Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Scalable k-NN based text clustering

  • Alessandro Lulli
  • , Thibault Debatty
  • , Matteo Dell'Amico
  • , Pietro Michiardi
  • , Laura Ricci
  • University of Pisa
  • EURECOM Ecole d'Ingénieur et Centre de Recherche en Sciences du Numérique
  • S3mantec Research Labs
  • CNR

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

14 Zitate (Scopus)

Abstract

Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an approximate k-NN graph, which is then used to compute connected components representing clusters. Our focus is to understand the scalability / accuracy tradeoff that underlies our method: we do so through an extensive experimental campaign, where we use real-life datasets, and show that even rough approximations of k-NN graphs are sufficient to identify valid clusters. Our method is scalable and can be easily tuned to meet requirements stemming from different application domains.

OriginalspracheEnglisch
TitelProceedings - 2015 IEEE International Conference on Big Data, Big Data 2015
Redakteure/-innenHoward Ho, Beng Chin Ooi, Mohammed J. Zaki, Xiaohua Hu, Laura Haas, Vipin Kumar, Sudarsan Rachuri, Shipeng Yu, Morris Hui-I Hsiao, Jian Li, Feng Luo, Saumyadipta Pyne, Kemafor Ogan
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten958-963
Seitenumfang6
ISBN (elektronisch)9781479999255
DOIs
PublikationsstatusVeröffentlicht - 22 Dez. 2015
Veranstaltung3rd IEEE International Conference on Big Data, Big Data 2015 - Santa Clara, USA/Vereinigte Staaten
Dauer: 29 Okt. 20151 Nov. 2015

Publikationsreihe

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Konferenz

Konferenz3rd IEEE International Conference on Big Data, Big Data 2015
Land/GebietUSA/Vereinigte Staaten
OrtSanta Clara
Zeitraum29/10/151/11/15

Fingerprint

Untersuchen Sie die Forschungsthemen von „Scalable k-NN based text clustering“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren