loading...
Proximity Estimation and Hardness of Short-Text Corpora
September 01-September 05
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DEXA.2008.872008 19th International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
In this work, we investigate the relative hardness of short-text corpora in clustering problems and how this hardness relates to traditional similarity measures. Our approach basically attempts to establish a connection between the hardness of a corpus and the precisionlevel exhibited by similarity measures, according to the results obtainedwith different cluster validity measures on the "ideal" clustering ofeach corpus. Moreover, we also propose a new validity measure, namedcontiguity error that allowed us to observe this connection in a consistentway in all the collections considered.
Index Terms:
clustering, short-text corpora, proximity estimation, cluster validity measures
Citation:
Marcelo Luis Errecalde, Diego Ingaramo, Paolo Rosso, "Proximity Estimation and Hardness of Short-Text Corpora," dexa, pp.15-19, 2008 19th International Conference on Database and Expert Systems Application, 2008
Usage of this product signifies your acceptance of the Terms of Use.