loading...
Document Clustering with Semantic Analysis
Kauai, Hawaii January 04-January 07
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/HICSS.2006.129Proceedings of the 39th Annual Hawaii ...
 This Article 
 
PURCHASE ARTICLE: $0
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Yong Wang, Mississippi State University
Julia Hodges, Mississippi State University
Document clustering generates clusters from the whole document collection automatically and is used in many fields, including data mining and information retrieval. In the traditional vector space model, the unique words occurring in the document set are used as the features. But because of the synonym problem and the polysemous problem, such a bag of original words cannot represent the content of a document precisely. In this paper, we investigate using the sense disambiguation method to identify the sense of words to construct the feature vector for document representation. Our experimental results demonstrate that in most conditions, using sense can improve the performance of our document clustering system. But the comprehensive statistical analysis performed indicates that the differences between using original single words and using senses of words are not statistically significant. In this paper, we also provide an evaluation of several basic clustering algorithms for algorithm selection.
Citation:
Yong Wang, Julia Hodges, "Document Clustering with Semantic Analysis," hicss, vol. 3, pp.54c, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06) Track 3, 2006
Usage of this product signifies your acceptance of the Terms of Use.