loading...
GDClust: A Graph-Based Document Clustering Technique
Omaha, Nebraska, USA October 28-October 31
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDMW.2007.104Seventh IEEE International Conference ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
This paper introduces a new technique of document clustering based on frequent senses. The proposed system, GDClust (Graph-Based Document Clustering) works with frequent senses rather than frequent keywords used in traditional text mining techniques. GDClust presents text documents as hierarchical document-graphs and utilizes an Apriori paradigm to find the frequent subgraphs, which reflect frequent senses. Discovered frequent subgraphs are then utilized to generate sense-based document clusters. We propose a novel multilevel Gaussian minimum support approach for candidate subgraph generation. GDClust utilizes English language ontology to construct document-graphs and exploits graph-based data mining technique for sense discovery and clustering. It is an automated system and requires minimal human interaction for the clustering purpose.
Citation:
M. Shahriar Hossain, Rafal A. Angryk, "GDClust: A Graph-Based Document Clustering Technique," icdmw, pp.417-422, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.