loading...
Improving Web Clustering by Cluster Selection
Compi?gne University of Technology, France September 19-September 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2005.752005 IEEE/WIC/ACM International Confe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Daniel Crabtree, Victoria University of Wellington
Xiaoying Gao, Victoria University of Wellington
Peter Andreae, Victoria University of Wellington
Web page clustering is a technology that puts semantically related web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, Suffix Tree Clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paper?s experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.
Index Terms:
web clustering, cluster selection
Citation:
Daniel Crabtree, Xiaoying Gao, Peter Andreae, "Improving Web Clustering by Cluster Selection," wi, pp.172-178, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.