loading...
XML Document Clustering Using Common XPath
Tokyo, Japan April 08-April 09
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WIRI.2005.39International Workshop on Challenges ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ho-pong Leung, Department of Computing Hong Kong Polytechnic University Hunghom, Hong Kong, China.
Fu-lai Chung, Department of Computing Hong Kong Polytechnic University Hunghom, Hong Kong, China.
Stephen C.F. Chan, Department of Computing Hong Kong Polytechnic University Hunghom, Hong Kong, China.
Robert Luk, Department of Computing Hong Kong Polytechnic University Hunghom, Hong Kong, China.

XML is becoming a common way of storing data. The elements and their arrangement in the document?s hierarchy not only describe the document structure but also imply the data?s semantic meaning, and hence provide valuable information to develop tools for manipulating XML documents. In this paper, we pursue a data mining approach to the problem of XML document clustering. We introduce a novel XML structural representation called common XPath (CXP), which encodes the frequently occurring elements with the hierarchical information, and propose to take the CXPs mined to form the feature vectors for XML document clustering. In other words, data mining acts as a feature extractor in the clustering process. Based on this idea, we devise a path-based XML document clustering algorithm called PBClustering which groups the documents according to their CXPs, i.e. their frequent structures. Encouraging simulation results are observed and reported.

Citation:
Ho-pong Leung, Fu-lai Chung, Stephen C.F. Chan, Robert Luk, "XML Document Clustering Using Common XPath," wiri, pp.91-96, International Workshop on Challenges in Web Information Retrieval and Integration, 2005
Usage of this product signifies your acceptance of the Terms of Use.