loading...
Classifying XML Documents by Using Genre Features
Regensburg, Germany September 03-September 07
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DEXA.2007.12018th International Conference on Data ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Malcolm Clark, The Robert Gordon University, UK
Stuart Watt, The Robert Gordon University, UK
The categorization of documents is traditionally topic-based. This paper presents a complementary analysis of research and experiments on genre to show that encouraging results can be obtained by using genre structure (form) features. We conducted an experiment to assess the effectiveness of using extensible mark-up language (XML) tag information, and part-of-speech (P-O-S) features, for the classification of genres, testing the hypothesis that if a focus on genre can lead to high precision on normal textual documents, then good results can be achieved using XML tag information in addition to P-O-S information. An experiment was carried out on a subsection of the initiative for the evaluation of XML (INEX) 1.4 collection. The features were extracted and documents were classified using machine learning algorithms, which yielded encouraging results for logistic regression and neural networks. We propose that utilizing these features and training a classifier may benefit retrieval for most world wide web (WWW) technologies such as XML and extensible hypertext markup language) XHTML.
Citation:
Malcolm Clark, Stuart Watt, "Classifying XML Documents by Using Genre Features," dexa, pp.242-248, 18th International Conference on Database and Expert Systems Applications (DEXA 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.