loading...
A Learning Approach to Discovering Web Page Semantic Structures
Seoul, Korea August 31-September 01
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2005.19Eighth International Conference on Do ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Junlan Feng, AT&T LABS RESEARCH
Patrick Haffner, AT&T LABS RESEARCH
Mazin Gilbert, AT&T LABS RESEARCH
This paper proposes a learning approach for discovering the semantic structure of web pages. The task includes partitioning the text on a web page into information blocks and identifying their semantic categories. We employed two machine learning techniques, Adaboost and SVMs, to learn from a labeled web page corpus. We evaluated our approach on general web pages from the World Wide Web and obtained encouraging results. This work can be beneficial to a number of web-driven applications such as search engines, web-based question answering, web-based data mining as well as voice enabled web navigation.
Citation:
Junlan Feng, Patrick Haffner, Mazin Gilbert, "A Learning Approach to Discovering Web Page Semantic Structures," icdar, pp.1055-1059, Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.