loading...
Layout Based Information Extraction from HTML Documents
Curitiba, Parana, Brazil September 23-September 26
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2007.155Ninth International Conference on Doc ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
R. Burget, Brno University of Technology
We propose a method of information extraction from HTML documents based on modelling the visual informa- tion in the document. A page segmentation algorithm is used for detecting the document layout and subsequently, the extraction process is based on the analysis of mutual po- sitions of the detected blocks and their visual features. This approach is more robust that the traditional DOM-based methods and it opens new possibilities for the extraction task specification.
Citation:
R. Burget, "Layout Based Information Extraction from HTML Documents," icdar, vol. 2, pp.624-628, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2, 2007
Usage of this product signifies your acceptance of the Terms of Use.