loading...
A Fully Automated Object Extraction System for the World Wide Web
Mesa, AZ April 16-April 19
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDSC.2001.91896621st IEEE International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
David Buttler, Georgia Institute of Technology
Ling Liu, Georgia Institute of Technology
Calton Pu, Georgia Institute of Technology
Abstract: This paper presents a fully automated object extraction system---Omini. A distinct feature of Omini is the suite of algorithms and the automatically learned information extraction rules for discovering and extracting objects from dynamic Web pages or static Web pages that contain multiple object instances. We evaluated the system using more than 2,000 Web pages over 40 sites. It achieves 100% precision (returns only correct objects) and excellent recall (between 93% and 98%, with very few significant objects left out). The object boundary identification algorithms are fast, about 0.1 second per page with a simple optimization.
Citation:
David Buttler, Ling Liu, Calton Pu, "A Fully Automated Object Extraction System for the World Wide Web," icdcs, pp.0361, 21st IEEE International Conference on Distributed Computing Systems (ICDCS'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.