loading...
Efficient Techniques for Effective Wrapper Induction
Atlanta, Georgia April 03-April 07
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDEW.2006.5322nd International Conference on Data ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Valter Crescenzi, Universita degli Studi Roma Tre, Italy
Paolo Merialdo, Universita degli Studi Roma Tre, Italy
Several studies have recently concentrated on the generation of wrappers for extracting data from Web data sources. The ROADRUNNER system aims at automating the tedious and expensive process of writing wrappers in an unsupervised, domain-independent, and scalable manner. The system is based on a grammar inference algorithm, called MATCH, which has been designed in a sound theoretical framework. However, in its original definition MATCH lacks in expressivity; that is, in many cases when MATCH runs over real-life Web pages, it is not able to produce a solution. In this paper we address the challenging issue of developing techniques that allow us to build upon MATCH an effective and efficient system, without renouncing to the original formal background. First, we analyze the main limitations of MATCH; then we illustrate the techniques we have developed to overcome such limitations. Finally we report on the results of some experiments, that show the efficacy of the introduced techniques and demonstrate the improvements of the overall system.
Citation:
Valter Crescenzi, Paolo Merialdo, "Efficient Techniques for Effective Wrapper Induction," icdew, pp.47, 22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.