loading...
Learning Rules to Pre-process Web Data for Automatic Integration
Athens, Georgia, USA November 10-November 11
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/RULEML.2006.16Second International Conference on Ru ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Kai Simon, Universitat Freiburg, Germany
Thomas Hornung, Universitat Freiburg, Germany
Georg Lausen, Universitat Freiburg, Germany
Web pages such as product catalogues and web sites resulting from querying a search engine often follow a global layout template which facilitates the retrieval of information for a user. In this paper we present a technique which makes such content machine-processable by extracting and transforming it into tabular form. We achieve this goal via ViPER, our fully automatic wrapper system, while localizing and extracting structured data records from suchlike web pages following a sophisticated strategy based on the visual perception of a web page.

The first contribution of this paper is to give deep insight into the post-processing heuristics of ViPER, which become materialized by a set of rules. Once these rules are defined, the regular content of a web page can be abstracted into a relational view. Second, we show that new, unseen contents rendered with the same layout, only have to be extracted by ViPER, whereas the remaining transformation can be performed by applying the learned rules accordingly.

Citation:
Kai Simon, Thomas Hornung, Georg Lausen, "Learning Rules to Pre-process Web Data for Automatic Integration," ruleml, pp.107-116, Second International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions