loading...
Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web
Boston, Massachusetts March 30-April 02
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDE.2004.131998820th International Conference on Data ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
James Caverlee, Georgia Institute of Technology
Ling Liu, Georgia Institute of Technology
David Buttler, Georgia Institute of Technology
In this paper, we introduce the concept of a QA-Pagelet to refer to the content region in a dynamic page that contains query matches. We present THOR, a scalable and efficient mining system for discovering and extracting QA-Pagelets from the Deep Web. A unique feature of THOR is its two-phase extraction framework. In the first phase, pages from a deep web site are grouped into distinct clusters of structurally-similar pages. In the second phase, pages from each page cluster are examined through a subtree filtering algorithm that exploits the structural and content similarity at subtree level to identify the QA-Pagelets.
Citation:
James Caverlee, Ling Liu, David Buttler, "Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web," icde, pp.103, 20th International Conference on Data Engineering (ICDE'04), 2004
Usage of this product signifies your acceptance of the Terms of Use.