loading...
On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources
Melbourne, Florida November 19-November 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2003.1250945Third IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Guizhen Yang, University at Buffalo, NY
Saikat Mukherjee, Stony Brook University, NY
I. V. Ramakrishnan, Stony Brook University, NY
Machine learning techniques for data extraction from semistructured sources exhibit different precision and recall characteristics. However to date the formal relationship between learning algorithms and their impact on these two metrics remains unexplored. This paper proposes a formalization of precision and recall of extraction and investigates the complexity-theoretic aspects of learning algorithms for multi-attribute data extraction based on this formalism. We show that there is a tradeoff between precision/recall of extraction and computational efficiency and present experimental results to demonstrate the practical utility of these concepts in designing scalable data extraction algorithms for improving recall without compromising on precision.
Citation:
Guizhen Yang, Saikat Mukherjee, I. V. Ramakrishnan, "On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources," icdm, pp.395, Third IEEE International Conference on Data Mining (ICDM'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.