loading...
Automatic Wrapper Generation for Semi-Structured Biological Data Based on Table Structure Identification
Prague, Czech Republic September 01-September 05
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DEXA.2003.123199814th International Workshop on Databa ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Liangyou Chen, Mississippi State University
Hasan M. Jamil, Mississippi State University
Nan Wang, Mississippi State University
Biological data analyses usually require complex manipulations involving tool applications, multiple web site navigation, result selection and filtering, and iteration over the internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
Citation:
Liangyou Chen, Hasan M. Jamil, Nan Wang, "Automatic Wrapper Generation for Semi-Structured Biological Data Based on Table Structure Identification," dexa, pp.55, 14th International Workshop on Database and Expert Systems Applications (DEXA'03), 2003
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions