loading...
Automated Metadata and Instance Extraction from News Web Sites
Compi?gne University of Technology, France September 19-September 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2005.382005 IEEE/WIC/ACM International Confe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Srinivas Vadrevu, Arizona State University
Saravanakumar Nagarajan, Arizona State University
Fatih Gelgi, Arizona State University
Hasan Davulcu, Arizona State University
Over the past few years World Wide Web has established as a vital resource for news. With the continuous growth in the number of available news Web sites and the diversity in their presentation of content, there is an increasing need to organize the news related information on the Web and keep track of it. In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. The tree-mining algorithms that we present identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We report experimental evaluation for the news domain to demonstrate the efficacy of our algorithms.
Citation:
Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih Gelgi, Hasan Davulcu, "Automated Metadata and Instance Extraction from News Web Sites," wi, pp.38-41, 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions