loading...
New Techniques for the Discovery of Logical Documents in Web
Kyoto, Japan November 28-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/DANTE.1999.8449501999 International Symposium on Datab ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Keishi Tajima, Kobe University
Katsumi Tanaka, Kobe University
We propose a method of identifying logical documents in Web data. Pages in Web data are sometimes designed for presentation and do not always reflect logical structure, while a logical document is a data unit representing logical structure. One logical document often corresponds to a connected subgraph consisting of multiple pages. Therefore, for various Web data processing that should capture logical structure, such as querying facilities, extended support for user navigation, and Web structure analysis, logical documents are more appropriate data units than pages. We develop a method of identifying such logical documents in existing Web data. Our method uses three kind of information: link structure, directory structure embedded in URIs, and page contents.
Index Terms:
Web, WWW, hypertext, structure discovery, data units, logical documents, structural analysis, query, retrieval, navigation, summarization, overview, partitioning, link patterns, link structure
Citation:
Keishi Tajima, Katsumi Tanaka, "New Techniques for the Discovery of Logical Documents in Web," dante, pp.125, 1999 International Symposium on Database Applications in Non-Traditional Environments (DANTE'99), 1999
Usage of this product signifies your acceptance of the Terms of Use.