loading...
A Probabilistic Model for Intelligent Web Crawlers
Dallas, Texas November 03-November 06
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CMPSAC.2003.124535427th Annual International Computer So ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Ke Hu, The Chinese University of Hong Kong
Wing Shing Wong, The Chinese University of Hong Kong
With the enormous growth of the World Wide Web in recent years, the issue of how to discover web pages efficiently has become an important challenge for web crawler designers. In this paper, we will outline a simple model to predict the distribution of the search depth in a breadth-first search to reach the first web pages relevant to a user query. We define this probability as the crawler confidence. Recent studies indicate that at a large scale the Web structure subscribes to power law distribution on several aspects [3][7]. However, our work tries to model a microscopic linkage structure of the Web from an intelligent crawler's point of view. With the information provided by crawler confidence, an intelligent crawler can adjust its crawling behavior to achieve a higher harvest rate.
Citation:
Ke Hu, Wing Shing Wong, "A Probabilistic Model for Intelligent Web Crawlers," compsac, pp.278, 27th Annual International Computer Software and Applications Conference, 2003
Usage of this product signifies your acceptance of the Terms of Use.