The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given web document to the topic. In this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the web
Citation:
Mohsen Jamali, Hassan Sayyadi, Babak Bagheri Hariri, Hassan Abolhassani, "A Method for Focused Crawling Using Combination of Link Structure and Content Similarity," wi, pp.753-756, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006