loading...
CINDI Robot: an Intelligent Web Crawler Based on Multi-level Inspection
Banff, Alberta, Canada September 06-September 08
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/IDEAS.2007.2111th International Database Engineeri ...
 This Article 
 
PDF
HTML
IEEE Xplore Subscribers
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Rui Chen, Concordia University, Canada
Bipin C. Desai, Concordia University, Canada
Cong Zhou, Motorola Canada
With the explosion of the Web, focused web crawlers are gaining attention. Focused web crawlers aim at finding web pages related to the pre-defined topic. CINDI Robot is a focused web crawler devoted to finding computer science and software engineering academic documents. We propose a multi-level inspection scheme to discover relevant web pages. Through this multi-level inspection scheme, the text feature of the content contributes to the classification; furthermore other web characteristics, such as URL pattern, anchor text and so on, assist the decision process. The experiment result demonstrates this multi-level inspection method outperforms other traditional methods.
Index Terms:
focused web crawler, SVM classifier, Na?ve Bayes classifier, multi-level inspection, revised context graph, tunneling
Citation:
Rui Chen, Bipin C. Desai, Cong Zhou, "CINDI Robot: an Intelligent Web Crawler Based on Multi-level Inspection," ideas, pp.93-101, 11th International Database Engineering and Applications Symposium (IDEAS 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions