loading...
Personalized Spam Filtering with Semi-supervised Classifier Ensemble
Hong Kong, China December 18-December 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WI.2006.1322006 IEEE/WIC/ACM International Confe ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Victor Cheng, Hong Kong Baptist University, Hong Kong
C.H. Li, Hong Kong Baptist University, Hong Kong
The proliferation of unsolicited emails, also known as spam, poses significant burden to email users worldwide. Recent researches on spam filtering have shown that high accuracies can be obtained if labeled emails examples are available from the particular user of the spam filter. However, the time consuming process of providing personalized labeled training examples is often inconvenient or impossible due to privacy issues. In this paper, a semi-supervised personalized spam filter based on classifier ensemble is proposed that classifies user?s emails accurately by learning on both generic labeled emails and personalized unlabeled emails. The proposed multi-stage classification process begins learning a SVM model from labeled generic data. Unlabeled user?s emails are then fed to this SVM to generate personalized labeled data for constructing personalized naive Bayes classifiers. Furthermore, some personalized labeled examples are generated by exploiting rare word distributions and then fed into a semi-supervised classifier. The multi-stage results are integrated with SVMs learned from generic labeled emails to produce the final classification results. Experimental results show that the proposed approaches can significantly increases the classification accuracy in spam filtering.
Citation:
Victor Cheng, C.H. Li, "Personalized Spam Filtering with Semi-supervised Classifier Ensemble," wi, pp.195-201, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.