loading...
Adding Semantics to Email Clustering
Hong Kong December 18-December 22
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2006.16Sixth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Hua Li, Microsoft Research Asia, China
Dou Shen, Hong Kong University of Science and Technology, Hong Kong
Benyu Zhang, Microsoft Research Asia, China
Zheng Chen, Microsoft Research Asia, China
Qiang Yang, Hong Kong University of Science and Technology, Hong Kong
This paper presents a novel algorithm to cluster emails according to their contents and the sentence styles of their subject lines. In our algorithm, natural language processing techniques and frequent itemset mining techniques are utilized to automatically generate meaningful generalized sentence patterns (GSPs) from subjects of emails. Then we put forward a novel unsupervised approach which treats GSPs as pseudo class labels and conduct email clustering in a supervised manner, although no human labeling is involved. Our proposed algorithm is not only expected to improve the clustering performance, it can also provide meaningful descriptions of the resulted clusters by the GSPs. Experimental results on open dataset (Enron email dataset) and a personal email dataset collected by ourselves demonstrate that the proposed algorithm outperforms the K-means algorithm in terms of the popular measurement F1. Furthermore, the cluster naming readability is improved by 68.5% on the personal email dataset.
Citation:
Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang, "Adding Semantics to Email Clustering," icdm, pp.938-942, Sixth IEEE International Conference on Data Mining (ICDM'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.