loading...
Iterative Projected Clustering by Subspace Mining
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.29February 2005 (vol. 17 no. 2) pp. 176-189
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Irrelevant attributes add noise to high-dimensional clusters and render traditional clustering techniques inappropriate. Recently, several algorithms that discover projected clusters and their associated subspaces have been proposed. In this paper, we realize the analogy between mining frequent itemsets and discovering dense projected clusters around random points. Based on this, we propose a technique that improves the efficiency of a projected clustering algorithm (DOC). Our method is an optimized adaptation of the frequent pattern tree growth method used for mining frequent itemsets. We propose several techniques that employ the branch and bound paradigm to efficiently discover the projected clusters. An experimental study with synthetic and real data demonstrates that our technique significantly improves on the accuracy and speed of previous techniques.

[1] 176 C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park, “Fast Algorithms for Projected Clustering,” Proc. ACM SIGMOD, 1999.
[2] C.C. Aggarwal and P.S. Yu, “Finding Generalized Projected Clusters in High Dimensional Spaces,” Proc. ACM SIGMOD, 2000.
[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” Proc. ACM SIGMOD, 1998.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. Int'l Conf. Very Large Data Bases (VLDB '94), 1994.
[5] F. Beil, M. Ester, and X. Xu, “Frequent Term-Based Text Clustering,” Proc. ACM SIGKDD, 2002.
[6] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is 'Nearest Neighbor' Meaningful?” Proc. Int'l Conf. Database Theory (ICDT '99), 1999.
[7] C. Blake and C. Merz, “UCI Repository of Machine Learning Databases,” Univ. of Calif., Irvine, Dept. of Information and Computer Sciences, http://www.ics.uci.edu/~mlearnMLRepository.html , 1998.
[8] R. Frischholz and U. Dieckmann, “BioID: A Multimodal Biometric Identification System,” Computer, vol. 33, no. 2, Feb. 2000.
[9] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM SIGMOD, 2000.
[10] A. Hinneburg, C.C. Aggarwal, and D.A. Keim, “What is the Nearest Neighbor in High Dimensional Spaces?” Proc. Int'l Conf. Very Large Data Bases (VLDB '00), 2000.
[11] J. Pei, X. Zhang, M. Cho, H. Wang, and P.S. Yu, “Maple: A Fast Algorithm for Maximal Pattern-Based Clustering,” Proc. Int'l Conf. Data Mining (ICDM '03), 2003.
[12] C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali, “A Monte Carlo Algorithm for Fast Projective Clustering,” Proc. ACM SIGMOD, 2002.
[13] H. Toivonen, “Sampling Large Databases for Association Rules,” Proc. Int'l Conf. Very Large Data Bases (VLDB '96), 1996.
[14] H. Wang, W. Wang, J. Yang, and P.S. Yu, “Clustering by Pattern Similarity in Large Data Sets,” Proc. ACM SIGMOD, 2002.
[15] J. Yang, W. Wang, H. Wang, and P.S. Yu, “$\delta {\hbox{-}}Clusters$ : Capturing Subspace Correlation in a Large Data Set,” Proc. Int'l Conf. Data Eng. (ICDE '00), 2002.
[16] M.L. Yiu and N. Mamoulis, “Frequent-Pattern Based Iterative Projected Clustering,” Proc. Third IEEE Int'l Conf. Data Mining (ICDM '03), Nov. 2003.
[17] M.J. Zaki and K. Gouda, “Fast Vertical Mining Using Diffsets,” Proc. ACM SIGKDD, 2003.
[18] M.J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara, “Evaluation of Sampling for Data Mining of Association Rules,” Research Issues in Data Eng. (RIDE), 1997.

Index Terms:
Database management, database applications, clustering, classification, and association rules.
Citation:
Man Lung Yiu, Nikos Mamoulis, "Iterative Projected Clustering by Subspace Mining," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 176-189, Feb. 2005, doi:10.1109/TKDE.2005.29
Usage of this product signifies your acceptance of the Terms of Use.