loading...
Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.1047September 2007 (vol. 19 no. 9) pp. 1175-1187
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
This paper addresses the problem of finding frequent closed patterns (FCPs) from very dense datasets. We introduce two compressed hierarchical FCP mining algorithms C-Miner and B-Miner. The two algorithms compress the original mining space, hierarchically partition the whole mining task into independent subtasks and mine each subtask progressively. The two algorithms adopt different task-partitioning strategies: CMiner partitions the mining task based on Compact Matrix Division whereas B-Miner partitions the task based on Base Rows Projection. The compressed hierarchical mining algorithms enhance the mining efficiency and facilitate a progressive refinement of results. Moreover, because the subtasks can be mined independently, C-Miner and B-Miner can be readily parallelized without incurring significant communication overhead. We have implemented C-Miner and B-Miner, and our performance study on synthetic datasets and real dense microarray datasets shows their effectiveness over existing schemes. We also report experimental results on parallel versions of these two methods.

[1] 1175 R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, 1994.
[2] J. Besson, C. Robardet, and J.-F. Boulicaut, “Constraint-Based Mining of Formal Concepts in Transactional Data,” Proc. Eighth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD '04), pp. 615-624, 2004.
[3] G. Cong, K.L. Tan, A.K.H. Tung, and F. Pan, “Mining Frequent Closed Patterns in Microarray Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM '04), pp.363-366, 2004.
[4] M. El-Hajj and O.R. Zaiane, “Parallel Association Rule Mining with Minimum Inter-Processor Communication,” Proc. 14th Int'l Workshop Database and Expert Systems Applications (DEXA '03), pp.519-523, 2003.
[5] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” Proc. 12th Nat'l Conf. Artificial Intelligence (AAAI '94) Workshop Knowledge Discovery in Databases (KDD '94), pp. 181-192, 1994.
[6] F. Pan, G. Cong, and A.K.H. Tung, “Carpenter: Finding Closed Patterns in Long Biological Datasets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 642-673, 2003.
[7] F. Pang, A.K.H. Tung, G. Cong, and X. Xu, “Cobbler: Combining Column and Row Enumeration for Closed Pattern Discovery,” Proc. 16th Int'l Conf. Scientific and Statistical Database Management (SSDBMS '04), pp. 21-30, 2004.
[8] J. Pei, J. Han, and R. Mao, “Closet: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. SIGMOD Int'l Workshop Data Mining and Knowledge Discovery, pp. 21-30, 2000.
[9] P. Shenoy, J. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, “Turbo-Charging Vertical Mining of Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp.22-23, 2000.
[10] J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining Frequent Closed Itemsets,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), pp. 236-245, 2003.
[11] R.P.G. Yudho Giri Sucahyo and A. Rudra, “Efficiently Mining Frequent Patterns from Dense Datasets Using a Cluster of Computers,” Proc. 16th Australian Joint Conf. Artificial Intelligence (AI '03), pp. 233-244, 2003.
[12] M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. Second SIAM Int'l Conf. Data Mining (SDM '02), 2002.
[13] M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New Algorithms for Fast Discovery of Association Rules,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining (KDD '97), pp. 283-286, 1997.
[14] M.J. Zaki, “Parallel and Distributed Association Mining: A Survey,” IEEE Concurrency, special issue on data mining, pp. 14-25, 1999.
[15] U. Takeaki, K. Masashi, and A. Hiroki, “LCM ver. 3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining,” Proc. First Int'l Workshop Open Source Data Mining: Frequent Pattern Mining Implementations (OSDM '05), pp. 77-86, 2005.
[16] C. Creighton and S. Hanash, “Mining Gene Expression Database for Association Rules,” Bioinformatics, vol. 19, pp. 79-86, 2003.
[17] G. Cong, A.K.H. Tung, X. Xu, F. Pan, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), 2004.
[18] L. Ji, “Mining Localized Co-Expressed Gene Patterns from Microarray Data,” PhD dissertation, http://www.comp.nus. edu.sg~jiliping, 2006.

Index Terms:
Frequent closed patterns, progressive, dense datasets, data mining, parallel mining
Citation:
Liping Ji, Kian-Lee Tan, Anthony Tung, "Compressed Hierarchical Mining of Frequent Closed Patterns from Dense Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1175-1187, June 2007, doi:10.1109/TKDE.2007.1047
Usage of this product signifies your acceptance of the Terms of Use.