loading...
Mining Approximate Frequent Itemsets from Noisy Data
Houston, Texas November 27-November 30
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2005.93Fifth IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Jinze Liu, University of North Carolina at Chapel Hill
Susan Paulsen, University of North Carolina at Chapel Hill
Wei Wang, University of North Carolina at Chapel Hill
Andrew Nobel, University of North Carolina at Chapel Hill
Jan Prins, University of North Carolina at Chapel Hill

Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise.

In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al. [11].

Citation:
Jinze Liu, Susan Paulsen, Wei Wang, Andrew Nobel, Jan Prins, "Mining Approximate Frequent Itemsets from Noisy Data," icdm, pp.721-724, Fifth IEEE International Conference on Data Mining (ICDM'05), 2005
Usage of this product signifies your acceptance of the Terms of Use.