space of frequent sequence mining and present two novel pruning strategies, SEP (Sequence Extension Pruning) and IEP (Item Extension Pruning), which can be used in all Apriori-like sequence mining algorithms or lattice-theoretic approaches. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM [2] and present the improved algorithm, SP AMSEPIEP, which uses SEP and IEP to prune the search space by sharing the frequent 2sequences lists. A set of comprehensive performance experiments study shows that SP AMSEPIEP outperforms SPAM by a factor of 10 on small datasets and better than 30% to 50% on reasonably large dataset.
Citation:
Xu Yusheng, Ma Zhixin, Li Lian, Tharam S. Dillon, "Effective Pruning Strategies for Sequential Pattern Mining," wkdd, pp.21-24, First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008), 2008