Finding patterns in biological sequences owns a sig- nificant impact on many real-world applications such as biological sequence analysis, text indexing, stream data mining, and sensor networking. The problem of Pattern Matching with Wildcards and Length Con- straints is to find all locations of occurrences of a pat- tern P in a text T, which can be a biological sequence, text string, etc. The user can specify a varying range for the number of wildcards between every two con- secutive letters in P and also the length constraints of P. Another constraint is the one-off condition, where every literal in T can only be used once for matching with P. The on-line version of this problem is to find out an occurrence of the given pattern that satisfies all constraints as soon as the occurrence appears in the input of T so far. There is an algorithm SAIL to find the optimal solution for the on-line version of this problem. However, SAIL only handles exact pat- tern matching. In this paper, we propose an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints, which is a more general problem than exact matching. We apply dy- namic programming in our algorithm and prove that our algorithm is correct.
Citation:
Dan He, Xindong Wu, Xingquan Zhu, "SAIL-APPROX: An Efficient On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints," bibm, pp.151-158, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007), 2007