Continuous Similarity-Based Queries on Streaming Time Series
|
In many applications, local or remote sensors send in streams of data, and the system needs to monitor the streams to discover relevant events/patterns and deliver instant reaction correspondingly. An important scenario is that the incoming stream is a continually appended time series, and the patterns are time series in a database. At each time when a new value arrives (called a time position), the system needs to find, from the database, the nearest or near neighbors of the incoming time series up to the time position. This paper attacks the problem by using Fast Fourier Transform (FFT) to efficiently find the cross correlations of time series, which yields, in a batch mode, the nearest and near neighbors of the incoming time series at many time positions. To take advantage of this batch processing in achieving fast response time, this paper uses prediction methods to predict future values. When the prediction length is long, FFT is used to compute the cross correlations of the predicted series (with the values that have already arrived) and the database patterns, and to obtain predicted distances between the incoming time series at many future time positions and the database patterns. If the prediction length is short, the direct computation method is used to obtain these predicted distances to avoid the overhead of using FFT. When the actual data value arrives, the prediction error together with the predicted distances is used to filter out patterns that are not possible to be the nearest or near neighbors, which provides fast responses. Experiments show that with reasonable prediction errors, the performance gain is significant. Especially, when the long term predictions are available, the proposed method can handle incoming data at a very fast streaming rate.
[1] 1320 R. Agrawal , C. Faloutsos , and A.N. Swami , “Efficient Similarity Search In Sequence Databases,” Proc. Conf. Foundations of Data Organization and Algorithms, pp. 69-84, 1993.
[2] R. Agrawal , K.-I. Lin , H.S. Sawhney , and K. Shim , “Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases,” VLDB J., pp. 490-501, 1995.
[3] R. Avnur and J.M. Hellerstein , “Eddies: Continuously Adaptive Query Processing,” Proc. ACM SIGMOD, pp. 261-272, 2000.
[4] S. Babu and J. Widom , “Continuous Queries over Data Streams,” SIGMOD Record, Sept. 2001.
[5] S. Berchtold and D.A. Keim , “High-Dimensional Index Structures, Database Support for Next Decade's Applications (tutorial),” Proc. ACM SIGMOD, 1998.
[6] C. Burrus and T. Parks , DFT/FFT and Convolution Algorithms. John Wiley and Sons, 1985.
[7] J. Chen , D.J. DeWitt , and J.F. Naughton , “Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries,” Proc. Int'l Conf. Data Eng., 2002.
[8] J. Chen , D.J. DeWitt , F. Tian , and Y. Wang , “NiagaraCQ: A Scalable Continuous Query System for Internet Databases,” Proc. ACM SIGMOD, pp. 379-390, 2000.
[9] C. Faloutsos , M. Ranganathan , and Y. Manolopoulos , “Fast Subsequence Matching in Time-Series Databases,” Proc. ACM SIGMOD, pp. 419-429, 1994.
[10] M. Frigo and S.G. Johnson , “FFTW: C Subroutine Library for Computing the Discrete Fourier Transform (DFT),” http:/www. fftw.org/, 2004.
[11] L. Gao and X.S. Wang , “CQP Code,” http://www.cem.uvm.edu/~lgaoresearch.html , 2005.
[12] L. Gyorfi , G. Lugosi , and G. Morvai , “A Simple Randomized Algorithm for Sequential Prediction of Ergodic Time Series,” IEEE Trans. Information Theory, vol. 45, no. 7, pp. 2642-2650, 1999.
[13] Y.-W. Huang and P.S. Yu , “Adaptive Query Processing for Time-Series Data,” Proc. ACM SIGKDD, pp. 282-286, 1999.
[14] T. Kahveci and A.K. Singh , “Variable Length Queries for Time Series Data,” Proc. Int'l Conf. Data Eng., pp. 273-282, 2001.
[15] S. Madden and M.J. Franklin , “Fjording the Stream: An Architecture for Queries over Streaming Sensor Data,” Proc. Int'l Conf. Data Eng., 2002.
[16] S. Madden , M. Shah , J.M. Hellerstein , and V. Raman , “Continuously Adaptive Continuous Queries over Streams,” Proc. ACM SIGMOD, 2002.
[17] A. Oppenheim and R. Schafer , Digital Signal Processing. Prentice-Hall, 1975.
[18] C.-S. Perng , H. Wang , S.R. Zhang , and D.S. Parker , “Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases,” Proc. Int'l Conf. Data Eng., pp. 33-42, 2000.
[19] Physionet and CINC, “CinC Challenge 2001 Data Sets: The PAF Prediction Challenge Database,” http://www.physionet.org/ physiobank/database afpdb/, 2001.
[20] B. Plale and K. Schwan , “Optimizations Enabled by a Relational Data Model View to Querying Data Streams,” Proc. Int'l Parallel and Distributed Processing Symp., 2001.
[21] The Transforms and Applications Handbook, A.D. Poularikas, ed. CRC Press LLC, 2000.
[22] D. Rafiei and A. Mendelzon , “Similarity-Based Queries for Time Series Data,” Proc. ACM SIGMOD, pp. 13-25, 1997.
[23] D. Rafiei and A.O. Mendelzon , “Querying Time Series Data Based on Similarity,” IEEE Trans. Knowledge and Data Eng., vol. 12, no. 5, pp. 675-693, 2000.
[24] C. Shahabi and D. Yan , “Real-Time Pattern Isolation and Recognition over Immersive Sensor Data Streams,” Proc. Ninth Int'l Conf. Multi-Media Modeling, 2003.
[25] H. Shatkay and S.B. Zdonik , “Approximate Queries and Representations for Large Data Sequences,” Proc. Int'l Conf. Data Eng., pp. 536-545, 1996.
[26] D. Terry , D. Goldberg , D. Nichols , and B. Oki , “Continuous Queries over Append-Only Databases,” Proc. ACM SIGMOD, pp. 321-330, 1992.
[27] L. Wang , K.K. Teo , and Z. Lin , “Predicting Time Series with Wavelet Packet Neural Networks,” Proc. Int'l Joint Conf. Neural Networks, vol. 3, pp. 1593-1597, 2001.
[28] S. Winograd , “Some Bilinear Forms Whose Multiplicative Complexity Depends on the Field of Constants,” Math. Systems Theory, vol. 10, pp. 169-180, 1977.
Index Terms:
Index Terms- Similarity search, data mining, data stream processing, time series analysis, nearest neighbor search, continuous query.
Citation:
Like Gao, Xiaoyang Sean Wang, "Continuous Similarity-Based Queries on Streaming Time Series," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1320-1332, Oct. 2005, doi:10.1109/TKDE.2005.161