Abstract
We study the problem of searching similar patterns in time series data for variable length queries. Recently, a multi-resolution indexing technique (MRI) was proposed in (Kahveci and Singh, in proceedings of the international conference on data engineering, pp. 273–282, 2001; Kahveci and Singh, IEEE Trans Knowl Data Eng 16(4):418–433, 2004) to address this problem, which uses compression as an additional step to reduce the index size. In this paper, we propose an alternative technique, called compact MRI (CMRI), which uses adaptive piecewise constant approximation (APCA) representation as dimensionality reduction technique, and which occupies much less space without requiring compression. We implemented both MRI and CMRI, and conducted extensive experiments to evaluate and compare their performance on real stock data as well as synthetic. Our results indicate that CMRI provides a much better precision ranging from 0.75 to 0.89 on real data, and from 0.80 to 0.95 on synthetic data, while for MRI, these ranges are from 0.16 to 0.34, and from 0.03 to 0.65, respectively. Compared to sequential scan, we found that CMRI is 4–30 times faster and the number of disk I/Os it required is close to minimal. In terms of storage utilization, CMRI occupies 1% of the memory occupied by MRI. These results and analysis show CMRI to be an efficient and scalable indexing technique for large time series databases.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agarwal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: Proceedings of the International conference on foundation of data organization and algorithms, pp 69–84
Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Proceedings of the 21st international conference on very large data bases. Morgan Kaufmann, Zurich, pp 490–501
Beckman N, Kriegel HP, Schneider R, Fu W (1990) The R*-tree: an efficient and robust access method for points and retangles. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 322–332
Bellman RE (1961). Adaptive control process: a guided tour. Princeton University Press, Princeton
Chakrabarti K, Mehrotra S (2000) Local dimensionality reduction: a new approach to indexing high dimensional spaces. In: Proceedings of the 26th VLDB Conference
Chan K, Fu W (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th International conference on data engineering, pp 126–133
Faloutsos C (1996). Searching multimedia databases by content. Kluwer, Dordrecht
Faloutsos C, King-Ip (David) Lin (1995) Fastmap: a fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets. In: Proceedings of the ACM SIGMOD conference, pp 163–174
Faloutsos C, Ranganathan M, Manopolous Y (1994) Fast subsequence matching in time series databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 419–429
Guttman A (1984) R-trees: An efficient indexing structure for spatial searching. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 47–57
Kahveci T, Singh AK (2001) Variable length queries for time series data. In: Proceedings of the international conference on data engineering, pp 273–282
Kahveci T and Singh AK (2004). Optimizing similarity search for arbitrary length time series queries. IEEE Trans Knowl Data Eng 16(4): 418–433
Kanth KVR, Agrawal D, Singh A (1998) Dimensionality reduction for similarity searching in dynamic databases. In: Proceedings of the ACM SIGMOD international conference on management of data
Keogh E and Ann RC (2005). Exact indexing of dynamic time warping. Knowl Info Syst 7(3): 358–386
Keogh EJ and Lin J (2005). Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Info Syst 8(2): 154–177
Keogh E, Chakrabarti K, Mehrotra S and Pazzani M (2001). Dimensionality reduction for fast similarity search in large databases. Knowl Info Syst 3(3): 263–286
Keogh E, Chakrabarti K, Mehrotra S, Pazzani M (2001b) Locally adaptive dimensionallity reduction for indexing large time series databases. J ACM
Lee S, Chun S, Kim D, Lee J, Chung C (2000) Similarity search for multidimensional data sequences. In: Proceedings of the 16th international conference on data engineering. IEEE Computer Society, Washington, DC, pp 599–608
Li Q, Lopez IFV and Moon B (2000). Skyline index for time series data. IEEE Trans Knowl Data Eng (TKDE) 16: 669–664
Popivanov I, Miller RJ (2002) Similarity search over time series database using wavelets. In: Proceedings of the 18th international conference on data engineering, pp 212–221
Rafiei D, Mendelzon AO (1997) Similarity-based queries for time series data. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 13–25
Rafiei D, Mendelzon AO (1998) Efficient retrieval of similar time sequences. In: Proceedings of the FODO Conference, pp 249–256
Shepard RN (1980). Multi dimensional scaling. Princeton University Press, Princeton
Sun H, Ozturk O, Ferhatosmanoglu H (2003) Comri: a compressed multi-resolution index structure for sequence similarity queries. In: IEEE computer society bioinformatics conference (CSB’03), pp 553–558
Wu YL, Divyakant A, Abbadi AE (2000) A comparision of dft and dwt based similarity search in time series databases. In: Proceedings of the international conference on information and knowledge management, pp 488–495
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary lp norms. In: Proceedings of the 26th international conference on very large databases(VLDB)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kadiyala, S., Shiri, N. A compact multi-resolution index for variable length queries in time series databases. Knowl Inf Syst 15, 131–147 (2008). https://doi.org/10.1007/s10115-007-0097-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0097-z