Abstract
Recently, the research on efficient extraction of previously unknown, frequently appearing patterns in a time-series data has received much attention. These patterns are called ‘motifs’. Motifs are useful for various time-series data mining tasks. In this paper, we propose a motif discovery algorithm to extract a motif that represents a characteristic pattern of the given data based on Minimum Description Length (MDL) principle. In addition, the algorithm can extract motifs from multi-dimensional time-series data by using Principal Component Analysis (PCA). In experimental evaluation, we show the efficiency of the motif discovery algorithm, and the usefulness of extracted motifs to various data mining tasks.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akaike, H. (1969). Fitting autoregressive model for prediction. The Annals of the Institute of Statistical Mathematics, 21, 243–247.
Berberidis, C., Vlahavas, I., Aref, W. G., Atallah, M., & Elmagarmid, A. K. (2002). On the discovery of weak periodicities in large time series. In Proc. of PAKDD 2002 (pp. 51–61).
Buhler, J., & Tompa, M. (2001). Finding motifs using random projections. In Proc. of 5th International Conference on Computer Molecular Biology (pp. 67–74).
Caraca-Valente, J. P., & Lopez-Chavarrias, I. (2000). Discovering similar patterns in time series. In Proc. of 6th ACM SIGMOD International Confference on Knowledge Discovery and Data Mining (pp. 126–133).
Chakrabarti, S., Sarawagi, S., & Dom, B. (1998). Mining surprising patterns using temporal description length. In Proc. of 24th International Conference on Very Large databases (pp. 606–617).
Chiu, B., Keogh, E., & Lonardi, S. (2003). Probabilistic discovery of time series motif. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 493–498).
Colomer, J., Melendez, J., & Gamero, F. I. (2002). Pattern recognition based on episodes and DTW, application to diagnosis of a level control system. In Proc. of 16th International Workshop on Qualitative Reasoning (pp. 37–43).
Cyril, G., Peter, T., Egill, R., Arup, N. F., & Kai, H. L. (1999). On clustering fMRI time series. NeuroImage, 9:3, 298–310.
Das, G., Lin, K., Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule discovery from time series. In Proc. of the 4th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining (pp. 16–22).
Fradkin, D., & Madigan, D. (2003). Experiments with random projections for machine learning. In Proc. of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 517–522).
Friedman, J. H., & Tukey, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23:9, 881–889.
Heras, D. B., Cabaleiro, J. C., Perez, V. B., Costas, P., & Rivera, F. F. (1996). Principal component analysis on vector computers. In Proc. of Vector and Parallel Processing 1996 (pp. 416–428).
Hyvrinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13:4, 411–430.
Kashino, K., Smith, G., & Murase, H. (1999). Time-series active search for quick retrieval of audio and video. In Proc. of 1999 International Conference on Acoustics, Speech and Signal Processing (pp. 2993–2996).
Keogh, E. (2004). The UCR Time Series Data Mining Archive [http://www.cs.ucr.edu/~eamonn/TSDMA/datasets. html]. Riverside CA, University of California: Computer Science & Engineering Department.
Koopman, S. J., & Ooms, M. (2003). Time-series modeling of daily tax revenues. Statistica Neerlandica, 57:4, 439–469.
Levin, A. U., Leen, T. K., and Moody, J. E. (1993), Fast pruning using principal components. In Proc. of NIPS ′93 (pp. 35–42).
Lin, J., Keogh, E., Lonardi, S., & Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
Lin, J., Keogh, E., Lonardi, S., & Patel, P. (2002). Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining (pp. 53–68).
Mori, T., & Uehara, K. (2001). Extraction of primitive motion and discovery of association rules from human motion. In Proc. of the 10th IEEE International Workshop on Robot and Human Communication (pp. 200–206).
Myers, C. S., & Rabiner, L. R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 7:60, 1389–1409.
Navarro, G., & Baeza-Yates, R., (1999). Fast multi-dimensional approximate pattern matching. In Proc. of the 10th Annual Symposium on Combinatorial Pattern Matching (pp. 243–257).
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, Vol. 15. World Scientific Publishing.
Schwarz, G (1981). Estimating the dimension of a model. The Annals of Statistical Mathematics, 6:2, 461-464.
Shannon, C. E. (1949). Communication in the presence of noise. In Proc. of 2nd Institute of Radio Engineers (pp. 10–21).
Tanaka, Y., & Uehara, K. (2003). Discover motifs in multi dimensional time-series using the principal component analysis and the MDL principle. In Proc. of 3rd International Conference on Machine Learning and Data Mining in Pattern Recognition (pp. 252–265).
Vigario, R., Jousmaki, V., Hamalainen, M., Hari, R., and Oja, E. (1998), Independent component analysis for identification of artifacts in magnetoencephalographic recording. In Proc. of NIPS ′97 (pp. 229–235).
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., & Keogh, E. (2003). Indexing multi-dimensional time-series with support for multiple distance measure. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 216–225).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tanaka, Y., Iwamoto, K. & Uehara, K. Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle. Mach Learn 58, 269–300 (2005). https://doi.org/10.1007/s10994-005-5829-2
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-5829-2