Abstract
Multivariate time series(MTS) data mining has attracted much interest in recent years due to the increasing number of fields requiring the capability to manage and process large collections of MTS. In those frameworks, carrying out pattern recognition tasks such as similarity search, clustering or classification can be challenging due to the high dimensionality, noise, redundancy and feature correlated characteristics of the data. Dimensionality reduction is consequently often used as a preprocessing step to render the data more manageable. We propose in this paper a novel MTS similarity search approach that addresses these problems through dimensionality reduction and correlation analysis. An important contribution of the proposed technique is a representation allowing to transform the MTS with large number of variables to a univariate signal prior to seeking correlations within the set. The technique relies on unsupervised learning through Principal Component Analysis(PCA) to uncover and use, weights associated with the original input variables, in the univariate derivation. We conduct numerous experiments using various benchmark datasets to study the performance of the proposed technique. Compared to major existing techniques, our results indicate increased accuracy and efficiency. We also show that our technique yields improved similarity search accuracy.
Access provided by CONRICYT-eBooks. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Asuncion, A., Newman, D.: Uci machine learning repository (2007)
Bankó, Z., Abonyi, J.: Correlation based dynamic time warping of multivariate time series. Expert Systems with Applications 39(17), 12814–12823 (2012)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67. IEEE (2010)
Draper, B.A., Baek, K., Bartlett, M.S., Beveridge, J.R.: Recognizing faces with pca and ica. Computer vision and image understanding 91(1), 115–137 (2003)
Esmael, B., Arnaout, A., Fruhwirth, R.K., Thonhauser, G.: Multivariate time series classification by combining trend-based and value-based approximations. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012. LNCS, vol. 7336, pp. 392–403. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31128-4_29
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522. ACM (2003)
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4), 411–430 (2000)
Jegou, H., Douze, M., Schmid, C.: Inria holidays dataset (2008)
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189-206), 1 (1984)
Jolliffe, I.: Principal component analysis. Wiley Online Library
Kadous, M.W.: Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales (2002)
Kahveci, T., Singh, A., Gurel, A.: Similarity searching for multi-attribute sequences. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 175–184. IEEE (2002)
Kane, A., Shiri, N.: Selecting the top-k discriminative features using principal component analysis. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 639–646. IEEE (2016)
Karamitopoulos, L., Evangelidis, G., Dervos, D.: Pca-based time series similarity search. In Data Mining, pp. 255–276. Springer, 2010
Keogh, E.: Ucr time series archive (2006). www.cs.ucr.edu/~eamonn
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems 3(3), 263–286 (2001)
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)
Moinester, M., Gottfriedb, R.: Sample size estimation for correlations with pre-specified confidence interval
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 171–182. ACM (2010)
Pearson, K.: Mathematical contributions to the theory of evolution. xix. second supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, pp. 429–457 (1916)
Quandl. http://www.quandl.com/help/api
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. pp. 262–270, 2012
Ratanamahatana, C., Keogh, E., Bagnall, A.J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 771–777. Springer, Heidelberg (2005). doi:10.1007/11430919_90
Roverso, D.: Plant diagnostics by transient classification: The aladdin approach. International Journal of Intelligent Systems 17(8), 767–790 (2002)
Shieh, J., Keogh, E.: isax: disk-aware mining and indexing of massive time series datasets. In: Data Mining and Knowledge Discovery 19(1), 24–57 (2009)
Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multi-dimensional data based on mdl principle. Machine Learning 58(2–3), 269–300 (2005)
Yang, K., Shahabi, C.: A pca-based similarity measure for multivariate time series. In: Proceedings of the 2nd ACM International Workshop on Multimedia Databases, pp. 65–74. ACM (2004)
Yang, K., Yoon, H., Shahabi, C.: A supervised feature subset selection technique for multivariate time series.yang2004pca (2005)
Yi, B.-K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. VLDB (2000)
Zhu, Y.: High performance data mining in time series: techniques and case studies. PhD thesis, New York University (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kane, A., Shiri, N. (2017). Multivariate Time Series Representation and Similarity Search Using PCA. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-62701-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)