Multivariate Time Series Representation and Similarity Search Using PCA

Kane, Aminata; Shiri, Nematollaah

doi:10.1007/978-3-319-62701-4_10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10357))

Included in the following conference series:

Industrial Conference on Data Mining

2124 Accesses
6 Citations
6 Altmetric

Abstract

Multivariate time series(MTS) data mining has attracted much interest in recent years due to the increasing number of fields requiring the capability to manage and process large collections of MTS. In those frameworks, carrying out pattern recognition tasks such as similarity search, clustering or classification can be challenging due to the high dimensionality, noise, redundancy and feature correlated characteristics of the data. Dimensionality reduction is consequently often used as a preprocessing step to render the data more manageable. We propose in this paper a novel MTS similarity search approach that addresses these problems through dimensionality reduction and correlation analysis. An important contribution of the proposed technique is a representation allowing to transform the MTS with large number of variables to a univariate signal prior to seeking correlations within the set. The technique relies on unsupervised learning through Principal Component Analysis(PCA) to uncover and use, weights associated with the original input variables, in the univariate derivation. We conduct numerous experiments using various benchmark datasets to study the performance of the proposed technique. Compared to major existing techniques, our results indicate increased accuracy and efficiency. We also show that our technique yields improved similarity search accuracy.

Access provided by CONRICYT-eBooks. Download to read the full chapter text

Chapter PDF

A Comparison of Multivariate Time Series Clustering Methods

Dimensionality reduction for multivariate time-series data mining

Article 19 January 2022

Mining Massive Time Series Data: With Dimensionality Reduction Techniques

Keywords

References

Asuncion, A., Newman, D.: Uci machine learning repository (2007)
Google Scholar
Bankó, Z., Abonyi, J.: Correlation based dynamic time warping of multivariate time series. Expert Systems with Applications 39(17), 12814–12823 (2012)
Article Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Google Scholar
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: isax 2.0: Indexing and mining one billion time series. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 58–67. IEEE (2010)
Google Scholar
Draper, B.A., Baek, K., Bartlett, M.S., Beveridge, J.R.: Recognizing faces with pca and ica. Computer vision and image understanding 91(1), 115–137 (2003)
Article Google Scholar
Esmael, B., Arnaout, A., Fruhwirth, R.K., Thonhauser, G.: Multivariate time series classification by combining trend-based and value-based approximations. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012. LNCS, vol. 7336, pp. 392–403. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31128-4_29
Chapter Google Scholar
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522. ACM (2003)
Google Scholar
Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural networks 13(4), 411–430 (2000)
Article Google Scholar
Jegou, H., Douze, M., Schmid, C.: Inria holidays dataset (2008)
Google Scholar
Johnson, W.B., Lindenstrauss, J.: Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics 26(189-206), 1 (1984)
Google Scholar
Jolliffe, I.: Principal component analysis. Wiley Online Library
Google Scholar
Kadous, M.W.: Temporal classification: Extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales (2002)
Google Scholar
Kahveci, T., Singh, A., Gurel, A.: Similarity searching for multi-attribute sequences. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 175–184. IEEE (2002)
Google Scholar
Kane, A., Shiri, N.: Selecting the top-k discriminative features using principal component analysis. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 639–646. IEEE (2016)
Google Scholar
Karamitopoulos, L., Evangelidis, G., Dervos, D.: Pca-based time series similarity search. In Data Mining, pp. 255–276. Springer, 2010
Google Scholar
Keogh, E.: Ucr time series archive (2006). www.cs.ucr.edu/~eamonn
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems 3(3), 263–286 (2001)
Article MATH Google Scholar
Lin, J., Keogh, E., Lonardi, S., Chiu, B.: A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2–11. ACM (2003)
Google Scholar
Moinester, M., Gottfriedb, R.: Sample size estimation for correlations with pre-specified confidence interval
Google Scholar
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 171–182. ACM (2010)
Google Scholar
Pearson, K.: Mathematical contributions to the theory of evolution. xix. second supplement to a memoir on skew variation. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, pp. 429–457 (1916)
Google Scholar
Quandl. http://www.quandl.com/help/api
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. pp. 262–270, 2012
Google Scholar
Ratanamahatana, C., Keogh, E., Bagnall, A.J., Lonardi, S.: A novel bit level time series representation with implication of similarity search and clustering. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 771–777. Springer, Heidelberg (2005). doi:10.1007/11430919_90
Chapter Google Scholar
Roverso, D.: Plant diagnostics by transient classification: The aladdin approach. International Journal of Intelligent Systems 17(8), 767–790 (2002)
Google Scholar
Shieh, J., Keogh, E.: isax: disk-aware mining and indexing of massive time series datasets. In: Data Mining and Knowledge Discovery 19(1), 24–57 (2009)
Google Scholar
Tanaka, Y., Iwamoto, K., Uehara, K.: Discovery of time-series motif from multi-dimensional data based on mdl principle. Machine Learning 58(2–3), 269–300 (2005)
Article MATH Google Scholar
Yang, K., Shahabi, C.: A pca-based similarity measure for multivariate time series. In: Proceedings of the 2nd ACM International Workshop on Multimedia Databases, pp. 65–74. ACM (2004)
Google Scholar
Yang, K., Yoon, H., Shahabi, C.: A supervised feature subset selection technique for multivariate time series.yang2004pca (2005)
Google Scholar
Yi, B.-K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. VLDB (2000)
Google Scholar
Zhu, Y.: High performance data mining in time series: techniques and case studies. PhD thesis, New York University (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Software Engineering, Concordia University, Montreal, Canada
Aminata Kane & Nematollaah Shiri

Authors

Aminata Kane
View author publications
You can also search for this author in PubMed Google Scholar
Nematollaah Shiri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aminata Kane .

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Leipzig, Sachsen, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kane, A., Shiri, N. (2017). Multivariate Time Series Representation and Similarity Search Using PCA. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2017. Lecture Notes in Computer Science(), vol 10357. Springer, Cham. https://doi.org/10.1007/978-3-319-62701-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-62701-4_10
Published: 01 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62700-7
Online ISBN: 978-3-319-62701-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multivariate Time Series Representation and Similarity Search Using PCA

Abstract

Chapter PDF

Similar content being viewed by others

A Comparison of Multivariate Time Series Clustering Methods

Dimensionality reduction for multivariate time-series data mining

Mining Massive Time Series Data: With Dimensionality Reduction Techniques

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multivariate Time Series Representation and Similarity Search Using PCA

Abstract

Chapter PDF

Similar content being viewed by others

A Comparison of Multivariate Time Series Clustering Methods

Dimensionality reduction for multivariate time-series data mining

Mining Massive Time Series Data: With Dimensionality Reduction Techniques

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation