Abstract
Classification of time series has been attracting great interest over the past decade. While dozens of techniques have been introduced, recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems, especially for large-scale datasets. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a high time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data and to make the classification result more explainable, which global characteristics of the nearest neighbor cannot provide. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. We can use the distance to the shapelet, rather than the distance to the nearest neighbor to classify objects. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Anon (1525) Founders’ and benefectors’ book of Tewkesbury Abbey, in Latin England. www.bodley.ox.ac.uk/dept/scwmss/wmss/medieval/mss/top/glouc/d/002.htm
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. AAAI-94 workshop on knowledge discovery in databases, Seattle, Washington, 31 July 1994
Breiman L, Friedman J, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont, CA
Briandet R, Kemsley EK, Wilson RH (1996) Discrimination of arabica and robusta in instant coffee by fourier transform infrared spectroscopy and chemometrics. Food Chem 44(1): 170–174
Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, 24–27 Aug 2003. KDD ‘03, ACM, New York, NY, pp 493–498
CMU Graphics Lab Motion Capture Database http://mocap.cs.cmu.edu/
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the VLDB endowment, Aug 2008, vol 1, 2. pp 1542–1552
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. ACM SIGMOD Record 23, June 1994, vol 2. pp 419–429
Geurts P (2001) Pattern extraction for time series classification. In: Raedt LD, Siebes A (eds) Proceedings of the 5th European conference on principles of data mining and knowledge discovery, Sept 03–05, 2001 (Lecture notes in computer science), vol 2168. Springer-Verlag, London, pp 115– 127
Gramm J, Guo J, Niedermeier R (2003) On exact and approximation algorithms for distinguishing substring selection. In: Proceedings of 14th fundamentals of computation theory (Lecture notes in computer science), vol 2751. Springer-Verlag, London, pp 963–971
Jeffery C (2005) http://public.lanl.gov/eads/datasets/emp/index.html
Jeong MK, Lu JC, Huo X, Vidakovic B, Chen D (2006) Wavelet-based data reduction techniques for process fault detection. Technometrics 48(1): 26–40
Kadous MW (1999) Learning comprehensible descriptions of multivariate time series. In: Bratko I, Dzeroski S (eds) Proceedings of the sixteenth international conference on machine learning, June 27–30, 1999. Morgan Kaufmann Publishers, San Francisco, CA, pp 454–463
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, 23–26 July 2002. KDD ‘02. ACM, New York, NY, pp 102–111
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping knowledge. Inf Syst 7(3): 358–386
Keogh E, Wei L, Xi X, Lee S, Vlachos M (2006) LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In: Dayal U, Whang K, Lomet D, Alonso G, Lohman G, Kersten M, Cha SK, Kim Y (eds) Proceedings of the 32nd international conference on very large data bases, Seoul, Korea, 12–15 Sept 2006. Very large data bases. VLDB Endowment, pp 882–893
Koschorreck W, Werner W (eds) (1981) Facsimile edition with commentary: Kommentar zum faksimile des codex manesse: Die grosse Heidelberger Liederhandschrift
Lang W, Morse M, Patel JM (2009) Dictionary-based compression for long time-series similarity. IEEE transactions on knowledge and data engineering, 15 Oct 2009
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2): 107–144
Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Patt Anal Mach Intel 23(2): 228–233
Montagu JA (1840) A guide to the study of heraldry. W. Pickering, London. www.archive.org/details/guidetostudyofhe00montuoft
Rodríguez JJ, Alonso CJ (2004) Interval and dynamic time warping-based decision trees. In: Proceedings of the 2004 ACM symposium on applied computing, Nicosia, Cyprus, March 14–17, 2004. SAC ‘04. ACM, New York, NY, pp 548–552
Roverso D (2000) Multivariate temporal classification by windowed wavelet decomposition and recurrent neural networks. In: 3rd ANS international topical meeting on nuclear plant instrumentation, control and human-machine interface, 2000
Salzberg SL (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1: 317–327
Veloso A, Meira W, Zaki MJ (2006) Lazy associative classification. In: Proceedings of the sixth international conference on Data mining, Dec 18–22, 2006. ICDM. IEEE Computer Society, Washington, DC, pp 645–654
Wikipedia description of coffee: http://en.wikipedia.org/wiki/Coffee
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1: 80–83
Wilson RH, Goodfellow BG (1994) Mid-infrared spectroscopy. Spectroscopic Techniques for Food Analysis
Xi X, Keogh E, Shelton C, Wei L, Ratanamahatana CA (2006) Fast time series classification using numerosity reduction. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh, Pennsylvania, June 25–29, 2006. ICML ‘06, vol 148. ACM, New York, NY, pp 1033– 1040
Yamada Y, Suzuki E, Yokoi H, Takabayashi K (2003) Decision-tree induction from time-series data based on a standard example split test. In: Proceedings of the 20th international conference on machine learning, pp 840–847
Ye L (2009) The time series shapelet Webpage. www.cs.ucr.edu/~lexiangy/shapelet.html
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, June 28 to July 01 2009. KDD ‘09. ACM, New York, NY, pp 947–956
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bart Goethals.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Ye, L., Keogh, E. Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Disc 22, 149–182 (2011). https://doi.org/10.1007/s10618-010-0179-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-010-0179-5