Instance Reduction for Time Series Classification by Exploiting Representative Characteristics using k-means

Vinh, Vo Thanh; Nguyen, Hien T.; Tran, Tin T.

doi:10.1007/978-3-319-49046-5_19

Vo Thanh Vinh¹⁸,
Hien T. Nguyen¹⁸ &
Tin T. Tran¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9978))

Included in the following conference series:

International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making

1642 Accesses

Abstract

1-Nearest Neighbor has been endorsed an efficient method for time series classification as it outperforms more advanced classification algorithms in most cases. However, the time and space efficiency of this method depend on the number of instances in a training set. In order to improve its running time and space using, one can apply an approach called instance reduction. This approach reduces the training set size by choosing the best instances and using only them during classification of new instances. However, the high-dimensional characteristic of time series is a challenge for many mining tasks including instance reduction. Due to this challenge, only two instance reduction methods have been proposed: Naïve Rank Reduction and INSIGHT. In this work, we propose a new approach to instance reduction using K-means. Our method can select a reasonable set of instances so that it can preserve the representative characteristics and keep the generalization of the original training set. We conduct experiments to evaluate our method on 46 public datasets from the UCR Classification Archive. The experimental results show that the proposed method achieves state-of-the-art performance on these datasets.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series

Low-Feature Extraction for Multi-label Patterns Analyzing in Complex Time Series Mining

A Novel Clustering-Based 1-NN Classification of Time Series Based on MDL Principle

Keywords

1 Introduction

Time series data can be found in many areas of our life such as finance, economy, or medicine. Mining data in these areas includes many different tasks such as motif discovery, anomaly detection, rule extraction, clustering, or classification. In which, classification is definitely important as it is the background of pattern recognition. For example, it can be applied in signature verification, handwriting recognition, analysis of brainwaves, or electrocardiograph signals. One of challenging properties of time series data is its high dimensionality. Some classification models such as 1-Nearest Neighbor (1-NN), Artificial Neural Network, or Bayesian Network have been applied successfully in time series. Among them, the simple approach 1-NN has been considered as very hard to be beaten [1, 3]. Nevertheless, time and space efficiency of 1-NN is still need to be improved. To this end, one can apply these directions:

1.
Using some indexing structures such as R-Tree [17], M-Tree [16], TS-Tree [18].
2.
Reducing dimensions of time series [19, 20].
3.
Speeding up computational time of distance measure [21, 22].
4.
Reducing the number of instances in the training set [1, 2].

Following the fourth approach, Xi et al. [1] proposed Naïve Rank Reduction and Buza et al. [2] proposed Instance Selection based on Graph-coverage and Hubness for Time series (INSIGHT). Although these methods can help reduce instances, they have three drawbacks: (i) do not preserve the training set’s distribution, (ii) may produce an imbalance training set, and (iii) must be provided the reduced training set size.

The contribution of our work can be summarized as follows. Firstly, we propose a new approach to instance reduction in time series classification using K-means. Secondly, our method can decide whenever to stop eliminating instances from the training set, in the meaning, if we continue to remove more instance, it would lead to poor results. The experiments were conducted over popular time series datasets from the UCR Classification Archive [7]. Experimental results show that the proposed method outperforms Naïve Rank Reduction and INSIGHT in most cases.

2 Backgrounds and Related Works

2.1 Time Series

A time series T is a sequence of real numbers collected at regular intervals over a period of time: T = t ₁, t ₂,…, t _n. It can be considered as a n-dimensional object in a metric space.

2.2 1-Nearest Neighbor

The 1-Nearest Neighbor (1-NN) assigns a new data object to the same class of its nearest object in the training set. 1-NN has been considered hard to be beaten in classification of time series data among other methods such as Artificial Neural Network, Bayesian Network [1, 3]. In this work, we use 1-NN to classify time series data and evaluate the error rate of this classifier on the reduced training set obtained by our instance reduction method.

2.3 Instance Reduction

The main goal of instance reduction is: (i) to improve the running time of classification algorithms while losing accuracy as little as possible and (ii) to fit data, especially training data, into limited memory devices. Many instance reduction methods were proposed to speed up nearest neighbor classification of conventional data such as in [8–15]. However, it is hard to find an efficient classification method for time series data because of its high dimensionality characteristic. So far, there has been only two works related to instance reduction for time series data as above mentioned.

2.4 Naïve Rank Reduction

Xi et al. in 2006 [1] proposed a method called Naïve Rank Reduction for time series data which uses a ranking function developed from the ideas in the most referenced paper by Wilson and Martinez (1997) [6]. Naïve Rank Reduction method first assigns rank to each instance x in the training set by identifying its contribution to the classification of other instances. The instances are ranked by the following formula:

$$ rank(x) = \sum\limits_{j} {\left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if }}class(x) = class(x_{j} )} \hfill \\ { - 2} \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.} $$

(1)

where x _j is the instance having x as its nearest neighbor. This ranking function attempts to keep the instances that contribute to correctly classifying other instances. The ranking function presented in Eq. (1) may produce the same rank for some instances. In order to break this tie, the authors used a second ranking function as presented in Eq. (2).

$$ priority(x) = \sum\limits_{j} {\frac{1}{{d(x,x_{j} )^{2} }}} $$

(2)

where x _j is the instance having x as its nearest neighbor and d(x, x _j) is the distance between instances x and x _j. The idea behind Eq. (2) is that the farther the instance from its nearest neighbor, the lower priority it should have. Because this instance may be noisy or unrepresentative instance, so it should be eliminated first.

2.5 INSIGHT Method for Instances Selection

Buza et al. in 2011 [2] proposed a method called Instance Selection based on Graph-coverage and Hubness for Time series (INSIGHT), which ranks the time series in the training set by using one of their three score functions. These ranking functions attempt to exploit the hubness property of time series [2]. This property states that for data with high dimensionality, some objects tend to become nearest neighbors much more frequently than others. The term hubness is a phenomenon that the distribution of f ^k_N (x) significantly skewed to the right, where f ^k_N (x) denote the k-occurrence of an instance x in a training set D, which is the number of instances in D having x as their k nearest neighbor. The score functions of INSIGHT were defined as follows:

$ f_{G} (x) = f_{G}^{1} (x) $: Good 1-occurrence score.
$ f_{R} (x) = f_{G}^{1} (x)/\left( {f_{N}^{1} (x) + 1} \right) $: Relative score.
$ f_{Xi} (x) = f_{G}^{1} (x) - 2f_{B}^{1} (x) $: Xi’s score (base on ranking criterion of Xi et al. [1]).

In which, f ¹_G (x) is the number of instances in the training set having x as their 1-nearest neighbor and x has the same label with them. $ f_{B}^{1} (x) $ is the number of instances in the training set having x as their 1-nearest neighbor and label of x is different from their label.

2.6 Notes on Naïve Rank Reduction and INSIGHT

Naïve Rank Reduction was used in a method called Fast Time Series Classification Using Numerosity Reduction (FastAWARD) proposed by Xi et al. [1]. This method iteratively removes time series in a training set. In each iteration, ranks of all the time series in the training set is calculated and the lowest ranked time series is discarded. Thus, each iteration kept a particular number time series. Xi et al. claim that the best value of Sakoe-Chiba band [4, 5] for DTW distance depends on the number of time series in the reduced training set. Therefore, they include a final step in order to calculate the warping window size for each training set size. The INSIGHT method just ranks instances in the training set and reduces the training set only once. This method does not include the second step to find the value for Sakoe-Chiba band r as in FastAWARD. Instead, the warping-window size is set to 5 % of the time series length.

Our proposed method is different from the FastAWARD and INSIGHT in that we do not use a ranking function. Moreover, our method does not focus on finding Sakoe-Chiba band r since this additional step is not important but computationally expensive. In addition, our method can easily include this step. All things considered, we only compare our method with the Naïve Rank Reduction and INSIGHT.

2.7 Weaknesses of Naïve Rank Reduction and INSIGHT

The main idea of Naïve Rank Reduction and INSIGHT bases on ranking functions, which attempt to exploit the original training set to discover how much the instances impact in the classification process. Then, the instances are arranged from least to most impact. The instances contributing most to the classifier are selected first. This idea has three drawbacks as follows:

It does not maintain the representative characteristics of the training set.
It may lead to imbalance in the number of instances of each class in the reduced training set.
It is parametric which means that we must provide the reduced training set size.

With regards to the first drawback, because the instances that have most impact in the classification process are kept, the training set’s distribution does not maintain as original. This means that representative characteristics of original training set are not preserved in reduced training set. As shown in Fig. 1, the original training set and the reduced training set have different distribution. Therefore, a new instance (square) in real world might be wrongly classified because the instances in the original training set that look like this new instance were removed.

For the second drawback, the ranking function may bias some classes while reducing instances. As a result, some classes may have much more instances than the others. We illustrate this issue in Fig. 2. The original training set has three classes; each contains 7 instances (circle). Meanwhile, the reduced training set is unbalanced, which has three instances for Class 1, two instance for Class 2, and six instances for Class 3. As a result, the new instance (square), which is classified as Class 2 with the original training set, will be classified as Class 3 with the reduced training set.

3 Proposed Method

3.1 The Framework

In order to overcome the drawbacks mentioned in Subsect. 2.7, we propose a framework to reduce training set size while maintaining distribution of each class. The outline of our method is described in Fig. 3. In order to keep p% instances in the training set, we run K-means on each class with the number of cluster K = p x number of instances in the class to get K centroids. Then, the K centroids are used as training instances. All the instances kept in each class are combined together to get the reduced training set.

As we can see from Fig. 3, the framework uses K-means to partition n instances into K sets C = {C ₁, C ₂,…, C _K} so as to minimize the sum of distance of each instance in the cluster to the K centroids as in Eq. (3).

$$ \mathop {\arg \hbox{min} }\limits_{C} \sum\limits_{i = 1}^{K} {\sum\limits_{{x \in C_{i} }} {dist(x,\mu_{i} )} } $$

(3)

where µ _i is the centroids of data instances in C _i. As a result, the instances with similar characteristics are grouped into one cluster and they are all similar to their centroid. Therefore, we can keep the centroids only as they can represent the whole cluster. Figure 4 illustrates effect of our framework, which can help keep representative instances while the number of instances in each class is balance as original. While the component of the framework, K-means, is well-known, we emphasize that the framework as a whole is new for the instance reduction of time series classification.

3.2 Choosing Appropriate Percentage of Instances

One question in instance reduction problem is how many instances should be kept in the training set. In general, the bigger the training set size, the more accurate the classifier. Although accuracy rate should be considered as a major factor in deciding which training set is better, training set size should also be considered as a penalty. If a training set has more instances than the others, we should penalize it more. For example, given a training set with 100 instances and accuracy rate for classification is 0.85. Now we remove this training set to 20 instances with its new accuracy rate for classification is 0.8. The accuracy rate decreases only 5 % but it can help to speedup 5 times. In this situation, it is more likely that we prefer the second choice. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are the two popular model selections. The ideas behind these two methods are the same except that BIC gives more penalties to the model with more parameters. The formulas of AIC and BIC are as follows:

$$ \begin{aligned} & AIC = - 2\log L + 2k \\ & BIC = - 2\log L + \log (N) \times k \\ \end{aligned} $$

(4)

where k is the number of parameters in the model, logL is a log-likelihood function, and N is the number of observed data. We can understand that AIC and BIC evaluate each model to find the best one bases on two factors: likelihood value and the number of parameters of the model. In other words, they attempt to balance likelihood and model’s size. In this work, we adopt the above two formulas in Eq. (4) to select an appropriate point at which we should not remove anymore instances. Note that we only apply the idea of balancing between likelihood value and the number of parameters of a model. So we emphasize that our method is not AIC or BIC. The formulas are redefined as follows:

$$ \begin{aligned} & f1\_score = 2 \times percentage\_error + 2 \times k \\ & f2\_score = 2 \times percentage\_error + \log (N) \times k \\ \end{aligned} $$

(5)

Where:

percentage_error: the percentage of wrongly classified instances.
k: the number used to penalize reduced training set, which is defined based on the percentage of selected instances as follows:

Percentage
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
k
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
N: total number of instance in original training set.

In general, the percentage of selected instances with the lowest value of these two criterions is the best one. In Eq. (5), we map the percentage of selected instance to the number of parameters. For example, if we select 10 % instances as reduced training set, the value of k is 2. Now why do we need this mapping? The answer is that if we do not map but use the percentage of selected instance, then it will penalize a lot, i.e. the value of k will have a high weight in selecting appropriate training set, and thus the accuracy rate does not play an essential part in deciding the percentage at which we choose to reduce the training set to.

Figure 5 illustrates how to identify the number of instances for reduced training set. In which we run our framework in Subsect. 3.2 with different number of instances, i.e. 10 %, 15 %, 20 %,…, 90 % of original training set. Then, we use these reduced training sets to classify original one (100 % instances) to get the percentage of wrongly classified instances (percentage_error). Finally, Eq. (5) is applied to find out the best point to reduce the training set to.

4 Experiments

This section shows two main experiments. Firstly, we compare accuracy rate of our method with those of Naïve Rank Reduction, and INSIGHT. Accuracy rate is the ratio of the number of correctly classified instances on tested data to the total number of tested instances. In general, the higher accuracy rate, the better method. Secondly, we find the percentage at which we should not remove any more instance. 10-fold cross-validation was applied over 46 datasets from the UCR Time Series Classification Archive [7]. Each datasets has from 2 to 50 classes. The length of each time series in each datasets is from 60 to 1882. Euclidean Distance was used as similarity measurement. The ranking function used in INSIGHT method is Good 1-occurrence score as this is the function used in the experiment of Buza et al. [2].

4.1 Comparing to Previous Methods

The two scatter graphs in Fig. 6 show comparison of the proposed method with those of Naïve Rank Reduction and INSIGHT. Each circle point in the scatter graphs represents a dataset. As we can see from the figures, most circle points deviate on the lower triangle, which means that the proposed method is better than the two previous ones. In detail, the proposed method outperforms INSIGHT in 41 datasets, draws in 1 datasets, and loses in 4 datasets. In comparison to Naïve Rank Reduction, the proposed method outperforms it in 42 datasets, draws in 1 datasets, and loses in only 3 datasets. Moreover, Table 1 shows accuracy rate of the three approaches on each dataset.

Table 1. Accuracy rate ± standard deviation in 46 datasets after reducing to 20 % of original training set

Full size table

4.2 Experiments on Choosing Appropriate Percentage

Figure 7(a) and (b) illustrate f1_score and f2_score respectively on Haptics dataset. As we can see, f1_score selects 70 % as the percentage that we should reduce the training set to and f2_score selects at 45 %. Comparing these points to the figures on Fig. 7(c), which shows the error rates when we use the reduced training sets to classify instances on testing set, these reductions are acceptable because the error rates at these points do not increase too much. In detail, the training set with 70 % chosen by f1_score has the error rate of 0.51. The training set with 45 % chosen by f2_score has the error rate of 0.46. Meanwhile, the training set of 90 % has the error rate of 0.48.

Figure 8(a) and (b) illustrate f1_score and f2_score respectively on Lightning 7 dataset. Both f1_score and f2_score select at 25 % as the percentage that we should reduce the training set to. Compare these points to the corresponding error rates on Fig. 8(c), which use the reduced training sets to classify instances on testing set, these reductions are acceptable because the error rates at these points even decreased. In detail, the training set with 25 % instances has the error rate of 0.1. Meanwhile, the training set with 90 % instances has the error rate of 0.3.

Figure 9(a) and (b) illustrate f1_score and f2_score respectively on Gun Point dataset. f1_score selects at 25 % as the percentage that we should reduce the training set to and f2_score selects at 10 %. Comparing these points to the error rate on Fig. 9(c) when we use the reduced training sets to classify instances on testing set, these reductions are reasonable because the error rate is 0.0 at all the percentage of reduction.

5 Conclusions

Reducing training set size is vitally important in improving speed and space using of 1-NN, an efficient method for time series classification. In this work, our method reduces training set size by using K-means to determine its representative characteristics. Then, only representative instances are kept as training data. In comparison with previous methods, our method has a different viewpoint as it does not use a ranking function, which can lead to some weaknesses mentioned in this paper. In addition, the proposed method can identify an appropriate percentage which we should choose to reduce the training set to. The experimental results show that the proposed method outperforms Naïve Rank Reduction and INSIGHT.

References

Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series classification using numerosity reduction. In: Proceedings of the 23rd International Conference on Machine learning, ICML 2006, pp. 1033–1040 (2006)
Google Scholar
Buza, K., Nanopoulos, A., Schmidt-Thieme, L.: INSIGHT: efficient and effective instance selection for time-series classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011. LNCS (LNAI), vol. 6635, pp. 149–160. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20847-8_13
Chapter Google Scholar
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endowment 1(2), 1542–1552 (2008)
Article Google Scholar
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7(3), 358–386 (2005)
Article Google Scholar
Berndt, D., Clifford, J.: Using dynamic time warping to find patterns in time series. In: Proceedings of AAAI Workshop on Knowledge Discovery in Databases, KDD 1994, Seattle, Washington, USA, pp. 359–370 (1994)
Google Scholar
Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: Proceedings of ICML 1997, Morgan Kaufmann, pp. 403–411 (1997)
Google Scholar
Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A. and Batista, G.: The UCR Time Series Classification Archive (2015). www.cs.ucr.edu/~eamonn/time_series_data/
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6, 153–172 (2002)
Article MathSciNet MATH Google Scholar
Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
Jankowski, N., Grochowski, M.: Comparison of instances seletion algorithms I. algorithms survey. In: Rutkowski, L., Siekmann, Jörg, H., Tadeusiewicz, R., Zadeh, Lotfi, A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 598–603. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24844-6_90
Chapter Google Scholar
Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II. results and comments. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI), vol. 3070, pp. 580–585. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24844-6_87
Chapter Google Scholar
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A Taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(1), 86–100 (2012)
Article Google Scholar
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)
Article MathSciNet MATH Google Scholar
Wilson, D.R., Martinez, T.R.: Instance pruning techniques. In: Proceedings of ICML1997, Morgan Kaufmann, pp. 403–411 (1997)
Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38, 257–286 (2000)
Article MATH Google Scholar
Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB 1997 Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 426–435 (1997)
Google Scholar
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 1984 Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984)
Google Scholar
Assent, I., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th International Conference on Extending database technology: Advances in database technology (EDBT 2008), pp. 252–263 (2008)
Google Scholar
Keogh, E., Chakrabarti, K., Pazzani, P., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3, 263–286 (2001)
Article MATH Google Scholar
Keogh, E., Pazzani, M.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 239–241 (1998)
Google Scholar
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), pp. 262–270 (2012)
Google Scholar
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 358–386 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Vo Thanh Vinh, Hien T. Nguyen & Tin T. Tran

Authors

Vo Thanh Vinh
View author publications
You can also search for this author in PubMed Google Scholar
Hien T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tin T. Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hien T. Nguyen .

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh
Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka, Japan
Masahiro Inuiguchi
University of Science, Ho Chi Minh City, Vietnam
Bac Le
Duy Tan University, Da Nang, Vietnam
Bao Nguyen Le
Université de Technologie de Compiègne , Compiègne, France
Thierry Denoeux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vinh, V.T., Nguyen, H.T., Tran, T.T. (2016). Instance Reduction for Time Series Classification by Exploiting Representative Characteristics using k-means. In: Huynh, VN., Inuiguchi, M., Le, B., Le, B., Denoeux, T. (eds) Integrated Uncertainty in Knowledge Modelling and Decision Making. IUKM 2016. Lecture Notes in Computer Science(), vol 9978. Springer, Cham. https://doi.org/10.1007/978-3-319-49046-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-49046-5_19
Published: 29 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49045-8
Online ISBN: 978-3-319-49046-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics