1 Introduction

Time series is a useful way for representing temporal data in a wide variety of real-world applications. Some applications of time series analysis are: economic indicators analysis, meteorological phenomena analysis, human movement and health monitoring, signal processing, etc. This has lead to numerous studies on data mining over time series.

For dealing with large amounts of time series data, there is a need of developing time series representations that are concise and indexable. This allows one to efficiently implement data mining tasks like clustering, classification, and anomaly detection. For this reason, there are many techniques for transforming a time series into a lower resolution representation. For instance, the most common spectral decomposition technique, Discrete Fourier Transform (DFT), is highly useful for periodic signals and feature-based systems. However, the main disadvantage of DFT over other techniques is its time complexity, that in the best case is \(O(n \log n)\). Alternatively, Piecewise Approximation techniques address the segmentation problem with a time complexity of O(n). These techniques require only one parameter, the number of segments or the sliding window size. Some of the better-known indexable techniques are: Piecewise Linear Approximation (PLA) [3, 12], Piecewise Aggregate Approximation (PAA) [10] and Adaptive Piecewise Constant Approximation (APCA) [2].

In addition to these numeric representation techniques, new approaches based on symbolic representations have taken notoriety in this field. Symbolic representations are used to offer greater legibility of interpretation, simplicity, and efficient implementation. In this group, Symbolic Aggregate Approximation (SAX) [19] deserves special mention, as well as some of its extensions: Differential Evolution SAX for non-normalized time series [7] and SAX for multivariate time series [21]. Other techniques combine SAX with any additional feature of time series, for example the trend-based techniques [6, 20].

All of the aforementioned techniques generate the representation of a time series for a single level of resolution. On the other hand, multi-resolution techniques operate the time series representation using different levels of granularity. The building algorithm of a multi-resolution representation has O(cn) time complexity, where c is a constant of the resolution level. An example of this type of technique is the Discrete Haar Wavelet Transform (DWT) [26]. An advantage of the multi-resolution approach is that it provides us hierarchical access to data by means of resolution levels, for example: a multi-resolution index for binary SAX (iSAX) [23], and an iterative clustering algorithm using Haar Wavelets [17], or Multi-resolution PAA (MPAA) [18].

In this paper, we introduce a new multi-resolution representation based on local trends and mean values of the time series. It becomes a parameter-free technique when we use the maximum level of resolution which will be defined in this work. Unlike DWT and MPAA that only takes into account the average values, our proposed multi-resolution representation also considers the local trends on each level of resolution. We also propose a symbolic representation derived from the numeric representation by applying SAX quantization on the trend and value components, which allows us to provide a lower bounding function for indexing a time series collection. Finally, we propose a multi-resolution discord discovery for anomaly detection based on our proposed time series approximation.

Our proposed multi-resolution representation has several advantages with respect to the state of the art. First, it adds the trend value to the representation, which improves its discriminative power as shown in our experimental evaluation. Second, it provides lower-bounding distances for each resolution level, which can be used to early prune a similarity search thus saving distance computations. Finally, it takes advantage of a symbolic representation that allows us to build a hash table for quickly finding buckets with similar time series, which makes the technique efficient.

The efficacy and efficiency of our approach is experimentally evaluated in two data-mining tasks, classification and anomaly detection, over a diversity of data domains (UCR Time Series Archive [4, 9]). We empirically show that our method outperforms conventional methods.

This paper is an extended version of a previous work presented at the 14th International Work-Conference on Artificial Neural Networks (IWANN’17) [22]. The main differences between this article and the conference paper are:

  • The aim of the conference paper was improving the discord discovery task. This paper focuses on the multi-resolution proposal as a general technique that can be used for different time series mining tasks.

  • We improved the presentation of the state-of-the-art techniques.

  • We added a second data mining task, classification, for evaluating the efficiency and effectiveness of the multi-resolution representation.

  • We added several experimental evaluations using more time series collections.

1.1 Background

Piecewise Approximation Method splits the time series into segments and builds a new time series with the representative values of each segment. This approach is widely used for dimensional reduction of time series. For example, two well-known piecewise approximation techniques are PAA [11] and SAX [16]. Other related methods are Piecewise Approximations based on Trend and Value features, that is, each segment of the time series is represented by both the local trend and a representative value. The trend can be denoted using the derivative estimation [8], the slope [6], or a radio-based function [5]. Figure 1 shows three recent techniques that use this approach, which are listed below.

Fig. 1
figure 1

Three techniques for time series representation based on trend and value: a Piecewise Trend Approximation, b Piecewise Trend-Value Approximation and (c) Extended Trend-Value Approximation

Piecewise Trend Approximation (PTA) [5]: Given a time series \(P = \{ p_1, \cdots , p_n \} \), this algorithm partitions P into \(m < n\) variable length segments and builds the new representation:

$$\begin{aligned} PTA(P, m, \epsilon ) = \{ (r_1, t_{r_1}) , \ldots , (r_m, t_{r_m}) \}, \end{aligned}$$
(1)

where \(t_{r_i}\) is the right endpoint and \(r_i\) is the ratio between both endpoints in the ith segment. This technique adapts the segments to the original shape of the time series, such that two conjunctive segments represent different trends. This adaptation requires tuning the threshold of ratios \(\epsilon \), indicating when two consecutive segments can be aggregated.

Piecewise Trend-Value Approximation (TVA) [6]: It is an extension of SAX by adding new symbols that represent the slope of each segment:

$$\begin{aligned} TVA(P, m) = \{ ({\hat{v}}_1, {\hat{s}}_1) , \ldots , ({\hat{v}}_m, {\hat{s}}_m) \}, \end{aligned}$$
(2)

where \({\hat{v}}_i \in \, \{a= \hbox {low}, \, b= \hbox {normal}, \, c= \hbox {high}\}\) is the SAX symbol and \({\hat{s}}_i \in \, \{U = \hbox {up}, \, D = \hbox {down}, \, S = \hbox {straight}\}\) is the slope symbol. The algorithm uses linear regression to compute the slope. Testing was performed on a particular streaming multivariate time series to assign a label to each multivariate segment using classification tasks.

Extended Trend-Value Approximation (1d-SAX) [20]: It is a compact binary representation to improve the retrieval performance using the same quantity of information as SAX. Unlike TVA, it works with alphabets of different sizes:

$$\begin{aligned} 1d-SAX (P, m, \alpha ^v, \alpha ^s) = \{ ({\hat{v}}_1, {\hat{s}}_1) , \cdots , ({\hat{v}}_m, {\hat{s}}_m) \}, \end{aligned}$$
(3)

where \({\hat{v}}_i\) is the average value symbol from an alphabet of size \(\alpha ^v\) and \({\hat{s}}_i \in \) is the slope symbol from an alphabet of size \(\alpha ^s\). In addition, the authors propose a lookup table with all distances between two different symbols to speed-up the approximate search on large databases. Testing was performed on seven univariate time series collections (UCR Archive) using classification tasks.

2 Multi-resolution Trend-Value Approximation

2.1 Motivation

Why a Trend-Based Representation? Esmael et al. claims that “using only the value approximation, causes a high possibility to miss some important patterns in some time series data. SAX does not pay enough attention to the shapes of the time subsequences and may produce similar strings for completely different time series” [6]. Figure 2 shows an example of this statement. Here, we use an agglomerative hierarchical clustering to group five time series in three different classes. The time series is split into four segments. SAX only takes the mean value while TVA considers the slope obtaining a better match between time series 2 and 3, which belong to the same class.

Fig. 2
figure 2

A comparison of the ability of two time series representations to cluster five members of the CBF dataset [4] using the Euclidean distance

TVA and 1d-SAX are similar techniques that compress each trend-value pair of the time series by a quantization process. In this work, we extend the ability of the local trends to various levels of resolution. While the granularity parameter (number of segments) of the piecewise approximation like TVA and 1d-SAX produces a horizontal segmentation, we propose a hierarchical segmentation induced by the resolution level. We will discuss how segmentation provides greater advantages in design and optimization and use both representations, symbolic and numeric, to evaluate our proposed method. Moreover, the numeric representation serves us to further reduce the reconstruction error and obtain better results in information retrieval tasks.

Our time series representation technique is called Multi-resolution Trend-Value Approximation (MTVA). The technique generates trend-value pairs on each level of resolution of the time series, and then computes the similarity between two MTVA representations using a distance measure. In addition, we design a symbolic representation to index a time series collection. Table 1 summarizes the notation used in this and subsequent sections.

Table 1 Notation used in this paper

2.2 Bottom-Up Construction Algorithm

Given the times series \(P = \{ p_1, \ldots , p_n \} \) and L as the level of resolution defined by the user, the MTVA representation of P is built following the next steps:

  1. 1.

    We start in the last resolution level L dividing the time series into \(M=2^{L-1}\) segments of size \(w=n/M\).

  2. 2.

    Let \(Y=\{y_1, \ldots , y_w\}\) be a segment of P in the time segment \(X=\{x_1, \ldots , x_w\}\). We compute the linear regression on each segment by the function \(lr(x) = ax + b\), where:

    • \(a = \frac{\sum _{i=1}^{w}(x_j - {\bar{X}})*y_j}{\sum _{i=1}^{w}(x_j - {\bar{X}})^2} \)

    • \(b = {\bar{Y}} - a * {\bar{X}}\)

    • \({\bar{X}}\) and \({\bar{Y}}\) are the average value of X and Y, respectively.

    • The trend-value pair (vs) of the segment Y is defined by:

      • \(v = a*\frac{x_1+x_w}{2} + b\) is the mean value.

      • \(s = \arctan (a)\) is the slope,

  3. 3.

    For the next resolution levels \(M = 2^{\{L-2, L-3,\ldots , 0\}}\), compute the trend-value pair (vs) for each segment as follows:

    • \(v = \frac{v_i + v_{i+1}}{2}\)

    • \(s = \arctan \left( \frac{v_{i+1} - v_i}{x_{i+1} - x_i}\right) \).

    • \(v_i\) and \(x_i\) are, respectively, the mean value and the average time associated with a segment in the upper level (see Fig. 3).

  4. 4.

    The MTVA representation is an array of all the trend-value pairs: \(MTVA(P, L) = \{(v_1, s_1), ..., (v_m, s_m)\}\).

Fig. 3
figure 3

Construction of the Multi-resolution Trend-Value Approximation

Figure 3 shows the MTVA representation of the time series P up the third level of resolution \((L=3)\). Parameter L can be automatically computed so that the spatial complexity of the MTVA representation does not exceed the space of the original time series, that is, adjusting the total number of segments \(m \le n/2\). Alternatively, m can be defined in terms of the level of resolution \(m = 2^L - 1\). Solving both equations, we conclude that the level of resolution for P is:

$$\begin{aligned} L_{max} = \lfloor \log _2(n/2) \rfloor + 1. \end{aligned}$$
(4)

2.3 Distances Measures

First, we need cost functions to measure the distance between trend-value pairs. Given two pairs \(p_i\) and \(q_j\), we define three cost functions:

  • \(cost_1(p_i, q_j) = |v_i^p - v_j^q| + |s_i^p - s_j^q| \)

  • \(cost_2(p_i, q_j) = |v_i^p - v_j^q|^2 + |s_i^p - s_j^q|^2 \)

  • \(cost_{dot}(p_i, q_j)= (1 + |v_i^p - v_j^q|)*(1 + |s_i^p - s_j^q|) - 1 \)

For using \(cost_1\) and \(cost_2\), both value-domain and slope-domain should have similar ranges to avoid that the distance is governed by only one of them. The slope ranges are between \(-\frac{\pi }{2}\) and \(+\frac{\pi }{2}\). We therefore normalize the time series by a standard normalization procedure (e.g., Z-distribution). The cost function \(cost_{dot}\) can also be used on time series that need to retain their value-domain.

Second, we adapt two common techniques for measuring the distance between two sets of trend-value pairs:

  • Linear Distance:

    $$\begin{aligned} Dist(P, Q) = \sum _{i=1}^M cost(p_i, q_i). \end{aligned}$$
    (5)
  • Dynamic Time Warping Distance: It is computed building a cumulative cost matrix \( C \in {\mathbb {R}}^{M \times M}\) in the following way:

    $$\begin{aligned} C[0,0]= & {} 0 \quad C[i,0] = \infty \quad C[0,j] = \infty \\ C[i,j]= & {} cost(p_i,q_j) + min \left\{ \,\, \begin{array}{l} C[i-1,j] \\ C[i,j-1] \\ C[i-1,j-1] \end{array} \right. \\&\Rightarrow Dist(P,Q) = C[M, M]. \end{aligned}$$

Finally, our multi-resolution distance \( MDist \) measures the dissimilarity between two MTVA representations executing Dist on all levels of resolution:

$$\begin{aligned} MDist (P, Q)=\sum _{l=1}^{L} Dist(P[a \ldots b], Q[a \ldots b]), \quad \text{ where } a = 2^{l -1} \text{ and } b = 2^l - 1. \end{aligned}$$
(6)

Algorithm 1 shows the pseudocode of \( MDist \). Its time complexity using the Linear Distance corresponds to the sum of the time for each level of resolution:

$$\begin{aligned} T(L) = \sum _{l=1}^{L}M_l = \sum _{l=1}^{L}2^{l-1} = 2^L - 1. \end{aligned}$$
(7)

If we compute the distance in the worst case where L is exactly \(\log _2(n/2)+1\), the time complexity is O(n), where n is the length of the original time series. For the DTW Distance, the time complexity is expressed as follows:

$$\begin{aligned} T(L) = \sum _{l=1}^{L}M_l^2 = \sum _{l=1}^{L}(2^{l-1})^2 = \frac{4^L - 1}{3}, \end{aligned}$$
(8)

where in the worst case it is \(O(n^2)\). Therefore, \( MDist \) in the worst case is theoretically as fast as the classic distances that work over raw time series.

figure a

To reduce the number of distance computations, we design an optimization strategy that applies a threshold th to break out of the loop when the partial distance is greater than th, thus avoiding having to compute the rest of levels (see Algorithm 1, lines 7–9). This threshold allows us to efficiently compute the similarity between two MTVA representations reducing the number of calls to Dist. When the algorithm receives two inputs of different resolution, it takes the maximum resolution level that both have in common (line 1). A practical application of the pruning is in time series classification when we use the Nearest Neighbor Search (see Algorithm 2, line 14). This optimization follows the spirit of the Anticipatory DTW [1], where the authors apply a threshold into the DTW distance to filter out comparisons.

2.4 Symbolic Representation

By using discretization techniques, we transform the numeric representation into a sequence of symbols. This symbolic representation provides us greater ease of interpretation and simplicity to manage time series collections.

Definition 1

According to Lin et al., “Breakpoints are a sorted list of numbers \( \beta = \{ \beta _1, \, \ldots , \beta _{\alpha -1} \}\), such that the area under a N(0, 1) Gaussian curve from \( \beta _i\) to \(\beta _{i+1}= 1/\alpha \) (\(\beta _0\) and \(\beta _{\alpha }\) area defined as \(-\infty \) and \(+\infty \), respectively)” [16]. For example, if \(\alpha = 4\), the breakpoints are \(\{\beta _1 = -\,0.67,\beta _2 = 0, \beta _3= +\,0.67\}\).

2.4.1 Gaussian Assumption

To transform the numeric pair \(p_i=(v_i, s_i)\) to a symbolic pair \({\hat{p}}_i=({\hat{v}}_i, {\hat{s}}_i)\), we quantize both values separately using breakpoints that produce equal-size areas under the Gaussian curve \(N(\mu , \sigma ^2)\) (similar to 1d-SAX, see Fig. 1). Gaussian discretization is feasible for normalized time series, since statistically the mean value and the slope have a Gaussian distribution [19, 20]. As in 1d-SAX, the breakpoints are determined by the curve N(0, 1) for the mean value and \(N(0, \sigma ^2_L)\) for the slope. In this last case, we use the variance \(\sigma ^2_L\) in terms of the level of resolution L because each level of resolution generates different slope distributions (see Fig. 4), unlike the 1d-SAX that uses a slope variance in terms of the size of the segment. Additionally, to apply the linear regression between X and Y, we recommend that both variables have a similar range. If the time series is normalized in N(0, 1), the temporal component X must fit in this interval size. In this work, we normalize the length of each segment \(X=[1, w] \rightarrow X=[-1, 1]\). Thus, the variance \(\sigma ^2_L\) is only in terms of the level of resolution and independent of the size of the time series.

Fig. 4
figure 4

Density of the slope varying the level of resolution in ECG time series

2.4.2 Alphabet Size

The alphabet size is delimited by the number of breakpoints (Definition 1) and strongly influences the compression ratio and the reconstruction error. To quantize the trend-value pair, we need two alphabets with size \(\alpha _v\) and \(\alpha _s\) for the mean value and the slope respectively. We use binary symbols where \(\alpha \) is a power of two [23]. For example, to compress the numeric MTVA representation up the level 3 using \(\alpha _v=4\) and \(\alpha _s=4\), we need \((2 + 2) * (2^3 - 1)\) bits, which is less than 4 bytes for time series. The symbolic MTVA representation is useful for different applications like indexing and anomaly detection.

2.5 Indexing

We propose a simple bucket index to efficiently manage MTVA time series datasets (see Fig. 5). We use the symbolic representation to build a hash table, where each bucket \({\hat{P}}\) envelops a set of similar MTVA time series. Moreover, we design a lower bounding function called \(LB\_ MDist \) to measure the distance between the query object Q and a bucket \({\hat{P}}\), so that it is lower or equal than the real distance between Q and any object \(P \in {\hat{P}}\).

Fig. 5
figure 5

Index model for the MTVA representation

Before defining \( LB\_MDist \), we first need to define the lower bounding function of the trend-value cost, which is denoted as follows:

$$\begin{aligned} LB\_cost({\hat{p}}_i, q_i) \le cost(p_i, q_i). \end{aligned}$$
(9)

The symbol \(\hat{p_i}\) derives from a trend-value pair \(p_i\) that is located between two breakpoints \( \beta _{Ui} < p_i \le \beta _{Li}\), independently for each pair value (see Fig. 6). Either of these equations then computes \(LB\_cost\):

  • \( LB\_cost_1({\hat{p}}_i, q_i)= {\varDelta }v + {\varDelta }s \)

  • \( LB\_cost_2({\hat{p}}_i, q_i)= ({\varDelta }v)^2 + ({\varDelta }s)^2 \)

  • \( LB\_cost_{dot}({\hat{p}}_i, q_i)= (1+{\varDelta }v) *(1 + {\varDelta }s) - 1 \)

where

Fig. 6
figure 6

Lower bounding trend-value cost, the blue line represents a trend-value pair stored in our database and the green line is the query. (Color figure online)

Equation 10 describes the lower bounding function for the multi-resolution linear distance:

$$\begin{aligned} { LB }\_{ MDist }({\hat{P}}, Q)=\sum _{l=1}^ IL \sum _{i=2^{(l -1)}}^{2^l - 1} LB\_cost({\hat{p}}_i, q_i). \end{aligned}$$
(10)

The Indexing Level \(IL \le L\) is the level of resolution with which we build the hash table. \( IL \) determines the amount of bits for the index structure.

For similarity searches, on the one hand, we can use \(LB\_ MDist \) to find approximated buckets in which the objects are most similar to a query object Q. On the other hand, we can use \(LB\_ MDist \) to efficiently search the nearest neighbor of Q (see Algorithm 2). We end up with the same result as applying a linear scan over the numeric MTVA time series. This algorithm orders the buckets in such a way that those closest to Q, in terms of \(LB\_ MDist \), are visited first (lines 2–5). Afterwards, we refine the candidates objects, applying a post-processing search with the anticipatory pruning distance MDist (lines 12–19) that receives as threshold the best match distance so far. The algorithm finishes immediately when the current visited bucket has a lower bounding distance greater than the best match distance (lines 9–11).

figure b

3 Multi-resolution Discord Discovery

Keogh et al. [14] introduced an unsupervised method—called discord discovery—for finding the most unusual subsequence (the one with the largest distance from its nearest non-self match) in a streaming time series. The basic algorithm has a computational complexity of \(O(N^2)\), where N is the total number of subsequences. Therefore, the main challenge of the discord discovery techniques is to face this quadratic complexity.

In this sense, our MTVA representation together with the proposed efficient search techniques and the discord discovery heuristics can be used to solve the anomaly detection on normalized subsequences. This is called HOT MTVA. However, we observed that the index method designed above uses only one hash table for the whole SAX space (similar to HOT SAX), and this has a disadvantage: the size of the hash table grows considerably when using a higher resolution level. This may result in a high searching cost, which could be as expensive as applying a linear scan over the data. Therefore, we propose a multi-resolution method which increases the level resolution when a hash-bucket is overflowed (see Fig. 7). Such an indexing structure is the perfect fit for our MTVA representation. Moreover, it allows one to control the level of resolution of the detected anomaly.

Fig. 7
figure 7

Multi-resolution Index Model for the MTVA representation

3.1 Building Algorithm

Let \(P = \{ p_1, \ldots , p_n \} \) be a time series, and let w be the size of the sliding window for extracting all subsequences. Algorithm 3 describes the insertion procedure of a MTVA subsequence \(C_p\) into the indexing structure \(\mathbb {R}\). Unlike HOT iSAX, we apply hierarchical quantization (line 2) to access the hash tables (where each slot is a node) from the lowest resolution to the maximum resolution. If a terminal node is full, we re-insert all its associated objects into a hash table of higher level to provide additional differentiation (lines 10–16), so we create new nodes with the next resolution level of the current node (lines 20–22). We use a size threshold \(th_{max}\) to control the maximum number of objects in a terminal node (the so-called bucket). It follows that the indexing level has dynamic behavior, as its incremental value depends on the size of the dataset and the maximum level of resolution (\(L_{max}\)).

figure c

3.2 Discord Discovery Heuristics

The discordant subsequence is found by applying the optimal discord discovery procedure [14] using the following heuristics:

Outer Loop Heuristic We first visit all subsequences belonging to the bucket that contains the minimum number of subsequences starting from the lowest resolution level (\(L=1\)). Afterwards, we visit the rest of buckets in random order. This heuristic ensures that the subsequences that are most isolated, at each resolution level, will be visited at the beginning of the search as potential candidates.

Inner Loop Heuristic We then use an inner loop to search the best non-self match of each selected candidate \(C_q\). We first visit all subsequences contained in the bucket from which \(C_q\) is retrieved. Afterwards, we apply the nearest non-self match search algorithm (NNM-Search) to visit the rest of the buckets. This heuristic allows us to first visit all the subsequences most similar to \(C_q\), increasing the probability of an early termination of the loop.

The NNM-Search algorithm (see Algorithm 4) performs a hierarchical search across the internal nodes using a stack to maintain the nodes ordered by MINDIST (lines 10–17). This MINDIST is a lower bounding function of the MTVA distance, which measures the minimum distance between the query and the current node. MINDIST is calculated as:

$$\begin{aligned} MINDIST ({\hat{P}}, Q, l, \alpha )= \alpha + \sum _{i=2^{(l -1)}}^{2^l - 1} LB\_cost({\hat{p}}_i, q_i), \end{aligned}$$
(11)

where l is the current resolution level and \(\alpha \) is the accumulated distance of the previous levels. The algorithm also applies two breaking statements to break the inner loop as early as possible: one corresponds to MINDIST (line 6) and the other one to the best-so-far discord distance (line 27).

figure d

4 Experimental Results

In this section, we evaluate the performance of our approach for classification and anomaly detection in different datasets. An Intel Core i7 3.4GHz with 8GB RAM is used for conducting all our experiments. All algorithms are implemented in C++.

4.1 Classification

For this experimental evaluation, we use 44 univariate time series collections from the UCR Archive (see Table 2). We first evaluate the quality of our proposed representation with other state-of-the-art techniques for representing time series. We next compare our approach with two classic distances that work over the raw representation. The nearest neighbor search (Algorithm 2) gives us the classification results.

Table 2 UCR Time Series Archive: data collections used for our assessments

4.1.1 Trend-Value Approximation Performance

We evaluate the performance of the numeric representations based on trends and mean values (TVA and MTVA) with respect to the representation based only on average values (PAA). To get the best performance, we use the best parameter tuning in each time series collection. In this sense, PAA and TVA require tuning the total number of segments in terms of the length of the time series \(m = factor * n\), where the best value for \( factor \) is experimentally selected from the set \(\{0.1, 0.15, 0.2, \ldots , 0.5\}\). MTVA requires tuning the level of resolution L from the set \(\{2, 3, \ldots , L_{max}\}\). Table 3 shows the results obtained by the linear distance. Here, we calculate the average classification error, the win-tie-lose results of the numeric TVA and MTVA representations compared to PAA representation, and the p value (Wilcoxon signed ranks test [25], confidence level = 0.95). We mark a tie between two classification results when their error difference is lower than 0.001. We observe that our multi-resolution trend-value representation performs better than the PAA and TVA representations. Moreover, our technique required less number of segments than PAA, although we need twice the memory space for the numeric trend-value pairs.

Table 3 Classification error using the best configuration of three numeric representations

Furthermore, we compare our multi-resolution representation with other state-of-the-art techniques (see Fig. 8). First, the numeric MTVA is compared with the Multi-resolution PAA (MPAA [18]) which is similar to the Discrete Haar Wavelet Transform (DWT [11]). Second, the symbolic MTVA is compared with other symbolic representations like SAX and 1d-SAX. For the latter, we use the same quantity of information for generating the same compression ratio in each dataset (Table 4). As the breakpoints depend on the slope distribution \(N(0, \sigma ^2)\), we empirically set the slope variance as \(\sigma ^2=\frac{2}{\sqrt{m}}\) and \(\sigma ^2=\frac{2}{L}\) for the 1d-SAX and the MTVA respectively, which obtained the best trade-off in most datasets. After setting the parameters, we compute the classification error using the Euclidean distance (\(cost_2\) for the MTVA) over eight time series collections that have at least eight levels of resolution. We observe that MTVA reaches the smallest error sooner than MPAA. Moreover, the error obtained by both MTVA and 1d-SAX is stabilized from level \(L \ge 4\). However, the error obtained by SAX shows a monotonically decreasing trend while the size of word gets bigger, and it obtains better performance than our method from level \(L \ge 7\).

An important characteristic of our multi-resolution representation is that the segments always have the same size in each resolution level defined by L. In this way, for each time series length, we can generate all trend-value pairs of the maximum resolution level and store them in secondary memory. Afterwards, we just take out those that are requested by the user. This characteristic provides us with greater flexibility for testing our index model with different index levels without the need to recalculate the MTVA representation. In contrast, the segments generated by the piecewise approximations (like PAA, TVA, SAX and 1d-SAX) are of different sizes in each resolution level defined by m. Thus, they always need to recalculate the representation when one varies the value of m.

4.1.2 MTVA Distance Performance

Table 5 shows the results obtained by our multi-resolution distance MDist, using the linear distance and the dynamic time warping as base distance. For both cases, we calculate the average classification error, the win-tie-lose results of the numeric MTVA approximation compared with the raw representation, and the p value. Moreover, we performed two tests: a first test using the maximum level of resolution for each dataset (Eq. 4), and a second test using the best level of resolution for each dataset.

We obtain a smaller classification error when we use MTVA, mainly in two of the proposed trend-value costs: \(cost_1\) and \(cost_{dot}\). Using linear distance, we note that the obtained decrease in classification error is statistically significant (p value < 0.05). Using the DTW distance, our multi-resolution representation has a non-significant difference regarding the raw representation (p value > 0.05). However, we observed that MTVA was particularly effective in half of the datasets, which means that our approach performs better than the classic distance depending on the data collection.

Fig. 8
figure 8

Comparison of our multi-resolution representation with other state-of-the-art techniques

Table 4 Equalizing the quantitative information
Table 5 Classification error using raw representation and MTVA representation

As noted above, the efficiency is the most remarkable characteristic of using time series representation. To show the computational advantage of our approach, we select the largest dataset that contains time series of length 750, where the maximum level of resolution is \(L_{max}=8\). Figure 9 shows the improved efficiency of MDist (using the linear distance) compared with the classic Euclidean Distance in terms of computed cost functions increasing the level of resolution. We first perform a linear scan using the MDist according to Eq. 6. Afterwards, we apply the anticipatory pruning strategy (Algorithm 1). We note the wide lead of using the anticipatory pruning, which allows us to save about one order of magnitude in computed cost functions in the best case (\(L=8\)). This efficiency advantage is due to multi-resolution properties described in Sect. 2.3. We finally evaluate our simple bucket index, where the search algorithm also employs the pruning property (Algorithm 2). We note that our indexing method outperforms the linear scan up to two orders of magnitude, thus being the most efficient option for retrieving TVA time series.

Fig. 9
figure 9

Efficiency improvement by the MTVA Distance regarding the classic ED that uses raw representation

Table 6 Datasets for discord discovery evaluation

We conclude with an important fact about our multi-resolution representation: the best level of resolution was obtained in the interval \(L=\{4,5,6\}\) for 38 of 44 time series collections. This will allow us to limit the range of possible values for L in new time series collections.

4.2 Anomaly Detection

In this experiment, we address the anomaly detection problem using the discord discovery approach together with our proposed methods. Effectiveness will be evaluated over a set of real cases of anomalous time series collected by Keogh et al. [13] detailed in Table 6. An expert in the field (application domain) annotated the anomalous region. Also, the window size is different for each time series. To evaluate the efficiency of our approach on static time series, we use the datasets ECG, EEG, ERP, Koski, Random Walk and Packet from the UCR Time Series Archive [4]. We also use the “Time Series for Weather Data” from the National Oceanic and Atmospheric Administration in the USA (available at www.esrl.noaa.gov/psd/boulder/). For each dataset, we randomly extract time series of lengths 1k, 2k, 4k, 8k, 16k and 32k. We empirically set the maximum number of elements in a bucket as \(th_{max}=50\).

We first evaluate the accuracy of the two trend-value numeric representations, TVA and MTVA, over all anomaly cases. An important feature of the trend-value representations is that the slope of the noisy segments tends to zero and thereby the unusualness of noisy subsequences is reduced. The classic techniques use the Euclidean Distance as measure distance over the raw representation of the normalized subsequence. For TVA and MTVA representations, we use the linear distance with the cost function \(cost_2\). We use the same number of inputs in both methods and three different values of dimensional reduction. Table 7 shows that trend-value representations achieve a higher percentage of true detections (\(ACC \ge 0.5\)) than using the raw representation. In this way, we assert that our method is more robust to local noise. Furthermore, we can improve this percentage up to 100% of true detections finding the best value for \(\varepsilon \) for each of the time series. Additionally, we also note that TVA outperforms MTVA, but nevertheless we highlight the flexibility of MTVA for dynamically working in different levels of resolution at runtime.

Table 7 Percentage of true detections for trend-value representations

Secondly, we accelerate the search using our indexing structures and compare it with the main state-of-the-art techniques like HOT_SAX [13] and HOT_iSAX [15]. We set the same quantitative information for each technique (see Table 4). We empirically set the indexing level as \( IL =3\), so the word size is set as \(m = 2^{IL} - 1\). To measure the efficiency of the algorithms, we consider the number of computed distances (see Fig. 10). We observe that our MTVA techniques are much more efficient than HOT_iSAX in terms of computed distances. Moreover, we achieved a better performance in CPU runtime due to the dimensional reduction and the anticipatory pruning distance. Finally, Fig. 11 shows experimentally the benefits of using a multi-resolution index for MTVA representation instead of a simple bucket index. The multi-resolution index significantly reduces the number of buckets and computed MINDIST for the same resolution level.

Fig. 10
figure 10

Efficiency improvement of our multi-resolution method in Anomaly Detection

Clearly, discord discovery using our multi-resolution trend-value representation returns better results. Our approach also provides flexibility with finding anomalies at different levels of granularity.

Fig. 11
figure 11

Structural comparison of two indexing methods for MTVA representation

Table 8 Parameters to be tuned

5 Conclusions

We proposed a multi-resolution time series representation (MTVA) which is composed of trend-value pairs obtained by applying regression linear in each resolution segment. Our parameter-free technique models the full resolution of time series, keeping the same spatial complexity of the raw representation. We also provided a (dis)similarity measure and its lower bounding function to perform efficient searches.

We demonstrated the utility of our proposed method in Classification Accuracy improving the performance of the classic techniques based on raw representation. Also, our numeric representation was more effective than the classic approximations: PAA, TVA and Multi-resolution PAA. We performed a comparison of three symbolic data models using the same quantitative information, where our symbolic MTVA slightly outperformed SAX and 1d-SAX. The efficiency of our method was tested in terms of computed cost functions, where our MTVA distance was two orders of magnitude faster than the classic Euclidean Distance.

MTVA also was evaluated in Anomaly Detection, where we highlighted the slope feature for mitigating the false unusualness of noisy subsequences. The efficiency of our multi-resolution discord discovery algorithm outperformed HOT_iSAX in terms of computed distances. One additional advantage of our multi-resolution representation is that the level of resolution was more intuitive and easier to fine-tune than the number of segments in each piecewise approximation technique.

One disadvantage of trend-value approximations is that they require twice the space per segment. Adding a parameter to represent the trend of the time series, it runs the risk of subtracting simplicity to our concise data model if it is compared with the SAX technique. Table 8 shows a summary of the parameters that need be tuned by each technique. Nevertheless, MTVA can be used as baseline for finding anomalies in different levels of granularity.