Detecting anomalies in sequential data augmented with new features

Kong, Xiangzeng; Bi, Yaxin; Glass, David H.

doi:10.1007/s10462-018-9671-x

Detecting anomalies in sequential data augmented with new features

Published: 03 January 2019

Volume 53, pages 625–652, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Artificial Intelligence Review Aims and scope Submit manuscript

Detecting anomalies in sequential data augmented with new features

Download PDF

503 Accesses
14 Citations
Explore all metrics

Abstract

This paper presents a new weighted local outlier factor method for anomaly detection, which is underpinned with three novel components: (1) a piecewise linear representation defined on the basis of the important points that consist of extreme points and additional points; (2) a set of new features which are used to identify anomalies given the new piecewise linear representation; (3) a weighting schema, assigning different weights to different features by accounting for the discriminant power of the features. The underlying idea of the proposed method is to characterize a time series with a set of four features and then discover abnormal changes by taking account of the closeness of any data points augmented with the new features. The comparative experiments demonstrate that the proposed piecewise representation method has performed well in sequential time series data, and the weighted local outlier factor method has achieved better accuracy and RankPower in detecting anomalies from the same data sets in comparison with the conventional local outlier factor, normalized local outlier factor and HOT symbolic aggregate approximation methods.

Data-Driven Pattern Identification and Outlier Detection in Time Series

Show Me Your Friends and I’ll Tell You Who You Are. Finding Anomalous Time Series by Conspicuous Cluster Transitions

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Anomaly detection techniques aim to find patterns that do not conform to expected behavior in the data set (Chandola et al. 2009; Huang 2013). These patterns are often called anomalies, outliers, abnormal changes, surprises or discords in different contexts, frequently arising in real-world applications such as bioinformatics and finance (Huang 2013; Chandola et al. 2008a, b; Keogh et al. 2005). In this paper we present a new anomaly detection method called weighted local outlier factor (WLOF), which is able to extract and weight features in time series.

In the past decades, many anomaly detection methods have been developed in specific application domains, which can be broadly divided into two categories (Beigi et al. 2011): modeling approaches (including rule-based, pattern-matching and model-based approaches), which require the prior knowledge of application domains, and data mining approaches (including similarity-based and statistical approaches), which do not require any prior knowledge of application domains. Hadi used a modeling approach based on statistical estimation of the distribution parameters to identify anomalies in multivariate samples (Hadi 1994). Tandon and Chan (2007) used a parametric statistical modeling approach based on association rule mining-based techniques for network intrusion detection. Keogh et al. (2005) used distance based approaches to identify the anomalies in time series. Sun et al. (2005) proposed an algorithm to compute the neighbourhood for each node in bipartite graphs using random walk with restarts and graph partitioning and then used the neighbourhood information to identify abnormal nodes. Some researchers have combined modeling approaches and data-mining approaches to identify the anomalies in data streams. For example, Chandola et al. (2008a, b) proposed a framework for modeling categorical data with a desired set of characteristics and a set of separability statistics, which are helpful for understanding the performance of similarity measures for outlier detection. In addition, Aydin et al. (2015) proposed a modified kernel-based tracking methods for detecting the anomalies of railway traffic, and Jin et al. (2016) proposed a method for detecting bearing anomalies and fault prognosis using the Kalman filter approach. Moreover several surveys have also been reported in the literature on outlier detection for different application areas (Hodge and Austin 2004; Zhang et al. 2008; Gupta et al. 2014).

The nature of anomalies determines which anomaly detection techniques would be applied. According to the suggestions of Chandola et al. (2009), anomalies can be grouped into three categories as follows. (1) Point anomalies: a data instance is considered as anomalous with the rest of the data, such as in the case of credit card fraud. (2) Contextual anomalies: a data instance is anomalous in a specific context, but not otherwise. Contextual anomalies have been investigated in time series data (Weigend et al. 1995) and spatial data (Kou et al. 2006). (3) Collective anomalies: a collection of data instances is anomalous with respect to the entire data set. Collective anomalies can be found, for example, in electrocardiogram data (Keogh et al. 2005).

In this paper we focus on collective anomalies in different types of sequential data. In order to find the collective anomalies, we need to segment a time series into a set of sub-series of data, i.e. subsequences. Piecewise linear representation (PLR) (Keogh et al. 2001; Yankov et al. 2007; Keogh et al. 2008) is a common feature representation method which has been used to obtain the main features of time series data or data streams. The main idea of the PLR is using the K connective straight lines to represent a time series with length n(K ≪ n). The advantages of PLR are summarized as follows: (1) a low-dimensional index structure and (2) high computational efficiency (Keogh et al. 2001; Yan et al. 2013). In fact, PLR can obtain higher precision with a larger number of segments, but that would require more computation time. Keogh et al. (2001, 2008) also proposed a piecewise aggregate approximation (PAA) method for dimensionality reduction in time series data (Palpanas et al. 2004), which segments a time series using a fixed size window and uses the average value of each sub-segment to collectively represent a time series. Park et al. (2001a, b) used the monotonic sliding windows segmentation algorithm to represent a time series, and demonstrated good results for a smooth time series data. However, real world data often include a great deal of noise and the number of segments required is often very large. Peng et al. (2000) used the landmark model to segment a time series through selecting segment points according to their minimum distance/percentage principle which is a smoothing process and is implemented as a linear time algorithm. Pratt and Fink (2002) proposed an important point segmentation method that compresses a time series by selecting some of its minima and maxima. In this paper we adopt a piecewise linear representation method based on important points (PLR_IP).

Given a new representation of time series data, we also need a method for measuring the difference between data objects (instances) embedded in subsequences in order to detect collective anomalies. Therefore, a PLR method can be used to segment a time series into an alternative representation, and distances of the objects within their neighbourhood can be used to find the anomaly. For instance, Ramaswamy et al. (2000) used the distance in the k-nearest neighbourhood to rank the outliers. Their approach can be used to compute the top n outliers. Breunig et al. (2000) used a local outlier factor (LOF), whose value depends on how isolated objects are with respect to the surrounding neighbourhood, as a measure for determining outliers. Although that approach can find meaningful outliers, there are two issues with the LOF method. One is that it does not work well for those features with different orders of magnitude as the features with large magnitude will determine the results, whereas the features with smaller magnitude will have little effect. Another is that the LOF method can recognize the anomalies in time series data based on their original values (Breunig et al. 2000), but when anomalies are interleaved in regular frequency spectrums or other complex anomalies, the LOF is not able to do so.

In order to address these two issues above, we propose the WLOF, in which all selected features will be taken into account in detecting anomalies. Importantly, we propose to construct four features to represent time series data, three of which are defined on the basis of the PLR_IP, representing three different aspects of a time series. First of all, we average the data points in a subsequence that corresponds to a sliding window. The second and third features are defined as the number of important points and the maximum angle of the subsequence, respectively, which are designed mainly for finding anomalies in regular spectrums. Finally, Lin et al. (2003) used the symbolic aggregate approximation (SAX) method to map a time series into a character string like “cbccbaab”, every character in the alphabet representing the feature of one segment (Keogh et al. 2006). Similarly, to represent a segment with a feature, we propose a new feature which is the difference between the values of important points in a subsequence and then compute the maximum difference between important points in a sliding window which may cover several segments. This feature represents the maximum change in all the segments involved in a sliding window. Therefore these features constitute a core for the WLOF method to find anomalies in time series data.

After presenting the WLOF method in detail, we then present experimental results to evaluate it. The experiments have been carried out over 17 benchmark datasets and the comparative analysis against other approaches to demonstrate the effectiveness of the proposed WLOF method in discovering more anomalies within the time series data.

The paper is organized as follows. In Sect. 2, we introduce the concept of PLR_IP and WLOF. In Sect. 3 we present the experimental results over 17 data sets which show that the proposed method can find local outliers. In Sect. 4 we discuss the effect of different parameters. Finally, Sect. 5 presents conclusions and future work.

2 Methodology

2.1 Notation

2.1.1 Time series and subsequences

Time series or sequential data exist in many real world domains such as commercial, economic, medical, and gene expression data. These domains typically involve large amounts of data and are updated regularly which make it very difficult to detect anomalies directly in the original time series data. Thus, we separate a time series sequence into a set of relatively short subsequences using a sliding window. Firstly, we give some definitions of a time series sequence and subsequences as follows:

Definition 1

(Time series) A sequence of pairs, T = [(Z₁, t₁), (Z₂, t₂), …, (Z_n, t_n)], (t₁ < t₂ < ··· < t_n) where Z_i is a data point in a d-dimensional data space, and t_i is the time stamp corresponding to the time at which Z_i occurs (1 ≤ i ≤ n).

Definition 2

(Subsequence Keogh et al. 2005) Given a time series T = [(Z₁, t₁), (Z₂, t₂), …, (Z_n, t_n)], a subsequence C of T is a sampling of length m ≤ n of contiguous position from p, that is, $ C_{p,m} = \left[ {\left( {Z_{p} ,t_{p} } \right), \ldots , \left( {Z_{p + m - 1} ,t_{p + m - 1} } \right)} \right] $ for 1 ≤ p ≤ n − m + 1. To get a set of subsequences $ C_{m} = \left\{ {C_{1} ,C_{2} , \ldots ,C_{n - m + 1} } \right\} $, sliding windows can be defined and used, where each subsequence corresponds to a sliding window, where overlap between two adjacent sliding windows can be adjusted on the basis of different applications.

2.1.2 Anomalous features of a subsequence

A subsequence could be anomalous compared with subsequences or contain an anomaly, which can be characterized with various features of the subsequence, such as average value and the maximum difference between values of important points, etc. In this study, four features have been identified. Prior to defining them, we define the extreme points, important points, piecewise linear representation and fitting error.

Definition 3

[Extreme points (Yan et al. 2013)] Given a 1-dimensional time series, T = [(Z₁, t₁), (Z₂, t₂), …, (Z_n, t_n)], if (Z_i > Z_i−1 and Z_i > Z_i+1) or if (Z_i < Z_i−1 and Z_i < Z_i+1), the point (Z_i, t_i) is an extreme point.

Definition 4

(Important points) Extreme points are important features of time series, but sometimes the distance between two neighbouring extreme points is too large, making it difficult to find an anomaly. For this reason, we introduce a concept of important points that consist of extreme points plus additional points identified by the following two step procedure. The first step identifies several extreme points that represent largest distances between extreme points in the data, and the second step ensures that the distance between the neighbouring points is not too large. The set of important points is obtained by a two step procedure below.

Step 1 Select extreme points as important points. The first and last data points of subsequences are selected as important points. Then suppose that there are L extreme points in T = [(Z₁, t₁), (Z₂, t₂), …, (Z_n, t_n)], where L < n. For a specified number of important points required g and parameter β ∊ (0, 1), if $ L \ge \left\lfloor {\beta (g - 2)} \right\rfloor $, $ \left\lfloor {\beta (g - 2)} \right\rfloor $ extreme points are selected as important points iteratively as follows. At each iteration, the data point $ (Z_{r} ,{\text{ t}}_{r}^{{}} ) $ is selected where r satisfies:
$$ r = \mathop {\arg \hbox{max} }\limits_{j \in FI} D[Z_{j} ,Z_{{i_{j} }} ] $$
(1)
where FI is the set of subscripts of extreme points that have not yet been selected as important points, D is a distance measure, and $ (Z_{{i_{j} }} ,t_{{i_{j} }} ) $ is the currently selected important point that is the nearest to (Z_j, t_j). If $ L < \left\lfloor {\beta (g - 2)} \right\rfloor $, all the extremes are selected as important points. Note that since we aim to find the abnormal change of time series and because the change in time t is uniform, this means that the distance between two adjacent data points at t is the same, we only select the important points based on the Z value.
Step 2 Select some additional points as important points if necessary. The remaining $ g - 2 - \left\lfloor {\beta (g - 2)} \right\rfloor $ important points are also selected iteratively as follows. Suppose, $ P = \left[ {\left( {Z_{{i_{1} }} ,t_{{i_{1} }} } \right),\left( {Z_{{i_{2} }} ,t_{{i_{2} }} } \right), \ldots \left( {Z_{{i_{l} }} ,t_{{i_{l} }} } \right)} \right] $, where $ t_{{i_{1} }} , t_{{i_{2} }} , \ldots , t_{{i_{l} }} $ is the set of important points which have been selected. At each iteration the data point $ (Z_{h} , \, t_{h} ) $ is selected, where $ t_{h} = \left\lfloor {\frac{{t_{{i_{a} }} + t_{{i_{a + 1} }} }}{2}} \right\rfloor $, $ Z_{h} = Z_{{t_{h} }}^{{}} , $a is obtained as follows:
$$ a = { \arg }\mathop {\hbox{max} }\limits_{1 \le j \le l - 1} D\left[ {Z_{{i_{j} }} ,Z_{{i_{j + 1} }} } \right] $$
(2)
i.e. we identify the largest distance between the currently selected important points. If $ L < \left\lfloor {\beta (g - 2)} \right\rfloor $, all the extremes are selected as important points and the remaining g − 2 − L important points are obtained using Formula (2).

Here we give an illustration of important points. Suppose that Fig. 1 shows a sequence of a time series and six important points are required, and $ \beta = \frac{1}{2} $. First of all, the beginning point b₁ and end point b₂ are selected as indicated by the yellow circles, and then we need to select two extreme points as important points according to step 1 of Definition 4. Firstly, e₁ is selected and then e₂ is selected according to Formula (1) as indicated by the red circles. Now we have selected all extreme points with the number given by $ \left\lfloor {\beta (g - 2)} \right\rfloor = 2,\,{\text{where}}\,,g = 6,\,\beta = \frac{1}{2} $, thus the rest of the extreme points m₁, m₂ and m₃ cannot be selected as important points. With this situation, we need then to select two additional points as important points according to the step 2 of Definition 4 to ensure none of the differences are too large. The largest difference in Z values between neighbouring points is between b₁–e₁. So a₁ is selected as an important point, and then a₂ is selected according to Formula (2) since after a₁ has been added the largest difference in Z values is between a₁–e₁. Points a₁ and a₂ are indicated by the green circles. Since large differences between points will affect feature extraction, the six important points identified should be more suitable for this purpose.

Definition 5

Piecewise linear representation (PLR) of time series based on important points (Yan et al. 2013)

Given a time series, T = [(Z₁, t₁), (Z₂, t₂), …, (Z_n, t_n)], where the set of important points is, T′ = [(Z₁′, t₁′), (Z₂′, t₂′), …, (Z_m′, t_m′)], where $ Z_{1}^{'} = Z_{1}^{{}} ,Z_{m}^{'} = Z_{n}^{{}} \,{\text{and}}\,m < n $, then a PLR of T can be obtained by first defining a set of functions: $ T_{l} = \left( {f_{1} ,f_{2} , \ldots , f_{m - 1} } \right) $, where f_j represents a linear fitting function between the points (Z_j′, t_j′) and (Z_j+1′, t_j+1′). The PLR of T is obtained by replacing each point in T with the point from the function f_j corresponding to the same time point. The fitting sequence can be expressed as follows: T″ = [(Z₁″, t₁), (Z₂″, t₂), ···, (Z_n″, t_n)]. In this paper set T′ represents the set of important points, and T″ represents the set of fitting sequences.

Definition 6

(Fitting error of PLR) Having defined the fitting sequence T″ which has the same size with original sequence T, the fitting error between the fitting sequence and original sequence T is defined as follows:

$$ Err = \sqrt {\sum\nolimits_{i = 1}^{n} {\left( {Z_{i} - Z_{i}^{''} } \right)^{2} } } $$

(3)

where n is the length of original sequence, Z_i and Z_i″ respectively express the original sequence value and fitting sequence value at the same time t_i. A smaller fitting error shows that the fitting sequence better reflects the original sequence.

According to Definition 5, we develop a segmentation method called PLR_IP, using the important points to segment the time series. Now we further define four features that will be used to characterize subsequences, each of which corresponds to a sliding window, for anomaly detection as follows.

Definition 7

The maximum angle of a subsequence

Let, T′ = [(Z₁′, t₁′), (Z₂′, t₂′), …, (Z_len′, t_len′)] be the important points in a given subsequence, where len is the number of important points; for simplicity we express this as, T′ = [I₁, I₂, …I_len)]. Define θ_i to be the angle between the vectors V_i−1,i and V_i,i+1, where V_i−1,i represents the vector from I_(i−1) to I_i and V_i,i+1 represents the vector from I_i to I_i+1, for i = 2, 3, ···, len − 1. θ_i is called the degree of anomaly of the ith important point as shown in Fig. 2. The maximum angle of the subsequences corresponding to a sliding window is denoted S^p_θ and is given by

$$ S_{\theta }^{p} = \hbox{max} \left\{ {\left| {\theta_{2} } \right|,\left| {\theta_{3} } \right|, \ldots ,\left| {\theta_{len - 1} } \right|} \right\}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \theta_{i} \in ( - \pi ,\pi ) $$

(4)

Note that there is no degree of anomaly defined for the first and last important points of a subsequence. The angles are decided by important points; and the fitting data points don’t affect the angles.

Definition 8

Number of important points in a subsequence

The number of the important points in a subsequence, denoted as S^p_N, is defined as

$$ S_{N}^{p} = \left| {\left\{ {\left( {Z_{\alpha }^{'} ,t_{\alpha }^{'} } \right) \in T'\left| {t_{p} \le t_{\alpha }^{'} \le t_{p + m - 1} } \right.} \right\}} \right| $$

(5)

where T′ = [(Z₁′, t₁′), (Z₂′, t₂′), …, (Z_len′, t_len′)] where $ t_{1}^{'} < t_{2}^{'} < \cdots < t_{len}^{'} . $T′ is a set of important points of the time series T. S^p_N represents the number of important points in C_p computed by Definition 4.

Definition 9

Average value of a subsequence

The average value of Z, denoted as S^p_μ, is defined as

$$ S_{\mu }^{p} = \frac{1}{m}\sum\limits_{i = p}^{p + m - 1} {Z_{i} } $$

(6)

where p is the beginning position index of the sliding window and p + m − 1 is the end position index of the sliding window. Z_i represents the value of the data points in the sliding window C_p.

Definition 10

The maximum difference between values of important points in a subsequence

$$ S_{\sigma }^{p} = \hbox{max} \left\{ {h_{2} ,h_{3} , \ldots ,h_{len} } \right\}{\kern 1pt} {\kern 1pt} {\kern 1pt} $$

(7)

where $ h_{i} = \left| {Z_{i}^{'} - Z_{i - 1}^{'} } \right| $ is the difference between (Z_i′, t_i′) and (Z_i−1′, t_i−1′) with respect to Z, where T′ = [(Z₁′, t₁′), (Z₂′, t₂′), …, (Z_len′, t_len′)] are the important points in the sliding window.

2.1.3 A weighted local outlier factor method

According to the features of the time series that have been defined above, we propose a new anomaly detection method called the “weighted local outlier factor”, which assigns different features with different weights, and then uses these weighted features for anomaly detection. The relevant definitions are given below.

Definition 11

The distance between two subsequences P and Q in the new feature space.

We have defined four features in Definitions 7–10, which give us a four dimensional feature space. We can then compute the distance between two different subsequences in this space. Supposing subsequence P is represented by the point (x_p, y_p, l_p, m_p) and subsequence Q by the point (x_q, y_q, l_q, m_q) in the four dimensional feature space, where x, y, l, m represent the four features respectively and the number of subsequences n is determined by the size of the sliding window. The weighted Euclidean distance is defined as follows:

$$ {\text{wdist}}\,\left( {P,Q} \right) \equiv \sqrt {w_{1} \left( {x_{p} - x_{q} } \right)^{2} + w_{2} \left( {y_{p} - y_{q} } \right)^{2} + w_{3} \left( {l_{p} - l_{q} } \right)^{2} + w_{4} \left( {m_{p} - m_{q} } \right)^{2} } $$

(8)

where w_i are weights, which are assigned to these four features and ∑⁴_i=1w_i = 1. In order to determine appropriate weights, we use the sum of the values of each feature and want to ensure that for a given feature, the larger its sum, the smaller its weight. This approach can avoid a feature with a large sum determining the result with other features being irrelevant. One way of achieving this is as follows:

$$ w_{i} = \frac{{\sum\limits_{j = 1}^{4} {Sum_{j} - Sum_{i} } }}{{3\left( {\sum\limits_{j = 1}^{4} {Sum_{j} } } \right)}} $$

(9)

where Sum₁ = ∑ⁿ_k=1|x_k| for feature x and similarly for the other features y, l and m. The idea is that instead of using the normalized sum, i.e. $ w_{i} = \frac{{Sum_{i} }}{{\sum\nolimits_{j = 1}^{4} {Sum_{j} } }} $, we use the mean of the normalized sum of the other three features to ensure that the larger sums have the smaller weights. An empirical comparison between the weighted local outlier factor and the local outlier factor is presented in Sect. 3.

Definition 12

The k-distance of subsequence object P: kwdist(P) (Breunig et al. 2000)

Here each of the subsequences is viewed as one object which is represented by the four features x, y, l, m. For any positive number k, the k-distance of object P, denoted as kwdist(P), is defined as the wdist(P, O) (see Definition 11) between P and an object O ∊ D, where D is the set of subsequence objects such that:

1.
For at least k objects $ O' \in D\backslash \{ P\} $ it holds that wdist(P, O′) ≤ wdist(P, O), and
2.
For at most k − 1 objects $ O' \in D\backslash \{ P\} $ it holds that wdist(P, O′) < wdist(P, O)

These constraints are defined for the k-distance of object P which represents the distance between P and the kth nearest object O. Figure 3 shows the k-distance of subsequence object P. The definition of the reachability distance of an object is given as follows:

Definition 13

The k-weighted local reachability densities of subsequence object P (Breunig et al. 2000):

$$ wlrd_{k} (P) = \frac{k}{{\sum\limits_{Q \in kw(P)}^{{}} {reach{ - }wdist_{k} (P,Q)} }} $$

(10)

where $ kw\left( P \right) = \left\{ {Q \in D\backslash \{ P} \right\}:wdist\left( {P,Q} \right) \le kwdist\left( P \right)\} $,$ reach{ - }wdist_{k} (P,Q) = \hbox{max} \left\{ {kwdist(Q),wdist(P,Q)} \right\} $. We can then give the definition of the weighted local outlier factor of an object P based on the reachability distance of an object as follows:

Definition 14

k-weighted local outlier factor of an object P (Breunig et al. 2000)

$$ WLOF_{k} (P) = \frac{{\frac{1}{k}\sum\limits_{Q \in kw(P)}^{{}} {wlrd_{k} (Q)} }}{{wlrd_{k} (P)}} $$

(11)

According to Definition 14, we can get the k-weighted local outlier factor of each of the subsequence objects P, and the larger the value of the k-weighted outlier factor, the larger the anomaly. From here on this will simply be referred to as the weighted outlier factor, where it is dependent on a constant k.

2.2 Anomaly detection algorithm based on weighted local outlier factor

2.2.1 Selection of important points

Based on Definition 4, we present pseudo-code for selecting important points, as shown in Algorithm 1. Therefore, we segment the time series into g − 1 segments using the g important points. The description of this method is as follows.

2.2.2 A new method based on weighted local outlier factor

The proposed anomaly detection algorithm is based on the weighted local outlier factor as shown in Algorithm 2. It involves the following main steps:

Step 1 Uniform scaling. This operation can enlarge or shrink data points by scaling them into the range of 0 and 1.
Step 2 Smooth the data using the locally weighted scatter plot smoothing. In order to find the extreme points, we must smooth the original data set to avoid finding too many extreme points.
Step 3 Selection of important points. We select the important points according to Formula 1 and Formula 2 in Definition 4. The selection of important points is shown in Algorithm 1.
Step 4 Compute the features of subsequences. (1) The maximum angle of the subsequences, (2) the number of important points in the subsequences, (3) the average of the subsequences, and (4) the maximum difference between values of important points of the subsequences.
Step 5 Compute the weighted local outlier factors. Here we compute the weighted local outlier factors of each subsequence based on Definition 14. And then we rank these weighted local outlier factors and the larger the value of the k-weighted outlier factor, the larger the anomaly.

At the end of the process, the weighted local outlier factor of each subsequence is outputted; the larger values of the weighted outlier factors represent larger anomalies. We will show the sample largest values of the weighted outlier factor of subsequences over different data sets in Sect. 3.

2.2.3 Metrics for measurement

Huang (2013) introduced two metrics, which will be used in this study to measure the performance of the anomaly detection algorithms. Suppose the dataset D of n objects contains d_k true anomalies. We use our proposed method to find anomalies that would be ranked within the top 10. Let m_k be the number of true anomalies which are detected by our proposed method in D. Then, we define the accuracy measure of anomaly detection as follows:

$$ {\text{Accuracy}} = \frac{{m_{k} }}{{d_{k} }} $$

(12)

The second measure is called “RankPower” also introduced in Huang (2013). Suppose R_i denotes the rank of the ith true anomaly. Then,

$$ {\text{RankPower}} = \frac{{m_{k} (m_{k} + 1)}}{{2\sum\limits_{i = 1}^{{m_{k} }} {R_{i} } }} $$

(13)

Larger values of the two metrics imply better performance.

3 Experimental results

Since we are using the sliding window method to obtain the subsequences, we need to set several parameters before conducting an evaluation. We obtain the maximum anomaly values by searching from a minimum value of k = 5 to maximum k = 20 with a step = 1 for the proposed k-weighted local outlier factor method. We use the important points to segment the time series for piecewise linear representation. In Sect. 3.1 we vary the number of important points to evaluate the effect of the piecewise linear representation, and set it to 10% of the length of the time series in Sect. 3.2. The sliding window method needs to specify the size of window. Here we set the window sizes to be larger than the time period of the system in time series data in order to find anomalies. We also did the comparison experiments for 50% smaller and 50% larger than our selected window sizes in Sect. 4. In terms of selecting the extreme points and additional points, we set the parameter β with a value of 1/2.

The experiments start by obtaining the subsequences and selecting important points with the parameter β, by sliding a window of length w across the time series T and then obtaining the features for each of the subsequences, and finally computing the weighted local outlier factor for each subsequence. Note that the index of subsequences goes from 1 to (n − w) + 1. The experiments using the piecewise linear representation is based on important points on the 17 data sets as shown in Table 1, which were downloaded from the website (www.cs.ucr.edu/~eamonn/).

Table 1 Compared results of fitting error

Detecting anomalies in sequential data augmented with new features

Abstract

Similar content being viewed by others

Data-Driven Pattern Identification and Outlier Detection in Time Series

Show Me Your Friends and I’ll Tell You Who You Are. Finding Anomalous Time Series by Conspicuous Cluster Transitions

A Review of Time-Series Anomaly Detection Techniques: A Step to Future Perspectives

Explore related subjects

1 Introduction

2 Methodology

2.1 Notation

2.1.1 Time series and subsequences

Definition 1

Definition 2

2.1.2 Anomalous features of a subsequence

Definition 3

Definition 4

Definition 5

Definition 6

Definition 7

Definition 8

Definition 9

Definition 10

2.1.3 A weighted local outlier factor method

Definition 11

Definition 12

Definition 13

Definition 14

2.2 Anomaly detection algorithm based on weighted local outlier factor

2.2.1 Selection of important points

2.2.2 A new method based on weighted local outlier factor

2.2.3 Metrics for measurement

3 Experimental results

3.1 Experimental results of piecewise linear representation based on important points (PLR_IP)

3.2 Anomaly detection in electrocardiograms

3.3 Anomaly detection in space telemetry

3.4 Anomaly detection in patients’ respiration

3.5 Anomaly detection in aerospace data

4 Discussion

5 Conclusion and future work

References

Acknowledments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation