Keywords

1 Introduction

With the rapid proliferation of closed circuit television cameras, satellites and mobile devices, massive trajectory based on different kinds of moving objects such as people, hurricanes, animals and vehicles  [11] have been generated (Fig. 1). Undoubtedly trajectory data analysis plays a vital role and abnormal trajectory detection is one of the most key issues for this topic. There are more mature methods based on distance similarity  [10]. Although it is relatively high in complexity in terms of large-scale data, it works well in small and medium-scale trajectory anomaly detection. The more famous method, Sequential conformal anomaly detection in trajectories based on hausdorff distance (SNN-CAD) method, is proposed by Laxhammar et al.  [6]. Their detection method is mainly based on the conformal prediction (CP) theory  [8]. Firstly, the Hausdoff distance  [1] is used to calculate the distance between the trajectories as the trajectory similarity. Then Non-conformity Measure (NCM)  [9] is given in the light of K-Nearest Neighbor  [2], and finally uses conformal prediction detection theory to determine whether the trajectory is abnormal. However, the NCM can not distinguish the abnormal trajectory very well, and the detection accuracy is not high.

Fig. 1.
figure 1

A display of aircraft trajectories (left) and synthetic tracjectories (right). Red trajectories are abnormal and black trajectories are normal. (Colour fiugre online)

In the view of this, we take into account the new NCM, Mean Distance Deviation (MDD), and present a removing-updating strategy to enhance conformal anomaly detection. Accordingly we propose the Mean Distance Deviation based on Enhanced Conformal Anomaly Detector (MDD-ECAD), which can deal with trajectory anomaly detection very well.

Also imporantantly, in this paper, we propose a new distance measure by improving Euclidean Distance (ED), which is called Improved Moved Euclidean Distance (IMED). It can characterize the trajectory distance efficiently. What is more, IMED does not require that the length of the trajectories must be same and its computational complexity is small.

The rest of this paper is organized as follows. Section 2 introduces the relevant background knowledge of our work. Section 3 presents the details of our method MDD-ECAD. Experimental data and results are described in Sect. 4. Finally, the paper is concluded in the Sect. 5.

2 Background

In this section, we will introduce the basic concept of trajectory and the specific details of CP theory.

2.1 Trajectory Type

In general, a trajectory data we study in this paper is a sequence of coordinate points in Cartesian coordinate system. Speaking ahead of time, a trajectory can be simply represented as \( T = (a_{1},a_{2},\cdots ,a_{n})\).

2.2 Conformal Prediction

Conformal prediction (CP) makes use of the past of experience to determine precise levels of confidence in new prediction. Generally speaking, assume a training data \(\left\{ \left( x_{1},y_{1} \right) ,\left( x_{2},y_{2} \right) ,\cdots ,\left( x_{l},y_{l} \right) \right\} \) where \(x_{i}\left( i = 1,\cdots ,l \right) \) is the input data, that is some data observed or collected by some means. And \(y_{i}\left( i = 1,\cdots ,l \right) \) is the output data, that is the label predicted by some method. For exame, \(x_{i}\) is a trajectory data collected by sensor and \(y_{i}\) is the label with only abnormal or normal type in the trajectory anomaly detection. Given a new observed data \(x_{l+1}\), the basic idea of conformal prediction to estimate the p-value \(p_{l+1}\) of \(x_{l+1}\) by designed NCM according to training data. Finally, the \(p_{l+1}\) is compared with the pre-defined threshold \(\epsilon \) to determine the label of \(x_{l+1}\).

If \(p_{l+1}< \epsilon \), \(x_{l+1}\) is identified as conformal anomaly. Otherwise, \(x_{l+1}\) is determined as normal. However, the key to estimate the p-value of the new example is how to design effective NCM. Next, we will introduce the concept of a Non-Conformity Measure (NCM) whose purpose is to measure the difference between the new example and a set of observed data.

Formally, NCM is a mathematical function. We can get a score \(\alpha _{i}\) about the difference between the example \(x_{i}\) and the rest of dataset by a certain NCM. The score of \(x_{i}\) is given by

$$\begin{aligned} \alpha _{i} = A\left( X_{j\ne i},x_{i} \right) \end{aligned}$$
(1)

where X is a set of data; \(x_{i}\) is a example of dataset X; A(.) is a form of NCM.

Based on formula (1), the score \(\left( \alpha _{1},\alpha _{2},\cdots ,\alpha _{l+1} \right) \) is gained. Then the p-value of \(x_{l+1}\), \(p_{l+1}\), is determined as the ratio of the number of trajectories that have greater or equal nonconformity scores to \(x_{l+1}\) to the total number of trajectories. The p-value is defined as follows:

$$\begin{aligned} p_{l+1} = \frac{\left| \left\{ \alpha _{i}|\alpha _{i}\ge \alpha _{l+1},1\le i\le l+1 \right\} \right| }{l+1} \end{aligned}$$
(2)

where \(\left| \left\{ \cdot \right\} \right| \) computers the number of elements in the set. CP will estimate a set of p-value to predict the lable of the new example and work excellently by using an effective NCM, especially when \(\epsilon \) is close to the proportion of abnormal data in the dataset.

The Sequential Hausdorff Nearest Neighbor Conformal Anomaly Detector (SNN-CAD) method was developed by laxhammar et al.  [6]. Their main contribution is to use Hausdorff distance to calculate the trajectory distance and use k-nearest neighbor as NCM. Suppose there are two sets of \( T_{a} = \left\{ a_{1},a_{2} ,\cdots ,a_{m} \right\} \), \( T_{b} = \left\{ b_{1},b_{2} ,\cdots ,b_{n} \right\} \). The Hausdorff distance can refer to this article. As for NCM, it is defined as follows:

$$\begin{aligned} \alpha _{i}= \sum _{T_{b}\in Neig(T_{a})}d\left( T_{a},T_{b} \right) \end{aligned}$$
(3)

Where d(.) is a kind of tracjectory distance, Neig (\(T_{a}\)) represents the k-nearest neighbor of \(T_{a}\).

3 Our Method

3.1 Improved Moved Euclidean Distance

In order to measure the distance between two trajectories effctively, researchers have put forward various methods to calculate the distance. The most commonly used and famous ones are ED, HD, and DTW. However, comparing the advantages and disadvantages of the above three distances, we come to the following conclusions: (1) DTW and HD can handle the unequal length trajectory data, but the computational complexity is too high to deal with large and medium-sized data. (2) ED calculates the trajectory distance quickly with the simple implementation, but it can not do anything for the unequal trajectory data.

After our discussion, we can’t help thinking about how to calculate quickly and deal with unequal data. For this purpose, based on ED, we propose a new distance measure Improved Moved Euclidean Distance (IMED) to enlarge the difference between trajectories for better performing trajectory anomaly detection. The proposed distance measure can manage both equal and unequal length trajectories. The basic idea is to fix the longer tracjectory, moving the shorter tracjectory backward until the longer tracjectory is completely matched. Given two trajectories, \( T_{a} = \left\{ a_{1},a_{2} ,\cdots ,a_{m} \right\} \), \( T_{b} = \left\{ b_{1},b_{2} ,\cdots ,b_{n} \right\} \). Assuming \(n \ge m\),the IMED is defined as follows:

$$\begin{aligned} d_{IME}(T_{a},T_{b}) = \frac{\sum _{j = 0}^{n-m}\sqrt{\sum _{i = 1}^{m}\left\| b_{i+j}-a_{i} \right\| ^{2}}}{n-m+1} \end{aligned}$$
(4)

especially, when n = m, \(T_{a}\) and \(T_{b}\) have the same trajectory length.

Fig. 2.
figure 2

An example of our IMED for unequal lengths.

3.2 Mean Distance Deviation

An appropriate NCM is very critical and widely used for general anomaly detection. Generally, if a trajectory is similar to its neighboring trajectories, we can think that it is normal. Otherwise, if a trajectory is not the same as the trajectories around it, we can judge that the trajectory is abnormal. Actually, the employment of the local neighborhood is a fundamental consideration widely used in many anomaly detection methods, such as the classic KNN. In SNN-CAD, they use the sum of k-nearest neighbors of a trajectory as an indicator of comparison with other trajectories. The larger the value of KNN, the greater the difference between the behavior of the trajectory and the surrounding trajectories. However, it is not ideal to use KNN to judge whether the trajectory is abnormal. For this reason, we propose a new NCM, Mean Distance Deviation (MDD). It is proved by the later experimental data (Sect. 5) that this method is much better than KNN in trajectory anomaly detection. Now we will give its specific definition (Fig. 2):

$$\begin{aligned} \alpha \left( T_{a} \right) = \sqrt{\frac{\sum _{T_{b}\in Neig(T_{a})}(MD(T_{a})-MD(T_{b}))^{2}}{k}} \end{aligned}$$
(5)

where

$$\begin{aligned} MD = \frac{\sum _{T_{b}\in Neig(T_{a})}d(T_{a},T_{b})}{k} \end{aligned}$$
(6)

3.3 Removing-Updating Strategy

The process of anomaly detection based on CP is to calculate the p-value of each data, and then compare with the given threshold value \(\epsilon \) to determine whether the trajectory is abnormal. Because the abnormal data detected last time will interfere with this detection, we propose a removing-updating strategy to CP.

Specifically, when calculating the p-value of all data, the most abnormal data will be removed. Then update the threshold and repeat the above process with the remaining data until the threshold is 0.

4 Experiment

In this section, in order to evaluate the effect of MDD-ECAD and IMED, we compared MDD-ECAD algorithm with SNN-CAD algorithm, as well as several distances IMED, HD and DTW based on the synthetic data and real life data.

4.1 Data Sets

Synthetic tracjectories I  [5] presented for anomaly detection is created by laxhammar et al.  [6] using the trajectory generator software. It includes 100 datasets with 2000 trajectories in each dataset, about \(1\%\) of which are abnormal trajectories. In addition, each trajectory is composed of a series of two-dimensional coordinate points. To expand the dataset for experiment, we use another Synthetic tracjectories  [3] including synthetic trajectories II, synthetic trajectories III, synthetic trajectories IV. The three synthetic trajectories each contain 100 trajectory datasets with \(\epsilon \) equal to 0.05, 0.01, and 0.02. And each dataset has 2000 trajectories with the number of sample points ranging from 20 to 100.

Aircraft trajectories  [7] has in all 470 two-dimensional trajectories, involving 450 normal and 20 abnormal ones. And the trajectory length in the set varies from 12 to 171 sampling points.

4.2 Performance Measure

Trajectory anomaly detection is also two classification problem. Therefore, we can use the following evaluation indicators: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Precison (P), Recall (R) and F1 are used to test the classification accuracy. F1 score is used to evaluate the effect of all experiments. The larger F1 value is, the better the algorithm effect is.

$$\begin{aligned} P = \frac{TP}{TP+FP},\,R = \frac{TP}{TP+FN},\,F1 = \frac{2*P*R}{P+R} \end{aligned}$$
(7)

4.3 Experimental Results and Analysis

Table 1. The F1 results (%) of synthetic trajectory datasets with two methods.
Table 2. The F1 results (%) of real-life trajectory datasets with differnt methods.

In the experiment, we mainly compare the performance of SNN-CAD and MDD-ECAD. It can be seen from Table 1 that the F1 of MDD-ECAD is 89.11%, 86.35%, 90.76% and 92.31% respectively, higher than that of SNN-CAD. For testing our method on complex real life data, Table 2 shows MDD-ECAD still outperforms SNN-CAD and iVAT+  [4]. The F1 of MDD-ECAD is as high as 95%, while the SNN-CAD is only 75% and iVAT+ is 90%. The reason why SNN-CAD doesn’t perform excellent may be that its NCM can’t amplify the abnormal tracjectory behavior greatly, and the MDD we used can make up for this defect very well. In addition to the problem of detection framework, we use the removing and updating strategy to avoid the secondary interference of obvious abnormal trajectories to others.

Table 3. The F1 results (%) of MDD-ECAD with different distance measures.
Table 4. Runtimes (s) of MDD-ECAD with differnt distance measuers.

In order to compare the performance of IMED, HD and DTW, we use the three distances in MDD-ECAD method. Table 3 shows that the F1 of IMED is higher than HD and DTW, which indicates that IMED can measure the distance between trajectories better. In addition, the running time of IMED, HD and DTW is given (see Table 4), and it is obvious that the running time of IMED is fewer. From the theoretical analysis, IMED has the minimal computational complexity and no doubt runs the fastest. The experiment just verifies this point.

5 Conclusion

In this paper, in order to improve performance of SNN-CAD, we propose a new method to calculate the trajectory distance. An excellent Non-conformal measurment and a removing-updating strategy are also used for our anomaly detector. Large number of experimental data shows that our detector is better than SNN-CAD.