1 Introduction

Today, with the development of positioning equipment, such as GPS (Global Position System), Wi-Fi, Video surveillance as well as WSN (Wireless Sensor Network), more and more trajectory data are collected and stored in moving object database. These data contain a great deal of knowledge, which need an efficient and effective analysis. With the increasing requirement of LBS (Location-based Service), the ways to discover useful knowledge from large-scale trajectory data have been being given more and more attention. The main tasks of trajectory mining include trajectory clustering (Yuan et al. 2012b, 2017a), trajectory classification (Anagnostopoulos and Hadjiefthymiades 2014; Lee et al. 2008b), Trajectory prediction (Qiao et al. 2015a, b, 2010), trajectory pattern mining (Yuan et al. 2013, 2017b), and outlier detection (Yuan et al. 2011, 2012a). Trajectory data is called spatial-temporal data early, for it is extracted from surveillance video record by camera with the form of time and position (Masciari 2011). Over the last few years, benefiting from positioning equipment and location tracing device, mobile objects of all sizes, no matter it is a tiny cellphone or a giant ocean liner, can be easily tracked across globe, such as surveillance video (Piciarelli and Foresti 2006, 2007; Piciarelli et al. 2008), WSN (Shahid et al. 2015; Wang et al. 2014), Wi-Fi (Sapiezynski et al. 2015), GPS-liked equipment (Lee et al. 2008a), and other devices (Costa et al. 2011). There are many scenarios that moving objects are traced for various purposes. For example, zoologists attach remote sensing facility to animals to study their ecology behavior (Lee et al. 2007, 2008a; Li et al. 2010). Traffic planners install GPS-equipped devices in vehicles for better traffic monitor and routing guide (Li et al. 2009). Meteorologists use satellite and radar to measure the typhoon (Huang et al. 2011). The collected massive trajectory data needs effective algorithms to deal with, which can greatly promote trajectory data mining research and related applications.

An outlier means a data object that is grossly different from or inconsistent with the remaining set of data (Han et al. 2011). The trajectory outlier means a trajectory that has great local or global difference with most other trajectories in terms of some similarity metric (Zhu et al. 2017). Outlier detection is a significant topic in data mining. Therefore, the task of outlier detection is to find some data objects which are different and to what extent they are different from the most others in the dataset. In some applicant fields, such as trajectory compression (Sun et al. 2016), it is different to distinguish the noise and outlier. Outliers are different from noise, for the latter is impossible data but exists in a database or a data set and the former is surprising data. Finding those surprising data are more meaningful and important in data mining field. For example, finding abnormal trading in financial transactions can detect fraudulent trading, money laundering and other financial crimes. Discovering abnormal trajectories in a large-scale dataset also has remarkable meaning and practical value. A common example is meteorology typhoon monitoring. If we are able to detect sudden changes in a typhoon track, such as the change of direction, we can issue an early warning as soon as possible to reduce casualties and property losses. More than that, trajectory outlier detection plays a wide and important role in fields of traffic monitoring, automatic navigation, border security and so on.

The main framework of this paper is given in Fig. 1 with parts of ideas borrowing from Li et al. (2011a, b) and Zheng (2015). As described in Fig. 1, trajectory metadata of many kinds of moving objects are collected and stored in various formats. Before they can be used for analyzing, data clean must be done to smooth and structure data to uniform. Then all data are store into moving object trajectory database. Combined with related semantic knowledge, related algorithms and distance metrics, various trajectory outliers can be detected. Visualization is one of the indispensable components for a trajectory mining system, and it can support intuitionistic data display and interaction. Finally, detected outliers can be applied in many application domains to analyze the pattern and other information of data in different application scenarios.

Fig. 1
figure 1

Framework of this work

In order to provide an overview on moving object trajectory outlier detection, this paper surveys, summaries, and discusses some state-of-art outlier detection algorithms. The organization of this paper is as follows: the basic idea and the aim of trajectory outlier detection are introduced in Sect. 1. In Sect. 2, some related work of outlier detection are described and discussed. In Sect. 3 we briefly expound the preliminaries of trajectory outlier detection, including trajectory definition and trajectory distance metrics. Section 4 summaries and categories trajectory outlier detection algorithms that have been published in the open literature. In Sect. 5, some typical application scenarios are listed to show the application of trajectory outlier detection. Finally, Sect. 6 draws the conclusions of our work and points out some future possible research directions of interests.

2 Related works

Outlier detection is a very meaningful and important research topic and has been being applied in many application fields. In each field, outlier may be represented with different symbols. For example, outlier, is described as event in video processing. Meanwhile it is called fault in machine sensor data mining. Also it is represented by risk in bank account analysis.

2.1 Traditional outlier detection algorithms

There are many supervised, semi-supervised and unsupervised techniques have been used for outlier detection, including classification based methods, clustering based methods, density based methods, statistical based methods, and so on (Gupta et al. 2013).

Classification based methods learn a classification model (classifier) using a set of data objects during the training phase and then classify the data object to one of the training classes during the testing phase. These techniques can be either supervised or unsupervised (Shahid et al. 2015). Classification based outlier detection algorithms assume that a classifier can be learnt from a given feature space that can distinguish between normal and outlier classes. Generally, classification based outlier detection algorithms can be divided into one-class and multi-class two categories. The former assumes that all training data objects are with a single class label. A discriminative boundary can be learnt using proper one-class classification algorithms to find normal data objects, and any data objects which don’t fall within the learnt boundary will be considered as outliers. While the latter assumes that the labeled training data may belong to multiple normal classes. Classifiers have to be learnt to distinguish which class these data truly belongs to. A test data will be considered as an outlier if it is not classified as a normal data by any classifiers. Anagnostopoulos and Hadjiefthymiades (2014) treated the problem of movement prediction as a classification task and assumed the existence of a gradually trained knowledge base. Based on the analysis, they tried to compare the movement pattern of a certain object with stored information in order to predict its future locations. Kokkula and Musti (2013) provided a formal framework for characterizing the class imbalance and outlier elimination, and found that the classification accuracy of minority class increases in the presence of synthesized patterns. Shahid et al. (2015) surveyed characteristics and classification of outlier detection techniques for wireless sensor networks in harsh environments, and gave some possible open research areas on outlier detection. Sillito and Fisher (2008) proposed a novel learning framework to find anomalous behavior in video data. In their work motion trajectories of pedestrians are taken full consideration and the control points of approximating cubic spline curves are used to parameterize trajectories. Then anomalous behavior can be found in unlabeled trajectories combined with occasional examples of normal behavior using an incremental semi-supervised one-class learning algorithm.

Clustering based methods are used to discover object groups which are strongly related, while outlier detection is used to discover objects that are not strongly related to other objects. In the results of clusters, those far from others or with lest relevant to others may be marked as outliers. Therefore, clustering can be used to detect outliers. There many clustering algorithms can be used to find outliers in dataset, such as K-means, DBSCAN, STING as well as BRICH (Yuan et al. 2012a, 2017a). Liao (2005) pointed out that outliers are the by-product of clustering. Piciarelli and Foresti (2006) proposed an online trajectory clustering algorithm with clusters organized in a tree-like structure. The detected cluster can be used to identify anomalous events for video surveillance system. Lee et al. (2007) proposed a DBSCAN liked clustering algorithm to cluster trajectory segments, while some other segments which are not as similar as the most of segments, are marked as outliers. In order to generate high-quality features and achieve high classification accuracy, Lee et al. (2008b) proposed a feature generation framework to capture both higher-level and lower-level features from moving object trajectory data using hierarchical region based and trajectory based clustering algorithms. The time and space complexity of some clustering techniques (such as K-means) are linear, so the outlier detection technique based on such algorithms is efficient. And labels are not required when detecting outliers, which means they are unsupervised. But each clustering algorithm is only suitable for a particular dataset, so appropriate clustering algorithms should be selected carefully.

Distance based methods are with the basic idea of that if a data object is far away from most of objects, it probably be an outlier. The distance based methods are more extensively used than the statistic based methods as they are easier to determine or design meaningful distance measures of a dataset than to determine the statistical distribution models. One of the easiest ways to measure whether an object is far away from the most objects is to use the k-nearest neighbor distance, that is, the outlier score of an object is determined by its k-nearest neighbor distance (Hubballi et al. 2011). The advantage of the distance-based outlier detection methods lies in it’s simple to implement. However, the worst time complexity of this kind algorithms may arise to \(O (m^{2})\), which may be so cost that can’t deal with large volume data. Moreover, these methods are also sensitive to parameters. Therefore distance based methods cannot handle multi-density datasets as they use global thresholds and ignore change in density (Knorr et al. 2000).

Density based methods are closely related to distance based outlier detection methods. Because density is usually defined by the distance, especially Euclidean distance. From the view point of density, the outliers are objects in low density regions. The outlier detection methods based on relative density can process multi density dataset effectively. The time complexity of this kind of methods may reach as high as \(O(m^{2})\), which is comparable to the distance-based approaches. Han et al. (2011) introduced various density-based outlier detection methods according to the difference in density calculation. Liu et al. (2013) proposed a density-based trajectory outlier detection algorithm to compensate the disadvantages of the TRAOD algorithm (Lee et al. 2008a), which is unable to detect anomalous defects when the trajectory is local and dense. Cao et al. (2014) proposed a new density-based local outlier concept based on uncertain data to quickly detect outliers.

Statistic based methods are also called model-based methods, which may use existing models or build a model for evaluating the degree that the data objects fit for the model(s). The most commonly used data models include Gaussian distribution (Shaikh and Kitagawa 2014), multivariate Gaussian distribution (Hazel 2000). Outlier detection using statistic based methods can be quite effective and efficiency when there is sufficient data and priori knowledge. However, this kind of methods are no so ideal when applied to high-dimensional data.

Beside the methods discussed above, there are also some other methods play important roles in the fields of outlier detection, such as nearest neighbor based methods and information theory based methods. Ramirez-Padron et al. (2010) addressed the problem of determining which class of kernels should be used in a geometric framework for nearest neighbor-based outlier detection. Then they introduced the class of similarity kernels and employs it within that framework and proposed the use of isotropic stationary kernels for the case of normed input spaces. Daneshpazhouh and Sami (2014) present a two-phase entropy-based solution to conquer the difficulties that no enough labeled data for training and only a few positive labeled samples are available. Firstly, reliable negative examples are extracted from positive and unlabeled data and then, as the second phase, the entropy-based outlier detection algorithm is employed to detect top N outliers.

2.2 Outlier detection on different data

As aforementioned, outlier detection is widely used in many kinds of fields. With each kind of application, outlier detection algorithms should work on different dataset, such as video data, stream data, sequence data as well as trajectory data.

Video cameras are commonly used devices to capture the motion pictures of moving objects. Due to the high dimension of video data, it is a challenge to find abnormal events in video data. In order to solve the over fitting problem which previously defined Hidden Markov Model (HMM) based similarity may fall in, Jiang et al. (2007) proposed a multi-sample-based similarity measure, where HMM training and distance measuring are based on multiple samples. These multiple training data are acquired by a novel dynamic hierarchical clustering method. By iteratively reclassifying and retraining the data groups at different clustering levels, the initial training and clustering errors due to over fitting will be sequentially corrected in later steps. Aiming at learning motion patterns and detecting anomalies from human trajectories in video data, Suzuki et al. (2007) employed HMM to model spatial and temporal features of human positions in trajectories. In order to solve the problem of high-dimensional, they provided Multi-dimensional Scaling based on eigenvector decomposition to project coordinates of trajectories in low-dimensional space. Then k-means clustering was applied to the projected data to acquire human motion patterns with a formed similarity matrix of HMM mutual distances. Finally, abnormal trajectory can be detected by the use of likelihood scores for HMM representing motion patterns. Outlier detection in video surveillance field is mainly used to find the abnormal moving pattern in the motion pictures. Generally, with moving object matching and other technologies, moving objects’ trajectories can be derived from video data. Therefore, methods of trajectory outlier detection also can be used in this kind of data.

Fault detection and classification is very important for equipment running guarantee and equipment testing in all kinds of areas. Many researchers have paid much attention in finding more effective and efficiency technologies to detect faults of the equipment from their running data. Silva et al. (2006) proposed a novel method for transmission-line fault detection and classification using oscillographic data. The fault detection and its clearing time are determined based on a set of rules obtained from the current waveform analysis in time and wavelet domains. Then an artificial neural network was employed to classify the fault from the voltage and current wave forms pattern recognition in the time domain. Chung and Chung (2015) used Self-Organizing Map (SOM) to discover the base models from the automation system. Then strategies based on data mining were developed to detect and diagnose the faults. Finally, a semi-supervised anomaly detection technique was introduced to construct the condition monitoring system on the basis of classification rules based on standardized data and domain experts’ knowledge. In order to improve the reliability of fault diagnosis, Wang et al. (2013a) developed an approach of merging short-time Fourier transform (STFT), linear fitting and median filtering is to extract the total energy growth rate from rolling bearing during run-up, and the total energy growth rate was adopted to diagnose faults as input vector of radial basis function (RBF) neural network. In the field of fault detection and classification, data collected from the running equipment are often viewed as signals. Moreover, due to importance of detecting fault in time, faults are often detected and classified online using Fast Fourier Transform (FFT) (Wang et al. 2013b) wavelet analysis as well as artificial neural network.

Stream data are very common in nowadays, which means data output constantly from some kinds of data source. For example, continuous video stream, unstopped equipment running data and other data output without interruption. As discussed above, a lot of researches have been being down to optimize or improve the outlier detection in each field. However, tradition algorithms cannot be used in stream data directly, this problem is challenging due to the requirement of high speed data processing within limited space cost (Bu et al. 2009). Facing with the continuous coming video data, Piciarelli and Foresti (2006) proposed an online trajectory clustering algorithm to avoid the classical two-step clustering (data collection and off-line processing). The detected cluster can be used to identify anomalous events for video surveillance system. To improve outlier detection in high-dimensional data stream, Piao et al. (2014) proposed a novel high-dimensional data stream outlier detection algorithm based on angle distribution. Their work can effectively identify the normal point, border point and outlier accurately. Besides, a small-scale calculation set of data stream was established to reduce the computational complexity, and an updated mechanism was developed to solve the problem of concept drift. Said et al. (2015) concentrated on the detection of outliers in data stream using frequent pattern mining technique and proposed an adaptive method for finding outliers in stream of data with a presented outlier measurement. Since the era of big data and artificial intelligence is coming, in the coming future, outlier detection in stream data will become more and more attractive. High processing speed and accuracy with low cost will be the evaluation benchmark of proposed works.

3 Basics of trajectory outlier detection

Most of the works discussed in Sect. 2 are related to trajectory data analysis. However, with the widely used positioning device, all kinds of moving objects can be traced theoretically and their trajectory can be collected and stored. Different from video data and data format in other application fields, trajectory is a sequence data associated with both spatial and temporal information. Moreover, the sequence changes continuously which bring great challenges of storage and analysis. In order to comprehensively discuss the outlier detection for trajectory, in this section, we firstly give the definition of trajectory. Then some commonly used distance metrics are introduced.

3.1 Trajectory definition

A trajectory represents a moving object’s continuous motion history in theory, and it refers to the fine-grained location traces collected via a variety of devices (Ge et al. 2010). Due to the limitation of location positioning devices (such as sensors, GPS devices), a trajectory in real world is a sequence of positions observed at discrete time instances.

Given TD as Trajectory Database which denotes trajectories set, and \(TD= \{TR_{1}, TR_{2}, {\ldots } , TR_{n} \}\). A trajectory (TR) is a chronological sequence consisted of multi-dimensional points, which is denoted by \(TR_{i}= \{P_{1}, P_{2}, {\ldots } , P_{m}\}({1 \le i \le n}). P_{j} (1 \le j \le m)\). A sampling point \(P_{i}\) in a TR is represented as \(< Location_{j}\), \(T_{j}>\), which means that the position of the moving object is \(Location_{j}\) at the sampling time of \(T_{j}\). \(Location_{j }\) is a multi-dimensional location point. A trajectory \(P_{c1}\), \(P_{c2}, {\ldots }, P_{ci} (1\le c1< c2 < {\ldots } \le m)\) represents a trajectory segment or sub-trajectory of a trajectory \(TR_{i}\), denoted as TS(Trajectory Segment), \(TS_{i}=\{L_{i1}, L_{i2}, {\ldots }, L_{inum}\)}.

Generally a raw trajectory point is a 2-dimensional location coordinate (x and y) or a 3-dimensional coordinate with additional height information (h). Conveniently, we assume trajectory point is 2-dimensional coordinate.

3.2 Trajectory distance metrics

Appropriate distance metrics are fairly significant for any distance based outlier detection methods, which are not only limited to the application of trajectory outlier detection field but also play an important role in video surveillance, trajectory clustering and other fields. Predecessors have conducted a lot of research work on distance metrics when analyzing trajectories (Zhang et al. 2006). In this subsection, we will take a brief look at several widely used trajectory distance metrics, including Euclidean distance, Hausdorff distance, Longest Common Subsequence (LCSS) distance and Dynamic Time Warping (DTW) distance. These metrics are proposed for many years and demonstrated be effective and efficiency in many literatures (Table 1).

Table 1 Summary of trajectory distance measurements

3.2.1 Euclidean distance

Euclidean distance is the classical distance metric that has a long history and has a wide range of use in the field of data mining, and can be extended to trajectory analysis. The linear time complexity makes it capable to handle a great quantity of data, which is rather important in trajectory mining. Euclidean distance is very important in moving object data mining, due to its simple implement and parameter free (Clarke 1976; Yuan et al. 2017a, b). Let \(L_{i}\) and \(L_{j}\) are p-dimensional trajectory segments with length of n. Their Euclidean distance denoted as \(D_{E}\) is given in (1).

$$\begin{aligned} D_E (L_i ,L_j )=\frac{1}{n}\sum _{k=1}^n {\sqrt{\sum _{m=1}^p {(a_k^m -b_k^m )^{2}} }} \end{aligned}$$
(1)

The complexity of the algorithm is O(n), making it cost less time on the same dataset compared with other distance measures. However, the Euclidean distance of two trajectories is different from it between raw points. Firstly, the coordinate of each sampling point in the trajectory must be the same dimension. Secondly, the two sampling points with their Euclidean distance to be calculated in two trajectories should be the corresponding positions (at the same time). Thirdly, the Euclidean distance and distance measures can be synthesized in the form of average or sum, maximum, minimum, etc. Finally, two trajectories to be measured must be the same length. However, in real application, the points in trajectories may not be sampled at the same time. Moreover, the two trajectories are often of different length, which limits the use of Euclidean distance. Besides, due to parameter free, the Euclidean distance cannot treat noise well. In order to overcome the shortages of Euclidean distance, many other distance measures are proposed.

3.2.2 Hausdorff distance

Hausdorff distance is used to measure how far two trajectories (segments) are from each other. Informally, two trajectories are close in the Hausdorff distance if every sampling point of either trajectory is close to some points of the other one. Hausdorff distance (Rote 1991; Huttenlocher et al. 1993) denoted as \(D_{H}(L_{i}, L_{j})\) is given in (2).

$$\begin{aligned} D_H\left( {L_i ,L_j } \right) =\max (h(L_i ,L_j ),h(L_j ,L_i )) \end{aligned}$$
(2)

where \(h(L_i ,L_j )=\mathop {\max }\limits _{a\in L_i } (\mathop {\min }\limits _{b\in L_j } (dist(a,b)))\)

In the formula, \(h(L_{i}\), \(L_{j})\) is the direct Hausdorff distance of \(L_{i}\) and \(L_{j}\), and dist(a,b) is the Euclidean distance between sampling points a and bin \(L_{i}\) and \(L_{j}\) respectively. The dist(ab) can be viewed as a simple form of \(D_{E}\). \(h(L_{i}\), \(L_{j})\) and \(h(L_{j}\), \(L_{i})\) are the bidirectional Hausdorff distances between \(L_{i}\) and \(L_{j}\). The sampling point identified from \(L_{j}\) which is nearest to each point in \(L_{i}\), and the point identified from \(L_{j}\) which is most mismatched with \(L_{i}\), determine the value of \(h(L_{i}, L_{j})\). Suppose \(h(L_{i}, L_{j})=d\), is the distance between all points in \(L_{i}\) and all points in \(L_{j}\) is no more than d. \(h(L_{j}, L_{i})\) is in the same way.

The Hausdorff distance between trajectory segments \(L_{i}\) and \(L_{j}\) selects the maximum unidirectional Hausdorff distance from \(L_{i}\) to \(L_{j}\) and from \(L_{j}\) to \(L_{i}\). It can better measure the maximum mismatching degree between two trajectory segments. Hausdorff distance can tolerate the influence caused by the disturbance of points, and have a better robust. But Hausdorff distance is sensitive to noise data. Suppose the length of \(L_{i}\) and \(L_{j}\) are m and n respectively. It can be seen from formula (2) that the complexity of the Hausdorff distance computation is \(\hbox {O}(m*n)\), which means that it requires a large time overhead to handle large-scale trajectory data.

Hausdorff distance is extremely sensitive to noise. For example, a sampling point far away from \(L_{i}\) will lead to quite great value of \(D_{H}(L_{i}\), \(L_{j})\). In this case, obtained distance is unable to represent real difference between the two trajectories. Therefore, several improved Hausdorff distance metrics have been proposed: Huttenlocher et al. (1993) raise the conception of partial Hausdorff distance. Dubuisson and Jain (1994) proposed means Hausdorff distance. Sim et al. (1999) proposed proposes two robust Hausdorff distance measures based on m-estimation and least trimmed square (LTS) which are more efficient than the traditional Hausdorff distance measures.

3.2.3 LCSS distance

Longest Common Sub-Sequence (LCSS) (Robinson 1990) is different from distance calculation and it is used to obtain the longest common sub-sequence existing in two trajectory sequences. The longest common subsequence is generally solved recursively, as shown in (3).

$$\begin{aligned}&D_L (L_i ,L_j )\nonumber \\ {}&\quad =\left\{ {{\begin{array}{c} 0 \\ {1+LCSS_{\sigma ,\varepsilon } (Head(L_i ),Head(L_j ))} \\ {\max (LCSS_{\sigma ,\varepsilon } (Head(L_i ),L_j ),LCSS_{\sigma ,\varepsilon } (L_i ,Head(L_j )))} \\ \end{array} }} \right. {\begin{array}{c} {n=m=0} \\ {|a_i^x -b_j^x |\le \sigma ,and\left| {a_i^y -b_j^y } \right| \le \varepsilon } \\ {others} \\ \end{array} } \end{aligned}$$
(3)

In formula (3), \(D_{L}(L_{i}\), \(L_{j})\) is the longest common subsequence of two trajectory segments \(L_{i}\) and \(L_{j}\) whose length are n and m respectively. Suppose the location of moving objects are recorded in 2-dimention. \(\delta \) and \(\varepsilon \) are user defined distance thresholds of the x-direction and y-direction respectively, which are used to control how far when looking for matches. When the abscissa difference and ordinate difference between two trajectories A and B is respectively less than \(\sigma \) and \(\varepsilon \), the pair of trajectory points is considered similar and the value of LCSS is increased by 1. If the number of points of trajectory \(L_{i}\) and \(L_{j}\) are both 0, then \(D_{L}(L_{i}\), \(L_{j})\) is 0.

LCSS allows certain deviation existing in sampling data. Therefore, the LCSS is effective and efficient in practical application. However, LCSS is over-reliance on two user parameters \(\sigma \) and \(\varepsilon \), so how to determine the two optimal parameters is a difficult problem. The complexity of LCSS computation is \(O(m*n)\).

3.2.4 DTW distance

Dynamic Time Warping (DTW) method (Sankoff and Kruskal 1983) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions. DTW is suitable for matching trajectories even if they are of different length. Its goal is to find the warping path w between two trajectories with the smallest warping cost. DTW distance is specifically defined as that in the case of ensuring the order of trajectory points. It completes the local scaling of time dimension by repeating the previous points, and makes the minimum distance between trajectories as DTW distance. DTW distance can be represented by (4).

$$\begin{aligned} D_D (L_i ,L_j )=\left\{ \begin{array}{cc} 0 &{} {m=n=0}\\ \infty &{} {m=0||n=0}\\ dist(a_i^k ,b_j^k )+\min \left\{ \begin{array}{cc} {D_D (Rest(L_i ),Rest(L_j ))} \\ {D_D (Rest(L_i ),L_j )} \\ {D_d (L_i ,Rest(L_j ))} \\ \end{array}\right. &{} {others}\\ \end{array}\right. \end{aligned}$$
(4)

Here, \(D_{D}(L_{i}\), \(L_{j})\) is the DTW distance between two trajectory segments \(L_{i}\) and \(L_{j}\) whose length are m, n respectively. dist(\(a_{i}\), \(b_{j})\)is the Euclidean Distance between two points \(a_{i}\) and \(b_{j}\). Rest(\(L_{i})\) and Rest(\(L_{j})\) are the remaining trajectory space after removing the first sampling point of trajectory segment \(L_{i}\) and \(L_{j}\). As given in formula (4), when the number of points of trajectories \(L_{i}\) and \(L_{j}\) both are 0, the DTW distance (\(D_{D})\) is 0. When the count of points in either trajectory is 0, the DTW distance is \(\infty \). When the number of trajectory points both are not 0, the DTW distance is the minimum distance between trajectories calculated by a recursive method.

DTW distance can better find similar trajectories after the local scaling of time dimension. It effectively solves the problems of different sampling rates and inconsistent timescales. But it requires trajectory data points must be continuous when calculating the DTW distance Therefore, DTW distance is sensitive to noise. In addition, if the two trajectories are completely dissimilar in a small range, the method cannot identify DTW distance. It can be seen from formula (4), the algorithm also requires a large time cost, and its complexity is \(\hbox {O}(m*n)\). Therefore it’s unsuitable for large-scale trajectory data processing. In order to improve the efficiency of DTW, Keogh and Pazzani (2000) put forward an improved algorithm PDTW, which makes it possible to deal with large amount of data.

3.2.5 Other distance

In order to find outliers in moving object database comprehensively, various metrics were proposed on the basis of traditional distance. For example, Yuan et al. (2011) proposed a structural distance including the comparison of direction, angle, speed, and density. The structural distance can measure two trajectories (segments) in the view of microcosm. In order to cope with the spatial difference of different trajectories, Liu et al. (2012) proposed a relative distance to compare trajectory segments.

4 Trajectory outlier detection algorithms

Trajectory outlier detection plays an important role in many fields, such as climate monitoring, transportation management. According to the well-known saying “one person’s noise could be another person’s signal”, the outliers may be of particular interest (Lee et al. 2008a). Knorr et al. (2000) introduced the concept of distance-based outlier, and present an algorithm to discovery outliers from trajectory dataset. His research is recognized as the first study to detect trajectory outlier. Since then, all kinds of abnormal trajectory detection algorithms have been proposed. Zheng and Zhou (2011) and Zheng (2015) gave some brief introduction on anomalies detection from trajectories. In this paper, we divide these researches into two categories: one is the anomaly detection algorithms based on trajectory analysis, and the other is trajectory outlier detection algorithms. The former is not the focus of our research and we just provide an overview in the related work section. While the latter is the core of our study, and we will investigate and summarize the representative algorithms.

Despite the importance of trajectory outlier detection, it has not been paid enough attention. So far, studies on trajectory outlier detection are relatively few. Among the few researchers, Professor Han Jiawei’s group has carried out extensive and in-depth research on trajectory data mining. For example, Lee et al. (2007) proposed a novel partition-and-group framework to cluster trajectory segments. Based on the similar framework, Lee et al. (2008a, b) proposed a partition-and-detect framework to find outlier trajectory segments in moving object database. Li et al. (2007) proposed a rule and motif-based anomaly detection algorithm to find outliers in massive moving object datasets. Li et al. (2009) proposes a method for detecting temporal outliers with an emphasis on historical similarity trends between data points. The proposed algorithms in Lee et al. (2007) have an important influence on trajectory clustering and trajectory outlier detection (Lee et al. 2008a; Li et al. 2006, 2007). Li et al. (2007) proposed a classifier-based abnormal trajectory detection called ROAM (Li et al. 2007) (Rule and Motif-Based Anomaly Detection in Moving Objects) in 2007. In the algorithm, trajectories are expressed using discrete pattern fragments called motifs which are collected via k-means clustering method, and then a classifier is employed for anomaly detection. Soon later, to overcome the shortcoming of ROAM, a new framework called partition-and-detection is proposed. Based on the framework, they develop a trajectory outlier detection algorithm TRAOD. TRAOD utilizes the line segment Hausdorff distance (Sim et al. 1999), which comes from pattern recognition field, to measure the distance between two sub-trajectories. At the same time, two-level trajectory partitioning ensures both high quality and high efficiency. Professor Han Jiawei’s group open the doors of the trajectory outlier detection research. Their work attracts more and more scholars to carry out such investigation.

4.1 Processes of trajectory outlier detection

Base on the analysis of the existing researches on trajectory outlier detection, we summarize the main processes of trajectory outlier detection, which is given in Fig. 2.

Fig. 2
figure 2

Processes of trajectory outlier detection

A trajectory outlier detection system generally consists of three core components, including trajectory preprocessing, representation transformation and trajectory outlier detection. Firstly, trajectory preprocessing is conducted to deal with a number of issues, such as noise filtering, segmentation, and map-matching. This stage is also a fundamental step of many trajectory data mining tasks. At the second step, raw trajectory data are transformed into appropriate representation and the output will be used as the input of third step. In the detection phase, outlier detection algorithms are introduced or proposed to discovery outliers of trajectories or their segments. These components will be discussed in detail in the following sub-sections.

4.2 Classification of trajectory outlier detection algorithms

An outlier is an observation point that is distant from other observations (Yuan et al. 2011, 2012b). This definition is often subjective and depends heavily on the context. The definition of a trajectory outlier should be neutral to applications. However not everyone has the same idea of what should be considered as a trajectory outlier, and not all datasets conform the same definitions or rules (Lee et al. 2008a). We divide trajectory outlier into four different categories according to application scenarios as follows.

Distance-based outlier A distance-based outlier in a dataset T is an object O of T such that at least the fraction p of the objects in T lies greater than distance D from O. In Fig. 3, \(TR_{1}\) is far away from \(TR_{6}\) and \(TR_{7 }\) and should be regarded as outlier if p is set to 2. The performance of distance based outlier detection algorithm depends heavily on a suitable distance metric. We discuss each of these measurements in detail in the following section.

Fig. 3
figure 3

An example of outlying trajectories

Density-based outlier A density-based outlier in a dataset T is an object O of T such that density of O less than density threshold d. A distance-based outlier always needs to select a global distance threshold, so the outlying trajectories detected can be regard as global outliers. However some datasets present complex distributed characteristic. Their outliers are correlated with the density of their neighbors. These can be regarded as local abnormal trajectories. Just like \(TR_{6}\) and \(TR_{7}\) in Fig. 3, both of them will not be viewed as an outlier in terms of distance. But when it comes to density they are abnormal.

Feature-based outlier A feature-based outlier is a trajectory that is significantly different from others in a dataset in terms of trajectory feature, such as velocity and direction. For example, the trajectory \(TR_{2}\) in Fig. 3 is an outlier as it moves in the opposite direction. Similarly, we can detect outlier trajectory whose velocity is abnormal.

Sub-trajectory outlier Partition a trajectory into a set of trajectory partitions and then detects abnormal sub-trajectories, the detected trajectory partitions are so-called sub-trajectory outlier. Different from what discussed above, this kind of outlier is detected by comparing trajectory partially. The shortcoming of methods based on comparing trajectory as a whole is that they are often unable to detect outlying portions of the trajectories. As it is illustrated on Fig. 3, the partition in bold of \(TR_{4}\) is an outlying sub-trajectory.

Activity outlier The trajectories of moving objects are the reflection of their activities in the physical world (Ashbrook and Starner 2003). Therefore, through analyzing their trajectories, moving objects’ activity outliers can be detected. Basharat et al. (2008) learned objects’ motion patterns to find their anomaly activities through analyzing the moving trends of trajectories. Li et al. (2011a) detected abnormal behavior via sparse reconstruction analysis of trajectory. Li et al. (2010) developed a moving object mining system to find different activities in trajectory database.

4.3 Trajectory outlier detection algorithms

An outlier trajectory is a trajectory or a part of trajectory that is obviously different from others in a trajectory dataset. We groups previously developed trajectory outlier detection into four major categories. In this section, we take a brief look at these methods.

4.3.1 Distance-based algorithms

Distance-based measures are famous in clustering where they are used to quantify the similarity of objects. Outliers can be discovered as a side-effect of clustering. They are objects lying outside clusters.

Knorr et al. (2000) introduced the concept of distance-based outlier. His study is recognized as earliest research on trajectory outlier detection. They transform each trajectory into a composite object, which consists of four key features: the position, the length, the direction and the velocity. Then a traditional distance-based outlier detecting method is employed to discovery outlying trajectories. The shortcoming of this method is that it may not be able to detect outlying sub-trajectory since the differences are averaged out over the whole trajectory.

To solve the problem raised by comparing whole trajectory, Lee et al. (2008a). introduce a partition-and-detect framework and present an abnormal trajectories detection algorithm TRAOD based on it. TRAOD contains two phases: partition phase and detecting phase. A trajectory is partitioned in coarse granularity firstly and then in fine granularity. In detecting stage, outlying trajectory partitions are detected by a distance-based method. The line segment Hausdorff distance (Sim et al. 1999) is employed to measure the distance between two partitions.

The distance metric used in TRAOD only involves the dissimilarity in the shape and local motion pattern of sub-trajectories, the common deviation between sub-trajectories is not eliminated. Liu et al. (2010) proposed a novel distance function, which derived from Minimum Hausdorff distance. This distance function uses k consecutive points as basic unit to calculate local outlier of trajectory. Based on local outlier, two trajectories are detected whether they are globally similar or not. While target trajectory doesn’t have enough similar trajectories, it will be viewed as outlying trajectory. What’s more, the paper introduces R-Tree to improve the algorithm performance.

All distance measures used in these methods treat trajectory as static data sequence, and did not take valuable features into consideration, for example, the starting time of trajectory may effect on trajectory shape, location, and other features. To address this problem, Yuan et al. (2011) proposed a Trajectory Outlier Detection algorithm based on Structural Similarity (TOD-SS). TOD-SS consider more trajectory features, and the importance of each feature can be adjusted by their weights. The structural similarity not only can better reflect the differences of both internal and external features, but also can strengthen analysis effect.

4.3.2 Density-based algorithms

As mentioned above, every distance-based algorithm needs a user specified global distance threshold, which is hard to choose a reasonable value. The outlying trajectories detected with a global distance threshold can be regard as global outliers. However some datasets present complex distributed characteristic. Their outliers are correlated with the density of their neighbors. These can be regarded as local abnormal trajectories.

In order to overcoming the weaknesses of distance-based method, Liu et al. (2013) put forward the density-based trajectory outlier detection (DBTOD). DBTOD takes advantage of TRAOD and introduce the concept of trajectory density, which takes account of the distribution of neighborhood objects. Density of trajectory is make up of two components: the distance between the sub-trajectories and the number of sub-trajectories within the given ranges.

The DBTOD algorithm makes full use of the partition-and-detect framework, and consists of partition stage and detection stage. Each trajectory is partitioned into a set of t-partition using the same trajectory partition algorithm as (Lee et al. 2008a) used. Then a density-based, instead of distance-based, detection algorithm is employed to discovery abnormal sub-trajectories

Their method has two advantages over distance-based algorithm. First, it is able to detect both anomalous sub-trajectories and anomalous local trajectories. What’s more, it overcomes the sensitive parameter problem of distance-based methods.

In order to speed up the process of trajectory outlier detection, some index technologies and storage structures should be employed. For example, in order to optimize the proposed method and eliminate unrelated trajectory segments, R-tree is commonly used data structure to store trajectories (segments) (Liu et al. 2009; Guan et al. 2012).

4.3.3 Classification-based algorithms

A classification-based trajectory outlier detection algorithm, called ROAM (Rule- and Motif-based Anomaly Detection in Moving Objects) was proposed by Li et al. (2007). In this algorithm, common patterns called motifs are used to represent trajectory. The framework consists of three components: (1) Motif-based feature space. Trajectories are partitioned into motifs and construct a multi-dimensional feature space oriented on the motif with associated attribute. (2) Automated hierarchy extraction. By examining the pattern in the tracks, the hierarchies in the feature space were derived. (3) Hierarchical rule-based classifier. A hierarchical rule-based classifier is a rule-based classifier which explores the hierarchical feature space and finds the effective regions for classification.

Same as other classification-based outlier detection algorithm, it’s difficult to find a standard training data set for ROAM. Besides, when handling a complex trajectory, the motif-based trajectory might cover local anomaly. Therefore some outlying partitions might escape detection.

4.3.4 Evolving trajectory outlier detection

Ge et al. (2010) proposed an evolving trajectory outlier detection approach, called TOP-EYE, to identify evolving outliers at very early stage with relatively low false alarm rate. The algorithm continuously computes the outlying score for each trajectory in an accumulating way. The outlying score can be defined based on moving direction or density of trajectories.

First, unlike traditional vector date, trajectories are complex data for which traditional distance measures and outlying score functions cannot be used to directly. A data transformation process is applied to cope with the difficulty. Specifically, they discretize the continuous space into small grids.

For detecting direction outliers, they use a probabilistic model to turn the direction information of trajectories in a grid into a vector with eight values indicate the probabilities of moving toward eight directions with this grid. Therefore, they can generate the direction trend within a fixed area by summarizing the moving directions from large amounts of trajectories for a period of time. Then, once some objects move across this along the completely different direction from the summarized directions, they can detect them as outliers in a real-time fashion by measuring the similarity between the directions of the observed objects and summarized directions.

For detecting density outliers, they compute the density of trajectories in each grid conveniently. The trajectory density with each grid is estimated as the number of trajectories across this grid. They can obtain the trajectory density distribution with sufficient historical trajectory data. The outlying score of a new trajectory can then be measured based on the density of trajectories in the grid where this trajectory actually passes.

The main idea of TOP-EYE is that abnormal behavior of moving object is gradually reflected in its moving characteristics represented by its trajectory. Therefore, it is possible to detect the trajectory outliers in an evolving way combining the instant abnormity of the object with the influence of its prior movement. They introduce an exponential decay function to control the influence. As both historical and instant abnormity are taken into count, temporarily disturbance won’t affect the final result. In addition, TOP-EYE has ability to identify evolving outlier at very early stage with relatively low false alarm rate. What’s more, the method can save a lot of time cost because only conduct computation on the grid level.

5 Application scenarios

Trajectory outlier detection is widely used in numerous spatial-temporal application scenarios, such as climate, animal or traffic detection. In this section, we introduce some typical application scenarios to intuitively describe the meaning of studying trajectory outlier detection.

  1. (1)

    Abnormal climate detection is a typical application scenario of spatial-temporal outlier detection. There are mainly two ways to find abnormal climate. One is finding outlier (sub) trajectories of various storm, typhoon as well as other meteorological data (Lee et al. 2008a). The other is finding the trends or periods of extreme climate using spatial and temporal weighted regression (Fotheringham et al. 2015). Lee developed a trajectory outlier detection algorithm TRAOD. Their algorithm consists of two phases: partitioning and detection. For the first phase, they proposed a two-level trajectory partitioning strategy to partition a trajectory into a set of line segments, which ensures both high quality and high efficiency. For the second phase, they present a hybrid of the distance-based and density-based approaches. They conducted experiments on two real data sets (the hurricane track data set and the animal movement data set), and the results demonstrate that TRAOD correctly detects outlying sub-trajectories from real trajectory data. While Fotheringham present a novelty model called geographical and temporal weighted regression (GTWR) to account for local effects in both space and time. In their work, they firstly detailed both the model formulation and the estimation of GTWR focusing on spatiotemporal kernel function definition and spatiotemporal bandwidth optimization. Then the performance of GTWR was examined by comparison with basic geographical weighted regression models. Finally, they discussed and drew some conclusions on the highlight of effectiveness and potential superiority of GTWR in local spatiotemporal modeling.

  2. (2)

    Animal habit analysis is a very important application scenario of moving object trajectory analysis. Li et al. (2011b) developed an efficient prototype system called MoveMine to discover animal movement patterns from animal trajectory data. Animal abnormal habit or moving tendency, which may be events or observations (represented by a collection of trajectories) that do not conform to an expected pattern, may greatly attract the attention from biologists or even botanists. MoveMine is constructed that integrates various data mining functions, including sophisticated pattern mining and trajectory analysis. Two interesting moving object pattern mining functions are developed in MoveMine: i. Periodic behavior mining and ii. Swarm pattern mining. The system is tested on real datasets and its results are visualized in Google Map and Google Earth. With MoveMine, users can be facilitated when analyzing moving object data.

  3. (3)

    Urban traffic detection is very important for city life. Both spatial and temporal outliers can be detected from traffic data. Through analyzing the trajectories of moving objects in urban area or with the constraint of road network, different traffic flow can be identified, which are very helpful for smart city plan and smart transportation management. Li et al. (2009) proposed a Temporal Outlier Discovery (TOD) framework for detecting temporal outliers in vehicle traffic data. In their work, agglomerated temporal information of the entire dataset is utilized to detect outliers. At each time step, each road segment checks its similarity vs. other road segments, and the historical similarity values are recorded in a temporal neighborhood vector at each road segment. Outliers are calculated from drastic changes in these vectors.

  4. (4)

    Urban safety management has been being considered as a new important application of trajectory moving mining. The movement of crowds in a city is strategically important for risk assessment, and public safety management (Hoang et al. 2016). Therefore, through analyzing the abnormal status of traffic flow and the communication base station, the direction and density of moving crowds can be extracted, and quick response to the urban safety management can be provided to the city administrator. The work of Hoang et al. (2016) is quite different from the predictions of each individual’s movements and each road segment’s traffic conditions, which are computationally costly and not necessary from the perspective of public safety on a citywide scale. In their work, they developed a practical solution for citywide traffic prediction. Firstly, they partition the map of a city into regions using both its road network and historical records of human mobility. In order to model the multiple complex factors affecting crowd flows, they decompose flows into three components: seasonal patterns, trend, and residual flows. The seasonal and trend models are built as intrinsic Gaussian Markov random fields which can cope with noisy and missing data, whereas the residual model exploits the spatiotemporal dependence among different flows and regions, as well as the effect of weather. Experiment results on three real-world datasets (Beijing Taxi GPS Data, New York Taxi Pickup Data, and New York Bike Rent Data) show that their method is scalable and outperforms all baselines significantly in terms of accuracy.

6 Conclusion

Outlier detection is an important way to find surprised data or pattern. In this paper, the research status and new development of trajectory outlier detection algorithms in recent years have been surveyed and summarized. Firstly, some recently proposed studies on outlier detection algorithms and researches on different application domain are analyzed and summarized from algorithmic thinking, key technology and the advantages and disadvantages. Then, popular distance measures are discussed. Thirdly, comprehensive analysis on trajectory outlier detection algorithms are discussed. Lastly, some application scenarios are pointed out and discussed. Last but not least, we present several important and classic algorithms. These algorithms are significant references for carrying out research work in the field of detecting abnormal trajectory.