Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Recent advances in telecommunication, sensing, and recording technologies allow for storing positions from moving objects at large scales in (near) real time. Analysing positional data streams is highly important in many applications; examples range from navigation and routing systems, network traffic, animal migration/tracking, movements of avatars in computer games to tactics in team sports.

In this paper, we focus on identifying frequent movement patterns in positional data streams that consist of a possible infinite sequence of coordinates. Existing approaches to frequent pattern mining [3, 20, 29] use identities of atomic events to define sequences (episodes) [15, 17]. In positional data, events correspond to sequences of positions (i.e., trajectories) and due to the continuous domain it is very unlikely to observe a trajectory twice. Instead, we observe a multitude of different trajectories that give rise to an exponentially growing set of possibly frequent sequences. Consequentially, mining positional data can only be addressed in the context of big data.

Our contribution is threefold: (i) To remedy the absence of matching atomic events, we propose an efficient preprocessing of the positional data using locality sensitive hashing and approximate dynamic time warping. (ii) To process the resulting near-neighbour trajectories we present a novel frequent pattern mining algorithm that generalises Achar et al. [1] to positional data. (iii) We present a distributed algorithm for processing positional data at large-scales. Empirically, we evaluate all stages of our approach on positional data of a real soccer game where cameras and sensors realise a bird’s eye view of the pitch that allows for locating the players and the ball several times per second.

The next section reviews related work. Section 3 introduces the representation and the efficient computation of near neighbours. We present our algorithm to detect frequent trajectories in Sect. 4. Section 5 reports on empirical results and Sect. 6 concludes.

2 Related Work

Spatio-temporal data mining aims to extract the behaviour and relation of moving objects from (positional) data streams and is frequently used in computational biology for mining animal movements. In [11], the authors aim to detect closely moving objects. Although composition and location of the group may change over time, the identity of the group is considered fixed. Such groups of objects are called moving clusters.

Trajectory-based patterns are first introduced by [6]. These patterns represent a set of individual trajectories that share the property of visiting the same sequence of places within similar travel time. Trajectory-based approaches use a discretisation of the movements to identify places that are also known beforehand. Our contribution considers a continuous generalisation: every coordinate on the pitch is a place of interest and trajectories are relations between coordinates and travel time.

Event sequence mining has been introduced by [3] as a problem of mining frequent sequential patterns in a set of sequences. Sequential pattern mining discovers frequent subsequences as patterns in a sequence database. The most common example is the cart analysis proposed by [3]. Their approach discovers items that are often bought together by customers. Efficient generalisations have been proposed to mine large sequence databases [20, 29]. A special case of sequential pattern mining that aims at closed patterns is introduced in [28]. Frequent episode discovery algorithms [17] summarise techniques to describe and find patterns in a stream of events. Applications in several domains have been developed like manufacturing [15], telecommunication [17], biology [19], and text mining [9]. However, these algorithms are restricted to specific types of episodes. Achar et al. [1] propose the first approach to mine unrestricted episodes. Our approach generalises [1] to mining positional data streams.

Data mining for team sports mostly focuses on event recognition in videos [5, 27] or the conversion of video data into a positional data stream [10]. Reference [10] also report on simple descriptive statistics including heat maps and passing accuracies. Reference [12] uses positional data to assess player positions in particular areas of the pitch, such as catchable, safe or competing zones. Prior work also utilises positional data of basketball games [21] and military operations [24] to identify tactical patterns, respectively. However, the presented approaches focus on detecting a priori known patterns in the data stream. By contrast, we devise a purely data-driven approach to find frequent patterns in positional data without making any assumptions on zones, tasks or movements.

3 Efficiently Finding Similar Movements

3.1 Representation

Given a positional data stream \(\mathcal {D}\) with \(\ell \) objects \({\varvec{o}}_1,\ldots ,{\varvec{o}}_\ell \). Every object \({\varvec{o}}_i\) is represented by a sequence of coordinates \(\mathcal {P}_{i}=\langle {{\varvec{x}}}^i_1,{{\varvec{x}}}^i_2,\ldots \rangle \) where \({{\varvec{x}}}_t=(x_{1},x_{2},\ldots ,x_d)^\top \) denotes the position of the object in \(d\)-dimensional space at time \(t\). A trajectory or movement of the \(i\)-th object is a subset \({\varvec{p}}_{[t,t+m]}\subseteq \mathcal {P}_i\) of the stream, e.g., \({\varvec{p}}_{[t,t+m]}=\langle {{\varvec{x}}}^i_t,{{\varvec{x}}}^i_{t+1},\ldots ,{{\varvec{x}}}^i_{t+m}\rangle \), where \(m\) is the length of the trajectory. In the remainder, the time index \(t\) is omitted and each element of a trajectory is indexed by offsets \(1,\ldots ,m\).

For generality, we focus on finding similar trajectories where (i) the exact location of a trajectory does not matter (translation invariance), (ii) the range of the trajectory is negligible (scale invariance), and where turns such as left or right are considered identical (rotation invariance). Note that, depending on the application at hand, one or more of these requirements may be inappropriate and can be dropped by altering the representation accordingly. Using the requirements (i)–(iii) gives rise to the so-called angle/arc-length representation [26] of trajectories that represents movements as a list of tuples of angles \(\theta _t\) and distances \({\varvec{v}}_t ={{\varvec{x}}}_t - {{\varvec{x}}}_{t-1}\), The difference \({\varvec{v}}_t\) is called the movement vector at time \(t\) and the angles are computed with respect to a (randomly drawn) reference vector \({\varvec{v}}_{ref}=(1,0)^\top \),

$$\begin{aligned} \theta _i = {{\mathrm{{sign}}}}({\varvec{v}}_i,{\varvec{v}}_{ref}) \left[ \cos ^{-1}\left( \frac{{\varvec{v}}_i^\top {\varvec{v}}_{ref}}{\Vert {\varvec{v}}_i\Vert \, \Vert {\varvec{v}}_{ref}\Vert }\right) \right] , \end{aligned}$$

where the \({{\mathrm{{sign}}}}\) function computes the direction (clockwise or counterclockwise) of the movement with respect to the reference.

Additionally, transformed trajectories are normalised by subtracting the average so that \(\theta _i\in [-\pi , +\pi ]\) for all \(i\) and by normalising the total distance to one. Finally, we discard the difference vectors and represent trajectories solely by their sequences of angles, \({\varvec{p}}\mapsto \tilde{\varvec{p}}= \langle \theta _1, \ldots , \theta _n\rangle \).

3.2 Approximate Dynamic Time Warping

Recall that pairs of trajectories may contain phase shifts, that is, a movement may begin slowly and then speeds-up while another starts fast and then slows down towards the end. Such phase shifts are well captured by alignment-based similarity measures such as dynamic time warping [22].

Dynamic time warping (DTW) is a non-metric distance function that measures the distance between two sequences and is often used to compare time related signals in biometrics or speech recognition problems. Given two sequences \({\varvec{s}}=\langle s_1,\ldots ,s_n\rangle \) and \({\varvec{q}}=\langle q_1,\ldots ,q_m\rangle \) and a cost function \(cost(s_i,q_j)\) detailing the costs of matching \(s_i\) with \(q_j\). The goal of dynamic time warping is to find an alignment \(\varLambda =\{(\eta _i, \mu _i)\}\) for \(\eta _i \in [0,n]\) and \(\mu _i \in [0,m]\) of sequences \({\varvec{s}}\) and \({\varvec{q}}\) that has minimal costs subject to the constraints that (i) the alignment starts at position \((1,1)\) and ends at position \((n,m)\) (boundary), (ii) being in step \((\eta _i,\mu _j)\) after being in step \((\eta _k,\mu _l)\) implies that \(i-k \le 1\) and \(j-l \le 1\) (continuity), and (iii) being in step \((\eta _i,\mu _j)\) after being in step \((\eta _k,\mu _l)\) implies that \(i-k \ge 0\) and \(j-l \ge 0\) (monotonicity). An alignment that fulfils the above conditions is given by [18] using \(g(\emptyset , \emptyset ) = 0\), \(g({\varvec{s}}, \emptyset ) = cost(\emptyset , {\varvec{q}}) = \infty \), and

$$\begin{aligned} \begin{array}{l l} g({\varvec{s}}, {\varvec{q}}) &{}= cost(s_1, q_1) + min \left\{ \begin{array}{l} g({\varvec{s}}, \langle q_2,\ldots ,q_m\rangle ) \\ g(\langle s_2,\ldots ,s_m\rangle , {\varvec{q}}) \\ g(\langle s_2,\ldots ,s_m\rangle ,\langle q_2,\ldots ,q_m\rangle ) \end{array} \right\} \end{array}. \end{aligned}$$

The cost function \(cost\) is arbitrary and any metric or non-metric distance functions can be used. Dynamic time warping has a complexity of \(\mathcal O(|{\varvec{s}}| |{\varvec{q}}|)\) which is prohibitive for mining positional data streams.

A great deal of methods has been proposed to speed-up the computation of dynamic time warping. Besides global constraints (e.g., [8, 23]), efficient approximations can be obtained by lower bounds. The rationale is that lower bound functions can be computed in less time and are therefore often used as pruning techniques in applications like indexing or information retrieval. The exact DTW computation only needs to be carried out if the lower bound value is above a given threshold. We make use of two lower bound functions, \(f_{kim}\) [14] and \(f_{keogh}\) [13], that are defined as follows: \(f_{kim}\) focuses on the first, last, greatest and smallest values of two sequences [14] and can be computed in \(\mathcal O(m)\):

$$\begin{aligned} f_{kim}({\varvec{s}}, {\varvec{q}}) = \max \left\{ | s_1- q_1 | , | s_m - q_m | , | \max ({\varvec{s}}) - \max ({\varvec{q}}) | , | \min ({\varvec{s}}) - \min ({\varvec{q}}) | \right\} \!. \end{aligned}$$

If the greatest and smallest entries are normalised to a specific value their computation can be ignored and the time complexity reduces to \(\mathcal O(1)\). The second lower bound \(f_{keogh}\) [13] uses minimum \(\ell _i\) and maximum values \(u_i\) for sub-sequences of the query \({\varvec{q}}\) given by

$$\begin{aligned} \ell _i = \min (q_{i-r}, \ldots , q_{i+r}) \quad \text {and}\quad u_i = \max (q_{i-r}, q_{i+r}), \end{aligned}$$

where \(r\) is a user defined threshold. Trivially, \(u_i \ge q_i \ge \ell _i\) holds for all \(i\) and the lower bound \(f_{keogh}\) is given by

$$\begin{aligned} f_{keogh}({\varvec{q}},{\varvec{s}}) = \sqrt{\sum _{i=1}^m c_i}\quad \text {with} \quad c_i = \left\{ \begin{matrix} (s_i - u_i)^2 &{}:&{} if\ s_i > u_i \\ (s_i - \ell _i)^2 &{}:&{} if\ s_i < \ell _i \\ 0 &{} :&{}otherwise \end{matrix} \right. \end{aligned}$$

which can also be computed in \(\mathcal O(m)\). The result is a non-metric distance function that only violates the triangle inequality of a metric distance.

3.3 An \(N\)-Best Algorithm

Given a trajectory \({\varvec{q}}\in \mathcal {D}\), the goal is to find the most similar trajectories in \(\mathcal {D}\). Trivially, a straight forward approach is to compute the DTW values of \({\varvec{q}}\) for all trajectories in \(\mathcal {D}\) and sort the outcomes accordingly. However, this requires \(|\mathcal {D}|\) DTW computations, each of which is quadratic in the length of the trajectories, and renders the approach clearly infeasible. Algorithm 1 computes the \(N\) most similar trajectories for a given query \({\varvec{q}}\) efficiently by making use of the lower bound functions \(f_{kim}\) and \(f_{keogh}\). Lines 2–9 compute the DTW distances of the first \(N\) entries in the database and stores the entry with the highest distance to \({\varvec{q}}\). Lines 10–21 loop over the other trajectories in \(\mathcal {D}\) by first applying the lower bound functions \(f_{kim}\) and \(f_{keogh}\) to efficiently filter irrelevant movements before using the exact DTW distance for the remaining candidates. Every trajectory, realising a smaller DTW distance than the current maximum, replaces its peer; \(maxdist\) and \(maxind\) are updated accordingly. Note that the complexity of Algorithm 1 is linear in the number of trajectories in \(\mathcal {D}\). In the worst case, the sequences are sorted in descending order by the DTW distance, which requires to compute all DTW distances. In practice, however, much lower run-times are observed. A crucial factor is the tightness of the lower bound functions. The better the approximation of the DTW, the better the pruning. For \(N = 1\), the maximum value drops faster towards the lowest possible value. By contrast, setting \(N = |\mathcal {D}|\) requires to compute the exact DTW distances for all entries in the database. Hence, in most cases, \(N\ll |\mathcal {D}|\) is required to reduce the overall computation time. The computation can trivially be distributed with Hadoop; computing distances is performed in the mapper and sorting is done in the reducer.

figure a

3.4 Distance-Based Hashing

An alternative to the introduced \(N\)-Best algorithm provides locality sensitive hashing (LSH) [7]. A general class of LSH functions are called distance-based hashing (DBH) that can be used together with arbitrary spaces and (possibly non-metric) distances [4]. The hash family is constructed as follows. Let \(h : \mathcal {X} \rightarrow \mathbb {R}\) be a function that maps elements \(x \in \mathcal {X}\) to a real number. Choosing two randomly drawn members \(x_1, x_2 \in \mathcal {X}\), the function \(h\) is defined as

$$\begin{aligned} h_{x_1,x_2}(x) = \frac{dist(x,x_1)^2 + dist(x_1, x_2)^2 - dist(x, x_2)^2}{2\, dist(x_1, x_2)}. \end{aligned}$$
figure b

The binary hash value for \(x\) simply verifies whether \(h(x)\) lies in an interval \([t_1, t_2]\),

$$\begin{aligned} h^{[t_1,t_2]}_{x_1,x_2}(x) = \left\{ \begin{matrix} 1 &{} :&{}h_{x_1,x_2}(x) \in [t_1, t_2] \\ 0 &{}:&{} otherwise \end{matrix} \right. , \end{aligned}$$

where the boundaries \(t_1\) and \(t_2\) are chosen so that the probability that a randomly drawn \(x \in \mathcal {X}\) lies with 50 % chance within and with 50 % chance outside of the interval. Given the set \(\mathcal {T}\) of admissible intervals and function \(h\), the DBH family is now defined as

$$\begin{aligned} \mathcal {H}_{DBH} = \left\{ h^{[t_1,t_2]}_{x_1,x_2} \,:\, x_1, x_2 \in \mathbb {R}\,\wedge \, [t_1,t_2] \in \mathcal {T}(x_1, x_2) \right\} . \end{aligned}$$

Using random draws from \(\mathcal {H}_{DBH}\), new hash families can be constructed using AND- and OR-concatenation.

We use DBH to further improve the efficiency of the \(N\)-Best algorithm by removing a great deal of trajectories before processing them with Algorithm 1. Given a query trajectory \({\varvec{q}}\in \mathcal {D}\), the retrieval process first identifies candidate objects that are hashed to the same bucket for at least one of the hash functions, and then computes the exact distances of the remaining candidates using the \(N\)-Best algorithm. As distance measure of the DBH hash family we use the lower bound \(f_{kim}\). The computation is again easily distributed with Hadoop.

4 Frequent Episode Mining for Positional Data

The main difference between frequent episode mining (e.g., [1]) and mining frequent trajectories from positional data streams is the definition of events. In contrast to having a predefined set of atomic events, every trajectory in the stream is considered an event. Thus, events may overlap and are very unlikely to occur more than just once. In the absence of a predefined set of atomic events, we use the previously defined approximate distance functions in the mining step.

figure c

An event stream is a time-ordered stream of trajectories. Every event is represented by a tuple \((A,t)\) where \(A\) is an event and \(t\) denotes its timestamp. An episode \(\alpha \) is a directed acyclic graph, described by a triplet \((\mathcal {V}, \le , m)\) where \(\mathcal {V}\) is a set of nodes, \(\le \) is a partial order on \(\mathcal {V}\) (directed edges between the nodes), and \(m: \mathcal {V} \rightarrow E\) is a bijective function that maps nodes to events in the event stream. We focus on transitive closed episodes [25] in the remainder, that is if node \(A\) is ordered before \(B\) (\(A < B\)) there must be a direct edge between \(A\) and \(B\), that is, \(\forall A,B \in \mathcal {V} \,\,\text {if } A < B \implies edge(A, B)\).

The partial ordering of nodes upper bounds the number of possible directed acyclic graphs on the event stream. The ordering makes it impossible to include two identical (or similar) events in the same episode. Episodes that do not allow duplicate events are called injective episodes [1]. An episode \(\alpha \) is called frequent, if it occurs often enough in the event stream. The process of counting the episode \(\alpha \) consists of finding all episodes that are similar to \(\alpha \). A sub-episode \(\beta \) of an episode \(\alpha \) can be created by removing exactly one node \(n\) and all its edges from and to \(n\); e.g., for the episode \(A \rightarrow B \rightarrow C\) the sub-episodes are \(A \rightarrow B\), \(A \rightarrow C\) and \(B \rightarrow C\). The sub-episode of a singleton is denoted by the empty set \(\varnothing \).

Generally, frequent episodes can be found by Apriori-like algorithms [2]. The principles of dynamic programming are exploited to combine already frequent episodes to larger ones [16, 17]. We differentiate between alternating episode generation and counting phases. Every newly generated episodes must be unique, transitive closed, and injective. Candidates possessing infrequent sub-episodes are discarded due to the downward closure lemma [1]. We now present counting and episode generation algorithms for processing positional data with Hadoop.

4.1 Counting Phase

The frequency of an episode is defined as the maximum number of non-overlapping occurrences of the episode in the event stream [16].Footnote 1 Non-overlapping episodes can be detected and counted with finite state automata (FSAs), where every FSA is tailored to accept only a particular episode. The idea is as follows. For every episode that needs to be counted, an FSA is created and the event stream is processed by each FSA. If an FSA moves out of the initial state, a new FSA is created for possibly later occurring episodes and once the final state has been reached, the episode counter is incremented and all FSA-instances of the episode are deleted except for the one still remaining in the initial state.

Algorithm 2 shows the FSA transition function that counts an instance of an episode. Whenever the FSA reaches its final state its frequency is incremented. As input, Algorithm 2 gets the \(fsa\) instance which contains the current state, the last transition time and the first transition time. Additionally, the appropriate episode, the current time stamp and the events starting at that time stamp are passed to the function. First, in case the FSA is already in the final state, the function returns without doing anything (line 1). Algorithm 2 iterates over all source nodes in the current state and all events that had happened at time \(t\) (lines 4–5). Whenever there is an event \(e\) that is similar to the appropriate event of source node \(n\) (line 6), the FSA is traversed to the next state. The algorithm also keeps track of the start time and the last transition time to check the expiry time (lines 9 and 11).

The FSA transition function can be defined as a counting algorithm shown in Algorithm 3 in terms of a map-function for the Hadoop/MapReduce framework. The function first loads the event streamFootnote 2 (line 1) and initialises an empty FSA for every episode. Next, the event stream and the FSAs are traversed and passed to the FSA transition function. Whenever an FSA leaves the start state a new FSA must be added to the set of FSAs. This ensures that there is exactly one FSA in a start state. In case an FSA reaches its final state, all other FSAs can be removed and the process starts again with only one FSA in start state. In case more than one FSA reaches the final state, Algorithm 3 removes all but the youngest one in final state as this one has higher chances to meet the expiry time constraints. The test for expiry time is not shown in the pseudo code. Instances violating the expiry time do not contribute to the frequency count. Neither do FSAs that associate overlapping events with the same object. Note that the general idea of the counting algorithm is very similar to [1]. However, due to the different notions of an event, many optimisation do not apply in our case.

figure d

Following [1] we also employ bidirectional evidence as frequencies alone are necessary but not sufficient for detecting frequent episodes. The entropy-based bidirectional evidence can be integrated in the counting algorithm, see [1] for details. We omit the presentation here for lack of space.

4.2 Generation Phase

Algorithm 5 is designed to efficiently find the indices of the differentiated nodes of two episodes \(\alpha \) and \(\beta \). Therefore, it first tries to find the bijective mapping \(\pi \), that maps each node (and its corresponding event) of episode \(\alpha \) to episode \(\beta \) (line 1). In case such a complete mapping can not be found, \(\pi \) returns only the possible mappings and \(n\) contains the number of missing nodes in the mapping (see Algorithm 4). Episodes \(\alpha \) and \(\beta \) are combinable, if and only if \(n = 1\). The remainder of the algorithm finds the missing node indices by accumulating over the existing indices and by subtracting the accumulated result from the sum of all indices. This little trick finds the missing indices in time \(O(n)\). The function returns the node indices that differentiate between \(\alpha \) and \(\beta \). To prevent the computation of Algorithm 5 on all pairs of episodes, each episode is associated with its block start identifier [1]. The idea is the following. All generated episodes from an episode \(\alpha \) share the same sub-episode. This sub-episode is trivially identical to \(\alpha \) as it originates from adding a node to \(\alpha \). The generation step thus takes only those episodes into account that possess the same block start identifier.

figure e

Given two combinable episodes \(\alpha \) and \(\beta \) and the differentiated nodes \(a\) and \(b\) (found by Algorithm 5), it is now possible to combine these episodes to up to three new candidates, as described by [1]. The first candidate originates from adding node \(b\) to episode \(\alpha \) including all its edges from and to \(b\). The second candidate is generated from the first candidate by adding an edge from node \(a\) to node \(b\) and the third one adds an edge from \(b\) to \(a\) to the first candidate. In contrast to [1], we do not test weather all sub-episodes of each candidate are frequent as this would require an efficient lookup of all episodes which can be quite complex for positional data. Candidates with infrequent sub-episodes are allowed at this stage of the algorithm as they will be eliminated in the next counting step anyway.

figure f

The complete episode generation algorithm is shown in Algorithm 6. As input, a list of frequent episodes ordered by their block start identifier is given. The result of the algorithm is a list of new episodes that are passed on to the counting algorithm. In lines 2 and 4, all episode pairs are processed as long as they share the same block start identifier (line 6). Then, three possible candidates are generated (line 7) and kept in case they are transitive closed (line 9). Before adding it to the result set, the block start identifier of the new episode is updated (line 10). Analogously to the counting phase, domain specific constraints may be added to filter out unwanted episodes (e.g. in terms of expiry time, overlapping trajectories of the same object, etc.).

5 Empirical Evaluation

5.1 Positional Data

For the experiments, we use positional data from the DEBS Grand Challenge 2013Footnote 3. The data is recorded from a soccer game of two teams with eight players on each side. Each player is equipped with two sensors, one for each foot. We average every pair of sensors to obtain only a single measurement for every player at each point in time. Events happening before and after the game or during the half time break are removed as well as coordinates occurring outside of the pitch are discarded. To reduce the overall amount of data, we average the positional data of each player over 100 ms. The set of trajectories is created by introducing a sliding window of constant size for each player so that trajectories begin every 500 ms and last for one second. This procedure gives us 111.041 trajectories in total, 50.212 for team A, 50.245 for team B, and 10.584 for the ball.

5.2 Near Neighbour Search

The first set of experiments evaluates the run-time of the three distance functions Exact, \(N\) -Best, and LSH. Since the exact variant needs quadratically many comparisons in the length of the stream, we focus on only a subset of 15,000 consecutive positions of team A in the experiment. We fix \(N=1000\) and measure the total computation time for all approaches. Figure 1 (left) shows the run-times in seconds for varying sample sizes.

Fig. 1.
figure 1

Left: Run-time. Right: Pruning efficiency.

Fig. 2.
figure 2

Left: Accuracy of LSH. Right: Most similar trajectories for a given query

Unsurprisingly, the computation time of the exact distances grows exponentially in the size of the data. By contrast, the \(N\)-Best algorithm performs slightly super-linear and significantly outperforms its exact counterpart. Pre-filtering trajectories using LSH results in only a small additional speed-up. The figure also shows that distributing the computation significantly improves the run-time of the algorithms and indicates that parallelisation allows for computing near-neighbours on large data sets very efficiently. The observed improvements in run-time are the result of a highly efficient pruning strategy. Figure 1 (right) details the amount of pruned trajectories for the respective approaches. The table shows that the effectiveness of the pruning for \(f_{kim}\) and \(f_{keogh}\) increases with increasing numbers of trajectories. By contrast, pruning LSH is more or less constant and does not change in terms of the number of trajectories but depends on the overall data size and on the ratio \(\frac{N}{|\mathcal {D}|}\).

We now investigate the accuracy of the proposed LSH pruning. The 1,000 most similar trajectories are compared with the ground truth given by the exact approach. That is, for each trajectory, the two result lists are compared. Figure 2 reports averaged precision@\(N\) scores with \(N\in \{100, 200, \dots , 1000\}\) for all trajectories. The average precision decreases slightly for larger \(N\). We credit this observation to the small data set that comprises only a few trajectories. While highly similar trajectories are successfully found, the approximate near-neighbour method fills up remaining slots with non-similar trajectories. The worst precision is therefore close to zero for \(N=100\) and increases slightly for larger \(N\) due to only a few true near neighbours in the data.

Although LSH performs well and only slightly decreases in the size of \(N\) we focus on the \(N\)-Best algorithm with \(N = 1,000\) in the remainder for a loss-free and exact computation of the top-\(N\) matches. According to Fig. 1 (left) the slightly longer execution time is negligible. Figure 2 shows the most similar trajectories for three query trajectories. For common trajectories (top rows), the most similar trajectories are true near neighbours. It can also be seen that the proposed distance function is rotation invariant. For uncommon trajectories (bottom row), the found candidates are very different from the query.

5.3 Episode Discovery

The first experiments of the episode discovery algorithm focus on the influence of the parameters wrt the number of generated and counted episodes. The algorithm depends on four different parameters, the similarity, frequency, the bidirectional evidence, and the expiry time. For this set of experiments, we use the trajectories of team A to find frequent tactical patterns in the game. The results are shown in Fig. 3.

An interesting parameter is the similarity threshold as it strongly impacts the number of generated episodes: small changes may already lead to an exponential growth in the number of trajectories and large values quickly render the problem infeasible even on medium-sized Hadoop clusters. A similar effect can be observed for the expiry time threshold. Incrementing the expiry time often requires decreasing the similarity threshold. The number of counted episodes is adjusted by the frequency threshold. As shown in the figure, the number of generated episodes can often be reduced by one or more orders of magnitudes. By contrast, the bidirectional evidence threshold affects the result only marginally.

Fig. 3.
figure 3

Top row: Varying similarity (first and second columns) and frequency (third and fourth columns) thresholds. Bottom row: Varying bidirectional evidence (first and second columns) and expiry time (third and fourth columns) thresholds.

Finally, we present two frequent episodes in Fig. 4. The ball is presented with a black line. All other lines describe the players of team A during the time span of the episode. The involved trajectories are displayed by thick lines and a circle at the beginning to indicate movement directions. A small circle at the beginning of a trajectory indicates that the trajectory depends on one or more other trajectories. Additionally, each trajectory is labeled with player ID and timestamp. For completeness, the opponent goal keeper is drawn with a red line at the bottom. The displayed episodes present a well known pattern of soccer games: players move towards the ball. In the left figure, the ball is played towards the opponent goal and players 1, 2, 3, 6, and 7 follow the direction of the ball. In the right figure the opponent team prepares an attack by passing the ball from one side to the other. The players of team A follow the direction of the ball to prevent the attacking team from moving forward.

6 Conclusion

We proposed a novel method to mining frequent patterns in positional data streams where consecutive coordinates of objects are treated as movements. Our contribution is threefold: We firstly proposed an efficient and accurate method to find similar trajectories for a given query. Secondly, we proposed an algorithm that uses these distances to efficiently combine individual movements to complex frequent patterns consisting of multiple trajectories. Thirdly, we presented a distributed version for big data and Hadoop/MapReduce frameworks. Empirical results on positional data from a soccer game showed that the found patterns are intuitive and interpretable.

Fig. 4.
figure 4

Exemplary episodes