1 Introduction

The continuous proliferation of GPS-enabled mobile devices [26] (e.g., car navigation systems and smart phones) and online map services (e.g., GoogleMaps,Footnote 1 BingMapsFootnote 2 and MapQuestFootnote 3) enable people to acquire their current geographic positions in real time and to interact with servers to query spatial information regarding their trips [14, 2325]. In the meantime, with the rapid development of indoor positioning systems (e.g., Wi-Fi, RFID, and Bluetooth), the movements of moving objects in indoor spaces are increasingly tracked and recorded [9, 10], which makes it possible to plan and to optimize the travel routes of moving objects in indoor spaces. The potential market of these location based services in the near future enables many novel applications. Emerging ones are TAUP queries that plan unobstructed paths in traffic-aware spatial networks to avoid potential traffic congestions. This type of queries are mainly motivated by indoor space applications, but are also applicable in outdoor spaces. We believe that the TAUP queries are useful in popular applications such as path planning and recommendation. To give examples, we describe the following two application scenarios:

Indoor Scenario::

At an international airport, bags are collected from travelers and delivered to the corresponding aircrafts. The movements of bags are constrained in a large transfer network, which is made up by hundreds of conveyer belts, trolleys and buggies. The travel routes of bags are roughly planned when they are collected, such as “Check-in 8:55am → Screen machine 9:30am → Tilt-tray sorter 9:50am → Flight SK1217 10:20am”. The travel path between any two adjacent intended places is uncertain. Large amounts of bags may create many traffic congestions in the transfer network and the delivery of some bags may be delayed. Suppose the travel routes of all bags in delivery are available, our task is to find unobstructed paths for VIP bags to avoid the potential congestions and to guarantee their arrival times. This work is motivated by the BagTrack project,Footnote 4 which is dedicated to improving bag handling in aviation industry globally.

Outdoor Scenario::

Trajectory sharing and search are pervasive nowadays. Travelers can easily upload their trajectories to some specialized web sites such as Bikely,Footnote 5 GPS-Way-points,Footnote 6 Share-My-Routes,Footnote 7 Microsoft GeoLife.Footnote 8 In practice, most of the existing trajectory data are stored in compressed format (lossy compression leads to the uncertainty). Compared to original trajectories, compressed trajectories have clear advantages in data processing, transmitting, and storing [12]. By analyzing historical travel trajectories of commuters, it is possible to construct a comprehensive traffic model to describe the traffic conditions in road networks, and to help users plan an unobstructed travel route and to avoid the potential traffic congestions.

In this light, we propose two novel probabilistic TAUP queries: (1) a time-threshold query like“what is the path from the check-in desk to the flight SK 1217 with the minimum congestion probability to take at most 45 minutes?”, (2) a probability-threshold query like “what is the fastest path from the check-in desk to the flight SK 1217 whose congestion probability is less than 20 %?”. These queries are applied in spatial networks, since in a large number of practical scenarios, moving objects move in a constrained environment (e.g., transfer networks in the indoor scenario, and road networks in the outdoor scenario) rather than a Euclidean space. The TAUP queries are challenging due to three reasons. First, it is necessary to establish a practical traffic model to detect the potential traffic congestions (time-delays)Footnote 9 and describe the traffic conditions in spatial networks, by analyzing uncertain trajectories (i.e., roughly planned routes in the indoor scenario, and compressed trajectories in the outdoor scenario) of moving objects. Second, the uncertainty should be taken into account during the uncertain trajectory reconstruction, thus a probabilistic model and an efficient trajectory reconstruction algorithm are required. Third, we need to develop efficient algorithms to compute the TAUP queries under different query settings.

To overcome these challenges, we construct a traffic-aware spatial network G t a (V, E) to detect potential time-delays and model traffic conditions in spatial networks. First, we propose an efficient algorithm to reconstruct uncertain trajectories. An uncertain trajectory is reconstructed into several possible paths, and all possibilities are taken into account. Then, we map these possible paths onto spatial networks. For each vertex vG t a (V, E), we maintain a set of traffic records to describe its traffic conditions. Based on G t a (V, E), we propose two novel probabilistic path queries (TAUP) and develop efficient algorithms to compute them. To sum up, the main contributions of this paper are as follows:

  • We define two novel TAUP queries in traffic-aware spatial networks. It provides new features for advanced spatio-temporal information systems, and may benefit users in popular applications such as route planning and recommendation.

  • We propose a comprehensive probabilistic model to evaluate the uncertainty when reconstructing uncertain trajectories, and a traffic-aware spatial network based on uncertain trajectory data to model the traffic conditions practically (Sections 2 and 3).

  • We develop two algorithms to compute the time-threshold and probability-threshold TAUP queries efficiently (Section 4).

  • We conduct extensive experiments on real and synthetic spatial data to investigate the performance of the developed algorithms (Section 5).

Note that this paper is an expanded version of our previous work [15]. We inherit the definition of traffic-aware spatial network, which covers the majority of the contents in Sections 2 and 3. In this paper, we propose two new probabilistic path queries in traffic-aware spatial networks (TAUP queries), and develop efficient algorithms to compute them (refer to Section 4, pages 11 – 16). In addition, we conduct extensive experiments to demonstrate the performance of the two new queries under different query settings (refer to Section 5, pages 18 – 20). Compared to the conference version [15], more than 40 percent significantly new materials are added into the journal version.

The rest of the paper is organized as follows. Section 2 introduces spatial networks and uncertain trajectories used in this paper, as well as uncertain trajectory reconstruction algorithm. The construction of traffic-aware spatial network is detailed in Section 3. The TAUP query processing is addressed in Section 4, which is followed by the experimental results in Section 5. This paper is concluded in Section 7 after some discussions of related work in Section 6.

2 Uncertain trajectory reconstruction

2.1 Preliminaries

2.1.1 Spatial networks

Spatial networks (i.e., transfer networks in the indoor space, and road networks in the outdoor space) are modeled by a connected and undirected graph G(V, E), where V is the set of vertices and E is the set of edges. A weight is assigned to each edge to represent application specific factors such as traveling time. Given two vertices a and b in a spatial network, the network distance between them is the length of their shortest network path (i.e., a sequence of edges linking a and b where the accumulated weight is minimal). Data points (e.g., trajectory sample points) are embedded in networks and they may be located in edges. If the network distances to the two end vertices of an edge are known, it is straightforward to derive the network distance to any point in this edge. Thus, we assume that all data points are in the vertices for the simplification of description.

2.1.2 Uncertain trajectory

There are two categories of uncertain trajectory data: roughly planned travel routes in the indoor space, and compressed trajectories in the outdoor spaces. In the indoor space, travel routes of bags are roughly planned such as “Check-in 8:55am → Screen machine 9:30am → Tilt-tray sorter 9:50am → Flight SK1217 10:20am”, where sample points (intended places) are the vertices on a spatial network. In the outdoor space, raw trajectory samples obtained from GPS devices are typically of the form of 〈l o n g i t u d e, l a t i t u d e, t i m e s t a m p〉. How to select historical trajectory data of travelers and how to map the tuple 〈l o n g i t u d e, l a t i t u d e, t i m e s t a m p〉 onto a spatial network are interesting research problems themselves, but they are outside the scope of this paper. We assume that trajectory data are selected and all trajectory sample points have already been aligned to the vertices on the spatial network by some map-matching algorithms [1, 2, 6, 20]. Between any two adjacent sample points a and b, the exact travel path of moving objects is uncertain in both indoor and outdoor spaces. The spatio-temporal attributes of an uncertain trajectory are defined as follows.

Definition 1

Uncertain Trajectory An uncertain trajectory τ of a moving object in a spatial network G is a finite sequence of time stamped positions: τ = 〈p 1, p 2,…,p n 〉, where p i is a vertex in G, and p i .t is its timestamp, for i = 1,2,..,n.

2.2 Uncertain trajectory reconstruction algorithm

Given a spatial network G(V, E), each vertex vG.V is allocated a threshold v.k to describe its traffic processing capability. That means at most v.k moving objects can be processed at vertex v in one minute, and each individual moving object will take \(\frac {1}{v.k}\) minutes processing time.

Given an uncertain trajectory segment τ s e g (p i , p j ) connecting two adjacent sample points p i and p j , it is difficult to find the exact path P(p i , p j ) between them due to the rough route planning (indoor) or trajectory compression loss (outdoor). Here, we propose a random walk based probabilistic model to evaluate the uncertainty of trajectories, and all possibilities are considered. Conceptually, we assume that the movement of object o between p i and p j is according to a random walk, and its moving space is constrained by two thresholds. First, the maximum moving time between p i and p j is constrained by (p j .tp i .t), where p i .t and p j .t are the timestamps of p i and p j , respectively. Second, there should be no loop in P(p i , p j ), which means that one vertex cannot appear twice in one path. The length and probability of path P = 〈p 1,…,p k 〉 are defined as follows:

$$ P.length = \sum\limits_{i=1}^{k-1} \left((p_{i},p_{i+1}).weight+ \frac{1}{p_{i+1}.k}\right) $$
(1)
$$ P.prob = \prod\limits_{i=1}^{k-1} (p_{i},p_{i+1}).prob $$
(2)

Here, (p i , p i + 1).w e i g h t and (p i , p i + 1).p r o b are the weight and probability of edge (p i , p i + 1), respectively. Suppose a moving object o is at vertex p i and it may select edge (p i , p i + 1) as its following moving direction. This probability is defined as the probability of edge (p i , p i + 1).

All valid sub-paths of the trajectory segment τ s e g (p i , p j ) can be retrieved according to Algorithm 1. Here, p i and p j are two adjacent sample points and Procedure (p i ) is a recursive function. A depth-first traversal is conducted to find all possible sub-paths connecting p i and p j . A network expansion [4] starts from p i , and all adjacent vertices of p i are scanned iteratively (line 4). For each vertex n adjacent to p i , we check whether it conflicts the two thresholds: n cannot appear twice in the same path (lines 5–6), and the path length cannot be greater than (p j .tp i .t)(lines 7–8). If n satisfies both thresholds, the related information of p a t h(n) are recorded, including its vertices, length and probability (lines 9–11). Once the destination p j has been scanned by the expansion, a valid sub-path is found and stored in P a t h l i s t(p i , p j )(lines 12–16). Otherwise, a new recursive function based on n will be conducted (line 17). Finally, all the possible sub-paths connecting p i and p j are returned to users (line 20).

figure g

An example of trajectory segment reconstruction is shown in Fig. 1, where p 1 and p 2 are two adjacent trajectory sample points, and n 1, n 2,…,n 8 are vertices in a spatial network G. Each edge is assigned a weight to represent the travel time along this edge, and each vertex is assigned a threshold to describe its traffic processing capability. In Fig. 1, the traffic processing capability for each vertex is 1, thus the processing time for each individual object is also 1. The maximum moving time between p 1 and p 2 is p 2.tp 1.t = 33.

Fig. 1
figure 1

Probabilistic model

There exist three sub-paths P 1, P 2 and P 3 that satisfy both thresholds mentioned above. For P 1=〈p 1, n 1, n 2, n 4, n 5, p 2〉, according to Eq. 1, its length is computed as P 1.l e n g t h = 6 + 1 + 3 + 1 + 7 + 1 + 3 + 1 + 8 + 1 = 32. The probability computation is more complex than the length computation. (p 1, n 1) is the first edge of P 1 and its probability is \(\frac {1}{1}\). We assume that the movement of objects is according to a random walk. At n 1, a moving object m has four choices: {p 1, n 2, n 3, n 5} for the next step. Vertex p 1 is invalid in this case, since a loop 〈p 1, n 1, p 1〉 is formed. As a result, the rest three vertices share the same probability \(\frac {1}{3}\). Then, at n 2, n 4 holds \(\frac {1}{1}\) probability to be the next vertex, since n 1 will form a loop. Following this procedure, the probability of P 1 is computed based on Eq. 2: \({P_{1}.prob = \frac {1}{1} \times \frac {1}{3} \times \frac {1}{1} \times \frac {1}{2} \times \frac {1}{3} =\frac {1}{18}}\). Similarly, we compute the length and probability of P 2=〈s 1, n 1, n 5, s 2〉 and P 3=〈s 1, n 1, n 3, n 6, n 5, s 2〉. P 2 has the length 24 and the probability \(\frac {1}{1} \times \frac {1}{3} \times \frac {1}{4} = \frac {1}{12}\). P 3 has the length 32 and the probability \(\frac {1}{1} \times \frac {1}{3} \times \frac {1}{1} \times \frac {1}{1} \times \frac {1}{3} = \frac {1}{9}\).

Other sub-paths that cannot satisfy both thresholds are labeled as invalid paths and are pruned. Then, we normalize the original probabilities of valid sub-paths P 1, P 2, and P 3.

$$ P_{i}.probN = \frac{P_{i}.prob}{{\sum}_{i = 1}^{k} P_{i}.prob} $$
(3)

Here, P i .p r o b N is the normalized probability of P i . Consequently, \(Prob_{N}(P_{1}) =\frac {1/18}{1/4} = \frac {2}{9}\), \({Prob_{N}(P_{2}) = \frac {1/12}{1/4} = \frac {1}{3}}\) and \({Prob_{N}(P_{3}) = \frac {1/9}{1/4} = \frac {4}{9}}\). At this stage, the trajectory segment T s e g (s i , s j ) is reconstructed to three possible sub-paths P 1, P 2, P 3.

After that, we combine the possible sub-paths from different trajectory segments to create a full path P(s, d)=〈P 1, P 2,…,P k 〉 connecting the source s and the destination d, where P 1, P 2,…,P k are sub-paths from trajectory segments τ s e g1, τ s e g2,…,τ s e g k , respectively. The length and probability of P(s, d) can be computed as \(P(s, d).length = {\sum }_{i=1}^{k} (P_{i}.length)\) and \(P(s, d).prob = {\prod }_{i=1}^{k} P_{i}.prob\), respectively.

3 Traffic-aware spatial network

3.1 Time-delay

For any vertex v in a spatial network G(V, E), if the number of moving objects to be processed by v is over its processing capability v.k, a time-delay (congestion) will occur. Suppose the processing capability of v is v.k per minute, and each individual moving object will take \(\frac {1}{v.k}\) minutes processing time. If the gap between any two moving objects is less than \(\frac {1}{v.k}\) minutes, a time-delay will be triggered. The time-delay for object o at vertex v is estimated as follows.

$$ T_{d}(o, v, T_{a}(o,v)) =\left\{\begin{array}{ll} {\sum}_{o_{i} \in O_{p}} o_{i}.prob \cdot (\frac{1}{v.k} - T_{a}(o,v) \\ + T_{a}(o_{i},v))+ {\sum}_{o_{j} \in O_{w}} \frac{o_{j}.prob}{v.k} & \text{if} ~C_{1}\\ \\ 0 & \text{if}~ C_{2} \end{array} \right. $$
(4)
C 1::

v is occupied at the time point T a (o, v), and O p is the set of moving objects being processed by v at this time point. In addition, there are also |O w | moving objects waiting to be processed.

C 2::

v is available at the time point T a (o, v), thus the corresponding time-delay is 0. Here T a (o, v) is the arrival time of object o at vertex v.

The total time cost of object o at vertex v is the sum of waiting time (time-delay) and processing time. Thus the leaving time of moving object o from vertex v can be computed as follows.

$$ T_{l}(o,v) = T_{a}(o,v) + T_{d}(o, v, T_{a}(o,v)) + \frac{1}{v.k} $$
(5)

An example is shown in Fig. 2, where P 1=〈n 1, n 2, v, n 3, n 4〉 and P 2=〈n 5, n 6, v, n 7, n 8〉 are two reconstructed probabilistic trajectories for moving objects o 1 and o 2 respectively. P 1 has 50 % probability while P 2 has 75 % probability. Vertex v is an intersection and its traffic processing capability is v.k = 4 per minute, which means processing an individual moving object will take \(\frac {1}{4}\) minute = 15 seconds. The expected arrival time of o 1 at v is 12:28:40, while o 2’s expected arrival time at v is 12:28:45. When o 2 arrives at v, the processing of o 1 is not finished until 10 seconds later. Referring to Eqs. 4 and 5, the time-delay of moving object o 2 at vertex v can be estimated as T d (v, o 2) = 50 % × 0 : 10 = 0 : 05. The estimated leaving time of o 2 from v is 12:28:45+0:05+0:15=12:29:05, and the expected arrival time of o 2 at vertices n 7 and n 8 will be delayed correspondingly

Fig. 2
figure 2

Time-delay

Note that at some time points, there may exist more than one moving object with the status of “being processed by vertex v” (referring to Eq 4). For example, in Fig. 2, moving object o 1 is being processed within the time range [12:28:40, 12:28:55], and the estimated processing time of o 2 is within the time range [12:28:50, 12:29:05]. In the overlapped time range [12:28:50, 12:28:55], o 1 and o 2 are both in the status of “being processed by v”. Suppose there is a new moving object o 3, and the arrival time of o 3 at v is within the overlapped time range [12:28:50, 12:28:55], to estimate the time-delay of o 3, the influences from both o 1 and o 2 should be taken into account. In addition, if more than one moving object arrive at one vertex at the same time point, we will arrange their schedules randomly.

3.2 Traffic-aware spatial network

To detect potential time-delays and model traffic conditions practically, we construct a traffic-aware spatial network G t a (V, E) by analysing the uncertain trajectories over spatial networks. In our definition, a traffic-aware spatial network G t a (V, E) is a spatial network, where each vertex vG t a .V has been assigned a threshold v.k to describe its traffic processing capability. There are also a set of traffic information records attached to v to describe its traffic conditions. Each record is in the form of (object, probability, expected arrival time, time-delay, processing time).

Given a spatial network G(V, E) and a set of uncertain trajectories T, the construction of a traffic-aware spatial network takes two steps. First, we reconstruct the uncertain trajectories in T according to a random walk probabilistic model, and map the reconstructed trajectories (with probability) onto G(V, E). Second, we sort the vertices in reconstructed trajectories according to their timestamps, and then refine their time-delays.

An example is shown in Fig. 3, where n 1, n 2,…,n 6 are vertices and their traffic processing capabilities are all 1 moving object per minute. There are 2 uncertain trajectories τ 1=〈s 1, d 1〉 and τ 2=〈s 2, d 2〉 of moving objects o 1 and o 2, respectively. Trajectory τ 1 is reconstructed into two possible paths P 1=〈s 1, n 2, n 1, n 4, d 1〉 and P 2=〈s 1, n 2, n 5, n 4, d 1〉, and each of them has 50 % probability; while τ 2 is reconstructed into two possible paths P 3=〈s 2, n 3, n 2, n 5, d 2〉 and P 4=〈s 2, n 3, n 6, n 5, d 2〉, and each of them also has 50 % probability. Then, we map all possible paths onto the spatial network, and compute the expected arrival times (ignore the time-delay) as the timestamps for vertices in all possible paths. For instance, the departure time of P 1 is 17:25:30, and the expected arrival time of n 2, n 1, n 4 and d 1 are 17:27:30, 17:30:00, 17:33:30, and 17:36:30, respectively.

Fig. 3
figure 3

Traffic-aware spatial network construction

We maintain a dynamic priority heap containing these timestamps. At each step, we only refine the timestamp on the top of the heap. In Fig. 3, n 3P 3, P 4 will be refined firstly, followed by n 2P 1, P 2. Then, we detect the first time-delay at vertex n 2P 3. The expected arrival time of moving object o 1 at n 2 is 17:27:30 with 100 % probability (P 1 and P 2), while the expected arrival time of o 2 at n 2 is 17:28:00 with 50 % probability (P 3). Referring to Eqs. 4 and 5, the time-delay of o 2P 3 at n 2 is 0:30, and the estimated leaving time is 17:29:30. Due to the time-delay at n 2, the expected arrival times for the rest vertices in P 3 (i.e., n 5 and d 2) have to be adjusted correspondingly. The refinement follows this procedure, step by step, until the priority heap is empty and all time-stamps have been refined. For each vertex vV, we maintain a set of traffic records to describe its traffic conditions. The refinement results of Fig. 3 are demonstrated in Table 1.

Table 1 Traffic information records

The complete procedure of establishing a traffic-aware spatial network is detailed in Algorithm 2. Given a spatial network G(V, E) and a set of uncertain trajectories T, for each uncertain trajectory τ, we reconstruct its possible paths and map them onto G(V, E) (lines 1–6). In the next, we select the vertex v with the minimum timestamp and refine it (lines 8–9). If a time-delay is detected, the timestamps of the rest vertices following v in the same path are updated correspondingly (lines 10–11).

figure h

3.3 Traffic records indexing and search

Each vertex vG t a (V, E) maintains a set of traffic records to describe its traffic conditions (refer to Table 1). We notice that there may exist a large number of records (e.g, hundreds of or thousands of records, depending on the number of uncertain trajectories) at one vertex. Such massive traffic data may prevent the computation from being addressed in real time. To accelerate the computing process, we establish a B-tree for the column “expected arrival time”. Note that the utilization of the well known B-tree is only for improving the search efficiency. Other indexing approaches can also be easily adapted. Given a moving object o and its expected arrival time T a (o, v) at vertex v, if a moving object o i in the traffic records of v satisfies (6), it will be labeled as “a waiting object” (i.e., when moving object o arrives at v, o i is waiting for being processed) and put into data set O w . Otherwise, if o i satisfies (7), it will be labeled as “a processing object” (i.e., when o arrives at v, o i is being processed) and put into data set O p . According to Eq. 4, the time-delay of moving object o at vertex v can be obtained.

$$ T_{a}(o_{i},v) < T_{a}(o,v) < T_{a}(o_{i},v) + T_{d}(o_{i},v,T_{a}(o_{i},v)) $$
(6)
$$ \left\{\begin{array}{ll} T_{a}(o_{i},v) + T_{d}(o_{i},v,T_{a}(o_{i},v)) < T_{a}(o,v)\\ T_{a}(o,v)< T_{a}(o_{i},v) + T_{d}(o_{i},v,T_{a}(o_{i},v)) + \frac{1}{v.k} \end{array} \right. $$
(7)

4 Probabilistic TAUP query processing

In this section, we propose two novel probabilistic path queries in traffic-aware spatial networks (TAUP queries), and develop two algorithms to compute them efficiently.

Definition 2

Congestion Probability Given a traffic-aware spatial network G t a (V, E) and a moving object o, the congestion probability of object o at vertex v is defined by

$$ o(v).prob_{c} = 1 - \prod\limits_{o_{i} \in O_{p}} (1 - o_{i}(v).prob_{c}) \cdot \prod\limits_{o_{j} \in O_{w}} (1-o_{j}(v).prob_{c}). $$
(8)

Here, O p is the set of “processing objects” (these objects are being processing when o arrives at v, refer to Eq. 7), and O w is the set of “waiting objects” (these objects are waiting to be processed when o arrives at v, refer to Eq. 6). Parameter o i (v).p r o b is the probability of the scenario that when object o arrives at v, object o i is being processed at v, and parameter o j (v).p r o b is the probability of the scenario that when o arrives at v, o j is being processed. In the aforementioned two scenarios, vertex v is occupied by other objects, and the movement of object o is obstructed.

The travel time and congestion probability of path P = 〈v 1, v 2,…,v k 〉 are defined as follows:

$$ P.time = \sum\limits_{i=1}^{k-1}\left(w(v_{i},v_{i+1})+\frac{1}{v_{i+1}.k}\right) $$
(9)
$$ P.prob_{c} = 1 - \prod\limits_{i=1}^{k} (1 - o(v_{i}).prob_{c}) $$
(10)

Here, w(v i , v i + 1) is the weight of edge (v i , v i + 1), and o(v i ).p r o b c is the congestion probability of object o at vertex v.

4.1 Time-threshold TAUP query

Definition 3

Time-threshold TAUP Query Given a traffic-aware spatial network G t a (V, E), a query source s, a query destination d, a departure time t, and a time-threshold τ.t, time-threshold TAUP query finds the path P from s to d with the minimum congestion probability and its travel time is no greater than τ.t, such that \(\forall P^{\prime } \in Pathlist(s,d)\setminus \{P\}(P(s,d).time \leq \tau .t \wedge P^{\prime }(s,d).time \leq \tau .t \rightarrow P(s,d).prob_{c} < P^{\prime }(s,d).prob_{c})\), where P a t h l i s t(s, d) is a path set that contains all possible paths from s to d.

Network expansion is a conventional (e.g., Dijkstra’s expansion [4]) method to compute the shortest/fastest path problem in spatial networks, but it fails to compute the time-threshold TAUP query due to the non-awareness of moving-object processing time and congestion probability at each vertex in G t a (V, E). To overcome this weakness, we develop a novel search algorithm named T A U P t i m e to compute the time-threshold TAUP query in real time. In T A U P t i m e , we maintain a time label (travel time, refer to Eq. 9) and a probability label (congestion probability, refer to Eq. 10) for each vertex v in a network expansion tree. The time label P(s, v).t i m e is defined by

$$ P(s,v).time = P(s,c).time + w(c,v) + \frac{1}{v.k}, $$
(11)

where v and c are vertices in G t a (V, E), and c is the parent vertex of v (i.e., c = v.p r e) in the expansion tree, and \(\frac {1}{v.k}\) is the individual moving object processing time at vertex v.

We also define a travel time lower bound P(v, d).t i m e.l b = s d(v, d) < P(v, d).t i m e to estimate the travel time from v to d, where s d(v, d) is the shortest path distance from v to d. The all-pair shortest path distances in G t a (V, E) are pre-computed.

$$\left\{\begin{array}{ll} P(s,v) = P(s,c).time + w(c,v) + \frac{1}{v.k}\\ P(v,d).time > P(v,d).time.lb = sd(v,d) \end{array} \right. $$
$$\Rightarrow P(s,v).time + P(v,d).time > P(s,c).time + w(c,v) + \frac{1}{v.k} + sd(v,d) $$

The value of (P(s, v).t i m e + P(v, d).t i m e) is the total travel time from s to d via v. If the value of \((P(s,c).time + w(c,v) + \frac {1}{v.k} + sd(v,d))\) is greater than time threshold τ.t, we have that

$$\left\{\begin{array}{ll} P(s,c).time + w(c,v) + \frac{1}{v.k} + sd(v,d) > \tau.t \\ P(s,v).time + P(v,d).time > P(s,c).time + w(c,v) +\frac{1}{v.k} + sd(v,d) \end{array} \right. $$
$$\Rightarrow P(s,v).time + P(v,d).time > \tau.t $$

Thus there does not exist any travel path from s to d via v whose total travel time is no greater than τ.t, and vertex v can be pruned from the expansion tree safely.

The congestion probability label of vertex v is defined by

$$ P(s,v).prob_{c} = 1 - (1 - P(s,c).prob_{c}) \cdot (1 - o(v).prob_{c}), $$
(12)

where P(s, v).p r o b c is the congestion probability of path P(s, v), and o(v).p r o b c is the congestion probability of moving object o at vertex v (refer to Eq. 10).

In the T A U P t i m e algorithm, we inherit the best-first search strategy from the conventional network expansion (e.g., Dijkstra’s algorithm). At each time, we select the vertex v with the minimum congestion probability label P(s, v).p r o b c for expansion. We also check the time label of v to guarantee that the value of (P(s, v).t i m e + P(v, d).t i m e) is no greater than the time threshold τ.t. The T A U P t i m e search algorithm is detailed in Algorithm 3. The query input includes a source s, a destination d, a departure time t, and a time threshold τ.t, while the output is the path P(s, d) with the minimum congestion probability. Initially, the path P(s, d) is set to null, and the set of scanned vertices is also set to null, and the probability label of each vertex vV is set to 0 (line 1–2). The source s is labeled as scanned vertex and is put in the heap O s . Its time label is set to 0, and its probability label is set to 1 (lines 3–4). During the search process, in each iteration, we select the vertex c with the minimum probability label P(s, c) c .p r o b c from the heap O s (lines 5–7). Once c = d, the destination d has been reached and the path P(s, d) is found (lines 8–12). We evaluate the distance label for each adjacent vertex of c. If the value of \((P(s,c).time + w(c,n) + \frac {1}{n.k} + sd(n,d))\) is greater than time threshold τ.t, vertex n can be pruned safely. Otherwise, for vertex n, if its current probability label is greater than the new label (1−(1−P(s, c).p r o b c )⋅(1−o(n).p r o b c )), its probability label and time label are updated, respectively, and n is put into the heap O s (lines 13–19).

figure i

4.2 Probability-threshold TAUP query

Definition 4

Probability-threshold TAUP Query Given a traffic-aware spatial network G t a (V, E), a query source s, a query destination d, a departure time t, and a probability threshold τ.p, the probability-threshold TAUP query finds the path P from s to d with the minimum travel time and its congestion probability is less than τ.p, such that \(\forall P^{\prime }(s,d) \in Pathlist(s,d)\setminus \{P\}(P(s,d).prob_{c} < \tau .p \wedge P^{\prime }(s,d).prob_{c} < \tau .p \rightarrow P(s,d).time < P(s,d).time)\), where P a t h l i s t(s, d) is a path set that contains all paths from s to d.

figure j

To compute the probability-threshold TAUP query efficiently, we develop a novel T A U P p r o b algorithm. For each vertex vG t a (V, E), we maintain a time label v.t i m e and congestion probability label v.p r o b c , and their definitions are given as follows.

$$ v.time = P(s,c).time + w(c,v) + \frac{1}{n.k} + sd(v,d) $$
(13)
$$ v.prob_{c} = 1 - (1 - c.prob_{c}) \cdot (1 - o(v).prob_{c}) $$
(14)

Here, c is the parent vertex of v in the expansion tree, and s d(v, d) is the shortest path distance between vertex v and the query destination d. The time label includes two parts: (P(s, c).t i m e + w(c, v) + 1/v.k) describes the travel time from the query source s to vertex v via c, while s d(v, d) is the heuristic part to estimate the travel time from v to d. Intuitively, the value of s d(v, d) is always less than the exact travel time from v to d, since the moving object processing time is not taken into account. Thus the correctness of T A U P p r o b algorithm can be guaranteed.

The T A U P p r o b is detailed in Algorithm 4. The query input includes a source s, a destination d, a departure time t, and a probability threshold τ.p, while the output is the path P(s, d) with the minimum travel time. Initially, the path P(s, d) is set to null, and the set of scanned vertices is also set to null. The time label of each vertex vV is set to \(+\infty \), and the probability label is set to 0 (line 1–2). The source s is labeled as scanned vertex and is put in the heap O s (line 3). During the search process, in each iteration, we select the vertex c with the minimum time label c.t i m e from the heap O s (lines 4–6). Once c = d, the destination d has been reached and the path P(s, d) is found (lines 8–11). If the value of n.p r o b c is no greater than probability threshold τ.p, vertex n can be pruned safely. Otherwise, for vertex n, if the value of P(s, n).t i m e is greater than that of \(P(s,c).time + w(c,v) + \frac {1}{n.k}\), the value of P(s, n).t i m e is updated to that of \(P(s,c).time + w(c,v) + \frac {1}{n.k}\). We also update the probability label and time label of n, respectively, and we put n into the heap O s (lines 12–19).

5 Experimental results

We conducted extensive experiments on real and synthetic spatial data sets to demonstrate the performance of TAUP queries. The two data sets used in our experiments were the Beijing Road Network (BRN) and synthetic Indoor Transfer Network (ITN), which contain 28,342 vertices and 6,105 vertices, respectively, stored as adjacency lists. In BRN, we adopted the real trajectory data collected by MOIR project [11]. In ITN, synthetic trajectory data were used. All algorithms were implemented in Java and tested on a Linux platform with Intel Core i7-3520M Processor (2.90GHz) and 8GB memory. The experimental results were averaged over 20 independent trials with different query inputs. The main performance metrics were CPU time and the number of visited vertices. The number of visited vertices was selected as a metric for two reasons: (i) it can describe the exact amount of data access; (i i) it can reflect the real disk I/O cost to a certain degree. The parameter settings are listed in Table 2. By default, the number of uncertain trajectories were 40,000 and 20,000 in BRN and ITN respectively to construct the traffic-aware spatial network.

Table 2 Parameter settings

5.1 Performance of uncertain trajectory reconstruction

First of all, we studied the performance of the uncertain trajectory reconstruction (UTR) in Algorithm 1, when uncertain trajectory length (the number of sample points) varies (Fig. 4). The length of uncertain trajectory was set from 2 to 10 in both BRN and ITN. Longer trajectories lead to more computation efforts, thus the CPU time and the number of visited vertices are expected to be higher in both BRN and ITN. Generally, the maximum CPU time is under 700 ms in BRN and under 400 ms in ITN, while the maximum number of visited vertices is less than 2000 in BRN and is around 1500 in ITN.

Fig. 4
figure 4

Performance of uncertain trajectory reconstruction

5.2 Traffic-aware spatial network G t a (V, E) construction

Figure 5 demonstrates the performance of the construction of Traffic-Aware spatial network G t a (V, E) (refer to Algorithm 2), including the construction times (Fig. 5a and c) and network sizes (Fig. 5b and d) in BRN and ITN, respectively. The number of reconstructed possible paths (reconstructed trajectories) is from 10,000 to 40,000 in both BRN and ITN. Obviously, more trajectory data leads to higher construction time, and larger size of the constructed traffic-aware spatial network. The traffic-aware spatial network construction is an off-line process due to the huge amount of possible paths, and the traffic records are stored in disk. For each vertex vG t a (V, E), we maintain a pointer pointing to the positions of the related traffic records in disk. Once the vertex v is scanned by a network expansion, the related traffic records can be accessed efficiently.

Fig. 5
figure 5

Performance of traffic-aware spatial network construction

5.3 Performance of TAUP query processing

We studied the performance of the time-threshold TAUP query processing (refer to Algorithm 3) and probability-threshold TAUP query processing (refer to Algorithm 4) in Sections 5.3.1 and 5.3.2, respectively.

5.3.1 Effect of time threshold τ.t

Figure 6 shows the effect of τ.t on the time threshold TAUP query processing. For the purpose of comparison, a baseline algorithm was also implemented. Compared to T A U P t i m e , the baseline algorithm did not use the travel time lower bound P(v, d).t i m e.l b to prune the search space. The baseline algorithm is denoted as “ T A U P t i m e without lower bound” in Fig. 6. Intuitively, the larger the time threshold is, the larger search space it may need. Thus the CPU time and the number of visited vertices are expected to be higher when the value of time threshold increases. In Fig. 6, it is clear that the time-threshold TAUP query can be computed within 1 minute in both BRN and ITN, and with the help of travel time lower bound, the query efficiency is improved notably. The CPU time and the number of visited vertices of the baseline algorithm are almost as twice as those of T A U P t i m e .

Fig. 6
figure 6

Effect of time threshold τ.t

It is worth to note that (1) the number of visited vertices may be greater than the size of vertex set V , since a vertex may be visited several times in one query; (2) the CPU time is not fully corresponding to the Disk I/O times. In some cases, the increment of computation cost may offset the benefits from the reduction of Disk I/O times.

5.3.2 Effect of congestion probability threshold τ.p

Figure 7 shows the performance of probability-threshold TAUP query processing when the value of τ.p varies. For the purpose of comparison, the T A U P p r o b algorithm without the heuristic part was used as baseline algorithm. It is denoted as “ T A U P p r o b without heuristic” in Fig. 7. Generally, a lager probability threshold will lead to a larger search space. Thus the CPU time and the number of visited vertices are expected to increase when the value of τ.p increases. From Fig. 7, it is clear that with the help of heuristic search strategy, the performance of probability-threshold TAUP query processing is improved by 1.5 – 2 times in terms of both CPU time and the number of visited vertices. These results clearly demonstrate the superiority of the heuristic strategy.

Fig. 7
figure 7

Effect of congestion probability threshold τ.p

6 Related work

6.1 Path queries in spatial networks

In static spatial networks, Dijkstra’s algorithm [4] and the A algorithm [7] are two classical methods to address the shortest path problem. In Dijkstra’s algorithm, an expansion wavefront is expanded from the source s. A priority heap H is adopted to maintain the unscanned vertexes. At each expansion, vertex v with the minimum distance label (usually the network distance from s to v) will be selected, and removed from the priority heap, and labeled as scanned vertex. In the next, all the unscanned neighbor vertex to v will be put into H, until the destination d has been reached and the shortest path from s to d is found. In A algorithm, the value of s d(v s , v) + d E (v, v d ), where d E (v, v d ) is the Euclidean distance between v and v d , is used as the distance label of vertex v, to estimate the network distance from v s to v d via v. A algorithm is an heuristic searching approach, and its performance is greater than Dijkstra’s algorithm in general.

Time-dependent road network [5] and probabilistic road network [8] are two representative dynamic spatial networks. In time-dependent road networks, each edge may have different weight (travel cost) at different times. Time-dependent shortest path (TDSP) query is a variant of dynamic shortest path problem, which is designed to find the best departure time for users, to minimize the global traveling time from a source to a destination over a large road network, where the traffic conditions are dynamically changing from time to time. However, the TDSP query only considers the dynamic edge weight, but the possibility of traffic congestion at each vertex and the corresponding vertex time-delay are not taken into account. While in our TAUP query, the edge weight is a constant value, and our study focuses on computing the possibility of traffic congestions and the corresponding time-delays at vertices. The vertex time-delay depends on the number of moving objects there and also on their arrival times; thus, it is not a constant value. That is the main difference between the TDSP query and our TAUP query; thus, the optimization techniques can hardly be used in our problem. Similarly, in probabilistic road networks, each edge is assigned a set of probabilistic data to describe the traveling cost along this edge, and probabilistic shortest path (PSP) queries [8] ask for (i) the fastest path constrained by a probability threshold, and (i i) the path with the highest probability constrained by a travel time threshold. Due to the similar reasons of the TDSP query (the possibility of traffic congestion and vertex time-delay are not taken into account), the optimization techniques in the PSP query cannot be used in our TAUP query.

Despite the bulk of literature on shortest path queries [4, 5, 7, 8], none of the existing work can address the proposed TAUP query due to two reasons: Dijkstra’s algorithm and A algorithm are not aware of the time-delay and moving object processing time, and time-dependent and probabilistic shortest path queries are based on different traffic models from ours.

This paper is an expanded version of our previous works [15, 16]. We inherit the concept of traffic-aware spatial network, which covers the majority of the contents in Sections 2 and 3. In this paper, we propose two new probabilistic path queries in traffic-aware spatial networks (TAUP queries), and develop efficient algorithms to compute them (refer to Section 4). In addition, we conduct extensive experiments to demonstrate the performance of the two new queries under different query settings (refer to Section 5). Compared to the conference version [15], more than 40 percent significantly new materials are added into the journal version.

6.2 Uncertain trajectory data management

Uncertain trajectory data management [3, 13, 17, 19, 21, 22, 27, 28] has received significant attention in recent ten years. Wolfson et al. [21, 22] addressed the update problem of moving objects by proposing an information cost model that captured the uncertainty, deviation and communication. Pfoser et al. [13] proposed a formal quantitative approach to the aspect of uncertainty in modeling moving objects. Massive uncertain trajectory data enabled many novel spatial queries. Trajcevski et al. introduced continuous range queries [19] and nearest neighbor queries [18] on uncertain trajectories. Zhang et al. [27] devised an efficient location-prediction method, and integrated it into an effective indexing structure designed for uncertain moving objects. Zheng et al. [28] presented the probabilistic range queries on uncertain trajectories in road networks and the corresponding efficient solutions.

7 Conclusions and future directions

In this paper, we studied a novel problem of planning unobstructed paths in traffic-aware spatial networks (TAUP queries) to avoid the potential traffic congestions. We propose two probabilistic TAUP queries based on time threshold and congestion probability threshold, respectively. The TAUP queries are useful in some popular applications, such as planning unobstructed paths for VIP bags in airports and recommending convenient routes to travelers. As a solution, we constructed a traffic-aware spatial network to model the traffic awareness and developed two algorithms to compute the TAUP queries efficiently. The performance of the developed algorithms were verified by extensive experiments on real and synthetic spatial data.

Four interesting directions for future research exist. First, it is of our interest to establish a new pattern-based probabilistic model by analyzing historical traffic data, to replace the existing random-walk based model in uncertain trajectory construction (refer to Algorithm 1). Second, it is our plan to establish a new spatial index on traffic-aware spatial networks, and to develop a much tighter lower bound to prune the search space and to further enhance the query efficiency. Third, we are interested to study the cost-aware path planning problem in traffic-aware spatial networks. We may define different levels of segment cost in traffic-aware spatial networks, and low-cost segments (e.g., priority segment and express lane) may have higher priority in path planning. We may also make a trade-off between time/probability threshold and segment cost. Some paths may not guarantee the given thresholds, but they may achieve a comparable low travel cost, and they may still need to be taken into account. Fourth, we plan to adopt parallel computing technique to accelerate the process of traffic-aware spatial network construction, especially when we have massive volume of uncertain trajectories to be processed. The construction of traffic-aware spatial network takes two steps: first, we reconstruct uncertain trajectories to possible paths, and second we use these reconstructed paths to construct traffic-aware spatial network. The parallel computing technique is useful in the first step, because the reconstruction process for different uncertain trajectories are independent, and the whole task can easily be divided into N sub-tasks for N different workers. The overall performance of the first step can be improved by approximate N times in terms of CPU time.