1 Introduction

Over time, we all travel on regular routes, daily, weekly, or monthly that exhibit some specific patterns pertaining to our lifestyles and work schedules. These paths are anything but random, embracing our habitual trails. For instance, students go by regular school buses daily; working professionals shuttle between offices and homes every morning and evening; for recreation purpose, we have our favorite hangout places where we patron regularly. The same idea applies over walking, driving, and riding on public transports that leave our trails behind our activities on the use of the roads. Based on this principle, nowadays many projects of smart cities are much talked and planned about; one of the main parts of smart city synergy [1, 2] is about sensing, collecting traffic patterns, analyzing them, and inferring a suitable strategy in traffic/town planning for improving quality of life. Applications that were designed for the sake of enhancing our transportation systems such as crowd control, security surveillance, traffic jam remedies, and better allocation or sharing of road resources have emerged recently. The ability of the underlying decision making, however, hinges on how efficient and effective the analytics is in analyzing and predicting the next moves of every individual.

Positioning technology [3] evolves with increasing popularity recently with better accuracy and affordability. Location-based services for smartphones are so common nowadays [4], which signal the users’ current geographic locations to the mobile network and pull information about points of interest from their proximities [5]. Predicting a person’s next location where he is moving to, from his current position is becoming mandatorily important for location-based analytics which in turn supports intelligent transportation systems [6] and location-based services (LBS) [7] for individuals. For an example of unmanned vehicle which is a rising research topic, Nissan admitted that if the route of vehicle could be known and preset in advance, hybrid fuel economy can be saved by 7.8% [8]. It is seconded by Tate and Boyd who invented some optimal control scheme for a hybrid vehicle assuming the route is already known [9].

For predicting a traveler’s trajectory given his current position and his historical path, a number of algorithms exist [9,10,11,12]. These algorithms usually attempt to match a predicted future path with the existing path with the lowest fitting error. The algorithms often assume that the trajectory carries elements of people’s intentions and destinations, as well as the moving speeds and their changes along the way. They work by connecting the dots along the paths, so to infer the future path for a particular traveler.

In this paper, we investigate an alternative type of trajectory prediction model which simplifies the trajectory details using the concept of segments. The trajectories or paths that have been travelled by all the users are aggregated into travel segments for generalizing them into some representative frequently travelled paths. Each segment is made up of turns at the junctions along the way, thereby ignoring the distance and direction details between the turns. When we piece up a series of segments representing a current trajectory, the next direction of turn given the current junction where the traveler is at, could be predicted according to the segments which he has travelled through. This “next-turn” prediction model works by computing through the history of all the travelers’ paths as to match the current traveler’s path and his next turn. In other words, our new model does not require detailed matching. The next turn of a path which we are interested to know is inferred from a similar representative travel pattern summed from an aggregated archive of data. There is hence no need to collect sufficient path records for an individual traveler nor digging into details of each path travelled by the individual traveler. It is therefore more suitable for big data analytics [13] where often mass data (instead of individual profile) are dealt with. Efficiency is anticipated by our model because we simply match the segments which are already generated and transformed from the original raw data in the preprocessing step upon adding in new records from the sensors to the database. The segment database is believed to well represent the purposes and traveling patterns of the travelers without explicitly recording the details. For example, a high traffic flow of high traveling probability from the start to the end of a highway connecting some important suburbs to the central business districts in the weekday mornings, represents a popular route for people arriving office for work. Their turns in terms of entries and exists of the highways could be represented by probabilities for guessing where the mass would go. Other examples include but not limited to a frequently travelled route linking from public transport stations to tourist attractions, occasional heavy packs of vehicle flow to and from stadiums during events, popular transits at certain intersections connecting people living and working in different areas, and several routes that direct people to and from an airport, etc.

We predict by matching the current trajectory without regard to any particular traveler’s historical paths. Using concept of travelled segments that facilitates rough representation of trails accumulated, and collective segments that are summed over from the population in the archive, help simplify the computation significantly.

The remaining of this paper is structured as follows. Section 2 presents the related works about some possible application scenarios as well as the similar trajectory prediction techniques. In Sect.  3, the prediction model is described. Simulation experiments and the results are reported in Sect. 4. Section 5 concludes the paper.

2 Related work

2.1 Application scenarios

Some possible application scenarios are listed below, by using our model to predict the next turn given a current position: (1) Urban security—after a crime took place, the robbers were on the run. With the aid of real-time CCTV cameras that are streaming in live data feed [14], the policemen know about the trajectory of robbers have so far taken. Without knowing their identities and being unable to know their destinations, the robbers were last seen at a road junction. In this situation, the police would want to know the probabilities of the subsequent turns from the junction, given the path which the robbers travelled from so far, so to continue the pursuit. Our model predicts the most probable turn (or direction) that a traveler would be most likely taking provided that the path pattern that they came from. (2) LBS recommendation service [15]—we assume the traveler who is a tourist has visited a series of spots of interest. From the path that connected a number of tourist attractions that he traveled, we could recommend the next destination this tourist should visit. The recommendation is based on the most probable next turn given his current location, as most people have already travelled in such style. (3) Road navigation system for vehicle drivers—for vehicle drivers, since their trajectories have been tracked as a series of segments, at each turn, probabilities of traffic flows could be displayed to the drivers. This additional information offers drivers information of which turns of the road from this junction that most vehicle drivers have mostly taken given the routes that they have traveled. This is useful when explicit destination is not known nor stored in the system, drivers drive by go-with-the-flow strategy that follows the most probable route which has been taken by drivers in the past who travelled the same routes to the current junction. (4) Road planning—by knowing the probabilities of which turns at each junction, simulation model could be better conducted in modeling the road traffic usage, and give better results especially in what-if scenarios planning. (5) Catching the catch—the same idea by this model could be applied in lieu of human on animals that move, for example, animal hunting, netting fish, and intercepting moving targets on the road. At different time periods and certain season, fish in the wild usually swim pass a specific place, from one stream to another. When their historical paths are known, fishermen could predict by when and where they will be making a bigger catch. The same principle applies to anything that is on the run over the road network [16]; knowing where they are from, where they are now, the model predicts their next turns from the collective big data.

2.2 Trajectory inference models

Pu [10] gave a method based on dead-reckoning. She predicted the next latitude and longitude by the current latitude, longitude, bearing, in consideration of speed and time intervals. According to her method, when these elements are obtained, multiply speed by travel time is the traveled distance. The next latitude and longitude could be predicted by current latitude, longitude, bearing, and distance. Using this algorithm, a remote system could get to know the real-time position of the pedestrian. It was argued that GPS signal could be affected by interference via various factors such as weather, obstacles of buildings, blind spots of electromagnet fields in underground and enclosed lifts, etc. Even if GPS signal could be received normally, there is also delay on remote monitor. Therefore, by using complex ingredients in predicting a result, the result at best is an approximate from the actual situation.

In Liu and Karimi’s [17] algorithm, they consider the current location of the person to be predicted as the center of a circle, in considering the speed limits on the roads that multiplies the prediction period as the radius of the circle. Exit points are denoted as the points at the circle circumference that intersect with the road. A graph search tree is constructed by taking current location as the root, and exit points as leaf nodes. From the root to a leaf, the shortest paths are regarded as the possible paths. There is a probability for taking each node. According to the probabilistic model, from current position to each exit point, the probabilities could be calculated. The path that has the highest probability is the most possible route. Coverage of a radius of circle is considered in this method, which is unlike ours. We instead centered on the road junctions by approximating the segment lengths instead of precise distance from circle radius.

In Ye’s [18] algorithm, they mined the pedestrian’s route to generate route of interest (ROI). Then all the ROI points are integrated and transformed into a prediction tree. Based on the tree, a prediction is made to predict the next ROI of the pedestrian. In this method, similar concept of simplifying the detailed routes like ours was used. We took it to a higher level of abstraction by summing up the routes traveled from all the travelers.

Anagnostopoulos [19] proposed a context model based on classification, which deals with location prediction of moving users. The induced model predicts the next movement of a mobile user with certain moving profile and history of movements. Then temporal context periods throughout a day are also incorporated in the proposed model. Two classification schemes are introduced in order to support location prediction. These schemes are evaluated with three data mining algorithms, and the most accurate algorithm is adopted. Finally, the classification schemes are also compared with three non-data mining schemes for location prediction, by means of prediction accuracy.

In Liao’s [20] method, they introduced a hierarchical Markov model to learn and infer a user’s daily movement through an urban community. Multiple levels of abstraction are used to bridge the gap between raw GPS data and high-level information such as user’s destination and mode of transportation. Rao-Blackwell utilized particle filters to achieve efficient inference. Locations where the user frequently changes mode of transportation, such as bus stops and parking lots, are learned from GPS data logs without manual labeling of training data. Then it detects the novel behavior or user errors by modeling activities in the context of the user’s historical data. Recently, there is an emerging research trend with analytics applied on big road traffic data. They range from using big data to predict highway traffic accident [21], analysis of transportation system [22], to vehicle speed estimation [23], etc.

3 Segment-based prediction model

3.1 Model formulation

The concept of segments, current trajectory, and future trajectory is shown in Fig. 1. The objective is to predict which turn from a junction where a traveler will continue from his current position. The junction is called decision point. The travelled paths (from markers S to P as shown in Fig. 1) are characterized by segments which is a series of junctions connected in some order.

Fig. 1
figure 1

Illustration of segment, current trajectory, and future trajectory

The formulation of the prediction model is based on the layout of a road map where the accessible routes and junctions are shown. In the simplest form, the entire map contains information, \(M={<}{V{,}E}{>}\). The junctions in the map are represented by a collection of vertices, V. A vertex in M is denoted by a group of \(v_{i }(i\in {\vert }N{\vert })\). \({\vert }N{\vert }\) is the number of possible branches spanning out from the junction. A series of junctions which a traveler has passed is an ordered vector denoted by \(\rho _{j} =(v_{1},v_{2},v_{3}, {\ldots } v_{M})\). The current junction where the position of the traveler is currently at is called a decision point \(v_{D}\). The next junction (or direction of the turn) that the traveler is predicted to turn into from \(v_{D}\) is \(v_{N}\), which is coined as the future vertex. \(s_{D }\in E\) is a series of vertices that the traveler has passed through before reaching \(v_{D}\). It is called current trajectory that has a list of visiting records. At the decision point, \(v_{D}\), the probability of taking a future turn \(v_{N}\) is shown in Eq. 1, \(P(\bullet )\).

$$\begin{aligned} P\left( v_{N}, v_{1}, v_{2}, v_{3}, {\ldots }v_{D}\right)= & {} P\left( v_{N}{\vert }s_{D}\right) \nonumber \\ P\left( v_{N},s_{D}\right)= & {} P\left( v_{N}{\vert } s_{D}\right) \end{aligned}$$
(1)

where \(v_{1}, v_{2}, v_{3}, {\ldots } v_{D}\) are transformed into a vector of segments as \(s_{D}\). The segment vector is a simplified version of route that the travelers have already passed through \(v_{1}, v_{2}, v_{3}, {\ldots } v_{D-1}\), prior to arriving at the decision point \(v_{D}\). \(P(v_{1}, v_{2}, v_{3}, {\ldots } v_{D})\) is the summation of the probabilities in terms of frequency counting of taking \(v_{1}, v_{2}, v_{3}, {\ldots } v_{D}\).

\(P(v_{N}{\vert }s_{D})\) defines the probability of taking \(v_{N}\) when the traveler walked through \(v_{1}, v_{2}, v_{3}, {\ldots } v_{D-1}\) that leads to the decision point \(v_{D}\). When the traveler is now standing at the crossroad at \(v_{D}\), which is the decision point, the probability matrix of taking each future vertex is shown as follows:

$$\begin{aligned} \overline{P(M)} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {P\left( {v_{1} |s_{k} ,s_{1} } \right) ~,} &{} {P\left( {v_{2} |s_{k} ,s_{1} } \right) } \\ \end{array} } &{} \cdots &{} {P\left( {v_{i} |s_{k} ,s_{1} } \right) } \\ \vdots &{} \ddots &{} \vdots \\ {\begin{array}{*{20}c} {P\left( {v_{1} |s_{k} ,s_{{j - 1}} } \right) ,~~} &{} {P\left( {v_{2} |s_{k} ,s_{{j - 1}} } \right) } \\ {P\left( {v_{1} |s_{k} ,s_{j} } \right) ,~~~} &{} {P\left( {v_{2} |s_{k} ,s_{j} } \right) } \\ \end{array} } &{} \cdots &{} {\begin{array}{*{20}c} {P\left( {v_{i} |s_{k} ,s_{{j - 1}} } \right) } \\ {P\left( {v_{i} |s_{k} ,s_{j} } \right) } \\ \end{array} } \\ \end{array} } \right] \forall ~i,j,k \end{aligned}$$
(2)

In this prediction matrix, conditional probability \(P\left( {v_i|s_k ,s_j } \right) \) is the probability of the traveler who has passed \(s_{D-1}\) finally getting to \(v_{i}\) along the trajectory \(s_{j}\). v\(_{i}\) is future vertex like \(v_{N}\). \(s_{k }\)is the trajectory the traveler has passed. \(s_{j }\) is a series of vertices the traveler has taken from \(v_{D}\) to \(v_{i}\). Sum of a column in Eq. 2 could be expressed as follows:

$$\begin{aligned} \overline{P(M)}= & {} \left[ {\mathop \sum \limits _{z=1}^j P\left( {v_1 |s_k ,s_z } \right) ,\mathop \sum \limits _{z=1}^j P\left( {v_2 |s_k ,s_z } \right) ,...\mathop \sum \limits _{z=1}^j P\left( {v_i |s_k ,s_z } \right) } \right] \nonumber \\= & {} \left[ {\frac{\left| {s_1 ,v_1 } \right| }{\left| {s_k } \right| },\frac{\left| {s_1 ,v_2 } \right| }{\left| {s_k } \right| },\ldots \frac{\left| {s_1 ,v_j } \right| }{\left| {s_k } \right| }} \right] \end{aligned}$$
(3)

where \(s_k \) is the trajectory of the traveler who has passed the sequence of vertices \(v_{1}, v_{2}, v_{3} {\ldots } v_{D}\). \(\left| {s_k } \right| \) is the historical number of travelers who have passed this path crossing through the vertices sequence \(v_{1}, v_{2}, v_{3} {\ldots } v_{D}\) in the path history database. \(\left| {s_k ,v_N } \right| \) is the number of times that travelers have passed the sequence \(v_{1},v_{2}, v_{3} {\ldots } v_{D}\) as historical path, leading to the next vertex \(v_{N}\).

\(\mathop \sum \nolimits _{z=1}^j P\left( {v_i |s_k ,s_z } \right) \) is the probability of a traveler who has passed \(s_{k}\) segments, finally reaching the point \(v_{i}\). Between the decision points \(v_{D }\) and \(v_{i}\), the traveler has passed \(s_{k}\).

From Eq. (3), the probability of a traveler who has passed a series of vertices represented by \(s_{k}\) and eventually arrives at one of the possible candidate future vertex \(v_N^{\prime } \) is shown as follows:

$$\begin{aligned} P\left( {v_N^{\prime } ,s_k } \right) =\mathop \sum \limits _{z=1}^j P\left( {v_N^{\prime } |s_k ,s_z } \right) =\frac{\left| {s_k ,v_N^{\prime } } \right| }{\left| {s_k } \right| } \end{aligned}$$
(4)

The most probable next turn is calculated as maximum \((P(v_{N},s_{k}))\).

3.2 Prediction procedures

One prime ingredient for the prediction is the raw sensing data collected over time, marking the locations with time-stamps traversed by all the road users. In a general sense, the road users can be any cars, bikes, or pedestrians. In our model, a minimum of three segments are used to mimic the shape of current trajectory, which would be used to match it against the historical paths. Figure  2 shows a step-by-step workflow that preprocesses the raw data to segments in terms of path IDs for establishing a trajectory. The raw data format consists of data fields such as latitude, longitude, altitude, and a time-stamp in the form of day of the week, date, and time. The converted data format has the same with an automatically generated path ID number appended; the time-stamp is simplified too for fast processing.

Fig. 2
figure 2

Workflow of the general model setup procedures

A real-life GPS dataset from Microsoft Research AsiaFootnote 1 which contains 17,687 trajectories is used in the experimentation for validating the proposed model. This dataset recoded a broad range of users’ outdoor movements, including not only life routines like going home and going to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. It is widely distributed over 30 cities of China and even in some cities located in the USA and Europe. The majority of the data was created in Beijing, China. Every single folder of this dataset stores a user’s GPS log files in PLT format. PLT files are vector image files primarily associated with a graphic design program called AutoCAD. Each PLT file contains a single trajectory and is named by its starting time. So, there are 17,687 PLT files in total. The files contain header fields such as information of latitude, longitude, height, speed, and heading direction. After preprocessing, we retained information of latitude, longitude, and time-stamp as temporal–spatial data.

4 Experiments

The experiments are designed with the objective of verifying the possibility of the proposed segment-based model. Pertaining to the design of the prediction model, there are four main aspects that may vary in the model which in turn might affect the performance of the model; they are 1) different segment numbers that compose a trajectory; 2) different trajectory or path lengths that are already known; 3) different shapes of the trajectory; and 4) effects of time filters. The dataset contains path information of both past and future. By referring to the time-stamps, we can control using the paths that are simulated to be those that have been travelled, and those so-called future paths. Thus, it is possible to validate the accuracy of the prediction model since the subsequent future paths are already known in advance as they were stored in a database in the testing, which have extended out from the travelled paths.

4.1 Testing with different numbers of segments

By default, we just arbitrarily choose three random segments for fitting a trajectory that contains information of locations that have been travelled. Usually any number of segments can be used to form a trajectory. If a different number of segments are used to match the shape of the current trajectory, we may probably get different results. So, in this part of the experiment we test different numbers of segments and check the effect on the prediction accuracy. We randomly select some testing paths that are made up of 2 to 5 segments. Paths of path IDs, #2, #3, #5, and #9 have been selected (the ID numbers are just arbitrary). It was found that a path that is made up of five segment has covered most of the complex shapes and that has almost reached the maximum length of paths that most people would have travelled in a city. Five segments are sufficient to cover even the most complex travel patterns without any recurrence, or leading back to their original places, e.g., return trips. The paths selected here are those that took place within half a day (12 hours). They all have unique starting positions and ending positions, as one-way trips. The results are given in Table 1 which shows the probabilities of the possible turns. The turns that have satisfactory good probabilities are chosen as the predicted turn. They are marked as italics in the table. Then, the predicted turn is checked against the actual turn from the data archive. The same experiment is repeated 20 times with the same setting but with different randomly picked paths from the large data archive. The average accuracies are charted in Fig. 3.

Table 1 Experiment results of different numbers of segments used
Fig. 3
figure 3

Average results of different numbers of segments used

Figure 3 shows that each line stays almost flat when the number of used segments varies. Again, in Table 1, we could observe that the number of segments affects just slightly on the accuracy. For this reason, we compute the accuracy changing rate based on the segment variety using a simple formula \(\frac{\Delta _{accuracy} }{\Delta _{segment\,number} }\). The accuracy changing rates for the four randomly chosen paths, after averaged, are: 0.966667, 1.393333, −0.1, and 0.00054%, respectively. Averaging out the changing rates, we have 0.565%. As a result, the accuracy only changes about 0.565% upon increasing or decreasing one segment in average. The effect of segment amounts is quite insignificant.

4.2 Testing with different lengths of trajectory

In this part, we test current trajectory with different current trajectory lengths in order to investigate the relationship between current trajectory lengths and accuracy. In this test group, 2 to 4 segments are used to fit the shape of the current trajectory. The same experiment setting is kept; 20 sample runs are drawn randomly from the data archive. The results are given in Table 2. Again those paths that produce sufficiently high probabilities are marked in shaded cells. The average accuracies from the 20 samples are averaged and charted in Fig. 4.

Table 2 Experiment results of different lengths of trajectory used
Fig. 4
figure 4

Average results of different lengths of paths used

Figure 4 shows that the average accuracy of guessing the future turn correctly from each test path rises as the current trajectory length increases. Cross-checking with the results in Table 2, we can see that the accuracy improves by one degree on every 100 meters of extra distance added. From the above data, we confirm that the correct prediction accuracy is in a direct ratio to the current trajectory length. When the current trajectory increases per 100 meters, the accuracy will grow 0.46418625% in average.

4.3 Testing with different complexities of trajectory

Here, in this part of the experiment, we pick ten test paths randomly and place them into two groups according to their trajectory characters. In general, group A is more straightforward than group B. The trajectories in different groups have different twists and turns. The objective of this experiment is to evaluate the performance of the prediction model under different shapes of trajectory. The same experiment setting is used, and 20 samples from each path are taken for evaluation. The results are averaged out and charted in Figs. 5 and 6 for group A and group B, respectively. The selected ten paths are given in “Appendix.”

Fig. 5
figure 5

Average accuracy versus path length for group A of paths

Fig. 6
figure 6

Average accuracy versus path length for group B of paths

From Figs. 5 and 6 that indicate the performance by groups A and B, respectively, the accuracy variations for each test in groups A and B are estimated by the following formula. Average accuracy for test \(i=\frac{\mathop \sum \nolimits _i Accuracy\left( i \right) }{\mathop \sum \nolimits _i Length\left( i \right) }\). So,

  • Group A: Average accuracy for group A = \(\frac{447.77\% }{14479}\) = 0.030925%

  • Group B: Average accuracy for group B =  \(\frac{359.42\% }{7418}\) = 0.04845241%

According to the group results, it is reasoned that as the current trajectory length increases, the collective accuracy of the paths in group B increases more than that of the paths in group A. That means for the same level of path length, higher prediction accuracy can be achieved for the paths with more complex characters than those with simple structure. In other words, complex structures help better characterize the paths and hence they can identify the traffic flow more strongly and uniquely.

To sum up thus far, we could get the following conclusion by observing the experiment results.

  1. 1.

    Accuracy is generally improved when the current trajectory length increases.

  2. 2.

    For different character level, in the same trajectory length level, we could get a higher accuracy from the group with complex trajectory character which is characterized by more twists and turns.

So, the prediction accuracy is dependent on the current trajectory length and trajectory character in terms of different turns.

4.4 Testing with time filter

In reality, traveled traits are time related which are due to the fact that people (travelers) do travel by some time schedules. For example, certain roads are more traveled than the others during business peak hours, schooling times, and religious worships that go by certain hours of the day and days of the week.

Within certain time periods, a large number of people do the same thing in common. In the morning most of us do go work/study, till afternoon or evening we return home. At noon, people seek for lunch. Therefore, our prediction model can be modified with a time filter that caters for refining the predictions in account of different time periods of a day. In this modification, we match current trajectory with historical path which was gathered from within a specified time period instead of all the historical paths from the database.

Fig. 7
figure 7

Average accuracy of prediction with and without time filters

Let \(\delta \) be the time period of a day.

Let \(\delta ({\vert }s_{k}{\vert })\) be the number of historical paths that travelers have traversed through the vertices sequence \(v_{1},v_{2}, v_{3} {\ldots } v_{D}\) during the time period \(\delta \).

\(\delta ({\vert }s_{k},v_{k}{\vert })\) is the number of historical paths that travelers have traversed through the vertices sequence \(v_{1},v_{2},v_{3} {\ldots } v_{D}\) and subsequently reached \(v_{k}\), during the time period \(\delta \). Substituting into Eq. 2, the time-based prediction could be approximated by \(\frac{\delta (|\hbox {s}_k ,v_N^{\prime } |)}{\delta \left( {\left| {s_k } \right| } \right) }\).

In our experiment, three main time periods are used, and they are 7:00 to 10:00, 11:00 to 15:00, and 17:00 to 20:00 which generally represent morning, noon, and evening. By imposing the time periods over the 12 paths, with 20 samples for each path, the results are simulated and given in the following table. The average accuracies are charted in Fig. 7.

Table 3 Experiment results when time filter is applied

The first column is just a list of arbitrary IDs of test paths; the second column lists the results without time period constraint. And we ignored the results with little data support such as test path 8, test path 9, and test path 10, because there are few records available during those times. Hence, the probability in this situation drops to almost zero \(({\approx } 0\%)\). From the results given in Table 3, we compare the results without time period constraint with the results with time period constraint. The prediction accuracies are averaged over the samples and charted in Fig. 7.

As shown in Fig. 7, we could conclude that generally for each test path whenever time filter is applied, there is an improvement in prediction accuracy result compared to the one without time filter, in most cases. Some time filters show strong difference in results such as the morning period and evening period. The evening period is strongest in influencing the prediction. It is supposed that there would be very high volume of traffic (even traffic jams) that occurs during those peak hours, where almost everybody is coming home out of the city by taking those limited routes. The experiment results do show that time filters have influence on the prediction results and accuracies. For instance, in test path 6, the result without time constraint is predicted at future trajectory 3 with 60.84% probability, but the result of time period from 11:00 to 15:00 is at future trajectory 2 with 59.15% probability. If the predicted trajectory took place in the period of 11:00 to 15:00, it would rather be future trajectory 2 with 59.15%.

Comparing our prediction model with some classical states of the arts like Liu [17], Ye [18], Liao [11], and Jeung’s [12] algorithms, we offer an alternative model which does not cater for predicting the next exact locations. Rather, our model approximates about which turn out of all possible turns from a road junction a traveler is likely to take after he has been traveling certain length of path. This approximate prediction model highly simplifies the data volume required for big transport data, because information regarding the traveling velocities and profiling of each individual road user are not required. It is based on the belief that the traveling patterns for each individual user would not fluctuate too much over time. So collectively when the traveling patterns for all the people in a city have been aggregated the next-turn probabilities at each road junction would become quite stable, which is resilient to some abrupt changes by some minority. If ever the probabilities do get change, it implies there are some changes in trends of traffic movements, which may trigger the need for road traffic re-planning.

In terms of data processing speed, which depends on the underlying database structure and the search algorithm being used, typically the probability inference takes about few to dozens of minutes. Most of the time was spent in scanning the database, but the probability computation is extremely simple and fast. Once the probabilities are computed and stored in place for each road junction, the testing is very fast without the need of referring to the data archive. Hence, this model is suitable for big data analytics for quick referencing the likelihood of taking which turn at a junction, and guessing where the traveling object [20] would proceed next, given the trajectory he has traveled so far.

5 Conclusion

In recent years, with the rapid popularization of the vehicle navigation equipment, intelligent mobile phones and portable computing devices, the collection of the trajectory data has become more convenient than ever. The historical trajectory data of people or cars contain important information about their movements, and through analyzing it, we can predict the user’s next location, which can provide more variety of services for the users. The factors affecting the location prediction differ like path distance, weather condition, and landform analysis. In this paper, a novel next-turn prediction model is formulated and studied. The next-turn prediction model predicts the most likely turn that a traveler would proceed from a road junction after he has travelled through certain pattern on the roads. Different from other similar prediction models, our model is inferred from collective traveling data from the mass. Matching precisely on individual users’ past records is not needed; therefore, user profiling is not required. By using mass data, the prediction can be refined by time filter which matches from path histories from only certain time periods instead of all. Our experiment results show that there is some influence on the length of past trajectory that a user has traveled to the accuracy of next-turn prediction. The complexity of the trajectory has certain effects on the accuracy too, because trajectory which takes a complex (hence relatively unique shape) can better identify and predict the future turns at road junctions ahead.

As future work, more experiments would be devoted in modeling this phenomenon of trajectory complexity in relation to next-turn prediction accuracy. It is interesting also to associate the unique patterns with different roles or purposes of travels, should sufficient information would become available in the future. This will give more dimensions of information about how the roads were used, the predicted future road uses, and the directional flows of traffic that associate with purposes of uses. Furthermore, the prediction model is anticipated to be enhanced by knowing only which particular turns that a traveler has made, we would be able to predict his next turn at the forth junction. Without referencing to the past trajectories, the prediction can be made more quickly enabling it suitable for real-time prediction that can be made at the nick of time. This work can potentially be extended to optimal route planning or traffic engineering because insights are harvested in terms of probabilities of next turn at any junction given the big traffic data.