1 Introduction

Urban traffic networks, which are typical of human-centered time-variant complex systems, have experienced rapid development in recent decades, and the problem of creating highly efficient transportation systems has attracted significant attention. Intelligent transport systems (ITSs) [1] enable traffic, road infrastructure, and user data to be recorded and regularly updated to provide vital analysis in management systems for route guidance, traffic control, traffic policy adjustment, and transportation networking planning, thus increasing traffic safety and efficiency in road transport. Intensive research has been done in spatiotemporal traffic network analysis. However, there is little systematic review that can shape the field and specific existing research directions. The correlations and differences among these existing techniques and applications remain unclear.

Typically, successful urban traffic analysis applications must capture the spatial and temporally-varying characteristics of the traffic systems they model. Traffic parameters, such as speed, occupancy, and congestion, are dynamic and location-dependent; therefore, analysis systems should provide insight into temporal and spatial pattern recognition, tendency anticipation, and status prediction results to urban traffic managers (preferably with graphical analysis demonstration). In fact, analysis should provide temporal and spatial similarities, patterns and evolution, and traffic status and predictions in real time. Therefore, it is crucial to develop technologies that efficiently summarize trends in traffic data and provide decision-making help for traffic man which have agement.

However, urban traffic network data also tend to be influenced by travel demand fluctuation: traffic control devices, accidents, major events, road construction, and even weather conditions or weekday patterns will interrupt the time-dependent regularity of urban traffic systems. Therefore, not only traffic factors such as volume, velocity, capacity, road structure, and signal control but also non-traffic factors should be considered during spatiotemporal traffic data analysis.

In addition, traffic systems should handle vast amounts of diverse types of data. For instance, NASA captures several gigabytes of remote sensor data each day [2]. As the dataset grows larger, different traffic data may be collected and stored in different formats in different traffic management systems, e.g., position data from geographic information systems (GISs); congestion data from traffic control systems (TCSs); trajectory data/pattern generated from vast majority of traffic systems; signal control data from traffic signal control systems (TSCSs); traffic demand data from mobile terminals, public transportation systems, public sensing devices, surveillance cameras or even apps like Uber; weather data from weather forecast systems; and data from anomalous traffic systems (ATSs). Urban traffic data are distributed, heterogeneous, and autonomous [3]. Urban traffic systems produce a wide variety of data with all kinds of attributes: weather information, which is sequential data; points of interest (POIs) data [4], which are spatial point data; road maps, which are graphic data; motion trajectories, which are spatiotemporal data; traffic flow, which is flow data; and social media posts, which are text or image data; trajectory data, which is basically directed graph data with time tags. Managing and integrating large-scale heterogeneous traffic data is a challenge to spatiotemporal traffic analysis. When using a variety of data in an application, only the pre-establishment of an association between the different data makes the latter analysis and mining process efficient and feasible.

The goal of this paper is to provide a comprehensive survey on spatiotemporal traffic network analysis techniques to date. First, we introduce some potential applications of spatiotemporal traffic network analysis in Sect. 2. Second, data sources and corresponding pre-treatment methods are discussed in Sect. 3. Then, various existing methodologies comprising the state of the art in traffic network analysis are discussed in Sect. 4. Discussion about the evaluation of models is organized in Sect. 5. Detailed discussion of future research challenges and new research points are also provided in Sect. 6. Finally, Sect. 7 summarizes the state of the techniques described in this paper. This framework is illustrated in Fig. 1.

Fig. 1
figure 1

Framework

2 Possible applications

2.1 Traffic prediction and forecasting

The most popular and straightforward application of spatiotemporal traffic network analysis is to predict future traffic volume, travel times and congestion, and provide route guidance for navigation systems.

In contrast to simple freeway traffic, urban traffic network flows are complicated and influenced by both traffic and nontraffic factors. Since reductions in congestion depend on network complexity [5], traffic prediction and route guidance for urban networks are much more challenging than for freeway or single arterial traffic. Urban traffic prediction can be further classified as travel time prediction, traffic congestion prediction, and traffic volume/density/status prediction/forecasting [6,7,8,9,10], all of which serve a single purpose: detect spatial, temporal, or spatiotemporal traffic patterns and predict their future tendencies [11].

There are many difficulties in spatiotemporal traffic network analysis for traffic prediction and route guidance. First, different methods use different measurements for traffic patterns; popular metrics include traffic flow [12, 13]; density [13, 14]; speed [15]; travel time [16, 17]; and traffic status [18]. The initial purpose of these methods varies, thus shaping the problem domain in different ways. The unique features of the measurements further shape the mechanisms quite differently. This diversity makes it difficult to compare methods efficiently and fairly.

Models have been proposed for traffic prediction in recent years, but many of them have focused on freeway [18, 19] or simple arterial network [17, 20] prediction. Urban traffic networks are more complex and spatiotemporally dynamic, fluctuating with traffic demands. Furthermore, factors such as signal control and notable events should be considered in urban networks, making the task more challenging. Attempts [21,22,23] have been made to increase the prediction accuracy of urban traffic networks.

As both temporal and spatial traffic data have boomed in recent years, storing and archiving traffic data efficiently and extracting important patterns that can be applied to decision-making from the data have given rise to significant challenges. Attempts have been made to reduce the dimensional space of data with abstract functional data analysis (AFDA) [24], and principal component analysis (PCA) [25] has been employed to process numerous data with multiple agents simultaneously [26] or to abstract data to different dimensions (intersection, subregion, or region levels) using a hierarchical structure [27].

Finally, urban traffic is influenced by sudden events and accidents-conventional methods only consider static traffic networks and predict travel time based on historical data. Numerous approaches to address dynamic traffic networks have been proposed, including regression [28], neural-network-based [29], Kalman filtering [30], and time series estimation methods. However, few of the above incorporate dynamic event response mechanisms. A knowledge-based real-time prediction system has been proposed [5] using pattern mining and dynamic weight combination to provide a real-time event response mechanism, thus improving travel time prediction.

In real-world application, Google obtained the patent for prediction considering space-time relation in 2013 [31]. This system comprises a spatial influence for determining the future traffic state, for each directed section to be predicted, spatial influences on the section by its neighboring sections are considered based on the directed graph network. Using a traffic prediction model by incorporating the determined spatial influences and historical traffic data of the sections, a spatial influence of a section can be used as a spatial operator and a time sequence model can be incorporated [32]. In this way, the traffic condition in a spatial scope can be measured more practically, so as to improve the accuracy of prediction.

2.2 Anomalous pattern discovery

There are always many unexpected events in urban traffic systems, such as natural disasters (earthquakes, floods, etc.), major events and commercial promotions, and traffic accidents and temporary traffic control. Urban functional structure development influences traffic patterns. If anomalous events can be anticipated or inferred, urban traffic management can improve its ability to respond to emergencies, thus ensuring the safety of traffic networks, and reducing tragedy and social and economic loss.

For instance, Pang et al. [33] applied GPS data from taxis to monitor the emergence of unexpected behavior in the Beijing metropolitan area. They aimed to identify sequential cells with time intervals that show the largest significant difference from normal historical behavior. The analysis concluded with a case study of festival day traffic and National Peoples Congress Day traffic, proving the effectiveness of the proposed method. STOTree [34] was proposed based on both spatial and temporal properties of detected outliers (certain links in a time frame) to construct outlier trees, which uncover causal relationships among outliers. This method can successfully detect spatiotemporal outliers and their causal interactions that coincide with known abnormal traffic control events, such as the Olympic Games in Beijing. Using identified anomalous events and their causal interactions, the spatiotemporal impact of planned events can be predicted and properly handled in advance [35].

Since it may be difficult to explain why a discovered anomaly is interesting, [36] proposed to dig deeper for the root of anomalous patterns, studying the traffic between regions, rather than on roads. This approach not only reduces complexity but can detect the root cause of traffic anomalies. Important social events can be cross-checked to extract internal relationships and analogue patterns between events. For example, anomalous increases in traffic volumes on two roads maybe caused by Sakura Festivals in different district of the city. To quantify traffic both spatially and temporally, dictionary-based compression theory has been applied to identify anomalous patterns in multi-dimensional traffic data.

Conversely, if incidents are unplanned, they still can be captured by real-time data collection in transportation networks. Traffic incidents include nonrecurring events such as accidents, weather hazards, or earthquakes [37]. If an incident is observed (from real-time data), its impact on surrounding traffic can be predicted and quantified using historical analysis results from similar incidents [38].

In DBAPRS [39], an intelligent system for analyzing and managing evacuation and relief supply delivery behaviors during large-scale disasters, spatiotemporal traffic network analysis is essential for better understanding and simulating human mobility after disasters. For example, based on the developed analysis model, some new (anomalous) features or mobility patterns can be detected after the recent severe earthquake, tsunami, and release of radioactivity in Japan that are likely to play a vital role in future disaster relief and management worldwide.

The A-map company of China, which serves as Google Map in mainland China, have published annual traffic analysis report for major cities in China for 3 years. These reports [40] are so influential that government agencies are currently adjust their traffic management policies accordingly. In their report, they also conducted the spatiotemporal analysis method to impact analysis of major traffic accidents. For example, at 6 a.m. of July 9, 2015, near the East Fourth Ring of Dajiaoting bridge in Beijing, a large truck brake failure caused a 7-car pileup. The accident led to a congestion for about 4 hours at the south of the East Fourth Ring including four bridges as shown in Fig. 2. The congestion fleet was up to 17 km, and the speed of congestion spreading is approaching 4.86 km/h. As shown in Fig. 2, the direct loss of the story is within the controllable range.

Fig. 2
figure 2

Space-time distribution diagram of traffic condition on July 9

In addition to the direct loss, the social and economic losses caused by the accident cannot be ignored. According to loop detector data and real-time estimation, the average arrival rate of the vehicle is about 3500 vehicles per hour, after the accident the traffic jam is about 7500 vehicles per hour, if traffic congestion in the process in a static state, so the total delay caused by the accident is about 39800 hours. Suppose the average vehicle load per vehicle is 2.5 people, according to the average monthly wage of Beijing as 6463 yuan, the total economic loss of the accident is up to 2 million 679 thousand yuan. Obviously, the indirect economic loss caused by traffic delay is much more serious than direct loss. The analysis of anomalous pattern here considering both the time and space dimension so as to better analyze the impact of the anomalous events, predicted consequences and provide lessons for future traffic management.

2.3 Traffic network structure analysis and bottleneck detection

Urban traffic congestion indicates, to a certain extent, that the design of the existing road network cannot meet the requirements of urban traffic development. As shown in Fig. 3a, urban traffic networks can be split into regions using high-speed and main city loops; then, spatiotemporal traffic network analysis methods summarize the characteristics of large-scale traffic data traveling in and between different regions. Thus, areas of poor connectivity can be identified to explore weak linkages or bottlenecks in the current urban traffic network.

Figure 3b gives an example of analysis results from these data. The bottleneck intersections and areas are intuitively shown with shading. These results can be used as a reference for traffic management systems and in the development of urban transportation planning. Meanwhile, by comparing results from consecutive years, the effectiveness and influence of some already-implemented planning projects (such as new roads and subways) can be validated.

Fig. 3
figure 3

Examples of traffic network structural analysis

Hence, traffic bottleneck detection applications have drawn significant attention in recent years. A traffic bottleneck refers to a section of a road or zone in a network where the traffic supply capacity is below the current traffic demand, thus leading to traffic congestion [41]. Traffic congestion formation can be classified into three categories: (1) temporary obstruction caused by unusual events such as road construction, accidents, or even traffic regulations such as a temporary speed limit [42]; (2) permanent constraint resulting from insufficient capacity; and (3) stochastic demand fluctuation when traffic demand booms during big events, such as commercial performance.

Because of the nature of traffic flow propagation, once formed, bottlenecks are difficult to monitor and easy to propagate to other areas of traffic networks. Methods have been proposed to study link-side [43] or link-to-link bottleneck propagation [44, 45] with node models, which study the propagation of traffic flow using dynamic equilibrium models. There are still gaps that need further exploration, for example, link-to-area and area-to-area propagation analysis methods.

A cell transmission model (CTM) was proposed to recognize urban congestion propagation [46], which discretizes intersections into homogeneous cells with split time intervals to identify bottlenecks by simulating traffic demands. Game theories such as di-lemmas [47] can also be used to analyze the spatiotemporal traffic flow behavior of congestion and bottlenecks. A three-phase spatiotemporal traffic bottleneck mining (STBM) model was proposed to discover traffic bottlenecks using location-based services [48]. Moreover, an attempt has been made to explore traffic flow propagation patterns using the concept of conductivity [49]. Google also has applied spatiotemporal analysis method in real world for bottleneck detection [50], it utilized the congestion detection, time series classification and analysis of link-speed data and capacity on road sections to determine the nonlinear, multi-segment traffic bottlenecks in a transportation network graph. This algorithm applied by Google uses link-speed data to detect bottleneck conditions and classifies bottlenecks and bottleneck-like traffic features. This method identifies and evaluates the recurrent bottlenecks even when the location of the bottleneck varies slightly across multiple time periods or across multiple days with fuzzy identification.

Recognizing bottlenecks is foundational to improving network traffic conditions and preventing traffic congestion. The features of bottleneck and traffic flow propagation patterns should be further explored, considering time, location, and network information.

2.4 Traffic guidance systems

Not limited to simple route guidance algorithms, a traffic guidance system is an important subsystem of ITSs and advanced traveler information systems (ATISs) [51] and can effectively guide most of the vehicles running on a road network, reducing the travel time of vehicles on the road, and equitably distributing traffic volume across the road network.

Traffic guidance systems [52, 53] are comprised of three components: (1) in-vehicle, (2) out-of-vehicle, and (3) central control systems. In-vehicle systems (see Fig. 4a) gather real-time information and provide route guidance inside a vehicle, while out-of-vehicle systems include traffic information broadcasts, traffic guidance screens (see Fig. 4b), and urban parking guidance systems, guide the traffic flow outside the vehicle. Both in-vehicle and out-of-vehicle systems require the coordination of background spatiotemporal analysis of traffic data in central control systems. Note that the above subsystems, such as traffic guidance screens, do not always show real-time traffic information; they may show the anticipated traffic status instead. For example, when an intersection will soon develop traffic congestion, a traffic guidance screen can represent the corresponding road in red to guide vehicles to choose other roads, thus avoiding the formation of congestion.

Intuitively, accurate traffic network spatiotemporal analysis is also a very important part of intelligent traffic guidance systems, which has a direct impact on their corresponding induced effect. Especially, computing an optimal route [54] in traffic network between specified origin and destination target nodes is one of the important branches of real-world traffic guidance applications. Many applications such as autopilot vehicle routing [55, 56], logistic planning or intelligent navigation system need to solve a huge number of shortest-path queries in traffic networks. Although the simplest cost evaluation factor here is distance or travel time, the cost function may be any combination of travel time, distance, waste gas emission, passing toll, energy consumption, comfortable index, etc. The task is to compute the costs of optimal paths between arbitrary origin–destination pairs. Since the guidance information should be provided on time, preprocessing is allowed, but it has to be sufficiently fast and the routing algorithm must be able to handle a large number of users in real-time request.

Recently, T-drive has also drawn growing attention. Using GPS-equipped taxi trajectory information, T-drive [57] proposed a routing algorithm based on a landmark system. For example, the method uses a red dot to represent a frequent taxi driver across the road (called a landmark); the connection between landmarks is a virtual edge representing the continuous trajectory aggregation passing connected landmark points. By learning historical trajectory data, the travel time for any virtual edge can be estimated and used to recommend optimal routes to drivers and alleviate traffic congestion by distributing drivers with different optimal routes for the same origin and destination assignment (OD pair) dynamically.

The environment of optimal route planning can be generally classified to static and dynamic traffic networks [58]. In static traffic network, the cost associated with each edge of traffic network is constant to simplify the problem domain while in dynamic network, the traffic system is using real-world maps and thus variations on the map, such as traffic congestions, accidents, traffic controls, will influence the costs. Therefore, the dynamic route planning [59] algorithm is aware of the feedback from the traffic system and should provide alternative routes in real time.

Optimal route planning in guidance system can be processed as time-dependent shortest-path problem [60], where the traffic network is a time-dependent graph and the edge delay various from time to time. The traditional Dijkstra-based routing algorithm encountered irresolvable difficulty in this dynamic context, but it could be overcome effectively by adding time-refinement steps. The proposed method can obtain the optimal efficiently for handling large time-dependent graphs.

Fig. 4
figure 4

Examples of in-vehicle and out-of-vehicle guidance systems ((a) shows an example of in-vehicle navigator and in (b), the green line means low traffic volume, the yellow line means moderate traffic volumes, and red means traffic congestion) (color figure online)

Simple statistic shortest-path problem can be solved by aggregation of queries evaluated with average traveling time. However, real world traffic system is ever-changing and full of uncertainties; therefore simple aggregates often fail to adapt to the changing traffic. Probabilistic path queries have been incorporated [61] to capture the uncertainty by P* algorithm with best-first search method based on hierarchical partition tree index and heuristic evaluation functions. The algorithm showed good performance to deal with uncertainty of real-world traffic network by incorporating probabilistic theories in graph.

Meanwhile, another related problem of finding traffic-aware fastest path (TAFP query) also attracts a number of researchers’ attention [62, 63]. TAFP query algorithm aims to find the fastest path connecting destination and guarantee that moving objects can arrive in time. Although the queries are mainly motivated by indoor applications, they are also applicable in outdoor context, such as traffic networks. A traffic-aware spatial network is constructed (Shang et al. [64]) by analyzing uncertain trajectories of moving objects, and the performances of queries are verified by extensive experiments on real and synthetic spatial data.

Inspired by different preferences of travelers, purposes other than shortest traveling time are also considered by optimal route planning algorithms, e.g., eco-routing [65] is one of these popular topics here. To reduce the carbon emission and other travellings environmental footprint, eco-routing will quantify the environmental impact of vehicles as they travel in a traffic network. An eco-weight assignment is proposed in Ecomark 2.0 system; the framework analyzes the actual fuel consumption, impact model calibration, and utility of the impact models in assigning eco-weights.

With the rapid development of mobile apps like Uber, more specific demands, such as taxi or carpooling distribution systems, have emerged [66]. These systems provide taxi drivers locations where they will pick up passengers more quickly and maximize profit, and recommend users positions where they can easily find vacant taxis or vehicles. The efficient distribution provided by these systems has improved the public usage of taxis and vehicles, therefore alleviating traffic loads in urban networks.

In Uber, two kinds of models are applied for passengers [67], and the interests of passengers and taxi drivers are modeled. For non-sharing tours (multiple passengers request does not want share a taxi or carpool), a stable matching approach is used. It can deal with unequal numbers of passenger requests and taxis through matching them to dummy partners within required responding time constraints. Stable matchings are proved to exist with dummy partners considering both shortest responding time and most convenient vehicles. While in sharing taxi dispatches (multiple passengers can share a taxi or a carpool), passenger requests can be packed through solving a maximum set packing problem.

Broadly speaking, intelligent signal control systems also belong to traffic guidance systems, since they control the distribution of traffic flow based on real-time traffic network analysis. Efficient advanced signal control systems, such as SCOOT and SCATS, normally coordinate traffic networks using a multi-level hierarchical control mechanism based on both traffic network topology and real-time traffic information analysis [68]. Traffic network node degree, the connection between nodes, and peak-hour traffic flows can be selected as indexes for node importance assessment in advanced signal control, and c-means clustering can be applied to identify regional traffic control nodes [69].

2.5 Human mobility analysis and urban planning

To date, the popularity of social networks (Facebook, WeChat, etc.), especially of location-based social networks, brings a wealth of media data and time-dependent location data sources to traffic network analysis. The human movement patterns hidden in these data sources constitute the fundamental demand of urban traffic networks, which is a crucial issue for traffic management [70].

Big data represents new opportunities to measure traffic demand, diagnose problems, and plan. However, integrating these new data sources into existing urban and transportation planning is a challenging problem. Attempts have been made to tackle this challenge, for example, with the increasing availability of human-tracking data in modern traffic system, travelers movement paths around the public transport networks could be anticipated based on historical data of trajectories or public transportation records of IC cards. Shang et al. [64] developed an efficient algorithm based on network expansion to find unobstructed routes in public transport networks. The congested stations were firstly predicted; then, the optimal route will be provided to avoid these obstructs to increase traveling efficiency. Extensive experiments proved the efficiency and effectiveness of the proposed method and also implied another important application field of spatiotemporal traffic network analysis. That is to provide recommendations to human mobility and which, in the long run, may further influence the traffic demand or even shape the future development of the city.

In real applications, great progress has been made in processing new data sources. For example, call detail records (CDRs) from mobile phones in conjunction with geospatial data, census records, or surveys can be used to analyze and generate an urban OD demand matrix [71]. A case study was conducted in the Kansas Metro Corridor to analyze the feasibility of using cell phone data to track cross-region traffic activities, and to derive the OD travel demand and commuting traffic data [72]. Estimation methods from triangulated mobile phone records of millions of anonymized users have also been proposed by aggregating individuals daily trips by census tract [73].

A framework that discovers regions of different functions (DRoF) in a city using both human mobility between regions and POIs in a region has also been proposed [74]. Human mobility data, extracted from taxi trajectory data and public transportation system entrance and exit information, were summarized using spatiotemporal traffic network analysis to identify human mobility patterns that can distinguish the heat of the same class of POIs and reveal the function of a region. Better understanding of region function can further help the development of traffic management systems and establish future urban and transportation planning frameworks. For example, more roads should be planned to connect commercial and residential regions (routes that experience greater commuting traffic demand, especially during morning and evening rush hours). If the city wants to avoid heavy commuting traffic, then urban planning should arrange more residential areas around existing identified commercial areas; shorter average commuting trips will lead to lighter commuting traffic demands.

Interest in human movement has been further developed [75] and shifted from raw human movement data analysis to more application-oriented ways of analyzing segments of movement. Not only the human movement, but also the reason behind those movement trajectories are studies by enriching trajectories with semantic information and providing information about their related contextual data. Therefore, the behavioral patterns of the moving objects can be extracted as semantic behavior. A semantic behavior [76] is a trajectory behavior whose predicate bears on some contextual data on some spatial and/or temporal data, either among individual trajectories or groups of trajectory movements.

What is worth to mention is that although human mobility analysis is currently available based on the abundance of human mobility data, whenever the trajectory positions or POI data refer to the personal sphere of individuals, e.g., where individuals live or work, the places they visit, these are personal information and should be protected against unauthorized disclosure. Therefore, information security should be carefully considered during the process.

Available human mobility data show new opportunities to measure traffic demand and can be further integrated into urban transportation planning to solve traffic problems from more strategic and top-level perspectives, which requires more effort for future development.

3 Data acquisition and preprocessing

Source of related traffic data can be classified into four major categories:

(1) Location-based data are usually gathered from traffic surveillance systems, e.g., road cameras, inductive loop detectors, and wireless sensor networks [77] in urban traffic networks. Location-based data measure specific time-varying traffic factors (such as traffic volume, average traveling speed, traffic density, and traffic status) of points in traffic networks, such as sections, intersections, or interest areas. If the data are collected from road cameras, further image preprocessing is necessary to convert raw data to the required format [78]. The data collected may have various formats, for example, image data from cameras, stream, or spatiotemporal data from detectors, and point data from road network maps.

Point of interest (POI) data [79] represent locations that are worth visiting because majority of people find interesting or provide some type of service such as a restaurant or shop. Spatially detailed and volunteered geographic information (VGI) together with modern techniques offers new possibilities for represent activity hot spots in urban area. Rapidly developing location-based social networks (LBSNs) leads to increasing research interests in analyzing POI data, for example, POI recommendation [80] provide recommendation services for places where users have not visited before but might interested in by estimating social, behavioral and psychological factors and mining historical location-based POI data; combined with other data such as land use, transportation and environmental resource, POI data can also be used as alternatives to explore economic activities at the local level, as studied in [81].

(2) Vehicle-based data are collected and archived in our daily lives with the help of GPS-equipped vehicles (such as taxis, probe cars, and public buses). For instance, many taxis in urban area have GPS sensors to give them efficient route guidance information, while recording their trajectory data in the background [82]. The trajectory data usually constitute series of location information with corresponding time stamps, which can be a fundamental source for traffic management [54], location recommendations [57], and transportation network improvements. Moreover, speed information can also be collected by sending probe vehicles into traffic networks [82], providing vital information for travel time estimation.

(3) Human-based data, with the development of mobile devices, are recording users real-world movements in the form of spatiotemporal data streams, both intentionally and unintentionally.

Intentional Data: Travelers log their travel routes on Google maps to plan journeys. Uber users input their OD pairs for taxi or carpooling services; all the above constitute basic data sources for urban travel demand analysis.

Unintentional Data: A user carrying a mobile phone unintentionally generates a sequence of location data (that can be generated from telecom tower triangulation) with corresponding times, which is trajectory data. Additionally, uploaded Twitter information with position stamps from witnesses of an accident or emergency event may be useful for triggering accident emergency schema for traffic management system [83].

(4) Supplementary data can be collected from various sources, for instance, weather data from weather report websites, which has a spatiotemporal format; road construction data from road administration centers, comprising location and duration data; and dynamic traffic control data, such as temporary closures to traffic during certain periods, from traffic management systems, which are also location and duration data [84].

As a wider variety of data becomes available for spatiotemporal traffic network analysis, more data preprocessing is required. In this section, we introduce four basic techniques to process a spatiotemporal traffic network database before starting concrete mining tasks, including noise filtering, compression, segmentation, and heterogeneous data management.

3.1 Noise filtering

In real-time traffic systems, data are naturally noisy, which may be caused by equipment failures, incomplete or missing data, typos, or other unknown reasons. To improve modeling accuracy, noise reduction is needed before starting a mining task. Existing methods fall into two major categories.

  1. (1)

    Data smoothing and inference methods: The most common method in this category is Kalman filtering (KF) [85], a real-time algorithm for traffic estimation using an extended Kalman filter (EKF) as a flow model [86]. KF is a sequential data assimilation method using Monte Carlo or other ensemble integrations. By integrating an ensemble of a models previous states, the mean and error covariances can be computed to capture the distribution of the data and reduce noise. In addition, the problem of missing data caused by, for example, lack of sufficient traffic sensors can be solved using KF methods that automatically complete data records using estimated mean and covariances [87]. This process is also known as data fusion and is discussed more in Sect. 3.4. Similarly, support vector regression (SVR) theory can also be used to recombine data from adjacent sections to reduce noise and compress the dataset [88]. A fuzzy inference system was applied to identify potential noise data points and replace them with proper fuzzy values, thus smoothing the whole dataset [89].

  2. (2)

    Outlier elimination methods: In contrast with replacing noise data with estimated values, outlier elimination methods were proposed to identify outlier noise data and delete them from the data source. For example, a fuzzy clustering method was applied to identify and delete irregular data from traffic systems [90]. EMD wavelet transformation has also been applied to remove high-frequency components, which contain many noise data, from the traffic stream. Wavelet transformation is suitable for nonstationary and nonlinear data analysis.

3.2 Data compression

Managing and summarizing large-scale datasets is a basic challenge in urban traffic network analysis. Huge datasets make it difficult to extract useful patterns efficiently; furthermore, newly available data sources bring new research opportunities as well as requirements for proper image and text compression technologies. Finally, many applications do not require such precise data; thus, data can be compressed for more efficient performance without sacrificing too much accuracy.

As many applications require processing data in a timely fashion, a series of compression techniques have been proposed. There are four major categories of compression methods.

(1) Temporal compression: Representative methods are window-based algorithms, such as the sliding window algorithm [91], which summarizes necessary information using a simple average in a sliding time window. The search window algorithm calculates the proper search window size as follows to reduce computational effort while maintaining prediction accuracy [92]:

$$\begin{aligned} Search\_window\_size = nearest\_size(\frac{C}{{{V_{av}}}}) \times D.I \ge {D_{\min }} \end{aligned}$$
(1)

where C is a constant; \(V_{av}\) denotes the average speed on the road section; and D.I represents the data detectors data collection interval, which has a minimum value of \(D_{min}\). Although the search window size is automatically calculated, C and \(D_{min}\) should be heuristically defined based on the problem domain.

(2) Spatial compression: Methods have been proposed to discretize spatial data into different dimension levels, such as section, subregional, or regional levels, using a hierarchy [27, 36] as demonstrated in Fig. 5. The granularity of the analysis process can be adjusted flexibly for different application requirements; thus, analysis methods need not process the most detailed traffic network at all times. Instead, they can first locate the interest areas based on approximate data from compressed regional-level datasets and then conduct detailed analysis in the corresponding section-level dataset.

Fig. 5
figure 5

An example of spatial compression where traffic data have been discretized to different hierarchy levels

(3) Dimension reduction methods: Dimensionality reduction and feature extraction are also popular in data compression, such as in PCA [25, 93]; AFDA [24]; and apriori-based approaches [48]. Well known for its capability to select major features, PCA projects the original data vector onto an orthogonal linear space in which the k-th largest projection of the data on the kth coordinate is as shown in Eq. (2):

$$\begin{aligned} \mathbf {y} = W\mathbf {x} + \mathbf {\mu }\end{aligned}$$
(2)

where for each data vector , there exists a corresponding \(\mathbf {y} \in {R^n}\), \(\mathbf {\mu }\in {R^{n \times 1}}\) , and \( W \in {R^{n \times n}}\), allowing a mean shift when R represents a projection space with n dimensions. The main component factors of object vector can be extracted by sorting the sum of the correlation covariances. Then, the first few principle components will be reserved in the transposed correlation coefficient matrix containing the most valuable information with more compact structure.

In contrast, AFDA uses functional component analysis, which is usually applied to sparse longitudinal data, and is suitable for solving the sparse traffic data problem. The proposed methods calculate the covariance structure and mean function of vehicle trajectories and estimate the variance of measurement errors. Then, the eigenbasis functions of the data are calculated and the functional component scores can be used for further data compression.

In addition, since most pattern mining methods are only interested in frequent events, some infrequent attributes can be deleted from the database to compress the dataset (note that these methods cannot be used in anomalous detection applications). Based on the Apriori principle [48], frequent sets of components in a dataset can only be obtained by combining corresponding frequent subsets; thus, we only need to retrieve frequent subsets of components for data processing. Intuitively, only frequent event data and all their frequent combination data will remain in datasets for pattern mining. For example, when studying interest events, such as traffic congestion, if a section rarely has any traffic congestion, we can delete it from datasets for congestion pattern mining. In conventional pattern mining, full combinations of all sections are examined, which is very time consuming. By apriori-based data compression, infrequent sections are deleted in preprocessing. Consequently, the possible section sets that need further examination can be reduced exponentially.

(4) Encoding methods: Other compression methods try to delete redundant codes from a database by converting the data format to more efficient encoding systems. Song et al. [94] use Huffman coding to represent a frequently traveled path to a shorter version, therefore saving storage. A dictionary-based compression method was applied in [36] to denote the database in a more compressed way, which saves both storage and calculation time. A pattern table is necessary to compress and encode the features of the database. Given the usage, the optimal length of the codes to encode the pattern can be calculated based on optimal prefix codes. Then, the best pattern to encode the database is determined by the minimum description length (MDL) principle. Wavelet transforms [95] can also be implemented to encode stream data in a more compact form with less noise.

3.3 Data segmentation

In many scenarios, such as clustering and classification, databases must be divided into additional segments for further processing. Segmentation not only reduces computational complexity, but also enables us to mine more specific patterns, such as day-to-day or regional traffic patterns, beyond what we can learn from an entire database. In general, there are three types of segmentation methods.

(1) Based on time intervals: Traffic demands vary with time, showing day-to-day fluctuations. Commuting demands peak at morning and evening rush hours and are minimal at midnight; thus, a method was proposed [96] to divide traffic data from each day and extract similarity patterns across daily databases. In [97], traffic data each day are divided into small segments, such as morning rush hours, morning nonrush hours, and evening rush hours. By comparing corresponding data from the same period every day, a time-of-day pattern can be generated for further traffic analysis.

(2) Based on space: Data can be divided into sub datasets by regions for regional pattern study or specific research objectives. For example, traffic accident analysis [98] is often performed on a small subset of traffic accidents, and accidents only influence contiguous areas; thus, only the region adjacent to the accident should be analyzed. Therefore, it is necessary to divide the database using regional segmentation. As mentioned in Sect. 3.2, spatial compression methods [27, 36] must also divide the raw dataset into subregional segments and then discretize them into different hierarchy levels.

(3) Based on other factors: Factors such as weather also influence the spatiotemporal patterns of traffic network. By dividing the database and archiving them by weather in [99], the authors studied whether weather influences traffic patterns and estimates the capacity and speed reduction of vehicles due to rain, snow, fog, cold, and wind by weather intensity levels.

3.4 Heterogeneous data management

The top two concerns for heterogeneous traffic data management are data synchronization and fusion. As we know, traffic data are heterogeneous; therefore, spatiotemporal traffic data analysis involves various kinds of data in different systems, such as TCSs, ATSs, and GISs. Because data are distributed in different servers and locations, comprehensive traffic analysis requires communicating and exchanging data across systems. Multi-agent systems [26, 100] can be an efficient tool for solving the problem of data management in distributed environments. Each agent may oversee one type of system, and a coordination mechanism can refresh the real-time dynamic data, providing scalability while maintaining the autonomy and security of separate management systems.

As described in Sect. 3.1, data fusion techniques are also used to manage heterogeneous data. It is a challenge to integrate data with different characteristics, resolution, measurements, and accuracy. The most widely utilized data fusion and assimilation technique applied to traffic state estimation problems are KF and its variations (extended or unscented KF) [85, 86]. KF methods assume a Gaussian distribution, which may fail when data distributions are skewed or multi-modal. Therefore, partial filters are also applied here for data fusion, using applied Bayesian optimal estimates with sufficient samples. However, the result is not better than KF if there are insufficient particles. In [101], nudging data assimilation was used to drive model variables toward observations (available data samples). Nudging first includes a term proportional to the difference between the predicted and observed data values in the constitutive equation f(stx) = 0:

$$\begin{aligned} f(s,t,x) = \lambda (t,s) \cdot (x - {x^o}) \end{aligned}$$
(3)

where s is space, t is time, x is the source data vector, and \(x^o\) is the observed values. The nudging factor \(\lambda \) decreases as the location shifts from the measurement location and time passes from the observation. As a result, \(\lambda \) drives the solution toward the observations. For example, when calculating an LWR partial differential equation in traffic database, the \(f(s,t,x)=0\) (original) state of the model is simply:

$$\begin{aligned} \frac{{\partial \rho (s,t)}}{{\partial t}} + \frac{{\partial q(\rho )}}{{\partial s}} = 0 \end{aligned}$$
(4)

where \(\rho (s,t)\) represents the data value at location s and time t, and q is the fundamental diagram of the data value. Therefore, based on (3), we get the following:

$$\begin{aligned} \frac{{\partial \rho (s,t)}}{{\partial t}} + \frac{{\partial q(\rho )}}{{\partial s}} = - \sum \limits _{j = 1}^J {\lambda (s - {s_j},t) \cdot (\rho ({s_j}(t),t) - } \rho ({s_j}(t),t)) \end{aligned}$$
(5)

where the summation can combine the loop counts and provide the measured value of \(\rho ({s_j}(t),t)\). The nudging factor is flexible and becomes 0 outside the predefined influence area. For example, when considering traffic density, \(\lambda \) increases when the model underestimates the traffic density, thus adding vehicle source data in the density database. Meanwhile, \(\lambda \) decreases when the model overestimates the traffic density, thus removing vehicles from the traffic density database. Although easy to implement, it is not easy to find proper a nudging factor and difficult to link to the objective variables in models with differential equations.

4 Methodology

Spatiotemporal database systems organize large amounts of multi-dimensional data. Efficient analysis requires the methods to retrieve spatial and temporal relation patterns in real time. An increasing number of traffic applications manage real-world spatiotemporal databases. As a result, recent reports have shown an increase in research papers on managing spatiotemporal network analysis in traffic applications. Not only do the spatial aspects of objects change over time, but also the granularities of temporal factors depend on different spatial aspects. For example, sliding window size can be decided by the geographic level of the database: regional databases have larger windows, whereas section-level databases may use smaller ones.

It is important to identify spatial and temporal traffic patterns in urban network analysis applications. Research attempts have distinguished between freeways and urban networks due to the constraints imposed by signaling or other control measures, which influence the temporal patterns in traffic networks. However, even in freeway or single arterial cases, which are simpler to analyze, capturing the spatiotemporal patterns of traffic elements is an open problem.

Table 1 Artificial intelligence methodologies

Existing literature has largely used data from motorways and arterials and has employed univariate mathematical models to predict and analyze traffic networks [102]. At the urban network level, attempts to integrate spatiotemporal information into traffic network analysis algorithms require well-established models that accurately replicate traffic flow dynamics both spatially and temporally. There are five categories of methodology that can be used in spatiotemporal traffic network analysis: artificial intelligence (AI), inference, regression and data driven, model-based, and classification methodologies.

4.1 AI methodologies

Facing the challenge of jointly considering nonstationarity and nonlinearity in traffic analysis model selections [103], choosing the model with the best forecasting and analysis is essential for enhancing traffic management decision making.

AI techniques have attracted increasing attention in traffic management; their advantages lie in their interdisciplinary explanatory power, ability to engage human-like reasoning, and adaptivity to various problem domains. The most popular AI methods for solving spatiotemporal data analysis problems are artificial neural network (ANN) methods, as shown in Table 1. Inspired by human neural networks [104], ANNs are very sophisticated modeling techniques, capable of modeling extremely complex functions, especially when linear approximation is not valid, which is usually the case in urban traffic network systems. ANNs also help address the problem of dimensionality, which often encountered when modeling nonlinear functions with many variables.

ANNs learn by historical data and training examples. The user gathers representative data and then invokes training algorithms to automatically learn the structure of the data. The node network of ANNs can be a good model to the analog adjacency structure of traffic systems; thus, the temporal pattern of traffic flow can be discovered by analyzing surrounding areas as well as historical information [8]. Although some heuristic knowledge of selecting data and choosing an appropriate network structure is necessary, the level of user knowledge needed to successfully apply ANNs is much lower than would be the case using more traditional statistical methods. To date, interest has concentrated on hybrid ANNs in spatiotemporal traffic analysis. To optimize the hybrid analysis structure, various optimization techniques have been implemented, such as genetic, Bayesian, spectral basis transformation, clustering, EKF, and nature-inspired algorithms, as shown in Table 1.

Table 2 Inference methodologies

4.2 Inference methodologies

Calibration of transport models for spatiotemporal traffic systems also gives rise to a variety of statistical inference methods, such as estimation of travel demand parameters [68, 70]; fuzzy logic inferences [88]; Bayesian inferences [105]; and Markov chain inferences [4], as shown in Table 2. Once the ways in which vehicles move through the network are known, statistical inference becomes a straightforward solution.

Inference methods concentrate on deducing properties of an underlying distribution by analyzing sample datasets. Observed data are assumed to be sampled from a larger population; the spatiotemporal pattern or distribution probabilities obtained from the current dataset can be updated as more evidence becomes available. This feature makes them well adapted to the dynamic and online analysis of traffic management systems.

As shown in Table 2, the increasing use of GPS devices has led to an increasing availability of trajectory data. Researchers have devoted their efforts to developing analysis methods for trajectories [4, 36, 57]. Similar to probe vehicle data, trajectory data from GPS-equipped vehicles are incomplete datasets and have irregular trajectories [93]. Estimating the statistical distribution of source data from inference methods such as Markov chains, fuzzy logic, OD demand estimation, or game theory [47] provide a natural way to preprocess noisy data as well as capture hidden spatiotemporal patterns in data samples.

4.3 Regression and data-driven methodologies

Data fusion of traffic parameters is crucial in ITSs yet complicated to formulate mathematically. As discussed in Sect. 4.2, the development of data collection technologies has provided opportunities to acquire traffic data at different resolutions. Higher data resolution also inevitably brings datasets with more problems (noise, incompleteness, etc.) that require approaches to reduce noise and smooth the original datasets.

To improve data quality, several methods have been implemented, ranging from a simple moving average to regression-based methods, as shown in Table 3. For example, there exist some road segments with missing or sparse probe vehicle data, which reduces accuracy. To tackle the sparse dataset problem [29, 44] data-driven methods use regression to calculate the traffic parameters of segments without sensor data by fusing the information from adjacent interval, time, and road segments. They make full use of information from source data by selecting data from neighboring areas with the highest correlation coefficients dynamically. Regression-based techniques (ARIMA [24], etc.) represent traffic patterns as equation matrices that use parameters from historical and adjacent areas as independent variables, thus handling spatiotemporal traffic analysis task.

Table 3 Regression and data-driven methodologies

Conversely, the explosion of traffic data in metropolitan traffic networks and the concurrent advanced data management systems provide data-driven regression with new opportunity in practice. Regression methods have good prediction ability with restrained accuracy errors and can solve the data fusion problem rapidly and efficiently.

Data-driven methods also provide suitable solutions for integrating data with different resolutions. In such situations, PCA [26] can provide an independent linear combination of the variables, thus reducing the dimensionality of heterogeneous datasets. Numerous nonparametric and nonlinear data-driven methods have proposed to handle concurrent and dynamic traffic network analysis; thus, both temporal and spatial dimension patterns can be discovered efficiently by data-driven methods [5, 39, 106]. However, the mathematical complexity of advanced data-driven model is an obstacle in practice. Considering that field staffs do not often have deep knowledge of mathematically complex models, it is difficult for them to reach the potential of the models during application. Therefore, the expandability of these analysis methods should be improved. As discussed in [106], such practical considerations receive insufficient attention in many academically oriented studies.

4.4 Model-based methodologies

As the emerging trend shows that simple methods are insufficient to manage overwhelming database sizes and complex heterogeneous data structures, increasing attempts have been made to build sophisticated models to tackle spatiotemporal data analysis, as shown in Table 4.

KF models [32, 85, 107] are extended to include explanatory variables and cope with both multivariate time series and geographical information. Regular KF shows good static stability and adopts iterative methods for optimal traffic estimation without considering influencing factors. To consider spatial influences, the spatiotemporal characteristics of the traffic flow on urban traffic networks and other influence factors such as incident data [32] can be accounted for in flow rate prediction. Based on KF models, the structural state-space model [89] was proposed to estimate the OD traffic demand, thus giving essential support to dynamic traffic analysis systems. Features of day-to-day traffic demand evolution are captured as a state-space matrix with respect to actual links in the traffic network. KF models may not explicitly evaluate spatial influence; however, spatial factors are implicitly calculated when building the matrix structure of traffic networks.

CTM [38, 46] uses transmission among cells to mimic and predict traffic flow over time and space, especially the building, propagation, and dissipation of traffic queues. CTM captures the transfer flow volume between cells as a minimum of sending and receiving flows and uses cell-based density or flow as state variables. However, most CTMs focus on merging and dispatching traffic flows of urban freeways; thus, more optimization is necessary when dealing with urban traffic networks.

Table 4 Model-based methodologies

Functional data analysis (FDA) [25] converts the traffic data collected into groups of functions using statistical smoothing and clustering methods. Functional PCA associated with a functional dataset is defined by the same properties as in the multivariate case. PCA was implemented to find the main component from a data source, thus reducing dimensional complexity. Then, for each functional case, each principal component is specified by a principal component weight function defined over the range of the data. A graphic representation of the traffic network is analyzed and clustered to provide spatiotemporal patterns for analysis.

Gray system models [22] take each stochastic data sample variable as a gray quantity that changes within a given range or time. Gray relational space in gray system models can describe the spatial relationships between one main factor and all the other factors in a traffic system, while temporal features are captured by statistical regression. Gray system models are convenient to combine with other models, such as ANNs [109]. Kerner and Klenov proposed a deterministic speed adaptation microscopic model [108], in which free flow and synchronized traffic flow are assumed. The proposed model can qualitatively exhibit the empirical features of Section A (free flow) \(\rightarrow \) Section B (synchronized flow) \(\rightarrow \) Section C (traffic jam) transitions and spatiotemporal congested traffic patterns. The trend for hybrid models also emerges as traffic management must deal with increasingly complicated traffic networks [42, 96, 100] as shown in Table 4.

Table 5 Classification methodologies

4.5 Classification methodologies

Traditional spatiotemporal methods draw conclusions from day-to-day or time-of-day features of traffic datasets; however, there is a pressing need to develop systematic prediction approaches for traffic pattern identification. For example, the relationship between intersections and traffic flows of surrounding areas should be unveiled; the influence of incidents, events, and natural disasters to traffic flow should be clarified; and traffic demand patterns, which constantly evolve with the development of urban functional areas, should be further explored. The demand for these systematic prediction approaches makes classification methods an attractive choice, since they aim to explore internal correlations and features among events and items, as shown in Table 5.

Clustering [68] and anomaly detection are suitable for estimating data patterns, fusing sparse data, and identifying traffic network bottlenecks [68, 97, 98]. By revising the calculation mechanism for distance criteria, clustering methods can also be used to process text or document data from Twitter to identify important traffic incidents or social events that may influence traffic demand and urban traffic flow [84].

With association and rule mining methods [10, 33, 84] not only can historical rules be extracted from sequential pattern mining, but the correlations among social events and traffic patterns can also be obtained by spatiotemporal analysis. Apriori-based mechanisms [49] can improve database analysis efficiency. A typical temporal association rule has the format:

$$\begin{aligned} {\mathrm{Section A (congestion)}}_{(t = 0)} \rightarrow {\mathrm{Section B (congestion)}}_{(t=2)} \end{aligned}$$

As shown above, if section A has traffic congestion at time 0, then 2 time units later, section B also has traffic congestion. If this event sequence happens frequently enough, statistical criteria will be used to test the correlations between the events in this sequence. Only sequences that pass these tests are extracted as association rules. Temporal association rules are easy to use and interpret; thus, even without professional knowledge, traffic managers can effectively use the obtained association rules to guide management in field applications. However, correlations do not necessarily guarantee causality relationships; one event always happening before another does not automatically reflect that the first thing is the cause of the second. The explanatory ability of association rules must be improved by combining previous field knowledge.

Decision trees [91] are effective generative models of induction rules with a tree structure from traffic data. An optimal tree then accounts for most of the data, while minimizing the number of hierarchy levels. Unlike ANN methods, decision trees are a white-box model and are thus easy to interpret. However, when processing intensive databases such as traffic data, the tree structure can be convoluted. Furthermore, since the decision tree structure is constructed using information gain theory, it is biased in favor of categorical variables with different numbers of levels.

A support vector machine (SVM) [15] constructs a hyperplane or set of hyperplanes in a high-dimensional space for classification and regression. Proper hyperplanes are chosen based on statistical learning theory and the Vapnik–Chervonenkis (VC) dimension to separate training points from test datasets. After training, the established model can be used to estimate and predict future traffic. SVM approaches can deal with the local minimum problem of AI-based methods and are suitable for high-dimensional data with few samples. However, SVM models are sensitive to missing data; thus, data fusion preprocessing is necessary to maintain the classification accuracy. As explained before, hybrid models are also increasing as shown in Table 5, since the strengths of other methods can make up for deficiencies of the main model.

5 Evaluation of methods

It is both essential and difficult to effectively evaluate the performance of analysis models, since the selection and comparison should firstly have consistent standards and comparable criteria system. There are various models which are applied in different application fields thus facing quite different problem domains.

The most popular criterion for model performance is accuracy; however, different models have various explanations for their accuracy rates, since they process data in different formats due to the different application domain and traffic management system. For example, for travel time data, accuracy is calculated based on the difference between the estimated and actual travel times, whereas traffic status analysis models first classify traffic data into different states, and if the predicted states and actual states coincide, then the accuracy is 100

Accuracy is of significant importance but should not be the sole determinant in selecting an appropriate methodology. Time and space complexity is also important for evaluating the efficiency, especially considering in most of the recent studies, all the decisions and feedbacks should be provided to travelers or users in real-time manner. Evolutionary-based models [28, 106] have good robustness, however, when facing dynamic traffic environment with fast fluctuating data, the response time is relatively longer than simple intuitive models such as regressions and decision tree-based methods [91, 110]. Clustering methods [97, 98] are also useful in data preprocessing since the vast amount of data can be clustered and summed-up to different granularity, which saves storage space and processing memory for running the program, thus decreasing the space complexity, but due to the extra step of preprocessing, sometimes the time complexity is increased by the complicated analysis model.

Furthermore, other criteria are also important for evaluating model performance; for example, criterion of understandability is easily neglected by many models. ANN-based methods are usually robust when dealing with noisy, nonlinear, and nonstationary problems, but the hidden layer network structure of ANN methods is hard to interpret and provides limited meaningful information to model users, thus having good ease-of-use and poor understandability. Statistical analysis methods carefully consider nonlinear, nonstationary, and noisy problems, providing rigor in logical meaning, but violate the assumptions of practical urban traffic networks; therefore, their ease-of-use is limited. Inference and AI-based methods are easy to understand and implement, though their causal reasoning is still controversial [106].

Efforts have been made to compare the performance of the applied models in each application direction, for example, in time series traffic prediction [111], where the basic data type is traffic density and travel time, the accuracy is evaluated by difference between the estimated and actual travel time. Speed measurement [112] is also popular in prediction systems with trajectory data, since speed can reflect the congestion condition of the target section. While in traffic anomalous detection applications [33, 36], more attentions have been paid to the understandability of the output besides the prediction accuracy; therefore, valuable suggestions can be inferenced and provided to policy makers and traffic network planners.

On the other hand, traffic guidance system [51, 53] empathizes the timely manner of the responds; thus time complexity is one of the most important criteria here. For some traffic guidance system such as traffic inducement [113], estimation of traffic state (low/middle/high traffic volume) for each section is accurate enough and calculation of actual travel time is not required; therefore, the performance of inducement rate becomes important here. In bottleneck analysis of traffic network [41], the interpretation of the bottleneck is more important than simply evaluation of the number of detected bottlenecks. Besides, how to solve the problem of bottleneck without creating new ones is beyond the scope of the discussion in this paper. Finally, as the new research hot spot, various criteria have been proposed for model evaluations, e.g., the rating and check-in rate of POI recommendation systems [80], the matching rate, responding time, satisfaction rate of taxi routing and matching algorithms [54]; the different functions (DRoF) in a city using both human mobility between regions and POIs data in a region have also been proposed to identify human mobility patterns and reveal the function of a region. Better understanding of region function can further help the development of traffic management systems and, for example, advise to decide where to build a new road to meet the transportation demand. This human mobility model is hard to evaluate unless wait for years to track the consequences after the construction of the road.

In conclusion, since most of the spatiotemporal traffic analysis models are heavily dependent on the system environment in which they are applied, their data format, processing unit and analysis objectives are different according to their application domains. Not only the data they are dealing with is heterogeneous, multi-dimensional, and complex, the algorithm themselves are also heterogeneous, multi-dimensional, and complex. As a result, they should be evaluated based on concrete application purpose and context.

6 Challenges and opportunities

6.1 Heterogeneity

Traditional analysis approaches are often based on single data sources, as in short-term prediction models focused on analyzing traffic volume or travel time data. Anomalous detection is mainly based on trajectory or traffic flow data. As technologies in transportation, information, and communications rapidly advanced in recent decades, managing the heterogeneous and globally distributed traffic data presented the first challenge to analysis systems. Raw data may be collected from low-level detectors from different management systems and thus need specific preprocessing for data cleaning, fusion, and aggregation before further analysis. For example, when we need to analyze the influence of certain traffic incidents, traffic flow data should be collected from TCSs, position data from GISs, and incident data from ATSs.

In application, these data are stored in different servers and locations with different formats, spatiotemporal resolutions, and operating systems. In many applications of urban traffic network analysis, treating heterogeneous data with different features equally is unsatisfactory.

Extracting consistent, meaningful, and accurate information from heterogeneous traffic management systems requires not only corresponding data fusion methods from different fusion levels [114], but efficient coordination and workload allocation mechanism, especially for urban traffic networks, which require online traffic analysis and management mechanisms. Heterogeneous and globally distributed databases suggest a multi-agent system [3] with proper cooperation and management coordination. More efforts are necessary to develop efficient algorithms that can be implemented and used in central traffic data archiving and processing facilities to process heterogeneous data.

6.2 New data sources, more applications, and emerging opportunity

As discussed in Sect. 2, with the rapid development of in-vehicle navigation systems, mobile apps like Uber, Google Maps, mobile navigation, and taxi recommendation systems, data generated from vehicle and human mobility have offered abundant new sources for urban spatiotemporal traffic analysis. Formerly unavailable trajectory data for human mobility and additional vehicle trajectory data are now attracting increasing attention and effort to analyze the spatiotemporal correlations among these data with different granularities. The results of these studies can be implemented in both directly related fields, such as traffic management, as well as in indirectly related fields such as urban functional area identification, POI detection, and city planning.

Additionally, the popularity of location-based social networks brings a wealth of media data and time-dependent location data sources. These data may seem irrelevant to spatiotemporal traffic analysis at first glance; however, they can be applied to analyze and generate urban OD demand patterns, which is essential to traffic demand estimation when predicting future traffic scenarios. Therefore, the spatiotemporal analysis of traffic networks can not only consider historical information, but also explore the causal relationship between demand and supply in transportation systems.

New perspectives are now available for deeper analysis. The trend of crowds gathering continuously induces the demand for traffic; thus, human mobility data shed light on the sources of traffic demand, which may help us to discover the equilibrium status of traffic systems considering real-time supply (currently available urban traffic networks) and demand. Advances in technologies bring new perspective and plenty of data sources, which help to explore more applications of spatiotemporal traffic analysis models, thus providing new research opportunities.

6.3 More sophisticated criteria systems

Interest has focused on selecting the model that provides accurate and meaningful information based on the collected datasets, without considering underlying characteristics of traffic data; however, it is essential to effectively evaluate the performance of analysis models before selection and comparison. Most studies have emphasized discussions of findings with little attention paid to the overall quality of the proposed models.

As discussed in Sect. 5, accuracy is of significant importance, but should not be the sole determinant in selecting an appropriate methodology. More sophisticated criteria systems could be used to consistently and fairly evaluate analysis models, considering accuracy, time and space complexity, ease of use, understandability, goodness of fit, and explanatory power in a given problem domain.

6.4 Trade-off for model hybridization

To confront increasingly complicated decision-making needs for urban traffic network analysis, providing a heuristic approach that combines multiple models becomes a practical alternative. As discussed in Sect. 4, each group of methodologies has its own merits and demerits. Hybrid models generally absorb the advantages of each model and complement each models weaknesses, thus resulting in overall improved performance.

However, as growing number of approaches have adopted hybrid models, the boundary of hybridization must be clarified. For example, in which cases should hybrid models be used? Some researchers argue that hybridization should be done in multiple steps that discretize the analysis into different steps that use different models. Others may support the opposite, suggesting that it is better to use a deep binding hybrid model for the entire process. Moreover, hybrid models often result in more complicated analysis how does this influence the time and space complexity of model performance? Since it is possible for baseline models to outperform hybrid models in time and space complexity, we must not neglect this possibility when evaluating the proposed approaches. It is a trade-off for modelers who must decide whether it worth the effort to adopt hybrid models.

6.5 Multi-dimensional, sparse, and variable-resolution data management

Traffic analysis approaches are directly dependent on data collection systems and technologies. Data collected from various traffic management systems are inherently sparse in structure, huge in size, and have high dimensionality. For example, models can analyze the trajectories of taxis to estimate the urban travel demand. Taxi GPS trajectory data are huge, but at every time point there are quite many road sections with no taxis. To estimate the travel demand pattern of these roads, analysis models must grapple with data sparsity. Data fusion and clustering models are popular solutions for estimating missing data [8, 69, 85].

The high dimensionality of data also aggravates the problem of data sparseness. When the data reach a certain size, a simple matrix decomposition algorithm becomes difficult to perform. Finally, as many applications require processing data in a timely fashion, many applications do not need extremely precise data sources; thus, the data could be compressed for more efficient performance without sacrificing too much accuracy.

Therefore, traffic data could be collected at or compressed to a variety of resolutions to match the needs of both traffic management and control applications. The higher the data resolution, the larger the proportion of noise and, consequently, the more tedious data preprocessing (data fusion, segmentation, and compression) is necessary to maintain the prediction accuracy of the proposed model. However, inadequate data resolution also results in poor data fusion and analysis model performance, owing to the lack of vital information. Equilibrium solutions should be found to explore potentially useful patterns by exploiting information with appropriate resolution and data source granularity.

All the above features result in increased time and space complexity in the proposed models. Further considering mass processing of spatiotemporal traffic data, efficiency becomes a challenge that must be tackled in spatiotemporal urban traffic network analysis.

7 Conclusions

As transportation, information, and communication technologies have rapidly advanced in recent decades, traffic analysis data (e.g., trajectories representing moving objects) have become increasingly available. The potential of these data has captured the interest of researchers, who have developed spatiotemporal analysis methodologies for traffic networks. However, an integrated view of solved problems and methods in urban traffic network analysis, as well as the applications based on solutions to these problems, has not been fully explored.

This paper revisited the many spatiotemporal urban traffic network analysis methods and their advances in recent decades. It reviewed the impressive literature for identifying applications that can be solved by analyzing traffic data and the heuristic approaches used. It further surveyed the applications that are based on solutions to these problems. The preprocessing of multi-dimensional, noisy, and sparse data sources was carefully discussed.

In the following section, by surveying and analyzing groups of methods and their function, we identified and discussed the features and merits of different methodologies. This paper also synthesized and analyzed the challenges confronted and the potential interesting opportunities among them. There are still interesting research directions that will continue to attract the attention of researchers in years to come.

The major contribution of this paper is in providing an integrated view of spatiotemporal urban traffic network problems, the methods used to address them, and the applications that exploit solutions to these problems. The classification of application problems can help researchers identify new problems to which a traffic analysis model could be effectively applied.

As most vehicles and people are now equipped with high-end technologies, users can send information (intentionally or unintentionally) through mobile networks (Wi-Fi, GPS, navigator systems, etc.), and traffic analysis is now undergoing rapid development and expansion. Advances in communication technologies enable vehicles and people to continuously update their location and corresponding time information. This abundance of data sources and consequent real-time management demand will revolutionize the way we think about the traffic network analysis.

Additionally, integrating social network information further improves transportation system management. The applications of traffic data analysis go beyond the boundaries of transportation and extend to interdisciplinary fields such as city planning, urban function localization, and emergency management. The flows of people, vehicles, and logistics have been integrated inseparably as part of the smart city and urban network.

At this crossroads of chance and challenge, more sophisticated testbeds, test environments, and criteria are necessary for testing and comparing mechanisms. When confronting the more complicated requirements for processing data from various sources, considering the merits and disadvantages of different methodologies, hybrid models become an intuitive solution. The trade-off between baseline methods and performance sacrifices for model hybridization must also be determined appropriately. Moreover, researchers should respond to the problem of handling temporal and spatial complexity, especially for large datasets; treating spatial and temporal dimensions of data simultaneously and efficiently for online management systems requires further effort [1].