Keywords

1 Introduction

Recently, ride-hailing applications are becoming prevalent choices for daily commutes, such as Didi, UCAR, and Uber, which aim to provide passengers with convenient ride services and improve the efficiency of public transportation. To provide high-quality services and achieve company profits, ride-hailing platforms need to fully understand the passenger demands in real-time, which helps avoid empty drives (i.e., driving without passengers). Therefore, instead of merely forecasting the possible number of passenger demands within a region, it is rather important to gain knowledge of passenger demands in terms of the origin and destination of each trip. Because the demand quantity between two regions at different time slots also takes mining useful mobility patterns into consideration. If Origin-Destination (OD) travel demand can be found earlier, popular destinations and travel routes can be provided for travel service providers, and vehicles can be scheduled in advance.

OD matrix prediction has received increasing attention because of its importance. Existing studies [1, 14, 21] predict the OD travel demand between all regions of the city by constructing an OD matrix. As shown in Fig. 1, we regard the urban region as a large rectangle, and then divide it into \(m \times n\) regions, making the dimension of the OD matrix in each time slot reach \(N \times N (N=m \times n)\). And then form an OD matrix sequence according to a temporal order, where \(M_{i,j}^{(t)}\) in the OD matrix represents the number of trips from region \(r_i\) to region \(r_j\) from time \(t-1\) to time t. OD matrix prediction aims to predict the OD matrix of a future moment based on the OD matrix of multiple historical moments.

Fig. 1.
figure 1

Spatial neighborhood and OD matrix.

Although recent works consider combining traffic flows to utilize mobility patterns into OD matrix prediction, it is still a challenging problem, affected by the following aspects:

(1) Directed semantics of travel demand. There is a directed semantic relationship between OD travel demands among urban regions. In the left diagram of Fig. 2, A, B, and C represent the office region, residential region, and residential region, respectively. During the morning rush hour, there are similar travel demands from B to A and C to A. There are more travel demands with A as the destination, but less travel demand with A as the origin, which indicates that the semantic neighbors in the two directions between departure and arrival of A is different in the same period. The right diagram of Fig. 2 shows the 10-day OD demand curves between A and B revealing a large number of taxi demands from B to A. At the same time, due to the marginality of region B, the demand for taxis from A to B appears to be random. This non-periodic and non-stationary curve increases the difficulty of forecasting.

Fig. 2.
figure 2

OD spatial correlation (left) and OD flow time series (right) among different urban regions

(2) Various Traffic Context. In the existing OD matrix prediction studies, in addition to the number of travel demands between regions, other information, such as the traffic context of the origin and destination regions, has not been fully considered. Traffic context includes static traffic context information and dynamic traffic context information. Different traffic contexts contain different contextual information. For example, static POI information reflects the function of the region while the dynamic traffic context of the region, such as inflow and outflow traffic, reflects the dynamic traffic conditions of the region.

To tackle these challenges, we propose an effective model named ODCRN, to collectively predict the OD matrix in the following time slot more accurately and more efficiently. The primary contributions of this paper can be summarized as follows:

  1. (1)

    In order to capture the directional information of travel demand, we construct two OD matrices from the origin to the destination(forward) and from the destination to the origin(backward) to capture the forward and backward information by bi-directional graph diffusion convolution network. Then we combine them together to integrate the bi-directional semantic information.

  2. (2)

    We comprehensively consider static and dynamic context information by taking POI information which reflects regional functions as static traffic context information by establishing a bootstrap to better represent origins and destinations, and traffic flow information between regions as dynamic traffic context information. Then we capture the features through dynamic and static traffic context learning networks to collaborate with OD matrix prediction.

  3. (3)

    Extensive experiments on two open datasets, TaxiNYC and TaxiCD, demonstrate the proposed ODCRN model outperforms baselines.

2 Preliminaries and Related Work

2.1 Definitions

We first introduce several basic concepts to formulate the OD matrix prediction problem. We assume the target city is partitioned into subareas as regions and regard them as graph nodes.

Definition 2-1 OD Matrix. We define an OD matrix at time slot t as a 2-dimensional tensor , where \(N_l\) is the total number of regions in the city. It can also be defined as \(M_t=\{m_{i,j},1\le i,j\le N_l\}\), where represents specific figure of traffic flow from region i to region j at the time slot t. OD Matrices Sequence is defined as a 3-dimensional tensor , where \(N_t\) is the total number of slots of historical traffic data.

Definition 2-2 Semantic Neighbors. \(m_{i,j}\) represents the number of travels from region \(v_i\) to region \(v_j\). \(v_i\) and \(v_j\) are the OD semantic neighbors in this particular travel demand. \(v_i\) and \(v_j\) needn’t be geographically adjacent.

Definition 2-3 Static Traffic Context. We define the traffic graph of city as \(G=\{V,A\}\) where \(V =[v_1,\cdots ,v_{N_l}]\) and \(\left| V\right| =N_l\). \(v_i\in V\) is a node on the traffic graph which represents a region in the city. 2-dimensional tensor represents the static traffic context on all regions, such as POIs and embedding features.

Definition 2-4 Dynamic Traffic Context. We define the dynamic traffic context in regions as a 3-dimensional tensor , where the \(D_t\) represents the features’ dimension of dynamic traffic context. For example, when we use inflow and outflow as features of context, features’ dimension \(D_t=2\).

2.2 Origin-Destination Matrix Prediction

For a given time slot t, using OD matrices for the past p time slots , dynamic traffic contexts of every region for the past p time slots , static traffic context of every region and adjacent matrix which represents the region’s geographic proximity as inputs, to predict OD matrix of one-time slot in the future.

Generally, OD Matrix prediction task [2,3,4] can be divided into two categories, static approaches and dynamic approaches. Static approaches [5, 6] consider the traffic flow as independent time slots and ignore the spatial-temporal information by calculating the average traveling demand to predict the OD matrix over a long period of time. Dynamic approaches [7] take the time-variant traffichttps://www.overleaf.com/project/6424167639f3dccab22016fa flow information under consideration so they can be used for the management of the traffic flow and the dynamic path induction.

In recent years, an increasing number of neural network approaches, especially Graph Neural Networks (GNN) [9] have been applied to OD matrix prediction tasks. Since urban transportation networks can be seen as graph structures, GNN can naturally deal with the spatial relation of traffic flows in non-Euclidean city traffic networks. Those approaches consider sensors as graph nodes [12], the data at fixed time intervals as node features and the geographical proximity, distance, traffic similarity or road connectivity between sensors or regions as edges between two nodes. Researchers such as Wang and Ke [1, 14] proposed graph embedding methods to embed the grid and combine it with LSTM to predict the OD matrix. Multi-task learning traffic prediction approaches [15,16,17,18] were proposed for both regional traffic flow and OD matrix prediction tasks. Liu et al. [19] constructed station distance and similarity in traffic patterns for predicting subway OD trip traffic based on a combination of physical and virtual approaches. Zheng et al. [1] proposed a method to predict the OD matrix via graph convolution and Liu et al. [8] combine the spatial-temporal context to predict taxi OD demand.

We found that they rely heavily on the construction of traffic graphs to capture the spatial correlation of flows between geo-locations. Moreover, due to the functional differences between locations and the real-time mobility and variability of traffic flows, most graph neural networks fail to capture the dynamic spatial correlations between urban locations, as well as the hierarchical spatial correlations of both local traffic correlations and global traffic similarities.

3 The ODCRN Framework

In this section, we present the detailed architecture of the proposed model ODCRN. As the Fig. 3 shows, the model consists of three parts. The first part is the spatial-temporal information capture model, which is constructed by the Bi-GDC-GRU unit based on the architecture of RNN. As Fig. 4(a) shows, each GRU unit integrates the bi-direction diffusion graph convolution (in Fig. 4(b)) to capture the temporal and spatial correlations of the OD matrices. The other two components are dynamic and static traffic context learning networks, which are used to capture the traffic context features associated with each time slot and input them as node features \(\textrm{TC}_t\) into the Bi-GDC-GRU unit. We use CNN with the kernel size of \(1\times 1\) to predict the OD matrix of the next time slot \(M_{t+1}\) by using the last hidden state \(h_t\) in the prediction phase.

The inputs to the model include the OD matrix of the historical p time segments \([M_{t-p+1},\cdots ,M_t]\) and traffic context information for historical p time slot \([{TC}_{t-p+1},\cdots ,TC_t]\). As Fig. 3 shows, the traffic context learning network module learns regionally relevant traffic contextual information and consists of two sub-networks, including a static context learning network and a dynamic context learning network. We use Node2vec [20] embedding features which illustrate the regional structural features as input of the static traffic context learning network. To predict the number of passengers from one region to another region, we use the inflows and outflows of all regions in each time slot as inputs of the dynamic traffic context learning network.

3.1 Constructing Traffic Context from Historical Data

Static Traffic Context Learning Network Module. The attributes of the region itself, such as POI, population and other information, affect its traffic flow and influence its fluency as the origin or destination for commutes. We pretrain the embedding features of each node with Node2vec algorithm as the static traffic contexts for each node. Node embedding features represent the structural properties of the nodes on the graph and the similarity between nodes.

Fig. 3.
figure 3

ODCRN architecture

After constructing the static traffic context, we use a deep neural network DNN as a Static Context Learner (SCL) to map the static traffic context into node hidden states as follows:

$$\begin{aligned} h_1=SCL\left( S\right) =DNN\left( S\right) \end{aligned}$$
(1)

In Eq. 1, is the output of the static traffic context learning network SCL and the DNN is a multi-layers deep neural network.

Dynamic Traffic Context Learning Network Module. We count the inflow and outflow traffic in each subarea of the city for each time segment from the historical travel records. We use as regional dynamic traffic context. Since the inflow and outflow traffic of a region are affected by adjacent regions, we use a graph convolution network as a Dynamic Context Learner (DCL) network to capture spatial correlations. DCL maps the dynamic traffic context of all nodes in each time slot into hidden state representations:

$$\begin{aligned} h_2=\ DCL\left( X_t\right) =GDC\left( A,X_t,W\right) =\ \sum _{k=0}^{K}{A^kX_tW_k} \end{aligned}$$
(2)

In Eq. 2, GDC denotes the graph diffusion convolution [13] operation, which will be introduced in the following Sect. 3.2. is the dynamic traffic context at time slot t of the input, and W is the graph convolution parameters.

For each time slot t, we combine those two types of traffic context as shown in the Eq. 3:

$$\begin{aligned} \textrm{TC}_t=SCL\left( S\right) \ \mid \mid \ DCL\left( X_t\right) =\ {h_1\mid \mid h_2} \end{aligned}$$
(3)

3.2 Capturing Spatial-Temporal Correlations by Bi-GDC-GRU Unit

We propose Bi-directional Graph Diffusion Convolution GRU (Bi-GDC-GRU) unit to capture correlations of spatial-temporal information. Demands of different OD travel directions might be different at the same time. For example, office regions have more arrival demand in the morning and less departure demand. Figure 5(a) and Fig. 5(b) show that the OD semantic neighbors departing from and arriving at region r2 at the same time are different. Therefore, in order to partially solve the sparsity problem of the OD matrix, we construct the semantically adjacent edges of \(r_i\) in two directions(both in and out), which can aggregate more OD semantic neighbor node information.

Fig. 4.
figure 4

Details of Bi-GDC-GRU unit (in Fig. 4(a)) and an example of graph information aggregation process in Bi-GDC-GRU (in Fig. 4(b))

Specifically, for each input time t, we use the OD matrix \(M_t\) to construct two directed weighted adjacency matrices as shown in Fig. 4(b) and Fig. 4(c) as graphs to aggregate neighbor node information. The forward adjacency matrix is expressed as and the backward adjacency matrix is expressed as :

$$\begin{aligned} A_f^{\left( t\right) }=Norm\left( M_t\right) \end{aligned}$$
(4)
$$\begin{aligned} A_b^{\left( t\right) }=A_f^{(t)}.T \end{aligned}$$
(5)

In Eq. 4, Norm is to normalize the OD matrix \(M_t\) to prevent the gradient explosion problem after the graph convolution operation. In Eq. 5, .T is the transpose operation. Since there is some correlations between the region set as the origin and set as the destination, using the forward and backward adjacency matrix to simultaneously aggregate the semantic neighbor node features on the OD pair starting from node \(v_i\) and arriving at node \(v_i\).

Fig. 5.
figure 5

An example of forward and backward OD matrix. Figure 5(a) shows the trip times and directions of the four nodes. Figure 5(b) and Fig. 5(c) show the corresponding OD matrix. The travel demand relationship from r2 to r1, r2, r3, and r4 represents as [0,0,0,1], and the number of trips arriving at r2 is [3,0,5,4].

After constructing the forward and backward adjacency matrices, we use bi-directional diffusion graph convolution to capture the spatial correlation between OD pairs, as shown in Eq. 6:

$$\begin{aligned} h^{\left( t\right) }=\ \sum _{k=0}^{K}\left( A_f^{\left( t\right) }\right) ^k\textrm{TC}_tW_{k1}+\left( A_b^{\left( t\right) }\right) ^k\textrm{TC}_tW_{k2} \end{aligned}$$
(6)

\(\mathrm TC_t\) represents the traffic context at time t, and W is the graph convolution kernel parameter. For time t, the diffusion graph convolution of Eq. 6 is defined as a finite number of directed diffusion processes on the graph, k is the number of diffusions, and \({(A_f^{(t)})}^k\) represents the diffusion probability to the \(k-th\) hop neighbor. Figure 4, 5 and 6 shows the convolution process of the forward diffusion graph. The central node is represented by the aggregation and summation of the information of neighboring nodes of different hops. The diffusion probability \(A_f^{(t)}\) indicates the strength of the edge connection between nodes. The OD traffic of nodes is bi-directional, and the forward and backward diffusion graph convolution in Eq. 6 can aggregate the region as the information of multi-hop semantic neighbor nodes in the two directions of the origin and the destination.

Fig. 6.
figure 6

An schematic of k-steps forward graph diffusion convolution

Then, the time correlations of the OD matrix at different moments are captured by the GRU units based on the recurrent neural network. Specifically, we use GRU to capture the temporal correlation between OD matrices, and replace the matrix multiplication operation in GRU with the bi-directional diffusion graph convolution operation in Eq. 6.

$$\begin{aligned} r^{\left( t\right) }=\ \sigma {(\Theta _r*_G\left[ \textrm{TC}_t,H^{\left( t-1\right) }\right] +b_r)} \end{aligned}$$
(7)
$$\begin{aligned} u^{\left( t\right) }=\ \sigma \left( \Theta _u*_G\left[ \textrm{TC}_t,H^{\left( t-1\right) }\right] +b_u\right) \end{aligned}$$
(8)
$$\begin{aligned} c^{\left( t\right) }=tanh(\Theta _c*_G[TC_t,r^{(t)}\odot H^{(t-1)}]+b_c) \end{aligned}$$
(9)
$$\begin{aligned} H^{\left( t\right) }=u^{\left( t\right) }\odot H^{(t-1)}+(1-u^{(t)})\odot c^{(t)} \end{aligned}$$
(10)

In Eq. 10, \(H^{\left( t\right) }\) is the output of the GRU unit at time t. \(*_G\) is the diffusion map convolution operation represented by Eqs. (46). In Eq. 7, 8, and 9, \(\Theta \) is the graph convolution kernel parameter, \(\sigma \) is the Sigmoid activation function, and \(\odot \) is the Hadamard product. The reset gate \(r^{\left( t\right) }\) and the update gate \(u^{\left( t\right) }\) of the GRU unit at time t are used to control the forgetting information and control output respectively, and the input is the traffic context \(\textrm{TC}_t\) at the current time t and the previous time. The hidden state \(H^{\left( t-1\right) }\) output by the GRU unit is concatenated and used as node features to input into the diffusion graph convolution represented by Eq. 6. Diffusion Graph Convolution GRU units can capture both temporal and spatial correlations.

In the model prediction stage, we regard the hidden state output by the last GRU unit as the input of the prediction network. Since we need to predict the OD matrix next moment, we adopt a single-layer convolutional neural network with a \(1 \times 1\) convolution kernel as the prediction network, which is shown in Eq. 11:

$$\begin{aligned} {\ \hat{M}}_{t+1}=\Theta *_C\left( H^{\left( t\right) }\right) \end{aligned}$$
(11)

where is the hidden state output by the last GRU unit, \(*_C\) is the convolution operation, \(\Theta \) is the convolution kernel parameter, is the predicted OD matrix at the next moment.

3.3 Loss Function of OD Matrix Prediction

We use Mean Square Error (MSE), a common loss function in regression tasks, to back-propagate the error and optimize the model parameters. As shown in Eq. (412):

$$\begin{aligned} L\left( W_\theta \right) =MSE\left( M_{t+1},{\ \hat{M}}_{t+1}\right) =\frac{1}{N_l*N_l}\sum _{i=1}^{N_l}\sum _{j=1}^{N_l}\left( m_{i,j}-{\hat{m}}_{i,j}\right) ^2 \end{aligned}$$
(12)

\(W_\theta \) are the parameters of the whole model. are the true and predicted values of the OD matrix at the next moment, respectively. \(N_l\) is the total number of nodes, namely the number of regions in the city.

4 Experiment

4.1 Datasets and Evaluation Metrics

We use taxi order data from New York City and Chengdu for the experiments. The raw data information includes boarding time, alighting time, boarding longitude, boarding latitude, alighting longitude and alighting latitude.

We divide the whole city into \(m\times n\) regions. New York City is divided into 256\((16\times 16)\) regions while Chengdu City is divided into 256\((16\times 16)\) regions as well. Firstly, based on the data of orders, the number of orders corresponding to the boarding and alighting regions per hour is counted and used to construct an OD Matrix, which is arranged in chronological order to form a sequence of OD Matrices with a dimension of \(T*N*N\), where the T is the total number of time periods and \(N=m\times n\) is the total number of regions. Specific information on the dataset is shown in Table 1:

Table 1. Description of Dataset

At the same time, we construct two kinds of traffic context information:

  1. (1)

    Static Traffic Context Information. Using the data of orders within two weeks, we construct the graph by using the starting grid and the ending grid as two nodes on an edge. The number of orders is the weight of the edge. The embedding vector for each grid was pre-trained with Node2vec.

  2. (2)

    Dynamic Traffic Context Information. The number of boardings and alightings corresponding to each zone per hour is counted as the inflow and outflow of the zone.

The OD matrix prediction task uses historical data from the past 8 h (8 unit time slots) to predict the future OD matrix of 1 h (1 time slot). The dataset is divided in chronological order with training, validation and test set at the ratio of 8:1:1. We evaluate the performance by two commonly used regression task evaluation indicators, root mean square error(RMSE) and mean absolute error(MAE).

4.2 Parameter Setting

The hyperparameters of our model are set as follows:

  1. (1)

    Dimension of the Node2vec nodes embedding feature is 10. The static context learning network consists of 2 layers neural network where the number of hidden layer units in each layer is 32. The dynamic context learning network is a 3 layers GDC network, and the number of units in each layer is 32.

  2. (2)

    The ODCRN has a total of two layers of recurrent networks, where each recursive cell contains two layers of diffusion map convolutional networks. The number of hidden layer cells is 32, and the number of GDC diffusion steps is set to 2.

  3. (3)

    We use Adam optimizer to train the model. The learning rate during the training stage is 0.01 with a batch size of 128. In addition, to prevent overfitting, the regularization parameter is set to 0.0001 and an early stop mechanism is used.

4.3 Spatial-Temporal Data Preprocess

For OD matrix prediction, we use historical data from the traffic flows’ dataset to generate static and dynamic traffic contexts as the input of traffic context, and generate the OD matrices for the past 8 h as input of the historical OD matrices sequence.

Fig. 7.
figure 7

Traffic context and OD matrix construction process

Static Traffic Context Data. According to the statistics of the start point, end point and traveling counts between them, we use Node2vec to train the embedding features of each region as our static traffic context, including graph structural characteristics and node similarity.

Dynamic Traffic Context Data. As shown in Fig. 7(b), we count the inflow and outflow of each region in hourly intervals as our dynamic traffic context data. regions with high inflow flows tend to be the end of the commute, while those with high outflow flows tend to be the origin of the commute.

OD Matrices Sequence. Since inter-regional commuting takes time, the OD matrices are also counted in hours. We first grid the data to find the semantic neighbors of each region. Then we perform temporal statistics on the order start and end points of each region to determine the weights of the adjacency matrix. We regard the OD matrix as the adjacency matrix and use it as our input of the Bi-GDC-GRU cell to capture the spatial association of OD semantic neighbors.

4.4 Comparison and Analysis of Model Prediction Accuracy

Table 2 shows the prediction results of each method on the TaxiNYC and TaxiCD datasets. It is clear that the traditional models HA and ARIMA have poor prediction results, due to the randomness of OD travel demand between some regions. ARIMA and the historical averaging method (HA) are simple but less effective. The deep learning method ST-ResNet, which considers the trend, period, proximity, and other temporally more relevant historical data, and deposits a residual network to enhance the network depth, improves the prediction effect. In contrast, as an improved version of ST-ResNet [11], MDL [15] model designs node network and edge network branches to extract edge flow (OD) and node flow (inflow and outflow) features respectively, and uses a multi-task learning approach to predict both flows. But its performance is inferior to that of the single-task ST-ResNet. MDL and ST-ResNet use three segments of historical data as input. GEML [1] is a GNN-based OD matrix prediction model that uses a grid embedding approach and graph convolution to aggregate information about the semantic and geographic neighbors of OD pairs to pre-weight the importance of neighbor nodes in the OD matrices, and adapts LSTM to capture temporal correlations. But the model neither considers the matrix prediction in terms of dynamic and static contextual information of the starting and ending points, nor does it explicitly distinguish the direction of the edges of the adjacency matrix.

Table 2. Comparison Results of TaxiNYC and TaxiCD

Among all models, ODCRN has the lowest error on the OD matrix prediction task. The prediction performance is improved because ODCRN uses the time-varying OD matrix as the adjacency matrix, the values of the OD matrix as the weights of edges, and considers both forward and backward-directed edges to aggregate neighbor node features more effectively. In addition, ODCRN incorporates both static and dynamic traffic context information of each region, which enables the traffic conditions in the region to be captured to further improve the prediction.

4.5 Comparison of Model Complexity and Speed

We compare the number of parameters (complexity), training speed and inference speed (testing speed) of ODCRN and several deep learning-based baseline models, where model complexity and speed are important reference metrics for model deployment. In the OD matrix prediction task, we define the OD matrix of the first 8 time slots with the OD matrix of the next 1 time slot representing the labeled data as one sample. Define the speed as the number of samples that can be processed per unit of time (samples/s).

Table 3. Comparison of the number of Parameters on the Model and Speed

As shown in Table 3, the MDL model uses multi-task learning to extract both regional traffic flow features and OD matrix features, and each task uses three deep convolutional networks with residual connections to extract spatial-temporal features from three highly correlated historical data, respectively, with the highest complexity. ST-ResNet only performs feature extraction on the OD matrix, and the single task makes the model parametric number and speed better.

Our ODCRN model, compared with the GEML model, is an OD matrix prediction model with an LSTM networkwhich has a much smaller number of parameters. For inputs at different time steps, the parameters in the LSTM units are shared, greatly reducing the number of parameters. In contrast, since the ODCRN model integrates both GRU and GCN, the number of parameters in the whole model is mainly the convolutional kernel parameters in the bi-direction diffusion graph convolution in the GRU units, which further reduces the number of parameters compared with the serial structure of GCN and LSTM in GEML. The design of the recurrent unit in ODCRN can capture both temporal and spatial correlations, which improves the training and inference speed while reducing the parameters.

4.6 Hyperparameter Experiments

In this section, we adjust the hyperparameters in the ODCRN model and compare the prediction errors of the model under different parameters. This experiment adjusts the hidden status dimension of the bi-directional diffusion graph convolution in GRU, the hidden state dimension of the static traffic context learning network DNN, and the hidden state dimension of the dynamic traffic context learning network GDC. The experiments are conducted by using the control variates, where one of the parameters is adjusted to observe the prediction accuracy of the model while the other parameters remain unchanged.

Fig. 8.
figure 8

The Influence of the Number of Hidden Units on Model Performance

In Fig. 8a(b) and Fig. 8b(b), a hidden state of 32 in the static context network provides a better representation of spatially static information, while smaller hidden state dimensions (16 dimensions) and larger hidden state dimensions (128 dimensions) suffer from underfitting and overfitting problems. Finally, the dynamic traffic context, i.e. the GDC with input features of regional inflow and outflow traffic (2 dimensions), was adjusted for different hidden states. As shown in Fig. 8a(c) and Fig. 8b(c), the model over-fits as the dimensionality of the GDC hidden layer increases, indicating that the dynamic traffic context features can be better extracted by using GDCs with low-dimensional (16-dimensional) hidden layers at lower input feature dimensions.

4.7 Ablation Experiment

We conduct extensive ablation studies to quantify the performance benefit of each component in our model. Four variants of the ODCRN-based model are constructed and tested as follows:

  1. (1)

    W/O Static Context Learner: without inputting the static traffic context information, i.e., Node2vec node embedding features.

  2. (2)

    W/O Dynamic Context Learner: without inputting the dynamic traffic context information, i.e., The inflow and outflow for each region at each moment.

  3. (3)

    W/O Forward GDC: removing the graph aggregation from the central node, the forward adjacency matrix is an \(N \times N (N = m \times n)\) matrix from Origin to Destination.

  4. (4)

    W/O Backward GDC: removing the graph aggregation to the central node, the backward adjacency matrix is an \(N \times N (N = m \times n)\) matrix from Destination to Origin.

Table 4. Results of Ablation Experiments on Dataset TaxiNYC and TaxiCD

The results of the model ablation experiments are shown in Table 4. The ODCRN model has the best prediction with all components retained, and the effectiveness of each component will be described in turn as follow.

Static Context Learning Network. When the static context learning network is removed, RMSE and MAE increase compared to ODCRN, which indicates that adding Node2vec node embedding features helps to improve the performance of the model. Since there is no POI information of the regions, we count the starting region, the ending region, and the number of orders as the connected edges between regions (nodes) from one month of taxi orders, so that the constructed graph structure can reflect the community structure, and other characteristics of a region as a popular starting or ending region has multiple connected edges on the graph. Node2vec trains the node embedding features to allow the OD matrix prediction model to learn the prior knowledge of nodes which helps improve the prediction.

Dynamic Context Learning Network. Regions with high inflow traffic tend to be set as the endpoint in OD travel demand, while regions with high outflow traffic tend to be set as the starting point in OD travel demand. Using the dynamic traffic contexts of regions as the dynamic characteristics of nodes can improve the OD matrix prediction.

Forward and Backward Diffusion Graph Convolution. Due to the regularity of commuting activities, the OD travel demand has different directions at different moments, and departure and arrival are correlated. Therefore, using the OD matrix with directions (forward) and its transposed DO matrix (backward) to construct bidirectional edges from the target region to other regions, and from other regions to the target region, the graph convolution learns more effective node representations than one-way edges and is aggregated with more information about neighbor nodes.

4.8 Comparison Experiment of Predicted Value and Actual Value

For a more intuitive view of the model predictions, we selected Region 113 and Region 198 in the TaxiNYC dataset and plotted the predicted versus actual value curves of OD travel demand between the two regions for a total of 10 days from April 21, 2015 to April 30, 2015, respectively.

Fig. 9.
figure 9

Comparison between Predicted and Actual value between regions

From Fig. 9, we can see that the true values of the OD curves change irregularly with time, which shows an overall cyclical nature and a more obvious morning and evening peak. The trend of the curve and the range of values are different, and the demand for taxis from region 113 to region 198 is more than the demand for taxis from region 198 to region 113, which indicates that the travel demand between regions is asymmetric, and the forward OD matrix and backward OD matrix we constructed aggregates the characteristics of neighbor nodes with OD semantic relationship through a directed and weighted adjacency matrix. The ODCRN fits the real data well and is generally consistent with the real values in terms of trend and numerical magnitude. Secondly, we can also find from Figs. 9 that the OD curves show dramatic local fluctuations due to the dynamic changes in taxi demand between regions. And this acyclic, randomly changing curve increases the difficulty of prediction, so the model fails to give a more accurate prediction in detail.

Figure 10 shows the predicted values of the OD matrix (\(N \times N, N = 256\)) compared to the true values at three different moments on April 30, 2015 at 8 am, 2 pm and 10 pm. It can be seen that the smaller number suburban regions keep the values of the corresponding locations of the OD matrix silent due to the sparse number of taking taxis. The OD matrix predicted by our model is close to the real OD matrix in terms of spatial distribution and temporal variation of ride-hirings, and can better predict the OD travel demand among urban regions.

Fig. 10.
figure 10

Comparison of the Predicted Value and Actual Value of OD Matrix

5 Limitation Analysis

Although the method proposed in this paper is better than the existing traffic prediction model in terms of prediction effect, there still exist some aspects which can be further improved.

Prediction of Non-stationary and Non-strictly Periodic Traffic Time Series. Due to the uneven development of urban regions, differences in travel demand, or uneven distribution of data collection devices, there are non-stationary and non-periodic trends in the time series of some nodes, and the prediction of such time series is more difficult than that of regular and periodically varying time series. So, a more robust and better prediction model for the input time series is needed.

Traffic Flow Prediction Under Abnormal Conditions. By visualizing the prediction results, we find that the model has difficulty in achieving accurate prediction for traffic flow surges or sudden decreases caused by peak hours, holidays, extreme weather, social events, traffic accidents, etc. We consider increasing the amount of training data, which can add external features containing the above events, or time series anomaly detection methods to achieve more accurate prediction.

6 Conclusion

In this paper we propose ODCRN, a novel model that integrates traffic context and spatial-temporal information of OD matrices. We take advantage of the bi-directional semantic information of each travel demand’s semantic neighbors to capture both the inflow and outflow of one region. Then we construct traffic context information by static and dynamic traffic contexts to coordinate with OD matrices in the prediction task. We conduct extensive experiments in two datasets and the results demonstrate that our method outperforms baseline methods in terms of prediction accuracy and model complexity.

In future work, we will explore predicting traffic spikes or dips caused by rush hours in the day, holidays, extreme weathers and social events more accurately. Also, we will consider integrating more traffic features into the model elegantly, which requires more experiments on model design and data mining.