SSGCRTN: a space-specific graph convolutional recurrent transformer network for traffic prediction

Yang, Shiyu; Wu, Qunyong; Wang, Yuhang; Lin, Tingyu

doi:10.1007/s10489-024-05815-1

SSGCRTN: a space-specific graph convolutional recurrent transformer network for traffic prediction

Published: 07 September 2024

Volume 54, pages 11978–11994, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

SSGCRTN: a space-specific graph convolutional recurrent transformer network for traffic prediction

Download PDF

Shiyu Yang^1,2,3,
Qunyong Wu ORCID: orcid.org/0000-0002-4081-5293^1,2,3,
Yuhang Wang^1,2,3 &
…
Tingyu Lin^1,2,3

88 Accesses
Explore all metrics

Abstract

Current research often formalizes traffic prediction tasks as spatio-temporal graph modeling problems. Despite some progress, this approach still has the following limitations. First, space can be divided into intrinsic and latent spaces. Static graphs in intrinsic space lack flexibility when facing changing prediction tasks, while dynamic relationships in latent space are influenced by multiple factors. A deep understanding of specific traffic patterns in different spaces is crucial for accurately modeling spatial dependencies. Second, most studies focus on correlations in sequential time periods, neglecting both reverse and global temporal correlations. This oversight leads to incomplete temporal representations in models. In this work, we propose a Space-Specific Graph Convolutional Recurrent Transformer Network (SSGCRTN) to address these limitations simultaneously. For the spatial aspect, we propose a space-specific graph convolution operation to identify patterns unique to each space. For the temporal aspect, we introduce a spatio-temporal interaction module that integrates spatial and temporal domain knowledge of nodes at multiple granularities. This module learns and utilizes parallel spatio-temporal relationships between different time points from both forward and backward perspectives, revealing latent patterns in spatio-temporal associations. Additionally, we use a transformer-based global temporal fusion module to capture global spatio-temporal correlations. We conduct experiments on four real-world traffic flow datasets (PeMS03/04/07/08) and two traffic speed datasets (PeMSD7(M)/(L)), achieving better performance than existing technologies. Notably, on the PeMS08 dataset, our model improves the MAE by 6.41% compared to DGCRN. The code of SSGCRTN is available at https://github.com/OvOYu/SSGCRTN.

AMGCN: adaptive multigraph convolutional networks for traffic speed forecasting

Article 19 February 2024

ISTGCN: Integrated spatio-temporal modeling for traffic prediction using traffic graph convolution network

Article 23 October 2023

Spatiotemporal synchronous dynamic graph attention network for traffic flow forecasting

Article 28 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As urbanization accelerates, the surge in population and vehicles presents unprecedented challenges to urban traffic infrastructure. [1]. Traffic prediction, a vital aspect of Intelligent Transportation Systems (ITSs) [2, 3], has evolved into a research area of mutual interest for both academia and industry. For traffic managers, traffic prediction provides real-time traffic information, helping drivers effectively avoid congestion. For travelers, traffic prediction offers trip suggestions and helps optimize travel choices [4].

Traffic prediction aims to forecast future traffic flow states by observing traffic time series data and the underlying road network structure. Traffic prediction has always been challenging due to complex spatio-temporal dependencies. Traffic data exhibits dependencies in temporal, spatial, and spatio-temporal relationships. Temporal dependency refers to the influence of past time points on future ones, where complex temporal correlations make long-term traffic forecasting difficult. For example, when using data from the past 12 time steps to predict the next 12, the 8th to 12th steps are generally harder to predict, while the 1st to 4th steps are easier. Spatial dependency refers to the mutual influence of sensor nodes simultaneously. As shown in Fig. 1, temporal dependency usually exhibits slight periodic changes, and spatial dependency between roads displays different correlations under different environments. Spatio-temporal dependency involves the mutual influence between sensor nodes over time. Although advanced research has made progress in capturing complex spatio-temporal dependencies, limitations remain in the following two aspects.

1.
Insufficient modeling of spatial dependencies. Early studies [5,6,7] constructed static adjacency matrices in the intrinsic space based on prior knowledge provided by road networks. This approach considered the intrinsic spatial dependencies between nodes but often ignored the dynamic spatial dependencies caused by factors such as traffic congestion. GWN [8] and AGCRN [9] addressed this limitation by constructing adaptive adjacency matrices using learnable node embeddings in a single space. However, they overlooked multiple latent spatial dependencies between nodes under different factors. These dependencies are influenced by spatial heterogeneity, causal associations, and uncertainty. For example, as shown in Fig. 2, spatial heterogeneity means that traffic conditions in different locations, such as residential and commercial areas, vary significantly due to their attributes, like road conditions and points of interest. Causal associations refer to the interactions and dependencies between different components of the traffic system [10, 11]. For instance, traffic congestion in a residential area may be caused by a traffic accident in a commercial area, and this congestion may in turn lead to subsequent multi-vehicle collisions. It is necessary to identify potential factors that may trigger events from pre-event data and analyze the direct and indirect impacts of events from post-event data. Uncertainty refers to the effects of various factors, such as weather changes and holidays, which increase the complexity of traffic dynamics prediction and management. Therefore, diversified spatial dependencies can be fully considered by dynamically modeling in multiple specific spaces.
2.
Limitations in processing long time series data. Due to their chain-like structure, Recurrent Neural Networks (RNNs) and their variants often suffer from gradient vanishing or exploding issues, making them inefficient at capturing long-term dependencies. In contrast, Transformer models are widely applied to various sequence processing tasks due to their unique structural design [12]. Transformers rely entirely on self-attention mechanisms to capture global dependencies between data points, effectively extracting long-term temporal dependencies.

Considering the aforementioned challenge, we introduce a Space-specific Graph Convolutional Recurrent Transformer Network (SSGCRTN) to explore the interaction between input data and the spatio-temporal correlations in road networks. The main contributions are as follows:

We propose a novel Space-Specific Graph Convolution (SSGC), consisting of Intrinsic Space Graph Convolution (ISGC), Latent Space Graph Convolution (LSGC), and an Adaptive Fusion Layer (AFL). Notably, in the LSGC, we introduce a multi-head mechanism that divides embeddings into multiple subspaces and automatically learns dynamic graphs for each subspace. Learning of multiple subspaces effectively simulates the diversity of spatial relationships.
We develop a new Space-Specific Graph Convolutional Recurrent Network (SSGCRN), which replaces the gated units of GRU with SSGC to capture parallel spatio-temporal dependencies.
To better capture the causal relationships in context, we introduce a Spatio-Temporal Interaction Module (STIM), which uses a bidirectional SSGCRN to recursively integrate spatio-temporal dependencies in each cycle. At the end of STIM, a Transformer-based Global Temporal Fusion Module (GTFM) is introduced. It employs a self-attention mechanism to dynamically extract key features at each time step and flexibly allocate weights, thereby effectively capturing global spatio-temporal correlations.
Extensive experiments on four real-world traffic flow datasets and two speed datasets have confirmed the superior performance of our method.

2 Related works

Traffic prediction has been extensively researched [13], resulting in two main categories: traditional methods and deep learning methods. Traditional methods include both classic statistical learning and machine learning techniques.

Initially, traffic prediction tasks were viewed as simple time series forecasting problem, addressed using classic statistical approaches like ARIMA [14] and VAR [15]. These models are based on the assumption of linear relationships, while traffic data exhibits complex nonlinear relationships. Therefore, they perform worse than machine learning methods. The use of Markov Jump Neural Networks for sampling data control systems synchronization can handle random fluctuations in traffic prediction, providing a method to manage nonlinear traffic data that surpasses the limitations of traditional statistical methods [16]. Subsequently, other traditional machine learning methods, such as SVR [17], RFR [18], and KNN [19], have also been widely applied to traffic prediction. These methods can capture the nonlinear characteristics of traffic data well but often require manual feature engineering, which limits their flexibility and degree of automation. The coupled inertial memristive neural network model offers a new perspective by simulating the complex interactions between road sections in the traffic network and capturing the nonlinear and periodic characteristics of traffic data [20]. Additionally, the event-triggered output tracking method provides an efficient solution for real-time traffic prediction by reducing communication requirements and improving system response efficiency, demonstrating higher efficiency and real-time performance in practical applications [21].

In recent years, the focus has shifted to deep learning methods due to their superior auto-feature learning capabilities. RNNs and its variants, LSTM and GRU [22,23,24,25], have been used to extract temporal features in traffic prediction. However, as the sequence length increases, these methods may encounter issues with low computational efficiency and error accumulation. In contrast, Convolutional Neural Networks (CNNs) have fewer parameters and the ability to transfer learning, leading to the widespread adoption of CNN-based methods like Wavenet [26] and TCN [27] for traffic prediction. Some researchers have also created grid-based maps of traffic data and used CNNs for spatial information extraction [28, 29]. However, these CNN-based methods are limited in capturing the topological structure of traffic road networks. Graph Neural Networks (GNNs) represent data using graphs and are widely applied with excellent performance in tasks such as graph classification, node clustering, and other graph-structured tasks. TCGNN [30] uses GNNs for network flow classification, PistGNN [31] serves recommendation systems, and HetGAPN [32] improves text processing in natural language processing. The flexibility and powerful capabilities of GNNs mean that they are not limited to traditional graph-structured tasks; they are beginning to demonstrate their unique value in the field of traffic as well.

Recent research has seen the adoption of GNN-based methods for analyzing spatial correlations in traffic data [33], while also incorporating RNNs, CNNs, and attention mechanisms to grasp the temporal dynamics of traffic data. For instance, DCRNN [5] combines GRU with diffusion GCNs, simulating the diffusion process of traffic spatial correlations through a directed graph. However, it only considers the fixed connection relationships between traffic nodes, capturing spatial dependencies through bidirectional random walks, while neglecting the dynamic associations between nodes. In contrast, ASTGCN [7] and STSGCN [34], by integrating spatio-temporal attention mechanisms, more effectively capture dynamic spatial-temporal dependencies. Nevertheless, relying solely on spatial or temporal attention components makes it difficult to comprehensively capture the global spatio-temporal dependencies in traffic flow data. This is especially true in the context of real-time urban traffic dynamics, where the fixed graph structure represented by a static adjacency matrix cannot effectively capture this dynamic nature. To address this, GWN [8] introduces a new adaptive adjacency matrix through node embeddings. STFGNN [35] generates temporal graphs in a data-driven manner and innovatively fuses spatio-temporal relationships at different time steps in parallel. STGODE [36] proposes a novel tensor-form GNN to address deep GCNs’ over-smoothing issue. Z-GCNETs [37] and TAMP-S2GCNets [38] are adept at extracting hidden temporal features by analyzing the topological characteristics of time. DSTAGNN [39] creates dynamic-aware graphs following a data-driven approach to represent dynamic node associations in traffic networks. STGPCN [40] uses graph product to take spatial and temporal graphs as inputs, automatically creating a new large overlapping spatio-temporal graph. DGCRN [41] finely models the dynamic graphs at each timestep using a graph generation algorithm, also introducing a novel training strategy. HSTGCNT [42] combines spatio-temporal graph convolutional networks with long-term temporal transformer networks to capture long-term and short-term temporal relationships and integrates these relationships through an attention fusion module. However, most GNN-based methods either overlook the inherent spatial dependencies in predefined graph matrices or ignore the dynamic associations of traffic nodes under different factor influences. Key determinants of intricate traffic flow fluctuations, like the causative links among traffic incidents, have not been completely accounted for. The latest research leverages the inherent spatial information of road networks, combining generative models with textual descriptions of traffic systems for traffic generation, resulting in more comprehensive and realistic traffic conditions [43].

Compared to the aforementioned methods, we consider the real connections between nodes in the traffic network and learn the latent associations between nodes under different influencing factors through embeddings in multiple subspaces. We also simulate the complex spatial interactions in the road network using an adaptive fusion structure. Additionally, we account for the causal relationships between traffic events and apply a self-attention mechanism to each node to capture global features better.

3 Preliminaries

Definition 1: Traffic Network. The actual traffic road network and sensors deployed for recording traffic information are formalized as a graph $G=\left( V,E,A \right) $, where G represents the actual traffic road network, $V=\left\{ v_1, v_2, \ldots , v_N\right\} $ denotes the set of sensor nodes on the road, E is the set of edges between the neighboring sensors, and $A \in \mathbb {R}^{N \times N}$ corresponds to the adjacency matrix of G.

Definition 2: Traffic Signal Matrix. Traffic characteristics (such as traffic flow and speed) recorded by sensors at the t-th timestep are represented as a tensor $X_t \in \mathbb {R}^{N \times C}$, where C is the number of traffic characteristics.

Traffic prediction involves learning a function, represented as $\mathfrak {F}(\cdot )$, on traffic network G, to forecast traffic flow at sensor nodes based on historical data from H timesteps, for the forthcoming P timesteps:

$$\begin{aligned} {{X}_{(t+1):(t+P)}}=\mathfrak {F}\left( {{X}_{(t-H+1):t}};G \right) , \end{aligned}$$

(1)

where $X_{(t+1):(t+P)} \in $ ${{\mathbb {R}}^{P\times N\times C}}$,${{X}_{(t-H+1):t}}\in {{\mathbb {R}}^{H\times N\times C}}$.

4 Methodology

The architecture of our model is illustrated in Fig. 3. Initially, the input data is processed by STIM to extract preliminary spatio-temporal features. Next, it passes through GTFM to capture the global spatio-temporal features. Finally, a fully connected layer learns the nonlinear spatio-temporal dependencies and adjusts the dimensions to the required output size.

4.1 Space-specific graph convolution

4.1.1 Intrinsic space graph convolution

The traffic network visually encodes the interconnections between urban centers and suburban roads. The topological links between various roads are unchanging. Consequently, based on the standard deviation of the actual distance between sensors and a set threshold, a threshold Gaussian kernel is used to construct an adjacency matrix $A^{IS}$ in the intrinsic space. The initial node weights are set to mirror the non-Euclidean topological connections among different nodes:

$$\begin{aligned} {A^{IS}} = \left\{ {\begin{array}{ll} {\exp \left( { - \frac{{d{{\left( {{v_i},{v_j}} \right) }^2}}}{{{\sigma ^2}}}} \right) ,} & {{\text {if}}\,d{{\left( {{v_i},{v_j}} \right) }^2} \le k} \\ {0,} & \text { otherwise} \end{array}} \right. , \end{aligned}$$

(2)

where $d\left( v_i, v_j\right) $ is defined as the distance between node ${{v}_{i}}$ and node ${{v}_{j}}$, $\sigma $ represents the standard deviation of the distance, and k signifies the threshold.

Previous studies [44, 45] often used multi-layer GCNs to model long-distance spatial dependencies, but these models only consider the information of directly connected neighbors at each layer. Additionally, using multi-layer GCNs can lead to nodes in locally connected subgraphs having overly similar representations, reducing predictive performance. In this paper, we update node representations using a single-layer GCN and introduce multi-hop neighbors in this layer to obtain richer road network traffic topological information, effectively reducing the over-smoothing issue. The representation of k-hop ISGC is given below:

$$\begin{aligned} {X_{IS}}&= \sum \limits _{i = 0}^K {{H^{(k)}}} {W^{(k)}},\nonumber \\ {H^{(k)}}&= {H^{(k - 1)}}{\tilde{A}^{IS}}, \nonumber \\ {H^0}&= {X_{in}},\\ {\tilde{A}^{IS}}&= {\left( {{{\tilde{D}}^{IS}}} \right) ^{ - 1}}{A^{IS}}, \nonumber \\ {\tilde{D}^{IS}}&= \sum \limits _j {A_{i,j}^{IS}} \nonumber , \end{aligned}$$

(3)

where $X_{i n}$ and $X_{IS}$ represent the initial and resultant node states, respectively. $W^{(k)} \in \mathbb {R}^{d_{i n} \times d_{\text{ out } }}$ is a learnable parameter, and K stands for the total number of hops.

4.1.2 Latent space graph convolution

The spatial correlations between sensors are influenced by various irregular factors, leading to spatial graph structures that are not entirely consistent under different conditions. Sole dependence on a predefined graph for spatial dependencies, without direct relevance to the prediction task, can result in notable bias. GWN generates the adjacency matrix through learnable node embeddings. First, two learnable parameters $E_1$ and $E_2$ are randomly initialized, where $E_1, E_2 \in \mathbb {R}^{N \times D}$. Then, the calculation is performed as follows:

$$\begin{aligned} \hat{A}_{a a p}={\text {Softmax}}\left( {\text {ReLU}}\left( E_1 E_2^T\right) \right) . \end{aligned}$$

(4)

AGCRN directly generates ${D}^{-\frac{1}{2}} {A}{D}^{-\frac{1}{2}}$ to avoid unnecessary and repetitive calculations:

$$\begin{aligned} D^{-\frac{1}{2}} A D^{-\frac{1}{2}}={\text {Softmax}}\left( {\text {ReLU}}\left( E_A E_A^T\right) \right) . \end{aligned}$$

(5)

We observe that both GWN and AGCRN overlook the multiple latent spatial dependencies that may exist between nodes. To address this, we propose modeling dynamic spatial dependencies in multiple latent spaces. Specifically, we first randomly initialize two learnable node embedding dictionaries ${{E}_{1}},{{E}_{2}}\in {{\mathbb {R}}^{N\times {{n}_{space}}\times {{d}_{k}}}}$, where ${{n}_{space}}$ is the number of latent spaces, and ${{d}_{k}}$ represents the embedding dimension of each latent space. The use of two different dictionaries allows the model to learn and represent node characteristics from two distinct perspectives, akin to observing data from different viewpoints, thus enhancing the potential to capture complex dependencies. $E_1$ can be seen as providing an "emitting" characteristic description of the nodes, while $E_2$ provides a corresponding "receiving" characteristic description. This design enables each node to interact with other nodes not just in a single way but in multiple latent spaces, playing different roles (i.e., emitter or receiver). Additionally, by independently optimizing $E_1$ and $E_2$, we can more flexibly adjust the interaction modes between nodes during training, which is advantageous for learning complex and nonlinear spatial relationships. We then infer the latent dependencies in each latent space using the following formula:

$$\begin{aligned} A_i^{LS} = {\mathop {\text{ S }oftmax}\nolimits } \left( \frac{\mathop {\text {ReLU}}\nolimits \left( E_{1,i}E_{2,i}^T \right) }{\sqrt{d_k}} \right) , \end{aligned}$$

(6)

where i represents the i-th latent space, and ReLU is the activation function used to filter out some of the weaker connections. Notably, the Softmax function is directly applied to achieve the normalized graph, bypassing the need to first create an adjacency matrix and then calculate the Laplacian matrix, thereby eliminating unnecessary computations. We replace $A^{IS}$ in (3) with $A_i^{LS}$, and then obtain $X_i^{LS}$ through graph convolution operations. Each latent space independently learns different spatial dependencies, allowing the entire model to capture the diversity of spatial relationships. The final output $X_{LS}$ is obtained by averaging the outputs of each latent space $X_i^{LS}$ to model the multiple latent spatial dependencies:

$$\begin{aligned} X_{LS}=\frac{1}{n_{space}} \sum _{i=1}^{n_{space}} X_i^{LS}. \end{aligned}$$

(7)

In this process, our model integrate information from different processing streams, effectively capturing multi-level and multi-granularity spatial dependencies in the data. By integrating outputs from various latent spaces, the model enhances its comprehensive understanding of spatial domain knowledge, enabling it to interpret and utilize this information at multiple levels.

4.1.3 Adaptive fusion layer

ISGC focuses on the information of neighboring nodes around existing connected nodes on the road, making it localized. The learning of multiple subspaces in LSGC reveals potential connections between unknown nodes, allowing LSGC to effectively capture the dependencies between two spatially distant road nodes, making it global. To coordinate and enhance the interaction between these two types of spatial dependencies, we use an adaptive fusion layer to integrate them. The fusion method is as follows:

$$\begin{aligned} {X_s} = {W_{IS}}{X_{IS}} + {W_{LS}}{X_{LS}}, \end{aligned}$$

(8)

where ${{W}_{IS}}=1-{{W}_{LS}}$ is a learnable parameter.

4.2 Bidirectional spatio-temporal dependency fusion

Inspired by AGCRN [9] and DGCRN [41], we design the SSGCRN, which replaces the traditional gating units of GRU with SSGC. This allows multi-granularity information from the spatial domain to be directly used to model short-term dynamics and complex interactions in the spatio-temporal domain simultaneously. This design not only enhances the accuracy of spatio-temporal data processing but also deepens the model’s understanding of node interactions at different temporal and spatial granularities. SSGCRN fuses parallel spatio-temporal dependencies across multiple spatial dimensions at each adjacent time step to effectively learn the underlying local spatio-temporal dependencies in traffic data.

It’s worth noting that relationships in traffic data are not invariably arranged sequentially. Indeed, there are intricate causal links among various traffic events. Therefore, we introduce STIM, which recursively fuses the intrinsic and latent spatio-temporal dependencies of each period through forward and backward SSGCRN. Specifically, we concatenate the input data $X_s^{(t)}$ at the current time t with the output of the SSGCRN from the previous time step $h_{t-1}^{(d)} \in \mathbb {R}^{N \times C^{\prime }}$ as the input information for the current time step, and then perform the following calculations:

$$\begin{aligned} z_t^{(d)}=\sigma \left( \mathcal {G}\left( \left[ X_s^{(t)} \Vert \textrm{h}_{t-1}^{(d)}\right] ; \Upsilon _z^{(d)}\right) \right) \nonumber \\ r_t^{(d)} = \sigma \left( {\mathcal{G}\left( {\left[ {X_s^{(t)}\parallel {\text {h}}_{t - 1}^{(d)}} \right] ;\Upsilon _r^{(d)}} \right) } \right) \nonumber \\ c_t^{(d)} = \tanh \left( {\mathcal{G}\left( {\left[ {X_s^{(t)}\parallel r_t^{(d)} \odot {\text {h}}_{t - 1}^{(d)}} \right] ;\Upsilon _c^{(d)}} \right) } \right) \nonumber \\ {\text {h}}_t^{(d)} = z_t^{(d)} \odot {\text {h}}_{t - 1}^{(d)} + \left( {1 - z_t^{(d)}} \right) \odot c_t^{(d)} \end{aligned}$$

(9)

Table 1 Details of datasets

Full size table

where d represents the direction of STIM, with 1 indicates forward and -1 indicates backward. $[\cdot \Vert \cdot ]$ denotes concatenation operation along the feature dimension, $\odot $ signifies element-wise multiplication, $z_t^{(d)}$ and $r_t^{(d)}$ are the outputs of the update gate and reset gate at time step t, respectively, and $c_t^{(d)}$ is the new candidate activation state at the current time step, representing the latent new information calculated based on the current input and the adjusted previous hidden state. $\mathcal {G}$ denotes the SSGC module with learnable parameters $\Upsilon _z^{(d)}$, $\Upsilon _r^{(d)}$ and $\Upsilon _c^{(d)}$. The forward and backward features of STIM are concatenated, denoted as $\left[ h_t^{(1)} \Vert h_t^{(-1)}\right] $. Once the final step is completed, we collect the hidden features from all time steps within STIM to form a comprehensive feature $X^H \in \mathbb {R}^{N \times T \times 2 C^{\prime }}$:

$$\begin{aligned} {X^H} = {\text {[h}}_1^{(1)}{\text {,h}}_1^{( - 1)}{\text {;h}}_2^{(1)}{\text {,h}}_2^{( - 1)}{\text {;}} \cdots {\text {h}}_T^{(1)}{\text {,h}}_T^{( - 1)}{\text {]}}. \end{aligned}$$

(10)

4.3 Global spatio-temporal dependency fusion

In STIM, the output at a given time step is influenced by both the current input and the hidden state from the previous time step. For long sequences, especially those that require understanding interdependent features, STIM needs to gather sufficient information over multiple time steps to connect events that are temporally distant. The self-attention mechanism, especially as applied in Transformers, has proven to be highly effective in learning the long-distance interdependencies of time series data, as it can directly capture the relationships between any two positions in the sequence. Therefore, we adopt GTFM after STIM to deeply integrate hidden features from all time steps in STIM. GTFM mainly consists of temporal multi-head self-attention and fully connected layers. To enhance the representational power of node features, we introduce residual connections into the network. Specifically, it can be represented as follows:

$$\begin{aligned} {Q_{{v_i}}} = X_{{v_i}}^H{W_q},\nonumber \\ {K_{{v_i}}} = X_{{v_i}}^H{W_k},\nonumber \\ {V_{{v_i}}} = X_{{v_i}}^H{W_v}, \end{aligned}$$

(11)

where ${W_q}$, ${W_k}$, and ${W_v}$ represent learnable weight matrices. The dynamic dependency between nodes at different times, $T^T \in \mathbb {R}^{N \times N}$, is computed through the dot product of $Q_{v_i}$ and $K_{v_i}^T$:

$$\begin{aligned} {T^T} = \mathrm{{Softmax}}(\frac{{{Q_{{v_i}}}K_{{v_i}}^T}}{{\sqrt{{d_k}} }}),\nonumber \\ GT_{v_i} = {T^T}V_{v_i}, \\ O_{v_i} = \textrm{LN}({\mathop \textrm{Concat}\nolimits } \left( {G{T_1}, \ldots ,G{T_H}} \right) {W^T} + X_{{v_i}}^H),\nonumber \end{aligned}$$

(12)

where $\frac{1}{\sqrt{d_k}}$ is a scaling factor, the Softmax function maps the relevance of $Q_{v_i}$ and $K_{v_i}$ to a range of [0,1], Concat is used to concatenate attention features, LN refers to layer normalization to improve model convergence, and $W^T$ is a learnable parameter. After all nodes are computed, we obtain the final output $O \in \mathbb {R}^{N \times T \times 2C^{\prime }}$.

4.4 Multi-step traffic prediction

To finalize multi-step prediction, a dimension-specific linear transformation is applied to the output sequence via a fully connected neural network layer. This approach is more efficient than single-step prediction methods:

$$\begin{aligned} \hat{Y} = {W_l}O + {b_l}, \end{aligned}$$

(13)

where $W_l$ and $b_l$ are the weight matrix and bias term, respectively, and $\hat{Y} \in \mathbb {R}^{N \times T \times 1}$ is the final prediction result.

5 Experiment

5.1 Datasets

To evaluate the performance of SSGCRTN, we conduct extensive experiments on six public traffic datasets: PeMS03/ 04/07/08 [34], and PeMSD7(M)/(L) [6]. Data gathered every 30 seconds in real-time through the Caltrans Performance Measurement System (PeMS), these datasets are ultimately aggregated at 5-minute intervals, as summarized in Table 1. The distribution of sensors in PeMS can be seen in Fig. 4.

Each dataset consists of two parts. The first part is a CSV file that provides distance information between sensors with connectivity relationships. It includes three attribute columns: “from", “to", and “cost". “From" and “to" record the ID information of two stations, and “cost" records the corresponding distance information. The second part is a data file. For the PeMS03/04/07/08 datasets, we select the recorded traffic flow data. For the PeMSD7(M)/(L) datasets, we select the recorded speed data.

5.2 Data preprocessing

For consistency with benchmarks in earlier studies [34, 35], the datasets are split into training, validation, and test sets following a 6:2:2 ratio. The 60% training set provides sufficient data to help the model learn complex patterns. Traffic data typically includes data from multiple time periods and locations, requiring a large amount of data to capture the relationships between these variables. The 20% validation set is used to adjust model parameters and prevent overfitting, while the remaining 20% test set is used to evaluate the model’s final performance, ensuring that the model generalizes well to new data.

Missing data in the dataset is handled using masking. To accelerate the model’s convergence speed, we normalize the input data using Z-Score normalization, as follows:

$$\begin{aligned} x^{\prime }=\frac{x-{\text {mean}}(x)}{{\text {std}}(x)}, \end{aligned}$$

(14)

where ${\text {mean}}(x)$ denotes the average value and ${\text {std}}(x)$ signifies the standard deviation of the training data.

5.3 Experiment settings

This research aims to predict traffic conditions for the next hour using traffic flow data from the previous hour. To achieve this, we set the values of H and P to 12.

The experimental setup includes a computer running Windows 10 OS, featuring an Intel Xeon Gold 5320 CPU @ 2.20 GHz. The system is equipped with 200 GB RAM and a single NVIDIA RTX 3090 GPU. For the PeMS03/04/07, and PeMSD7(M)/(L) datasets, we set ${n_{space}}=4$, ${{d}_{k}}=64$. For the PeMS08 dataset, ${n_{space}} = 16$, ${{d}_{k}}=64$. All tasks are evaluated using three widely-adopted evaluation metrics: MAE, MAPE and RMSE. The definitions of these metrics are as follows:

1) MAE:

$$\begin{aligned} {\text {MAE = }}\frac{1}{N}\sum \limits _{t = 1}^N {\left| {{{\hat{y}}_t} - {y_t}} \right| }, \end{aligned}$$

(15)

2) MAPE:

$$\begin{aligned} {\text {MAPE = }}\frac{{100\% }}{N}\sum \limits _{t = 1}^N {\left| {\frac{{{{\hat{y}}_t} - {y_t}}}{{{y_t}}}} \right| }, \end{aligned}$$

(16)

3) RMSE:

$$\begin{aligned} \textrm{RMSE}=\sqrt{\frac{1}{N} \sum _{t=1}^N\left( \hat{y}_t-y_t\right) ^2} \end{aligned}$$

(17)

where N denotes the number of samples.

MAE is selected as the loss function, and the Adam optimizer is utilized for the training process. We set the number of training epochs to 150, with a batch size of 64 and a learning rate of 0.001. To prevent overfitting, an early stopping strategy is employed. The model exhibiting the smallest loss on the validation set is chosen as the final model for evaluation. The experiments are conducted five times, and the average results of the evaluation metrics are reported.

Table 2 Comparison of SSGCRTN and baselines on PeMS03, PeMS04, PeMS07, and PeMS08 datasets

Full size table

5.4 Baseline methods

To evaluate SSGCRTN, it is compared with sixteen baseline models.

DCRNN [5]: Utilizes bidirectional random walks to represent spatial correlations and employs a GRU to capture temporal dependencies.
STGCN [6]: Exploits spatio-temporal correlations through a sequence of spatio-temporal convolutional blocks.
ASTGCN [7]: Employs a spatial-temporal attention mechanism to capture hidden spatio-temporal patterns.
GWN [8]: Captures latent spatial dependencies using an adaptive matrix and extends the receptive field with dilated 1D convolutional layers.
STG2Seq [46]: Uses graph convolutions exclusively to extract spatial correlations for multi-step traffic forecasting.
STSGCN [34]: Employs a graph convolution module designed to concurrently identify local spatial and temporal correlations.
AGCRN [9]: Captures node-specific spatial and temporal correlations through a node-adaptive parameter learning module.
LSGCN [47]: Uses graph convolution and cosine graph attention network to extract long-term and short-term spatial dependencies.
STFGNN [35]: Captures hidden spatial dependencies by integrating data-driven graphs with predefined spatial graphs.
Z-GCNETs [37]: Introduces a time-aware zigzag topology layer in GCN to capture significant time-aware topological features in data.
STGODE [36]: Proposes a novel tensor form of GNN to extract distant spatio-temporal correlations.
TAMP-S2GCNets [38]: Models spatio-temporal data using dynamic matrix construction and temporal graph sequences.
DSTAGNN [39]: Generates dynamic spatio-temporal graphs through a data-driven approach, enhancing the multi-head attention mechanism to represent dynamic associations between nodes.
DGCRN [41]: Utilizes a generative tactic to meticulously model the complex topological structure of dynamic graphs for each time interval.
STGPCN [40]: Convolves various spatio-temporal graphs defined by graph accumulation operations to capture spatio-temporal relationships.
STC-CGCN [25]: Introduces prior knowledge such as comfort to improve prediction accuracy.

5.5 Experiment results

Table 3 Comparison of SSGCRTN and baselines on PeMSD7(M) and PeMSD7(L) datasets

Full size table

Tables 2 and 3 respectively present the experimental results of SSGCRTN and other baseline models on traffic flow and speed datasets. We draw the following conclusions:

Early models [5,6,7, 34, 46, 47] constructed static graphs to account for intrinsic spatial dependencies, capturing only shared patterns of traffic sequences while neglecting latent spatial dependencies. To account for latent dependencies, Graph WaveNet constructs adaptive graphs, ST-CGCN dynamically generates complex adjacency matrices, and STGPCN produces multiple spatio-temporal graphs. STFGNN and STGODE introduce temporal graphs and GODE, respectively. Z-GCNETs and TAMP-S2GCNets fully consider the topological properties conditioned on time, thereby enhancing the models’ spatio-temporal awareness and outperforming other graph-based methods. However, they fail to effectively explore parallel spatio-temporal dependencies, resulting in inferior performance compared to AGCRN and DGCRN. DSTAGNN combines dynamic graphs generated from historical data to better uncover potential spatial dependencies among nodes while using self-attention mechanisms to capture long-term temporal dependencies. Consequently, it is highly competitive compared to our model.

It is worth noting that SSGCRTN demonstrates sub-optimal performance compared to other baselines only on PeMS07 for MAE, RMSE, and PeMSD7(L) for MAPE metrics. We conjecture this is because these two datasets have the second largest and the largest number of traffic nodes, respectively, making it difficult for SSGC to identify useful signals. The optimal performance on all other datasets can be attributed to three main reasons: 1) Our model considers intrinsic and multiple latent spatial dependencies; 2) Our model captures parallel spatio-temporal dependencies and enhances the understanding of temporal context; 3) Our model accounts for long-term temporal dependencies.

Figure 5 shows the performance of several models at twelve prediction levels across six datasets under three metrics. As the prediction level increases, the complexity of forecasting escalates, leading to a continual rise in metrics such as MAE, RMSE, and MAPE. SSGCRTN identifies both preceding and succeeding dependencies in temporal dimension for short-term predictions, while employing node-specific time self-attention mechanisms for long-term forecasting abilities. Thus, SSGCRTN is suitable for both short-term and long-term forecasting tasks, demonstrating its versatility and enduring stability.

Figure 6 illustrates how SSGCRTN compares to other advanced models in spatio-temporal prediction, considering parameter quantity and training costs. The findings indicate that SSGCRTN effectively manages the overall number of parameters. During the training phase, SSGCRTN consumes more time because SSGC extracts features from multiple subspaces, learning a broader range of spatial knowledge categories than other baseline methods. Despite this, its lower training costs and outstanding predictive performance make it a preferred choice for spatio-temporal prediction.

Figure 7 demonstrates the quality of predictions at different times of the day by capturing twenty-four prediction snapshots along the time axis on the test set. SSGCRTN responds more quickly to dynamic changes in traffic flow under conditions of missing data and more accurately predicts the start of traffic peaks.

Moreover, the efficacy of SSGCRTN in situations with substantial data gaps is corroborated. Specifically, the ten road nodes experiencing the most significant data omissions in the PeMS04 and PeMS08 test sets are chosen, and the MAE values for various models are depicted as a heatmap, as illustrated in Fig. 8. The MAE values for some nodes in PeMS08 are generally lower than those in PeMS04, indicating a positive correlation between lower data loss rate and lower MAE values. It is evident that SSGCRTN consistently exhibits lower MAE and lighter colors on these road nodes, strongly demonstrating its outstanding performance even in real-world scenarios with substantial missing data.

We also use relative error rates to quantify the differences between our model and other advanced baseline models. In all datasets, the average MAE, RMSE, and MAPE values for our model are 12.70, 21.29, and 10.07%, respectively. The corresponding values for TAMP-S2GCNets are 13.72 (108.03%), 22.92 (107.66%), and 10.80% (107.25%); for DSTAGNN, the values are 12.99 (102.28%), 21.62 (101.55%), and 10.24 (101.68%); and for DGCRN, the values are 13.19 (103.86%), 21.88 (102.77%), and 11.47% (113.90%). Overall, the improvement of our model compared to these two models ranges from 1.55% to 13.90%. This improvement is crucial in traffic prediction, as even slight enhancements can significantly impact the accuracy of the final prediction results.

5.6 Ablation study

To substantiate the efficacy of various components of SSGCRTN, ablation studies are performed using the PeMS04 and PeMS08 datasets. We design five variants of the SSGCRTN model as follows:

1.
SSGC: This model specifically adopts our proposed SSGC for traffic prediction.
2.
STIM: This model removes the GTFM.
3.
w/o ISGC: This model removes the ISGC.
4.
w/o LSGC: This model removes the LSGC.
5.
w/o Reverse SSGCRN: This model removes the reverse SSGCRN from STIM.

The results of the ablation experiments are shown in Table 4. We draw the following conclusions:

Table 4 Component analysis of the SSGCRTN on PeMS04 and PeMS08 datasets

Full size table

It’s evident that each component is indispensable for our model. This indicates that the model extracts different types of knowledge from multiple subgraphs through the causal fusion strategy, enhancing its perception of temporal and spatial dynamics and better capturing spatio-temporal features.

Compared to other variant models, SSGC exhibits the worst predictive performance, suggesting that focusing solely on the spatial correlations of traffic flow data, without deeply modeling the dynamic spatio-temporal associations between road network nodes over time, is insufficient for accurate traffic flow prediction. The model variant w/o LSGC shows the largest error compared to w/o ISGC indicates that dynamic features obtained from multiple latent spaces are more crucial than fixed spatial topological features. $A_i^{LS}$, which is trained alongside the model, learns latent dependencies directly related to downstream tasks. The ISGC component enhances predictive performance, suggesting that it learns the positional relationships between sensor nodes during training, capturing the intrinsic spatial dependencies between them. Furthermore, the unidirectional SSGCRN’s predictive performance falls short compared to STIM, indicating that the causal fusion strategy enables the model to parallelly learn and integrate spatio-temporal relationships at each time step. In-depth study of the causal relationships in traffic events can achieve better predictions. Lastly, after removing GTFM, all three evaluation metrics increased significantly. This indicates that GTFM greatly enhances the model’s ability to capture global spatio-temporal dependencies.

5.7 Hyperparameter effects

Since there may be multiple hidden spatial dependencies between nodes, our work introduces a multi-head mechanism, aiming for $E_1$ and $E_2$ to simulate dependencies in the latent space. Additionally, the node embedding dimension is an important parameter in the LSGC component, affecting the quality of the embeddings and determining whether SSGCRTN can effectively capture the diversity of spatial relationships. Figure 9 shows the prediction results of our model on PeMS08 with different hyperparameters. When we adjust one parameter, other parameters are set to their optimal defaults. The study finds that increasing $n_{space}$ can lead to performance improvement. When $n_{space}= 16$, the model performs best. This indicates that as $n_{space}$ increases, the model acquires more latent spatial dependency information, validating the effectiveness of multiple latent spaces. SSGCRTN performs best when the embedding dimension is 64; both smaller and larger node embedding dimensions reduce performance. This may be because when the embedding dimension is small, the node embedding module can only contain relatively limited information, and when the embedding dimension is too large, the number of module parameters increases sharply, making it difficult for the model to optimize. In conclusion, finding the appropriate node embedding dimension is crucial for the spatio-temporal capture capability of SSGCRTN.

5.8 Analysis of multiple graphs

We visualize ${{A}^{IS}}$ and $A_i^{LS}$ as heatmaps on six datasets, as shown in Fig. 10. ${{A}^{IS}}$ reflects the proximity of nodes in the intrinsic space, with its values determined by the actual distances between sensors – the closer the distance, the higher the value. $A_i^{LS}$ reflects node interaction and similarity in the latent space, continuously adjusting as the model trains. In the $A_i^{LS}$ heatmap, some rows, such as those within the red border, have higher values, indicating that the current node has a broad influence and affects most nodes. The dynamic graphs generated in multiple latent spaces fully utilize semantic associations between road nodes, better capturing latent traffic flow information. This pattern is found in the heatmaps in Fig. 10 (b)-(d), (f)-(h), (j)-(l), (n)-(p), (r)-(t), and (v)-(x). As shown in Fig. 10 (a), (e), (i), (m), (q), and (u), ${{A}^{IS}}$ heatmaps are relatively sparse compared to $A_i^{LS}$, indicating lower intrinsic spatial correlation between sensor nodes in the original traffic network. ${{A}^{IS}}$ is static and cannot reflect real-time traffic conditions like peak hours. Due to the lack of specific geographical locations of sensor nodes and surrounding points of interest, the distribution characteristics of high-impact nodes in $A_i^{LS}$ require further investigation.

6 Conclusion

This paper presents SSGCRTN, a novel approach to traffic prediction. It not only extracts intrinsic and various latent spatial dependencies through SSGC but also deeply explores the fusion of spatio-temporal correlations. Specifically, we combine SSGC with an RNN-based model to process spatio-temporal relationships in parallel at different times. SSGC not only captures existing links between road nodes but also explores latent node correlations under various factors. Since SSGC involves modeling multiple specific spaces, it may have limitations with large-scale node datasets. Additionally, SSGCRTN incorporates a temporal self-attention mechanism for each node ${{v}_{i}}$, crucial for identifying key traffic sequence features and understanding global spatio-temporal dependencies. Comprehensive experiments on six real traffic datasets show that SSGCRTN outperforms existing methods. As a generalized framework suitable for various time-series prediction tasks, we aim to adapt SSGCRTN for forecasting in areas like weather and air quality in future work.

Data Availability

Data can be found at https://github.com/Davidham3/STSGCN (PeMS03/04/07/08) and https://github.com/VeritasYin/STG CN (PeMSD7(M)/(L)).

References

Chen Y, Wang W, Chen XM (2023) Bibliometric methods in traffic flow prediction based on artificial intelligence. Expert Syst Appl p 120421
Wang H, Chen X, Jia F et al (2023) Digital twin-supported smart city: status, challenges and future research directions. Expert Syst Appl p 119531
Xu X, Hu X, Zhao Y et al (2023) Urban short-term traffic speed prediction with complicated information fusion on accidents. Expert Syst Appl p 119887
Reza S, Oliveira HS, Machado JJ et al (2021) Urban safety: an image-processing and deep-learning-based intelligent traffic management and control system. Sensors 21(22):7705
Li Y, Yu R, Shahabi C et al (2018) Diffusion convolutional recurrent neural network: data-driven traffic forecasting. In: International conference on learning representations
Yu B, Yin H, Zhu Z (2018) Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: Proceedings of the international joint conference on artificial intelligence. AAAI Press, pp 3634–3640
Guo S, Lin Y, Feng N et al (2019) Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: Proceedings of the AAAI conference on artificial intelligence. AAAI Press, pp 922–929
Wu Z, Pan S, Long G et al (2019) Graph wavenet for deep spatial-temporal graph modeling. In: Proceedings of international joint conference on artificial intelligence. AAAI Press, pp 1907–1913
Bai L, Yao L, Li C et al (2020) Adaptive graph convolutional recurrent network for traffic forecasting. In: Advances in neural information processing systems. MIT Press, pp 17804–17815
Zhaowei Q, Haitao L, Zhihui L et al (2022) Short-term traffic flow forecasting method with m-b-lstm hybrid network. IEEE Trans Intell Transp Syst 23(1):225–235
Article Google Scholar
Ma D, Song X, Li P (2021) Daily traffic flow forecasting through a contextual convolutional recurrent neural network modeling inter- and intra-day traffic patterns. IEEE Trans Intell Transp Syst 22(5):2627–2636
Article Google Scholar
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in neural information processing systems 30
Jiang W, Luo J (2022) Graph neural network for traffic forecasting: A survey. Expert Syst Appl 207
Hou Q, Leng J, Ma G et al (2019) An adaptive hybrid model for short-term urban traffic flow prediction. Physica A: statistical mechanics and its applications 527:121065
Dissanayake B, Hemachandra O, Lakshitha N et al (2021) A comparison of arimax, var and lstm on multivariate short-term traffic volume forecasting. Conference of open innovations association, FRUCT, FRUCT Oy 28:564–570
Tamil Thendral M, Ganesh Babu TR, Chandrasekar A et al (2022) Synchronization of markovian jump neural networks for sampled data control systems with additive delay components: analysis of image encryption technique. Math Methods Appl Sci
Zhan A, Du F, Chen Z et al (2022) A traffic flow forecasting method based on the ga-svr. J High Speed Netw 28(2):97–106
Aadhavan A, Ahmed A, Vellaian VM et al (2021) Prediction and classification of traffic data with knn and rfr for a smart internet of vehicles system. In: 4th Smart Cities Symposium (SCS 2021), pp 146–151
Wang Z, Ji S, Yu B et al (2019) Short-term traffic volume forecasting with asymmetric loss based on enhanced knn method. Math Probl Eng
Rakkiyappan R, Kumari EU, Chandrasekar A et al (2016) Synchronization and periodicity of coupled inertial memristive neural networks with supremums. Neurocomputing 214:739–749
Aslam MS, Radhika T, Chandrasekar A et al (2024) Improved event-triggered-based output tracking for a class of delayed networked t–s fuzzy systems. Int J Fuzzy Syst pp 1–14
Ma Y, Zhang Z, Ihler A (2020) Multi-lane short-term traffic forecasting with convolutional lstm network. IEEE Access 8:34629–34643
Article Google Scholar
Wang J, Zhu W, Sun Y et al (2021) An effective dynamic spatiotemporal framework with external features information for traffic prediction. Appl Intell 51:3159–3173
Khodabandelou G, Kheriji W, Selem FH (2021) Link traffic speed forecasting using convolutional attention-based gated recurrent unit. Appl Intell 51(4):2331–2352
Bao Y, Huang J, Shen Q et al (2023) Spatial-temporal complex graph convolution network for traffic flow prediction. Eng Appl Artif Intell 121:106044
Hui B, Yan D, Chen H et al (2021) Trajectory wavenet: a trajectory-based model for traffic forecasting. In: 2021 IEEE International conference on data mining (ICDM), IEEE, pp 1114–1119
Zhang R, Sun F, Song Z et al (2021) Short-term traffic flow forecasting model based on ga-tcn. J Adv Transp 2021:1–13
Jiang R, Yin D, Wang Z et al (2021) Dl-traff: survey and benchmark of deep learning models for urban traffic prediction. In: Proceedings of the 30th ACM international conference on information & knowledge management. Association for computing machinery, New York, USA, CIKM ’21, p 4515–4525
Khaled A, Elsir AMT, Shen Y (2022) Tfgan: traffic forecasting using generative adversarial network with multi-graph convolutional network. Knowl-Based Syst 249:108990
Hu G, Xiao X, Shen M et al (2023) Tcgnn: packet-grained network traffic classification via graph neural networks. Eng Appl Artif Intell 123:106531
Gong J, Zhao Y, Zhao J et al (2024) Personalized recommendation via inductive spatiotemporal graph neural network. Pattern Recogn 145:109884
Onan A (2023) Gtr-ga: harnessing the power of graph-based neural networks and genetic algorithms for text augmentation. Expert Syst Appl 232:120908
Zou G, Lai Z, Wang T et al (2024) Multi-task-based spatiotemporal generative inference network: A novel framework for predicting the highway traffic speed. Expert Syst Appl 237:121548
Song C, Lin Y, Guo S et al (2020) Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting. In: Proceedings of the AAAI conference on artificial intelligence. AAAI Press, pp 914–921
Li M, Zhu Z (2021) Spatial-temporal fusion graph neural networks for traffic flow forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol 35. AAAI Press, pp 4189–4196
Fang Z, Long Q, Song G et al (2021) Spatial-temporal graph ode networks for traffic flow forecasting. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery & data mining. Association for computing machinery, New York, USA, pp 364–373
Chen Y, Segovia I, Gel YR (2021) Z-gcnets: time zigzags at graph convolutional networks for time series forecasting. In: International conference on machine learning, PMLR, pp 1684–1694
Chen Y, Segovia-Dominguez I, Coskunuzer B et al (2021) Tamp-s2gcnets: coupling time-aware multipersistence knowledge representation with spatio-supra graph convolutional networks for time-series forecasting. In: International conference on learning representations
Lan S, Ma Y, Huang W et al (2022) Dstagnn: dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In: International conference on machine learning, PMLR, pp 11906–11917
Tan Z, Zhu Y, Liu B (2023) Learning spatial-temporal feature with graph product. Sig Process 210:109062
Li F, Feng J, Yan H et al (2023) Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution. ACM Trans Knowl Discov Data 17(1):1–21
Huo G, Zhang Y, Wang B et al (2023) Hierarchical spatio–temporal graph convolutional networks and transformer network for traffic flow forecasting. IEEE Trans Intell Transp Syst 24(4):3855–3867
Zhang C, Zhang Y, Shao Q et al (2023) Chattraffc: text-to-traffic generation via diffusion model. arXiv:2311.16203
Zhao L, Song Y, Zhang C et al (2019) T-gcn: a temporal graph convolutional network for traffic prediction. IEEE Trans Intell Transp Syst 21(9):3848–3858
Bai J, Zhu J, Song Y et al (2021) A3t-gcn: attention temporal graph convolutional network for traffic forecasting. ISPRS Int J Geo Inf 10(7):485
Bai L, Yao L, Kanhere SS et al (2019) Stg2seq: spatial-temporal graph to sequence model for multi-step passenger demand forecasting. In: Proceedings of the international joint conference on artificial intelligence. AAAI Press, pp 1981–1987
Huang R, Huang C, Liu Y et al (2020) Lsgcn: long short-term traffic prediction with graph convolutional networks. In: Proceedings of the international joint conference on artificial intelligence. AAAI Press, pp 2355–2361

Download references

Funding

This study was funded by the National Nature Science Foundation of China (Grant number No. 41471333) and National Nature Science Foundation of China (Grant number No. 42201500).

Author information

Authors and Affiliations

Academy of Digital China (Fujian), Fuzhou University, 350108, Fuzhou, China
Shiyu Yang, Qunyong Wu, Yuhang Wang & Tingyu Lin
Key Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, Fuzhou University, 350108, Fuzhou, China
Shiyu Yang, Qunyong Wu, Yuhang Wang & Tingyu Lin
National Engineering Research Centre of Geospatial Information Technology, Fuzhou University, 350108, Fuzhou, China
Shiyu Yang, Qunyong Wu, Yuhang Wang & Tingyu Lin

Authors

Shiyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qunyong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tingyu Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, Shiyu Yang and Qunyong Wu; methodology, Shiyu Yang; visualization, Shiyu Yang; writing- original draft preparation, Shiyu Yang; writing- review and editing, Qunyong Wu, Yuhang Wang and Tingyu Lin; All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Qunyong Wu.

Ethics declarations

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, S., Wu, Q., Wang, Y. et al. SSGCRTN: a space-specific graph convolutional recurrent transformer network for traffic prediction. Appl Intell 54, 11978–11994 (2024). https://doi.org/10.1007/s10489-024-05815-1

Download citation

Accepted: 24 August 2024
Published: 07 September 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s10489-024-05815-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SSGCRTN: a space-specific graph convolutional recurrent transformer network for traffic prediction

Abstract

Similar content being viewed by others

AMGCN: adaptive multigraph convolutional networks for traffic speed forecasting

ISTGCN: Integrated spatio-temporal modeling for traffic prediction using traffic graph convolution network

Spatiotemporal synchronous dynamic graph attention network for traffic flow forecasting

Explore related subjects

1 Introduction

2 Related works

3 Preliminaries

4 Methodology

4.1 Space-specific graph convolution

4.1.1 Intrinsic space graph convolution

4.1.2 Latent space graph convolution

4.1.3 Adaptive fusion layer

4.2 Bidirectional spatio-temporal dependency fusion

4.3 Global spatio-temporal dependency fusion

4.4 Multi-step traffic prediction

5 Experiment

5.1 Datasets

5.2 Data preprocessing

5.3 Experiment settings

5.4 Baseline methods

5.5 Experiment results

5.6 Ablation study

5.7 Hyperparameter effects

5.8 Analysis of multiple graphs

6 Conclusion

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation