1 Introduction

The continuous expansion of urban scale has accelerated the development of urbanization level, but there are a lot of urban problems. Traffic is one of them. The development of economy has brought about the number of private cars. According to relevant data, by the end of 2015, the number of motor vehicles in China was 279 million, of which 172 million were automobiles, accounting for 61.65%, and the number of motor vehicle drivers was 327 million, including more than 280 million motorists. Compared with 207 million vehicles in 2010, the national motor vehicle ownership has increased by 34.78% in the past 5 years, and the growth rate is very fast [1, 2]. However, the transportation infrastructure and management level have been slow to develop. The mismatch between the two has led to traffic congestion and frequent traffic accidents, which seriously affect residents’ travel, life and property safety. In addition, traffic congestion has led to increased energy consumption and environmental pollution and the city’s ecological environment has been greatly damaged. Therefore, it is urgent to mitigate the impact of traffic problems on urban functions, and the use of intelligent transportation systems (ITS) has become a key way to solve urban congestion problems [3]. Intelligent transportation system (ITS) [4] integrates advanced information technology, data communication technology, computer technology and other technologies with the whole transportation system to achieve harmony among people, vehicles and roads, alleviate traffic congestion and improve the capacity of road network. According to [5], after the application of ITS in a certain area, it is predicted that the benefits in 2015 will be as follows: reducing traffic congestion by 10–50%; saving energy by 5–15%; reducing air pollution by more than 25%; reducing enterprise operating costs by 5–25%; and reducing accidents by 30–60%.

Short-term traffic state forecasting [6], as the core technology of intelligent transportation system, has been the most popular technologies in the field of intelligent transportation systems research. It can forecast the future traffic situation by analyzing historical traffic data, helping traffic managers to alleviate traffic congestion and also helping travelers to make path planning. The accuracy of traffic flow prediction is critical to the success of intelligent traffic management systems. In real life, traffic flow forecasting can help traffic managers take traffic control measures before congestion occurs, instead of waiting for congestion to solve a series of traffic problems. But the short-term traffic flow is complex, changeable and unstable. Therefore, the core of traffic flow forecasting research is how to design forecasting model through historical traffic flow data to predict traffic flow in the future.

There are generally two types of traffic flow forecasting methods [7]. The first category is traditional statistical theory-based methods, such as auto-regressive moving average model (ARMA) and its improved model, Kalman filtering model and local linear regression model. In 1984, Okutani and Stephanedes [8] and others introduced Kalman into traffic flow prediction for the first time, and they found that the prediction result was higher than the historical average method. In 2013, Guo et al. [9] fully considered the similar characteristics of urban traffic flow and used fuzzy logic to further optimize the Kalman filter prediction model, which was successfully applied to urban traffic flow prediction. In 2014, Guo et al. [10] conducted an in-depth study on the Kalman filter model, adopted a random adaptive Kalman filter model for traffic flow prediction and achieved good results. The advantages of these models are that they are simple to calculate and fast to solve. However, these methods cannot reflect the uncertainty and nonlinear characteristics of traffic flow and cannot solve the complex and rapid changes hidden in the traffic flow. The second category is intelligent model based on knowledge discovery, including fuzzy theory model, wavelet theory model, chaos theory model and machine learning model. Machine learning prediction model includes neural network prediction model, support vector machine prediction model and depth learning prediction model. The typical representative of machine learning prediction model is artificial neural network (ANN), which can efficiently solve arbitrary complexity problems without prior knowledge. The ANN model simulates the operation of the human nervous system and determines the number and time characteristics of vehicles from the historical traffic flow pattern, especially for the nonlinear and dynamic traffic flow. Therefore, in 1998, Park and others used RBF neural network to forecast the traffic flow of expressway. By comparing with other methods, the conclusion that the model is better is obtained. However, the advantage of ANN is based on the principle of minimizing the risk of experience. Therefore, when the sample is limited, the ANN has insurmountable defects. When the number of samples increases, it will face the problem of fitting, and it is also difficult to ensure high accuracy. In addition, another limitation of the ANN model is the complexity of the model itself and the operation, and it is difficult to obtain global optimality. However, the method based on support vector regression (SVR) can obtain the global optimal. This method converts the nonlinear regression problem into a linear regression problem through the kernel function. Therefore, SVR is also used by researchers to predict urban traffic flow. For example, in 2004, Vanajakshi and others used the support vector machine in traffic flow prediction, and compared with BP neural network, the former is better. In 2005, Xu Qihua used the support vector machine to predict the traffic flow and compared it with the artificial neural network. It is concluded that the forecast results are better when the traffic flow data have a certain proportion of noise.

In recent years, deep learning [11, 12] has received a lot of attention from researchers and business people as a new machine learning method. In 2014, Huang et al. [13, 14] used the new concept of deep belief network in deep learning to predict short-term traffic flow. The method combines deep belief network with multitask regression, and the prediction result is increased by 5%. In the later period, Huang predicted the traffic flow with single output and multitask output, and the result was better. In 2016, Jiang Dehao used wavelet theory to reduce the noise of short-term traffic flow and then used depth belief network (DBN) to predict the road in Birmingham, England, and achieved good results. In addition, Kuremoto and others used time series based on restricted Boltzmann machine’s trust network model to predict traffic flow. LV and others proposed self-coding depth network model prediction methods for traffic flow under highway network, but these studies did not consider the potential trend of traffic data on the impact of prediction results.

In view of the instability of the classical deep belief network–back propagation (DBN–BP) prediction model and the impact of the trend of traffic data on the prediction results, combining depth model with SVR, a short-term traffic flow forecasting model based on depth belief network model deep belief network–support vector regression (DBN–SVR) is proposed in this paper. Firstly, the input traffic flow data are processed differently to avoid the trend of the data. Then, the input data are trained by DBN+SVR, and a prediction model is established to predict the test samples.

2 An alternating prediction model based on machine learning

2.1 Traffic forecasting model

Traffic flow forecasting technology [15,16,17] is the key technologies of intelligent transportation system. It predicts traffic volume in a certain period of time by analyzing historical traffic data. In general, the prediction time period can be set to 5–30 min. Let \(X_{i}^{t}\) denote the traffic flow of the ith traffic road at the tth time. When given a traffic sequence, i = 1,2 ,…, m; t = 1, 2 ,…,T, then the traffic flow prediction is based on the previous traffic flow sequence to predict the {T + Δt} time period of a certain road, where Δt can be adjusted. The approximate prediction mode is shown in Fig. 1.

Fig. 1
figure 1

Traffic flow prediction mode

As shown in Fig. 1, according to the analysis of traffic flow characteristics, the traffic flow prediction model is trained. Then, the final prediction model is obtained by optimizing the parameters of the prediction model. Finally, the test samples are input into the optimized prediction model, and the predicted values are obtained. Therefore, traffic flow forecasting model [18, 19] generally includes two steps: feature learning and model learning. Feature learning is an unsupervised learning. It solves the problem of pattern recognition according to unknown training samples. First, it gets a feature model h through training samples, which represents historical traffic time series. After feature training, the traffic flow sequence X can be transformed into the characteristic space Y data by h, that is \(h(X) \to Y\); model learning is a kind of supervised learning, the result of which is to get the final prediction model by optimizing the parameters of the prediction model. It is a learning prediction model \(Z_{n + 1} = g(Y)\) composed of a paired group \(\left\{ {(Y_{1} ,Z_{1} ),(Y_{2} ,Z_{2} ), \ldots (Y_{n} ,Z_{n} )} \right\}\) of a set of features Y and a target task Z. By minimizing the target loss function L, a reasonable weight parameter W of the prediction model is obtained:

$$L(Z;W) = ||Z - g(Y)||^{2}$$
(1)

At present, traffic flow forecasting models can be roughly divided into six kinds of models [20, 21]: model based on statistical methods, dynamic traffic assignment model, traffic simulation model, nonparametric regression model, neural network model and model based on chaos theory. Although the prediction models differ from each other, the target loss function is basically the same.

  1. 1.

    Model based on statistical method

This model deals with traffic history data by means of mathematical statistics. It assumes that the data predicted in the future will have the same characteristics as the data in the past. Then use historical data to predict. Based on statistical methods, the model theory is simple and easy to understand. But most of the models are linear. However, the prediction effect of this method will deteriorate due to the large variation of traffic flow.

  1. 2.

    Dynamic traffic assignment model

The model estimates the state of the network over time by collecting traffic flow data and traveler’s travel choice behavior. Such methods have clear objectives and clear theory, but it is difficult to obtain information and is expensive; some models cannot be solved or can be difficult to solve, and optimization time is long; the model is difficult to solve, only suitable for small- and medium-scale network applications.

  1. 3.

    Traffic simulation model

The model provides a practical calculation method to simulate the relationship between traffic flow, occupancy and travel time. But strictly speaking, the traffic simulation model does not have the ability of real-time detection and cannot be used for traffic flow prediction.

  1. 4.

    Nonparametric regression model

The model is a nonparametric modeling method for uncertain, nonlinear dynamic systems. It does not require prior knowledge, only enough historical data, to find neighbors in the historical data similar to the current point, and use those neighbors to predict the next moment value. Therefore, especially when there are special events, the prediction results are more accurate than the parameter modeling.

  1. 5.

    Neural network model

Neural network has the characteristics of recognizing complex nonlinear systems, so it is more suitable for traffic applications. It adopts the typical “black box” learning mode, which is very suitable for the application of traffic flow forecasting. However, the model cannot obtain the input/output relationship, and the lack of data in the training process will lead to poor prediction results; in addition, the training of the network generalization ability is poor; at the same time, the learning algorithm of the neural network is also inadequate in theory.

  1. 6.

    Model based on Chaos Theory

The purpose of chaos theory research is to reveal the simple laws hidden behind seemingly random phenomena, so as to solve a large class of complex system problems by using these common laws. Traffic flow system is an open complex giant system involving people’s participation, so chaos exists in traffic.

Deep learning is a new kind of neural network. It not only solves the training problem of traditional neural network, but also achieves better experimental results than traditional neural network. Therefore, the deep belief network (DBN) can be used to predict and improve the traffic flow model.

2.2 Deep belief network (DBN)

The deep belief network [22, 23] is a probability generating model. It establishes a joint distribution between observation data and tags and evaluates both P (Observation | Label) and P (Label | Observation). DBN consists of several restricted Boltzmann machines layers, and its typical network structure is shown in Fig. 2. These networks are “restricted” to a display layer and a hidden layer, with connections between the layers, but there are no connections between the cells in the layer. Hidden elements are trained to capture the correlation of higher-order data presented in the visible layer. Each of the RBMs is a two-layer model containing only one hidden layer, and the training output of each RBM is used as the input of the next RBM.

Fig. 2
figure 2

Typical DBN network structure

2.2.1 Restricted Boltzmann machine

The restricted Boltzmann machine (RBM) [24] is a component of DBN. RBM has two layers of neurons: One is a visible layer, which is composed of visible units for input training data, and the other layer is hidden layer, which consists of hidden units and is used as feature detectors. The structure is shown in Fig. 3. The gray h1 and h2 represent the two hidden elements of the hidden layer. The blue v1, v2, v3, v4 and v5 represent the five visible elements of the visible layer.

Fig. 3
figure 3

RBM model diagram

In Fig. 3, assume that the hidden element is h and the visible element is v, that is \(\forall i,\;j,v_{i} \in \{ 0,1\} ,\;h_{j} \in \{ 0,1\}\). m and n, respectively, represent the number of cells in hidden and explicit layers, and the energy formula of RBM can be defined as:

$$E(v,h||\theta ) = - \sum\limits_{i = 1}^{n} {a_{i} v_{i} } - \sum\limits_{j = 1}^{m} {b_{j} h_{j} } - \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{m} {v_{i} w_{ij} h_{j} } }$$
(2)

where \(\theta = \{ w_{ij} ,a_{i} ,b_{j} \}\) is the parameter of RBM, wij is the connection weight between the visible element i and the hidden element j. ai represents the offset of the visible element, and bj represents the offset of the hidden element. Based on the energy function, the joint probability distribution of state (V, H) is as follows:

$$p(v,h|\theta ) = \frac{{{\text{e}}^{ - E(v,h|\theta )} }}{{Z^{\theta } }},\;Z(\theta ) = \sum\limits_{v,h} {{\text{e}}^{{^{ - E(v,h|\theta )} }} }$$
(3)

where \(Z(\theta )\) is normalization factor. The distribution \(P(v|\theta )\) of v is obtained, that is, the marginal distribution of joint probability distribution \(p(v,h|\theta )\) is:

$$p(v|\theta ) = \frac{1}{Z(\theta )}\sum\limits_{h} {{\text{e}}^{{^{ - E(v,h|\theta )} }} }$$
(4)

Because the activation states of each hidden element (or explicit element) are independent of each other. The activation probabilities of the jth hidden element and the ith visible element are, respectively:

$$P(h_{j} = 1|v,\theta ) = \sigma \left( {b_{j} + \sum\limits_{i} {v_{i} w_{ij} } } \right)$$
(5)
$$P(v_{i} = 1|h,\theta ) = \sigma \left( {a_{i} + \sum\limits_{j} {w_{ij} } h_{j} } \right)$$
(6)

where \(\sigma (x) = \frac{1}{1 + \exp ( - x)}\) is the sigmoid function.

In order to find the best weights of each RBM, a pre-training process is needed. The training process can find out a probability distribution which can produce the most training samples. The specific algorithm is the contrastive divergence algorithm proposed by Hinton. Initialize the first layer with the training data and then calculate the corresponding hidden layer with the conditional distribution; calculate the weight and offset of the first RBM and then enter it into the second RBM. After training the second RBM, stack it above the first RBM; repeat this step by analogy until the last layer of RBM. As shown in Fig. 4, first use the training data to initialize the first explicit layer v1; then, the values of h1, offset b1 and W11, which are connection weights of explicit element v1 and implicit element H1, are calculated by using formula (5) (formula left with P (h1 = 1|v)); The result is then used as input vector of RBM 2. According to formula (6), the value of V2, the offset a1 and W21, which is the connection weight of the explicit V2 and the hidden H1 are calculated. Repeat the above steps until the last layer of RBM n.

Fig. 4
figure 4

RBM pre-training process

2.2.2 DBN model training process

The classical DBN [25, 26] network structure is RBM-BP structure, and it has several RBM and a BP composition. The training model of DBN can be divided into two steps.

Step 1 Train each layer of RBM network individually and unsupervisedly; the output of each layer of RBM is used as input to the next layer of RBM. After training all the RBM layers, we can get a model that maps multi-level features to digital tags. The result is shown in Fig. 5a, where y1 represents the digital tag. All the hidden elements of the top level are divided into y1 tags.

Fig. 5
figure 5

Training results of RBM and DBN

Step 2 BP network set at the top of the network is responsible for receiving RBM’s output eigenvectors as its input eigenvectors and supervising the training of entity relationship classifiers. Since each layer of RBM network can only ensure that the weights in its own layer map to the layer’s eigenvectors optimally, the back propagation network also propagates the error information from top to bottom to each layer of RBM and adjusts each DBN network. The whole training result is shown in Fig. 5b. Input data are divided into many different classes through DBN network.

In DBN training process, the RBM network training model is equivalent to initializing the weights of deep BP network, which overcomes the shortcomings of local optimum and long training time caused by randomly initializing the weights of BP network in DBN model.

2.3 Traffic flow model based on DBN–SVR

2.3.1 Architecture

The architecture of the traffic flow model based on DBN is shown in Fig. 6. It is different from classic DBN, and at its top level is a SVR. SVR seeks a linear regression equation to fit all the sample points. The optimal hyperplane it seeks is not to divide the two types into the most open, but to minimize the total variance of the sample points from the hyperplane. Therefore, SVR can get global optimum. As shown in Fig. 6, multiple RBM models are combined to form a DBN. The training process of the DBN model is a bottom-up layer-by-layer feature extraction process. The output of each layer is a nonlinear transformation of the input features. The RBM fast learning algorithm CV algorithm [27] proposed by Hinton is used to train the data and update the parameters. After the training of several RBM models is completed, the SVR predictor is used to predict the traffic.

Fig. 6
figure 6

Architecture of traffic flow model based on DBN–SVR

2.3.2 Prediction process

The prediction process of traffic flow prediction model is as follows:

  1. 1.

    Time series preprocessing

Traffic flow data are a random time series with obvious trend in continuous time. Although SVR model can deal with nonlinear data well, its ability to deal with time series with obvious trend direction is poor. Therefore, in order to avoid the influence of trend direction on the accuracy of traffic flow forecasting, this paper uses the difference data smoothing method [28] to eliminate the trend of traffic flow data. Let the original traffic flow time series be x(t) and the smoothed time series be xd(t). The processing methods are as follows:

$$x^{d} (t) = x(t) - x(t - d)$$
(7)

where d is the delay time of time series; xd(t) is used as a new input sample to predict the value of xd(t + 1). After training, the prediction value of the new time series at the next time is \(\hat{x}^{d} (t + 1)\), then the final prediction result of the original data is:

$$\hat{x}(t + 1) = \hat{x}^{d} (t + 1) + \hat{x}(t - d + 1)$$
(8)
  1. 2.

    DBN–SVR traffic flow prediction

Assuming that the sampling interval of traffic flow is t, the number of observed data is N, and the number of observed sections is p, the traffic flow data of all sections in the road section can be represented as a traffic data set:

$$A = \{ x_{1} ,x_{2} , \ldots ,x_{p} \}$$
(9)
$$x_{i} = \{ x_{{i,t_{1} }} ,x_{{i,t_{2} }} , \ldots ,x_{{i,t_{d} }} \}$$
(10)

where \(i = 1,2, \ldots ,p,x_{i}\) indicates the traffic flow of a section at different times.

There is a certain correlation between traffic flow in different sections in time and space, so the data set of each section can be expressed as X, and X is the input data set of the prediction model.

$$X = \{ x_{1} ,x_{2} , \ldots ,x_{p} \}$$
(11)
$$x_{i} = \{ x_{{i,t_{{}} }} ,x_{i,t - \Delta t} , \ldots ,x_{i,t - M\Delta t} \}$$
(12)

where \(i = 1,2, \ldots ,p\) and xij represents the traffic flow of the ith section at time t. Assuming that the output vector of the input data set after learning from the DBN model features is H, then there is

$$H = \varphi (X_{d} )$$
(13)

where \(\phi\) represents deep learning model, and the flow at the next moment of each section is predicted by the current moment and the previous M moment of each section. Xd is a traffic flow data set processed according to formula (12). The predicted traffic flow of any section j at time \(t + \Delta t\) is

$$y_{d} (j,t + \Delta t) = f(H)$$
(14)

where f is the prediction model, \(y_{d} (j,t + \Delta t)\) is the traffic flow of the jth section at time \(t + \Delta t\), i = 1, 2,…,p. The specific traffic flow prediction algorithm flow is as follows:

  1. (a)

    Input data set X;

  2. (b)

    The Xd is obtained by differential preprocessing of traffic flow data set;

  3. (c)

    Xd is input to the DBN network model for feature learning, and the traffic flow characteristic H is obtained by formula (13).

  4. (d)

    H is used as SVR input for traffic prediction.

3 Experiment

3.1 Experimental data description

The short-term traffic flow of Jinlong Road in Chongqing is predicted. Short-term traffic flow data are collected from traffic detectors provided by the traffic department. The period is 2 weeks from 9 May, 2016, to 22 May, 2016. The sampling interval is 5 min, and the data of a single road section are 4032. Residents’ daily travel has certain regularity, which directly leads to the similarity and periodicity of short-term traffic flow in time. In order to elaborate this regularity in more detail, this paper analyzes the short-term traffic flow from 9 May, 2016, to 15 May, 2016. According to the working day and non-working day, the short-term traffic flow of Jinlong Road in Chongqing is plotted. The specific results are shown in Fig. 7.

Fig. 7
figure 7

Trends in short-term traffic flow on working days and rest days in Jinlong Road in Chongqing

In Fig. 7, the overall trend of working days and rest days on different sections is different. The morning and evening peaks of working days are mainly around 7:00 am–8:30 am and 5:00 pm–7:00 pm, and the minimum flow period is 2:00–5:00 in the morning. The peak traffic reaches about 1200. But the morning and evening peaks of rest days are mainly around 8:30 am–10:00 am and 5:30 pm–8:00 pm, and the minimum flow period is 4:00–6:30 in the morning. The peak traffic reaches about 800. Therefore, this paper only uses working day sample data to predict processing. The short-term data of Chongqing City from 9 May, 2016, to 13 May, 2016, and from 16 May, 2016, to 20 May, 2016, are used as the data set to forecast the short-term data. The data of the first 6 days (9 May to 13 May, 16 May) are used as training sets. The sampling interval is 5 min and the time interval is 4. Therefore, there are 5060 training samples in these training sets. The data of the next 4 days (17 May to 20 May) are used as the test set, which contains 2420 test samples. All training samples and test samples are input to the model.

3.2 Model parameter setting

According to the literature, the number of layers is set to k = 3, and time interval of DBN model is 4. The range of differential delay is set to d = {1, 2, 3, 4, 5, 6, 7, 8, 9}, the initial number of output nodes is 4, the step size is 6, the upper limit of N is 50 and the number of nodes in the output layer is N = {5, 11, 17, 23, 29, 35, 41, 47}. The number of nodes is N = 42, and the optimal delay interval is d = 4. The parameters of the top-level SVR are set as follows: The kernel function of SVR is RBF, the number of iterations is 15,000, and the penalty factor C is 0.011. The number of iterations in DBN model training is 60.

3.3 Indicators for evaluating performance

In order to verify the feasibility and accuracy of the short-term traffic flow prediction model based on DBN, this paper mainly uses mean absolute error (MAE) and mean absolute percentage error (MAPE) to determine the quality of the model. Their definitions are as follows:

$$\begin{aligned} & {\text{MAE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {|Z_{i} - \hat{Z}_{i} } | \\ & {\text{MAPE}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {\frac{{|Z_{i} - \hat{Z}_{i} |}}{{Z_{i} }}} \times 100\% \\ \end{aligned}$$
(15)

where Zi is the actual traffic flow value at i time; \(\hat{Z}_{i}\) is the traffic flow predicted at i time. N represents the total number of predicted values. MAE can intuitively reflect the deviation between the true value and the predicted value, and the unit is the same as the predicted value and the true value; MAPE refers to the percentage of the error in the true value, is a dimensionless value and can reflect the measurement error level and credibility.

4 Analysis and discussion of experimental results

4.1 Analysis of forecast results

In this paper, the short-term traffic flow of Jinlong Road in Chongqing has been forecasted for a day. The prediction results are shown in Fig. 8. As can be seen from Fig. 8, the predicted results of the proposed model are in good agreement with the actual traffic flow data, reflecting the basic law of traffic flow changes with time, especially for the 00:00–9:00 period traffic flow data.

Fig. 8
figure 8

Comparison of predicted and actual values of short-term traffic flow in 1 day

In this paper, a short-term traffic flow forecasting model based on DBN is applied to predict the 4-day short-term traffic flow of Jinlong Road in Chongqing. The forecasting results are shown in Fig. 9. As can be seen from Fig. 9, most of the predicted and actual values for the 4 days from May 19 to May 22 are in good agreement and there is little difference between the predicted value and the actual value. In order to further illustrate its effectiveness, this paper quantitatively calculates its error and takes MAE and MAPE as the overall evaluation index. The specific MAE = 9.57, MAPE = 5.91%.

Fig. 9
figure 9

Comparison of short-term traffic flow prediction results of Jinlong Road in Chongqing

4.2 Performance comparison with other models

The short-term traffic flow forecasting model based on DBN–SVR is used to predict the short-term traffic flow in Chongqing, and the fitting effect is good. In order to further verify the accuracy and effectiveness of the model, SVR and the classical DBN–BP prediction model are used to calculate and compare the results of the same example data. The specific results are shown in Table 1. Comparison of predicted and actual results of three kinds of prediction models is shown in Fig. 10.

Table 1 Comparison of various prediction models
Fig. 10
figure 10

Comparison of predicted and actual results of three kinds of prediction models

Data from Table 1 show that the prediction error of DBN–SVR prediction model is smaller than that of other models and its prediction effect is the best. As can be seen from Fig. 10, the prediction results of DBN–SVR model in these 4 days are in good agreement with the actual results. Next is the DBN–BP prediction model, and the last is the SVR prediction model.

5 Conclusion

The uncertainty and variability of urban roads bring great challenges to traffic flow prediction. To solve this problem, this paper uses a short-term traffic flow forecasting model based on deep belief network model DBN to predict urban road traffic flow. DBN model is used to study the characteristics of traffic flow data in road network, and then SVR predictor is used to predict traffic flow. By comparing with the measured traffic flow data, it is concluded that the difference between predicted value and actual measured value is small, and the prediction accuracy is also high. The model can effectively predict the short-term traffic flow. But this paper is only a traffic forecast for a section, and the result is still too one-sided. Traffic flow prediction for more sections will be added later to verify the validity of the model.