1 Introduction

The short-term traffic forecasting has always been one of the most important problems in real-time traffic control for three decades. The information to be forecasted involves journey time, vehicle speed, traffic flow density and so on. The forecasting accuracy of traffic flow directly influences the effects of traffic guidance, planning and control.

Various forecasting approaches have been applied to forecast the short-term traffic flow, which can be classified as follows: (1) time series analysis methods, including ARIMA [1], SARIMA [2], Kalman filtering models [3, 4] and so on; (2) machine learning methods, including K nearest neighbor(KNN) [510], kernel estimator [11], artificial neural network(ANN) [12] and so on. Among the ANN approaches, several improved ones have been proposed, such as back-propagation neural network (BPNN) [13], radial basis function neural network [14], wavelet network [15], fuzzy neural network [16], object-oriented neural network [17] and so on. But these methods are relatively complicated in modeling, or their performances are not good enough in forecasting accuracy [1822]. Therefore in this paper a hybrid method is proposed, which is less complicated in modeling and can yield accurate real-time flow forecasting.

Unlike ANN, the support vector regression model (SVR) can get the global optimal solution, and can map a nonlinear regression problem into a linear regression problem by applying a kernel function [23]. Particle Swarm Optimization (PSO) is a population-based stochastic approach for solving continuous and discrete optimization problems, and compared with Genetic Algorithm (GA) and other heuristic algorithms, it is easy to be realized and there are not many parameters needing to be adjusted. Through the comprehensive analysis of SVR and PSO, this paper proposes a hybrid short-term traffic flow forecasting method based on PSO and SVR. In addition, to handle the condition when the traffic data contain noises, this paper proposes a PSO-SVR forecasting method with historical momentum based on the historical similarity of the traffic flow data. To sum up, the contribution of this paper can be summarized as follows.

  1. (1)

    It proposes a short-term traffic flow forecasting method based on hybrid PSO-SVR, which uses PSO to optimize the parameters of SVR.

  2. (2)

    It proposes three kinds of strategies to handle the particles flying out of the searching space, in order to find an approach to improve the search speed of PSO.

  3. (3)

    In order to handle the condition when the original traffic data contain noises, it proposes a PSO-SVR forecasting method with historical momentum, which is based on the fact of historical data’s similarity.

The rest of this paper is organized as follows. Section 2 introduces the related works on the classical methods for short-term traffic flow forecasting. Section 3 details a hybrid short-term traffic flow forecasting method based on hybrid PSO-SVR. Section 4 details a PSO-SVR method with historical momentum. Section 5 presents the analysis for the results of extensive comparison experiments. Finally, conclusions and future research are made in Sect. 6.

2 Related Work

The related work on the short-term traffic flow forecasting can be presented as follows.

(1) Kalman filtering model and system identification model built the traffic flow prediction as a multidimensional model and involved the relationship between the time and the location. However, this kind of method was difficult to deal with the observation noise in the traffic flow data and the prediction accuracy was limited. Danech-Panjouh and Aron [24] adopted a hierarchical statistical method, which used a mathematical clustering technology to classify the traffic flow data and built a tuned linear regression model for each category respectively. The modified method improved the prediction accuracy, but it had a high demand for the quality of essential data [25, 26].

(2) The ARIMA model developed by Box [20] is one of the most popular methods in traffic flow forecasting. Kamarianakis [27] successfully applied the ARIMA model considering space and time factors to forecast the traffic flow data. However, ARIMA had its own limitation that the tendency to concentrate on the average values of the past data series made it unable to capture the rapid varying process of the traffic flow. In addition, Willianms [2] employed the seasonal ARIMA (SARIMA) model to the traffic flow forecasting, which considered the periodic difference of the peak, non-peak traffic data, and obtained good forecasting results. However, it took much time to detect the needed outliers and to estimate the parameters of SARIMA model.

(3) ANN imitates the human neurological system’s information processing, and several kinds of ANN have been widely used in the traffic flow forecasting. Yin [28] applied a fuzzy-neural model (FNM) to predict the traffic flow of urban street network, which was verified by the experiments to be more accurate than ANN. Vlahogria [29] adopted genetic algorithm (GA) and a multilayer structure optimization strategy to determine the appropriate neural network structure. Their experiments indicated that a simple and static neural network with genetic optimization step, momentum and a certain number of hidden units was suitable for modeling the univariate and multivariate traffic data. ANN model is suitable for arbitrary functions especially nonlinear functions, but its disadvantage is that the objective function is difficult to be understood and it is difficult to find the global optimal solution for non-convex problems.

(4) Support Vector Machine (SVM) can effectively overcome the shortcoming of ANN. It can not only use the minimal risk strategy to train, but also use the structure risk minimization strategies to minimize the upper bound of the error. SVM can obtain the global optimal value in theory, while ANN can only get the local optimal value. In addition, through the application of the kernel function, SVM can map a nonlinear problem in the low dimensional input space to a linear problem in the high dimensional feature space. Hong [30, 33] applied SVR to short-term traffic flow forecasting, and used simulated annealing algorithm and genetic algorithm to optimize the SVR parameters selection process. However, they failed to consider the condition when the traffic data contain noises.

In conclusion, short-term traffic flow forecasting has the characteristics of nonlinearity, complexity and real-time performance. However, the present methods are not perfect in constructing adaptive models, ensuring high precision accuracy and providing a real-time solution. In this paper, we combine PSO with SVR and propose a hybrid PSO-SVR short-time traffic flow forecasting method, which can ensure the accurate and real-time prediction of the short-time traffic flow and reduce the influence of the noises in the traffic flow data.

3 A Hybrid PSO-SVR Method for Short-term Traffic Flow Forecasting

3.1 SVR

Support Vector Machines refers to a kind of specific algorithms, which can be used to solve the classification and regression problems. They were invented by Vladimir Vapnik and his colleagues, and were firstly introduced on the computational learning theory (COLT) conference in 1992. Their basic model is a hyper plane with the maximum margin in feature space.

Considering the given train dataset \(\left\{ {\left( {x_1 ,y_1 } \right) ,\ldots ,\left( {x_n ,y_n } \right) } \right\} \), the study target of SVR is to find a function representing the relationship of \(x\) and \(y\), and when a new \(x\) is given, the function can get the corresponding forecasted value. This function is shown as Eq. (1):

$$\begin{aligned} f\left( x \right) =\sum \nolimits _{i=1}^n w\phi \left( x \right) +b \end{aligned}$$
(1)

where \(w\) and \(b\) are the final study targets of SVR, which decide a linear hyper plane that can fit the training dataset. \(\phi \left( x \right) \) is the nonlinear mapping about \(x\), which maps \(x\) to a new space when the relationship of \(x\) and \(y\) is nonlinear. In the new space the relationship of \(\phi \left( x \right) \) and \(y\) is linear.

The goal of SVR is to minimize the expected risk, which can be defined as Eq. (2), where \(L_{\epsilon }\) is called \({\upepsilon }\)-insensitive loss function proposed by Vapnik. \(L_{\epsilon }\) is defined as Eq. (3).

$$\begin{aligned} R_{emp} =\frac{1}{n}\sum \nolimits _{i=1}^n L_{\epsilon } (y_i ,f(x_i )) \end{aligned}$$
(2)
$$\begin{aligned} L_{\epsilon } \left( {y,f\left( x \right) } \right) = \left\{ {{\begin{array}{l@{\quad }l} 0,&{} \mathrm{if}\left| {y-f(x)} \right| \le {\epsilon } \\ \left| {y-f(x)} \right| -{\epsilon }, &{} \mathrm{otherwise} \\ \end{array} }} \right. \end{aligned}$$
(3)

SVR performs linear regression in the feature space to lower the expected risk using \(\upepsilon \) -insensitive loss and, and at the same time, tries to reduce the complexity of the model by minimizing \(\Vert w^{2}\Vert \). This can be realized in Eq. (4), where \(\xi _i \), \(\xi _i^*(i=1,\ldots ,n)\) are the non-negative slack variables, representing the deviation between the function \(f(x)\) of training dataset and the actual value.

$$\begin{aligned} \begin{array}{ll} {\mathop {min}\limits _{w,b,\xi ,\xi ^{*}}} &{} \frac{1}{2} \Vert w^{2}\Vert +C \sum \limits _{i=1}^n (\xi _i +\xi _i^*)\\ {s.t.} &{} w\phi \left( {x_i } \right) +b-y_i \le \epsilon +\xi _i ,\\ &{} y_i -w\phi \left( {x_i } \right) -b\le \epsilon +\xi _i^*,\\ &{} \xi _i ,\xi _i^*\ge 0,\quad i=1,\ldots ,n.\\ \end{array} \end{aligned}$$
(4)

This optimization problem can be transformed into the dual problem and its solution is given by Eq. (5), where \(a_i^*,a_i \) are the Lagrange multipliers that can be got by solving the dual problem and \(K\left( {x_i ,x_j } \right) \) is the kernel function that equals the inner product of \(\phi \left( {x_i } \right) \) and \(\phi \left( {x_j } \right) \). Any function that meets Mercer’s condition [31] can be used as the kernel function.

$$\begin{aligned}&f( x )=\sum \nolimits _{i=1}^n \left( {a_i^*-a_i } \right) K\left( {x_i ,x} \right) +b\nonumber \\&{s.t.}\quad \quad 0\le a_i^*\le C,\quad 0\le a_i \le C \end{aligned}$$
(5)

The most frequently used kernel functions are polynomial kernel function, sigmoid kernel function and the radial basis kernel function. This paper uses the radial basis kernel function, which is defined as Eq. (6):

$$\begin{aligned} K\left( {x,z} \right) =exp\left( {\frac{\Vert x-z\Vert ^{2}}{2\gamma ^{2}}} \right) \end{aligned}$$
(6)

where \(\gamma \) is the parameter needing to be manually set, as the same as \(\upepsilon \) and \(C\) in Eq. (4), all of which have much influence on the forecasting accuracy of SVR.

The hybrid PSO-SVR method proposed in this paper uses SVR to forecast the short-term traffic flow, and uses PSO to optimize the selection procss of the parameters of SVR. The forecasting process is shown in Fig. 1, where the “Pre-Processing Unit” represents a unit that preprocesses the real-time data from the sensor or the history data to obtain the data in the format that SVR needs.

Fig. 1
figure 1

Short-term traffic flow forecasting based on PSO-SVR algorithm

3.2 Performance Evaluation Index

In this paper \(\textit{RMSE}\)(Root Mean Squared Error) and \(r^{2}({\textit{Rsquared}})\) are used to evaluate the forecasting performance of the model, which are respectively defined in Eqs. (7) and (8). Here \(n\) is the number of the test samples, \(x_i (i=1,\ldots ,n)\) is the instance of a test sample, \(f\left( {x_i } \right) \) is the forecasting value of an instance, and \(y_i \left( {i=1,\ldots ,n} \right) \) is the true value. The smaller \(\textit{RMSE}\) is, the higher the forecasting accuracy is, and the bigger \(r^{2}\) is, the higher the forecasting accuracy is. \(r^{2}\) is not more than 1.

$$\begin{aligned} \textit{RMSE}=\frac{1}{n} \sum \nolimits _{i=1}^n \left( {f( {x_i } )-y_i } \right) ^{2} \end{aligned}$$
(7)
$$\begin{aligned} r^{2}=\frac{\left( {n \sum \nolimits _{i=1}^n f( {x_i } )y_i -\sum \nolimits _{i=1}^n f(x_i ) \sum \nolimits _{i=1}^n y_i } \right) ^{2}}{\left( {n \sum \nolimits _{i=1}^n f( {x_i } )^{2}-( \sum \nolimits _{i=1}^n f(x_i ))^{2}} \right) \times \left( {n \sum \nolimits _{i=1}^n y_i^2 -\left( { \sum \nolimits _{i=1}^n y_i} \right) ^{2}} \right) } \end{aligned}$$
(8)

3.3 Hybrid PSO-SVR Algorithm

In order to get the optimal parameters of SVR, this paper uses PSO to optimize the selection process of the parameters of SVR, and proposes the PSO-SVR model. In this model, SVR is trained by the method of S-fold cross validation, and \(\textit{RMSE}\) is selected to evaluate the performance of SVR. The smaller \(\textit{RMSE}\) is, the better the SVR is.

In PSO, assume \(\Phi \) is the searching space of the particles, which is the value range of the vector \(\left( {\gamma ,C,\epsilon } \right) \), and \(n\) is the swarm size. The ranges of \(\gamma ,C,\epsilon \) are set as [0, 1000], [1, 10000] and [0, 50] respectively. Each particle has its own position \(\vec {x_\imath }\) and velocity \(\vec {v_\imath }\), among which \(\vec {x_\imath }\) corresponds to \((\gamma ,C,\epsilon )\). Whether the particle is excellent depends on its fitness function, which is the \(\textit{RMSE}\) of the SVR’s train result using this particle. At each iteration, the particles update their position and velocity according to their own historical optimal position \(\vec {b_i}\) and the global historical optimal position \(\vec {g}\), which are also updated correspondingly after each iteration. The updating formulas of the position and velocity for each particle are shown as Eqs. (9) and (10). The iteration terminates either after the maximum number of iterations or on the condition when \(\textit{RMSE}\) is smaller than a preset value. The position \(\vec {g}\) is the output that represents the desired parameters combination. The variables discussed in above description are detailed in Table 1.

$$\begin{aligned} \vec {v}_{i}^{t+1}&= w\vec {v}_{i}^{t}+{\varphi }_{1}\vec {U}_{1}^{t}(\vec {b}_{i}-\vec {x}_{i}^{t}) +{\varphi }_{2}\vec {U}_{2}^{t}(\vec {g} -\vec {x}_{i}^{t}) \end{aligned}$$
(9)
$$\begin{aligned} \vec {x}_{i}^{t+1}&= \vec {x}_{i}^{t}+\vec {v}_{i}^{t+1} \end{aligned}$$
(10)
Table 1 Description of PSO variables in PSO-SVR

In order to find a proper PSO for selecting the parameters of SVR, this paper proposes three strategies to handle the particles flying out of the searching space, which are described as follows.

The first strategy is the standard strategy. In the traditional PSO, one of the strategies to handle the particles flying out of the searching space is to stop updating the fitness value, and in the latter iterations, the particles will be attracted back to the searching space.

The second strategy is the chaos strategy. Due to the ergodicity and randomness of chaos, the PSO will generate random positions for the particles flying out of the searching space after the chaos.

The third strategy is the convergence strategy. If \(\gamma \) of one particle’s position vector is beyond its setting range, it will be set as the \(\gamma \) of \(\vec {g}\) and there will be a little disturbance for it. The other vector components are handled equally.

The chaos strategy is proposed for the multiple-peak functions, which slows down the convergence speed of PSO and makes it avoid falling into the local optimal value. This paper adopts the Logistic mapping to do Feigenbaum iteration to generate chaos. The Logistic mapping is a quadratic function defined as Eq. (11), where \(\mu \in \left[ {0,4} \right] ,x\in \left[ {0,1} \right] \). When \(\mu >3.57\), the iteration will generate chaos. The Feigenbaum iteration is defined as Eq. (12), where \(\mu \in \left[ {3.57,4} \right] \). The algorithm to generate chaos is called Feigenbaum iteration chaos, which is described in Table 2.

$$\begin{aligned} f\left( x \right)&= \mu x\left( {1-x} \right) \end{aligned}$$
(11)
$$\begin{aligned} x_{n+1}&= \mu x_n (1-x_n ) \end{aligned}$$
(12)
Table 2 Feigenbaum iteration chaos algorithm

The convergence strategy is proposed for finding the optimum value of a single-peak function. As a single-peak function has the unique global optimal solution, the chaos will make the particle further away from the optimal position. The convergence strategy will make the particles closer to the global optimal particle, and the little disturbance could make the particles speed up closer to the global optimal solution.

The PSOs adopting standard strategy, chaos strategy and convergence strategy are described in Table 3, which are respectively called “standard PSO”, “chaos PSO” and “fast PSO”.

Table 3 The processing step of the hybrid PSO-SVR algorithm

4 PSO-SVR Algorithm with Historical Momentum

4.1 Weekly Similarity and Holiday Similarity Analysis of Traffic Flow

This paper discovers the weekly similarity and holiday similarity of traffic flow through observing the traffic flow figure of one road. For example, the traffic flow figure of a Sunday is very similar with those of other Sundays. To measure the similarity, this paper uses the \(SC\) (Similarity Coefficient) as Eq. (13), \(SC\le 1\). The bigger \(SC\) is, the more similar the data are. In Eq. (13), \(n\) is the number of days needing to be compared, \(D_i ,D_j (i,j=1,2,\ldots ,n)\) are the flow data on the \(i{th}\) and \(j{th}\) days of \(n\) days. \(R\left( {D_i ,D_j } \right) \) is the correlation coefficient’s square of \(D_i ,D_j \), which is defined in Eq. (14) and is similar with the \(r^{2}\) defined in Eq. (8). \(k\) represents the number of the time periods in one day, and it is set as 96 in this paper.

$$\begin{aligned}&\displaystyle SC=\frac{\sum \nolimits _{1\le i\le j\le n} R(D_i ,D_j )}{n(n-1)/2}&\end{aligned}$$
(13)
$$\begin{aligned}&\displaystyle R\left( {D_i ,D_j } \right) =\frac{\left( {k\mathop \sum \nolimits _{l=1}^k D_{il} D_{jl} -\mathop \sum \nolimits _{l=1}^k D_{il} \mathop \sum \nolimits _{l=1}^k D_{jl} } \right) ^{2}}{\left( {k\mathop \sum \nolimits _{l=1}^k D_{il}^2 -C\left( {\mathop \sum \nolimits _{l=1}^k D_{il} } \right) ^{2}} \right) \left( {\mathop \sum \nolimits _{l=1}^k D_{jl}^2 -C\left( {\mathop \sum \nolimits _{l=1}^k D_{jl} } \right) ^{2}} \right) }\qquad \quad&\end{aligned}$$
(14)

Tables 5 and 6 display the weekly similarity and holiday similarity of traffic flow on road AL1770 and road AL3179 in April, May and June, 2013. The real data series come from the Highways Agency [32]. Table 4 displays which days are used to calculate the similarity of traffic flow data on some day in one week with the same day in other weeks. The traffic flow data of 06-05 and 27-05 are not used to calculate the similarity of that of Monday, but that of the holiday, because these two days are Bank Holiday in England. From Tables 5 and 6, we can see that the short-term traffic flow has strong weekly and holiday similarity, and especially we find that the \(SC\) of the traffic flow data of 06-05 and 27-05 is 1. To verify whether the result is caused by the error of data, this paper also check the \(SC\) of the traffic flow data of these two days on road LM1, road LM2, road LM3 and road LM4, and find that all \(SC\)s are 1.

Table 4 Data used to calculate similarity
Table 5 The similarity of traffic flow—AL1770
Table 6 The similarity of traffic flow—AL3179

4.2 PSO-SVR with Historical Momentum

PSO-SVR needs precise traffic flow data for prediction. The noises in the traffic data reduce the prediction accuracy because they break the original statistical law of the traffic flow data. Tables 7 and 8 display the prediction result using the traffic flow data with noises manually added, where \(\textit{RMSE}\) and \(\textit{Rsquared}\) are used to evaluate the forecasting performance of the model. The smaller \(\textit{RMSE}\) is, the higher the forecasting accuracy is, and the bigger \(\textit{Rsquared}\) is, the higher the forecasting accuracy is. It can be seen that in most cases the prediction result is worse when there are more noises in the traffic data.

Table 7 The forecasting based on PSO-SVR with data containing noise-AL1770
Table 8 The forecasting based on PSO-SVR with data containing noise-AL3179

To reduce the influence of the noises, this paper presents the “short-term traffic flow forecasting method with historical momentum” on the basis of PSO-SVR, which is based on the similarity of historical traffic flow data. It is defined by Eq. (15), the variables in which are described in Table 9. In Eq. (15), \(h\) represents the historical data which will influence the forecasting process, and and \(p\) is the forecasting result based on the original PSO-SVR algorithm. When \(p<h\), the final forecasting result will increase to \(h\) based on the Eq. (15), and when \(p>h\), the final forecasting result will decrease to \(h\). It can be seen that due to the effect of the historical traffic flow data in the forecasting process, the new method performs well in the forecasting when the data contain noises, which is verified by the experiments in Sect. 5.

$$\begin{aligned} O=p+\alpha \left( {h-p} \right) \end{aligned}$$
(15)
Table 9 Description of variables in Eq. (15)

5 Experiments and Results Analysis

5.1 Experimental Data

The experimental data series are managed by the Highways Agency [32], known as the Strategic Road Network in England, which contain average journey time, speed and traffic flow information on all motorways and ’A’ roads every 15 minutes since April 2009. The data use “Linkref” to identify a road (a junction-to-junction link) in the Highway’s Agency managed road network. This paper randomly selects the traffic flow data of two roads in April, May and June, 2013. The Linkref of the two roads is AL1770, AL3179 respectively.

5.2 Experimental Environment

Experiments are conducted under the configuration of Ubuntu Server 12.04, with Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz and 2GB RAM. In this experiment, the input vector length of the SVR and BPNN is set as 4. In order to stop the deviation propagating forward we adopt the static forecasting, which means that the forecasting value of the traffic flow at \(t+1\) will use the real value at \(t\) instead of the forecasting value at \(t\).

5.3 Performance Analysis of the Hybrid PSO-SVR Method

5.3.1 Analysis of the Particle Number and the Iteration Number

The aim of this experiment is to find the least particle number and the least iteration number when the PSO converges in parameters searching of SVR. In the following the standard PSO, chaos PSO, fast PSO are respectively called PSO, cPSO, fPSO for short.

The traffic flow data of the road AL1770 on 01-05, 15-05, 31-05 are used as the training datasets. The experiment is done according to the algorithm process described in Table 2, and each training dataset is tested for 10 times. The parameters in this experiment are set as Table 10, among which \(w,\varphi _1 ,\varphi _2 \) are described in Eqs. (9) and (10), and \(\gamma ,C,\epsilon \) are described in Sect. 3.1. The result of the experiment is shown in Tables 11, 12 and 13, where the \(E{\text {-}}\textit{RMSE}\) and \(E{\text {-}}\textit{Rsquared}\) are the average \(\textit{RMSE}\) and \(\textit{R-squared}\) values of 10 times training results for each training dataset using SVR, and the \(E{\text {-}}iters\) is the average iteration number of PSO in the 10 times training experiments for each training dataset.

Table 10 The experimental parameters’ settings
Table 11 The result of SVR parameters selecting by PSO- with data on 01-05-2013
Table 12 The result of SVR parameters selecting by PSO- with data on 15-05-2013
Table 13 The result of SVR parameters selecting by PSO- with data on 31-05-2013

It can be seen from Tables 11, 12 and 13 that:

  1. (1)

    The fPSO’s performance with 5 particles is close to that with 100 particles, so it can converge to a certain result with only 5 particles.

  2. (2)

    The standard PSO’s performance with 5 particles is much different from that with 100 particles, so it cannot converge to a certain result with 5 particles. So is the cPSO.

  3. (3)

    According to the training result of SVR, we can see that three different PSOs’ performances are close to each other when the number of particles is 100.

  4. (4)

    The iteration number of the cPSO is the least and that of the fast PSO is less.

5.3.2 Analysis of the Algorithm’s Running Time

To find a proper PSO for SVR, we do another experiment to compare the running time of three different PSOs. The parameters and training data are the same with those in the above experiment, except the maximum iteration number of the fPSO, which is set as 150 for the fPSO can converge at about the 150th iteration. Besides, the fPSO’s swarm size is 5, and the other two PSOs’ swarm sizes are 10 because the standard PSO and the cPSO can converge to a steady result with 10 particles. One experiment is carried out for each training dataset. The result is displayed in Table 14.

Table 14 The comparison of three PSOs

From Table 14 we can see that the fPSO performs closely to the standard PSO, but the former needs less time, and the cPSO performs the worst. To sum up, we can make the conclusion as the following.

  1. (1)

    The performances of three different PSOs are close to each other with enough particles.

  2. (2)

    The fPSO needs the least particles to converge to a steady result, which makes the fPSO spend less time to finish the parameter searching of SVR. Therefore, the fPSO is more suitable for the parameter searching of SVR.

5.4 Comparison Experiments

To test the performance of the PSO-SVR for the forecasting of short-term traffic flow, we compare ARIMA and BPNN (Back Propagation Neural Network) with it. This experiment chooses the traffic flow data of road AL1770 in random 7 days of May in 2013 as the static forecasting target. The train dataset and test dataset are the traffic flow data of road AL1770 in the 30 days of April in 2013. The description about each model is detailed as follows.

(1) PSO-SVR: The length of input vector is 4, and the parameter combination \((\gamma ,C,\epsilon )\) of SVR, which is selected by PSO, is (0.000214847057443, 833.08646366, 0.00000524635605413).

(2) BPNN: The length of input vector is set as 4. The network has four layers, of which the input layer has 4 nodes, the first hidden layer has 2 nodes, the second hidden layer has 4 nodes, and the output layer has 1 node. It is trained 10,000 times.

(3) ARIMA: It is realized by Eviews, and the model is ARIMA (1,0,1).

The experiment result is displayed in Table 15; from which we can see that ARIMA gets the worst forecasting result, and the PSO-SVR gets the best forecasting result.

Table 15 The comparison of PSO-SVR, ARIMA, SVR and BPNN

From the results of Table 15, the performance of PSO-SVR is better than other typical forecasting methods (SVR, ARIMA and BPNN). In order to verify the effectiveness of the proposed method, the PSO-SVR is compared with other state-of-the-art methods in nowadays, such as GA-SA (Hybrid Genetic Algorithm—Simulated Annealing Algorithm) [33], KNN-LWNN (K Nearest Neighbor based on Linear Wavelet Neural Network) [8] and SSA (Singular Spectrum Analysis) [34]. These three algorithms mentioned above are the most representative of different hybrid intelligent approaches. In the comparison experiments, we implement these algorithms since the corresponding references detail the algorithm steps. These experiments choose the traffic flow data of road AL1770 in random 7 days of May in 2013 as the static forecasting target. The training dataset and test dataset are the traffic flow data of road AL1770 in the 30 days of April in 2013. The forecasting results are shown in Table 16. The proposed PSO-SVR owns better performance than other state-of-the-art methods in forecasting accuracy.

Table 16 The comparison of PSO-SVR with other state-of-the-art methods

5.5 Performance Analysis of the PSO-SVR Method with Historical Momentum

The aim of this experiment is to test the “short-term traffic flow forecasting method with historical momentum” with the traffic flow data containing noises. The experiment data are the traffic flow data of road AL1770 and road AL3179 on 01-05-2013, 05-05-2013, 27-05-2013, 28-05-2013, 31-05-2013.

The process of this experiment is as follows.

  1. (1)

    Add 5 noises into each experiment data manually.

  2. (2)

    Use PSO-SVR to forecast the traffic flow values of road AL1770 and road AL3179 on 01-05-2013, 05-05-2013, 27-05-2013, 28-05-2013, 31-05-2013.

  3. (3)

    Use Eq. (15) to adjust the result of 2). To get the best \(\alpha \), we change the value of \(\alpha \) from 0 to 1 every 0.01 step size, and find the best one according to the last forecasting result. The traffic flow data of 24-04-2013, 28-04-2013, 06-05-2013, 21-05-2013, 24-05-2013 are chosen as the historical data of 01-05-2013, 05-05-2013, 27-05-2013, 28-05-2013, 31-05-2013.

The results of this experiment are displayed in Tables 17, 18 and Fig. 2. In Tables 17 and 18, the “prediction with momentum” is the forecasting result of “short-term traffic flow forecasting with historical momentum”, and the “PSO-SVR” is the forecasting result of PSO-SVR model. In Figure 2 the Y-axis represents the traffic flow, and the X-axis represents the 96 time periods of a day.

Fig. 2
figure 2figure 2

The result of PSO-SVR with historical momentum: a AL1770(01-05-2013); b AL1770(05-05-2013); c AL1770(27-05-2013); d AL1770(28-05-2013); e AL1770(31-05-2013); f AL3179(01-05-2013); g AL3179 (05-05-2013); h AL3179(27-05-2013); i AL3179(28-05-2013); j AL3179(31-05-2013). The Y-axis unit is vehicles, and the X-axis unit is a quarter

Table 17 The result of PSO-SVR with historical momentum- AL1770
Table 18 The result of PSO-SVR with historical momentum- AL3179

From the experimental results, we can get the findings as follows.

  1. (1)

    The PSO-SVR algorithm with historical momentum can improve the static forecasting result when the data contain noises.

  2. (2)

    The \(\alpha \) in this experiment is relatively high, which is caused by the strong historical similarity of the traffic flow data used in this experiment.

6 Conclusions

In order to ensure the accurate and real-time prediction of the short-time traffic flow, this paper proposes a hybrid PSO-SVR forecasting method. This method uses PSO to optimize the parameter setting of SVR and puts forward different strategies to handle the particles flying out of the searching space to adapt to the difference of the forecasting traffic data distribution. Besides, a PSO-SVR forecasting method with histrorical momentum is proposed to reduce the influence of the noises in the traffic flow data. In order to verify the validity of the proposed method, we carry out a large number of performance experiments and comparative experiments. To sum up we can make the conclusions as follows.

  1. (1)

    Through the comparision with ARIMA and BPNN, the proposed hybrid PSO-SVR method is confirmed to have good performance on the short-term traffic flow forecasting.

  2. (2)

    To find a proper PSO for SVR, this paper proposes three different strategies to handle the particles flying out of the searching space. One of these strategies is proved to make PSO converge with fewer particles and reduce the time used in the parameters searching of SVR, which is called fast PSO.

  3. (3)

    To solve the problem that the noises in traffic flow data reduce the prediction accuracy of PSO-SVR, this paper proposes the “short-term traffic flow forecasting with historical momentum” based on the similarity of the historical traffic flow data. The experimental results indicate that this method can get a good forecasting result when the traffic flow data contain noises.

In spite of the work discussed above, further research is still needed to continue in the following two aspects.

  1. (1)

    The similarity of the historical traffic flow data are not only used to select the training dataset, but also to quantify the historical influence factor \(\alpha \).

  2. (2)

    From the forecasting research, it is found that the traffic flow data of different roads in different time periods have obvious characteristics. For example, the traffic flow data of the same road are different in weekends, workdays and holidays, even in the same time period the traffic flow data of different roads are not the same. Therefore, combined with different factors of roads, the characteristic of the traffic flow data should be further focused on to study the influence of the road structure, geographical positions and the daily life of people on the traffic flow data.