1 Introduction

The effective capacity of inter-urban motorway networks is an essential component of traffic control and information systems, particularly during daily peak periods. Since slightly inaccurate capacity predictions will lead to congestion with huge social costs in terms of travel time, fuel costs and environment pollution, accurate forecasting of the traffic flow during peak periods is a very topic attracted interest in the literature.

There has been a wide variety of forecasting approaches applied to forecast the traffic flow of inter-urban motorway networks. Those approaches could be classified according to the type of data, forecast horizon, and potential end-use [1], including Kalman state space filtering models [25] and system identification models [6]. However, traffic flow data are in the form of spatial time series and are collected at specific locations at constant intervals of time. The above-mentioned studies and their empirical results have indicated that the problem of forecasting inter-urban motorway traffic flow is multi-dimensional, including relationships among measurements made at different times and geographical sites. In addition, these methods have difficultly coping with observation noise and missing values while modeling. Therefore, Danech-Pajouh and Aron [7] employed a layered statistical approach with a mathematical clustering technique to group the traffic flow data and a separately tuned linear regression model for each cluster. Their experimental results revealed that the proposed model is superior to the other forecasting approach—autoregressive integrated moving average models (ARIMA). Based on the multi-dimensional pattern recognition requests, such as intervals of time, geographical sites, and the relationships between dependent variable and independent variables, non-parametric regression models [810] have also successfully been employed to forecast motorway traffic flow.

Furthermore, the ARIMA models, initially developed by Box and Jenkins [11], are one of the most popular alternatives in traffic flow forecasting [10, 1215]. For example, Kamarianakis and Prastacos [13] successfully employed the ARIMA model with space and time factors to forecast space–time stationary traffic flow. However, the limitation of ARIMA models is that their natural tendency to concentrate on the mean values of the past series data seems unable to capture the rapid variational process underlying of traffic flow [16]. Recently, as an extension of ARIMA model, Williams [17] applied seasonal ARIMA (SARIMA) model to traffic flow forecasting. The proposed model considered the peak/non-peak flow periods by seasonal differencing and forecasting results reported that it significantly outperformed the heuristic forecast generation method in terms of forecasting accuracy. However, it is quite time-consuming to detect the outlier required and to estimate the parameter of SARIMA model. These new findings are also encouraging the author to employ the SARIMA model as the bench model in this study.

As mentioned above that the process underlying inter-urban traffic flow is complicated to be captured by a single linear statistical algorithm, the artificial neural networks (ANN) models, able to approximate any degree of complexity and without prior knowledge of problem solving, have received much attention and been considered as alternatives for traffic flow forecasting models [14, 1823]. ANN is based on a model of emulating the processing of the human neurological system to determine related numbers of vehicle and temporal characteristics from the historical traffic flow patterns, especially for nonlinear and dynamic evolutions. Therefore, ANN is widely applied in traffic flow forecasting. Recently, Yin et al. [24] developed a fuzzy-neural model (FNM) to predict traffic flow in an urban street network. The FNM contains two modules: gate network (GN) and expert network (EN). The GN classifies the input data using fuzzy approach, and the EN identifies the input–output relationship by neural network approaches. The empirical results showed that the FNM model provides more accurate forecasting results than the BPNN model. Vlahogianni et al. [22], successfully considering based on the proper representation of traffic flow data with temporal and spatial characteristics, employed a genetic algorithm-based, multilayered, structural optimization strategy to determine the appropriate neural network structure. Their results show that the capabilities of a simple static neural network, with genetically optimized step size, momentum, and number of hidden units, are very satisfactory when modeling both univariate and multivariate traffic data. Even though ANN-based forecasting models could approximate any function particularly for nonlinear function, the limitations are not only difficult to explain the operations of the so-called black-box (such as how to determine suitable network structure), but also the problem of any ANN algorithm minimizing network training errors is non-convex and it is hard to find the global optimum.

Support vector machines (SVM) were originally developed to solve pattern recognition and classification problems. With the introduction of Vapnik’s ε-insensitive loss function, SVMs have been extended to solve nonlinear regression estimation problems, i.e., the so-called support vector regression (SVR), and have been successfully applied to solve forecasting problems in many fields in many fields, such as financial time series (stocks index and exchange rate) forecasting [2529] engineering and software field (production values and reliability) forecasting [30, 31], atmospheric science forecasting [3235], electric load forecasting [3640], and so on. The practical results indicated that poor forecasting accuracy is suffered from the lack of knowledge of the selection of the three parameters (σ, C, and ε) in an SVR model. However, the structured ways in determining three free parameters in an SVR model is poor. Recently, some major nature-inspired evolutionary algorithms are applied to solve optimization problems, immune algorithm (IA) is one among them. IA, proposed by Mori et al. [41] is used in this study and is based on the learning mechanism of natural immune systems. Similar to GA, SA, and PSO, IA is also a population-based evolutionary algorithm; therefore, it provides a set of solution for exploration and exploitation of search space to obtain optimal/near optimal solution [42]. In addition, the diversity of the employed population set will determine the searching results, the desired solution, or premature convergence (trapped into local minimum). As special mechanism to avoid being trapped in local minimum, the ergodicity property of chaotic sequences has been used as an optimization technique to hybridize with evolutionary algorithms. In this investigation, the chaotic immune algorithm (CIA) is tried to determine the values of three parameters in an SVR model. On the other hand, as mentioned that the traffic flow data not only involves a complicated nonlinear data pattern, but also reveals cyclic (seasonal) trend during daily peak periods (morning/evening commute peak time). However, the applications of SVR models to deal with cyclic (seasonal) trend time series had not been widely explored. Therefore, this paper also attempts to apply the seasonal adjustment method [43, 44] to deal with seasonal trend time series problem. Thus, the proposed SSVRCIA model is applied to forecast inter-urban motorway traffic flow in Panchiao city of Taipei County, Taiwan. The rest of this paper is organized as follows. Section 2 presents the models for comparing forecast performance and SVR models. Section 3 introduces the proposed SSVRCIA forecasting model. Section 4 illustrates a numerical example that reveals the forecasting performance of the proposed models. Conclusions are finally made in Sect. 5.

2 Forecasting methodology

In this investigation, two models, the seasonal ARIMA (SARIMA), seasonal Holt–Winters (SHW), back-propagation neural network (BPNN) models and the SSVRCIA model, are used to compare the forecasting performance of traffic flow.

2.1 Seasonal autoregressive integrated moving average (SARIMA) model

Proposed by Box and Jenkins [11], the seasonal ARIMA process has been one of the most popular approaches in time series forecasting, particularly for strong seasonal component. The SARIMA process is often referred to as the \( {\text{SARIMA}}(p,d,q) \times (P,D,Q)_{S} \) model. Similar to the ARIMA model, the forecasting values are assumed to be a linear combination of past values and past errors. A time series \( \left\{ {X_{t} } \right\} \) is a SARIMA process with seasonal period length S if d and D are nonnegative integers and if the differenced series \( W_{t} = (1 - B)^{d} (1 - B^{S} )^{D} X_{t} \) is a stationary autoregressive moving average process. In symbolic terms, the model can be written as

$$ \phi_{p} (B)\Upphi_{P} (B^{S} )W_{t} = \theta_{q} (B)\Uptheta_{Q} (B^{S} )\varepsilon_{t} ,\quad t = 1,2, \ldots ,N $$
(1)

where N is the number of observations up to time t; B is the backshift operator defined by \( B^{a} W_{t} = W_{t - a} \); \( \phi_{p} (B) = 1 - \phi_{1} B - \cdots - \phi_{p} B^{p} \) is called a regular (non-seasonal) autoregressive operator of order p; \( \Upphi_{P} (B^{S} ) = 1 - \Upphi_{1} B^{S} - \cdots - \Upphi_{P} B^{PS} \) is a seasonal autoregressive operator of order P; \( \theta_{q} (B) = 1 - \theta_{1} B - \cdots - \theta_{q} B^{q} \) is a regular moving average operator of order q; \( \Uptheta_{Q} (B^{S} ) = 1 - \Uptheta_{1} B^{S} - \cdots - \Uptheta_{Q} B^{QS} \) is a seasonal moving average operator of order Q; ε t is identically and independently distributed as normal random variables with mean zero, variance σ2 and \( {\text{cov}}(\varepsilon_{t} ,\varepsilon_{t - k} ) = 0 \), \( \forall k \ne 0 \).

In the definition above, the parameters p and q represent the autoregressive and moving average order, respectively; and the parameters P and Q represent the autoregressive and moving average order at the model’s seasonal period length, S, respectively. The parameters d and D represent the order of ordinary and seasonal differencing, respectively.

Basically, when fitting a SARIMA model to data, the first task is to estimate values of d and D, the orders of differencing needed to make the series stationary and to remove most of the seasonality. The values of p, P, q, and Q then need to be estimated by the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the differenced series. Other model parameters may be estimated by suitable iterative procedures.

2.2 Seasonal Holt–Winters (SHW) model

To consider the seasonal effect, the second employed model is the seasonal Holt–Winters’ linear exponential smoothing (SHW) approach, which is extended from the Holt–Winters model [45, 46]. The Holt–Winters method cannot be extended to accommodate additive seasonality if the magnitude of the seasonal effects does not change with the series or multiplicative seasonality if the amplitude of the seasonal pattern changes over time. The forecast for SHW model is as follows:

$$ s_{t} = \alpha {\frac{{a_{t} }}{{I_{t - L} }}} + (1 - \alpha )(s_{t - 1} + b_{t - 1} ) $$
(2)
$$ b_{t} = \beta (s_{t} - s_{t - 1} ) + (1 - \beta )b_{t - 1} $$
(3)
$$ I_{t} = \gamma {\frac{{a_{t} }}{{s_{t} }}} + (1 - \gamma )I_{t - L} $$
(4)
$$ f_{t} = (s_{t} + ib_{t} )I_{t - L + i} $$
(5)

where a t is the actual value at time t; s t is the smoothed estimate at time t; b t is the trend value at time t; α is the level smoothing coefficient; and β is the trend smoothing coefficient. L is the length of seasonality; I is the seasonal adjustment factor; and γ is the seasonal adjustment coefficient.

Equation (2) lets the actual value be smoothed in a recursive manner by weighting the current level (α), and then adjusts s t directly for the trend of the previous period, b t−1, by adding it to the last smoothed value, s t−1. This helps to eliminate the lag and brings s t to the approximate base of the current data value. In addition, the first term of (2) is divided by the seasonal number I tL ; this is done to de-seasonalize a t (eliminate seasonal fluctuations from a t ). Equation (3) updates the trend, which is expressed as the difference between the last two smoothed values. It modifies the trend by smoothing with β in the last period (s t  − s t−1) and adding that to the previous estimate of the trend multiplied by (1 − β). Equation (4) is comparable to a seasonal index that is found as a ratio of current values of the series, a t , divided by the smoothed value for the series, s t . If a t is larger than s t , the ratio will be greater than 1, else, the ratio will be less than 1. In order to smooth the randomness of a t , (4) weights the newly computed seasonal factor with γ and the most recent seasonal number corresponding to the same season with (1 − γ). Equation (5) is used to forecast ahead. The trend, b t , is multiplied by the number of periods ahead to be forecast, i, and added to the base value, s t , finally, the summation of s t and ib t is multiplied by the seasonal number I tL+i . The forecast error (e t ) is defined as the actual value minus the forecast (fitted) value for time period t, that is:

$$ e_{t} = a_{t} - f_{t} $$
(6)

The forecast error is assumed to be an independent random variable with zero mean and constant variance. Values of smoothing coefficients, α and β, and seasonal adjustment coefficient, γ, are determined to minimize the forecasting error.

2.3 Back-propagation neural networks (BPNN) model

The multi-layer back-propagation neural network (BPNN) is one of the most widely used neural network models. Consider the simplest BPNN architecture including three layers: an input layer (x), an output layer (o), and a hidden layer (h). The computational procedure of this network is described below:

$$ o_{i} = f\left( {\sum\limits_{j} {g_{ij} x_{ij} } } \right) $$
(7)

where o i denotes the output of node i, f(·) represents the activation function, g ij is the connection weight between nodes i and j in the lower layer which can be replaced with v ji and w kj , and x ij denotes the input signal from the node j in the lower layer.

The BPNN algorithm attempts to improve neural network performance by reducing the total error through changing the gradient weights. The BPNN algorithm minimizes the sum-of-error-square, which can be calculated by:

$$ E = \frac{1}{2}\sum\limits_{p = 1}^{P} {\sum\limits_{j = 1}^{K} {\left( {d_{pj} - o_{pj} } \right)^{2} } } $$
(8)

where E denotes the square errors, K represents the output layer neurons, P is the training data pattern, d pj denotes the actual output and o pj represents the network output. The BPNN algorithm is expressed as follows. Let Δv ji denote the weight change for any hidden layer neuron and Δw kj for any output layer neuron,

$$ \Updelta v_{ji} = - \eta {\frac{\partial E}{{\partial v_{ji} }}}\quad i = 1, \ldots ,I,\; \, j = 1, \ldots ,J - 1, $$
(9)
$$ \Updelta w_{kj} = - \eta {\frac{\partial E}{{\partial w_{kj} }}}\quad j = 1, \ldots ,J - 1,\; \, k = 1, \ldots ,K $$
(10)

where η represents the learning rate parameter, specified at the start of training cycle and determining the training speed and stability of the network. Notably, the Jth node is the bias neuron without weight. Equations (11) and (12) express the signal (s j ) to each hidden layer neuron and the signal (u k ) to each neuron in the output layer are expressed as \( s_{j} = \sum\nolimits_{i = 1}^{I} {v_{ji} x_{i} } \) and \( u_{k} = \sum\nolimits_{j = 1}^{J - 1} {w_{kj} y_{j} } \), respectively.

The error signal terms for the jth hidden neuron δ yj , and for the kth output neuron δ ok are defined as \( \delta_{yj} = - {\frac{\partial E}{{\partial s_{j} }}} \) and \( \delta_{ok} = - {\frac{\partial E}{{\partial u_{k} }}} \), respectively.

Applying the chain rule, the gradients of the cost function with respect to weights v ji and w kj are \( {\frac{\partial E}{{\partial v_{ji} }}} = {\frac{\partial E}{{\partial s_{j} }}}\,{\frac{{\partial s_{j} }}{{\partial v_{ji} }}} \) and \( {\frac{\partial E}{{\partial w_{kj} }}} = {\frac{\partial E}{{\partial u_{k} }}}\,{\frac{{\partial u_{k} }}{{\partial w_{kj} }}} \), respectively. Then, obviously, the gradients of s j and u k with respect to weights v ji and w kj are \( {\frac{{\partial s_{j} }}{{\partial v_{ji} }}} = x_{i} \) and \( {\frac{{\partial u_{k} }}{{\partial w_{kj} }}} = y_{j} \), respectively. By combining above mention equations, we will obtain \( {\frac{\partial E}{{\partial v_{ji} }}} = - \delta_{yj} x_{i} \) and \( {\frac{\partial E}{{\partial w_{kj} }}} = - \delta_{ok} y_{j} \). Finally, the weight change from (9) and (10) can now be written as \( \Updelta v_{ji} = - \eta {\frac{\partial E}{{\partial v_{ji} }}} = \eta \delta_{yj} x_{i} \) and \( \Updelta w_{kj} = - \eta {\frac{\partial E}{{\partial e_{kj} }}} = \eta \delta_{ok} y_{j} \), respectively. The weights, v ji and w kj , are changed as (11) and (12),

$$ w_{kj} = w_{kj} + \Updelta w_{kj} = w_{kj} + \eta \delta_{ok} y_{j} $$
(11)
$$ v_{ji} = v_{ji} + \Updelta v_{ji} = v_{ji} + \eta f_{j}^{\prime } (u_{j} )x_{i} \sum\limits_{k = 1}^{K} {\delta_{ok} w_{kj} } $$
(12)

The most common activation functions are the squashing sigmoid function, such as the logistic and tangent hyperbolic functions.

2.4 Support vector regression (SVR) model

The brief ideas of SVMs for the case of regression are introduced. A nonlinear mapping \( \varphi ( \cdot ):\Re^{n} \to \Re^{{n_{h} }} \) is defined to map the input data (training data set) \( \left\{ {({\mathbf{x}}_{i} ,y_{i} )} \right\}_{i = 1}^{N} \) into a so-called high dimensional feature space (which may have infinite dimensions), \( \Re^{{n_{h} }} \). Then, in the high dimensional feature space, there theoretically exists a linear function, f, to formulate the nonlinear relationship between input data and output data. Such a linear function, namely SVR function, is as (13),

$$ f({\mathbf{x}}) = {\bf{w}}^{\text{T}} \varphi ({\mathbf{x}}) + b $$
(13)

where f(x) denotes the forecasting values; the coefficients w (\( {\mathbf{w}} \in \Re^{{n_{h} }} \)) and b (\( b \in \Re \)) are adjustable. As mentioned above, SVM method one aims at minimizing the empirical risk by employing the ε-insensitive loss function to find out an optimum hyper plane on the high dimensional feature space to maximize the distance separating the training data into two subsets. Thus, the SVR focuses on finding the optimum hyper plane and minimizing the training error between the training data and the ε-insensitive loss function.Then, the SVR minimizes the overall errors,

$$ \mathop {\text{Min}}\limits_{{{\mathbf{w}},b,\xi^{*} ,\xi }} \;R_{\varepsilon } ({\mathbf{w}},\xi^{*} ,\xi ) = \frac{1}{2}{\mathbf{w}}^{\text{T}} {\mathbf{w}} + C\sum\limits_{i = 1}^{N} {(\xi_{i}^{*} + \xi_{i} )} $$
(14)

with the constraints

$$ \begin{array}{*{20}c} {{\mathbf{y}}_{i} - {\mathbf{w}}^{T} \varphi ({\mathbf{x}}_{i} ) - b \le \varepsilon + \xi_{i}^{*} ,} \hfill & {i = 1,2, \ldots ,N} \hfill \\ { - {\mathbf{y}}_{i} + {\mathbf{w}}^{T} \varphi ({\mathbf{x}}_{i} ) + b \le \varepsilon + \xi_{i} ,} \hfill & {i = 1,2, \ldots ,N} \hfill \\ {\xi_{i}^{*} \ge 0,} \hfill & {i = 1,2, \ldots ,N} \hfill \\ {\xi_{i} \ge 0,} \hfill & {i = 1,2, \ldots ,N} \hfill \\ \end{array} $$

After the quadratic optimization problem with inequality constraints is solved, the parameter vector w in (13) is obtained,

$$ {\mathbf{w}} = \sum\limits_{i = 1}^{N} {\left( {\beta_{i}^{*} - \beta_{i} } \right)\varphi ({\mathbf{x}}_{i} )} $$
(15)

where \( \beta_{i}^{*} \), β i are obtained by solving a quadratic program and are the Lagrangian multipliers. Finally, the SVR regression function is obtained as (16) in the dual space,

$$ f({\mathbf{x}}) = \sum\limits_{i = 1}^{N} {\left( {\beta_{i}^{*} - \beta_{i} } \right)K({\mathbf{x}}_{i} ,{\mathbf{x}})} + b $$
(16)

where K(x i , x j ) is called the kernel function, and the value of the Kernel equals the inner product of two vectors, x i and x j , in the feature space φ(x i ) and φ(x j ), respectively; that is, \( K({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) = \varphi ({\mathbf{x}}_{i} ) \circ \varphi ({\mathbf{x}}_{j} ) \). Any function that meets Mercer’s condition [47] can be used as the Kernel function.

There are several types of kernel function. The most used kernel functions are the Gaussian RBF with a width of \( \sigma :K({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) = \exp \left( { - 0.5{{\left\| {{\mathbf{x}}_{i} - {\mathbf{x}}_{j} } \right\|^{2} } \mathord{\left/ {\vphantom {{\left\| {{\mathbf{x}}_{i} - {\mathbf{x}}_{j} } \right\|^{2} } {\sigma^{2} }}} \right. \kern-\nulldelimiterspace} {\sigma^{2} }}} \right) \) and the polynomial kernel with an order of d and constants a 1 and a 2: \( K({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) = (a_{1} {\mathbf{x}}_{i} {\mathbf{x}}_{j} + a_{2} )^{d} \). Till now, it is hard to determine the type of kernel functions for specific data patterns [48, 49]. However, the Gaussian RBF kernel is not only easier to implement, but also capable to nonlinearly map the training data into an infinite dimensional space; thus, it is suitable to deal with nonlinear relationship problems. Therefore, the Gaussian RBF kernel function is specified in this study.

3 Chaotic immune algorithm (CIA) in selecting parameters and seasonal adjustment

3.1 CIA in selecting parameters

The selection of the three parameters, σ, ε, and C, of an SVR model influence the accuracy of forecasting. However, structural methods for confirming efficient selection of parameters efficiently are lacking. Recently, Hong [38] applied immune algorithm (IA) to determine parameters of an SVR model and found that the proposed model is superior to other competitive forecasting models (ANN and regression models). However, based on the operation procedure of IA, if the population diversity of an initial population cannot be maintained under selective pressure, i.e., the initial individuals are not necessarily fully diversified in the search space, then IA could only seek for the solutions in the narrow space and the solution is far from the global optimum (premature convergence). To overcome the shortcoming, it is necessary to find some effective approach and improved design or procedure on IA to track in the solution space effectively and efficiently. One feasible approach is focused on the chaos approach, due to its easy implementation and special ability to avoid being trapped in local optimum [50]. The application of chaotic sequences can be a good alternative to diversify the initial definition domain in stochastic optimization procedures, i.e., small changes in the parameter settings or the initial values in the model. Due to the ergodicity property of chaotic sequences, it will lead to very different future solution finding behaviors; thus, chaotic sequences can be used to enrich the searching behavior and to avoid being trapped into local optimum [51]. There are lots of applications in optimization problem using chaotic sequences [5256]. Coelho and Mariani [57] recently apply chaotic artificial immune network (chaotic opt-aiNET) to solve the economic dispatch problem (EDP), which are based on Zaslavsky’s map by its spread-spectrum characteristic and large Lyapunov exponent to successfully escape from local optimum and to converge to a stable equilibrium. Therefore, it is believable that applying chaotic sequences to diversify the initial definition domain in IA’s initialization procedure (CIA) is a feasible approach to optimize the parameter selection in an SVR model.

To design the CIA, many principal factors like identification of the affinity, selection of antibodies, crossover and mutation of antibody population are similar to the IA factors; more procedure details of the CIA on this study is as follows, and the flowchart is shown as Fig. 1.

Fig. 1
figure 1

Chaotic immune algorithm (CIA) flowchart

Step 1

Initialization of antibody population

The values of the three parameters in an SVR model in the ith iteration can be represented as \( X_{k}^{(i)} ,k = C,\sigma ,\varepsilon \). Set i = 0, and employ (17) to map the three parameters among the intervals (Min k , Max k ) into chaotic variable \( x_{k}^{(i)} \) located in the interval (0, 1).

$$ x_{k}^{(i)} = {\frac{{X_{k}^{(i)} - {\text{Min}}_{k} }}{{{\text{Max}}_{k} - {\text{Min}}_{k} }}},\quad k = C,\sigma ,\varepsilon $$
(17)

Then, employ the chaotic sequence, defined as (18), with μ = 4 to compute the next iteration chaotic variable, \( x_{k}^{(i + 1)} \).

$$ x^{(i + 1)} = \mu x^{(i)} (1 - x^{(i)} ) $$
(18)
$$ x^{(i)} \in (0,1),\quad i = 0, \, 1, \, 2, \ldots , $$

where x (i) is the value of the chaotic variable x at the ith iteration, μ is the so-called bifurcation parameter of the system, \( \mu \in [0,4] \).

And, transform \( x_{k}^{(i + 1)} \) to obtain three parameters for the next iteration, \( X_{k}^{(i + 1)} \), by the following (19).

$$ X_{k}^{(i + 1)} = {\text{Min}}_{k} + x_{k}^{(i + 1)} ({\text{Max}}_{k} - {\text{Min}}_{k} ) $$
(19)

After this transformation, the three parameters, C, σ, and ε, are constituted the initial antibody population and then will be represented by binary-code string. For example, assume that an antibody contains 12 binary codes to represent three SVR parameters. Each parameter is thus expressed by four binary codes. Assume the set-boundaries for parameters σ, C, and ε are 2, 10, and 0.5, respectively, then, the antibody with binary-code “1 0 0 1 0 1 0 1 0 0 1 1” implies that the real values of the three parameters σ, C, and ε are 1.125, 3.125, and 0.09375, respectively. The number of initial antibodies is the same as the size of the memory cell. The size of the memory cell is set to ten in this study.

Step 2

Identification of the affinity and the similarity

A higher affinity value implies that an antibody has a higher activation with an antigen. To continue keeping the diversity of the antibodies stored in the memory cells, the antibodies with lower similarity have higher probability of being included in the memory cell. Therefore, an antibody with a higher affinity value and a lower similarity value has a good likelihood of entering the memory cells. The affinity between the antibody and antigen is defined as (20).

$$ Ag_{k} = {1 \mathord{\left/ {\vphantom {1 {(1 + d_{k} )}}} \right. \kern-\nulldelimiterspace} {(1 + d_{k} )}} $$
(20)

where d k denotes the SVR forecasting errors obtained by the antibody k.

The similarity between antibodies is expressed as (21).

$$ Ab_{ij} = {1 \mathord{\left/ {\vphantom {1 {(1 + T_{ij} )}}} \right. \kern-\nulldelimiterspace} {(1 + T_{ij} )}} $$
(21)

where T ij denotes the difference between the two SVR forecasting errors obtained by the antibodies inside (existed) and outside (will be entering) the memory cell.

Step 3

Selection of antibodies in the memory cell

Antibodies with higher values of Ag k are considered to be potential candidates for entering the memory cell. However, the potential antibody candidates with Ab ij values exceeding a certain threshold are not qualified to enter the memory cell. In this investigation, the threshold value is set to 0.9.

Step 4

Crossover of antibody population

New antibodies are created via crossover and mutation operations. To perform crossover operation, strings representing antibodies are paired randomly. Moreover, the proposed scheme adopts the single-point-crossover principle. Segments of paired strings (antibodies) between two determined break-points are swapped. In this investigation, the probability of crossover (p c) is set as 0.5. Finally, the three crossover parameters are decoded into a decimal format.

Step 5

Annealing chaotic mutation of antibody population

For the ith iteration (generation), crossover antibody population (\( \hat{X}_{k}^{(i)} ,k = C,\sigma ,\varepsilon \)) of current solution space (Min k , Max k ) are mapped to chaotic variable interval [0, 1] to form the crossover chaotic variable space \( \hat{x}_{k}^{(i)} ,k = C,\sigma ,\varepsilon \), as (22),

$$ \hat{x}_{k}^{(i)} = {\frac{{\hat{X}_{k}^{(i)} - {\text{Min}}_{k} }}{{{\text{Max}}_{k} - {\text{Min}}_{k} }}},\quad k = C,\sigma ,\varepsilon ,\;i = 1,2, \ldots ,q_{\max } $$
(22)

where q max is the maximum evolutional generation of the population. Then, the ith chaotic variable \( x_{k}^{(i)} \) is summed up to \( \hat{x}_{k}^{(i)} \), and the chaotic mutation variable are also mapped to interval [0, 1] as in (23),

$$ \tilde{x}_{k}^{(i)} = \hat{x}_{k}^{(i)} + \delta x_{k}^{(i)} $$
(23)

where δ is the annealing operation. Finally, the chaotic mutation variable obtained in interval [0, 1] is mapped to the solution interval (Min k , Max k ) by definite probability of mutation (p m ), thus completing a mutative operation.

$$ \tilde{X}_{k}^{(i)} = {\text{Min}}_{k} + \tilde{x}_{k}^{(i)} \left( {{\text{Max}}_{k} - {\text{Min}}_{k} } \right). $$
(24)

Step 6

Stopping criteria

If the number of generations equals a given scale, then the best antibody is a solution, otherwise return to Step 2.

The CIA is used to seek a better combination of the three parameters in SVR. The value of the normalized root mean square error (NRMSE) is used as the criterion (the smallest value of NRMSE) of forecasting errors to determine the suitable parameters used in SVR model in this investigation, which is given by (25).

$$ {\text{NRMSE}} = \sqrt {{{\sum\nolimits_{i = 1}^{n} {(a_{i} - f_{i} )^{2} } } \mathord{\left/ {\vphantom {{\sum\nolimits_{i = 1}^{n} {(a_{i} - f_{i} )^{2} } } {\sum\nolimits_{i = 1}^{n} {a_{i}^{2} } }}} \right. \kern-\nulldelimiterspace} {\sum\nolimits_{i = 1}^{n} {a_{i}^{2} } }}} $$
(25)

where n is the number of forecasting periods; a i is the actual traffic flow value at period i; and f i is the forecasting traffic flow value at period i.

3.2 Seasonal adjustment

As mentioned that during daily peak periods, traffic flow data reveals cyclic (seasonal) trend, any model attempts to accomplish the goal of high accurate forecasting performance, it is necessary to estimate this seasonal component. There are several approaches to estimate the seasonal index of data series [44, 58, 59], including product-model type and non-product-model type. Based on the data series type consideration, this investigation employed Deo and Hurvich’s [58] approach to compute the seasonal index, as shown in (26),

$$ {\text{peak}}_{t} = {\frac{{a_{t} }}{{f_{t} }}} = {\frac{{a_{t} }}{{\sum\nolimits_{i = 1}^{N} {\left( {\beta_{i}^{*} - \beta_{i} } \right)K({\mathbf{x}}_{i} ,{\mathbf{x}}) + b} }}} $$
(26)

where t = j, l + j, 2 l + j,…, (m − 1)l + j only for the same peak time point in each period. Then, the seasonal index (SI) for each peak time point j is computed as (27),

$$ {\text{SI}}_{j} = \frac{1}{m}\left( {{\text{peak}}_{j} + {\text{peak}}_{l + j} + \cdots + {\text{peak}}_{(m - 1)l + j} } \right) $$
(27)

Eventually, the forecasting value of the SSVRCIA is obtained by (28),

$$ f_{N + k} = \left( {\sum\limits_{i = 1}^{N} {\left( {\beta_{i}^{*} - \beta_{i} } \right)K({\mathbf{x}}_{i} ,{\mathbf{x}}_{N + k} )} + b} \right) \times {\text{SI}}_{k} $$
(28)

where k = j, l + j, 2 l + j,…, (m − 1)l + j implies the peak time point in another period (for forecasting period).

4 A numerical example and experimental results

The traffic flow data sets were originated from three civil motorway detector sites. The civil motorway is the busiest inter-urban motorway networks in Panchiao city, the capital of Taipei County, Taiwan. The major site was located at the center of Panchiao City, where the flow intersects an urban local street system, and it provided one way traffic volume for each hour in weekdays. Therefore, one way flow data for peak traffic are employed in this investigation, which includes the morning peak period (from 6:00 to 10:00) and the evening peak period (from 16:00 to 20:00). The data collection is conducted from February 2005 to March 2005. During the observation period, the number of traffic flow data available for the morning and evening peak periods are 45 and 90 h, respectively. For convenience, the traffic flow data are converted to equivalent of passengers (EOP), and both of these two peak periods show the seasonality of traffic data. In addition, traffic flow data are divided into three parts: training data, validation data, and testing data. For the morning peak period, the training data set, validation data set, and testing data set are 30, 10, and 10 h accordingly. For the evening peak period, the experimental data are arranged as training data (60 h), validation data (15 h), and testing data (15 h).

4.1 Parameter determination of different comparative forecasting models

The parameter selection of forecasting models is important for obtaining good forecasting performance. For the SARIMA model, the parameters are determined by taking the first-order regular difference and first seasonal difference to remove non-stationary and seasonality characteristics. Using statistical packages, with no residuals autocorrelated and approximately white noise residuals, the most suitable models for these two morning/evening peak periods for the traffic data are \( {\text{SARIMA}}(1,0,1) \times (0,1,1)_{5} \) with non-constant item and \( {\text{SARIMA}}(1,0,1) \times (1,1,1)_{5} \)with constant item, respectively. The equations used for the SARIMA models are presented as (29) and (30), respectively.

$$ (1 - 0.5167B)(1 - B^{5} )X_{t} = (1 + 0.3306B)(1 - 0.9359B^{5} )\varepsilon_{t} $$
(29)
$$ (1 - 0.5918B)(1 - B^{5} )X_{t} = 2.305 + (1 - 0.9003B^{5} )\varepsilon_{t} $$
(30)

For the seasonal Holt–Winters method, by Minitab 14 statistic software, the appropriate parameters (L, α, β, and γ) for morning peak period are determined 5, 0.15, 0.72, and 0.73, correspondingly; for evening peak period as 5, 0.48, 0.04, and 0.13, correspondingly. For the BPNN model, Matlab 6.5 computing software is employed to implement the forecasting procedure. The number of nodes in the hidden layer is used as a validation parameter of the BPNN model. The most suitable number of hidden nodes of a BPNN model is three.

4.2 SSVRCIA traffic forecasting model

Before conducting the seasonal adjustment for the SSVRCIA model, it is necessary to implement the CIA algorithm to determine suitable values of the three parameters in an SVR model. The parameters of the CIA in the proposed models for both traffic peak periods are experimentally set respectively as shown in Table 1. For the SVRCIA modeling procedure, in the training stage, a rolling-based forecasting procedure was conducted, and, in the validation and testing stage, a 1-h-ahead forecasting policy adopted. Then, several types of data-rolling are considered to forecast traffic flow in the next hour. Different numbers of the traffic flow in a time series were fed into the SVRCIA model to forecast the traffic flow in the next validation period. While training errors improvement occurs, the three kernel parameters, σ, C, and ε of the SVRCIA model adjusted by CIA algorithm, are employed to calculate the validation error. Then, the adjusted parameters with minimum validation error are selected as the most appropriate parameters. Table 2 indicates that SVRCIA models perform the best when 15 and 35 input data are used for morning/evening traffic forecast respectively.

Table 1 CIA’s parameters setting in each peak period
Table 2 Forecasting results and associated parameters of the SVRCIA models

Now the seasonal term is considered. For the morning peak period, there are five peak time points in each cycle, from 6:00 to 10:00. The seasonal indexes for each peak time point are calculated based on the 40 forecasting values of the SVRCIA model in training (30 forecasting values) and validation (10 forecasting values) stages, as shown in Table 3. Similarly in the evening peak period, there are also five peak time points in each cycle, from 16:00 to 20:00. The seasonal indexes for each peak time point are also shown in Table 3.

Table 3 The seasonal indexes for each peak time point

The well-trained models, SARIMA, BPNN, SHW, SVRCIA, and SSVRCIA, are applied to forecast the traffic flow during the morning/evening peak period. Tables 4 and 5 show the actual values and the forecast values obtained using various forecasting models in the morning peak and the evening peak, respectively. The NRMSE values for each peak hour are calculated to compare fairly the proposed models with other alternative models. The proposed SSVRCIA model has smaller NRMSE values than the SARIMA, BPNN, SHW, and SVRCIA models to capture the traffic flow patterns on hourly average basis. Clearly, the seasonal adjustment employed here is proficient in dealing with such cyclic peak data type of forecasting problems.

Table 4 Morning peak period traffic flow forecasting results (unit: EOP)
Table 5 Evening peak period traffic flow forecasting results (unit: EOP)

5 Conclusions

Accurate traffic forecast is crucial for the inter-urban traffic control system, particularly for avoiding congestion and for increasing efficiency of limited traffic resources during peak periods. The historical traffic data of Panchiao City in northern Taiwan show a seasonal fluctuation trend which occurs in many inter-urban traffic systems. Therefore, over-prediction or under-prediction of traffic flow influences the transportation capability of an inter-urban system. This study introduces the application of forecasting techniques, SSVRCIA, to investigate its feasibility for forecasting inter-urban motorway traffic. The experimental results indicate that the SSVRCIA model has better forecasting performance than the SARIMA, BPNN, SHW, and SVRCIA models. The superior performance of the SSVRCIA model is due to the generalization ability of SVR model for forecasting and the proper selection of SVR parameters by CIA and effective seasonal adjustments. In addition, SVR method employs the quadratic programming technique which is based on the assumptions of convex set and existence of global optimum solution. Thus, it should be theoretically approximated to the global optimum solution if superior searching algorithms are employed. In the contrast, SARIMA, SHW models employ the parametric technique which is based on specific assumptions, such as linear relationship between the current value of the underlying variables and previous values of the variable and error terms, and these assumptions are not completely tallied with real world problems.

This investigation is the first to apply the SVR with CIA and seasonal adjustment for forecasting inter-urban motorway traffic flow. Many forecasting methodologies have been proposed to deal with the seasonality of traffic flow. However, most models are time-consuming in verifying the suitable time-phase divisions, particularly when the sample size is large. In this investigation, the SSVRCIA model provides a convenient and valid alternative for traffic flow forecasting. The SSVRCIA model directly uses historical observations from traffic control systems and then determines suitable parameters by efficient optimization algorithms. The next step would be to develop trading strategies to involve other factors and meteorological control variables during peak periods, such as driving speed limitation, important social events, the percentage of heavy vehicles, bottleneck service level, and waiting time during intersection traffic signals can be included in the traffic forecasting model. In addition, even the proposed SSVRCIA model is one of the hybrid forecasting models; some other advanced optimization algorithms for parameters selection can be applied for the SVR model to satisfy the requirement of real-time traffic control systems. The goal of the author is to show that combination of novel techniques is as good as pure techniques.