1 Introduction

In the literature, various approaches have been used for forecasting. Conventional forecasting methods can be insufficient for real-life time series since these approaches need some assumptions to be satisfied [1]. Not requiring any assumption such as linearity and normal distribution in forecasting time series makes artificial neural networks applicable for many fields [4]. In recent years, an effective way to forecast time series has been to utilize artificial neural networks [5]. This approach has been successfully used for forecasting in various implementations [2].

Various neuron models have been proposed in the literature [6, 14, 16, 17, 26]. The most preferred artificial neural network type is multilayer perceptron (MLP) introduced by Rumelhart et al. [18]. When real-life problems are solved using standard artificial neural networks such as MLP, it requires large number of neurons in the architecture [20]. A neuron having higher-order statistics can produce superior neural network with comparatively lesser number of neurons [20]. Thus, higher-order neural networks have been suggested by Chaturvedi et al. [8], Giles and Maxwell [9], Homma and Gupta [11], Sinha et al. [21], Taylor and Commbes [22]. Higher-order neurons have demonstrated improved computational power and generalization ability. However, these models are difficult to train because of a combinatorial explosion of higher-order term as the number of inputs to the neuron increases [20]. In addition, it is a well-known fact that the forecasting performance of MLP is negatively affected when data include outliers [10, 25]. In recent years, some artificial neural networks models which are based on some neuron models such as generalized-mean neuron (GMN) [23], geometric mean neuron (G-MN) [20], and single multiplicative neuron (SMN) [24] have been proposed as an alternative for MLP. Like MLP, these suggested models can also be negatively affected by outliers since aggregation functions in these models are based on mean.

A MNM is firstly introduced in this study, and a new feed forward neural networks approach based on MNM is proposed in order to deal with outlier problem. In the proposed MNM, unlike other neuron models, aggregation function based on median, which is not affected much by outliers, is employed instead of using functions based on summation or mean. It is a fact that unlike other measures of location statistics such as mean, median is not affected much by outliers of a data set. Using a mean-based aggregation function in a neuron will prevent that this neuron produces an extreme output value for an outlier input value. Therefore, MNM-MFF model consists of MNM is a robust multilayer neural network approach that is not affected by outliers. In the training process of MNM-MFF model, it is very hard to obtain derivate of cost function with respect to the weights of the model since median-based aggregation functions are used in the model. That means it is not easy to use back propagation learning algorithm to determine the best values of the weights. Therefore, the modified particle swarm optimization method [3] is utilized to train MNM-MFF model. To assess the forecasting performance of the proposed model MNM-MFF, it was applied to two well-known real-time series which are Australian beer consumption and Box-Jenkins gas furnace. In addition, different data scenarios were considered in the implementation to examine the performance of the proposed approach better. Furthermore, other forecasting models available in the literature were also used for the aim of comparison.

The remaining parts of the paper are organized as follows. The modified particle swarm optimization method that is used as a learning algorithm to train MNM-MFF model is briefly summarized in the next section. MNM is introduced in Sect. 3. In Sect. 4, MNM-MFF model is described and how the modified particle swarm optimization method is employed to train this model is addressed. Section 5 presents the implementation and the obtained results. Finally, the results are discussed in the last section.

2 The modified particle swarm optimization (MPSO)

Particle swarm optimization is a population-based heuristic algorithm, and it was firstly proposed by Kennedy and Eberhart [13]. Distinguishing feature of this heuristic algorithm is that it simultaneously examines different points in different regions of the solution space to obtain the global optimum solution. Local optimum traps can be avoided because of this feature of the method. In this study, MPSO was used to train MNM-MFF model. The detailed information about MPSO method can be found in [3]. The MPSO algorithm has time varying inertia weight like in [19]. In a similar way, this algorithm also has time varying acceleration coefficient like in [15].

Algorithm 1

The modified particle swarm optimization

Step 1

Positions of each kth (k = 1, 2, … , pn) particles are randomly determined and kept in a vector X k given as follows:

$$ X_{k} = \left\{ {x_{k1} ,x_{k2} , \ldots ,x_{kd} } \right\},\quad k = 1,2, \ldots ,pn $$
(1)

where x ki (i = 1,2,…,d) represents ith position of kth particle. pn and d represent the number of particles in a swarm and positions, respectively.

Step 2

Velocities are randomly determined and stored in a vector V k given below.

$$ V_{k} = \left\{ {v_{k1} ,v_{k2} , \ldots ,v_{kd} } \right\},\quad k = 1,2, \ldots ,pn $$
(2)

Step 3

According to the evaluation function, Pbest and Gbest particles given in (3) and (4), respectively, are determined.

$$ Pbest_{k} = \left( {p_{k,1} ,p_{k,2} , \ldots ,p_{k,d} } \right),\quad k = 1,2, \ldots ,pn $$
(3)
$$ Gbest = \left( {p_{g,1} ,p_{g,2} , \ldots ,p_{g,d} } \right) $$
(4)

where Pbest k is a vector stores the positions corresponding to the kth particle’s best individual performance, and Gbest and g represent the best particle and index number of best particle respectively, which has the best evaluation function value, found so far.

Step 4

Let c 1 and c 2 represent cognitive and social coefficients, respectively, and w is the inertia parameter. Let (c 1i , c 1f ), (c 2i , c 2f ), and (w 1, w 2) be the intervals which includes possible values for c 1, c 2 and w, respectively. At each iteration, these parameters are calculated by using the formulas given in (5), (6), and (7).

$$ c_{1} = \left( {c_{1f} - c_{1i} } \right)\frac{t}{\hbox{max} t} + c_{1i} $$
(5)
$$ c_{2} = \left( {c_{2f} - c_{2i} } \right)\frac{t}{\hbox{max} t} + c_{2i} $$
(6)
$$ w = \left( {w_{2} - w_{1} } \right)\frac{\max t - t}{\max t} + w_{1} $$
(7)

where maxt, t, f, and i represent maximum iteration number, current iteration number, final, and initial, respectively.

Step 5

Values of velocities and positions are updated by using the formulas given in (8) and (9), respectively.

$$ v_{i,d}^{t + 1} = \left[ {w \times v_{i,d}^{t} + c_{1} \times {\text{rand}}_{1} \times \left( {p_{i,d} - x_{i,d} } \right) + c_{2} \times {\text{rand}}_{2} \times \left( {p_{g,d} - x_{i,d} } \right)} \right] $$
(8)
$$ x_{i,d}^{t + 1} = x_{i,d} + v_{i,d}^{t + 1} $$
(9)

where rand1 and rand2 are random values from the interval [0 1].

Step 6

Steps 3 to 5 are repeated until a predetermined maximum iteration number (maxt) is reached.

3 Median neuron model

The first artificial neuron model was proposed by McCulloch and Pitts [16]. Then, various neuron models have been proposed in the literature. One of the most preferred types of artificial neural networks is MLP. In each neuron of MLP, the function in (10) has been generally employed as aggregation function.

$$ net\left( {x_{j} ,w_{j} } \right) = \sum\limits_{j = 1}^{N} {w_{j} x_{j} } + w_{0} $$
(10)

where x j , w j , N (j = 0, 1, 2, … ,N) represent input signals, weights, and number of input signals, respectively. And, w 0 is the weight for bias. It is very clear that the neuron model given in (10) is negatively affected by input signals from outliers since this model is based on summation operation. Besides, mean-based neuron models such as GMN and G-MN have same problem since mean is also negatively affected by outliers. In this study, a new neuron model MNM, in which median is utilized as aggregation function, is proposed to deal with outlier problem. It is a well-known fact that median is not affected much by outliers so MNM is not affected by outliers. MNM is illustrated in Fig. 1.

Fig. 1
figure 1

MNM

In Fig. 1, y and f represent output signal of the neuron and activation function, respectively. The bias value is 1 as seen from the figure. net is the activation value obtained from aggregation function. This value is calculated as follows:

$$ net = Median(w_{1} x_{1} ,w_{2} x_{2} , \ldots ,w_{N} x_{N} , w_{0} ) $$
(11)

where x j , w j , N (j = 0,1,2, … ,N) represent input signals, weights, and number of input signals, respectively. And, w 0 is the weight for bias.

4 Multilayer feed forward network with median neuron model

MNM-MFF model proposed in this study is a multilayer feed forward neural network model. This neural network model is composed of MNM. Architecture structure of the proposed MNM-MFF model that has N and M neurons in input and hidden layers, respectively, is shown in Fig. 2. As seen from this figure, input and output vectors of the model are \( X = \left[ {x_{1} ,x_{2} , \ldots ,x_{N} } \right] \) and [y], respectively.

Fig. 2
figure 2

The architecture of MNM-MFF

If \( w_{{h_{ij} }} \) is a weight that connects the ith hidden neuron with jth input, the activation value of the ith hidden neuron can be given as

$$ net_{{h_{i} }} = Median\left( {w_{{h_{i1} }} x_{1} ,w_{{h_{i2} }} x_{2} , \ldots ,w_{{h_{iN} }} x_{N} , w_{{h_{i0} }} } \right),\quad i = 1,2, \ldots ,M $$
(12)

where \( w_{{h_{i0} }} \) is the weight for the bias of the ith hidden neuron. The nonlinear transformation performed by each of M neurons in the network is given as

$$ y_{{h_{i} }} = f\left( {net_{{h_{i} }} } \right),\quad i = 1,2, \ldots ,M $$
(13)

where f denotes a sigmoid function. Similarly, the output of neuron in the output layer can be given as (15).

$$ net = Median\left( {w_{{o_{1} }} y_{{h_{1} }} ,w_{{o_{2} }} y_{{h_{2} }} , \ldots ,w_{{o_{M} }} y_{{h_{M} }} , w_{{o_{0} }} } \right) $$
(14)
$$ y = f(net) $$
(15)

where \( w_{{o_{i} }} \) is the weight that connects the ith neuron of hidden layer to neuron of the output layer, and \( w_{{o_{0} }} \) is the weight for the bias to corresponding output layer neuron.

To train MNM-MFF whose architecture structure is shown in Fig. 2, MPSO method which is presented in Sect. 2 is utilized. A particle in MPSO consists of positions which are weights of MNM-MFF. A particle is presented in Fig. 3.

Fig. 3
figure 3

Structure of a particle in MPSO

In MPSO algorithm, mean square error (MSE) is employed as evaluation function. The formula in (16) can be used to calculate MSE value.

$$ {\text{MSE}} = \frac{1}{n}\sum\limits_{t = 1}^{n} {({\text{output}}_{t} - {\text{target}}_{t} )^{2} } $$
(16)

where n represents the number of learning samples. The algorithm of MPSO approach used in training process of MNM-MFF can be given as follows:

Algorithm 2

The modified particle swarm optimization to train MNM-MFF

Step 1

The parameters of the method are determined.

In the first step, the parameters which direct the MPSO algorithm are determined. These parameters are pn, vm, c 1i , c 1f , c 2i , c 2f , w 1, and w 2 that were given in Sect. 2. Where −vm, vm show that minimum and maximum velocities size.

Step 2

Initial values of positions and velocities are determined.

The initial positions and velocities of each particle in a swarm are randomly generated from uniform distribution (0,1) and (−vm, vm), respectively.

Step 3

Evaluation function values are computed.

Evaluation function values for each particle are calculated. Evaluation function is MSE whose formula given in (16).

Step 4

Pbest k (k = 1, 2, … , pn) and Gbest are determined due to evaluation function values calculated in the previous step.

Pbest k is a vector stores the positions corresponding to the kth particle’s best individual performance, and Gbest is the best particle, which has the best evaluation function value, found so far.

Step 5

The parameters are updated.

The updated values of cognitive coefficient c 1, social coefficient c 2, and inertia parameter w are calculated using the formulas given in (5), (6), and (7).

Step 6

New values of positions and velocities are calculated.

New values of positions and velocities for each particle are computed by using the formulas given in (8) and (9). If maximum iteration number is reached, the algorithm goes to Step 3; otherwise, it goes to Step 7.

Step 7

The optimal solution is determined.

The elements of Gbest are taken as the optimal weight values of the MNM-MFF.

5 The application

In order to evaluate the forecasting performance of MNM-MFF model, it was applied to two well-known real-time series which are Australian beer consumption and Box-Jenkins gas furnace. When MNM-MFF and other neural networks models were used for forecasting, lagged variables of the time series were taken as inputs of models.

5.1 Australian beer consumption data

The first time series is quarterly Australian beer consumption [12] between 1956 Q1 and 1994 Q1. The graph of this time series is shown in Fig. 4. The last 16 observations of the time series are used for test set. For Australian beer consumption, three data scenarios were examined to assess the performance of the proposed approach in case of the data have outliers. In the first case, seasonal autoregressive integrated moving average (SARIMA), Winter’s multiplicative exponential smoothing (WMES), MLP, and MNM-MFF methods were applied to original Australian beer consumption data set between 1956 Q1 and 1994 Q1. In the second case, an outlier value was obtained by multiplying the maximum observation value of the original data by 5, and then, this outlier was added to data. In the last scenarios, outlier value was obtained by multiplying the maximum observation value of the original data by 10, and then this outlier was added to data.

Fig. 4
figure 4

1956 Q1—1994 Q1 quarterly Australian beer consumption

Case 1

Original Australian beer consumption data are forecasted by using SARIMA, WMES, MLP, and the proposed model MNM-MFF. When both MLP and MNM-MFF were employed, the number of input was changed between 1 and 4 since the period of this time series is 4. And, for both methods, the number of neuron in hidden layer was changed from 1 to 8. Thus, 32 different architectures were totally examined for each neural networks method to determine the best architecture. The obtained best architectures can be seen in Table 1. The best models and corresponding MSE values obtained from all methods are also summarized in Table 1.

Table 1 The MSE values calculated over the test set for Case 1

When Table 1 is examined, it is seen that artificial neural networks methods MLP, PSO-MLP, and MNM-MFF produced better forecasts than those obtained from other conventional methods in terms of MSE. In addition, it is observed that the proposed MNM-MFF method has the best forecasting accuracy. When the data are forecasted with the proposed method, 4-1-1 was found as the best architecture. That is, the best architecture has 4, 1, and 1 neuron in input, hidden, and output layers, respectively.

To examine the performance of MNM-MFF better, MNM-MFF, PSO-MLP, and MLP were compared in more details. For training and test sets, the results obtained from 16 architectures are given in Tables 2, 3 and 4. Mean and standard deviation of MSE values calculated from 16 architectures are presented in Table 5.

Table 2 The training and testing MSE values are obtained from different architectures for MLP
Table 3 The training and testing MSE values are obtained from different architectures for PSO-MLP
Table 4 The training and testing MSE values are obtained from different architectures for MNM-MFF
Table 5 The training and testing performances of MLP, PSO-MLP, and MNM-MFF for Case 1

As seen from Table 5, for both training and test sets, mean values of errors produced by MNM-MFF are less than those obtained from MLP and PSO-MLP. Furthermore, MNM-MFF has minimum standard deviation values for both training and test sets. In other words, MNM-MFF and PSO-MLP give more consistent results than those obtained from MLP. Therefore, it is clear that MNM-MFF produces the most accurate forecasts for Australian beer consumption data. The forecasts of test set for MLP, PSO-MLP, and MNM-MFF are graphically shown in Fig. 5.

Fig. 5
figure 5

The prediction results of MLP, PSO-MLP, and proposed method for Case 1

In addition to RMSE values and forecasting graph, scatter plot of forecasts obtained from the proposed method and observations is depicted in Fig. 6.

Fig. 6
figure 6

Scatter plot of forecasts and observations for Case 1

Case 2

In Case 1, it is seen that MNM-MFF has a superior forecasting performance. In the second case, performances of MLP,PSO-MLP, and MNM-MFF methods were compared to each other when the data include an outlier. An outlier value was obtained by multiplying the maximum observation value of the original data by 5. Then, 15th observation of the original data, which is the maximum observation value, was changed by this outlier. The obtained data set was forecasted using MLP, PSO-MLP, and MNM-MFF by using same settings in Case 1. All obtained results are shown in Tables 6, 7, 8, and 9.

Table 6 The training and testing MSE values are obtained from different architectures for MLP
Table 7 The training and testing MSE values are obtained from different architectures for PSO-MLP
Table 8 The training and testing MSE values are obtained from different architectures for MNM-MFF
Table 9 The training and testing performances of MLP, PSO-MLP, and MNM-MFF for Case 2

According to Table 9, the mean error and standard deviation values obtained from MNM-MFF are less than those produced by MLP and PSO-MLP for both training and test sets. In Case 2, MNM-MFF produces more accurate and consistent forecasting results. In addition, if Tables 5 and 9 are compared, it is seen that MLP and PSO-MLP produce worse forecasting results when the data have an outlier. On the other hand, MNM-MFF gives almost same forecasting results even if the data include an outlier especially for test set. In other words, the proposed approach is not affected by outlier as much as MLP and PSO-MLP methods. The forecasts of test set for MLP, PSO-MLP, and MNM-MFF are graphically shown in Fig. 7.

Fig. 7
figure 7

The prediction results of MLP and proposed method for Case 2

Also, scatter plot of forecasts obtained from the proposed method, and observations can be shown in Fig. 8.

Fig. 8
figure 8

Scatter plot of forecasts and observations for Case 2

Case 3

In the last case, an outlier value was obtained by multiplying the maximum observation value, which is 15th observation of the original data, by 10. Then, the observation which has the maximum value was changed by this outlier. MLP, PSO-MLP, and MNM-MFF methods were applied to the obtained data set by using same settings in Case 1. All obtained results are presented in Tables 10, 11, 12, and 13.

Table 10 The training and testing MSE values are obtained from different architectures for MLP
Table 11 The training and testing MSE values are obtained from different architectures for PSO-MLP
Table 12 The training and testing MSE values are obtained from different architectures for MNM-MFF
Table 13 The training and testing performances of MLP, MLP-PSO, and MNM-MFF for Case 3

When Table 13 is examined, it is observed that MNM-MFF has the minimum mean error and standard deviation for both training and test sets. Thus, in Case 3, MNM-MFF again gives the most accurate and consistent forecasting results. In light of the results obtained in all cases, it can be clearly said that the proposed MNM-MFF model is not affected by outlier as much as MLP model. For both training and test sets, the results produced by MLP and PSO-MLP methods are inaccurate and inconsistent in Cases 2 and 3 where the data have an outlier. However, MNM-MFF is not affected by these extreme values as much as MLP and PSO-MLP. Besides, MNM-MFF produces very similar forecasting results especially for test set whether or not the data have an outlier.

In order to compare the forecasting performances of the methods MLP, PSO-MLP, and MNM-MFF better, averages for MSE values for test sets are also summarized in Table 14 for all cases. According to Table 14, it is clearly seen that MNM-MFF produces more accurate out of sample forecasts than those produced by MLP and PSO-MLP in all cases. In Cases 2 and 3 where the data have an outlier, the proposed MNM-MFF model gives accurate forecasts like in Case 1. On the other hand, MLP and PSO-MLP models cannot produce good results in Cases 2 and 3. This is an important finding indicates that MNM-MFF model is not affected by outlier as much as MLP and PSO-MLP models for Australian beer consumption data. The forecasts of test set for MLP, PSO-MLP, and MNM-MFF are graphically shown in Fig. 9.

Table 14 Average values obtained from MLP, MLP-PSO, and MNM-MFF for all cases
Fig. 9
figure 9

The prediction results of MLP and proposed method for Case 3

In addition, scatter plot of forecasts obtained from the proposed method, and observations are given in Fig. 10.

Fig. 10
figure 10

Scatter plot of forecasts and observations for Case 3

5.2 The gas furnace data

The second time series used in the implementation is Box-Jenkins gas furnace data set [7]. In gas furnace data, gas flow rate x(t) and CO2 concentration y(t) are input and output, respectively. Because of the characteristic of this well-known data set, when artificial neural networks have been applied to the data, x(t − 4) and y(t − 1) have been taken as inputs, while y(t) has been taken as output in all studies in the literature. The first 146 and the last 150 observations were used for training and test, respectively, like in other studies [24, 27]. To show the forecasting performance of the proposed MNM-MFF model, the data were forecasted using MNM-MFF and other artificial neural networks models available in the literature. When artificial neural networks have been utilized, the architecture 2-2-1 has been used in all these studies [24, 27]. That is, the architecture has 2, 2, and 1 neuron in input, hidden, and output layers, respectively. Thus, the same architecture 2-2-1 was used for the proposed MNM-MFF model in the implementation. MSE values calculated over both the training and the test sets are summarized in Table 15. The results obtained from back propagation single multiplicative neuron model (BP-SMN) and MLP were taken from [24]. The other results produced by particle swarm optimization single multiplicative neuron model (PSO-SMN), cooperative random learning particle swarm optimization single multiplicative neuron model (CRPSO-SMN), and genetic algorithm single multiplicative neuron model (GA-SMN) were taken from [27].

Table 15 MSE values obtained from all methods for training and test sets

When Table 15 is examined, it is clearly seen that the best forecasts for the test set were obtained when the proposed MNM-MFF model was employed. Also, the proposed model produced the best result for training set in terms of MSE. To examine the results visually, the graph of the observations (targets) and predictions (outputs) produced by MNM-MFF is given in Fig. 11. According to Fig. 11, it is obvious that the proposed MNM-MFF model gives very accurate results for gas furnace data. The agreement between the observations and the predictions of the proposed model is quite satisfactory.

Fig. 11
figure 11

The prediction results of MNM-MFF for the gas furnace data

In the first case, the gas furnace data were forecasted and it was seen that MNM-MFF has a superior forecasting performance. In the second case, performances of MLP, PSO-MLP, and MNM-MFF methods were compared to each other when the data include an outlier. An outlier value was obtained by multiplying the maximum observation value of the original data by 10. Then, 15th observation of the original data, which is the maximum observation value, was changed by this outlier. The obtained data set was forecasted using MLP and MNM-MFF by utilizing the same architecture in the first case. All obtained results are shown in Table 16. According to Table 16, the proposed MNM-MFF has the best accuracy for both training and test sets. This is also indicates that MNM-MFF model was not affected much by the outlier. However, the outlier leaded MLP model to misleading results.

Table 16 MSE values obtained from MLP, PSO-MLP, and MNM-MFF

In addition to RMSE values and forecasting graph, scatter plot of forecasts obtained from the proposed method and observations is depicted in Fig. 12.

Fig. 12
figure 12

Scatter plot of forecasts and observations for the gas furnace data

6 Conclusions

In this study, a new neuron model which is called MNM is proposed. The proposed neuron model can produce an output which is not affected extreme values since it employs a median-based aggregation function. In addition, a new multilayer feed forward neural network (MNM-MFF) model that consists of MNM is firstly proposed in this study in order to reach high accuracy level and to cope with outlier problem. The proposed MNM-MFF model is a robust neural network model since its ability to deal with the outlier problem. Also, the modified particle swarm optimization method is used to train the proposed MNM-MFF model. In order to evaluate the performance of the proposed approach, it was applied to two real-time series. These time series were also forecasted using some other methods available in the literature for comparison. As a result of the comparison, it was clearly seen that the proposed approach produces very accurate forecasts for both Australian beer consumption and the gas furnace data sets. In addition to this, different data scenarios were considered in the implementation to examine the performance of the proposed approach when the data have outlier. Then, the forecasts obtained from the proposed approach were compared to those produced by MLP model, which has been the most preferred type of artificial neural networks in many implementations. It was shown that the proposed model is not affected much by outliers. To sum up, the proposed MNM-MFF model composed of MNM provides two important advantages. First, the proposed approach can produce very accurate forecasts. Secondly, it can be used to forecasts time series which include outliers. It should be noted that these results are obtained for the parameter sets given above, and two time series examined in the study. For instance, if the length of test set is shifted, the results can change or similarly if these parameter sets are used for other time series, the obtained results can change. Therefore, the obtained results are valid for only these parameter sets and these time series. In order to reach general results, a comprehensive simulation study has to be made. However, it is very hard to perform such a simulation study since there are many types of time series and many parameter combinations.