Abstract
Datasets with outliers can be predicted with robust learning methods or robust artificial neural networks. In robust artificial neural networks, the architectures become robust by using robust statistics as aggregation functions. Median neural network and trimmed mean neural network are two robust artificial neural networks used in the literature. In these robust artificial neural networks, median and trimmed mean statistics are used as aggregation functions. In this study, Median-Pi artificial neural network is proposed as a new robust neural network for the purpose of forecasting. In Median-Pi artificial neural network, median and multiplicative functions are used as aggregation functions. Because of using median, the proposed network can produce good results for data with outliers. The Median-Pi artificial neural network is trained by particle swarm optimization. The performance of the neural network is investigated by using datasets from the International Time Series Forecast Competition 2016 (CIF-2016). The performance of the proposed method in case of outlier is compared to some other artificial neural networks. Median neural network, trimmed mean neural network, Pi-Sigma neural network and the proposed robust network are applied to time series with outlier, and the obtained results are compared. According to application results, the proposed Median-Pi artificial neural network can produce better forecast results than the other network types.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Multilayer perceptron artificial neural networks (MLPANNs) can be adversely affected by outliers. Because of complex composition of inputs, an outlier as an input can cause too small or too large output. If the dataset has outlier(s), the training algorithm should be a robust algorithm or a robust artificial neural network (ANN) should be used to obtain satisfactory forecasting results.
In the literature, Chen and Jain [1] and Hsiao et al. [2] proposed a robust learning algorithm based on M-estimator. Lee et al. [3] presented a robust learning algorithm for radial basis neural networks. El Melegy et al. [4] and Rusuecki [5] put forward least median squares algorithm for ANNs. Thomas [6] proposed a robust learning algorithm for MLPANNs. Some robust ANNs were also proposed in the literature. Bors and Pitas [7] proposed median radial basis function ANN for datasets with outlier(s). Majhi et al. [8] proposed Wilcoxon ANN as a robust neural network (NN). Aladag et al. [9] introduced median neural network. In this study, network uses median of incoming signals as an aggregation functions. Yolcu et al. [10] proposed trimmed ANN.
ANNs with multiplicative neuron model are admissible alternatives to MLPANNs for forecasting aim. Ghosh and Shin [11] proposed Pi-Sigma neural network (PSNN) as a high-order neural network. Yadav et al. [12] put forward single neuron multiplicative neuron model artificial neural network (SMNMANN). These ANNs can produce more accurate forecast results than others. There are also different neuron models in the literature. Chen et al. [13] and Zhou et al. [14] used dendritic neuron model. Attia et al. [15] proposed generalized neuron model, Aladag et al. [9] proposed median neuron model and Yolcu et al. [10] proposed trimmed mean neuron model.
Outlier(s) can affect the outputs of the NN more than NNs with additive neuron model when the multiplicative neuron model (MNM) is used in the architecture. Bas et al. [16] proposed a robust training algorithm for SMNMANN. While in some studies various heuristic-based approaches have been used to forecast the datasets from different areas such as the study proposed by Barati and Sharifian [17], different types of NN are taken advantage of in other studies put forward by Berenguer et al. [18] and Haviluddin [19]. Moreover, Chow and Cho [20] described the development of new approaches to rainfall forecasting using NN. Cogollo and Velásquez [21] analysed the development of new forecasting models based on NNs. Thomas and Suhner [22] proposed a new pruning approach to determine optimal structure of NN. Beheshti et al. [23] used some meta-heuristic algorithms to train an ANN for improving the accuracy of rainfall forecasting. Dey et al. [24] proposed an approach based on both gene expression and programming and NNs to forecast unsteady forced convection over a cylinder. Kiakojoori and Khorasani [25] dwell on the problem health monitoring and prognosis of aircraft gas tribune engines by using two different dynamic NNs, and Li [26], to predict traffic flow, presented a new prediction tool combining the NN and fuzzy system called as dynamic fuzzy NNs.
In this study, a new robust artificial neural network is proposed for the purpose of forecasting. This new robust artificial neural network uses MNM and median neuron model (MdNM) in the architecture. The proposed new network is called as Median-Pi artificial neural network (MdPNN). Moreover, MdPNN is trained by particle swarm optimization (PSO). The proposed network can be used for the purpose of forecasting and also it can be modified for various purposes such as classification and prediction problem.
In the second section of the paper, the proposed MdPNN is introduced and an algorithm is given to explain how to compute an output for a learning sample. In the third section, the training algorithm is introduced for the proposed network model. Application results are given in Section 4 and the obtained results are discussed in the last section.
2 Median-Pi artificial neural network
Robust architectures for ANNs can be provided by robust statistics such as median and trimmed mean. In this paper, Pi-sigma neural network is modified by using MdNM instead of additive (sigma) neuron model. The proposed ANN is a high-order network and it is less affected by outlier(s) in a dataset under favour of median neuron model used in hidden layer. Figure 1 shows the architecture of the proposed MdPNN.
In Fig. 1, M and Π represent MdNN and MNM, respectively. The architecture given in Fig. 1 represents MdPNN structure with kth order and m-input. The inputs of the MdPNN are composed of lagged variables of time series. W represents weights between inputs and hidden layers and it is a matrix with m × k dimension. There are k neurons in the hidden layer and k represents the order of the network. In the hidden layer of the NN, median neuron models are employed. In a hidden layer’s neuron of MdNN, median of incoming signals constitutes the output of neuron. Let (s1 , s2 , … , sm) be incoming signals and yl(l = 1, 2, ⋯ , k) be output of the lth neuron and lth be the output of MdNN can be represented as in Eq. (2).
In the hidden layer neurons, the activation functions are linear and θ1 , θ2 , … , θk are bias terms. In the output layer, there is only a single neuron and MNM is used in this layer. The weights between hidden layer and output layer are taken as one and bias term is taken as zero. The output of MdPNN is obtained as in Eq. (3). The activation function is sigmoid activation function in the output layer and the output of MdPNN is calculated as in Eq. (4).
An algorithm is given below for the explanation of how to compute an output for a learning sample. In the algorithm, the input values of learning sample are represented by x1 , x2 , … , xm.
Algorithm 1
Computation of output for MdPNN
-
Step 1
Outputs for the hidden layer neurons are computed by using incoming signals as below.
-
Step 2
Output of the network is computed by using outputs of hidden layer and sigmoid activation function.
3 Training of MdPNN by particle swarm optimization
Derivative-based algorithms have been commonly used to train ANNs in the literature. Back propagation algorithms are one of the most preferred ones for MLPNNs. Artificial intelligence optimization techniques have been also used for the training of NNs and they have important advantages such as working without derivatives and having capability of not falling trap of local optimum. The training algorithm based on PSO is proposed for the MdPNN. PSO is an artificial intelligence optimization technique and it can provide good results for numerical optimization problems. In PSO, there is no need to compute the derivate of cost function. PSO was proposed by Kennedy and Eberhart [27]. Shi and Eberhart [28] and Ma et al. [29] made some modifications on the algorithm. The training of the proposed network is performed by using modified PSO. Because median function uses order of dataset and it is not easy to compute derivatives of median function. PSO is feasible for this kind of objective functions. Besides, genetic algorithm and artificial bee colony can be used instead of PSO but we preferred PSO because of its simple structure.
The training algorithm for the proposed network is given as in Algorithm 2. In Algorithm 2, PSO method which has some minor modifications of the modified PSO in Aladag et al. [9] is explained.
Algorithm 2
Training Algorithm for MdPNN
-
Step 1
The parameters of the PSO are determined.
- pn :
-
Number of particles
- vmaps :
-
Upper bound for the velocities
- c 1 i :
-
Lower bound for cognitive coefficient
- c 1 f :
-
Upper bound for cognitive coefficient
- c 2 i :
-
Lower bound for social coefficient
- c 2 f :
-
Upper bound for social coefficient
- w 1 :
-
Lower bound for the inertia weight
- w 2 :
-
Upper bound for the inertia weight
- maxitr :
-
Maximum number of iterations
-
Step 2
Initial positions and velocities of particles are generated.
The positions of particles are composed of weights and biases of MdPNN. In Fig. 2, the structure of a particle is presented.
There are totally (k × m + k) positions in particle. The first k × m positions represent the weights between input and hidden layer neurons; the last k positions represent biases for hidden layer neurons. All initial values for positions are generated from uniform distribution with (0, 1) parameters. The position j and velocity j for the particle i are demonstrated with \( {P}_{i, j}^t \) and \( {V}_{i, j}^t \) (i = 1 , 2 , … , pn ; j = 1 , 2 , … , k × m + k), respectively. Velocities are generated from uniform distribution with (−vmaps, vmaps) parameters.
-
Step 3
Fitness values for each particle are calculated.
To calculate the outputs of the network, Algorithm 1 is applied for each learning sample in the training set. Outputs and target values are represented by \( {\widehat{x}}_t \) and xt, respectively. Root mean squared error (RMSE) given in Eq. (7) for training set is preferred as fitness function.
where \( {\widehat{x}}_t^i\kern0.5em \) represents the output of the network for the time t from particle i and negt is the number of learning sample in the training set.
-
Step 4
Pbest and Gbest are determined.
Pbest is a matrix whose elements are the positions corresponding to the particles’ best individual performance and Gbest is the best particle, which has the best fitness function value, found so far. In the first iteration, Pbest is same as initial positions of particles and Gbest is the best particle.
- \( {Pb}_{i, j}^t \) :
-
Pbest value for ith particle, jth position in tth iteration.
- \( {Pg}_j^t \) :
-
Gbest value for jth position in tth iteration.
-
Step 5
The velocities and positions are updated.
Firstly, cognitive and social coefficients and inertia weight values are calculated by using Eqs. 8–10.
Secondly, velocities and positions are updated by using Eqs. 11–13.
-
Step 6
Fitness values for each particle are calculated.
This step is applied like in Step 3.
-
Step 7
Pbest and Gbest are updated.
-
Step 8
Stopping criteria is checked.
The algorithm is stopped if it is reached to the maximum iteration number or if RMSE error criterion value for Gbest is smaller than predetermined threshold value. Otherwise, go back to Step 5.
4 Application results
The forecasting performance of the proposed network was firstly investigated on real time series datasets from CIF-2016. There are 72 time series data in CIF-2016. These time series have different numbers of observations and they are observed monthly. The first 20 time series used in this paper have 108 observations and seasonal component. The graphs of time series are given in Figs. 3, 4, 5, 6 and 7.
Observation dates of time series cannot be given in these figures because they are not declared in CIF-2016. It is clear that all-time series have different properties. Linear trend, upward trend, downward trend, quadratic trend, seasonality and structural breaks can be seen from Figs. 3, 4, 5, 6 and 7.
The results of the proposed MdPNN were compared with MdANN, TrMANN and PSNN because the algorithm of the proposed NN model and other methods are robust methods. Besides, 20 contaminated time series were analysed by adding outliers. In contamination process, five times or 10 times of maximum observation were added to time series. After these contamination processes, all time series were analysed by using MP-ANN, M-ANN, Tr-MANN and PSNN. All networks were trained by using PSO. The parameters of PSO were taken as pn = 30, vmaps = 1, c1i = 2 , c1f = 1, c2i = 1, c2f = 2, w1=0.4, w2=0.9, maxitr = 200. The first 96 observations were used as training set, and the last 12 observations were taken as test set in all applications for all datasets.
The inputs of neural networks were taken as (xt − 1, xt − 2) or (xt − 1, xt − 2, xt − 3, xt − 4) or (xt − 1, xt − 2, xt − 3, xt − 4, xt − 5, xt − 6, xt − 7, xt − 8, xt − 9, xt − 10, xt − 11, xt − 12). This selection of inputs means that m was taken as 2, 4 and 12. The order of the networks (number of hidden layer neurons) was taken as 2. As a result of this experiment design, there are six possible situations for all datasets and this is given in Table 1.
For each time series, NNs were trained 50 times by using random initial weights in all cases. The RMSE criterion values were computed for test sets. The means and minimum of RMSE values for 50 repetitions were given in Tables 2 and 3 for all-time series. The best results, in terms of mean and minimum statistics, were highlighted by bold style for five and 10 times outlier in Tables 2 and 3. Moreover, the success rates of the models in Tables 2 and 3 were summarized in Tables 4 and 5. The detailed tables are given in supplementary files.
Table 4 presents the success rates with regard to mean statistic. In this table, it is seen that MdPNN has the best performance for 17 out of 20 time series (85% success rate) in case of five times outlier. The proposed model has also the best performance for nine out of 20 time series (45% success rate) in case of 10 times outlier. Moreover, Table 5 presents success rates with regard to minimum statistics. This table shows that MdPNN has the best performance for 19 out of 20 time series (95% success rate) in case of five times outlier, and the proposed model has also the best performance for 13 out of 20 time series (65% success rate) in case of 10 times outlier. The visual views of these figures are given in Figs. 8 and 9.
Kruskal Walis H-test was applied as a statistical test according to minimum statistics. The statistical results obtained from this test were given in Tables 6 and 7. In Table 6, it is clear that the probability value is smallerler than 0.05 and also the probability value is smaller than 0.10 in Table 7. So, there are differences among the applied methods in each case. Moreover, the proposed method has minimum median value.
Secondly, Australian beer consumption (Janacek [30], pp.84) between 1956 Q1 and 1994 Q1 was used to examine the performance of proposed method. This time series is a good benchmark for time series forecasting in the literature. The last 16 observations of the time series were used for test set. The graph of time series is given in Fig. 10.
Australian beer consumption time series data was contaminated by adding an outlier (five times of maximum value) to the training data. The contaminated time series data was analysed by proposed network, and the results compared the results of other methods which were taken from Bas et al. [16]. The summarized RMSE values were given in Table 8.
In Table 8, MNM-BP-ANN is the back propagation learning algorithm based on SMNMANN, MNM-PSO-ANN is SMNMANN based on PSO and R-MNM-ANN is robust SMNMANN. The best result was obtained from the proposed MdPNN.
5 Discussion and conclusions
Several kinds of NN, for many years, have been successfully used for time series forecasting problems. Nevertheless, there are some issues that need to be solved about them. One of them is how the performance of the models will be affected when the sets have outlier(s). In this paper, a new robust artificial neural network is introduced for forecasting of time series. Moreover, the proposed NN model is a high-order neural network as well as its robust architecture. In MdPNN, median neuron model and multiplicative neuron model are collaborated. According to application results, it can be said that MdPNN provides more successful forecasting performance than the other robust NNs in the literature. Particularly, the performance of the MdPNN is better than other methods in case of smaller outlier (five times of maximum value). When it comes to the outlier obtained by injecting 10 times of maximum observation to datasets, the performance of MdPNN goes down to 45 and 65% and it has still the best success rates.
The performance of the ANNs can be increased by using different robust statistics. From this point of view, in the future studies, different robust statistics such as trimmed mean can be used to modify the proposed neural network. Moreover, the training algorithm of the proposed method can be modified for a large number of outliers.
References
Chen DS, Jain RC (1994) A robust backpropagation learning algorithm for function approximation. IEEE Trans Neural Netw 5:467–479. doi:10.1109/72.286917
Hsiao C-C, Chuang C-C, Jeng J-T (2012) Robust back propagation learning algorithm based on near sets. In: 2012 Int. Conf. Syst. Sci. Eng. (ICSSE),June 30–July 2, 2012, Dalian, China. pp 19–23
Lee CC, Chung PC, Tsai JR, Chang CI (1999) Robust radial basis function neural networks. IEEE Trans Syst Man, Cybern Part B Cybern 29:674–685. doi:10.1109/3477.809023
El-Melegy MT, Essai MH, Ali AA (2009) Robust training of artificial feedforward neural networks. In: Found. Comput. Intell. Vol. 1 Vol. 201 Ser. Stud. Comput. Intell. pp 217–242
Rusiecki A (2012) Robust learning algorithm based on iterative least median of squares. Neural Process Lett 36:145–160
Thomas P, Bloch G, Sirou F, Eustache V (1999) Neural modeling of an induction furnace using robust learning criteria. (I. Press, Éd.) Integrated Computer-Aided Engineering 6(1):15–26
Bors AG, Pitas I (1996) Median radial basis function neural network. IEEE Trans Neural Netw. doi:10.1109/72.548164
Majhi B, Rout M, Majhi R et al (2012) New robust forecasting models for exchange rates prediction. Expert Syst Appl 39:12658–12670. doi:10.1016/j.eswa.2012.05.017
Aladag CH, Egrioglu E, Yolcu U (2014) Robust multilayer neural network based on median neuron model. Neural Comput Appl. doi:10.1007/s00521-012-1315-5
Yolcu U, Bas E, Egrioglu E, Aladag CH (2015) A new multilayer feed forward network model based on trimmed mean neuron model. Neural Netw World J 25:587–602
Ghosh J, Shin Y (1992) Efficient higher-order neural networks for classification and function approximation. Int J Neural Syst 3:323–350
Yadav RN, Kalra PK, John J (2007) Time series prediction with single multiplicative neuron model. Appl Soft Comput J 7:1157–1163. doi:10.1016/j.asoc.2006.01.003
Chen W, Sun J, Gao S, Cheng J-J, Wang J, Todo Y (2017) Using a single dendritic neuron to forecast tourist arrivals to Japan. IEICE Trans Inf Syst E100D(1):190–202
Zhou T, Ga S, Wang J, Chu C, Todo Y, Tang Z (2016) Financial time series prediction using a dendritic neuron model. Knowl-Based Syst 105:214–224
Attia MA, Sallam EA, Fahmy MM. (2012) A proposed generalized mean single multiplicative neuron model. Proceedings—2012 I.E. 8th International Conference on Intelligent Computer Communication and Processing, ICCP 2012, art. no. 6356163, pp. 73–78
Bas E, Uslu VR, Egrioglu E (2016) Robust learning algorithm for multiplicative neuron model artificial neural networks. Expert Syst Appl 56:80–88. doi:10.1016/j.eswa.2016.02.051
Barati M, Sharifian S (2015) A hybrid heuristic-based tuned support vector regression model for cloud load prediction. J Supercomput 71:4235–4259. doi:10.1007/s11227-015-1520-y
Berenguer TM, Berenguer JAM, García MEB et al (2015) Models of artificial neural networks applied to demand forecasting in nonconsolidated tourist destinations. Methodol Eur J Res Methods Behav Soc Sci 11:35–44. doi:10.1027/1614-2241/a000088
Haviluddin H (2015) Time series prediction using radial basis function neural network. Int J Electr Comput Eng 4:31–37
Chow TWS, Cho SY (1997) Development of a recurrent Sigma-Pi neural network rainfall forecasting system in Hong Kong. Neural Comput Appl 5:66–75. doi:10.1007/BF01501172
Cogollo MR, Velásquez JD (2014) Methodological advances in artificial neural networks for time series forecasting. IEEE Lat Am Trans 12:764–771. doi:10.1109/TLA.2014.6868881
Thomas P, Suhner M-C (2015) A new multilayer perceptron pruning algorithm for classification and regression applications. Neural Process Lett 42:437–458. doi:10.1007/s11063-014-9366-5
Beheshti Z, Firouzi M, Shamsuddin SM et al (2016) A new rainfall forecasting model using the CAPSO algorithm and an artificial neural network. Neural Comput Appl 27:2551–2565. doi:10.1007/s00521-015-2024-7
Dey P, Sarkar A, Das AK (2016) Development of GEP and ANN model to predict the unsteady forced convection over a cylinder. Neural Comput Appl 27:2537–2549. doi:10.1007/s00521-015-2023-8
Kiakojoori S, Khorasani K (2016) Dynamic neural networks for gas turbine engine degradation prediction, health monitoring and prognosis. Neural Comput Appl 27:2157–2192. doi:10.1007/s00521-015-1990-0
Li H (2016) Research on prediction of traffic flow based on dynamic fuzzy neural networks. Neural Comput Appl 27:1969–1980. doi:10.1007/s00521-015-1991-z
Kennedy J, Eberhart R, Coello CAC, et al (1995) Particle swarm optimization. In: Neural Networks, 1995. Proceedings., IEEE Int. Conf. pp 1942–1948 vol.4
Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. Evol Comput 1999 CEC 99 Proc 1999 Congr 3:1–1950 Vol. 3. doi: 10.1109/CEC.1999.785511
Yuchao M, Chuanwen J, Zhijian H, Chenming W (2006) The formulation of the optimal strategies for the electricity producers based on the particle swarm optimization algorithm. Power Syst IEEE Trans 21:1663–1671
Janacek G (2001) Practical time series. Oxford University Press Inc., Newyork, p 156
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
ESM 1
(DOCX 19 kb)
ESM 2
(DOCX 19 kb)
ESM 3
(DOCX 19 kb)
ESM 4
(DOCX 19 kb)
ESM 5
(DOCX 19 kb)
ESM 6
(DOCX 19 kb)
ESM 7
(DOCX 19 kb)
ESM 8
(DOCX 19 kb)
ESM 9
(DOCX 19 kb)
ESM 10
(DOCX 19 kb)
ESM 11
(DOCX 19 kb)
ESM 12
(DOCX 19 kb)
ESM 13
(DOCX 19 kb)
ESM 14
(DOCX 19 kb)
ESM 15
(DOCX 19 kb)
ESM 16
(DOCX 19 kb)
ESM 17
(DOCX 19 kb)
ESM 18
(DOCX 19 kb)
ESM 19
(DOCX 19 kb)
ESM 20
(DOCX 19 kb)
Rights and permissions
About this article
Cite this article
Egrioglu, E., Yolcu, U., Bas, E. et al. Median-Pi artificial neural network for forecasting. Neural Comput & Applic 31, 307–316 (2019). https://doi.org/10.1007/s00521-017-3002-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-017-3002-z