Robust multilayer neural network based on median neuron model

Aladag, Cagdas Hakan; Egrioglu, Erol; Yolcu, Ufuk

doi:10.1007/s00521-012-1315-5

Robust multilayer neural network based on median neuron model

Original Article
Published: 03 January 2013

Volume 24, pages 945–956, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Robust multilayer neural network based on median neuron model

Download PDF

Cagdas Hakan Aladag¹,
Erol Egrioglu² &
Ufuk Yolcu³

467 Accesses
20 Citations
Explore all metrics

Abstract

Multilayer perceptron has been widely used in time series forecasting for last two decades. However, it is a well-known fact that the forecasting performance of multilayer perceptron is negatively affected when data have outliers and this is an important problem. In recent years, some alternative neuron models such as generalized-mean neuron, geometric mean neuron, and single multiplicative neuron have been also proposed in the literature. However, it is expected that forecasting performance of artificial neural network approaches based on these neuron models can be also negatively affected by outliers since the aggregation function employed in these models is based on mean value. In this study, a new multilayer feed forward neural network, which is called median neuron model multilayer feed forward (MNM-MFF) model, is proposed in order to deal with this problem caused by outliers and to reach high accuracy level. In the proposed model, unlike other models suggested in the literature, MNM which has median-based aggregation function is employed. MNM is also firstly defined in this study. MNM-MFF is a robust neural network method since aggregation functions in MNM-MFF are based on median, which is not affected much by outliers. In addition, to train MNM-MFF model, particle swarm optimization method was utilized. MNM-MFF was applied to two well-known time series in order to evaluate the performance of the proposed approach. As a result of the implementation, it was observed that the proposed MNM-MFF model has high forecasting accuracy and it is not affected by outlier as much as multilayer perceptron model. Proposed method brings improvement in 7 % for data without outlier, in 90 % for data with outlier, in 95 % for data with bigger outlier.

Median-Pi artificial neural network for forecasting

Article 13 May 2017

Multi-step Ahead Time Series Forecasting Based on the Improved Process Neural Networks

Considering Factors Affecting the Prediction of Time Series by Improving Sine-Cosine Algorithm for Selecting the Best Samples in Neural Network Multiple Training Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the literature, various approaches have been used for forecasting. Conventional forecasting methods can be insufficient for real-life time series since these approaches need some assumptions to be satisfied [1]. Not requiring any assumption such as linearity and normal distribution in forecasting time series makes artificial neural networks applicable for many fields [4]. In recent years, an effective way to forecast time series has been to utilize artificial neural networks [5]. This approach has been successfully used for forecasting in various implementations [2].

Various neuron models have been proposed in the literature [6, 14, 16, 17, 26]. The most preferred artificial neural network type is multilayer perceptron (MLP) introduced by Rumelhart et al. [18]. When real-life problems are solved using standard artificial neural networks such as MLP, it requires large number of neurons in the architecture [20]. A neuron having higher-order statistics can produce superior neural network with comparatively lesser number of neurons [20]. Thus, higher-order neural networks have been suggested by Chaturvedi et al. [8], Giles and Maxwell [9], Homma and Gupta [11], Sinha et al. [21], Taylor and Commbes [22]. Higher-order neurons have demonstrated improved computational power and generalization ability. However, these models are difficult to train because of a combinatorial explosion of higher-order term as the number of inputs to the neuron increases [20]. In addition, it is a well-known fact that the forecasting performance of MLP is negatively affected when data include outliers [10, 25]. In recent years, some artificial neural networks models which are based on some neuron models such as generalized-mean neuron (GMN) [23], geometric mean neuron (G-MN) [20], and single multiplicative neuron (SMN) [24] have been proposed as an alternative for MLP. Like MLP, these suggested models can also be negatively affected by outliers since aggregation functions in these models are based on mean.

A MNM is firstly introduced in this study, and a new feed forward neural networks approach based on MNM is proposed in order to deal with outlier problem. In the proposed MNM, unlike other neuron models, aggregation function based on median, which is not affected much by outliers, is employed instead of using functions based on summation or mean. It is a fact that unlike other measures of location statistics such as mean, median is not affected much by outliers of a data set. Using a mean-based aggregation function in a neuron will prevent that this neuron produces an extreme output value for an outlier input value. Therefore, MNM-MFF model consists of MNM is a robust multilayer neural network approach that is not affected by outliers. In the training process of MNM-MFF model, it is very hard to obtain derivate of cost function with respect to the weights of the model since median-based aggregation functions are used in the model. That means it is not easy to use back propagation learning algorithm to determine the best values of the weights. Therefore, the modified particle swarm optimization method [3] is utilized to train MNM-MFF model. To assess the forecasting performance of the proposed model MNM-MFF, it was applied to two well-known real-time series which are Australian beer consumption and Box-Jenkins gas furnace. In addition, different data scenarios were considered in the implementation to examine the performance of the proposed approach better. Furthermore, other forecasting models available in the literature were also used for the aim of comparison.

The remaining parts of the paper are organized as follows. The modified particle swarm optimization method that is used as a learning algorithm to train MNM-MFF model is briefly summarized in the next section. MNM is introduced in Sect. 3. In Sect. 4, MNM-MFF model is described and how the modified particle swarm optimization method is employed to train this model is addressed. Section 5 presents the implementation and the obtained results. Finally, the results are discussed in the last section.

2 The modified particle swarm optimization (MPSO)

Particle swarm optimization is a population-based heuristic algorithm, and it was firstly proposed by Kennedy and Eberhart [13]. Distinguishing feature of this heuristic algorithm is that it simultaneously examines different points in different regions of the solution space to obtain the global optimum solution. Local optimum traps can be avoided because of this feature of the method. In this study, MPSO was used to train MNM-MFF model. The detailed information about MPSO method can be found in [3]. The MPSO algorithm has time varying inertia weight like in [19]. In a similar way, this algorithm also has time varying acceleration coefficient like in [15].

Algorithm 1

The modified particle swarm optimization

Step 1

Positions of each kth (k = 1, 2, … , pn) particles are randomly determined and kept in a vector X _k given as follows:

$$ X_{k} = \left\{ {x_{k1} ,x_{k2} , \ldots ,x_{kd} } \right\},\quad k = 1,2, \ldots ,pn $$

(1)

where x _ki (i = 1,2,…,d) represents ith position of kth particle. pn and d represent the number of particles in a swarm and positions, respectively.

Step 2

Velocities are randomly determined and stored in a vector V _k given below.

$$ V_{k} = \left\{ {v_{k1} ,v_{k2} , \ldots ,v_{kd} } \right\},\quad k = 1,2, \ldots ,pn $$

(2)

Step 3

According to the evaluation function, Pbest and Gbest particles given in (3) and (4), respectively, are determined.

$$ Pbest_{k} = \left( {p_{k,1} ,p_{k,2} , \ldots ,p_{k,d} } \right),\quad k = 1,2, \ldots ,pn $$

(3)

$$ Gbest = \left( {p_{g,1} ,p_{g,2} , \ldots ,p_{g,d} } \right) $$

(4)

where Pbest _k is a vector stores the positions corresponding to the kth particle’s best individual performance, and Gbest and g represent the best particle and index number of best particle respectively, which has the best evaluation function value, found so far.

Step 4

Let c ₁ and c ₂ represent cognitive and social coefficients, respectively, and w is the inertia parameter. Let (c _1i, c _1f), (c _2i, c _2f), and (w ₁, w ₂) be the intervals which includes possible values for c ₁, c ₂ and w, respectively. At each iteration, these parameters are calculated by using the formulas given in (5), (6), and (7).

$$ c_{1} = \left( {c_{1f} - c_{1i} } \right)\frac{t}{\hbox{max} t} + c_{1i} $$

(5)

$$ c_{2} = \left( {c_{2f} - c_{2i} } \right)\frac{t}{\hbox{max} t} + c_{2i} $$

(6)

$$ w = \left( {w_{2} - w_{1} } \right)\frac{\max t - t}{\max t} + w_{1} $$

(7)

where maxt, t, f, and i represent maximum iteration number, current iteration number, final, and initial, respectively.

Step 5

Values of velocities and positions are updated by using the formulas given in (8) and (9), respectively.

$$ v_{i,d}^{t + 1} = \left[ {w \times v_{i,d}^{t} + c_{1} \times {\text{rand}}_{1} \times \left( {p_{i,d} - x_{i,d} } \right) + c_{2} \times {\text{rand}}_{2} \times \left( {p_{g,d} - x_{i,d} } \right)} \right] $$

(8)

$$ x_{i,d}^{t + 1} = x_{i,d} + v_{i,d}^{t + 1} $$

(9)

where rand₁ and rand₂ are random values from the interval [0 1].

Step 6

Steps 3 to 5 are repeated until a predetermined maximum iteration number (maxt) is reached.

3 Median neuron model

The first artificial neuron model was proposed by McCulloch and Pitts [16]. Then, various neuron models have been proposed in the literature. One of the most preferred types of artificial neural networks is MLP. In each neuron of MLP, the function in (10) has been generally employed as aggregation function.

$$ net\left( {x_{j} ,w_{j} } \right) = \sum\limits_{j = 1}^{N} {w_{j} x_{j} } + w_{0} $$

(10)

where x _j, w _j, N (j = 0, 1, 2, … ,N) represent input signals, weights, and number of input signals, respectively. And, w ₀ is the weight for bias. It is very clear that the neuron model given in (10) is negatively affected by input signals from outliers since this model is based on summation operation. Besides, mean-based neuron models such as GMN and G-MN have same problem since mean is also negatively affected by outliers. In this study, a new neuron model MNM, in which median is utilized as aggregation function, is proposed to deal with outlier problem. It is a well-known fact that median is not affected much by outliers so MNM is not affected by outliers. MNM is illustrated in Fig. 1.

In Fig. 1, y and f represent output signal of the neuron and activation function, respectively. The bias value is 1 as seen from the figure. net is the activation value obtained from aggregation function. This value is calculated as follows:

$$ net = Median(w_{1} x_{1} ,w_{2} x_{2} , \ldots ,w_{N} x_{N} , w_{0} ) $$

(11)

where x _j, w _j, N (j = 0,1,2, … ,N) represent input signals, weights, and number of input signals, respectively. And, w ₀ is the weight for bias.

4 Multilayer feed forward network with median neuron model

MNM-MFF model proposed in this study is a multilayer feed forward neural network model. This neural network model is composed of MNM. Architecture structure of the proposed MNM-MFF model that has N and M neurons in input and hidden layers, respectively, is shown in Fig. 2. As seen from this figure, input and output vectors of the model are $ X = \left[ {x_{1} ,x_{2} , \ldots ,x_{N} } \right] $ and [y], respectively.

If $ w_{{h_{ij} }} $ is a weight that connects the ith hidden neuron with jth input, the activation value of the ith hidden neuron can be given as

$$ net_{{h_{i} }} = Median\left( {w_{{h_{i1} }} x_{1} ,w_{{h_{i2} }} x_{2} , \ldots ,w_{{h_{iN} }} x_{N} , w_{{h_{i0} }} } \right),\quad i = 1,2, \ldots ,M $$

(12)

where $ w_{{h_{i0} }} $ is the weight for the bias of the ith hidden neuron. The nonlinear transformation performed by each of M neurons in the network is given as

$$ y_{{h_{i} }} = f\left( {net_{{h_{i} }} } \right),\quad i = 1,2, \ldots ,M $$

(13)

where f denotes a sigmoid function. Similarly, the output of neuron in the output layer can be given as (15).

$$ net = Median\left( {w_{{o_{1} }} y_{{h_{1} }} ,w_{{o_{2} }} y_{{h_{2} }} , \ldots ,w_{{o_{M} }} y_{{h_{M} }} , w_{{o_{0} }} } \right) $$

(14)

$$ y = f(net) $$

(15)

where $ w_{{o_{i} }} $ is the weight that connects the ith neuron of hidden layer to neuron of the output layer, and $ w_{{o_{0} }} $ is the weight for the bias to corresponding output layer neuron.

To train MNM-MFF whose architecture structure is shown in Fig. 2, MPSO method which is presented in Sect. 2 is utilized. A particle in MPSO consists of positions which are weights of MNM-MFF. A particle is presented in Fig. 3.

In MPSO algorithm, mean square error (MSE) is employed as evaluation function. The formula in (16) can be used to calculate MSE value.

$$ {\text{MSE}} = \frac{1}{n}\sum\limits_{t = 1}^{n} {({\text{output}}_{t} - {\text{target}}_{t} )^{2} } $$

(16)

where n represents the number of learning samples. The algorithm of MPSO approach used in training process of MNM-MFF can be given as follows: