1 Introduction

Artificial neural networks, one of the most widely used methods of artificial intelligence, is a very simple imitation of the human nervous system. The most basic element of the human nervous system is nerve cells called neurons. McCulloch and Pitts in (1943) set the artificial mathematical model of a biological neuron for the first time and this model underlies the various artificial neural network model even today. Generally, many artificial neural networks have three layers as input, hidden and output layer and these layers reveal the architecture of an artificial neural network. Besides the running process of each layer differs from an artificial neural network model according to an artificial neural model. Artificial neural networks have been extensively used for pattern recognition, clustering, classification and forecasting task in recent years. Feedforward neural networks (FFNNs) are the popular form of artificial neural network models that can perceive and approximate computational models using their advanced parallel layered structure (Ojha et al. in 2017). Multilayer perceptron (MLP) is a special form of FFNNs. Unlike the single perceptron model, MLP has the hidden layer(s) structure. The input layer receives the incoming data and sends it to the hidden layer. Then the incoming information is transferred to the next layer. The number of hidden layers varies depending on the problem, at least one and adjusted according to the need or the problem structure. The output of each layer becomes the input of the next layer. In artificial neural networks literature, MLP models have been successfully used and applied in time series forecasting. MLP is employing one or several hidden layers in the artificial neural networks and generally, many FFNNs models including MLP work with additive neuron model (units). Besides, some artificial neural networks use multiplicative neuron model such as multiplicative neuron model artificial neural networks proposed by Yadav et al. in (Yadav et al. 2007). MLP is a type 1st order neural network which can effectively carry out inner products which are then weighted and summed before passing through the non-linear threshold function. The other way to overcome the restriction to linear maps is to introduce higher-order units to model nonlinear dependences (Giles and Maxwell in (Giles and Maxwell 1987), Giles et al. in (Giles et al. 1988)).

Higher-order neural networks (HONNs) are a type of FFNNs which provide nonlinear decision boundaries, therefore offering a better classification capability than the linear neuron (Guler and Sahin in 1994). HONNs use higher combinations of inputs. HONNs do not only contain additive or multiplicative units but contain both units at the same time. HONNs also allow the inputs to be used by duplicating them. Unlike many artificial neuron networks, Pi-Sigma artificial neural networks (PS-ANN) one of the HONNs proposed by Shin and Gosh in (1991) have a very different architecture than other artificial neural networks due to using both additive and multiplicative structure. In the literature, there are many studies about PS-ANN for different aims especially forecasting. Ghazali and Jumeily in (2009) used PS-ANN for financial time series prediction. Husaini et al. in (2014) used PS-ANN for temperature forecasting in Batu Pahat. Husaini et al. in (2011) showed the effect of network parameters of PS-ANN for temperature forecasting. Husaini et al. in (2012) used PS-ANN for a one-step-ahead temperature forecasting. Nayak et al. in (2015) proposed a novel chemical reaction optimization based on PS-ANN for nonlinear classification. Mohamed et al. in (Mohamed et al. 2016) used batch gradient method for training of PS-ANN with a penalty. Nayak in (2017) used PS-ANN based on genetic algorithm and particle swarm optimization for exchange rate prediction. Bas et al. in (2016) proposed a high order fuzzy time series method based on PS-ANN and determine the fuzzy relations with PS-ANN. Dash et al. in (2018) used a PS-ANN based on evolutionary algorithms for gold price prediction. Deepa et al. in (2018) used a PS-ANN based on bioinspired swarm intelligence optimization algorithm for multimodal tumour data analysis. Akram et al. in (2019) proposed an improved PS-ANN with error feedback for physical time series prediction. Nayak in (2020) used a fireworks algorithm for the training of PS-ANN for modelling and forecasting chaotic crude oil price time series. Panda and Majhi in (2020) used improved spotted hyena optimizer with space transformational search for the training of PS-ANN. Kocak et al. in (2020) proposed a new fuzzy time series method based on an ARMA-type recurrent PS-ANN. Pattanayak et al. in (2020) proposed a multi-step-ahead fuzzy time series forecasting by using hybrid chemical reaction optimization with PS-ANN. Nayak and Ansari (2020) used cooperative optimization algorithm in PS-ANN for stock forecasting.

In this paper, the training of PS-ANN is performed by differential evolution algorithm (DEA) uses DE/rand/1 mutation strategy for time series forecasting. The performance of the proposed method (DEA-PS-ANN) is evaluated on two well-known data sets in ANN literature and compared with many studies in ANN literature. The rest of the paper is as follows: Section two is about PS-ANN and DEA. The proposed DEA-PS-ANN is given in section three. The application results are given in section four and finally, section five is about conclusions and discussions.

2 Preliminary Section

The main line of the study is about PS-ANN and DEA. So, this section contains basic information about PS-ANN and DEA.

2.1 Pi-Sigma Artificial Neural Networks

PS-ANN is a higher-order neural network type that uses higher combinations of its inputs and it was proposed by Shin and Gosh in (1991). PS-ANN consists of an input layer, a hidden layer of a linear sum of units, and finally an output layer of the multiplication of units. The term pi-sigma is that the units are processed throughout a sum and multiplicative process. Although the weights between the input and hidden layers are adjustable, the weights between hidden and output layers are fixed as 1. Increasing the degree of the Pi-Sigma neural network causes the function that defines the relationship between the input and the output layer to depend on more parameters and have a more complex structure. This allows to obtain better forecasting results. However, working with excess parameters causes an overfitting problem and more calculation time for the training algorithm. The architecture of a PS-ANN with an N input and Kth order is given in Fig. 1. We would like to stress that the template should not be manipulated and that the guidelines regarding font sizes and format should be adhered to. This is to ensure that the end product is as homogeneous as possible.

Fig. 1
figure 1

The architecture of PS-ANN

The linear combinations of input units are obtained with the weights \( w_{ij} \left( { i = 1,2, \ldots ,N , j = 1,2, \ldots ,K} \right) \). and biases of \( \theta_{j} \left( {j = 1,2, \ldots ,K} \right). \). \( w_{ij} \) is the weight from the ith input to the jth hidden layer unit. The linear combination as much as the number of hidden layer units passes through the linear activation function and creates the output of hidden layers.

2.2 Differential Evolution Algorithm

DEA is a simple, understandable and effective optimization algorithm compared to other evolutionary algorithms and it was proposed by Storn and Price in (1995). DEA is similar in that it uses the same evolutionary operators as a genetic algorithm,uch as mutation, crossover, and selection to direct the population to an optimal solution. Chromosomes represent solutions in DEA process. DEA’s performance is based on two main components: selected strategy and control parameters. However, the underlying idea of DEA is the mutation, crossover, and selection operators used to determine global optimum in every generation. Control parameter components consist of population size (PB), scaling factor (F) and crossover operators.

The mutation operator is the main distinguishing component of DEA and is considered as the main strategy of DEA. There are many different mutation strategies, each based on the classical mutation strategy and expressed in short definitions. The mutation strategy, called the classical mutation strategy and given with Eq. (1), is one of the most frequently used mutation strategies in the DEA literature and is shown by DE/rand/1. In this notation, “DE” indicates the DEA, “rand” is randomness, that is, the chromosomes are selected randomly. Classical DEA contains three random chromosomes (\( c_{1} , c_{2} , c_{3} \)) selected from the population. \( \left( {c_{1} , c_{2} , c_{3} \in \left[ {1, \ldots , PB} \right]; c_{1} \ne c_{2} \ne c_{3} } \right) \). With the mutation operator, these chromosomes come together and recombine to form the total vector or in other words mutation vector (\( c_{mv} \)). The application of the mutation operator to all candidate chromosomes in the population defines a search rule based on other candidate solutions (chromosomes) as given in Eq. (1).

$$ c_{mv} = c_{3} + F \times (c_{1} - c_{2} ) $$
(1)

In the DEA process, a random chromosome called \( c_{x} \) which is different from \( c_{1} , c_{2} , c_{3} \) and also called as the target chromosome is selected and a new ndidate chromosome (\( c_{c} \)) is also selected for the population by intersecting this \( c_{x} \) and \( c_{mv} \) chromosomes and finally crossover operator is performed for the DEA as given in Eq. (2).

$$ c_{c} \left( {i,j} \right) = \left\{ {\begin{array}{*{20}c} {c_{mv} \left( {i,j} \right)} & {rnd\left( {0,1} \right) < cr} \\ {c_{x} \left( {i,j} \right)} & {\text{otherwise}} \\ \end{array} } \right. $$
(2)

When constructing \( c_{c} \)., random numbers must be generated as much as the number of genes in a chromosome. If each generated random number is smaller than the crossover rate (\( cr \).), the gene is taken from the \( c_{mv} \). chromosome, if not, from the chromosome \( c_{x} \). and the chromosome \( c_{c} \) is created.

Finally, the fitness values of \( c_{c} \). and \( c_{x} \). chromosomes are compared in the DEA process. If the fitness value of \( c_{c} \) is better than the fitness value of \( c_{x} \), the candidate chromosome \( c_{c} \) will replace the target chromosome \( c_{x} \) in the next iteration, otherwise the target chromosome \( c_{x} \) will remain in the population. This process is called selection process in DEA. After the population is updated, mutation and crossover operators and selection process are repeated after reaching a predefined stop criterion, such as a certain number of iterations.

3 The Proposed Method

The forecasting of time series is made under some assumptions such as linearity and normal distribution such as autoregressive, moving averages models and etc. Considering that many time series that encounter in daily life are not linear so in the literature, several nonlinear models such as ANN have been proposed for forecasting of time series. PS-ANN, one of the most commonly used artificial neural network models, is a kind of high order ANN type use higher combinations of its inputs. The weight values determined depending on the number of inputs in PS-ANN do not increase exponentially unlike many other high order ANN types. So, since PS-ANN does not have a complex structure it is very useful compared to many other high order ANN types and has quite good analysis performance for time series prediction in the literature.

Genetic algorithm is known as the most frequently preferred and one of the oldest algorithms among artificial intelligence optimization algorithms. In a GA process, each chromosome in the population is not included in a GA process one by one. In contrast to GA process, each chromosome in the population is evaluated one by one during the DEA process, and thus, it is checked whether any chromosome can be transferred to the next generation or not. In short, it can be said that the most important feature of the DEA is that it evaluates each chromosome in the population within the DEA process. Thus, some chromosomes that are important for the population are preserved without any changes and transferred to the next generations.

In the proposed method, the training of the PS-ANN is performed with DEA. In the implementation process of DEA, DE/rand/1 mutation strategy is preferred as a mutation strategy. Moreover, with the use of an artificial intelligence optimization technique, DEA, a search space with multiple solution points is used instead of a random solution point. With the use of DEA in the training process of PS-ANN, using a derivative-based optimization algorithm such as LM has been prevented. Thus, it is avoided the use of complex derivatives. The algorithm of the proposed method (DEA-PS-ANN) is given in the following algorithm.

Algorithm

The training of DEA-PS-ANN

Step 1 Determining the parameters of PS-ANN used in the learning process.

These parameters are; the number of inputs (\( m \)), the degree of PS-ANN (\( d \)), the number of chromosomes (\( nc \)), the number of genes (\( g \)) in a chromosome, and the crossover rate (\( cr \)).

Step 2 Creating the initial population.

Before creating the initial population, let’s identify any of the chromosomes that will correspond to each solution in DEA. During the learning process of PS-ANN, the genes of a chromosome of DEA consist of the weight and bias values of PS-ANN. A chromosome consisting of weight and bias values of PS-ANN is given in Fig. 2.

Fig. 2
figure 2

A structure of a chromosome in DEA-PS-ANN

In Fig. 2, the genes from 1 to \( \left( {m~ \times ~d} \right) \) are weight values in a chromosome, and genes from \( \left( {\left( {m~ \times ~d} \right)~ + ~1} \right) \) to \( \left( {\left( {m~ \times ~d} \right)~ + ~d} \right) \) are bias values in a chromosome. In total, there are \( \left( {\left( {m~ \times ~d} \right)~ + ~d} \right) \) genes in a chromosome.

Therefore, in the creating of the initial population, nc chromosomes consisting of \( \left( {\left( {m~ \times ~d} \right)~ + ~d} \right) \) genes are generated randomly with the help of continuous uniform distribution \( U~\left( {0,1} \right). \)

Step 3. Calculation of the fitness function value for each chromosome.

First, the output of the jth hidden layer unit for each hidden layer is calculated by the neural net signals given in Equation (3) and the output of the network is calculated by Equation (4).

$$ h_{j} = f_{1} \left( {\mathop \sum \limits_{{i = 1}}^{N} w_{{ij}} x_{i} + \theta _{j} } \right), \quad j = 1,2, \ldots ,K $$
(3)

In Equation (3), \( f_{1} \left( x \right) = x \) shows the linear activation function. The output of the network is calculated by Equation (4) with the \( f_{2} \left( x \right) = \frac{1}{{1 + {\text{exp}}\left( { - x} \right)}} \) logistic activation function.

$$ \hat{y} = f_{2} \left( {\mathop \prod \limits_{{j = 1}}^{K} h_{j} } \right) = \frac{1}{{1 + \exp \left( { - \mathop \prod \nolimits_{{j = 1}}^{K} h_{j} } \right)}} $$
(4)

Finally, for each chromosome, the Root of Mean Square Error (RMSE) value determined as the fitness function given in Equation 5

$$ RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{{t = 1}}^{n} \left( {x_{t} - \hat{x}_{t} } \right)^{2} }}{n}} $$
(5)

In Equation (5) n, xt and \( \hat{x}_{t} \) are the number of learning samples the observed values and the forecasting value respectively.

Step 4. Application of mutation and crossover operators.

Mutation and crossover operators are applied sequentially to each chromosome in the initial population by using Equations (1) and (2).

Step 5. Comparison of fitness values.

The fitness value of the candidate chromosome is compared with the fitness value of the target chromosome. The chromosome with the lowest RMSE value is transferred to the new generation. Mutation and crossover operators are applied to all chromosomes in the population.

Step 6. Stopping condition.

If the maximum number of iterations has been reached or the chromosome with the best fitness value is less than a predetermined error value (ε), the process is finished if not, go to Step 3.

When the algorithm stops, the optimal weight and bias values are obtained from the chromosome with the lowest RMSE value in the population.

4 Application

In the evaluation of the performance of the DEA-PS-ANN, Australian beer consumption (AUST) time series with observation values between the second quarter of 1957 and the first quarter of 1994 given in Fig. 3 was analyzed firstly.

Fig. 3
figure 3

AUST time series graph

The last 16 observations of the AUST time series were taken as the test set. In the analysis of the AUST time series, the number of inputs of the model was tried from 4 to 12 with increasing 4, the degree of the model was tried by 2 to 5 with increasing 1, the crossover rate was tried from 0.1 to 0.5 with increasing 0.1 and the number of chromosomes was tried from 30 to 100 with increasing 10. Besides, the number of iterations was taken as 200. As a result of the trials, the best solution was obtained when the number of inputs is 12, the model degree is 4, the number of chromosomes is 40, and the crossover rate is 0.2.

Apart from the proposed DEA-PS-ANN, AUST time series was analyzed with single multiplicative neuron model artificial neural network (SMNM-ANN) based on BP learning algorithm (BP-SMNM-ANN) proposed by Yadav et al. in (2007), multilayer ANN based on PSO (PSO-ML-ANN) proposed by Kelwade and Salankar in (2017), Radial basis ANN (RB-ANN) proposed by Li in (2009), SMNM-ANN based on PSO (PSO-SMNM-ANN) proposed by Zhao and Yang in (2009), Multilayer feedforward ANN (ML-FF-ANN) proposed by Rumelhart et al. in (1986), Elman type recurrent ANN (E-ANN) proposed by Elman in (1990), multiplicative seasonal ANN based on PSO (PSO MS-ANN) proposed by Aladag et al. in (2013), multi-layer forward-feed ANN based on the trimmed mean neuron model (TR-ML-FF-ANN), PS-ANN based on PSO (PSO-PS-ANN), SMNM-ANN based on DEA (DEA-SMNM-ANN) proposed by Bas in (2016), linear and nonlinear ANN (L&NL-ANN) proposed by Yolcu et al. in (2013), recurrent SMNM-ANN (R-SMNM-ANN) proposed by Egrioglu et al. in (2017), SMNM-ANN based on artificial bat algorithm (ABA-SMNM-ANN) proposed by Bas et al. in (2018), SMNM-ANN with autoregressive coefficient (AC-SMNM-ANN) proposed by Cagcag Yolcu et al. in (2018), SMNM-ANN based on Gauss activation function (Gauss-SMNM-ANN) proposed by Gundogdu et al. in (2015), recurrent PS-ANN (R-PS-ANN) proposed by Akdeniz et al. in (2018) methods given in Table 1.

Table 1 RMSE And MAPE Values Obtained from All Methods for AUST Test Data

When Table 1 is examined, the proposed DEA-PS-ANN has the best performance among all methods in terms of both RMSE and MAPE criteria. Besides, the graph of the actual observations of the AUST test set and the forecasts obtained from the proposed DEA-PS-ANN for AUST test data are given in Fig. 4.

Fig. 4
figure 4

AUST test set together with the forecasts obtained from the proposed DEA-PS-ANN

In Fig. 4, it is seen that the real observations of AUST test set are quite compatible with the forecasts obtained from the method proposed DEA-PS-ANN for AUST test data.

Secondly and finally Turkey electricity consumption (TEC) time series observed monthly between January 2002 and December 2013 given in Fig. 5 was analyzed.

Fig. 5
figure 5

TEC time series graph

The last 12 observations of the TEC time series were taken as the test set. In the analysis of the TEC time series, the number of inputs of the model was tried from 5 to 15 with increasing 1, the degree of the model was tried from 2 to 5 with increasing 1, the crossover rate was tried from 0.1 to 0.5 with increasing 0.1 and the number of chromosomes was tried from 30 to 100 with increasing 10. Besides, the number of iterations is taken as 200. As a result of the trials, the best solution was obtained when the number of inputs is 13, the model degree is 2, the number of chromosomes is 40, and the crossover rate is 0.2.

Apart from the proposed DEA-PS-ANN, TEC time series was analyzed with BP-SMNM-ANN, ML-FF-ANN, ABA-SMNM-ANN, PSO-ML-ANN, L&NL-ANN, ANN with the deterministic trend and seasonal components (DT&S-ANN) proposed by Egrioglu et al. in (2015), PSO-PS-ANN, PSO-SMNM-ANN and BP-PS-ANN.

When Table 2 is examined, just as in the TEC time series, the proposed DEA-PS-ANN method has the best performance among all methods in term of RMSE criterion and the second-best method in terms of MAPE criterion. Besides, the graph of the actual observations of the TEC test set and the forecasts obtained from the proposed DEA-PS-ANN method for the TEC test set data are given in Fig. 6.

Table 2 RMSE And MAPE Values Obtained from All Methods for TEC Test Data
Fig. 6
figure 6

TEC test set together with the forecasts obtained from the proposed DEA-PS-ANN

5 Conclusions and Discussions

Different from many artificial neural networks in the literature, PS-ANN which is a kind of HONNs has bot additive and multiplicative units in its structure. PS-ANN attracts the attention as an effective model in the literature as an artificial neural network model with high forecasting performance by using these two units together.

In this study, the training of the PS-ANN was performed by DEA. To test the forecasting performance of the proposed DEA-PS-ANN method, analysis of real-life time series was performed using two different data sets. As a result of the analyses, it was concluded that the performance of the proposed method is better than many artificial neural network models available in the literature. In future studies, different mutation operators can be used during the training process of PS-ANN with DEA, or DEA can be used in the training of different neural networks.