1 Introduction

As a data mining field, analysis of time series has been one of the main research subjects for decades. For modelling the time series, such as Auto-Regressive (AR) model, Linear Dynamical Systems (LDS) and Hidden Markov Model (HMM) and traditional approaches that include the estimation of parameters from an assumed time series model may fail to analysis of complex real-world time series. To get rid of the restrictions of traditional approaches as model assumption, normal distribution and the number of observation, in recent years, artificial neural networks (ANNs) have been commonly and successfully utilized. While a variety of neuron model were put forward by [6, 26, 30, 32, 44], one of the most common method used for time series prediction is multilayer perceptron (MLP) proposed by McCuloch and Pitts [36]. While Zhang et al. [46] and Zhang [45] reviewed the time series literature in which uses MLP, Crone and Kourentzes [10] and Crone et al. [11], in their studies, evaluated the forecasting performance of ANN in time series analysis. In time series literature, while dynamic ANN models [3, 20] have been used, some hybrid methods [1, 5, 12, 18, 19, 35, 39, 42] have a huge usage area which should not be underestimated. Moreover, some comparative [7, 9, 13, 21, 22, 31, 33] and application [4, 14, 17] studies were presented. To predict various time series, some new ANN models have been put forward by Voyant et al. [40], Laboissiere et al. [27], Cheng et al. [8], Kim [24] and Wang et al. [41]. Furthermore, Reyes et al. [34] presented a new earthquake prediction system based on the application of ANN to predict earthquakes in Chile. Koprinska et al. [25] introduced a new approach based on sequence similarity with neural networks for forecasting of electricity demand time series. Martínez-Álvarez et al. [29] explored the application of various data mining techniques to time series forecasting.

MLP has more than one neuron in the hidden layer and the output is a non-linear combination of multiplication of weighted sum of the inputs. When it comes to forecast time series by using MLP, the determination of architecture in a proper way seems as an essential problem because, in particular, the number of neurons in hidden layer directly affects the performance of multilayer perceptron neural networks. To determine the number of neurons in hidden layer and inputs of the model, Egrioglu et al. [15] put forward a method. Multiplicative neuron model (MNM) that does not contain this type of problem is introduced by Yadav et al. [43]. MNM named single multiplicative neuron model (S-MNM) has just one neuron. S-MNM uses a multiplicative function in its neuron as an aggregation function on the contrary to MLP that uses additive function. This multiplicative structure strengthens non-linearity characteristic of the model. S-MNM uses less parameter than those employed by MLP since it has only one neuron in the hidden layer [2]. Although S-MNM has some advantages in comparison to MLP, it has some difficulty in certain time series prediction since it is model-based the reason is that it has just a single neuron. To analysis of time series that may contain more complex structures, while MLP has outstanding prediction performance under favour of its high compliance with data by changing its architecture, S-MNM, from this aspect, is insufficient. When we take into consideration both advantages and disadvantages of MLP and S-MNM together, generating a model that has outstanding sides of both MLP and S-MNM conduce toward to obtain better prediction performance. From this point of view, in this study, we proposed a single multiplicative neuron model with autoregressive coefficient (AC-S-MNM). In AC-S-MNM, weights and biases are achieved by way of autoregressive equations. In the obtaining process of weights and biases, since the weights and the biases are determined via autoregressive equations that consider the time index of each observation, the model is a data-based. The parameters of autoregressive equations that generate the weights and biases of AC-S-MNM are specified by utilizing the modified particle swarm optimization (MPSO). The performance of proposed model is displayed through several implementations and compares with some others models which have been commonly used in the time series prediction literature.

In the rest of the paper, Sect. 2 comprises the proposed model in detail and in this section an algorithm is given regarding to operation of the proposed model. Various implementations and their results are presented in Sect. 3. Finally, in the last section; a discussion and an overall assessment are presented over the obtained results.

2 Single Multiplicative Neuron Model with Autoregressive Coefficient

S-MNM and MLP have been widely used to predict time series. While MLP has some advantages such as highly compliance with data, difficulties of architecture specified must be considered the down side of this ANN model. Moreover, whereas the S-MNM does not have such a problem, in prediction problems of some complex time series, it can remain incapable by the reason of containing a strict model-based structure.

AC-S-MNM, proposed in this study, contain in itself the master characteristics of both MLP and S-MNM such as a highly compatibility with data which is a crucial feature in prediction of complex time series and not including architecture selection problem under favour of having just a single neuron. Since the proposed AC-S-MNM model is basically an S-MNM, it has same structure with S-MNM and it can be demonstrated in Fig. 1.

Fig. 1
figure 1

The architecture of AC-S-MNM

In Fig. 1, function \(\Omega \left( {y,w,b} \right) \) is the product of the weighted inputs and f is activation function. Here, \(y_{t-1}, y_{t-2}, \ldots , y_{t-q} \) and \(\hat{y}_t \) are inputs and output of AC-S-MNM, respectively. Moreover, q c is called as model order, n is the number of observation in training set of time series. The S-MNM with q inputs given in Fig. 1 has \(q\times 2\) weights. Of these, q are the weights corresponding to the inputs \(\left( {w_t^i ,i=1,2,\ldots , q; t=1,2,\ldots , n} \right) \) and q to the sides of the weights \(\left( {b_t^i ,i=1,2,\ldots , q;\, t=1,2,\ldots , n} \right) \). The output producing process of AC-S-MNM can be given as an algorithm.

Algorithm 1 The calculation of outputs of AC-S-MNM

Step 1 Autoregressive equations are constituted for weights and biases.

For each of weights and biases, autoregressive models are given in Eqs. (1) and (2), respectively.

$$\begin{aligned} w_t^i= & {} \emptyset _0^i +\emptyset _1^i w_{t-1}^i +{ }^w \varepsilon _t^i ,i=1,2,\ldots , q; \quad t=1,2,\ldots , n \end{aligned}$$
(1)
$$\begin{aligned} b_t^i= & {} \theta _0^i +\theta _1^i b_{t-1}^i +{ }^b \varepsilon _t^i ,i=1,2,\ldots , q; \quad t=1,2,\ldots , n \end{aligned}$$
(2)

Totally, there are 2q autoregressive models. The initial values of each model are \(w_0^i \) and \(b_0^i ,i=1,2,\ldots , q\). The parameters of these 2q autoregressive models are estimated by using MPSO in an optimization process (see Algorithm 2).

Step 2 Weights and biases are calculated.

For each of learning sample, weights and biases \(\big ( w_1^i ,w_2^i ,\ldots , w_n^i ;b_1^i ,b_2^i ,\ldots , b_n^i, \, i=1,2,\ldots ,q \big )\) are calculated by using autoregressive equations given in Eqs. (1) and (2). For example, Let \(q=2\), \(n=2\) and the coefficients of autoregressive models are estimated by using MPSO as \(\emptyset _0^1 =0.1\), \(\emptyset _0^2 =0.2\), \(\emptyset _1^1 =0.3\), \(\emptyset _1^2 =0.4\); \(\theta _0^1 =0.5\), \(\theta _0^2 =0.6\), \(\theta _1^1 =0.7\), \(\theta _1^2 =0.8\); \(w_0^1 =0.1\), \(w_0^2 =0.2\), \(b_0^1 =0.3\) and \(b_0^2 =0.4\) weights and biases are calculated as below:

$$\begin{aligned} \hbox {For }t=1\hbox { and }i=1; w_1^1&=\emptyset _0^1 +\emptyset _1^1 w_0^1 +{}^w \varepsilon _1^1 =0.1+\left( {0.3\times 0.1} \right) +0=0.13\\ b_1^1&=\theta _0^1 +\theta _1^1 b_0^1 +{ }^b \varepsilon _1^1 =0.5+\left( {0.7\times 0.3} \right) +0=0.71\\ \hbox {For }t=1\hbox { and }i=2; w_1^2&=\emptyset _0^2 +\emptyset _1^2 w_0^2 +{ }^w \varepsilon _1^2 =0.2+\left( {0.4\times 0.2} \right) +0=0.28\\ b_1^2&=\theta _0^2 +\theta _1^2 b_0^2 +{ }^b \varepsilon _1^2 =0.6+\left( {0.8\times 0.4} \right) +0=0.92\\ \hbox {For }t=2\hbox { and }i=1; w_2^1&=\emptyset _0^1 +\emptyset _1^1 w_1^1 +{ }^w \varepsilon _2^1 =0.1+\left( {0.3\times 0.13} \right) +0=0.139\\ b_2^1&=\theta _0^1 +\theta _1^1 b_1^1 +{ }^b \varepsilon _2^1 =0.5+\left( {0.7\times 0.71} \right) +0=0.997\\ \hbox {For }t=2\hbox { and }i=2; w_2^2&=\emptyset _0^2 +\emptyset _1^2 w_1^2 +{ }^w \varepsilon _2^2 =0.2+\left( {0.4\times 0.28} \right) +0=0.312\\ b_2^2&=\theta _0^2 +\theta _1^2 b_1^2 +{ }^b \varepsilon _2^2 =0.6+\left( {0.8\times 0.92} \right) +0=1.336 \end{aligned}$$

In autoregressive models, \({ }^b \varepsilon _t^i \) and \({ }^w \varepsilon _t^i i=1,2,\ldots , q; \,t=1,2,\ldots , n\) error terms are always taken as zero.

Step 3 Values of net are calculated.

$$\begin{aligned} net_t =\prod \limits _{i=1}^q \left( {w_t^i \times y_{t-i} +b_t^i } \right) ;\quad t=1,2,\ldots , n \end{aligned}$$
(3)

Step 4 The outputs of AC-S-MNM are obtained.

For each of learning sample, the values of output are calculated by passing through logistic activation function.

$$\begin{aligned} \hat{y}_t =f\left( {net_t } \right) .\hbox {}=\frac{1}{1+e^{-net_t }}; \quad t=1,2,\ldots , n \end{aligned}$$
(4)

While the proposed AC-S-MNM is used to obtain the outputs for each learning sample, it uses weights and biases that are obtained by making use of autoregressive equations. In this stage, it is necessary that the coefficients (parameters) of autoregressive equations are determined to obtain the weights and biases. In our proposed approach, the coefficients of autoregressive equations are obtained by taking advantage of MPSO in an optimization process.

Particle swarm optimization (PSO) is an evolutionary computation technique proposed by Kennedy and Eberhart [23]. PSO can be evaluated as a population based optimization tool. The particle swarm concept originated as a simulation of simplified social system [38]. Distinguishing feature of this heuristic algorithm is that it simultaneously examines different points in different regions of the solution space to obtain the global optimum solution. Local optimum traps can be avoided because of this feature of the method [2]. The MPSO algorithm has time varying inertia weight like in [37]. In a similar way, this algorithm also has time varying acceleration coefficient like in [28].

In the optimization process, this process can also be called as training of AC-S-MNM, the coefficients aimed to get are \(\phi _0^i ,\theta _0^i ,\phi _1^i ,\theta _1^i ,w_0^i \) and \(b_0^i ;i=1,2,\ldots , q\). Thus, each particle of swarm has \(6\times q\) positions. The structure of a particle in a swarm is illustrated in Fig. 2.

Fig. 2
figure 2

The structure of a particle in a swarm

To get the coefficients of autoregressive equations, the optimization process can be given as an algorithm.

Algorithm 2 The training of AC-S-MNM by taking advantage of MPSO

Step 1 The parameters of MPSO are determined.

In the first step, the parameters which direct the MPSO algorithm are determined. These parameters are;

pn: The number of particles in the swarm.

vm: Velocities size.

\(\left( {c_{1i} ,c_{1f} } \right) \): The intervals which includes possible values of cognitive coefficient \(c_1 \).

\(\left( {c_{2i} ,c_{2f} } \right) \): The intervals which includes possible values of social coefficient \(c_2 \).

\(\left( {w_1 ,w_2 } \right) \): The intervals which includes possible values of inertia parameter w.

\(\textit{itr}_\textit{max}\): The number of maximum iteration.

Step 2 Initial values of positions and velocities are generated.

Initial positions of each kth \(({k=1,2,\ldots , pn})\) particle are randomly generated from uniform distribution \(\left( {0,1} \right) \) and kept in a vector of \({ }k^ X\) given as follows:

$$\begin{aligned} { }_k{X}=\left\{ {{ }_k x_1 ,{ }_k x_2 ,\ldots , { }_k x_{6q} ,} \right\} , \quad k=1,2,\ldots , pn \end{aligned}$$
(5)

where \({ }_k x_l \left( {l=1,2,\ldots , 6q} \right) \) represents lth position of kth particle. Moreover initial values of velocities are randomly generated from uniform distribution \(\left( {-vm,vm} \right) \), and kept in a vector of \({ }_k V\) given bellow.

$$\begin{aligned} { }_k V=\left\{ {{ }_k v_1 ,{ }_k v_2, \ldots , { }_k v_{6q},} \right\} ,k=1,2,\ldots , pn \end{aligned}$$
(6)

where \({ }_k v_l \left( {l=1,2,\ldots , 6q} \right) \) represents velocity for lth position of kth particle.

Step 3 Fitness function values are computed.

Mean square error (MSE) is taken as fitness function in this step. For \(k_{th} \) particle, MSE value are obtained in Eq. (7)

$$\begin{aligned} { }_k MSE=\frac{1}{n}\sum \limits _{t=1}^n \left( {y_t -{ }_k \hat{y}_t } \right) ^{2},\quad k=1,2,\ldots , pn \end{aligned}$$
(7)

The outputs, \({ }_k \hat{y}_t \), are calculated by using algorithm 1 and the positions of corresponding particle k. Weights and biases are designed like in Fig. 1 from positions of corresponding particle k.

Step 4 \({ }_k Pbest\) and Gbest are determined.

\({ }_k Pbest=\left( {{ }_k p_1 ,{ }_k p_2 ,\ldots , { }_k p_{6q} ;k=1,2,\ldots , pn} \right) \) and \(Gbest=\left( {p_1 ,p_2 ,\ldots ,p_{6q} } \right) \) are specified via fitness function values calculated in Step 2. \({ }_k Pbest\) and Gbest represent a vector stores the positions corresponding to the \(k_{th} \) particle’s best individual performance, and the positions of the best particle which has the best fitness function value, obtained so far, respectively.

Fig. 3
figure 3

The flow chart of algorithm 2

Step 5 The parameters of MPSO are updated.

The updated values of cognitive coefficient \(c_1 \), social coefficient \(c_2 \) and inertia parameter w are calculated using the formulas given in (8), (9), and (10).

$$\begin{aligned} { }^r c_1= & {} \left( {c_{1f} -c_{1i} } \right) \frac{r}{\textit{itr}_\textit{max} }+c_{1i} \end{aligned}$$
(8)
$$\begin{aligned} { }^r c_2= & {} \left( {c_{2f} -c_{2i} } \right) \frac{r}{\textit{itr}_\textit{max} }+c_{2i} \end{aligned}$$
(9)
$$\begin{aligned} { }^r w= & {} \left( {w_2 -w_1 } \right) \frac{\textit{itr}_\textit{max} -r}{\textit{itr}_\textit{max} }+w_1 \end{aligned}$$
(10)

where, r is current iteration number. Moreover, indexes of f and i represent final, and initial of possible values for cognitive and social coefficients, respectively.

Step 6 Values of velocities and positions are updated.

Velocities and positions are re-obtained by using the equations given in (11) and (12), respectively.

$$\begin{aligned} { }^{r+1} {}_k v_l= & {} \left[ {{ }^r w\times { }_k^r v_l +{ }^r c_1 \times rand_1 \times \left( {{ }_k p_l -{ }_k x_l ,} \right) +{ }^r c_2 \times rand_2 \times \left( {p_l -{ }_k x_l ,} \right) } \right] \nonumber \\ \end{aligned}$$
(11)
$$\begin{aligned} { }^{r+1} {}_k x_l= & {} { }_k^r x_l +{ }_k^{r+1} v_l \end{aligned}$$
(12)

where \(rand_1 \) and \(rand_2 \) are random values from the interval \(\left[ {0,1} \right] \).

Step 7 Check the stopping criteria.

If the number of repetition maximum iteration number \(\left( {\textit{itr}_\textit{max} } \right) \) is reached then stop the process, or else repeat from Steps 3 to Step 7 until a predetermined maximum iteration number \(\left( {\textit{itr}_\textit{max} } \right) \) is reached.

When \(\textit{itr}_\textit{max} \) is reached the optimum values of weights and biases are specified by Eqs. 1 and 2. And then the training of AC-S-MNM is completed. The flow chart of algorithm 2 can be given as in Fig. 3.

3 The Implementations

In order to investigate the performance of the proposed AC-S-MNM, 22 different time series were analysed. These time series and its some features used in the implementations are given in Table 1. In implementations of AC-S-MNM and the other methods based on MPSO, the parameters were taken as \(pn=30\), \(vm=100\), \(\left( {c_{1i} ,c_{1f} } \right) =\left( {2,3} \right) \), \(\left( {c_{2i} ,c_{2f} } \right) =\left( {2,3} \right) \), \(\left( {w_1 ,w_2 } \right) =\left( {1,2} \right) \), and \(\textit{itr}_\textit{max} =300\).

Table 1 Time series used in implementations and their some features

20 of these time series are daily basis stock exchange of Dow Jones Futures (DJF), Istanbul Stock Exchange (BIST), National Association of Securities Dealers Automated Quotations (NASDAQ), and Taiwan Stock Exchange Capitalization Weighted (TAIEX). In the application of these data sets, model order, in other words the number of inputs of the ANN, were taken from two to five and three different test set size were used as 10; 20; 40 and also for each of data set were analysed with 30 different random initialization. Taking into account all of these features, totally, \(4\times 3\times 30=360\) different analysis were performed for each of data set. In this analysis process, in addition to the proposed model, Single Multiplicative Neuron Model ANN trained by MPSO (SMN-ANN-PSO), multilayer perceptron ANN trained by MPSO (MLP-ANN-PSO), and Single Multiplicative Recurrent Neuron Model ANN trained by MPSO (SMN-R-ANN-PSO) were implemented and obtained results were evaluated together in terms of root mean square error (RMSE) criteria for both training and test sets. A summary of the obtained results are given in Table 2.

$$\begin{aligned} RMSE=\sqrt{\frac{1}{n}\sum \limits _{t=1}^n \left( {y_t -\hat{y}_t } \right) ^{2}} \end{aligned}$$
(13)

In Table 2, ranking rates of the proposed model are given with respect to four different values as mean, minimum, maximum and standard deviation of RMSE. In consideration of Table 2, for DJF time series, it is seen that the proposed AC-S-MNM, with regards to mean of RMSE, has best value in all 360 implementations in both training and test sets. Moreover, while the proposed model has best performance in 86.66 percentages of implementations, it has second best performance in 11.67 percentages of implementations for test sets in the analysis of DJF.

Table 2 Ranking rates of each model

When it comes to consideration the implementations of four different data sets (DJF, BIST, NASDAQ and TAIEX) as a whole, the proposed model has best values in terms of mean of RMSE in 99.17 percentages of implementations; in terms of minimum values of RMSE in 91.67 percentages of implementation; in respect to maximum values of error criteria in 100 percentages of implementation and in recognition of standard deviation of RMSE that is also an evidence of consistency of the models 73.33 percentages of implementation for the test sets (detailed results can be seen from supplementary tables).

Moreover, for a statistical evaluation as another emphasis of the outstanding performance of the proposed AC-S-MNM, Kruskal Wallis-H (KW-H) test performed and the results obtained from four different ANN models including the proposed AC-S-MNM were compared for DJF, BIST, NASDAQ and TAIEX data sets. In the comparison, significance level alpha was taken into account with two different levels as 0.05 and 0.10 and the p values obtained from KW-H test were compared to alpha. According to KW-H test’s results, for whole alternating analysis including training and test data sets, there are significant differences among the performance of ANN models for both 0.05 and 0.10 significance levels apart from just one case for analysis of NASDAQ 2011/test set 10 when the number of input is 2 (see the Supplementary Tables). Even for these cases, there is a significant difference among ANN models in the level of significant 0.10 (\(P=0.099)\). Considering the results both obtained from KW-H tests and given in Table 2 together, it is statistically said that the proposed AC-S-MNM has the best forecasting performance for 98.33 percentages and 99.17 percentage of whole analysis in the training and the test sets, respectively.

Secondly, Australian beer consumption data (AUST), a well-known real-time series data, between 1957 Q2 and 1994 Q1 was analysed. In the analysis of AUST given in Fig. 4, as in other previous studies, the number of inputs of the ANN was changed from 4 to 16 and the last 16 observations were taken as test set.

The best performance of the proposed model in terms of RMSE and mean absolute percentage error (MAPE), for AUST, is represented with the best results that obtained from 11 different models in Table 3. While the results of SARIMA (seasonal autoregressive integrated moving average), WMES (Winter’s multiplicative exponential smoothing), MLP-ANN, RBF-ANN (radial bases function ANN) L&NL-ANN (linear and non-linear ANN), E-ANN (Elman ANN), MS-ANN (multiplicative seasonal ANN), and R-MNM-ANN (recurrent multiplicative neuron ANN) were taken from [16], the results of SMN-ANN-PSO, MLP-ANN-PSO, and SMN-R-ANN-PSO were obtained in this study by running MATLAB software program.

$$\begin{aligned} MAPE=\frac{1}{n}\sum \limits _{t=1}^n \left| {\frac{y_t -\hat{y}_t }{y_t }} \right| \end{aligned}$$
(14)
Fig. 4
figure 4

The graph of AUST

Table 3 The performance criteria of models for AUST

When all results in Table 3 are considered together, it is clearly seen that the proposed model has superior forecasting performance with 17.2390 RMSE value and 3.09% MAPE value. The graph of best forecasts obtained in case of model order 16 and observations of test data is given in Fig. 5. Figure 5 shows that the forecasts of AC-S-MNM are in good agreement with observations just as the scatter plot given in Fig. 6.

Fig. 5
figure 5

The graph of forecasts obtained from AC-S-MNM and observations for test set of AUST

Fig. 6
figure 6

The scatter plot of forecasts obtained from AC-S-MNM and observations for test set of AUST

Fig. 7
figure 7

The graph of ELC

Table 4 The performance criteria of models for ELC
Fig. 8
figure 8

The graph of forecasts obtained from AC-S-MNM and observations for test set of ELC

Fig. 9
figure 9

The scatter plot of forecasts obtained from AC-S-MNM and observations for test set of ELC

Finally, Turkey Electricity Consumption data observed monthly between first month of 2002 and last month of 2013 (TEC) was used to evaluate the forecasting performance of AC-S-MNM. In the implementation of TEC which is given in Fig. 7, the model order was changed from 2 to 16, and the last 12 observations were taken as test set.

The best performance of the proposed model, for ELC, is represented with the best results that obtained from some other models in Table 4. Considering the results in Table 4, it is clearly seen that AC-S-SMN has superior forecasting performance with 6.2437E+08 RMSE value and 2.39% MAPE value. The graph of best forecasts obtained in case of model order 15 and observations of test data are given in Fig. 8. From Fig. 8, we clearly see that the forecasts of AC-S-MNM are in compliance with observations. Moreover the scatter plot given in Fig. 9 is a supportive evidence for the cohesion of the forecasts with observations.

4 Conclusion and Discussion

Nowadays, although researchers have commonly take advantage of MLP and S-MNM in time series prediction problems and they have various high quality features, each of them also contains some issues itself such as architecture determination problem and having a strict model-based structure. However, a more practical and successful ANN model can be used for time series prediction by integrating the superior aspect of S-MNM and MLP by eliminating the weaknesses of them. In this study, from this point of view, we proposed an ANN model—AC-S-MNM—that does not have the architecture selection problem just as S-MNM and have a feature of data-based just as MLP. In AC-S-MNM, weights and biases are produced through autoregressive equations. In the producing process of weights and biases, since the time index of each observation are considered, the model is evaluated as a data-based. The coefficients of autoregressive equations and the initial values of the weights and biases were determined by using PSO in an optimization process. The proposed model was applied to various real-world time series data sets and the obtained results were compared to the results of some well-known methods. And the results also show that AC-S-MNM exhibits significantly better performance as compared to the existing models. In future studies, some other equation structure can be utilized to determine the weights and biases.