Keywords

1 Introduction

Formally defined, a time series is a set of observed values from a variable along time in regular periods (for instance, every day, every month or every year) [25]. Accordingly, the work of forecasting in a time series can be defined as the task of predicting successive values of the variable in time spaced based on past and present observations.

For many decades, different approaches have been used for to modelling and forecasting time series. These techniques can be classified into three different areas: descriptive traditional technologies, linear and nonlinear modern models, and soft computing techniques. From all developed method, ARIMA, proposed by Box and Jenkins [3], is possibly the most widely known and used. Nevertheless, it yields simplistic linear models, being unable to find subtle patterns in the time series data.

New methods based on artificial neural networks, such as the one used in this paper, on the other hand, can generate more complex models that are able to grasp those subtle variations.

The L-Co-R method [24], developed inside the field of ANNs, makes jointly use of Radial Basis Function Networks (RBFNs) and EAs to automatically forecast any given time series. Moreover, L-Co-R designs adequate neural networks and selects the time lags that will be used in the prediction, in a coevolutive [7] approach that allows to separate the main problem in two dependent subproblems. The algorithm evolves two subpopulations based on a cooperative scheme in which every individual of a subpopulation collaborates with individuals from the other subpopulation in order to obtain good solutions.

While previously work [24] was focused on 1-step ahead prediction, the main goal of this one is to analyze the effectiveness of the L-Co-R method in the medium and long-term horizon, using the own previously predicted values to perform next predictions. Thus, 6 different methods used in time series forecasting have been selected in order to test the behavior of the method.

The rest of the paper is organized as follows: Sect. 2 introduces some preliminary topics related to this research; Sect. 3 describes the method L-Co-R; and finally Sect. 4 presents the experimentation and the statistical study carried out.

2 Preliminaries

Approaches proposed in time series forecasting can be mainly grouped as linear and nonlinear models. Methods like exponential smoothing methods [34], simple exponential smoothing, Holt’s linear methods, some variations of the Holt-Winter’s methods, State space models [29], and ARIMA models [3], have stand out from linear methods, used chiefly for modelling time series. Nonlinear models arose because linear models were insufficient in many real applications; between nonlinear methods it can be found regime-switching models, which comprise the wide variety of existing threshold autoregressive models [31] as: self-exciting models [32], smooth transition models [8], and continuous-time models [4], among others. Nevertheless, soft computing approaches were developed in order to save disadvantages of nonlinear models like the lack of robustness in complex model and the difficulty to use [9].

ANNs have also been applied successfully [17] and recognized as an important tool for time-series forecasting. Within ANNs, the utilization of RBFs as activation functions were considered by works as [5] and [27], and applied to time series by Carse and Fogarty [6], and Whitehead and Choate [33]. Later works like the ones by Harpham and Dawson [13] or Du [10] focused on RBFNs for time series forecasting.

On the other hand, an issue that must be taken into account when working with time series is the correct choice of the time lags for representing the series. Takens’ theorem [30] establishes that if d, a d-dimensional space where d is the minimum dimension capable of representing such a relationship, is sufficiently large is possible to build a state space using the correct time lags and if this space is correctly rebuilt also guarantees that the dynamics of this space is topologically identical to the dynamics of the real systems state space.

Many methods are based in Takens’ theorem (like [19]) but, in general, the approaches found in the literature consider the lags selection as a pre or post-processing or as a part of the learning process [1, 23]. In the L-Co-R method the selection of the time lags is jointly faced along with the design process, thus it employs co-evolution to simultaneously solve these problems.

Cooperative co-evolution [26] has also been used in order to train ANNs to design neural network ensembles [12] and RBFNs [18]. But in addition, cooperative co-evolution is utilized in time series forecasting in works as the one by Xin [20].

3 Description of the Method

This section describes L-Co-R [24], a co-evolutionary algorithm developed to minimize the error obtained for automatically time series forecasting. The algorithm works building at the same time RBFNs and sets of lags that will be used to predict future values. For this task, L-Co-R is able to simultaneously evolve two populations of different individual species, in which any member of each population can cooperate with individuals from the other one in order to generate good solutions, that is, each individual represents itself a possible solution to the subproblem. Therefore, the algorithm is composed of the following two populations:

  • Population of RBFNs: it consists of a set of RBFNs which evolves to design a suitable architecture of the network. This population employs real codification so every individual represent a set of neurons (RBFs) that composes the net. During the evolutionary process neurons can grow or decrease since the number of neurons is variable. Each neuron of the net is defined by a center (a vector with the same dimension as the inputs) and a radius. The exact dimension of the input space is given by an individual of the population of lags (the one chosen to evaluate the net).

  • Population of lags: it is composed of sets of lags evolves to forecast future values of the time series. The population uses a binary codification scheme thus each gene indicates if that specific lag in the time series will be utilized in the forecasting process. The length of the chromosome is set at the beginning corresponding with the specific parameter, so that it cannot vary its size during the execution of the algorithm.

As the fundamental objective, L-Co-R forecasts any time series for any horizon and builds appropriate RBFNs designed with suitable sets of lags, reducing any hand made preprocessing step. Figure 1 describes the general scheme of the algorithm L-Co-R.

Fig. 1
figure 1

General scheme of method L-Co-R

L-Co-R performs a process to automatically remove the trend of the times series to work with, if necessary. This procedure is divided into two main phases: preprocessing, which takes places at the beginning of the algorithm, and post-processing, at the end of co-evolutionary process. Basically, the algorithm checks if the time series includes trend and, in affirmative case, the trend is removed.

The performance of L-Co-R starts with the creation of the two initial populations, randomly generated for the first generation; then, each individual of the populations is evaluated. The L-Co-R algorithm uses a sequential scheme in which only one population is active, so the two population take turns in evolving. Firstly, the evolutionary process of the population of lags occurs: the individuals which will belong to the subpopulation are selected; following the CHC scheme [11], genetic operators are applied; the collaborator for every individual is chosen from the population of RBFNs; and the individuals are evaluated again and assigned the result as fitness. After that, the best individuals from the subpopulation will replace the worst individuals of the population. During the evolution, the population of lags checks that al least one gene of the chromosome must be set to one because necessarily the net needs one input to obtained the forecasted value.

In the second place, the population of RBFNs starts the evolutionary process. For the first generation, every net in the population has a number of neurons randomly chosen which may not exceed a maximum number previously fixed. As in population of lags, the individuals for the subpopulation are selected, the genetic operators are applied, every individual chooses the collaborator from the population of lags, and then, the individuals are evaluated and the result is assigned as fitness. Fitness function is defined by the inverse of the root mean squared error At the end of the co-evolutionary process, two models formed by a set of lags (from the first population) and a neural network (from the second population) are obtained. On the one hand, a model is composed of the best set of lags and its best collaborator, and on the other hand, the other model is composed of the best net found and its best collaborator. Then, the two models are trained again and the final model chosen is the one that obtains the best fitness. This final model obtains the future values of the time series used for the prediction, and then, forecasted data will be used to find next values.

The collaboration scheme used in L-Co-R is the best collaboration scheme [26]. Thus, every individual in any population chooses the best collaborator from the other population. Only at the beginning of the co-evolutionary process, the collaborator is selected randomly because the population has not been evaluated yet.

The method has a set of specific operators specially developed to work with individuals from every population. The operators used by L-Co-R are the followings:

  • Population of RBFNs: tournament selection, x_fix crossover, four operators to mutate randomly chosen (C_random, R_random, Adder, and Deleter) and replacement of the worst individuals by the best ones of the subpopulation.

  • Population of lags: elitist selection, HUX crossover operator, replacement of the worst individuals, and diverge (the population is restarted when it is blocked).

4 Experimentation and Statistical Study

The main goal of the experiments is to study the behavior of the algorithm L-Co-R using 4 different and growing horizons, and to compare the results with other 6 methods found in the literature and for 3 different quality measures.

4.1 Experimental Methodology

The experimentation has been carried out using 20 data bases taken from the INE.Footnote 1 The data represent observations from different activities and have different nature, size, and characteristics. The data bases have been labeled as: Airline, WmFrancfort, WmLondon, WmMadrid, WmMilan, WmNewYork, WmTokyo, Deceases, SpaMovSpec, Exchange, Gasoline, MortCanc, MortMade, Books, FreeHouPrize, Prisoners, TurIn, TurOut, TUrban, and HouseFin.

To compare the effectiveness of L-Co-R it has used, on the one hand, 6 methods found within the field of time series forecasting: Exponential smoothing method (ETS), Croston, Theta, Random Walk (RW), Mean, and ARIMA [16], and on the other hand, 4 different horizons in order to test the effectiveness when the horizon rises: 1, 6, 12, and 24.

An open question when dealing with time series is the measure to be used in order to calculate the accuracy of the obtained predictions. Mean Absolute Percentage Error (MAPE) [2] was the first measure employed in the M-competition [21] and most textbooks recommended it. Later, many other measures as Geometric Mean Relative Absolute Error, Median Relative Absolute Error, Symmetric Median and Median Absolute Percentage Error (MdAPE), and Symmetric Mean Absolute Percentage Error, among others, were proposed [22]. However, a disadvantage was found in these measures, they were not generally applicable and can be infinite, undefined or can produce misleading results, as Hyndman and Koehler explained in their work [15]. Thus, they proposed Mean Absolute Scaled Error (MASE) that is less sensitive to outliers, less variable on small samples, and more easily interpreted.

Table 1 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 1 and MAPE
Table 2 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 1 and MASE
Table 3 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 1 and MdAPE
Table 4 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 6 and MAPE
Table 5 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 6 and MASE
Table 6 Results of the methods L-Co-R, ETS, Croston, Theta, RW, Mean, and ARIMA, with respect to horizon 6 and MdAPE

In this work, the measures used are MAPE (i.e., \(mean(\mid p_t\mid )\)), MASE (defined as \(mean(\mid q_t\mid )\)), and MdAPE (as \(median(\mid p_t\mid )\) ), taking into account that \(Y_t\) is the observation at time \(t = {1,...,n}\); \(F_t\) is the forecast of \(Y_t\); \(e_t\) is the forecast error (i.e. \(e_t= Y_t - F_t\)); \(p_t = 100e_t/Y_t\) is the percentage error, and \(q_t\) is determined as:

$$\begin{aligned} q_t = \displaystyle \frac{e_t}{\displaystyle \frac{1}{n-1} \sum _{i=2}^n \mid Y_i - Y_{i-1} \mid } \end{aligned}$$

Due to its stochastic nature, the results yielded by L-Co-R have been calculated as the average errors over 30 executions with every time series. For each execution, the following parameters are used in the L-Co-R algorithm: lags population size = 50, lags population generations=5, lags chromosome size = 10 %, RBFNs population size = 50, RBFNs population generations=10, validation rate=0.25, maximum number of neurons of first generation=0.05, tournament size = 3, replacement rate=0.5, crossover rate=0.8, mutation rate=0.2, and total number of generations=20.

Tables 1, 2, 3, 4, 5, and 6, show the results of the L-Co-R and the utilized methods to compare (ETS, Croston, Theta, RW, Mean, and ARIMA), for measures MAPE, MASE, and MdAPE, for horizons 1, and 6, respectively. Due to space limitations, this paper only shows results of the horizons 1 and 6, the results of the rest horizons, 12 and 24, can be accessed at https://goo.gl/frHK7z.

As mentioned before, every result indicated in the tables represent the average of 30 executions for each time series. Best result per database is marked with character ’*’. Considering every horizon tested:

  • Horizon 1: the L-Co-R algorithm obtains the best results in most of the time series. With respect to MAPE, the L-Co-R algorithm obtains the best results in 15 of 20 time series used, as can be seen in Table 1. Regarding MASE, L-Co-R stands out yielding the best results for 5 time series as can be observed in Table 2. And concerning MdAPE, L-Co-R acquires better results than the other methods in 12 of 20 time series, as Table 3 shows.

  • Horizon 6: the L-Co-R has better results than all the other methods using MAPE and MdAPE, as can be seen in Tables 4 and 6, and the best results in 15 o the 20 time series for MASE, as can be observed in Table 5.

  • Horizon 12: the L-Co-R yields the best results in 19, 17, and 18 of the 19 time series (MortCanc has not enough values to use with this horizon) respecting MAPE, MASE, and MdAPE, respectively.

  • Horizon 24: the L-Co-R algorithm obtains better results than the other methods in 17, 16, and 16 of the 17 time series (MortCanc, MortMade, and FreeHouPrize have not enough values to use with this horizon) with regard to MAPE, MASE, and MdAPE, respectively.

Thus, the L-Co-R algorithm is able to achieve a more accurate forecast in the most time series for any of the horizons and quality measures considered.

Table 7 Friedman’s test ranking
Table 8 Friedman’s test ranking

4.2 Analysis of the Results

To analyze in more detail the results and check whether the observed differences are significant, two main steps are performed: firstly, identifying whether exist differences in general between the methods used in the comparison; and secondly, determining if the best method is significant better than the rest of the methods. To do this, first of all it has to be decided if is possible to use parametric o non-parametric statistical techniques. An adequate use of parametric statistical techniques reaching three necessary conditions: independency, normality and homoscedasticity [28].

Table 9 Friedman’s test ranking
Table 10 Friedman’s test ranking

Owing to the former conditions are not fulfilled, the Friedman and Iman-Davenport non-parametric tests have been used. Tables with results of these tests are available at https://goo.gl/frHK7z. They show, from left to right, the Friedman and Iman-Davenport values (\(\chi ^2\) and \(F_F\), respectively), the corresponding critical values for each distribution by using a level of significance \(\alpha \) = 0.05, and the p-value obtained for the measures utilized. Finally, the critical values of Friedman and Iman-Davenport are smaller than the statistic, it means that there are significant differences among the methods in all cases.

In addition, Friedman provides a ranking of the algorithms, so that the method with a lowest result is taken as the control algorithm. For this reason, and according to Tables 7, 8, 9, and 10, the L-Co-R algorithm results to be the control algorithm for all horizons considered and the three quality measures used.

In order to check if the control algorithm has statistical differences regarding the other methods used, the Holm procedure [14] is used. Tables 11, 12, 13, and 14 presents the results of the Holm’s procedure since shows the adjusted p values from each comparison between the algorithm control and the rest of the methods for MAPE, MASE, and MdAPE, and for horizons 1, 6, 12, and 24 considering a level of significance of \(alpha=0.05\).

As can be seen in Tables 11, 12, 13, and 14, there are significant differences among L-Co-R and all the rest of the methods in the most of the cases. Analyzing more specifically for every horizon:

  • Horizon 1: significant differences exist between L-Co-R and the rest of the method for MAPE. With respect to MASE, there exist significant differences between the L-Co-R algorithm and Mean, ARIMA, and Croston, although it is not appropriate to assure that with methods ETS, RW, and Theta. Regarding MdAPE, L-Co-R has significant differences with all methods except ARIMA, as can be seen Table 11.

  • Horizon 6: L-Co-R has significant differences with all methods used, for every measure considered, as Table 12 shows.

  • Horizon 12: there are significant differences among the control algorithm, L-Co-R, and the rest of the methods in all cases, as can be observed in Table 13.

  • Horizon 24: as with horizons 6 and 12, there are also significant differences between L-Co-R and other methods, as Table 14 shows.

Table 11 Adjusted p values of Holm’s procedure between the control algorithm (L-Co-R) and the other methods for MAPE, MASE, and MdAPE with respect to horizon 1
Table 12 Adjusted p values of Holm’s procedure between the control algorithm (L-Co-R) and the other methods for MAPE, MASE, and MdAPE with respect to horizon 6
Table 13 Adjusted p values of Holm’s procedure between the control algorithm (L-Co-R) and the other methods for MAPE, MASE, and MdAPE with respect to horizon 12
Table 14 Adjusted p values of Holm’s procedure between the control algorithm (L-Co-R) and the other methods for MAPE, MASE, and MdAPE with respect to horizon 24

In conclusion, it is possible to confirm that the L-Co-R method is able to achieve a better forecast in majority of cases even when the horizon grows, comparing with the other 6 methods utilized and concerning to 3 different quality measures.