1 Introduction

Evolutionary algorithms (EAs) have been successfully used in a wide area of applications. Traditionally, EAs are well suited to solve problems where the environment is static. For dynamic optimization problems, an effective EA must be able to detect changes and rapidly deal with that changes when they occur. Classical EAs are not suited for these kinds of problems, since they have the tendency to prematurely converge to a solution. Therefore, when the conditions of the environment change, the algorithm needs some time for this converged population to readapt and move towards the new optimum.

To deal with these limitations, some improvements have been proposed as extensions of the classical EA. These improvements include maintaining diversity, e.g. Cobb and Grefenstette (1993), Grefenstette (1992), using memory schemes, e.g. Karaman et al. (2005), Yang (2007), Simões and Costa (2007b), Barlow and Smith (2008), using multi-populations, e.g. Ursem (2000), Younes et al. (2006), Li and Yang (2012) or anticipating the changes in the environment, e.g. Bosman and La Poutré (2007), Simões and Costa (2008, 2009b), Richter and Yang (2009).

If the environment changes cyclically and past situations reappear later, the use of memory is beneficial, since that the memorized solutions can be used to help the EA when a change in the environment happens and a previous situation reappears. In general, memory-based approaches dealing with dynamic environments use the memory after a change is detected. As a consequence, the EA’s performance decreases after the change and then, when the memory is used, it starts increasing. In order to minimize the effects of the environmental changes on the EA’s performance, anticipation mechanisms have been explored by different authors. The goal of these approaches is to use past data to forecast future situations and use it to help the EA to deal with the new environment. The next section describes the most relevant works concerning these issues. The majority of these methods use prediction to estimate where the optimum will move, giving the algorithm the moment of the next change. Nevertheless, in real situations, the time when the next change will happen is also unknown. In our previous work, we proposed the use of a new prediction techniques in memory-based EAs in order to estimate both the time of the next change and the next environmental transition, with promising results. Simões and Costa (2008) mechanisms using linear regression and Markov chains, are used to estimate the generation when a change in the environment will occur, and also to predict to which state (or states) the environment may change, respectively. Simões and Costa (2009a), we investigate different methods to dynamically adjust the linear predictor in order to achieve higher adaptability and robustness. Simões and Costa (2009b) a new approach based a nonlinear regression predictor is proposed to estimate when the next change will occur. In this paper we unify and further extend the use of these techniques in memory-based EAs. The previously proposed approaches presented several limitations and failed in specific situations. In particular, the linear regression predictor used all available data to make the next prediction. Consequently, as time passed, the computational effort was bigger and if the dynamics of the change period was altered the performance of the EA was compromised. To overcome this problem in this paper we introduce the concept of time window that greatly enhances the linear predictor module, making it more efficient, robust and applicable to a wider set of situations. This new mechanism significantly reduces the computational effort of the previous versions and presents the best performance in the studied benchmark problems. Moreover, we enlarge the experimental setup with new test functions dealing with different and more challenging scenarios. Finally, we present an unified and complete description of the framework, that includes our memory-based evolutionary algorithm, the prediction module (linear and nonlinear) and the anticipation module.

The rest of the paper is outlined as follows. The next section briefly reviews relevant work concerning prediction in dynamic environments. Section 3 gives an overview about different categorizations of dynamic environments, introducing a new one that is used in this work. Section 4 presents the memory-based EA used in this study. The prediction methods introduced in Simões and Costa (2008, 2009a, b) and the new improvements are described in Sect. 5. The used benchmarks, the experimental settings and the performance measures are detailed in Sect. 6. The results obtained are analyzed in Sect. 7. Finally, the relevant conclusions and directions for future work are discussed in Sect. 8.

2 Related work

This section describes related work proposed in the literature concerning anticipation in changing environments using EAs.

Stroud (2001) used a Kalman-extended genetic algorithm (KGA) in which a Kalman filter is applied to the fitness values associated with the population individuals. The goal of this Kalman filter is to determine when to generate a new individual, when to re-evaluate an existing individual, and which one to re-evaluate. This KGA is applied to the problem of maintaining a network configuration with minimized message loss in which the nodes are mobile and the transmission over a link is stochastic. As the nodes move, the optimal network changes, but the information contained within the population of solutions allows efficient discovery of better-adapted solutions.

Van Hemert et al. van Hemert et al. (2001) introduced an EA with a meta-learner to estimate, at time \(t\), how the environment will be at time \(t+\delta \). This approach uses two populations, one that searches the current optimum and another that uses the best individuals in the past to predict the future best value. The prediction is made based on observations from the past, using two types of predictors: a perfect predictor and a noisy predictor. Individuals from the latter are occasionally sent to the former. The prediction module is not implemented, but instead it is assumed a perfect predictor or a noisy predictor. The idea is tested with two benchmark problems: the knapsack problem and Ǒsmera’s function.

An integrated system combining prediction, optimization and adaptation techniques is proposed in Schmidt et al. (2005) and applied to a real world application used to find the best distribution of cars of a particular model across the nation. The problem is complex and the implemented model addresses the issues of transportation, volume sensitivity effect, price depreciation, recent history, current inventory, risk factors and dynamic market changes. The system has three main modules: optimization, prediction and adaptation. The prediction module uses information from the past to give prediction about the sale prices. Schmidt et al. (2005), no information was given about which techniques are used to provide the predictions. The other two modules use the information provided by the prediction module and provide an answer to the actual problem. Finally, the adaptation module receives new information about the problem and adapts the parameters of the prediction module in order to decrease the prediction error. Later, in Michalewicz et al. (2007), this model was used in three different case studies.

Bosman and La Poutré (2006, 2007) proposed several approaches focused on the importance of using learning and anticipation in online dynamic optimization. These works analyzed the influence of time-linkage present in problems such as scheduling and vehicle routing. The presence of time-linkage in this kind of problem can influence the overall performance of the system: if a decision is made just to optimize the score at a specific moment, it can negatively influence the results obtained in the future. Bosman’s works proposed an algorithmic framework integrating evolutionary computation with machine learning and statistical learning techniques to estimate future situations. Predictions are made based on information collected from the past. The used predictor is a learning algorithm that approximates either the optimization function or several parameters.

Hatzakis and Wallace (2001) uses prediction techniques to forecast the location of the Pareto front in multiobjective problems. This approach stores the location of previous solutions and uses autoregressive models to predict the position of the new optimal solution in the next time step. The changes in the environment are known à priori in order to decide when to make the next prediction. A similar technique for multiobjective optimization was investigated by Zhou et al. (2007). This approach explores prediction techniques to re-initialize the population of an EA. The proposed method uses information from the past to guide future search. Each individual in the population is tracked and its history is modeled through a time series model. Predictions about each individual’s position at the next time step are made using a linear model. These predictions are used to re-initialize the population after a change is detected. Two strategies for population re-initialization are investigated. The first, predicts the new location of individuals from the location changes that had occurred in the past. Then, the current population is updated using new individuals generated based on that prediction. The second strategy consists of perturbing the current population with Gaussian noise, whose variance is estimated according to the previous changes.

Rossi et al. (2008) compared different techniques to improve the search for tracking a moving optimum using the information provided by a predictor mechanism based on Kalman filters. The used predictor assumes that the changes in the environment are not random and could be learned, helping the EA to keep track of the current optimum.

The prediction methods investigated in the foregoing cited papers are used to estimate the likelihood of particular future situations and to decide what the algorithm must do in the present situation. Since information about the future typically is not available, it is attained through learning from past situations. The mentioned works are mainly focused on the estimation of where the optimum will move. The moment when the change happens is given to the algorithm or detected after its occurrence. Simões and Costa (2008, 2009b) introduced different methods to estimate the movement of the change and also the moment of the next change. With the correct use of the two predictors the EA starts preparing the change before it happens, by introducing useful information into the population. The proposed methods are based on Markov chains, linear and nonlinear regression techniques. The results obtained proves the efficacy of the proposed approaches, which will be further investigated in this article.

3 Dynamic environments

Dynamic environments can be classified in different ways. For example, Branke (2002) categorizes the environments using certain parameters of the problem: the frequency of change, the severity of change, the predictability of change and the periodicity of the change (i.e. the cycle length). A different categorization, suggested by De Jong (2006), uses a direct description of the problems, classifying them in (1) alternating problems, (2) problems with changing morphology, (3) drifting landscapes and (4) abrupt and discontinuous problems. Weicker (2003) proposes a classification of the dynamic environments aiming to establish a mapping between different types of dynamic optimization problems, techniques and performance measures. In order to achieve this, Weicker compares and classifies the dynamic problems using a mathematical framework. Moreover, Yang and Yao (2008b) proposes a classification that uses cyclic, cyclic with noise, and random environments constructed using the XOR Dynamic Optimization Problem generator. Since we are interested in assessing the effectiveness of two different predictors, in this paper we suggest an alternate way of classifying the dynamic environments. Therefore, we classify the changes into two main groups depending on when the environment changes and how the environment changes. In each group we only discuss the types of environments that are studied in this paper.

3.1 When does the environment change?

The time when the changes occur is defined by the change period, which consists of the number of function evaluations between two consecutive changes. Knowing the characteristics of the change period, different decisions can be made concerning the design of the EA. The change period has three main aspects to be considered: (1) the type, (2) the frequency and (3) the predictability.

  1. (1)

    Types of change period: the change period can be classified as (a) periodic or linear: the changes are observed at fixed intervals; (b) patterned: the interval between the changes is not constant, but instead follows a repeated pattern (c) nonlinear: the moments where the changes are observed follow a nonlinear function; and (d) random: the changes happen at random points without any pattern or periodicity.

  2. (2)

    Frequency of the change: determines how often the environment changes. Changes can occur every generation or at larger intervals. Environments that change faster are typically harder to deal with. This is an important issue, since in most approaches using EAs, the algorithm only reacts after the change is detected and the frequency of change can determine if the EA is able to quickly readapt to the new environmental conditions.

  3. (3)

    Predictability of the change: this aspect is directly related with the type of change period. If the change period follows a linear or a repeated trend, we can say that the moment of the next change can be predicted. However, if the change period is completely random, no prediction is possible. We introduce this aspect since we are interested in designing EAs that can react before the change happens.

3.2 How does the environment change?

Knowing how the environment changes is important in deciding if the incorporation of a memory component will be useful to the EA or if the application of other methods can be more effective.

  1. (1)

    Types of environmental changes: The environmental changes can be (a) cyclic, where situations from the past reappear in the future in a cyclic manner. In this type of environment the number of different environments can determine the difficulty of the problem; (b) cyclic with noise, where environments from past reappear but with small differences introduced by a noise factor; (c) probabilistic, when the transitions between a fixed number of environments are governed by some probability; and (d) random, where the environments change from a state to another completely different state without any correlation with the past.

  2. (2)

    Severity of change: the severity of change measures the strength of the modifications in the environment. The environment can change to a completely different state or to a similar one.

  3. (3)

    Predictability of the new environment: this is directly related with the type of environmental change. If the transitions between the environments are cyclic or follow a trend that can be captured by the algorithm, it is possible to predict which modifications will be observed at the next change. All of the previously mentioned types of environments, except the random ones, present some predicability.

4 Memory-based EAs

In dynamic problems, memory is used to store successful past solutions with the assumption that the optimum may return to its former value. When certain aspects of the problem exhibit some kind of periodic behavior, old solutions might be used to bias the search in their vicinity and reduce the time needed to recover. The use of memory is beneficial on those types of environments.

Memory-based approaches can be divided in two categories: implicit memory and explicit memory. Implicit memory is characterized by the use of redundant representations. By using diploid or multiploid chromosomes, the best individual of the population is implicitly memorized. A dominance scheme controls which genes are expressed in the phenotype (Goldberg and Smith 1987; Ng and Wong 1995; Uyar and Harmanci 2002, 2005; Yang 2006b). The approaches using explicit memory need an extra space to explicitly store information about the individuals or the environments. In these methods, one population performs the search for the optimum and the other population, called memory, stores useful information that is reused later (Yang 2006a; Simões and Costa 2007b, 2012; Barlow and Smith 2008).

This work uses an explicit memory-based EA proposed in Yang (2005) and called memory-enhanced genetic algorithm (MEGA). This algorithm is enhanced by adding two prediction modules (see Sect. 5). In MEGA, the population and the memory are initialized at random. The time to update memory (\(t_m\)) is decided using a random integer generator producing an integer between 5 and 10. If the memory is updated at generation \(t\), the next update will occur at generation \(t_m=t + rand(5, 10)\). In order to store the most relevant information to an environment in the memory, each time an environmental change is detected, the memory is also updated. If the memory update is due to \(t = t_m\), then the current best individual of the population is stored in the memory; if the memory update is because of a change detection, the elite from the previous population is stored. To store a new individual into the memory, another must be replaced: if there are random individuals in the memory, one of them is selected to be replaced; otherwise, the memory point closest to the individual being stored is selected for replacement, if it is a better solution. When an environmental change is detected, a new set of individuals is formed by merging the memory and the search population. Then, these individuals are evaluated in the context of the new environment, and the best \(p\) (population size) individuals are selected to become the new search population, which evolves through selection, crossover and mutation. Through this process, the memory remains unchanged. The best individual from the previous population is preserved and transferred to the next population replacing the worst individual (elitism of size 1). More details about this algorithm can be found in Yang (2005, 2008a).

5 Prediction

In this work, the previously described EA is empowered with two prediction modules. For the first predictor, to estimate when the next change will take place, two different methods are studied, one using linear regression, another using nonlinear regression. The second predictor is based on Markov chains and is used to estimate how the next environment will change. The combined and correct application of these two predictors allows to effectively anticipate the change and prepare the EA to the future environmental conditions. Knowing the time when the next change will happen and how the environment will change, it is possible to retrieve useful information from the memory and introduce it into the population before the occurrence of the environmental changes.

5.1 Predicting when

Usually, memory-based EAs for dynamic environments detect the changes when they occur. After the change is detected the information is retrieved from the memory and inserted into the population (Karaman et al. 2005; Simões and Costa 2007b; Yang 2007). Nevertheless, in certain types of dynamic environments some repeated behavior can be observed and it is possible to make predictions about when the next change will happen. For instance, if the environment changes periodically after a fixed number of generations, the generation when the next change will occur can be correctly predicted. Even in non-periodic environments, if some repeated pattern is present, prediction methods can be successfully applied. Two different methods were previously explored: a linear regression predictor (Simões and Costa 2008) and a nonlinear regression predictor (Simões and Costa 2009b). Those methods, which aim to estimate the time of the next change, are described next. We also describe a novel approach based on the concept of time-window that greatly improved the efficiency of the prediction module.

5.1.1 Linear regression predictor

Simple linear regression analyzes the relationship between a response variable \(y\) and a single explanatory variable \(x\). This statistical method assumes that for each value of \(x\), the observed values of \(y\) are normally distributed around a mean that depends on \(x\). These means are usually denoted by \(\mu _y\). In general the means \(\mu _y\) can change according to any sort of pattern as \(x\) changes. In simple linear regression it is assumed that they all lie on a line when plotted against \(x\). Linear regression allows inferences not only for samples for which the data is known, but also for those corresponding to \(x\)’s not present in the data.

In this work we propose the use of a linear regression predictor to estimate when the next change will happen. If the environment changes periodically at fixed time steps, the generation when the next change will occur can be successfully predicted by linear regression. In this case the explanatory variable \(x\) is the number of the change (1, 2, 3,\(\ldots \)) and the response variable \(y\) is the generation where that change occurred (10, 20, 30, \(\ldots \), for periodic change periods changing every 10 generations). To predict when the next change will occur using linear regression we proceed as follows: (1) the first two changes of the environment are memorized after they happen (no prediction could be made yet); (2) the \(k~th\) changes, \(k > 2\), can be predicted using the equation of the regression line. The first two changes in the environment cannot be predicted since they are needed to calculate the slope and the intercept of the regression line. After the first two changes, and using the values where those changes occurred, an approximation of the regression line is built and predictions about the next possible moment of change are provided. Then, each time a change occurs, new values for the slope and the intercept are computed and the regression line is updated.

In previous work, this predictor used all available data to calculate the slope and the intercept of the regression line and to provide the estimation for the next change. Therefore, as time passed, the predictor became slower. Another limitation of using all available observations was that, if the type of the change period was other than periodic, the prediction accuracy was seriously affected. In this paper we introduce the idea of time-window, which consists of the number of observations that is used to estimate the next value. Therefore, instead of using all available data, the linear predictor uses only a fraction of the available information to give the future prediction. Using the appropriate time-window value, the linear regression predictor is faster and also more robust, as the experimental results show.

5.1.2 Nonlinear regression predictor

The linear regression method is not suitable to fit data that follows a nonlinear pattern. For theses cases, nonlinear regression is often used because it allows to model a wide range of situations. The basic idea of nonlinear regression is the same as that of linear regression, namely to relate a response \(y\) to a vector of predictor variables \(x\) (McCabe and Moore 2003). Nonlinear regression is characterized by the fact that the prediction equation depends nonlinearly on one or more unknown parameters \(\theta \). Nevertheless, this method has a major limitation: the nonlinear function must be known. Moreover, an additional task is needed: the estimation of the nonlinear equation parameters. The task of parameter estimation for nonlinear regression is not straightforward. Usually, statistical software using numerical algorithms is used to analyze the data and produce the best parameter’s choice for that data (McCabe and Moore 2003). A nonlinear parameter estimation problem is an optimization problem whose goal is to minimize the sum of squared errors given by Eq. 1.

$$\begin{aligned} Sum_{e_i} = \sum _{i=1}^n \left( y_i - f(x_i, \theta _i) \right) ^2 \end{aligned}$$
(1)

Rather than minimizing the sum of squared errors, other techniques minimize the sum of absolute deviations. Several function minimization methods are used in parameter estimation, for instance, weighted least squares, maximum likelihood, Quasi-Newton method, Simplex procedure or Hooke–Jeeves pattern moves (McCabe and Moore 2003; Nash and Walker-Smith 1987). In general, these methods are not easily controllable and require much auxiliary information to work correctly. Another option, more general and easy to apply, is to use a genetic algorithm to evolve a population of individuals that minimize an objective function. This approach was introduced and successfully tested by Pan et al. (1995) and is used in this paper to estimate the parameters vector \(\theta _i, i=1, 2,\ldots , n\) (\(n\) is the number of parameters). The GA uses a population of binary strings which corresponds to different values of \(\theta \). The required number of genes is determined using the desired precision and the domain size for each parameter. The fitness function used to evaluate the population is the function given by Eq. 1. Because individuals with a higher fitness are selected more often, after some generations the best individual represents the optimal solution for \(\theta \). The initial population is generated at random and is evolved using tournament selection, uniform crossover and flip mutation. The best individual of the previous population is transferred to the next population to preserve the best solution found. The desired precision and the parameters’ domains are initialized at the beginning, and when the GA is running, in order to provide faster and correct estimations using the known data, an alteration can be made on these initial domains. This task is controlled by two parameters \(t_d\) and \(s_d\). The value of \(t_d\) is randomly chosen and can have four different values: 1 for a left translation of the domain, 2 for a right translation of the domain, 3 for an increase of the domain and 4 for a decrease of the initial domain. The size of the translation, increase or decrease is given by the parameter \(s_d\) (\(s_d \in [0.0 , 1.0]\)). These two parameters are changed during the run, and applied to the best individual of the population. If the individual’s fitness is improved using the new domain size, the domains’ size is adjusted for all individuals of the next generation.

A predictor based on nonlinear regression is tested to estimate when the next change will happen. While the linear predictor can provide predictions by calculating the slope and the intercept, the nonlinear predictor needs at least one function, in order to give future estimations. Four different functions are incorporated in the nonlinear predictor. The four functions are defined by the Eqs. 2 through 5.

$$\begin{aligned}&f_1 = \theta _1 + \theta _2x + \theta _3x^2\end{aligned}$$
(2)
$$\begin{aligned}&f_2 = \frac{\theta _1x}{\theta _2 + \theta _3 x}\end{aligned}$$
(3)
$$\begin{aligned}&f_3 = \frac{\theta _1}{1 + e^{(\theta _2 - \theta _3 x)}}\end{aligned}$$
(4)
$$\begin{aligned}&f_4 = \frac{(\theta _1 x)^{\theta _4} + \theta _2 \theta _3}{\theta _3 + x^{\theta _4}} \end{aligned}$$
(5)

Each one of these functions, using different values for the vector \(\theta \) can model a wide set of data. For example, function \(f_1\) creates a rapid change period, i.e, with few generations between two changes. Using \(f_2\) the change period is slower; with function \(f_3\) the change period initially changes very quickly but slows down as time proceeds. Function \(f_4\) is used to create a very rapid change period, becoming slower at the end. In the proposed nonlinear regression predictor, a set of \(n\) functions \(f_1, f_2, \ldots , f_n\), can be used to give predictions. At time \(t\), only one function is active, selected using Eq. 1. As we said, the vector parameter \(\theta \) is estimated using a standard GA. Every time a change occurs in the environment and additional information is available, the GA is executed to find the vector \(\theta \) that better fits the data. Thus, the vector \(\theta \) is estimated using only the known data. Using these estimated parameters and the selected function, the predictor indicates when the next change will occur. After the \(k\) th change has occurred, the prediction error is computed. The prediction error is the difference between the predicted value and the generation when the change actually occurred. If this error is greater than an established threshold \(\alpha _p\), the module analyzes all available functions to assess if another function can provide better results. In this step all functions are re-evaluated using Eq. 1 with the data obtained since the \((k-1)\) th change. The function which minimizes the sum of squared errors is selected to provide the next predictions. If more than one function have equal errors, this choice is made at random. Figure 1 shows how the proposed module works.

Fig. 1
figure 1

The nonlinear regression predictor (adapted from Simões and Costa 2009b)

Simões and Costa (2009a, b), this predictor was tested using cyclic, patterned and nonlinear change periods. The provided predictions were accurate for all types of change periods. However, the nonlinear change periods for testing this module were created using the functions incorporated in the module. Therefore, the prediction accuracy depended on the quality of the parameters’ estimation made by the GA. In this paper we intend to test the accuracy of this module using different nonlinear change periods obtained by other functions, than the four ones incorporated in the predictor. The goal is to see how the predictor chooses the function to approximate the known data and to measure the quality of the predictions provided by it. We also use a change period obtained by a combination of four different nonlinear functions, known and unknown to the module. Section 6 details this information.

5.2 Predicting how

The prediction of how the environment will be modified in the next change, combined with the prediction about the generation where that change will happen, makes possible the preparation of the population before the change. If we know how the next environment will look we can introduce the appropriate individuals into the population. This way, when the change effectively happens, the EA will be prepared to face the new environmental conditions.

To gather information about the characteristics of the known environments we use a Markov chain. The information stored in the Markov chain is also used to estimate which environment(s) can appear in the next change.

5.2.1 Markov chain predictor

A Markov chain can be defined as a sequence of random variables \(X_1\), \(X_2\), \(X_3\), ...  that do not keep memory of the whole past (Norris 1997). In fact, Markov chains are memoryless, meaning that the present state is enough to predict future states, i.e.: \(Pr(X_{n+1} = x |X_n=x_n, \ldots , X_1=x_1) = Pr(X_{n+1} = x | X_n=x_n)\).

A discrete Markov chain model can be defined by the tuple (S, P, \(\lambda \)). \(S\) is the state space, a finite or countable infinite set of possible values for a sequence of random variables \(X_1\), \(X_2\), \(X_3, \ldots ., P\) is a matrix representing transition probabilities between states. In the matrix \(P\), the element \(p_{ij}\) is the probability of going from state \(X_i\) to state \(X_j\). \(\lambda \) is the initial probability distribution for all the states in \(S\). \(\lambda = {p_0, p_1, p_2, \ldots }\) with \(p_i\), the probability of starting at state \(X_i\).

Markov chains are often described by a directed graph, where the nodes are the states and the edges are labeled by the probabilities of going from one state to another state.

The proposed approach uses two Markov chains. One is called system Markov chain (SMC) and another is called Algorithm Markov chain (AMC). The SMC is created at the beginning with all the possible states and probabilities transitions. This component is used to decide how the environment changes, but all the information contained in it is unknown to the evolutionary algorithm. There is no restriction to the maximum number of states, since this parameter is not related to the memory size: one individual in memory can be connected to more than one state. The AMC is created empty at the beginning, and is updated on-line as new information is gathered from the evolutionary process. If a new environment appears, a new state is added to the AMC and the corresponding matrix \(P\) is updated. The prediction about which environment(s) may appear in the future is given according to the information contained in the AMC. In a perfect scenario, at the end of the run, the AMC is equal to the SMC.

Example

The following example shows how the AMC evolves for a maximum number of states equal to 5 using the following SMC defined a priori: \(\lambda = {1.0, 0.0, 0.0, 0.0, 0.0}\) and the transition matrix:

$$\begin{aligned} P_{SMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} 0.50 &{} 0.00 &{} 0.50 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.50 &{} 0.25 &{} 0.75 \\ 0.00 &{} 0.00 &{} 0.00 &{} 1.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.25 &{} 0.00 &{} 0.75 \\ 1.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) \end{aligned}$$

Again, all this information is unknown to the EA and is used to decide how the environment will change.

The AMC starts without any information (\(t=0\)).

$$\begin{aligned} \frac{t = 0}{P_{AMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) } \end{aligned}$$

The first state is chosen using \(\lambda \), in this example is state 1. Using \(P_{SMC}\) the next states can be states 2 or 4. Assuming that the state 2 is chosen at random, the AMC transition matrix is updated (\(t=1\)).

$$\begin{aligned} \frac{t = 1}{P_{AMC} = \left( \begin{array}{ccccc} 0.00 &{} {{ 1.00}} &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) } \end{aligned}$$

At this point no predictions are performed, since at least two environmental changes are required to provide predictions. The next state is randomly chosen from the three possibilities (states 3, 4 or 5). Once more, assuming that state 5 is chosen, the AMC transition matrix is updated (\(t = 2\)). From state 5, \(P_{SMC}\) indicates that the next state will be state 1, so at \(t=3\) the AMC transition matrix is updated.

Since state 1 is already present in \(P_{AMC}\), the Markov model can make a prediction (state 2). Using the \(P_{SMC}\), the next states can be states 2 or 4. If the state 4 is chosen at random, the prediction fails, but this new transition is used to update the AMC transition matrix (\(t=4\)). Assuming that from state 4, a transition to state 5 is randomly chosen, the \(P_{AMC}\) is updated (\(t = 5\)).

$$\begin{aligned}&\frac{t = 2}{P_{AMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} 1.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} {{ 1.00}} \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) }\\&\frac{t =3}{P_{AMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} 1.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 1.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ {{ 1.00}} &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) }\\&\frac{t = 4}{P_{AMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} {{ 0.50}} &{} 0.00 &{} {{ 0.50}} &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 1.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 1.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) }\\&\frac{t = 5}{P_{AMC} = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 0.00 &{} 0.50 &{} 0.00 &{} 0.50 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 1.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ 0.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} {{ 1.00}} \\ 1.00 &{} 0.00 &{} 0.00 &{} 0.00 &{} 0.00 \\ \end{array} \right) } \end{aligned}$$

State 5 is already present in AMC, so the module gives state 1 as prediction. This state is in fact the state that will be chosen at \(t=6\), where \(P_{AMC}\) is updated (no changes are made). From state 1, AMC gives as predictions states 2 or 4, that are the two real possibilities, and, once more, at \(t=7\) no changes are made to \(P_{AMC}\). At \(t=6\) and \(t=7\) the predictions given by the Markov model are 100 % precise.

End of example

The use of a Markov model assumes that the environmental information can be measured and stored in a specific state of the chain. In this work the Markov model, each state has an integer index and explicitly stores information about the fitness function. This means that each state \(s\) (\(s = 1, 2, \ldots \)) of the SMC corresponds to a different instance of the problem to be optimized (capacity of the knapsack, binary template, system variables, etc). When a transition occurs, the AMC doesn’t know when it occurred (it is detected by the algorithm), but after being detected, the new environment is automatically identified by the AMC by observing which instance of the problem is now being optimized, for instance retrieving the new capacity of the knapsack in the new environment.

5.3 Anticipation

The efficacy of the two predictors described before is achieved if they are used at the right moment. In order to prepare the population before the changes happen, an anticipation mechanism must be robust and capable of dealing with erroneous predictions. Moreover, the prediction errors should be used to improve the next predictions’ values.

The “right moment” to start preparing the population is decided using the values predicted either by the linear predictor or by the nonlinear predictor. Both methods estimate the generation when the next change will be observed. Knowing this value, the system starts the preparation for the change some generations before. If the prediction mechanisms are accurate and the correct information is introduced into the main population before the change, the EA’s performance is not affected by the changes in the environment. When the prediction mechanisms fail and no anticipation is made, when a change occurs, the EA’s performance suffers from a sudden decrease and the EA takes some time to readapt to the new environment.

A parameter called \(\Delta \) is used to decide how many generations before the predicted moment of change the anticipation starts. The prediction error at time \(t\) (\(e_t\)) is given by \(e_t = g - g'\), where \(g\) is the predicted generation for the occurrence of next change and \(g'\) is the generation where the change actually happens. Those prediction errors can be positive or negative. A negative error indicates that the predicted value for the next change (\(g\)) is smaller than the real value (\(g'\)), i.e, \(g < g'\) and thus the anticipation of the change is successfully made. A positive error means the opposite. In this situation, if the value of \(\Delta \) is greater than this error, the anticipation is made before the change; otherwise, the change is detected only when it occurs and no effective anticipation is executed.

The value of \(\Delta \) must be chosen in order to cover the prediction error and to guarantee that the preparation for the next change is made before it happens. More explicitly, if the next change is estimated to happen at generation \(g\), at generation \(g - \Delta \), the Markov model is used to predict the set of possible future environments. At that time, individuals from the memory are retrieved and introduced into the population, replacing the worst ones. In order to be effective, the Markov model must act before the change. Therefore, if the change is observed at generation \(g'\), the value of \(\Delta \) must assure that the condition \(\Delta > |g - g'|\) is observed. In addition, smaller values of \(\Delta \) are better in the sense that the anticipation starts as close to the change as possible. If the environment changes fast and \(\Delta \) has a large value, the anticipation can be started in the wrong moment and is ineffective. Thus, the choice of the best value of \(\Delta \) assumes soaring importance. Different approaches to assign a value to \(\Delta \) were tested in Simões and Costa (2009a): \(\Delta \) constant and \(\Delta \) adjustable. The auto-adjusting methods for \(\Delta \) used the previous errors to decide the new value. Different approaches to adjust the value of \(\Delta \) were tested:

  • using the maximum prediction error

  • using the average of the positive prediction errors

  • using the average of all the prediction errors (absolute value)

  • using the maximum and the average of the positive prediction errors

The results obtained showed that all the methods for adjusting \(\Delta \) were very effective. In this paper we use the method that obtained the best results, which is the method that changes the value of \(\Delta \) using the maximum and the average of the positive prediction errors (Simões and Costa 2009a). In this method, the value of \(\Delta \) is updated as follows: in the first two changes \(\Delta = 5\), in the next \(k\) changes (\(k>2\)) the highest prediction error and the average of the positive errors are used to update \(\Delta \):

$$\begin{aligned} \Delta (k) = \left\{ \begin{array}{l@{\quad }l} 5 &{} \hbox {if }k = 1, 2\\ max\left\{ 2, \frac{\Delta _m(k) + \Delta _{av}(k)}{2}\right\} &{} \mathrm if k > 2 \end{array} \right. \end{aligned}$$
(6)

\(\Delta _m(k)\) computes a value for \(\Delta \) using the highest prediction errors:

$$\begin{aligned} \Delta _m(k) = \left\{ \begin{array}{l@{\quad }l} 5 &{} \mathrm if k = 1, 2\\ max\{2, e_1, e_2,\ldots , e_k\} &{} \mathrm if k > 2\\ \end{array} \right. \end{aligned}$$
(7)

and \(\Delta _{av}(k)\) is calculated using the average of the positive prediction errors:

$$\begin{aligned} \Delta _{av}(k) = \left\{ \begin{array}{l@{\quad }l} 5 &{} \mathrm if k = 1, 2\\ max\{2, \frac{\sum _{i=1}^{k} e_i}{k}\} &{} \mathrm if k > 2\mathrm and e_i > 0\\ \end{array} \right. \end{aligned}$$
(8)

where \(e_i\) is the observed error at the \(i\) th change.

5.4 Putting it all together: prediction in the EA

The proposed computational model, consisting of MEGA enhanced with two predictors and one anticipation modules, is called PredEA (see Fig. 2). This algorithm is a traditional EA that evolves a population of individuals aiming to optimize the current fitness function. A memory is used to store useful information from the past that is used in future changes. The first prediction module uses information about when the previous changes occurred to estimate the generation when the next change will be observed. The predictions are given by one of the techniques previously discussed, based on linear or on nonlinear regression. The second module uses a Markov chain to keep track of previous environments and provides predictions on how the environment will look like in the next change step. The two predictor modules are managed by a third component, the anticipation module, that uses the information provided by the previous two modules and prepares the EA for the next change.

Fig. 2
figure 2

Architecture of PredEA (adapted from Simões and Costa 2009a)

Memory-based evolutionary algorithm

In this algorithm, a main population of individuals evolves by means of selection, crossover and mutation and is used to find the best solution for the current environment. Another population is used as memory to store the best current individual from time to time. It starts empty and has a limited size (20 % of the global number of individuals). When a change happens (or it is predicted) the information stored in the memory is retrieved and used to help the EA readapt to the new environment. When the memory is full, the replacement of the memorized individuals is made using the generational replacing strategy proposed by Simões and Costa (2007a). This method replaces a memory individual of the same period by the best individual of the population, if it is a better solution. If there are more different environments than the capacity of the memory, the replacement is made using the similar strategy, which replaces the most genetically similar individual found in memory. Instead of updating memory in a fixed time interval, we create a stochastic time pattern using a random value between 5 and 10 (\(t_m = t + rand(5, 10)\)). The memory is also used to detect changes in the environment: a change occurs when at least one individual in the memory has its fitness changed. When the memory is updated, the best individual of the population is stored and the current state of the Markov model is added as a reference to that individual. If the memory capacity is attained and a solution is replaced, the new solution keeps the links to the Markov model states that the replaced individual had and a new link to a different state can be added. So, a memory solution can be linked to more than one state.

Predictor 1 module (P1)

This predictor uses information about when previous changes were observed to estimate when the next change would occur. Two methods are tested for this module: (1) a linear regression predictor which is tested using all available observations to estimate the next change or using only a fraction of those observations (time-window); (2) a nonlinear regression predictor is also investigated. This module is activated every time a change is detected, i.e, at change \(k\) the module provides an estimation for the generation where the change \(k+1\) may occur.

Predictor 2 module (P2)

This module keeps track of the different environments and estimates which environments may appear in the next change. Those predictions are made using only the information known so far about the previous environmental changes. Each state of the Markov chain corresponds to a different environment. If two states are linked, it means that a change happened from one state to the other. Associated to each transition is a probability value which is updated every time a change is detected. The initial state is randomly chosen among the existing states. Again we stress that this information is unknown to the algorithm and the model is updated throughout time. The information that is stored about each environment is problem dependent.

Anticipation module (A) This module receives the information provided by the two predictors and decides when to start the preparation of the EA for the next change. This activation must be done at the correct time in order to prepare the population to the next environment(s) predicted by the \(P2\) module. The \(P1\) module estimates the generation when the next change will be observed and the \(A\) module starts the anticipation some generations before. The \(\Delta \) parameter described previously is used by this module to decide how many generations before the predicted moment of change (\(g\)) the anticipation starts. The value of this parameter is also used to cover the prediction errors associated with the \(P1\)’s estimations. The anticipation process consists of retrieving from memory individuals that were good solutions in the environments that the \(P2\) module indicates as the next to appear. These individuals are inserted into the main population at generation \(g-\Delta \), replacing the same number of worst individuals, so the population size is kept constant. If the \(P2\) module don’t provide any prediction, five random individuals from memory are inserted into the population, replacing five randomly selected individuals. These retrieved individuals remain unchanged (i.e., they are not affected by crossover or mutation) until the change happens. Algorithm 1 shows the pseudo code of the PredEA algorithm.

figure a
figure b

6 Experiments

This section introduces the experiments that were performed to study, compare and validate the proposed prediction mechanisms in different dynamic problems. It provides a description of the problems used as benchmarks, details the different types of dynamic environments, specifies the parameter settings used in the experiments and finally describes how the evaluation and validation of the results was made.

6.1 Problems

In this paper two benchmark problems are used to test the proposed methods: the dynamic knapsack and the dynamic bit-matching problems. Benchmark problems should have a set of good characteristics like being simple to implement, analyze or evaluate, computational efficient, flexible and allow conjectures about real-world problems (see Nguyen et al. 2012). The dynamic knapsack and the dynamic bit-matching belong to this set of benchmarks that have these good characteristics (see Cruz et al. 2011). For the purpose of this investigation, the use of these benchmarks is preferable to the use of artificial general-purpose problems generators (see Rohlfshagen and Yao 2009, Ben-Romdhane et al. 2013). The knapsack problem is an NP-complete combinatorial optimization problem often used as benchmark. It consists of selecting a number of items to a knapsack with limited capacity. Each item has a value and a weight and the objective is to choose the items that maximize the total value, without exceeding the capacity of the bag. In this paper, the knapsack uses 100 items, the initial weights are randomly created in the range [1, 50] and the values are obtained by adding the corresponding weight to a random number obtained in the range [1, 5]. The capacity of the knapsack is set to 60 % of the total weight of all items. The fitness of an individual using binary representation is equal to the sum of the values of the selected items, if the weight limit is not reached. If too many items are selected, then the fitness is set to the difference between the total weight of all items and the weight of the selected items, multiplied by \(10^{-10}\), in order to ensure that invalid individuals are distinguished from the valid ones. The problem becomes dynamic by changing the capacity of the knapsack.

The bit-matching problem is a unimodal problem whose goal is to find a solution that matches a given template. Changing the template from time to time turns this problem dynamic. The severity of the change is defined by the number of bits that change in the template. The difficulty of the problem can be increased using templates with larger dimensions.

The maximum number of different environments (\(max\)) is set to \(max = 3, 5, 10, 20\) or \(50\). Depending on the problem, \(max\) different capacities (for the dynamic knapsack problem) or \(max\) different templates (for the dynamic bit-matching problem) are created. The different capacities are generated from the initial capacity of the knapsack and making variations of 20 % (severity of the change), following the next equation (\(m\) is the number of items, \(w_i\) is the weight of the \(i~th\) item, and \(C\) is the capacity of the knapsack):

$$\begin{aligned} C(t) = \left\{ \begin{array}{l@{\quad }l@{\quad }l} 0.6 \times \sum _{i=1} ^{m} {w_i}, \mathtt{{ if }}\ \quad t = 0\\ C(t-1) - 0.2 \times C(t-1), \mathtt{{ if }}\ \quad t\ \mathtt{{ is\ odd}}\\ C(t-1) + 0.2 \times C(t-1), \mathtt{{ if }}\ \quad t \ \mathtt{{ is\ even}} \end{array} \right. \end{aligned}$$

The different templates are created using an initial template \(T(0)=\{0\}\). The following templates are obtained by changing \(\frac{l}{max}\%\) (severity of the change) of the bits from the previous template (\(l\) is the chromosome length). Both problems use binary representation with chromosomes of length 100.

6.2 Dynamic environments

According to the classification provided in Sect. 3, three different types of change period are used: (1) periodic (or linear): every \(r\) generations, with \(r=10\), \(r=50\), \(r=100\) and \(r=200\); (2) patterned: the moments of change are decided by repeating an established pattern. Four different patterns are included in the study: 5-10-5, 10-20-10, 50-60-70 and 100-150-100. The generations when the environment changes are calculated using \(change(i) = change(i-1) + pattern(i-1\ \hbox {mod}\ K)\), where \(\hbox {mod}\) is a modula operation, \(K\) is the total number of components in a pattern (in this paper \(K=3\)) and \(i\) is the change index (\(i = 1, 2, \ldots \)). For the first change (\(i=1\)) we assume, \(change(i-1)=0\); (3) nonlinear: the change period is defined by a nonlinear function. In order to generate different nonlinear change periods, seven different types of nonlinear functions are used. These functions are the four ones incorporated in the module (Eqs. 2 through 5), and three additional ones, never used before, defined by Eqs. 9 through 11. The goal is to see how the nonlinear predictor approximates the data generated with functions that are not included in the module and how the linear predictor performs for these cases. The three new functions are defined by the following equations:

$$\begin{aligned}&f_5 = \frac{\theta _1x}{1 + e^{\theta _2\theta _3x}}\end{aligned}$$
(9)
$$\begin{aligned}&f_6 = \frac{\theta _1 x^2 + \theta _2 x}{e^{\theta _3 x} + \theta _4}\end{aligned}$$
(10)
$$\begin{aligned}&f_7 = \left\{ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} f_4, \mathtt{{ if }}\ \quad 1 < x \le 100\\ f_6, \mathtt{{ if }}\ \quad 100 < x \le 200\\ f_5, \mathtt{{ if }}\ \quad 200 < x \le 300\\ f_1, \mathtt{{ if }}\ \quad 300 < x \le 500 \end{array} \right. \end{aligned}$$
(11)

For generating the seven nonlinear change periods, the parameter vector \(\theta \) is used with the values of Table 1.

Table 1 Values for \(\theta _i\) used in the experiments

For each of the change period types, two different types of environmental changes are defined (cyclic and probabilistic) and the environments change between \(max\) different states. For the probabilistic type, the probabilities associated to each different state are set at the beginning of the run and correspond to the system Markov model (SMC) transition matrix. The probabilistic transitions are applied only in certain states, chosen at random. So, for 50 states, only a fraction of them have probabilistic transitions. The number of transitions is also chosen at random, between 2 and 5. The severity of the change is kept constant according to the description in the previous section. If the Markov model provides accurate predictions the severity of the change is not relevant for the performance of the algorithm, since that the stored information allows the reintroduction of the best individuals for the next change. The severity of the change can affect the performance of the algorithm if the prediction fails, but this experimentation is not included in this study.

6.3 Settings

Four different versions of PredEA are compared with the standard MEGA, that will be called noPredEA. PredEA is tested with the linear regression predictor using all known observations (PredEA-LR), using a time-window of size two (PredEA-LR2) and using a time-window of size ten (PredEA-LR10). Moreover, PredEA is also tested using the nonlinear predictor (PredEA-NLR). A time-window of size 2 or 10 means that the next prediction is made using the 2 or the 10 previous observations, respectively. For the nonlinear regression predictor, the estimation of the parameters is done using all available observations, except when a new function is selected. In this case, at \(k\) th change, the predictor considers only the observations measured after change \((k-1)\) th. All those techniques are used combined with the Markov chain predictor. In all the experiments, the information concerning the type of change period, the type of environmental change, the change period size, and the number of different states are unknown to the EA. A total of 30 runs are performed with each technique for each problem. Binary representation is used for all the studied problems. The global number of individuals (\(n\)) is set to 100. The memory size \(m\) is set to \(20~\%\) of \(n\). The EA is allowed to evolve for as many generations as necessary so as to result in 500 environmental changes. The detection of a change is made using the memory—a change occurs when at least one individual in the memory has its fitness changed. This method is not enough to detect changes for the Knapsack problem, so an additional mechanism is incorporated: the expected knapsack capacities are compared with the capacities generated by the algorithm. When a difference is observed, a change in the environment is assumed. The uniform crossover operator is applied with a probability of 70 % and flip mutation is used with 1 % rate. The initial population and memory are randomly created and the selection of parents is made using the binary tournament selection method. The next population is formed using the generated offspring, through recombination and mutation, and the best individual (the elite) of previous population is preserved. For the \(P2\) module, the threshold \(\alpha _p\) is set to 10.

The parameters of the GA used to find the best parameters to the nonlinear regression predictor are the following: population of size 50, uniform crossover rate of 75 % and mutation rate of 1 %. The GA is stopped in one of three cases: if no better solution is found for 10 generations, if the sum of the squared errors is equal to zero, if a maximum of 1,000 generations is attained. The domain of the parameters, and consequently the chromosome size, depends on the nonlinear function as shown in Table 2. The precision used for each parameter \(\theta \) is six places after the decimal point. The initial parameters’ domains are chosen in order to enclose a wide set of nonlinear curves but can change as described in Sect. 5.

Table 2 Upper and lower limits for the parameters’ domain and number of genes used to encode each parameter

6.4 Evaluation and validation

To evaluate the different algorithms, the offline performance measure (Branke 2002) is used, which is defined as follows:

$$\begin{aligned} \textit{offline}(t) = \frac{1}{t} \sum _{i=1}^t {best'_{t}} \end{aligned}$$

where \(t\) is the actual generation and \({best'_{t}}\) is the maximum observed fitness since the last time step at which a change in the environment occurred. This measure includes the re-evaluation of the memory individual’s introduced after the change is detected. The values presented in the next section refer to the average of the offline performance obtained at the end of the 30 runs. This work compares 6 different algorithms, applied to 15 different types of change periods, using 2 different dynamics (cyclic and probabilistic) for 5 different settings of the maximum number of different states, in two benchmark problems, resulting in a total of 1800 different tested situations. All the results obtained are statistically validated using the nonparametric Friedman test at a 0.01 level of significance. After this test, if a significance is found, the multiple pair wised comparisons are performed using the Nemenyi procedure. For multiple comparisons, the \(p\)-value (0.01) used in the Nemenyi test is adjusted using the Bonferroni correction method. In the statistical tables presented in the next section, each line compares a pair of algorithms using the notation “\(++\)” or “\(-\,\!-\)”, when the first algorithm is significantly better than, or significantly worse than the second one, respectively. The use of “\(\sim \)” indicates that there is no statistical significance in the results obtained by the two algorithms.

7 Results

This section presents the results of all the experiments. First, the prediction accuracy of the proposed methods is analyzed and second, the performance of the different algorithms is compared.

7.1 Prediction accuracy

Prediction accuracy consists of the frequency of correct outcomes reached by the proposed predictors. First we analyze the accuracy for the Markov model predictor, and then for the linear and nonlinear regression predictors.

7.1.1 Accuracy of the Markov model predictor

For the Markov model predictor, a predicted value is considered correct when the module provides the correct value for the next environmental transition.

Table 3 shows the prediction accuracy of the Markov model based on 500 environmental changes, in each run. As the number of different environments increases, the prediction accuracy expectedly decreases, but the attained results are still very good. Moreover, for probabilistic dynamics the prediction accuracy is worse. This happens because the Markov model uses data collected from previous transitions to estimate the future. As the number of environments increases or the transitions are probabilistic, the Markov model needs more time to learn the entire behavior of the environment. So, the proposed predictor based on a Markov model provides excellent predictions, well above \(90~\%\), in almost all situations.

Table 3 Accuracy of the Markov model predictions

Figure 3 shows how the EA behaves over time using prediction and without prediction. The example illustrates a typical result for the first 50 environmental changes in the dynamic bit matching problem with ten different states. We can see that the EA using prediction goes through a learning phase—where the Markov model acquires the history of possible environmental transitions—and an equilibrium phase—where the Markov model provides the correct predictions. During the equilibrium phase no fitness decrease is observed. On the other hand, the EA without prediction experiences a decrease on its performance every time a change happens. In this case, the recuperation is achieved only after the change, when the information from memory is introduced into the population.

Fig. 3
figure 3

Best of generation for PredEA-LR and noPredEA, bit matching problem

7.1.2 Accuracy of the linear and nonlinear predictors

For the linear and nonlinear regression predictors a value is considered accurate if, using the value of \(\Delta \), the anticipation is made before the real change occurs. If a previous change occurs at generation \(g_{before}\) and the predicted value for next change is \(g_{next}\), this value is considered accurate if \(g_{next} - \Delta < g_{real}\) and \(g_{next} - \Delta \ge g_{before}\) (where \(g_{real}\) corresponds to the generation when the change effectively happens and the accuracy of the predicted value is measured). Figure 4 shows different cases of good and bad predictions.

Fig. 4
figure 4

Examples of good and bad predictions for the Linear and nonlinear predictors

The results of Tables 4 and 5 show that the three linear prediction techniques, in periodic change periods, obtain 100 % accuracy. This is expected, since data follows a straight line, correctly estimated by these methods. For PredEA-LR, when the change period is patterned, the prediction accuracy is always above 99 %, and the prediction errors are negative, indicating that the predicted values correspond to a generation before the real change. Nevertheless, the values of \(\Delta \) are able to cover those errors, leading to the high prediction accuracies presented in Table 4. Even when \(\Delta \) presents larger values, for the 50-60-70 and the 100-150-100 patterned change periods, the predictions are correct. These larger values are due to the predictions errors used for adjusting the value of \(\Delta \). For the situations where the change period follows a nonlinear trend, the PredEA-LR technique fails. The prediction accuracies for those cases are very weak. This happens because the straight line obtained by the linear regression method uses all the observations which do not fit the data obtained by the nonlinear functions. The enormous prediction errors confirm that evidence.

Table 4 Accuracy of PredEA-LR
Table 5 Accuracy of PredEA-LR2 and PredEA-LR10

The linear regression is also tested using two different sizes for the time-window (PredEA-LR2 and PredEA-LR10). The results for these two methods are in Table 5 and show that the chosen size for the time-window influences the obtained prediction, especially in nonlinear change periods. The PredEA-LR2 technique obtains high prediction accuracy in all types of change periods. The lowest value (66.16 %) is obtained in the \(NL 4\) change period. Although the value of \(\Delta \) is enough to cover the prediction errors, due to the characteristics of \(NL 4\) curve, several erroneous anticipations are performed. The PredEA-LR10 obtains similar results for the periodic and patterned change periods but the prediction accuracy is slightly worst for the nonlinear change periods.

The results shown on Table 6 refer to the prediction accuracy of PredEA-NLR. The last column of this table indicates which function, incorporated in the module, is chosen to provide the predictions. The nonlinear predictor is 100 % accurate in the periodic change periods and obtains near 100 % of accuracy in the patterned change periods. An exception occurs for the change period 10-20-10 (66.76 %). For periodic and patterned change periods the functions \(f_1\) or \(f_2\) are selected to estimate the time of the next change. The values of the prediction error show that this selection is appropriate. For the nonlinear situations, PredEA-NLR has a higher prediction accuracy for \(NL 1\) through \(NL 4\) change periods, since those functions are incorporated in the module and are correctly selected to provide the estimated values. For \(NL 5\) through \(NL 7\) the prediction is made using the functions \(f_1\) and \(f_4\) and the accuracy is lower. Despite the worst accuracy obtained by PredEA-NLR, the prediction errors show that the module is able to select the best function available. For the change period \(NL 7\), the function \(f_4\) is selected firstly and, at generation 102, the function \(f_1\) is activated and used through the rest of the run.

The results obtained reveal some limitations of PredEA-NLR, namely the need to have a ‘good’ function to provide valid estimations. Moreover, for PredEA-LR, the use of all available information, besides the computational cost, leads to poor prediction accuracies in nonlinear change periods. The use of a time-window in the linear regression predictor (PredEA-LR2, PredEA-LR10) allows to obtain the best prediction accuracies in all the studied situations. The size of the time-window is an important choice that should be further analyzed.

Table 6 Accuracy of PredEA-NLR

7.2 Algorithms’ performance

This section sets forth the performance obtained by the different methods. The results refer to PredEA-LR, PredEA-LR2 and PredEA-NLR and are compared with noPredEA. The results of PredEA-LR10, for lack of space, are not reported, but are similar to PredEA-LR2’s results.

Figures 5 and 6 show the evolution of the algorithms’ performance over time. These two figures refer to the bit matching problem, for the 10-20-10 and the \(NL 5\) change periods, for 50 different states and are representative of the remaining cases. Table 7 shows the results for the dynamic bit matching problem, and Table 8 contains the scores concerning the dynamic knapsack problem. The statistical results, obtained using the statistical tests, are on Tables 9 and 10 for the dynamic bit matching problem and the dynamic knapsack problem, respectively. In all tables, adjacent to the number of states is the type of environment: \(C\) for cyclic and \(P\) for probabilistic.

Fig. 5
figure 5

Offline performance over time, 10-20-10, 50 states, cyclic (C) and probabilistic (P) transitions, bit matching problem

Fig. 6
figure 6

Offline performance over time, \(NL 5\), 50 states, cyclic (C) and probabilistic (P) transitions, bit matching problem

Table 7 PredEA and noPredEA results: dynamic bit matching for periodic, patterned and nonlinear change periods and for cyclic (C) and probabilistic (P) transitions between 3, 5, 10, 20 and 50 states
Table 8 PredEA and noPredEA results: dynamic knapsack for periodic, patterned and nonlinear change periods and for cyclic (C) and probabilistic (P) transitions between 3, 5, 10, 20 and 50 states
Table 9 Statistical results: dynamic knapsack for periodic, patterned and nonlinear change periods and for cyclic (C) and probabilistic (P) transitions between 3, 5, 10, 20 and 50 states
Table 10 Statistical results: dynamic bit matching for periodic, patterned and nonlinear change periods and for cyclic (C) and probabilistic (P) transitions between 3, 5, 10, 20 and 50 states

From the analysis of the results it is evident that, in general, all the prediction techniques allow the EA to preform significantly better than the same algorithm without prediction. The only situation where this observation is not so consistent is for the PredEA-LR in the nonlinear change periods, because of the poor accuracy of PredEA-LR on these cases (from 0.00 to 7.63 %—see Table 4). In fact, the results show that a relation between the prediction accuracy and the performance obtained by the algorithms can be found. Since the prediction accuracy for the Markov-based model is the same for all the algorithms, the different performances are related to the performance of the linear/nonlinear predictors. For the periodic change periods, all the methods are very accurate (100 %) and the performances obtained are, in the majority of the cases, equivalent. Tables 9 and 10 show that, in periodic change periods, the comparisons of the performances obtained by the three versions of PredEA are, in general, statistically equivalent. For the patterned change periods, the three proposed methods obtain similar performances, except for the 10-20-10 change period. In this case, PredEA-NLR obtains the lowest prediction accuracy (66.76 %) when compared with the other two approaches (99.73 %). Consequently, as the results show, PredEA-NLR achieves significantly worse fitness than PredEA-LR and PredEA-LR2. Figure 5 shows the offline performance measured over time. While the performance of noPredEA remains nearly constant during the entire process, the other algorithms improve their fitness as time passes. Furthermore, PredEA-LR2 is more effective at the beginning of the process than the other algorithms. Figure 5 also shows that PredEA-NLR performs worse than PredEA-LR and PredEA-LR2. For the remaining patterned change periods, the three methods obtain similar prediction accuracies (above 99 %) and the performances obtained are, in general, equivalent. Finally, for the nonlinear change periods, as stated before, the fitness obtained by PredEA-LR is statistically comparable to the scores achieved by noPredEA. The reason for this is the weak prediction accuracy obtained by the linear predictor in the nonlinear change periods. The use of all information in the linear regression method is unsuitable to fit the data generated by the nonlinear functions. The performances obtained by PredEA-NLR are always statistically better than PredEA-LR’s results. Although PredEA-LR fails in the nonlinear change periods, PredEA-LR2 obtains the best performances among all the techniques. The update of the regression line using only the two previous observations is enough to continuously fit the points of the nonlinear curves. This method performs evenly to PredEA-NLR for the \(NL 1\) through \(NL 4\) change periods, and is significantly superior in the remaining nonlinear change periods. Notice that the \(NL 1\) through \(NL 4\) change periods are generated using the nonlinear functions incorporated in the nonlinear regression predictor, benefiting PredEA-NLR. Figure 6 shows the evolution of the offline performance over time for the \(NL 5\) change period. It is obvious the superiority of PredEA-LR2 over the remaining methods. While PredEA-LR and PredEA-NLR are not able to evolve, PredEA-LR2 continuously improves its performance. Analyzing the plot, we can see that, around generation 2000, the offline measure of PredEA-LR2 for the cyclic case strongly increases. This happens because near generation 2000 the Markov model completes the learn of the dynamics of the environment, which is associated to the high prediction accuracy of PredEA-LR2 and leads to these superior results. For the cyclic case, the Markov model takes more time to learn the entire dynamics of the environment, so the performance increases slightly slower. For the nonlinear change periods, not only PredEA-LR2 obtains the highest results, but also provides a significantly faster performance of the EA. Since the predictions are provided using only two observations, the computational effort is considerably less than PredEA-LR that uses all the observations. Concerning the computational times PredEA-NLR is the most expensive method since the parameter’s estimation made by the GA is time consuming and is made in every environmental change.

As global conclusions we stress that the proposed prediction techniques significantly improve the performance of the EA in the studied dynamic environments. In fact, when the appropriate information is retrieved from the memory and inserted into the population before the change, the performance of the EA is significantly increased. Furthermore, the use of an appropriate time-window in the linear regression predictor resulted as the most robust and the most efficient method with the lowest computational effort.

8 Conclusions

When in the presence of environments that change following a repeated behavior, the use of prediction mechanisms is highly beneficial to the performance of memory-based EAs. Unlike other methods investigated so far, the proposed approaches are able to predict both when the next change will happen and how the environment will look like. By using past data, accurate predictions can be made and the algorithm can anticipate the change by introducing useful information into the population before such change takes place. Two different predictors were used to estimate when the next change would occur: one using linear regression, and another using nonlinear regression. The linear predictor was tested using all collected data and using only a part of the observations, enclosed by a time-window. All the proposed methods performed better than the algorithm without prediction. Moreover, the linear regression predictor using all the collected data, performed well for cyclic and patterned change periods, but failed in the presence of nonlinear change periods. On the other hand, the nonlinear predictor provided accurate predictions in all types of environments analyzed, except for the nonlinear change periods obtained by functions unknown to the module. The use of a time-window in the linear predictor was the most robust and efficient method: it allows the EA to obtain the best performances in all types of change periods, with the minimum computational cost. For the linear and patterned change periods, PredEA-LR2 performed equivalently to PredEA-LR and PredEA-LR10. For the nonlinear change periods, PredEA-LR2 outperformed the remaining techniques. The most relevant results of this method were obtained for the nonlinear change periods. While the other linear regression techniques failed and the nonlinear method found some difficulties, the use of a small time-window allowed to divide the nonlinear change period in small sets, which were correctly fitted by the linear regression method.

The second predictor was implemented using a Markov chain and estimated how the environment would change. The Markov chain stored information about the environments and the transitions among them. After that, this information was used to predict which possible environment(s) would appear in the next change. This predictor performed very well in the situations analyzed: it started by learning the dynamics of the environmental changes and, after that phase, the predictions provided were, for the most part, correct. When the number of different environments increased, the predictor needed more time to acquire all the necessary information in order to make valid predictions.

The investigated methods are advantageous for those problems where a repeated behavior can be found. In these cases, the investigated prediction techniques can improve the performance of the problem solver. In other type of problems the proposed approach can be used as a complement to the problem solver, working in the background, trying to identify possible patterns. So, even if a certain problem is mainly non-periodic, if in a certain moment some trend is observed, it will be detected and the benefits of the presented methods can be used. The fact that the new individuals chosen from the memory to be inserted into the population do not participate in the evolutionary process until the change effectively occurs, make them harmless if things go differently than predicted. Moreover, the prediction techniques are not yet studied for problems that have a stochastic nature or present time-linkage dependences. Examples of real world problems that present a repeated behavior and where our model could be tested, are the class of dynamic shortest path routing problems, e.g., in mobile ad hoc networks or dealing with traffic jams in cities.

The proposed methods, although conferring superior performances to the EA for the studied situations, present several limitations and need to be improved and further tested. So, as future work, it is crucial to improve the Markov chain module, removing the problem dependence. We are exploring different approaches to measure the environment and store the appropriate information in the Markov model. This information will be used to implicitly identify an environment as a new one or as a known one. Other required improvement concerns the choice for the size of the time-window. In fact, PredEA-LR2 and PredEA-LR10 worked well for the selected functions, but it is not certain that these time-windows are the best choices if different functions are used to generate the change periods. If these two issues are improved the proposed method can be used and tested in different problems, using different representations, such as real-valued. Moreover, the proposed techniques will be used in other types of algorithms besides memory-based, such as multi-population approaches and will be tested in a real word application.