1 Introduction

Evolutionary algorithms (EAs) are a class of nature-inspired algorithms that can be applied to solve a wide variety of optimization problems. Runtime analysis of nature-inspired algorithms has advanced considerably in recent years [2, 12], though most focus on static optimization problems, where the objective is simply to find the global optimum within the least amount of steps. Many real-world optimization problems are dynamic in nature, meaning that the optimal solution to a given problem may change as the problem conditions change over time, and the algorithms therefore need to be able to not only find the optimum at some point of time, but also to track the optimal solution over time as the problem changes.

Application of EAs to Dynamic Optimization Problems is the subject of study in the Evolutionary Dynamic Optimization field, which in recent years has attracted much activity. Many applications of evolutionary algorithms on dynamic problems are considered in literature [1, 13], and there are runtime analyses building on theoretical studies of evolutionary algorithms for dynamic problems [4, 7, 15]. The utility of a population for tracking problems was studied in evolutionary computation by Jansen and Schellbach [7], while different mechanisms for ensuring population diversity have been considered by Oliveto and Zarges [14]. In particular, a mechanism called genotype diversity was proved to be inefficient on a particular dynamic problem. Recently, Jansen and Zarges [8] compared evolutionary algorithms and artificial immune systems on a bi-stable dynamic optimization problem.

In [9], Kötzing and Molter introduced a dynamic pseudo-boolean function called Maze, the optimum of which slowly moves from the all-ones to the all-zeroes bit string in \(n\) phases, in each of which the optimum oscillates between two solutions that differ in the value of a single bit. The paper shows that while the Max–Min Ant System is able to track the changes occurring in this fitness function and find the optimum all-zeroes string within polynomial time, a (1+1) EA loses track of the optimum and requires an exponential amount of time to find the all-zeroes optimum with high probability.

In this paper, we consider the impact of introducing a population and a simple diversity mechanism to the (1+1) EA, showing that a (2+1) EA with genotype diversity is able to track the optimum of the Maze function. We then generalize the Maze to a function over a finite alphabet, and prove a hierarchy result with respect to the population size. More precisely, for any \(\mu \) and any \(c>0\), there is a variant of the Maze such that a (\(\mu \)+1) EA with genotype diversity will with probability \(1-O(n^{-c})\) succeed in tracking the optimum, whereas an EA with population size \(\mu -1\) will with probability \(1-O(n^{-c})\) lose track of the optimum. Finally, we return to consider the performance of the \(\hbox {MMAS}^{*}\) Max–Min Ant System Ant colony optimization (ACO) algorithm, and conclude that \(\hbox {MMAS}^{*}\), due to its pheromone memory model, is able to successfully track the optimum of the finite-alphabet version of Maze, even with shorter phase lengths than considered in [9] (if the alphabet is not too big). Our proofs are based on mixing time arguments, fitness levels with tail bounds, and a new variant of a variable drift theorem, which allows for a tail bound on the probability distribution of the pheromone value in \(\hbox {MMAS}^{*}\).

The paper is structured as follows. Section 2 defines the dynamic fitness function Maze, and the (\(\mu \)+1) EA with genotype diversity and \(\hbox {MMAS}^{*}\) algorithms generalized to larger alphabets. Section 3 proves the positive result for the simple (2+1) EA w. r. t. the classical Maze function on bit strings. The hierarchy result for larger alphabets is proven in Sect. 4. Finally, Sect. 5 shows that \(\hbox {MMAS}^{*}\) is efficient in tracking the optimum for every polynomial alphabet size. We finish with some conclusions.

2 Preliminaries

The Maze dynamic fitness function defined in [9] consists of \(n+1\) phases of \(t_0 = k n^3 \log n\) iterations each, where \(k\) is a sufficiently large constant. During the first phase, which we will for convenience refer to as phase \(0\), Maze is equivalent to OneMax, a fitness function that simply counts the number of \(1\)-bits in an \(n\)-bit string. In each subsequent phase \(i\), the function assigns fitness values \(n+2\) and \(n+1\) to bit strings \(0^{i-1}01^{n-i}\) and \(0^{i-1}11^{n-i}\), oscillating between assigning the higher fitness value to these individuals in a 0-0-1 pattern, with \(0^{i}1^{n-i}\) being favored, i.e. having the higher fitness value, every two iterations out of three.

The version shown below has been extended to assign fitness values to \(n\)-character strings over a finite alphabet; for \(r=1\), i. e., bit strings, it is exactly equivalent to the original Maze. In this context, OneMax counts the number of literal 1 characters in the string, and the sets OPT\(_p\) and ALT\(_p\) generalize the 0-0-1 oscillation pattern.

In this paper, we will examine how a \((2+1)\) Evolutionary Algorithm (EA) performs on the original Maze function, and how the (\(\mu \)+1) EA and MMAS algorithms perform on our finite-alphabet version. Rather than only focusing on expected optimization times (i.e. the expected iteration during which OPT\(_n\) is first constructed following the start of the final phase of the Maze), we consider the probability that OPT\(_n\) will be constructed by the algorithms by the end of phase \(n\) of the Maze, as well as what is likely to happen if it is is not.

The (\(\mu \)+1) EA with genotype diversity [14, 16] is shown as Algorithm 1. Definition 1 extends the mutation operator mut\(_r\) to support a finite alphabet as in [3, 5]; for \(r=1\), it is equivalent to the standard mutation operator of the (1+1) EA.

Several lemmas throughout this paper state that “a specific event occurs with high probability.” Definition 2 provides a more formal definition of this concept.

figure a

Definition 1

Let \(\varSigma = \{0, 1, \ldots , r\}\) be a finite alphabet.

The mutation operator \(\text {mut}_r\) creates an image \(y \in \varSigma ^n\) from \(x \in \varSigma ^n\) by independently replacing each character \(x_i\) of \(x\) (\(1 \le i \le n\)) with probability \(1/n\) with a symbol drawn uniformly at random from \(\varSigma \setminus \{x_i\}\).

Definition 2

An event \(E\) is said to occur with high probability if, for every constant \(c > 0\), \({{\mathrm{Pr}}}(E)=1 - O(n^{-c})\). An event \(E\) is said to occur within \(O(f(n))\) iterations with high probability if for every constant \(c>0\) there exists a \(c'>0\), possibly depending on \(c\) but not on \(n\), such that \(E\) occurs within \(c'f(n)\) iterations with probability \(1-O(n^{-c})\).

In Sect. 5, we will consider how the Max–Min Ant System [17] algorithm \(\hbox {MMAS}^{*}\), shown as Algorithm 2, is able to track the optimum of the finite-alphabet Maze function.

figure b

To use a path-constructing algorithm with a fitness function that assigns values to \(n\)-character strings, we use the construction graph shown in Fig. 1: every \(n\)-character string \(x \in \varSigma ^n\) corresponds to the path from \(v_0\) to \(v_n\) consisting of the edges \(e_{i,c}\) for which \(x_i = c\), and every \(v_0\)-\(v_n\) path corresponds to some \(x \in \varSigma ^n\) in this fashion. We use the standard choice for \(\tau _{\text {max}} = 1 - 1/n\), and set \(\tau _{\text {min}} = 1/(rn)\) and \(\rho = \varOmega \left( 1/(rn)\right) \) to accommodate an \((r+1)\)-character alphabet of the extended Maze function. Notably, for \(r > 1\), the sum of the pheromone values on edges leaving a vertex is no longer always equal to \(1\), and, when considering probabilities of selecting a particular edge, we have to use the bounds presented in Lemma 3, similar to [19, Lemma 1] and [18, Lemma 15].

Fig. 1
figure 1

Construction graph used for \(\hbox {MMAS}^{*}\) on the finite-alphabet Maze function. There are \(r+1\) edges between each pair of vertices \((v_{i-1}, v_{i})\)

Lemma 3

The sum of the pheromone values on edges leaving any specific vertex \(v\), \(\tau _\text {sum}\), can be bounded as:

$$\begin{aligned} 1 \le \tau _\text {sum} \le 1 + (deg^+(v)-1)\tau _{\text {min}} = 1 + 1/n, \end{aligned}$$

where \(\text {deg}^+(v)\) is the out-degree of vertex \(v\).

Proof

Recall that \(\text {deg}^+(v) = r + 1\) and \(\tau _{\text {min}} = 1/(rn)\). We prove these bounds by induction, noting that both hold at initialization, where \(\tau _\text {sum} = 1\).

If, prior to a pheromone update, \(\tau _\text {sum} \ge 1\), and no pheromone values are affected by the \(\tau _{\text {max}}\) border, \(\tau _\text {sum} (1-\rho ) + \rho \ge 1\) after the update; while if there are pheromone values capped at \(\tau _{\text {max}}\) after the update, we note that even if all the other pheromone values are \(\tau _{\text {min}}\), \(\tau _{\text {max}} + r \cdot \tau _{\text {min}} = 1\), proving the first inequality.

If, prior to a pheromone update, \(\tau _\text {sum} \le 1 + 1/n\), \(\tau _\text {sum} (1-\rho ) + \rho \ge \tau _\text {sum} \) can only occur as a consequence of pheromone values being affected by the lower pheromone border (as \(\tau _\text {sum} \ge 1\)), i. e., those for which \(\tau (1-\rho ) \le \tau _{\text {min}} \), increasing \(\tau _\text {sum} \) by at most \(\rho \tau _{\text {min}} \) for each such value. We note that there can be at most \(r\) such values, as the reinforced pheromone value cannot drop below \(\tau _{\text {min}} \), thus the sum of pheromone values after the update is at most \(\tau _\text {sum} (1-\rho ) + \rho + r \cdot \rho \tau _{\text {min}} = \tau _\text {sum} (1 - \rho ) + \rho + \rho /n \le 1 + 1/n\), proving the second inequality. \(\square \)

Note that both algorithms re-evaluate the fitness function when updating the population or the best-so-far solution. Similarly to [8], the considered clock \(t\) is external to the Maze function, making it possible to evaluate many solutions in one clock tick of the Maze; this corresponds to having hardware available to evaluate many solutions in parallel, while the problem changes occur at fixed intervals regardless of the number of parallel evaluations.

3 (2+1) EA on Maze

The (1+1) EA will with high probability require an exponential amount of time to find the \(0^n\) optimum on the Maze function, because there is at least a constant probability of ending each phase \(p > 0\) with \(x^* \ne 0^p1^{n-p}\), which lets the (1+1) EA revert to optimizing OneMax, destroying the \(0\)-prefix constructed so far, and eventually requiring a large mutation to recover [9]. In this section, we will show that a (2+1) EA with genotype diversity avoids this problem by being able to store both of the oscillating individuals in each phase, thereby ensuring that \(0^p1^{n-p}\) is in the population at the start of the next phase. This ensures that at the end of the last phase, the \(0^n\) individual is in the population with high probability.

Theorem 4

The \((2+1)\) EA with genotype diversity will with high probability have the \(0^n\) optimum in the population at the end of the last phase of the Maze.

To show this, we will prove that the \(1^n = \text {OPT}_0\) individual is found with high probability during the initial OneMax phase, and the EA is then able to follow the oscillation process: if OPT\(_{p-1}\) is in the population at the beginning of phase \(p\), OPT\(_{p}\) will be in the population at the end of that phase – meaning that at the end of final phase, OPT\(_n = 0^n\) will be in the population.

Lemma 5

The \((2+1)\) EA will discover the \(1^n\) individual within \(O(n \log n)\) iterations with high probability.

Proof

Recall that each phase of the Maze lasts \(kn^3\log n\) iterations for some constant \(k>0\). The fitness level method can be applied: partition the possible populations into levels based on the maximum fitness value of any individual within the population. The probability of an iteration leaving the level where the highest fitness value is \(n-i\) is at least \(p_i \ge (1/2) \cdot (1/n) \cdot (1-1/n)^{n-1} \ge i/(2ne)\): i. e., that of selecting the best-valued individual in the population as the ancestor, and flipping a single zero bit. Even if the EA starts at the lowest of \(n+1\) fitness levels, it has to leave at most \(n\) levels in order to reach the level where \(1^n\) is in the population; in expectation, this takes \(\mathrm {E}(T) \le \sum _{i=1}^{n} \frac{1}{p_i} = 2en \log n\) iterations.

The high probability result is obtained by applying the tail bounds on fitness levels derived in [20, Theorem 2]: using \(s = 50 n^2 > 4 \pi ^2 e^2 n^2 / 6 \ge \sum _{i=1}^{n} \frac{1}{{p_i}^2}\), and \(h = 1/(2en) \le \min p_i\), the probability that \(1^n\) is found within \(\mathrm {E}(T) + \delta \) iterations is at least \(1 - e^{-\frac{\delta }{4} \cdot \min \{\delta /s, h\}}\). Setting \(\delta = 50 c n \log n\), where \(c > 0\), and observing that \(h < \delta /s\) for sufficiently large \(n\), yields a probability of finding \(1^n\) within \(O(n \log n)=o(kn^3\log n)\) iterations of at least

$$\begin{aligned} 1 - e^{-\frac{50 c n \log n}{4} \cdot \min \left\{ \frac{50 c n \log n}{50 n^2}, \frac{1}{2en}\right\} } \ge 1 - e^{-\frac{50 c \log n}{8e}} > 1 - n^{-2 c}. \end{aligned}$$

\(\square \)

Lemma 6

If OPT\(_{p-1}\) is in the population at the beginning of phase \(p\), OPT\(_p\) will with high probability be in the population when phase \(p\) ends.

Proof

As OPT\(_{p-1} \in \text {ALL}_{p}\), it has a fitness value of at least \(n+1\) during phase \(p\), and therefore cannot be removed from the population. The OPT\(_p\) individual can be constructed by selecting OPT\(_{p-1}\) as the ancestor and flipping a single bit, which occurs in each iteration with probability at least \(1/(2en)\). The probability of this occurring within \(n^3 \log n\) iterations is at least:

$$\begin{aligned} 1 - (1 - 1/(2en))^{n^3 \log n} \ge 1 - e^{-\frac{1}{2e}n^2 \log n} = 1 - n^{-\varOmega (n^2)}. \end{aligned}$$

As OPT\(_p\) has a fitness value of at least \(n+1\) during phase \(p\), when constructed, it will replace a OneMax-valued individual in the population, and cannot be removed from the population during phase \(p\). \(\square \)

Combined, Lemmas 5 and  6 prove Theorem 4.

Proof

(of Theorem 4) By applying a union bound on the probabilities of failure during each of Maze ’s \(n+1\) phases, we can conclude that the (2+1) EA finishes phase \(n\) with \(0^n\) in the population with high probability. \(\square \)

4 (\(\mu \)+1) EA and the Finite-Alphabet Maze

While a (2+1) EA with genotype diversity is able to track the optimum of the original Maze, it is interesting to consider whether there exist Maze-like functions for which a larger population is required. In this section, we use the Maze function extended over a finite alphabet \(\varSigma = \{0, 1, \ldots , r\}\), and consider a (\(\mu \)+1) EA with genotype diversity, where \(r \in O(n)\) and \(\mu < n/2\). The phase length is still \(kn^3\log n\) for some sufficiently large constant \(k>0\). We will build toward two results: a population of \(\mu \le r\) is insufficient to track the optimum, while \(\mu > r\) is sufficient and enables the (\(\mu \)+1) EA to find \(0^n\) in polynomial time. These results are formalized in Theorems 7 and 8.

Theorem 7

If \(r \ge \mu \), \(r \in O(n)\), and \(\mu \le n/(2+\epsilon )\), where \(\epsilon > 0\) is an arbitrarily small constant, (\(\mu \)+1) EA with genotype diversity will with high probability not find the \(0^n\) optimum on Maze within a polynomial number of iterations.

Theorem 8

If \(\mu , r \in O(n)\), and \(r < \mu \), (\(\mu \)+1) EA with genotype diversity will with high probability finish the last phase of the Maze with the \(0^n\) optimum in the population given that an appropriately large constant \(k\) is chosen for the phase length \(t_0\).

As before, we need to verify that \(1^n\) is found during the initial OneMax phase; this is done in Lemma 9; notably, the constant \(k\) in the phase length \(t_0\) may need to be adjusted based on the high-probability constant \(c\): i. e., in order to make sure that \(1^n\) is located initially, the initial phase needs to be sufficiently long. Then, Lemma 10 shows that if an iteration begins with one of ALL\(_p\) individuals (i. e., those with a non-OneMax value during that phase) in the population, the population will with high probability be saturated with ALL\(_p\) individuals before the phase is over. These two lemmas are used in the proofs of both Theorems 7 and 8.

Lemmas 11 and 12 are used for Theorem 7. The former shows that once a population of \(\mu \le r\) individuals is filled with ALL\(_p\) individuals, the probability of OPT\(_p\) being in the population is at most a constant after a small number of additional iterations; while the latter states that if a phase \(p\) begins with no ALL\(_p\) individuals in the population (i. e., OPT\(_{p-1}\) was not in the population when the phase \(p-1\) ended), the (\(\mu \)+1) EA loses track of the optimum and reverts to optimizing OneMax with at least constant probability. This proof strategy is inspired by the lower bound for the (1+1) EA on the original Maze  [9].

Lemma 9

The (\(\mu \)+1) EA will discover the \(1^n\) individual within \(O(\mu r n \log n) = O(n^3 \log n)\) iterations with high probability.

The method used to prove Lemma 5 can be applied to prove Lemma 9 as well: we require that the best-fitness individual in the population is chosen as the ancestor, and the mutation operator changes a single non-\(1\) character into a \(1\), resulting in an individual with a higher OneMax-value than any previously in the population.

Lemma 10

During phase \(p\), once an individual from ALL\(_p\) is in the population, the population will contain \(\min (\mu , r+1)\) individuals from ALL\(_p\) within \(O(\mu r n \log n)\) iterations with high probability.

Proof

The general form of the fitness level method can be applied by partitioning the \(\mu \)-individual populations into levels by the number of ALL\(_p\) individuals they contain, from \(0\) to \(\min (\mu , r+1)\), observing that the number of ALL\(_p\) individuals in the population cannot be reduced during phase \(p\). As the phase starts with a population containing at least one ALL\(_p\) individual, there are at most \(\min (\mu , r+1)-1\) “population levels” that the process may need to pass through before the population is saturated with ALL\(_p\) individuals; the time for this to happen can be bounded as a sum of geometrically distributed waiting times to leave each “level”.

If \(i < \min (\mu , r+1)\) is the number of ALL\(_p\) individuals in the population, the probability of a single iteration creating a new ALL\(_p\) individual (and hence moving to a higher level) by selecting one of the \(i\) as the ancestor and performing a one-character mutation is \(p_i \ge \frac{i}{\mu }\frac{r+1-i}{r}\frac{1}{en}\). Let \(T\) be the number of iterations before the population is saturated with ALL\(_p\) individuals, and \(m = \min (\mu , r+1)-1\) be the number of ALL\(_p\) individuals that need to be added to the population to achieve saturation, thus the expectation \(\mathrm {E}(T)\) is at most:

$$\begin{aligned} \sum _{i=1}^{m} \frac{\mu }{i}\frac{r}{r+1-i}en = r \mu en \sum _{i=1}^{m} \frac{1}{i (r+1-i)} = O(r \mu n). \end{aligned}$$

Applying the tail bounds from [20, Theorem 2]: using \(s = 13 \mu ^2 r^2 n^2 > \sum _{i=1}^{m} \frac{1}{{p_i}^2}\), and \(h = 1/(\mu {}ren) < \min p_i\), the probability that \(m\) ALL\(_p\) individuals are added to the population within \(\mathrm {E}(T) + \delta \) iterations is at least \(1 - e^{-\frac{\delta }{4} \cdot \min \{\delta /s, h\}}\). Setting \(\delta = 13 c \mu r n \log n\), where \(c > 0\), and observing that \(h < \delta /s\) for sufficiently large \(n\), yields a probability of reaching the final level within \(O(\mu r n \log n)\) iterations of at least

$$\begin{aligned} 1 - e^{-\frac{13 c r \mu n \log n}{4} \cdot \min \left\{ \frac{13 c r \mu n \log n}{13 \mu ^2 r^2 n^2}, \frac{1}{e r \mu n}\right\} } \ge 1 - e^{-\frac{13 c \log n}{4e}} > 1 - n^{-c}. \end{aligned}$$

Thus, if an ALL\(_p\) individual exists in the population during phase \(p\), the population will be saturated with ALL\(_p\) individuals in \(O(\mu r n \log n)\) iterations with high probability. \(\square \)

The following lemmas consider the situation for \(\mu \le r\), i. e., a population size too small to contain every individual in ALL\(_p\). In this case, OPT\(_p\) can be removed from the population after being discovered. Interestingly, this happens with constant probability regardless of \(\mu \).

Lemma 11

If, during phase \(p\), the population consists of \(\mu \) individuals from ALL\(_p\), and \(r \ge \mu \), then after \(\varOmega (rn)\) iterations, the probability that OPT\(_p\) is in the population is at most a constant smaller than 1.

Proof

OPT\(_p\) can be replaced by an ALT\(_p\) individual that was not in the population during an iteration that favors ALT\(_p\) individuals over OPT\(_p\). The probability \(p_L\) of OPT\(_p\) being replaced by one of \(r+1-\mu \) ALT\(_p\) individuals not yet in the population can then be bounded:

$$\begin{aligned} p_L = \frac{r + 1 - \mu }{r} \frac{1}{n} \left( 1 - \frac{1}{n}\right) ^{n-1} ,\quad \frac{1}{rne} \le p_L \le \frac{1}{n}. \end{aligned}$$

The OPT\(_p\) individual can be added to the population during any iteration favoring it over the ALT\(_p\) individuals by a single-character mutation of any individual in ALT\(_p\). The probability \(p_W\) of OPT\(_p\) being rediscovered during an iteration favoring it can be bounded:

$$\begin{aligned} p_W = \frac{1}{r} \frac{1}{n} \left( 1 - \frac{1}{n}\right) ^{n-1} ,\quad \frac{1}{rne} \le p_W \le \frac{1}{rn}. \end{aligned}$$

Consider the probability of the OPT\(_p\) individual being in the population after three additional iterations: it is greatest when the first of these three iterations favors ALT\(_p\) individuals, and the subsequent two favor OPT\(_p\). Let \(p_{W3}\) be the probability that OPT\(_p\) is constructed during the two iterations favoring it (being added to the population if it was not already there):

$$\begin{aligned} p_{W3} = p_W + (1 - p_W) \cdot p_W , \quad \frac{1}{rne} \le p_W \le p_{W3} \le 2 p_W \le \frac{2}{rn}, \end{aligned}$$

and \(p_{L3}\) be the probability that OPT\(_p\) is replaced by an ALT\(_p\) individual in the first iteration, and then not constructed in the two following iterations:

$$\begin{aligned} p_{L3} = p_{L} \cdot (1 - p_{W3}) , \quad \frac{1}{2rne} \le \frac{1}{rne} \frac{rn - 2}{rn} \le p_{L3} \le \frac{1}{n} \left( 1-\frac{1}{rne}\right) < \frac{1}{n}, \end{aligned}$$

where the lower bound holds for \(n \ge 4\).

This behavior can be modeled in a two-state Markov chain, where the states represent having or not having OPT\(_p\) in the population, with transition probabilities \(p_{W3}\) and \(p_{L3}\); each step of this Markov chain thus corresponds to three iterations of the (\(\mu \)+1) EA. Let \(\pi _H\) be the steady-state probability of the \(OPT_p\) individual being in the population:

$$\begin{aligned}&\pi _H \cdot p_{L3} = (1-\pi _H) \cdot p_{W3} \\&\frac{1-\pi _H}{\pi _H} = \frac{p_{L3}}{p_{W3}} \ge \frac{1}{4e} \\&\pi _H \le \frac{1}{1 + \frac{1}{4e}} < 0.916. \end{aligned}$$

Assume, as the worst case, that the OPT\(_p\) individual is in the population when it becomes saturated with ALL\(_p\) individuals. Markov chain mixing time, \(t(\epsilon )\), can then be used to find the number of steps required to reduce the total variation distance to the steady state \(\pi _H\) to at most some \(\epsilon \) (and hence the number of iterations required to reduce the probability of OPT\(_p\) being in the population to at most some constant smaller than 1), and can be upper-bounded by coupling time as in [18, Corollary4]:

$$\begin{aligned} t(\epsilon ) \le \min \left\{ t : \max _{x,y\in \varOmega } \Pr (T_{xy} > t) \le \epsilon \right\} , \end{aligned}$$

where \(T_{xy} = \min \{t: X_t = Y_t \mid X_0 = x, Y_0 = y\}\) is the coupling time, i. e., the earliest time at which the two equivalent Markov chains \(X_t\) and \(Y_t\), initialized in different states \(x\), \(y\), are in the same state.

With only two states (and symmetry), the coupling time \(T_{xy}\) is greatest when the chains begin in different states. The probability that the chains remain in different states for at least \(t\) steps is then:

$$\begin{aligned} \max _{x,y \in \varOmega } \Pr (T_{xy} > t) = \left( p_{L3}p_{W3} + (1-p_{L3})(1-p_{W3})\right) ^t. \end{aligned}$$

To get an upper bound on \(t(\epsilon )\), an upper bound on the expression in parentheses is needed. Inserting the appropriate bounds on \(p_{W3}\) and \(p_{L3}\) yields:

$$\begin{aligned}&2 p_{W3} p_{L3} + 1 - p_{L3} - p_{W3} \le \frac{4}{rn^2} + 1 - \frac{3}{2rne} \\&\quad \le 1 - \frac{1}{rne} \left( 1.5 - \frac{4e}{n}\right) < 1 - \frac{1}{rne} \text{(for } n \ge 22\text {)} , \\&\quad t(\epsilon ) \le \min \left\{ t : \left( 1-\frac{1}{rne}\right) ^t \le \epsilon \right\} . \end{aligned}$$

After, e. g., \(t(0.01) < 4.61 rne\) steps of the Markov chain, the probability that coupling has not occurred is at most 0.01. As coupling time is an upper bound for mixing time, and each Markov chain step corresponds to three EA iterations, this means that after at most \(37.60rn\) iterations, the probability that OPT\(_p\) is in the population is at most \(\pi _H + 0.01\), and so the probability that OPT\(_p\) is not in the population is at least 0.083.

Therefore, \(\varOmega (rn)\) iterations after the population is saturated with ALL\(_p\) individuals, the probability that OPT\(_p\) is in the population is at most a constant. \(\square \)

Lemma 11 therefore implies that there is at least a constant probability of phase \(p+1\) beginning with only ALT\(_p\) individuals in the population when \(r \ge \mu \). The following lemma considers the consequences of this—as OPT\(_p\) is the only individual in both ALL\(_p\) and ALL\(_{p+1}\), this means that phase \(p+1\) begins without any ALL\(_{p+1}\) individuals in the population, leaving the EA with only OneMax-valued individuals at the beginning of phase \(p+1\).

Lemma 12

If at the beginning of phase \(p \ge n/2 + 4\), the population contains no individuals from ALL\(_p\), and only individuals with fitness values between \(n-p\) and \(n\), with at least constant probability, the population at the end of phase \(p\) will consist only of \(1^n\) and one-character mutations of \(1^n\).

Proof

Observe that ALL\(_p\) contains only individuals with fitness values \(n-p\) and \(n-p-1\): thus, in order to construct an ALL\(_p\) individual from the individuals in the population, a mutation must change at least one character to a specific value. If an ALL\(_p\) individual is constructed, it will be accepted, and the optimization process will resume according to Lemma 10. If no such individual is constructed, individuals with higher OneMax-values will continue being accepted into the population, eventually leading to the discovery of \(1^n\) as in Lemma 9.

Until an ALL\(_p\) individual has been constructed, the probability of constructing one can be upper-bounded by the probability of a mutation changing a character at a specific position to a specific value, i. e., \(1/(rn)\).

Let a value-improving mutation be a mutation that produces an individual with a strictly higher OneMax-value than the worst individual currently in the population. Note that even if a value-improving mutation occurs, the resulting individual might already be one of the \(\mu -1\) other individuals in the population, in which case the population is not modified. We will now show that the probability \(p_I\) of such a mutation occurring while the minimum OneMax-value of any individual in the population is below \(n-p+4\) is at least:

$$\begin{aligned} p_I \ge \frac{n}{2} \frac{1}{n} \frac{1}{r} \left( 1 - \frac{1}{n}\right) ^{n-1} \ge \frac{1}{2e r}. \end{aligned}$$

Let \(n-p \le v \le n-p+3\) be the minimum OneMax-value of an individual in the population during an iteration. If the ancestor selected during an iteration has a OneMax-value of at least \(v+2\), any one-character mutation will produce an individual of sufficient value to be accepted; there are \(n \ge n/2\) such mutations. If the selected ancestor has a value of at most \(v+1\), it contains at least \(n - (n - p + 4) \ge n/2\) non-1 characters, and hence at least \(n/2\) one-character mutations can produce an individual with a strictly higher OneMax-value than such an ancestor. Thus, regardless of the ancestor selection, there are at least \(n/2\) possible one-character mutations leading to an acceptable fitness value.

No more than a constant fraction of those value-improving mutations can fail to be accepted due to already being in the population (as \(\mu \le n/(2+\epsilon )\)). This means that the probability of a value-improving mutation occurring and being accepted during a single iteration is at least \(\varOmega (p_I) = \varOmega (1/r)\).

Consider the probability of the minimum OneMax-value of the population rising to at least \(n-p+4\) without any ALL\(_p\) individual being constructed; this requires that at most \(4\mu \) value-improving mutations to be accepted (as a value-improving mutation can only introduce an individual with the same fitness value into the population at most \(\mu \) times). This occurs with at least constant probability: let \(A\) be the event that an ALL\(_p\) individual is constructed, and \(V\) be the the event that a value-improving mutation is accepted, then:

$$\begin{aligned}&{{\mathrm{Pr}}}(A \mid A \vee V) \le \frac{1/(rn)}{1/(rn) + \varOmega (1/r)} = O(n^{-1}) , \\&{{\mathrm{Pr}}}(4\mu \; V \text {s without } A) \ge (1 - {{\mathrm{Pr}}}(A \mid A \vee V))^{4\mu } = (1 - O(n^{-1}))^{O(n)} = \varOmega (1). \end{aligned}$$

Once the minimum OneMax-value of the population is raised to \(n-p+4\), constructing an ALL\(_p\) individual requires at least three characters to be changed into a \(0\) simultaneously, which occurs with probability at most \(1/(rn)^3\).

Thus, with at least constant probability, if phase \(p\) begins without an ALL\(_p\) individual in the population, no ALL\(_p\) individual is constructed within the \(O(\mu r n \log n)\) iterations required to find the \(1^n\) individual with high probability (per Lemma 9). Furthermore, once the \(1^n\) individual is in the population, one-character mutations of that individual will fill the population in \(O(\mu \log n)\) iterations with high probability, making construction of an ALL\(_p\) individual require a \(p - 2 = \varOmega (n)\) character mutation. \(\square \)

By combining these lemmas, it is now possible to prove Theorems 7 and 8.

Proof

(of Theorem 7) Per Lemma 9, the EA is able to construct the \(1^n\) individual with high probability during phase 0, and therefore also fill the population with one-character mutations of \(1^n\). At the start of each phase \(p\), the fitness value of individuals in the population is thus with high probability not below \(n-p\), as any individual would have to have been accepted during the previous phase, so it was either an individual in ALL\(_{p-1}\), which have a fitness value of at least \(n-p\), or has a OneMax fitness value at least that of an individual already in the population.

Pessimistically assume that at the end of the phase \(n/2+3\), the population does not yet consist of \(1^n\) and one-character mutations of it, i.e. the EA has not fully reverted to optimizing OneMax during the first \(n/2+3\) phases. \(\varOmega (n)\) of the remaining phases, per Lemma 11, have at least a constant probability of beginning without an ALL\(_p\) individual in the population, and with all individuals in the population having fitness values between \(n-p\) and \(n\); per Lemma 12, a phase beginning under such circumstances will with at least constant probability end with a population consisting of \(1^n\) and one-character mutations thereof. Once the population is in this configuration, constructing an ALL individual (for this or any future phase) will require at least \(\varOmega (n)\) characters to be mutated correctly in a single iteration, which occurs with probability \((rn)^{-\varOmega (n)}\).

Therefore, with high probability, the EA will find the initial \(1^n\) optimum, but will at some point fail to track the oscillation, revert to optimizing OneMax, and require an exponential number of iterations to find the final \(0^n\) optimum. \(\square \)

Proof

(of Theorem 8) Per Lemma 9, the EA is able to find the \(1^n\) individual with high probability during phase 0. For the subsequent \(n\) oscillation phases, apply Lemma 10: if the phase begins with OPT\(_{p-1}\) (i. e., an individual from ALL\(_p\)) in the population, then the remaining individuals from ALL\(_p\), including OPT\(_p\) will be added to the population before phase \(p\) ends with high probability. Add up the failure probabilities in these \(n+1\) phases using a union bound; with high probability, none of the phases fail, and phase \(n\) ends with OPT\(_n = 0^n\) in the population – so the (\(\mu \)+1) EA is able to find the \(0^n\) optimum within a polynomial number of iterations with high probability when \(\mu > r\). \(\square \)

5 ACO on Larger Alphabets

Kötzing and Molter [9] study the case where \(r=1\), i. e., the case of bit strings, and show that \(\hbox {MMAS}^{*}\) with overwhelming probability (i. e., probability \(1-2^{-\varOmega (n)}\)) will track the optimum of Maze with oscillation phase length \(t_0=kn^3\log n\) in polynomial time. Their proof can be summarized as follows, wherein all references to lemmas and theorems refer to the paper:

  1. 1.

    While a bit is oscillating, taking the total over three steps of a so-called OPT-OPT-ALT oscillation (where OPT is favored in the first two steps and ALT in the last step) there is a drift of the pheromone value on the \(0\)-edge of the bit (i. e., the edge associated with OPT) towards its maximum. With overwhelming probability, the pheromone value will reach its upper border in \(O(n^3\log n)\) steps (Lemma 2 in their paper) provided the bit keeps oscillating so long. To obtain the bound on the probability, a multiplicative drift theorem is used, which, according to [9], “wastes a lot” since the bound on the drift obtained is additive.

  2. 2.

    Despite the fact that the pheromone value reaches its upper border within the oscillation phase of a bit, the multiplicative drift theorem does not imply that the pheromone value stays close to the border by the end of the oscillation phase. To prove that this is unlikely, a negative drift theorem is applied (Lemma 3), implying that it takes a large polynomial amount of time until the pheromone value drops below \(1-O((\log ^2 n)/n)\).

  3. 3.

    The transition from “bit \(i\) oscillating” to “bit \(i+1\) oscillating” is analyzed. It is shown that a string outside ALL will be best-so-far string only temporarily for at most \(O(\log n)\) steps after the transition, with high probability (Lemma 3). Drift arguments towards the \(0\)-value are applied afterwards.

  4. 4.

    Using an \(O((n\log n)/\rho )\) bound from the literature on the time from initialization until the all-ones string with is found high probability (Lemma 5) and basically applying union bounds leads to the result (Theorem 6).

It turns out that the above analysis to a very large extent carries over to larger alphabets. Basically, one can group together all edges belonging to the non-0 entries of a character and identify the sum of their pheromone values with the pheromone on the \(1\)-edge in the binary case. The only thing to pay attention to is the new lower bound on the pheromone values, which may increase the time to reach the upper border by a factor of \(r\).

However, we are going to present a stronger and, as we think, more elegant analysis here. In particular, we contribute a technique that supplements the drift result from Lemma 3 in [9] with a statement on probability distributions (also called occupation probabilities), which makes the application of the negative drift theorem (Lemma 4 in [9]) unnecessary. In addition, the stronger analysis allows us to work with shorter oscillation phase lengths, as detailed below.

We now present the tool by which the statement on occupation probabilities is obtained. The following lemma is a variable drift theorem that is not concerned with the expected first hitting time of a set of target states but with the probability of being in this set at any time \(t\) (however, results on the hitting time can be easily obtained from this probability). It is a spin-off of the variable drift theorem with tail bounds from [10] and goes back to a statement from Hajek’s paper [6] that, to the best of our knowledge, has not been applied in the running time analysis of evolutionary algorithms yet. The lemma requires a bound on the moment-generating function of a potential function \(g\) that is usually derived from the one-step drift (via the function \(h\)).

Lemma 13

Let \((X_t)_{t\ge 0}\), be a stochastic process, adapted to a filtration \((\mathcal {F}_t)_{t\ge 0}\), over some state space \(S\subseteq \{0\}\cup [x_{\min },x_{\max }]\), where \(x_{\min }\ge 0\). Let \(a,b\in \{0\}\cup [x_{\min },x_{\max }]\), \(b> a\). Let \(h:[x_{\min },x_{\max }]\rightarrow \mathbb {R}^+\) be such that \(1/h\) is integrable on \([x_{\min },x_{\max }]\) and define \(g:\{0\}\cup [x_{\min },x_{\max }]\rightarrow \mathbb {R}^{\ge 0}\) by \(g(x) := \frac{x_{\min }}{h(x_{\min })} + \int _{x_{\min }}^x \frac{1}{h(y)} \,\mathrm {d}y\) for \(x\ge x_{\min }\) and \(g(0):=0\).

If there exist \(\lambda >0\), \(\beta <1\) and \(D>0\) such that for all \(t\ge 0\)

then

$$\begin{aligned} {{\mathrm{Pr}}}(X_t \ge b \mid X_0) < \beta ^{t} \cdot e^{\lambda (g(X_0)-g(b))} + \frac{1-\beta ^t}{1-\beta }De^{\lambda (g(a)-g(b))} \end{aligned}$$

for all \(t > 0\).

Proof

We use ideas implicit in in the proof of Inequality 2.6 of [6], which uses the exponential method (a generalized Chernoff bound), and argue

$$\begin{aligned} {{\mathrm{Pr}}}(X_t \ge b \mid X_0) = {{\mathrm{Pr}}}(g(X_{t}) \ge g(b)\mid X_0)&= {{\mathrm{Pr}}}(e^{\lambda g(X_{t})} \ge e^{\lambda g(b)}\mid X_0) \\&\le \mathrm {E}(e^{\lambda g(X_{t}) - \lambda g(b)}\mid X_0), \end{aligned}$$

where the first inequality uses that \(g(x)\) is non-decreasing, the equality that \(x\mapsto e^x\) is a bijection, and the last inequality is Markov’s inequality. Now,

by our prerequisites; we omitted the condition on \(X_0\) for space reasons in lines 2–4. Inductively (note that this does not assume independence of the \(g(X_{t-1})-g(X_{t})\)), we get

$$\begin{aligned} \mathrm {E}\left( e^{\lambda g(X_{t})}\mid X_0\right) \le \beta ^t e^{\lambda g(X_0)} + \sum _{r=0}^{t-1} \beta ^r D e^{\lambda g(a)}, \end{aligned}$$

altogether

$$\begin{aligned} {{\mathrm{Pr}}}(X_t \ge b \mid X_0) \le e^{\lambda (g(X_0)-g(b))} \beta ^t + \frac{1-\beta ^t}{1-\beta }D e^{\lambda (g(a)-g(b))} \end{aligned}$$

as suggested. \(\square \)

The preceding lemma will be applied to show Lemma 15, whose purpose is to combine Lemmas 2 and 3 from [9] and which shows that the pheromone value after a certain point of time will come close to its border and stay close with high probability. In fact, due to the strength of the lemma, we can work with a smaller oscillation length per character if \(r\) is not too big. More precisely, we redefine \(t_0:=kr^2 n^2\ln (rn)\) in the Maze function, for some constant \(k\), which is less than the original \(kn^3\log n\) if \(r=o(\sqrt{n})\). By contrast, [9] needs \(\varOmega (n^3\log n)\) oscillations per bit due to the application of the multiplicative drift theorem along with negative drift.

To prepare the proof of Lemma 15, we set up an appropriate potential function to define a non-negative stochastic processs \((X_t)_{t\ge 0}\), which is partially based on the distance of the pheromone value to its border. However, as in [9], it also includes an indicator random variable based on the best-so-far entry. The following lemma summarizes some crucial properties of the process. Hereinafter, we call the pheromone values of a character \(i\), \(1\le i\le n\), saturated according to the best-so-far solution \(x^*\) if the edge corresponding to \(x^*_i\) has pheromone \(\tau _{\text {max}} \) and all other edges for character \(i\) have value \(\tau _{\text {min}} \).

Lemma 14

Assume \(\rho \le \frac{1}{7rn}\) and \(\rho =\varOmega (\frac{1}{rn})\). Consider an OPT-OPT-ALT oscillation of a single character, with the best-so-far solution from ALL, and pheromone values of the non-oscillating characters saturated according to the best-so-far solution. Let \(\tau _t\) be the pheromone value on the edge corresponding to the \(0\)-entry for the oscillating character after \(t\) OPT-OPT-ALT oscillations, \(C_t\) be an indicator that the best-so-far solution at the start of the oscillation is OPT, and \(X_t := 1 - 1/n - \tau _t + \tfrac{7}{2}\rho (1 - C_t)\) be a potential function. Then, for \(X_t \ge 1/n\), the following observations hold:

  1. 1.

    \({{\mathrm{Pr}}}(X_{t+1} > X_t \mid C_t = 1) = O(X_t)\) and \({{\mathrm{Pr}}}(X_{t+1} < X_{t} \mid C_t = 0) = O(1-X_t)\),

  2. 2.

    \((X_t - X_{t+1} \mid C_t = 1) = O(X_t \rho )\) and \((X_{t+1} - X_{t} \mid C_t = 0) = O((1-X_t) \rho )\),

  3. 3.

    \(\mathrm {E}(X_t-X_{t+1}\mid C_t = 1) = \varOmega (X_t\rho )\) and \(\mathrm {E}(X_t-X_{t+1}\mid C_t = 0) = \varOmega ((1-X_t)\rho )\).

Proof

Consider the statements conditioned on \(C_t = 1\) first: i. e., those that consider oscillations that begin with OPT as the best-so-far solution. In such oscillations, the first two iterations will always reinforce \(\tau _t\), as the best-so-far solution OPT has the highest possible fitness value; while the last iteration may replace OPT with a solution from ALT if one is constructed.

We note that the potential function \(X_t\) decreases unless an ALT solution is constructed in the third iteration, and upper-bound this probability using \(\tau _t\), noting that iterations reinforcing \(\tau _t\) do not increase the probability that ALT solutions are constructed:

$$\begin{aligned} {{\mathrm{Pr}}}(X_{t+1} > X_t \mid C_t = 1) < 1 - \tau _t = X_t + \tfrac{1}{n} \le 2 X_t , \end{aligned}$$

as \(X_t \ge 1/n\).

If an ALT solution is not constructed, \(\tau _t\) is reinforced thrice during the oscillation, decreasing \(X_t\) by at most:

$$\begin{aligned} (X_t - X_{t+1} \mid C_t = 1)&\le \tau _t(1-\rho )^3 + \rho \left( 1 + (1-\rho ) + (1-\rho )^2\right) - \tau _t \\&= (1-\tau _t) \rho (3 - 3\rho + \rho ^2) < 3(X_t + \tfrac{1}{n}) = 6 X_t\rho , \end{aligned}$$

as \(X_t \ge 1/n\).

Finally, for the expected decrease in \(X_t\), bound the probability of constructing ALT in the third iteration using \(p_a \le (1-\tau _t/\tau _\text {sum}){\tau _{\text {max}}}^{n-1} \le (1-\tau _t/\tau _\text {sum})/2\), where \(\tau _\text {sum}\) denotes the sum of pheromone values on the edges belonging to the character. The upper bound equals the probability of selecting an oscillating edge corresponding to an ALT solution, and \(n-1\) other edges with pheromone values \(\tau _{\text {max}}\) corresponding to the remaining characters in the ALT solution, using that \({\tau _{\text {max}}}^{n-1} \le 1/2\) for \(n > 1\). Hence,

$$\begin{aligned} \mathrm {E}(\tau _{t+1} \mid C_t = 1)&= \tau _t(1-\rho )^3 + \rho (1-\rho )^2 + \rho (1-\rho ) + (1-p_{a})\rho , \\ \mathrm {E}(X_{t} - X_{t+1} \mid C_t = 1)&= \mathrm {E}(\tau _{t+1} \mid C_t) - \tau _t - p_{a} \cdot \left( \tfrac{7}{2} \rho \right) \\&= \rho \left( (1 - \tau _t)\left( 3 - 3 \rho + \rho ^2\right) - \tfrac{9}{2}p_{a} \right) \\&\ge \rho \left( (1 - \tau _t)\left( \tfrac{3}{4} - 3 \rho + \rho ^2\right) - \tfrac{9}{4}\tau _t/n \right) = \varOmega (\rho X_t) , \end{aligned}$$

recalling that \(\tau _\text {sum} \le 1 + 1/n\), yielding \((1-\tau _t/\tau _\text {sum}) \le 1 - \tau _t + \tau _t/n\).

In the statements conditioned on \(C_t = 0\), the best-so-far solution is in ALT. The first two iterations will evaporate \(\tau _t\) unless an OPT solution is constructed; while the final iteration will evaporate \(\tau _t\) if an ALT solution is constructed, or if OPT was not constructed during the first two.

We note that \(X_t\) increases unless an OPT solution is constructed during at least one of the first two iterations, and upper-bound this probability using \(\tau _t\), \({\tau _{\text {max}}}^{n-1} \le 1/2\) for \(n > 1\), and a union bound:

$$\begin{aligned} {{\mathrm{Pr}}}(X_t > X_{t+1} \mid C_t = 0) < 2 (\tau _t/2) = \tau _t = 1 - X_t - \tfrac{1}{n} + \tfrac{7}{2}\rho < 1 - X_t. \end{aligned}$$

If an OPT solution is not constructed, \(\tau _t\) is evaporated thrice during the oscillation, increasing \(X_t\) by at most:

$$\begin{aligned} (X_{t+1} - X_t \mid C_t = 0)&= \tau _t - \tau _t(1-\rho )^3 = \tau _t \rho (3 - 3\rho + \rho ^2) \\&< 3\left( 1 - X_t - \tfrac{1}{n} + \tfrac{7}{2}\rho \right) \rho < 3 (1 - X_t) \rho . \end{aligned}$$

Finally, to compute the expected decrease in \(X_t\), we need to derive bounds for three probabilities: \(p_o\), the probability that OPT is constructed at least once in the first two iterations; \(p_f\), the probability that given OPT is constructed in the first two iterations, it was constructed in the first iteration; and \(p_a\), the probability that given that OPT was constructed in the first two iterations, an ALT solution is constructed in the third iteration.

Using \({\tau _{\text {max}}}^{n-1} \ge 1/e\), we upper-bound the probability that OPT is not constructed by \(1-p_o \le (1-\tau _t/(e \tau _\text {sum}))(1-\tau _t(1-\rho )/(e \tau _\text {sum}))\), which provides a lower bound on \(p_o\).

In most cases, OPT is more likely to be constructed in the first iteration compared to the second, as its corresponding pheromone value would decrease due to pheromone evaporation. However, if \(\tau _t(1-\rho )\) drops below \(\tau _{\text {min}} \), and \(\tau _\text {sum} > 1\) is reduced, the second iteration may be more likely to construct OPT than the first. This effect is greatest when \(\tau _t = \tau _{\text {min}} \) and \(\tau _\text {sum}\) is as large as possible, i. e., \(1 + 1/n\), i.e. the probability that OPT is constructed during the first iteration can be bounded as \(p_1 \ge \tau _{\text {min}}/(1+1/n)\), while the probability that OPT is constructed in the second given that it was not constructed in the first can be bounded as \(p_2 \le \tau _{\text {min}}/(1 + 1/n - \rho / n)\), i. e., the first iteration is at least \(\frac{p_1}{p_2} = \frac{n+1-\rho }{n+1} \ge \frac{2-1/7}{2} = \frac{13}{14}\) times as likely to construct OPT as the second. The probability that OPT is then constructed in the first iteration given that it is constructed at all during the first two iterations is then \(p_f \ge \frac{p_1}{p_1 + (1-p_1)p_2}\), and thus \(p_f > 1/3\) can be used as a very coarse lower-bound by substituting \(p_2 \ge p_1 \ge \frac{13}{14} p_2\) and computing the limit as \(p_2\) approaches 0 from above.

Pessimistically assuming that OPT is not constructed in the first iteration, so the pheromone value on the OPT edge evaporates during the first iteration, and lower-bounding the effect of the pheromone evaporation and reinforcement in the second iteration as \(0\), the pheromone value on the OPT edge is at least \(\tau (1-\rho )\) during the third iteration, and hence the probability of constructing an ALT solution can be upper-bounded as \(p_a \le (1-\tau _t(1-\rho )/\tau _\text {sum}){\tau _{\text {max}}}^{n-1} \le (1-\tau _t(1-\rho )/\tau _\text {sum})(38/100)\) for \(n > 15\).

Combining the three probability bounds, we consider the expected pheromone value \(\tau _{t+1}\) on the edge corresponding to OPT, and then the distance \(X_{t+1}\), which additionally decreases by \(\tfrac{7}{2}\rho \) if OPT is constructed in either of the first two iterations, and ALT is not constructed in the third iteration:

$$\begin{aligned} \mathrm {E}(\tau _{t+1} \mid C_t = 0)&\ge \tau _t(1-\rho )^3 + p_o \rho (p_f(1-\rho )^2 + (1-\rho ) + (1-p_a)) , \\ \mathrm {E}(X_{t} - X_{t+1} \mid C_t = 0)&\ge \mathrm {E}(\tau _{t+1} \mid C_t = 0) - \tau _t + p_o (1-p_a) \tfrac{7}{2} \rho \\&\ge \rho \left( p_o \left( \tfrac{35}{6} - \tfrac{9}{2} p_a\right) - 3 \tau _t \right) + \rho ^2 \left( 3 \tau _t - \tfrac{5}{3} p_o \right) \\&\quad - \rho ^3 \left( \tau _t + \tfrac{1}{3}p_o \right) \\&= \rho \tau _t\left( \left( \tfrac{35}{6} - \tfrac{9}{2}p_a\right) b - 3 + \rho \left( 3 - \tfrac{5}{3}b \right) - \rho ^2 \left( 1 - \tfrac{1}{3}b \right) \right) \\&= \rho \tau _t\left( \left( \tfrac{1237}{300} + \tfrac{171}{100}\tfrac{\tau _t}{\tau _\text {sum}} \right) b - 3 + \rho \left( 3 - \tfrac{5}{3}b - \tfrac{171}{100}\tfrac{\tau _t b}{\tau _\text {sum}}\right) \right. \\&\quad \left. - \rho ^2 \left( 1 - \tfrac{1}{3}b \right) \right) , \end{aligned}$$

where \(b = \frac{2 - \rho }{e \tau _\text {sum}} - \tau _t \frac{1 - \rho }{e^2 {\tau _\text {sum}}^2} \le p_o / \tau _t\). By bounding \(\frac{2}{e} > b > \frac{2}{e} - \frac{\tau _t}{e^2} - o(1)\),

$$\begin{aligned} \mathrm {E}(X_{t} - X_{t+1} \mid C_t = 0)&\ge \rho \tau _t\left( \left( \tfrac{1237}{300} + 1.71\tau _t/\tau _\text {sum} \right) b - 3 - O(\rho )\right) \\&> \rho \tau _t\left( \left( \tfrac{1237}{300} + 1.71\tau _t/\tau _\text {sum} \right) \left( 2/e - \tau _t/e^2 - o(1)\right) \right. \\&\quad \left. - 3 - o(1) \right) \\&= \rho \tau _t\left( \tfrac{1237}{150e} - 3 + \tau _t\left( \tfrac{3.42}{e\tau _\text {sum}} - \tfrac{1237}{300e^2} - \tfrac{1.71\tau _t}{e^2\tau _\text {sum}} \right) - o(1) \right) \\&> \rho \tau _t\left( 0.033 - o(1)\right) = \varOmega (\rho (1-X_t)) , \end{aligned}$$

by observing that for a sufficiently large \(n\) (and hence a \(\tau _\text {sum}\) sufficiently close to \(1\)), the expression multiplied with the inner \(\tau _t\) is positive. \(\square \)

Lemma 15

Let the assumptions from Lemma 14 hold; recall that \(\tau _t\) denotes the pheromone value on the edge corresponding to the \(0\)-entry for the oscillating character after \(t\) oscillations. For \(t\ge C(r n^2+\alpha n)\), where \(C>0\) is some sufficiently large constant, and all \(\alpha \ge 0\) it holds

$$\begin{aligned} {{\mathrm{Pr}}}\!\left( \tau _t \le 1-O\!\left( \frac{1}{n}+\frac{\log ^2 n}{rn}\right) \right) \le e^{-\alpha }+e^{-\varOmega (\log ^2 n)}. \end{aligned}$$

Proof

We will apply Lemma 13, using the bounds from Lemma 14. Note that \(0\le X_t\le 1-\tfrac{1}{n}-\tfrac{1}{rn} + \tfrac{7}{2}\rho \le 1-\tfrac{1}{2n}\) since \(\tfrac{1}{rn}\le \tau _t\le 1-\tfrac{1}{n}\) and \(\rho \le \tfrac{1}{7rn}\). We will consider the cases \(C_t=0\) and \(C_t=1\) separately. First, let \(C_t=1\). If \(X_t\ge 1/n\), we get from the third item of Lemma 14 and the choice \(\rho =\varTheta (1/(rn))\) a drift of the \(X_t\)-process according to \(\mathrm {E}(X_t-X_{t+1}\mid X_t;X_t\ge 1/n) = \varOmega (X_t \rho ) \ge \frac{c_1 X_t}{r n}\) for some sufficiently small constant \(c_1>0\).

We therefore set \(x_{\min }:=0\), \(a:=\tfrac{1}{n}\), \(b:=a+ \tfrac{\log ^2 n}{rn}\), \(x_{\max }=1-1/(2n)\) and \(h(x):=\tfrac{c_1}{rn^2}\) in Lemma 13. The choice of \(h(x)\) results in \(\mathrm {E}(X_t-X_{t+1} \mid X_t; X_t> a) \ge h(X_t)\). Moreover, by definition, \(g(x):= \frac{xr n^2}{c_1}\) since \(x_{\min }=0\). Hence, we have the following simple bound on the drift of the \(g(X_t)\)-process:

$$\begin{aligned} \mathrm {E}(g(X_t)-g(X_{t+1})\mid X_t; X_t> a)&\ge \frac{\mathrm {E}(X_t-X_{t+1}\mid X_t; X_t> a)}{c_1/(rn^2)} \\&\ge \frac{c_1 X_t/(rn)}{c_1/(rn^2)} = X_t n. \end{aligned}$$

Note that \(X_t-X_{t+1}\le \tfrac{13}{2}\rho \) since three iterations can change \(\tau _t\) by at most \(3\rho \) and \(C_t\le 1\) holds. This implies \(g(X_t)-g(X_{t+1}) \le c_2 rn^2\rho \) for some constant \(c_2>0\). Hence, as \(\rho =\varTheta (1/(rn))\), we get \(g(X_t)-g(X_{t+1})\le c_3 n\) for another constant \(c_3\ge 1\). An expansion of the exponential function will show for \(\lambda :=\frac{c_4}{n}\), where \(c_4>0\) is a sufficiently small constant, that

$$\begin{aligned}&\mathrm {E}(e^{-\lambda (g(X_t)-g(X_{t+1}))} \cdot 1\!\!1\left\{ X_t>a\right\} \mid X_t)\\&\quad \le \mathrm {E}(e^{-\lambda (g(X_t)-g(X_{t+1}))} \mid X_t; X_t>a) \le 1 - \frac{\lambda }{2} \le e^{-\lambda / 2}, \end{aligned}$$

which then can be used to prove the lemma.

We supply the details for the expansion now. By setting \(c_4\le \frac{1}{c_3}\), we get \(\lambda (g(X_t)-g(X_{t+1})) \le 1\). Using \(e^{-x}\le 1-x+x^2\) for \(x\le 1\), we get for the moment-generating function (mostly omitting the conditions \(X_t; X_t>a\) in expectations for readability)

$$\begin{aligned}&\mathrm {E}\left( e^{-\lambda (g(X_t)-g(X_{t+1}))}\mid X_t; X_t>a\right) \\&\quad \le 1-\lambda \mathrm {E}\left( g(X_t)-g(X_{t+1})\right) + \lambda ^2 \mathrm {E}\left( (g(X_t)-g(X_{t+1}))^2\right) \\&\quad \le 1-\lambda \mathrm {E}\left( g(X_t)-g(X_{t+1})\right) + \lambda ^2 \mathrm {E}\left( |g(X_t)-g(X_{t+1})|\right) (c_3 n) \end{aligned}$$

We already know that \(\mathrm {E}(g(X_t)-g(X_{t+1})) \ge X_t n\). We are left with an estimate for \(\mathrm {E}(|\varDelta |)\), where \(\varDelta :=g(X_t)-g(X_{t+1})\). By the law of total probability (and again using \(|\varDelta |\le c_3 n\), where \(c_3\ge 1\)),

$$\begin{aligned} \mathrm {E}(|\varDelta |) \le \mathrm {E}(\varDelta \mid \varDelta >0) + c_3 n {{\mathrm{Pr}}}(\varDelta <0) \le c_5 X_t n + c_3 n c_6 X_t \le (c_5+c_6) c_3 n X_t, \end{aligned}$$

where we used the first and second item from Lemma 14 and introduced appropriate constants \(c_5,c_6>0\) to cover the implicit constants from \(O\)-notation and the factor \(1/c_1\) from \(g(x)\).

Hence,

$$\begin{aligned} \mathrm {E}(e^{-\lambda (g(X_t)-g(X_{t+1}))})&\le 1- \lambda X_t n + \lambda ^2 (c_5+c_6) X_t (c_3 n)^2\\&\le 1- \lambda X_t n + \lambda \frac{c_4}{n} (c_5+c_6) (c_3^2 n) X_t n. \end{aligned}$$

Choosing \(c_4\le 1/(2c_3^2(c_5+c_6))\), we get from last bound that

$$\begin{aligned} \mathrm {E}(e^{-\lambda (g(X_t)-g(X_{t+1}))}) \le 1- \lambda X_t n + \frac{\lambda }{2} X_t n = 1- \frac{\lambda }{2} X_t n \le 1-\frac{\lambda }{2}, \end{aligned}$$

which completes the analysis for \(C_t=0\) if \(X_t > a = 1/n\).

If \(C_t=0\), we can redo the above calculations analogously with \(\mathrm {E}(g(X_t)-g(X_{t+1})) \ge (1-X_t) n\) and replace \(X_t\) with \(1-X_t\). We note that \(1-X_t\ge 1/(2n)\), hence still \(\mathrm {E}(X_t-X_{t+1}\mid X_t;X_t\ge 1/n) = \varOmega (1/(r n^2))\). In the estimation of \(\mathrm {E}(|\varDelta |)\), the events \(\varDelta <0\) and \(\varDelta >0\) are swapped. The constants may take different values, but remain constants. Choosing \(c_4\) small enough to cover both cases, we get

$$\begin{aligned} \mathrm {E}(e^{-\lambda (g(X_t)-g(X_{t+1}))}\mid X_t; X_t> a) \le 1-\frac{\lambda }{2}\le e^{-\lambda /2} \end{aligned}$$

regardless of whether \(C_t=0\) or \(C_t=1\).

We are left with the case \(X_t\le a\) in Lemma 13. Pessimistically assuming the maximum change \((13/2)\rho \) of the \(X_t\)-values, we can bound

$$\begin{aligned} \mathrm {E}(e^{-\lambda (g(a)-g(X_{t+1}))} \cdot 1\!\!1\left\{ X_t\le a\right\} \mid X_t) \le e^{c_4c_2 \rho rn} \le D \end{aligned}$$

for some constant \(D>0\). Applying Lemma 13 with \(\beta :=1-\lambda /2 \le e^{-\lambda /2}\),

$$\begin{aligned} {{\mathrm{Pr}}}(X_t \ge b) \le e^{-\frac{tc_4}{2n} + \frac{c_4}{n} \frac{rn^2}{c_1 }} + \frac{D}{\lambda /2} \cdot e^{\lambda (g(a)-g(b))} \end{aligned}$$

since \(X_0\le 1\). Now, if \(t=\frac{2rn^2}{c_1} + \frac{2\alpha n}{c_4}\) then \({{\mathrm{Pr}}}(X_t \ge b) \le e^{-\alpha } + \frac{2D}{\lambda }e^{\lambda (g(a)-g(b))}\). The second term is \(O(n) \cdot e^{-\varOmega (\log ^2(n))} = e^{-\varOmega (\log ^2(n))}\). Setting \(C:=\max \{\tfrac{2}{c_1},\tfrac{2}{c_4}\}\) and noting that \(X_t\ge b\) corresponds to \(\tau _t \le 1-O(\frac{1}{n}+\frac{\log ^2 n}{rn})\), the lemma follows. \(\square \)

We remark here that the statement of Lemma 15 can most likely be strengthened to hold already for \(t\ge C(r n\ln n+\alpha n)\) by using a different \(h(x)\). However, since the bottleneck will be in the analysis of the time needed for phase \(0\), we are content with the present statement.

The following lemma takes the role of Lemma 4 in [9], which analyzes the transition from character \(i\) oscillating to character \(i+1\) oscillating. It applies to the case that the best-so-far solution at the end of phase \(i\) is not OPT\(_i\) but ALT\(_i\) despite the pheromone values favoring OPT\(_i\). Then, when the new phase starts and the best-so-far is reevaluated, the fitness function will equal OneMax. However, it is very likely that \(\hbox {MMAS}^{*}\) recovers quickly from this; more precisely, it will sample the solution OPT\(_i\), which is in ALL\(_{i+1}\), again before the pheromones have changed significantly. The lemma can be proven in the very same way as in [9]. In fact, the probability of setting a character being \(0\) in the best-so-far solution to \(1\), assuming saturated pheromones, will even be \(1/(rn)\). This is less than the bound \(1/n\) used in the proof from [9].

Lemma 16

Let \(\rho =\varTheta (\tfrac{1}{rn})\). Assume for \(i\in \{1,\dots ,n\}\) that the current-best solution is \(0^{i-1} 1^{n-i+1}\) and that the pheromones of the first \(i-1\) edges belonging to \(0\)-entries as well as the last \(n-i\) edges belonging to \(1\)-entries all are \(\tau _{\text {max}} =1-1/n\). Finally, assume that the pheromone belonging to the \(i\)-th \(0\)-entry is \(1-O((\log ^2 n)/n)\). Then for all \(c>0\), \(\hbox {MMAS}^{*}\) will sample \(0^i1^{n-i}\) within \(O(\log n)\) iterations with probability \(1-O(n^{-c})\).

Finally, Theorem 5 in [9] states a tail bound on the optimization time of classical \(\hbox {MMAS}^{*}\) on OneMax, which is used in phase \(0\) of Maze, where the all-ones string is the first target. This theorem carries mostly over to our enlarged search space, see Theorem 18 below, except for that the modified lower pheromone border introduces a factor of \(r\) at two places. The following lemma is used for the analysis.

Lemma 17

Assume there is a character whose value remains fixed in all best-so-far solutions of \(\hbox {MMAS}^{*}\). Then the pheromone values of the character will be saturated according to the best-so-far solution after at most \(\ln (rn)/\rho \) steps.

Proof

The edges belonging to the \(r\) entries different from the best-so-far value each have a pheromone value of at most

$$\begin{aligned} \tau _{\text {max}} \cdot \left( 1-\rho \right) ^{t} \end{aligned}$$

or are already capped by their lower border after \(t\) steps, all of which by assumption reinforce the edge belonging to the best-so-far entry. Setting \(t:=\ln (rn)/\rho \), the expression becomes at most \(1/(rn)= \tau _{\text {min}} \). Since by Lemma 3 always \(\tau _\text {sum} \ge 1\), the value for the best-so far entry must be at least \(1-r\tau _{\text {min}} = \tau _{\text {max}} \). \(\square \)

Theorem 18

For all \(c>0\) with probability \(1-O(n^{-c})\), \(\hbox {MMAS}^{*}\) for the search space \(\{0,\dots ,r\}^n\) optimizes OneMax in \(O(nr\log (rn)/\rho )\) iterations and then saturates the all-ones string in pheromone.

Proof

We will use a fitness-level argument combined with an analysis of “freezing time” as commonly used in the analysis of ACO algorithms [12, p. 125]. The number of characters being \(1\) in the best-so-far solution is non-decreasing over time. By Lemma 17, pessimistically assuming no update of the best-so-far, the pheromone values of every \(1\)-entry must be saturated after at most \(\ln (rn)/\rho \) steps. This applies to all characters simultaneously, also to the entries different from \(1\). Hence, pheromone values are saturated according to the best-so-far solution after a so-called freezing time of \(O(\ln (rn)/\rho )\) steps (or an improvement is found before).

Given a current \(\textsc {OneMax} \)-value of \(i\le n-1\), the probability of finding an improvement in such a situation is at least

$$\begin{aligned} \left( {\begin{array}{c}n-i\\ 1\end{array}}\right) {\tau _{\text {max}}}^{i} \tau _{\text {min}} \ge \frac{n-i}{ern} =: p_i \end{aligned}$$

Hence, in the notation of Theorem 2 in [20], we have \(n+1\) fitness levels \(A_0,\dots ,A_{n}\) corresponding to the \(\textsc {OneMax} \)-values and corresponding probabilities of improvement (assuming saturated pheromones) given by \(p_i\) for \(0\le i\le n-1\). Now, \(s=\sum _{i=0}^{n-1} 1/p^2_i = O(r^2 n^2)\), \(h=\varOmega (1/(rn))\) and \(\sum _{i=0}^{n-1} 1/p_i = O(rn \log n)\). Hence, by setting \(\delta =Cc rn\ln n\) for some constant \(C>0\) in the theorem, the time to reach the last level (without the freezing time) is \(O(rn\log n)\) with probability \(1-n^{-c}\)

On at most \(n\) levels, the pheromone values need to be saturated according to the best-so-far solution in order for the fitness-level argument to apply; one more saturation may be required for the final all-ones string. This accounts for a deterministic term of \(O(n\ln (rn)/\rho ) \) that has to be added to the time bound given by the fitness-level argument. Taking the two bounds together, the lemma follows. \(\square \)

We note that the proof of Theorem 18 is the only place in our analysis where the strict \(>\)-selection for the update of the best-so-far solution by \(\hbox {MMAS}^{*}\) is used. Otherwise, the arguments would hold for the algorithm using non-strict \(\ge \)-selection, which is often simply called MMAS in the literature.

Putting everything together, we obtain the following theorem, taking the role of Theorem 6 in [9]. Recall that \(t_0\), the length of the so-called oscillation phase, is the number of iterations that a character is oscillating as OPT-OPT-ALT; however, the very first phase of length \(t_0\), called phase \(0\), has objective function OneMax.

Theorem 19

Given any \(r>0\), choose \(\rho \le \frac{1}{7rn}\) and \(\rho =\varOmega (\frac{1}{rn})\). We say that \(\hbox {MMAS}^{*}\) tracks the optimum of the Maze for the \((r+1)\)-character alphabet if the best-so-far solution at the end of every oscillation phase has Hamming distance at most \(1\) to the optimum. Then for all \(c>0\) there is a constant \(k\) such that choosing \(t_0 \ge kr^2 n^2\ln (rn)\) makes \(\hbox {MMAS}^{*}\) track the optimum with probability \(1-O(n^{-c})\).

Proof

We follow the argumentation in [9]. Let \(k'\) be the largest implicit constant in the bounds of Lemmas 15, 16 and Theorem 18 for obtaining a failure probability of \(O(n^{-c-1})\). Let \(k=3k'\). In the following, we assume that the events proved to hold with probability \(1-O(n^{-c-1})\) actually happen and call it a failure otherwise.

Since Maze equals OneMax within phase \(0\), we know from Theorem 18 that \(\hbox {MMAS}^{*}\) finds the all-ones string and saturates all pheromones in the phase with probability \(1-O(n^{-c-1})\). The conditions of Lemma 15 hold for the start of phase 1, where the first character is oscillating. Setting \(\alpha =n\), we get that the pheromone value on the corresponding \(0\)-edge will be at least \(1-O(\log ^2 n/(rn))\) by the end of the phase with probability \(1-o(n^{-c-1})\). The best-so-far solution by the end of phase \(1\) is guaranteed to be in ALL\(_1\); however, it might be ALT\(_1\), which does not belong to ALL\(_2\).

We now analyze the transition to phase \(2\). According to Lemma 16, a solution from ALL\(_2\) will be created in the first third of the phase with probability \(1-O(n^{-c-1})\). Within at most \(O(rn\ln (rn))\) steps of the second third of the phase, \(\hbox {MMAS}^{*}\) by Lemma 17 saturates all pheromones except for the oscillating character corresponding to the solutions from ALL\(_2\). Now the conditions of Lemma 15 are satisfied. By the end of the final third of the phase, the pheromone value on the \(0\)-edge for character \(2\) will be \(1-O(\log ^2 n/(rn))\) with probability \(1-o(n^{-c-1})\). The subsequent phases are analyzed in the very same way as the second one.

By a union bound, the failure probability over all \(n+1\) phases is \(O(n^{-c}\)). \(\square \)

6 Conclusions

We have revisited the analysis of evolutionary algorithms and ACO on the dynamic fitness function Maze [9]. First, we have shown that a (2+1) EA with a simple population diversity mechanism is able to track the optimum, which the (1+1) EA cannot. Subsequently, we have generalized this to a hierarchy result on strings over finite alphabets, where for given \(\mu \), there exists a Maze variant for which a population size of at least \(\mu \) in a (\(\mu \)+1) EA with genotype diversity is sufficient to track the optimum, whereas population size \(\mu -1\) causes the EA lose track of the optimum. Surprisingly, it turns out that a generalization of \(\hbox {MMAS}^{*}\) to the larger state space does not require a population and is sufficient on all functions from the hierarchy. Along the way, we have introduced a variable drift theorem dealing with occupation probabilities, which allows for a more precise and simpler analysis of the pheromone values in \(\hbox {MMAS}^{*}\) compared to [9]. As a subject for future research, it is interesting to study the benefits and the limitations of ACO on other dynamic problems.