1 Introduction

In real-world optimization tasks, the exact objective (i.e., fitness) function evaluation of candidate solutions is often impossible, instead we can obtain only a noisy one due to a wide range of uncertainties [21]. For example, in machine learning, a prediction model is evaluated only on a limited amount of data, which makes the estimated performance deviate from the true performance; in product design, the design variables can be subject to perturbations due to manufacturing tolerances, which brings noise.

In the presence of noise, the difficulty of solving an optimization problem may increase. Evolutionary algorithms (EAs) [5], inspired by natural phenomena, are a type of randomized metaheuristic optimization algorithm. They are likely to be able to handle noise, since the corresponding natural phenomena have been well processed in noisy natural environments. In fact, EAs have been successfully applied to solve many noisy optimization problems [7, 22].

Compared with the application, the theoretical analysis of EAs is far behind. But in the last two decades, much effort has been devoted to the running time analysis (an essential theoretical aspect) of EAs. Numerous analytical results for EAs solving synthetic problems as well as combinatorial problems have been derived, e.g., [4, 23]. Meanwhile, a few general approaches for running time analysis have been proposed, e.g., drift analysis [10, 12, 20], fitness-level methods [8, 32], and switch analysis [34].

However, previous running time analyses of EAs mainly focused on noise-free optimization, where the fitness evaluation is exact. Only a few pieces of work on noisy evolutionary optimization have been reported, which mainly considered two kinds of noise models, prior and posterior. The prior noise comes from the variation on a solution, e.g., one-bit noise [13] flips a random bit of a binary solution before evaluation with probability p. The posterior noise comes from the variation on the fitness of a solution, e.g., additive noise [19] adds a value randomly drawn from some distribution. Droste [13] first analyzed the (\(1+1\))-EA on the OneMax problem in the presence of one-bit noise and showed that the tight range of the noise probability p allowing a polynomial running time is \(O(\log n/n)\), where n is the problem size. Gießen and Kötzing [19] recently studied the LeadingOnes problem, and proved that the expected running time is polynomial if \(p \le 1/(6en^2)\) and exponential if \(p=1/2\). They also considered additive noise with variance \(\sigma ^2\), and proved that the (\(1+1\))-EA can solve OneMax and LeadingOnes in polynomial time when \(\sigma ^2=O(\log n/n)\) and \(\sigma ^2\le 1/(12en^2)\), respectively.

For inefficient optimization of the (\(1+1\))-EA under high noise levels, some implicit mechanisms of EAs were proved to be robust to noise. In [19], it was shown that the (\(\mu \)+1)-EA with a small population of size \(\varTheta (\log n)\) can solve OneMax in polynomial time even if the probability of one-bit noise reaches 1. The robustness of populations to noise was also proved in the setting of non-elitist EAs [9, 26]. However, Friedrich et al. [17] showed the limitation of populations by proving that the (\(\mu \)+1)-EA needs super-polynomial time for solving OneMax under additive Gaussian noise \(\mathcal {N}(0,\sigma ^2)\) with \(\sigma ^2 \ge n^3\). This difficulty can be overcome by the compact genetic algorithm (cGA) [17] and a simple Ant Colony Optimization (ACO) algorithm [16], both of which find the optimal solution in polynomial time with a high probability. ACO was also shown to be able to efficiently find solutions with reasonable approximations on some instances of the single-destination shortest path problem with edge weights disturbed by noise [11, 15, 33].

The ability of explicit noise handling strategies was also theoretically studied. Qian et al. [29] proved that the threshold selection strategy is robust to noise: the expected running time of the (\(1+1\))-EA using threshold selection on OneMax under one-bit noise is always polynomial regardless of the noise probability p. For the (\(1+1\))-EA solving OneMax under one-bit noise with \(p=1\) or additive Gaussian noise \(\mathcal {N}(0,\sigma ^2)\) with \(\sigma ^2 \ge 1\), the sampling strategy was shown to be able to reduce the running time from exponential to polynomial [28]. The robustness of sampling to noise was also proved for the (\(1+1\))-EA solving LeadingOnes under one-bit noise with \(p=1/2\) or additive Gaussian noise with \(\sigma ^2 \ge n^2\). Akimoto et al. [2] proved that sampling with a large sample size can make optimization under additive unbiased noise behave as optimization in a noise-free environment. The interplay between sampling and implicit noise-handling mechanisms (e.g., crossover) has been statistically studied in [18].

The noise models considered in the studies mentioned above are summarized in Table 1. We can observe that for the prior noise model, one-bit noise was mainly considered, which flips a random bit of a solution before evaluation with probability p. However, the noise model, which can change several bits of a solution simultaneously, may be more realistic and needs to be studied, as mentioned in the first noisy theoretical work [13].

Table 1 The noise models mainly considered in running time analyses on noisy evolutionary optimization

In this paper, we study the bit-wise noise model, which is characterized by a pair (pq) of parameters. It happens with probability p, and independently flips each bit of a solution with probability q before evaluation. We analyze the running time of the (\(1+1\))-EA solving OneMax and LeadingOnes under bit-wise noise with two specific parameter settings \((p,\frac{1}{n})\) and (1, q). The ranges of p and q for a polynomial upper bound and a super-polynomial lower bound are derived, as shown in the first two rows of Table 2. For the (\(1+1\))-EA on LeadingOnes, we also transfer the running time bounds from bit-wise noise \((p,\frac{1}{n})\) to one-bit noise by using the same proof procedure. As shown in the bottom right of Table 2, our results improve the previously known ones [19].

Note that for the (\(1+1\))-EA on LeadingOnes, the current analysis (as shown in the last column of Table 2) does not cover all the ranges of p and q. We thus conduct experiments to estimate the expected running time for the uncovered values of p and q. The empirical results show that the currently derived ranges of p and q allowing a polynomial running time are possibly tight.

Table 2 For the running time of the (\(1+1\))-EA on OneMax and LeadingOnes under prior noise models, the ranges of noise parameters for a polynomial upper bound and a super-polynomial lower bound are shown below
Table 3 For the running time of the (\(1+1\))-EA using sampling on OneMax and LeadingOnes under prior noise models, the ranges of noise parameters for a polynomial upper bound and a super-polynomial lower bound are shown below

From the results in Table 2, we find that the (\(1+1\))-EA is efficient only under low noise levels. For example, for the (\(1+1\))-EA solving OneMax under bit-wise noise \((p,\frac{1}{n})\), the expected running time is polynomial only when \(p=O(\log n/n)\). We then study whether the sampling strategy can bring robustness to noise. Sampling is a popular way to cope with noise in fitness evaluation [3], which, instead of evaluating the fitness of one solution only once, evaluates the fitness multiple (m) times and then uses the average to approximate the true fitness. We analyze the running time of the (\(1+1\))-EA using sampling under both bit-wise noise and one-bit noise. The ranges of p and q for a polynomial upper bound and a super-polynomial lower bound are shown in Table 3. Our analysis covers all the ranges of p and q. Note that for proving a polynomial upper bound, it is sufficient to show that using sampling with a specific sample size m can guarantee a polynomial running time, while for proving a super-polynomial lower bound, we need to prove that using sampling with any polynomially bounded m fails to guarantee a polynomial running time. Compared with the results in Table 2, we find that using sampling significantly improves the noise-tolerance ability. For example, by using sampling with \(m=4n^3\), the (\(1+1\))-EA now can always solve OneMax under bit-wise noise \((p,\frac{1}{n})\) in polynomial time.

From the analysis, we also find the reason why sampling is effective or not. Let f(x) and \(f^n(x)\) denote the true and noisy fitness of a solution, respectively. For two solutions x and y with \(f(x)>f(y)\), when the noise level is high (i.e., the values of p and q are large), the probability \(\mathrm {P}(f^n(x) \le f^n(y))\) (i.e., the true worse solution y appears to be better) becomes large, which will mislead the search direction and then lead to a super-polynomial running time. In such a situation, if the expected gap between \(f^n(x)\) and \(f^n(y)\) is positive, sampling will increase this trend and make \(\mathrm {P}(f^n(x) \le f^n(y))\) sufficiently small; if it is negative (e.g., on OneMax under bit-wise noise (1, q) with \(q\ge 1/2\)), sampling will continue to increase \(\mathrm {P}(f^n(x) \le f^n(y))\), and obviously will not work. We also note that if the positive gap between \(f^n(x)\) and \(f^n(y)\) is too small (e.g., on OneMax under bit-wise noise (1, q) with \(q= 1/2-1/n^{\omega (1)}\)), a polynomial sample size will be not sufficient and sampling also fails to guarantee a polynomial running time.

This paper extends our preliminary work [27]. Since the theoretical analysis on the LeadingOnes problem is not complete, we add experiments to complement the theoretical results (i.e., Sect. 4.4). We also add the robustness analysis of sampling to noise (i.e., Sect. 5). Note that the robustness of sampling to one-bit noise has been studied in our previous work [28]. It was shown that sampling can reduce the running time of the (\(1+1\))-EA from exponential to polynomial on OneMax when the noise probability \(p=1\) as well as on LeadingOnes when \(p=1/2\). Therefore, our results here are more general. We prove that sampling is effective for any value of p, as shown in the last row of Table 3. Furthermore, we analyze the robustness of sampling to bit-wise noise for the first time.

The rest of this paper is organized as follows. Section 2 introduces some preliminaries. The running time analysis of the (\(1+1\))-EA on OneMax and LeadingOnes under noise is presented in Sections 3 and 4, respectively. Section 5 analyzes the (\(1+1\))-EA using sampling. Section 6 concludes the paper.

2 Preliminaries

In this section, we first introduce the optimization problems, noise models and evolutionary algorithms studied in this paper, respectively, then introduce the sampling strategy, and finally present the analysis tools that we use throughout this paper.

2.1 OneMax and LeadingOnes

In this paper, we use two well-known pseudo-Boolean functions OneMax and LeadingOnes. The OneMax problem as presented in Definition 1 aims to maximize the number of 1-bits of a solution. The LeadingOnes problem as presented in Definition 2 aims to maximize the number of consecutive 1-bits counting from the left of a solution. Their optimal solution is \(11\ldots 1\) (briefly denoted as \(1^n\)). It has been shown that the expected running time of the (\(1+1\))-EA on OneMax and LeadingOnes is \(\varTheta (n \log n)\) and \(\varTheta (n^2)\), respectively [14]. In the following analysis, we will use \({\mathrm {LO}}(x)\) to denote the number of leading 1-bits of a solution x.

Definition 1

(OneMax) The OneMax Problem of size n is to find an n bits binary string \(x^*\) such that

$$\begin{aligned} x^*=\mathop {\arg \max }\nolimits _{x \in \{0,1\}^n} \left( f(x)=\sum \limits ^{n}_{i=1} x_i\right) . \end{aligned}$$

Definition 2

(LeadingOnes) The LeadingOnes Problem of size n is to find an n bits binary string \(x^*\) such that

$$\begin{aligned} x^*=\mathop {\arg \max }\limits _{x \in \{0,1\}^n} \left( f(x)=\sum \limits ^{n}_{i=1} \prod \limits ^{i}_{j=1} x_j\right) . \end{aligned}$$

2.2 Bit-Wise Noise

There are mainly two kinds of noise models: prior and posterior [19, 21]. Let \(f^n(x)\) and f(x) denote the noisy and true fitness of a solution x, respectively. The prior noise comes from the variation on a solution, i.e., \(f^n(x)=f(x')\), where \(x'\) is generated from x by random perturbations. The posterior noise comes from the variation on the fitness of a solution, e.g., additive noise \(f^n(x)=f(x)+\delta \) and multiplicative noise \(f^n(x)=f(x)\cdot \delta \), where \(\delta \) is randomly drawn from some distribution. Previous theoretical analyses involving prior noise [9, 13, 19, 28, 29] often focused on a specific model, one-bit noise. As presented in Definition 3, it flips a random bit of a solution before evaluation with probability p. However, in many realistic applications, noise can change several bits of a solution simultaneously rather than only one bit. We thus consider the bit-wise noise model. As presented in Definition 4, it happens with probability p, and independently flips each bit of a solution with probability q before evaluation. We use bit-wise noise (pq) to denote the bit-wise noise model with a scenario of (pq).

Definition 3

(One-bit Noise) Given a parameter \(p \in [0,1]\), let \(f^{n}(x)\) and f(x) denote the noisy and true fitness of a solution \(x \in \{0,1\}^n\), respectively, then

$$\begin{aligned} f^n(x)={\left\{ \begin{array}{ll} f(x) &{} \text {with probability }1-p,\\ f(x') &{} \text {with probability }p, \end{array}\right. } \end{aligned}$$

where \(x'\) is generated by flipping a uniformly randomly chosen bit of x.

Definition 4

(Bit-wise Noise) Given parameters \(p,q \in [0,1]\), let \(f^{n}(x)\) and f(x) denote the noisy and true fitness of a solution \(x \in \{0,1\}^n\), respectively, then

$$\begin{aligned} f^n(x)={\left\{ \begin{array}{ll} f(x) &{} \text {with probability }1-p,\\ f(x') &{} \text {with probability }p, \end{array}\right. } \end{aligned}$$

where \(x'\) is generated by independently flipping each bit of x with probability q.

To the best of our knowledge, only bit-wise noise (1, q) has been recently studied. Gießen and Kötzing [19] proved that for the (\(1+1\))-EA solving OneMax, the expected running time is polynomial if \(q=O(\log n /n^2)\) and super-polynomial if \(q=\omega (\log n /n^2)\). Besides bit-wise noise (1, q), we also study another specific model bit-wise noise \((p,\frac{1}{n})\) in this paper. Note that bit-wise noise \((p,\frac{1}{n})\) is a natural extension of one-bit noise; their random behaviors of perturbing a solution correspond to the two common mutation operators, bit-wise mutation and one-bit mutation, respectively.

To investigate whether the performance of the (\(1+1\))-EA for bit-wise noise with two scenarios of (pq) and \((p',q')\) where \(p\cdot q=p'\cdot q'\) can be significantly different, we consider the OneMax problem under bit-wise noise \((1, \frac{\log n}{30n})\) and \((\frac{\log n}{30n}, 1)\). The comparison gives a positive answer. For bit-wise noise \((1, \frac{\log n}{30n})\), we know that the (\(1+1\))-EA needs super-polynomial time to solve OneMax [19], while for bit-wise noise \((\frac{\log n}{30n}, 1)\), we will prove in Theorem 7 that the (\(1+1\))-EA can solve OneMax in polynomial time. Thus, the analysis on general bit-wise noise without fixing p or q would be complicated, and \(p\cdot q\) may not be the only deciding factor. We leave it as a future work.

2.3 (\(1+1\)) Evolutionary Algorithm

The (\(1+1\))-EA as described in Algorithm 1 is studied in this paper. For noisy optimization, only a noisy fitness value \(f^n(x)\) instead of the exact one f(x) can be accessed, and thus line 4 of Algorithm 1 is “if \(f^{n}(x') \ge f^{n}(x)\)” instead of “if \(f(x') \ge f(x)\)”. Note that the reevaluation strategy is used as in [11, 13, 19]. That is, besides evaluating \(f^{n}(x')\), \(f^{n}(x)\) will be reevaluated in each iteration of the (\(1+1\))-EA. The running time is usually defined as the number of fitness evaluations needed to find an optimal solution w.r.t. the true fitness function f for the first time [2, 13, 19].

Algorithm 1

((\(1+1\))-EA) Given a function f over \(\{0,1\}^n\) to be maximized, it consists of the following steps:

figure a

2.4 Sampling

In noisy evolutionary optimization, sampling has often been used to reduce the negative effect of noise [1, 6]. As presented in Definition 5, it approximates the true fitness f(x) using the average of multiple (m) independent random evaluations. For the (\(1+1\))-EA using sampling, line 4 of Algorithm 1 changes to be “if \(\hat{f}(x') \ge \hat{f}(x)\)”. Its pseudo-code is described in Algorithm 2. Note that the sample size \(m=1\) is equivalent to that sampling is not used. The effectiveness of sampling was not theoretically analyzed until recently. Qian et al. [28] proved that sampling is robust to one-bit noise and additive Gaussian noise. Particularly, under one-bit noise, it was shown that sampling can reduce the running time exponentially for the (\(1+1\))-EA solving OneMax when the noise probability \(p=1\) and LeadingOnes when \(p=1/2\).

Definition 5

(Sampling) Sampling first evaluates the fitness of a solution m times independently and obtains the noisy fitness values \(f^n_1(x),\ldots ,f^n_m(x)\), and then outputs their average, i.e.,

$$\begin{aligned} \hat{f}(x)=\frac{1}{m}\sum \limits ^{m}_{i=1} f^{n}_i(x). \end{aligned}$$

Algorithm 2

((\(1+1\))-EA with sampling) Given a function f over \(\{0,1\}^n\) to be maximized, it consists of the following steps:

figure b

2.5 Analysis Tools

The process of the (\(1+1\))-EA solving any pseudo-Boolean function with one unique global optimum can be directly modeled as a Markov chain \(\{\xi _t\}^{+\infty }_{t=0}\). We only need to take the solution space \(\{0,1\}^n\) as the chain’s state space (i.e., \(\xi _t \in \mathcal {X}=\{0,1\}^n\)), and take the optimal solution \(1^n\) as the chain’s optimal state (i.e., \(\mathcal {X}^*=\{1^n\}\)). Note that we can assume without loss of generality that the optimal solution is \(1^n\), because unbiased EAs treat the bits 0 and 1 symmetrically, and thus the 0 bits in an optimal solution can be interpreted as 1 bits without affecting the behavior of EAs. Given a Markov chain \(\{\xi _t\}^{+\infty }_{t=0}\) and \(\xi _{\hat{t}}=x\), we define its first hitting time (FHT) as \(\tau =\min \{t \mid \xi _{\hat{t}+t} \in \mathcal {X}^*,t\ge 0\}\). The mathematical expectation of \(\tau \), \(\mathrm {E}(\tau \mid \xi _{\hat{t}}=x)=\sum \nolimits ^{+\infty }_{i=0} i\cdot \mathrm {P}(\tau =i \mid \xi _{\hat{t}}=x)\), is called the expected first hitting time (EFHT) starting from \(\xi _{\hat{t}}=x\). If \(\xi _{0}\) is drawn from a distribution \(\pi _{0}\), \(\mathrm {E}(\tau \mid \xi _{0}\sim \pi _0) = \sum \nolimits _{x\in \mathcal {X}} \pi _{0}(x)\mathrm {E}(\tau \mid \xi _{0}=x)\) is called the EFHT of the Markov chain over the initial distribution \(\pi _0\). Thus, the expected running time of the (\(1+1\))-EA starting from \(\xi _0 \sim \pi _0\) is equal to \(1+2\cdot \mathrm {E}(\tau \mid \xi _{0} \sim \pi _0)\), where the term 1 corresponds to evaluating the initial solution, and the factor 2 corresponds to evaluating the offspring solution \(x'\) and reevaluating the parent solution x in each iteration. If using sampling, the expected running time of the (\(1+1\))-EA is \(m+2m\cdot \mathrm {E}(\tau \mid \xi _{0} \sim \pi _0)\), since estimating the fitness of a solution needs m number of independent fitness evaluations. Note that we consider the expected running time of the (\(1+1\))-EA starting from a uniform initial distribution in this paper.

In the following, we give three drift theorems that will be used to analyze the EFHT of Markov chains in the paper. The additive drift theorem [20] as presented in Theorem 1 is used to derive upper bounds on the EFHT of Markov chains. To use it, a function V(x) has to be constructed to measure the distance of a state x to the optimal state space \(\mathcal {X}^*\). Then, we need to investigate the progress on the distance to \(\mathcal {X}^*\) in each step, i.e., \(\mathbb {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t)\). An upper bound on the EFHT can be derived through dividing the initial distance by a lower bound on the progress.

Theorem 1

(Additive Drift [20]) Given a Markov chain \(\{\xi _t\}^{+\infty }_{t=0}\) and a distance function V(x), if for any \(t \ge 0\) and any \(\xi _t\) with \(V(\xi _t) > 0\), there exists a real number \(c>0\) such that

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t) \ge c, \end{aligned}$$

then the EFHT satisfies that \(\mathrm {E}(\tau \mid \xi _0) \le V(\xi _0)/c.\)

The negative drift theorem [24, 25] as presented in Theorem 2 was proposed to prove exponential lower bounds on the EFHT of Markov chains, where \(X_t\) is usually represented by a mapping of \(\xi _t\). It requires two conditions: a constant negative drift and exponentially decaying probabilities of jumping towards or away from the goal state. To relax the requirement of a constant negative drift, the negative drift theorem with self-loops [30] as presented in Theorem 3 has been proposed, which takes into account large self-loop probabilities.

Theorem 2

(Negative Drift [24, 25]) Let \(X_t\), \(t\ge 0\), be real-valued random variables describing a stochastic process. Suppose there exists an interval [ab] \(\subseteq \mathbb {R}\), two constants \(\delta ,\epsilon >0\) and, possibly depending on \(l:=b-a\), a function r(l) satisfying \(1\le r(l)=o(l/\log (l))\) such that for all \(t\ge 0\) the following two conditions hold:

$$\begin{aligned}&1. \quad \mathrm {E}(X_t-X_{t+1} \mid a< X_t <b) \le -\epsilon ,\\&2. \quad \mathrm {P}(|X_{t+1}-X_t| \ge j \mid X_t>a) \le \frac{r(l)}{(1+\delta )^j} \;\; \text {for} \; j \in \mathbb {N}_0. \end{aligned}$$

Then there is a constant \(c>0\) such that for \(T:=\min \{t \ge 0: X_t \le a \mid X_0 \ge b\}\) it holds \(\mathrm {P}(T \le 2^{cl/r(l)})=2^{-\varOmega (l/r(l))}\).

Theorem 3

(Negative Drift with Self-loops [30]) Let \(X_t\), \(t\ge 0\), be real-valued random variables describing a stochastic process. Suppose there exists an interval \([a,b] \subseteq \mathbb {R}\), two constants \(\delta ,\epsilon >0\) and, possibly depending on \(l:=b-a\), a function r(l) satisfying \(1\le r(l)=o(l/\log (l))\) such that for all \(t\ge 0\) the following two conditions hold:

$$\begin{aligned} 1.&\; \forall a<i<b: \mathrm {E}(X_t-X_{t+1} \mid X_t=i) \le -\epsilon \cdot \mathrm {P}(X_{t+1} \ne i \mid X_t=i),\\ 2.&\; \forall i >a, j \in \mathbb {N}_0: \mathrm {P}(|X_{t+1}-X_t| \ge j \mid X_t=i) \le \frac{r(l)}{(1+\delta )^j}\cdot \mathrm {P}(X_{t+1} \ne i \mid X_t=i). \end{aligned}$$

Then there is a constant \(c>0\) such that for \(T:=\min \{t \ge 0: X_t \le a \mid X_0 \ge b\}\) it holds \(\mathrm {P}(T \le 2^{cl/r(l)})=2^{-\varOmega (l/r(l))}\).

3 The OneMax Problem

In this section, we analyze the running time of the (\(1+1\))-EA on OneMax under bit-wise noise. Note that for bit-wise noise (1, q), it has been proved that the expected running time is polynomial if and only if \(q=O(\log n/n^2)\), as shown in Theorem 4.

Theorem 4

[19] For the (\(1+1\))-EA on OneMax under bit-wise noise (1, q), the expected running time is polynomial if \(q =O(\log n/n^2)\) and super-polynomial if \(q=\omega (\log n / n^2)\).

For bit-wise noise \((p,\frac{1}{n})\), we prove in Theorems 5 and 6 that the tight range of p allowing a polynomial running time is \(O(\log n/n)\). Instead of using the original drift theorems, we apply the upper and lower bounds of the (\(1+1\))-EA on noisy OneMax in [19]. Let \(x^k\) denote any solution with k number of 1-bits, and \(f^n(x^{k})\) denote its noisy objective value, which is a random variable. Lemma 1 intuitively means that if the probability of recognizing the true better solution by noisy evaluation is large (i.e., Eq. (1)), the running time can be upper bounded. Particularly, if Eq. (1) holds with \(l=O(\log n)\), the running time can be polynomially upper bounded. On the contrary, Lemma 2 shows that if the probability of making a right comparison is small (i.e., Eq. (2)), the running time can be lower bounded. Particularly, if Eq. (2) holds with \(l=\varOmega (n)\), the running time can be exponentially lower bounded. Both Lemmas 1 and 2 are proved by applying standard drift theorems, and can be used to simplify our analysis. Note that in the original upper bound of the (\(1+1\))-EA on noisy OneMax (i.e., Theorem 5 in [19]), it requires that Eq. (1) holds with only \(j=k\), but the proof actually also requires that noisy OneMax satisfies the monotonicity property, i.e., for all \(j< k<n\), \(\mathrm {P}(f^n(x^k)<f^n(x^{k+1})) \le \mathrm {P}(f^n(x^j)<f^n(x^{k+1}))\). We have combined these two conditions in Lemma 1 by requiring Eq. (1) to hold with any \(j \le k\) instead of only \(j=k\).

Lemma 1

[19] Suppose there is a positive constant \(c \le 1/15\) and some \(2 < l \le n/2\) such that

$$\begin{aligned} \begin{aligned}&\forall j\le k<n: \mathrm {P}\left( f^n(x^j)< f^n(x^{k+1})\right) \ge 1-\frac{l}{n};\\&\forall j \le k<n-l: \mathrm {P}\left( f^n(x^j) < f^n(x^{k+1})\right) \ge 1-c\frac{n-k}{n}, \end{aligned} \end{aligned}$$
(1)

then the (\(1+1\))-EA optimizes f in expectation in \(O(n \log n) +n2^{O(l)}\) iterations.

Lemma 2

[19] Suppose there is some \(l \le n/4\) and a constant \(c \ge 16\) such that

$$\begin{aligned} \begin{aligned}&\forall n-l \le k< n: \mathrm {P}\left( f^n(x^k) < f^n(x^{k+1})\right) \le 1-c\frac{n-k}{n}, \end{aligned} \end{aligned}$$
(2)

then the (\(1+1\))-EA optimizes f in \(2^{\varOmega (l)}\) iterations with a high probability.

First, we apply Lemma 1 to show that the expected running time is polynomially upper bounded for bit-wise noise \((p,\frac{1}{n})\) with \(p=O(\log n/n)\).

Theorem 5

For the (\(1+1\))-EA on OneMax under bit-wise noise \((p,\frac{1}{n})\), the expected running time is polynomial if \(p =O(\log n/n)\).

Proof

We prove it by using Lemma 1. For some positive constant b, suppose that \(p \le b\log n/n\). We set the two parameters in Lemma 1 as \(c=\min \{\frac{1}{15},b\}\) and \(l=\frac{2b\log n}{c} \in (2,\frac{n}{2}]\).

For any \(j \le k<n\), \(f^n(x^j) \ge f^n(x^{k+1})\) implies that \(f^n(x^j) \ge k+1\) or \(f^n(x^{k+1}) \le k\), either of which happens with probability at most p. By the union bound, we get \(\forall j \le k<n\),

$$\begin{aligned} \mathrm {P}\left( f^n(x^j) \ge f^n(x^{k+1})\right) \le 2p \le \frac{2b\log n}{n}=\frac{lc}{n}\le \frac{l}{n}. \end{aligned}$$

For any \(j \le k < n-l\), we easily get

$$\begin{aligned} \mathrm {P}\left( f^n(x^j) \ge f^n(x^{k+1})\right) \le \frac{lc}{n}< c\frac{n-k}{n}. \end{aligned}$$

By Lemma 1, we know that the expected running time is \(O(n \log n)+n2^{O(2b\log n/c)}\), i.e., polynomial. \(\square \)

Next we apply Lemma 2 to show that the expected running time is super-polynomial for bit-wise noise \((p,\frac{1}{n})\) with \(p=\omega (\log n/n)\). Note that for \(p=1-O(\log n/n)\), we actually give a stronger result that the expected running time is exponential.

Theorem 6

For the (\(1+1\))-EA on OneMax under bit-wise noise \((p,\frac{1}{n})\), the expected running time is super-polynomial if \(p =\omega (\log n/n) \cap 1-\omega (\log n/n)\) and exponential if \(p =1-O(\log n/n)\).

Proof

We use Lemma 2 to prove it. Let \(c = 16\). The case \(p =\omega (\log n/n) \cap 1-\omega (\log n/n)\) is first analyzed. For any positive constant b, let \(l=b\log n\). For any \(k \ge n-l\), we get

$$\begin{aligned} \mathrm {P}\left( f^n(x^k) \ge f^{n}(x^{k+1})\right) \ge \mathrm {P}(f^n(x^k)=k) \cdot \mathrm {P}(f^n(x^{k+1})\le k). \end{aligned}$$

To make \(f^n(x^k)=k\), it is sufficient that the noise does not happen, i.e., \(\mathrm {P}(f^n(x^k)=k) \ge 1-p\). To make \(f^n(x^{k+1})\le k\), it is sufficient to flip one 1-bit and keep other bits unchanged by noise, i.e., \(\mathrm {P}(f^n(x^{k+1})\le k) \ge p\cdot \frac{k+1}{n}(1-\frac{1}{n})^{n-1}\). Thus,

$$\begin{aligned} \mathrm {P}\left( f^n(x^k) \ge f^{n}(x^{k+1})\right) \ge (1-p)\cdot p\frac{k+1}{en}=\omega (\log n/n). \end{aligned}$$

Since \(c\frac{n-k}{n} \le c\frac{l}{n}=\frac{cb\log n}{n}\), the condition of Lemma 2 holds. Thus, the expected running time is \(2^{\varOmega (b\log n)}\) (where b is any constant), i.e., super-polynomial.

For the case \(p =1-O(\log n/n)\), let \(l=\sqrt{n}\). We use another lower bound \(p(1-\frac{1}{n})^{n}\) for \(\mathrm {P}(f^n(x^k)=k)\), since it is sufficient that no bit flips by noise. Thus, we have

$$\begin{aligned} \mathrm {P}\left( f^n(x^k) \ge f^{n}(x^{k+1})\right) \ge p\left( 1-\frac{1}{n}\right) ^{n}\cdot p \frac{k+1}{en}=\varOmega (1). \end{aligned}$$

Since \(c\frac{n-k}{n} \le \frac{c\sqrt{n}}{n}\), the condition of Lemma 2 holds. Thus, the expected running time is \(2^{\varOmega (\sqrt{n})}\), i.e., exponential. \(\square \)

To show that the performance of the (\(1+1\))-EA for bit-wise noise with two scenarios (pq) and \((p',q')\) where \(p\cdot q=p'\cdot q'\) can be significantly different, we compare the expected running time of the (\(1+1\))-EA for bit-wise noise \( (1,\frac{\log n}{30n}) \) and \((\frac{\log n}{30n},1)\). For the former case, we know from Theorem 4 that the expected running time is super-polynomial, while for the latter case, we prove in the following theorem that the expected running time can be polynomially upper bounded.

Theorem 7

For the (\(1+1\))-EA on OneMax under bit-wise noise \((\frac{\log n}{30n},1)\), the expected running time is polynomial.

Proof

We use Lemma 1 to prove it. For any \( j\le k<n \), \( f^n(x^j)\ge f^n(x^{k+1}) \) implies that the fitness evaluation of \( x^j \) or \( x^{k+1} \) is affected by noise, whose probability is at most \( 2\cdot \frac{\log n}{30n} =\frac{\log n}{15n}\). Thus, we have \( \mathrm {P}(f^n(x^j)<f^n(x^{k+1}))\ge 1-\frac{\log n}{15n}\). It is then easy to verify that the condition of Lemma 1 holds with \( c=\frac{1}{15} \) and \( l=\log n \). Thus, the expected running time is polynomial. \(\square \)

4 The LeadingOnes Problem

In this section, we first analyze the running time of the (\(1+1\))-EA on the LeadingOnes problem under bit-wise noise \((p,\frac{1}{n})\) and bit-wise noise (1, q), respectively. Then, we transfer the analysis from bit-wise noise \((p,\frac{1}{n})\) to one-bit noise; the results are complementary to the known ones recently derived in [19]. However, our analysis does not cover all the ranges of p and q. For those values of p and q where no theoretical results are known, we conduct experiments to empirically investigate the running time.

4.1 Bit-Wise Noise \((p,\frac{1}{n})\)

For bit-wise noise \((p,\frac{1}{n})\), we first apply the additive drift theorem (i.e., Theorem 1) to prove that the expected running time is polynomial when \(p=O(\log n/n^2)\).

Theorem 8

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise \((p,\frac{1}{n})\), the expected running time is polynomial if \(p =O(\log n/n^2)\).

Proof

We use Theorem 1 to prove it. For some positive constant b, suppose that \(p \le b\log n /n^2\). Let \(\theta \in (0,1)\) be some constant close to 0. We first construct a distance function V(x) as, for any x with \({\mathrm {LO}}(x)=i\),

$$\begin{aligned} V(x)=\left( 1+\frac{c}{n}\right) ^n-\left( 1+\frac{c}{n}\right) ^i, \end{aligned}$$
(3)

where \(c=\frac{2b\log n}{1-\theta }+1\). It is easy to verify that \(V(x \in \mathcal {X}^*=\{1^n\})=0\) and \(V(x \notin \mathcal {X}^*)>0\).

Then, we investigate \(\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\) for any x with \({\mathrm {LO}}(x)<n\) (i.e., \(x \notin \mathcal {X^*}\)). Assume that currently \({\mathrm {LO}}(x)=i\), where \(0\le i \le n-1\). Let \(\mathrm {P}_{mut}(x,x')\) denote the probability of generating \(x'\) by mutation on x. We divide the drift into two parts: positive \(\mathrm {E}^+\) and negative \(\mathrm {E}^-\). That is,

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)=\mathrm {E}^+-\mathrm {E}^-, \end{aligned}$$

where

$$\begin{aligned}&\mathrm {E}^+=\sum _{x': {\mathrm {LO}}(x')>i}\mathrm {P}_{mut}(x,x')\cdot \mathrm {P}(f^n(x') \ge f^n(x)) \cdot (V(x)-V(x')), \end{aligned}$$
(4)
$$\begin{aligned}&\mathrm {E}^-=\sum _{x': {\mathrm {LO}}(x')<i}\mathrm {P}_{mut}(x,x')\cdot \mathrm {P}(f^n(x') \ge f^n(x))\cdot (V(x')-V(x)). \end{aligned}$$
(5)

For the positive drift \(\mathrm {E}^+\), we need to consider that the number of leading 1-bits is increased. By mutation, we have

$$\begin{aligned} \begin{aligned}&\sum _{x': {\mathrm {LO}}(x')>i}\mathrm {P}_{mut}(x,x')=\mathrm {P}({\mathrm {LO}}(x') \ge i+1)=\left( 1-\frac{1}{n}\right) ^i\frac{1}{n}, \end{aligned} \end{aligned}$$
(6)

since it needs to flip the \((i+1)\)-th bit (which must be 0) of x and keep the i leading 1-bits unchanged. For any \(x'\) with \({\mathrm {LO}}(x') \ge i+1\), \(f^n(x') < f^n(x)\) implies that \(f^n(x') \le i-1\) or \(f^n(x) \ge i+1\). Note that,

$$\begin{aligned} \begin{aligned}&\mathrm {P}(f^n(x') \le i-1)=p\left( 1-\left( 1-\frac{1}{n}\right) ^i\right) , \end{aligned} \end{aligned}$$
(7)

since at least one of the first i leading 1-bits of \(x'\) needs to be flipped by noise;

$$\begin{aligned} \begin{aligned}&\mathrm {P}(f^n(x) \ge i+1)=p\left( 1-\frac{1}{n}\right) ^i\frac{1}{n}, \end{aligned} \end{aligned}$$
(8)

since it needs to flip the first 0-bit of x and keep the leading 1-bits unchanged by noise. By the union bound, we get

$$\begin{aligned} \begin{aligned}&\mathrm {P}(f^n(x') \ge f^n(x)) = 1- \mathrm {P}(f^n(x') < f^n(x))\\&\quad \ge 1-p\left( 1-\left( 1-\frac{1}{n}\right) ^{i+1}\right) \ge 1-p\frac{i+1}{n} \ge 1-p \ge 1-\theta , \end{aligned} \end{aligned}$$
(9)

where the last inequality holds with sufficiently large n, since \(p=O(\log n/n^2)\) and \(\theta \in (0,1)\) is some constant close to 0. Furthermore, for any \(x'\) with \(V(x')\ge i+1\),

$$\begin{aligned} \begin{aligned}&V(x)-V(x') \ge \left( 1+\frac{c}{n}\right) ^{i+1}-\left( 1+\frac{c}{n}\right) ^i=\frac{c}{n}\left( 1+\frac{c}{n}\right) ^i. \end{aligned} \end{aligned}$$
(10)

By combining Eqs. (6), (9) and (10), we have

$$\begin{aligned} \begin{aligned} \mathrm {E}^+ \ge \left( 1-\frac{1}{n}\right) ^i\frac{1}{n} \cdot (1-\theta ) \cdot \frac{c}{n}\left( 1+\frac{c}{n}\right) ^i \ge \frac{(1-\theta )c}{3n^2}\left( 1+\frac{c}{n}\right) ^i, \end{aligned} \end{aligned}$$

where the last inequality is by \((1-\frac{1}{n})^{i} \ge (1-\frac{1}{n})^{n-1} \ge \frac{1}{e}\ge \frac{1}{3}\).

For the negative drift \(\mathrm {E}^-\), we need to consider that the number of leading 1-bits is decreased. By mutation, we have

$$\begin{aligned} \begin{aligned}&\sum _{x': {\mathrm {LO}}(x')<i}\mathrm {P}_{mut}(x,x')=\mathrm {P}({\mathrm {LO}}(x') \le i-1)=1-\left( 1-\frac{1}{n}\right) ^i, \end{aligned} \end{aligned}$$
(11)

since it needs to flip at least one leading 1-bit of x. For any \(x'\) with \({\mathrm {LO}}(x') \le i-1\) (where \(i \ge 1\)), \(f^n(x') \ge f^n(x)\) implies that \(f^n(x') \ge i\) or \(f^n(x) \le i-1\). Note that,

$$\begin{aligned} \mathrm {P}(f^n(x') \ge i)\le p\left( 1-\frac{1}{n}\right) ^{i-1}\frac{1}{n}, \end{aligned}$$
(12)

since for the first i bits of \(x'\), it needs to flip the 0-bits (whose number is at least 1) and keep the 1-bits unchanged by noise;

$$\begin{aligned} \mathrm {P}(f^n(x) \le i-1)=p\left( 1-\left( 1-\frac{1}{n}\right) ^i\right) , \end{aligned}$$
(13)

since at least one leading 1-bit of x needs to be flipped by noise. By the union bound, we get

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \le p-p\left( 1-\frac{2}{n}\right) \left( 1-\frac{1}{n}\right) ^{i-1} \le p\frac{i+1}{n}. \end{aligned}$$
(14)

Furthermore, according to the definition of the distance function (i.e., Eq. (3)), we have for any \(x'\) with \(0 \le {\mathrm {LO}}(x') \le i-1\),

$$\begin{aligned} V(x')-V(x) =\left( 1+\frac{c}{n}\right) ^i-\left( 1+\frac{c}{n}\right) ^{{\mathrm {LO}}(x')}\le \left( 1+\frac{c}{n}\right) ^i-1. \end{aligned}$$
(15)

By combining Eqs. (11), (14) and (15), we have

$$\begin{aligned} \mathrm {E}^-&\le \left( 1-\left( 1-\frac{1}{n}\right) ^i\right) \cdot p\frac{i+1}{n} \cdot \left( \left( 1+\frac{c}{n}\right) ^i-1\right) \\&\le \left( 1-\frac{1}{e}\right) \cdot p \cdot \left( 1+\frac{c}{n}\right) ^i\le \frac{2p}{3}\left( 1+\frac{c}{n}\right) ^i. \end{aligned}$$

Thus, by subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have

$$\begin{aligned}&\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\ge \left( 1+\frac{c}{n}\right) ^i \left( \frac{(1-\theta )c}{3n^2}-\frac{2p}{3}\right) \nonumber \\&\quad \ge \left( 1+\frac{c}{n}\right) ^i \left( \frac{2b\log n+1-\theta }{3n^2}-\frac{2b \log n}{3n^2}\right) \ge \frac{1-\theta }{3n^2}, \end{aligned}$$
(16)

where the second inequality is by \(c=\frac{2b\log n}{1-\theta }+1\) and \(p \le b \log n/n^2\). Note that \(V(x)\le (1+\frac{c}{n})^n \le e^c=e^{\frac{2b\log n}{1-\theta }+1}=en^{\frac{2b}{1-\theta }}\). By Theorem 1, we get

$$\begin{aligned} \mathrm {E}(\tau \mid \xi _0) \le \frac{3n^2}{1-\theta } \cdot en^{\frac{2b}{1-\theta }}=O\left( n^{\frac{2b}{1-\theta }+2}\right) , \end{aligned}$$

i.e., the expected running time is polynomial. \(\square \)

Next we use the negative drift with self-loops theorem (i.e., Theorem 3) to show that the expected running time is super-polynomial for bit-wise noise \((p,\frac{1}{n})\) with \(p=\omega (\log n/n) \cap o(1)\).

Theorem 9

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise \((p,\frac{1}{n})\), if \(p =\omega (\log n/n) \cap o(1)\), the expected running time is super-polynomial.

Proof

We use Theorem 3 to prove it. Let \(X_t=|x|_0\) be the number of 0-bits of the solution x after t iterations of the (\(1+1\))-EA. Let c be any positive constant. We consider the interval \([0,c \log n]\), i.e., the parameters \(a=0\) (i.e., the global optimum) and \(b=c \log n\) in Theorem 3.

Then, we analyze the drift \(\mathrm {E}(X_t-X_{t+1} \mid X_t=i)\) for \(1\le i<c \log n\). As in the proof of Theorem 8, we divide the drift into two parts: positive \(\mathrm {E}^+\) and negative \(\mathrm {E}^-\). That is,

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1} \mid X_t=i)=\mathrm {E}^+-\mathrm {E}^-, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&\mathrm {E}^+=\sum _{x': |x'|_0<i}\mathrm {P}_{mut}(x,x')\cdot \mathrm {P}(f^n(x') \ge f^n(x)) \cdot (i-|x'|_0),\\&\mathrm {E}^-=\sum _{x': |x'|_0>i}\mathrm {P}_{mut}(x,x')\cdot \mathrm {P}(f^n(x') \ge f^n(x))\cdot (|x'|_0-i). \end{aligned} \end{aligned}$$

Note that the drift here depends on the number of 0-bits due to the definition of \(X_t\). It is different from that in the proof of Theorem 8, which depends on the number of leading 1-bits due to the definition of the distance function (i.e., Eq. (3)).

For the positive drift \(\mathrm {E}^+\), we need to consider that the number of 0-bits is decreased. For mutation on x (where \(|x|_0=i\)), let X and Y denote the number of flipped 0-bits and 1-bits, respectively. Then, \(X \sim B(i,\frac{1}{n})\) and \(Y \sim B(n-i,\frac{1}{n})\), where \(B(\cdot ,\cdot )\) is the binomial distribution. To estimate an upper bound on \(\mathrm {E}^+\), we assume that the offspring solution \(x'\) with \(|x'|_0 <i\) is always accepted. Thus, we have

$$\begin{aligned} \mathrm {E}^+&\le \sum _{x':|x'|_0 <i} \mathrm {P}_{mut}(x,x')(i-|x'|_0)=\sum ^i_{k=1} k \cdot \mathrm {P}(X-Y=k)\nonumber \\&=\sum \limits ^i_{k=1} k \cdot \sum \limits ^i_{j=k} \mathrm {P}(X=j) \cdot \mathrm {P}(Y=j-k)\nonumber \\&=\sum \limits ^i_{j=1} \sum \limits ^j_{k=1} k \cdot \mathrm {P}(X=j) \cdot \mathrm {P}(Y=j-k)\nonumber \\&\le \sum \limits ^i_{j=1} j \cdot \mathrm {P}(X=j)=\frac{i}{n}. \end{aligned}$$
(17)

For the negative drift \(\mathrm {E}^-\), we need to consider that the number of 0-bits is increased. We analyze the \(n-i\) cases where only one 1-bit is flipped (i.e., \(|x'|_0=i+1\)), which happens with probability \(\frac{1}{n}(1-\frac{1}{n})^{n-1}\). Assume that \({\mathrm {LO}}(x)=k \le n-i\). If the j-th (where \(1 \le j \le k\)) leading 1-bit is flipped, the offspring solution \(x'\) will be accepted (i.e., \(f^n(x') \ge f^n(x)\)) if \(f^n(x') \ge j-1\) and \(f^n(x) \le j-1\). Note that,

$$\begin{aligned} \begin{aligned} \mathrm {P}(f^n(x') \ge j-1) = 1-p+p\left( 1-\frac{1}{n}\right) ^{j-1}\ge 1-p\frac{j-1}{n} \ge \frac{1}{2}, \end{aligned} \end{aligned}$$
(18)

where the equality is since it needs to keep the \(j-1\) leading 1-bits of \(x'\) unchanged, and the last inequality is by \(p=o(1)\);

$$\begin{aligned} \mathrm {P}(f^n(x) \le j-1)&=p\left( 1-\left( 1-\frac{1}{n}\right) ^{j}\right) \nonumber \\&=p\left( 1-\frac{1}{n}\right) ^j\left( \left( 1+\frac{1}{n-1}\right) ^j-1\right) \ge \frac{p}{e} \cdot \frac{j}{n-1} \ge \frac{pj}{3n}, \end{aligned}$$
(19)

where the first equality is since at least one of the first j leading 1-bits of x needs to be flipped by noise. Thus, we get

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge \frac{pj}{6n}. \end{aligned}$$
(20)

If one of the \(n-i-k\) non-leading 1-bits is flipped, \({\mathrm {LO}}(x')={\mathrm {LO}}(x)=k\). We can use the same analysis procedure as Eq. (9) in the proof of Theorem 8 to derive that

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge 1-p\frac{k+1}{n} \ge \frac{1}{2}, \end{aligned}$$
(21)

where the last inequality is by \(p=o(1)\). Combining all the \(n-i\) cases, we get

$$\begin{aligned} \mathrm {E}^-&\ge \frac{1}{n}\left( 1-\frac{1}{n}\right) ^{n-1} \cdot \left( \sum ^k_{j=1} \frac{pj}{6n}+\frac{n-i-k}{2}\right) \cdot (i+1-i)\nonumber \\&\ge \frac{1}{en}\left( \frac{pk(k+1)}{12n}+\frac{n-i-k}{2}\right) \ge \frac{pk^2}{36n^2}+\frac{n-i-k}{6n}. \end{aligned}$$
(22)

By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we get

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1} \mid X_t=i)\le \frac{i}{n}-\frac{pk^2}{36n^2}-\frac{n-i-k}{6n}. \end{aligned}$$

To investigate condition (1) of Theorem 3, we also need to analyze the probability \(\mathrm {P}(X_{t+1} \ne i \mid X_t=i)\). For \(X_{t+1}\ne i\), it is necessary that at least one bit of x is flipped and the offspring \(x'\) is accepted. We consider two cases: (1) at least one of the k leading 1-bits of x is flipped; (2) the k leading 1-bits of x are not flipped and at least one of the last \(n-k\) bits is flipped. For case (1), the mutation probability is \(1-(1-\frac{1}{n})^k\) and the acceptance probability is at most \(p\frac{k+1}{n}\) by Eq. (14). For case (2), the mutation probability is \((1-\frac{1}{n})^{k}(1-(1-\frac{1}{n})^{n-k}) \le \frac{n-k}{n}\) and the acceptance probability is at most 1. Thus, we have

$$\begin{aligned} \mathrm {P}(X_{t+1} \ne i \mid X_t=i) \le p+\frac{n-k}{n}. \end{aligned}$$
(23)

When \(k<n-np\), we have

$$\begin{aligned}&\mathrm {E}(X_t-X_{t+1} \mid X_t=i)\le \frac{i}{n}-\frac{n-i-k}{6n}\nonumber \\&\quad \le -\frac{n-k}{12n}-\frac{np/2-7c\log n}{6n}\le -\frac{n-k}{12n} \le -\frac{1}{24}\left( p+\frac{n-k}{n}\right) , \end{aligned}$$
(24)

where the second inequality is by \(n-k>np\) and \(i< c \log n\), the third inequality is by \(p =\omega (\log n/n)\) and the last is by \(n-k>np\). When \(k\ge n-np\), we have

$$\begin{aligned}&\mathrm {E}(X_t-X_{t+1} \mid X_t=i)\le \frac{i}{n}-\frac{pk^2}{36n^2}\nonumber \\&\quad \le \frac{c\log n}{n}-\frac{p}{144}\le -\frac{p}{288} \le -\frac{1}{576}\left( p+\frac{n-k}{n}\right) , \end{aligned}$$
(25)

where the second inequality is by \(p=o(1)\) and \(i< c \log n\), the third is by \(p=\omega (\log n/n)\) and the last is by \(n-k \le np\). Combining Eqs. (23), (24) and (25), we get that condition (1) of Theorem 3 holds with \(\epsilon =\frac{1}{576}\).

For condition (2) of Theorem 3, we need to show \(\mathrm {P}(|X_{t+1}-X_{t}|\ge j \mid X_t =i) \le \frac{r(l)}{(1+\delta )^j}\cdot \mathrm {P}(X_{t+1} \ne i \mid X_t=i)\) for \(i \ge 1\). For \(\mathrm {P}(X_{t+1} \ne i \mid X_t=i)\), we analyze the n cases where only one bit is flipped. Using the similar analysis procedure as \(\mathrm {E}^-\), except that flipping any bit rather than only 1-bit is considered here, we easily get

$$\begin{aligned} \mathrm {P}(X_{t+1} \ne i \mid X_t=i) \ge \frac{pk(k+1)}{36n^2}+\frac{n-k}{6n}. \end{aligned}$$
(26)

For \(|X_{t+1}-X_{t}|\ge j\), it is necessary that at least j bits of x are flipped and the offspring solution \(x'\) is accepted. We consider two cases: (1) at least one of the k leading 1-bits is flipped; (2) the k leading 1-bits are not flipped. For case (1), the mutation probability is at most \(\frac{k}{n}\left( {\begin{array}{c}n-1\\ j-1\end{array}}\right) \frac{1}{n^{j-1}}\) and the acceptance probability is at most \(p\frac{k+1}{n}\) by Eq. (14). For case (2), the mutation probability is at most \((1-\frac{1}{n})^k\left( {\begin{array}{c}n-k\\ j\end{array}}\right) \frac{1}{n^{j}}\) and the acceptance probability is at most 1. Thus, we have

$$\begin{aligned}&\mathrm {P}(|X_{t+1}-X_{t}|\ge j \mid X_t =i) \nonumber \\&\quad \le \frac{k}{n}\left( {\begin{array}{c}n-1\\ j-1\end{array}}\right) \frac{1}{n^{j-1}} \cdot p\frac{k+1}{n} +\left( 1-\frac{1}{n}\right) ^k\left( {\begin{array}{c}n-k\\ j\end{array}}\right) \frac{1}{n^{j}}\nonumber \\&\quad \le \frac{pk(k+1)}{n^2} \cdot \frac{4}{2^j}+\frac{n-k}{n}\cdot \frac{2}{2^j} \le \left( \frac{pk(k+1)}{36n^2}+\frac{n-k}{6n}\right) \cdot \frac{144}{2^j}. \end{aligned}$$
(27)

By combining Eq. (26) with Eq. (27), we get that condition (2) of Theorem 3 holds with \(\delta =1\) and \(r(l)=144\).

Note that \(l=b-a=c \log n\). By Theorem 3, the expected running time is \(2^{\varOmega (c \log n)}\), where c is any positive constant. Thus, the expected running time is super-polynomial. \(\square \)

For \(p=\varOmega (1)\), we can use the negative drift theorem (i.e., Theorem 2) to derive a stronger result that the expected running time is exponentially lower bounded.

Theorem 10

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise \((p,\frac{1}{n})\), the expected running time is exponential if \(p=\varOmega (1)\).

Proof

We use Theorem 2 to prove it. Let \(X_t=i\) be the number of 0-bits of the solution x after t iterations of the (\(1+1\))-EA. We consider the interval \(i \in [0,n^{1/2}]\). To analyze the drift \(\mathrm {E}(X_t-X_{t+1} \mid X_t=i)=\mathrm {E}^+-\mathrm {E}^-\), we use the same analysis procedure as Theorem 9. For the positive drift, we have \(\mathrm {E}^+ \le \frac{i}{n}=o(1)\). For the negative drift, we re-analyze Eqs. (20) and (21). From Eqs. (18) and (19), we get that \(\mathrm {P}(f^n(x') \ge j-1) \ge p(1-\frac{j-1}{n})\) and \(\mathrm {P}(f^n(x) \le j-1) \ge \frac{pj}{3n}\). Thus, Eq. (20) becomes

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge \frac{p^2j}{3n}\left( 1-\frac{j-1}{n}\right) . \end{aligned}$$
(28)

For Eq. (21), we need to analyze the acceptance probability for \({\mathrm {LO}}(x')={\mathrm {LO}}(x)=k\). Since it is sufficient to keep the first \((k+1)\) bits of x and \(x'\) unchanged in noise, Eq. (21) becomes

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge p^2\left( 1-\frac{1}{n}\right) ^{2(k+1)}\ge p^2\left( 1-\frac{k+1}{n}\right) ^{2}. \end{aligned}$$
(29)

By applying the above two inequalities to Eq. (22), we have

$$\begin{aligned} \mathrm {E}^- \ge \frac{p^2}{en} \left( \sum ^k_{j=1} \frac{j(n-j+1)}{3n^2}+\frac{(n-i-k)(n-1-k)^2}{n^2}\right) =\varOmega (1), \end{aligned}$$

where the equality is by \(p=\varOmega (1)\). Thus, \(\mathrm {E}^+-\mathrm {E}^-=-\varOmega (1)\). That is, condition (1) of Theorem 2 holds.

Since it is necessary to flip at least j bits of x, we have

$$\begin{aligned} \mathrm {P}(|X_{t+1}-X_{t}|\ge j \mid X_t \ge 1) \le \left( {\begin{array}{c}n\\ j\end{array}}\right) \frac{1}{n^j} \le \frac{1}{j!}\le 2 \cdot \frac{1}{2^j}, \end{aligned}$$

which implies that condition (2) of Theorem 2 holds with \(\delta =1\) and \(r(l)=2\). Note that \(l=n^{1/2}\). Thus, by Theorem 2, the expected running time is exponential. \(\square \)

4.2 Bit-Wise Noise (1, q)

For bit-wise noise (1, q), the proof idea is similar to that for bit-wise noise \((p,\frac{1}{n})\). The main difference led by the change of noise is the probability of accepting the offspring solution, i.e., \(\mathrm {P}(f^n(x') \ge f^n(x))\). We first prove that the expected running time is polynomial when \(q =O(\log n/n^3)\).

Theorem 11

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q), the expected running time is polynomial if \(q =O(\log n/n^3)\).

Proof

The proof is very similar to that of Theorem 8. The change of noise only affects the probability of accepting the offspring solution in the analysis. For some positive constant b, suppose that \(q \le b \log n /n^3\).

For the positive drift \(\mathrm {E}^+\), we need to re-analyze \(\mathrm {P}(f^n(x') \ge f^n(x))\) (i.e., Eq. (9) in the proof of Theorem 8) for the parent x with \({\mathrm {LO}}(x)=i\) and the offspring \(x'\) with \({\mathrm {LO}}(x') \ge i+1\). By bit-wise noise (1, q), Eqs. (7) and (8) change to

$$\begin{aligned} \mathrm {P}(f^n(x') \le i-1)=1-(1-q)^i; \;\; \mathrm {P}(f^n(x) \ge i+1)=(1-q)^iq. \end{aligned}$$

Thus, by the union bound, Eq. (9) becomes

$$\begin{aligned}&\mathrm {P}(f^n(x') \ge f^n(x))\ge 1-(1-(1-q)^i+(1-q)^iq) \nonumber \\&\quad =(1-q)^{i+1}\ge 1-q(i+1) \ge 1-\theta , \end{aligned}$$
(30)

where the last inequality holds with sufficiently large n, since \(q=O(\log n/n^3)\) and \(\theta \in (0,1)\) is some constant close to 0.

For the negative drift \(\mathrm {E}^-\), we need to re-analyze \(\mathrm {P}(f^n(x') \ge f^n(x))\) (i.e., Eq. (14) in the proof of Theorem 8) for the parent x with \({\mathrm {LO}}(x)=i\) (where \(i \ge 1\)) and the offspring \(x'\) with \({\mathrm {LO}}(x') \le i-1\). By bit-wise noise (1, q), Eqs. (12) and (13) change to

$$\begin{aligned} \mathrm {P}(f^n(x') \ge i)\le q(1-q)^{i-1},\quad \mathrm {P}(f^n(x) \le i-1)=1-(1-q)^i. \end{aligned}$$

Thus, by the union bound, Eq. (14) becomes

$$\begin{aligned}&\mathrm {P}(f^n(x') \ge f^n(x))\le q(1-q)^{i-1}+1-(1-q)^i\nonumber \\&\quad =1-(1-q)^{i-1}(1-2q)\le 1-(1-(i-1)q)(1-2q)\le (i+1)q, \end{aligned}$$
(31)

where the second inequality is by \((1-q)^{i-1} \ge 1-(i-1)q\) and \(1-2q >0\) for \(q=O(\log n/n^3)\).

By applying Eq. (30) and Eq. (31) to \(\mathrm {E}^+\) and \(\mathrm {E}^-\), respectively, Eq. (16) changes to

$$\begin{aligned}&\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\ge \left( 1+\frac{c}{n}\right) ^i \left( \frac{(1-\theta )c}{3n^2}-\frac{2q(i+1)}{3}\right) \\&\quad \ge \left( 1+\frac{c}{n}\right) ^i \left( \frac{2b\log n+1-\theta }{3n^2}-\frac{2b n\log n }{3n^3}\right) \ge \frac{1-\theta }{3n^2}. \end{aligned}$$

That is, the condition of Theorem 1 still holds with \(\frac{1-\theta }{3n^2}\). Thus, the expected running time is polynomial. \(\square \)

Next we prove that the expected running time is super-polynomial when q is in the range of \(\omega (\log n/n^2) \cap o(1/n)\).

Theorem 12

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q), if \(q =\omega (\log n/n^2) \cap o(1/n)\), the expected running time is super-polynomial.

Proof

We use the same analysis procedure as Theorem 9. The only difference is the probability of accepting the offspring solution \(x'\) due to the change of noise. For the positive drift, we still have \(\mathrm {E}^+ \le \frac{i}{n}\), since we optimistically assume that \(x'\) is always accepted in the proof of Theorem 9.

For the negative drift, we need to re-analyze \(\mathrm {P}(f^n(x') \ge f^n(x))\) for the parent solution x with \({\mathrm {LO}}(x)=k\) and the offspring solution \(x'\) with \({\mathrm {LO}}(x')=j-1\) (where \(1 \le j \le k+1\)). For \(j\le k\), to derive a lower bound on \(\mathrm {P}(f^n(x') \ge f^n(x))\), we consider the j cases where \(f^n(x)=l\) and \(f^n(x') \ge l\) for \(0 \le l \le j-1\). Since \(\mathrm {P}(f^n(x)=l)=(1-q)^lq\) and \(\mathrm {P}(f^n(x') \ge l)=(1-q)^l\), Eq. (20) changes to

$$\begin{aligned}&\mathrm {P}(f^n(x') \ge f^n(x)) \ge \sum ^{j-1}_{l=0} (1-q)^lq\cdot (1-q)^l\ge \frac{1-(1-q)^{2j}}{2}\nonumber \\&\quad =\frac{1}{2} (1-q)^{2j}\left( \left( 1+\frac{q}{1-q}\right) ^{2j}-1\right) \ge (1-q)^{2j} \frac{qj}{1-q} \ge \frac{qj}{2}, \end{aligned}$$
(32)

where the last inequality is by \((1-q)^{2j} \ge 1-2qj \ge 1/2\) since \(q =o(1/n)\). For \(j=k+1\) (i.e., \({\mathrm {LO}}(x')={\mathrm {LO}}(x)=k\)), we can use the same analysis as Eq. (30) to derive a lower bound \(1-q(k+1) \ge 1/2\), where the inequality is by \(q=o(1/n)\). Thus, Eq. (21) also holds here, i.e.,

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge \frac{1}{2}. \end{aligned}$$
(33)

By applying Eqs. (32) and (33) to \(\mathrm {E}^-\), Eq. (22) changes to

$$\begin{aligned} \mathrm {E}^- \ge \frac{qk^2}{12n}+\frac{n-i-k}{6n}. \end{aligned}$$

Thus, we have

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1} \mid X_t=i)=\mathrm {E}^+-\mathrm {E}^-\le \frac{i}{n}-\frac{qk^2}{12n}-\frac{n-i-k}{6n}. \end{aligned}$$

For the upper bound analysis of \(\mathrm {P}(X_{t+1} \ne i \mid X_t=i)\) in the proof of Theorem 9, we only need to replace the acceptance probability \(p\frac{k+1}{n}\) in the case of \({\mathrm {LO}}(x') < {\mathrm {LO}}(x)\) with \((k+1)q\) (i.e., Eq. (31)). Thus, Eq. (23) changes to

$$\begin{aligned} \mathrm {P}(X_{t+1} \ne i \mid X_t=i) \le (k+1)q+\frac{n-k}{n} \le nq+\frac{n-k}{n}. \end{aligned}$$

To compare \(\mathrm {E}(X_t-X_{t+1} \mid X_t=i)\) with \(\mathrm {P}(X_{t+1} \ne i \mid X_t=i)\), we consider two cases: \(k < n-n^2q\) and \(k \ge n-n^2q\). By using \(q = \omega (\log n /n^2)\) and applying the same analysis procedure as Eqs. (24) and (25), we can derive that condition (1) of Theorem 3 holds with \(\epsilon =\frac{1}{192}\).

For the lower bound analysis of \(\mathrm {P}(X_{t+1} \ne i \mid X_t=i)\), by applying Eqs. (32) and (33), Eq. (26) changes to

$$\begin{aligned} \mathrm {P}(X_{t+1} \ne i \mid X_t=i) \ge \frac{qk(k+1)}{12n}+\frac{n-k}{6n}. \end{aligned}$$

For the analysis of \(|X_{t+1}-X_{t}|\ge j\), by replacing the acceptance probability \(p\frac{k+1}{n}\) in the case of \({\mathrm {LO}}(x') < {\mathrm {LO}}(x)\) with \((k+1)q\), Eq. (27) changes to

$$\begin{aligned} \mathrm {P}(|X_{t+1}-X_{t}|\ge j \mid X_t =i)&\le \frac{qk(k+1)}{n} \cdot \frac{4}{2^j}+\frac{n-k}{n}\cdot \frac{2}{2^j} \\&\le \left( \frac{qk(k+1)}{12n}+\frac{n-k}{6n}\right) \cdot \frac{48}{2^j}. \end{aligned}$$

That is, condition (2) of Theorem 3 holds with \(\delta =1, r(l)=48\). Thus, the expected running time is super-polynomial. \(\square \)

For \(q=\varOmega (1/n)\), we prove a stronger result that the expected running time is exponentially lower bounded.

Theorem 13

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q), the expected running time is exponential if \(q=\varOmega (1/n)\).

Proof

We use Theorem 2 to prove it. Let \(X_t=i\) be the number of 0-bits of the solution x after t iterations of the (\(1+1\))-EA. We consider the interval \(i \in [0,n^{1/2}]\). To analyze the drift \(\mathrm {E}(X_t-X_{t+1} \mid X_t=i)\), we use the same analysis procedure as the proof of Theorem 9.

We first consider \(q = \varOmega (1/n) \cap o(1)\). We need to analyze the probability \(\mathrm {P}(f^n(x') \ge f^n(x))\), where the offspring solution \(x'\) is generated by flipping only one 1-bit of x. Let \({\mathrm {LO}}(x)=k\). For the case where the j-th (where \(1 \le j \le k\)) leading 1-bit is flipped, as the analysis of Eq. (32), we get

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge \frac{1-(1-q)^{2j}}{2} \ge (1-q)^{2j} \frac{qj}{1-q}. \end{aligned}$$

If \((1-q)^{2j}<\frac{1}{2}\), \(\frac{1-(1-q)^{2j}}{2} \ge \frac{1}{4}\); otherwise, \((1-q)^{2j} \frac{qj}{1-q} \ge \frac{qj}{2}\). Thus, we have

$$\begin{aligned} \mathrm {P}(f^n(x') \ge f^n(x)) \ge \min \{1/4,qj/2\}. \end{aligned}$$

For the case that flips one non-leading 1-bit (i.e., \({\mathrm {LO}}(x')={\mathrm {LO}}(x)=k\)), to derive a lower bound on \(\mathrm {P}(f^n(x') \ge f^n(x))\), we consider \(f^{n}(x)=l\) and \(f^{n}(x') \ge l\) for \(0 \le l \le k\). Thus,

$$\begin{aligned} \begin{aligned}&\mathrm {P}(f^n(x') \ge f^n(x)) \ge \sum ^{k-1}_{l=0} (1-q)^lq\cdot (1-q)^l+(1-q)^{k+1} \cdot (1-q)^{k}\\&\quad \ge \frac{1-(1-q)^{2k}}{2}+(1-q)^{2k+1}=\frac{1}{2}+(1-q)^{2k}\left( \frac{1}{2}-q\right) \ge \frac{1}{2}, \end{aligned} \end{aligned}$$

where the last inequality is by \(q=o(1)\). By applying the above two inequalities to Eq. (22), we get

$$\begin{aligned} \mathrm {E}^- \ge \frac{1}{en} \left( \sum ^{k}_{j=1}\min \left\{ \frac{1}{4},\frac{qj}{2}\right\} +\frac{n-i-k}{2}\right) . \end{aligned}$$

If \(k \ge \frac{n}{2}\), \(\sum ^{k}_{j=1}\min \{\frac{1}{4},\frac{qj}{2}\}=\varOmega (n)\) since \(q=\varOmega (1/n)\). If \(k <\frac{n}{2}\), \(\frac{n-i-k}{2}=\varOmega (n)\) since \(i \le \sqrt{n}\). Thus, \(\mathrm {E}^{-}=\varOmega (1)\).

For \(q=\varOmega (1)\), we use the trivial lower bound q for the probability of accepting the offspring solution \(x'\), since it is sufficient to flip the first leading 1-bit of x by noise. Then,

$$\begin{aligned} \mathrm {E}^- \ge \frac{1}{en} (k q +(n-i-k) q)=\frac{(n-i)q}{en}=\varOmega (1). \end{aligned}$$

Thus, for \(q=\varOmega (1/n)\), we have

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1} \mid X_t=i) =\mathrm {E}^+-\mathrm {E}^-\le \frac{i}{n}-\varOmega (1)=-\varOmega (1). \end{aligned}$$

That is, condition (1) of Theorem 2 holds. Its condition (2) trivially holds with \(\delta =1\) and \(r(l)=2\). Thus, the expected running time is exponential. \(\square \)

4.3 One-Bit Noise

For the (\(1+1\))-EA on LeadingOnes under one-bit noise, it has been known that the expected running time is polynomial if \(p \le 1/(6en^2)\) and exponential if \(p=1/2\) [19]. We extend this result by proving in Theorem 14 that the expected running time is polynomial if \(p =O(\log n/n^2)\) and super-polynomial if \(p=\omega (\log n/n)\). The proof can be accomplished in the same way as that of Theorems 89 and 10 for bit-wise noise \((p,\frac{1}{n})\). This is because although the probabilities \(\mathrm {P}(f^n(x') \ge f^n(x))\) of accepting the offspring solution are different, their bounds used in the proofs for bit-wise noise \((p,\frac{1}{n})\) still hold for one-bit noise.

Theorem 14

For the (\(1+1\))-EA on LeadingOnes under one-bit noise, the expected running time is polynomial if \(p =O(\log n/n^2)\), super-polynomial if \(p=\omega (\log n/n) \cap o(1)\) and exponential if \(p=\varOmega (1)\).

Proof

We re-analyze \(\mathrm {P}(f^n(x') \ge f^n(x))\) for one-bit noise, and show that the bounds on \(\mathrm {P}(f^n(x') \ge f^n(x))\) used in the proofs for bit-wise noise \((p,\frac{1}{n})\) still hold for one-bit noise.

For the proof of Theorem 8, Eqs. (7) and (8) change to

$$\begin{aligned} \mathrm {P}(f^n(x') \le i-1)=p\frac{i}{n}, \qquad \mathrm {P}(f^n(x) \ge i+1)=p\frac{1}{n}, \end{aligned}$$

and thus Eq. (9) still holds; Eqs. (12) and (13) change to

$$\begin{aligned} \mathrm {P}(f^n(x') \ge i)\le p\frac{1}{n}, \qquad \mathrm {P}(f^n(x) \le i-1)=p\frac{i}{n}, \end{aligned}$$

and thus Eq. (14) still holds.

For the proof of Theorem 9, Eqs. (18) and (19) change to

$$\begin{aligned} \mathrm {P}(f^n(x') \ge j-1) = 1-p\frac{j-1}{n}, \quad \mathrm {P}(f^n(x) \le j-1)=p\frac{j}{n}, \end{aligned}$$

and thus Eq. (20) still holds.

For the proof of Theorem 10, Eq. (28) still holds by the above two equalities; Eq. (29) still holds since the probability of keeping the first \((k+1)\) bits of a solution unchanged in one-bit noise is \(1-p\frac{k+1}{n} \ge p(1-\frac{k+1}{n})\). \(\square \)

4.4 Experiments

In the previous three subsections, we have proved that for the (\(1+1\))-EA solving the LeadingOnes problem, if under bit-wise noise \((p,\frac{1}{n})\), the expected running time is polynomial when \(p=O(\log n /n^2)\) and super-polynomial when \(p=\omega (\log n /n)\); if under bit-wise noise (1, q), the expected running time is polynomial when \(q=O(\log n /n^3)\) and super-polynomial when \(q=\omega (\log n /n^2)\); if under one-bit noise, the expected running time is polynomial when \(p=O(\log n /n^2)\) and super-polynomial when \(p=\omega (\log n /n)\). However, the current analysis does not cover all the ranges of p and q. We thus have conducted experiments to complement the theoretical results.

For bit-wise noise \((p,\frac{1}{n})\), we do not know whether the running time is polynomial or super-polynomial when \(p=\omega (\log n/n^2) \cap O(\log n/n)\). We empirically estimate the expected running time for \(p=(\log n/n)^2\), \(\log n/n^{3/2}\) and \(\log n/n\). On each problem size n, we run the (\(1+1\))-EA 1000 times independently. In each run, we record the number of fitness evaluations until an optimal solution w.r.t. the true fitness function is found for the first time. Then the total number of evaluations of the 1000 runs are averaged as the estimation of the expected running time. To show the relationship between the estimated expected running time and the problem size n clearly, we plot the curve of \(\log (\text {estimated expected running time})/\log n\), as shown in Fig. 1. Note that in subfigures (a) and (b), the problem size n is in the range from 5 to 100, while in subfigure (c), n is from 5 to 45. This is because the expected running time with \(n > 45\) in subfigure (c) is too large to be estimated. We can observe that all the curves continue to rise as n increases, which suggests that the expected running time for the three tested p values is all \(n^{\omega (1)}\), i.e., super-polynomial. For bit-wise noise (1, q) and one-bit noise, we also empirically estimate the expected running time for the values of q and p, which are uncovered by our theoretical analysis. The results are plotted in Figs. 2 and 3, respectively, which are similar to that observed for bit-wise noise \((p,\frac{1}{n})\).

Fig. 1
figure 1

Estimated expected running time for the (\(1+1\))-EA on LeadingOnes under bit-wise noise \((p,\frac{1}{n})\), where the y-axis is (the logarithm of estimated expected running time) divided by \(\log n\). a\(p=(\log n/n)^2\). b\(p=\log n/n^{3/2}\). c\(p=\log n/n\)

Fig. 2
figure 2

Estimated expected running time for the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q), where the y-axis is (the logarithm of estimated expected running time) divided by \(\log n\). a\(q=(\log n)^2/n^3\). b\(q=\log n/n^{5/2}\). c\(q=\log n/n^2\)

Fig. 3
figure 3

Estimated expected running time for the (\(1+1\))-EA on LeadingOnes under one-bit noise, where the y-axis is (the logarithm of estimated expected running time) divided by \(\log n\). a\(p=(\log n/n)^2\). b\(p=\log n/n^{3/2}\). c\(p=\log n/n\)

Therefore, these empirical results suggest that the expected running time is super-polynomial for the uncovered ranges of p and q in theoretical analysis, and thus the currently derived ranges of p and q allowing a polynomial running time might be tight. The rigorous analysis is not easy. We may need to analyze transition probabilities between fitness levels more precisely, and design an ingenious distance function or use more advanced analysis tools. We leave it as a future work.

Fig. 4
figure 4

Estimated expected running time (ERT) for the (\(1+1\))-EA on OneMax under noise. Note that for the three studied noise models, the largest noise levels allowing a polynomial running time derived in theoretical analysis are \(O(\log n/n)\), \(O(\log n/n^2)\) and \(O(\log n/n)\), respectively. a Bit-wise noise \((p,\frac{1}{n})\). b Bit-wise noise (1, q). c One-bit noise

Since the theoretical results are all asymptotic, we also empirically compare the expected running time to see when the asymptotic behaviors can be clearly distinguished. For each problem and each kind of noise, we estimate the expected running time for the largest noise level (denoted by a) allowing a polynomial running time derived in our theoretical analysis, and then compare it with the estimated expected running time for one relatively smaller noise level \(a/\log n\), and two relatively larger noise levels \(a\cdot \log n\) and \(a \cdot \sqrt{n}\). For example, for the OneMax problem under bit-wise noise \((p,\frac{1}{n})\), the largest noise level allowing a polynomial running time is \(O(\log n/n)\); thus we compare the estimated expected running time for \(p=1/n\), \(\log n/n\), \((\log n)^2/n\) and \(\log n/n^{1/2}\). The results are plotted in Figs. 4 and 5. We can observe that for the OneMax and LeadingOnes problems, the asymptotic polynomial behaviors (i.e., ‘green \(\circ \)’ and ‘red \(\times \)’) can be distinguished when the problem size n reaches nearly 200; while the asymptotic behaviors of polynomial (i.e., ‘green \(\circ \)’ and ‘red \(\times \)’) and super-polynomial (i.e., ‘blue \(\bigtriangleup \)’ and ‘black \(*\)’) can be clearly distinguished when n reaches 50 and 100, respectively.

5 The Robustness of Sampling to Noise

From the derived results in the above two sections, we can observe that the (\(1+1\))-EA is efficient for solving OneMax and LeadingOnes only under low noise levels. For example, for the (\(1+1\))-EA solving OneMax under bit-wise noise \((p,\frac{1}{n})\), the optimal solution can be found in polynomial time only when \(p=O(\log n/n)\). In this section, we analyze the robustness of the sampling strategy to noise. Sampling as presented in Definition 5 evaluates the fitness of a solution multiple (m) times independently and then uses the average to approximate the true fitness. We show that using sampling can significantly increase the largest noise level allowing a polynomial running time. For example, if using sampling with \(m=4n^3\), the (\(1+1\))-EA can always solve OneMax under bit-wise noise \((p,\frac{1}{n})\) in polynomial time, regardless of the value of p.

Fig. 5
figure 5

Estimated expected running time (ERT) for the (\(1+1\))-EA on LeadingOnes under noise. Note that for the three studied noise models, the largest noise levels allowing a polynomial running time derived in theoretical analysis are \(O(\log n/n^2)\), \(O(\log n/n^3)\) and \(O(\log n/n^2)\), respectively. a Bit-wise noise \((p,\frac{1}{n})\). b Bit-wise noise (1, q). c One-bit noise

5.1 The OneMax Problem

We prove in Theorems 15 and 18 that under bit-wise noise \((p,\frac{1}{n})\) or one-bit noise, the (\(1+1\))-EA can always solve OneMax in polynomial time by using sampling. For bit-wise noise (1, q), the tight range of q allowing a polynomial running time is \(1/2-1/n^{O(1)}\), as shown in Theorems 16 and 17. Let \(x^k\) denote any solution with k number of 1-bits, and \(f^n(x^{k})\) denote its noisy objective value. For proving polynomial upper bounds, we use Lemma 1, which gives a sufficient condition based on the probability \(\mathrm {P}(f^n(x^j) < f^{n}(x^{k+1}))\) for \(j \le k\). But for the (\(1+1\))-EA using sampling, the probability changes to be \(\mathrm {P}(\hat{f}(x^j) < \hat{f}(x^{k+1}))\), where \(\hat{f}(x^j)=\frac{1}{m}\sum ^{m}_{i=1}f^n_i(x^j)\) as shown in Definition 5. Lemma 1 requires a lower bound on \(\mathrm {P}(\hat{f}(x^j) < \hat{f}(x^{k+1}))\). Our proof idea as presented in Lemma 3 is to derive a lower bound on the expectation of \(f^{n}(x^{k+1})-f^n(x^j)\) and then apply Chebyshev’s inequality. We will directly use Lemma 3 in the following proofs. For proving super-polynomial lower bounds, we use Lemma 2 by replacing \(\mathrm {P}(f^n(x^k) < f^{n}(x^{k+1}))\) with \(\mathrm {P}(\hat{f}(x^k) < \hat{f}(x^{k+1}))\). Let poly(n) indicate any polynomial of n.

Lemma 3

Suppose there exists a real number \(\delta > 0\) such that

$$\begin{aligned} \forall j\le k<n: \mathrm {E}\left( f^n(x^{k+1})-f^n(x^j)\right) \ge \delta , \end{aligned}$$

then the (\(1+1\))-EA using sampling with \(m=n^3/\delta ^2\) needs polynomial number of iterations in expectation for solving noisy OneMax.

Proof

We use Lemma 1 to prove it. For any \(j\le k<n\), let \(Y_{k,j}=f^n(x^{k+1})-f^n(x^j)\) and \(\hat{Y}_{k,j}=\hat{f}(x^{k+1})-\hat{f}(x^j)\). We then need to analyze the probability \(\mathrm {P}(\hat{f}(x^j)<\hat{f}(x^{k+1}))=\mathrm {P}(\hat{Y}_{k,j}>0)\).

Denote the expectation \(\mathrm {E}(Y_{k,j})\) as \(\mu _{k,j}\) and the variance \(\mathrm {Var}(Y_{k,j})\) as \(\sigma ^2_{k,j}\). It is easy to verify that \(\mathrm {E}(\hat{Y}_{k,j})=\mu _{k,j}\) and \(\mathrm {Var}(\hat{Y}_{k,j})=\sigma ^2_{k,j}/m\). By Chebyshev’s inequality, we have

$$\begin{aligned} \mathrm {P}(\hat{Y}_{k,j} \le 0) \le \mathrm {P}(|\hat{Y}_{k,j} -\mu _{k,j}|\ge \mu _{k,j}/2) \le 4\sigma ^2_{k,j}/\left( m\mu ^2_{k,j}\right) . \end{aligned}$$

Since \(\mu _{k,j}\ge \delta >0\), \(\sigma ^2_{k,j}=\mathrm {E}(Y^2_{k,j})-\mu ^2_{k,j} \le n^2\) and \(m=n^3/\delta ^2\), we have

$$\begin{aligned} \mathrm {P}(\hat{Y}_{k,j} \le 0) \le 4/n \le \log n/(15n), \end{aligned}$$

where the last inequality holds with sufficiently large n. Let \(l=\log n\). Then, \(\mathrm {P}(\hat{Y}_{k,j}> 0) \ge 1-\frac{\log n}{15n} > 1- \frac{l}{n}\). Let \(c=\frac{1}{15}\). For \(k<n-l, \mathrm {P}(\hat{Y}_{k,j} >0) \ge 1-c\frac{n-k}{n}\). Thus, the condition of Lemma 1 (i.e., Eq. (1)) holds. We then get that the expected number of iterations is \(O(n\log n)+n2^{O(\log n)}=n^{O(1)}\), i.e., polynomial. \(\square \)

For bit-wise noise \((p,\frac{1}{n})\), we apply Lemma 3 to prove that the (\(1+1\))-EA using sampling with \(m=4n^3\) can always solve OneMax in polynomial time, regardless of the value of p.

Theorem 15

For the (\(1+1\))-EA on OneMax under bit-wise noise \((p,\frac{1}{n})\), if using sampling with \(m=4n^3\), the expected running time is polynomial.

Proof

We use Lemma 3 to prove it. Since \(\mathrm {E}(f^n(x^j))=j(1-\frac{p}{n})+(n-j)\frac{p}{n}=(1-\frac{2p}{n})j+p\), we have, for any \(j \le k < n\),

$$\begin{aligned} \mathrm {E}\left( f^n(x^{k+1})-f^n(x^j)\right) =\left( 1-\frac{2p}{n}\right) (k+1-j)\ge 1-\frac{2p}{n} \ge 1/2, \end{aligned}$$

where the last inequality holds with \(n \ge 4\). Thus, by Lemma 3, we get that the expected number of iterations of the (\(1+1\))-EA using sampling with \(m=4n^3\) is polynomial. Since each iteration takes \(2m=8n^3\) number of fitness evaluations, the expected running time is also polynomial. \(\square \)

For bit-wise noise (1, q), we prove in the following two theorems that by using sampling, the tight range of q allowing a polynomial running time is \(1/2-1/n^{O(1)}\).

Theorem 16

For the (\(1+1\))-EA on OneMax under bit-wise noise (1, q) with \(q= 1/2-1/n^{O(1)}\), if using sampling, there exists some \(m = O(poly(n))\) such that the expected running time is polynomial.

Proof

We use Lemma 3 to prove it. Since \(q= 1/2-1/n^{O(1)}\), there exists a positive constant c such that \(q\le 1/2-1/n^c\). It is easy to verify that \(\mathrm {E}(f^n(x^j))=j(1-q)+(n-j)q = (1-2q)j+nq\). Thus, for any \(j \le k < n\),

$$\begin{aligned} \mathrm {E}\left( f^n(x^{k+1})-f^n(x^j)\right) =(1-2q)(k+1-j)\ge 1-2q \ge 2/n^{c}. \end{aligned}$$

By Lemma 3, we get that if using sampling with \(m=n^{3+2c}/4\), the expected number of iterations is polynomial, and then the expected running time is polynomial. Thus, the theorem holds. \(\square \)

Theorem 17

For the (\(1+1\))-EA on OneMax under bit-wise noise (1, q) with \(q=1/2-1/n^{\omega (1)}\) or \(q\ge 1/2\), if using sampling with any \(m=O(poly(n))\), the expected running time is exponential.

Proof

We use Lemma 2 to prove it. Note that for the (\(1+1\))-EA using sampling, we have to analyze \(\mathrm {P}(\hat{f}(x^k)< \hat{f}(x^{k+1}))\) instead of \(\mathrm {P}(f^n(x^k)< f^{n}(x^{k+1}))\).

Let Z denote a random variable which satisfies that \(\mathrm {P}(Z=0)=q\) and \(\mathrm {P}(Z=1)=1-q\). In the following proof, each \(Z_i\) is an independent random variable, which has the same distribution as Z. We have \( f^n(x^k)=\sum ^k_{i=1}Z_i+\sum ^n_{i=k+1}(1-Z_i)\), and then,

$$\begin{aligned}&f^n(x^{k+1})-f^n(x^k)\\&\quad =\sum _{i=1}^{k+1}Z_i+\sum _{i=k+2}^n(1-Z_i) -\sum _{i=n+1}^{n+k}Z_i-\sum _{i=n+k+1}^{2n}(1-Z_i) \\&\quad =\sum _{i=1}^{n+1}Z_i-\sum _{i=n+2}^{2n}Z_i-1. \end{aligned}$$

Since \(\hat{f}(x^k)=\frac{1}{m}\sum ^{m}_{i=1} f^n_i(x^k)\), which is the average of m independent evaluations, we have

$$\begin{aligned}&m\left( \hat{f}(x^{k+1})-\hat{f}(x^k)\right) =\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+n+1}Z_i-\sum ^{m-1}_{j=0}\sum _{i=2nj+n+2}^{2n(j+1)}Z_i-m\\&\quad =\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i +\left( \sum ^{m-1}_{j=0}\sum _{i=2nj+3}^{2nj+n+1}Z_i-\sum ^{m-1}_{j=0}\sum _{i=2nj+n+2}^{2n(j+1)}Z_i\right) -m\\&\quad =\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i +Z^*-m, \end{aligned}$$

where \(Z^*=\sum ^{m-1}_{j=0}\sum _{i=2nj+3}^{2nj+n+1}Z_i-\sum ^{m-1}_{j=0}\sum _{i=2nj+n+2}^{2n(j+1)}Z_i\). To make \(\hat{f}(x^k) \ge \hat{f}(x^{k+1})\), it is sufficient that \(Z^* \le 0\) and \(\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i \le m\). That is,

$$\begin{aligned} \mathrm {P}\left( \hat{f}(x^k) \ge \hat{f}(x^{k+1})\right) \ge \mathrm {P}(Z^* \le 0) \cdot \mathrm {P}\left( \sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i \le m\right) . \end{aligned}$$
(34)

Since \(Z^*\) is the difference between the sum of the same number of \(Z_i\), \(Z^*\) has the same distribution as \(-Z^*\). Thus, \(\mathrm {P}(Z^*\le 0)+\mathrm {P}(Z^*\ge 0)=\mathrm {P}(Z^*\le 0)+\mathrm {P}(-Z^*\le 0)=2\mathrm {P}(Z^*\le 0)\ge 1\), which implies that

$$\begin{aligned} \mathrm {P}\left( Z^*\le 0\right) \ge 1/2. \end{aligned}$$
(35)

We then investigate \(\mathrm {P}(\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i\le m)\). Since \(\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i\) is the sum of 2m independent random variables which have the same distribution as Z, we have

$$\begin{aligned}&\mathrm {P}\left( \sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i \le m\right) =\sum _{t=0}^m\left( {\begin{array}{c}2m\\ t\end{array}}\right) (1-q)^tq^{2m-t},\\&\mathrm {P}\left( \sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i > m\right) =\sum _{t=m+1}^{2m}\left( {\begin{array}{c}2m\\ t\end{array}}\right) (1-q)^tq^{2m-t} =\sum _{t=0}^{m-1}\left( {\begin{array}{c}2m\\ t\end{array}}\right) (1-q)^{2m-t}q^t. \end{aligned}$$

For any \(t<m\), let \(r= \frac{(1-q)^tq^{2m-t}}{(1-q)^{2m-t}q^t} =(\frac{q}{1-q})^{2m-2t}\). If \(q\ge 1/2\), we have \(r\ge 1\). If \(q=1/2-1/n^{\omega (1)}\), we have

$$\begin{aligned} r&\ge \left( \frac{q}{1-q}\right) ^{2m}=\left( 1-\frac{1-2q}{1-q}\right) ^{2m}\\&\ge \left( 1-\frac{4}{n^{\omega (1)}}\right) ^{2m}\ge \left( \frac{1}{e}\right) ^{2m/(n^{\omega (1)}/4-1)}\ge \frac{1}{e}, \end{aligned}$$

where the first inequality is by \(q\le 1/2\), the second inequality is by \(1-2q=2/n^{\omega (1)}\) and \(1-q \ge 1/2\), and the last is by \(m=O(poly(n))\). Thus, \(\mathrm {P}(\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i \le m)>1/3\cdot \mathrm {P}(\sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i > m)\), which implies that

$$\begin{aligned} \mathrm {P}\left( \sum ^{m-1}_{j=0}\sum _{i=2nj+1}^{2nj+2}Z_i \le m\right) >1/4. \end{aligned}$$
(36)

By applying Eqs. (35) and (36) to Eq. (34), we get

$$\begin{aligned} \mathrm {P}\left( \hat{f}(x^k) \ge \hat{f}(x^{k+1})\right) \ge 1/8. \end{aligned}$$

Let \(c=16\) and \(l=n/128\). For any \(n-l\le k<n\),

$$\begin{aligned} \mathrm {P}\left( \hat{f}(x^{k})< \hat{f}(x^{k+1})\right) =1-\mathrm {P} \left( \hat{f}(x^{k})\ge \hat{f}(x^{k+1})\right) \le 1- \frac{cl}{n}\le 1- \frac{c(n-k)}{n}, \end{aligned}$$

i.e., the condition of Lemma 2 holds. Thus, the expected number of iterations is \(2^{\varOmega (n/128)}\), and the expected running time is exponential. \(\square \)

For one-bit noise, we show that using sampling with \(m =4n^3\) is sufficient to make the (\(1+1\))-EA solve OneMax in polynomial time.

Theorem 18

For the (\(1+1\))-EA on OneMax under one-bit noise, if using sampling with \(m =4n^3\), the expected running time is polynomial.

Proof

It is easy to verify that the expectation of \(f^n(x^j)\) (i.e., \(\mathrm {E}(f^n(x^j))\)) under one-bit noise is the same as that under bit-wise noise \((p,\frac{1}{n})\). Thus, the proof can be finished in the same way as that of Theorem 15. \(\square \)

From the above analysis, we can intuitively explain why sampling is always effective for bit-wise noise \((p,\frac{1}{n})\) and one-bit noise, while it fails for bit-wise noise (1, q) when \(q=1/2-1/n^{\omega (1)}\) or \(q\ge 1/2\). For two solutions x and y with \(f(x) > f(y)\), if under bit-wise noise \((p,\frac{1}{n})\) and one-bit noise, the noisy fitness \(f^n(x)\) is larger than \(f^n(y)\) in expectation, and using sampling will increase this trend and make the probability of accepting the true worse solution y sufficiently small. If under bit-wise noise (1, q), when \(q=1/2-1/n^{\omega (1)}\), although the noisy fitness \(f^n(x)\) is still larger in expectation, the gap is very small (in the order of \(1/n^{\omega (1)}\)) and a polynomial sample size is not sufficient to make the probability of accepting the true worse solution y small enough; when \(q\ge 1/2\), the noisy fitness \(f^n(x)\) is smaller in expectation, and using sampling will increase this trend and it obviously does not work.

5.2 The LeadingOnes Problem

The bit-wise noise \((p,\frac{1}{n})\) model is first considered. We prove in Theorem 19 that the (\(1+1\))-EA using sampling can solve the LeadingOnes problem in polynomial time, regardless of the value of p. The proof idea is similar to that of Theorem 8. The main difference is the probability of accepting the offspring solution \(x'\), which is changed from \(\mathrm {P}(f^n(x')\ge f^n(x))\) to \(\mathrm {P}(\hat{f}(x')\ge \hat{f}(x))\) due to sampling. Lemma 4 gives some bounds on this probability, which will be used in the proof of Theorem 19.

Lemma 4

For the LeadingOnes problem under bit-wise noise \((p,\frac{1}{n})\), if using sampling with \(m=144n^6\), it holds that

  1. (1)

    for any x with \({\mathrm {LO}}(x)=i<n\) and y with \({\mathrm {LO}}(y) \le i-2\) or \({\mathrm {LO}}(y)=i-1 \wedge y_{i+1}=0\), \(\mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le 1/n^2\).

  2. (2)

    for any y with \({\mathrm {LO}}(y) <n\), \(\mathrm {P}(\hat{f}(1^n) \le \hat{f}(y))\le 1/(4n^4)\).

Proof

The proof is finished by deriving a lower bound on the expectation of \(f^n(x)-f^n(y)\) (which is equal to the expectation of \(\hat{f}(x)-\hat{f}(y)\)) and then applying Chebyshev’s inequality. We first consider case (1). For any x with \({\mathrm {LO}}(x)=i<n\),

$$\begin{aligned} \mathrm {E}(f^n(x))&\ge (1-p)\cdot i+ \sum _{j=1}^i p\left( 1-\frac{1}{n}\right) ^{j-1}\frac{1}{n}\cdot (j-1)\nonumber \\&\quad +p\left( 1-\frac{1}{n}\right) ^{i}\frac{1}{n}\cdot (i+1)+ p\left( 1-\frac{1}{n}\right) ^{i+1}\cdot i,\nonumber \\ \mathrm {E}(f^n(x))&\le (1-p)\cdot i+ \sum _{j=1}^{i}p\left( 1-\frac{1}{n}\right) ^{j-1}\frac{1}{n}\cdot (j-1) \nonumber \\&\quad +p\left( 1-\frac{1}{n}\right) ^{i}\frac{1}{n}\cdot n +p\left( 1-\frac{1}{n}\right) ^{i+1}\cdot i. \end{aligned}$$
(37)

Note that when flipping the first 0-bit of x and keeping the i leading 1-bits unchanged, the fitness is at least \(i+1\) and at most n. Then for any \(1\le i<n\), we have

$$\begin{aligned}&\mathrm {E}(f^n(x)-f^n(y)\mid {\mathrm {LO}}(x)=i \wedge {\mathrm {LO}}(y)=i-1)\nonumber \\&\quad \ge 1-p+p\left( 1-\frac{1}{n}\right) ^{i-1}\frac{1}{n}\cdot (i-1) +\frac{p}{n}\left( 1-\frac{1}{n}\right) ^{i-1}\cdot \left( \left( 1-\frac{1}{n}\right) (i+1)-n\right) \nonumber \\&\qquad +p\left( 1-\frac{1}{n}\right) ^{i}\left( \left( 1-\frac{1}{n}\right) i-(i-1)\right) \nonumber \\&\quad =1-p+p\left( 1-\frac{1}{n}\right) ^{i-1}\left( \frac{i-1}{n}-\frac{1}{n^2}\right) \ge \left\{ \begin{array}{lcl} -1/n^2, \; &{} \text {if} \;\;\; i=1 \\ 1/(6n), \; &{} \text {if} \;\;\; i\ge 2\\ \end{array} \right. . \end{aligned}$$
(38)

Thus, for any x with \({\mathrm {LO}}(x)=i\) and y with \({\mathrm {LO}}(y) \le i-2\) (where \(2 \le i<n\)), letting z be any solution with \({\mathrm {LO}}(z)={\mathrm {LO}}(y)+1\), we have

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y))&=\mathrm {E}(f^n(x)-f^n(z))+\mathrm {E}(f^n(z)-f^n(y))\nonumber \\&\ge \frac{1}{6n} \cdot (i-{\mathrm {LO}}(y)-1)-\frac{1}{n^2} \ge \frac{1}{6n} -\frac{1}{n^2} \ge \frac{1}{12n}, \end{aligned}$$
(39)

where the first inequality is by repeatedly applying Eq. (38), the second inequality is by \({\mathrm {LO}}(y)\le i-2\), and the last holds with \(n \ge 12\).

When \({\mathrm {LO}}(y)=i-1 \wedge y_{i+1}=0\), we have to re-analyze Eq. (38) to derive a tighter lower bound, because applying Eq. (38) directly will lead to a negative lower bound for \(i=1\). We first derive a tighter upper bound on \(\mathrm {E}(f^n(y))\). When flipping the i-th bit of y and keeping the \(i-1\) leading 1-bits unchanged, we can now further consider the flipping of the \((i+1)\)-th bit since we know that \(y_{i+1}=0\), rather than directly using a trivial upper bound n on the noisy fitness. If \(y_{i+1}\) is not flipped, \(f^n(y)= i\); otherwise, \(f^n(y)\le n\). Thus, we get

$$\begin{aligned}&\mathrm {E}(f^n(y) \mid {\mathrm {LO}}(y)=i-1 \wedge y_{i+1}=0)\\&\quad \le (1-p)\cdot (i-1)+ \sum _{j=1}^{i-1}p\left( 1-\frac{1}{n}\right) ^{j-1}\frac{1}{n}\cdot (j-1) \\&\qquad +p\left( 1-\frac{1}{n}\right) ^{i-1}\frac{1}{n}\cdot \left( \frac{1}{n}n+\left( 1-\frac{1}{n}\right) i\right) +p\left( 1-\frac{1}{n}\right) ^{i}\cdot (i-1). \end{aligned}$$

By combining this inequality and the lower bound in Eq. (37), we have that, for any x with \({\mathrm {LO}}(x)=i\) and y with \({\mathrm {LO}}(y) = i-1 \wedge y_{i+1}=0\) (where \(1 \le i<n\)),

$$\begin{aligned}&\mathrm {E}(f^n(x)-f^n(y)) \nonumber \\&\quad \ge 1-p+p\left( 1-\frac{1}{n}\right) ^{i-1}\frac{1}{n}\cdot (i-1) +p\left( 1-\frac{1}{n}\right) ^{i}\left( \left( 1-\frac{1}{n}\right) i-(i-1)\right) \nonumber \\&\qquad +\frac{p}{n}\left( 1-\frac{1}{n}\right) ^{i-1}\cdot \left( \left( 1-\frac{1}{n}\right) (i+1)-\left( i+1-\frac{i}{n}\right) \right) \nonumber \\&\quad =1-p+p\left( 1-\frac{1}{n}\right) ^{i-1}\left( 1-\frac{2}{n}+\frac{i-1}{n^2}\right) \nonumber \\&\quad \ge 1-p+p\cdot \frac{1}{e}\cdot \frac{1}{2} \ge 1-\frac{5}{6}p \ge \frac{1}{6}, \end{aligned}$$
(40)

where the second inequality holds with \(n \ge 4\).

According to Eqs. (39) and (40), we have a unified lower bound 1 / (12n) on \(\mathrm {E}(f^n(x)-f^n(y))\). Denote \(\mathrm {E}(f^n(x)-f^n(y))\) as \(\mu \) and \(\mathrm {Var}(f^n(x)-f^n(y))\) as \(\sigma ^2\). We have \(\mu \ge 1/(12n)\) and \(\sigma ^2 \le n^2\) since \(|f^n(x)-f^n(y)| \le n\). As \(\hat{f}(x)\) is the average of \(m=144n^6\) independent evaluations, it is easy to verify that \(\mathrm {E}(\hat{f}(x)-\hat{f}(y))=\mu \) and \(\mathrm {Var}(\hat{f}(x)-\hat{f}(y))=\sigma ^2/m\). By Chebyshev’s inequality,

$$\begin{aligned} \mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le \mathrm {P}(|(\hat{f}(x)-\hat{f}(y))-\mu |\ge \mu )\le \sigma ^2/(m\mu ^2)\le 1/n^2. \end{aligned}$$
(41)

Thus, case (1) holds.

For case (2), we first analyze \(\mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))\). The expectation on \(f^n(1^n)\) can be easily calculated as follows:

$$\begin{aligned} \mathrm {E}(f^n(1^n))=(1-p)\cdot n+\sum _{j=1}^{n}p\left( 1-\frac{1}{n}\right) ^{j-1}\frac{1}{n}\cdot (j-1) +p\left( 1-\frac{1}{n}\right) ^n\cdot n. \end{aligned}$$

Combining this equality with the upper bound in Eq. (37), we get

$$\begin{aligned}&\mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))\nonumber \\&\quad \ge 1-p+p\left( 1-\frac{1}{n}\right) ^{n-1}\frac{1}{n}\cdot (n-1) +p\left( 1-\frac{1}{n}\right) ^{n}-p\left( 1-\frac{1}{n}\right) ^{n-1} \nonumber \\&\quad =1-p+p\left( 1-\frac{1}{n}\right) ^{n-1}\left( 1-\frac{2}{n}\right) \ge \frac{1}{6}. \end{aligned}$$
(42)

Then, for any y with \({\mathrm {LO}}(y)<n\), we have

$$\begin{aligned} \mathrm {E}(f^n(1^n)-f^n(y))= & {} \mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))\\&+ \sum ^{n-1}_{k={\mathrm {LO}}(y)+1}\, \mathrm {E}\left( f^n(z^{k})-f^n\left( z^{k-1}\right) \right) \ge \frac{1}{6}, \end{aligned}$$

where \(z^{n-1}=1^{n-1}0\), \(z^k \; ({\mathrm {LO}}(y)<k<n-1)\) denotes one solution z with \({\mathrm {LO}}(z)=k\), \(z^{{\mathrm {LO}}(y)}=y\), and the inequality is by applying Eqs. (42) and (38). As the analysis for \(\mathrm {P}(\hat{f}(x)\le \hat{f}(y))\) in case (1) (i.e., Eq. (41)), we can similarly use Chebyshev’s inequality to derive that, noting that \(\mu \ge 1/6\) here,

$$\begin{aligned} \mathrm {P}(\hat{f}(1^{n})\le \hat{f}(y)) \le \sigma ^2/(m\mu ^2) \le 1/(4n^4). \end{aligned}$$

Thus, case (2) holds. \(\square \)

The following theorem shows that for bit-wise noise \((p,\frac{1}{n})\), using sampling with \(m=144n^6\) is sufficient to make the (\(1+1\))-EA solve LeadingOnes in polynomial time.

Theorem 19

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise \((p,\frac{1}{n})\),if using sampling with \(m=144n^6\), the expected running time is polynomial.

Proof

We use Theorem 1 to prove it. We first construct a distance function V(x) as, for any x with \({\mathrm {LO}}(x)=i\),

$$\begin{aligned} V(x)=\left( 1+\frac{c}{n}\right) ^n-\left( 1+\frac{c}{n}\right) ^i, \end{aligned}$$

where \(c=13\). Then, we investigate \(\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\) for any x with \({\mathrm {LO}}(x)<n\). Assume that currently \({\mathrm {LO}}(x)=i\), where \(0\le i \le n-1\). We divide the drift into two parts: positive \(\mathrm {E}^+\) and negative \(\mathrm {E}^-\). That is,

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)=\mathrm {E}^+-\mathrm {E}^-. \end{aligned}$$

The positive drift \(\mathrm {E}^+\) can be expressed as Eq. (4), except that \(\mathrm {P}(f^n(x')\ge f^n(x))\) changes to \(\mathrm {P}(\hat{f}(x')\ge \hat{f}(x))\) due to sampling. To derive a lower bound on \(\mathrm {E}^+\), we only consider that the \((i+1)\)-th bit of x is flipped and the other bits keep unchanged, the probability of which is \(\frac{1}{n}(1-\frac{1}{n})^{n-1}\). The only difference between \(x'\) and x is the \((i+1)\)-th bit and \({\mathrm {LO}}(x')>{\mathrm {LO}}(x)=i\). If \({\mathrm {LO}}(x')=n\), \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 1/(4n^4)\) by case (2) of Lemma 4, and then \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \ge 1-1/(4n^4)\). If \({\mathrm {LO}}(x')<n\), it must hold that \({\mathrm {LO}}(x)\le {\mathrm {LO}}(x')-1 \wedge x_{{\mathrm {LO}}(x')+1}=x'_{{\mathrm {LO}}(x')+1}=0\). By case (1) of Lemma 4, \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 1/n^2\), and then \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \ge 1-1/n^2\). Thus, the probability of accepting the offspring solution \(x'\) is at least 1 / 2. Since \({\mathrm {LO}}(x')>i\), \(V(x)-V(x') \ge (1+\frac{c}{n})^{i+1}-(1+\frac{c}{n})^i=\frac{c}{n}(1+\frac{c}{n})^i\). Then, \(\mathrm {E}^+\) can be lower bounded as follows:

$$\begin{aligned} \mathrm {E}^+ \ge \left( 1-\frac{1}{n}\right) ^{n-1}\frac{1}{n}\cdot \frac{1}{2} \cdot \frac{c}{n}\left( 1+\frac{c}{n}\right) ^i\ge \frac{c}{6n^2}\left( 1+\frac{c}{n}\right) ^i. \end{aligned}$$

For the negative drift \(\mathrm {E}^{-}\), we need to consider \({\mathrm {LO}}(x')<{\mathrm {LO}}(x)=i\). Since \(V(x')-V(x) \le V(0^n)-V(x)= (1+\frac{c}{n})^i-1\), Eq. (5) becomes

$$\begin{aligned} \mathrm {E}^-\le \left( \left( 1+\frac{c}{n}\right) ^i-1\right) \sum _{x': {\mathrm {LO}}(x')<i}\mathrm {P}_{mut}(x,x')\cdot \mathrm {P}(\hat{f}(x') \ge \hat{f}(x)). \end{aligned}$$

We further divide \({\mathrm {LO}}(x')<i\) into two cases. If \({\mathrm {LO}}(x')\le i-2\) or \({\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=0\), then \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \le 1/n^2\) by case (1) of Lemma 4. If \({\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=1\), then \(\sum _{x': {\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=1}\mathrm {P}_{mut}(x,x') \le 1/n^2\) since it is necessary to flip the i-th and the \((i+1)\)-th bits of x in mutation. Then, we get

$$\begin{aligned} \mathrm {E}^-\le \left( \left( 1+\frac{c}{n}\right) ^i-1\right) \cdot \left( 1\cdot \frac{1}{n^2} +\frac{1}{n^2}\cdot 1\right) \le \frac{2}{n^2}\left( 1+\frac{c}{n}\right) ^i. \end{aligned}$$

By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have, noting that \(c=13\),

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)&\ge \left( 1+\frac{c}{n}\right) ^i \cdot \left( \frac{c}{6n^2}-\frac{2}{n^2}\right) \ge \frac{1}{6n^2}. \end{aligned}$$

Since \(V(x) \le (1+\frac{13}{n})^n \le e^{13}=O(1)\), we have \(\mathrm {E}(\tau \mid \xi _0)=O(n^{2})\) by Theorem 1. Each iteration of the (\(1+1\))-EA using sampling takes \(2m=288n^6\) number of fitness evaluations, thus the expected running time is polynomial. \(\square \)

For bit-wise noise (1, q), we prove in Theorems 20 and 21 that the expected running time is polynomial if and only if \(q = O(\log n/n)\). The proof of Theorem 20 is similar to that of Theorem 19, which considers bit-wise noise \((p,\frac{1}{n})\). The main difference is the probability of accepting the offspring solution \(x'\) (i.e., \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x))\)), due to the change of noise. Lemma 5 gives some bounds on this probability, which will be used in the proof of Theorem 20.

Lemma 5

For the LeadingOnes problem under bit-wise noise (1, q) with \(q\le c_0\log n/n\) (where \(c_0\) is a positive constant), if using sampling with \(m=36n^{2c_0+4}\), it holds that

  1. (1)

    for any x with \({\mathrm {LO}}(x)=i<n\) and y with \({\mathrm {LO}}(y) \le i-5c_0\log n\) or \(i-5c_0\log n<{\mathrm {LO}}(y)\le i-1 \wedge y_{i+1}=0\), \(\mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le 1/n^2\).

  2. (2)

    for any y with \({\mathrm {LO}}(y) <n\), \(\mathrm {P}(\hat{f}(1^n) \le \hat{f}(y))\le 1/n^2\).

Proof

The proof is finished by deriving a lower bound on the expectation of \(f^n(x)-f^n(y)\) and then applying Chebyshev’s inequality. We first consider case (1). For any x with \({\mathrm {LO}}(x)=i<n\),

$$\begin{aligned} \begin{aligned} \mathrm {E}(f^n(x))&\ge \sum _{j=1}^i (1-q)^{j-1}q\cdot (j-1) +(1-q)^{i}q\cdot (i+1)+(1-q)^{i+1}\cdot i,\\ \mathrm {E}(f^n(x))&\le \sum _{j=1}^i (1-q)^{j-1}q\cdot (j-1)+(1-q)^{i}q\cdot n+(1-q)^{i+1}\cdot i. \end{aligned} \end{aligned}$$
(43)

By applying these two inequalities, we get, for any \(k<i<n\),

$$\begin{aligned}&\mathrm {E}(f^n(x)-f^n(y) \mid {\mathrm {LO}}(x)=i \wedge {\mathrm {LO}}(y)=k)\nonumber \\&\quad \ge \frac{1}{q}((i-1)(1-q)^{i+1}-i(1-q)^i) +q(1-q)^i\cdot (i+1)+(1-q)^{i+1}\cdot i\nonumber \\&\qquad -\frac{1}{q}((k-1)(1-q)^{k+1}-k(1-q)^k)-q(1-q)^k\cdot n -(1-q)^{k+1}\cdot k \nonumber \\&\quad =(1-q)^i\left( q+1-\frac{1}{q}\right) -(1-q)^k\left( 1-\frac{1}{q}+qn-qk\right) \nonumber \\&\quad =(1-q)^k\left( \left( \frac{1}{q}-q-1\right) (1-(1-q)^{i-k})+q+qk-qn\right) . \end{aligned}$$
(44)

Thus, for any x with \({\mathrm {LO}}(x)=i\) and y with \({\mathrm {LO}}(y)=k\le i-5c_0\log n\) (where \(5c_0\log n\le i<n\)),

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y))&\ge (1-q)^k\left( \left( \frac{1}{q}-2\right) (1-(1-q)^{5c_0\log n})-qn\right) \nonumber \\&\ge (1-q)^k\left( \left( \frac{1}{q}-2\right) \left( 1-\frac{1}{e^{5c_0q\log n}}\right) -qn\right) \nonumber \\&\ge (1-q)^k\left( \left( \frac{1}{q}-2\right) \left( 1-\frac{1}{1+5c_0q\log n}\right) -qn\right) \nonumber \\&\ge (1-q)^k\left( \left( \frac{1}{q}-2\right) \frac{5c_0q\log n}{2}-qn\right) \nonumber \\&\ge (1-q)^k\left( \frac{5}{2}c_0\log n-5\frac{(c_0\log n)^2}{n}-c_0\log n\right) \nonumber \\&=(1-q)^k\left( \frac{3}{2}c_0\log n-o(1)\right) \ge \frac{c_0\log n}{en^{c_0}}, \end{aligned}$$
(45)

where the third inequality is by \(e^x\ge 1+x\), the fourth is by \(5c_0q \log n \le 5(c_0\log n)^2/n \le 1\) for sufficiently large n, the fifth is by \(q \le c_0\log n /n\), and the last is by

$$\begin{aligned} (1-q)^k\ge \left( 1-\frac{c_0\log n}{n}\right) ^{n-1}\ge \left( \frac{1}{e}\right) ^{\frac{n-1}{n/(c_0\log n)-1}}\ge \left( \frac{1}{e}\right) ^{c_0\log n+1} = \frac{1}{en^{c_0}}. \end{aligned}$$
(46)

When \(i-5c_0\log n<{\mathrm {LO}}(y)\le i-1 \wedge y_{i+1}=0\) (where \(i\ge 1\)), we calculate \(\mathrm {E}(f^n(x)-f^n(y))\) by

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y))=\sum ^{i}_{k={\mathrm {LO}}(y)+1} \mathrm {E}\left( f^n(z^k)-f^n\left( z^{k-1}\right) \right) , \end{aligned}$$

where \(z^i=x\), \(z^k \; ({\mathrm {LO}}(y)<k<i)\) denotes one solution z with \({\mathrm {LO}}(z)=k \wedge z_{i+1}=0\), and \(z^{{\mathrm {LO}}(y)}=y\). We then give a lower bound on \(\mathrm {E}(f^n(z^k)-f^n(z^{k-1}))\), where \({\mathrm {LO}}(y)+1\le k\le i\). For \(\mathrm {E}(f^n(z^k))\), we directly use the lower bound in Eq. (43) to get that

$$\begin{aligned} \mathrm {E}(f^n(z^{k})) \ge \sum _{j=1}^k (1-q)^{j-1}q\cdot (j-1) +(1-q)^{k}q\cdot (k+1)+(1-q)^{k+1}\cdot k. \end{aligned}$$

For \(\mathrm {E}(f^n(z^{k-1}))\), instead of directly using the upper bound in Eq. (43), we derive a tighter one:

$$\begin{aligned} \mathrm {E}(f^n(z^{k-1}))\le \sum _{j=1}^{k-1} (1-q)^{j-1}q\cdot (j-1)+(1-q)^{k-1}q(q\cdot n+(1-q)\cdot i)+(1-q)^{k}\cdot (k-1). \end{aligned}$$

Note that the inequality is because when the \(k-1\) leading 1-bits of \(z^{k-1}\) keep unchanged and its k-th bit (which must be 0) is flipped, if the \((i+1)\)-th bit (which is 0) is flipped, \(f^n(z^{k-1})\le n\); otherwise, \(f^n(z^{k-1})\le i\). By combining the above two inequalities, we get

$$\begin{aligned} \mathrm {E}\left( f^n(z^{k})-f^n(z^{k-1})\right)&\ge (1-q)^{k-1}(1+q(k-i-1)-q^2(1+n-i)) \\&=(1-q)^{k-1}(1-o(1)) \ge 1/(2en^{c_0}), \end{aligned}$$

where the equality by \(k \ge {\mathrm {LO}}(y)+1 >i-5c_0\log n+1\) and \(q =O(\log n/n)\), and the last is by Eq. (46). Thus, we have

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y)) \ge (i-{\mathrm {LO}}(y))\cdot \frac{1}{2en^{c_0}} \ge \frac{1}{2en^{c_0}}. \end{aligned}$$
(47)

For case (1), by combining Eqs. (45) and (47), we get a unified lower bound \(1/(2en^{c_0})\) on \(\mathrm {E}(f^n(x)-f^n(y))\). As the analysis for \(\mathrm {P}(\hat{f}(x) \le \hat{f}(y))\) (i.e., Eq. (41)) in the proof of Lemma 4, we can similarly use Chebyshev’s inequality to derive that, noting that \(m=36n^{2c_0+4}\) here,

$$\begin{aligned} \mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le \sigma ^2/(m\mu ^2) \le 1/n^2, \end{aligned}$$
(48)

where \(\mu \) and \(\sigma ^2\) are the expectation and variance of \(f^n(x)-f^n(y)\), respectively. Thus, case (1) holds.

Then, we consider case (2), that is, we are to analyze \(\mathrm {P}(\hat{f}(1^n)\le \hat{f}(y))\) with \({\mathrm {LO}}(y)<n\). We calculate \(\mathrm {E}(f^n(1^n)-f^n(y))\) as follows:

$$\begin{aligned} \mathrm {E}(f^n(1^n)-f^n(y))=\mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))+\mathrm {E}(f^n(1^{n-1}0)-f^n(y)). \end{aligned}$$

It is easy to derive that

$$\begin{aligned} \mathrm {E}(f^n(1^n))&=\sum _{j=1}^{n}(1-q)^{j-1}q\cdot (j-1)+(1-q)^n\cdot n,\\ \mathrm {E}(f^n(1^{n-1}0))&=\sum _{j=1}^{n-1}(1-q)^{j-1}q\cdot (j-1)+(1-q)^{n-1}q\cdot n+(1-q)^{n}\cdot (n-1). \end{aligned}$$

Then, we have

$$\begin{aligned} \mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))=(1-q)^{n-1}(1-2q) \ge 1/(2en^{c_0}), \end{aligned}$$

where the inequality is by Eq. (46) and \(q=O(\log n/n)\). If \({\mathrm {LO}}(y)\le n-1-5c_0\log n\), by Eq. (45), we directly have \(\mathrm {E}(f^n(1^{n-1}0)-f^n(y))\ge \frac{c_0\log n}{en^{c_0}} \ge 0\). If \({\mathrm {LO}}(y)\ge n-5c_0\log n\), \(\mathrm {E}(f^n(1^{n-1}0)-f^n(y))\) is calculated as follows:

$$\begin{aligned} \mathrm {E}(f^n(1^{n-1}0)-f^n(y)) =\sum ^{n-1}_{k={\mathrm {LO}}(y)+1} \mathrm {E}(f^n(z^k)-f^n(z^{k-1})), \end{aligned}$$

where \(z^{n-1}=1^{n-1}0\), \(z^k \; ({\mathrm {LO}}(y)<k<n-1)\) denotes one solution z with \({\mathrm {LO}}(z)=k\), and \(z^{\mathrm {LO(y)}}=y\). By Eq. (44), we have, for any \({\mathrm {LO}}(y)+1 \le k<n\),

$$\begin{aligned}&\mathrm {E}\left( f^n(z^{k})-f^n(z^{k-1})\right) \ge (1-q)^{k-1}(1-q^2+q(k-1-n)) \\&\quad \ge (1-q)^{k-1}(1-q^2-5c_0q\log n)=(1-q)^{k-1}(1-o(1)) \ge 0, \end{aligned}$$

which implies that \(\mathrm {E}(f^n(1^{n-1}0)-f^n(y))\ge 0\). Then, we get

$$\begin{aligned} \mathrm {E}(f^n(1^n)-f^n(y))=\mathrm {E}(f^n(1^n)-f^n(1^{n-1}0))+\mathrm {E}(f^n(1^{n-1}0)-f^n(y)) \ge \frac{1}{2en^{c_0}}. \end{aligned}$$

As the analysis in case (1) (i.e., Eq. (48)), we can get that

$$\begin{aligned} \mathrm {P}(\hat{f}(1^n) \le \hat{f}(y)) \le 1/n^2. \end{aligned}$$

Thus, case (2) holds. \(\square \)

The following theorem shows that by using sampling, the (\(1+1\))-EA can solve LeadingOnes under bit-wise noise (1, q) in polynomial time when q is in the range of \(O(\log n/n)\).

Theorem 20

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q) with \(q=O(\log n/n)\), if using sampling, there exists some \(m= O(poly(n))\) such that the expected running time is polynomial.

Proof

Since \(q=O(\log n/n)\), there exists a positive constant \(c_0\) such that for all n large enough, \(q\le c_0\log n/n\). We prove that if using sampling with \(m=36n^{2c_0+4}\), the expected running time is polynomial.

The proof is similar to that of Theorem 19. The distance function V(x) is defined as, for any x with \({\mathrm {LO}}(x)=i\), \(V(x)=\left( 1+\frac{c}{n}\right) ^n-\left( 1+\frac{c}{n}\right) ^i\), where \(c=30c_0\log n+7\). Assume that currently \({\mathrm {LO}}(x)=i\), where \(0\le i \le n-1\). For the positive drift \(\mathrm {E}^+\), we consider that only the \((i+1)\)-th bit (i.e., the first 0-bit) of x is flipped in mutation. If \({\mathrm {LO}}(x')=n\), \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 1/n^2\) by case (2) of Lemma 5. If \({\mathrm {LO}}(x')<n\), it must hold that \({\mathrm {LO}}(x)\le {\mathrm {LO}}(x')-1 \wedge x_{{\mathrm {LO}}(x')+1}=x'_{{\mathrm {LO}}(x')+1}=0\), since x and \(x'\) are the same except the \(({\mathrm {LO}}(x)+1)\)-th bit. By case (1) of Lemma 5, \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 1/n^2\). Thus, the probability of accepting the offspring solution \(x'\) is \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \ge 1-1/n^2 \ge 1/2\). The positive drift then can be lower bounded by

$$\begin{aligned} \mathrm {E}^+ \ge \left( 1-\frac{1}{n}\right) ^{n-1}\frac{1}{n}\cdot \frac{1}{2} \cdot \frac{c}{n}\left( 1+\frac{c}{n}\right) ^i\ge \frac{c}{6n^2}\left( 1+\frac{c}{n}\right) ^i. \end{aligned}$$

For the negative drift \(\mathrm {E}^{-}\), we need to consider \({\mathrm {LO}}(x')<{\mathrm {LO}}(x)=i\). We further divide \({\mathrm {LO}}(x')<i\) into two cases. If \({\mathrm {LO}}(x')\le i-5c_0\log n\) or \(i-5c_0\log n<{\mathrm {LO}}(x')\le i-1 \wedge x'_{i+1}=0\), then \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \le 1/n^2\) by case (1) of Lemma 5. If \(i-5c_0\log n<{\mathrm {LO}}(x')\le i-1 \wedge x'_{i+1}=1\), we consider the probability of generating \(x'\) by mutation on x. Since it is necessary to flip the \((i+1)\)-th bit of x and at least one 1-bit in positions \(i-5c_0\log n+2\) to i simultaneously,

$$\begin{aligned} \sum _{x': i-5c_0\log n<{\mathrm {LO}}(x')\le i-1 \wedge x'_{i+1}=1}\mathrm {P}_{mut}(x,x') \le \frac{5c_0\log n}{n^2}. \end{aligned}$$

Then, we get

$$\begin{aligned} \mathrm {E}^-\le \left( \left( 1+\frac{c}{n}\right) ^i-1\right) \cdot \left( 1\cdot \frac{1}{n^2} +\frac{5c_0\log n}{n^2}\cdot 1\right) \le \frac{5c_0\log n+1}{n^2}\left( 1+\frac{c}{n}\right) ^i. \end{aligned}$$

By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have, noting that \(c=30c_0\log n+7\),

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x) \ge \left( 1+\frac{c}{n}\right) ^i \cdot \left( \frac{c}{6n^2}-\frac{5c_0\log n+1}{n^2}\right) \ge \frac{1}{6n^2}. \end{aligned}$$

Note that \(V(x) \le (1+\frac{c}{n})^n \le e^c = e^{30c_0\log n+7}= n^{O(1)}\). By Theorem 1, we have \(\mathrm {E}(\tau \mid \xi _0)\le 6n^2\cdot n^{O(1)}\). Since each iteration of the (\(1+1\))-EA using sampling takes \(2m=72n^{2c_0+4}\) number of fitness evaluations, the expected running time is polynomial. \(\square \)

For bit-wise noise (1, q) with \(q=\omega (\log n/n)\), we apply the negative drift theorem (i.e., Theorem 2) to prove that using sampling still cannot guarantee a polynomial running time.

Theorem 21

For the (\(1+1\))-EA on LeadingOnes under bit-wise noise (1, q) with \(q=\omega (\log n/n)\), if using sampling with any \(m= O(poly(n))\), the expected running time is exponential.

Proof

We use Theorem 2 to prove it. Let \(X_t=|x|_0\) be the number of 0-bits of the solution x after t iterations of the algorithm. We consider the interval [0, n / 50], that is, the parameters \(a=0\) and \(b=n/50\) in Theorem 2. Then, we analyze the drift \(\mathrm {E}(X_t-X_{t+1}\mid X_t=i)\) for \(1\le i<n/50\). As in the proof of Theorem 9, we divide the drift into two parts: positive \(\mathrm {E}^+\) and negative \(\mathrm {E}^-\). That is,

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1}\mid X_t=i)=\mathrm {E}^+ - \mathrm {E}^-. \end{aligned}$$

For the positive drift, we can use the same analysis as that (i,e., Eq. (17)) in the proof of Theorem 9 to derive that \(\mathrm {E}^+\le i/n<1/50\). This is because the offspring solution \(x'\) is optimistically assumed to be always accepted in the analysis of Eq. (17), and thus the change of noise and the use of sampling will not affect the analysis.

For the negative drift, we need to consider that the number of 0-bits is increased. To derive a lower bound on \(\mathrm {E}^-\), we only consider the \(n-i\) cases where only one 1-bit of x is flipped, which happens with probability \(\frac{1}{n}(1-\frac{1}{n})^{n-1}\). Let \(x^j\) denote the solution that is generated by flipping only the j-th bit of x. Then, we have

$$\begin{aligned} \mathrm {E}^-\ge \sum _{j: x_j=1}\frac{1}{n}\left( 1-\frac{1}{n}\right) ^{n-1}\cdot \mathrm {P}(\hat{f}(x^j)\ge \hat{f}(x)) \cdot (i+1-i). \end{aligned}$$

We then investigate \(\mathrm {P}(\hat{f}(x^j)\ge \hat{f}(x))\). Let \(\mathrm {F}(y)\) denote the event that when evaluating the noisy fitness of a solution y, at least one 1-bit in its first n / 3 positions is flipped by noise. Note that there must exist 1-bits in the first n / 3 positions of x, since \(|x|_0=i<n/50\). For any \(x^j\) with \(j>n/3\), its first n / 3 bits are the same as that of x. If both the events \(\mathrm {F}(x)\) and \(\mathrm {F}(x^j)\) happen, \(f^n(x)<n/3 \wedge f^n(x^j)<n/3\), and the last (2n) / 3 bits of x and \(x^j\) will not affect their noisy fitness. Thus, for any \(x^j\) with \(j>n/3\), \(f^n(x^j)\) has the same distribution as \(f^n(x)\) conditioned on \(\mathrm {F}(x)\cap \mathrm {F}(x^j)\). When estimating \(\hat{f}(y)\) of a solution y by sampling, let \(\mathrm {F}_t(y)\) denote the event \(\mathrm {F}(y)\) in the t-th independent noisy evaluation of y. Thus, for all \(j>n/3\), \(\sum _{t=1}^mf^n_t(x)\) and \(\sum _{t=1}^mf^n_t(x^j)\) have the same distribution conditioned on \((\cap _{t=1}^m\mathrm {F}_t(x))\cap (\cap _{t=1}^m\mathrm {F}_t(x^j))\). Since \(\hat{f}(x)=\frac{1}{m}\sum _{t=1}^mf^n_t(x)\) and \(\hat{f}(x^j)=\frac{1}{m}\sum _{t=1}^mf^n_t(x^j)\), we have

$$\begin{aligned} \mathrm {P}(\hat{f}(x^j)\ge \hat{f}(x) \mid (\cap _{t=1}^m\mathrm {F}_t(x))\cap (\cap _{t=1}^m\mathrm {F}_t(x^j)))\ge 1/2. \end{aligned}$$

Since \(|x|_0=i<n/50\), there are at least \(n/3-n/50 \ge n/4\) number of 1-bits in the first n / 3 positions of x, which implies that the probability \(\mathrm {P}(\mathrm {F}_t(x))\) of the event \(\mathrm {F}_t(x)\) happening is at least \(1-(1-q)^{n/4}\). Furthermore, \(\mathrm {P}(\mathrm {F}_t(x^j))=\mathrm {P}(\mathrm {F}_t(x))\) for \(j>n/3\), since x and \(x^j\) have the same first n / 3 bits. Thus,

$$\begin{aligned}&\mathrm {P}((\cap _{t=1}^m\mathrm {F}_t(x))\cap (\cap _{t=1}^m\mathrm {F}_t(x^j)))\ge (1-(1-q)^{n/4})^{2m} \\&\quad \ge \left( 1-\frac{1}{e^{nq/4}}\right) ^{2m} \ge \left( \frac{1}{e}\right) ^{2m/(e^{nq/4}-1)} = \left( \frac{1}{e}\right) ^{2m/n^{\omega (1)}} \ge \frac{1}{e}, \end{aligned}$$

where the equality is by \(q=\omega (\log n/n)\) and the last inequality is by \(m=O(poly(n))\). By the law of total probability, we have, for all \(j>n/3\),

$$\begin{aligned} \mathrm {P}(\hat{f}(x^j)\ge \hat{f}(x))&\ge \mathrm {P}(\hat{f}(x^j)\ge \hat{f}(x) \mid (\cap _{t=1}^m\mathrm {F}_t(x))\cap (\cap _{t=1}^m\mathrm {F}_t(x^j)))\\&\quad \cdot \mathrm {P}((\cap _{t=1}^m\mathrm {F}_t(x))\cap (\cap _{t=1}^m\mathrm {F}_t(x^j)))\ge \frac{1}{2e}. \end{aligned}$$

Then, we can get a lower bound on the negative drift:

$$\begin{aligned} \mathrm {E}^-&\ge \sum _{j: j>n/3 \wedge x_j=1}\frac{1}{n}\left( 1-\frac{1}{n}\right) ^{n-1}\cdot \mathrm{P}(\hat{f}(x^j)\ge \hat{f}(x)) \cdot 1 \\&\ge \left( \frac{2n}{3}-\frac{n}{50}\right) \cdot \frac{1}{en}\cdot \frac{1}{2e} \ge \frac{1}{36}. \end{aligned}$$

By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have, for \(1 \le i< n/50\),

$$\begin{aligned} \mathrm {E}(X_t-X_{t+1}\mid X_t=i)=\mathrm {E}^+-\mathrm {E}^-\le \frac{1}{50}-\frac{1}{36}. \end{aligned}$$

Thus, condition (1) of Theorem 2 holds. It is easy to verify that condition (2) of Theorem 2 holds with \(\delta =1\) and \(r(l)=2\). Note that \(l=b-a=n/50\). By Theorem 2, we get that the expected number of iterations is exponential, and then the expected running time is also exponential. \(\square \)

For the one-bit noise model, we prove in Theorem 22 that the (\(1+1\))-EA using sampling can always solve the LeadingOnes problem in polynomial time. The proof is finished by applying the additive drift theorem. Lemma 6 gives some bounds on the probability \(\mathrm {P}(\hat{f}(x')-\hat{f}(x))\) of accepting the offspring solution \(x'\), which will be used in the proof of Theorem 22.

From case (2) of Lemma 6, we can observe that when the solution is close to the optimum \(1^n\), the probability of accepting \(1^n\) is small, which is different from the situation in bit-wise noise (as shown in case (2) of Lemmas 4 and 5). If directly using the distance function constructed in the proof of Theorems 19 and 20, this small acceptance probability will make the positive drift \(\mathrm {E}^+\) not large enough, and then the condition of the additive drift theorem is unsatisfied. To address this issue, our idea is to re-design the distance function such that the distance from non-optimal solutions to the optimum \(1^n\) is much larger than that between non-optimal solutions. Then, the small probability of accepting \(1^n\) can be compensated by the significant decrease on the distance after accepting \(1^n\); thus the positive drift can still be large enough to make the condition of the additive drift theorem hold.

Note that in the proof of Lemma 6, we use Berry–Esseen inequality [31] and Hoeffding’s inequality, instead of Chebyshev’s inequality used in the proof of Lemmas 4 and 5. When the solution x is close to the optimum \(1^n\), the expectation of \(f^n(1^n)-f^n(x)\) is lower bounded by a negative value. Thus, for deriving a lower bound on the probability \(\mathrm {P}(\hat{f}(1^n) \ge \hat{f}(x))\), Chebyshev’s inequality fails, while we apply Berry–Esseen inequality [31]. The analysis shows that a moderate sample size \(m=4n^4\log n/15\) can make this probability not too small. With this sample size, to derive a small enough upper bound on the probability \(\mathrm {P}(\hat{f}(x)\le \hat{f}(y))\) for two solutions x and y with \(\mathrm {E}(f^n(x)-f^n(y))>0\), we have to use Hoeffding’s inequality, which is tighter than Chebyshev’s inequality.

Lemma 6

For the LeadingOnes problem under one-bit noise, if using sampling with \(m=4n^4\log n/15\), it holds that

  1. (1)

    for any x with \({\mathrm {LO}}(x)=i<n\) and y with \({\mathrm {LO}}(y) \le i-2\) or \({\mathrm {LO}}(y)=i-1 \wedge y_{i+1}=0\), \(\mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le 2n^{-\frac{2}{15}}\); furthermore, if \(i \le n-4\), \(\mathrm {P}(\hat{f}(x) \le \hat{f}(y))\le 2n^{-\frac{32}{15}}\).

  2. (2)

    for any y, if \({\mathrm {LO}}(y) \le n-4\), \(\mathrm {P}(\hat{f}(1^n) \le \hat{f}(y))\le 2n^{-\frac{8}{15}}\); if \({\mathrm {LO}}(y) \in \{n-3,n-2,n-1\}\), \(\mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y))\ge 1/n^2\).

Proof

The proof is finished by deriving a lower bound on the expectation of \(f^n(x)-f^n(y)\) and then applying Hoeffding’s inequality or Berry–Esseen inequality [31]. We first consider case (1). For any x with \({\mathrm {LO}}(x)=i<n\),

$$\begin{aligned}&\mathrm {E}(f^n(x))\ge (1-p)\cdot i+\sum _{j=1}^i\frac{p}{n}\cdot (j-1)+\frac{p}{n}\cdot (i+1)+p\frac{n-i-1}{n}\cdot i,\nonumber \\&\quad \mathrm {E}(f^n(x))\le (1-p)\cdot i+\sum _{j=1}^{i}\frac{p}{n}\cdot (j-1)+\frac{p}{n}\cdot n+p\frac{n-i-1}{n}\cdot i. \end{aligned}$$
(49)

By applying these two inequalities, we have, for any \(k< i<n\),

$$\begin{aligned}&\mathrm {E}(f^n(x)-f^n(y)\mid {\mathrm {LO}}(x)=i \wedge {\mathrm {LO}}(y)=k)\nonumber \\&\quad \ge (1-p)(i-k)+\frac{p}{n}\left( (i-k)\left( n-\frac{i+k+3}{2}\right) +i+1-n\right) . \end{aligned}$$
(50)

Thus, for any x with \({\mathrm {LO}}(x)=i\) and y with \({\mathrm {LO}}(y)=k \le i-2\) (where \(2 \le i<n\)), we have

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y))\ge 2(1-p)+\frac{p}{n}(n-i) \ge \frac{n-i}{n}. \end{aligned}$$

When \({\mathrm {LO}}(y)=i-1 \wedge y_{i+1}=0\), if we directly use Eq. (50), we will get a lower bound \(1-p\), which is 0 for \(p=1\). Since \(y_{i+1}=0\), we actually can get an exact value of \(\mathrm {E}(f^n(y))\):

$$\begin{aligned} \mathrm {E}(f^n(y))= (1-p)\cdot (i-1)+\sum _{j=1}^{i-1}\frac{p}{n}\cdot (j-1)+\frac{p}{n}\cdot i+p\frac{n-i}{n}\cdot (i-1). \end{aligned}$$

By combining this equality and the lower bound in Eq. (49), we have that, for any x with \({\mathrm {LO}}(x)=i\) and y with \({\mathrm {LO}}(y) = i-1 \wedge y_{i+1}=0\) (where \(1 \le i<n\)),

$$\begin{aligned} \mathrm {E}(f^n(x)-f^n(y)) \ge 1-p+\frac{p}{n}(i-1)+\frac{p}{n}+\frac{p}{n}(n-2i) \ge \frac{n-i}{n}. \end{aligned}$$

Thus, we have a unified lower bound \((n-i)/n\) on \(\mathrm {E}(f^n(x)-f^n(y))\) for case (1). Denote \(\mathrm {E}(f^n(x)-f^n(y))\) as \(\mu \). We have \(\mu \ge (n-i)/n\ge 1/n\). Since \(\hat{f}(x)\) is the average of \(m=4n^4\log n/15\) independent evaluations, it is easy to verify that \(\mathrm {E}(\hat{f}(x)-\hat{f}(y))=\mu \). Furthermore, \(|f^n(x)-f^n(y)| \le n\). By Hoeffding’s inequality, we get

$$\begin{aligned} \mathrm {P}(\hat{f}(x)\le \hat{f}(y)) \le \mathrm {P}(|\hat{f}(x)-\hat{f}(y) -\mu |\ge \mu ) \le 2e^{-\frac{2m^2\mu ^2}{m(2n)^2}} \le 2n^{-\frac{2}{15}}. \end{aligned}$$

When \(i \le n-4\), \(\mu \ge (n-i)/n\ge 4/n\). By applying this lower bound to the above inequality, we get

$$\begin{aligned} \mathrm {P}(\hat{f}(x)\le \hat{f}(y)) \le 2n^{-\frac{32}{15}}. \end{aligned}$$

Thus, case (1) holds.

For case (2), we are to analyze \(\mathrm {P}(\hat{f}(1^n) \le \hat{f}(y))\) or \(\mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y))\), where \({\mathrm {LO}}(y)<n\). \(\mathrm {E}(f^n(1^n))\) can be calculated as follows:

$$\begin{aligned} \mathrm {E}(f^n(1^n))= (1-p)\cdot n+\sum _{j=1}^n\frac{p}{n}\cdot (j-1). \end{aligned}$$

By combining this equality and the upper bound in Eq. (49), we get, for any y with \({\mathrm {LO}}(y)=k<n\),

$$\begin{aligned} \mathrm {E}(f^n(1^n)-f^n(y)) \ge (n-k)\left( 1-p+\frac{p}{n}\frac{n-k-3}{2}\right) . \end{aligned}$$

When \(k \le n-4\), \(\mathrm {E}(f^n(1^n)-f^n(y))\ge 2/n\). By Hoeffding’s inequality, we get

$$\begin{aligned} \mathrm {P}(\hat{f}(1^n) \le \hat{f}(y)) \le 2n^{-\frac{8}{15}}. \end{aligned}$$

When \(k \in \{n-3,n-2,n-1\}\),

$$\begin{aligned} \mu =\mathrm {E}(f^n(1^n)-f^n(y))\ge 1-p-p/n. \end{aligned}$$

If \(p\le 1-2/n\), we have \(\mu \ge 1/n\). By Hoeffding’s inequality,

$$\begin{aligned} \mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y)) \ge 1-2n^{-\frac{2}{15}}. \end{aligned}$$
(51)

If \(p> 1-2/n\), \(\mu \ge -1/n\). We then use Berry–Esseen inequality [31] to derive a lower bound on \(\mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y))\). Let \(Y=f^n(1^n)-f^n(y)-\mu \). Note that \(\mathrm {E}(Y)=0\). Denote the variance \(\mathrm {Var}(Y)\) as \(\sigma ^2\). Then, \(\sigma ^2=\mathrm {Var}(f^n(1^n))+\mathrm {Var}(f^n(y))\). For \(\mathrm {Var}(f^n(1^n))\), it can be calculated as follows:

$$\begin{aligned} \mathrm {Var}(f^n(1^n))&=\mathrm {E}((f^n(1^n))^2)-(\mathrm {E}(f^n(1^n))^2 \\&=(1-p)n^2+\frac{p}{n}\frac{(n-1)n(2n-1)}{6}-\left( (1-p)n+\frac{p}{n}\frac{n(n-1)}{2}\right) ^2 \\&\ge \frac{p(n-1)(n-1/2)}{3}-\left( 1+\frac{p(n-1)}{2}\right) ^2 \ge \frac{n^2}{14}, \end{aligned}$$

where the last inequality holds because for \(p>1-2/n\) and n being large enough, the minimum is reached when \(p=1\). Thus, we get \(\sigma ^2\ge n^2/14\). Since \(-2n\le Y\le 2n\), \(\rho =\mathrm {E}(|Y|^3)\le 8n^3\). Note that \(\hat{f}(1^n)-\hat{f}(y)-\mu \) is the average of m independent random variables, which have the same distribution as Y. By Berry–Esseen inequality [31], we have

$$\begin{aligned} \mathrm {P}\left( \frac{(\hat{f}(1^n)-\hat{f}(y)-\mu )\sqrt{m}}{\sigma }\le x\right) -\varPhi (x)\le \frac{\rho }{\sigma ^3\sqrt{m}}, \end{aligned}$$
(52)

where \(\varPhi (x)\) is the cumulative distribution function of the standard normal distribution. Thus, for \(p>1-2/n\),

$$\begin{aligned} \mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y))&=\mathrm{P}(\hat{f}(1^n)- \hat{f}(y)-\mu \ge -\mu )\nonumber \\&\ge 1-\mathrm{P}(\hat{f}(1^n)- \hat{f}(y)-\mu \le -\mu )\nonumber \\&\ge 1-\varPhi \left( \frac{-\mu \sqrt{m}}{\sigma }\right) -\frac{\rho }{\sigma ^3\sqrt{m}} \nonumber \\&=\varPhi \left( \frac{\mu \sqrt{m}}{\sigma }\right) -\frac{\rho }{\sigma ^3\sqrt{m}} \ge \varPhi \left( -\frac{\sqrt{m}}{n\sigma }\right) -\frac{\rho }{\sigma ^3\sqrt{m}} \nonumber \\&\ge \varPhi \left( -\sqrt{\frac{56}{15}\log n}\right) -O\left( \frac{1}{n^2\sqrt{\log n}}\right) \nonumber \\&\ge \frac{1}{\sqrt{2\pi }}\int _{-2\sqrt{\log n}}^{-\sqrt{\frac{56}{15}\log n}}e^{-\frac{t^2}{2}}\mathrm {d}t -O\left( \frac{1}{n^2\sqrt{\log n}}\right) \ge \frac{1}{n^2}, \end{aligned}$$
(53)

where the second inequality is by Eq. (52), the third inequality is \(\mu \ge -1/n\), the fourth is by \(m=4n^4\log n/15\), \(\sigma ^2 \ge n^2/14\) and \(\rho \le 8n^3\), and the last holds with sufficiently large n. According to Eqs. (51) and (53), we get that, when \({\mathrm {LO}}(y) \in \{n-3,n-2,n-1\}\),

$$\begin{aligned} \mathrm {P}(\hat{f}(1^n) \ge \hat{f}(y)) \ge 1/n^2. \end{aligned}$$

Thus, case (2) holds. \(\square \)

The following theorem shows that for one-bit noise, using sampling with \(m=4n^4\log n/15\) is sufficient to make the (\(1+1\))-EA solve LeadingOnes in polynomial time.

Theorem 22

For the (\(1+1\))-EA on LeadingOnes under one-bit noise, if using sampling with \(m=4n^4\log n/15\), the expected running time is polynomial.

Proof

We use Theorem 1 to prove it. We first construct a distance function V(x) as, for any x with \({\mathrm {LO}}(x)=i\),

$$\begin{aligned} V(x)=\left\{ \begin{array}{ll} n-i/n^6, &{} \quad i\le n-1; \\ 0, &{} \quad i=n. \end{array}\right. \end{aligned}$$

Then, we investigate \(\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\) for any x with \({\mathrm {LO}}(x)=i<n\). We divide the drift into two parts: positive \(\mathrm {E}^+\) and negative \(\mathrm {E}^-\). That is,

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)=\mathrm {E}^+-\mathrm {E}^-. \end{aligned}$$

For \(i\le n-4\), the lower bound analysis on \(\mathrm {E}^+\) is similar to that in the proof of Theorem 19. We only consider that the \((i+1)\)-th bit of x is flipped and the other bits keep unchanged, whose probability is \(\frac{1}{n}(1-\frac{1}{n})^{n-1}\). The offspring solution \(x'\) is the same as x except the \((i+1)\)-th bit, and \({\mathrm {LO}}(x')>{\mathrm {LO}}(x)=i\). According to definition of V(x), we know that the decrease on the distance is at least \(1/n^6\). If \({\mathrm {LO}}(x')=n\), \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 2n^{-\frac{8}{15}}\) by case (2) of Lemma 6. If \({\mathrm {LO}}(x')<n\), it must hold that \({\mathrm {LO}}(x)\le {\mathrm {LO}}(x')-1 \wedge x_{{\mathrm {LO}}(x')+1}=x'_{{\mathrm {LO}}(x')+1}=0\). By case (1) of Lemma 6, \(\mathrm {P}(\hat{f}(x') \le \hat{f}(x)) \le 2n^{-\frac{2}{15}}\). Thus, the probability \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x))\) of accepting the offspring solution \(x'\) is at least \(1-\max \{2n^{-\frac{8}{15}},2n^{-\frac{2}{15}}\} \ge 1/2\), where the inequality holds with sufficiently large n. Then, \(\mathrm {E}^+\) can be lower bounded as follows:

$$\begin{aligned} \mathrm {E}^+ \ge \left( 1-\frac{1}{n}\right) ^{n-1}\frac{1}{n}\cdot \frac{1}{2} \cdot \frac{1}{n^6}\ge \frac{1}{6n^7}. \end{aligned}$$

For the negative drift \(\mathrm {E}^{-}\), we need to consider \({\mathrm {LO}}(x')<{\mathrm {LO}}(x)=i\). We further divide \({\mathrm {LO}}(x')<i\) into two cases. If \({\mathrm {LO}}(x')\le i-2\) or \({\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=0\), then \(\mathrm {P}(\hat{f}(x') \ge \hat{f}(x)) \le 2n^{-\frac{32}{15}}\) by case (1) of Lemma 6 (note that \(i\le n-4\) here), and \(V(x')-V(x) \le V(0^n)-V(x)= i/n^6\). If \({\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=1\), then \(\sum _{x': {\mathrm {LO}}(x')= i-1 \wedge x'_{i+1}=1}\mathrm {P}_{mut}(x,x') \le 1/n^2\) since it is necessary to flip the i-th and the \((i+1)\)-th bits of x in mutation, and \(V(x')-V(x)=1/n^6\). Then, \(\mathrm {E}^{-}\) can be upper bounded by as follows:

$$\begin{aligned} \mathrm {E}^- \le \frac{i}{n^6}\cdot 1\cdot \frac{2}{n^{\frac{32}{15}}}+\frac{1}{n^6}\cdot \frac{1}{n^2}\cdot 1=o\left( \frac{1}{n^7}\right) . \end{aligned}$$

By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)=\mathrm {E}^+-\mathrm {E}^-\ge \frac{1}{12n^7}. \end{aligned}$$

For \(i\in \{n-3,n-2,n-1\}\), we use a trivial upper bound \(i/n^6\) on \(\mathrm {E}^-\). For the positive drift, we consider that the offspring solution \(x'\) is the optimal solution \(1^n\), whose probability is at least \(\frac{1}{n^3}(1-\frac{1}{n})^{n-3} \ge \frac{1}{en^3}\) since at most three bits of x need to be flipped. The probability of accepting \(1^n\) is at least \(\frac{1}{n^2}\) by case (2) of Lemma 6. The distance decrease is at least \(n-\frac{n-1}{n^6} \ge \frac{n}{2}\). Thus, \(\mathrm {E}^+ \ge \frac{1}{en^3} \cdot \frac{1}{n^2} \cdot \frac{n}{2} \ge \frac{1}{2en^4}\). By subtracting \(\mathrm {E}^-\) from \(\mathrm {E}^+\), we have

$$\begin{aligned} \mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\ge \frac{1}{2en^4}-\frac{i}{n^6}= \frac{1}{2en^4}-O\left( \frac{1}{n^5}\right) \ge \frac{1}{4en^4}. \end{aligned}$$

By combing the above analyses for \(i\le n-4\) and \(i\in \{n-3,n-2,n-1\}\), we get a unified lower bound \(1/(12n^7)\) on \(\mathrm {E}(V(\xi _t)-V(\xi _{t+1}) \mid \xi _t=x)\). Since \(V(x)\le n\), we have \(\mathrm {E}(\tau \mid \xi _0)=O(n^{8})\) by Theorem 1. Each iteration of the (\(1+1\))-EA using sampling takes \(2m=8n^4\log n/15\) number of fitness evaluations, thus the expected running time is polynomial. \(\square \)

Therefore, we have shown that the (\(1+1\))-EA using sampling can always solve LeadingOnes in polynomial time under bit-wise noise \((p,\frac{1}{n})\) (i.e., Theorem 19) or one-bit noise (i.e., Theorem 22); while under bit-wise noise (1, q), the tight range of q allowing a polynomial time is \(O(\log n/n)\) (i.e., Theorems 20 and 21). The reason why sampling is ineffective under bit-wise noise (1, q) with \(q=\omega (\log n/n)\) is similar to that observed in the analysis of OneMax under bit-wise noise (1, q) with \(q=1/2-1/n^{\omega (1)}\) or \(q \ge 1/2\). For two solutions x and y with \(f(x) > f(y)\), we can find from the calculation of \(\mathrm {E}(f^n(x)-f^n(y))\) in the proof of Lemma 5 that when \(q=\omega (\log n/n)\), \(\mathrm {E}(f^n(x)-f^n(y))\) will be very small (since \((1-q)^{n-1}=1/n^{\omega (1)}\)) or even negative; thus a polynomial sample size is not sufficient to make the probability of accepting the true worse solution y small enough, or it will increase the probability. For the situation where sampling is effective, the analysis on LeadingOnes is a little different from that on OneMax. On OneMax, \(\mathrm {E}(f^n(x)-f^n(y))\) is always sufficiently large when \(f(x)>f(y)\); thus sampling can make the probability of accepting the true worse solution y small enough and then work. While on LeadingOnes, \(\mathrm {E}(f^n(x)-f^n(y))\) is sufficiently large in most cases instead of all cases when \(f(x)>f(y)\), but a few exceptions do not affect the effectiveness of sampling.

6 Conclusion

In this paper, we theoretically study the (\(1+1\))-EA solving the OneMax and LeadingOnes problems under bit-wise noise, which is characterized by a pair (pq) of parameters. We derive the ranges of p and q for the running time being polynomial and super-polynomial, respectively. The previously known parameter ranges for the (\(1+1\))-EA solving LeadingOnes under one-bit noise are also improved. Considering that the (\(1+1\))-EA is efficient only under low noise levels, we further analyze the robustness of sampling to noise. We prove that for both bit-wise noise and one-bit noise, using sampling can significantly enlarge the range of noise parameters allowing a polynomial running time. In the future, we shall improve the currently derived bounds on LeadingOnes, as they do not cover the whole range of noise parameters. For proving polynomial upper bounds on the expected running time by using sampling, we only give a sufficiently large sample size, the tightness of which will be studied in our future work. In our analysis, we consider the bit-wise noise model with one parameter fixed. Thus, to analyze the running time under general bit-wise noise is also an interesting future work. Note that our analysis has shown that the performance of the (\(1+1\))-EA solving OneMax under bit-wise noise \((1,\frac{\log n}{30n})\) and \((\frac{\log n}{30n},1)\) is significantly different, which implies that \(p \cdot q\) may not be the only deciding factor for the analysis of general bit-wise noise.