Keywords

1 Introduction

Evolutionary algorithms (EAs) are widely used in real-world applications and have achieved great success in solving both continuous and discrete optimization problems. As a class of general-purpose optimization algorithms, EAs are designed to search for the optimum without the information of gradients or Hessian matrix. Thus, they have been regarded as a major approach when the optimization problems are non-differential, multi-modal, or black-box. However, a great gap lies between the numerous practice and weak theoretical foundation. Moreover, most of the theoretical works focus on the discrete domain, such as pseudo-Boolean functions [4, 10,11,12, 14, 15], while only a few works deal with continuous optimization problems [1, 7, 8].

Note that the theoretical analysis of EAs in discrete domain cannot be directly extended to continuous domain. In the following, we briefly summarize the difference between continuous EAs and discrete EAs.

  • Optimization problems they are designed to solve originally. Continuous EAs are designed to solve continuous optimization problems, while discrete EAs are more often used to solve discrete optimization problems, such as combinatorial optimization problems, etc.

  • Coding they use to represent the individuals. Individuals are represented by a real vector in continuous EAs in contrast to a binary vector in discrete EAs.

  • Mutation operators. In discrete EAs, an offspring solution is generated by flipping the bits of the parent solution. In continuous EAs, we first sample a stochastic vector from a given distribution. A new candidate solution is generated by adding the preceding stochastic vector to the solution.

  • Stopping criteria. The searching space is infinite and uncountable in continuous domain, which means the probability of finding the exact optimum is 0. Thus, we focus on the number of steps that are needed to halve the approximation error, i.e., the distance from the optimum in continuous domain. While in discrete domain, the evolution would not stop until the exact optimum or an acceptable suboptimal solution is found.

Most of the previous theoretical works in continuous EAs analyze the expected running time of evolution strategies (ES) on the sphere function. Jägersküpper [7] proved that the (1+1)-ES with Gaussian mutation operator has a polynomial lower bound \( \varOmega (n) \) on certain function scenarios (including the sphere function), and derived an upper bound \( O(n) \) when 1/5-rule is introduced. These two results together confirmed the effectiveness of 1/5-rule. Moreover, Akimoto et al. proved that the expected running time of the (1+1)-ES with Gaussian mutation operator adapted by 1/5-rule on the sphere function is \( \varTheta \left( {\left( {\log \frac{d}{\epsilon }} \right)n} \right) \), where d denotes the distance of the initial individual to the global optimum and \( \epsilon \) is the parameter of the stopping criteria [2]. Uniform mutation operator was then analyzed by Agapie et al., with a lower bound \( \varOmega (n) \) and an upper bound \( O(e^{cn} ) \) [1].

In this paper, we first prove that the general lower bound on the running time of the (1+1)-ES with uniform mutation operator inside a hypersphere is \( \varOmega (e^{cn} ) \) which largely improves the known bound \( \varOmega (n) \) which was given in [1]. Second, we derive a similar result when the mutation operator is replaced by Gaussian mutation, i.e. \( \varOmega (e^{cn} ) \), in contrast to \( \varOmega (n) \) given by Jägersküpper in [7]. Third, we study the effectiveness of 1/5-rule for the (1+1)-ES using uniform mutation inside a hypersphere for the first time. We prove that the running time of the (1+1)-ES using uniform mutation inside a hypersphere has polynomial upper bound \( O(n) \) after the incorporation of 1/5-rule.

The rest of the paper is organized as follows. Section 2 introduces some preliminaries, including the studied problem and algorithm, and also the analysis tools that will be used. Section 3 derives the exponential lower bound of the (1+1)-ES using uniform and Gaussian mutation operators. Section 4 analyzes the effectiveness of 1/5-rule for the (1+1)-ES using uniform mutation operator inside a hypersphere. Section 5 concludes the paper.

2 Preliminaries

In this section, we first introduce the problem and algorithm studied in this paper, respectively. Then, we present the analysis tools that we use throughout the paper.

2.1 Problem

We consider minimizing the sphere function in this paper. Its formulation is as follows,

$$ f(x) = \sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} } $$
(1)

The sphere function has two properties:

  • A minimum exists.

  • \( f\left( x \right) < f\left( y \right) \) if \( \left| x \right| < \left| y \right| \) for any two points \( x \) and \( y \), which means that a mutant closer to the optimum is always accepted.

2.2 (1+1)-ES and 1/5-Rule

Jägersküpper analyzed the time complexity of the (\( 1 { + }\uplambda \))-ES on the sphere function and proved that the offspring population helps to reduce the running time of ES [8]. However, parent population does not help for the hypersphere function, and even makes the expected running time larger [6]. Our theoretical analysis will focus on the (1+1)-ES, which maintains only one parent solution and one offspring solution in each generation. The (1+1)-ES replaces the current individual by a better candidate offspring, and discards a worse one in every iteration. The process of the (1+1)-ES using uniform mutation inside a sphere is as follows,

Initialization: Create a random initial individual \( {\upxi }_{0} \) which satisfies \( \text{d}\left( {{\upxi}_{0} ,\text{o}} \right) = \text{R} \), where R denotes a positive constant and o is the global optimum.

Mutation: Create a new search point \( {\upxi}^{\prime } = {\upxi}_{t} + {\upsigma }{\text{m}} \), where \( {\upsigma } \) is scaling factor and m is a n-dimensional vector that uniformly distributed in a sphere of fixed radius r (or sampled from n-variate standard Gaussian distribution).

Evaluation: If \( {\text{f}}({\upxi}^{\prime} ) \le {\text{f}}({\upxi}_{\text{t}} ) \), \( {\upxi}_{{{\text{t}} + 1}} = {\upxi }^{\prime} \), else \( {\upxi}_{{{\text{t}} + 1}} = {\upxi}_{\text{t}} \).

Stopping criteria: \( f(\upxi_{t} ) \le R/2 \).

Among all the self-adaptive strategies for the coefficients of mutation operators in real-world applications and research areas, 1/5-rule is the most commonly used one [3, 13]. This heuristic strategy has shown excellent performance by controlling the mutation strength of ES. The 1/5-rule works as follows: the optimization process is observed for n steps without changing \( \upsigma \); if more than one fifth of the steps in this observed phase have been successful, \( \upsigma \) is doubled; otherwise, \( \upsigma \) is halved.

2.3 Analysis Tools

It is not hard to notice the common property between the evolving of population in ES and a Markov Chain \( \{ \xi_{t} \}_{t \ge 0} \), since the future state/population is independent of the past states/populations and only effected by the current state/population. However, it is nearly impossible to construct a transition probability matrix for ES. Thus, a new effective model, renewal process, is introduced for ES. For each \( t = 0,1,2, \ldots \), let \( \xi_{t} \) be the individual of \( t \)-th generation in ES and \( d(\xi_{t} ) \) be the distance from \( \xi_{t} \) to the optimum 0. Define the stochastic process \( \{ \rho_{t} \}_{t \ge 0} \) by \( \rho_{t} = d\left( {\xi_{t} } \right) - d(\xi_{t + 1} ) \), then \( \{ \rho_{t} \}_{t \ge 0} \) is a renewal process.

Since ES is a class of stochastic algorithms, the first hitting time is a random variable, either. We take the expectation of first hitting time (FHT) as the measure of time complexity of ES. The definition of FHT is as follows,

First Hitting Time. Assume that the initial state set is \( \chi_{I} \), and the final state set is \( \chi^{*} \), the first hitting time \( T \) is defined as \( T = \hbox{min} \{ t|\xi_{t} \in \text{ }\chi^{*} \wedge \xi_{0} \in \chi_{I} \} \). In our paper, particularly \( \chi_{I} = \{ \xi \in {\mathbb{R}}^{n} |d\left( \xi \right) = R\} \) and \( \text{ }\chi^{*} = \left\{ {\xi \in {\mathbb{R}}^{n} |d\left( \xi \right) \, <\, R/2} \right\} \).

Several analysis tools have been proposed with different ideas and principles in discrete domain. All of these methods have shown great power in the running time analysis of EAs. Drift analysis, constructs a distance function and derives the expected running time by bounding the one-step progress [4]. Fitness level method cuts the search space into different sets or levels based on the fitness values and derives the expected running time by bounding the transition probability of the solutions moving from the current level to other levels [14]. Switch analysis compares the evolutionary process with another one which has been well analyzed and derives the expected running time by estimating the one-step differences of the two [15]. However, these three effective methods cannot be applied to continuous optimization directly since they are all based on Markov Chain with finite states.

Similar to the drift analysis theorem, which was first introduced to analyze the running time of EAs by He and Yao [4], the analysis approach for ES introduced by Jägersküpper also focuses on the difference of the two adjacent generations, defined as progress rate \( \rho_{t} \) [7].

Progress Rate. Assume that the t-th generation is \( \xi_{t} \), the distance of \( \xi_{t} \) from the optimizer is \( d\left( {\xi_{t} } \right) \), then \( \rho_{t} = d(\xi_{t} ) - d(\xi_{t + 1} ) \) is called the progress rate of the t-th generation.

Theorem 1

(Lower bound theorem [7]). Let \( \rho_{1} , \rho_{2} , \ldots \) denote random variables with bounded range and T be the random variable defined by \( T = \hbox{min} \{ t|\mathop \sum \nolimits_{i = 0}^{t} \rho_{i} \ge g\} \) for a given \( g > 0 \). If \( {\text{E}}(T) \) exists and \( {\text{E}}(\rho_{i} |T \ge i) \le u \), then \( {\text{E}}(T) \ge g/u \).

Theorem 1 reveals that if the progress rate has an upper bound, we could derive a lower bound for the expected running time.

3 Exponential Lower Bound of the (1+1)-ES

In this section, we prove that the expected running time of the (1+1)-ES using uniform mutation operator inside a hypersphere and Gaussian mutation operator has exponential lower bound when used for minimizing the sphere function, presented in Theorems 2 and 3, respectively.

3.1 Uniform Mutation Inside a Hypersphere

Lower bound of the (1+1)-ES minimizing the sphere function derived by Agapie et al. is not tight enough, which could be testified by simulation experiments [1]. The super-polynomial bound by Huang et al., however, is concerned with uniform mutation inside a hypercube and based on a very strong assumption on the initialization and mutation operator [5]. In this subsection, we give out the lower bound which is tighter than the state-of-the-art bound. Besides, our method is more general, which has no assumption on the coefficients of the mutation operator and initialization.

Theorem 2.

For the (1+1)-ES minimizing the sphere function in \( n \)-dimensional Euclidian space using uniform mutation in a hypersphere and elitist selection, the expected number of generations \( T \) to half the approximation error is \( \varOmega (e^{cn} ) \).

Proof.

Denote the global optimum as \( o \) and the current individual as \( c \), i.e. \( \xi_{t} \). Supposing that the distance between \( o \) and \( c \) is \( d(o,c) = R \) and the radius of the mutation operator is \( r \). Due to symmetry, we assume that \( c \) is on axes \( ox_{1} \).

As shown in Fig. 1. Intersection of fitness sphere and mutation sphere, we call the left sphere \( S_{o} = \{ \xi \in {\mathbb{R}}^{n} |d(\xi ,o) \le R\} \) fitness sphere, since the all points inside \( S_{o} \) are better than the current population \( c \). Accordingly, we call the right sphere \( S_{c} = \{ \xi \in {\mathbb{R}}^{n} |d(\xi ,c) \le r\} \) mutation sphere, since all the candidate offsprings scatter in it. \( A \) is denoted as the intersection of the two spheres.

Fig. 1.
figure 1

Intersection of fitness sphere and mutation sphere

According to the definition of progress rate, we get

$$ E\left( {\rho_{t} } \right) = \frac{1}{{C_{n} r^{n} }}\mathop \smallint \nolimits \left( {R - d\left( {\xi_{t + 1} } \right)} \right)d\xi_{t + 1} \le \frac{1}{{C_{n} r^{n} }} \cdot R \cdot V\left( A \right) $$
(2)

Where \( C_{n} r^{n} = \frac{{\pi^{n/2} }}{{\Gamma (\frac{n}{2} + 1)}}r^{n} \) is the volume of the \( n \)-dimensional sphere with radius \( r \) and \( V(A) \) is the volume of the intersection of the two hyperspheres. We begin our deduction in an algebraic way. By solving the equations below

$$ \left\{ {\begin{array}{*{20}r} \hfill {\sum\nolimits_{i = 1}^{n} {x_{i}^{2} = R^{2} } } \\ \hfill {\left( {x_{1} - R} \right)^{2} + \sum\nolimits_{i = 2}^{n} {x_{i}^{2} = r^{2} } } \\ \end{array} } \right. $$
(3)

We get

$$ x_{1} = R - \frac{{r^{2} }}{2R} $$
(4)

Which is the \( x_{1} \)-coordinate of the hyperplane \( P \).

  1. 1.

    \( r > R \). In this case, we have

    $$ V(A) \le V(S_{o} ) = C_{n} R^{n} = C_{n} r^{n} \cdot O(e^{{c_{1} \cdot n}} ) $$
    (5)

Where \( c_{1} < 0 \) is a negative constant.

  1. 2.

    \( r \le R \). In this case, the intersection \( A \) can be viewed as two spherical caps.

    $$ \begin{aligned} V\left( A \right) & = V\left( {A_{1} } \right) + V\left( {A_{2} } \right) \\ & = C_{n} r^{n} \cdot \frac{1}{2}I _{{\left( {1 - \frac{{r^{2} }}{{4R^{2} }}} \right)}} \left( {\frac{n + 1}{2},\frac{1}{2}} \right) + C_{n} R^{n} \frac{1}{2}I _{{\left( {\frac{{r^{2} }}{{R^{2} }} - \frac{{r^{4} }}{{4R^{4} }}} \right)}} \left( {\frac{n + 1}{2},\frac{1}{2}} \right) \\ & <\, C_{n} r^{n} \cdot \frac{{\frac{R}{r}\left( {1 - \frac{{r^{2} }}{{4R^{2} }}} \right)^{{\frac{n + 1}{2}}} }}{{\frac{n + 1}{2} \cdot B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} + C_{n} R^{n} \cdot \frac{{\frac{{R^{2} }}{{2R^{2} - r^{2} }}\left( {\frac{{r^{2} }}{{R^{2} }} - \frac{{r^{4} }}{{4R^{4} }}} \right)^{{\frac{n + 1}{2}}} }}{{\frac{n + 1}{2} \cdot B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} \\ & = C_{n} r^{n} \cdot \frac{{\frac{{2R^{3} }}{{r(2R^{2} - r^{2} )}}\left( {1 - \frac{{r^{2} }}{{4R^{2} }}} \right)^{{\frac{n + 1}{2}}} }}{{\frac{n + 1}{2} \cdot B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} \\ & = C_{n} r^{n} \cdot O(e^{{c_{2} \cdot n}} ) \\ \end{aligned} $$

Where \( c_{2} < 0 \) is a negative constant. The second equation derived by applying the formula of volume of spherical cap \( V(C_{a,b} ) = C_{n} a^{n} \cdot \frac{1}{2}I _{{\frac{{2ab - b^{2} }}{{a^{2} }}}} \left( {\frac{n + 1}{2},\frac{1}{2}} \right) \) by Li, where \( a \) is the radius of the hypersphere that the spherical cap is cut from and \( b \) is the height of the spherical cap [9]. The first inequality is based on

$$ \begin{aligned} & I_{x} \left( {\frac{n + 1}{2},\frac{1}{2}} \right) \equiv \frac{{B\left( {x;\frac{n + 1}{2},\frac{1}{2}} \right)}}{{B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} = \frac{{\mathop \smallint \nolimits_{0}^{x} t^{{\frac{n - 1}{2}}} (1 - t)^{{ - \frac{1}{2}}} {\text{d}}t}}{{B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} \\ & < \frac{{(1 - x)^{{ - \frac{1}{2}}} \mathop \smallint \nolimits_{0}^{x} t^{{\frac{n - 1}{2}}} {\text{d}}t}}{{B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} = \frac{{(1 - x)^{{ - \frac{1}{2}}} x^{{\frac{n + 1}{2}}} }}{{\frac{n + 1}{2} \cdot B\left( {\frac{n + 1}{2},\frac{1}{2}} \right)}} \\ \end{aligned} $$
(6)

The last equation is based on

$$ B\left( {\frac{n + 1}{2},\frac{1}{2}} \right) = \frac{{\Gamma \left( {\frac{n + 1}{2}} \right)\Gamma \left( {\frac{1}{2}} \right)}}{{\Gamma \left( {\frac{n}{2} + 1} \right)}} = \varTheta \left( {\left( {\frac{n}{2} + 1} \right)^{{ - \frac{1}{2}}} } \right) = \varTheta \left( {n^{{ - \frac{1}{2}}} } \right) $$
(7)

Summing up the two cases, we get

$$ {\text{E}}\left( {\rho_{t} } \right) = \frac{1}{{C_{n} r^{n} }}\mathop \smallint \nolimits \left( {R - d(\xi_{t + 1} )} \right){\text{d}}\xi_{t + 1} \le \frac{1}{{C_{n} r^{n} }} \cdot R \cdot V(A) = O(e^{{c_{3} \cdot n}} ) $$
(8)

Where \( c_{3} = \hbox{max} \left\{ {c_{1} ,c_{2} } \right\} < 0 \) is a negative constant. By lower bound theorem, we have

$$ {\text{E}}(T) \ge \frac{R - R/2}{{O(e^{{c_{3} \cdot n}} )}} = \varOmega (e^{cn} ) $$
(9)

Where \( c = - c_{3} > 0 \) is a positive constant.

3.2 Gaussian Mutation

The analysis by Huang et al. for Gaussian mutation is based on a very strong assumption on the initialization [5]. In this section, we give out a lower bound which is much tighter than the bound \( \varOmega (n) \) given by Jägersküpper [7]. Moreover, our analysis is more general than [5] since we have no assumption on the initialization.

Theorem 3.

For the (1+1)-ES minimizing the sphere function in \( n \)-dimensional Euclidian space using Gaussian mutation and elitist selection, the expected number of generations \( T \) to half the approximation error is \( \varOmega (e^{cn} ) \).

Proof.

Denote the global optimum as \( o \) and the current individual as \( c \), i.e. \( \xi_{t} \). Supposing that the distance between \( o \) and \( c \) is \( d(o,c) = R \). Due to symmetry, we assume that \( c \) is on axes \( ox_{1} \). Similarly, we call the left sphere \( S_{o} = \{ \xi \in {\mathbb{R}}^{n} |d(\xi ,o) \le R\} \) fitness sphere, since the all points inside \( S_{o} \) are better than the current population \( c \).

According to the definition of progress rate, we get

$$ {\text{E}}\left( {\rho_{t} } \right) = \mathop \smallint \nolimits \varphi (\xi_{t + 1} )\left( {R - d(\xi_{t + 1} )} \right){\text{d}}\xi_{t + 1} \le \mathop \smallint \nolimits \varphi (\xi_{t + 1} )R{\text{d}}\xi_{t + 1} $$
(10)

Where \( \varphi ( \cdot ) \) is the probability density function of the \( n \)-variate standard Gaussian distribution. Since the \( \varphi \left( {\xi_{t + 1} } \right) \le \hbox{max} \left\{ {\varphi (m)|m \in {\mathbb{R}}^{n} } \right\} = (2\pi )^{ - n/2} \), we have

$$ {\text{E}}\left( {\rho_{t} } \right) \le \mathop \smallint \nolimits \left( {2\pi } \right)^{{ - \frac{n}{2}}} R{\text{d}}\xi_{t + 1} = \left( {2\pi } \right)^{{ - \frac{n}{2}}} \cdot R \cdot V\left( {S_{o} } \right) = \left( {2\pi } \right)^{{ - \frac{n}{2}}} \cdot \frac{{\pi^{n/2} R^{n + 1} }}{{\Gamma (n/2 + 1)}} $$
(11)

Where \( V(S_{o} ) \) is the volume of the fitness sphere \( S_{o} \). Since \( R/2 \le d(\xi_{i} ) \le R \) for \( \forall i \in [0,T] \), we have \( {\text{E}}\left( {\rho_{t} } \right) \le \frac{{R^{n + 1} }}{{\Gamma (n/2 + 1) \cdot 2^{n/2} }} \). By lower bound theorem, we have

$$ {\text{E}}\left( T \right) \ge \frac{{R - \frac{R}{2}}}{{\frac{{R^{n + 1} }}{{\Gamma \left( {\frac{n}{2} + 1} \right) \cdot 2^{{\frac{n}{2}}} }}}} = \frac{{2^{{\frac{n}{2} - 1}} }}{{R^{n} }} \cdot\Gamma \left( {\frac{n}{2} + 1} \right) = \varOmega (e^{cn} ) $$
(12)

Where \( c > 0 \) is a positive constant, since \( R > 0 \) is a positive constant.

4 On the Effectiveness of 1/5-Rule

Jägersküpper proved that the upper bound of the (1+1)-ES minimize the sphere function using Gaussian mutations adapted by 1/5-rule is \( O(n) \) [7]. In this section, we study the effectiveness of 1/5-rule for the (1+1)-ES using uniform mutation operator.

Lemma 1

[7]. For a \( n \)-dimensional vector \( m \) with each component independently standard normal distributed, the expectation \( l_{E} = {\text{E}}\left( {|m|} \right) \) exists, and \( {\text{P}}\left\{ {\left| {\left| m \right| - l_{E} } \right| \ge \delta \cdot l_{E} } \right\} \le \frac{{\delta^{ - 2} }}{2n - 1} \).

Similar result holds for uniform mutation inside a hypersphere. Lemma 2 reveals that the length of the \( n \)-dimensional vector uniformly sampled from the unit hypersphere is \( \varTheta (1) \) w.o.p., with overwhelming probability.

Lemma 2.

For a \( n \)-dimensional vector \( m \) uniformly distributed in a unit hypersphere, the expectation \( l_{E} = {\text{E}}\left( {|m|} \right) \) exists, and \( {\text{P}}\left\{ {\left| {\left| m \right| - l_{E} } \right| \ge \delta \cdot l_{E} } \right\} \le \frac{{\delta^{ - 2} }}{{n^{2} + 2n}} \).

Proof.

According to the definition of expectation in mathematics, we have

$$ {\text{E}}\left( {|m|} \right) = \mathop \smallint \nolimits_{0}^{1} rA_{n} (r) \cdot \frac{1}{{V_{n} (1)}}{\text{d}}r = \frac{n}{n + 1} $$
(13)

Where \( V_{n} \left( 1 \right) = \frac{{\pi^{n/2} }}{{\Gamma \left( {\frac{n}{2} + 1} \right)}} \) is the volume of the unit hypersphere and \( A_{n} \left( r \right) = \frac{{2\pi^{n/2} }}{{\Gamma \left( {\frac{n}{2}} \right)}} \cdot r^{n - 1} \) is the surface area of the \( n \)-dimensional hypersphere of radius \( r \). Similarly, we have

$$ {\text{E}}\left( {|m|^{2} } \right) = \mathop \smallint \nolimits_{0}^{1} r^{2} A_{n} (r) \cdot \frac{1}{{V_{n} (1)}}{\text{d}}r = \frac{n}{n + 2} $$
(14)

Consequently, we get

$$ {\text{Var}}\left( {|{\text{m}}|} \right) = {\text{E}}\left( {|m|^{2} } \right) - {\text{E}}\left( {|m|} \right)^{2} = \frac{n}{{(n + 1)^{2} (n + 2)}} $$
(15)

Thus, we get

$$ {\text{P}}\left\{ {\left| {\left| m \right| - l_{E} } \right| \ge \delta \cdot l_{E} } \right\} \le \frac{{\delta^{ - 2} }}{{n^{2} + 2n}} $$
(16)

by applying the Chebyshev’s inequality.

Intuitively, we partition a run of the (1+1)-ES adapted by 1/5-rule into phases each of which lasts \( n \) steps. Thus, in each phase, the radius of the mutation sphere is fixed. Denote the radius and the scaling factor in the \( t \)-th phase as \( r_{t} \) and \( \delta_{t} \). Jägersküpper find the relation between the radius of the mutation sphere and the progress rate, for Gaussian mutation though, but could be extended to uniform mutation similarly.

Lemma 3

[7]. Let the (1+1)-ES minimize the sphere function using Gaussian mutations adapted by the 1/5-rule and elitist selection. Then

  1. 1.

    If \( r_{t} = \varTheta \left( {d\left( {\xi_{t} /\sqrt n } \right)} \right) \), then w.o.p. \( d\left( {\xi_{t + 1} } \right) = d\left( {\xi_{t} } \right) - \varOmega \left( {d\left( {\xi_{t} } \right)} \right) \), i.e., the distance to the optimum is reduced by a constant fraction in the \( t \)-th phase.

  2. 2.

    If \( \delta \) is doubled after the \( t \)-th phase, then \( r_{t} = {\rm O}\left( {d\left( {\xi_{t} /\sqrt n } \right)} \right) \) w.o.p.;

  3. 3.

    If \( \delta \) is halved after the \( t \)-th phase, then \( r_{t + 1} = \varOmega \left( {d\left( {\xi_{t + 1} /\sqrt n } \right)} \right) \) w.o.p.;

Since the proof of Lemma 3 is merely based on the isotropy of Gaussian mutation and Lemma 1, we could derive the following result by simply applying the isotropy of uniform mutation and Lemma 2.

Lemma 4.

The (1+1)-ES minimize the sphere function using uniform mutations inside a hypersphere adapted by the 1/5-rule and elitist selection. Then

  1. 1.

    If \( r_{t} = \varTheta \left( {d\left( {\xi_{t} /\sqrt n } \right)} \right) \), then w.o.p. \( d\left( {\xi_{t + 1} } \right) = d\left( {\xi_{t} } \right) - \varOmega \left( {d\left( {\xi_{t} } \right)} \right) \), i.e., the distance to the optimum is reduced by a constant fraction in the \( t \)-th phase.

  2. 2.

    If \( \delta \) is doubled after the \( t \)-th phase, then \( r_{t} = {\rm O}\left( {d\left( {\xi_{t} /\sqrt n } \right)} \right) \) w.o.p.;

  3. 3.

    If \( \delta \) is halved after the \( t \)-th phase, then \( r_{t + 1} = \varOmega \left( {d\left( {\xi_{t + 1} /\sqrt n } \right)} \right) \) w.o.p.;

Theorem 4

[7]. Let the (1+1)-ES minimize the sphere function in \( n \)-dimensional Euclidian space using Gaussian mutation adapted by 1/5-rule and elitist selection. Given that the initialization ensures \( d(\xi_{0} )/\delta_{0} = \varTheta (n) \), the expected number of generations \( T \) to half the approximation error is w.o.p. \( {\rm O}(n) \).

Since the proof of Theorem 4 is merely based on the isotropy of Gaussian mutation and Lemma 3, we could derive the following result by simply applying the isotropy of uniform mutation and Lemma 4.

Theorem 5.

Let the (1+1)-ES minimize the sphere function in \( n \)-dimensional Euclidian space using uniform mutation in a hypersphere adapted by 1/5-rule and elitist selection. Given that the initialization ensures \( d(\xi_{0} )/\delta_{0} = \varTheta (\sqrt n ) \), the expected number of generations T to half the approximation error is w.o.p. \( {\rm O}(n) \).

5 Conclusion

In this paper, we first derive a tighter lower bound, i.e., \( \varOmega (e^{cn} ) \), for (1+1)-ES using uniform mutation inside a hypersphere on the sphere function in contrast to the state-of-the-art bound \( \varOmega ({n}) \). Second, we extend the result to the case when the mutation operator is Gaussian mutation. Third, we study the effectiveness of 1/5-rule for (1+1)-ES using uniform mutation inside a hypersphere for the first time, and prove that the incorporation of 1/5-rule reduces the time complexity from exponential to polynomial.

Uniform mutation operator that distributed in a hypercube is also used by some researchers. It does not belong to isotropic distribution like uniform distribution inside a hypersphere do. The effectiveness of 1/5-rule on it has not been studied yet. We leave it as our future work.