Use of the q-Gaussian mutation in evolutionary algorithms

Tinós, Renato; Yang, Shengxiang

doi:10.1007/s00500-010-0686-8

Use of the q-Gaussian mutation in evolutionary algorithms

Original Paper
Published: 31 December 2010

Volume 15, pages 1523–1549, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Soft Computing Aims and scope Submit manuscript

Use of the q-Gaussian mutation in evolutionary algorithms

Download PDF

Renato Tinós¹ &
Shengxiang Yang²

368 Accesses
11 Citations
Explore all metrics

Abstract

This paper proposes the use of the q-Gaussian mutation with self-adaptation of the shape of the mutation distribution in evolutionary algorithms. The shape of the q-Gaussian mutation distribution is controlled by a real parameter q. In the proposed method, the real parameter q of the q-Gaussian mutation is encoded in the chromosome of individuals and hence is allowed to evolve during the evolutionary process. In order to test the new mutation operator, evolution strategy and evolutionary programming algorithms with self-adapted q-Gaussian mutation generated from anisotropic and isotropic distributions are presented. The theoretical analysis of the q-Gaussian mutation is also provided. In the experimental study, the q-Gaussian mutation is compared to Gaussian and Cauchy mutations in the optimization of a set of test functions. Experimental results show the efficiency of the proposed method of self-adapting the mutation distribution in evolutionary algorithms.

Generalized Hybrid Evolutionary Algorithm Framework with a Mutation Operator Requiring no Adaptation

On the Mutation Operators in Evolution Strategies

Multi-search differential evolution algorithm

Article 02 March 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Evolutionary algorithms (EAs), which are a class of stochastic search algorithms inspired by the principles of natural evolution, have been successfully applied to a large number of optimization problems. Many of these problems are continuous optimization problems and, as a consequence, several interesting EA variants for real-valued optimization have been investigated, as evolution strategy (ES) with adaptive encoding (Hansen 2008), differential evolution (DE) (Das et al. 2009; Price et al. 2005; Qin et al. 2009), real-coded genetic algorithms (Herrera et al. 1998; García-Martínez et al. 2008), memetic algorithms (Lozano et al. 2004; Nguyen et al. 2009; Noman and Iba 2008), and hybrid approaches (Vrugt et al. 2009).

Stochastic search algorithms, like EAs, differ from other optimization algorithms by using random samples to generate new candidate solutions. Traditionally, when EAs are applied in real-valued optimization, new candidate solutions are generated by mutation using multivariate samples taken from Gaussian distributions with zero mean (Beyer and Schwefel 2002). The isotropic Gaussian distribution, which has a finite second moment, maximizes the Boltzmann–Gibbs entropy (and the differential entropy, i.e., the extension of the Shannon’s concept of information entropy to the continuous case) in the unconstrained real-valued search space (Thistleton et al. 2007) and, in this way, does not favor any direction in the search space. In this way, the generation of new candidate solutions by mutation does not require the knowledge of any information about the geometry of the search space.

However, in recent years, researchers have proposed the use of distributions with longer tails and an infinite second moment in EAs. For example, in the fast evolutionary programming (FEP) (Yao et al. 1999), the Cauchy distribution is employed, while in the evolutionary programming (EP) with Lévy mutation (LEP) (Lee and Yao 2004), mutation based on the Lévy distribution is used. The Lévy distribution is a class of probability distributions with an infinite second moment, which includes the Cauchy distribution, and allows to control the tail of the distribution by changing a scalar parameter α.

The use of mutation taken from heavy tail distributions implies jumps of scale-free sizes, allowing to reach distant regions of the search space faster. This property can be interesting when EAs are applied to multimodal problems or dynamic optimization problems as it can allow the population to escape faster from local optima. However, some controversy about the benefits of the use of distributions with heavy tails in EAs have appeared in recent years (Hansen et al. 2006). For example, some researchers argue that most of the proposed algorithms use heavy tail distributions that are anisotropic, i.e., where some directions of the mutation are privileged in the search space (Obuchowicz 2003). Moreover, in some fitness landscapes, it can be difficult to reach a fair region of the search space from a long jump as the probability to reach a point with a worse fitness is generally much larger for a long jump (Hansen et al. 2006).

In this paper, self-adaptation is employed to control the mutation distribution. In this way, the choice of a mutation distribution for a given problem and, at a given moment of the evolutionary process, is made by the algorithm, and not by the programmer. In the proposed algorithm, a real parameter that defines the shape of the distribution employed by the mutation operator is encoded in the chromosome of the individuals and is allowed to evolve. For this purpose, the q-Gaussian probability density function (Thistleton et al. 2007) derived from the Tsallis generalized entropy (Tsallis 1988) is employed. The q-Gaussian distribution allows to control the shape of the distribution by setting a real parameter q and can reproduce either finite second moment distributions, like the Gaussian distribution, or infinite second moment distributions, like the Cauchy distribution.

The rest of this paper is organized as follows: Related work and the main contributions of the paper are presented in Sect. 2. The q-Gaussian distribution is briefly introduced in Sect. 3. In Sect. 4, self-adaptation of the mutation distribution in EAs is proposed. Evolutionary programming algorithms with mutations generated from anisotropic and isotropic distributions are presented in Sect. 5. In Sect. 6, the theoretical analysis of the q-Gaussian mutation is provided. In Sect. 7, the q-Gaussian mutation is used in a $(\mu, \lambda)$-ES with restart. The experimental study based on a suite of benchmark test functions is presented in Sect. 8. In the experiments with EP algorithms, Gaussian, Cauchy and q-Gaussian mutations generated from anisotropic and isotropic distributions are compared, while in the remaining experiments, a $(\mu, \lambda)$-ES with q-Gaussian mutation and restart is compared with other continuous optimization EAs. Finally, Sect. 9 concludes this paper with discussions on the relevant future work.

2 Related work

In this paper, instead of only controlling the mutation strength parameter that defines the spread of a fixed mutation distribution like in ESs and EP, the shape of the mutation distribution is also controlled during the evolutionary process. There are three main classes of parameter control in EAs (Eiben et al. 1999): deterministic, where the parameters are changed by deterministic rules; adaptive, where feedback from the optimization process is employed for parameter control; and self-adaptive, where parameters of the EA are encoded in the chromosome and allowed to evolve. The use of self-adaptation to define the mutation strength parameters is usual in EP and ESs. Here, self-adaptation is used to control the parameter q, that allows to modify the shape of the q-Gaussian distribution employed to generate random mutations.

The use of mutations generated from q-Gaussian distributions in EAs is not new (Iwamatsu 2002; Moret et al. 2006). However, in such algorithms, like in most other algorithms that use heavy tail distributions, new candidate solutions are produced by generating random deviates for each coordinate of the individual, which implies anisotropic distributions where large steps are generated close to the coordinate axes. Moreover, in such algorithms, the parameter q, which defines the shape of the distribution, is fixed during the searching process or starts with a large value and decreases during the searching process, like the temperature control in simulated annealing. That is, the control is deterministic. In Lee and Yao (2004), the authors proposed two schemes for LEP: in the first scheme, all offspring are generated from a distribution with a fixed α, and in the second, each parent generates five offspring, each of which is generated from a distribution with a different pre-defined value of α. All individuals in LEP use the same pre-fixed values of α during the whole evolutionary process.

Control of the mutation probability density function is not new too. In Davis (1994), the mutation probability density function is generated from a histogram with 101 bars representing the values of probability between a range of interest in a one-dimensional space. Self-adaptation is employed to control the heights of the bars, allowing to change the shape of the mutation distribution during the running of the EA (Bäck 2000). Experiments indicated the formation of histograms with a peak in the center, suggesting that Gaussian and Cauchy distributions are good candidates as mutation distributions. However, the use of histograms is not suitable when the number of dimensions of the search space is high because the required number of histograms exponentially increases with the dimension of the search space. As pointed in Bäck (2000), the control of the mutation probability density function by a few control parameters would be more interesting.

In Dong et al. (2007), four types of mutation can occur: Cauchy, Lévy, and two types of Gaussian mutation (i.e., the standard Gaussian mutation and the single-point mutation, where in each occurrence, only one dimension is changed). A four-string vector containing the probabilities of choosing each type of mutation is added to the individual and is modified according to the performance of each type of mutation.

The solution presented in this paper to control the shape of the mutation distribution employs only one parameter for each individual: the q-Gaussian distribution parameter q. Differently from the strategy used in Dong et al. (2007), the control of the parameter q allows to smoothly and continuously change the shape of the distribution, as q is a real parameter and a small change in its value causes a small change in the shape of the mutation distribution. In this way, the main contributions of this paper are (1) the use of the q-Gaussian mutation generated from anisotropic and isotropic distributions are proposed and compared in EP; (2) self-adaptation is used to control the parameter q, which allows changing the shape of the distribution during the solving process; (3) Gaussian, Cauchy, and q-Gaussian mutations generated from anisotropic and isotropic distributions are compared in a series of experiments based on a suite of benchmark test functions; and (4) an ES with q-Gaussian mutation is proposed and compared with other continuous optimization EAs.

3 The q-Gaussian distribution

One of the most interesting properties of the Gaussian distribution is that it maximizes, under certain constraints, the entropy in the form

$$ S = \int\limits_{-\infty}^{+\infty}{p(x)\ln(p(x)){\text{d}}x}, $$

(1)

where $p(x)$ is the distribution density function. This entropy is known as the Boltzman–Gibbs entropy. While the Gaussian distribution is an attractor for independent systems with a finite second moment, it does not represent well correlated systems with an infinite second moment (Thistleton et al. 2007). In this concern, Tsallis (1988) proposed a generalized entropy form as follows:

$$ S_{q} = {\frac{1 - \int_{-\infty}^{+\infty}{p(x)^{q}{\text{d}}x}}{q-1}}, $$

(2)

where ${q \in \mathbb {R}}$. Equation 2 recovers the entropy form given by Eq. 1 in the limit $q \rightarrow 1$. The q-Gaussian distribution arises when maximizing the generalized entropy form given by Eq. 2. The q-Gaussian distribution has interesting properties. The parameter q controls the shape of the q-Gaussian distribution. The second-order moment is finite for q < 5/3 and the q-Gaussian distribution reproduces the usual Gaussian distribution when $q\rightarrow1$. When $q<1$, the q-Gaussian distribution has a compact form, and decays asymptotically as a power law for $1<q<3$. When $q=2$, the q-Gaussian distribution reproduces the Cauchy distribution, while for $q=(3+d)/(1+d)$ where $0<d<\infty$, it becomes a Student’s t-distribution with d degrees of freedom (Souza and Tsallis 1997).

When $-\infty <q <3$, the q-Gaussian distribution density (Thistleton et al. 2007) is given by

$$ p_{q(\bar{\mu}_{q}, \bar{\sigma}_{q})}(x) = {\frac{\sqrt{B_{q}}} {A_{q}}}\;{\text{e}}_{q}^{-B_{q}(x-\bar{\mu}_{q})^2}, $$

(3)

where $\bar{\mu}_{q}$ and $\bar{\sigma}_{q}$ are the q-mean and the q-variance, respectively, A _q is the normalization factor, $B_{q}$ controls the width of the q-Gaussian distribution, and ${\text{e}}_{q}^{-y}$ is the q-exponential function of $-y$ defined as follows:

$$ {\text{e}}_{q}^{-y}\equiv\left\{\begin{array}{ll} \left(1+(q-1)y\right)^{-{\frac{1} {q-1}}}, & \hbox{if}\;1+(q-1) y \geq 0\\ 0, & \hbox{otherwise}\\ \end{array} \right.. $$

(4)

When $q\rightarrow1$, the limit of the q-exponential function of $-y$, if we write $z=(q-1)y$, is given by

$$ \lim_{q\rightarrow1} {\text{e}}_{q}^{-y} = \lim_{z\rightarrow0} \left( \left(1+ z \right)^{{\frac{1}{z}}}\right)^{-y}. $$

(5)

The limit of the function $(1+ z )^{{\frac{1}{z}}}$ is well known and converges to e when $z\rightarrow0$. Thus, we have

$$ \lim_{q\rightarrow1} {\text{e}}_{q}^{-y} = {\text{e}}^{-y} ,$$

(6)

i.e., the q-Gaussian exponential converges to the exponential when $q \rightarrow 1$.

In Eq. 3, the q-mean $\bar{\mu}_{q}$ and the q-variance $\bar{\sigma}_{q}$ (Thistleton et al. 2007) are, respectively, defined as follows:

$$ \bar{\mu}_{q} \equiv {\frac{\int{x p(x)^{q}} {\text{d}}x}{\int{p(x)^{q}}{\text{d}}x}}, $$

(7)

$$ \bar{\sigma}_{q}^{2}\equiv{\frac{\int{(x- \bar{\mu}_{q}})^{2}p(x)^{q}{\text{d}}x}{\int{p(x)^{q}}{\text{d}}x}}, $$

(8)

and, respectively, reduce to the usual mean and variance when $q\rightarrow1$.

In Eq. 3, the normalization factor $A_{q}$ is given by $A_{q}= \int_{-\infty}^{+\infty}{e_{q}^{-(x-\bar{\mu}_{q})^2}}{\text{d}}x$ (Umarov et al. 2008) and $B_{q}$ is given by

$$ B_{q} = \left((3-q) \bar{\sigma}_{q}^{2}\right)^{-1}. $$

(9)

A random variable x taken from a q-Gaussian distribution with q-mean $\bar{\mu}_{q}$ and q-variance $\bar{\sigma}_{q}^{2}$ is here denoted by $x\sim\mathcal{N}_{q}(\bar{\mu}_{q},\bar{\sigma}_{q})$. In this paper, the generalized Box–Müller method proposed in Thistleton et al. 2007, which is very simple (see its pseudo-code in Thistleton et al. 2007) and allows to generate samples from q-Gaussian distributions for $-\infty<q<3$, is employed to generate q-Gaussian random variables $x \sim \mathcal{N}_{q}(0,1)$.

Figure 1 presents the empirical q-Gaussian distribution for random variables $x \sim \mathcal{N}_{q}(0,1)$ with different values of q. It can be observed that larger values of q result in longer tails of the q-Gaussian distribution.

4 Self-adaptation of the mutation distribution

In an m-dimensional real-valued search space, a new candidate solution is generated by the EA’s mutation operator from individual ${\mathbf{x}}_{i}$ as follows:

$$ \tilde{{{\mathbf{x}}}}_{i} = {{\mathbf{x}}}_{i} + {{\mathbf{Cz}}} , $$

(10)

where $i=1,\ldots,\mu$, z is an m-dimensional random vector generated from a given multivariate distribution with zero mean, and C is the matrix which defines the mutation strength in each coordinate $j=1, \ldots, m$. In the most simple case,

$$ {{\mathbf{C}}} = \sigma {{\mathbf{I}}}, $$

(11)

where I is the identity matrix and the unique parameter, $\sigma$, defines the mutation strength for all components of ${\mathbf{x}}_{i}$. There are some cases, however, where it is interesting to define one different parameter $\sigma(j)$ for each component of ${\mathbf{x}}_{i}$. In this way, we have

$$ {{\mathbf{C}}}=\hbox{diag}({\varvec{\sigma}}^{{\rm T}}). $$

(12)

That is, C is a diagonal matrix with the main diagonal composed by the elements of vector ${\varvec{\sigma}} = [\sigma(1)\,\sigma(2) \cdots \sigma(m)]^{{\rm T}}$. In the most general situation, e.g., in the covariance matrix adaptation ES (CMA-ES) (Hansen and Ostermeier 2001), C is a matrix with elements indicating the correlation between the components of z.

In general, when self-adaptation is used in the standard EP (Yao et al. 1999) and in ES (Beyer and Schwefel 2002), the mutation strength parameter for each offspring $i=1, \ldots, \mu$ of the population is multiplicatively updated. If all elements of the mutation parameter vector are equal [i.e., $\sigma_{i}(j) =\sigma_{i}$ for $j=1, \ldots, m$, see Eq. 11], then the updated value of the mutation parameter is given by

$$ \tilde{\sigma}_{i} = \sigma_{i} {\text{e}}^{\tau_{b} {\mathcal{N}}(0,1)} , $$

(13)

where $\tau_{b}$ denotes the standard deviation of the Gaussian distribution used to generate the change in $\sigma_{i}$. If each element of the vector $\mathbf{x}_{i}$ has an individual mutation strength parameter, as shown in Eq. 12, $\sigma_{i}(j)$ is updated according to the following formula:

$$ \tilde{\sigma}_{i}(j) = \sigma_{i}(j) {\text{e}}^{\tau_{b} {\mathcal{N}}(0,1)_{i} + \tau_{c} {\mathcal{N}}(0,1)} , $$

(14)

where $\tau_{b}$ denotes the standard deviation of the Gaussian distribution used to generate the random deviate $\mathcal{N}(0,1)_{i}$, which is common for all elements of the vector $\mathbf{x}_{i}$, and $\tau_{c}$ is the standard deviation of the Gaussian distribution used to generate the separated random deviate $\mathcal{N}(0,1)$ for each element $j=1, \ldots, m$.

In EAs, the use of the Gaussian distribution is generally employed to generate the m-dimensional vector $\mathbf{z}$ (Beyer and Schwefel 2002). Here, an m-dimensional random vector generated from the Gaussian distribution is denoted by $\mathbf{z} \sim \mathcal{N}^{m}$. A Gaussian random vector $\mathbf{z} \sim \mathcal{N}^{m}$ is generated by sampling m independent Gaussian variables $\mathcal {N}(0,1)$. It is important to observe that when the same procedure is adopted to generate multivariate random samples with a heavy tail distribution, some directions in the search space are much more explored than others, i.e., the distribution is highly anisotropic.

To the best of the authors’ knowledge, all stochastic search algorithms with the q-Gaussian mutation, like the generalized simulated annealing (Tsallis and Stariolo 1996) and the generalized genetic algorithm (Moret et al. 2006), make use of anisotropic q-Gaussian distributions generated by sampling m independent q-Gaussian variables. Most mutation operators for EAs that are based on heavy tail distributions, e.g., in fast evolution strategies (Yao and Liu 1997), FEP (Yao et al. 1999), and LEP (Lee and Yao 2004), make use of anisotropic distributions generated by sampling independent random variables too. The use of random variables generated by sampling independent random variables taken from a heavy tail distribution is very interesting for optimization problems with separable functions, as most of the large steps occur close to the coordinate axes (Obuchowicz 2003; Thistleton et al. 2007) and the optimization can be solved by m one-dimensional optimization processes parallel to the coordinate axes. However, the performance of the optimization process can be strongly affected for some non-separable functions.

In this paper, we investigate the use of two multivariate q-Gaussian distributions, the anisotropic q-Gaussian distribution generated by sampling independent q-Gaussian random variables (Sect. 5.1) and the q-Gaussian distribution generated from isotropic distributions (Sect. 5.2), to produce new candidate solutions in EAs. As mentioned earlier, the use of the q-Gaussian distribution allows us to reproduce different distributions by changing only one real parameter q.

We propose to self-adapt the parameter q, which defines the shape of the distribution. Based on the mutation strength self-adaptation (Beyer and Schwefel 2002), we propose to multiplicatively update the parameter q in individual i as follows:

$$ \tilde{q}_{i} = q_{i} {\text{e}}^{\tau_{q} {\mathcal{N}}(0,1)}, $$

(15)

where $\tau_{q}$ denotes the standard deviation of the Gaussian distribution. In this way, different distributions can be reproduced during the evolutionary process. However, it is not possible to identify the separated influence of a change in $\varvec{\sigma}_{i}$ or $q_{i}$ in the fitness of individual i if both mutation strength vector and parameter q are mutated in the same generation for individual i. For example, a beneficial mutation in $\varvec{\sigma}_{i}$ can be masked in the fitness of individual i if the parameter q is mutated to a bad value in the same generation. Here, $\varvec{\sigma}_{i}$ and $q_{i}$ are not mutated together (i.e., in the same generation) in each individual. The mutation strength vector $\varvec{\sigma}_{i}$ is updated for individual i in each generation if a uniform random number in the range [0, 1] is equal to or larger than a real parameter $r_{q} \in [0,1]$. Otherwise, the value of $q_{i}$ is updated.

5 Evolutionary programming algorithms with q-Gaussian mutation

In order to test the proposed ideas, an EP algorithm with the q-Gaussian mutation, called qGEP for the anisotropic version and IqGEP for the version generated from isotropic distributions, is proposed and presented in Algorithm 1. EP was selected to test the q-Gaussian mutation because it only uses mutation as a transformation operator, which makes it easier to compare the q-Gaussian mutation with Cauchy and Gaussian mutation.

There are two main differences of the EP algorithm presented in Algorithm 1 from the Gaussian EP, FEP (Yao et al. 1999), and LEP (Lee and Yao 2004). First, in the proposed algorithm, the q-Gaussian mutation is employed (step 10) instead of the Gaussian (Gaussian EP), Cauchy (FEP), or Lévy (LEP) mutation. Second, a procedure to adapt the q parameter is adopted in the proposed algorithm, i.e., steps 5 to 9 in Algorithm 1. For Gaussian EP, FEP, and LEP, steps 5, 7, 8, and 9 in Algorithm 1 are removed.

5.1 EP with q-Gaussian mutation generated from anisotropic distribution

In an EA with q-Gaussian mutation generated from anisotropic distribution, the vector $\mathbf{z}$ in Eq. 10 is created by sampling m independent q-Gaussian random variables. Each element $z(j)$, $j=1,\ldots,m$, of the random mutation vector $\mathbf{z}$ is generated as follows:

$$ z(j) \sim {\mathcal{N}}_{q}(0,1) . $$

(16)

Here, an m-dimensional random vector generated from the multivariate anisotropic q-Gaussian distribution is denoted by $\mathbf{z} \sim \mathcal{M}_{q}^{m}$.

In order to investigate the contribution of the scheme of self-adapting the parameter q, the qGEP algorithm is compared, in Sect. 8, with three other approaches where the parameter q is fixed, i.e., the shape of the mutation distribution is fixed during the evolutionary process. For fixed values of q, the q-Gaussian mutation can reproduce Gaussian and Cauchy mutations. Lévy mutation can be still reproduced, as Lévy and q-Gaussian distributions are related for some values of q (Rathie and Da Silva 2008). In all approaches described here, Eq. 16 is employed to generate new candidate solutions. The approaches are defined as follows:

Algorithm GEP: uses only one fixed parameter $q=1.0$ for all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 1.0 for $i=1,\ldots,\mu$ and $r_{q}=0$. In this way, the Gaussian mutation generated from sampling m independent Gaussian random variables is reproduced.
Algorithm CEP: uses only one fixed parameter $q=2.0$ for all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 2.0 for $i=1,\ldots,\mu$ and $r_{q}=0$. In this way, the anisotropic Cauchy distribution is reproduced.
Algorithm EP ($q=1.5$): uses only one fixed parameter $q=1.5$ for all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 1.5 for $i=1,\ldots,\mu$ and $r_{q}=0$.
Algorithm qGEP uses one changing q for each individual and $r_{q}>0$, i.e., Eq. 15 is employed.

5.2 EP with q-Gaussian mutation generated from isotropic distribution

Here, the use of q-Gaussian mutations generated from isotropic distributions is proposed. In Obuchowicz (2003), a method to generate the Cauchy random mutation vector from an isotropic distribution was proposed. For this purpose, the random mutation vector is generated with (1) a random direction uniformly distributed on the surface of the m-dimensional unit hypersphere, and (2) an Euclidean norm obtained from a Cauchy distribution.

Based on the works Obuchowicz (2003) and Thistleton et al. (2007), we propose to generate the random mutation vector $\mathbf{z}$ as follows:

$$ \mathbf{z} \sim r \mathbf{u}, $$

(17)

where $r \sim \mathcal{N}_{q}(0,1)$, i.e., a random variable with the q-Gaussian distribution, and $\mathbf{u}$ is an uniform random vector obtained by sampling a random vector with a Gaussian distribution and normalizing it to length one, i.e., $\mathbf{u}=\mathbf{v} / \|\mathbf{v}\|$ where $\mathbf{v} \sim \mathcal{N}^{m}$ and $\|\mathbf{v}\|$ denotes the Euclidean norm of the vector $\mathbf{v}$. In this paper, an m-dimensional random vector generated from an isotropic distribution and step length given by a q-Gaussian distribution is denoted by $\mathbf{z} \sim \mathcal{N}_{q}^{m}$.

Figure 2 presents two-dimensional multivariate samples obtained from an anisotropic q-Gaussian distribution, a Gaussian distribution, and a q-Gaussian distribution generated as described in this section. It can be observed that, in the anisotropic q-Gaussian distribution, larger steps occur more often close to the coordinate axes. This effect is more evident in high-dimensional spaces and/or for larger values of q for $q<3$.

The EP algorithm with q-Gaussian mutation generated from isotropic distribution, called IqGEP here, is compared, in Sect. 8, with three other approaches where the parameter q is fixed. In all approaches, Eq. 17 is employed to generate the new candidate solutions. The approaches are defined as follows:

Algorithm IGEP: uses only one fixed parameter $q=1.0$ for all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 1.0 for $i=1,\ldots,\mu$ and $r_{q}=0$. In this way, the Gaussian mutation generated from an isotropic distribution and with step length given by the Gaussian distribution is reproduced.
Algorithm ICEP: uses only one fixed parameter $q=2.0$ for all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 2.0 for $i=1, \ldots, \mu$ and $r_{q}=0$. In this way, the Cauchy mutation generated from an isotropic distribution and with step length given by the Cauchy distribution is reproduced.
Algorithm IEP ($q=1.5$): uses only one fixed parameter $q=1.5$ all individuals. That is, in Algorithm 1, the initial value of $\tilde{q}_{i}$ is equal to 1.5 for $i=1, \ldots, \mu$ and $r_{q}=0$.
Algorithm IqGEP uses one changing q for each individual and $r_{q}>0$, i.e., Eq. 15 is employed.

6 Analysis of the q-Gaussian mutation

In this section, the impact of changing the mutation strength parameter $\sigma$ and the q-Gaussian distribution parameter q on the probability of generating a jump $\sigma x$, where $x \sim \mathcal{N}_{q}(0,1)$, in the neighbourhood of a point $x^{*}$ is analysed. The analyses presented here are similar to the analysis of Gaussian and Cauchy mutations presented in Yao et al. (1999).

When $\bar{\mu}_{q}=0$ and $\bar{\sigma}_{q}^{2}=1$, the q-Gaussian distribution density for $-\infty<q<3$ (see Eq. 3) is given by

$$ p_{q}(x) = {\frac{1}{\sqrt{3-q}A_{q}}} \quad {\text{e}}_{q}^{\frac{-x^2}{3-q}}, $$

(18)

where considering $\left(1+x^{2}(q-1) / \sqrt{3-q} \right) \geq 0$, the q-exponential is given by

$${\text{e}}_{q}^{\frac{-x^2}{3-q}}=\left(1+{\frac{q-1}{3-q}} \quad x^{2} \right)^{\frac{1}{1-q}}. $$

(19)

We consider that the q-Gaussian mutation is applied in an EA in the one-dimensional case, i.e., the mutation produces a jump $\sigma x$, where $\sigma$ is the mutation strength parameter and $x \sim \mathcal{N}_{q}(0,1)$. For simplicity, we will consider $1<q<3$. The probability of reaching the neighbourhood of a point $x^{*}$ from a jump $\sigma x$, i.e., the probability that $x^{*}-\epsilon \leq \sigma x \leq x^{*}+\epsilon$, where $\epsilon>0$ defines the neighbourhood size, is given by

$$ P_{q}(|\sigma x - x^{*}| \leq \epsilon) = \int\limits_{\frac{x^{*}- \epsilon}{\sigma}}^{\frac{x^{*}+\epsilon}{\sigma}}{p_{q}(x){\text{d}}x}. $$

(20)

The mean value theorem for definite integrals states that there is a number $\delta$ ($0<\delta<2\epsilon$), at which the value of the integral given by Eq. 20 is equal to the difference between the limits of the integral multiplied by $p_{q}((x^{*} - \epsilon + \delta)/ \sigma)$. In this way, Eq. 20 can be written as follows:

$$ P_{q}(|\sigma x - x^{*}| \leq \epsilon) = {\frac{2 \epsilon} {\sigma}} p_{q} \left({\frac{x^{*} - \epsilon + \delta}{\sigma}} \right). $$

(21)

Substituting Eqs. 18 and 19 in Eq. 21, we obtain

$$ P_{q}(|\sigma x - x^{*}| \leq \epsilon) = {\frac{2 \epsilon}{\sigma \sqrt{3-q}A_{q}}} \left(1+ {\frac{q-1}{3-q}} {\frac{c^{2}} {\sigma^{2}}}\right)^{\frac{1}{1-q}}, $$

(22)

where $c = x^{*} - \epsilon + \delta$.

6.1 The impact of changing σ

Taking the derivative of Eq. 22 with respect to σ, we can write

$$ \begin{aligned} &{\frac{\partial}{\partial \sigma}}P_{q}(|\sigma x - x^{*}| \leq \epsilon) \\ &\quad={\frac{2 \epsilon}{\sqrt{3-q}A_{q}}}{\frac{\partial}{\partial \sigma}} \left({\frac{1}{\sigma}}\left(1+ \frac{q-1}{3-q} \frac{c^{2}} {\sigma^{2}}\right)^{\frac{1}{1-q}}\right), \end{aligned} $$

(23)

and, then,

$$ \begin{aligned} {\frac{\partial}{\partial \sigma}}P_{q}(|\sigma x-x^{*}| \leq \epsilon)& = {\frac{2 \epsilon}{\sqrt{3-q}A_{q}}}\left({\frac{2q_{a}c^{2}} {(q-1)\sigma^{4}}} \left(1+q_{a}{\frac{c^{2}} {\sigma^{2}}}\right)^{\frac{q}{1-q}}\right.\\ &\quad -\left.{\frac{1}{\sigma^2}}\left(1+ q_{a}{\frac{c^{2}} {\sigma^{2}}}\right)^{\frac{1}{1-q}}\right), \end{aligned} $$

(24)

where $q_{a} = (q-1)/(3-q)$. After some manipulation, we obtain

$$ \begin{aligned} &{\frac{\partial}{\partial \sigma}} P_{q}(|\sigma x - x^{*}| \leq \epsilon) \\ &\quad={\frac{2 \epsilon}{\sqrt{3-q}A_{q} \sigma^{2}}}\left(1+ q_{a}{\frac{c^{2}}{\sigma^{2}}}\right)^{\frac{1}{1-q}} {\frac{c^{2}-\sigma^{2}}{q_{a}c^{2}+\sigma^{2}}}. \end{aligned} $$

(25)

From Eq. 25, we can write for $1<q<3$

$$ {\frac{\partial}{\partial \sigma}} P_{q}(|\sigma x - x^{*}| \leq \epsilon)\left\{{\begin{array}{ll} {> 0\,{\mathrm{if}}\,|c|> \sigma}\\ {< 0\,{\mathrm{if}}\,|c|< \sigma}\\ \end{array}} \right. . $$

(26)

Equation 26 states that an increase in the mutation strength $\sigma$ results in an increase in the probability of reaching the point c, which is located in the neighbourhood of the point $x^{*}$, from a jump $\sigma x$ only if $|c|<\sigma$. In other words, an increase in the mutation strength is beneficial to reach the neighbourhood of a point $x^{*}$ if it is distant ($|c|>\sigma$) from the current solution (before the mutation). A similar result was found for the Gaussian and Cauchy mutations (Yao et al. 1999). The above analysis also states that the probability changing rate given by Eq. 25 depends on the value of q.

6.2 The impact of changing q

Taking the derivative of Eq. 22 with respect to q, we can write

$$ \frac{\partial}{\partial q} P_{q}(|\sigma x - x^{*}| \leq \epsilon) = \frac{2 \epsilon}{\sigma} \frac{\partial}{\partial q} \left({\frac{1}{\sqrt{3-q}A_{q}}} y \right) , $$

(27)

where

$$ y = \left(1+ \frac{q-1}{3-q} \frac{c^{2}} {\sigma^{2}}\right)^{\frac{1}{1-q}} $$

(28)

is the q-exponential given by Eq. 19 at the point $x=c/\sigma$. We now analyse the derivative of the q-exponential y given by Eq. 28). Applying the natural logarithm in both sides of Eq. 28 and taking the derivative on q, we have

$$ {\frac{\partial y}{\partial q}}= y \frac{\partial}{\partial q} \left({\frac{1}{1-q}} \ln \left(1+\frac{q-1}{3-q} \frac{c^{2}}{\sigma^{2}}\right)\right). $$

(29)

After some manipulation, we can write

$$ {\frac{\partial y}{\partial q}}= \frac{1}{q-1} p^{\frac{1}{1-q}} \left({\frac{\ln(p)}{q-1}}-\frac{2}{(3-q)^{2}} \frac{c^{2}}{\sigma^{2}}p^{-1}\right), $$

(30)

where

$$ p = 1+ \frac{q-1}{3-q} \frac{c^{2}}{\sigma^{2}} , $$

(31)

i.e., $y=p^{\frac{1}{1-q}}$. In Eq. 30, $p \geq 1$ for $1<q<3$. In this way, we can write

$$ {\frac{\partial y}{\partial q}} \left\{{\begin{array}{ll} {> 0\;{\mathrm{if}}\;a> b}\\ {< 0\;{\mathrm{if}}\;a< b}\\ \end{array}} \right., $$

(32)

where $a = {\frac{\ln (p)}{q-1}}$ and $b =\frac{2}{(3-q)^{2}} \frac{c^{2}}{\sigma^{2}} p^{-1}$. While $|a|>|b|$ for small values of $|c|$ (until the value of c where $a=b$), $|a|<|b|$ for larger values of $|c|$ as $\lim _{|c| \to \infty} a = +\infty$. and $\lim _{|c| \to \infty} b={\frac{2}{(3-q)(q-1)}}$.

Figure 3 shows the regions where the derivative of the q-exponential given by Eq. 28 is positive or negative for $\sigma=3$ and $|c| \leq 10$. When the derivative is positive, increasing (or decreasing) q by a small value implies increasing (or decreasing) the value of the q-exponential at the point c, while the opposite occurs for a negative derivative. It can be observed that the location c where the derivative changes its sign moves according to the value of q. The larger the value of q, the farther the locations of $|c|$ where the derivative changes its sign. Figure 3 (and Eq. 30) also suggests that values of q close to 3 are not interesting as the location where the derivative changes its sign is very far away from the current solution.

Using the derivative of the q-exponential given by Eq. 28, we can now write, after some manipulations, the derivative of Eq. 22 with respect to q (see Eq. 27)

$$\begin{aligned} {\frac{\partial}{\partial q}} P_{q}(|\sigma x - x^{*}| \leq \epsilon)&= {\frac{2 \epsilon p^{{\frac{1}{1-q}}}}{A_{q}\sigma \sqrt{3-q}(q-1)}} \\&\quad\times\left({\frac{(q-1)(A_{q}-(3-q) A_{q}')} {(3-q)A_{q}}}+ a-b \right).\end{aligned} $$

(33)

As the first term inside the parenthesis does not depend on c, the analysis is similar to that one presented before for the q-exponential given by Eq. 30. Equation 33 indicates where a small change in the value of q is beneficial to increase the probability of reaching the neighbourhood of a point $x^{*}$ from a jump $\sigma x$. In other words, an increase in the value of q is beneficial to reach the neighbourhood of a point $x^{*}$ if it is distant (at a location where the derivative given by Eq. 33 is positive) from the current solution (before the mutation). Otherwise, the value of q should be decreased.

7 Restart q-Gaussian evolution strategy

In Sect. 5, EP algorithms with q-Gaussian mutation were presented. However, it is important to observe that the proposed self-adapted q-Gaussian mutation can be used in other EAs. In this section, the q-Gaussian mutation is proposed in a ($\mu$, $\lambda$)-ES with recombination. The proposed algorithm, called RqGES, is presented in Algorithm 2.

Besides mutation, like in the EP algorithms presented in Sect. 5, the proposed RqGES employs an intermediate recombination with $\rho=2$, i.e., two parents are randomly chosen from the parent population and are mixed to generate an offspring. The intermediate recombination is applied to the variables ($\mathbf{x}_{i}$) and parameters ($\sigma_{i}$ and $q_{i}$). After recombination, the q-Gaussian mutation generated from isotropic distribution is used. In multimodal problems, larger population sizes can be useful to help the population escaping from local optima. In this way, following the scheme proposed in (Auger and Hansen 2005a), the population is restarted with a larger number of individuals if a convergence criterion is met. Here, like in (Auger and Hansen 2005a), the population is increased by a factor of 2 until a maximum allowed size ${\lambda}_{\max}$.

8 Experimental study

In order to investigate the performance of the proposed algorithms, 25 benchmark functions as described in Suganthan et al. (2005) are selected as the test suite in our experiments. The test functions, which should be minimized, are presented in Table 1 and are used with the same parameters as described in Suganthan et al. (2005).

In the test suite, functions $f_{1}$ to $f_{5}$ are unimodal and the remaining functions are multimodal. Functions $f_{15}$ to $f_{25}$ are hybrid composition functions (Suganthan et al. 2005). Such functions are composed of basic functions, which results in hybrid composite functions with different basic function properties, and are given by

$$ f({\mathbf x}) = \sum_{i=1}^{10} \left(w_{i} \left( f_{i}' \left({\mathbf M}_{i} ({\mathbf x}-{\mathbf o}_{i} ) / \lambda_{i} \right) + {\text{bias}}_{i} \right) \right)+f_{{\text{bias}}} , $$

(34)

where $f_{i}'(.)$ is the normalized i-th basic function, ${\mathbf M}_{i}$, $\lambda_{i}$, and $w_{i}$ are the linear transformation matrix, the compress rate, and the weight value for each function $f_{i}(.)$, respectively, ${\mathbf o}_{i}$ defines the position of the local and global optima, bias_i defines which optimum is the global optimum, and $f_{\text{bias}}$ is the bias in the function value. For example, function $f_{15}$ is composed of five basic functions: Rastrigin, Weierstrass, Griewank, Ackley, and Sphere. See Suganthan et al. (2005) for details and parameters.

Table 1 Test functions, where the vector ${\mathbf x^{*}}$ is the global optimum and the range is for each element of ${\mathbf x}$

Use of the q-Gaussian mutation in evolutionary algorithms

Abstract

Similar content being viewed by others

Generalized Hybrid Evolutionary Algorithm Framework with a Mutation Operator Requiring no Adaptation

On the Mutation Operators in Evolution Strategies

Multi-search differential evolution algorithm

Explore related subjects

1 Introduction

2 Related work

3 The q-Gaussian distribution

4 Self-adaptation of the mutation distribution

5 Evolutionary programming algorithms with q-Gaussian mutation

5.1 EP with q-Gaussian mutation generated from anisotropic distribution

5.2 EP with q-Gaussian mutation generated from isotropic distribution

6 Analysis of the q-Gaussian mutation

6.1 The impact of changing σ

6.2 The impact of changing q

7 Restart q-Gaussian evolution strategy

8 Experimental study

8.1 Experimental design

8.2 Experimental results: evolutionary programming

8.2.1 Experimental results on qGEP

8.2.2 Experimental results on IqGEP

8.2.3 Analysis

8.3 Comparison of restart q-Gaussian evolution strategy to other continuous EAs

9 Conclusion and future work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation