The time complexity analysis of a class of gene expression programming

Du, Xin; Ni, Youcong; Xie, Datong; Yao, Xin; Ye, Peng; Xiao, Ruliang

doi:10.1007/s00500-014-1551-y

The time complexity analysis of a class of gene expression programming

Foundations
Published: 13 December 2014

Volume 19, pages 1611–1625, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

The time complexity analysis of a class of gene expression programming

Download PDF

Xin Du^1,2,
Youcong Ni¹,
Datong Xie⁴,
Xin Yao²,
Peng Ye³ &
…
Ruliang Xiao¹

304 Accesses
8 Citations
Explore all metrics

Abstract

This paper studies the time complexity of gene expression programming based on maintaining elitist (ME-GEP). Using the theory of Markov chain and the technique of artificial fitness level, the properties of transition matrices of ME-GEP are analyzed. Based on the properties, the upper and lower bounds of the average time complexity of ME-GEP are obtained. Furthermore, the upper bound is estimated, which is determined by the parameters of ME-GEP algorithm. And the theoretical results acquired in this paper are used to analyze ME-GEP for solving function modeling and clustering problem. At last, a set of experiments are performed on these problems to illustrate the effectiveness of theoretical results. The results show that the upper bound of expected first hitting time can be used to direct the algorithm design of ME-GEP.

Multi-Expression Based Gene Expression Programming

Evolutionary Continuous Optimization of Hybrid Gene Regulatory Networks

Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Gene expression programming (GEP) (Ferreira 2001) is a genotype/phenotype evolutionary algorithm. As a descendant of genetic algorithms (GAs) and genetic programming (GP), GEP combines their advantageous features and eliminates their main handicaps. The essential feature among GEP, GA and GP is that GEP separates genotype (linear structure) from phenotype (non-linear structure) of chromosomes. GA and GP only have one entity, such as linear strings of fixed length (chromosomes) or nonlinear entities of different sizes and shapes (parse trees).

From the history of life on the earth (Smith and Szathmry 1995), this kind of system such as GA and GP is condemned to have one of two limitations (Ferreira 2001): if they are easy to be manipulated genetically, they lose in functional complexity (the case of GA); if they exhibit a certain amount of functional complexity, they are extremely difficult to reproduce with modification (the case of GP). In GEP, the individuals are encoded as linear strings of fixed length (chromosomes) which are afterwards expressed as nonlinear entities of different sizes and shapes (the phenotype). It not only is manipulated genetically in an easy way but also exhibits a certain amount of functional complexity. Thus, GEP overcomes the limitation of representation of GA and makes up the defect of complex genetic operators of GP. Those excellent properties of GEP have attracted wide attention of scholars around the world and have aroused their research interests. In recent years, GEP has rapidly become a powerful tool of automatic programming. And it has been widely applied in symbolic regression (Peng et al. 2005; Xin et al. 2006), time series prediction (Lopes et al. 2004), data mining (Chi et al. 2003; Bin et al. 2005; Karakasis and Stafylopatis 2008) and many other fields (Shi et al. 2010; Canakci et al. 2009; Teodorescu 2004; Teodorescu and Sherwood 2008).

Up to now, theoretical results of EA on computational time complexity are comparatively abundant. The representative works are as follows: for specific problems (e.g. Rudolph 1996; Garnier et al. 1999) and simple single-individual EA, such as the (1 + 1) EA (e.g. Droste et al. 2002; Wegener and Witt 2005). It is to be noted that He and Yao have done a series of works about the time complexity for several kinds of EAs and different problems (He and Yao 2002, 2003, 2004; Lehre and Yao 2012). Further, some more profound results about the computation time of EAs by using the drift analysis and other tools such as Dynkin’s formula in stochastic process theory were given Ding and Yu (2005). A survey of the results obtained in the past decade along the time complexity of EAs was presented Oliveto et al. (2007). After that, a simplified drift theorem was introduced Happ et al. (2008). A further simplification of the drift theorem which is particularly suited for the analysis of EAs was presented Oliveto and Witt (2008). By exploiting the relationship between the convergence rates and the expected first hitting times, the upper bounds of the first hitting times of evolutionary algorithms based on He and Kang works (He and Kang 1999) were given Yu and Zhou (2008). A general idea for analyzing population-based EAs on unimodal problems using Markov chain theory and Chernoff bounds was given Chen et al. (2009). However, how to make the upper or lower bounds of the expected first hitting times associate with the parameters of EAs is sill unresolved.

The individual of GEP has linear genotype and non-linear phenotype. Simultaneously, GEP not only contains all the genetic operators of traditional EA but also introduces some new genetic operators. All above characters of GEP bring some challenges to the time complexity analysis. Now, in the case of GEP, most studies of GEP focus on the realizations, improvements and applications of the algorithm. There have been few studies involved in theoretical analysis of GEP, especially the time complexity. To the best of our knowledge, the convergence of GEP under the smallest residual sum of squares was studied Yuan et al. (2004, 2006). In view of above results, we Du et al. (2009, 2010) studied the convergence and convergence rate of ME-GEP. We not only proved that ME-GEP algorithm converged to the global optimal solution in probability but also obtained the upper bound of the convergence rates of GEP. The results were that the upper bound was dependent on the second largest eigenvalue of the transition matrix of the Markov chain by the method of Jordan Blocks of matrix and the second largest eigenvalue of the transition matrix was determined by the parameters of ME-GEP algorithm.

This paper concerns theoretical analysis of the average time complexity of ME-GEP on optimization problems with a unique global optimum. A general and easy-to-apply approach to estimate the average time complexity for ME-GEP algorithm is presented. By using the properties of Markov chain and artificial fitness level technique, the upper and lower bounds of the average time complexity of ME-GEP algorithm are obtained. More importantly, we get the relation between the upper bound and the parameters of ME-GEP algorithm. These results obtained in this paper are the general theoretical results. Moreover, it has been shown that the analytical methods and techniques adopted in this paper have broad suitability and useful reference for the researchers in the theory of GEP. The rest of this paper is organized as follows: Sect. 2 introduces GEP and ME-GEP algorithm. In Sect. 3, the characteristic of nonnegative matrix and artificial fitness level technique are introduced. In addition, starting with the Markov chain model associated with ME-GEP algorithm, some fundamental properties of transition matrix of ME-GEP algorithm are proved. Section 4 gives the upper and lower bound of average time complexity of ME-GEP algorithm and discusses the relations between the upper bound and the parameters of ME-GEP. As an application of the theoretical results acquired in this paper, in Sect. 5, the upper bounds of average time complexity of ME-GEP for function modeling and clustering problem are also explored. And the results in Sect. 4 are verified. Finally, some short comments on the results obtained in this paper and future work are summarized.

2 Background

2.1 GEP algorithm

In the following, we will first introduce the genotype, phenotype and genetic operators in GEP.

The genotype is composed of head and tail. Let $x$ be a genotype and set $x = \{ \hat{x}_1 ,\hat{x}_2, \ldots ,\hat{x}_l \} $ where

$$\begin{aligned} \hat{x}_i \in \left\{ \begin{array}{l@{\quad }l} T \vee F, &{} 1 \le i \le h \\ T, &{} h < i \le l \\ \end{array} \right. . \end{aligned}$$

Here, $l$ represents the length of genotype. $F$ represents function set,which is composed of operators, the primary functions or user-defined functions. $T$ represents terminal set, which is composed of constants and variables. Both $F$ and $T$ are finite. The elements in head are from $F$ or $T$, and the elements in tail are from $T$. Variables $h$ and $tal$ denote the length of head and tail, respectively. For any given $h$, $tal$ is evaluated by formula $tal = h \times (n_{\max }- 1) + 1 $, $l = h + tal$, where $n_{\mathrm{{max}}} \mathrm{{ = max(paraNum(}}f_{} \mathrm{{)}}_{f_{} \in \mathrm{{F}}} ) $, paraNum(f) is the number of parameters of function .

For a given genotype, the phenotype is inferred as follows by Karva Language (Ferreira 2001), which reads genotype from the start character depending on the number of arguments of each element (functions may have a different number of arguments, whereas terminals have an arity of zero). And the representation (algebraic expression) is inferred from the phenotype according to the method of inorder traversal. The transformation relation between genotype, phenotype and representation is shown as Fig. 1.

There are three kinds of genetic operators in basic GEP (Ferreira 2001), including mutation, transposition and recombination. Each operation occurs with an active probability. Next, we show these operators adopted in GEP.

Mutation is allowed to occur anywhere in the chromosome and each bit flips with a probability independently at time $t$. In the head of the gene, a function can be flipped into another function or terminal. In the tail, a terminal can only be flipped into another terminal.

In GEP, there are two recombination operators: one-point and two-point. Similar to crossover operators in GA, the two operations act the same way to one-point and two-point crossover operators. In all cases of recombination, both offspring are conserved into successive evolution.

IS transposition Any sequence in the chromosome can become an IS element. The chosen sequence copies itself to any position in the head of a chromosome except the start position. A sequence with the same length is excluded from the end of the head to maintain the structural organization of the gene. For example, there is a parent chromosome whose head length is 10 and tail length is 11.

$$\begin{aligned} \hbox {Q}^{**}+ \hbox {abbb}^{*}\hbox {a}\,\, \hbox {a}\quad \mathbf{bba }\quad \hbox {aaabbab} \end{aligned}$$

Suppose that the sequence “bba” becomes an IS element, which is highlighted, and its target position is 7, as the result of IS transposition, the last three elements “b*a” of the head are deleted. The offspring is as follows:

$$\begin{aligned} \hbox {Q}^{**}+ \hbox {ab}\quad \mathbf{bba}\quad \hbox {b}\quad \hbox {abbaaaabbab} \end{aligned}$$

2.2 ME-GEP algorithm

Although the advantages of a system like GEP are clear from nature, the most important should be emphasized. First, the chromosomes (genotype) are simple entities: linear, compact, relatively small, easy to be manipulated genetically (mutate, recombine, transpose, etc.). Second, Karva language was created to read and express the information encoded in the chromosome (genotype). It always guarantees that a genotype can be converted into a valid phenotype, which allows the implementation of a very powerful set of unconstrained genetic operators acting on genetic space. Clearly, it is that the unconstrained genetic operators highly improving the search capability, also making GEP algorithm difficult to converge. ME-GEP is a kind of GEP algorithm with elitist strategy. Thus, ME-GEP not only has efficient search capability but also ensures the convergence proved by our previous work Du et al. (2010).

ME-GEP algorithm with population size $n(n \ge 1)$ can be described as follows:

Step 1 Initialization: generate randomly an initial subpopulation of $n$ individuals, denote it by $X(0) = (x_1 (0), \ldots ,x_n (0))$, where $x_i (0) \in H$, $i = 1, \ldots ,n$ and let $ t \leftarrow 0 $. The optimal individual in generation 0, denoted by $x_0 (0)$, is then defined as the individual among $x_1 (0), \ldots ,x_n (0)$ with the biggest fitness. The population in generation 0 is defined as $\xi (0) = (x_0 (0), x_1 (0), \ldots ,x_n (0))$, where $x_i (0) \in H$, $i = 0,1, \ldots ,n$.

Step 2 Recombination: generate a new (intermediate) subpopulation by recombination operation based on subpopulation $X(t)$ with probability $p_c$($0 < p_c < 1$) and denote it as $X{}_C(t)$.

Step 3 Transposition: generate a new (intermediate) subpopulation by transposition operation based on subpopulation $X{}_C(t)$ with probability $p_t$($0 < p_t< 1$) and denote it as $X{}_T(t)$.

Step 4 Mutation: generate a new (intermediate) subpopulation by mutation operation based on subpopulation $X{}_T(t)$ with probability $p_m$($0 < p_m< 1$) and denote it as $X{}_{Mu}(t)$.

Step 5 Select $n$ individuals from subpopulation $X{}_{Mu}(t)$ and $X(t)$ according to certain select strategy, where the probability for each individual to be selected is $>$0 and then obtain the next subpopulation $X(t+1)$.

Step 6 Find optimal individual $x_0 (t + 1)$ from subpopulation $X(t+1)$ and optimal individual $x_0 (t )$.

Step 7 Let $t \leftarrow t + 1$; then the next population $\xi (t + 1) = (x_0 (t + 1),x_1 (t + 1), \ldots ,x_n (t + 1))$.

Step 8 If $f(x_0 (t + 1)) = f_{\max }$, then stop; otherwise go to step2.

Remark In this paper, we always assume that all the genetic operations are independent of each other and fitness function is a single-valued function. The individual space is denoted by $H$ and $|H| = N$

3 Markov chain model of ME-GEP

In reference Du et al. (2010), we have proved that both $\{ X(t);t \ge 0\}$ and $\{ \xi (t);t \ge 0\}$ are Homogeneous Markov chains with finite states space $H^n$ and $V = H^{n + 1}$, respectively; this is because subpopulation $X(t)$ only depends on subpopulation $X(t-1)$, and optimal individual $x_0 (t)$ depends on both subpopulation $X(t)$ and optimal individual $x_0 (t-1)$. Hence, population $\xi (t)$ only depends on population $\xi (t-1)$; further, all genetic operators adopted in different generation are fixed.

Since the limit behaviors of Markov chain are determined by its corresponding transition matrices, some definitions and symbols about matrix are stated as below.

3.1 The characteristic of nonnegative matrix

Definition 1

Rudolph (1994) A square matrix $A = [a_{ij} ]_{m \times m}$ is said to be

(1)
nonnegative ($A \ge 0$ ), if $a_{ij} \ge 0$ for all $i,j \in \{ 1,2, \ldots ,m\}$ ,
(2)
positive ($A >0$ ), if $a_{ij}> 0$ for all $i,j \in \{ 1,2, \ldots ,m\}$. A nonnegative matrix $A = [a_{ij} ]_{m \times m}$ is said to be
(3)
primitive if there exists a positive integer $k$ such that $A^k$ is positive,
(4)
stochastic, if $\sum \nolimits _{j = 1}^m {a_{ij} = 1}$ for all $i \in \{ 1,2, \ldots ,m\}$,
(5)
column-allowable, if it has at least one positive entry in each column.

Clearly, the product of stochastic matrices is again a stochastic matrix and that every positive matrix is also primitive.

Apart from providing a smaller Markov chain transition matrix, artificial fitness levels lead to simplified calculations. Next, the artificial fitness level technique will be introduced.

3.2 Artificial fitness levels

Artificial fitness levels (Oliveto et al. 2007) is a kind of partition technique of the search space based on the fitness function. The whole search space $S$ as a set of $|S| = n$ different states is divided into $m<n$ states $A_1,\ldots , A_m$, such that for all points $a \in A_i$ and $b \in A_j$ it happens that $f(a) < f(b)$ if $i<j$. Thus, $A_m$ contains only optimal search points, in such a way the Markov chain can be constructed considering only $m$ different states rather than $n$.

3.3 The transition matrices of ME-GEP

Note that the evolutionary process of ME-GEP can be illustrated by the following formula:

$$\begin{aligned} \xi (t)&= \! (x_0 (t),X(t))\buildrel C \over \longrightarrow \! (x_0 (t),X_C (t))\buildrel T \over \longrightarrow \!\! (x_0 (t),X_T (t))\; \nonumber \\&\buildrel M \over \longrightarrow (x_0 (t),X_{Mu} (t))\buildrel S \over \longrightarrow (x_0 (t),X(t + 1))\; \nonumber \\&\buildrel U \over \longrightarrow \;(x_0 (t + 1),X(t + 1)) = \xi (t + 1), \end{aligned}$$

(1)

where $C$, $T$, $M$, $S$ and $U$ represent recombination, transposition, mutation, select and update operation, respectively.

Let $P^X$ and $P^\xi $ be the transition matrices associated with homogeneous Markov chains $\{ X(t);t \ge 0\}$ and $\{ \xi (t);t \ge 0\}$. Then the properties and relations between $P^X$ and $P^\xi $ will be analyzed in the following. First, some conclusions of $P^X$ can be stated as below.

Lemma 1

For any two chromosomes $x$ and $y$, the mutation probability from $x$ to $y$ is

$$\begin{aligned} P\left\{ x \buildrel M \over \longrightarrow y\right\}&= (1 - p_m )^{h + tal - k_1 - k_2 } p_m ^{k_1 + k_2 } \nonumber \\&\times \,\left( \frac{1}{{|F| + |T|}}\right) ^{k_1 } \left( \frac{1}{{|T|}}\right) ^{k_2 }, \end{aligned}$$

(2)

where $k_1$ and $k_2$ are the numbers of different bits in gene head and tail of $x$ and $y$ separately.

Proof

For $\forall x,y \in H$, suppose that there are $k_1$ bits in head and $k_2$ bits in tail of individual $x$ to be mutated. Each bit in head has $|F| + |T|$ conditions and each bit in tail has $|T|$ conditions; hence the probability that individual $x$ becomes individual $y$ after mutation can be aggregated to

$$\begin{aligned}&P\left\{ x \buildrel M \over \longrightarrow y\right\} \\&\quad = (1 - p_m )^{h + tal - k_1 - k_2 } \left( \frac{{p_m }}{{|F| + |T|}}\right) ^{k_1 } \left( \frac{{p_m }}{{|T|}}\right) ^{k_2 } \; \\&\quad = (1 - p_m )^{h + tal - k_1 - k_2 } p_m ^{k_1 + k_2 } \left( \frac{1}{{|F| + |T|}}\right) ^{k_1 } \left( \frac{1}{{|T|}}\right) ^{k_2 } \\ \end{aligned}$$

$\square $

Lemma 2

Through IS transposition, population in last generation could generate any chromosome with the probability of

$$\begin{aligned} p_{T} = 2p_{t} /((h - 2) \cdot l \cdot (l + 1)) \end{aligned}$$

Proof

Suppose individual $ X= x_1, x_2, \ldots , x_l$ is selected as IS transpose with probability $p_t$. Let the beginning bit of IS string be $f_j ,j = 1,2,\ldots ,l$. There would be $l-j+1$ possibilities of the IS string, and $h-2$ possibilities of its insertion bit, except the first bit and the bit. Thus IS transposition with the beginning bit $f_j$ could generate $(h - 2) \cdot (l-j+ 1)$ kinds of chromosomes. When IS transposition occurs on $X$, there would be

$$\begin{aligned} \sum \limits _{j = 1}^l {(h - 2) \cdot (l - j + 1) = (h - 2) \cdot l \cdot (l + 1)/2} \end{aligned}$$

kinds of chromosomes. So any chromosome could be generated by IS transposition with the probability of $2p_t /((h - 2) \cdot l \cdot (l + 1))$ . $\square $

Lemma 3

$P^X$ is primitive, which can be decomposed in a natural way into a product of stochastic matrices $P^X = C \cdot T \cdot M \cdot S$, where $C = [c_{ij} ]_{N^n \times N^n }$, $T = [t_{ij} ]_{N^n \times N^n }$, $M = [m_{ij} ]_{N^n \times N^n }$ and $S = [s_{ij} ]_{N^n \times N^n }$ are the intermediate transition matrices induced by recombination, transposition, mutation and selection, respectively.

Proof

Crossover, mutation and selection operators may be regarded as random total functions whose domain and range are $H^n$, i.e., each state of $H^n$ is mapped probabilistically to another state since $0 < p_c ,p_m < 1$. Therefore, $C$, $M$ and $S$ are stochastic matrices.

According to Lemma 1, we have

$$\begin{aligned}&\!\!\!\!P\left\{ a \buildrel M \over \longrightarrow b\right\} \\&\quad = (1 - p_m )^{h + tal - k_1 - k_2 } p_m ^{k_1 + k_2 } \left( \frac{1}{{|F| + |T|}}\right) ^{k_1 } \left( \frac{1}{{|T|}}\right) ^{k_2 } \\&\quad \ge (1 - p_m )^{h + tal} p_m ^{h + tal} \left( \frac{1}{{|F| + |T|}}\right) ^h \left( \frac{1}{{|T|}}\right) ^{tal}\!, \end{aligned}$$

where $k_1 \le h,k_2 \le tal.$

Denote the minimal mutation probability by $p_{\min m} = \mathop {\min }\limits _{a,b \in H} p\left\{ a\buildrel M \over \longrightarrow b\right\} $; then

$$\begin{aligned} p_{\min m} \ge (1 {-} p_m )^{h + tal} p_m ^{h + tal} \left( \frac{1}{{|F| + |T|}}\right) ^h \left( \frac{1}{{|T|}}\right) ^{tal}\!>\! 0. \end{aligned}$$

For any $i,j \in H^n $, we have $m_{ij} \ge p_{\min m}^n > 0$. Thus $M$ is positive.

According to selection operation, the probability of each individual to be selected is greater than zero. Then for $\forall i \in H^n $, we have $s_{ii}> 0$. Thus $S$ is column-allowable.

Let $A = [a_{ij} ]_{N^n \times N^n } = T \cdot M$, $Q = [q_{ij} ]_{N^n \times N^n } = C \cdot A$, $P^X = [p_{ij}^X ]_{N^n \times N^n } = Q \cdot S$. Since $C$ and $T$ are stochastic matrices with at least one positive entry in each row, and $M>0$, then, by matrix multiplication $a_{ij} = \sum \nolimits _{k = 1}^{N^n } {t_{ik} m_{kj} } > 0$ for all $i,j \in \{ 1,2, \ldots ,N^n \}$, i.e. $A>0$. Similarly, $q_{ij} = \sum \nolimits _{k = 1}^{N^n } {c_{ik} a_{kj} } > 0$, and hence $p_{ij} ^X = \sum \nolimits _{k = 1}^{N^n } {q_{ik} s_{kj} } > 0$ for all $i,j \in \{ 1,2, \ldots ,N^n \}$, because $S$ is column-allowable. Thus $P^X$ is positive. Since every positive matrix is primitive. $\square $

In order to analyze the properties of transition matrix $P^\xi $, we will use artificial fitness level technique to partition the state space $V = H^{n + 1}$. First, set $H = \{ s_1, s_2, \ldots , s_N \} $ with $f(s_1 ) > f(s_2 ) > \cdots > f(s_N )$; thus $H$ is a totally ordered set, and the elements in $H$ can be ordered by $s_1 \succ s_2 \succ \cdots \succ s_N$. Let

$$\begin{aligned} V_i = \{ (s_i, x_1, \ldots , x_n );\,x_j \in H,j = 1, \ldots n\} , \ \forall \, i = 1, \ldots , N.\nonumber \\ \end{aligned}$$

(3)

It is obvious that $V_1 \cup \cdots \cup V_N = H^{n + 1}$ and $V_1$ is the optimal state set. Then the following facts can be reached immediately:

(1)
By step 2 to 5 of ME-GEP, we know that the optimal individual is not affected by the genetic operators, and the extended transition matrices on $H^{n + 1}$ for recombination $C^{\prime }$, transposition $T^{\prime }$, mutation $M^{\prime }$ and selection $S^{\prime }$ can be written as block diagonal matrices.
$$\begin{aligned} \begin{array}{ll} C^{\prime } = \left( \begin{array}{llll} C &{} &{} &{} \\ &{} C &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} C \\ \end{array}\right) &{} T^{\prime } = \left( \begin{array}{llll} T &{} &{} &{} \\ &{} T &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} T \\ \end{array} \right) \\ M^{\prime } = \left( \begin{array}{llll} M &{} &{} &{} \\ &{} M &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} M \\ \end{array} \right) &{} S^{\prime } = \left( \begin{array}{llll} S &{} &{} &{} \\ &{} S &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} S \\ \end{array} \right) \\ \end{array} \end{aligned}$$

$$\begin{aligned} C^{\prime }T^{\prime }M^{\prime }S^{\prime }&= \left( \begin{array}{llll} {CTMS} &{} &{} &{} \\ &{} {CTMS} &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} {CTMS} \\ \end{array} \right) \\&= \left( \begin{array}{llll} {P^X } &{} &{} &{} \\ &{} {P^X } &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} {P^X } \\ \end{array} \right) \end{aligned}$$
(2)
Update operation of step 6 is only to update the optimal individual. And transition matrix $U$ can be written by matrices $U_{rk} (1 \le r,k \le N)$ as follows:
$$\begin{aligned} U = \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {U_{11} } &{} {U_{12} } &{} \cdots &{} {U_{1N} } \\ {U_{21} } &{} {U_{22} } &{} \cdots &{} {U_{2N} } \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {U_{N1} } &{} {U_{N2} } &{} \cdots &{} {U_{NN} } \\ \end{array} \right) , \end{aligned}$$
where $U_{ij} (1 \le i,j \le N)$ are $N \times N$ matrices.
(3)
Transition matrix $P^\zeta = (C^{\prime }T^{\prime }M^{\prime }S^{\prime }) \cdot U$.

Some useful properties of $P^\zeta $ can be obtained by analyzing matrix $U$. Let $n_k^{} = |\{ (x_1, \ldots x_n ):\mathop {\max }\limits _{1 \le j \le n} f(x_j ) = f(s_k )\} |$, that is, $n_k$ is the number of those subpopulations with best individual $s_k$. Then we have

Lemma 4

For each subspace $V_i = \{ (s_i, x_1, \ldots , x_n ){:}\,x_j \in H,j = 1, \ldots n\} ,\quad \forall \, i = 1, \ldots ,N$ in $H^{n + 1}$, we have $\sum {_{k = 1}^N } n_k^{} = N^n $ and $n_k^{} = (N - k + 1)^n - (N - k)^n$, where $k = 1,\ldots ,N$.

Proof

According to the definition of $V_i$, we can know immediately that the size of each $V_i$ is equal to $N^n$; thus it is clear that

$$\begin{aligned} \sum _{k = 1}^N n_k^{} = N^n. \end{aligned}$$

In subspaces $V_i$, if subpopulation $(x_1 , \ldots ,x_n )$ satisfies $\mathop {\max }\limits _{1 \le i \le n} f(x_i ) < f(s_k ),$ then subpopulation $(x_1 , \ldots ,x_n )$ cannot contain individuals $s_1 ,s_2 ,\ldots $, $s_{k - 1}$. Thus, there are $(N - k + 1)^n$ states which do not contain individuals $s_1 ,s_2 , \ldots ,$ $s_{k - 1}$. Likewise, there exist $(N - k )^n$ states that do not contain individuals $s_1 ,s_2 , \ldots ,s_k$. So we have

$$\begin{aligned} n_k = (N - k + 1)^n - (N - k)^n. \end{aligned}$$

$\square $

Lemma 5

$U$ Matrix has the following properties:

(1)
In each row of matrix $U$, there is exactly one entry equaling to 1, all others are 0, and $U_{11}$ is a unit matrix.
(2)
$U_{rk} = 0(r < k,1 \le r < N)$.
(3)
$U_{rk} (2 \le r \le N,1 \le k \le r - 1)$ are diagonal matrices such that $n_k^{} $ elements are 1 in primary diagonal line and the remaining elements are 0.
(4)
$U_{rr} (2 \le r \le N)$ are diagonal matrices such that $\sum \nolimits _{i = r}^N {n_i^{} }$ elements are 1 in primary diagonal line and the remaining elements are 0.

Proof

(1)
The upgrade operation is represented by an upgrade matrix $U$. Let $b(i)$ be the best individual of the subpopulation $(x_1 , \ldots ,x_n )$ at any state $i$ and write $b(i) = \arg \max \{ f(x_i ){:}\,i = 1, \ldots ,n\}$. Then $u_{ij} = 1$ if $f(x_0 ) < f(b(i))$ with $j\mathop = \limits ^\Delta ( b(i),x_1 , \ldots ,x_n) \in V $, $i =( x_0 ,x_1 , \ldots ,x_n) \in V$; otherwise $u_{ii} = 1$, $u_{ij} = 0(j \ne i) $. That is, there is exactly one entry equal to 1, and all others are 0 in each row of matrix $U$. Notice that subpopulation $(x_1 , \ldots ,x_n )$ is unchanged by upgrade operation. In other words, a state either gets upgraded or remains unaltered. So there is only one in primary diagonal line for each sub-matrix $U_{ij} (1 \le i,j \le N)$. In particular, the optimal individual of subspace $V_1$ is the global optimal solution. That is $f(x^ * ) \ge f(b(i))$. Then we have $u_{ii}= 1$,$u_{ij}= 0(j \ne i)$ according to (1). So $U_{11}$ is a unit matrix.
(2)
The state of population remains unaltered when $r < k$, that is $s_r \succ s_k$. Then we have $u_{ij}= 0(j \ne i)$. Thus $U_{rk}= 0(r < k,1 \le r < N)$.
(3)
There is $s_k \succ s_r$ when $1 \le k \le r - 1$. The total state number with the best individual $s_k$ in $V_k$ is $n_k$ according to Lemma 4. In addition, for any diagonal matrix $U_{rk}(2 \le r \le N,2 \le k \le r)$, there are $n_k$ elements whose value equal to 1, while others equal to 0 according to (1).
(4)
There is $s_k=s_r$ when $2 \le k =r \le N$. The total state number with the best individual $s_k$ in $V_r$ is $n_k$. The best individual can be $s_r$, $s_{r+1}$, $ \ldots $ , $s_N$. Then $U_{rr} (2 \le r \le N)$ are diagonal matrices such that $n_r^{}+ n_{r + 1}^{}+ \cdots + n_N^{}$ elements are 1 in primary diagonal line and the remaining elements are 0.$\square $

Based on the above, the transition matrix $P^\xi $ becomes

$$\begin{aligned} P^\xi&= \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {P^X } &{} &{} &{} \\ &{} {P^X } &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} {P^X } \\ \end{array} \right) \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {U_{11} } &{} &{} &{} \\ {U_{21} } &{} {U_{22} } &{} &{} \\ \vdots &{} \vdots &{} \ddots &{} \\ {U_{N1} } &{} {U_{N2} } &{} \cdots &{} {U_{NN} } \\ \end{array} \right) \nonumber \\&= \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {P^X U_{11} } &{} &{} &{} \\ {P^X U_{21} } &{} {P^X U_{22} } &{} &{} \\ \vdots &{} \vdots &{} \ddots &{} \\ {P^X U_{N1} } &{} {P^X U_{N2} } &{} \cdots &{} {P^X U_{NN} } \\ \end{array} \right) = \left( \begin{array}{l@{\quad }l} Y &{} 0 \\ R &{} T \\ \end{array} \right) \nonumber \\ \end{aligned}$$

(4)

where

$$\begin{aligned}&R = \left( \begin{array}{l} {P^X U_{21} } \\ {P^X U_{31} } \\ \vdots \\ {P^X U_{N1} } \\ \end{array} \right) , T = \left( \begin{array}{lll} {P^X U_{22} } &{} &{} \\ \vdots &{} \ddots &{} \\ {P^X U_{N2} } &{} \cdots &{} {P^X U_{NN} } \\ \end{array} \right) , R \ne 0 , T \ne 0,\\&Y = P^X U_{11} = P^X > 0. \end{aligned}$$

According to the properties of absorbing Markov chain (Iosifescu 1980) , we have

$$\begin{aligned} P^\xi = \begin{array}{ll}&\begin{array}{c@{\quad }c@{\quad }c} V_1\quad V{\backslash } V_1 \end{array}\\ \begin{array}{l} {V_1 } \\ {V{\backslash } V_1 } \\ \end{array}&{} \left( \begin{array}{c@{\quad }c@{\quad }c} Y &{}\quad 0 \\ R &{}\quad T \\ \end{array} \right) \end{array} \end{aligned}$$

(5)

where $Y$ concerns the transition from one absorbing state to the other, the region 0 consists entirely of 0’s, the matrix $R$ concerns the transition from transient state to absorbing one, and the matrix $T$ concerns the transition from transient to transient states. The matrix $Y$ corresponds to the fact that once the Markov chain reaches an absorbing state, it will never leave that absorbing state.

Let $P^\xi =\left( p_{ij}^\xi ;i,j \in N^{n + 1} \right) $, $P^X= \left( p_{ij}^X ;i,j \in N^n \right) $ where $p_{ij}^\xi $, $p_{ij}^X$ are the transition probability from state $i$ to stat $j$). According to Lemma 5, we know that matrices $U_{rk}(2 \le r \le N,1 \le k \le r - 1) $ and matrices $ U_{rr} (2 \le r \le N)$ can be written as follows:

$$\begin{aligned} U_{rk} = \left( \begin{array}{lll} {\overbrace{\begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}}^{n_1 }} &{} &{} \\ &{} {\overbrace{\begin{array}{lll} 1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 1 \\ \end{array}}^{n_k }} &{} \\ &{} &{} {\begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}} \\ \end{array} \right) \end{aligned}$$

$$\begin{aligned} U_{rr} = \left( \begin{array}{lll} {\overbrace{\begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}}^{\sum \nolimits _{i = 1}^{r - 1} {n_i } }} &{} \\ &{} {\overbrace{\begin{array}{lll} 1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 1 \\ \end{array}}^{\sum \nolimits _{i = r}^N {n_i}}} \\ \end{array} \right) \end{aligned}$$

Then we have

$$\begin{aligned} P^X U_{21}&= \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {p_{11}^X } &{} {p_{12}^X } &{} {\cdots } &{} {p_{1N^n }^X } \\ {p_{21}^X } &{} {p_{22}^X } &{} \cdots &{} {p_{2N^n }^X } \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ {p_{N^n 1}^X } &{} {p_{N^n 2}^X } &{} \cdots &{} {p_{N^n N^n }^X } \\ \end{array} \right) \cdot \left( \begin{array}{l} \overbrace{\begin{array}{l@{\quad }l@{\quad }l} 1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 1 \\ \end{array}}^{n_1} \begin{array}{lll} 0 &{} \cdots &{} 0 \\ \cdots &{} &{} \\ 0 &{} \cdots &{} 0 \\ \end{array} \\ \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} 0 &{} \cdots &{} 0 &{} 0 &{} \cdots &{} 0 \\ 0 &{} \cdots &{} 0 &{} 0 &{} \cdots &{} 0 \\ \end{array} \\ \end{array} \right) = \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l@{\quad }l@{\quad }l} {p_{11}^X } &{} \cdots &{} {p_{1n_1 }^X } &{} 0 &{} \cdots &{} 0 \\ {p_{21}^X } &{} \cdots &{} {p_{2n_1 }^X } &{} 0 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} {} &{} {} &{} \vdots &{} \vdots \\ {p_{N^n 1}^X } &{} \cdots &{} {p_{N^n n_1 }^X } &{} 0 &{} \cdots &{} 0 \\ \end{array} \right) \end{aligned}$$

(6)

$$\begin{aligned} P^X U_{32}&= \left( \begin{array}{llll} {p_{11}^X } &{} {p_{12}^X } &{} {\cdots } &{} {p_{1N^n }^X } \\ {p_{21}^X } &{} {p_{22}^X } &{} \cdots &{} {p_{2N^n }^X } \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ {p_{N^n ,1}^X } &{} {p_{N^n ,2}^X } &{} \cdots &{} {p_{N^n ,N^n }^X } \\ \end{array} \right) \cdot \left( \begin{array}{lll} {\overbrace{\begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}}^{n_1 }} &{} &{} \\ &{} {\overbrace{\begin{array}{lll} 1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 1\\ \end{array}}^{n_2 }} &{} \\ &{} &{} \begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}\\ \end{array} \right) \; = \left( \begin{array}{lllllllll} 0 &{} \cdots &{} 0 &{} \quad {p_{1,n_1 + 1}^X } &{} \cdots &{} {p_{1,n_1 + n_2 }^X } &{} 0 &{} \cdots &{} 0 \\ \vdots &{} &{} \vdots &{} \quad \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots \\ \vdots &{} &{} \vdots &{} &{} &{} &{} &{} &{} \\ \vdots &{} &{} \vdots &{} \quad \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots \\ 0 &{} &{} 0 &{} \quad {p_{N^n ,n_1 + 1}^X } &{} &{} {p_{N^n ,n_1 + n_2 }^X } &{} 0 &{} \cdots &{} 0 \\ \end{array} \right) \nonumber \\ \end{aligned}$$

(7)

$$\begin{aligned} P^X U_{22}&= \left( \begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {p_{11}^X } &{} {p_{12}^X } &{} {\cdots } &{} {p_{1N^n }^X } \\ {p_{21}^X } &{} {p_{22}^X } &{} \cdots &{} {p_{2N^n }^X } \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ {p_{N^n ,1}^X } &{} {p_{N^n ,2}^X } &{} \cdots &{} {p_{N^n ,N^n }^X } \\ \end{array} \right) \cdot \left( \begin{array}{lll} {\overbrace{\begin{array}{lll} 0 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 0 \\ \end{array}}^{n_1 }} &{} \\ &{} {\overbrace{\begin{array}{lll} 1 &{} &{} \\ &{} \ddots &{} \\ &{} &{} 1 \\ \end{array}}^{\sum \nolimits _{i = 2}^N {n_i } }} \\ \end{array} \right) = \left( \begin{array}{lllllllll} 0 &{} \cdots &{} 0 &{} \quad {p_{1,n_1 + 1}^X } &{} \cdots &{} {p_{1,n_1 + n_2 }^X } &{} {p_{1,n_1 + n_2 + 1}^X } &{} \cdots &{} {p_{1,N^n }^X } \\ \vdots &{} &{} \vdots &{} \quad \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots \\ \vdots &{} &{} \vdots &{} &{} &{} &{} &{} &{} \\ &{} &{} &{} &{} &{} &{} &{} &{} \\ \vdots &{} &{} \vdots &{} \quad \vdots &{} &{} \vdots &{} \vdots &{} &{} \vdots \\ 0 &{} &{} 0 &{} \quad {p_{N^n ,n_1 + 1}^X } &{} &{} {p_{N^n ,n_1 + n_2 }^X } &{} {p_{N^n ,n_1 + n_2 + 1}^X } &{} \cdots &{} {p_{N^n ,N^n }^X } \\ \end{array} \right) \nonumber \\ \end{aligned}$$

(8)

The matrix $R$ contains $N$ sub-matrices similar to above sub-matrix $P^X U_{21}$. The matrix $T$ contains two kinds of sub-matrices: $ P^X U_{rk} (2 \le r \le N,1 \le k \le r - 1)$ and $P^X U_{rr} (2 \le r \le N)$; as example $P^X U_{32}$ and $P^X U_{22}$ can be written as Eqs. (7) and (8).

4 The average time complexity of ME-GEP

In this section, EFHT is used to estimate the average time complexity of ME-GEP based on the Markov Chain Model and transition matrices of ME-GEP shown above.

The first hitting time (FHT) of ME-GEP is the time that ME-GEP finds the optimal solution for the first time, and the expected first hitting time (expected FHT) is the average time that ME-GEP requires to find the optimal solution, which implies the average computational time complexity of ME-GEP.

The first hitting time on $V_1$ can be defined by

$$\begin{aligned} \tau (V_1 ) = \min \left\{ k;k \ge 0,\xi (k) \in V_1 \right\} \end{aligned}$$

(9)

Let $E$ denote the expectation operators. The first hitting time $\tau (V_1 )$ is a random variable, and what we are interested is its expectation value $E[\tau (V_1 )]$.

Hence, we can obtain the following conclusion of EFHT of ME-GEP:

Theorem 1

Let $\beta = \min \{ p^X _{ij} ;i,j \in N^n \}$; then the EFHT of ME-GEP is as follows:

Proof

Note that $P\{ \tau (V_1 ) \ge 0\} = 1 $, $P\{ \tau (V_1 ) \ge 1\} = P\{ \xi (0) \notin V_1 \}$.

By Markov property of $\{ \xi (t);t \ge 0\}$, for any $l \ge 2$, we have

$$\begin{aligned}&\!\!\!P\{ \tau (V_1 ) \ge l\}\\&= P\{ \xi (0) \notin V_1 ,\xi (1) \notin V_1 , \cdots ,\xi (l - 1) \notin V_1 \} \\&= P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {P\{ \xi (k + 1) \notin V_1 |\xi (k) \notin V_1 \} } \\&= P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {\frac{{P\{ \xi (k + 1) \notin V_1 ,\xi (k) \notin V_1 \} }}{{P\{ \xi (k) \notin V_1 \} }}} \\&= P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {\frac{{\sum \nolimits _{i,j \notin V_1 } {P\{ \xi (k + 1) = j|\xi (k) = i\} P\{ \xi (k) = i\} } }}{{\sum \nolimits _{s \notin V_1 } {P\{ \xi (k) = s\} } }}} \\&= P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {\frac{{\sum \nolimits _{i \notin V_1 } {[P\{ \xi (k) = i\} \cdot \sum \nolimits _{j \notin V_1 } {p_{ij}^\xi } ]} }}{{\sum \nolimits _{s \notin V_1 } {P\{ \xi (k) = s\} } }}}. \\ \end{aligned}$$

Then for any $i \notin V_1$, according to Eqs. (6)–(8), we have

$$\begin{aligned} (N^n - n_1 )\beta \le \sum \limits _{j \notin V_1 } {p_{ij}^\xi } \le 1 - n_1 \beta . \end{aligned}$$

So that for any $l \ge 2$, we have

$$\begin{aligned} P\{ \tau ( V_1 ) \ge l\}&\ge P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {(N^n - n_1 )\beta } \\&\ge P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {(N^n - n_1 )\beta } \; \\&\ge P\{ \xi (0) \notin V_1 \} \{ (N^n - n_1 )\beta \} ^{l - 1} \; \\ P\{ \tau ( V_1 )&\ge l\} \le P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l - 2} {(1 - n_1 \beta )} \; \\&\le P\{ \xi (0) \notin V_1 \} \prod \limits _{k = 0}^{l {-} 2} {(1 {-} (N^n {-} (N {-} 1)^n )\beta } )\; \\&\le P\{ \xi (0) \notin V_1 \} \{ 1 {-} (N^n {-} (N {-} 1)^n )\beta \} ^{l {-} 1} \; \\ \end{aligned}$$

Note that $0 < n_1 \beta < 1$, $0 < (N^n - n_1 )\beta < 1$. Hence

$$\begin{aligned} E[\tau (V_1 )]&= \sum \limits _{l \ge 1} {P\{ } \tau (V_1 ) \ge l\} = P\{ \xi (0) \notin V_1 \} \\&+ \sum \limits _{l \ge 2} {P\{ } \tau (V_1 ) \ge l\} \end{aligned}$$

So the upper bound for $ E[\tau (V_1 )]$ is

$$\begin{aligned} E[\tau (V_1)]&= \sum \limits _{l \ge 1} {P\{ } \tau (V_1 ) \ge l\} \\&\le P\{ \xi (0) \notin V_1 \} + P\{ \xi (0) \notin V_1 \} \sum \limits _{l = 2}^{ + \infty } \\&\{ 1 - (N^n - (N - 1)^n )\beta \} ^{l - 1} \\&\le P\{ \xi (0) \notin V_1 \} \Bigg \{ 1 + \sum \limits _{l = 2}^{ + \infty }\\&\times \left\{ 1 - (N^n - (N - 1)^n )\beta \right\} ^{l - 1} \Bigg \} \\ \end{aligned}$$

The lower bound for $ E[\tau (V_1 )]$ is

$$\begin{aligned} E[\tau (V_1 )]&= \sum \limits _{l \ge 1} {P\{ } \tau (V_1 ) \ge l\} \\&\ge P\{ \xi (0) \notin V_1 \} \\&+\,\, P\{ \xi (0) \notin V_1 \} \sum \limits _{l = 2}^{ + \infty } {\{ (N^n - n_1 )\beta \} ^{l - 1} } \\&\ge P\{ \xi (0) \notin V_1 \} \{ 1 + \sum \limits _{l = 2}^{ + \infty } {\{ (N^n - n_1 )\beta \} ^{l - 1} } \} \\&\ge P\{ \xi (0) \notin V_1 \} \frac{1}{{1 - (N^n - n_1 )\beta }} \\ \end{aligned}$$

That is,

$$\begin{aligned}&P\{ \xi (0) \notin V_1 \} \frac{1}{{1 - (N^n - n_1 ) \cdot \beta }} \le E[\tau (V_1 )] \\&\quad \le P\{ \xi (0) \notin V_1 \} \frac{1}{{(N^n - (N - 1)^n )\beta }} \\&P\{ \xi (0) \notin V_1 \} \frac{1}{{1 - (N^n - n_1 ) \cdot \beta }} \\&\quad \le E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n )\beta }} \\ \end{aligned}$$

$\square $

In order to make the EFHT associated with parameters of ME-GEP directly, Corollary 1 is given.

Corollary 1

Let $p_{\min t} = \mathop {\min }\limits _{\forall a,b \in H} p\left\{ a\buildrel T \over \longrightarrow b\right\} $, $p_{\min s} = \mathop {\min }\limits _{\forall a \in H} p\left\{ a\buildrel S \over \longrightarrow a\right\} $, $ p_{\min m} = \mathop {\min }\limits _{\forall a,b \in H} p\{ a\buildrel M \over \longrightarrow b\}$ ; then we have

$$\begin{aligned} E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min t}^n \cdot p_{\min m}^n \cdot p_{\min s}^n }} \end{aligned}$$

Proof

For any $i,\delta ,\eta ,k,j \in H^n$, we have

$$\begin{aligned} \sum \limits _{i,\delta \in H^n } {p_{i\delta }^C } = 1, p_{\delta \eta }^T \ge p_{\min t}^n,p_{\eta k}^M \ge p_{\min m}^n,p_{jj}^S \ge p_{\min s}^n. \end{aligned}$$

Then we have

$$\begin{aligned} p^X _{ij}&= \;P(X(t + 1) = j|X(t) = i)\nonumber \\&= \sum \limits _{\eta ,\delta ,k \in H^n } {p_{i\delta }^C } \cdot p_{\delta \eta }^T \cdot p_{\eta k}^M \cdot p_{kj}^S \\&\ge \sum \limits _{k = j} {\sum \limits _{\;\;\delta ,\eta \in H^n } {p_{i\delta }^C \cdot p_{\delta \eta }^T \cdot p_{\eta k}^M \cdot p_{kj}^S } } \\&\ge p_{jj}^S \cdot p_{\min t}^n \cdot p_{\min m}^n \cdot \sum \limits _{\delta \in H^n } {p_{i\delta }^C }\nonumber \\&\ge p_{\min s}^n \cdot p_{\min t}^n \cdot p_{\min m}^n \\ \end{aligned}$$

So we get

$$\begin{aligned} \beta \ge p_{\min t}^n \cdot p_{\min m}^n \cdot p_{\min s}^n \end{aligned}$$

By Theorem 1, we have

$$\begin{aligned} E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min t}^n \cdot p_{\min m}^n \cdot p_{\min s}^n }} \end{aligned}$$

$\square $

Theorem 1 estimates the upper and lower bounds of EFHT of ME-GEP algorithm. Further, Corollary 1 makes the upper bound associate with parameters of ME-GEP algorithm directly. Thus, Theorem 1 and Corollary 1 can be used to direct algorithm design of ME-GEP.

5 Case study

In order to justify our theoretical results, the EFHT of ME-GEP for solving function modeling and clustering problems are analyzed below.

To guarantee the validity of results, each execution of each algorithm on each parameter set was independently repeated 50 times.

5.1 Function modeling problem

5.1.1 The description of function modeling problem

Function modeling problem can be described as follows: given a group of datum $(x_1^i ,x_2^i , \ldots ,x_m^i ,y^i ),i = 1,2, \ldots k$, find a function $\varphi (x_1 ,x_2 , \ldots ,x_m ) $ in a certain function set $\Phi (\bar{x}) $, $\bar{x} = (x_1 , \ldots ,x_m ) $, which satisfies

$$\begin{aligned}&\mathop {Min}\limits _{g(\bar{x}) \in \Phi (\bar{x})} \sum \limits _{i = 1}^k \left( g\left( x_1^i,x_2^i , \ldots ,x_m^i \right) - y^i \right) ^2 \\&\quad = \sum \limits _{i = 1}^k \left( \varphi \left( x_1^i ,x_2^i , \ldots ,x_m^i \right) - y^i \right) ^2, \end{aligned}$$

where $g(\bar{x})$ is an arbitrary function in set $\Phi (\bar{x}) $.

Next, the function modeling about the amount of gas emitted from coalfaces (Xin et al. 2006) is used as an example to be described as follows. The amount of gas emitted from coalfaces in coal mine is the main unsafe factor during the production of coal mine. However, this amount is hard to predict because it is affected by many factors and has complicated nonlinear relations with these factors. In Table 1, the sampling cases are given. There are six main factors as input variables: the depth of coalface $x_1$, the height of coalface $x_2$, the amount of coalfaces $x_3$, the layer interval between working coalface and neighboring coalface $x_4$, average daily advance of working face $x_5$ and average daily output $x_6$. The amount of gas emitted from coalfaces $y$ is the output variable. By ME-GEP, the function can be found to model accurately the nonlinear relations between input variables and $y$.

Table 1 Sample cases of gas emitted from coalfaces

Full size table

5.1.2 The EFHT of ME-GEP for solving the amount of gas emitted from coalfaces

The parameters of ME-GEP algorithm are designed as follows: Let $n = 80$, $h = 15$ then $tal = 16$, $l = 31$, $F = \{ + , - ,*,/,\mathrm{{sqrt, exp,}}\sin ,\cos ,\ln \}$, $T = \{ R,x\}$ where $R$ is a set of random constants, and we assume that $R$ has 20 elements. The genetic operators adopted are one-point recombination, IS transposition, selection and mutation operators. Especially, mutation operators include uniform mutation that flips independently each bit with probability $p_m$ and one-point mutation that flips one bit with probability $p_m$ randomly. Each individual is selected with same probability in selection strategy. Next, the EFHT of ME-GEP with two different mutation operators is analyzed below.

The fitness (Ferreira 2001) of ME-GEP is evaluated by Eq. (10).

$$\begin{aligned} f_i = \max \left( M - \sum \limits _{j = 1}^{k} {\left( {\left| {\frac{{C_{ij} - y_j }}{{y_j }}} \right| \cdot {\text{ factor }}} \right) } ,0 \right) . \end{aligned}$$

(10)

where $k$ is the number of sampling, and $M$ is the range of selection. For this example, let $M= 1{,}000$, factor $=M/k \!\approx \! 56$, $C_{ij}$ be the value returned by the individual $i$ for sample case $j$, and $y_j$ be the target value for sample case $j$.

Applying Theorem 1 and Corollary 1, we have

$$\begin{aligned} E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min t}^n \cdot p_{\min m}^n \cdot p_{\min s}^n }} \end{aligned}$$

Let $p_{\min m1}$ and $p_{\min m2}$ be the minimum probabilities when an individual $x$ becomes individual $y$ after uniform mutation and one-point mutation, respectively. Next, we will estimate $N$, $p_{\min t}$, $p_{\min m1}$, $p_{\min m2}$ and $p_{\min s}$.

(1) The size of individual space $N$.

Denote the tree height of phenotype by variable $ht$. Because the individual length is 31, we have $ht \le 5$. It follows that

$$\begin{aligned} N = |H| \le 9^{(2^0 + 2^1 + 2^2+ 2^3 )} \times (|R| + 6)^{2^4 } = 9^{15} \times 26^{16} \end{aligned}$$

(2) $p_{\min m1}$

According to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&= (1 - p_m )^l \left( {{p_m } \over {1 - p_m }}\right) ^{k_1 + k_2 }\\&\times \left( {1 \over {|F| + |T|}}\right) ^{k_1 } \left( {1 \over {|T|}}\right) ^{k_2 } \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^l \left( {{p_m } \over {1 - p_m }}\right) ^l \left( {1 \over {|T| + |F|}}\right) ^l \left( {1 \over {|T|}}\right) ^l \\&= \left( {{p_m } \over {(|T| + |F|) \cdot (|T|)}}\right) ^l = \left( {{p_m } \over {910}}\right) ^{31} \end{aligned}$$

It is easy to find that the minimum probability that any individual $x$ mutates to another individual $y$ is $\left( \frac{{p_m }}{{910}}\right) ^{31}$ where $p_m \in (0,0.5)$. Thus, we have $p_{\min m1} \ge \left( \frac{{p_m }}{{910}}\right) ^{31}$.

(3) $p_{\min m2}$

There are two situations for one-point mutation. One is that one bit flips in head. The other is that one bit flips in tail.

For the first situation according to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&= (1 - p_m )^{l-1} {p_m \over {|F| + |T|}}\\&= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over {|F| + |T|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\{ x\buildrel M \over \longrightarrow y\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} \left( {1 \over {|T| + |F|}}\right) \\&={{p_m }^{l} \over {|T| + |F|}} = {{p_m }^{31} \over {35}} \end{aligned}$$

For the second situation according to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\} = (1 - p_m )^{l-1} {p_m \over {|T|}}= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over { |T|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} \left( {1 \over {|T| }}\right) \\&={{p_m }^{l} \over {|T| }} = {{p_m }^{31} \over {26}} \end{aligned}$$

From above analysis, the minimum probability that any individual $x$ mutates to another individual $y$ is ${{p_m }^{31} \over {35}}$ where $p_m \in (0,0.5)$. Thus, we have $p_{\min m2} \ge { {p_m }^{31}\over {35}}$.

(4) $p_{\min t}$

According to Lemma 2, for $\forall x,y \in H$, we have

$$\begin{aligned} P\left\{ x\buildrel T \over \longrightarrow y\right\}&= {2 \over {(h - 2) \cdot l \cdot (l + 1)}} \cdot p_t \\&> {2 \over {(l - 2) \cdot l \cdot (l + 1)}} \cdot p_t > {2 \over {l^2 (l + 1)}} \cdot p_t \end{aligned}$$

Thus, we have

$$\begin{aligned} p_{\min t}^{} ={2 \over {l^2 (l + 1)}} \cdot p_t= {1\over { {15376} }} \cdot p_t \end{aligned}$$

(5) $p_{\min s}$

According to selection operator, obviously, we have $p_{min s} ={\textstyle {1 \over 2n}}$.

According to Corollary 1, the EFHT of ME-GEP with uniform mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min t}^n \cdot p_{\min m1}^n \cdot p_{\min s}^n }} \nonumber \\&< {{2^{80} \cdot 80^{80} \cdot 910^{2480} \cdot 15376^{80} } \over {\left( (9^{15} {\times } 26^{16} )^{80} - (9^{15} {\times } 26^{16} - 1)^{80} \right) \cdot p_m^{2480} \cdot p_t^{80} }}\nonumber \\ \end{aligned}$$

(11)

and the EFHT of ME-GEP with one-point mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{\left( N^n - (N - 1)^n \right) \cdot p_{\min t}^n \cdot p_{\min m2}^n \cdot p_{\min s}^n }} \nonumber \\&< {{2^{80} \cdot 80^{80} \cdot 35^{80} \cdot 15376^{80} } \over {\left( (9^{15} {\times } 26^{16} )^{80} - (9^{15} {\times } 26^{16} - 1)^{80} \right) \cdot p_m^{2480} \cdot p_t^{80} }}\nonumber \\ \end{aligned}$$

(12)

5.1.3 Numerical experiment

From above formulas (11) and (12), the EFHT of ME-GEP can be decreased by increasing mutation probability and transposition probability.

The following experiments to observe the actual average first hitting time of ME-GEP, especially considering different mutation operators, are done:

(1)
When IS transposition probability is 0.1, uniform and one-point mutation probabilities are changed;
(2)
When uniform and one-point mutate probabilities are 0.04, the IS transposition probability is changed.

The results obtained are shown in Figs. 2 and 3. The EFHT of ME-GEP decreased with the increasing of one-point and uniform mutation probabilities when IS transposition probability was 0.1. And the EFHT of ME-GEP decreased with the increasing of IS transposition probability when one-point and uniform mutation probabilities were 0.04. In this example, the experimental results conformed to formulas (11) and (12).

5.2 Clustering problem

5.2.1 The description of clustering problem

Clustering problem can be described as follows: given a set of observations $(P_\mathrm{{1}}, \ldots , P_\mathrm{{m}})$, where each observation is a $ d$-dimensional real vector, partition the $m$ observations into $q(\le m)$ sets $C=\{C_\mathrm{{1}}, \ldots ,C_\mathrm{{q}}\}$, which satisfies

Minimize $\sum \limits _{i = 1}^q {\sum \limits _{P_j \in C_i } {||P_j - \mu _i ||^2 } }$, where $\mu _i$ is the mean of points in $C_\mathrm{{j}}$.

In ME-GEP for solving clustering problems, two special clustering operators, namely ‘$\cup $’ and ‘$\cap $’, are introduced as binary operators used to define aggregation and segmentation of clusters respectively.

Let $O_1$ and $O_2$ be the centers of cluster $C_1$ and $C_2$.

(1)
The aggregation operator ‘$\cap $’ is defined as $O_1 \cap O_2=$ centroid ($O_1$, $O_2$), where the return of function centroid is the center vector of $O_1$ and $O_2$. And $C_1$ and $C_2$ are aggregated into a cluster $C$ with the center $O$ $=$ centroid $(O_1, O_2)$.
(2)
The segmentation operator $'\cup '$ is defined as $O_1 \cup O_2$ = $\{O_1, O_2\}$. $C_1$ and $C_2$ are two distinct clusters, and they still keep the segmentation, whose centers of the clusters are $O_1$ and $O_2$, respectively.

The chromosome is also composed of head and tail. The function set is $F = \{ \cup , \cap \}$, and the terminals set is $T=\{ x_1, x_2, \ldots , x_i \}$, where $i \in (1,n)$. Here, the symbol $x_i$ represents the center of a cluster in the data set.

Based on above definitions, the clusters are easily defined in the chromosome of ME-GEP framework.

The main procedure of ME-GEP for solving clustering problem is described as follows. The detailed content can be got in reference (Chen et al. 2007):

Step 1 Initialize the population;

Step 2 Select and aggregate cluster centers by coding and decoding expression trees;

Step 3 Calculate the Euclidean distances between each data object and cluster centers and tag the corresponding data object;

Step 4 Update the cluster centers and compute the fitness of each individual;

Step 5 Preserve the best individual to the next generation;

Step 6 Select individuals with same probability;

Step 7 Mutate and recombination with a certain probability;

Step 8 Generate a new population, and if the optimal solution has been found, then turn to step 9; otherwise, turn to step 2;

Step 9 Select the best individual;

Step 10 Auto-clustering algorithm of best individual;

Step 11 Output the result of clusters.

The EFHT of ME-GEP (Chen et al. 2007) for solving two actual clustering examples of Iris and Yeast (benchmark databases) (Zhou et al. 2003) with different data dimensions and sizes is analyzed as follows:

To solve clustering problems in examples, the parameters of ME-GEP are designed as Table 2. The genetic operators adopted include one-point recombination, selection with equal probability, uniform and one-point mutation operators.

Table 2 Parameters set of ME-GEP for clustering two benchmark databases

Full size table

The fitness of ME-GEP algorithm for solving the clustering problem is evaluated by Eq. (13).

$$\begin{aligned} f = \frac{1}{\sum \limits _{i = 1}^q {\sum \limits _{P_j \in C_i } {||P_j - \mu _i ||^2 } }}, \;\;\; \mu _i= {\frac{1}{{n_i }}}\sum \limits _{P_j \in C_i } {P_j }, \end{aligned}$$

(13)

where $q$ is the number of clusters, $C_i$ is the $ith$ cluster, $P_j$ is the observation point of cluster $C_i$ and $\mu _i$ is the center of $ith$ cluster. The bigger the individual fitness, the more compact and independent the cluster, namely the corresponding individual is better.

5.2.2 The EFHT of ME-GEP for sloving clustering Iris database

The parameters of ME-GEP for clustering Iris database are as follows: Let $h = 4$; then $tal = 5$, $l = 9$. $T=\{x_1,x_2, x_3, x_4, x_5\}$.

Applying Theorem 1 and Corollary 1, we have

$$\begin{aligned} E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m}^n \cdot p_{\min s}^n }} \end{aligned}$$

Let $p_{\min m1}$ and $p_{\min m2}$ be the minimum probabilities when an individual $x$ changes to individual $y$ after uniform mutation and one-point mutation, respectively. Next, we will evaluate and estimate $N$, $p_{\min m1}$, $p_{\min m2}$ and $p_{\min s}$.

(1) The size of individual space $N$.

Denote the tree height of phenotype by variable $ht$. Because the individual length is 9, we have $ht \le 4$. It follows that

$$\begin{aligned} N = |H| \le 2^{(2^0 + 2^1 + 2^2 )} \times \mathrm{{5}}^{\mathrm{{2}}^\mathrm{{3}} } \mathrm{{ = 2}}^\mathrm{{7}} \times \mathrm{{5}}^{\mathrm{{8}}} \end{aligned}$$

(2) $p_{\min m1}$

According to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y \right\} = (1 - p_m )^l \!\!\left( {{p_m } \over {1 - p_m }}\right) ^{k_1 + k_2 }\!\! \left( {1 \over {|F|}}\right) ^{k_1 } \!\!\left( {1 \over {|T|}}\right) ^{k_2 } \end{aligned}$$

If $1 - p_m > p_m$, that is $p_m < {\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^l \left( {{p_m } \over {1 - p_m }}\right) ^l \left( {1 \over {|F|}}\right) ^l \left( {1 \over {|T|}}\right) ^l \\&= \left( {{p_m } \over {(|F|) \cdot (|T|)}}\right) ^l = \left( {{p_m } \over {10}}\right) ^{9} \end{aligned}$$

From the above analysis, the minimum probability that any individual $x$ mutates to another individual $y$ is $\left( \frac{{p_m }}{{10}}\right) ^{9}$ where $p_m \in (0,0.5)$. Thus, we have $p_{\min m1} \ge \left( \frac{{p_m }}{{10}}\right) ^{9}$.

(3) $p_{\min m2}$

There are two situations for one-point mutation. One is that one bit flips in head. The other is that one bit flips in tail.

For the first situation according to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\} = (1 - p_m )^{l-1} {p_m \over {|F|}}= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over { |F|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} \left( {1 \over {|F| }}\right) \\&= {{p_m }^{l} \over {|F|}} = {{p_m }^{9} \over {2}} \end{aligned}$$

For the second situation according to Lemma 1, we have

$$\begin{aligned} P\{ x\buildrel M \over \longrightarrow y\} = (1 - p_m )^{l-1} {p_m \over {|T|}}= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over { |T|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} ({1 \over {|T| }})\\&= {{p_m }^{l} \over {|T|}} = {{p_m }^{9} \over {5}} \end{aligned}$$

From above analysis, the minimum probability that any individual $x$ mutates to another individual $y$ is ${{p_m }^{9} \over {5}}$, where $p_m \in (0,0.5)$. Thus, we have $p_{\min m2} \ge { {p_m }^{9}\over {5}}$.

(4) $p_{\min s}$

According to selection operator, obviously, we have $p_{mins} ={\textstyle {1 \over 2n}}$.

According to Corollary 1, the EFHT of ME-GEP with uniform mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m1}^n \cdot p_{\min s}^n }} \nonumber \\&< {{10^{180} \cdot 2^{20} \cdot 20^{20} } \over {\left( (2^{7} \times 5^{8} \right) ^{20} - (2^{7} \times 5^{8} - 1)^{20} ) \cdot p_m^{180} }} \end{aligned}$$

(14)

And the EFHT of ME-GEP with one-point mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m2}^n \cdot p_{\min s}^n }} \nonumber \\&< {{5^{20} \cdot 2^{20} \cdot 20^{20} } \over {\left( (2^{7} \times 5^{8} )^{20} - (2^{7} \times 5^{8} - 1)^{20}\right) \cdot p_m^{180} }} \end{aligned}$$

(15)

5.2.3 The EFHT of ME-GEP for clustering yeast database

The parameters of ME-GEP for clustering yeast database are as follows: Let $h = 15$; then $tal = 16$, $l = 31$. $T =\{x_1,x_2, \ldots , x_{16} \}$.

Applying Theorem 1 and Corollary 1, we have

$$\begin{aligned} E[\tau (V_1 )] < \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m}^n \cdot p_{\min s}^n }} \end{aligned}$$

Let $p_{\min m1}$ and $p_{\min m2}$ be the minimum probabilities when an individual $x$ mutates to individual $y$ after uniform mutation and one-point mutation, respectively. Next, we will evaluate and estimate $N$, $p_{\min m1}$, $p_{\min m2}$ and $p_{\min s}$.

(1) The size of individual space $N$.

Denote the tree height of phenotype by variable $ht$. Because the individual length is 31, we have $ht \le 5$. It follows that

$$\begin{aligned} N = |H| \le 2^{(2^0 + 2^1 + 2^2+ 2^3 )} \times \mathrm{{16}}^{\mathrm{{2}}^\mathrm{{4}} } \mathrm{{ = 2}}^\mathrm{{15}} \times \mathrm{{16}}^{\mathrm{{16}}} \end{aligned}$$

(2) $p_{\min m1}$

According to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\} = (1 - p_m )^l \!\!\left( {{p_m } \over {1 - p_m }}\right) ^{k_1 + k_2 } \!\!\left( {1 \over {|F|}}\right) ^{k_1 } \!\!\left( {1 \over {|T|}}\right) ^{k_2 } \end{aligned}$$

If $1 - p_m > p_m$, that is $p_m < {\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^l \left( {{p_m } \over {1 - p_m }}\right) ^l \left( {1 \over {|F|}}\right) ^l \left( {1 \over {|T|}}\right) ^l \\&= \left( {{p_m } \over {(|F|) \cdot (|T|)}}\right) ^l = \left( {{p_m } \over {32}}\right) ^{31} \end{aligned}$$

It is easy to find that the minimum probability that any individual $x$ mutates to another individual $y$ is $\left( \frac{{p_m }}{{32}}\right) ^{31}$ where $p_m \in (0,0.5)$. Thus, we have $p_{\min m1} \ge \left( \frac{{p_m }}{{32}}\right) ^{31}$.

(3) $p_{\min m2}$

There are two situations when an individual does one-point mutation. First is that one bit flips in head. Second is that one bit flips in tail.

For the first situation according to Lemma 1, we have

$$\begin{aligned} P\{ x\buildrel M \over \longrightarrow y\}&= (1 - p_m )^{l-1} {p_m \over {|F|}}\\&= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over { |F|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} \left( {1 \over {|F| }}\right) \\&= {{p_m }^{l} \over {|F|}} = {{p_m }^{31} \over {2}} \end{aligned}$$

For the second situation according to Lemma 1, we have

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\} = (1 - p_m )^{l-1} {p_m \over {|T|}}= (1 - p_m )^{l} {p_m \over {1 - p_m}}{1 \over { |T|}} \end{aligned}$$

If $1-p_m > p_m$, that is $p_m <{\textstyle {1 \over 2}}$, it follows that

$$\begin{aligned} P\left\{ x\buildrel M \over \longrightarrow y\right\}&> (1 - p_m )^{l} \left( {{p_m } \over {1 - p_m }}\right) ^{l} \left( {1 \over {|T| }}\right) \\&= {{p_m }^{l} \over {|T|}} = {{p_m }^{31} \over {16}} \end{aligned}$$

From above analysis, the minimum probability that any individual $x$ mutates to another individual $y$ is ${{p_m }^{31} \over {16}}$ where $p_m \in (0,0.5)$. Thus, we have $p_{\min m2} \ge { {p_m }^{31}\over {16}}$.

(4) $p_{\min s}$

According selection operator, obviously, we have $p_{mins} ={\textstyle {1 \over 2n}}$.

According to Corollary 1, the EFHT of ME-GEP with uniform mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m1}^n \cdot p_{\min s}^n }} \nonumber \\&< {{32^{930} \cdot 2^{30} \cdot 30^{30} } \over {\left( (2^{15} \times 16^{16} )^{30} - (2^{15} \times 16^{16} - 1)^{30} \right) \cdot p_m^{930} }}\nonumber \\ \end{aligned}$$

(16)

And the EFHT of ME-GEP with one-point mutation is as follows:

$$\begin{aligned} E[\tau (V_1 )]&< \frac{1}{{(N^n - (N - 1)^n ) \cdot p_{\min m2}^n \cdot p_{\min s}^n }} \nonumber \\&< {{16^{30} \cdot 2^{30} \cdot 30^{30} } \over {\left( (2^{15} \times 16^{16} )^{30} - (2^{15} \times 16^{16} - 1)^{30} \right) \cdot p_m^{930} }}\nonumber \\ \end{aligned}$$

(17)

5.2.4 Numerical experiments

From the above formulas (14)–(17), the upper bound of the EFHT of ME-GEP for solving clustering Iris and Yeast databases can be decreased by increasing mutate probability.

In order to verify formulas (14)–(17), the following experiments to observe the actual average first hitting times of ME-GEP, especially considering different mutation operators: (1) the uniform mutation probability is changed; (2) the one-point mutation probability is changed.

The results obtained are shown in Figs. 4 and 5. The EFHT of ME-GEP decreased with the increasing of mutate probabilities. In the above mentioned examples, the experimental results conformed to formulas (14)–(17).

6 Conclusion and future work

This paper studies the computational time complexity of ME-GEP. By Markov chain theory and artificial fitness levels technique, the upper and lower bounds of expected first hitting time of ME-GEP are obtained. Furthermore, the relations between the upper bound and the parameters of the ME-GEP are given. Moreover, the relations between the expected first hitting time and the parameters of the algorithm are verified by actual examples for function modeling and clustering problems. Our results obtained in this paper are the general theoretical results, which can be used to guide the design and implementation of GEP. What is more, it has shown that the analytical methods and techniques adopted in this paper have broad suitability and useful reference for the researchers in relevant research area of GEP theory.

Crossover operator has an important role in GEP. The influence of crossover operator on EFHT of GEP will be discussed in our future work. We will also research on the approach to judging whether a problem belongs to ME-GEP-easy class or ME-GEP-hard class.

References

Bin JX, Jie TC et al (2005) Mining frequent function set based on gene expression programming. Chin J Comput 128(8):1247–1254
Google Scholar
Canakci H, Baykasoglu A, Gullu H (2009) Quantitative structure-activity relationships studies of ccr5 inhibitors and toxicity of aromatic compounds using gene expression programming. Eur J Med Chem 18(8):1031–1041
Google Scholar
Chen Yu, Tang Changjie et al (2007) Clustering without prior knowledge based on gene expression progarmming. Third Int Conf Nat Comput 3:451–455
Google Scholar
Chen TS, He J et al (2009) A new approach for analyzing average time complexity of population-based evolutionary algorithms on unimodal problems. IEEE Trans Syst Man Cybern B Cybern 39(5):1092–1106
Article Google Scholar
Chi Z, Xiao WM, Thomas MT, Peter CN (2003) Evolving accurate and compact classification rules with gene expression programming. IEEE Trans Evol Comput 7(6):519–531
Article Google Scholar
Ding LX, Yu JH (2005) Some theoretical results about the computation time of evolutionary algorithms. In: Proceedings of the 7th annual conference on genetic and evolutionary computation. ACM, Washington, DC, USA, pp 1409–1415
Droste S, Jansen T, Wegener I (2002) On the analysis of the (1 + 1) evolutionary algorithms. Theor Comput Sci 276(1–2):51–81
Article MATH MathSciNet Google Scholar
Du X, Ding LX (2010) About the convergence rates of a class of gene expression programming. Sci China Inform Sci 53(4):715–728
Article MathSciNet Google Scholar
Du X, Ding LX et al (2009) Convergence analysis of gene expression programming based on maintaining elitist. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, vol 6, pp 823–826
Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst 12(2):87–129
Google Scholar
Garnier J, Kallel L, Schoenauer M (1999) Rigorous hitting times for binary mutations. Evol Comput 7(2):173–203
Happ E, Daniel J, Christian K, Frank N (2008) Rigorous analyses of fitness-proportional selection for optimizing linear functions. In: Proceedings of the 10th annual conference on genetic and evolutionary computation. ACM, Atlanta, Georgia, USA, pp 953–960
He J, Kang LS (1999) On the convergence rates of genetic algorithms. Theor Comput Sci 229(1–2):23–39
Article MATH MathSciNet Google Scholar
He J, Yao X (2002) From an individual to a population: an analysis of the first hitting time of population-based evolutionary algorithms. IEEE Trans Evol Comput 6(5):495–511
Article Google Scholar
He J, Yao X (2003) Towards an analytic framework for analyzing the computation time of evolutionary algorithms. Artif Intell 145(1–2):59–97
Article MATH MathSciNet Google Scholar
He J, Yao X (2004) A study of drift analysis of estimating computation time of evolutionary algorithms. Nat Comput 3(1):21–35
Iosifescu M (1980) Finite Markov processes and their application. Wiley, Chichester
Google Scholar
Karakasis V, Stafylopatis A (2008) Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection. IEEE Tran Evol Comput 12(6):662–678
Article Google Scholar
Lehre PK, Yao X (2012) On the impact of mutation-selection balance on the runtime of evolutionary algorithms. IEEE Trans Evol Comput 16(2):225–241
Article Google Scholar
Lopes HS, Weinert WR (2004) A gene expression programming system for time series modeling. In: Proceedings of XXV Iberian Latin American Congress on Computational Methods in Engineering, vol 11. CILAMCE, Recife, pp 10–12
Oliveto PS, Witt C, (2008) Simplified drift analysis for proving lower bounds in evolutionary computation. In: Proceeding of the 10th International Conference on Parallel Problem Solving from Nature, pp 82–91
Oliveto PS, He J, Yao X (2007) Computational complexity analysis of evolutionary algorithms for combinatorial optimization: a decade of results. J Autom Comput 4(3):281–293
Article Google Scholar
Peng J, Jie TC et al (2005) M-GEP: a new evolution algorithm based on multi-layer chromosomes gene expression programming. Chin J Comput 9(28):1459–1466
Google Scholar
Rudolph G (1994) Convergence analysis of canonical genetic algorithm. IEEE Trans Neural Netw 5:96–101
Article Google Scholar
Rudolph G (1996) How mutation and selection solve long path problems in polynomial expected time. Evol Comput 4(2):194–205
Article Google Scholar
Shi W, Zhang X, Shen Q (2010) Quantitative structure-activity relationships studies of ccr5 inhibitors and toxicity of aromatic compounds using gene expression programming. IEEE Trans Evol Comput 45(1):49–54
Google Scholar
Smith MJ, Szathmry E (1995) The major transitions in evolution. Oxford University Press, Oxford
Google Scholar
Teodorescu L (2004) Gene expression programming approach to event selection in high energy physics. IEEE Trans Nuclear Sci 53(6):2221–2227
Google Scholar
Teodorescu L, Sherwood D (2008) High energy physics event selection with gene expression programming. Comput Phys Commun 178(6):409–419
Article Google Scholar
Wegener I, Witt C (2005) On the analysis of a simple evolutionary algorithm on quadratic pseudo-boolean function. J Discrete Algorithm 3(1):61–78
Article MATH MathSciNet Google Scholar
Xin D, Qiao LY, Tong XD, Shan KL (2006) A new algorithm of automatic programming: GEDGEP. In: The 6th International Conference on Simulated Evolution and Learning, vol 10. Springer, Hefei, pp 292–301
Yu Y, Zhou ZH (2008) A new approach to estimating the expected first hitting time of evolutionary algorithms. Artif Intell 172(15):1809–1832
Article MATH MathSciNet Google Scholar
Yuan CA et al (2004) Function mining based on gene expression programming-convergence analysis and remnant-guided evolution algorithm. J Sichun Univ 36(6):100–105
Google Scholar
Yuan CA et al (2006) Convergence of genetic regression in data mining based on gene expression programming and optimized solution. Int J Comput Appl 28(4):359–366
Google Scholar
Zhou Chi, Xiao W, Tirpak TM, Nelsom PC (2003) Evolving accurate and compact classification rules with gene expression programming. IEEE Trans Evol Comput 7(6):519–531
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61305079, 61305086, 61203306), the project of preeminent youth fund of Fujian province (JA12471), outstanding young teacher training fund of Fujian Normal University (No. fjsdjk2012083), the open fund of State Key Laboratory of Software Engineering (No. SKLSE2012-09-28). Finally, we thank Prof. Jinghu Yu for discussing some issues about this paper.

Author information

Authors and Affiliations

Faculty of Software, Fujian Normal University, Fujian, 350108, China
Xin Du, Youcong Ni & Ruliang Xiao
The Centre of Excellence for Research in Computational Intelligence and Applications(CERCIA), School of Computer Science, The University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
Xin Du & Xin Yao
College of Mathematics and Computer, Wuhan Textile University, Wuhan, 430073, China
Peng Ye
Department of Information Management Engineering, Fujian Commercial College, Fujian, 350012, China
Datong Xie

Authors

Xin Du
View author publications
You can also search for this author in PubMed Google Scholar
Youcong Ni
View author publications
You can also search for this author in PubMed Google Scholar
Datong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Ye
View author publications
You can also search for this author in PubMed Google Scholar
Ruliang Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Du.

Additional information

Communicated by U. Fiore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, X., Ni, Y., Xie, D. et al. The time complexity analysis of a class of gene expression programming. Soft Comput 19, 1611–1625 (2015). https://doi.org/10.1007/s00500-014-1551-y

Download citation

Published: 13 December 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s00500-014-1551-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The time complexity analysis of a class of gene expression programming

Abstract

Similar content being viewed by others

Multi-Expression Based Gene Expression Programming

Evolutionary Continuous Optimization of Hybrid Gene Regulatory Networks

Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions

Explore related subjects

1 Introduction

2 Background

2.1 GEP algorithm

2.2 ME-GEP algorithm

3 Markov chain model of ME-GEP

3.1 The characteristic of nonnegative matrix

Definition 1

3.2 Artificial fitness levels

3.3 The transition matrices of ME-GEP

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

4 The average time complexity of ME-GEP

Theorem 1

Proof

Corollary 1

Proof

5 Case study

5.1 Function modeling problem

5.1.1 The description of function modeling problem

5.1.2 The EFHT of ME-GEP for solving the amount of gas emitted from coalfaces

5.1.3 Numerical experiment

5.2 Clustering problem

5.2.1 The description of clustering problem

5.2.2 The EFHT of ME-GEP for sloving clustering Iris database

5.2.3 The EFHT of ME-GEP for clustering yeast database

5.2.4 Numerical experiments

6 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation