1 Introduction

Evolutionary algorithms have solved different problems, which are difficult to solve by using more traditional optimization algorithms. The research areas of evolutionary algorithms have been attracting researchers for many years. The performance of evolutionary algorithms depends on balancing between the exploration and exploitation which are obtained via randomized operations such as mutation and recombination. Mutation introduces new genetic material into an existing individual. So, mutation adds diversity to the genetic characteristics of population. Recombination shares information between two or more individuals [1, 2].

Genetic Programming (GP) is an evolutionary algorithm introduced by Koza [3, 4]. It has tree data structures as genotype. GP has been applied on various problems in different areas such as image processing and pattern recognition [59], biomedical science [1013] to control engineering [1416], robotics [1719] and so on [20]. After introducing GP, researchers have been trying to develop new techniques to improve the performance of GP [21]. One such approach suggested recently by Miller and Thomson is Cartesian Genetic Programming (CGP) [22]. CGP uses directed graphs which are more general than the trees to represent programs. The CGP is implemented with mutation only [23], therefore, it may lack the exploitation ability. As the crossover techniques which were reported in [23] (i.e. like the simple point crossover that was applied in the CGP original integer representation) failed to improve the performance of CGP, Clegg et al. [23] introduced the new crossover. In order to apply this new operator, the CGP representation is modified in which a genotype is a fixed length list of real-valued numbers instead of the integer numbers. Each gene in genotype has a value in the interval [0, 1]. After applying the crossover inspired from the real-valued crossover operator in real-valued GAs, the decoding process from the real-valued genotype to integer based genotype should be performed. The results show that by applying a proper crossover, the performance of CGP can efficiently improve.

Biogeography-based optimization is a new evolutionary algorithm for global optimization introduced by Simon [24]. It is a population-based algorithm in which each solution of the population is a vector of integer. Since BBO has certain features in common with other biology-based optimization algorithms like genetic algorithms (GAs) and particle swarm optimization (PSO), BBO is applicable to the same type of problems which they claim to solve [24]. In [24] Simon compares the BBO performance on 14 benchmark functions with seven widely used population-based optimization methods. Results of his study show that BBO outperforms most of the other algorithms on most of the benchmarks. BBO applies the migration operator to share information among solutions. It has good exploitation ability due to the migration operator [25].

The new features of the proposed method which aims to balance the exploitation and exploration are: (1) Taking the migration operator of the BBO to share information among individuals. Applying this migration operator exempts us from modifying the CGP representation and dominant the lack of exploitation ability. (2) Driving a new mutation operator inspired by the concept of opposition-based learning. This new mutation operator enhances the exploration ability of CGP.

To illustrate the effect of the proposed method, another search method is used instead of the 1 + lambda search method applied with the traditional CGP. Experiments have been tested on some of the symbolic regression problems chosen from literatures. In addition, to fairly compare the proposed method with the traditional CGP three performance criteria have been utilized. These measures are success rate (SR), acceleration rate (AR), and the average of the best found solutions and the corresponding standard deviation as calculated after specific number of function evaluations has been reached. Furthermore, the influence of changing in both the number of nodes and population are investigated. The results demonstrate the importance of the exploration–exploitation trade-off.

The rest of this paper is organized as follows: Sect. 2 reviews the BBO algorithm. Section 3 introduces the traditional CGP and its problems. The proposed method called Balanced Cartesian Genetic Programming (BCGP) is also described in Sect. 3. Section 4 presents the experimental results and discussions. Finally, conclusions and directions for future investigations are given in Sect. 5.

2 Biogeography-based optimization

Biogeography-based optimization is a new population-based algorithm for global optimization which was originally developed by Simon [24]. Each individual (called island or habitat) represents a solution for the problem and is comprised of solution features. These solution’s features are called suitability index variables (SIVs) which are the same as genes in GA [24, 25]. The goodness of each solution is characterized by a habitat suitability index (HSI). HSI is equivalent to fitness in GA. Two main operators of BBO for improving the population are the migration (which includes both immigration and emigration) and mutation [24, 25].

Migration is an operator for probabilistically sharing features among solutions. High-HSI solutions are the good ones and tend to share their features with low-HSI solutions by emigration. Low-HSI solutions are the poor ones and accept new features from high-HSI solutions by the immigration. In BBO, each individual has its own immigration rate λ and the emigration rate μ. These rates indicate the probability that a solution is selected as an immigration or emigration habitat. A good solution relatively has a high μ and low λ and vice versa for the poor solution. BBO has good exploitation ability due to these characteristic of the migration. For each solution in each generation, immigration and emigration rates are adaptively determined based on the fitness of the solution as follows [24, 25]:

$$ \lambda_{i} = I\left( {1 - \frac{k(i)}{{n_{pop} }}} \right) $$
(1)
$$ \mu_{i} = E\left( {\frac{k(i)}{{ \, n_{pop} }}} \right) $$
(2)

where k(i) represents the rank of ith individual in an ordered list sorted based on the fitness of the population from the worst fitness to the best one, and n pop is the number of solutions in the population. E and I are respectively the maximum possible rates of the emigration and immigration which are most of the time set to 1 or close to it. Figure 1 illustrates the above-mentioned explanations for a population sorted based on the fitness of individuals [24, 29]. According to the already mentioned definitions, migration can be expressed as [25]:

$$ H_{i,SIV} \leftarrow H_{j,SIV} $$
(3)

where H i is selected as the immigration solution with immigration rate λ i , and H j is the individual selected as the emigration solution with the emigration rate μ j . Equation (3) means that a solution feature of solution H i is replaced by a feature from solution H j [25].

Fig. 1
figure 1

Linear migration rates plotted versus the sorted population [29]

The mutation in BBO is utilized to modify solution’s features, SIVs. According to the mutation rate, a selected SIV in the ith solution, H i , is replaced by a randomly generated one. This process can be described as follows:

figure a

where D is the number of SIV in each habitat, H i,j is the jth SIV of the individual H i , and m i is calculated from equation (4).

$$ m_{i} = m_{\hbox{max} } \left( {1 - \frac{{p_{i} }}{{p_{\hbox{max} } }}} \right) $$
(4)

where m max is the user defined maximum mutation probability, p max  = arg max p i , i = 1,…,n pop , and p i is the solution probability [24, 25]. More details about mutation are discussed in [24].

3 Balanced Cartesian Genetic Programming

Miller and Thomson developed CGP [22]. Although CGP is from GP family, it represents a program as a directed graph unlike the standard tree based GP. The main reason is the more generality of a graph in comparison with a tree [22]. This directed graph is defined by a rectangular grid of nodes with n r and n c number of nodes in each row and column respectively. n r and n c are user defined parameters. Yu and Miller [27] showed that CGP has been more effective when number of rows was chosen to be one. Hence, we choose the number of rows to be one through this paper.

Genotype in CGP is a fixed length list of integers that encodes the function and the connections of each node in the directed graph [22, 23]. To explain the genotype and corresponding phenotype of a program in CGP more clearly, the following example is considered. This example is borrowed from [23] which implements a function considered as x 6 − 2x 4 + x 2. As shown in Fig. 2, each node of the graph consists of genes. The first gene encodes the node function randomly selected from the function set. In this example, functions and corresponding indexes are {+(0), −(1), *(2), /(3)}. The remaining genes in the node encode where in the graph the node takes its inputs from. The nodes take their inputs from either output of a previous node or from the primitive program inputs (terminals). Each of inputs is labeled with the consecutive number from {0 (x), 1 (1)…}.

Fig. 2
figure 2

Genotype and corresponding phenotype of a CGP for the function x 6 − 2x 4 + x 2. First gene in each node (underlined gene) encodes the function. The function set is {+=0, −=1, *=2, /=3}. The two inputs are 0(x) and 1(1) [23]

The number of previous columns of nodes which may have their output connected to a node in the current column is defined by levels-back parameter. Notice that the primary inputs are treated in the same way as node outputs [22]. In this example levels-back is equal to the maximum number of columns. Thus, nodes can connect to any previous nodes in left.

The program output is taken from the node output 8. Any encoded nodes in genotype can be either connected or disconnected. In Fig. 2, gray nodes are not connected to the program output and are inactive. Therefore, in contrary to the genotype, phenotypes (programs) in CGP have a variable length.

As mentioned above, CGP may lack the exploitation ability because it is implemented with mutation only. Some mutation operators include the point mutation, insert-node and delete-node [28]. Some attempts have been done to introduce an effective crossover for CGP when CGP representation is in its original form. However, all of them are failed to improve the performance of CGP. In [23], the crossover method which improved the performance of CGP was developed. However, to incorporate this type of crossover operator into CGP requires a modification to CGP representation itself [23].

Introducing a proper crossover technique can increase the exploitation of CGP and speed up its convergence considerably. The goal of this paper is to improve the performance of CGP by balancing its exploration and exploitation. To reach this purpose, we propose a new method which is called BCGP (Balanced Cartesian Genetic Programming) which is described in the following subsections.

figure b

3.1 Main procedure of BCGP

By employing the migration operator of BBO and being inspired by the concept of opposition-based learning as mutation, BCGP is developed to improve the performance of traditional CGP (see Algorithm 1). In BCGP, each individual is presented by a D dimensional integer vector. In Algorithm 1, VTR is value to reach, NOG is number of generations, and M generation is the maximum number of generations. n pop is the size of the parent population P, P i , k is the kth gene of the ith individual in the P, and C i is the ith member of the offspring population C. The proposed method is an elitist method. Namely, if applying the proposed method yields a better offspring, parent is replaced with it. It is important to mention that the parent will be replaced with an offspring even if the offspring receives the same fitness value.

3.2 Migration

Based on the explanation mentioned in Sect. 2, migration has good exploitation ability, and it can utilize the population information effectively. In order to make a way for sharing information among solutions in CGP, the migration operator is integrated into CGP. Since CGP uses a vector of integers as a genotype just like BBO, we can use the migration operator without any modifications. Migration can be expressed in BCGP as:

$$ P_{i,gen} \leftarrow P_{j,gen} $$
(5)

where P i is the immigrating individual and P j is the emigrating individual.

3.3 Mutation

In order to improve the exploration ability of the proposed method, the mutation operator which is composed of two types of mutation is integrated into BCGP. These types of mutation are different in terms of power of exploration and stochastic.

3.3.1 Type 1: Simple mutation

In type 1, the value of a gene is replaced by a randomly generated value from a valid integer interval.

3.3.2 Type 2: Opposition-based learning based mutation (OBL mutation)

Initial population in evolutionary algorithms is often created randomly. The computation time is related to the distance of these initial guesses from the optimal solution. The chance of starting with a closer solution is increased by checking the opposition guesses at the same time. This is the main idea behind opposition-based learning proposed by Tizhoosh [26]. In opposition-based optimization, fitter one between guess or opposite guess can be chosen to create better initial population or better generation to accelerate evolutionary algorithms convergence. We use the concept of quasi-opposition-based learning as a mutation. As it has been shown in [30], quasi-opposite points have a better convergence rate than opposite points [29]. The value of the kth gene of the ith member of the offspring population C is updated as follows:

$$ \begin{aligned} op_{k} & = Min_{k}^{C} + Max_{k}^{C} - C_{i,k} \\ M_{k} & = \frac{{Min_{k}^{C} + Max_{k}^{C} }}{2} \\ C_{i,k} & = \left\{ {\begin{array}{*{20}l} {round(M_{k} + (op_{k} - M_{k} ) \times rand(0,1))} \hfill & {if\quad C_{i,k} < M_{k} } \hfill \\ {round(op_{k} + (M_{k} - op_{k} ) \times rand(0,1))} \hfill & {otherwise} \hfill \\ \end{array} } \right. \\ \end{aligned} $$
(6)

where \( Min_{k}^{C} \) is the minimum value of the kth gene in the current offspring population and \( Max_{k}^{C} \) is the maximum value of the kth gene in the current offspring population C. So, the maximum and minimum values of each variable (gene) in the current offspring population are used. As explained in [30], using this interval instead of predefined interval boundaries of variables helps to concentrate on the current reduced search space and use the obtained knowledge of converged population. M k is the middle point, round(x) rounds x to the nearest integer, and rand(0,1) is a random number uniformly distributed between 0 and 1. It is important to notice that in order to employ OBL in integer coding, we add round to the relation of OBL.

The core idea of the proposed mutation type is that when the time increases, the search space is shrunken. OBL mutation operator is able to use the knowledge of the current reduced space. The mutation operator is described in Algorithm 2. m r in Algorithm 2 is the mutation rate.

3.4 Differences between BCGP and CGP

There are three main differences between CGP and BCGP: (1) there are no ways for CGP to share information among individuals while BCGP using the migration operator can utilize the population information effectively. (2) In addition, the mutation operator in BCGP which is composed of two types of mutation tends to increase the diversity of population and improves the exploration ability. (3) Operators shown in Algorithm 1 can balance the exploration and the exploitation of BCGP while CGP may lack the exploitation ability.

figure c

4 Simulation result and analysis on symbolic regression problems

To evaluate the performance of BCGP with traditional CGP, five real-valued symbolic regression problems [3, 2123, 31] which are shown in Table 1 are used. The symbolic regression involves finding a mathematical expression that relates the independent variables to the dependent variable for a given finite sampling interval. In other words, it involves finding the hidden relation among the variable/variables and the target concept.

Table 1 Symbolic regression functions

In order to compare the methods, the following parameters have to be determined: terminal set, function set, mutation rate (m r ), maximum number of nodes, and maximum NOGs, M generation . These parameters used in this study have been discussed in detail in [3, 2123]. They are recommended as follows: terminal set of {1,X}, function set consisting of {+, −, *, /, sin, cos}, mutation rate of 3 %, maximum number of nodes of 50, and for the M generation , we use 500 for our proposed method and 6,250 for traditional CGP. These parameters remain unchanged unless it is mentioned.

As can be seen, the maximum number of generations is set differently because CGP and BCGP employ different search methods. Since the search method over the CGP representation is very important and must be carefully chosen, we apply the (1 + lambda) evolutionary strategyFootnote 1 which most literature used it. Lambda is set to 4 [32]. In order to equate the number of evaluations in each method, the M generation for CGP is set to 6,250.

Moreover, for all regression problems, experiments are repeated 100 times and average results are shown. Note that the fitness value of each individual is defined here as the sum of absolute errors between its values and the true function value over all fitness cases. Fitness cases are randomly selected from specified intervals which are shown in Table 1. The stopping condition is to find a value smaller than the value to reach (VTR) before reaching the maximum number of generations M generation . We set VTR to 10−6.

In order to have a fair comparison, three performance criteria are utilized. These criteria are:

  • The Mean and standard deviation (St-d) of errors in 100 runs.

  • The number of runs for which the algorithm successfully reaches the VTR for each test function is measured as the success rate (SR) [33].

  • In order to compare convergence speeds, another measure called the acceleration rate (AR) is used [33]. As mentioned above, we must employ two different search methods in our proposed method and the traditional CGP. To have a fair comparison, we should compare BCGP and CGP in the same number of evaluations. We show AR as follows, based on the NOG and the number of individuals:

    $$ AR = \frac{{NOG_{CGP} \times lambda}}{{NOG_{BCGP} \times n_{pop} }} $$
    (7)

    where NOG is the number of generations. The number of generations multiplied by the population size is equal to the number of function calls (NFCs) which indicates the convergence speed. A smaller NFC means higher convergence speed. The average number of NFC over 100 runs is used for minimizing the effect of stochastic. AR > 1 means BCGP is faster.

4.1 Influence of migration on the performance of BCGP

In our first experiment, we investigate only the influence of applying migration on the performance of CGP. We call it BCGP1. In our proposed method, the size of the population is 50 and at each generation 50 offspring are created and as discussed in Algorithm 1, the better individual between ith parent and ith offspring survives for the next generation.

Figure 3 depicts average mean errors found by two algorithms over 100 runs for the benchmark functions. The average results of 100 independent runs of BCGP1 and CGP on all benchmark functions are summarized in Table 2. As demonstrated in Fig. 3 and Table 2, applying the migration operator improves the performance of the traditional CGP in terms of convergence speed and accuracy significantly. In all figures, the horizontal axis represents the number of function calls in CGP (or BCGP1), and a vertical axis is the fitness value for the best individual which is averaged over 100 runs. We use number of function calls for comparison between algorithms because they have the same maximum number of function calls.

Fig. 3
figure 3

Mean error curves of BCGP1 and traditional CGP for five regression problems. Results obtained from averaged over 100 independent runs

Table 2 Comparison of BCGP1 and traditional CGP

4.2 Comparison between CGP and proposed method (BCGP)

In this section, we try to prove the effectiveness of our proposed method. The population size for our proposed method is 50 as mentioned in Sect. 4.1. Table 3 summarizes the BCGP and CGP performances on the benchmark functions. It is obvious that BCGP performs significantly better than CGP consistently with respect to all three criteria for all regression problems.

Table 3 Comparison of BCGP and CGP

Figure 4 depicts average convergence curves of CGP, BCGP1 and BCGP (over 100 runs) for all regression problems. According to Fig. 4, we can see that: first, the traditional CGP may trap into local optima while the proposed algorithm can locate a good near-global optimum (because the proposed operator allows CGP to escape from poor local optima). Second, we can see that BCGP is better than BCGP1. The comparison between, BCGP and BCGP1 on all regression according to Fig. 4 illustrates the effectiveness of the OBL mutation. Results indicate that the exploration–exploitation trade-off improves the performance of CGP. Third and in summary, BCGP converges faster than BCGP1 and CGP.

Fig. 4
figure 4

Mean error curves of traditional CGP, BCGP1 and BCGP for five regression problems

4.3 Effect of increasing the number of nodes

This section considers the influence of the number of nodes on the performance of two methods. In order to consider the effect of individual size on the performance of the two methods, similar to [23], the number of nodes is increased from 50 to 100 in all regression problems. Average convergence graphs are shown in Fig. 5. Results of solving the five regression problems are given in Table 4. Compared with CGP, BCGP performs better in terms of the quality of the final solutions and the convergence rate.

Fig. 5
figure 5

Performance comparison between BCGP and traditional CGP with 100 nodes using the average error on the five regression problems

Table 4 Comparison of BCGP and traditional CGP with 100 nodes

4.4 Influence of population size

The increase of individuals in population makes the initial diversity of population rise and allows large parts of the search space to be covered per iteration [1]. In order to consider the influence of population size on the performance of BCGP, the number of population is increased to 150 in BCGP. Other parameters are the same as in Sect. 4.3. The results for n pop=50 and n pop=150 are shown in Table 5. From Table 5, we can conclude that overall Mean criterion is decreased and SR is increased.

Table 5 Influence of population size on the performance of BCGP

4.5 Comparison with the algorithm presented in [23]

In this section, the BCGP is compared with the real coded CGP presented in [23]. In Clegg et al. method, a new crossover technique was used. It was inspired by the real-valued crossover operator found in real-valued GAs. This method was tested on F 1 and F 2 (see Table 1). For two methods, the CGP basic parameters are as follows:

  • Population size: n pop  = 50;

  • Function set: {+,−,*,/};

  • Terminal set: {1,X};

  • Maximum number of nodes: 50;

  • M generation : 500;

Other parameters of BCGP are the same as mentioned in Sect. 4. The method presented in [23] was performed with different rates of crossover (such as 25 and 50 %) which are reported in Fig. 6. Since the maximum number of generations is the same as what used in two algorithms, they are compared on the basis of generations. According to Fig. 6, it is observed that the relative performance of BCGP is better than the method of Clegg et al. in terms of convergence speed and solution accuracy.

Fig. 6
figure 6

Mean error curves of BCGP and Clegg et al. method considered with various crossover rates, for F 1 and F 2 symbolic regression problems

5 Conclusion

This paper highlights the importance of balancing the exploration and exploitation to improve the performance of CGP. CGP utilizes only the random mutation, therefore, it may lack the exploitation ability. In this paper, Balanced Cartesian Genetic Programming (BCGP) is proposed as a method to balance the exploration and exploitation ability of CGP. To dominate the lack of exploitation ability of CGP, the migration operator of BBO is integrated in BCGP. The main reasons for implementing the migration operator to share information among individuals in CGP are based on two considerations. First, this operator is easily applicable in CGP without any modification in CGP representation. Second, the migration operator has good exploitation ability, and it can utilize the population information effectively. Furthermore, a new mutation type namely opposition-based learning based mutation (OBL mutation) is applied in the mutation operator of BCGP. The OBL mutation is defined on the basis of the concept of opposition-based learning to enhance the exploration ability of BCGP. The OBL mutation uses the knowledge of the population to reduce the exploration time as time increases.

The proposed method (i.e. BCGP) were tested on symbolic regression problems and compared with traditional CGP. In this study, three performance criteria have been utilized to fairly compare the algorithms. They are success rate (SR), acceleration rate (AR) and the average of the best found solutions and the corresponding standard deviation as calculated after specific number of function evaluations has been reached. AR is applicable to compare the convergence speeds of two methods when one generation of each algorithm executes the different number of evaluations. Experimental results show that by using OBL mutation and migration operators, BCGP outperforms CGP in terms of accuracy and the convergence speed.

Experiments that we performed verify that our proposed operators improve the CGP performance. For all cases, BCGP performs better than CGP with respect to the performance measures such as SR. Also, the effect of increasing the length of the individual and the population size on the performance of BCGP is investigated.

Our future will consist of applying proposed method on the real-valued representation of CGP which was presented in [23]. Future work can be investigated about how other types of migration are incorporated into CGP to improve its performance. Another direction for future work will be to extend the method so that it can be applicable as the preprocessing method to discover the relation among features by constructing new features in order to facilitate the learning of the target concept in the classification task.