A Study on Fitness Representation in Genetic Programming

Thi, Thuong Pham; Nguyen, Xuan Hoai; Nguyen, Tri Thanh

doi:10.1007/978-3-319-49073-1_13

Thuong Pham Thi^19,21,
Xuan Hoai Nguyen²⁰ &
Tri Thanh Nguyen²¹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 538))

Included in the following conference series:

International Conference on Advances in Information and Communication Technology

1141 Accesses

Abstract

In this paper, we propose a variation on the fitness function in Genetic Programming based on Bias-Variance Genetic Programming (BVGP) [2], called BVGP*. In order to evaluate the effectiveness of this variation, we compare it with Genetic Programming [1] and Bias-Variance Genetic Programming (BVGP) [2]. The experimental results shown that the learned model by BVGP* is better than that of GP and BVGP in ability to generalize, model complexity and evaluation time.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Genetic programming theory and practice: a fifteen-year trajectory

Article 16 May 2019

Genetic Programming

Inclusive Genetic Programming

Keywords

1 Introduction

Genetic Programming (GP) is one of evolutionary algorithm-based methodologies inspired by biological evolution. It uses tree-based structures and a suite of defined Genetic Algorithm-operators to generate and evolve a population of solutions to the given problem [3]. GP has produced many novel and outstanding results in various areas such as optimization, searching, sorting, quantum computing, electronic design, game playing, cyberterrorism prevention [4, 6]. One of main areas of GP is Machine Learning

In Machine Learning, generalization and over-fitting are two central challenges need to be solved. Generalization error of learners directly relates to over-fitting and is referred to as the problem of over-fitting [7]. There are many researches in Machine Learning, including GP, try to improve generalization ability of learners by reducing over-fitting error as [2, 8–12].

Over-fitting can be controlled by Bias-Variance trade-off [2], where bias is the error on training data set and variance is the error of difference on various data sets in the future. Over-fitting will be reduced when bias and variance are small, simultaneously. Because bias and variance are hidden in the L2-norm loss function (e.g. RMSE, MSE, ...). So, many researches in Machine Learning have used these functions [13–16] for learning. However, the combination of bias and variance in the error function L2 sometimes causes difficulties in optimizing them simultaneously because Bias and Variance are two conflicting problems. So, in GP, Alexandros et al. proposed the method (BVGP) [2] to overcome this issue. He divided the fitness function into two components: variance and squared bias which aim at bringing variance component into the evolution process more directly. However, this method faces to the over-fit issue on limited training sample. This leads to reducing the ability to generalize of the learned model. Moreover it can make the model very sensitive to noise.

In this paper, we propose a variation on the fitness function for GP which aims at improving the limits of BVGP as shown above. It is called BVGP*. Through experiments, we demonstrate that the use of BVGP* has some advantages: (1) It can help to reduce over-fitting on the problems that GP was over-fitted; (2) the program runs faster and finds the simpler solution. So, the main contribution of this paper is the variation on the fitness function for improving the effectiveness of GP based on bias-variance decomposition of training errors.

The remainder of this paper is organized as follows: In Sect. 2, we briefly present some background knowledge and related work. A variation on the fitness function is presented in Sect. 3. Section 4 are some experimental settings and problems for testing. Next, experimental results are given in Sect. 5. Finally, Sect. 6, we summarize achieved research results and present some future works.

2 Background and Related Work

2.1 Bias-Variance Decomposition

In this section we introduce the background on the statistical concept of loss function and Bias-Variance Decomposition for regression. The material is based on the book of Trevor Hastie [17].

If we assume that $Y=f(x)+\varepsilon $, where $\varepsilon $ is prediction error; $E(\varepsilon )=0$ and $Var(\varepsilon )=\sigma ^{2}_{\varepsilon }$, we can derive an expression for the expected prediction error of $\widehat{f}(x)$ at an input point $X=x_{0}$, using L2-loss function as follows:

$$\begin{aligned} Err(x_{0})= & {} {E[(Y-\widehat{f}(x))^2|X=x_{0}]} \nonumber \\= & {} {\sigma ^{2}_{\varepsilon }+[E\widehat{f}(x_{0})-f(x_{0})]^2+E[\widehat{f}(x_{0})-E\widehat{f}(x_{0})]^2} \nonumber \\= & {} {\sigma ^{2}_{\varepsilon }+Bias^2(\widehat{f}(x_{0}))+Var(\widehat{f}(x_{0}))} \nonumber \\= & {} {{Irreducible\ Error} + Bias^2+Variance} \end{aligned}$$

(1)

The first term is the variance of the target around its true mean $f(x_{0})$; the second term is the squared bias, the amount by which the average of our estimate differs from the true mean; the last term is the variance; the expected squared deviation of $\widehat{f}(x_{0})$ around its mean. The last two terms need to be addressed for a good performance of the prediction model.

Generalization error is the prediction error over an independent test sample:

$$\begin{aligned} Err(T)=E[L(Y,\widehat{f}(x))|T] \end{aligned}$$

(2)

where both X and Y are drawn randomly from their joint distribution (population). Here, the training set T is fixed, and test error refers to the error for this specific training set. A related quantity is the expected prediction error:

$$\begin{aligned} Err=E[L(Y,\widehat{f}(x))]=E[Err_{T}] \end{aligned}$$

(3)

Such decomposition is known as the Bias Variance Decomposition.

2.2 Bias-Variance Genetic Programming (BVGP)

The Bias-Variance Genetic Programming proposed by Alexandros et al. is a new method for over-fitting issue based on Bias/Variance Error Decomposition which aims at relaxing the sensitivity of an evolved model to a particular training dataset. This method used the fitness function that is the combination of bias and variance as follows:

$$\begin{aligned} fitness = w_{b}Bias(D)+w_{v} Var(D^{*}) \end{aligned}$$

(4)

where $w_{b}, w_{v}$ are the coefficients for error and variance respectively; D is the training data set of size n; $D^{*}$ includes B bootstrap datasets randomly drawn from D by the bootstrap re-sampling method; Bias(D) is the mean error on the original dataset (bias); $Var(D^{*})$ is variance of the error on the bootstrap datasets.

He separated regression error into two components: bias and variance to put variant error in the evolution process more directly.

3 The Improved Method: BVGP*

In this section, we propose a variation on the fitness function for GP which aims at overcoming the disadvantage of the BVGP. This function is based on BVGP and defined as follows:

$$\begin{aligned} fitness = w_{b}Bias(D^{*})+w_{v} Var(D^{*}) \end{aligned}$$

(5)

where bias and variance are calculated using the bootstrap re-sampling method. We consider f(x) as the model trained on a dataset $D = \{(x_{1},t_{1}),..., (x_{N}, t_{N})\}$ and use the bootstrap re-sampling method to randomly draw B datasets with replacement from D, each sample the same size as D. We denote $D^*$ to include B the bootstrap sample sets: $D^{*}=\{D^{*b}, b:1 .. B \}$. The estimated bias ($\mu $) and variance ($\sigma ^2$) of stochastic fitness are computed as follows:

$$\begin{aligned} Bias(D^{*}) = \sum _{b=1}^BBias^{*b}/B \end{aligned}$$

(6)

where $Bias^{*b}$ is the bias of bootstrap sample $D^{*b}$ is calculated using the error function RMSE:

$$\begin{aligned} Bias^{*b} = \sqrt{\frac{1}{N}\sum _{i=1}^N(f(x_{i})-t_{i})^2} \end{aligned}$$

(7)

So, here we use $Bias(D^{*})$ rather than the mean error on the original dataset Bias(D).

$$\begin{aligned} \sigma ^2=\frac{1}{B-1}\sum _{b=1}^B(Bias^{*b}-Bias(D^{*}))^2 \end{aligned}$$

(8)

As shown in [5], given a data sample, statistical inference is the process to assess how systems will behave in untested situations. It permits generalizations of conclusions beyond the sample, about an unseen population from which the sample is drawn. This process is inference from statistics to parameters, where statistics are functions on samples and parameters are functions on populations. It is noted that, Bias(D) is a statistic on D while $Bias(D^{*})$ is a parameter inferred from this statistic. The bootstrap re-sampling method is used to construct empirical sampling distributions for parameter estimation $Bias(D^{*})$ without making any troubling assumptions about sampling models and population distributions. BVGP* learns to optimize the fitness function based on $Bias (D^{*})$ while the fitness function of BVGP is based on Bias(D). Therefore, we believe that the generalization ability of BVGP* is better than that of BVGP. The experimental results have confirmed this is true with most of the problems to be tested.

4 Experimental Setting

4.1 Problems

In this paper, we used benchmarks in [2] as shown in Table 1. Besides, we also used three more UCI data sets as shown in Table 2 to test the generalization ability of BVGP*. With UCI data sets, we divide an original dataset into two parts randomly: $\langle \text {Train sample:Test sample} \rangle = \langle 1:2 \rangle $.

Table 1. GP benchmark regression problems

Full size table

Table 2. UCI data sets

Full size table

4.2 GP System Setup

Evolutionary parameter values for GP systems are shown in Table 3. These typical settings are often used by GP researchers and practitioners [1].

Table 3. GP systems setup

Full size table

5 Results and Discussion

In this section we present results of comparing the performance of BVGP* in comparison with GP, BVGP. We evaluate the effectiveness of BVGP* on three aspects: (1) Generalization ability; (2) Model complexity; and (3) Time complexity.

5.1 Generalization Error (Fittest)

In this section, we repeated one hundred runs independently for each GP system. Generalization error is the median of testing error of the best individual from all these runs. The Table 4 shows the generalization error or testing error (fittest) GP, BVGP and BVGP*, bold values indicate that the corresponding method is the best result. We see that with most of problems (BEN_1, BEN_2, BEN_3, BEN_4, BEN_5, BEN_7, UCI_1) fittest error of BVGP* is smaller than that of GP and BVGP or generalization ability of BVGP* is better than that of GP and BVGP. However, with UCI_2, generalization ability of BVGP* is much worse than that of GP and BVGP. The cause can be the learned model by BVGP is under-fit on this problem.

It is noted that both the GP and BVGP use the bias on the original training dataset (Bias(D)) as the optimal goal, this lead to over-fitting when the size of the training sample is limited or there is noise in the train data or the sampling process is bad. BVGP* rather than using Bias(D), it uses the mean of the empirical bootstrap error distribution ($Bias (D^{*})$) as one of the optimal goals. So, it can avoid the sampling bias issues that lead to over-fitting solution as showed above. This explains why the results by BVGP* are better than those of GP and BVGP in most of problems.

Table 4. Summary of fittest error (median). Statistics based on 100 independent runs. Bold values indicate that the method is the best.

Full size table

Table 5. Evaluation time (median), model complexity (median) is the average number of nodes on the best individual. Statistics based on 100 independent runs. Bold values indicate the method is the best.

Full size table

5.2 Model Complexity and Evaluation Time

In this section, we repeated one hundred runs independently for each GP system. Generalization error is the median of testing error of the best individual from these runs. The Table 5 shows the evaluation time and model complexity of the best individual by GP, BVGP, BVGP*. Bold values indicate that the corresponding method is the best.

Here, the evaluation time is measured in milliseconds. It is effected mainly by model complexity. Similar to fittest error, in all problems (10/10 problems, see bold lines), BVGP* is faster than GP since it leaned the smaller model (see corresponding lines at the column Evaluation time). Comparing to BVGP, BVGP* also learned the model with smaller complexity with most of problems (7/10 problems, see bold lines), so it is faster than BVGP or evaluation time is smaller. It is noted that, on BEN_6, the model complexity of BVGP* is larger while the its evaluation time is smaller than that of other methods. It can be caused by genotype of the learned model by BVGP* contains various operators that effect to evaluation time, i.e., considering two genotypes as shown in Fig. 1, although the model complexities of them are different, the evaluation time of them are similar.

6 Conclusion and Future Work

In this paper, we proposed the variation on the fitness function (BVGP*). It is based on the bias-variance decomposition and the method BVGP. Analyses of empirical results show that this approach has some advantages: (1) BVGP* can help reduce over-fitting on the problems that GP and BVGP were over-fitted; (2) It runs faster with simpler solution.

There are several future research directions arisen from this paper. First we need a more natural fitness representation way in bringing two components directly into the process of evolution. Second, we need a new selection mechanism corresponding to this fitness representation.

References

Koza, J.R.: Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT press, Cambridge (1992)
MATH Google Scholar
Agapitos, A., Brabazon, A., O’Neill, M.: Controlling overfitting in symbolic regression based on a bias/variance error decomposition. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) PPSN 2012. LNCS, vol. 7491, pp. 438–447. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32937-1_44
Chapter Google Scholar
Cramer, N.L.: A representation for the adaptive generation of simple sequential programs. In: Proceedings of the First International Conference on Genetic Algorithms, pp. 183–187 (1985)
Google Scholar
Nordin, P.: Genetic programming iii-darwinian invention and problem solving. Evol. Comput. 7, 451–453 (1999)
Article Google Scholar
Cohen, P.R.: Empirical Methods for Artificial Intelligence, vol. 139. MIT press, Cambridge (1995)
MATH Google Scholar
Hansen, J.V., Lowry, P.B., Meservy, R.D., McDonald, D.M.: Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection. Decis. Support Syst. 43, 1362–1374 (2007)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Fitzgerald, J., Azad, R., Ryan, C.: A bootstrapping approach to reduce over-fitting in genetic programming. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 1113–1120. ACM (2013)
Google Scholar
Gonçalves, I., Silva, S., Melo, J.B., Carreiras, J.M.B.: Random sampling technique for overfitting control in genetic programming. In: Moraglio, A., Silva, S., Krawiec, K., Machado, P., Cotta, C. (eds.) EuroGP 2012. LNCS, vol. 7244, pp. 218–229. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29139-5_19
Chapter Google Scholar
Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37207-0_7
Chapter Google Scholar
Nguyen, T.H., Nguyen, X.H., McKay, B., Nguyen, Q.U.: Where should we stop? An investigation on early stopping for GP learning. In: Bui, L.T., Ong, Y.S., Hoai, N.X., Ishibuchi, H., Suganthan, P.N. (eds.) SEAL 2012. LNCS, vol. 7673, pp. 391–399. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34859-4_39
Chapter Google Scholar
Uy, N.Q., Hien, N.T., Hoai, N.X., O’Neill, M.: Improving the generalisation ability of genetic programming with semantic similarity based crossover. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds.) EuroGP 2010. LNCS, vol. 6021, pp. 184–195. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12148-7_16
Chapter Google Scholar
Muttil, N., Chau, K.-W.: Neural network and genetic programming for modelling coastal algal blooms. Int. J. Environ. Pollut. 28, 223–238 (2006). Inderscience Publishers
Article Google Scholar
Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for human oral bioavailability of drugs. In: Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, pp. 255–262. ACM (2006)
Google Scholar
Juang, C.-F.: A hybrid of genetic algorithm and particle swarm optimization for recurrent network design. IEEE Trans. Syst. Man Cybern. Part B Cybern. 34, 997–1006 (2004)
Article Google Scholar
Whigham, P.A., Crapper, P.F.: Time series modelling using genetic programming: an application to rainfall-runoff models. In: Advances in Genetic Programming, vol. 3, pp. 89–104. MIT Press, Cambridge (1999)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J., Franklin, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2005). The Mathematical Intelligencer, 27, 83–85. Springer
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Information and Communication Technology, Thainguyen University, Thai Nguyen, Vietnam
Thuong Pham Thi
Hanoi University, Hanoi, Vietnam
Xuan Hoai Nguyen
VNU University of Engineering and Technology, Hanoi, Vietnam
Thuong Pham Thi & Tri Thanh Nguyen

Authors

Thuong Pham Thi
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Hoai Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Tri Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thuong Pham Thi .

Editor information

Editors and Affiliations

School of Information Science、Area of Human Life Design , Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa, Japan
Masato Akagi
Department of Computer Science, VNU University of Engineering and Technology, Hanoi, Vietnam
Thanh-Thuy Nguyen
Faculty of Information Technology, Thai Nguyen University of Information and Communication Technology, Thai Nguyen, Vietnam
Duc-Thai Vu
Thai Nguyen University of Information and Communication Technology, Thai Nguyen, Vietnam
Trung-Nghia Phung
School of Knowledge Science、Area of Knowledge Management , Japan Advanced Institute of Science and Technology, Nomi-shi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thi, T.P., Nguyen, X.H., Nguyen, T.T. (2017). A Study on Fitness Representation in Genetic Programming. In: Akagi, M., Nguyen, TT., Vu, DT., Phung, TN., Huynh, VN. (eds) Advances in Information and Communication Technology. ICTA 2016. Advances in Intelligent Systems and Computing, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-49073-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-49073-1_13
Published: 12 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49072-4
Online ISBN: 978-3-319-49073-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

A Study on Fitness Representation in Genetic Programming

Abstract

Similar content being viewed by others