Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Dolotov, Evgeny; Zolotykh, Nikolai

doi:10.1007/978-3-030-39575-9_2

Evgeny Dolotov^20,21 &
Nikolai Zolotykh^20,21

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1086))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

715 Accesses
3 Citations

Abstract

Most decision tree induction algorithms are based on a greedy top-down recursive partitioning strategy for tree growth. In this paper, we propose several methods for induction of decision trees and their ensembles based on evolutionary algorithms. The main difference of our approach is using real-valued vector representation of decision tree that allows to use a large number of different optimization algorithms, as well as optimize the whole tree or ensemble for avoiding local optima. Differential evolution and evolution strategies were chosen as optimization algorithms, as they have good results in reinforcement learning problems. We test the predictive performance of this methods using several public UCI data sets, and the proposed methods show better quality than classical methods.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Quality Diversity Genetic Programming for Learning Decision Tree Ensembles

Evolutionary Algorithm for Decision Tree Induction

HEAD-DT: Fitness Function Analysis

Keywords

1 Introduction

Decision trees are a popular method of machine learning for solving classification and regression problems. Because of their popularity many algorithms exists to build decision trees [1, 2]. However, the task of constructing optimal or near-optimal decision tree is very complex. Most decision tree induction algorithms are based on a greedy top-down recursive partitioning strategy for tree growth. They use different variants of impurity measures, such as information gain [2], gain ratio [3], gini-index [4] and distance-based measures [5], to select an input attribute to be associated with an internal node. One major drawback of the greedy search is that it usually leads to sub-optimal solutions. The underlying reason is that local decisions at each nodes are in fact interdependent and cannot be found in this way.

A popular approach that can partially solve these problems is the induction of decision trees through evolutionary algorithms (EAs) [15]. In this approach, each “individual” in evolutionary algorithms represents a solution to the classification problem. Each solution is evaluated by a fitness function, which measures the quality of it. At each new generation, the best solutions have a higher probability of being selected for reproduction. The selected solutions undergo operations inspired by genetics, such as crossover and mutation, producing new solutions which will replace the parents, creating a new population of solutions. This process is repeated until a stopping criterion is satisfied. Instead of a local search, EAs perform a robust global search in the space of candidate solutions. As a result, EAs tend to cope better with attribute interactions than greedy methods and avoid local optima.

In this paper we propose an approach that encodes a decision tree as a real-valued homogeneous vector, since we also encode indices of features by real numbers and decode them using the operation of finding the minimum. This approach allows to use a large number of different optimization algorithms, such as differential evolution [6] and evolution strategies [7].

2 Related Work

The number of proposed evolutionary algorithms for decision tree induction has grown in the past few years, mainly because they report good predictive accuracy whilst keeping the comprehensibility of decision trees. There are two the most common approaches to encoding decision trees for evolutionary algorithms: tree-based encoding and fixed-length vector encoding. They all use different methods to encode indices of features, threshold values, leaves, and operators in nodes. The main differences in tree-based approaches are the presence of pointers to nodes and the ability to encode trees of various sizes. Axis-parallel decision trees are the most common type found in the literature, mainly because this type of tree is usually much easier to interpret than an oblique tree. A node in axis-parallel decision tree can be described by two parameters: index of tested feature and threshold value. A popular approach [8] to encoding such trees is to encode each node with one integer and one real number, but in this case, we get heterogeneous and more complex representation of decision tree, than in approach proposed in this article, which makes the process of finding the optimal solution more complex. Authors of article [9] describe a very similar approach to encoding oblique decision trees with real-valued vectors and optimizing them with differential evolution algorithms, but in this article we propose a more compact representation specifically for axis-parallel decision trees. A more detailed overview of an evolutionary methods for constructing decision trees can be found here [10].

3 Proposed Approach

In this paper we propose a new approach to construct axis-parallel decision tree for classification problems using evolutionary algorithms.

3.1 Real-Valued Vector Representation

In axis-parallel trees each node splits dataset according to the following rule:

$$\begin{aligned} f(x) = {\left\{ \begin{array}{ll} 1, &{} \text {if } a_i \le t\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(1)

Thus, each node of the tree is described by two parameters: the index of a feature and a threshold value.

Suppose we have a fixed-length real-valued vector with values in the segment [0, 1]. This vector consists of two parts of equal length – the first part encodes feature indices, and the second part encodes threshold values. Also suppose that all features of objects belong to the segment [0, 1]. If this is not the case, then we normalize features using the maximum and minimum values from the training dataset. To restore the index of a feature from the vector, we should find the position of the minimum value in the first part of this vector and find its remainder of integer division by the number of features. The value in the second part of the vector in this position is used as threshold value in the corresponding node. After that, the next minimal value in the vector, the corresponding index of the feature in the node and a threshold value should be found. This operation is repeated until the entire vector is used. Using the indices of features and their threshold values for all nodes, the decision tree without leaves can be built by sequentially adding the nodes. After that, the leaves are added to the decision tree by using training dataset and the majority rule. Thus, we can construct a decision tree from a real-valued vector and evaluate its characteristics.

3.2 Differential Evolution

The differential evolution (DE) [6] is an effective evolutionary algorithm designed to solve optimization problems with real-valued parameters. A population in DE consists of N individuals:

$$\begin{aligned} P = \{x_1, x_2, ..., x_N\} \end{aligned}$$

(2)

The j-th value of the individual $x_i$ in the initial population is calculated as follows:

$$\begin{aligned} x_j^i = x_j^{\min } + r(x_j^{\max } - x_j^{\min }), \end{aligned}$$

(3)

where $r \in [0, 1]$ is a uniformly distributed random number.

The evolutionary process implements an iterative scheme to evolve the initial population. At each iteration of this process, known as the generation, a new population of individuals is generated from the previous one. Each individual is used to build a new vector by applying the mutation and crossover operators:

Mutation. Three randomly chosen individuals are linearly combined as follows:
$$\begin{aligned} v^i = x^{j_1} + \alpha (x^{j_2} - x^{j_3}), \end{aligned}$$
(4)
where $\alpha $ is a user-specified constant.
Crossover. The mutated vector is recombined with the target vector to build the trial vector:
$$\begin{aligned} u_j^i = {\left\{ \begin{array}{ll} v_j^i, &{} \text {if } r \le CR \text { or } j=l\\ x_j^i, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(5)
where $r \in [0,1]$ is uniformly distributed random number and CR is the crossover rate.
Selection: A one-to-one tournament is applied to determine which individual is selected as a member of the new population.

In the final step, when a stop condition is fulfilled, DE returns the best individual in the current population.

3.3 Evolution Strategies

Unlike the method of differential evolution, the population in the method of evolution strategies [7] consists of only one individual:

$$\begin{aligned} P \sim x \end{aligned}$$

(6)

The initial individual is calculated as follows:

$$\begin{aligned} x_j = x_j^{\min } + r(x_j^{\max } - x_j^{\min }), \end{aligned}$$

(7)

where $r \in [0,1]$ is a uniformly distributed random number. We sample several offsets which are represented as a normal distributed random vector $e_1, e_2, ..., e_n \sim \mathcal {N}(0, I)$. Then we shift the individual in the direction of the weighted sum of the offsets, which approximate the gradient:

$$\begin{aligned} x \leftarrow x + \alpha \frac{1}{n\sigma }\sum _{i=1}^{n}f(x+\sigma e_i)e_i, \end{aligned}$$

(8)

where $\alpha $ and $\sigma $ are user-specified constants.

3.4 Construction of Ensembles

Two of the most popular approaches for constructing the ensembles of decision trees is bagging and boosting. Example of method that use bagging approach is random forest and example of method that use boosting approach is AdaBoost. In this part of the paper we propose to replace classical algorithms to induction of decision trees in these methods by evolutionary algorithms described earlier. Thus, we obtain two new methods: evolutionary random forest (EvoRF) as the analogue of random forest and EvoBoost as the analogue of AdaBoost. In addition to this, we consider the method (EvoEnsemble) in which each individual in the population is the whole ensemble, and representation of the ensemble is a large real-valued vector obtained by concatenation with a vector for each tree from the ensemble. Thus, in this method – evolutionary ensemble, we optimize the whole ensemble at once, which theoretically should lead to a better result.

4 Experiment

For experiments we use several popular datasets from UCI repository. Experiments are divided into two parts.

First, we evaluate classification accuracy of the methods based on evolutionary algorithms and compare their results with the classical methods for solve classification problems. Experiments show that using of the proposed methods does not allow to exceed the results of classical algorithms for constructing decision trees on some datasets, but on the vast majority of datasets using evolution strategies allows to achieve the significant improvement in the accuracy of prediction by several percent (Table 1). Therefore, we decide to use this algorithm to build ensembles in subsequent experiments.

Table 1. Comparison of popular classification algorithms such as CART [11] and multilayer perceptron (MLP) [12] with the proposed approaches: differential evolution (DE) and evolution strategies (ES).

Full size table

Second, we evaluate classification accuracy of several approaches to constructing ensembles of decision trees. For these experiments we use datasets which have only two different labels, in other words, we are solving the problem of binary classification. Various hyperparameter of the random forest algorithm and AdaBoost such as, the depth of trees and maximum number of trees was selected using the method of grid search, and then these same parameters were used for their evolutionary analogues. As well as in the case of using evolutionary algorithms for constructing a single decision tree, experiments show that using of the proposed methods does not allow to exceed the results of classical algorithms for constructing ensemble of decision trees on some datasets, but the method that represent whole ensemble as one real-valued vector are showing best accuracy on most datasets (Table 2).

Table 2. Comparison of random forest (RF) [13] and AdaBoost [14] with the proposed approaches for constructing the ensembles of decision trees: evolutionary version of random forest (EvoRF), AdaBoost, EvoBoost and evolution ensemble (EvoEnsemble)

Full size table

5 Conclusion and Future Work

In this paper, we have proposed several methods that use different evolutionary algorithms to construct decision trees and their ensembles. The main contribution of this paper is method to construct real-valued vector representation of decision tree that allows to use different evolutionary algorithms for constructing decision trees and their ensembles. The proposed algorithms show better quality than classical methods such as CART, random forest and AdaBoost on popular datasets from UCI repository, but in order to achieve such high results, it takes more time than using classical algorithms. This is due to the fact that the methods using evolutionary algorithms during training several times build trees and evaluate their quality, while classical algorithms do it only once.

A detailed analysis of the computational performance of the proposed methods, parallel computations in evolutionary algorithms, initialization of initial population by results of the classical decision tree inductions algorithms and evolutionary analogue of gradient boosting are possible areas for further research.

References

Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)
MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., Massachusetts (1993)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton (1984)
MATH Google Scholar
De Mantaras, R.L.: A distance-based attribute selection measure for decision tree induction. Mach. Learn. 6(1), 81–92 (1991)
Article Google Scholar
Tasgetiren, M., Liang, Y., Sevkli, M., Gencyilmaz, G.: Differential evolution algorithm for permutation flowshop sequencing problem with makespan criterion. In: Proceedings of the 4th International Symposium on Intelligent Manufacturing Systems (IMS 2004), pp. 442–452 (2004)
Google Scholar
Rechenberg, I., Eigen, M.: Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. Frommann-Holzboog, Stuttgart (1973)
Google Scholar
Jankowski, D., Jackowski, K.: Evolutionary algorithm for decision tree induction. In: Saeed, K., Snášel, V. (eds.) CISIM 2014. LNCS, vol. 8838, pp. 23–32. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45237-0_4
Chapter Google Scholar
Rivera-Lopez, R., Canul-Reich, J.: Differential evolution algorithm in the construction of interpretable classification models. In: Artificial Intelligence-Emerging Trends and Applications, pp. 49–73 (2018)
Google Scholar
Basgalupp, M.P., Carvalho, A., Barros, R.C., Freitas, A.: A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(3), 291–312 (2012)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Wadsworth, Belmont. Republished by CRC Press, CA (1984)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, New Jersey (1994)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (2001)
Article MathSciNet Google Scholar
Back, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming Genetic Algorithms. Oxford University Press, Oxford (1996)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Yandex, Moscow, Russia
Evgeny Dolotov & Nikolai Zolotykh
National Research University Higher School of Economics, Moscow, Russia
Evgeny Dolotov & Nikolai Zolotykh

Authors

Evgeny Dolotov
View author publications
You can also search for this author in PubMed Google Scholar
Nikolai Zolotykh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evgeny Dolotov .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Institute of Mathematics and Mechanics Yekaterinburg, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Moscow, Russia
Valentina Kuskova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Moscow State University, Moscow, Russia
Natalia Loukachevitch
Loria, Vandoeuvre lès Nancy, France
Amedeo Napoli
University of Florida, Gainesville, USA
Panos M. Pardalos
Ca' Foscari University of Venice, Venezia Mestre, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dolotov, E., Zolotykh, N. (2020). Evolutionary Algorithms for Constructing an Ensemble of Decision Trees. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-39575-9_2
Published: 02 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39574-2
Online ISBN: 978-3-030-39575-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Abstract

Similar content being viewed by others

Quality Diversity Genetic Programming for Learning Decision Tree Ensembles

Evolutionary Algorithm for Decision Tree Induction

HEAD-DT: Fitness Function Analysis

Keywords

1 Introduction

2 Related Work

3 Proposed Approach

3.1 Real-Valued Vector Representation

3.2 Differential Evolution

3.3 Evolution Strategies

3.4 Construction of Ensembles

4 Experiment

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Abstract

Similar content being viewed by others

Quality Diversity Genetic Programming for Learning Decision Tree Ensembles

Evolutionary Algorithm for Decision Tree Induction

HEAD-DT: Fitness Function Analysis

Keywords

1 Introduction

2 Related Work

3 Proposed Approach

3.1 Real-Valued Vector Representation

3.2 Differential Evolution

3.3 Evolution Strategies

3.4 Construction of Ensembles

4 Experiment

5 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation