OC1-DE: A Differential Evolution Based Approach for Inducing Oblique Decision Trees

Rivera-Lopez, Rafael; Canul-Reich, Juana; Gámez, José A.; Puerta, José M.

doi:10.1007/978-3-319-59063-9_38

Rafael Rivera-Lopez¹⁹,
Juana Canul-Reich²⁰,
José A. Gámez²¹ &
…
José M. Puerta²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10245))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1976 Accesses
7 Citations

Abstract

This paper describes the application of a Differential Evolution based approach for inducing oblique decision trees in a recursive partitioning strategy. Considering that: (1) the task of finding an optimal hyperplane with real-valued coefficients is a complex optimization problem in a continuous space, and (2) metaheuristics have been successfully applied for solving this type of problems, in this work a differential evolution algorithm is applied with the aim of finding near-optimal hyperplanes that will be used as test conditions of an oblique decision tree. The experimental results show that this approach induces more accurate classifiers than those produced by other proposed induction methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Global Search Approach for Inducing Oblique Decision Trees Using Differential Evolution

Comparative Study of the Starting Stage of Adaptive Differential Evolution on the Induction of Oblique Decision Trees

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Keywords

1 Introduction

Nowadays, data mining has emerged as a very useful tool for extracting knowledge from any source of information including text files, large data repositories, and data in the cloud. Machine learning provides data mining with useful procedures for constructing models from data. The most representative machine learning approaches are supervised learning, in which a model is learned from labeled data, and unsupervised learning, where a model is obtained from unlabeled data. The main supervised techniques are classification and regression, while clustering is the main unsupervised technique. In particular, decision trees are a classification method that allows to construct simple models with high interpretability level.

Oblique DTs are classifiers that split the data using hyperplanes that are oblique to the axis of the instance space. Oblique DTs are generally more compact and precise than univariate DTs. In this paper, the application of a differential evolution-based approach named OC1-DE for inducing oblique decision trees in a recursive partitioning strategy is described. Experimental results obtained show that OC1-DE induces more accurate classifiers than those produced by other proposed induction methods.

In order to describe the implementation of OC1-DE method, this paper is organized as follows: Sect. 2 describes the characteristics of the decision tree induction process for classifications tasks. The use of metaheuristics for inducing oblique decision trees is described in Sect. 3 and details of differential evolution algorithm are given in Sect. 4. In Sect. 5 the OC1-DE method is outlined and experimental results are presented in Sect. 6. Finally, Sect. 7 holds conclusions and future work of this proposal.

2 Induction of Decision Trees for Classification

A decision tree (DT) is a classification model constructed using a set of training instances in order to define a simple and effective procedure for predicting the class membership of new unclassified instances. A DT is a connected acyclic graph whose nodes contain both test conditions (internal nodes) and class labels (leaf nodes) and whose arcs represent the possible results of each test condition. This graph defines a hierarchical evaluation schema that stands out for its simplicity and its high level of interpretability. These characteristics along with its predictive power allow to place to DT as one of the most widely used classifier. It should be noted that two methods for decision tree induction (CART [4] and C4.5 [36]) have been considered in the top ten most influential data mining algorithms [46].

The great majority of algorithms for inducing decision trees apply a recursive partitioning strategy that implements some splitting criterion in order to separate the training instances. This partitioning procedure is usually complemented with a pruning mechanism in order to avoid overfitting and to improve the performance of the classifier. Furthermore, other strategies such as a global search in the space of DTs have been utilized with the aim of reaching an optimal classifier [19, 29, 44] although it is known that building optimal DTs is NP-Hard [18].

In accordance to the number of attributes used in test conditions of DT internal nodes, two types of DTs can be induced: univariate and multivariate DTs. ID3 [34], C4.5 [36], J48 [45] and C5.0 are methods for inducing univariate DTs in which a single attribute is evaluated to split the training set. An advantage of univariate DTs is that they are easily interpretable, however when the instances cannot be separated by axis-parallel hyperplanes, induced DTs are more complex as they contain a number of internal nodes [37].

On the other hand, a combination of attributes participates in the test condition at each internal node of a multivariate DT. In case of a linear combination of attributes the induced classifier is known as oblique DT [31] or Perceptron DT [2, 29]. Hyperplanes splitting the training instances in this type of DT have an oblique orientation relative to the axes of instance space. Oblique DTs are generally smaller and more accurate than univariate DTs but they are generally more difficult to interpret [5]. In addition, finding the best partition using a single attribute takes low computational effort, however determining the best partition using an oblique hyperplane is a NP-hard problem [16]. CART [4], OC1 [31], LMDT [41] and LTree [10] are methods for inducing oblique DTs.

The training set used for inducing oblique DTs is a group of instances, where each instance can be represented with a real-valued vector $\mathbf v =\left( v_1, v_2, \dots , v_d,c \right) ^T$ of d attributes and a label c indicating the class membership of the instance. A hyperplane divides the instance space in two halfspaces and it is defined as follows:

$$\begin{aligned} \sum _{i=1}^d x_iv_i + x_{d+1} > 0 \end{aligned}$$

(1)

where $v_i$ is the value of attribute i, $x_i$ is a real-valued coefficient in the hyperplane and $x_{d+1}$ represents the independent term.

3 Metaheuristics for Inducing Oblique Decision Trees

Metaheuristics are methods that implement both local search procedures and higher level strategies to perform a robust search of a solution space [14]. They have been used for inducing both univariate and multivariate DTs. Metaheuristics can be classified into three categories: single-solution based procedures, evolutionary algorithms and swarm intelligence methods.

Although global search is the most commonly used strategy for inducing DTs in which metaheuristic-based methods are applied, several proposals using metaheuristics through a recursive partitioning strategy have been developed. Single-solution based metaheuristics such as stochastic local search (SLS) [31], simulated annealing (SA) [5, 17] and tabu search (TS) [26, 32], as well as evolutionary algorithms such as genetic algorithms (GA) [5, 6, 20, 39] and evolutionary strategies (ES) [5, 47] have been used for finding good hyperplanes of an oblique DT.

Simulated Annealing of Decision Trees (SADT) method [17] applies SA to find a near-optimal hyperplane for each internal node of an oblique DT. Using a fixed initial hyperplane, SADT iteratively perturbs one coefficient of this hyperplane and each new value is accepted as a coefficient according to the Boltzmann criterion. OC1 method [31] implements a two-step process for searching a better hyperplane: First, it uses a deterministic strategy that adjusts the coefficients of a hyperplane individually, finding a local optimal value for one coefficient at a time. Once this value is reached, OC1 applies SLS in order to jump out of this local optimum value. OC1-AP [30] is a OC1 variant that induces only univariate DTs. OC1 variants such as OC1-SA, OC1-ES and OC1-GA [5] simultaneously modify several coefficients of the best hyperplane produced by OC1-AP, using SA, ES and GA, respectively. Furthermore, TS has been used in two approaches: (1) in conjunction with linear discriminant analysis in the LDTS method [26], and (2) in combination with a discrete support vector machine in the LDSDT${_{TS}}$ method [32].

Others have used evolutionary algorithms for inducing oblique DTs: a multi-membered ES is applied for finding near-optimal hyperplanes in the MESODT method [47] and GA is implemented in two proposals: (1) in the BTGA method [6] that uses a binary chromosome for representing the hyperplane coefficients, and (2) in a method based on dipoles^{Footnote 1} [20] that encodes a chromosome by means of a real-valued vector. An especial GA known as HereBoy algorithm [24] that evolves a unique chromosome is used in HBDT method [39] for inducing an oblique DT.

On the other hand, GA [9, 15, 21, 33, 43], differential evolution [29] and genetic programming (GP) [1, 3, 28] have been used for inducing oblique DTs in a global search strategy.

4 Differential Evolution Algorithm

Differential evolution (DE) is an efficient evolutionary algorithm designed for solving optimization problems in continuous spaces [38]. DE encodes candidate solutions by means of real-valued vectors and it applies a difference vector to disrupt a population of these solutions. At the begining an initial population of candidate solutions is generated, then the iterative part of DE follows. At each iteration of DE a new population is constructed using mutation, crossover and selection operators. Instead of implementing traditional crossover and mutation operators, DE applies a linear combination of several candidate solutions selected randomly to produce a new solution. Finally, when the stop condition is fulfilled DE return the best candidate solution in the current population. An advantage of DE is that it utilizes few control parameters: a crossover rate Cr, a mutation scale factor F, and a population size NP.

DE/A/B/C is the notation commonly adopted for identifying DE variants, where A represents the selection procedure of candidate solutions that will be used for constructing a new solution, B is the number of difference vectors used for the mutation operator, and C identifies the type of crossover operator [7]. The most popular DE variant is the DE/rand/1/bin that consists in combining three randomly selecting candidate solutions ($\mathbf x _{a}$, $\mathbf x _{b}$ and $\mathbf x _{c}$), using Eq. (2), to yield a mutated solution $\mathbf x _{mut}$.

$$\begin{aligned} \mathbf x _{mut} = \mathbf x _{a} + F \left( \mathbf x _{b} - \mathbf x _{c}\right) \end{aligned}$$

(2)

Mutated solution is utilized to perturb the values of another candidate solution $\mathbf x _{cur}$ using the binomial crossover operator as follows:

$$\begin{aligned} x_{new,j} = {\left\{ \begin{array}{ll} x_{mut,j} &{} \text {if } r \le Cr \vee j = k \\ x_{cur,j} &{} \text {otherwise} \end{array}\right. } ; j \in \{1, \dots , d\} \end{aligned}$$

(3)

where $x_{new,j}$, $x_{mut,j}$ and $x_{cur,j}$ are the values of the j-th position of $\mathbf x _{new}$, $\mathbf x _{mut}$ and $\mathbf x _{cur}$, respectively, and $r \in [0,1)$ and $k \in \{1, \dots , d\}$ are uniformly distributed random numbers.

Finally, $\mathbf x _{new}$ is selected as a member of the new population if it has a better fitness value than that of $\mathbf x _{cur}$.

DE has been used in conjunction with support vector machines [25], neural networks [23], bayesian classifiers [13] and instance based classifiers [11] for implementing classification methods. It has been mainly used for optimizing the parameters of classification methods or for conducting preprocessing tasks in a data mining process.

In the case of DTs, DE is applied to finding the parameter settings of a classification method and for participating in the decision tree induction process. In DEMO algorithm [40] DE finds the J48 parameters in order to yield accurate and small DTs. On the other hand, TreeDE method [42] induces univariate DTs for mapping full trees to vectors and for representing discrete symbols by points in a real-valued vector space. In an improved version of TreeDE method [22] each candidate solution has two types of genes: one express the neighborhood structure between the symbols and the other represents a full tree structure. Finally, in a method for inducing perceptron DTs [29], the coefficients of all possible hyperplanes of an oblique-DT with a fixed size are encoded in a matrix, and then DE evolves the matrix values in order to construct optimized oblique DTs.

5 OC1-DE Method for Inducing Oblique Decision Trees

In this work a method for inducing an oblique DT using DE/rand/1/bin in a recursive partitioning strategy is introduced. This method, named OC1-DE, is similar to OC1 and its variants but it applies DE to finding a near-optimal hyperplane at each internal node of an oblique DT. Since the task of finding a near-optimal hyperplane with real-valued coefficients is an optimization problem in a continuous space, DE operators can be applied without any modification and OC1-DE should induce better oblique DTs.

When a recursive partitioning strategy is implemented for inducing oblique DTs, the classification method starts finding the hyperplane that best splits an instance set into two subsets. This hyperplane will be used as test condition of a new internal node that is added in DT. This procedure is recursively applied using each subset until a leaf node is created due to all instances in the subset have the same class label or a threshold value of unclassified instances is reached. Quality of hyperplane is obtained using a splitting criterion that measures the impurity of a partition or some other discriminant value. Finally, a pruning procedure is applied in order to reduce the overfitting of DT produced and to improve its predictive power.

The DE implementation for finding a near-optimal hyperplane at each internal node of an oblique DT is shown in Algorithm 1. In a similar fashion to OC1-GA method [5] the axis-parallel hyperplane that best splits a set of training instances is obtained (line 2). This hyperplane is copied to $10\%$ of the initial population and remaining hyperplanes are randomly created (line 3). Each random hyperplane is constructed considering that almost two instances with different class label are separated by the hyperplane. This population is evolved through several generations using DE operators (lines 4–13) and the best hyperplane in population is selected (line 14). Algorithm returns the hyperplane selected between the best axis-parallel hyperplane and the best oblique hyperplane produced by DE (line 15).

6 Experiments

In order to evaluate the performance of OC1-DE and for comparison with other approaches, two experiments were conducted using several datasets with numerical attributes chosen from UCI repository [27]. Table 1 shows the description of datasets used in experiments. Results were obtained by applying repeated k-fold stratified cross-validation (CV). OC1-DE is compared with OC1 method [31], three OC1 variants proposed in [5] and the HBDT method described in [39].

DE version implemented in Java language is the canonical DE/rand/1/bin. The parameters used in the experiments are described in Table 2. The population size is adjusted in accordance with [5] and the number of iterations of DE is small with the aim of reducing the time of experiments due to DE is applied at each internal node of the induced oblique DT.

Table 1. Description of datasets used in the experiments.

Full size table

Table 2. Parameters used in experiments with OC1-DE.

Full size table

In the first experiment OC1-DE is compared to OC1 variants applying 5-fold CV. 10 independent runs of each dataset are conducted. For each run, an oblique DT is induced and its test accuracy is calculated. Both average accuracy and average DT size across these 10 runs are obtained. Table 3 shows the experimental results^{Footnote 2}: Columns 2–9 show both average accuracy and average DT size of induced oblique DTs reported by [5]. Results obtained by OC1-DE are shown in columns 10–11 of this table. In this table can be observed that accuracies obtained by OC1-DE are better than those reported by OC1 variants, although in general OC1-DE produces oblique-DTs with more leaf nodes than those produced by OC1 variants. Fig. 1(a) shows a comparative plot of the average accuracies obtained.

In the second experiment OC1-DE is compared with HBDT method. 5 independent runs of 10-fold CV are conducted. Results are shown in Table 4: Columns 2–5 show both average accuracy and average DT size of induced oblique DTs reported by [39]. Results obtained by OC1-DE are shown in columns 6–7 of this table. In this table can be observed that both accuracies and DT sizes obtained by OC1-DE are comparable with those reported by HBDT method. OC1 induces oblique DTs with better accuracies for 2 datasets, HDBT constructs DTs with better accuracies for 5 datasets and the accuracies of DTs induced using OC1-DE are better than those reported by [39] for 8 datasets. Fig. 1(b) shows a comparative plot of the average accuracies obtained.

Table 3. Accuracy and size obtained for OC1 variants and OC1-DE.

Full size table

Table 4. Comparison of accuracy and size reported of HBDT and OC1-DE.

Full size table

In order to evaluate the performance of OC1-DE a statistical test of the results obtained was realized. Due to the conditions for using parametric test are not guaranteed when analysing results of evolutionary algorithms [12], the Friedman test is applied for detecting the existence of significant differences between the performance of two or more methods and the Nemenyi post-hoc test is utilized for checking these differences. Nemenyi test uses the average ranks of each classifier and checks for each pair of classifiers whether the difference between their ranks is greater than the critical difference [8] defined as follows:

$$\begin{aligned} CD = q_{\alpha } \sqrt{\frac{k(k+1)}{6N}} \end{aligned}$$

(4)

where CD is the critical difference, k is the number of methods, N is the number of datasets, and $q_{\alpha }$ is a critical value associated of the significance level $\alpha $.

For the first experiment the Friedman statistic for 5 methods using 7 datasets is 15.507 and the p-value obtained is 0.003757 for 4 degrees of freedom (dof) of chi-square distribution. This p-value indicates the existence of statistical differences between these methods and then Nemenyi test post-hoc is executed. Fig. 2(a) shows the critical differences obtained for Nemenyi test and it shown that OC1-DE is the best method compared with OC1 variants. For the second experiment, Friedman statistic for 3 methods using 15 datasets is 6.8136 and the p-value is 0.03315 for 2 dof and it also indicates statistical differences between these methods. Fig. 2(b) shows that OC1-DE is has better performance that OC1 and it have a comparable performance with HDBT.

7 Conclusions

In this paper, OC1-DE method that uses the differential evolution algorithm to find a near-optimal hyperplane with real-valued coefficients at each internal node of an oblique DT is introduced. OC1-DE is a recursively partitioning scheme based on OC1 in which the DE operators can be applied without any modification. OC1-DE was evaluated using a stratified cross-validation procedure with several UCI datasets and statistical tests suggest that OC1-DE achieves a better performance than OC1 variants and HBDT method. In general, OC1-DE induces more accurate oblique DTs although their sizes are overall larger than those produced for other methods. Based on our results, future work will be oriented to evaluate other DE variants for inducing oblique DTs and to investigate the effect of using several parameter configurations on the OC1-DE performance, also more experiments will be conducted for analyzing the OC1-DE execution time, as well as to compare the OC1-DE performance with those obtained by other classification methods such as random forest and support vector machines.

Notes

1.
A dipole is a pair of instances in training set represented as vectors.
2.
Highest values for each dataset are in bold.

References

Agapitos, A., O’Neill, M., Brabazon, A., Theodoridis, T.: Maximum margin decision surfaces for increased generalisation in evolutionary decision tree learning. In: Silva, S., Foster, J.A., Nicolau, M., Machado, P., Giacobini, M. (eds.) EuroGP 2011. LNCS, vol. 6621, pp. 61–72. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20407-4_6
Chapter Google Scholar
Bennett, K.P., Cristianini, N., Shawe-Taylor, J., Wu, D.: Enlarging the margins in perceptron decision trees. Mach. Learn. 41(3), 295–313 (2000). doi:10.1023/A:1007600130808
Article MATH Google Scholar
Bot, M.C.J., Langdon, W.B.: Improving induction of linear classification trees with genetic programming. In: Whitley, L.D., Goldberg, D.E., Cantú-Paz, E., Spector, L., Parmee, I.C., Beyer, H.G. (eds.) GECCO-2000, pp. 403–410. Morgan Kaufmann (2000)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Taylor & Francis, Abington (1984)
MATH Google Scholar
Cantú-Paz, E., Kamath, C.: Inducing oblique decision trees with evolutionary algorithms. IEEE Trans. Evol. Comput. 7(1), 54–68 (2003). doi:10.1109/TEVC.2002.806857
Article Google Scholar
Chai, B.B., Zhuang, X., Zhao, Y., Sklansky, J.: Binary linear decision tree with genetic algorithm. In: ICPR 1996, vol. 4, pp. 530–534 IEEE (1996). doi:10.1109/ICPR.1996.547621
Das, S., Suganthan, P.N.: Differential evolution: A survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4–31 (2011). doi:10.1109/TEVC.2010.2059031
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Dumitrescu, D., András, J.: Generalized decision trees built with evolutionary techniques. Stud. Inf. Control 14(1), 15–22 (2005)
MATH Google Scholar
Gama, J., Brazdil, P.: Linear tree. Intell. Data Anal. 3(1), 1–22 (1999). doi:10.1016/S1088-467X(99)00002-5
Article MATH Google Scholar
García, S., Derrac, J., Triguero, I., Carmona, C.J., Herrera, F.: Evolutionary-based selection of generalized instances for imbalanced classification. Knowl. Based Syst. 25(1), 3–12 (2012). doi:10.1016/j.knosys.2011.01.012
Article Google Scholar
García, S., Fernández, A., Luengo, J., Herrera, F.: A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability. Soft Comput. 13(10), 959 (2008). doi:10.1007/s00500-008-0392-y
Article Google Scholar
Geetha, K., Baboo, S.S.: An empirical model for thyroid disease classification using evolutionary multivariate bayseian prediction method. Glob. J. Comput. Sci. Technol. 16(1), 1–9 (2016)
Google Scholar
Gendreau, M., Potvin, J.Y.: Handbook of Metaheuristics, vol. 2. Springer, Heidelberg (2010). doi:10.1007/978-1-4419-1665-5
Book MATH Google Scholar
Gray, J.B., Fan, G.: Classification tree analysis using TARGET. Comput. Stat. Data Anal. 52(3), 1362–1372 (2008). doi:10.1016/j.csda.2007.03.014
Article MathSciNet MATH Google Scholar
Heath, D.G.: A geometric framework for machine learning. Ph.D. thesis, Johns Hopkins University (1993)
Google Scholar
Heath, D.G., Kasif, S., Salzberg, S.: Induction of oblique decision trees. In: Bajcsy, R., et al. (ed.) IJCAI 1993, pp. 1002–1007 (1993)
Google Scholar
Hyafil, L., Rivest, R.L.: Constructing optimal binary decision trees is NP-complete. Inf. Process. Lett. 5(1), 15–17 (1976). doi:10.1016/0020-0190(76)90095-8
Article MathSciNet MATH Google Scholar
Jankowski, D., Jackowski, K.: Evolutionary algorithm for decision tree induction. In: Saeed, K., Snášel, V. (eds.) CISIM 2014. LNCS, vol. 8838, pp. 23–32. Springer, Heidelberg (2014). doi:10.1007/978-3-662-45237-0_4
Chapter Google Scholar
Krȩtowski, M.: An evolutionary algorithm for oblique decision tree induction. In: Rutkowski, L., Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS, vol. 3070, pp. 432–437. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24844-6_63
Chapter Google Scholar
Krętowski, M., Grześ, M.: Evolutionary learning of linear trees with embedded feature selection. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS, vol. 4029, pp. 400–409. Springer, Heidelberg (2006). doi:10.1007/11785231_43
Chapter Google Scholar
Kushida, J.I., Hara, A., Takahama, T.: A novel tree differential evolution using inter-symbol distance. In: IWCIA 2014, pp. 107–112. IEEE (2014). doi:10.1109/IWCIA.2014.6988087
Leema, N., Nehemiah, H.K., Kannan, A.: Neural network classifier optimization using differential evolution with global information and back propagation algorithm for clinical datasets. Appl. Soft Comput. 49, 834–844 (2016). doi:10.1016/j.asoc.2016.08.001
Article Google Scholar
Levi, D.: Hereboy: A fast evolutionary algorithm. In: Lohn, J., et al. (ed.) EH 2000, pp. 17–24. IEEE (2000). doi:10.1109/EH.2000.869338
Li, J., Ding, L., Li, B.: Differential evolution-based parameters optimisation and feature selection for support vector machine. Int. J. Comput. Sci. Eng. 13(4), 355–363 (2016). doi:10.1504/ijcse.2016.080212
Article Google Scholar
Li, X.B., Sweigart, J.R., Teng, J.T.C., Donohue, J.M., Thombs, L., Wang, S.M.: Multivariate decision trees using linear discriminants and tabu search. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 33(2), 194–205 (2003). doi:10.1109/TSMCA.2002.806499
Article Google Scholar
Lichman, M.: UCI Machine Learning Repository. University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Liu, K.H., Xu, C.G.: A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics 25(3), 331–337 (2009). doi:10.1093/bioinformatics/btn644
Article Google Scholar
Lopes, R.A., Freitas, A.R.R., Silva, R.C.P., Guimarães, F.G.: Differential evolution and perceptron decision trees for classification tasks. In: Yin, H., Costa, J.A.F., Barreto, G. (eds.) IDEAL 2012. LNCS, vol. 7435, pp. 550–557. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32639-4_67
Chapter Google Scholar
Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2(1), 1–32 (1994). doi:10.1613/jair.63
MATH Google Scholar
Murthy, S.K., Kasif, S., Salzberg, S., Beigel, R.: OC1: A randomized algorithm for building oblique decision trees. In: Proceedings of AAAI 93, vol. 93, pp. 322–327 (1993)
Google Scholar
Orsenigo, C., Vercellis, C.: Discrete support vector decision trees via tabu search. Comput. Stat. Data Anal. 47(2), 311–322 (2004). doi:10.1016/j.csda.2003.11.005
Article MathSciNet MATH Google Scholar
Pangilinan, J.M., Janssens, G.K.: Pareto-optimality of oblique decision trees from evolutionary algorithms. J. Glob. Optim. 51(2), 301–311 (2011). doi:10.1007/s10898-010-9614-9
Article MathSciNet MATH Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986). doi:10.1007/BF00116251
Google Scholar
Quinlan, J.R.: Simplifying decision trees. Int. J. Hum. Comput. Stud. 27(3), 221–234 (1987). doi:10.1006/ijhc.1987.0321
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Shali, A., Kangavari, M.R., Bina, B.: Using genetic programming for the induction of oblique decision trees. In: Arif-Wani, M. (ed.) ICMLA 2007, pp. 38–43. IEEE (2007). doi:10.1109/ICMLA.2007.66
Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997). doi:10.1023/A:1008202821328
Article MathSciNet MATH Google Scholar
Struharik, R., Vranjkovic, V., Dautovic, S., Novak, L.: Inducing oblique decision trees. In: SISY-2014, pp. 257–262. IEEE (2014). doi:10.1109/SISY.2014.6923596
Tušar, T.: Optimizing accuracy and size of decision trees. In: ERK-2007, pp. 81–84 (2007)
Google Scholar
Utgoff, P.E., Brodley, C.E.: Linear machine decision trees. University of Massachusetts, Amherst, MA, USA, Technical report (1991)
Google Scholar
Veenhuis, C.B.: Tree based differential evolution. In: Vanneschi, L., Gustafson, S., Moraglio, A., Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 208–219. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01181-8_18
Chapter Google Scholar
Vukobratović, B., Struharik, R.: Evolving full oblique decision trees. In: CINTI 2015, pp. 95–100. IEEE (2015). doi:10.1109/CINTI.2015.7382901
Wang, P., Tang, K., Weise, T., Tsang, E.P.K., Yao, X.: Multiobjective genetic programming for maximizing ROC performance. Neurocomputing 125, 102–118 (2014). doi:10.1016/j.neucom.2012.06.054
Article Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008). doi:10.1007/s10115-007-0114-2
Article Google Scholar
Zhang, K., Xu, Z., Buckles, B.P.: Oblique decision tree induction using multimembered evolution strategies. In: Dasarathy, B.V. (ed.) SPIE 2005, vol. 5812, pp. 263–270. SPIE (2005). doi:10.1117/12.596766

Download references

Acknowledgments

This work has been supported in part by the Mexican Government (CONACyT FOMIX-DICC project No. TAB-2014-C01-245876 and the PROMEP-SEP project No. DSA/103.5/15/6409).

Author information

Authors and Affiliations

Instituto Tecnológico de Veracruz, Veracruz, Mexico
Rafael Rivera-Lopez
Universidad Juárez Autónoma de Tabasco, Cunduacán, Mexico
Juana Canul-Reich
Universidad de Castilla-La Mancha, Albacete, Spain
José A. Gámez & José M. Puerta

Authors

Rafael Rivera-Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Juana Canul-Reich
View author publications
You can also search for this author in PubMed Google Scholar
José A. Gámez
View author publications
You can also search for this author in PubMed Google Scholar
José M. Puerta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juana Canul-Reich .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rivera-Lopez, R., Canul-Reich, J., Gámez, J.A., Puerta, J.M. (2017). OC1-DE: A Differential Evolution Based Approach for Inducing Oblique Decision Trees. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2017. Lecture Notes in Computer Science(), vol 10245. Springer, Cham. https://doi.org/10.1007/978-3-319-59063-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-59063-9_38
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59062-2
Online ISBN: 978-3-319-59063-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

OC1-DE: A Differential Evolution Based Approach for Inducing Oblique Decision Trees

Abstract

Similar content being viewed by others

A Global Search Approach for Inducing Oblique Decision Trees Using Differential Evolution

Comparative Study of the Starting Stage of Adaptive Differential Evolution on the Induction of Oblique Decision Trees

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Keywords

1 Introduction

2 Induction of Decision Trees for Classification

3 Metaheuristics for Inducing Oblique Decision Trees

4 Differential Evolution Algorithm

5 OC1-DE Method for Inducing Oblique Decision Trees

6 Experiments

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

OC1-DE: A Differential Evolution Based Approach for Inducing Oblique Decision Trees

Abstract

Similar content being viewed by others

A Global Search Approach for Inducing Oblique Decision Trees Using Differential Evolution

Comparative Study of the Starting Stage of Adaptive Differential Evolution on the Induction of Oblique Decision Trees

Evolutionary Algorithms for Constructing an Ensemble of Decision Trees

Keywords

1 Introduction

2 Induction of Decision Trees for Classification

3 Metaheuristics for Inducing Oblique Decision Trees

4 Differential Evolution Algorithm

5 OC1-DE Method for Inducing Oblique Decision Trees

6 Experiments

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation