Ensemble Method Combination: Bagging and Boosting

Deshmukh, Jyoti; Jangid, Mukul; Gupte, Shreeshail; Ghosh, Siddhartha; Ingle, Shubham

doi:10.1007/978-981-15-3242-9_38

Jyoti Deshmukh⁸,
Mukul Jangid⁸,
Shreeshail Gupte⁸,
Siddhartha Ghosh⁸ &
…
Shubham Ingle⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

1065 Accesses
2 Citations

Abstract

Boosting and bagging are ensemble techniques that create decision committees using the base learning algorithms. Both have their own merits and weakness in learning from different types of datasets. Boosting has a better efficiency than bagging on noiseless data, whereas for noisy data bagging is more efficient. Combining the two can overcome the demerits and reduce the error in prediction. Here in the proposed method, the concept of variance reduction of bagging is applied to boosting. For this, the dataset is divided based on attributes into different size of combinations. Individual boosting is performed on these smaller subsets of data and then for final prediction, the results are added. The results of the proposed method are compared with boosting, bagging and the combined method over a range of standard UCI datasets in terms of the accuracies.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Top-k Parametrized Boost

Application of Ensemble Methods of Strengthening in Search of Legal Information

An Ensemble Method Based on AdaBoost and Meta-Learning

Keywords

1 Introduction

Machine learning using a single model is more prone to error and is less efficient. Error is disintegrated into 3 terms: bias, variance and irreducible error term. Bias is contributing error due to model simplifying the learning function and variance is contributing error due to learner making the function more complex. Ensemble learning is a better method of utilizing the notion that combining the output of different models produces more accurate results. The success in ensemble learners lies when the sub-ensemble classifiers are able to overcome the failure of each other and generate diverse predictions. Two popular ensemble learning methods: boosting and bagging are applied successfully to classification problems.

Boosting and bagging are homogenous ensemble methods. They use the same base learning algorithm but the dataset distribution is changed. Data can be distributed by sampling the instances, another approach is by dividing the data into feature subsets. Features are selected by evaluating which feature are more relevant for classifying the output. Features can be evaluated individually or as a subset. Creating subsets requires the grouping of features in different sizes. Evaluating a subset requires more computation and time.

In the proposed method boosting is used in bagging algorithm as a base learner. Boosting itself will use its own base learner C4.5 [1]. Boosting is affected by variance error. Using multiple boosting on different data subsets makes the combined model have less variance. In this, the number of instances given to each boosting algorithm remains the same. In the results section, the results of the proposed method are compared with the combined method and individual bagging and boosting on standard datasets.

Ensembling is a process of merging at least two procedures of homogeneous or heterogeneous nature called base learners. This increases the robustness of the system which accepts the predictions from all the base learners. It is like a committee among various subcommittees to arrive at a conclusion on the problem. All of them have a different outlook on the problem thus a dissimilar mapping function from the problem statement to the favorable outcome. As a result, they have to make different predictions on the problem according to their own outlooks.

The final decision is made by accounting on all the predictions. Hence the output decision will be precise, less bias and robust. If one of these sub-committees decided this alone, the final decision might be contradictory. A single learner might not be able to predict with great accuracy. But the numerous feedbacks of multiple learners usually increase the accuracy. In the regression problem or the classification problem, the mean of probability predictions from the models are calculated by Averaging.

Majority votes consider the prediction with the highest vote count from various models predictions while predicting the results of a classification problem [2].

In weighted average, different weights are taken which are to be applied for predictions from various models, then considering the mean which results in providing high or low importance to specific model output.

2 Related Work

2.1 Bagging as an Ensemble

The paper “A Bagging Method using Decision Trees in the Role of Base Classifiers” proposed by the authors Kristna Machov, Frantiek Bark, Peter Bednr described a list of experiments with bagging—a process that can be used to enhance the results of the classification algorithm [3]. Their use of this process aims at classification algorithms implying decision trees. Bagging procedure improves the performance of classification using the minimum number of the decision trees.

2.2 Boosting as an Ensemble

The authors Thakkar et al. Proposed a paper on “Boost a Weak Learner to a Strong Learner Using Ensemble System Approach” which demonstrated the weak learner’s potential to reduce the rate of an error on the testing data and the ability of boosting algorithm to reduce the error rate of the weak learner [4]. In the experiment, they utilized decision stump as a weak learner (classifier) and by utilizing the boosting approach, the output denotes the enhancement in the classifier’s accuracy. The boosting meta-algorithm is a simple and well-organized model building strategy. Boosting motivates new model to become specialists, for instance, handle wrongly by earlier ones. A model’s contribution is weighted by its performance than giving equal weight to all the methods. Boosting algorithms creates various models from a dataset, consisting of some model builder techniques such as a decision tree builder which may not be a good model builder. The basic idea of boosting is that each entity in the dataset is mapped with a weight. If a model incorrectly classifies the entity, then a series of models are built and the weights are increased (boosted). The final model is then a sum of different models built from the sequence of models in which each model’s output weighted by little score.

2.3 Techniques for Combining Bagging and Boosting

In [5], the authors proposed a methodology in which the expected error of a learning procedure on a specific target function having training set size of 3 components: [1] A bias term computing how close the mean classifier makes by the learning procedure will be to the target function. [2] A variance term computes the amount of the every learning algorithms estimates how often they disagree. [3] A variable that measures the least classification error that is related with the Bayes optimal classifier for the target function. For enhancing the prediction of a classifier, authors suggest arranging bagging and boosting methodology with sum rule voting (Vote BB). All sub-ensemble gives a confidence value for each candidate class. In the proposed method, the voters express the degree of their preference as confidence weighs the chances of sub ensemble prediction. Later, confidence values are summed for each candidate and the candidate with the maximum sum proves to be superior.

MultiBoosting [6] is an another method proposed by Webb G. I. which is acknowledged as wagging committees formed by AdaBoost. Wagging is a aberrant version of bagging, as wagging makes use of re-weighting for every training example and bagging uses resampling to receive the datasets.

BagBoo A Scalable Hybrid Boosting and Bagging Model [7] proposed by Gorodilov et al. is another method for combining bagging and boosting into a hybrid approach. The only important change from bagging and boosting is that they bagged boosted models which allowed the resulting model to be quite a bit more powerful.

3 Proposed Methodology

A comparative study on bagging [8,9,10] and the boosting [11, 12] algorithms is performed and combined them to eliminate both of their flaws. In the proposed method, f̂ features are selected from the originally present f features. Then s number of subsets is randomly sampled without replacement from the total subsets generated of f̂ features. The same number of samples are present in all the subsets. The boosting algorithm is fitted on each training subset giving s learned hypothesis function. Prediction is made on test tuple by all the models. For each class a confidence value which is the probability between 0 and 1 is returned, this is summed and the maximum confidence class is predicted output. Algorithm 1 presents Random attribute subset selection

Algorithm 1: Random attribute subset selection

a.
Input.

Set all training sets {X_i,Y_i} where i = 1 to n
X consists of different features i.e. (X₁, X₂,…,X_k), Y is the target variable

b.
Procedure.

Apply Base Learning Algorithm as AdaBoost
Generate all combinations of attributes of length m
Subset: Select s number of random subsets

c.
For i to s do

M[i]: = AdaBoost(subset[i])
end for

d.
Output.: argmaxΣ_iM[i]

Figure 1 describes the workflow of the proposed methodology which starts with the training of the dataset including cleaning by filtering the dataset. N bags of the dataset are created, each portraying random attributes and applying boosting on each bag and later combining the results to give the predicted value.

4 Experiment Results

Experiments are performed in Python. The experiments are carried out on 64-bit Core i5–7th generation 3 GHz processor, 8 GB RAM. In the experiment, the aim is to compare the proposed method of integrating boosting consisting random feature subset selection with bagging, boosting and sum vote combining [5] classifiers. The 22 datasets of various domains are selected from UCI repository [13] containing a different number of instances, types of features: numerical as well as categorical and class distributions of two till ten classes. The base sub-classifier CART algorithm induced decision tree is used for all the methods. In the literature [5] the authors compared several base classifiers like Decision tree, Decision stump, Bayesian algorithm and Rule Learner on various datasets and concluded that Decision tree gave the best results compared to the others. As referring to the literature, Decision Tree is used as our base estimators in the proposed method. Equal count of base sub-classifier is combined in all the models for making the results comparable equally. The accuracy is estimated by taking an average of tenfold stratified cross validation. The database is split into ten folds by sampling and rearranging without replacement. The classifier algorithm is tested on each fold and trained on the remaining nine folds. Other evaluation metrics—recall, precision, and specificity are computed for each database.

Experiment is first tested by changing the number of features selected and the number of subsets selected to decide the proportion of randomly picked feature subsets from feature vector giving the optimal result. The time taken for running the proposed algorithm takes more time than bagging and boosting due to algorithm running on additional data subspaces. Increasing the number of bags or subsets, consumes more time for training. It can be parallelized easily to decrease the running time.

Table 1 describes databases which are used to express the count of instances, categorical features, numerical features and classes. Here some well-known datasets are used for the comparison of the results.

Table 1 Description of datasets

Full size table

Various experimentation is performed on all the datasets with a various count of base estimators. It is found that when the count of base classifiers is 5, the optimum result is achieved with least time and space complexity as shown in Table 2.

Table 2 Comparative analysis of the number of base classifiers with Breast-cancer, Car and Haberman datasets

Full size table

Bagging, boosting and combining using vote B and B are executed on the different datasets, for each method accuracy, recall, precision and specificity quality evaluation metrics are estimated and listed in Tables 3, 4 and 5.

Table 3 Result of bagging classifier using decision tree

Full size table

Table 4 Result of Adaboost classifier using decision tree

Full size table

Table 5 Result of vote bagging and boosting ensemble

Full size table

Figure 2 visually represents the results of vote bagging and boosting ensemble on datasets.

Comparison of the 3 ensemble methods is performed with the proposed method whose result is shown in Table 6. It contains the evaluation metrics, number of attributes taken and number of bags which is the subsets selected randomly from different combinations of attributes.

Table 6 Result of proposed method on datasets

Full size table

Figure 3 visually represents the results of proposed method on different datasets.

The proposed method has higher accuracy in 10 out of 22 datasets for bagging, 15 for boosting and 8 for vote B and B method. Comparing the Tables 4 and 6 it is seen that presented classifier gives better precision in 15 out of 22 datasets than boosting alone. Precision is better in 4 datasets for both bagging and combine vote B and B method. It states that positive predictions made by the proposed classifier are more relevant. The accuracy values also depend upon the random features selected. A feature can contribute more to the classifier prediction than others.

5 Conclusion and Future Scope

Authors compared the ensemble methodologies of bagging and boosting with the proposed methodology. Further, authors also compared the proposed methodology to that of combining bagging and boosting using VOTE B and B method. After the comparison, it reveals that in almost all the cases it got better results than boosting alone. In 8 datasets (Australian, Breast-cancer, Haberman, Heart-statlog, Heart-h, Liver-IND, Somhappy, Travel Insurance) the proposed methodology of combination gave better results, whereas the combination using VOTE B and B gave better results in the remaining 14 datasets. Further, the authors stated that, boosting works better compared to bagging on noiseless dataset, whereas bagging is more efficient compared to boosting on the data containing noise. The methodology the authors proposed has been proved to give better accuracy and achieve a lower error rate in general than bagging and boosting considering Decision Tree as the base classifier.

References

Quinlan JR (1993) C4. 5: program for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants Mach Learn 36(1–2):105–139
Google Scholar
Machová K, Barcak F, Bednár P (2006) A bagging method using decision trees in the role of base classifiers. Acta Polytech Hung 3(2):121–132
Google Scholar
Vaghela VB, Ganatra A, Amit Thakkar A (2009) Boost a weak learner to a strong learner using ensemble system approach. In: 2009 IEEE international advance computing conference, IEEE. pp 1432–1436
Google Scholar
Kotsiantis SB, Pintelas PE (2007) Combining bagging and boosting. Int J Mathe Comput Phys Electr Comput Eng 1(8):372–381 (2007)
Google Scholar
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196
Article Google Scholar
Pavlov DY, Gorodilov A, Brunk CA (2010) BagBoo: a scalable hybrid bagging-the-boosting model. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM
Google Scholar
Breiman L (1996) Bias, variance, and arcing classifiers. Tech. Rep. 460, Statistics. Department, University of California, Berkeley, CA, USA
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Google Scholar
Breiman L (1984) Classification and regression trees, 1st edn. Routledge, New York
MATH Google Scholar
Freund Y, Schapire RE (1996, July) Experiments with a new boosting algorithm. In: International conference on machine learning. pp 148–156
Google Scholar
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26:1651–1686
Article MathSciNet Google Scholar
Blake C, Merz CJ (1998) UCI repository of machine learning databases [Machine-readable data repository]. University of California, Department of Information and Computer Science, Irvine, CA
Google Scholar

Download references

Acknowledgements

The authors intend to convey gratefulness to David Aha and co-founders, for providing various UCI datasets for our work. Authors would also like to thank Anthony Goldbloom and Ben Hamner for providing Kaggle datasets.

Author information

Authors and Affiliations

Department of Computer Engineering, Rajiv Gandhi Institute of Technology, Andheri West, Mumbai, Maharashtra, 400053, India
Jyoti Deshmukh, Mukul Jangid, Shreeshail Gupte, Siddhartha Ghosh & Shubham Ingle

Authors

Jyoti Deshmukh
View author publications
You can also search for this author in PubMed Google Scholar
Mukul Jangid
View author publications
You can also search for this author in PubMed Google Scholar
Shreeshail Gupte
View author publications
You can also search for this author in PubMed Google Scholar
Siddhartha Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Ingle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mukul Jangid .

Editor information

Editors and Affiliations

Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, Maharashtra, India
Hari Vasudevan
Department of Computing Sciences, Tampere University of Technology, Tampere, Finland
Antonis Michalas
Department of Computer Engineering, Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, Maharashtra, India
Narendra Shekokar
Department of Computer Engineering, Dwarkadas Jivanlal Sanghvi College of Engineering, Mumbai, Maharashtra, India
Meera Narvekar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deshmukh, J., Jangid, M., Gupte, S., Ghosh, S., Ingle, S. (2020). Ensemble Method Combination: Bagging and Boosting. In: Vasudevan, H., Michalas, A., Shekokar, N., Narvekar, M. (eds) Advanced Computing Technologies and Applications. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-3242-9_38

Download citation

DOI: https://doi.org/10.1007/978-981-15-3242-9_38
Published: 07 May 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3241-2
Online ISBN: 978-981-15-3242-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Ensemble Method Combination: Bagging and Boosting

Abstract