Keywords

1 Introduction

Machine learning using a single model is more prone to error and is less efficient. Error is disintegrated into 3 terms: bias, variance and irreducible error term. Bias is contributing error due to model simplifying the learning function and variance is contributing error due to learner making the function more complex. Ensemble learning is a better method of utilizing the notion that combining the output of different models produces more accurate results. The success in ensemble learners lies when the sub-ensemble classifiers are able to overcome the failure of each other and generate diverse predictions. Two popular ensemble learning methods: boosting and bagging are applied successfully to classification problems.

Boosting and bagging are homogenous ensemble methods. They use the same base learning algorithm but the dataset distribution is changed. Data can be distributed by sampling the instances, another approach is by dividing the data into feature subsets. Features are selected by evaluating which feature are more relevant for classifying the output. Features can be evaluated individually or as a subset. Creating subsets requires the grouping of features in different sizes. Evaluating a subset requires more computation and time.

In the proposed method boosting is used in bagging algorithm as a base learner. Boosting itself will use its own base learner C4.5 [1]. Boosting is affected by variance error. Using multiple boosting on different data subsets makes the combined model have less variance. In this, the number of instances given to each boosting algorithm remains the same. In the results section, the results of the proposed method are compared with the combined method and individual bagging and boosting on standard datasets.

Ensembling is a process of merging at least two procedures of homogeneous or heterogeneous nature called base learners. This increases the robustness of the system which accepts the predictions from all the base learners. It is like a committee among various subcommittees to arrive at a conclusion on the problem. All of them have a different outlook on the problem thus a dissimilar mapping function from the problem statement to the favorable outcome. As a result, they have to make different predictions on the problem according to their own outlooks.

The final decision is made by accounting on all the predictions. Hence the output decision will be precise, less bias and robust. If one of these sub-committees decided this alone, the final decision might be contradictory. A single learner might not be able to predict with great accuracy. But the numerous feedbacks of multiple learners usually increase the accuracy. In the regression problem or the classification problem, the mean of probability predictions from the models are calculated by Averaging.

Majority votes consider the prediction with the highest vote count from various models predictions while predicting the results of a classification problem [2].

In weighted average, different weights are taken which are to be applied for predictions from various models, then considering the mean which results in providing high or low importance to specific model output.

2 Related Work

2.1 Bagging as an Ensemble

The paper “A Bagging Method using Decision Trees in the Role of Base Classifiers” proposed by the authors Kristna Machov, Frantiek Bark, Peter Bednr described a list of experiments with bagging—a process that can be used to enhance the results of the classification algorithm [3]. Their use of this process aims at classification algorithms implying decision trees. Bagging procedure improves the performance of classification using the minimum number of the decision trees.

2.2 Boosting as an Ensemble

The authors Thakkar et al. Proposed a paper on “Boost a Weak Learner to a Strong Learner Using Ensemble System Approach” which demonstrated the weak learner’s potential to reduce the rate of an error on the testing data and the ability of boosting algorithm to reduce the error rate of the weak learner [4]. In the experiment, they utilized decision stump as a weak learner (classifier) and by utilizing the boosting approach, the output denotes the enhancement in the classifier’s accuracy. The boosting meta-algorithm is a simple and well-organized model building strategy. Boosting motivates new model to become specialists, for instance, handle wrongly by earlier ones. A model’s contribution is weighted by its performance than giving equal weight to all the methods. Boosting algorithms creates various models from a dataset, consisting of some model builder techniques such as a decision tree builder which may not be a good model builder. The basic idea of boosting is that each entity in the dataset is mapped with a weight. If a model incorrectly classifies the entity, then a series of models are built and the weights are increased (boosted). The final model is then a sum of different models built from the sequence of models in which each model’s output weighted by little score.

2.3 Techniques for Combining Bagging and Boosting

In [5], the authors proposed a methodology in which the expected error of a learning procedure on a specific target function having training set size of 3 components: [1] A bias term computing how close the mean classifier makes by the learning procedure will be to the target function. [2] A variance term computes the amount of the every learning algorithms estimates how often they disagree. [3] A variable that measures the least classification error that is related with the Bayes optimal classifier for the target function. For enhancing the prediction of a classifier, authors suggest arranging bagging and boosting methodology with sum rule voting (Vote BB). All sub-ensemble gives a confidence value for each candidate class. In the proposed method, the voters express the degree of their preference as confidence weighs the chances of sub ensemble prediction. Later, confidence values are summed for each candidate and the candidate with the maximum sum proves to be superior.

MultiBoosting [6] is an another method proposed by Webb G. I. which is acknowledged as wagging committees formed by AdaBoost. Wagging is a aberrant version of bagging, as wagging makes use of re-weighting for every training example and bagging uses resampling to receive the datasets.

BagBoo A Scalable Hybrid Boosting and Bagging Model [7] proposed by Gorodilov et al. is another method for combining bagging and boosting into a hybrid approach. The only important change from bagging and boosting is that they bagged boosted models which allowed the resulting model to be quite a bit more powerful.

3 Proposed Methodology

A comparative study on bagging [8,9,10] and the boosting [11, 12] algorithms is performed and combined them to eliminate both of their flaws. In the proposed method, f̂ features are selected from the originally present f features. Then s number of subsets is randomly sampled without replacement from the total subsets generated of f̂ features. The same number of samples are present in all the subsets. The boosting algorithm is fitted on each training subset giving s learned hypothesis function. Prediction is made on test tuple by all the models. For each class a confidence value which is the probability between 0 and 1 is returned, this is summed and the maximum confidence class is predicted output. Algorithm 1 presents Random attribute subset selection

Algorithm 1: Random attribute subset selection

  1. a.

    Input.

  • Set all training sets {Xi,Yi} where i = 1 to n

  • X consists of different features i.e. (X1, X2,…,Xk), Y is the target variable

  1. b.

    Procedure.

  • Apply Base Learning Algorithm as AdaBoost

  • Generate all combinations of attributes of length m

  • Subset: Select s number of random subsets

  1. c.

    For i to s do

  • M[i]: = AdaBoost(subset[i])

  • end for

  1. d.

    Output.: argmaxΣiM[i]

Figure 1 describes the workflow of the proposed methodology which starts with the training of the dataset including cleaning by filtering the dataset. N bags of the dataset are created, each portraying random attributes and applying boosting on each bag and later combining the results to give the predicted value.

Fig. 1
figure 1

Flow diagram of proposed method

4 Experiment Results

Experiments are performed in Python. The experiments are carried out on 64-bit Core i5–7th generation 3 GHz processor, 8 GB RAM. In the experiment, the aim is to compare the proposed method of integrating boosting consisting random feature subset selection with bagging, boosting and sum vote combining [5] classifiers. The 22 datasets of various domains are selected from UCI repository [13] containing a different number of instances, types of features: numerical as well as categorical and class distributions of two till ten classes. The base sub-classifier CART algorithm induced decision tree is used for all the methods. In the literature [5] the authors compared several base classifiers like Decision tree, Decision stump, Bayesian algorithm and Rule Learner on various datasets and concluded that Decision tree gave the best results compared to the others. As referring to the literature, Decision Tree is used as our base estimators in the proposed method. Equal count of base sub-classifier is combined in all the models for making the results comparable equally. The accuracy is estimated by taking an average of tenfold stratified cross validation. The database is split into ten folds by sampling and rearranging without replacement. The classifier algorithm is tested on each fold and trained on the remaining nine folds. Other evaluation metrics—recall, precision, and specificity are computed for each database.

Experiment is first tested by changing the number of features selected and the number of subsets selected to decide the proportion of randomly picked feature subsets from feature vector giving the optimal result. The time taken for running the proposed algorithm takes more time than bagging and boosting due to algorithm running on additional data subspaces. Increasing the number of bags or subsets, consumes more time for training. It can be parallelized easily to decrease the running time.

Table 1 describes databases which are used to express the count of instances, categorical features, numerical features and classes. Here some well-known datasets are used for the comparison of the results.

Table 1 Description of datasets

Various experimentation is performed on all the datasets with a various count of base estimators. It is found that when the count of base classifiers is 5, the optimum result is achieved with least time and space complexity as shown in Table 2.

Table 2 Comparative analysis of the number of base classifiers with Breast-cancer, Car and Haberman datasets

Bagging, boosting and combining using vote B and B are executed on the different datasets, for each method accuracy, recall, precision and specificity quality evaluation metrics are estimated and listed in Tables 3, 4 and 5.

Table 3 Result of bagging classifier using decision tree
Table 4 Result of Adaboost classifier using decision tree
Table 5 Result of vote bagging and boosting ensemble

Figure 2 visually represents the results of vote bagging and boosting ensemble on datasets.

Fig. 2
figure 2

Visualization of results of vote bagging and boosting ensemble

Comparison of the 3 ensemble methods is performed with the proposed method whose result is shown in Table 6. It contains the evaluation metrics, number of attributes taken and number of bags which is the subsets selected randomly from different combinations of attributes.

Table 6 Result of proposed method on datasets

Figure 3 visually represents the results of proposed method on different datasets.

Fig. 3
figure 3

Visualization of results of proposed method on datasets

The proposed method has higher accuracy in 10 out of 22 datasets for bagging, 15 for boosting and 8 for vote B and B method. Comparing the Tables 4 and 6 it is seen that presented classifier gives better precision in 15 out of 22 datasets than boosting alone. Precision is better in 4 datasets for both bagging and combine vote B and B method. It states that positive predictions made by the proposed classifier are more relevant. The accuracy values also depend upon the random features selected. A feature can contribute more to the classifier prediction than others.

5 Conclusion and Future Scope

Authors compared the ensemble methodologies of bagging and boosting with the proposed methodology. Further, authors also compared the proposed methodology to that of combining bagging and boosting using VOTE B and B method. After the comparison, it reveals that in almost all the cases it got better results than boosting alone. In 8 datasets (Australian, Breast-cancer, Haberman, Heart-statlog, Heart-h, Liver-IND, Somhappy, Travel Insurance) the proposed methodology of combination gave better results, whereas the combination using VOTE B and B gave better results in the remaining 14 datasets. Further, the authors stated that, boosting works better compared to bagging on noiseless dataset, whereas bagging is more efficient compared to boosting on the data containing noise. The methodology the authors proposed has been proved to give better accuracy and achieve a lower error rate in general than bagging and boosting considering Decision Tree as the base classifier.