Boosting and Stacking

El Morr, Christo; Jammal, Manar; Ali-Hassan, Hossam; El-Hallak, Walid

doi:10.1007/978-3-031-16990-8_15

Christo El Morr¹³,
Manar Jammal¹⁴,
Hossam Ali-Hassan¹⁵ &
…
Walid El-Hallak¹⁶

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 334))

955 Accesses

Abstract

The ensemble technique relies on an aggregate of models’ output to provide a better prediction. Other than voting and bagging, we can use boosting and stacking.

Access provided by Autonomous University of Puebla. Download chapter PDF

Combining Base-Learners into Ensembles

Ensemble Method Combination: Bagging and Boosting

Top-k Parametrized Boost

15.1 The Problem

The ensemble technique relies on an aggregate of models’ output to provide a better prediction. Other than voting and bagging, we can use boosting and stacking.

15.2 Boosting

Boosting (or hypothesis boosting) refers to an ensemble method that builds a strong learner out of a combination of weak learners (i.e., learners that perform slightly better than random guessing). The predictors are trained sequentially, and each subsequent predictor tries to correct the current one [1]. The dataset is the same for all algorithms; however, each data instance is subject to a weight based on the outcome of the previous model’s success [2]; in each iteration, to factor in the prediction difficulty of incorrectly classified instances, their weight is increased. We usually use this technique when learning a new skill, as we focus our attention on difficult aspects. Boosting algorithms differ in the way they calculate the weights (Fig. 15.1).

A flowchart of scatterplots and diagrams. The chart is as follows. Scatterplot; diagram with output; scatterplot; diagram with output; scatterplot; diagram with output to final output. — **Fig. 15.1**

15.3 Stacking

In stacking (or stacking generalization [3]), the outputs of several algorithms are used as the input of the main algorithm (called sometime the blender [1]) that is supposed to make the final prediction. Practically, we feed the blender with the predicted outcomes of the preceding algorithms. The training dataset is divided into two parts, a subset (a holdout) to train the blender and a subset to train the other algorithms. The blender uses the outcomes of the other algorithms as input features and the labels from the holdout dataset to train the blender. The blender will learn to predict the labels based on the input features (i.e., the outcomes of the algorithms).

What we have just explained is a stacking mechanism with two layers and one blender. It is possible to create stacking with more than two layers; for instance, in stacking with three layers, the dataset is split into three subsets. The first is used at layer 1 to generate the outcomes, which will act as input for the first blender at layer 2, which will also use the labels of the second subset for training. The second blender at layer 3 will act similarly, i.e., it uses the outcomes of layer 2 and the labels of the third subset for training.

15.4 Boosting Example

15.4.1 AdaBoost Algorithm

AdaBoost is a boosting algorithm that focuses its attention on the training instances that the predecessor algorithm misclassified [4].

Initially, each instance of the dataset is assigned equal weight. Using the training dataset, AdaBoost trains a weak learner classifier, such as a decision tree with one level (called a decision stump). Then, AdaBoost uses the developed model to make predictions about the training dataset and increases the weight for the misclassified instances. The dataset with the updated weights is then used for training in the next iteration. The process continues until the desired number of classifiers is reached or no further improvement in classification can be made.

Once trained, AdaBoost makes predictions by calculating all the predictions of all the predictors, weighting them using the predictors’ weights. The predicted class is determined by the majority of the weighted votes [1].

At each iteration, AdaBoost focuses on misclassified instances; this strategy improves the performance of the weak classifiers drastically (Fig. 15.2).

A flow chart depicts the original dataset through model 1, updated weighed, model 2, predictions on weighed data D subscript 2, model 3, leads to predictions on weighed data D subscript 3. — **Fig. 15.2**

The following is a summary of the training of AdaBoost algorithm for a dataset of m instances and N predictors:

Initialize the weights $ {w}^{(i)}=\frac{1}{m} $
For j = 1 to N
Begin For j
1. 1.
  For i = 1 to m
  - Begin For i
    - Calculate jth prediction $ {\hat{y}}_j^{(i)} $ for each instance x⁽ⁱ⁾
  - End For i
2. 2.
  Calculate the jth predictor’s error rate
  $$ {r}_j=\frac{\left(\mathrm{for}\ {\hat{y}}_j^{(i)}\ne {y}^{(i)}\right)\sum \limits_{i=1}^m{w}^{(i)}}{\sum \limits_{i=1}^m{w}^{(i)}} $$
- where $ {\hat{y}}_j^{(i)} $ is the jth prediction for the instance i
1. 3.
  Calculate the jth predictor’s weight
  $$ {\alpha}_j=\eta\ \log \frac{1-{r}_j}{r_j} $$
- where η is the learning rate (by default, η = 1).
1. 4.
  Update the instances’ weights
  $$ \mathrm{For}\ i=1\ \mathrm{to}\ m $$
  - Begin For i
  $$ \mathrm{if}\ \left({\hat{y}}_j^{(i)}\ne {y}^{(i)}\right)\ \mathrm{then}\ {w}^{(i)}={w}^{(i)}\exp \left({\alpha}_j\right) $$
  - End For i
2. 5.
  Normalize the weights
  $$ \mathrm{For}\ i=1\ \mathrm{to}\ m $$
  - Begin For i
  $$ {w}^{(i)}=\frac{w^{(i)}}{\sum \limits_{i=1}^m{w}^{(i)}} $$
  - End For i
End For j

To predict using AdaBoost, for a new instance, the weak learners calculate in sequence a predicted value as either +1 (for first class) or –1 (for the second class); each prediction is weighted by the predictor’s weight. The weighted sum is calculated; AdaBoost assigns the instance to the first class if the weighted sum is positive and to the second class otherwise. Classifying an instance x with AdaBoost with N predictors can be summarized as follows:

$$ \hat{y}(x)=\underset{k}{\mathrm{argmax}}\sum \limits_{\begin{array}{c}j=1\\ {}{\hat{y}}_j(x)=k\end{array}}^N\ {\alpha}_j $$

15.4.2 AdaBoost Example

Download the “Iris” file from the Weka datasets or from the Kaggle website using the following link: https://www.kaggle.com/uciml/iris. Open the file in Weka and choose the AdaBoost algorithm in the Classify tab (Fig. 15.3).

A screenshot of the Weka Explorer window with the selected classify tab on the top panel, includes a selected file name AdaBoost M 1 under the meta, classifiers. — **Fig. 15.3**

Check the AdaBoost parameters and get acquainted with them (you can use the More button for more information). Accept the default parameters. Explore particularly the Classifier parameter; you can choose classifiers other than the decision stump (Fig. 15.4). Choose cross-validation with tenfolds (Fig. 15.5) and click the Start button. The output window displays the AdaBoost results (Fig. 15.6).

A screenshot of the Weka dot g u i generic object editor. A pop up notification on the weighted threshold, whether resampling is used instead reads 100. — **Fig. 15.4**

A screenshot of the classifier and test options section. Arrows point to the cross validation option and the below start button. — **Fig. 15.5**

A screenshot of the classifier output window. Arrows point to the following text in the screenshot. Iris setosa, Iris versicolor, and Iris virginica: root mean squared error, 0.1729, and so on. — **Fig. 15.6**

AdaBoost has performed ten iterations of cross-validation, as per our request. There are 143 (95.33%) correctly classified instances and seven (4.73%) incorrectly classified ones. The root mean squared error (RMSE) that we are trying to minimize is 0.1729. We can notice that the class Iris-setosa was clearly identified with a perfect area under the curve (AUC) (i.e., ROC area). The AUCs for Iris-versicolor and Iris-virginica were 0.92 and 0.93, respectively, indicating a high ability of the model to classify both types of irises. The confusion table shows five Iris-versicolor incorrectly classified as Iris-virginica and two Iris-virginica incorrectly classified as Iris-versicolor. The window shows at the top that the classification was based on the petal length value 2.45 to differentiate between the Iris-setosa and the two other types, the decision being if petal length value is <2.45, then the flower is Iris-setosa.

15.5 Key Terms

1.
Boosting
2.
Hypothesis boosting
3.
Weak learners
4.
Stacking
5.
Blender
6.
Holdout sample
7.
AdaBoost
8.
Decision stump

15.6 Test Your Understanding

1.
Explain how stacking functions.
2.
Describe boosting.
3.
What are some of the challenges in boosting?
4.
What is a decision stump?
5.
Cite some of the hyperparameters of AdaBoost.

15.7 Read More

1.
Abbruzzo, A., Tamburo, E., Varrica, D., Dongarrà, G., & Mineo, A. (2016). Penalized linear discriminant analysis and Discrete AdaBoost to distinguish human hair metal profiles: The case of adolescents residing near Mt. Etna. Chemosphere, 153, 100–106. doi: 10.1016/j.chemosphere.2016.03.029
2.
Barczak, A. L. C., Johnson, M. J., & Messom, C. H. (2008). Empirical evaluation of a new structure for AdaBoost. Paper presented at the Proceedings of the 2008 ACM symposium on Applied computing, Fortaleza, Ceara, Brazil. https://doi.org/10.1145/1363686.1364109
3.
Bartlett, P. L., & Traskin, M. (2007). AdaBoost is Consistent. J. Mach. Learn. Res., 8, 2347–2368.
4.
Cai, W., Qiu, L., Li, W., Yu, J., & Wang, L. (2019). Practical Fall Detection Algorithm based on Adaboost. Paper presented at the Proceedings of the 2019 4th International Conference on Biomedical Signal and Image Processing (ICBIP 2019), Chengdu, China. https://doi.org/10.1145/3354031.3354056
5.
Carreras, X., Màrquez, L., & Padró, L. (2002). Named Entity Extraction using AdaBoost. Paper presented at the proceedings of the 6th conference on Natural language learning - Volume 20. https://doi.org/10.3115/1118853.1118857
6.
Carreras, X., Màrquez, L., & Padró, L. (2003). A simple named entity extractor using AdaBoost. Paper presented at the Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, Edmonton, Canada. https://doi.org/10.3115/1119176.1119197
7.
Chen, Y., Li, X., & Sun, W. (2020). Research on Stock Selection Strategy Based on AdaBoost Algorithm. Paper presented at the Proceedings of the 4th International Conference on Computer Science and Application Engineering, Sanya, China. https://doi.org/10.1145/3424978.3425084
8.
Du, N., Li, K., Mahajan, S. D., Schwartz, S. A., Nair, B. B., Hsiao, C. B., & Zhang, A. (2011). Gene Co-Adaboost: a semi-supervised approach for classifying gene expression data. Paper presented at the Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine, Chicago, Illinois. https://doi.org/10.1145/2147805.2147892
9.
Frost, J., Beekers, N., Hengst, B., & Vendeloo, R. (2012). Meeting cancer patient needs: designing a patient platform. Paper presented at the CHI ‘12 Extended Abstracts on Human Factors in Computing Systems, Austin, Texas, USA. https://doi.org/10.1145/2212776.2223806
10.
Gutiérrez-Tobal, G. C., Álvarez, D., Del Campo, F., & Hornero, R. (2016). Utility of AdaBoost to Detect Sleep Apnea-Hypopnea Syndrome From Single-Channel Airflow. IEEE Trans Biomed Eng, 63(3), 636–646. doi: 10.1109/tbme.2015.2467188
11.
He, B., Huang, C., Sharp, G., Zhou, S., Hu, Q., Fang, C., … Jia, F. (2016). Fast automatic 3D liver segmentation based on a three-level AdaBoost-guided active shape model. Med Phys, 43(5), 2421. doi: 10.1118/1.4946817
12.
Hsu, K.-W. (2017). Heterogeneous AdaBoost with stochastic algorithm selection. Paper presented at the Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication, Beppu, Japan. https://doi.org/10.1145/3022227.3022266
13.
Hu, W., & Hu, W. (2005). Network-Based Intrusion Detection Using Adaboost Algorithm. Paper presented at the Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence. https://doi.org/10.1109/WI.2005.107
14.
Kadlček, F., & Fučík, O. (2013). Fast and energy efficient AdaBoost classifier. Paper presented at the Proceedings of the 10th FPGAworld Conference, Stockholm, Sweden. https://doi.org/10.1145/2513683.2513685
15.
Memari, N., Ramli, A. R., Bin Saripan, M. I., Mashohor, S., & Moghbel, M. (2017). Supervised retinal vessel segmentation from color fundus images based on matched filtering and AdaBoost classifier. PLoS One, 12(12), e0188939. doi: 10.1371/journal.pone.0188939
16.
Mukherjee, I., Rudin, C., & Schapire, R. E. (2013). The rate of convergence of AdaBoost. J. Mach. Learn. Res., 14(1), 2315–2347.
17.
Park, S. Y., & Chen, Y. (2017). Patient Strategies as Active Adaptation: Understanding Patient Behaviors During an Emergency Visit. Paper presented at the Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, Colorado, USA. https://doi.org/10.1145/3025453.3025978
18.
Reiss, A., Hendeby, G., & Stricker, D. (2013). Confidence-based multiclass AdaBoost for physical activity monitoring. Paper presented at the Proceedings of the 2013 International Symposium on Wearable Computers, Zurich, Switzerland. https://doi.org/10.1145/2493988.2494325
19.
Rudin, C., & Schapire, R. E. (2009). Margin-based Ranking and an Equivalence between AdaBoost and RankBoost. J. Mach. Learn. Res., 10, 2193–2232.
20.
Saravanakumar, S., & Thangaraj, P. (2019). A Computer Aided Diagnosis System for Identifying Alzheimer’s from MRI Scan using Improved Adaboost. J Med Syst, 43(3), 76. doi: 10.1007/s10916-018-1147-7
21.
Song, X., Rui, T., Zha, Z., Wang, X., & Fang, H. (2015). The AdaBoost algorithm for vehicle detection based on CNN features. Paper presented at the Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, Zhangjiajie, Hunan, China. https://doi.org/10.1145/2808492.2808497
22.
Sun, J., Wang, F., Hu, J., & Edabollahi, S. (2012). Supervised patient similarity measure of heterogeneous patient records. SIGKDD Explor. Newsl., 14(1), 16–24. doi: 10.1145/2408736.2408740
23.
Thongkam, J., Xu, G., Zhang, Y., & Huang, F. (2008). Breast cancer survivability via AdaBoost algorithms. Paper presented at the Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80, Wollongong, NSW, Australia.
24.
Wang, B., Qi, Z., Chen, S., Liu, Z., & Ma, G. (2017). Multi-vehicle detection with identity awareness using cascade Adaboost and Adaptive Kalman filter for driver assistant system. PLoS One, 12(3), e0173424. doi: 10.1371/journal.pone.0173424
25.
Wang, M. Y., Li, P., & Qiao, P. L. (2016). The Virtual Screening of the Drug Protein with a Few Crystal Structures Based on the Adaboost-SVM. Comput Math Methods Med, 2016, 4809831. doi: 10.1155/2016/4809831
26.
Wang, Q., & Wei, X. (2020). The Detection of Network Intrusion Based on Improved Adaboost Algorithm. Paper presented at the Proceedings of the 2020 4th International Conference on Cryptography, Security and Privacy, Nanjing, China. https://doi.org/10.1145/3377644.3377660
27.
Wang, Y., Ru, J., Jiang, Y., & Zhang, J. (2019). Adaboost-SVM-based probability algorithm for the prediction of all mature miRNA sites based on structured-sequence features. Sci Rep, 9(1), 1521. doi: 10.1038/s41598-018-38048-7
28.
Yang, Y., Liu, C., & Liu, N. (2019). Credit Card Fraud Detection based on CSat-Related AdaBoost. Paper presented at the Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition, Beijing, China. https://doi.org/10.1145/3373509.3373548
29.
Yousefi, M., Yousefi, M., Ferreira, R. P. M., Kim, J. H., & Fogliatto, F. S. (2018). Chaotic genetic algorithm and Adaboost ensemble metamodeling approach for optimum resource planning in emergency departments. Artif Intell Med, 84, 23–33. doi: 10.1016/j.artmed.2017.10.002

15.8 Lab

15.8.1 A Working Example in Python

The heart dataset that will be used for this lab can be downloaded from the following link: https://www.kaggle.com/code/ysthehurricane/heart-failure-prediction-using-adaboost-xgboost/data.

That dataset contains 11 features that will be used to predict heart disease events:

Age: person’s age in years
Sex: person’s gender
ChestPainType: chest pain type
RestingBP: resting blood pressure in mm Hg
Cholesterol: serum cholesterol in mm/dL
FastingBS: blood sugar measurement on fasting
RestingECG: electrocardiogram results in resting
MaxHR: maximum heart rate achieved
ExerciseAngina: exercise-induced angina flag
Oldpeak: old peak
ST_Slope: the slope of the peak exercise
HeartDisease: target class (1 for having heart disease and 0 for not)

15.8.1.1 Loading Heart Dataset

We start by importing the required libraries and loading the heart dataset (Fig. 15.7). When you run the code, if you have not installed previously a needed library you will receive an error message stating that the module was not found, in such cases you need install the missing library using pip.

An algorithm to load heart dataset into pandas. The algorithm has codes for the following: import required libraries, and load heart dataset. — **Fig. 15.7**

15.8.1.2 Visualizing Heart Dataset

The next step is to explore the heart dataset visually. We have opted to display the plot heatmap correlations between features’ pairs (Fig. 15.8).

An algorithm to visualize heart dataset in a heatmap and a visual representation of the heatmap. The algorithm has codes for data visualisation. The title of the heat map is heart failure correlation. — **Fig. 15.8**

15.8.1.3 Preprocess Data

The next step is to convert string values to numeric ones. We have used the LabelEncoder do so (Fig. 15.9), can you achieve the same result using a different approach? Try.

An algorithm to convert string values into numeric. The algorithm has codes to preprocess data colon converting string to numeric. — **Fig. 15.9**

15.8.1.4 Split and Scale Data

We can now choose the features and target, split the original dataset into training and testing datasets and standardize both (Fig. 15.10).

An algorithm to scale and split the heart dataset. The algorithm has codes for the following: split heart dataset into training slash testing data, initialize scaler object, and standardize test data. — **Fig. 15.10**

15.8.1.5 Create AdaBoost and Stacking Models

We will use AdaBoost to create a boosting model with a learning_rate=0.01 and n_estimators=500. We will also use a stacking approach using k-nearest neighbors and Gaussian naïve Bayes algorithms as classifiers and logistic regression as a metaclassifier. Then, we train both models on the training dataset and make associated predictions using the testing dataset (Fig. 15.11).

An algorithm to build a stack model and AdaBoost. The algorithm has codes for the following: create the AdaBoost classifier, create the staking classifier, and so on. — **Fig. 15.11**

15.8.1.6 Evaluate the AdaBoost and the Stacking Models

The next step is to evaluate the performance of the AdaBoost and Stacking models. We opted to show in this lab the performance on the training and testing datasets for exploration/learning purposes. Figure 15.12 shows the code and Figs. 15.13 and 15.14 show the performance results displayed for AdaBoost and Stacking, respectively.

An algorithm to calculate the confusion matrix and accuracy for the AdaBoost model. The algorithm has codes for the following: print AdaBoost classifier model metrics results, and print stacking classifier model metrics results. — **Fig. 15.12**

An algorithm of Adaboost model performance on the testing dataset, accuracy score, and so on. The accuracy score is 86.5942 percent on the testing dataset of AdaBoost model performance. — **Fig. 15.13**

An algorithm of stacking model performance on the training dataset, accuracy score, and so on, along with two confusion matrixes. The accuracy score is 89.7196 percent on the training dataset of stacking model performance. — **Fig. 15.14**

15.8.1.7 Optimizing the Stacking and AdaBoost Models

The models’ performances on the testing datasets are fair. Let us explore the performance of the optimized models after hyperparameter tuning. The results for the Stacking and AdaBoost models are presented in Figs. 15.15 and 15.16, respectively.

An algorithm with the following codes: optimized stacking model performance on the testing dataset, classification report, and with a confusion matrix. The accuracy score is 88.0435 percent for the dataset of optimized stacking model performance. — **Fig. 15.15**

An algorithm with the following code: optimized AdaBoost model performance on the testing dataset, classification report, along with a confusion matrix below. The optimized AdaBoost model's performance accuracy score is 88.0435 percent. — **Fig. 15.16**

15.8.2 Do It Yourself

15.8.2.1 The Heart Disease Dataset Revisited

1.
Have you noticed any possible overfitting in the example above?
2.
Did you obtain the same results when you run your code? What do you think about those results?
3.
During the evaluation step above, we have just applied the models to the testing dataset. That is not the best option. What is a better approach?
4.
Use cross-validation to redo the evaluation step.

15.8.2.2 The Iris Dataset

Download the iris dataset and do the following:

1.
Load the dataset into pandas.
2.
Visualize the dataset and calculate the highest correlations.
3.
Preprocess the data.
4.
Split the data.
5.
Create an AdaBoost model.
6.
Evaluate the AdaBoost model.
7.
Optimize the AdaBoost model.
8.
Create a stack model.
9.
Evaluate the stack model.
10.
Optimize the stack model.
11.
Compare the results between both models and deduce a conclusion.

15.8.3 Do More Yourself

References

A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (O’Reilly Media, Sebastopol, CA, 2019)
Google Scholar
Y. Liu, Python Machine Learning By Example: Build Intelligent Systems Using Python, TensorFlow 2, PyTorch, and Scikit-Learn, 3rd Edition (Kindle Edition) (Packt, 2020)
Google Scholar
D.H. Wolpert, Stacked generalization. Neural Netw. 5(2), 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1
Article Google Scholar
R.E. Schapire, Y. Freund, Boosting: Foundations and Algorithms (MIT Press, Cambridge, MA, 2014)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Health Policy and Management, York University, Toronto, ON, Canada
Christo El Morr
School of Information Technology, York University, Toronto, ON, Canada
Manar Jammal
Department of International Studies, York University, Glendon Campus, Toronto, ON, Canada
Hossam Ali-Hassan
Ontario Health, Toronto, ON, Canada
Walid El-Hallak

Authors

Christo El Morr
View author publications
You can also search for this author in PubMed Google Scholar
Manar Jammal
View author publications
You can also search for this author in PubMed Google Scholar
Hossam Ali-Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Walid El-Hallak
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

El Morr, C., Jammal, M., Ali-Hassan, H., El-Hallak, W. (2022). Boosting and Stacking. In: Machine Learning for Practical Decision Making. International Series in Operations Research & Management Science, vol 334. Springer, Cham. https://doi.org/10.1007/978-3-031-16990-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-16990-8_15
Published: 30 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16989-2
Online ISBN: 978-3-031-16990-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Boosting and Stacking

Abstract