Abstract
Breast cancer is one of the most prevalent cancers in women, and recent breakthroughs in data mining have provided more insight into the disease and its prognosis. We offer a set of machine learning models for predicting breast cancer survival in this chapter. There are 272 occurrences, 1564 characteristics, and a target variable in the original data, which were obtained from the data world Website (195 not survived, 77 survived). Out of 1564 features, the top ten features were selected using the extreme gradient boosting (XGB) method. 5-fold and 10-fold stratified cross-validation were used to extract the average results and accuracy for both methods is compared. The outcomes of six machine learning classifiers were compared and rated using a variety of statistical rates (accuracy, precision, true positive rate, true negative rate, F1-score, ROC–AUC score). We offer random forest (RFC) and XGB as the top classifiers after evaluating the models, with overall testing accuracy of 78% and 77.2%, respectively. However, all the classifiers performed well in predicting label 0 (high true negative rate) as compared to label 1 (low true positive rate).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sharma, G. N., Dave, R., Sanadya, J., Sharma, P., & Sharma, K. K. (2010, April). Various types and management of breast cancer: An overview. Journal of Advanced Pharmaceutical Technology & Research [Internet] [cited 2021 Aug 31] 1(2), 109. Available from: /pmc/articles/PMC3255438/
Boyle, P. (2012, August 1). Triple-negative breast cancer: Epidemiological considerations and recommendations. Annals of Oncology [Internet] [cited 2021 Aug 31], 23(SUPPL. 6), vi7–12. Available from: http://www.annalsofoncology.org/article/S0923753419376355/fulltext
Vickers, A. J., & Cronin, A. M. (2010). Traditional statistical methods for evaluating prediction models are uninformative as to clinical value: Towards a decision analytic framework. Seminars in Oncology., 37(1), 31–38.
Feng, Y., Spezia, M., Huang, S., Yuan, C., Zeng, Z., Zhang, L., et al. (2018). Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis [Internet]. In Genes and diseases. Chongqing yi ke da xue, di 2 lin chuang xue yuan Bing du xing gan yan yan jiu suo [cited 2021 Mar 7] (Vol. 5, pp. 77–106). Available from: /pmc/articles/PMC6147049/
Qazi, S., Raza, K., & Iqbal, N. (2021). Artificial intelligence in medicine (AIM): Machine learning in cancer diagnosis, prognosis and therapy. Artificial Intelligence for Data-Driven Medical Diagnosis, 10, 103–126.
Jabeen, A., Ahmad, N., & Raza, K. (2018). Machine learning-based state-of-the-art methods for the classification of RNA-seq data. Lecture Notes in Computational Vision and Biomechanics [Internet], 26, 133–172. Available from: https://springerlink.bibliotecabuap.elogim.com/chapter/https://doi.org/10.1007/978-3-319-65981-7_6
Raza, K. (2019). Improving the prediction accuracy of heart disease with ensemble learning and majority voting rule. U-Healthcare Monitoring Systems, 1, 179–196.
Kim, J.-Y., Lee, Y. S., Yu, J., Park, Y., Lee, S. K., Lee, M., et al. (2021). Deep learning-based prediction model for breast cancer recurrence using adjuvant breast cancer cohort in tertiary cancer center registry. Frontiers in Oncology, 4, 655.
Ganggayah, M. D., Taib, N. A., Har, Y. C., Lio, P., & Dhillon, S. K. (2019, March 22). Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Medical Informatics and Decision Making [Internet]. [cited 2021 Mar 7], 19(1), 48. Available from: https://bmcmedinformdecismak.biomedcentral.com/articles/https://doi.org/10.1186/s12911-019-0801-4
Genuer, R., Poggi, J.-M., & Tuleau-Malot, C. VSURF: An R package for variable selection using random forests [cited 2021 Aug 31]. Available from: http://CRAN.R-project.org/package=VSURF
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2015, January 1). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17.
Ming, C., Viassolo, V., Probst-Hensch, N., Dinov, I. D., Chappuis, P. O., & Katapodi, M. C. (2020, June 22). Machine learning-based lifetime breast cancer risk reclassification compared with the BOADICEA model: Impact on screening recommendations. British Journal of Cancer 2020 [Internet]. [cited 2021 Aug 31], 123(5), 860–867. Available from: https://www.nature.com/articles/s41416-020-0937-0
Montazeri, M., Montazeri, M., Montazeri, M., & Beigzadeh, A. (2016). Machine learning models in breast cancer survival prediction. Technology and Health Care., 24(1), 31–42.
Sekeroglu, B., & Tuncal, K. (2021, January 28). Prediction of cancer incidence rates for the European continent using machine learning models [Internet] [cited 2021 Aug 31], 27(1). Available from: https://journals.sagepub.com/doi/full/https://doi.org/10.1177/1460458220983878
O’Lorcain, P., Deady, S., & Comber, H. (2006, June 1). Mortality predictions for colon and anorectal cancer for Ireland, 2003–17. Colorectal Disease [Internet] [cited 2021 Aug 31], 8(5), 393–401. Available from: https://onlinelibrary.wiley.com/doi/full/https://doi.org/10.1111/j.1463-1318.2006.00951.x
Ganggayah, M. D., Taib, N. A., Har, Y. C., Lio, P., & Dhillon, S. K. (2019, March 22). Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Medical Informatics and Decision Making, 19(1).
Gupta, S., Tran, T., Luo, W., Phung, D., Kennedy, R. L., Broad, A., et al. (2014, March 1). Machine-learning prediction of cancer survival: A retrospective study using electronic administrative records and a cancer registry. BMJ Open [Internet]. [cited 2021 Aug 31], 4(3), e004007. Available from: https://bmjopen.bmj.com/content/4/3/e004007
Chang, C.-M., Su, Y.-C., Lai, N.-S., Huang, K.-Y., Chien, S.-H., Chang, Y.-H., et al. (2012, August 30). The combined effect of individual and neighborhood socioeconomic status on cancer survival rates. PLOS ONE [Internet] [cited 2021 Aug 31], 7(8), e44325. Available from: https://journals.plos.org/plosone/article?id=https://doi.org/10.1371/journal.pone.0044325
Woojae, K., Ku Sang, K., Jeong Eon, L., Don-Yong, N., Sung-Won, K., Yong Sik, J., et al. (2012, June). Development of novel breast cancer recurrence prediction model using support vector machine. Journal of breast cancer [Internet]. [cited 2021 Aug 31], 15(2), 230–238. Available from: https://pubmed.ncbi.nlm.nih.gov/22807942/
Manilitch, E. A., Kiran, R. P., Tomas, R., Ian, L., Fazio, V. W., & Remzi, F. H. (2011). A novel data-driven prognostic model for staging of colorectal cancer. Journal of the American College of Surgeons [Internet] [cited 2021 Aug 31], 213(5), 579–588.e2. Available from: https://pubmed.ncbi.nlm.nih.gov/21925905/
Keogh, E., & Mueen, A. (2017). Curse of dimensionality. In: Encyclopedia of machine learning and data mining [Internet]. Springer US; [cited 2021 Jan 13], pp. 314, 315. Available from: https://springerlink.bibliotecabuap.elogim.com/referenceworkentry/https://doi.org/10.1007/978-1-4899-7687-1_192
Albattah W, Khan RU, Khan K (2020, July 17). Attributes reduction in big data. Applied Sciences [Internet] [cited 2021 Jan 13], 10(14), 4901. Available from: https://www.mdpi.com/2076-3417/10/14/4901
Liu, L., Yu, Y., Fei, Z., Li, M., Wu, F.-X., Li, H.-D., et al. (2018, November 22). An interpretable boosting model to predict side effects of analgesics for osteoarthritis. BMC Systems Biology [Internet]. [cited 2021 Jul 14], 12(6), 29–38. Available from: https://bmcsystbiol.biomedcentral.com/articles/https://doi.org/10.1186/s12918-018-0624-4
Krstajic, D., Buturovic, L. J., Leahy, D. E., & Thomas, S. (2014, March 29). Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of Cheminformatics 2014 [Internet] [cited 2021 Aug 31], 6(1), 1–15. Available from: https://jcheminf.biomedcentral.com/articles/https://doi.org/10.1186/1758-2946-6-10
Battineni, G., Chintalapudi, N., & Amenta, F. (2019, January 1). Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM). Informatics in Medicine Unlocked, 16, 100200.
Fahidy, T. Z. (2011). Some applications of Bayes’ rule in probability theory to electrocatalytic reaction engineering. International Journal of Electrochemistry., 2011, 1–5.
Haury, A.-C., Gestraud, P., & Vert, J.-P. (2011, December 21). The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. In M.-T. Teh (Ed.) PLoS ONE [Internet] [cited 2021 Feb 11], 6(12), e28210. Available from: https://dx.plos.org/https://doi.org/10.1371/journal.pone.0028210
Lai, C., Reinders, M. J. T., van’t Veer, L. J., Wessels, L. F. A. (2006, May 2). A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics [Internet] [cited 2021 Feb 11], 7(1):235. Available from: http://bmcbioinformatics.biomedcentral.com/articles/https://doi.org/10.1186/1471-2105-7-235
Tyagi, A., Tiwari, P., Bhardwaj, P., & Chawla, H. (2021, October 6). Prognosis of sexual dimorphism with unfused hyoid bone: Artificial intelligence informed decision making with discriminant analysis. Science & Justice [Internet] [cited 2021 Oct 18]. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1355030621001283
Futreal, P. A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., et al. (2004). A census of human cancer genes. Nature Reviews Cancer [Internet] [cited 2021 Sep 2], 4(3), 177–183. Available from: https://www.nature.com/articles/nrc1299
Nicolau, M., Levine, A. J., & Carlsson, G. (2011, April 26). Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. In Proceedings of the national academy of sciences [Internet] [cited 2021 Sep 2], 108(17), 7265–7270. Available from: https://www.pnas.org/content/108/17/7265
Wang, R. (2012). AdaBoost for feature selection, classification and its relation with SVM. A Review. Physics Procedia, 1(25), 800–807.
Kalafi, E. Y., Nor, M., Taib, N. A., Ganggayah, M. D., Town, C., Dhillon, S. K., et al. (2019). Original article machine learning and deep learning approaches in breast cancer survival prediction using clinical data (breast cancer/survival prediction/deep learning/machine learning) (Vol. 65), Folia Biologica (Praha).
Boeri, C., Chiappa, C., Galli, F., de Berardinis, V., Bardelli, L., Carcano, G., et al. (2020, May 10). Machine learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Medicine [Internet] [cited 2021 Mar 7], 9(9), 3234–3243. Available from: https://onlinelibrary.wiley.com/doi/abs/https://doi.org/10.1002/cam4.2811
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Tiwari, P., Bhardwaj, P., Keprate, A., Tyagi, A. (2022). Breast Cancer Survival Prediction Using Machine Learning. In: Raza, K. (eds) Computational Intelligence in Oncology. Studies in Computational Intelligence, vol 1016. Springer, Singapore. https://doi.org/10.1007/978-981-16-9221-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-9221-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9220-8
Online ISBN: 978-981-16-9221-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)