Abstract
Through studies and statistics, it has been found that these days, breast cancer is the most common cancer leading to frequent deaths among women. Early screening and subsequent treatment can raise the chances of survival. Through this paper, we aim to demonstrate the ability to detect breast cancer cases using MRI scan data by analyzing the given data with machine learning algorithms. Using machine learning, we hope to ease the process of cancer detection in the hospitals so the patient can be afforded the right treatment as soon as possible before the situation can become critical. It also opens the door toward new possibilities in cancer detection for other different types of cancers as well as other diseases by use of machine learning in medical science, where detection using conventional means is usually laborious and time-taking.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
1 Introduction
During the past years, doctors have classified breast cancer into various subtypes. In 2020, there were 2.3 million women diagnosed with breast cancer and 685,000 deaths globally. At the end of 2020, there were 7.8 million women alive who were diagnosed with breast cancer in the past 5 years. According to GLOBOCAN, breast cancer is the predominant cancer in women, making up to 25.1% of all cancers. There has been a significant decline in the amount of cases of breast cancer because of improvement in the medical sector with the help of advanced technology and software. Over a period of time, machine learning has helped a lot in improving the accuracy of detection of breast cancer in early stages. Breast cancer is more difficult and costly to treat as it reaches upper stages, hence, proper machinery and technology for detection and treatment are called for. This paper lays out a machine learning algorithm that helps to detect breast cancer in women efficiently and with great accuracy.
2 Related Works
This section gives the review of literature work in the field of “Breast cancer detection using machine learning.” Multiple sources were reviewed by the authors for the literature which have been referred for the analysis of breast cancer using machine learning. Further, the authors reviewed datasets from reliable sources for testing purposes of the methods reviewed. The most popular methods [1] of breast cancer detection, namely Naive Bayes classifier, K-nearest neighbors (KNN) [2], logistic regression, decision tree classifier, support vector machine (SVM) classifier [3] and random forest classifier [4, 5], which are given in Table 1.
3 Proposed Methodology
The purpose of this work is to analyze the dataset using random forest classifier, logistic regression and decision tree classifier and compare the efficiency of the respective algorithms in terms of accuracy in detecting cancer in the subject.
3.1 Experimental Setup
The program was made using Python programming language on Jupyter Notebook application. We have used the Wisconsin Breast Cancer datasets [14] from the UCI machine learning repository for our analysis regarding the scope of our work.
We imported NumPy library for working with data using arrays. We imported Pandas library to analyze, clean and manipulate the given dataset. We imported Matplotlib and Seaborn libraries for data visualization of the statistics. We made use of Sclearn which is a machine learning library for Python to import and implement logistic regression, random forest classifier and decision tree classifier algorithms on the given data.
Logistic Regression: It uses a logistic function to for modeling data using dependent and independent variable. It is used for binary data and it is efficient to train. Logistic regression algorithm performs efficiently when the given dataset is separable linearly.
Decision Tree Classifier: A decision tree consists of decision nodes and leaf nodes. It begins with the root node and finds the best attribute using attribute selection measure so as to divide it into multiple subsets. From these, a decision tree node is generated. The process repeats recursively until no further classification of nodes is possible. It makes predictions from all types of outcomes.
Random Forest Classifier: It uses the functionality of a group of decision trees. It is an ensemble algorithm. Individual trees are generated by attribute selection indication. Greater number of trees lead to computation of a better average and thus increases the accuracy of the making predictions on the given dataset. It resolves the overfitting problem that arises with decision trees.
3.2 Feature Analysis
First up, the features are analyzed for their frequency count [15], and then, their pairwise correlation is computed. The code is given below.
The plot for the count of diagnosis for malignant (M) and benign (B) is shown in Fig. 1. The pairwise correlation is shown with the help of a heat map [15,16,17] as in Fig. 2.
The classification models are constructed. After training the model with the given algorithms, we computed accuracy and confusion matrix from Sclearn library to ascertain the performance of the algorithms on the constructed model.
We calculated the accuracy of each model using a confusion matrix:
- TrPo:
-
True Positive
- TrNe:
-
True Negative
- FaNe:
-
False Negative
- FaPo:
-
False positive
4 Results and Discussions
We were successful in detection of breast cancer in patients through the use of machine learning algorithms. To assess the performance of all the algorithms, we used a confusion matrix to provide the best evaluation. We took 80% of the dataset to train each of the model and 20% of the dataset to test the precision of each model.
Logistic regression algorithm has an accuracy of 96.49%, while the decision tree algorithm has an accuracy of only 93.85%. However, the best results were shown by random forest classifier algorithm with an accuracy of 97.36%. Hence, we are able to tell with high accuracy if the cancer is benign or malignant in the subject (Figs. 3, 4 and 5).
5 Conclusions
Breast cancer is one of the most predominant cancers found in women. Detection at an early stage and diagnosis of this disease can save lives and the patient can undergo suitable treatment. Machine learning has vast applications in the modern healthcare system. One such use of machine learning is detection and diagnosis of diseases that has been thoroughly discussed in this paper. Integration of machine learning with medical databases and devices will make healthcare system more efficient and help in better organization of data. Healthcare industries such as tempus are using machine learning on their clinical data to provide personalized treatments for patients. Hence, an increased adoption of machine learning technology is expected in medical fields in the future.
References
Goyal S, Bhatia PK (2020) Comparison of machine learning techniques for software quality prediction. Int J Knowl Syst Sci (IJKSS) 11(2):21–40. https://doi.org/10.4018/IJKSS.2020040102
Goyal S (2021) Handling class-imbalance with KNN (Neighbourhood) under-sampling for software defect prediction. Artif Intell Rev. https://doi.org/10.1007/s10462-021-10044-w
Goyal S (2021) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-021-01326-1
Goyal S, Bhatia PK (2021) Heterogeneous stacked ensemble classifier for software defect prediction. Multimed Tools Appl. https://doi.org/10.1007/s11042-021-11488-6
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Tahmooresi M, Afshar A, Rad BB, Nowshath KB, Bamiah MA (2018) Early detection of breast cancer using machine learning techniques. J Telecommun Electr Comput Eng (JTEC) 10(3–2):21–27
Nallamala SH, Mishra P, Koneru SV (2019) Breast cancer detection using machine learning way. Int J Recent Technol Eng 8:1402–1405
Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA). IEEE, Dec 2016, pp 1–4
Agarap AFM (2018) On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. In: Proceedings of the 2nd international conference on machine learning and soft computing, Feb 2018, pp 5–9
Vaka AR, Soni B, Reddy S (2020) Breast cancer detection by leveraging machine learning. ICT Express 6(4):320–324
Chaurasia V, Pal S (2017) A novel approach for breast cancer detection using data mining techniques. Int J Innov Res Comput Commun Eng 2 (An ISO 3297: 2007 Certified Organization)
Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Distrib Parallel Syst 4(3):105
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 19(1):1–17
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/version/2
Goyal S, Bhatia PK (2021) Software fault prediction using lion optimization algorithm. Int J Inf Tecnol. https://doi.org/10.1007/s41870-021-00804-w
Goyal S, Bhatia PK (2020) Feature selection technique for effective software effort estimation using multi-layer perceptrons. In: Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. pp 183–194. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_15
Goyal S (2022) FOFS: firefly optimization for feature selection to predict fault-prone software modules. In: Nanda P, Verma VK, Srivastava S, Gupta RK, Mazumdar AP (eds) Data engineering for smart systems. Lecture Notes in Networks and Systems, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2641-8_46
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Goyal, S., Sinha, M., Nath, S., Mitra, S., Arora, C. (2023). Breast Cancer Detection Using Machine Learning. In: Bhateja, V., Mohanty, J.R., Flores Fuentes, W., Maharatna, K. (eds) Communication, Software and Networks. Lecture Notes in Networks and Systems, vol 493. Springer, Singapore. https://doi.org/10.1007/978-981-19-4990-6_57
Download citation
DOI: https://doi.org/10.1007/978-981-19-4990-6_57
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4989-0
Online ISBN: 978-981-19-4990-6
eBook Packages: EngineeringEngineering (R0)