Breast Cancer Detection Using Machine Learning

Goyal, Somya; Sinha, Mehul; Nath, Shashwat; Mitra, Sayan; Arora, Charvi

doi:10.1007/978-981-19-4990-6_57

Somya Goyal¹³,
Mehul Sinha¹³,
Shashwat Nath¹³,
Sayan Mitra¹³ &
…
Charvi Arora¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 493))

318 Accesses

Abstract

Through studies and statistics, it has been found that these days, breast cancer is the most common cancer leading to frequent deaths among women. Early screening and subsequent treatment can raise the chances of survival. Through this paper, we aim to demonstrate the ability to detect breast cancer cases using MRI scan data by analyzing the given data with machine learning algorithms. Using machine learning, we hope to ease the process of cancer detection in the hospitals so the patient can be afforded the right treatment as soon as possible before the situation can become critical. It also opens the door toward new possibilities in cancer detection for other different types of cancers as well as other diseases by use of machine learning in medical science, where detection using conventional means is usually laborious and time-taking.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Current Trends and Future Prospects: Detection of Breast Cancer Using Machine Learning Techniques

Detailed Review on Breast Cancer Diagnosis Using Different ML Algorithms

Machine Learning Techniques for Breast Cancer Diagnosis: Literature Review

1 Introduction

During the past years, doctors have classified breast cancer into various subtypes. In 2020, there were 2.3 million women diagnosed with breast cancer and 685,000 deaths globally. At the end of 2020, there were 7.8 million women alive who were diagnosed with breast cancer in the past 5 years. According to GLOBOCAN, breast cancer is the predominant cancer in women, making up to 25.1% of all cancers. There has been a significant decline in the amount of cases of breast cancer because of improvement in the medical sector with the help of advanced technology and software. Over a period of time, machine learning has helped a lot in improving the accuracy of detection of breast cancer in early stages. Breast cancer is more difficult and costly to treat as it reaches upper stages, hence, proper machinery and technology for detection and treatment are called for. This paper lays out a machine learning algorithm that helps to detect breast cancer in women efficiently and with great accuracy.

2 Related Works

This section gives the review of literature work in the field of “Breast cancer detection using machine learning.” Multiple sources were reviewed by the authors for the literature which have been referred for the analysis of breast cancer using machine learning. Further, the authors reviewed datasets from reliable sources for testing purposes of the methods reviewed. The most popular methods [1] of breast cancer detection, namely Naive Bayes classifier, K-nearest neighbors (KNN) [2], logistic regression, decision tree classifier, support vector machine (SVM) classifier [3] and random forest classifier [4, 5], which are given in Table 1.

Table 1 Literature work

Full size table

3 Proposed Methodology

The purpose of this work is to analyze the dataset using random forest classifier, logistic regression and decision tree classifier and compare the efficiency of the respective algorithms in terms of accuracy in detecting cancer in the subject.

3.1 Experimental Setup

The program was made using Python programming language on Jupyter Notebook application. We have used the Wisconsin Breast Cancer datasets [14] from the UCI machine learning repository for our analysis regarding the scope of our work.

We imported NumPy library for working with data using arrays. We imported Pandas library to analyze, clean and manipulate the given dataset. We imported Matplotlib and Seaborn libraries for data visualization of the statistics. We made use of Sclearn which is a machine learning library for Python to import and implement logistic regression, random forest classifier and decision tree classifier algorithms on the given data.

Logistic Regression: It uses a logistic function to for modeling data using dependent and independent variable. It is used for binary data and it is efficient to train. Logistic regression algorithm performs efficiently when the given dataset is separable linearly.

Decision Tree Classifier: A decision tree consists of decision nodes and leaf nodes. It begins with the root node and finds the best attribute using attribute selection measure so as to divide it into multiple subsets. From these, a decision tree node is generated. The process repeats recursively until no further classification of nodes is possible. It makes predictions from all types of outcomes.

Random Forest Classifier: It uses the functionality of a group of decision trees. It is an ensemble algorithm. Individual trees are generated by attribute selection indication. Greater number of trees lead to computation of a better average and thus increases the accuracy of the making predictions on the given dataset. It resolves the overfitting problem that arises with decision trees.

3.2 Feature Analysis

First up, the features are analyzed for their frequency count [15], and then, their pairwise correlation is computed. The code is given below.

The plot for the count of diagnosis for malignant (M) and benign (B) is shown in Fig. 1. The pairwise correlation is shown with the help of a heat map [15,16,17] as in Fig. 2.

A bar graph depicts count of diagnosis for two classes malignant and benign. Benign has highest count compare to malignant. — **Fig. 1**

An image depicts a heat map for the given data with a pairwise correlation of the features. — **Fig. 2**

The classification models are constructed. After training the model with the given algorithms, we computed accuracy and confusion matrix from Sclearn library to ascertain the performance of the algorithms on the constructed model.

We calculated the accuracy of each model using a confusion matrix:

TrPo:: True Positive
TrNe:: True Negative
FaNe:: False Negative
FaPo:: False positive

$${\text{Accuracy}} = \left( {{\text{TrPo}} + {\text{TrNe}}} \right)/\left( {{\text{TrPo}} + {\text{TrNe}} + {\text{FaNe}} + {\text{FaPo}}} \right)$$

4 Results and Discussions

We were successful in detection of breast cancer in patients through the use of machine learning algorithms. To assess the performance of all the algorithms, we used a confusion matrix to provide the best evaluation. We took 80% of the dataset to train each of the model and 20% of the dataset to test the precision of each model.

Logistic regression algorithm has an accuracy of 96.49%, while the decision tree algorithm has an accuracy of only 93.85%. However, the best results were shown by random forest classifier algorithm with an accuracy of 97.36%. Hence, we are able to tell with high accuracy if the cancer is benign or malignant in the subject (Figs. 3, 4 and 5).

A table depicts logistic regression confusion matrix includes TrPo equals 66, FaPo equals 1, FaNe equals 3, and TrNe equals 44. — **Fig. 3**

A table depicts decision tree confusion matrix includes TrPo equals 64, FaPo equals 3, FaNe equals 4, and TrNe equals 43. — **Fig. 4**

A table depicts random forest classifier confusion matrix includes TrPo equals 67, FaPo equals 0, FaNe equals 3, and TrNe equals 44. — **Fig. 5**

5 Conclusions

Breast cancer is one of the most predominant cancers found in women. Detection at an early stage and diagnosis of this disease can save lives and the patient can undergo suitable treatment. Machine learning has vast applications in the modern healthcare system. One such use of machine learning is detection and diagnosis of diseases that has been thoroughly discussed in this paper. Integration of machine learning with medical databases and devices will make healthcare system more efficient and help in better organization of data. Healthcare industries such as tempus are using machine learning on their clinical data to provide personalized treatments for patients. Hence, an increased adoption of machine learning technology is expected in medical fields in the future.

References

Goyal S, Bhatia PK (2020) Comparison of machine learning techniques for software quality prediction. Int J Knowl Syst Sci (IJKSS) 11(2):21–40. https://doi.org/10.4018/IJKSS.2020040102
Article Google Scholar
Goyal S (2021) Handling class-imbalance with KNN (Neighbourhood) under-sampling for software defect prediction. Artif Intell Rev. https://doi.org/10.1007/s10462-021-10044-w
Article Google Scholar
Goyal S (2021) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag. https://doi.org/10.1007/s13198-021-01326-1
Article Google Scholar
Goyal S, Bhatia PK (2021) Heterogeneous stacked ensemble classifier for software defect prediction. Multimed Tools Appl. https://doi.org/10.1007/s11042-021-11488-6
Article Google Scholar
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Article Google Scholar
Tahmooresi M, Afshar A, Rad BB, Nowshath KB, Bamiah MA (2018) Early detection of breast cancer using machine learning techniques. J Telecommun Electr Comput Eng (JTEC) 10(3–2):21–27
Google Scholar
Nallamala SH, Mishra P, Koneru SV (2019) Breast cancer detection using machine learning way. Int J Recent Technol Eng 8:1402–1405
Google Scholar
Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA). IEEE, Dec 2016, pp 1–4
Google Scholar
Agarap AFM (2018) On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. In: Proceedings of the 2nd international conference on machine learning and soft computing, Feb 2018, pp 5–9
Google Scholar
Vaka AR, Soni B, Reddy S (2020) Breast cancer detection by leveraging machine learning. ICT Express 6(4):320–324
Article Google Scholar
Chaurasia V, Pal S (2017) A novel approach for breast cancer detection using data mining techniques. Int J Innov Res Comput Commun Eng 2 (An ISO 3297: 2007 Certified Organization)
Google Scholar
Gayathri BM, Sumathi CP, Santhanam T (2013) Breast cancer diagnosis using machine learning algorithms-a survey. Int J Distrib Parallel Syst 4(3):105
Article Google Scholar
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 19(1):1–17
Article Google Scholar
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data/version/2
Goyal S, Bhatia PK (2021) Software fault prediction using lion optimization algorithm. Int J Inf Tecnol. https://doi.org/10.1007/s41870-021-00804-w
Article Google Scholar
Goyal S, Bhatia PK (2020) Feature selection technique for effective software effort estimation using multi-layer perceptrons. In: Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. pp 183–194. Springer, Cham. https://doi.org/10.1007/978-3-030-30577-2_15
Goyal S (2022) FOFS: firefly optimization for feature selection to predict fault-prone software modules. In: Nanda P, Verma VK, Srivastava S, Gupta RK, Mazumdar AP (eds) Data engineering for smart systems. Lecture Notes in Networks and Systems, vol 238. Springer, Singapore. https://doi.org/10.1007/978-981-16-2641-8_46

Download references

Author information

Authors and Affiliations

Manipal University Jaipur, Jaipur, Rajasthan, 303007, India
Somya Goyal, Mehul Sinha, Shashwat Nath, Sayan Mitra & Charvi Arora

Authors

Somya Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Mehul Sinha
View author publications
You can also search for this author in PubMed Google Scholar
Shashwat Nath
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Charvi Arora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somya Goyal .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Applications, KIIT University, Bhubaneswar, Odisha, India
Jnyana Ranjan Mohanty
Facultad de Ingenieria, Autonomous University of Baja California, Mexicali, Baja California, Mexico
Wendy Flores Fuentes
School of Electronics and Computer Science, University of Southampton, Southampton, UK
Koushik Maharatna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goyal, S., Sinha, M., Nath, S., Mitra, S., Arora, C. (2023). Breast Cancer Detection Using Machine Learning. In: Bhateja, V., Mohanty, J.R., Flores Fuentes, W., Maharatna, K. (eds) Communication, Software and Networks. Lecture Notes in Networks and Systems, vol 493. Springer, Singapore. https://doi.org/10.1007/978-981-19-4990-6_57

Download citation

DOI: https://doi.org/10.1007/978-981-19-4990-6_57
Published: 28 October 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4989-0
Online ISBN: 978-981-19-4990-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics