Abstract
Breast cancer is most common in middle-aged female population. It is the fourth most dangerous cancer compared to remaining cancers. In recent years, breast cancer patients are significantly increasing, so the early diagnosis of cancer has become a necessary task in the cancer research, to facilitate subsequent clinical management of patients. The prevention of the breast cancer tumor is early detection of the tumor. Early detection of cancer can stop increase in tumor and saves lives. In the field of machine learning classification, cancer patients are classified into two types as benign or malignant. Different preprocessing techniques like filling missing values, applying correlation coefficient, synthetic minority oversampling technique (SMOTE) and tenfold cross-validations are implemented and aptly used to obtain the accuracy. The main context of this study is to identify key features from the dataset and analyze the performance evaluation of different machine learning algorithms like random forest classifier, logistic regression, support vector machine, decision tree, Gaussian Naive Bayes and k-nearest neighbors. Based on the results, the classification model that gives highest accuracy will be used as the best model for cancer prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
A. Jemal, T. Murray, E. Ward, A. Samuels, R.C. Tiwari, A. Ghafoor, E.J. Feuer, M.J. Thun, Cancer statistics. CA Cancer J. Clin. 55(1), 10–30 (2005)
K.B. Prakash, M.A. DoraiRangaswamy, A.R. Raman, Text studies towards multi-lingual content mining for web communication, in Proceedings of the 2nd International Conference on Trendz in Information Sciences and Computing (2010), pp. 28–31
K.B. Prakash, M.A.D. Rangaswamy, Content extraction of biological datasets using soft computing techniques. J. Med. Imag. Health Inf. 932–936 (2016)
K.B. Prakash, Information extraction in current Indian web documents. Int. J. Eng. Technol. (UAE) (2018), pp. 68–71
K.B. Prakash, M.A. DoraiRangaswamy, Content extraction studies using neural network and attribute generation. Ind. J. Sci. Technol. 1–10 (2016)
M. Sireesha, S. Vemuru, S. N. Tirumala Rao, Coalesce based binary table: an enhanced algorithm for mining frequent patterns. Int. J. Eng. Technol. 7(1.5), 51–55 (2018)
M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Frequent Itemset Mining Algorithms: A Survey. J. Theoret. Appl. Inf. Technol. 96(3), 744–755 (2018)
M. Sireesha, S. Vemuru, S.N. Tirumala Rao, Classification model for prediction of heart disease using correlation coefficient technique. Int. J. Adv. Trends in Comput. Sci. En. 9(2), 2116–2123 (2020)
U.S. Cancer Statistics Working Group. https://www.cdc.gov/cancer/uscs/technical_notes/index.html
M. Kumari, V. Singh, Breast cancer prediction system, in International Conference on Computational Intelligence and Data Science (2018), pp. 371–376
M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Optimized feature extraction and hybrid classification model for heart disease and breast cancer prediction. Int. J. Rec. Technol. Eng. 7(6), 1754–1772 (2016)
H. Asri, H. Mousannif, H. Al Moatassime, T. Noel, Using machine learning algorithms for breast cancer risk prediction and diagnosis, in The 6th International Symposium on Frontiers in Ambient and Mobile System (2016), pp. 1064–1069
UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29
Data Preprocessing- an overview. https://www.javatpoint.com/data-preprocessing-machine-learning
Handling Missing Values in machine learning. https://towardsdatascience.com/working-with-missing-data-in-machine-learning-9c0a430df4ce
M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Predictive analysis of imbalanced cardiovascular disease using SMOTE. Int. J. Adv. Sci. Technol. 29(5), 6301–6311 (2020)
In-Database Machine Learning 2: Calculate a correlation Matrix—A Data Exploration Post, Vertica. https://www.vertica.com/blog/in-database-machine-learning-2-calculate-a-correlation-matrix-a-data-exploration-post/
D. Lavanya, K. Usha Rani, Analysis of feature selection with classification: breast cancer datasets. Ind. J. Comput. Sci. Eng. (IJCSE) (2011)
Cross-validation: evaluating estimator performance. https://scikit-learn.org/stable/modules/cross_validation.html
J. Han, M. Kamber, Data Mining Concepts and Techniques (Morgan Kauffman Publishers, 2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Moturi, S., Tirumala Rao, S.N., Vemuru, S. (2021). Risk Prediction-Based Breast Cancer Diagnosis Using Personal Health Records and Machine Learning Models. In: Bhattacharyya, D., Thirupathi Rao, N. (eds) Machine Intelligence and Soft Computing. Advances in Intelligent Systems and Computing, vol 1280. Springer, Singapore. https://doi.org/10.1007/978-981-15-9516-5_37
Download citation
DOI: https://doi.org/10.1007/978-981-15-9516-5_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9515-8
Online ISBN: 978-981-15-9516-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)