Skip to main content

Risk Prediction-Based Breast Cancer Diagnosis Using Personal Health Records and Machine Learning Models

  • Conference paper
  • First Online:
Machine Intelligence and Soft Computing

Abstract

Breast cancer is most common in middle-aged female population. It is the fourth most dangerous cancer compared to remaining cancers. In recent years, breast cancer patients are significantly increasing, so the early diagnosis of cancer has become a necessary task in the cancer research, to facilitate subsequent clinical management of patients. The prevention of the breast cancer tumor is early detection of the tumor. Early detection of cancer can stop increase in tumor and saves lives. In the field of machine learning classification, cancer patients are classified into two types as benign or malignant. Different preprocessing techniques like filling missing values, applying correlation coefficient, synthetic minority oversampling technique (SMOTE) and tenfold cross-validations are implemented and aptly used to obtain the accuracy. The main context of this study is to identify key features from the dataset and analyze the performance evaluation of different machine learning algorithms like random forest classifier, logistic regression, support vector machine, decision tree, Gaussian Naive Bayes and k-nearest neighbors. Based on the results, the classification model that gives highest accuracy will be used as the best model for cancer prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Jemal, T. Murray, E. Ward, A. Samuels, R.C. Tiwari, A. Ghafoor, E.J. Feuer, M.J. Thun, Cancer statistics. CA Cancer J. Clin. 55(1), 10–30 (2005)

    Google Scholar 

  2. K.B. Prakash, M.A. DoraiRangaswamy, A.R. Raman, Text studies towards multi-lingual content mining for web communication, in Proceedings of the 2nd International Conference on Trendz in Information Sciences and Computing (2010), pp. 28–31

    Google Scholar 

  3. K.B. Prakash, M.A.D. Rangaswamy, Content extraction of biological datasets using soft computing techniques. J. Med. Imag. Health Inf. 932–936 (2016)

    Google Scholar 

  4. K.B. Prakash, Information extraction in current Indian web documents. Int. J. Eng. Technol. (UAE) (2018), pp. 68–71

    Google Scholar 

  5. K.B. Prakash, M.A. DoraiRangaswamy, Content extraction studies using neural network and attribute generation. Ind. J. Sci. Technol. 1–10 (2016)

    Google Scholar 

  6. M. Sireesha, S. Vemuru, S. N. Tirumala Rao, Coalesce based binary table: an enhanced algorithm for mining frequent patterns. Int. J. Eng. Technol. 7(1.5), 51–55 (2018)

    Google Scholar 

  7. M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Frequent Itemset Mining Algorithms: A Survey. J. Theoret. Appl. Inf. Technol. 96(3), 744–755 (2018)

    Google Scholar 

  8. M. Sireesha, S. Vemuru, S.N. Tirumala Rao, Classification model for prediction of heart disease using correlation coefficient technique. Int. J. Adv. Trends in Comput. Sci. En. 9(2), 2116–2123 (2020)

    Google Scholar 

  9. U.S. Cancer Statistics Working Group. https://www.cdc.gov/cancer/uscs/technical_notes/index.html

  10. M. Kumari, V. Singh, Breast cancer prediction system, in International Conference on Computational Intelligence and Data Science (2018), pp. 371–376

    Google Scholar 

  11. M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Optimized feature extraction and hybrid classification model for heart disease and breast cancer prediction. Int. J. Rec. Technol. Eng. 7(6), 1754–1772 (2016)

    Google Scholar 

  12. H. Asri, H. Mousannif, H. Al Moatassime, T. Noel, Using machine learning algorithms for breast cancer risk prediction and diagnosis, in The 6th International Symposium on Frontiers in Ambient and Mobile System (2016), pp. 1064–1069

    Google Scholar 

  13. UCI Machine Learning Repository: Breast Cancer Wisconsin (Original) Data Set. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

  14. Data Preprocessing- an overview. https://www.javatpoint.com/data-preprocessing-machine-learning

  15. Handling Missing Values in machine learning. https://towardsdatascience.com/working-with-missing-data-in-machine-learning-9c0a430df4ce

  16. M. Sireesha, S.N. Tirumala Rao, S. Vemuru, Predictive analysis of imbalanced cardiovascular disease using SMOTE. Int. J. Adv. Sci. Technol. 29(5), 6301–6311 (2020)

    Google Scholar 

  17. In-Database Machine Learning 2: Calculate a correlation Matrix—A Data Exploration Post, Vertica. https://www.vertica.com/blog/in-database-machine-learning-2-calculate-a-correlation-matrix-a-data-exploration-post/

  18. D. Lavanya, K. Usha Rani, Analysis of feature selection with classification: breast cancer datasets. Ind. J. Comput. Sci. Eng. (IJCSE) (2011)

    Google Scholar 

  19. Cross-validation: evaluating estimator performance. https://scikit-learn.org/stable/modules/cross_validation.html

  20. J. Han, M. Kamber, Data Mining Concepts and Techniques (Morgan Kauffman Publishers, 2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sireesha Moturi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moturi, S., Tirumala Rao, S.N., Vemuru, S. (2021). Risk Prediction-Based Breast Cancer Diagnosis Using Personal Health Records and Machine Learning Models. In: Bhattacharyya, D., Thirupathi Rao, N. (eds) Machine Intelligence and Soft Computing. Advances in Intelligent Systems and Computing, vol 1280. Springer, Singapore. https://doi.org/10.1007/978-981-15-9516-5_37

Download citation

Publish with us

Policies and ethics