Skip to main content

Advertisement

Log in

Breast cancer prediction using different machine learning methods applying multi factors

  • Research
  • Published:
Journal of Cancer Research and Clinical Oncology Aims and scope Submit manuscript

Abstract

Objective

Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors.

Methods

The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated.

Result

Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%, and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker, biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r2 = 0.54).

Conclusion

Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144

    Google Scholar 

  • Akbari A et al (2011) Parity and breastfeeding are preventive measures against breast cancer in Iranian women. Breast Cancer 18:51–55

    PubMed  Google Scholar 

  • Antoniou AC, Easton D (2006) Models of genetic susceptibility to breast cancer. Oncogene 25:5898–5905

    PubMed  CAS  Google Scholar 

  • Arthur RS, Xue X, Rohan TE (2020) Prediagnostic circulating levels of sex steroid hormones and SHBG in relation to risk of ductal carcinoma in situ of the breast among UK women. Cancer Epidemiol Prev Biomark 29:1058–1066

    CAS  Google Scholar 

  • Awaysheh A et al (2019) Review of medical decision support and machine-learning methods. Vet Pathol 56:512–525

    PubMed  Google Scholar 

  • Bazila-Banu A, Thirumalaikolundusubramanian P (2018) Comparison of Bayes classifiers for breast cancer classification. Asian Pac J Cancer Prev: APJCP 19:2917

    Google Scholar 

  • Bharati S, Rahman MA, Podder P (2018) In: 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT). IEEE. pp 581–584

  • Boeri C et al (2020) Machine Learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9:3234–3243

    PubMed  PubMed Central  Google Scholar 

  • Borges C, Almeida D, Damasceno M (2020) Prognostic and predictive factors for primary chemotherapy in locally advanced breast cancer. medRxiv

  • Brewer HR, Jones ME, Schoemaker MJ, Ashworth A, Swerdlow AJ (2017) Family history and risk of breast cancer: an analysis accounting for family structure. Breast Cancer Res Treat 165:193–200

    PubMed  PubMed Central  Google Scholar 

  • Calle ML, Urrea V, Boulesteix A-L, Malats N (2011) AUC-RF: a new strategy for genomic profiling with random forest. Hum Hered 72:121–132

    PubMed  CAS  Google Scholar 

  • Chandrasekar R, Palaniammal V, Phil M (2013) Performance and evaluation of data mining techniques in cancer diagnosis. IOSR J Comput Eng (IOSR-JCE) 15:39–44

    Google Scholar 

  • Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329

    CAS  Google Scholar 

  • Chen X, Wang M, Zhang H (2011) The use of classification trees for bioinformatics. Wiley Interdiscip Rev: Data Min Knowl Discov 1:55–63

    PubMed  Google Scholar 

  • Chen W et al (2013) Risk of GWAS-identified genetic variants for breast cancer in a Chinese population: a multiple interaction analysis. Breast Cancer Res Treat 142:637–644

    PubMed  CAS  Google Scholar 

  • Chen L et al (2020) Local extraction and detection of early stage breast cancers through a microneedle and nano-Ag/MBL film based painless and blood-free strategy. Mater Sci Eng, C 109:110402

    CAS  Google Scholar 

  • Chidambaranathan S (2016) Breast cancer diagnosis based on feature extraction by hybrid of k-means and extreme learning machine algorithms. ARPN J Eng Appl Sci 11:4581–4586

    Google Scholar 

  • Chu SY et al (1991) The relationship between body mass and breast cancer among women enrolled in the cancer and steroid hormone study. J Clin Epidemiol 44:1197–1206

    PubMed  CAS  Google Scholar 

  • Dorani F, Hu T, Woods MO, Zhai G (2018) Ensemble learning for detecting gene-gene interactions in colorectal cancer. PeerJ 6:e5854

    PubMed  PubMed Central  Google Scholar 

  • Eltalhi S, Kutrani H (2019) Breast cancer diagnosis and prediction using machine learning and data mining techniques: a review. IOSR J Dental Med Sci 18(4):85–94

    Google Scholar 

  • Emerson M (2019) Race, age and treatment delay in the Carolina breast cancer study phase 3

  • Fabris VT (2014) From chromosomal abnormalities to the identification of target genes in mouse models of breast cancer. Cancer Genet 207:233–246

    PubMed  CAS  Google Scholar 

  • Ferguson NL et al (2013) Prognostic value of breast cancer subtypes, Ki-67 proliferation index, age, and pathologic tumor characteristics on breast cancer survival in Caucasian women. Breast J 19:22–30

    PubMed  Google Scholar 

  • Ferroni P et al (2019) Breast cancer prognosis using a machine learning approach. Cancers 11:328

    PubMed  CAS  PubMed Central  Google Scholar 

  • Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inform Decis Mak 19:48

    PubMed  PubMed Central  Google Scholar 

  • Garcia M et al (2007) Global Cancer Facts & Figures 2007. Atlanta, GA: American Cancer Society

  • Getachew S et al (2020) Perceived barriers to early diagnosis of breast cancer in south and southwestern Ethiopia: a qualitative study. BMC Womens Health 20:1–8

    Google Scholar 

  • Giger ML (2000) Computer-aided diagnosis in mammography. Handb Med Imaging 2:915–1004

    Google Scholar 

  • Hadizadeh M et al (2018) GJA4/Connexin 37 mutations correlate with secondary lymphedema following surgery in breast cancer patients. Biomedicines 6:23

    PubMed  PubMed Central  Google Scholar 

  • Hayes SC, Janda M, Cornish B, Battistutta D, Newman B (2008) Lymphedema after breast cancer: incidence, risk factors, and effect on upper body function. J Clin Oncol 26:3536–3542

    PubMed  Google Scholar 

  • Hesari A et al (2019) Evaluation of the two polymorphisms rs1801133 in MTHFR and rs10811661 in CDKN2A/B in breast cancer. J Cell Biochem 120:2090–2097

    PubMed  CAS  Google Scholar 

  • Ho PJ et al (2020) Incidence of breast cancer attributable to breast density, modifiable and non-modifiable breast cancer risk factors in Singapore. Sci Rep 10:1–11

    CAS  Google Scholar 

  • Kim W et al (2012) Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer 15:230–238

    PubMed  CAS  PubMed Central  Google Scholar 

  • Knai C et al (2012) Systematic review of the methodological quality of clinical guideline development for the management of chronic disease in Europe. Health Policy 107:157–167

    PubMed  Google Scholar 

  • Kobayashi H, Takahashi H, Kimura T, Kikuchi K, Tazaki M (2000) In: 2000 26th annual conference of the IEEE industrial electronics society. IECON 2000. 2000 ieee international conference on industrial electronics, control and instrumentation. 21st century technologies. IEEE, pp. 487–492

  • Kontzoglou K et al (2013) Correlation between Ki67 and breast cancer prognosis. Oncology 84:219–225

    PubMed  CAS  Google Scholar 

  • Kordík P, Černý J, Frýda T (2018) Discovering predictive ensembles for transfer learning and meta-learning. Mach Learn 107:177–207

    Google Scholar 

  • Lavanya D, Rani KU (2012) Ensemble decision tree classifier for breast cancer data. Int J Inf Technol Converg Serv 2:17

    Google Scholar 

  • Liang M et al (2018) Association between CHEK2* 1100delC and breast cancer: a systematic review and meta-analysis. Mol Diagn Ther 22:397–407

    PubMed  CAS  Google Scholar 

  • Liu K-H, Tong M, Xie S-T, Yee Ng VT (2015) Genetic programming based ensemble system for microarray data classification. Comput Math Methods Med. https://doi.org/10.1155/2015/193406

    Article  PubMed  PubMed Central  Google Scholar 

  • Lotfi M, Charkhati S, Shobeyri S (2008) Breast cancer risk factors in an urban area of Yazd city, Iran

  • Ma R, Huang D, Zhang T, Luo T (2018) Determining influential descriptors for polymer chain conformation based on empirical force-fields and molecular dynamics simulations. Chem Phys Lett 704:49–54

    CAS  Google Scholar 

  • Majali J, Niranjan R, Phatak V, Tadakhe O (2015) Data mining techniques for diagnosis and prognosis of cancer. Int J Adv Res Comput Commun Eng 4:613–616

    Google Scholar 

  • Martin A-M, Weber BL (2000) Genetic and hormonal risk factors in breast cancer. J Natl Cancer Inst 92:1126–1135

    PubMed  CAS  Google Scholar 

  • Menze BH et al (2009) A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 10:213

    PubMed  PubMed Central  Google Scholar 

  • Moore P, Lyons T, Gallacher J, Initiative AsDN (2019) Random forest prediction of Alzheimer’s disease using pairwise selection from time series data. PLoS ONE 14:e0211558

    PubMed  CAS  PubMed Central  Google Scholar 

  • Mubarik S et al (2020) A Hierarchical age–period–cohort analysis of breast cancer mortality and disability adjusted life years (1990–2015) attributable to modified risk factors among Chinese women. Int J Environ Res Public Health 17:1367

    PubMed  PubMed Central  Google Scholar 

  • Mushtaq Z, Yaqub A, Sani S, Khalid A (2020) Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets. J Chin Inst Eng 43:80–92

    Google Scholar 

  • Nazari E, Ameli E, Tabesh H (2019a) Big data in healthcare: A to Z. J Biostat Epidemiol 5(3):194–203

    Google Scholar 

  • Nazari E, Afkanpour M, Tabesh H (2019b) Big data from A to Z. Front Health Inform 8:20

    Google Scholar 

  • Nazari E et al (2020a) Deep learning for acute myeloid leukemia diagnosis. J Med Life 13:382

    PubMed  PubMed Central  Google Scholar 

  • Nazari E et al (2020b) A comprehensive overview of decision fusion technique in healthcare: a systematic scoping review. Iran Red Crescent Med J 22(10):e30

    Google Scholar 

  • Nguyen C, Wang Y, Nguyen HN (2013) Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. J Biomed Sci Eng 6(3):551–560

    Google Scholar 

  • Okun O, Priisalu H (2007) Iberian conference on pattern recognition and image analysis. Springer, pp. 483–490

  • Ozcift A (2012) SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. J Med Syst 36:2141–2147

    PubMed  Google Scholar 

  • Polat K, Güneş S (2007) Breast cancer diagnosis using least square support vector machine. Digit Signal Process 17:694–701

    Google Scholar 

  • Pujol P, Galtier-Dereure F, Bringer J (1997) Obesity and breast cancer risk. Hum Reprod 12:116–125

    PubMed  Google Scholar 

  • Qi Y (2012) Ensemble machine learning. Springer, New York, pp 307–323

    Google Scholar 

  • Radhakrishnan A, Madhav ML (2016) A survey on efficient broadcast protocol for the Internet of Things. IJECS 5:18838–18842

    Google Scholar 

  • Reddington R et al (2020) Incidence of male breast cancer in Scotland over a twenty-five-year period (1992–2017). Eur J Surg Oncol 46(6):e51

    Google Scholar 

  • Sarica A, Cerasa A, Quattrone A (2017) Random Forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front Aging Neurosci 9:329

    PubMed  PubMed Central  Google Scholar 

  • Sartor H et al (2020) The association of single nucleotide polymorphisms (SNPs) with breast density and breast cancer survival: the Malmö diet and cancer study. Acta Radiol 61(10):1326–1334

    PubMed  PubMed Central  Google Scholar 

  • Saslow D et al (2007) American Cancer Society guidelines for breast screening with MRI as an adjunct to mammography. CA: Cancer J Clin 57:75–89

    PubMed  Google Scholar 

  • Seifi S et al (2020) Association of cyclin-dependent kinase inhibitor 2A/B with increased risk of developing breast cancer. J Cell Physiol 235:5141–5145

    PubMed  CAS  Google Scholar 

  • Semin JN, Palm D, Smith LM, Ruttle S (2020) Understanding breast cancer survivors’ financial burden and distress after financial assistance. Support Care Cancer 28(9):4241–4248

    PubMed  Google Scholar 

  • Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 18:205–219

    PubMed  CAS  Google Scholar 

  • ShahidSales S et al (2018) A genetic variant in CDKN2A/B gene is associated with the increased risk of breast cancer. J Clin Lab Anal 32:e22190

    PubMed  Google Scholar 

  • Sheikhtaheri A, Sadoughi F, Dehaghi ZH (2014) Developing and using expert systems and neural networks in medicine: a review on benefits and challenges. J Med Syst 38:110

    PubMed  Google Scholar 

  • Shen T-C et al (2017) Patients with uterine leiomyoma exhibit a high incidence but low mortality rate for breast cancer. Oncotarget 8:33014

    PubMed  PubMed Central  Google Scholar 

  • Smith-Warner SA et al (1998) Alcohol and breast cancer in women: a pooled analysis of cohort studies. JAMA 279:535–540

    PubMed  CAS  Google Scholar 

  • Sumbaly R, Vishnusri N, Jeyalatha S (2014) Diagnosis of breast cancer using decision tree data mining technique. Int J Comput Appl 98(10):16–24

    Google Scholar 

  • Takalkar U et al (2020) Hormone related risk factors and breast cancer: hospital based case control study from India. Breast Cancer. https://doi.org/10.5171/2014.872124

    Article  Google Scholar 

  • Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2(3 Suppl):S75-83

    CAS  Google Scholar 

  • Tarek A, El-Ghonaimy EA, Abdelaziz S, El-Shinawi M, Mohamed MM (2020) Characterization of the surgical leakage collected after breast cancer surgery and studying their effect on breast cancer cell line. Egypt Acad J Biol Sci, D Histol Histochem 12:21–29

    Google Scholar 

  • Tourassi GD, Markey MK, Lo JY, Floyd CE Jr (2001) A neural network approach to breast cancer diagnosis as a constraint satisfaction problem. Med Phys 28:804–811

    PubMed  CAS  Google Scholar 

  • Übeyli ED (2007) Implementing automated diagnostic systems for breast cancer detection. Expert Syst Appl 33:1054–1062

    Google Scholar 

  • Wang H et al (2020) Competitive electrochemical aptasensor based on a cDNA-ferrocene/MXene probe for detection of breast cancer marker Mucin1. Anal Chim Acta 1094:18–25

    PubMed  CAS  Google Scholar 

  • Yue W et al (2010) Effects of estrogen on breast cancer development: role of estrogen receptor independent mechanisms. Int J Cancer 127:1748–1757

    PubMed  CAS  PubMed Central  Google Scholar 

  • Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2:13

    Google Scholar 

  • Zakariah M (2014) Classification of genome data using random forest algorithm. Int J Comput Techno Appl 5(5):1663–1669

    Google Scholar 

  • Zand HKK (2015) A comparative survey on data mining techniques for breast cancer diagnosis and prediction. Indian J Fundam Appl Life Sci 5:4330–4339

    Google Scholar 

  • Zeliha KP et al (2020) Association between ABCB1, ABCG2 carrier protein and COX-2 enzyme gene polymorphisms and breast cancer risk in a Turkish population. Saudi Pharm J 28:215–219

    PubMed  CAS  Google Scholar 

Download references

Funding

This study was funded by Mashhad University of Medical Sciences (grant number 960336, 960275, 960211, 951122, 940724, 961731).

Author information

Authors and Affiliations

Authors

Contributions

Study conception and design: EN, MT, AA, MK, GAF, HT Acquisition of data: EN Analysis and interpretation of data: EN, HN, MT, RA, AHF Drafting of manuscript: EN, HN, MD, AM Critical revision: AA, GAF.

Corresponding authors

Correspondence to Hamed Tabesh or Amir Avan.

Ethics declarations

Conflict of interest

All authors have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were following the ethical standards of Mashhad University of Medical Sciences and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 37 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nazari, E., Naderi, H., Tabadkani, M. et al. Breast cancer prediction using different machine learning methods applying multi factors. J Cancer Res Clin Oncol 149, 17133–17146 (2023). https://doi.org/10.1007/s00432-023-05388-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00432-023-05388-5

Keywords

Navigation