Skip to main content

Genomic Data Machined: The Random Forest Algorithm for Discovering Breast Cancer Biomarkers

  • Conference paper
  • First Online:
Information and Communication Technologies and Sustainable Development (ICT&SD 2022)

Abstract

Advanced data analysis tools and bioinformatics are essential for uncovering the nature of breast cancer, which is the leading cause of cancer death among women. The goal of this study is to identify potential genomic biomarkers that have a significant impact on four prognostic factors, including tumour size, lymph node involvement, metastasis, and overall survival status. The Random Forest algorithm has been trained on data from The Cancer Genome Atlas Breast Cancer, which contains the expression values of 19,737 genes. In order to obtain the optimal learning model, the process has been repeated 20 times for each indicator, and only the genes with a p-value < 0.05 were taken into further consideration. Several performance metrics (e.g., F1 score) were calculated to check the algorithm's reliability. As a result, 97 and 7 genes were included in the extended and final databases, respectively. The chosen genes have been proven to play a critical role in cancer-related pathways, such as Toll-like receptor and NF-κB, and have effects on cell proliferation, tumour formation, and angiogenesis. Thus, this study demonstrates the potential of machine learning analyses for biomedical purposes and provides machine-generated insights into breast cancer development, setting the groundwork for further in vitro examinations to validate the prognostic potential of these biomarkers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dyba, T., et al.: The European cancer burden in 2020: incidence and mortality estimates for 40 countries and 25 major cancers. Eur. J. Cancer 157, 308–347 (2021)

    Article  Google Scholar 

  2. Siegel, R.L., Miller, K.D., Wagle, N.S., Jemal, A.: Cancer statistics, 2023. CA Cancer J. Clin. 73(1), 17–48 (2023)

    Google Scholar 

  3. Zhang, Y., Zhang, Z.: The history and advances in cancer immunotherapy: understanding the characteristics of tumor-infiltrating immune cells and their therapeutic implications. Cel. Mol. Immunol. 17(8), 807–821 (2020)

    Article  Google Scholar 

  4. Zaremba, A., Zaremba, P., Zahorodnia, S.: In silico study of HASDI (high-affinity selective DNA intercalator) as a new agent capable of highly selective recognition of the DNA sequence. Sci. Rep. 13(1), 5395 (2023)

    Article  Google Scholar 

  5. Świętek, M., et al.: Magnetic temperature-sensitive solid-lipid particles for targeting and killing tumor cells. Front. Chem. 8, 205 (2020)

    Article  Google Scholar 

  6. Martínez, R., et al.: Multitarget anticancer agents based on histone deacetylase and protein kinase CK2 inhibitors. Molecules (Basel, Switzerland) 25(7), 1497 (2020)

    Article  Google Scholar 

  7. Riley, R.S., June, C.H., Langer, R., Mitchell, M.J.: Delivery technologies for cancer immunotherapy. Nat. Rev. Drug Discovery 18(3), 175–196 (2019)

    Article  Google Scholar 

  8. Falfushynska, H., Lushchak, O., Siemens, E.: The application of multivariate statistical methods in ecotoxicology and environmental biochemistry. In: Proceedings of International Conference on Applied Innovation in IT, vol. 10, no. 1, pp. 99–104 (2022)

    Google Scholar 

  9. Rzymski, P., Kasianchuk, N., Sikora, D., Poniedziałek, B.: COVID‐19 vaccinations and rates of infections, hospitalizations, ICU admissions, and deaths in Europe during SARS‐CoV‐2 Omicron wave in the first quarter of 2022. J. Med. Virol. 95(14) (2022). https://doi.org/10.1002/jmv.28131

  10. He, J., McGee, D.L., Niu, X.: Application of the Bayesian dynamic survival model in medicine. Stat. Med. 29(3), 347–360 (2010)

    Article  MathSciNet  Google Scholar 

  11. Kasianchuk, N., Tsvyk, D., Siemens, E., Falfushynska, H.: Random forest algorithm in unravelling biomarkers of breast cancer progression. In: Proceedings of the 11th International Conference on Applied Innovations in IT, vol. 11, no. 1 (2023)

    Google Scholar 

  12. Cerami, E., et al.: The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2(5), 401–404 (2012)

    Article  Google Scholar 

  13. Rich, A.M., Hussaini, H.M., Parachuru, V.P., Seymour, G.J.: Toll-like receptors and cancer, particularly oral squamous cell carcinoma. Front. Immunol. 5, 464 (2014)

    Article  Google Scholar 

  14. Semlali, A., et al.: Toll-like receptor 6 expression, sequence variants, and their association with colorectal cancer risk. J. Cancer 10(13), 2969–2981 (2019). https://doi.org/10.7150/jca.31011

    Article  Google Scholar 

  15. Kauppila, J.H., Takala, H., Selander, K.S., Lehenkari, P.P., Saarnio, J., Karttunen, T.J.: Increased toll-like receptor 9 expression indicates adverse prognosis in oesophageal adenocarcinoma. Histopathology 59, 643–649 (2011). https://doi.org/10.1111/j.1365-2559.2011.03991.x

    Article  Google Scholar 

  16. Tuomela, J., et al.: Low TLR9 expression defines an aggressive subtype of triple-negative breast cancer. Breast Cancer Res. Treat. 135, 481–493 (2012). https://doi.org/10.1007/s10549-012-2181-7

    Article  Google Scholar 

  17. Orlacchio, A., Mazzone, P.: The role of toll-like receptors (TLRs) mediated inflammation in pancreatic cancer pathophysiology. Int. J. Mol. Sci. 22(23), 12743 (2021). https://doi.org/10.3390/ijms222312743

    Article  Google Scholar 

  18. Gu, J., Liu, Y., Xie, B., Ye, P., Huang, J., Lu, Z.: Roles of toll-like receptors: from inflammation to lung cancer progression. Biomed. Rep. 8(2), 126–132 (2018). https://doi.org/10.3892/br.2017.1034

    Article  Google Scholar 

  19. Bhattacharya, D., Yusuf, N.: Expression of toll-like receptors on breast tumors: taking a toll on tumor microenvironment. Int. J. Breast Cancer 2012, 716564 (2012). https://doi.org/10.1155/2012/716564

    Article  Google Scholar 

  20. Javaid, N., Choi, S.: Toll-like receptors from the perspective of cancer treatment. Cancers 12(2), 297 (2020). https://doi.org/10.3390/cancers12020297

    Article  Google Scholar 

  21. Giurini, E.F., Madonna, M.B., Zloza, A., Gupta, K.H.: Microbial-derived toll-like receptor agonism in cancer treatment and progression. Cancers 14(12), 2923 (2022). https://doi.org/10.3390/cancers14122923

    Article  Google Scholar 

  22. Braunstein, M.J., Kucharczyk, J., Adams, S.: Targeting toll-like receptors for cancer therapy. Target. Oncol. 13(5), 583–598 (2018). https://doi.org/10.1007/s11523-018-0589-7

    Article  Google Scholar 

  23. Chen, X., Zhang, Y., Fu, Y.: The critical role of toll-like receptor-mediated signaling in cancer immunotherapy. Med. Drug Discov. 14, 100122 (2022). https://doi.org/10.1016/j.medidd.2022.100122

    Article  Google Scholar 

  24. Xia, L., et al.: Role of the NFκB-signaling pathway in cancer. Onco. Targets. Ther. 11, 2063–2073 (2018). https://doi.org/10.2147/OTT.S161109

    Article  Google Scholar 

  25. Dewe, J.M., Fuller, B.L., Lentini, J.M., Kellner, S.M., Fu, D.: TRMT1-Catalyzed tRNA modifications are required for redox homeostasis to ensure proper cellular proliferation and oxidative stress survival. Mol. Cell Biol. 37(21), e00214-e217 (2017). https://doi.org/10.1128/MCB.00214-17

    Article  Google Scholar 

  26. Qi, T.F., Miao, W., Wang, Y.: Targeted profiling of epitranscriptomic reader, writer, and eraser proteins accompanied with radioresistance in breast cancer cells. Anal. Chem. 94(3), 1525–1530 (2022). https://doi.org/10.1021/acs.analchem.1c05441

    Article  Google Scholar 

  27. Jiang, T., et al.: Quiescin Sulfhydryl Oxidase 2 overexpression predicts poor prognosis and tumor progression in patients with colorectal cancer: a study based on data mining and clinical verification. Front. Cell Dev. Biol. 9, 678770 (2021). https://doi.org/10.3389/fcell.2021.678770

    Article  Google Scholar 

  28. Li, Y., et al.: QSOX2 is an E2F1 target gene and a novel serum biomarker for monitoring tumor growth and predicting survival in advanced NSCLC. Front Cell Dev. Biol. 9, 688798 (2021). https://doi.org/10.3389/fcell.2021.688798

    Article  Google Scholar 

  29. Danuta, G., Tobias, M., Marcus, D., et al.: Molecular karyotyping and gene expression analysis in childhood cancer patients. J. Mol. Med. 98, 1107–1123 (2020). https://doi.org/10.1007/s00109-020-01937-4

    Article  Google Scholar 

  30. Zhou, S., et al.: Single-cell RNA-seq dissects the intratumoral heterogeneity of triple-negative breast cancer based on gene regulatory networks. Mol. Therapy Nucleic Acids 23, 682–690 (2021). https://doi.org/10.1016/j.omtn.2020.12.018

    Article  Google Scholar 

  31. Osmanbeyoglu, H.U., et al.: Chromatin-informed inference of transcriptional programs in gynecologic and basal breast cancers. Nat. Commun. 10, 4369 (2019). https://doi.org/10.1038/s41467-019-12196-5

    Article  Google Scholar 

  32. Euhus, D.M., Timmons, C.F., Tomlinson, G.E.: ETV6-NTRK3–Trk-ing the primary event in human secretory breast cancer. Cancer Cell 2(5), 347–348 (2002). https://doi.org/10.1016/s1535-6108(02)00184-8

    Article  Google Scholar 

  33. Jia, J.J., Zhang, X., Ge, C.R., Jois, M.: The polymorphisms of UCP2 and UCP3 genes associated with fat metabolism, obesity and diabetes. Obes. Rev. Official J. Int. Assoc. Study Obes. 10(5), 519–526 (2009). https://doi.org/10.1111/j.1467-789X.2009.00569.x

    Article  Google Scholar 

  34. Joshi, H., Vastrad, B., Joshi, N., Vastrad, C., Tengli, A., Kotturshetti, I.: Identification of key pathways and genes in obesity using bioinformatics analysis and molecular docking studies. Front. Endocrinol. 12, 628907 (2021). https://doi.org/10.3389/fendo.2021.628907

    Article  Google Scholar 

  35. Lentes, K.U., et al.: Genomic organization and mutational analysis of the human UCP2 gene, a prime candidate gene for human obesity. J. Recept. Signal Transduct. Res. 19(1–4), 229–244 (1999). https://doi.org/10.3109/10799899909036648

    Article  Google Scholar 

  36. Qiao, C., et al.: UCP2-related mitochondrial pathway participates in oroxylin a-induced apoptosis in human colon cancer cells. J. Cell. Physiol. 230(5), 1054–1063 (2015). https://doi.org/10.1002/jcp.24833

    Article  Google Scholar 

  37. Dando, I., et al.: UCP2 inhibition triggers ROS-dependent nuclear translocation of GAPDH and autophagic cell death in pancreatic adenocarcinoma cells. Biochem. Biophys. Acta. 1833(3), 672–679 (2013). https://doi.org/10.1016/j.bbamcr.2012.10.028

    Article  Google Scholar 

  38. Li, W., et al.: UCP2 knockout suppresses mouse skin carcinogenesis. Cancer Prev. Res. 8(6), 487–491 (2015). https://doi.org/10.1158/1940-6207.CAPR-14-0297-T

    Article  Google Scholar 

  39. Human Gene Set: ZIC3_01, https://www.gsea-msigdb.org/gsea/msigdb/cards/ZIC3_01.html. Accessed 14 May 2023

  40. Herman, G.E., El-Hodiri, H.M.: The role of ZIC3 in vertebrate development. Cytogenet. Genome Res. 99(1–4), 229–235 (2002). https://doi.org/10.1159/000071598

    Article  Google Scholar 

  41. Aruga, J.: The role of ZIC genes in neural development. Mol. Cell. Neurosci. 26(2), 205–221 (2004). https://doi.org/10.1016/j.mcn.2004.01.004

    Article  Google Scholar 

  42. Ma, G., Dai, W., Sang, A., Yang, X., Li, Q.: Roles of ZIC family genes in human gastric cancer. Int. J. Mol. Med. 38(1), 259–266 (2016). https://doi.org/10.3892/ijmm.2016.2587

    Article  Google Scholar 

  43. Yang, B., et al.: MiR-564 functions as a tumor suppressor in human lung cancer by targeting ZIC3. Biochem. Biophys. Res. Commun. 467(4), 690–696 (2015). https://doi.org/10.1016/j.bbrc.2015.10.082

    Article  Google Scholar 

  44. Chen, D., Fan, Y., Wan, F.: LncRNA IGBP1-AS1/miR-24-1/ZIC3 loop regulates the proliferation and invasion ability in breast cancer. Cancer Cell Int. 20, 153 (2020). https://doi.org/10.1186/s12935-020-01214-x

    Article  Google Scholar 

  45. Sharma, S., Kadam, P., Dubinett, S.: CCL21 programs immune activity in tumor microenvironment. Proc. Cancer Prev. Res. (Philadelphia, Pa.) 8(6), 487–491 (2015). https://doi.org/10.1158/1940-6207.CAPR-14-0297-T

  46. Cheever, M.A.: Twelve immunotherapy drugs that could cure cancers. Immunol. Rev. 222, 357–368 (2008). https://doi.org/10.1111/j.1600-065X.2008.00604.x

    Article  Google Scholar 

  47. Chang, X., et al.: Bioinformatic analysis suggests that three hub genes may be a vital prognostic biomarker in pancreatic ductal adenocarcinoma (2020). https://doi.org/10.1089/cmb.2019.0367

  48. Zhou, Y.Y., et al.: Integrated transcriptomic analysis reveals hub genes involved in diagnosis and prognosis of pancreatic cancer (2019). https://doi.org/10.1186/s10020-019-0113-2

  49. Yu, Y., Werdyani, S., Carey, M., Parfrey, P., Yilmaz, Y.E., Savas, S.: A comprehensive analysis of SNPs and CNVs identifies novel markers associated with disease outcomes in colorectal cancer (2021). https://doi.org/10.1002/1878-0261.13067

  50. Pengue, G., Cannada-Bartoli, P., Lania, L.: The ZNF35 human zinc finger gene encodes a sequence-specific DNA-binding protein. FEBS Lett. 321(2–3), 233–236 (1993). https://doi.org/10.1016/0014-5793(93)80115-b

    Article  Google Scholar 

  51. Yin, Z., et al.: Detecting prognosis risk biomarkers for colon cancer through multi-omics-based prognostic analysis and target regulation simulation modeling. Front. Genet. 11, 524 (2020). https://doi.org/10.3389/fgene.2020.00524

    Article  Google Scholar 

Download references

Acknowledgements

This work was partly supported by EMBO IG 4728-2020, Jacek Arct and ‘New Technologies for Women’ scholarshipsб Kyiv School of Economics «Talents for Ukraine» for NK and Alexander von Humboldt Stiftung (Philipp Schwartz-Initiative) for HF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadiia Kasianchuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kasianchuk, N., Tsvyk, D., Siemens, E., Ostash, V., Falfushynska, H. (2023). Genomic Data Machined: The Random Forest Algorithm for Discovering Breast Cancer Biomarkers. In: Dovgyi, S., Trofymchuk, O., Ustimenko, V., Globa, L. (eds) Information and Communication Technologies and Sustainable Development. ICT&SD 2022. Lecture Notes in Networks and Systems, vol 809. Springer, Cham. https://doi.org/10.1007/978-3-031-46880-3_25

Download citation

Publish with us

Policies and ethics