Abstract
Essential genes (EGs) are fundamental for the growth and survival of a cell or an organism. Identifying EGs is an important issue in many areas of biomedical research, such as synthetic and system biology, drug development, mechanistic and therapeutic investigations. The essentiality is a context-dependent dynamic attribute of a gene that can vary in different cells, tissues, or pathological conditions, and wet-lab experimental procedures to identify EGs are costly and time-consuming. Commonly explored computational approaches are based on machine learning techniques applied to protein-protein interaction networks, but they are often unsuccessful, especially in the case of human genes. From a biological point of view, the identification of the node essentiality attributes is a challenging task. Nevertheless, from a data science perspective, suitable graph learning approaches still represent an open problem. Node classification in graph modeling/analysis is a machine learning task to predict an unknown node property based on defined node attributes. The model is trained based on both the relationship information and the node attributes. Here, we propose the use of a context-specific integrated network enriched with biological and topological attributes. To tackle the node classification task we exploit different machine and deep learning models. An extensive experimental phase demonstrates the effectiveness of both network structure and attributes associated with the nodes for EGs identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Scikit-Learn: https://scikit-learn.org/stable/, Pytorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/, Imbalanced-learn: https://imbalanced-learn.org/stable/.
- 5.
Google Colab notebook for result reproducibility are available at: https://github.com/giordamaug/EG-identification---Data-Science-in-App-Springer/tree/main/notebook.
References
Chen, H., Zhang, Z., Jiang, S., Li, R., Li, W., Zhao, C., Hong, H., Huang, X., Li, H., Bo, X.: New insights on human essential genes based on integrated analysis and the construction of the hegiap web-based platform. Brief. Bioinform. 21(4), 1397–1410 (2020)
Hasan, M.A., Lonardi, S.: DeeplyEssential: a deep neural network for predicting essential genes in microbes. BMC Bioinform. 21(367) (2020). https://doi.org/10.1186/s12859-020-03688-y
Li, X., Li, W., Zeng, M., Zheng, R., Li, M.: Network-based methods for predicting essential genes or proteins: a survey. Brief. Bioinform. 21(2), 566–583 (2019). https://doi.org/10.1093/bib/bbz017
Hutchison III, C.A., Chuang, R.-Y., Noskov, V.N., Assad-Garcia, N., Deerinck, T.J., Ellisman, M.H., Gill, J., Kannan, K., Karas, B.J., Ma, L., et al.: Design and synthesis of a minimal bacterial genome. Science 351(6280), 6253 (2016)
Dickerson, J.E., Zhu, A., Robertson, D.L., Hentges, K.E.: Defining the role of essential genes in human disease. PLoS ONE 6(11), 27368 (2011)
Park, D., Park, J., Park, S.G., Park, T., Choi, S.S.: Analysis of human disease genes in the context of gene essentiality. Genomics 92(6), 414–418 (2008)
Juhas, M., Eberl, L., Church, G.M.: Essential genes as antimicrobial targets and cornerstones of synthetic biology. Trends Biotechnol. 30(11), 601–607 (2012)
Luo, L., Zheng, W., Chen, C., Sun, S.: Searching for essential genes and drug discovery in breast cancer and periodontitis via text mining and bioinformatics analysis. Anticancer Drugs 32(10), 1038 (2021)
Chang, L., Ruiz, P., Ito, T., Sellers, W.R.: Targeting pan-essential genes in cancer: challenges and opportunities. Cancer Cell 39(4), 466–479 (2021)
Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E.S., Sabatini, D.M.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)
Bartha, I., di Iulio, J., Venter, J.C., Telenti, A.: Human gene essentiality. Nat. Rev. Genet. 19(1), 51–62 (2018). https://doi.org/10.1038/nrg.2017.75
Bartha, I., di Iulio, J., Venter, J.C., Telenti, A.: Human gene essentiality. Nat. Rev. Genet. 19(1), 51–62 (2018)
Gurumayum, S., Jiang, P., Hao, X., Campos, T.L., Young, N.D., Korhonen, P.K., Gasser, R.B., Bork, P., Zhao, X.-M., He, L.-J., et al.: Ogee v3: Online gene essentiality database with increased coverage of organisms and human cell lines. Nucleic Acids Res. 49(D1), 998–1003 (2021)
Cowley, G.S., Weir, B.A., Vazquez, F., Tamayo, P., Scott, J.A., Rusin, S., East-Seletsky, A., Ali, L.D., Gerath, W.F., Pantel, S.E., et al.: Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1(1), 1–12 (2014)
Aromolaran, O., Aromolaran, D., Isewon, I., Oyelade, J.: Machine learning approach to gene essentiality prediction: a review. Brief. Bioinform. 22(5) (2021). https://doi.org/10.1093/bib/bbab128
Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)
Liu, X., Hong, Z., Liu, J., Lin, Y., Rodríguez-Patón, A., Zou, Q., Zeng, X.: Computational methods for identifying the critical nodes in biological networks. Brief. Bioinform. 21(2), 486–497 (2020)
Manipur, I., Giordano, M., Piccirillo, M., Parashuraman, S., Maddalena, L.: Community detection in protein-protein interaction networks and applications. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2021). https://doi.org/10.1109/TCBB.2021.3138142
Granata, I., Manzo, M., Kusumastuti, A., Guarracino, M.R.: Learning from metabolic networks: current trends and future directions for precision medicine. Curr. Med. Chem. 28(32), 6619–6653 (2021)
Dong, C., Jin, Y.-T., Hua, H.-L., Wen, Q.-F., Luo, S., Zheng, W.-X., Guo, F.-B.: Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief. Bioinform. 21(1), 171–181 (2018). https://doi.org/10.1093/bib/bby116
Aromolaran, O., Beder, T., Oswald, M., Oyelade, J., Adebiyi, E., Koenig, R.: Essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput. Struct. Biotechnol. J. 18, 612–621 (2020). https://doi.org/10.1016/j.csbj.2020.02.022
Campos, T.L., Korhonen, P.K., Gasser, R.B., Young, N.D.: An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput. Struct. Biotechnol. J. 17, 785–796 (2019). https://doi.org/10.1016/j.csbj.2019.05.008
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Zeng, M., Li, M., Fei, Z., Wu, F.-X., Li, Y., Pan, Y., Wang, J.: A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinf. 18(1), 296–305 (2021). https://doi.org/10.1109/TCBB.2019.2897679
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 855–864. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939754
Dai, W., Chang, Q., Peng, W., Zhong, J., Li, Y.: Network embedding the protein-protein interaction network for human essential genes identification. Genes 11(2), 153 (2020)
Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(R53) (2010). https://doi.org/10.1186/gb-2010-11-5-r53
Li, T., Wernersson, R., Hansen, R., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017). https://doi.org/10.1038/nmeth.4083
Rezaei, J., Zare Mirakabad, F., Marashi, S.-A., MirHassani, S.A.: The assessment of essential genes in the stability of PPI networks using critical node detection problem. AUT J. Math. Comput. 3(1), 59–76 (2022)
Schapke, J., Tavares, A., Recamonde-Mendoza, M.: EPGAT: gene essentiality prediction with graph attention networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(3), 1615–1626 (2022). https://doi.org/10.1109/TCBB.2021.3054738
Zhang, X., Xiao, W., Xiao, W.: Deephe: accurately predicting human essential genes based on deep learning. PLoS Comput. Biol. 16(9), 1008229 (2020)
Kuang, S., Wei, Y., Wang, L.: Expression-based prediction of human essential genes and candidate lncrnas in cancer cells. Bioinformatics 37(3), 396–403 (2021)
Granata, I., Guarracino, M.R., Kalyagin, V.A., Maddalena, L., Manipur, I., Pardalos, P.M.: Supervised classification of metabolic networks. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2688–2693 (2018). https://doi.org/10.1109/BIBM.2018.8621500
Manipur, I., Granata, I., Maddalena, L., Guarracino, M.R.: Clustering analysis of tumor metabolic networks. BMC Bioinform. (2020). https://doi.org/10.1186/s12859-020-03564-9
Wang, H., Robinson, J.L., Kocabas, P., Gustafsson, J., Anton, M., Cholley, P.-E., Huang, S., Gobom, J., Svensson, T., Uhlen, M., et al.: Genome-scale metabolic network reconstruction of model animals as a platform for translational research. Proceed. Natil. Acad. Sci. 118(30) (2021)
Kotlyar, M., Pastrello, C., Malik, Z., Jurisica, I.: Iid 2018 update: context-specific physical protein-protein interactions in human, model organisms and domesticated species. Nucleic Acids Res. 47(D1), 581–589 (2019)
Uhlén, M., Fagerberg, L., Hallström, B.M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, Å., Kampf, C., Sjöstedt, E., Asplund, A., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)
Nandi, S., Subramanian, A., Sarkar, R.R.: An integrative machine learning strategy for improved prediction of essential genes in escherichia coli metabolism using flux-coupled features. Mol. BioSyst. 13(8), 1584–1596 (2017)
Carithers, L.J., Ardlie, K., Barcus, M., Branton, P.A., Britton, A., Buia, S.A., Compton, C.C., DeLuca, D.S., Peter-Demchok, J., Gelfand, E.T., et al.: A novel approach to high-quality postmortem tissue procurement: the gtex project. Biopreservation Biobanking 13(5), 311–319 (2015)
Tang, G., Cho, M., Wang, X.: Oncodb: an interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Res. 50(D1), 1334–1339 (2022)
Durinck, S., Spellman, P.T., Birney, E., Huber, W.: Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat. Protoc. 4, 1184–1191 (2009)
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2009)
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009)
Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., Sun, S., et al.: High-resolution crispr screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163(6), 1515–1526 (2015)
Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 36(suppl_1), 13–21 (2007)
Cacheiro, P., Muñoz-Fuentes, V., Murray, S.A., Dickinson, M.E., Bucan, M., Nutter, L.M., Peterson, K.A., Haselimashhadi, H., Flenniken, A.M., Morgan, H., et al.: Human and mouse essentiality screens as a resource for disease gene discovery. Nature Commun. 11(1), 1–16 (2020)
Piñero, J., Ramírez-Anguita, J.M., Saüch-Pitarch, J., Ronzano, F., Centeno, E., Sanz, F., Furlong, L.I.: The disgenet knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48(D1), 845–855 (2020)
Granata, I., Guarracino, M.R., Maddalena, L., Manipur, I.: Network distances for weighted digraphs. In: Kochetov, Y., Bykadorov, I., Gruzdeva, T. (eds.) Mathematical Optimization Theory and Operations Research. CCIS, vol. 1275, pp. 389–408. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58657-7_31
Rasti, S., Vogiatzis, C.: A survey of computational methods in protein-protein interaction networks. Ann. Oper. Res. 276(1), 35–87 (2019). https://doi.org/10.1007/s10479-018-2956-2
Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. The Journal of Mathematical Sociology 2(1), 113–120 (1972). https://doi.org/10.1080/0022250X.1972.9989806
Granata, I., Guarracino, M.R., Kalyagin, V.A., Maddalena, L., Manipur, I., Pardalos, P.M.: Model simplification for supervised classification of metabolic networks. Ann. Math. Artif. Intell. 88, 91–104 (2020). https://doi.org/10.1007/s10472-019-09640-y
Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004). https://doi.org/10.1073/pnas.0400087101
Csardi, G., Nepusz, T.: The igraph software package for complex network research. Inter. J. Complex Syst. 1695 (2006)
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)
Sporns, O., Kötter, R., Friston, K.J.: Motifs in brain networks. PLoS Biol. 2(11), 369 (2004)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999). https://doi.org/10.1145/324133.324140
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X. Proceedings of the Seventh International World Wide Web Conference
Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: Improving classification performance when training data is skewed. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008). IEEE
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Yue, X., Wang, Z., Huang, J., Parthasarathy, S., Moosavinasab, S., Huang, Y., Lin, S.M., Zhang, W., Zhang, P., Sun, H.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2020)
Nelson, W., Zitnik, M., Wang, B., Leskovec, J., Goldenberg, A., Sharan, R.: To embed or not: network embedding as a paradigm in computational biology. Front. Genet. 10, 381 (2019)
Manipur, I., Manzo, M., Granata, I., Giordano, M., Maddalena, L., Guarracino, M.R.: Netpro2vec: a graph embedding framework for biomedical applications. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(2), 729–740 (2022). https://doi.org/10.1109/TCBB.2021.3078089
Maddalena, L., Manipur, I., Manzo, M., Guarracino, M.R.: In: Mondaini, R.P. (ed.) On Whole-Graph Embedding Techniques, pp. 115–131. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73241-7_8
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 3844–3852. Curran Associates Inc., Red Hook, NY, USA (2016)
Manzo, M., Giordano, M., Maddalena, L., Guarracino, M.R.: Performance evaluation of adversarial attacks on whole-graph embedding models. In: Simos, D.E., Pardalos, P.M., Kotsireas, I.S. (eds.) Learning and Intelligent Optimization. Lecture Notes in Computer Science, vol. 12931, pp. 219–236. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92121-7_19
Acknowledgements
This work has been partially funded by the BiBiNet project (H35F21000430002) within POR-Lazio FESR 2014-2020. It was carried out also within the activities of the authors as members of the ICAR-CNR INdAM Research Unit and partially supported by the INdAM research project “Computational Intelligence methods for Digital Health”. The work of Mario R. Guarracino was conducted within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE). Mario Manzo thanks Prof. Alfredo Petrosino for the guidance and supervision during the years of working together.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Manzo, M., Giordano, M., Maddalena, L., Guarracino, M.R., Granata, I. (2023). Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. In: Dzemyda, G., Bernatavičienė, J., Kacprzyk, J. (eds) Data Science in Applications. Studies in Computational Intelligence, vol 1084. Springer, Cham. https://doi.org/10.1007/978-3-031-24453-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-24453-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24452-0
Online ISBN: 978-3-031-24453-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)