Abstract
The identification of essential proteins is not only important for understanding organism structure on the molecular level, but also beneficial to drug-target detection and genetic disease prevention. Traditional methods often employ various centrality indices of static protein-protein interaction (PPI) networks and/or gene expression profiles to predict essential proteins. However, the prediction accuracy of most methods still has room to be further improved. In this study, we propose a strategy to increase the prediction accuracy of essential protein identification in three ways. Firstly, RNA-Seq datasets are employed to construct integrated dynamic PPI networks. Using a RNA-Seq dataset is expected to give more accurate predictions than using microarray gene expression profiles. Secondly, a novel integrated dynamic PPI network is constructed by considering both the co-expression pattern and the co-expression level of the RNA-Seq data. Thirdly, a novel two-step strategy is proposed to identify essential proteins from two known centrality indices. Numerical experiments have shown that the proposed strategy can increase the prediction accuracy dramatically, which can be generalized to many existing methods and centrality indices.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Giaever G, Chu A M, Ni L, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature, 2002, 418: 387–391
Cullen L M, Arndt G M. Genome-wide screening for gene function using RNAi in mammalian cells. Immun Cell Biol, 2005, 83: 217–223
Wang J X, Peng W, Wu F X. Computational approaches to predicting essential proteins: a survey. Proteom-Clin Appl, 2013, 7: 181–192
Gerdes S Y, Scholle M D, Campbell J W, et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J Bacteriol, 2003, 185: 5673–5684
Batada N N, Hurst L D, Tyers M. Evolutionary and physiological importance of hub proteins. PLoS Comput Biol, 2006 2: e88
Hahn M W, Kern A D. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol, 2005, 22: 803–806
Yu H, Greenbaum D, Lu H X, et al. Genomic analysis of essentiality within protein networks. Trends Genet, 2004, 20: 227–231
Estrada E. Virtual identification of essential proteins within the protein interaction network of yeast. Proteomics, 2006, 6: 35–40
Li M, Lu Y, Wang J X, et al. A topology potential-based method for identifying essential proteins from PPI networks. IEEE/ACM Trans Comput Biol Bioinform, 2015, 12: 372–383
Ren J, Wang J X, Li M, et al. Discovering essential proteins based on PPI network and protein complex. Int J Data Min Bioinform, 2015, 12: 24–43
Li M, Zheng R Q, Zhang H H, et al. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods, 2014, 67: 325–333
Tang Y, Li M, Wang J X, et al. CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems, 2015, 127: 67–72
Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press, 1994
Freeman L C. Centrality in social networks conceptual clarification. Soc Netw, 1979, 1: 215–239
Zotenko E, Mestre J, O’leary D P, et al. Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol, 2008, 4: e1000140
Jeong H, Mason S P, Barabási A L, et al. Lethality and centrality in protein networks. Nature, 2001, 411: 41–42
Bonacich P. Power and centrality: a family of measures. Amer J Sociol, 1987, 92: 1170–1182
Li M, Wang J X, Chen X, et al. A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem, 2011, 35: 143–150
Estrada E, Rodriguez-Velazquez J A. Subgraph centrality in complex networks. Phys Rev E, 2005, 71: 056103
Wang J X, Peng X Q, Peng W, et al. Dynamic protein interaction network construction and applications. Proteomics, 2014, 14: 338–352
Xiao Q H, Wang J X, Peng X Q, et al. Identifying essential proteins from active PPI networks constructed with dynamic gene expression. BMC Genomics, 2015, 16: S1
Tang X W, Wang J X, Liu B B, et al. A comparison of the functional modules identified from time course and static PPI network data. BMC Bioinform, 2011, 12: 339
Jin R M, Mccallen S, Liu C C, et al. Identifying dynamic network modules with temporal and spatial constraints. In: Proceedings of Pacific Symposium on Biocomputing, Big Island of Hawaii, 2009. 203–214
Luo J W, Kuang L. A new method for predicting essential proteins based on dynamic network topology and complex information. Computl Biol Chem, 2014, 52: 34–42
Chen B L, Fan W W, Liu J, et al. Identifying protein complexes and functional modules from static PPI networks to dynamic PPI networks. Brief Bioinform, 2014, 15: 177–194
Oh S, Song S, Grabowski G, et al. Time series expression analyses using RNA-Seq: a statistical approach. BioMed Res Int, 2013, 203681
Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol, 2005, 4: 17
Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nat Methods, 2012, 9: 357–359
Ferragina P, Manzini G. Opportunistic data structures with applications. In: Proceedings of IEEE 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, 2000. 390–398
Trapnell C, Pachter L, Salzberg S L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics, 2009, 25: 1105–1111
Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat Protoc, 2012, 7: 562–578
Wang J X, Li M, Wang H, et al. Identification of essential proteins based on edge clustering coefficient. IEEE/ACM Trans Comput Biol Bioinform, 2012, 9: 1070–1080
Liu G M, Wong L, Chua H N. Complex discovery from weighted PPI networks. Bioinformatics, 2009, 25: 1891–1897
Lage K, Karlberg E O, Størling Z M, et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat Biotechnol, 2007, 25: 309–316
Chen Y X, Wang W H, Zhou Y Y, et al. In silico gene prioritization by integrating multiple data sources. PLoS ONE, 2011, 6: e21137
Stocchetto S, Marin O, Carignani G, et al. Biochemical evidence that Saccharomyces cerevisiae YGR262c gene, required for normal growth, encodes a novel Ser/Thr-specific protein kinase. FEBS Lett, 1997, 414: 171–175
Jaquet L, Jauniaux J C. Disruption and basic functional analysis of five chromosome X novel ORFs of Saccharomyces cerevisiae reveals YJL125c as an essential gene for vegetative growth. Yeast, 1999, 15: 51–61
Huang M E, Cadieu E, Souciet J L, et al. Disruption of six novel yeast genes reveals three genes essential for vegetative growth and one required for growth at low temperature. Yeast, 1997, 13: 1181–1194
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shang, X., Wang, Y. & Chen, B. Identifying essential proteins based on dynamic protein-protein interaction networks and RNA-Seq datasets. Sci. China Inf. Sci. 59, 070106 (2016). https://doi.org/10.1007/s11432-016-5583-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-016-5583-z