Abstract
A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ptashne M, Gann A (2001) Genes and signals. Cold Spring Harbor Laboratory Press, Cold Spring Harbor
Barenco M, Tomescu D, Brewer D, Callard R, Stark J, Hubank M (2006) Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biol 7(3):R25
Lawrence ND, Girolami M, Rattray M, Sanguinetti G (2010) Learning and inference in computational systems biology. MIT Press, Cambridge
Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 19:2271–2282
Zoppoli P, Morganella S, Ceccarelli M (2010) TimeDelay-ARACNE: reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinf 11:154
Morrissey ER, Juárez MA, Denby KJ, Burroughs NJ (2011) Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. Biostatistics 12(4):682–694
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genomics Mol Biol 4(1). https://doi.org/10.2202/1544-6115.1175
Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9:432–441
Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst Biol 1(37). https://doi.org/10.1186/1752-0509-1-37
Tibshirani R (1995) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Hastie T, Tibshirani R, Friedman JJH (2009) The elements of statistical learning. Springer, New York
Zou H, Hastie T (2005) Regularization and variable selection via the Elastic Net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Ahmed A, Xing EP (2009) Recovering time-varying networks of dependencies in social and biological studies. Proc Natl Acad Sci 106:11878–11883
Grzegorczyk M, Husmeier D (2012) A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology. Stat Appl Genet Mol Biol 11(4). Article 7
Bishop CM (2006) Pattern recognition and machine learning. Springer, Singapore
Tipping M (2001) Spare Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
Rogers S, Girolami M (2005) A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics 21(14):3131–3137
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge
Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343
Beal M, Falciani F, Ghahramani Z, Rangel C, Wild D (2005) A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21(3):349–356
Beal M (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, London
Rasmussen C, Williams C (2006) Gaussian processes for machine learning, vol 1. MIT Press, Cambridge
Äijö T, Lähdesmäki H (2009) Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics. Bioinformatics 25(22):2937–2944
Ko Y, Zhai C, Rodriguez-Zas S (2007) Inference of gene pathways using Gaussian mixture models. In: International conference on bioinformatics and biomedicine, Fremont, pp 362–367
Ko Y, Zhai C, Rodriguez-Zas S (2009) Inference of gene pathways using mixture Bayesian networks. BMC Syst Biol 3:54
Geiger D, Heckerman D (1994) Learning Gaussian networks. In: International conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers, San Francisco, pp 235–243
Aderhold A, Husmeier D, Grzegorczyk M (2017) Approximate Bayesian inference in semi-mechanistic models. Stat Comput 27(4):1003–1040
Oates CJ, Dondelinger F, Bayani N, Korkola J, Gray JW, Mukherjee S (2014) Causal network inference using biochemical kinetics. Bioinformatics 30(17):i468–i474
Pokhilko A, Hodge S, Stratford K, Knox K, Edwards K, Thomson A, Mizuno T, Millar A (2010) Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model. Mol Syst Biol 6(1):416
Pokhilko A, Fernández A, Edwards K, Southern M, Halliday K, Millar A (2012) The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops. Mol Syst Biol 8:574
Marin JM, Robert CP (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer, New York
Chib S, Jeliazkov I (2001) Marginal likelihood from the Metropolis–Hastings output. J Am Stat Assoc 96(453):270–281
Holsclaw T, Sansó B, Lee HK, Heitmann K, Habib S, Higdon D, Alam U (2013) Gaussian process modeling of derivative curves. Technometrics 55(1):57–67
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1): 1–22
Brooks S, Gelman A (1999) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7:434–455
Gelman A, Rubin D (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472
Tipping M, Faul A, et al (2003) Fast marginal likelihood maximisation for sparse Bayesian models. In: International workshop on artificial intelligence and statistics, vol 1, pp 3–6
Aderhold A, Husmeier D, Grzegorczyk M (2014) Statistical inference of regulatory networks for circadian regulation. Stat Appl Genet Mol Biol 13(3):227–273
Nabney I (2002) NETLAB: algorithms for pattern recognition. Springer, Berlin
Locke JCW, Kozma-Bognár L, Gould PD, Fehér B, Kevei E, Nagy F, Turner MS, Hall A, Millar AJ (2006) Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana. Mol Syst Biol 2(59). https://doi.org/10.1038/msb4100102
Pokhilko A, Mas P, Millar AJ, et al (2013) Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs. BMC Syst Biol 7(1):1–12
Trejo-Banos D, Millar AJ, Sanguinetti G (2015) A Bayesian approach for structure learning in oscillating regulatory networks. Bioinformatics 31:3617–3624
Guerriero M, Pokhilko A, Fernández A, Halliday K, Millar A, Hillston J (2012) Stochastic properties of the plant circadian clock. J R Soc Interface 9(69):744–756
Wilkinson DJ (2009) Stochastic modelling for quantitative description of heterogeneous biological systems. Nat Rev Genet 10(2): 122–133
Wilkinson D (2011) Stochastic modelling for systems biology, vol 44. CRC Press, Boca Raton
Ciocchetta F, Hillston J (2009) Bio-PEPA: a framework for the modelling and analysis of biological systems. Theor Comput Sci 410(33):3065–3084
Gillespie D (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361
Flis A, Fernández AP, Zielinski T, Mengin V, Sulpice R, Stratford K, Hume A, Pokhilko A, Southern MM, Seaton DD, McWatters HG, Stitt M, Halliday KJ, Millar AJ (2015) Defining the robust behaviour of the plant clock gene circuit with absolute RNA timeseries and open infrastructure. Open Biol 5(10):150042. https://doi.org/10.1098/rsob.150042
Edwards K, Akman O, Knox K, Lumsden P, Thomson A, Brown P, Pokhilko A, Kozma-Bognar L, Nagy F, Rand D, et al (2010) Quantitative analysis of regulatory flexibility under changing environmental conditions. Mol Syst Biol 6(1):424
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning (ICML). ACM, New York, pp 233–240
Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G, et al (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9(8): 796–804
Rasmussen CE (1996) Evaluation of Gaussian processes and other methods for non-linear regression. PhD thesis, Citeseer
Rasmussen CE, Neal RM, Hinton GE, van Camp D, Revow M, Ghahramani Z, Kustra R, Tibshirani R (1996) The DELVE repository was developed as part of a PhD thesis, which could be cited as an alternative to the technical report: Carl Edward Rasmussen Evaluation of Gaussian Processes and other Methods for Non-Linear Regression PhD thesis University of Toronto
Brandt S (1999) Data analysis: statistical and computational methods for scientists and engineers. Springer, New York
Neuneier R, Hergert F, Finnoff W, Ormoneit D (1994) Estimation of conditional densities: a comparison of neural network approaches. In: International conference on artificial neural networks. Springer, Berlin, pp 689–692
Mockler T, Michael T, Priest H, Shen R, Sullivan C, Givan S, McEntee C, Kay S, Chory J (2007) The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. In: Cold Spring Harbor symposia on quantitative biology, vol 72. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 353–363
Fogelmark K, Troein C (2014) Rethinking transcriptional activation in the Arabidopsis circadian clock. PLoS Comput Biol 10(7):e1003705
Grzegorczyk M, Aderhold A, Husmeier D (2015) Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles. Stat Appl Genet Mol Biol 14(2):143–167
Locke JCW, Southern MM, Kozma-Bognár L, Hibberd V, Brown PE, Turner MS, Millar AJ (2005) Extension of a genetic network model by iterative experimentation and mathematical analysis. Mol Syst Biol 1(1)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Grzegorczyk, M., Aderhold, A., Husmeier, D. (2019). Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data. In: Sanguinetti, G., Huynh-Thu, V. (eds) Gene Regulatory Networks. Methods in Molecular Biology, vol 1883. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8882-2_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8882-2_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8881-5
Online ISBN: 978-1-4939-8882-2
eBook Packages: Springer Protocols