Abstract
The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.
In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Usadel B, Fernie AR (2013) The plant transcriptome—from integrating observations to models. Front Plant Sci 4:1–3
Moustafa K, Cross JM (2016) Genetic approaches to study plant responses to environmental stresses: an overview. Biology (Basel) 5:1–18
Malik VS (2016) RNA sequencing as a tool for understanding biological complexity of abiotic stress in plants. J Plant Biochem Biotechnol 25:1–2
Wetterstrand, KA (2016). DNA sequencing costs: data from the NHGRI large-scale genome sequencing program. www.genome.gov/sequencingcostsdata, Accessed 4 Sep 2016
Martin LBB, Fei Z, Giovannoni JJ, Rose JKC (2013) Catalyzing plant science research with RNA-seq. Front Plant Sci 4:66
Weber APM (2015) Discovering new biology through RNA-Seq. Plant Physiol 169(3):1524–1531. 01081.2015
Hrdlickova R, Toloue M, Tian B (2017) RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev RNA 8:e1364
Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E et al (2015) ArrayExpress update-simplifying data submissions. Nucleic Acids Res 43:D1113–D1116
Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W (2016) Learning from co-expression networks: possibilities and challenges. Front Plant Sci 7:444
Katari MS, Nowicki SD, Aceituno FF, Nero D, Kelfer J, Thompson LP et al (2010) VirtualPlant: a software platform to support systems biology research. Plant Physiol 152:500–515
Gutiérrez RA, Lejay LV, Dean A, Chiaromonte F, Shasha DE, Coruzzi GM (2007) Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis. Genome Biol 8:R7
Yang C, Wei H (2015) Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics. Mol Plant 8:196–206
Bassel GW, Gaudinier A, Brady SM, Hennig L, Rhee SY, De Smet I (2012) Systems analysis of plant functional, transcriptional, physical interaction, and metabolic networks. Plant Cell 24:3859–3875
Canales J, Moyano TC, Villarroel E, Gutiérrez RA (2014) Systems analysis of transcriptome data provides new hypotheses about Arabidopsis root response to nitrate treatments. Front Plant Sci 5:22
Long TA, Brady SM, Benfey PN (2008) Systems approaches to identifying gene regulatory networks in plants. Annu Rev Cell Dev Biol 24:81–103
Rasmussen S, Barah P, Suarez-Rodriguez MC, Bressendorff S, Friis P, Costantino P et al (2013) Transcriptome responses to combinations of stresses in Arabidopsis. Plant Physiol 161:1783–1794
Ruffel S, Krouk G, Coruzzi GM (2010) A systems view of responses to nutritional cues in Arabidopsis: toward a paradigm shift for predictive network modeling. Plant Physiol 152:445–452
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13
Wei H, Persson S, Mehta T, Srinivasasainagendra V, Chen L, Page GP et al (2006) Transcriptional coordination of the metabolic network in Arabidopsis. Plant Physiol 142:762–774
Alvarez JM, Riveras E, Vidal EA, Gras DE, Contreras-López O, Tamayo KP et al (2014) Systems approach identifies TGA1 and TGA4 transcription factors as important regulatory components of the nitrate response of Arabidopsis Thaliana roots. Plant J 80:1–13
Gutierrez RA, Stokes TL, Thum K, Xu X, Obertello M, Katari MS et al (2008) Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proc Natl Acad Sci 105:4939–4944
Gutiérrez RA, Gifford ML, Poultney C, Wang R, Shasha DE, Coruzzi GM et al (2007) Insights into the genomic nitrate response using genetics and the Sungear software system. J Exp Bot 58:2359–2367
Krouk G, Mirowski P, LeCun Y, Shasha DE, Coruzzi GM (2010) Predictive network modeling of the high-resolution dynamic plant transcriptome in response to nitrate. Genome Biol 11:R123
Nero D, Krouk G, Tranchina D, Coruzzi GM (2009) A system biology approach highlights a hormonal enhancer effect on regulation of genes in a nitrate responsive ‘biomodule’. BMC Syst Biol 3:59
Vidal EA, Araus V, Lu C, Parry G, Green PJ, Coruzzi GM et al (2010) Nitrate-responsive miR393/AFB3 regulatory module controls root system architecture in Arabidopsis Thaliana. Proc Natl Acad Sci U S A 107:4477–4482
Vidal EA, Moyano TC, Krouk G, Katari MS, Tanurdzic M, McCombie WR et al (2013) Integrated RNA-seq and sRNA-seq analysis identifies novel nitrate-responsive genes in Arabidopsis Thaliana roots. BMC Genomics 14:701
Araus V, Vidal EA, Puelma T, Alamos S, Mieulet D, Guiderdoni E et al (2016) Members of BTB gene family of scaffold proteins suppress nitrate uptake and nitrogen use efficiency. Plant Physiol 171:1523–1532
De Bodt S, Carvajal D, Hollunder J, Van den Cruyce J, Movahedi S, Inze D (2010) CORNET: a user-friendly tool for data mining and integration. Plant Physiol 152:1167–1179
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J et al (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43:D447–D452
Zuberi K, Franz M, Rodriguez H, Montojo J, Lopes CT, Bader GD et al (2013) GeneMANIA prediction server 2013 update. Nucleic Acids Res 41:W115–W122
Aoki Y, Okamura Y, Tadaka S, Kinoshita K, Obayashi T (2016) ATTED-II in 2016: a plant coexpression database towards lineage-specific coexpression. Plant Cell Physiol 57:e5
Field D, Tiwari B, Booth T, Houten S, Swan D, Bertrand N et al (2006) Open software for biologists: from famine to feast. Nat Biotechnol 24:801–803
R Core Team (2015) R: a language and environment for statistical computing. R a lang. environ. stat. comput. R Core Team, Vienna, Austria
Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS et al (2015) Orchestrating high-throughput genomic analysis with bioconductor. Nat Methods 12:115–121
Wilhelm BT, Landry J-R (2009) RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48:249–257
Kodama Y, Shumway M, Leinonen R (2012) The sequence read archive: explosive growth of sequencing data. Nucleic Acids Res 40:2011–2013
Andrews, S (2010). FastQC: a quality control tool for high throughput sequence data. Available online at:http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120
Liao Y, Smyth GK, Shi W (2013) The subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res 41(10):e108
Lan P, Li W, Schmidt W (2012) Complementary proteome and transcriptome profiling in phosphate-deficient Arabidopsis roots reveals multiple levels of gene regulation. Mol Cell Proteomics 11:1156–1166
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550
Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V et al (2016) How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22:839–851
Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG et al (2013) EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 29:1035–1043
Morgun A, Yambartsev A, Thomas L, Shulzhenko N, Ramsey S, Dong X (2015) Reverse enGENEering of regulatory networks from big data: a roadmap for biologists. Bioinform Biol Insights 9:61–74
Revelle, W (2017) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA. Available at: https://CRAN.R-project.org/package=psychVersion=1.7.8.
Yoon J, Blumer A, Lee K (2006) An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics 22:3106–3108
Csardi G, Nepusz T (2006) The igraph software package for complex network research, InterJournal, Complex Systems 1695. Available at: http://igraph.org
Kohl M, Wiese S, Warscheid B (2011) Cytoscape: software for visualization and analysis of biological networks. Methods Mol Biol 696:291–303
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504
Cline M, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C et al (2007) Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2:2366–2382
Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G et al (2011) clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12:436
Su G, Kuchinsky A, Morris JH, States DJ, Meng F (2010) GLay: community structure analysis of biological networks. Bioinformatics 26:3135–3137
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A et al (2009) ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25:1091–1093
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360
Acknowledgments
Research in our group is funded by Fondo de Desarrollo de Areas Prioritarias (FONDAP) Center for Genome Regulation (15090007), MIISSB Iniciativa Científica Milenio-MINECON, Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) 1141097, and EvoNet (DE-SC0014377).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Contreras-López, O., Moyano, T.C., Soto, D.C., Gutiérrez, R.A. (2018). Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data. In: Ristova, D., Barbez, E. (eds) Root Development. Methods in Molecular Biology, vol 1761. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7747-5_21
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7747-5_21
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7746-8
Online ISBN: 978-1-4939-7747-5
eBook Packages: Springer Protocols