Abstract
Although the number of sequenced insect genomes numbers in the hundreds, little is known about gene regulatory sequences in any species other than the well-studied Drosophila melanogaster. We provide here a detailed protocol for using SCRMshaw, a computational method for predicting cis-regulatory modules (CRMs, also “enhancers”) in sequenced insect genomes. SCRMshaw is effective for CRM discovery throughout the range of holometabolous insects and potentially in even more diverged species, with true-positive prediction rates of 75% or better. Minimal requirements for using SCRMshaw are a genome sequence and training data in the form of known Drosophila CRMs; a comprehensive set of the latter can be obtained from the SCRMshaw download site. For basic applications, a user with only modest computational know-how can run SCRMshaw on a desktop computer. SCRMshaw can be run with a single, narrow set of training data to predict CRMs regulating a specific pattern of gene expression, or with multiple sets of training data covering a broad range of CRM activities to provide an initial rough regulatory annotation of a complete, newly-sequenced genome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
i5k Consortium (2013) The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600
Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491
Ekblom R, Wolf JB (2014) A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 7:1026–1042
Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342
Suryamohan K, Halfon M (2015) Insect regulatory genomics. In: Raman C et al (eds) Short views on insect genomics and proteomics. Springer International Publishing, pp 119–155
Cho, K.W. (2012) Enhancers. Wiley interdisciplinary reviews developmental biology, vol. 1, pp 469–478
Long HK et al (2016) Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167:1170–1187
Shlyueva D et al (2014) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15:272–286
Smith E, Shilatifard A (2014) Enhancer biology and enhanceropathies. Nat Struct Mol Biol 21:210–219
Vernimmen D, Bickmore WA (2015) The hierarchy of transcriptional activation: from enhancer to promoter. Trends Genet 31:696–708
Buffry AD et al (2016) The functionality and evolution of eukaryotic transcriptional enhancers. Adv Genet 96:143–206
Suryamohan K, Halfon MS (2015) Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol 4:59–84
Li Y et al (2015) The identification of cis-regulatory elements: a review from a machine learning perspective. Biosystems 138:6–17
Murakawa Y et al (2016) Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet 32:76–88
modENCODE Consortium et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797
Gallo SM et al (2011) REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res 39:D118–D123
Kantorovitz MR et al (2009) Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse. Dev Cell 17:568–579
Kazemian M et al (2011) Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison. Nucleic Acids Res 39:9463–9472
Kazemian M et al (2014) Evidence for deep regulatory similarities in early developmental programs across highly diverged insects. Genome Biol Evol 6:2301–2320
Suryamohan K et al (2016) Redeployment of a conserved gene regulatory network during Aedes aegypti development. Dev Biol 416:402–413
Stein, L. (2013) Generic Feature Format Version 3 (GFF3). https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
Gramates LS et al (2017) FlyBase at 25: looking to the future. Nucleic Acids Res 45:D663–D671
Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580
Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842
Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006
Zdobnov EM et al (2017) OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res 45:D744–D749
Sonnhammer EL, Ostlund G (2015) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 43:D234–D239
Huerta-Cepas J et al (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293
Suryamohan, K. (2016) PhD Thesis: Regulatory networks in development: understanding the role of cis-regulatory modules in Gene Regulatory Network evolution. Department of Biochemistry, University at Buffalo-State University of New York
Yang W, Sinha S (2017) A novel method for predicting activity of cis-regulatory modules, based on a diverse training set. Bioinformatics 33:1–7
Barolo S (2012) Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34:135–141
Acknowledgments
We thank Kushal Suryamohan for comments on the manuscript. This work was supported by USDA grant 2012-67013-19361 (MSH) and NIH grant 5K22HL125593-02 (MK).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Kazemian, M., Halfon, M.S. (2019). CRM Discovery Beyond Model Insects. In: Brown, S., Pfrender, M. (eds) Insect Genomics. Methods in Molecular Biology, vol 1858. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8775-7_10
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8775-7_10
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8774-0
Online ISBN: 978-1-4939-8775-7
eBook Packages: Springer Protocols