Skip to main content

CRM Discovery Beyond Model Insects

  • Protocol
  • First Online:
Insect Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1858))

Abstract

Although the number of sequenced insect genomes numbers in the hundreds, little is known about gene regulatory sequences in any species other than the well-studied Drosophila melanogaster. We provide here a detailed protocol for using SCRMshaw, a computational method for predicting cis-regulatory modules (CRMs, also “enhancers”) in sequenced insect genomes. SCRMshaw is effective for CRM discovery throughout the range of holometabolous insects and potentially in even more diverged species, with true-positive prediction rates of 75% or better. Minimal requirements for using SCRMshaw are a genome sequence and training data in the form of known Drosophila CRMs; a comprehensive set of the latter can be obtained from the SCRMshaw download site. For basic applications, a user with only modest computational know-how can run SCRMshaw on a desktop computer. SCRMshaw can be run with a single, narrow set of training data to predict CRMs regulating a specific pattern of gene expression, or with multiple sets of training data covering a broad range of CRM activities to provide an initial rough regulatory annotation of a complete, newly-sequenced genome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. i5k Consortium (2013) The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered 104:595–600

    Article  Google Scholar 

  2. Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491

    Article  Google Scholar 

  3. Ekblom R, Wolf JB (2014) A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 7:1026–1042

    Article  Google Scholar 

  4. Yandell M, Ence D (2012) A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13:329–342

    Article  CAS  Google Scholar 

  5. Suryamohan K, Halfon M (2015) Insect regulatory genomics. In: Raman C et al (eds) Short views on insect genomics and proteomics. Springer International Publishing, pp 119–155

    Google Scholar 

  6. Cho, K.W. (2012) Enhancers. Wiley interdisciplinary reviews developmental biology, vol. 1, pp 469–478

    Article  CAS  Google Scholar 

  7. Long HK et al (2016) Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167:1170–1187

    Article  CAS  Google Scholar 

  8. Shlyueva D et al (2014) Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet 15:272–286

    Article  CAS  Google Scholar 

  9. Smith E, Shilatifard A (2014) Enhancer biology and enhanceropathies. Nat Struct Mol Biol 21:210–219

    Article  CAS  Google Scholar 

  10. Vernimmen D, Bickmore WA (2015) The hierarchy of transcriptional activation: from enhancer to promoter. Trends Genet 31:696–708

    Article  CAS  Google Scholar 

  11. Buffry AD et al (2016) The functionality and evolution of eukaryotic transcriptional enhancers. Adv Genet 96:143–206

    CAS  PubMed  Google Scholar 

  12. Suryamohan K, Halfon MS (2015) Identifying transcriptional cis-regulatory modules in animal genomes. Wiley Interdiscip Rev Dev Biol 4:59–84

    Article  CAS  Google Scholar 

  13. Li Y et al (2015) The identification of cis-regulatory elements: a review from a machine learning perspective. Biosystems 138:6–17

    Article  CAS  Google Scholar 

  14. Murakawa Y et al (2016) Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet 32:76–88

    Article  CAS  Google Scholar 

  15. modENCODE Consortium et al (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330:1787–1797

    Article  Google Scholar 

  16. Gallo SM et al (2011) REDfly v3.0: toward a comprehensive database of transcriptional regulatory elements in Drosophila. Nucleic Acids Res 39:D118–D123

    Article  CAS  Google Scholar 

  17. Kantorovitz MR et al (2009) Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse. Dev Cell 17:568–579

    Article  CAS  Google Scholar 

  18. Kazemian M et al (2011) Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison. Nucleic Acids Res 39:9463–9472

    Article  CAS  Google Scholar 

  19. Kazemian M et al (2014) Evidence for deep regulatory similarities in early developmental programs across highly diverged insects. Genome Biol Evol 6:2301–2320

    Article  CAS  Google Scholar 

  20. Suryamohan K et al (2016) Redeployment of a conserved gene regulatory network during Aedes aegypti development. Dev Biol 416:402–413

    Article  CAS  Google Scholar 

  21. Stein, L. (2013) Generic Feature Format Version 3 (GFF3). https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

  22. Gramates LS et al (2017) FlyBase at 25: looking to the future. Nucleic Acids Res 45:D663–D671

    Article  CAS  Google Scholar 

  23. Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580

    Article  CAS  Google Scholar 

  24. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842

    Article  CAS  Google Scholar 

  25. Kent WJ et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006

    Article  CAS  Google Scholar 

  26. Zdobnov EM et al (2017) OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res 45:D744–D749

    Article  CAS  Google Scholar 

  27. Sonnhammer EL, Ostlund G (2015) InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 43:D234–D239

    Article  CAS  Google Scholar 

  28. Huerta-Cepas J et al (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293

    Article  CAS  Google Scholar 

  29. Suryamohan, K. (2016) PhD Thesis: Regulatory networks in development: understanding the role of cis-regulatory modules in Gene Regulatory Network evolution. Department of Biochemistry, University at Buffalo-State University of New York

    Google Scholar 

  30. Yang W, Sinha S (2017) A novel method for predicting activity of cis-regulatory modules, based on a diverse training set. Bioinformatics 33:1–7

    Article  Google Scholar 

  31. Barolo S (2012) Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34:135–141

    Article  CAS  Google Scholar 

Download references

Acknowledgments

We thank Kushal Suryamohan for comments on the manuscript. This work was supported by USDA grant 2012-67013-19361 (MSH) and NIH grant 5K22HL125593-02 (MK).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Majid Kazemian or Marc S. Halfon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Kazemian, M., Halfon, M.S. (2019). CRM Discovery Beyond Model Insects. In: Brown, S., Pfrender, M. (eds) Insect Genomics. Methods in Molecular Biology, vol 1858. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8775-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8775-7_10

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8774-0

  • Online ISBN: 978-1-4939-8775-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics