Abstract
GenBank® and the Sequence Read Archive (SRA) are comprehensive databases of publicly available DNA sequences. GenBank contains data for 480,000 named organisms, more than 176,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. SRA contains reads from next-generation sequencing studies from over 110,000 species. Daily data exchange with the European Nucleotide Archive (ENA) in Europe and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage for both databases. GenBank and SRA data are accessible through the NCBI Entrez retrieval system that integrates these data with other data at NCBI, such as genomes, taxonomy, and the biomedical literature. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. Usage scenarios for both GenBank and SRA ranging from local and cloud analyses to online analyses supported by the NCBI web-based tools are discussed. Both GenBank and SRA, along with their related retrieval and analysis services, are available from the NCBI homepage at www.ncbi.nlm.nih.gov.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I (2020) GenBank. Nucleic Acids Res 48:D84–D86
Amid C, Alako BTF, Balavenkataraman Kadhirvelu V, Burdett T, Burgin J, Fan J, Harrison PW, Holt S, Hussein A, Ivanov E et al (2020) The European Nucleotide Archive in 2019. Nucleic Acids Res 48:D70–D76
Ogasawara O, Kodama Y, Mashima J, Kosuge T, Fujisawa T (2020) DDBJ Database updates and computational infrastructure enhancement. Nucleic Acids Res 48:D45–D50
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, McVeigh R, O’Neill K, Robbertse B et al (2020) NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database 2020:baaa062
Haft DH, DiCuccio M, Badretdin A, Brover V, Chetvernin V, O’Neill K, Li W, Chitsaz F, Derbyshire MK, Gonzales NR et al (2018) RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res 46:D851–D860
Kitts PA, Church DM, Thibaud-Nissen F, Choi J, Hem V, Sapojnikov V, Smith RG, Tatusova T, Xiang C, Zherikov A et al (2016) Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res 44:D73–D80
Barrett T, Clark K, Gevorgyan R, Gorelenkov V, Gribov E, Karsch-Mizrachi I, Kimelman M, Pruitt KD, Resenchuk S, Tatusova T et al (2012) BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res 40:D57–D63
Sayers EW, Agarwala R, Bolton EE, Brister JR, Canese K, Clark K, Connor R, Fiorini N, Funk K, Hefferon T et al (2019) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 47:D23–D28
Brown GR, Hem V, Katz KS, Ovetsky M, Wallin C, Ermolaeva O, Tolstoy I, Tatusova T, Pruitt KD, Maglott DR et al (2015) Gene: a gene-centered information resource at NCBI. Nucleic Acids Res 43:D36–D42
Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A et al (2020) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 48:D9–D16
Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2016) GenBank. Nucleic Acids Res 44:D67–D72
Kodama Y, Shumway M, Leinonen R (2012) The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res 40:D54–D56
Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L, Lee M, Popova N, Sharopova N, Kimura M et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979
Boratyn GM, Camacho C, Cooper PS, Coulouris G, Fong A, Ma N, Madden TL, Matten WT, McGinnis SD, Merezhuk Y et al (2013) BLAST: a more efficient report with usability improvements. Nucleic Acids Res 41:W29–W33
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36:W5–W9
Ye J, McGinnis S, Madden TL (2006) BLAST: improvements for better sequence analysis. Nucleic Acids Res 34:W6–W9
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Acknowledgements
Funding for this work was provided by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Sayers, E.W., O’Sullivan, C., Karsch-Mizrachi, I. (2022). Using GenBank and SRA. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 2443. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2067-0_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2067-0_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2066-3
Online ISBN: 978-1-0716-2067-0
eBook Packages: Springer Protocols