Abstract
Protein Data Bank is the single worldwide archive of experimentally determined macromolecular structure data. Established in 1971 as the first open access data resource in biology, the PDB archive is managed by the worldwide Protein Data Bank (wwPDB) consortium which has four partners—the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The PDB archive currently includes ~175,000 entries. The wwPDB has established a number of task forces and working groups that bring together experts form the community who provide recommendations on improving data standards and data validation for improving data quality and integrity. The wwPDB members continue to develop the joint deposition, biocuration, and validation system (OneDep) to improve data quality and accommodate new data from emerging techniques such as 3DEM. Each PDB entry contains coordinate model and associated metadata for all experimentally determined atomic structures, experimental data for the traditional structure determination techniques (X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy), validation reports, and additional information on quaternary structures. The wwPDB partners are committed to following the FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles and have implemented a DOI resolution mechanism that provides access to all the relevant files for a given PDB entry. On average, >250 new entries are added to the archive every week and made available by each wwPDB partner via FTP area. The wwPDB partner sites also develop data access and analysis tools and make these available via their websites. wwPDB continues to work with experts in the community to establish a federation of archives for archiving structures determined using integrative/hybrid method where multiple experimental techniques are used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
wwPDB Consortium (2019) Protein data Bank: the single global archive for 3D macromolecular structure data jointly managed by the worldwide protein data bank. Nucleic Acids Res 47(D1):520–528
Durinx C, McEntyre J, Appel R et al (2016) Identifying ELIXIR core data resources. F1000Res 5. https://doi.org/10.12688/f1000research.9656.2
Bousfield D, McEntyre J, Velankar S et al (2016) Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources. F1000Res 5. https://doi.org/10.12688/f1000research.7911.1
Burley SK, Berman HM, Christie C et al (2018) RCSB protein data Bank: sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education. Protein Sci 27(1):316–330
Westbrook JD, Burley SK (2019) How structural biologists and the protein data bank contributed to recent FDA new drug approvals. Structure 27:211–217
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data Bank. Nat Struct Biol 10:980
Burley SK, Berman HM, Bhikadiya C et al (2019) RCSB protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res 47:D464–D474
Kinjo AR, Bekker GJ, Wako H et al (2018) New tools and functions in data-out activities at protein data Bank Japan (PDBj). Protein Sci 27:95–102
Armstrong DR, Berrisford JM, Conroy MJ et al (2020) PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res 48:D335–D343
Ulrich EL, Akutsu H, Doreleijers JF et al (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408
Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171:737–738
Kendrew JC, Bodo G, Dintzis HM et al (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181:662–666
Perutz MF, Rossmann MG, Cullis AF et al (1960) Structure of haemoglobin: a three-dimensional Fourier synthesis at 5.5-A. resolution, obtained by X-ray analysis. Nature 185:416–422
(1971) Crystallography: protein data Bank. Nat New Biol 233:223–223
Kennard O, Watson DG, Town WG (1972) Cambridge crystallographic data centre. I. Bibliographic file. J Chem Doc 12:14–19
Groom CR, Bruno IJ, Lightfoot MP, Ward SC (2016) The Cambridge structural database. Acta Crystallogr B Struct Sci Cryst Eng Mater 72:171–179
The Protein Data Bank Newsletter Nr 10, Oct 1979 (1979) ftp://ftp.wwpdb.org/pub/pdb/doc/newsletters/bnl/news10_oct79.pdf
Bernstein FC, Koetzle TF, Williams GJ et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112:535–542
Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A 64:88–95
(1989) Commission on biological macromolecules. Acta Crystallogr A 45:658
Sussman JL, Lin D, Jiang J et al (1998) Protein data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr 54:1078–1084
Keller PA, Henrick K, McNeil P et al (1998) Deposition of macromolecular structures. Acta Crystallogr D Biol Crystallogr 54:1105–1108
Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28(1):235–242
Henrick K, Newman R, Tagari M, Chagoyen M (2003) EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information. J Struct Biol 144:228–237
Markley JL, Ulrich EL, Berman HM et al (2008) BioMagResBank (BMRB) as a partner in the worldwide protein data Bank (wwPDB): new policies affecting biomolecular NMR depositions. J Biomol NMR 40:153–155
Wilkinson MD, Dumontier M, Aalbersberg IJ (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018
Read RJ, Adams PD, Arendall WB et al (2011) A new generation of crystallographic validation tools for the protein data bank. Structure 19:1395–1412
Montelione GT, Nilges M, Bax A et al (2013) Recommendations of the wwPDB NMR validation task force. Structure 21:1563–1570
Henderson R, Sali A, Baker ML et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure 20:205–214
Gore S, Sanz Garcia E, Hendrickx PM et al (2017) Validation of structures in the protein data bank. Structure 25:1916–1927
Young JY, Westbrook JD, Feng Z et al (2017) OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive. Structure 25:536–545
Adams PD, Aertgeerts K, Bauer C et al (2016) Outcome of the first wwPDB/CCDC/D3R ligand validation workshop. Structure 24:502–508
Smart OS, Bricogne G (2015) Achieving high quality ligand chemistry in protein-ligand crystal structures for drug design. In: Scapin G, Patel D, Arnold E (eds) Multifaceted roles of crystallography in modern drug discovery, Dordrecht, 2015. Springer, Netherlands, pp 165–181
Ulrich EL, Baskaran K, Dashti H et al (2019) NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments. J Biomol NMR 73:5–9
Sali A, Berman HM, Schwede T et al (2015) Outcome of the first wwPDB hybrid/integrative methods task force workshop. Structure 23:1156–1167
Burley SK, Kurisu G, Markley JL et al (2017) PDB-dev: a prototype system for depositing integrative/hybrid structural models. Structure 25:1317–1318
Jacobson RH, Zhang XJ, DuBose RF, Matthews BW (1994) Three-dimensional structure of beta-galactosidase from E. coli. Nature 369:761–766
Hall SR, Allen FH, Brown ID (1991) The crystallographic information file (CIF): a new standard archive file for crystallography. Acta Crystallogr A 47:31
Hall SR (1991) The STAR file: a new format for electronic data transfer and archiving. J Chem Inf Comp Sci 31:326–333
Westbrook JD, Bourne PE (2000) STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics 16:159–168
Fitzgerald PM, Westbrook JD, Bourne PE et al (2005) The macromolecular dictionary (mmCIF). In: Hall SR, McMahon B (eds) International tables for crystallography, vol G. International tables for crystallography. Springer, Dordrecht, pp 295–443
Westbrook J, Henrick K, Ulrich EL, HM B (2005) The protein data bank exchange dictionary. In: International tables for crystallography, vol G. Springer, Dordrecht, pp 195–198
Kachala M, Westbrook J, Svergun D (2016) Extension of the sasCIF format and its applications for data processing and deposition. J Appl Crystallogr 49:302–310
Vallat B, Webb B, Westbrook JD et al (2018) Development of a prototype system for archiving integrative/hybrid structure models of biological macromolecules. Structure 26:894–904
Westbrook J, Ito N, Nakamura H et al (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21:988–992
Kinjo AR, Suzuki H, Yamashita R et al (2012) Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 40:D453–D460
Westbrook JD, Shao C, Feng Z et al (2015) The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the protein data Bank. Bioinformatics 31:1274–1278
Dutta S, Dimitropoulos D, Feng Z et al (2014) Improving the representation of peptide-like inhibitor and antibiotic molecules in the Protein Data Bank. Biopolymers 101:659–668
Abbott S, Iudin A, Korir PK et al (2018) EMDB web resources. Curr Protoc Bioinformatics 61:5. 10 11-15 10 12
Winn MD, Ballard CC, Cowtan KD et al (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr 67:235–242
Young JY, Westbrook JD, Feng Z et al (2018) Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data. Database (Oxford) 2018. https://doi.org/10.1093/database/bay002
UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515
Sayers EW, Beck J, Brister JR et al (2020) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 48:D9–D16
Shao C, Liu Z, Yang H et al (2018) Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach. Sci Data 5:180293
Smart OS, Horsky V, Gore S et al (2018) Worldwide Protein Data Bank validation information: usage and trends. Acta Crystallogr D Struct Biol 74:237–244
Liebschner D, Afonine PV, Baker ML et al (2019) Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D Struct Biol 75:861–877
Potterton L, Agirre J, Ballard C et al (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. Acta Crystallogr D Struct Biol 74:68–84
Adams PD, Afonine PV, Baskaran K et al (2019) Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr D Struct Biol 75:451–454
Lemak A, Wu B, Yee A et al (2014) Structural characterisation of a flexible two-domain protein in solution using small angle X-ray scattering and NMR data. Structure 22:1862–1874
Schlundt A, Tants JN, Sattler M (2017) Integrated structural biology to unravel molecular mechanisms of protein-RNA recognition. Methods 118:119–136
Kikhney AG, Borges CR, Molodenskiy DS et al (2020) SASBDB: towards an automatically curated and validated repository for biological scattering data. Protein Sci 29:66–75
Moult J, Fidelis K, Kryshtafovych A et al (2018) Critical assessment of methods of protein structure prediction (CASP)-round XII. Proteins 86(Suppl 1):7–15
Lensink MF, Nadzirin N, Velankar S, Wodak SJ (2019) Modeling protein-protein, protein-peptide, and protein-oligosaccharide complexes: CAPRI 7th edition. Proteins. https://doi.org/10.1002/prot.25870
Haas J, Gumienny R, Barbato A et al (2019) Introducing "best single template" models as reference baseline for the continuous automated model evaluation (CAMEO). Proteins 87:1378–1387
Wagner JR, Churas CP, Liu S et al (2019) Continuous evaluation of ligand protein predictions: a weekly community challenge for drug docking. Structure 27:1326–1335
Rose AS, Bradley AR, Valasatava Y et al (2018) NGL viewer: web-based molecular graphics for large complexes. Bioinformatics 34:3755–3758
Sehnal D, Deshpande M, Varekova RS et al (2017) LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data. Nat Methods 14:1121–1122
Dana JM, Gutmanas A, Tyagi N, Qi G et al (2019) SIFTS: updated structure integration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res 47:D482–D489
Berman HM, Adams PD, Bonvin AA et al (2019) Federating structural models and data: outcomes from a workshop on archiving integrative structures. Structure 27:1745
Morin A, Eisenbraun B, Key J et al (2013) Collaboration gets the most out of software. elife 2:e01456
Iudin A, Korir PK, Salavert-Torres J et al (2016) EMPIAR: a public archive for raw electron microscopy image data. Nat Methods 13:387–388
Perez-Riverol Y, Csordas A, Bai J et al (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47:D442–D450
Acknowledgments
The Protein Data Bank in Europe is supported by European Molecular Biology Laboratory-European Bioinformatics Institute; Wellcome Trust [104948]; Biotechnology and Biological Sciences Research Council [BB/G022577/1, BB/J007471/1, BB/K016970/1, BB/K020013/1, BB/M013146/1, BB/M011674/1, BB/M020347/1, BB/M020428/1, BB/P024351/1]; European Union [284209], ELIXIR, and Open Targets. The RCSB PDB is jointly funded by the National Science Foundation (DBI-1832184), the National Institutes of Health (R01GM133198), and the United States Department of Energy (DE-SSC0019749). PDBj is funded by the National Bioscience Database Center of Japan Science and Technology Agency (JST-NBDC), the Basis for Supporting Innovative Drug Discovery and Life Science Research of Japan Agency for Medical Research and Development (AMED-BINDS), and the Joint Usage / Research Center project assigned to Institute for Protein Research, Osaka University, by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. BMRB is supported by US National Institutes of Health (NIH) grant R01GM109046. We gratefully acknowledge contributions from John Berrisford, Aleks Gutmanas, Eldon L. Ulrich, Jasmine Young, and John Westbrook, and all wwPDB staff members present and past. We would like to acknowledge wwPDB collaborators and partners at the EMDB, SASBDB, CCP4, CCPEM, CCPN, and the global structural biology and bioinformatics communities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Velankar, S., Burley, S.K., Kurisu, G., Hoch, J.C., Markley, J.L. (2021). The Protein Data Bank Archive. In: Owens, R.J. (eds) Structural Proteomics. Methods in Molecular Biology, vol 2305. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1406-8_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1406-8_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1405-1
Online ISBN: 978-1-0716-1406-8
eBook Packages: Springer Protocols