Skip to main content

State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments

  • Protocol
  • First Online:
Structural Genomics

Abstract

Efficient and comprehensive data management is an indispensable component of modern scientific research and requires effective tools for all but the most trivial experiments. The LabDB system developed and used in our laboratory was originally designed to track the progress of a structure determination pipeline in several large National Institutes of Health (NIH) projects. While initially designed for structural biology experiments, its modular nature makes it easily applied in laboratories of various sizes in many experimental fields. Over many years, LabDB has transformed into a sophisticated system integrating a range of biochemical, biophysical, and crystallographic experimental data, which harvests data both directly from laboratory instruments and through human input via a web interface. The core module of the system handles many types of universal laboratory management data, such as laboratory personnel, chemical inventories, storage locations, and custom stock solutions. LabDB also tracks various biochemical experiments, including spectrophotometric and fluorescent assays, thermal shift assays, isothermal titration calorimetry experiments, and more. LabDB has been used to manage data for experiments that resulted in over 1200 deposits to the Protein Data Bank (PDB); the system is currently used by the Center for Structural Genomics of Infectious Diseases (CSGID) and several large laboratories. This chapter also provides examples of data mining analyses and warnings about incomplete and inconsistent experimental data. These features, together with its capabilities for detailed tracking, analysis, and auditing of experimental data, make the described system uniquely suited to inspect potential sources of irreproducibility in life sciences research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Data management. http://www.businessdictionary.com/definition/data-management.html. Accessed 6 May 2019

  2. Freedman LP, Cockburn IM, Simcoe TS (2015) The economics of reproducibility in preclinical research. PLoS Biol 13(6):e1002165

    Article  CAS  Google Scholar 

  3. Prinz F, Schlange T, Asadullah K (2011) Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov 10(9):712–7c1

    Article  CAS  Google Scholar 

  4. Begley CG, Ioannidis JP (2015) Reproducibility in science: improving the standard for basic and preclinical research. Circ Res 116(1):116–126

    Article  CAS  Google Scholar 

  5. Collins FS, Tabak LA (2014) Policy: NIH plans to enhance reproducibility. Nature 505(7485):612–613

    Article  Google Scholar 

  6. McDowall RD, Pearce JC, Murkitt GS (1988) Laboratory information management systems—Part I. Concepts. J Pharm Biomed Anal 6(4):339–359

    Article  CAS  Google Scholar 

  7. Hakkinen J, Levander F (2011) Laboratory data and sample management for proteomics. Methods Mol Biol 696:79–92

    Article  CAS  Google Scholar 

  8. Hunter A, Dayalan S, De Souza D, Power B, Lorrimar R, Szabo T et al (2017) MASTR-MS: a web-based collaborative laboratory information management system (LIMS) for metabolomics. Metabolomics 13(2):14016-1142-2. Epub 2016 Dec 27

    Article  CAS  Google Scholar 

  9. Lin K, Kools H, de Groot PJ, Gavai AK, Basnet RK, Cheng F et al (2011) MADMAX - management and analysis database for multiple ~omics experiments. J Integr Bioinform 8(2):160,jib-2011-160

    Article  Google Scholar 

  10. Stephan C, Kohl M, Turewicz M, Podwojski K, Meyer HE, Eisenacher M (2010) Using Laboratory Information Management Systems as central part of a proteomics data workflow. Proteomics 10(6):1230–1249

    Article  CAS  Google Scholar 

  11. Venco F, Vaskin Y, Ceol A, Muller H (2014) SMITH: a LIMS for handling next-generation sequencing workflows. BMC Bioinformatics 15(Suppl 14):S3. Epub 2014 Nov 27

    Article  Google Scholar 

  12. Harris M, Jones TA (2002) Xtrack - a web-based crystallographic notebook. Acta Crystallogr D Biol Crystallogr 58(Pt 10 Pt 2):1889–1891

    Article  CAS  Google Scholar 

  13. Lab Information Management Systems (LIMS). https://www.thermofisher.com/us/en/home/life-science/lab-data-management-analysis-software/enterprise-level-lab-informatics/lab-information-management-systems-lims.html. Accessed 25 Apr 2019

  14. Laboratory Information Management System (LIMS). https://www.autoscribeinformatics.com/lims-laboratory-information-management-system. Accessed 6 May 2019

  15. Produce reliable results more quickly. https://www.illumina.com/informatics/sample-experiment-management/lims.html. Accessed 25 Apr 2019

  16. St. Cyr K, Hill A, Warren P, Mounts D, Whitley M, Mounts W et al (2010) From project-to-peptides: customizing a commercial LIMS for LC-MS proteomics. J Biomol Tech 21(3):S9

    Google Scholar 

  17. Zolnai Z, Lee PT, Li J, Chapman MR, Newman CS, Phillips GN Jr et al (2003) Project management system for structural and functional proteomics: SESAME. J Struct Funct Genom 4(1):11–23

    Article  CAS  Google Scholar 

  18. Morris C (2015) PiMS: a data management system for structural proteomics. Methods Mol Biol 1261:21–34

    Article  CAS  Google Scholar 

  19. Daniel E, Lin B, Diprose JM, Griffiths SL, Morris C, Berry IM et al (2011) xtalPiMS: a PiMS-based web application for the management and monitoring of crystallization trials. J Struct Biol 175(2):230–235

    Article  CAS  Google Scholar 

  20. Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I et al (2005) HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories. Acta Crystallogr D Biol Crystallogr 61(Pt 6):671–678

    Article  CAS  Google Scholar 

  21. Bonanno JB, Almo SC, Bresnick A, Chance MR, Fiser A, Swaminathan S et al (2005) New York-Structural GenomiX Research Consortium (NYSGXRC): a large scale center for the protein structure initiative. J Struct Funct Genom 6(2–3):225–232

    Article  CAS  Google Scholar 

  22. Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR et al (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D 67(Pt 4):235–242

    Article  CAS  Google Scholar 

  23. Potterton L, Agirre J, Ballard C, Cowtan K, Dodson E, Evans PR et al (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. Acta Crystallogr D Struct Biol 74(Pt 2):68–84

    Article  CAS  Google Scholar 

  24. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N et al (2010) PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D66(Pt 2):213–221

    Google Scholar 

  25. Echols N, Grosse-Kunstleve RW, Afonine PV, Bunkoczi G, Chen VB, Headd JJ et al (2012) Graphical tools for macromolecular crystallography in PHENIX. J Appl Crystallogr 45(Pt 3):581–586

    Article  CAS  Google Scholar 

  26. Minor W, Cymborowski M, Otwinowski Z, Chruszcz M (2006) HKL-3000: the integration of data reduction and structure solution - from diffraction images to an initial model in minutes. Acta Crystallogr D Biol Crystallogr D62:859–866

    Article  CAS  Google Scholar 

  27. Cymborowski M, Klimecka M, Chruszcz M, Zimmerman MD, Shumilin IA, Borek D et al (2010) To automate or not to automate: this is the question. J Struct Funct Genom 11(3):211–221

    Article  CAS  Google Scholar 

  28. Zimmerman MD, Grabowski M, Domagalski MJ, MacLean EM, Chruszcz M, Minor W (2014) Data management in the modern structural biology and biomedical research environment. Methods Mol Biol 1140:1–25

    Article  Google Scholar 

  29. Zimmerman MD, Chruszcz M, Koclega K, Otwinowski Z, Minor W (2005) The Xtaldb system for project salvaging in high-throughput crystallization. Acta Crystallogr A 61:c178–c179

    Article  Google Scholar 

  30. Zimmerman MD (2008) The crystallization expert system Xtaldb, and its application to the structure of the 5′- nucleotidase YfbR and other proteins [dissertation]. University of Virginia, Charlottesville

    Google Scholar 

  31. Chruszcz M, Wlodawer A, Minor W (2008) Determination of protein structures—a series of fortunate events. Biophys J 95(1):1–9

    Article  CAS  Google Scholar 

  32. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36

    Article  CAS  Google Scholar 

  33. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A et al (2016) PubChem Substance and Compound databases. Nucleic Acids Res 44(D1):D1202–D1213

    Article  CAS  Google Scholar 

  34. Formulatrix. https://formulatrix.com/. Accessed 6 May 2019

  35. Newman J (2005) Expanding screening space through the use of alternative reservoirs in vapor-diffusion experiments. Acta Crystallogr D Biol Crystallogr 61(Pt 4):490–493

    Article  CAS  Google Scholar 

  36. Cooper DR, Boczek T, Grelewska K, Pinkowska M, Sikorska M, Zawadzki M et al (2007) Protein crystallization by surface entropy reduction: optimization of the SER strategy. Acta Crystallogr D Biol Crystallogr 63(Pt 5):636–645

    Article  CAS  Google Scholar 

  37. CakePHP. https://cakephp.org/. Accessed 6 May 2019

  38. Shabalin IG, Porebski PJ, Minor W (2018) Refining the macromolecular model - achieving the best agreement with the data from X-ray diffraction experiment. Crystallogr Rev 24(4):236–262

    Article  CAS  Google Scholar 

  39. Czub MP, Venkataramany BS, Majorek KA, Handing KB, Porebski PJ, Beeram SR et al (2018) Testosterone meets albumin - the molecular mechanism of sex hormone transport by serum albumins. Chem Sci 10(6):1607–1618

    Article  Google Scholar 

  40. Majorek KA, Porebski PJ, Dayal A, Zimmerman MD, Jablonska K, Stewart AJ et al (2012) Structural and immunologic characterization of bovine, horse, and rabbit serum albumins. Mol Immunol 52(3–4):174–182

    Article  CAS  Google Scholar 

  41. Svare A, Nilsen TI, Asvold BO, Forsmo S, Schei B, Bjoro T et al (2013) Does thyroid function influence fracture risk? Prospective data from the HUNT2 study, Norway. Eur J Endocrinol 169(6):845–852

    Article  CAS  Google Scholar 

  42. Majorek KA, Kuhn ML, Chruszcz M, Anderson WF, Minor W (2014) Double trouble-buffer selection and His-tag presence may be responsible for nonreproducibility of biomedical experiments. Protein Sci 23(10):1359–1368

    Article  CAS  Google Scholar 

  43. How a typo in a catalog number led to the correction of a scientific paper—and what we can learn from that. https://retractionwatch.com/2018/10/18/how-a-typo-in-a-catalog-number-led-to-the-correction-of-a-scientific-paper-and-what-we-can-learn-from-that/. Accessed 8 May 2019

Download references

Acknowledgments

We thank all the users of our data management programs who over many years provided us with numerous complaints, suggestions, and requests that gave us invaluable feedback to improve our tools. This work was supported by the National Institute of General Medical Sciences under Grants GM117080 and GM117325, National Institutes of Health BD2K program under grant HG008424, and the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under Contract No. HHSN272201700060C and HHSN272201200026C.

Disclosure statement: One of the authors (W.M.) notes that he has also been involved in the development of state-of-the-art software and data management and mining tools; some of them were commercialized by HKL Research, Inc. and are mentioned in the paper. W.M. is the co-founder of HKL Research, Inc. and a member of the board.

The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wladek Minor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Cooper, D.R. et al. (2021). State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments. In: Chen, Y.W., Yiu, CP.B. (eds) Structural Genomics. Methods in Molecular Biology, vol 2199. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0892-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0892-0_13

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0891-3

  • Online ISBN: 978-1-0716-0892-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics