State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments

Cooper, David R.; Grabowski, Marek; Zimmerman, Matthew D.; Porebski, Przemyslaw J.; Shabalin, Ivan G.; Woinska, Magdalena; Domagalski, Marcin J.; Zheng, Heping; Sroka, Piotr; Cymborowski, Marcin; Czub, Mateusz P.; Niedzialkowska, Ewa; Venkataramany, Barat S.; Osinski, Tomasz; Fratczak, Zbigniew; Bajor, Jacek; Gonera, Juliusz; MacLean, Elizabeth; Wojciechowska, Kamila; Konina, Krzysztof; Wajerowicz, Wojciech; Chruszcz, Maksymilian; Minor, Wladek

doi:10.1007/978-1-0716-0892-0_13

David R. Cooper^4,5,6^na1,
Marek Grabowski^4,5^na1,
Matthew D. Zimmerman⁴^na1,
Przemyslaw J. Porebski⁴^na1,
Ivan G. Shabalin^4,5^na1,
Magdalena Woinska^4,5^na1,
Marcin J. Domagalski^4,5^na1,
Heping Zheng⁴^na1,
Piotr Sroka^4,5^na1,
Marcin Cymborowski^4,5^na1,
Mateusz P. Czub^4,5^na1,
Ewa Niedzialkowska^4,5^na1,
Barat S. Venkataramany⁴^na1,
Tomasz Osinski⁴^na1,
Zbigniew Fratczak⁴^na1,
Jacek Bajor⁴^na1,
Juliusz Gonera⁴^na1,
Elizabeth MacLean⁴^na1,
Kamila Wojciechowska⁴^na1,
Krzysztof Konina⁴^na1,
Wojciech Wajerowicz⁴^na1,
Maksymilian Chruszcz^4,7^na1 &
…
Wladek Minor^4,5^na1

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2199))

2069 Accesses
5 Citations
4 Altmetric

Abstract

Efficient and comprehensive data management is an indispensable component of modern scientific research and requires effective tools for all but the most trivial experiments. The LabDB system developed and used in our laboratory was originally designed to track the progress of a structure determination pipeline in several large National Institutes of Health (NIH) projects. While initially designed for structural biology experiments, its modular nature makes it easily applied in laboratories of various sizes in many experimental fields. Over many years, LabDB has transformed into a sophisticated system integrating a range of biochemical, biophysical, and crystallographic experimental data, which harvests data both directly from laboratory instruments and through human input via a web interface. The core module of the system handles many types of universal laboratory management data, such as laboratory personnel, chemical inventories, storage locations, and custom stock solutions. LabDB also tracks various biochemical experiments, including spectrophotometric and fluorescent assays, thermal shift assays, isothermal titration calorimetry experiments, and more. LabDB has been used to manage data for experiments that resulted in over 1200 deposits to the Protein Data Bank (PDB); the system is currently used by the Center for Structural Genomics of Infectious Diseases (CSGID) and several large laboratories. This chapter also provides examples of data mining analyses and warnings about incomplete and inconsistent experimental data. These features, together with its capabilities for detailed tracking, analysis, and auditing of experimental data, make the described system uniquely suited to inspect potential sources of irreproducibility in life sciences research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Management in the Modern Structural Biology and Biomedical Research Environment

The Protein Data Bank Archive

Databases, Repositories, and Other Data Resources in Structural Biology

References

Data management. http://www.businessdictionary.com/definition/data-management.html. Accessed 6 May 2019
Freedman LP, Cockburn IM, Simcoe TS (2015) The economics of reproducibility in preclinical research. PLoS Biol 13(6):e1002165
Article CAS Google Scholar
Prinz F, Schlange T, Asadullah K (2011) Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov 10(9):712–7c1
Article CAS Google Scholar
Begley CG, Ioannidis JP (2015) Reproducibility in science: improving the standard for basic and preclinical research. Circ Res 116(1):116–126
Article CAS Google Scholar
Collins FS, Tabak LA (2014) Policy: NIH plans to enhance reproducibility. Nature 505(7485):612–613
Article Google Scholar
McDowall RD, Pearce JC, Murkitt GS (1988) Laboratory information management systems—Part I. Concepts. J Pharm Biomed Anal 6(4):339–359
Article CAS Google Scholar
Hakkinen J, Levander F (2011) Laboratory data and sample management for proteomics. Methods Mol Biol 696:79–92
Article CAS Google Scholar
Hunter A, Dayalan S, De Souza D, Power B, Lorrimar R, Szabo T et al (2017) MASTR-MS: a web-based collaborative laboratory information management system (LIMS) for metabolomics. Metabolomics 13(2):14016-1142-2. Epub 2016 Dec 27
Article CAS Google Scholar
Lin K, Kools H, de Groot PJ, Gavai AK, Basnet RK, Cheng F et al (2011) MADMAX - management and analysis database for multiple ~omics experiments. J Integr Bioinform 8(2):160,jib-2011-160
Article Google Scholar
Stephan C, Kohl M, Turewicz M, Podwojski K, Meyer HE, Eisenacher M (2010) Using Laboratory Information Management Systems as central part of a proteomics data workflow. Proteomics 10(6):1230–1249
Article CAS Google Scholar
Venco F, Vaskin Y, Ceol A, Muller H (2014) SMITH: a LIMS for handling next-generation sequencing workflows. BMC Bioinformatics 15(Suppl 14):S3. Epub 2014 Nov 27
Article Google Scholar
Harris M, Jones TA (2002) Xtrack - a web-based crystallographic notebook. Acta Crystallogr D Biol Crystallogr 58(Pt 10 Pt 2):1889–1891
Article CAS Google Scholar
Lab Information Management Systems (LIMS). https://www.thermofisher.com/us/en/home/life-science/lab-data-management-analysis-software/enterprise-level-lab-informatics/lab-information-management-systems-lims.html. Accessed 25 Apr 2019
Laboratory Information Management System (LIMS). https://www.autoscribeinformatics.com/lims-laboratory-information-management-system. Accessed 6 May 2019
Produce reliable results more quickly. https://www.illumina.com/informatics/sample-experiment-management/lims.html. Accessed 25 Apr 2019
St. Cyr K, Hill A, Warren P, Mounts D, Whitley M, Mounts W et al (2010) From project-to-peptides: customizing a commercial LIMS for LC-MS proteomics. J Biomol Tech 21(3):S9
Google Scholar
Zolnai Z, Lee PT, Li J, Chapman MR, Newman CS, Phillips GN Jr et al (2003) Project management system for structural and functional proteomics: SESAME. J Struct Funct Genom 4(1):11–23
Article CAS Google Scholar
Morris C (2015) PiMS: a data management system for structural proteomics. Methods Mol Biol 1261:21–34
Article CAS Google Scholar
Daniel E, Lin B, Diprose JM, Griffiths SL, Morris C, Berry IM et al (2011) xtalPiMS: a PiMS-based web application for the management and monitoring of crystallization trials. J Struct Biol 175(2):230–235
Article CAS Google Scholar
Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I et al (2005) HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories. Acta Crystallogr D Biol Crystallogr 61(Pt 6):671–678
Article CAS Google Scholar
Bonanno JB, Almo SC, Bresnick A, Chance MR, Fiser A, Swaminathan S et al (2005) New York-Structural GenomiX Research Consortium (NYSGXRC): a large scale center for the protein structure initiative. J Struct Funct Genom 6(2–3):225–232
Article CAS Google Scholar
Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR et al (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr D 67(Pt 4):235–242
Article CAS Google Scholar
Potterton L, Agirre J, Ballard C, Cowtan K, Dodson E, Evans PR et al (2018) CCP4i2: the new graphical user interface to the CCP4 program suite. Acta Crystallogr D Struct Biol 74(Pt 2):68–84
Article CAS Google Scholar
Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N et al (2010) PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D66(Pt 2):213–221
Google Scholar
Echols N, Grosse-Kunstleve RW, Afonine PV, Bunkoczi G, Chen VB, Headd JJ et al (2012) Graphical tools for macromolecular crystallography in PHENIX. J Appl Crystallogr 45(Pt 3):581–586
Article CAS Google Scholar
Minor W, Cymborowski M, Otwinowski Z, Chruszcz M (2006) HKL-3000: the integration of data reduction and structure solution - from diffraction images to an initial model in minutes. Acta Crystallogr D Biol Crystallogr D62:859–866
Article CAS Google Scholar
Cymborowski M, Klimecka M, Chruszcz M, Zimmerman MD, Shumilin IA, Borek D et al (2010) To automate or not to automate: this is the question. J Struct Funct Genom 11(3):211–221
Article CAS Google Scholar
Zimmerman MD, Grabowski M, Domagalski MJ, MacLean EM, Chruszcz M, Minor W (2014) Data management in the modern structural biology and biomedical research environment. Methods Mol Biol 1140:1–25
Article Google Scholar
Zimmerman MD, Chruszcz M, Koclega K, Otwinowski Z, Minor W (2005) The Xtaldb system for project salvaging in high-throughput crystallization. Acta Crystallogr A 61:c178–c179
Article Google Scholar
Zimmerman MD (2008) The crystallization expert system Xtaldb, and its application to the structure of the 5′- nucleotidase YfbR and other proteins [dissertation]. University of Virginia, Charlottesville
Google Scholar
Chruszcz M, Wlodawer A, Minor W (2008) Determination of protein structures—a series of fortunate events. Biophys J 95(1):1–9
Article CAS Google Scholar
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36
Article CAS Google Scholar
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A et al (2016) PubChem Substance and Compound databases. Nucleic Acids Res 44(D1):D1202–D1213
Article CAS Google Scholar
Formulatrix. https://formulatrix.com/. Accessed 6 May 2019
Newman J (2005) Expanding screening space through the use of alternative reservoirs in vapor-diffusion experiments. Acta Crystallogr D Biol Crystallogr 61(Pt 4):490–493
Article CAS Google Scholar
Cooper DR, Boczek T, Grelewska K, Pinkowska M, Sikorska M, Zawadzki M et al (2007) Protein crystallization by surface entropy reduction: optimization of the SER strategy. Acta Crystallogr D Biol Crystallogr 63(Pt 5):636–645
Article CAS Google Scholar
CakePHP. https://cakephp.org/. Accessed 6 May 2019
Shabalin IG, Porebski PJ, Minor W (2018) Refining the macromolecular model - achieving the best agreement with the data from X-ray diffraction experiment. Crystallogr Rev 24(4):236–262
Article CAS Google Scholar
Czub MP, Venkataramany BS, Majorek KA, Handing KB, Porebski PJ, Beeram SR et al (2018) Testosterone meets albumin - the molecular mechanism of sex hormone transport by serum albumins. Chem Sci 10(6):1607–1618
Article Google Scholar
Majorek KA, Porebski PJ, Dayal A, Zimmerman MD, Jablonska K, Stewart AJ et al (2012) Structural and immunologic characterization of bovine, horse, and rabbit serum albumins. Mol Immunol 52(3–4):174–182
Article CAS Google Scholar
Svare A, Nilsen TI, Asvold BO, Forsmo S, Schei B, Bjoro T et al (2013) Does thyroid function influence fracture risk? Prospective data from the HUNT2 study, Norway. Eur J Endocrinol 169(6):845–852
Article CAS Google Scholar
Majorek KA, Kuhn ML, Chruszcz M, Anderson WF, Minor W (2014) Double trouble-buffer selection and His-tag presence may be responsible for nonreproducibility of biomedical experiments. Protein Sci 23(10):1359–1368
Article CAS Google Scholar
How a typo in a catalog number led to the correction of a scientific paper—and what we can learn from that. https://retractionwatch.com/2018/10/18/how-a-typo-in-a-catalog-number-led-to-the-correction-of-a-scientific-paper-and-what-we-can-learn-from-that/. Accessed 8 May 2019

Download references

Acknowledgments

We thank all the users of our data management programs who over many years provided us with numerous complaints, suggestions, and requests that gave us invaluable feedback to improve our tools. This work was supported by the National Institute of General Medical Sciences under Grants GM117080 and GM117325, National Institutes of Health BD2K program under grant HG008424, and the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under Contract No. HHSN272201700060C and HHSN272201200026C.

Disclosure statement: One of the authors (W.M.) notes that he has also been involved in the development of state-of-the-art software and data management and mining tools; some of them were commercialized by HKL Research, Inc. and are mentioned in the paper. W.M. is the co-founder of HKL Research, Inc. and a member of the board.

The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Author information

David R. Cooper and Marek Grabowski contributed equally to this work.

Authors and Affiliations

Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
David R. Cooper, Marek Grabowski, Matthew D. Zimmerman, Przemyslaw J. Porebski, Ivan G. Shabalin, Magdalena Woinska, Marcin J. Domagalski, Heping Zheng, Piotr Sroka, Marcin Cymborowski, Mateusz P. Czub, Ewa Niedzialkowska, Barat S. Venkataramany, Tomasz Osinski, Zbigniew Fratczak, Jacek Bajor, Juliusz Gonera, Elizabeth MacLean, Kamila Wojciechowska, Krzysztof Konina, Wojciech Wajerowicz, Maksymilian Chruszcz & Wladek Minor
Center for Structural Genomics of Infectious Diseases, University of Virginia, Charlottesville, VA, USA
David R. Cooper, Marek Grabowski, Ivan G. Shabalin, Magdalena Woinska, Marcin J. Domagalski, Piotr Sroka, Marcin Cymborowski, Mateusz P. Czub, Ewa Niedzialkowska & Wladek Minor
HKL Research, Inc., Charlottesville, VA, USA
David R. Cooper
Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, USA
Maksymilian Chruszcz

Authors

David R. Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Marek Grabowski
View author publications
You can also search for this author in PubMed Google Scholar
Matthew D. Zimmerman
View author publications
You can also search for this author in PubMed Google Scholar
Przemyslaw J. Porebski
View author publications
You can also search for this author in PubMed Google Scholar
Ivan G. Shabalin
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Woinska
View author publications
You can also search for this author in PubMed Google Scholar
Marcin J. Domagalski
View author publications
You can also search for this author in PubMed Google Scholar
Heping Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Sroka
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Cymborowski
View author publications
You can also search for this author in PubMed Google Scholar
Mateusz P. Czub
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Niedzialkowska
View author publications
You can also search for this author in PubMed Google Scholar
Barat S. Venkataramany
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Osinski
View author publications
You can also search for this author in PubMed Google Scholar
Zbigniew Fratczak
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Bajor
View author publications
You can also search for this author in PubMed Google Scholar
Juliusz Gonera
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth MacLean
View author publications
You can also search for this author in PubMed Google Scholar
Kamila Wojciechowska
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Konina
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Wajerowicz
View author publications
You can also search for this author in PubMed Google Scholar
Maksymilian Chruszcz
View author publications
You can also search for this author in PubMed Google Scholar
Wladek Minor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wladek Minor .

Editor information

Editors and Affiliations

Department of Applied Biology and Chemical Technology and the State Key Laboratory of Chemical Biology and Drug Discovery, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Yu Wai Chen
BMI Biotechnology Appraisals & Investments Limited, Wan Chai, Hong Kong
Chin-Pang Bennu Yiu

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Cooper, D.R. et al. (2021). State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments. In: Chen, Y.W., Yiu, CP.B. (eds) Structural Genomics. Methods in Molecular Biology, vol 2199. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0892-0_13

Download citation

DOI: https://doi.org/10.1007/978-1-0716-0892-0_13
Published: 31 October 2020
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0891-3
Online ISBN: 978-1-0716-0892-0
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data Management in the Modern Structural Biology and Biomedical Research Environment

The Protein Data Bank Archive

Databases, Repositories, and Other Data Resources in Structural Biology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

State-of-the-Art Data Management: Improving the Reproducibility, Consistency, and Traceability of Structural Biology and in Vitro Biochemical Experiments

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Data Management in the Modern Structural Biology and Biomedical Research Environment

The Protein Data Bank Archive

Databases, Repositories, and Other Data Resources in Structural Biology

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation