Skip to main content
Log in

Containers for computational reproducibility

  • Primer
  • Published:

From Nature Reviews Methods Primers

View current issue Sign up to alerts

Abstract

The fast-paced development of computational tools has enabled tremendous scientific progress in recent years. However, this rapid surge of technological capability also comes at a cost, as it leads to an increase in the complexity of software environments and potential compatibility issues across systems. Advanced workflows in processing or analysis often require specific software versions and operating systems to run smoothly, and discrepancies across machines and researchers can impede reproducibility and efficient collaboration. As a result, scientific teams are increasingly relying on containers to implement robust, dependable research ecosystems. Originally popularized in software engineering, containers have become common in scientific projects, particularly in large collaborative efforts. In this Primer, we describe what containers are, how they work and the rationale for their use in scientific projects. We review state-of-the-art implementations in diverse contexts and fields, with examples in various scientific fields. Finally, we discuss the possibilities enabled by the widespread adoption of containerization, especially in the context of open and reproducible research, and propose recommendations to facilitate seamless implementation across platforms and domains, including within high-performance computing clusters such as those typically available at universities and research institutes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Docker architecture.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Hsiehchen, D., Espinoza, M. & Hsieh, A. Multinational teams and diseconomies of scale in collaborative research. Sci. Adv. 1, e1500211 (2015).

    Article  ADS  Google Scholar 

  2. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  ADS  Google Scholar 

  3. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

    Article  ADS  Google Scholar 

  4. DeGrace, M. M. et al. Defining the risk of SARS-CoV-2 variants on immune protection. Nature 605, 640–652 (2022).

    Article  ADS  Google Scholar 

  5. Berrang-Ford, L. et al. A systematic global stocktake of evidence on human adaptation to climate change. Nat. Clim. Change 11, 989–1000 (2021).

    Article  ADS  Google Scholar 

  6. Donoho, D. L. An invitation to reproducible computational research. Biostatistics 11, 385–388 (2010).

    Article  MATH  Google Scholar 

  7. Prabhu, P. et al. in State of the Practice Reports 1–12 (Association for Computing Machinery, 2011).

  8. Humphreys, P. in Science in the Context of Application (eds Carrier, M. & Nordmann, A.) 131–142 (Springer Netherlands, 2011).

  9. Cioffi-Revilla, C. in Introduction to Computational Social Science: Principles and Applications (ed. Cioffi-Revilla, C.) 35–102 (Springer International Publishing, 2017).

  10. Levenstein, M. C. & Lyle, J. A. Data: sharing is caring. Adv. Methods Pract. Psychol. Sci. 1, 95–103 (2018).

    Article  Google Scholar 

  11. Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLoS Biol. 14, e1002456 (2016).

    Article  Google Scholar 

  12. Auer, S. et al. Science forum: a community-led initiative for training in reproducible research. eLife https://doi.org/10.7554/eLife.64719 (2021).

    Article  Google Scholar 

  13. Epskamp, S. Reproducibility and replicability in a fast-paced methodological world. Adv. Methods Pract. Psychol. Sci. 2, 145–155 (2019).

    Article  Google Scholar 

  14. Pittard, W. S. & Li, S. in Computational Methods and Data Analysis for Metabolomics (ed. Li, S.) 265–311 (Springer US, 2020).

  15. Baker, M. 1,500 Scientists lift the lid on reproducibility. Nature https://doi.org/10.1038/533452a (2016).

    Article  Google Scholar 

  16. Baker, M. Reproducibility: seek out stronger science. Nature 537, 703–704 (2016).

    Article  Google Scholar 

  17. Button, K. S., Chambers, C. D., Lawrence, N. & Munafò, M. R. Grassroots training for reproducible science: a consortium-based approach to the empirical dissertation. Psychol. Learn. Teach. 19, 77–90 (2020).

    Article  Google Scholar 

  18. Wilson, G. et al. Good enough practices in scientific computing. PLoS Comput. Biol. 13, e1005510 (2017). This article outlines a set of good computing practices that every researcher can adopt, regardless of their current level of computational skill. These practices encompass data management, programming, collaborating with colleagues, organizing projects, tracking work and writing manuscripts.

    Article  Google Scholar 

  19. Vicente-Saez, R. & Martinez-Fuentes, C. Open science now: a systematic literature review for an integrated definition. J. Bus. Res. 88, 428–436 (2018).

    Article  Google Scholar 

  20. McKiernan, E. C. et al. How open science helps researchers succeed. eLife 5, e16800 (2016).

    Article  Google Scholar 

  21. Woelfle, M., Olliaro, P. & Todd, M. H. Open science is a research accelerator. Nat. Chem. 3, 745–748 (2011).

    Article  Google Scholar 

  22. Evans, J. A. & Reimer, J. Open access and global participation in science. Science 323, 1025 (2009).

    Article  ADS  Google Scholar 

  23. Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013).

    Article  ADS  Google Scholar 

  24. Fan, G. et al. in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis 463–474 (Association for Computing Machinery, 2020).

  25. Liu, K. & Aida, K. in 2016 International Conference on Cloud Computing Research and Innovations (ICCCRI) 56–63 (IEEE, 2016).

  26. Hale, J. S., Li, L., Richardson, C. N. & Wells, G. N. Containers for portable, productive, and performant scientific computing. Comput. Sci. Eng. 19, 40–50 (2017).

    Article  Google Scholar 

  27. Boettiger, C., Center for Stock Assessment Research. An introduction to Docker for reproducible research. Oper. Syst. Rev. https://doi.org/10.1145/2723872.2723882 (2015). This article explores how Docker can help address challenges in computational reproducibility in scientific research, examining how Docker combines several areas from systems research to facilitate reproducibility, portability and extensibility of computational work.

    Article  Google Scholar 

  28. Kiar, G. et al. Science in the cloud (SIC): a use case in MRI connectomics. Gigascience 6, gix013 (2017).

    Article  ADS  Google Scholar 

  29. Merkel, D. Docker: lightweight Linux containers for consistent development and deployment. Seltzer https://www.seltzer.com/margo/teaching/CS508.19/papers/merkel14.pdf (2013). This article describes how Docker can package applications and their dependencies into lightweight containers that move easily between different distros, start up quickly and are isolated from each other.

  30. Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).

    Article  Google Scholar 

  31. Sochat, V. V., Prybol, C. J. & Kurtzer, G. M. Enhancing reproducibility in scientific computing: metrics and registry for Singularity containers. PLoS ONE 12, e0188511 (2017). This article presents Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute. The article also introduces Singularity Python software with novel metrics for assessing reproducibility of such containers.

    Article  Google Scholar 

  32. Walsh, D. & Podman team. Podman: A Tool for Managing OCI Containers and Pods. Github https://github.com/containers/podman (2023).

  33. Potdar, A. M., Narayan, D. G., Kengond, S. & Mulla, M. M. Performance evaluation of Docker container and virtual machine. Procedia Comput. Sci. 171, 1419–1428 (2020).

    Article  Google Scholar 

  34. Gerhardt, L. et al. Shifter: containers for HPC. J. Phys. Conf. Ser. 898, 082021 (2017).

    Article  Google Scholar 

  35. Ram, K. Git can facilitate greater reproducibility and increased transparency in science. Source Code Biol. Med. 8, 7 (2013).

    Article  Google Scholar 

  36. Vuorre, M. & Curley, J. P. Curating research assets: a tutorial on the git version control system. Adv. Methods Pract. Psychol. Sci. 1, 219–236 (2018).

    Article  Google Scholar 

  37. Clyburne-Sherin, A., Fei, X. & Green, S. A. Computational reproducibility via containers in psychology. Meta Psychol. 3, 892 (2019).

    Article  Google Scholar 

  38. Boettiger, C. & Eddelbuettel, D. An introduction to rocker: Docker containers for R. R J. 9, 527 (2017).

    Article  Google Scholar 

  39. Nüst, D. et al. The Rockerverse: packages and applications for containerization with R. Preprint at https://doi.org/10.48550/arXiv.2001.10641 (2020).

  40. Nüst, D. & Hinz, M. containerit: generating Dockerfiles for reproducible research with R. J. Open Source Softw. 4, 1603 (2019).

    Article  ADS  Google Scholar 

  41. Xiao, N. Liftr: Containerize R markdown documents for continuous reproducibility (CRAN, 2019).

  42. Peikert, A. & Brandmaier, A. M. A reproducible data analysis workflow with R Markdown, Git, Make, and Docker. Preprint at PsyArXiv https://doi.org/10.31234/osf.io/8xzqy (2019).

    Article  Google Scholar 

  43. Younge, A. J., Pedretti, K., Grant, R. E. & Brightwell, R. in 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) 74–81 (2017).

  44. Freire, J., Bonnet, P. & Shasha, D. in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data 593–596 (Association for Computing Machinery, 2012).

  45. Papin, J. A., Mac Gabhann, F., Sauro, H. M., Nickerson, D. & Rampadarath, A. Improving reproducibility in computational biology research. PLoS Comput. Biol. 16, e1007881 (2020).

    Article  Google Scholar 

  46. Sochat, V. V. et al. The experiment factory: standardizing behavioral experiments. Front. Psychol. 7, 610 (2016).

    Article  Google Scholar 

  47. Khan, F. Z. et al. Sharing interoperable workflow provenance: a review of best practices and their practical application in CWLProv. Gigascience 8, giz095 (2019).

    Article  Google Scholar 

  48. Kane, S. P. & Matthias, K. Docker: Up & Running: Shipping Reliable Containers in Production (‘O’Reilly Media, Inc., 2018).

  49. Khan, A. Key characteristics of a container orchestration platform to enable a modern application. IEEE Cloud Comput. 4, 42–48 (2017).

    Article  Google Scholar 

  50. Singh, S. & Singh, N. in 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) 804–807 (2016).

  51. Singh, V. & Peddoju, S. K. in 2017 International Conference on Computing, Communication and Automation (ICCCA) 847–852 (IEEE, 2017).

  52. Kang, H., Le, M. & Tao, S. in 2016 IEEE International Conference on Cloud Engineering (IC2E) 202–211 (IEEE, 2016).

  53. Sultan, S., Ahmad, I. & Dimitriou, T. Container security: issues, challenges, and the road ahead. IEEE Access. 7, 52976–52996 (2019).

    Article  Google Scholar 

  54. Ruiz, C., Jeanvoine, E. & Nussbaum, L. in Euro-Par 2015: Parallel Processing Workshops 813–824 (Springer International Publishing, 2015).

  55. Nadgowda, S., Suneja, S. & Kanso, A. in 2017 IEEE International Conference on Cloud Engineering (IC2E) 266–272 (IEEE, 2017).

  56. Srirama, S. N., Adhikari, M. & Paul, S. Application deployment using containers with auto-scaling for microservices in cloud environment. J. Netw. Computer Appl. 160, 102629 (2020).

    Article  Google Scholar 

  57. Cito, J. et al. in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 323–333 (IEEE, 2017).

  58. Poldrack, R. A. & Gorgolewski, K. J. Making Big Data open: data sharing in neuroimaging. Nat. Neurosci. 17, 1510–1517 (2014).

    Article  Google Scholar 

  59. Smith, S. M. & Nichols, T. E. Statistical challenges in ‘Big Data’ human neuroimaging. Neuron 97, 263–268 (2018).

    Article  Google Scholar 

  60. Tourbier, S. et al. Connectome Mapper 3: a flexible and open-source pipeline software for multiscale multimodal human connectome mapping. J. Open Source Softw. 7, 4248 (2022).

    Article  ADS  Google Scholar 

  61. Nichols, T. E. et al. Best practices in data analysis and sharing in neuroimaging using MRI. Nat. Neurosci. 20, 299–303 (2017).

    Article  Google Scholar 

  62. Halchenko, Y. O. & Hanke, M. Open is not enough. Let’s take the next step: an integrated, community-driven computing platform for neuroscience. Front. Neuroinform. 6, 22 (2012).

    Article  Google Scholar 

  63. Schalk, G. & Mellinger, J. A Practical Guide to Brain–Computer Interfacing with BCI2000: General-Purpose Software for BrainComputer Interface Research, Data Acquisition, Stimulus Presentation, and Brain Monitoring (Springer Science & Business Media, 2010).

  64. Kaur, B., Dugré, M., Hanna, A. & Glatard, T. An analysis of security vulnerabilities in container images for scientific data analysis. Gigascience 10, giab025 (2021).

    Article  Google Scholar 

  65. Huang, Y. et al. Realized ecological forecast through an interactive Ecological Platform for Assimilating Data (EcoPAD, v1.0) into models. Geosci. Model. Dev. 12, 1119–1137 (2019).

    Article  ADS  Google Scholar 

  66. White, E. P. et al. Developing an automated iterative near‐term forecasting system for an ecological study. Methods Ecol. Evol. 10, 332–344 (2019).

    Article  Google Scholar 

  67. Powers, S. M. & Hampton, S. E. Open science, reproducibility, and transparency in ecology. Ecol. Appl. 29, e01822 (2019).

    Article  Google Scholar 

  68. Ali, A. S., Coté, C., Heidarinejad, M. & Stephens, B. Elemental: an open-source wireless hardware and software platform for building energy and indoor environmental monitoring and control. Sensors 19, 4017 (2019).

    Article  ADS  Google Scholar 

  69. Morris, B. D. & White, E. P. The EcoData retriever: improving access to existing ecological data. PLoS ONE 8, e65848 (2013).

    Article  ADS  Google Scholar 

  70. Schulz, W. L., Durant, T. J. S., Siddon, A. J. & Torres, R. Use of application containers and workflows for genomic data analysis. J. Pathol. Inform. 7, 53 (2016).

    Article  Google Scholar 

  71. Di Tommaso, P. et al. The impact of Docker containers on the performance of genomic pipelines. PeerJ 3, e1273 (2015).

    Article  Google Scholar 

  72. O’Connor, B. D. et al. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res. 6, 52 (2017).

    Article  Google Scholar 

  73. Bai, J. et al. BioContainers registry: searching bioinformatics and proteomics tools, packages, and containers. J. Proteome Res. 20, 2056–2061 (2021).

    Article  Google Scholar 

  74. Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  Google Scholar 

  75. Zhu, T., Liang, C., Meng, Z., Guo, S. & Zhang, R. GFF3sort: a novel tool to sort GFF3 files for tabix indexing. BMC Bioinformatics 18, 482 (2017).

    Article  Google Scholar 

  76. Müller Paul, H., Istanto, D. D., Heldenbrand, J. & Hudson, M. E. CROPSR: an automated platform for complex genome-wide CRISPR gRNA design and validation. BMC Bioinformatics 23, 74 (2022).

    Article  Google Scholar 

  77. Torre, D., Lachmann, A. & Ma’ayan, A. BioJupies: automated generation of interactive notebooks for RNA-Seq data analysis in the cloud. Cell Syst. 7, 556–561.e3 (2018).

    Article  Google Scholar 

  78. Mahi, N. A., Najafabadi, M. F., Pilarczyk, M., Kouril, M. & Medvedovic, M. GREIN: an interactive web platform for re-analyzing GEO RNA-seq data. Sci. Rep. 9, 7580 (2019).

    Article  ADS  Google Scholar 

  79. Dobin, A. & Gingeras, T. R. Mapping RNA-seq reads with STAR. Curr. Protoc. Bioinform. 51, 11.14.1–11.14.19 (2015).

    Article  Google Scholar 

  80. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  Google Scholar 

  81. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).

    Article  Google Scholar 

  82. Yan, F., Powell, D. R., Curtis, D. J. & Wong, N. C. From reads to insight: a Hitchhiker’s guide to ATAC-seq data analysis. Genome Biol. 21, 22 (2020).

    Article  Google Scholar 

  83. Garcia, M. et al. Sarek: a portable workflow for whole-genome sequencing analysis of germline and somatic variants. Preprint at bioRxiv https://doi.org/10.1101/316976 (2018).

    Article  Google Scholar 

  84. Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).

    Article  Google Scholar 

  85. Zarate, S. et al. Parliament2: accurate structural variant calling at scale. Gigascience 9, giaa145 (2020).

    Article  Google Scholar 

  86. Morris, D., Voutsinas, S., Hambly, N. C. & Mann, R. G. Use of Docker for deployment and testing of astronomy software. Astron. Comput. 20, 105–119 (2017).

    Article  ADS  Google Scholar 

  87. Taghizadeh-Popp, M. et al. SciServer: a science platform for astronomy and beyond. Astron. Comput. 33, 100412 (2020).

    Article  ADS  Google Scholar 

  88. Herwig, F. et al. Cyberhubs: virtual research environments for astronomy. Astrophys. J. Suppl. Ser. 236, 2 (2018).

    Article  ADS  Google Scholar 

  89. The Astropy Collaboration. et al. The Astropy Project: building an open-science project and status of the v2.0 Core Package*. Astron. J. 156, 123 (2018).

    Article  ADS  Google Scholar 

  90. Robitaille, T. P. et al. Astropy: a community Python package for astronomy. Astron. Astrophys. Suppl. Ser. 558, A33 (2013).

    Article  Google Scholar 

  91. Abolfathi, B. et al. The fourteenth data release of the sloan digital sky survey: first spectroscopic data from the extended Baryon Oscillation Spectroscopic Survey and from the Second Phase of the Apache Point Observatory Galactic Evolution Experiment. Astrophys. J. Suppl. Ser. 235, 42 (2018).

    Article  ADS  Google Scholar 

  92. Nigro, C. et al. Towards open and reproducible multi-instrument analysis in gamma-ray astronomy. Astron. Astrophys. Suppl. Ser. 625, A10 (2019).

    Article  Google Scholar 

  93. Liu, Q., Zheng, W., Zhang, M., Wang, Y. & Yu, K. Docker-based automatic deployment for nuclear fusion experimental data archive cluster. IEEE Trans. Plasma Sci. IEEE Nucl. Plasma Sci. Soc. 46, 1281–1284 (2018).

    Article  ADS  Google Scholar 

  94. Meng, H. et al. An invariant framework for conducting reproducible computational science. J. Comput. Sci. 9, 137–142 (2015).

    Article  Google Scholar 

  95. Agostinelli, S. et al. Geant4 — a simulation toolkit. Nucl. Instrum. Methods Phys. Res. A 506, 250–303 (2003).

    Article  ADS  Google Scholar 

  96. Vallisneri, M., Kanner, J., Williams, R., Weinstein, A. & Stephens, B. The LIGO open science center. J. Phys. Conf. Ser. 610, 012021 (2015).

    Article  Google Scholar 

  97. Scott, D. & Becken, S. Adapting to climate change and climate policy: progress, problems and potentials. J. Sustain. Tour. 18, 283–295 (2010).

    Article  Google Scholar 

  98. Ebenhard, T. Conservation breeding as a tool for saving animal species from extinction. Trends Ecol. Evol. 10, 438–443 (1995).

    Article  Google Scholar 

  99. Warlenius, R., Pierce, G. & Ramasar, V. Reversing the arrow of arrears: the concept of ‘ecological debt’ and its value for environmental justice. Glob. Environ. Change 30, 21–30 (2015).

    Article  Google Scholar 

  100. Acker, J. G. & Leptoukh, G. Online analysis enhances use of NASA Earth science data. Eos Trans. Am. Geophys. Union 88, 14–17 (2007).

    Article  ADS  Google Scholar 

  101. Yang, C. et al. Big earth data analytics: a survey. Big Earth Data 3, 83–107 (2019).

    Article  Google Scholar 

  102. Wiebels, K. & Moreau, D. Leveraging containers for reproducible psychological research. Adv. Methods Pract. Psychol. Sci. 4, 25152459211017853 (2021). This article describes the logic behind containers and the practical problems they can solve. The tutorial section walks the reader through the implementation of containerization within a research workflow, with examples using Docker and R. The article provides a worked example that includes all steps required to set up a container for a research project, which can be easily adapted and extended.

    Google Scholar 

  103. Nüst, D. et al. Ten simple rules for writing Dockerfiles for reproducible data science. PLoS Comput. Biol. 16, e1008316 (2020). This article presents a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following these rules, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication and for effective and sustainable personal workflows.

    Article  Google Scholar 

  104. Elmenreich, W., Moll, P., Theuermann, S. & Lux, M. Making simulation results reproducible — survey, guidelines, and examples based on Gradle and Docker. PeerJ Comput. Sci. 5, e240 (2019).

    Article  Google Scholar 

  105. Van Moffaert, K. & Nowé, A. Multi-objective reinforcement learning using sets of pareto dominating policies. J. Mach. Learn. Res. 15, 3663–3692 (2014).

    MATH  MathSciNet  Google Scholar 

  106. Gama, J., Sebastião, R. & Rodrigues, P. P. On evaluating stream learning algorithms. Mach. Learn. 90, 317–346 (2013).

    Article  MATH  MathSciNet  Google Scholar 

  107. Kim, A. Y. et al. Implementing GitHub Actions continuous integration to reduce error rates in ecological data collection. Methods Ecol. Evol. 13, 2572–2585 (2022).

    Article  Google Scholar 

  108. Wilson, G. et al. Best practices for scientific computing. PLoS Biol. 12, e1001745 (2014).

    Article  Google Scholar 

  109. Eglen, S. J. et al. Toward standard practices for sharing computer code and programs in neuroscience. Nat. Neurosci. 20, 770–773 (2017).

    Article  Google Scholar 

  110. No authors listed. Rebooting review. Nat. Biotechnol. 33, 319 (2015).

    Article  Google Scholar 

  111. Kenall, A. et al. Better reporting for better research: a checklist for reproducibility. BMC Neurosci. 16, 44 (2015).

    Article  Google Scholar 

  112. Poldrack, R. A. The costs of reproducibility. Neuron 101, 11–14 (2019).

    Article  Google Scholar 

  113. Nagarajan, P., Warnell, G. & Stone, P. Deterministic implementations for reproducibility in deep reinforcement learning. Preprint at arXiv https://doi.org/10.48550/arXiv.1809.05676 (2018).

    Article  Google Scholar 

  114. Piccolo, S. R., Ence, Z. E., Anderson, E. C., Chang, J. T. & Bild, A. H. Simplifying the development of portable, scalable, and reproducible workflows. eLife 10, e71069 (2021).

    Article  Google Scholar 

  115. Higgins, J., Holmes, V. & Venters, C. in High Performance Computing 506–513 (Springer International Publishing, 2015).

  116. de Bayser, M. & Cerqueira, R. in 2017 IEEE International Conference on Cloud Engineering (IC2E) 259–265 (IEEE, 2017).

  117. Netto, M. A. S., Calheiros, R. N., Rodrigues, E. R., Cunha, R. L. F. & Buyya, R. HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput. Surv. 51, 1–29 (2018).

    Article  Google Scholar 

  118. Azab, A. in 2017 IEEE International Conference on Cloud Engineering (IC2E) 279–285 (IEEE, 2017).

  119. Qasha, R., Cała, J. & Watson, P. in 2016 IEEE 12th International Conference on e-Science (e-Science) 81–90 (IEEE, 2016).

  120. Saha, P., Beltre, A., Uminski, P. & Govindaraju, M. in Proceedings of the Practice and Experience on Advanced Research Computing 1–8 (Association for Computing Machinery, 2018).

  121. Abdelbaky, M., Diaz-Montes, J., Parashar, M., Unuvar, M. & Steinder, M. in 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC) 368–371 (IEEE, 2015).

  122. Hung, L.-H., Kristiyanto, D., Lee, S. B. & Yeung, K. Y. GUIdock: using Docker containers with a common graphics user interface to address the reproducibility of research. PLoS ONE 11, e0152686 (2016).

    Article  Google Scholar 

  123. Salza, P. & Ferrucci, F. Speed up genetic algorithms in the cloud using software containers. Future Gener. Comput. Syst. 92, 276–289 (2019).

    Article  Google Scholar 

  124. Pahl, C., Brogi, A., Soldani, J. & Jamshidi, P. Cloud container technologies: a state-of-the-art review. IEEE Trans. Cloud Comput. 7, 677–692 (2019).

    Article  Google Scholar 

  125. Dessalk, Y. D., Nikolov, N., Matskin, M., Soylu, A. & Roman, D. in Proceedings of the 12th International Conference on Management of Digital EcoSystems 76–83 (Association for Computing Machinery, 2020).

  126. Martín-Santana, S., Pérez-González, C. J., Colebrook, M., Roda-García, J. L. & González-Yanes, P. in Data Science and Digital Business (eds García Márquez, F. P. & Lev, B.) 121–146 (Springer International Publishing, 2019).

  127. Jansen, C., Witt, M. & Krefting, D. in Computational Science and Its Applications — ICCSA 2016 303–318 (Springer International Publishing, 2016).

  128. Brinckman, A. et al. Computing environments for reproducibility: capturing the ‘Whole Tale’. Future Gener. Comput. Syst. 94, 854–867 (2019).

    Article  Google Scholar 

  129. Perkel, J. M. Make code accessible with these cloud services. Nature 575, 247–248 (2019).

    Article  ADS  Google Scholar 

  130. Poldrack, R. A., Gorgolewski, K. J. & Varoquaux, G. Computational and informatic advances for reproducible data analysis in neuroimaging. Annu. Rev. Biomed. Data Sci. 2, 119–138 (2019).

    Article  Google Scholar 

  131. Vaillancourt, P. Z., Coulter, J. E., Knepper, R. & Barker, B. in 2020 IEEE High Performance Extreme Computing Conference (HPEC) 1–8 (IEEE, 2020).

  132. Adufu, T., Choi, J. & Kim, Y. in 17th Asia-Pacific Network Operations and Management Symposium (APNOMS) 507–510 (IEEE, 2015).

  133. Cito, J., Ferme, V. & Gall, H. C. in Web Engineering 609–612 (Springer International Publishing, 2016).

  134. Tedersoo, L. et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci. Data 8, 192 (2021).

    Article  Google Scholar 

  135. Tenopir, C. et al. Data sharing by scientists: practices and perceptions. PLoS ONE 6, e21101 (2011).

    Article  ADS  Google Scholar 

  136. Gomes, D. G. E. et al. Why don’t we share data and code? Perceived barriers and benefits to public archiving practices. Proc. Biol. Sci. 289, 20221113 (2022).

    Google Scholar 

  137. Weston, S. J., Ritchie, S. J., Rohrer, J. M. & Przybylski, A. K. Recommendations for increasing the transparency of analysis of preexisting data sets. Adv. Methods Pract. Psychol. Sci. 2, 214–227 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

D.M. and K.W. are supported by a Marsden grant from the Royal Society of New Zealand and a University of Auckland Early Career Research Excellence Award awarded to D.M.

Author information

Authors and Affiliations

Authors

Contributions

Introduction (D.M., K.W. and C.B.); Experimentation (D.M., K.W. and C.B.); Results (D.M., K.W. and C.B.); Applications (D.M., K.W. and C.B.); Reproducibility and data deposition (D.M., K.W. and C.B.); Limitations and optimizations (D.M., K.W. and C.B.); Outlook (D.M., K.W. and C.B.).

Corresponding author

Correspondence to David Moreau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Methods Primers thanks Beth Ciimini, Stephen Piccolo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

ACM Digital Library: https://dl.acm.org

Amazon Web Services: https://aws.amazon.com

Ansible: https://ansible.com

Astropy: https://astropy.org

ATAC-seq Pipeline: https://github.com/ENCODE-DCC/atac-seq-pipeline

BCI2000 project: https://bci2000.org/

Binder: https://mybinder.org/

Bioconductor: https://bioconductor.org

BioContainers: https://biocontainers.pro

Bismark: https://www.bioinformatics.babraham.ac.uk/projects/bismark/

Breakdancer: https://github.com/genome/breakdancer

CERN Container Registry: https://hub.docker.com/u/cern

Chef: https://chef.io

CodeOcean: http://codeocean.com

Containerd: https://containerd.io/

Docker Hub: https://hub.docker.com

EarthData: https://earthdata.nasa.gov

EcoData Retriever: https://ecodataretriever.org

Ecological Niche Modelling on Docker: https://github.com/ghuertaramos/ENMOD

Ecopath: https://ecopath.org/

EIGENSOFT: https://hsph.harvard.edu/alkes-price/software/eigensoft

Environmental Data Commons: https://edc.occ-data.org

Experiment Factory: https://expfactory.github.io

F1000Research guidelines: https://f1000research.com/for-authors/article-guidelines/software-tool-articles

fmriprep: https://fmriprep.org

FSL project: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki

GATK: https://gatk.broadinstitute.org

gdb: https://github.com/haggaie/docker-gdb

GEANT4: https://geant4.web.cern.ch

GeoServer: https://geoserver.org

GitHub Actions: https://github.com/features/actions

GitHub Container Registry: https://github.com/features/packages

Google Cloud Platform: https://cloud.google.com

GRASS GIS: https://grass.osgeo.org

Jenkins: https://jenkins.io

liftr: https://liftr.me/

LIGO Open Science Centre: https://losc.ligo.org

LXC: https://linuxcontainers.org

Marble Station: https://github.com/marblestation/docker-astro

Mesos: https://mesos.apache.org

NEST: https://nest-simulator.org

NeuroDebian: https://neuro.debian.net

NEURON: https://neuron.yale.edu/neuron

OpenShift: https://openshift.com/

Planet Research Data Commons: https://ardc.edu.au/program/planet-research-data-commons

Podman: https://podman.io/

Puppet: https://puppet.com

QGIS: https://qgis.org

Quay: https://quay.io

Rocker project: https://rocker-project.org/

Rocket: https://github.com/rkt/rkt

Salmon: https://combine-lab.github.io/salmon

SciServer: https://sciserver.org

Singularity: https://sylabs.io/

STAR: https://github.com/alexdobin/STAR

strace: https://github.com/amrabed/strace-docker

SVTyper: https://github.com/hall-lab/svtyper

Supplementary information

Glossary

Clusters

Groups of machines that work together to run containerized applications.

Compute resources

The resources required by a container to run, including central processing units, memory and storage.

Containerization platform

A complete system for building, deploying and managing containerized applications, typically including a container runtime, and additional tools and services for things such as container orchestration, networking, storage and security.

Container runtime

The software responsible for running and managing containers on a host machine, involving tasks such as starting and stopping containers, allocating resources to them and providing an isolated environment for them to run in.

Continuous Integration/Continuous Deployment

(CI/CD). A software development practice that involves continuously integrating code changes into a shared repository and continuously deploying changes to a production environment.

Dependencies

Software components that a particular application relies on to run properly, including libraries, tools and frameworks.

Distributed-control model

A deployment model in which control is distributed among multiple independent nodes, rather than being centralized in a single control node.

Docker engine

The containerization technology that Docker uses, consisting of the Docker daemon running on the computer and the Docker client that communicates with the daemon to execute commands.

Dockerfiles

A script that contains instructions for building a Docker image.

Environment variables

A variable that is passed to a container at runtime, allowing the container to configure itself on the basis of the value of the variable.

High-performance computing

The use of supercomputers and parallel processing techniques to solve complex computational problems that require a large amount of processing power, memory and storage capacity.

Host operating system

Primary operating system running on the physical computer or server in which virtual machines or containers are created and managed.

Image

A preconfigured package that contains all the necessary files and dependencies for running a piece of software in a container.

Namespaces

Virtualization mechanisms for containers, which allow multiple containers to share the same system resources without interfering with each other.

Networking

The process of connecting multiple containers together and to external networks, allowing communication between containers and the outside world.

Orchestration

The process of automating the deployment, scaling and management of containerized applications in a cluster.

Orchestration platform

System for automating the deployment, scaling and management of containerized applications.

Port mapping

The process of exposing the network ports of a container to the host machine, allowing communication between the container and the host or other networked systems.

Production environment

Live, operational system in which software applications are deployed and used by end-users.

Runtime environment

Specific set of software and hardware configurations that are present and available for an application to run on, including the operating system, libraries, system tools and other dependencies.

Scaling

The process of increasing or decreasing the number of running instances of a containerized application to meet changing demand.

Shared-control model

Deployment model in which a single central entity has control over multiple resources or nodes.

Volumes

A storage mechanism for containers, which allows data to persist outside the file system of the container, including after a container has been deleted or replaced.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moreau, D., Wiebels, K. & Boettiger, C. Containers for computational reproducibility. Nat Rev Methods Primers 3, 50 (2023). https://doi.org/10.1038/s43586-023-00236-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s43586-023-00236-9

  • Springer Nature Limited

This article is cited by

Navigation