Abstract
Like other types of computational research, modeling and simulation of biological processes (biomodels) is still largely communicated without sufficient detail to allow independent reproduction of results. But reproducibility in this area of research could easily be achieved by making use of existing resources, such as supplying models in standard formats and depositing code, models, and results in public repositories.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Reproducibility is at the core of the scientific process. A basic aspect of science is the ability to establish reproducible effects. Karl Popper’s notion of falsifiability depends on reproducibility: a theory is falsified when reproducible effects refute it (Popper 1959). This means that results of scientific investigations must be reproducible or else support for “discoveries” becomes discredited (e.g., see Maddox et al. 1988). Therefore, scientific reports must describe experiments in sufficient detail to allow other researchers to reproduce them. A recent survey suggests that there is a problem with lack of reproducibility in a large proportion of scientific articles (Baker 2016).
This reproducibility crisis has been highlighted mainly for experimental research. At first glance, it could appear that computational research would not suffer from such problems since, after all, computers follow specific sets of instructions (programs) and thus can be run in a reproducible manner. This idea was formalized in the 1990s by Claerbout and Karrenbach (1992) who described how electronic publications could easily be made into reproducible publications. For example, they could allow the reader to re-execute analyses and plots directly from the data through an appropriately associated program. These authors already highlight, though, that a critical issue underlying computational reproducibility is that the programs used should be open source in order to be available to all readers. Claerbout and Karrenbach were essentially optimistic in how computational resources were going to make publications more reproducible, writing that “With workstations becoming widespread and software available, the burdens imposed on the author to create reproducible results are little more than the task of filing everything systematically” (Claerbout and Karrenbach 1992). Unfortunately this optimism did not materialize.
It is now widely accepted that the problem with reproducibility extends also to computational research (Mesirov 2010; Peng 2011; Stodden et al. 2016). And while some journals have adopted a few measures to minimize this problem (Greenbaum et al. 2017; Guerreiro 2017; Loew et al. 2015; Peng 2009), a large proportion of articles describing computational research are still hard or impossible to reproduce (Hothorn et al. 2009; Hothorn and Leisch 2011; Hübner et al. 2011; Stodden et al. 2018).
2 Which Reproducibility?
It is rather unhelpful that the word ‘reproducibility’ has been used with different meanings in this context, as summarized by Plesser (2018). The terminology proposed by Goodman et al. (2016) seems the most appropriate, and I will follow it here. These authors distinguish three different types of reproducibility:
-
reproducibility of methods requires one to be able to exactly reproduce the results using the same methods on the same data;
-
reproducibility of results requires one to obtain similar results in an independent study applying similar procedures;
-
reproducibility of inferences requires the same conclusions to be reached in an independent replication potentially following a different methodology.
All of these types of reproducibility are desirable but should be addressed differently. The reproducibility of inferences requires that similar conclusions are reached when an independent approach is applied to a problem. Thus, this aspect is perhaps the least problematic since it can be satisfied by clearly describing the problem and the conclusions. On the other end of the spectrum, to achieve reproducibility of methods in computational research requires that the exact same software and input data be made available. To achieve reproducibility of results, the methods and algorithms must be well defined and the input data are still required. Claerbout and Karrenbach were right in that to achieve all of these levels of reproducibility in computational research “merely” requires to describe all steps in detail—but therein lies the problem.
3 Biomodels
A subset of computational research concerns the use of dynamic models of biological systems (‘biomodels’). This includes a wide range of models, from those representing basic physicochemical phenomena, such as ligand-receptor binding, all the way to models of disease transmission in populations or of entire ecosystems. Models of biochemical reaction networks or pathways are perhaps the most widely represented in this class. Computational biomodels have been reported since the dawn of computers (e.g., Chance et al. 1960) and are now increasingly used to help understand phenomena and make predictions, as can be witnessed by reading the pages of this journal and many others.
Research with biomodels may be better equipped to deal with reproducibility than other types of computational research. A number of researchers have agreed on various standards and principles that promote reproducibility. Most notorious of all is the systems biology markup language (SBML, Hucka et al. 2003) a specification for biomodels that many software packages can read and write, thus promoting reproducibility [see also CellML (Lloyd et al. 2004), NeuroML (Gleeson et al. 2010), PharmML (Swat et al. 2015)]. Standards have also been proposed for model diagrams [SBGN (Novère et al. 2009)], simulation specifications [SED-ML (Waltemath et al. 2011)], and data [SBRML (Dada et al. 2010)]. Finally, there are also minimal information recommendations for publishing models [MIRIAM (Novère et al. 2005)] and simulations [MIASE (Waltemath et al. 2011)] that, if followed, assure a good level of reproducibility — essentially formalizing the process of “filing everything systematically” advocated by Claerbout and Karrenbach (1992).
SBML allows biomodels to be specified in a manner that describes the biology and mathematics without prescribing the algorithms to be applied. An SBML file describes the transformations (“reactions”) that variables (“species”) can undergo; the kinetics of the transformations are well specified with all necessary constants, and the initial state of the system is also included. In essence an SBML file contains all that is needed for software to construct and solve the equations. Because the algorithms are not prescribed, this allows SBML models to be used in different contexts and analyzed with different formalisms in addition to those used in the original research.
With all the standards mentioned above, computational results obtained with biomodels can easily be made reproducible. Publishing the biomodels as SBML files, either as attached supplementary material or by inclusion in a database like BioModels (Chelliah et al. 2015), satisfies reproducibility of results because it makes the model immediately available for simulation with a range of software [e.g., COPASI (Hoops et al. 2006), VCell (Moraru et al. 2008), and many others]. Then, by using the same software as the authors used originally, even reproducibility of methods is achieved. Because SBML does not prescribe the mathematical formalism, in some circumstances it can also facilitate reproducibility of inferences. For example, conclusions may have been obtained from an analysis using the linear noise approximation (e.g., Pahle et al. 2012), and others may reproduce the same conclusions using the Gillespie stochastic simulation algorithm (Gillespie 1976). This is possible because both methods are implemented in software packages that read SBML.
4 Publication of Electronic Materials
The most basic aspect to promote reproducibility of research using biomodels is to publish the electronic materials used. This includes any programs used, the input data (usually constants and initial conditions), and results. Where should these materials be published? There are a number of options in current use: public repositories; journal website as supplementary materials; author’s Web site; supplied by author upon reader’s request—listed in decreasing order of utility to the research community.
The choice of supplying materials only when requested is not practical, and there is plenty of evidence that too frequently it is not honored by authors (Stodden et al. 2018); additionally those materials become inevitably lost as authors retire or die. Any results relying on materials “supplied upon request” should be considered non-reproducible, and journal editors should simply not allow this practice.
Publication of materials in authors’ Web sites is only minimally better: for a short while the materials are indeed immediately available to all, but dead links appear at a fast pace. This is due to frequent website redesigns, authors moving to other institutions, retirement, etc.
Publication of materials in public repositories and in journal Web sites (as supplementary materials) is much better because the materials are immediately available and likely to be findable for a longer time. Both options have some advantages over the other, and it is unclear which of the two may be better. Thus, it is recommended that authors follow both whenever possible.
Several public repositories are conveniently available and are generally being used by a growing number of authors. For code, the most popular are GitHub (https://www.github.com) and CRAN (for R programs, https://cran.r-project.org/), though there are several are other options. For models, the most widely used repository is the BioModels database (https://www.ebi.ac.uk/biomodels-main/), which has the advantage of having curators that ensure the models do indeed produce the results that are described in the publications (Le Novère et al. 2006). In the many cases when this is not true, they contact authors and correct the issues. Thus, submission of models to BioModels already ensures a major verification of reproducibility of methods (Chelliah et al. 2015). For data of any kind (which could include programs and models), other repositories could be used: Zenodo (https://zenodo.org/), Dryad (https://datadryad.org//), and FigShare (https://figshare.com/), all of which issue digital object identifiers (DOI) for data sets.
5 Some Recommendations
Hübner et al. (2011) surveyed a sample of some 400 articles reporting research with biomodels and found that only a minority of them properly described the computational research performed in the study such that it could be reproduced. Thus, it seems that despite the readily available tools to promote reproducibility described above, authors and journals are not applying them widely. In order to improve the present situation, the list below includes actions that authors should take to make their biomodel research more reproducible. This short list is partly based on the MIRIAM proposal (Novère et al. 2009). In addition, it is also important to consult recommendations made for computational research in general (Piccolo and Frampton 2016; Sandve et al. 2013; Stodden et al. 2016).
-
1.
Whenever possible use existing peer-reviewed, actively maintained and open-source software to create and analyze models. That way the algorithms and their implementation have already been reviewed, are available to all readers, and most likely will make step 3 trivial. Remember to cite the software and mention the version number used.
-
2.
If the research required specially written software, deposit the code in a public repository or at a minimum include it as supplementary material in the manuscript. This includes code that may require proprietary software (such as MATLAB, Comsol); your programs need to be published!
-
3.
Whenever possible, encode the model in an accepted standard (SBML, CellML, etc.), include it as supplementary material, and submit it to a repository.
-
4.
If it is not possible to encode the model in a standard, then include the full set of equations, parameter values, and initial conditions in the manuscript (at least in a supplement). Make sure that all algorithms used are specified unequivocally, as well as the software used (including version number).
-
5.
Publish the numerical results as data files, either in a repository, or as supplemental files. (Note that the generic repositories mentioned above allow very large data sets to be deposited.)
Because some authors may disregard these recommendations, either by ignorance, for convenience of publishing quickly, or to make their research harder to reproduce, journal editors should enforce them. In particular 2, 4, and 5 are essential when 1 and 3 are not possible. As mentioned earlier, point 3 is the most comprehensive and enables all three types of reproducibility. Item 2 will only fulfill reproducibility of methods. Items 4 and 5, without any of the others, do not guarantee reproducibility but at least describe the model and results in detail.
6 Conclusion
Reproducibility is clearly an important aspect of science for both experimental and computational researches. Research using biomodels should be communicated in ways that make it reproducible too. A few actions can be taken that will greatly facilitate this objective. Scientific journals should not publish non-reproducible research and thus should promote, or even enforce, such actions.
References
Baker M (2016) 1,500 scientists lift the lid on reproducibility. Nature 533(7604):452–454. https://doi.org/10.1038/533452a
Chance B, Garfinkel D, Higgins J, Hess B (1960) Metabolic control mechanisms. V. A solution for the equations representing interaction between glycolysis and respiration in ascites tumor cells. J Biol Chem 235(8):2426–2439
Chelliah V, Juty N, Ajmera I, Ali R, Dumousseau M, Glont M, Hucka M, Jalowicki G, Keating S, Knight-Schrijver V, Lloret-Villas A, Natarajan KN, Pettit JB, Rodriguez N, Schubert M, Wimalaratne SM, Zhao Y, Hermjakob H, Le Novère N, Laibe C (2015) BioModels: ten-year anniversary. Nucleic Acids Res 43:D542–548. https://doi.org/10.1093/nar/gku1181
Claerbout JF, Karrenbach M (1992) Electronic documents give reproducible research a new meaning. Soc Explor Geophys. https://doi.org/10.1190/1.1822162
Dada JO, Spasić I, Paton NW, Mendes P (2010) SBRML: a markup language for associating systems biology data with models. Bioinformatics 26(7):932–938. https://doi.org/10.1093/bioinformatics/btq069
Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22:403–434
Gleeson P, Crook S, Cannon RC, Hines ML, Billings GO, Farinella M, Morse TM, Davison AP, Ray S, Bhalla US, Barnes SR, Dimitrova YD, Silver RA (2010) NeuroML: a language for describing data driven models of neurons and networks with a high degree of biological detail. PLoS Comput Biol 6(6):e1000815. https://doi.org/10.1371/journal.pcbi.1000815
Goodman SN, Fanelli D, Ioannidis JPA (2016) What does research reproducibility mean? Sci Transl Med 8(341):341ps12. https://doi.org/10.1126/scitranslmed.aaf5027
Greenbaum D, Rozowsky J, Stodden V, Gerstein M (2017) Structuring supplemental materials in support of reproducibility. Genome Biol 18:64. https://doi.org/10.1186/s13059-017-1205-3
Guerreiro M (2017) Forking software used in eLife papers to GitHub . https://elifesciences.org/inside-elife/dbcb6949/forking-software-used-in-elife-papers-to-github
Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, Singhal M, Xu L, Mendes P, Kummer U (2006) COPASI: a COmplex PAthway SImulator. Bioinformatics 22(24):3067–3074. https://doi.org/10.1093/bioinformatics/btl485
Hothorn T, Held L, Friede T (2009) Biometrical journal and reproducible research. Biom J 51(4):553–555. https://doi.org/10.1002/bimj.200900154
Hothorn T, Leisch F (2011) Case studies in reproducibility. Brief Bioinform 12(3):288–300. https://doi.org/10.1093/bib/bbq084
Hübner K, Sahle S, Kummer U (2011) Applications and trends in systems biology in biochemistry: systems biology in biochemical research. FEBS J 278(16):2767–2857. https://doi.org/10.1111/j.1742-4658.2011.08217.x
Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novère N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4):524–531. https://doi.org/10.1093/bioinformatics/btg015
Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M (2006) BioModels database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res 34(Database issue):D689–91 10.1093/nar/gkj092
Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, Crampin EJ, Halstead M, Klipp E, Mendes P, Nielsen P, Sauro H, Shapiro B, Snoep JL, Spence HD, Wanner BL (2005) Minimum information requested in the annotation of biochemical models (MIRIAM). Natute Biotechnol 23(12):1509–15. https://doi.org/10.1038/nbt1156
Le Novère N, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Villéger A, Boyd SE, Calzone L, Courtot M, Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A, Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL, Kohn K, Kitano H (2009) The systems biology graphical notation. Nat Biotechnol 27(8):735–741. https://doi.org/10.1038/nbt.1558
Lloyd CM, Halstead MD, Nielsen PF (2004) CellML: its future, present and past. Prog Biophys Mol Biol 85(2–3):433–450. https://doi.org/10.1016/j.pbiomolbio.2004.01.004
Loew L, Beckett D, Egelman EH, Scarlata S (2015) Reproducibility of research in biophysics. Biophys J 108(7):E1. https://doi.org/10.1016/j.bpj.2015.03.002
Maddox J, Randi J, Stewart WW (1988) High-dilution experiments a delusion. Nature 334(6180):287–290. https://doi.org/10.1038/334287a0
Mesirov JP (2010) Accessible reproducible research. Science 327(5964):415–416. https://doi.org/10.1126/science.1179653
Moraru I, Morgan F, Li Y, Loew L, Schaff J, Lakshminarayana A, Slepchenko B, Gao F, Blinov M (2008) Virtual cell modelling and simulation software environment. IET Syst Biol 2(5):352–362. https://doi.org/10.1049/iet-syb:20080102
Pahle J, Challenger JD, Mendes P, McKane AJ (2012) Biochemical fluctuations, optimisation and the linear noise approximation. BMC Syst Biol 6(1):86. https://doi.org/10.1186/1752-0509-6-86
Peng RD (2009) Reproducible research and biostatistics. Biostatistics 10(3):405–408. https://doi.org/10.1093/biostatistics/kxp014
Peng RD (2011) Reproducible research in computational science. Science 334(6060):1226–1227. https://doi.org/10.1126/science.1213847
Piccolo SR, Frampton MB (2016) Tools and techniques for computational reproducibility. GigaScience. https://doi.org/10.1186/s13742-016-0135-4
Plesser HE (2018) Reproducibility vs. replicability: a brief history of a confused terminology. Front Neuroinform 11:76. https://doi.org/10.3389/fninf.2017.00076
Popper K (1959) The logic of scientific discovery. Hutchinson, London
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten simple rules for reproducible computational research. PLoS Comput Biol 9(10):e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, Heroux MA, Ioannidis JPA, Taufer M (2016) Enhancing reproducibility for computational methods. Science 354(6317):1240–1241. https://doi.org/10.1126/science.aah6168
Stodden V, Seiler J, Ma Z (2018) An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci 115(11):2584–2589. https://doi.org/10.1073/pnas.1708290115
Swat M, Moodie S, Wimalaratne S, Kristensen N, Lavielle M, Mari A, Magni P, Smith M, Bizzotto R, Pasotti L, Mezzalana E, Comets E, Sarr C, Terranova N, Blaudez E, Chan P, Chard J, Chatel K, Chenel M, Edwards D, Franklin C, Giorgino T, Glont M, Girard P, Grenon P, Harling K, Hooker A, Kaye R, Keizer R, Kloft C, Kok J, Kokash N, Laibe C, Laveille C, Lestini G, Mentré F, Munafo A, Nordgren R, Nyberg H, Parra-Guillen Z, Plan E, Ribba B, Smith G, Trocóniz I, Yvon F, Milligan P, Harnisch L, Karlsson M, Hermjakob H, Le Novère N (2015) Pharmacometrics markup language (PharmML): opening new perspectives for model exchange in drug development: PharmML - pharmacometrics markup language. CPT: Pharmacometr Syst Pharmacol 4(6):316–319. https://doi.org/10.1002/psp4.57
Waltemath D, Adams R, Beard DA, Bergmann FT, Bhalla US, Britten R, Chelliah V, Cooling MT, Cooper J, Crampin EJ, Garny A, Hoops S, Hucka M, Hunter P, Klipp E, Laibe C, Miller AK, Moraru I, Nickerson D, Nielsen P, Nikolski M, Sahle S, Sauro HM, Schmidt H, Snoep JL, Tolle D, Wolkenhauer O, Le Novère N (2011) Minimum information about a simulation experiment (MIASE). PLoS Comput Biol 7(4):e1001122. https://doi.org/10.1371/journal.pcbi.1001122
Waltemath D, Adams R, Bergmann FT, Hucka M, Kolpakov F, Miller AK, Moraru II, Nickerson D, Sahle S, Snoep JL, Le Novère N (2011) Reproducible computational biology experiments with SED-ML: the simulation experiment description markup language. BMC Syst Biol 5(1):198. https://doi.org/10.1186/1752-0509-5-198
Funding
Funding was provided by National Institute of General Medical Sciences (Grant No. GM080219).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mendes, P. Reproducible Research Using Biomodels. Bull Math Biol 80, 3081–3087 (2018). https://doi.org/10.1007/s11538-018-0498-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-018-0498-z