Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Theory-Based Simulation

A salient aspect of computational simulation, and the one which has attracted the most substantial philosophical interest so far, is its ability to extend the power and reach of theories in modern science beyond what could be achieved by pencil and paper alone. Work on simulations has concentrated on simulations built from established background theories or theoretical models and the relations between these simulations and theory. Examples have been sourced mainly from the physical sciences, including simulations in astrophysics, fluid dynamics, nanophysics, climate science and meteorology. Winsberg has been foremost in studying theory-driven forms of simulation and promoting the importance of philosophical investigation of it by arguing that such simulations set a new agenda for philosophy of science [5.1, 5.2, 5.3, 5.4, 5.5]. He uses the case of simulation to challenge the longstanding focus of philosophy of science on theories, particularly on how they are justified [5.1, 5.3, 5.5]. Simulations, he argues, cannot simply be understood as novel ways to test theories. They are in fact rarely used to help justify theories, rather simulations apply existing theories in order to explore, explain and understand real and possible phenomena, or make predictions about how such phenomena will evolve in time. Simulations open up a whole new set of philosophical issues concerning the practices and reliability of much modern science.

Winsberg’s analysis of theory-based simulation shares much with Cartwright’s [5.6] and Morgan and Morrison’s [5.7] challenges to the role of theories. Like them, he starts by strongly disputing the presupposition that simulations are somehow deductive derivations from theory. Simulations are applied principally in the physical sciences when the equations generated from a theory to represent a particular phenomenon are not analytically solvable. The path from a theory to a simulation requires processes of computerization, which transform equations into tractable computable structures by relying on practices of discretization and idealization [5.8]. These practices employ specific transformations and simplifications in combination with those used to make tractable the application of theoretical equations to a specific phenomenon such as boundary conditions and symmetry assumptions. As such simulations are, according to Winsberg [5.1], better construed as particular articulations of a theory rather than derivations from theory. They make use of theoretical information and the credibility, explanatory scope and depth, of well-established theories, to provide warrant to simulations of particular phenomena. Inferences drawn by computational simulations have several features in this regard; they are downward, motley and autonomous [5.9]. Inferences are downward because they move from theory to the real world (rather than from the real world to theory). They are motley because they depend not just on theory but on a large range of extra-theoretical techniques and resources in order to derive inferences, such as approximation and simplification techniques, numerical methods, algorithmic methods, computer languages and hardware, and much trial and error. Finally, simulations are autonomous, in the sense of being autonomous from both theory and data. Simulations, according to Winsberg, are principally used to study phenomena where data is sparse and unavailable. These three conditions on inference from simulation require a specific philosophical evaluation of their reliability.

Such evaluation is complicated by the fact that relations between theory and inferences drawn from the simulation model are unclear and difficult to untangle. As Winsberg [5.1, 5.9] suggests it is a complex task to unpack what role theories play in the final result given all these intervening steps. The fact that much validation of simulations is done through matching simulation outputs to the data, muddies the water further (see also [5.10]). A well-matched simulation constructed through a downward, motley and autonomous process from a nonetheless well-established theory raises the question of the extent to which the confirmation afforded to the theory flows down to the simulation [5.2]. For instance, although fitting a certain data set might well be the dominant mode of validation of a simulation model, the model could be considered to hold outside the range of that data because the model applies a well-accepted theory of the phenomenon thought to hold under very general conditions.

There is widespread agreement that untangling the relations between theories and simulations, and the reliability of simulations built from theories will require more in depth investigation of the actual practices scientists use to justify the steps they make when building a simulation model. In the absence of such investigations discussions of justification are limited to considerations about whether a simulation fits the observational data or not. Among other things, this limitation hides from view important issues about the warrant of the various background steps that transform theoretical information into simulations [5.10]. In general, what is required is an epistemology of simulation which can discover rigorous grounds upon which scientists can and do sanction their results, and more properly the role of theory in modern science.

The concern with practices of simulation has opened up a new angle on the older discussion about the structure of theories. Humphreys [5.11] has used the entanglement of theory and simulation in modern scientific practice to reflect more explicitly upon the proper philosophical characterization of the structure of physical theories. Simulations, as with other models, are not logical derivations from theory which is a central, but incorrect, feature of the syntactic view . Humphreys also argues, however, that the now dominant semantic view of theories , which treats theories as nonlinguistic entities, is not adequate either. On the semantic view a syntactical formulation of a theory, and whether different formulations might be solvable or not, is not important for philosophical assessment of relations of representations to the world. Relations of representation are only in fact sensibly held by models not theories. Both Humphreys and Winsberg construe the semantic view as dismissing the role of theories in both normative and descriptive accounts of science, in place of models. But as Humphreys [5.12, p. 620] puts it, “the specific syntactic representation used is often crucial to the solvability of a theory’s equations”, and thus, the solvability of models derived from it. Computational tractability, as well as choices of approximation and simplification techniques, will depend on the particular syntax of a theory. Hence both the semantic and syntactic views are inadequate for describing theory in ways that capture their role in science.

2 Simulation not Driven by Theory

Investigations, such as those by Winsberg and others discussed in the previous section, have illustrated the importance of close attention to scientific practice and discovery when studying simulations. Simulation manifests application-intensive, rather than theoretical, processes of scientific investigation. As Winsberg [5.1] suggests choices about how to model a phenomenon reliably are developed often in the course of the to and fro blood, sweat and tears of the model-building process itself. Abstract armchair points of view, distant from an understanding of the contingent, but also technical and technological nature of these practices and their affordances, will not put philosophers in a position to create relevant normative assessments of good simulation practices. What has thus far been established by the accounts of theory-based simulation is that even in the case where there is an established theory of the phenomena, simulation model-building has a degree of independence from theory and theory-building.

However, though the initial focus on theory-based simulation in the study of simulation is not unsurprising given the historical preference in philosophy of science for treating theory as the principal unit of philosophical investigation, simulations are not just a tool of theory-driven science alone. Pushing philosophical investigation into model-building practices outside the domain of theory-driven science reveals whole new practices of scientific model production using computational simulations that are not in fact theory-based, in the sense of traditional physical sciences. Some of the most compelling and innovative fields in science today, including, for instance, big-data biology , systems biology and neuroscience, and much modeling in the social sciences, are not theory-driven. As Winsberg [5.5] admits (in response to Parker [5.13]), his description of simulation modeling is theory-centric, and neither necessarily applicable to understanding the processes by which simulation models are built in the absence of theory, nor an appropriate framework for assessing the reliability and informativeness of models built that way. This is not to say that characteristics of theory-based simulation are irrelevant to simulations that are not. Both theory and nontheory-based simulations share an independence of theory and there are likely to be similarities between them, but there are also profound differences.

One kind of simulation that is important in this regard is agent-based modeling. Keller [5.14] has labeled much agent-based modeling as modeling from above in the sense that such models are not constructed using a mathematical theory that governs the motions of agents. Agents follow local interactions rules. In many fields in the social sciences and biology differential equations cannot be used to aggregate accurately agent or population behavior, but it is nonetheless possible to hypothesize or observe the structure of individual interactions. An agent-based model can be used to run those interactions over a large population to test whether the local structures can reproduce aggregate behavior [5.15]. As noted by Grüne-Yanoff and Weirich [5.16] agent-based modeling facilitates constructing remarkably complex models within computationally tractable constraints that often go well beyond what is possible with equation-based representations.

Agent-based models provide one exemplar of simulations that are not theory-driven. From an epistemological perspective, these simulations exhibit weak emergence [5.17]. The underlying mechanisms are thoroughly opaque to the users, and the way in which emergent properties come about can simply not be reassembled by studying the simulation processes. This opacity raises questions about the purpose and value of agent-based modeling. What kind of explanation and understanding does an agent-based simulation provide if the multiscale mechanisms produced in a simulation are cognitively inaccessible? Further, how is one to evaluate predictions and explanations from agent-based simulations which, in fields like ecology and economics, commonly simplify very complex interactions in order to create computationally tractable simulations. If a simplistic model captures a known behavior, can we trust its predictions? To address questions such as these we need an epistemology that can evaluate proposed techniques for establishing the robustness of agent-based models . One alternative is to argue that agent-based models require a novel epistemology that is able to rationalize their function as types of fictions rather than as representations [5.18, 5.19]. Another alternative, presented by Grüne-Yanoff and Weirich [5.16], is to argue that agent-based models provide in many cases functional rather than causal explanations of the phenomena they simulate [5.20]. Agent-based model simulations rarely control for all the potential explanatory factors that might be relevant to a given phenomenon, and any choice of particular interaction mechanism is usually thoroughly underdetermined. In practice, all possible mechanisms cannot be explored. But agent-based models can show reliably how particular lower-level capacities behave in certain ways, when modeled by suitably general interactions rules, and can constitute higher-level capacities no matter how multiply realized those interactions might be. Hence, such models, even though greatly simplified, can extract useful information despite a large space of potential explananda.

Nontheory-driven forms of simulation such as agent-based models provide a basis for reflecting more broadly on the role theory plays in the production of simulations, and the warrant a theory brings to simulations based on it. Comparative studies of the kinds of arguments used to justify relying on a simulation should expose the roles well-established theories play. Our investigations of integrative systems biology (GlossaryTerm

ISB

) have revealed that not all equation-based modeling is theory-driven, if theory is construed in terms of theory in the physical sciences. The canonical meaning based on the physical sciences is something like a background body of laws and principles of a domain.

In the case of systems biology , researchers generally do not have access to such theory and in fact the kinds of theory they do make use of have a function different from what is usually meant by theory in fields like physics [5.21]. There are certain canonical theories in systems biology of how to mathematically represent interactions among, for instance, metabolites, in the form of sets of ordinary differential equations. These posit particular canonical mathematical forms for representing a large variety of interactions (see Biochemical Systems Theory [5.22]). In principle, for any particular metabolic network, if all the interactions and reactants are known, the only work for the modeler is to write down the equations for a particular network and calculate the parameters. The mathematics will take care of the rest since the mathematical formulations of interactions are general enough that any potential nonlinear behaviors should be represented if parameters are correctly fixed.

For the most part, however, these canonical frameworks do not provide the basic ontological information from which a representation of a system is ultimately drawn, in the way say that the Navier-Stokes equations of fluid dynamics describe fluids and their component interactions in a particular way. In practice, modelers in systems biology need to assemble that information themselves in the form of pathway diagrams which more or less list the molecules involved and then make their own decisions about how to represent molecular interactions. A canonical framework is better interpreted as a theory of how to approximate and simplify the information that the systems biologist has assembled about a pathway in order to reliably simulate the dominant dynamics of a network given sparse data and complex nonlinear dynamics. Hence, there is no real theory articulation in Winsberg’s terms. Researchers do not articulate a general theory for a particular application. The challenge for systems biologists is to build a higher level or system level representation out of the lower level information they possess. We have found that canonical templates mediate this process by providing a possible structure for gluing together this lower level information in a tractable way [5.21]. These theories do not offer any direct explanatory value by virtue of their use.

Theory can in fact be used not just to describe a body of laws and theoretical principles, but also to describe principles that instruct scientists on how to reliably build models of given classes of phenomena from a background theory. As Peck puts it [5.18, p. 393]:

“ In traditional mathematical modeling, there is a long established research program in which standard methods, such as those used for differential equation modeling, are used to bring about certain ends. Once the variables and parameters and their relationships are chosen for the representation of the model, standard formulations are used to complete the modeling venture.”

If one talks about what physical scientists often start with it is not just the raw theory itself but well-established rules for formulating the theory and applying it with respect to a particular phenomenon. We might refer to this latter sense of theory as a theory of how to apply a background theory to reliably represent a phenomenon. The two senses of theory are exclusive. In the case of the canonical frameworks , what is meant by theory is something closer to this latter rather than former sense.

Additionally, the modelers we have studied are never in a position to rely on these frameworks uncritically and in fact no theory exists that specifies which representations to use that will reliably lead to a good representation in all data situations. In integrative systems biology the variety of data situations are very complex, and the data are often sparse and are rarely adequate for applying a set mathematical framework. This forces researchers in practice into much more intensive and adaptive model-building processes that certainly share much in common with the back and forth processes Winsberg talks about in the context of theory application. But these processes have the added and serious difficulty that the starting points for even composing the mathematical framework out of which a model should be built are open-ended and need to be decided based on thorough investigation of the possibilities with the specific data available.

Canonical frameworks are just an option for modelers and do not drive the model-building process in the way physical theories do. Currently, systems biology generally lacks effective theory of either kind. Modelers have many different choices about how to confront a particular problem that do not necessarily involve picking up a canonical framework or sticking to it. MacLeod and Nersessian [5.21] have documented how the nontheory-derived model-building processes work in these contexts. Models are strategic adaptations to a complex set of constraints system biologists are working under [5.23]. Among these constraints are:

  • Constraints of the biological problem: A model must address the constraints of the biological problem, such as how the redox environment is maintained in a healthy cell. The system involved is often of considerable complexity .

  • Informational/data constraints : There are constraints on the accessibility and availability of experimental data and molecular and system parameters for constructing models.

  • Cost constraints: ISB is data-intensive and relies on data that often go beyond what are collected by molecular biologists in small scale experiments. However, data are very costly to obtain.

  • Collaboration constraints: Constraints on the ability to communicate effectively with experimental collaborators with different backgrounds or in different fields in order to obtain expert advice or new data. Molecular biologists largely do not understand the nature of simulation modeling, do not understand the data needs of modeling, and do not see the cost-benefit of producing the particular data systems biologists ask from them.

  • Time-scale constraints: Different time scales operate with respect to generating molecular experimental data versus computational model testing and construction.

  • Infrastructure constraints: There is little in the way of standardized databases of experimental information or standardized modeling software available for systems biologists to rely upon.

  • Knowledge constraints: Modelers’ lack knowledge of biological systems and experimental methods limits their understanding of what is biologically plausible and what reliable extrapolations can be made from the data sets available.

  • Cognitive constraints: Constraints on the ability to process and manipulate models because of their complexity , and thus constraints on the ability to comprehend biological systems through modeling.

Working with these constraints requires them to be adaptive problem-solvers. Given the complexity of the systems, lack of data, and the ever-present problem of computational tractability, researchers have to experiment with different mathematical formulations, different parameter-fixing algorithms and approximation techniques in highly intensive trial and error processes. They build models in nest-like fashion in which bits of biological information and data and mathematical and computational techniques, get combined to create stable models. These processes transform not only the shape of the solutions, but also the problems, as researchers figure out what actual problem can be solved with the data at hand. Simulation plays a central exploratory role in the process. This point goes further than Lenhard’s idea of an explorative cooperation between experimental simulation and models [5.8]. Simulation in systems biology is not just for experimenting on systems in order to sound out the consequences of a model [5.8, p. 181], but plays a fundamental role in incrementally building the model and learning the relevant known and sometimes unknown features of a system and gaining an understanding of its dynamics. Simulation’s roles as a cognitive resource make the construction of representations of complex systems without a theoretical basis possible (see also [5.24, 5.25]).

Similar conclusions have been drawn by Peck for ecology which shares with systems biology the complexity in its problems and a lack of generalizable theory. As Peck [5.18, p. 393] points out:

“ there are no formal methodological procedures for building these types of models suggesting that constructing an ecological simulation can legitimately be described as an art.”

This situation promotes methodological pluralism and creative methodological exploration by modelers. Modelers in these contexts thus focus our attention on the deeper roles (sometimes called heuristic roles [5.5]) that simulation plays in the ability of researchers to explore potential solutions in order to solve complex problems.

These roles have added epistemological importance when it is realized that the downward character of simulation can be fact reversed in both senses we have mentioned above. This is a potentially significant difference between cases of theory and nontheory-driven simulation . Consider again systems biology . Firstly, the methodological exploration we witness amongst the researchers we have studied can be rationalized as precisely an attempt by the field to establish a good theory of how to build models of biological systems that work well given a variety of data situations. Since the complexities of these systems and computational constraints make this difficult to know at the outset, the field needs its freedom to explore the possibilities. Lab directors do encourage exploration, and part of the reason they do is to try to glean which practices work well and which do not given a lack of knowledge of what will work well for a given problem.

Secondly, systems biology aspires to a theory of biological systems which will detail general system-level characteristics of biological systems but also the design principles underlying biological networks [5.26, 5.27]. What is interesting about this theory, if it does emerge, is that it will in fact be theory generated by simulation rather than the other way around. Simulation makes possible the exploration of quite complex systems for generalities that can form the basis of a theory of systems biology. As such the use of simulations can also be upwards, not just downwards, to perhaps an unprecedented extent. Upward uses of simulation requires analysis that appears to fit better with more traditional philosophical analysis of how theories are in fact justified, only in this case robust simulation models will possibly be the more significant source of evidence rather than traditional experiment and observation. How this affects the nature and reliability of our inferences to theory, and what kind of resemblance such theory might have to theory in physics, is something that will need investigation. Thus, further exploration of nontheory-driven modeling practices stand to provide a rich ground for investigation of novel practices that are emerging with simulation, but also for exploring the roles and meanings of theory.

3 What is Philosophically Novel About Simulation?

The question of whether or not simulation introduces new issues into the philosophy of science has emerged as a substantial debate in discussions of computational simulation. Winsberg [5.1, 5.3, 5.4, 5.5] and Humphreys [5.11, 5.12] are the major proponents of the view that simulation requires its own epistemology. Winsberg, for instance, takes the view that simulations exhibit “distinct epistemological characteristics …novel to the philosophy of science” [5.9, p. 443]. Winsberg and Humphreys make this assertion on the basis of the points we outlined in Sec. 5.2 ; namely, 1) the traditional limited concern of philosophy of science with the justification of theory, and 2) the relative autonomy of simulations and simulation-building from the theory. The steps involved in generating simulations, such as applying approximation methods designed to generate computational tractability, are novel to science. These steps do not gain their legitimacy from a theory but are “autonomously sanctioned” [5.1, p. 837]. Winsberg argues, for instance, that while idealization and approximation methods have been discussed in the literature it has mostly been from a representational perspective in terms of how idealized and approximate models represent or resemble the world and in turn justify the theories on which they are based. But since simulations are often employed where data are sparse, they cannot usually be justified by being compared with the world alone. Simulations must be assessed according to the reliability of the processes used to construct them, and these often distinct and novel techniques require separate philosophical evaluation. Mainstream philosophy of science with its focus on theoretical justification does not have the conceptual resources for accounting for applications using computational methods. Even where theory is concerned, both Humphreys and Winsberg maintain that neither of the established semantic and syntactic conception of theories, conceptions which focus on justification and representation, can account for how theories are applied or justified in simulation modeling.

However, Frigg and Reiss [5.28] have countered that these claims were overblown and in fact simulation raises no new questions or problems that are specific to simulation alone. Part of the disagreement might simply come down to whether one construes philosophy of science narrowly or broadly by limiting philosophical questions to in-principle and normative issues, while avoiding practical methodological ones. Another part of the disagreement is over how one construes new issues or new questions for philosophy, since certainly at some level the basic philosophical questions about how representations represent and what makes them reliably do so, are still the same questions.

To some extent, part of the debate might be construed as a disagreement over the relevance of contexts of discovery to philosophy of science. Classically contexts of discovery, the scientific contexts in which model-building takes place, are considered irrelevant to normative philosophical assessments of whether those models are justified or not. Winsberg [5.3] and Humphreys [5.12] seem willing to assert that one of the lessons for philosophy of science from simulation is that practical constraints on scientific discovery matter for constructing relevant normative principles – both in terms of evaluating current practice, which in the case of simulation-building is driven by all kinds of practical constraints, and in terms of normatively directing practice sensitively within those constraints .

Part of the motivation for using the discovery/justification distinction to define philosophical interest and relevance is the belief that there is a clear distinction between the two contexts. Arguably Frigg and Reiss are reinforcing the idea of a clear distinction by relying on widespread presupposition that validation and verification are distinct independent processes [5.4]. Validation is the process of establishing that a simulation is a good representation, a quintessential concept of justification. Verification is the process of ensuring that a computational simulation adequately captures the equations from which it is constructed. Verification, according to Frigg and Reiss, represents the only novel aspects of modeling that simulation introduces. Yet it is a purely mathematical exercise that is of no relevance to questions of validation . As such, simulations involve no new issues of justification beyond those of ordinary models. Winsberg [5.3, 5.4], however, counters that there is, in practice, no clear division between processes of verification and validation . The equations chosen to represent a system are not simply selected on the basis of how valid they are, but also on the basis of decisions about computational tractability. Much of what validates a representation in practice occurs at the end stage, after all the necessary techniques of numerical approximation and discretization have been applied, by comparing the results of simulations with the data. As such, [5.5]:

“ If we want to understand why simulation results are taken to be credible, we have to look at the epistemology of simulation as an integrated whole, not as clearly divided into verification and validation  – each of which would look inadequate to the task.”

Hence what would otherwise seem to be distinct discovery and justification processes are in the context computational simulation interwoven.

Frigg and Reiss are right at some level that simulations do not change basic epistemological questions connected to the justification of models. They are also right that Winsberg in his downward, motley and autonomous description of simulation, does not reveal any fundamentally new observations on model-building that have not already been identified as issues by philosophers discussing traditional modeling. However, what appears to be really new in the case of simulation is: 1) the complexity of the philosophical problems of representation and reliability, and 2) the different methodological and epistemological strategies that have become available to modelers as a result of simulation.

Winsberg, in reply to Frigg and Reiss, has clarified what he thinks as novel about theory-based simulation as the simultaneous confluence of downward, motley and autonomous features of model-building [5.4]. It is the reliability and validity of the complex modeling processes instantiated by these three features that must be accounted for by an epistemology of simulation, and no current philosophical approaches are adequate to do so, particularly not those within traditional philosophical boundaries of analysis.

As a first step in helping with this task of assessing reliability and validity of simulation, philosophers such as Winsberg [5.29] have drawn lessons from comparison with experimentation , which they argue shares much with simulation in both function (enabling, for instance, in silico experiments) and also in terms of how the reliability of simulations is generated. Scientific researchers try to control for error in their simulations, and fix parameters, in ways that seem analogous to how experimenters calibrate their devices. Simulations build up credibility over long time scales and may have lives of their own independent of developments in other parts of science. These observations suggest a potentially rich analogy between simulations and Hacking’s account of experimentation [5.29]. In a normative step, based on these links, Parker [5.10] has suggested that in fact Mayo’s [5.30] rigorous error-statistical approach for experimentation should be an appropriate starting point for more thorough evaluation of the results of simulations. Simulations need to be evaluated by the degree to which they avoid false positives when it comes to testing hypotheses by successfully controlling for potential sources of error that creep in during the simulation process. At the same time a rather vigorous debate has emerged concerning the clarification of the precise epistemological dissimilarities or disanalogies between simulation and traditional experimentation (see for instance [5.31, 5.32, 5.33, 5.34, 5.35, 5.36]). This question is in itself of independent philosophical interest for assessing the benefits and value of each as alternatives, but should also help define the limits of the relevance of experimentation as a model for understanding and assessing simulation practices.

From our perspective, however, the new methodological and epistemological strategies that modelers are introducing in order to construct and guarantee the reliability of simulation models could prove to be the most interesting and novel aspect of simulation with which philosophers will have to grapple. Indeed, while much attention has focused on the contrasts and similarities between simulations, experiments and simulation experiments, no one has called attention to the fact that real-world experiments and simulations are also being used in concert to enhance the ability of researchers to handle uncertain complex systems. One of the labs we have studied conducts bimodal modeling, where the modelers conduct their own experiments in the service of building their models. We have analyzed the case of one modeler’s behavior in which model-building, simulation and experimentation were tightly interwoven [5.37]. She used a conjunction of experiment and simulation to triangulate on errors and uncertainties in her model, thus demonstrating that the two can be combined in practice in sophisticated ways. Her model-building would not have been possible without the affordances of both simulation and her ability to perform experimentation precisely adapted to test questions about the model as she was in the process of formulating it. Simulation and experiment closely coupled in this fashion offers the possibility of extending the capacity to produce reliable models of complex phenomena.

Bimodal modeling is relatively easy to characterize epistemologically since experimentation is used to validate and check the simulations as the model is being constructed. Simulations are not relied on independent of experimental verification. Often, however, experimental or any kind of observational data are hard to come by for practical or theoretical reasons. More philosophically challenging will be to evaluate the new epistemological strategies researchers are in fact developing for drawing inferences in these often deeply uncertain and complex contexts with the aid of computation. Parker [5.38, 5.39], for instance, identifies the practice in climate science and meteorology of ensemble modeling. No theory of model-building exists that tells climate and weather modelers how to go from physical theory to reliable models. Different formulations using different initial conditions, models structures and different parameterizations of those models that fit the observational data can be developed from the physical theory. In this situation modelers average over results from large collections of models, using different weighting schemas, and argue for the validity of these results on the basis that these models collectively represent the possibility space. However, considerable philosophical questions emerge as to the underlying justifiability of these ensemble practices and the probability weightings being relied upon. Background theory can provide little guidance in this context and in the case of climate modeling there is little chance for predictively testing performance. Further, the robustness of particular ensemble choices is often very low and justifications for picking out particular ensembles are rarely carefully formulated.

The ability to generate and compare large numbers of complex models in this way is a development of modern computational power. In our studies we have also come across novel argumentation, particularly connected with parameter-fixing [5.40]. Because the parameter spaces these modelers have to deal with are so complex, there is almost no chance of getting a best fit solution. Instead modelers produce multiple models often using Monte Carlo techniques that converge on similar behavior and output. These models have different parameterizations and ultimately represent the underlying mechanisms of the systems differently. However, modelers can nonetheless make specific arguments about network structure and dynamic relationships among specific variables. There is not usually any well-established theory that licenses these arguments. The fact that the models converge on the same relevant results is motivation for inferring that these models are right at least about those aspects of the system for which they are designed to account. Unfortunately, because access to real-world experimentation is quite difficult, it is hard to judge how reliable this technique is in producing robust models. What is novel about this kind of strategy is that it implicitly treats parameter-fixing as an opportunity, not just a problem, for modelers. If instead of trying to capture the dynamics of whole systems modelers just fix their goals on capturing robust properties and relations of a system, the potential of finding results that work within these constraints in large parameter-spaces increases, and from the multiple models obtained modelers can pare down to those that converge. The more complex problem thus seems to allow a pathway for solving a simpler one. Nonetheless, whether we should accept these kinds of strategies as reliable and the models produced as robust remains the fundamental question, and an overarching question for the field itself. It is a reasonable reaction to suspect that something important is being given up in the process, which will affect how well scientists can assess the reliability and importance of the models they produce. Whether the power computational processes can adequately compensate for the potential distortions or errors introduced is one of the most critical and novel epistemological questions for philosophy today.

The kinds of epistemological innovations we have been considering raise deeper questions about the purposes of simulation, particularly in terms of traditional epistemic categories like understanding, explanation and so on. Of course at one extreme some simulations of the purely data-driven kind is purely phenomenological. Theory plays no role in its generation, and is not sought as its outcome. However in other cases some form of understanding at least is sought. In many cases though, where theory might be thought the essential agent of understanding, the complexity of the equations and resulting complexity of the computational processes that instantiate them, simply block any way of decomposing the theory or theoretical model in order to understand how the theory might explain a phenomena and thus assess the accuracy and plausibility of the underlying mechanisms it might prescribe. Humphreys labels this epistemic opacity  [5.11]. Lenhard [5.41] in turn identifies a form of pragmatic understanding that can replace theoretical understanding when a simulation model is epistemically opaque. This form of understanding is pragmatic in the sense of being an understanding of how to control and manipulate phenomena, rather explain them using background theoretical principles and laws. Settling for this form of understanding is a choice made by researchers in order to handle more complex problems and systems using simulations. But it is a novel one in the context of physics and chemistry. In systems biology we recognize something similar [5.40]. Researchers give up accurate mechanistic understanding of their systems for more pragmatic goals of gaining network control, at least over specific variables. To do so they use simplification and parameter-fitting techniques that obscure the extent to which their models capture the underlying mechanisms. Mechanistic explanation is thus given up, for some weaker form of understanding.

Finally, computational modeling and simulation in the situations we have been considering in this section are driving a profound shift in the nature and level of human cognitive engagement in scientific production processes and their outputs [5.12, 5.24, 5.25, 5.42, 5.43]. So much of philosophy of science has been based on intuitive notions of human cognitive abilities. Our concepts of explanation and understanding are constructed implicitly on the basis of what we can grasp as humans. With simulation and big-data science those kinds of characterizations may no longer be accurate or relevant [5.44].

4 Computational Simulation and Human Cognition

It is on this last point that we turn to consider the ways in which human cognitive processes are implicated in processes of simulation model-building. Computational science, of the nonbig data or nonmachine learning kind which we have focused on here, is as Humphrey’s calls it, a “hybrid scenario” as opposed to an “automated scenario” [5.12, p. 616]. In his words:

“ This distinction is important because in the hybrid scenario, one cannot completely abstract from human cognitive abilities when dealing with representational and computational issues…. We are now faced with a problem, which we can call the anthropocentric predicament, of how we, as humans, can understand and evaluate computationally-based scientific methods that transcend our own abilities.”

Unlike machine-learning contexts, computational modeling is in many cases a practice of using computation to extend traditional modeling practices and our own capabilities to draw insight out of low-data contexts and complex systems for which theory provides at best a limited guide. In this way cognitive capacities are often heavily involved. The hybrid nature of computational science thus motivates the need for understanding how human agents cognitively engage with and control opaque computational processes, and in turn draw information out of them. Evaluating these processes – their productiveness and reliability – requires in the first step having some understanding of them. As we will see, although computational calculation processes are beyond our abilities, at least in the case of systems biology the use of computation by modelers is often far more integrated with their own cognitive processes and understanding, and thus far more under their control, than we might think.

As we have seen there are several lines of philosophical research on computational simulation that underscore it is through the processes of model-building – taken to comprise the incremental and interwoven processes of constructing the model and investigating its dynamics through simulation – that the modeler comes to develop at least a pragmatic understanding of the phenomena under investigation. Complex systems, such as investigated in systems biology , present perhaps the extreme case in which these practices are the primary means through which modelers, mostly nonbiologists, develop understanding of the systems. In our investigations, modelers called the building and running of their models under various conditions getting a feel for the model, which enables them to get a feel for the dynamics of the system.

In our investigations we have witnessed that modelers (mainly engineers) with little understanding of biology have been able to provide novel insights and highly significant predictions, later confirmed by biological collaborators, for the systems they are investigating through simulation. How is it possible that engineers with little to no biological training can be making significant biological discoveries? A related question concerns how complete novices are making scientific discoveries through simulations crowdsourced by means of video games such as Foldit and EteRNA, which appear to enable nonscientists to quickly build accurate/veridical structures representing molecular entities they had no prior knowledge of [5.45, 5.46]. Nersessian and Chadrasekharan, individually and together [5.24, 5.25, 5.42, 5.47, 5.48, 5.49], have argued that the answer to this question lies in understanding how computational simulation enhances human cognition in discovery processes. Because of the visual and manipulative nature of the crowdsourcing cases, the answer points in the direction of the coupling of the human sensorimotor systems with simulation models. These crowdsourcing models re-represent conceptual knowledge developed by the scientific community (e. g., structure of proteins) as computational representations with a control interface that can be manipulated through the gamer’s actions. The interface enables these novices to build new representations drawing on tacit/implicit sensorimotor processes. Although the use of crowdsourcing simulations in scientific problem solving is new, the human sensorimotor system has been used explicitly to detect patterns, especially in dynamic data generated by computational models, since the dawn of computational modeling. Entire disciplines and methods have been built using visualized patterns on computer screens. Complexity theory [5.50, 5.51], artificial life [5.52, 5.53] and computational chemistry [5.54, 5.55] provide a few exemplars where significant discoveries have been made.

Turning back now to the computational simulations used by scientists that we have been discussing, all of the above suggests that the model-building processes facilitate a close coupling between the model and the researcher’s mental modeling processes even in the absence of a dynamic visualization. The building process manipulates procedural and declarative knowledge in the imagination and in the representation, creating a coupled cognitive system of model and modeler [5.25, 5.42, 5.43, 5.48, 5.56, 5.57]. This coupling can lead to explicit understanding of the dynamics of the system under investigation. The notion of a coupled cognitive system is best understood in terms of the framework of distributed cognition [5.58, 5.59], which was developed to study cognitive processes in complex task environments, particularly where external representations and other cognitive artifacts and, possibly, groups of people, accomplish the task. The primary unit of analysis is the socio-technical system that generates, manipulates and propagates representations (internal and external to people). Research leading to the formation of the distributed cognition framework has focused largely on the use of existing representational artifacts and less so on the building/creation of the artifacts. The central metaphor is that of the human offloading complex cognitive processes such as memory to the artifact, which, for example, in the canonical exemplar of the speed bug that marks critical airspeeds for a particular flight, replaces complex cognitive operations with a perceptual operation and provides a publically available representation that is shared between pilot and co-pilot.

In the research cited above, we have been arguing that offloading is not the right metaphor for understanding the cognitive enhancements provided through the building of novel computational representations. Rather, the metaphor should be that of coupling between internal and external representations. Delving into the modifications needed of the distributed cognition framework to accommodate the notion of a coupled cognitive system would take use too far afield in this review (but see [5.25]). Instead, we will flesh out the notion a bit by noting some of the ways in which building and using simulation models enhance human cognitive capabilities and, in particular, extend the capability of the imagination system for simulative model-based reasoning.

A central, but yet not well-researched premise of distributed cognition is, as Hutchins has stated succinctly, that “humans create cognitive powers by creating the environments in which they exercise those powers” [5.58, p. 169]. Since building modeling-environments for problem solving is a major component of scientific research [5.49], scientific practices provide an especially good locus for examining the human capability to extend and create cognitive powers. In the case of simulation model-building , the key question is: What are the cognitive changes involved in building a simulation model and how do these lead to discoveries? The key cognitive change is that over the course of many iterations of model-construction and simulation, the model gradually becomes coupled with the modeler’s imagination system (mental model simulation), which enables the modeler to explore different scenarios. The coupling allows what if questions in the mind of the modeler to be turned into detailed explorations of the system, which would not be possible in the mind alone. The computational model enables this exploration because as it is incrementally built using many data sets, the model’s behavior, in the systems biology case, for instance, comes to parallel the dynamics of the pathway. Each replication of experimental results adds complexity to the model and the process continues until the model is judged to fit all available data well. This judgment is complex, as it is based on a large number of iterations where a range of factors such as sensitivity, stability, consistency, computational complexity and so forth are explored. As the model gains complexity it starts to reveal or expose many details of the system’s behavior enabling the modeler to interrogate the model in ways that are not possible in the mind alone (thought experimenting) or in real-world experiments. It makes evident many details of the system’s behavior that the modeler could not have imagined alone because of the fine grain and complexity of the details.

The parallel between computation simulation experimenting and thought experimenting is one philosophers have commented on, but the current framing of the discussion primarily centers on the issue of interpreting simulations and whether computational simulations should be construed as opaque thought experiments [5.60, 5.61]. Di Paolo et al. [5.60] have argued that computational models are more opaque than thought experiments , and as such, require more systematic enquiry through probing of the model’s behavior. In a similar vein, Lenhard [5.61] has claimed that thought experiments are more lucid than computational models, though it is left unclear what is meant by lucid in this context, particularly given the extensive discussions around what specific thought experiments actually demonstrate. In the context of the discussion of the relation of thought experimenting and computational simulation , we have argued that the discussion should be shifted from issues of interpretation to a process-oriented analysis of modeling [5.47]. Nersessian [5.62] casts thought experimenting as a form of simulative model-based reasoning, the cognitive basis of which is the human capacity for mental modeling . Thought experiments (conceptual models), physical models [5.63] and computational models [5.47, 5.48] form a spectrum of simulative model-based reasoning in that all these types of modeling generate and test counterfactual situations that are difficult (if not impossible) to implement in the real world. Both thought experiments and computational models support simulation of counterfactual situations, however, while thought experiments are built using concrete elements, computational models are built using variables. Simulating counterfactual scenarios beyond the specific one constructed in the thought experiment is difficult and requires complex cognitive transformations to move away from the concrete case to the abstract, generic case. On the other hand, computational simulation constructs the abstract, generic case from the outset. Since computational models are made entirely of variables, they naturally support thinking about parameter spaces, possible variations to the design seen in nature, and why this variation occurs rather than the many others that are possible.

Thought experiments are a product of a resource environment in science where the only tools available were writing implements, paper (blackboards, etc.) and the brain. Computational models create cognitive enhancements that go well beyond those resources and enable scientists to study the complex, dynamic and nonlinear behaviors of the phenomena that are the focus of contemporary science.

Returning to the nature of the cognitive enhancements created, the coupling of the computational model with the modeler’s imagination system significantly enhances the researcher’s natural capacity for simulative model-based reasoning, particularly in the following ways:

  • It allows running many more simulations, with many variables at gradients not perceivable or manipulable by the mind, which can be compared and contrasted.

  • It allows testing what-if scenarios with changes among many variables that would be impossible to do in the mind.

  • It allows stopping the simulation at various points and checking and tracking its states. If some desirable effect is seen, variables can be tweaked in process to get that effect consistently.

  • It allows taking the system apart as modules, simulating them, and putting them together in different combinations.

  • It allows changing the time in which intermediate processes kick in.

These complex manipulations expose the modeler to system-level behaviors that are not possible to examine in either thought alone or in real-world experimentation. The processes involved in building the distributed model-based reasoning system comprising simulation model and modeler enhance several cognitive abilities. Here we will conclude by considering three (for a fuller discussion see [5.25]). First, the model-building process brings together a range of experimental data. Given Internet search engines and online data bases, current models synthesize more data than even before and create a synthesis that exists nowhere in the literature and would not be possible for modelers or biologists to produce on their own. In effect, the model becomes a running literature review. Thus, modeling enhances the synthesizing and integrating capabilities of the modeler, which is an important part of the answer as to how a modeler with scant biological knowledge can make important discoveries. Second, an important cognitive effect of the model-building is to enhance the modeler’s powers of abstraction. Most significantly, through the gradual process of thousands of runs of simulations and analyses of system dynamics for these, the modeler gains an external, global view of the system as a whole. Such a global view would not be possible to develop just from mental simulation, especially since the interactions among elements are complex and difficult to keep track of separately. The system view, together with the detailed understanding of the dynamics, provides the modeler with an intuitive sense (a feeling for the model) of the biological mechanisms that enables her to extend the pathway structure in a constrained fashion to accommodate experimental data that could not be accounted for by the current pathway from which the model started. Additionally, this intuitive sense of the mechanism built from interaction with the model helps to explain the success of the crowdsourcing models noted above (see also [5.64]).

Finally, the model enhances the cognitive capacity for counterfactual or possible-worlds thinking. As noted in our discussion of thought experimenting, the model-building process begins by capturing the reactions/interactions using variables. Variables provide a place-holder representation, which when interpreted with combinations of numbers for these variables, can generate model data that parallels the known experimental data. One interesting feature of the place-holder representation is that it provides the modeler with a flexible way of thinking about the reactions, as opposed to the experimentalist who works with only one set of values. Once the model is using the experimental values, the variables can take any set of values, as long as they generate a fit with the experimental data. The modeler is able to think of the real-world values as only one possible scenario, to examine why this scenario is commonly seen in nature, and envision other scenarios that fit. Thinking in variables supports both the objective modelers often have of altering or redesigning a reaction (such as the thickness of lignin in plant wall for biofuels) and the objective of developing generic design patterns and principles. More broadly, the variable representation significantly expands the imagination space of the modeler, enabling counterfactual explorations of possible worlds that far outstrip the potential of thought experimenting alone.

A more microscopic focus like this one on the actual processes by which computational simulation is coupled with the cognitive processes of the modeler begins to help break down some of the mystery and seeming inscrutability surrounding computation conveyed by the idea that computational processes are offloaded automated processes from which inferences are derived. The implications of this research into hybrid nature of simulation modeling are that modelers might often have more control over and insight into their models and their alignment with the phenomena than philosophers have realized. Given the emphasis placed in published scientific literature on fitting the data and predictive success for validating simulations, we might be missing out on the important role that these processes internal to the model-building or discovery context appear to be playing (from a microanalysis of practice) in support of the models constructed. Indeed, the ability of computational modeling to support highly exploratory investigative processes makes it particularly relevant for philosophers to have fine-grained knowledge of model-building processes in order to begin to understand why models work as well as they do and how reliable they can be considered to be.