1 Introduction

Simulation, defined as “a method for using computer software to model the operation of real-world processes, systems, or events” (Davis et al. 2007, p. 481), provides a distinctive methodological approach for studying various phenomena in many disciplines. The use of simulation allows researchers to isolate and vary the potentially large number of parameters of the respective system in a controlled environment, while producing massive amounts of data that, for instance, enable researchers to capture non-linear relations with statistical techniques. Thus, scholars have recently reiterated the potential of simulations to advance the information systems (IS) discipline by providing a novel way to investigate IS phenomena (Burton and Obel 2011; Loos et al. 2013; Spagnoletti et al. 2013; Zhang and Gable 2014).

While simulation-based research has substantially contributed to other disciplines, for instance to the natural sciences or computer science, its current presence and impact in IS research is comparatively low (Zhang and Gable 2014). This can be due to (i) the nature and peculiar character of IS phenomena as well as to (ii) particularities of simulation-based research approaches, both subject to closer investigation. First, the social and the technical aspects of IS phenomena are inextricably intertwined so that boundaries between these two aspects are not clear-cut (Bostrom and Heinen 1977; Lyytinen and Newman 2008). This entails a need to conceptualize IS phenomena as complex socio-technical ensembles (Luna-Reyes et al. 2005; McLeod and Doolin 2012), for which both the social and the technical aspects and their relationships should be considered. Second, compared to theoretical analysis or deduction as well as empirical analysis or induction, simulation is recognized as a distinct third way of doing science (Harrison et al. 2007). Using simulation techniques, analytical reflections can be captured through mathematical models, which provide their own virtual data to overcome the problem of data availability in empirical investigations (Harrison et al. 2007). In such research endeavors, scholars model and translate real-world problems into a virtual (i.e., simulation) world with the aim to derive meaningful insights about the real-world problems. Therefore, the creation of scientific knowledge through simulation-based research requires several epistemic (how knowledge can be justified through simulation results) and methodological (regarding the activity of simulating and its relation to theorizing and experimenting) considerations (Rohrlich 1990; Dowling 1999; Grüne-Yanoff and Weirich 2010). These considerations are decisive in IS research due to the complexity of the underlying social systems as well as uncertainties inherent in incompletely described technical artifacts (Curşeu 2006).

This study first aims to understand how simulation-based research approaches can be applied to complex socio-technical IS phenomena. We adopt a socio-technical perspective to explain IS phenomena and discuss the epistemic inferences that are made in the simulation-based research process (Sargent 2005) of examining IS phenomena. Subsequently, we use the discussed epistemic inferences as a foundation to conduct a systematic literature review on simulation-based research in ISFootnote 1 in order to understand how simulation-based approaches are currently employed in IS research. Finally, the results of both the epistemological implications of simulation-based research and the literature review are discussed and contrasted, allowing us to provide guidance for prospective simulation-based research in IS.

2 Studying IS Phenomena Through Simulation

To understand the challenges of studying IS through simulation-based approaches, it is necessary to discuss the particularities of IS phenomena. In their investigation of the “intellectual core” of the IS discipline, Sidorova et al. (2008) find that contemporary IS research generally includes the social context in which technical artifacts are designed and used. That is, while IS research essentially studies human-made technology-based systems, this analysis requires an understanding of corresponding social systems as well as the interaction between the social and technical systems (Lee 2010; Becker et al. 2015). This is what Orlikowski (1992) calls “duality of technology” – the dialectical interaction between technical artifacts and their social context (Orlikowski and Iacono 2001).

Consequently, our conceptualization of IS phenomena adopts a socio-technical systems perspective to identify and organize the constituent parts of IS (Bostrom and Heinen 1977; Lyytinen and Newman 2008; Wu et al. 2015). This perspective conceptualizes IS as two interrelated subsystems, the technical system and the social system. The technical system is concerned with the processes, tasks, and technologies that are needed to acquire, store, and transform information to outputs, such as products or services. The social system is concerned with the relationships among people and the attributes of these people, such as attitudes, skills, and values. Recent contributions to the socio-technical perspective conceptualize IS as a mutually interactive socio-technical ensemble comprising actors, tasks, technology, and structure, reflecting both technical and social aspects and the relations among them, which are embedded in and influenced by an external environment (Lyytinen and Newman 2008; Gregoriades and Sutcliffe 2008; Wu et al. 2015). Table 1 gives an overview of the constituent components of IS as socio-technical systems.

Table 1 Constituent components of IS as socio-technical systems (Lyytinen and Newman 2008)

What makes socio-technical systems distinct and subject to closer investigation is the overall behavior of such systems, which is dependent on a diverse set of often non-linear and dynamic mechanisms that relate to both the social and technical subsystems (Luna-Reyes et al. 2005; McLeod and Doolin 2012). Lyytinen and Newman (2008) argue that changes in IS, as socio-technical systems, occur due to misalignments among the systems’ constituent socio-technical components (as shown in Table 1). They posit that change is not solely or even mainly incremental and cumulative, but rather is episodic and punctuated. As such, along with evolutionary (first-order) changes to fix the misalignment among components, a socio-technical system experiences revolutionary (second-order) changes (i.e., punctuations) over time by which components of the system are re-configured, and the given system eventually exhibits new, emergent properties. Due to the complexity of socio-technical systems and their non-deterministic behavior, studying such systems through conventional research methods (e.g., case study, survey) is challenging. McKelvey (2002) argues that for such complex systems, “no empirical study or experiment could successfully and completely control all the complexities that might affect the designated parameters” (McKelvey 2002, p. 758). Simulations, however, allow to study complex phenomena in a controlled setting and study the controlled interaction of many parameters in idealized socio-technical models (Curşeu 2006). Consequently, simulation-based research approaches hold a tremendous potential to advance scientific knowledge on complex, longitudinal, and nonlinear IS phenomena (Davis et al. 2007).

2.1 Epistemic Particularities of Simulating Socio-Technical Systems

Realizing the potential of simulation-based research approaches in the IS discipline requires to understand how a simulation relates to real-world IS phenomena (Frank and Troitzsch 2005). Much has been written on the epistemic particularities of simulation (Humphreys 1990; Winsberg 1999; Davis et al. 2007; Grüne-Yanoff and Weirich 2010), and prominent authors have even challenged the need for a distinct simulation epistemology in the philosophy of science (Frigg and Reiss 2008). Due to the breadth of this ongoing debate, we focus our discussion on the aspects that are both fundamental for an analysis of epistemic inferences in simulation-based research and particularly relevant for IS as socio-technical systems. For instance, many studies are concerned with IS artifacts that “do not yet exist but which are not only imaginable but also useful from today’s perspective” (Frank et al. 2014, p. 40). Considering these particularities, it is worthwhile to reflect some of the more general discourses on the epistemic status of simulation in the specific context of socio-technical IS.

A key aspect of simulation-based research is that many simulation techniques require a mechanism-based explanation of the investigated phenomena (Hedström and Ylikoski 2010). At its core, the concept of mechanism-based explanations implies that “proper explanations should detail the cogs and wheels of the causal process through which the outcome to be explained was brought about” (Hedström and Ylikoski 2010, p. 50). Simulations fundamentally rely on such mechanism-based explanations in the form of models – purposefully constructed abstractions that describe the simulation behavior and that, at least partially, aim at representing a real-world system or phenomenon (Becker et al. 2005; Frank et al. 2014). It is important to note that such abstractions are not always simplifications, as researchers are often required to hypothesize and detail hidden, but nevertheless relevant, causal mechanisms in the development of simulation models (Frank 2014).

In the context of simulation-based research, models broadly aim at two kinds of explananda (Hedström and Ylikoski 2010). First, they may focus on empirical facts, for example, by building simulations as targeted IS artifacts that accurately predict or classify real-world phenomena such as web-browsing paths of customers (Kuo et al. 2005) or purchase decisions (Chang et al. 2006; Sun et al. 2008). In this case, the epistemic credibility of the simulation model fundamentally depends on a match between simulation output with observable empirical data, and thus general discussions on epistemology in IS apply (e.g., Frank 2011). Simulation models, however, might also focus on highly stylized theoretical explananda (Hedström and Ylikoski 2010), aimed at a mechanism-based theory development. In this case, simulation models gain epistemic credit from existing theoretical models, and thus do not necessarily closely resemble any particular real-world phenomena (Bichler et al. 2016).

These approaches to establish the epistemic credit of simulations are related to fundamental issues in the philosophy of science. Regarding empirical grounding, the key issue is that of faulty inductive generalization, or inductive fallacy (Johnson 1996): how can one conclude the truth of a general statement, for example in the form of scientific theory, based on a limited set of specific observations (inductive reasoning)? Instead, using a theoretical grounding, one might argue that by detailing the causal processes in the simulation model, the general statement is necessarily true as a logical consequence of a priori true assumptions (deductive reasoning). This, however, raises a new issue: how does a simulation-based result then differ from a logical tautology? How can one create new insights beyond what is already assumed in the definition of the simulation model?

The practical answer to this dilemma is that most simulation-based research employs both inductive and deductive reasoning. In general, simulations are not just logical deductions based on true mathematical principles (Winsberg 2003). Instead, several, often highly-specific, modelling decisions are rather assumptions than scientific truth and some calculations and transformations, for example in artificial neural networks, cannot reasonably be traced by humans (Humphreys 2008; Winsberg 2009; Grüne-Yanoff and Weirich 2010). To counteract the loss of epistemic credit due to such inferences, researchers again compare their simulation results to empirical data.

Philosophically, simulations therefore do not resolve the centuries-old epistemic issues involved with inductive (and potentially fallacious) and deductive (and potentially tautologic or infinitely regressing) reasoning (Johnson 1996; Frigg and Reiss 2008; Gregor and Hovorka 2011). In practice, however, simulations can draw upon, and to a certain extent require, both approaches. Regarding induction, Davis et al. (2007) argue that developing scientific theory through simulation-based approaches requires a suitable “simple theory” (Davis et al. 2007, p. 482, Table 1) as a basis for the assumptions in the simulation model. Such “simple theory” must at least provide the basic concepts and processes that describe a phenomenon (Davis et al. 2007). On the other hand, if a theory already clearly describes all constructs and processes in detail, it may be difficult to extend such theory by simulation, since most results will just be logical consequences of the theory. Regarding deduction, simulation-based research can handle massive volumes of data, thereby avoiding faulty generalizations due to small sample groups. Simulations thus shine in settings where such large datasets are easily available for validation purposes. Furthermore, we reiterate the potential of simulations to study complex, longitudinal, and nonlinear IS phenomena (Davis et al. 2007) by isolating specific parameters and studying their interactions in purposefully constructed abstract models (Curşeu 2006; Frank et al. 2014).

2.2 Epistemic Inferences of Simulation

Following the previous discussion, simulations both deductively gain epistemic credit by using existing scientific theory in their construction and inductively gain epistemic credit by comparing simulation results with empirical data (Winsberg 1999; Sargent 2005). On the one hand, the use of simulation thus entails a careful reflection of real-world problems as well as of existing theoretical understandings of the real-world problems in simulation models. On the other hand, it also requires an accurate evaluation and validation of simulation experiments and a thorough interpretation of the resulted insights’ implications for the given real-world problems.

Therefore, from an epistemological vantage point, each step of a simulation process can be questioned in terms of how appropriate the reflection of a real-world problem in the simulation model is (transferring the existing real-world knowledge to the simulation world), how reliable the simulation experiments are (simulation), and how valid and how meaningful the resulted insights of the simulation to the given real-world problems are (transferring resultant knowledge from the simulation and making sense of this knowledge in the real-world context). To elaborate on these inherent challenges of simulation-based research, we first discuss different simulation techniques, then outline the constituent steps of a simulation-based research process and eventually discuss epistemic inferences in different steps of such a process.

The term simulation, in our adopted definition by Davis et al. (2007), refers to a very diverse class of methods, each with its own epistemic capabilities and restrictions. Hence, we distinguish different simulation techniques in our analysis, namely, analytical simulations, stochastic processes, system dynamics, genetic algorithms, artificial neural networks, and general agent-based simulations, in line with previous literature on simulation-based research in IS (e.g., Davis et al. 2007; Spagnoletti et al. 2013; Zhang and Gable 2014). Analytical simulation refers to simulation models that are directly based on a mathematical description of a system and which focus the use of formal models (e.g., game theory, and auctioning theory). Stochastic processes are similar regarding the formal nature of the underlying mathematical models, but differ in that at the center of such simulations is a model of a stochastic process, for example characterized by a random walk (Ripley 2009). Such simulations are frequently employed to study the consequences of changes to highly stylized socio-economic market models (e.g., Xiao and Dong 2015). System dynamics refers to a specific modelling approach that employs the concept of stacks and flows to simulate the behavior of complex systems over time (Roberts et al. 1994; Sterman 2002). Genetic algorithms refers to an optimization approach with roots in biology that models a system as a set of heterogeneous entities – called candidate solutions – which then iteratively evolve and adapt over time to better match a given fitness function (Whitley 1994; Davis et al. 2007). Similarly, artificial neural network refers to a specific machine learning technique that is commonly employed in simulations as a component that adapts to and learns from input data (Graupe 2013). Finally, we distinguish other, general agent-based models, referring to any computational model that represents the actions and interactions of autonomous agents (Macal and North 2009).

Following Sargent (2005), the simulation model development process comprises distinct steps, starting with the selection of the real-world system to be simulated (problem entity) and ending with the actual implementation of the simulation model on a computer (see the upper part of Fig. 1). Researchers can then employ the implemented simulation model to conduct simulation experiments and use the obtained results in combination with empirical data or extant system theories to create new knowledge (lower part of Fig. 1).

Fig. 1
figure 1

Constituent steps of a simulation-based research process

The term problem entity is used to refer to the real-world system in which the investigated phenomenon of interest is situated. Researchers generally rely on scientific theory (system theories), which describes extant knowledge about the problem entity, to derive a conceptual model as a foundation for the design of the simulation. While this model of Sargent (2005) is quite general, it ensures that it is applicable to a wide range of simulation-based research, making it a useful starting point for our subsequent analysis. Furthermore, it has been widely used (Law 2008; Bratley et al. 2011) and although other simulation process models may use slightly different terminology, they are very similar in their structure (e.g., Davis et al. 2007).

The relation between models and theory is the subject of an ongoing debate in the philosophy of science and in IS (Frigg and Hartmann 2012; Bichler et al. 2016). A common position is that models are independent from theory in their construction and their functioning (Winsberg 1999; Morgan and Morrison 1999; Grüne-Yanoff and Weirich 2010). In the context of simulation-based research this means that scholars are required to interpret and refine scientific theory, which is rarely precise enough to directly translate into computerized simulation models. Along this process, one can distinguish different types of models that are used in simulation-based research (Winsberg 2003; Sargent 2005; Küppers and Lenhard 2005). We use the term conceptual model (Sargent 2005) to refer to the basic conceptual understanding of the problem entity that scholars gain through interpreting respective scientific theory. Winsberg (2003) refers to this as a principle model: “The simulationist begins by choosing a principle model – a model that characterizes the system in terms of both the arrangement of its constituent parts, and the rules of evolution that govern the changes of state that the system undergoes over time” (Winsberg 2003, p. 108).

The term simulation model specification refers to the mathematical model that represents and reflects the conceptual model in the simulation. Since any computer implementation of a simulation necessitates a precise mathematical definition of all employed constructs and relations, researchers have to make simplifications and assumptions to fill the parts of the conceptual model that lack the required mathematical precision. This simplification, which brings about specific assumptions about the simulation model, should account for stylized facts that only include abstract and relevant aspects of the real world in the simulation model (Bichler et al. 2016). Finally, we use the term simulation model to refer to the executable simulation, i.e., the actual implementation on a computer.

Scientific simulations that represent socio-technical systems are not simply calculations, but instead “involve a complex chain of inferences that serve to transform theoretical structures into specific concrete knowledge” (Winsberg 1999, p. 275). This process of knowledge-generation through simulation is described in Fig. 1, following Sargent (2005). We focus our analysis on the links between knowledge of the real-world system (in the form of scientific system theories and empirical data) and the simulation world (steps 1, 5, and 6 in Fig. 1), as well as the links between two intermediate constructs within the simulation world (steps 2, 3, and 4 in Fig. 1). Each of these creates another layer of distance between reality and the obtained simulation results, thus requiring epistemic justification.

Conceptual Modeling (1) Since “any reasonably comprehensive simulation of organizations must be constructed from insights made with regard to how organizations have been observed to operate,” (Kulik and Baker 2008, p. 88) we analyze the construction of conceptual models based on scientific theory. The importance of a theory-informed conceptualization is underlined by Sargent’s choice of including the abstraction of a system theory as the basis of the conceptual model as a fundamental step in the simulation model development process (Sargent 2005). Here, system theory refers to any kind of theory from which the conceptual model can be derived such as IS modeling theories, which are already discussed in IS modeling literature (Bichler et al. 2016). While a unified modeling theory is still missing, depending on the targeted conceptual model, researchers can employ a wide range of theories such as fuzzy set theory or alternative uncertainty theories, including stochastics (Bichler et al. 2016).

Occasionally, IS researchers rely on formal theories to logically derive models from a set of fundamental axioms (Bichler et al. 2016). Consequently, such models are not necessarily based on observations in the real world. In most cases, however, the construction of conceptual models in simulation-based IS research is “an activity that often brings us beyond the original theoretical principles themselves” (Winsberg 2003, p. 118), since conceptual modeling usually involves creativity and intuition (Winsberg 2003; Frank and Troitzsch 2005).

Philosophers of science often use the term autonomous models (Morgan and Morrison 1999), in the sense that models “function as instruments of investigation, [as they] are partially independent of both theories and the world” (Morgan and Morrison 1999, p. 10). This view on partial independence still recognizes that the process of model construction is limited by the concepts and languages employed in a scientific domain (Frank 2011; Loos et al. 2013). Rather, following Winsberg (2003), the view “that models are autonomous or independent of theory is meant to emphasize the fact that there is no algorithm for reading models of theory”, and thus involves creativity and intuition (Winsberg 2003, p. 106). Furthermore, and particularly relevant for IS, researchers often develop models that have no clear, direct empirical grounding (Frank 2011). For example, many studies aim to design IS artifacts that overcome existing practices, therefore necessitating the development of models that, at the time of model development, lack an empirical original (Frank 2011).

On the one hand, the autonomy of models therefore requires researchers to argue that the creative, intuitive modeling choices lead to an accurate description of the investigated phenomenon in the results obtained from the final simulation model (in step 4). On the other hand, it is often this very process that brings about new scientific knowledge, for example through providing additional evidence for or against intuitive refinements of extant theory. Researchers are thus often required to revisit this step after comparing the resulted insights of the simulation model with extant theories or new empirical studies, which makes simulation-based research an iterative process (Sargent 2005).

Specifying (2) The act of specifying concerns the development of a mathematical representation of the conceptual model. Similarly to the previous step, researchers are required to make assumptions in translating a conceptual model to a computational model (Poile and Safayeni 2016), which is a transformation process that generally involves “idealizations, approximations, and even self-conscious falsifications” (Winsberg 2003, p. 108). As implemented simulation models correspond to precise mathematical constructs, a preliminary step in their creation requires that “specific rules replace general laws” (Pias 2011, p. 35). Winsberg (2003) refers to this creative process as ad hoc modeling: “Ad hoc modeling includes such techniques as simplifying assumptions, removal of degrees of freedom, and even substitution of simpler empirical relationships for more complex, but also more theoretically-founded laws” (Winsberg 2003, p. 109). As such, the resultant simulation model often only reflects stylized facts, i.e., interesting patterns in empirical data that focus statistical relations between observable phenomena while abstracting from details (Houy et al. 2015). This endeavor simplifies empirical data and makes specific assumptions about the simulation model explicit. However, due to the imprecisions introduced by ad hoc modeling, the simulation model does not have the full faith and credit of the governing theory’s epistemic credentials and thus requires additional justification and validation. One of the promising ways to ensure the validity of the simulation models is to employ stylized facts based on existing system theories (Houy et al. 2015; Bichler et al. 2016). By doing so, the resultant simulation model comprises both scientific theory and real-world empirical data, essentially mediating between theory and real-world systems (Bichler et al. 2016).

Implementing (3) From an epistemic perspective, the implementation of a specified model on a computer requires a researcher to argue that the simulation model is an error free and correct software implementation of the simulation model specification on the given physical hardware. A formal, algorithmic code verification is often not feasible for complex software (D’Silva et al. 2008; Ashish and Aghav 2013). Instead, researchers usually argue that the process of implementing the simulation model was rigorous, for instance by adhering to commonly recommended processes and software engineering principles.

Experimenting (4) The central epistemic issue when conducting simulation experiments is related to the inability of researchers to analytically trace how the simulation results were obtained. This is often referred to as epistemic opacity: it is simply impossible for humans to follow and understand the millions of calculations that are performed by the computer to obtain the results of simulation experiments (Humphreys 1990; Grüne-Yanoff and Weirich 2010). This is particularly true and even more decisive for simulations that aim to understand emergent phenomena, a common goal in the exploration of socio-technical systems (Kochanowicz et al. 2013). The idea of emergence in socio-technical simulation experiments has been nicely described by Coleman (1994): to understand macro-level associations between observed macro-level socio-technical phenomena, one investigates how the occurrence of a given macro-level phenomenon affects individual elements of the analyzed socio-technical system and how these individual elements interact and influence each other. In consequence, these interactions again aggregate to the observed macro-level phenomena (Coleman 1994; Boero and Squazzoni 2005; Manzo 2007; Hedström and Ylikoski 2010).

As it is not possible for a human to follow all the calculations that model these situational, action-formation, and transformational mechanisms, one has to instead rely on additional techniques, such as graphical visualization, to argue for the relation between simulation design and observed output data from simulation experiments (Trier 2008; Lee et al. 2015). IS researchers may rely on the extant knowledge from related disciplines, such as research on the design of autonomous software agents and multi-agent simulations (Birdsey and Szabo 2014; Doan et al. 2014), to structure their simulation experiments.

Validating and Predicting (5) Validation of simulation results can either be done by comparing them with data obtained from empirical studies (this step, validating and predicting, in Fig. 1) or by evaluating them in the context of already established scientific theory (step 6, validating and hypothesizing, in Fig. 1). Using the first approach, the epistemic credit of simulation results is largely related to the ability of the simulation to reproduce or predict characteristics of the socio-technical phenomenon under investigation (Boero and Squazzoni 2005). Consequently, a large number of validation techniques relies on a direct comparison with real-world observations to argue that simulation results constitute scientific knowledge.

In addition to using empirical data for the validation of simulation results, researchers also develop simulations with the goal to predict future empirical data. Both, testing a simulation’s predictive validity and using a simulation to predict, is complicated through epistemic issues related to the predictive precision of a simulation model. In general, due to abstractions and simplifications in the simulation model development process, the results of simulation experiments do not show a precise one-to-one correspondence with empirical data. Instead, simulation experiments rather aim to create statistical estimates of phenomena or observations that suggest the presence of certain conceptual relations (Küppers and Lenhard 2005). Thus, scholars have argued that in the empirical validation of simulation results “adequacy replaces proof” (Pias 2011, p. 35), meaning that no direct relation to reality is established, but simulation results are instead judged against experience and high-level observations. Similarly, “performance beats theoretical accuracy” (Küppers and Lenhard 2005, p. 6) when using simulations to predict, meaning that simulation models are rather evaluated against their utility than against a perfect correspondence with reality. In consequence, however, this lack of predictive precision requires researchers to be very careful in framing and positioning their results from an epistemic perspective.

Validating and Hypothesizing (6) In addition to relying on empirical observations, validation of simulation results can also be done by comparing them with established scientific theory. To that end, simulation results are contrasted with a theory-informed understanding of the phenomenon of interest, in order to test whether the simulation is able to adequately recreate extant scientific knowledge. Respective validation techniques aim to facilitate this process of data generation and interpretation, for example by describing common types of visualizations or specific parameterization strategies.

Furthermore, simulation results can also be used to advance and refine extant knowledge, for example through evaluating the influence of certain parameterization choices and modeling decisions that were made in simulation development to derive novel hypotheses and propositions in a given theoretical framework (Epstein 2008). Contrary to the traditional epistemic process commonly employed in natural sciences, epistemological inference is “downwards” in simulation-based research, starting with abstract theory and then going to empirical observations (Winsberg 2001). This leads to the issue of equifinality: different initial conditions and different processes can lead to the same final result, leading to a potential problem in deducing novel theory from complex simulations (Epstein 1999; Davis et al. 2007; Harrison et al. 2007; Weinhardt and Vancouver 2012; Poile and Safayeni 2016). How can one argue about theoretical constructs and causal processes based on the observation that “two black boxes are able to reach the same outcome” (Poile and Safayeni 2016, p. 4)? Researchers are therefore usually required to employ multiple validation techniques to not only understand structure in the simulation results, but also how these results were generated (Sargent 2005). Equifinality, and associated concepts and problems, are the subject of ongoing discussions within the IS discipline on emergence in complex socio-technical systems (Lyytinen and Newman 2008; Lee et al. 2015; Prat et al. 2015). Additionally, related research on the verification of multi-agent simulations proposes several techniques and frameworks to deal with issues related to the black box nature of simulation experiments (e.g., Doan et al. 2014; Montali et al. 2014; Aminof et al. 2016; Jamroga et al. 2016).

3 Literature Review Approach

Relying on the preceding discussion of epistemological particularities of simulation-based IS research, we now investigate how IS scholars conduct such studies. To this end, we opt for a structured literature review, following the suggestions of Webster and Watson (2002) and vom Brocke et al. (2015). With this literature review, we specifically aim to not only summarize, but to analyze and critically examine the status quo of simulation-based IS research in the context of ongoing discussions on simulation epistemology in the philosophy of science (Rowe 2014).

3.1 Literature Selection

To make our review of the pertinent literature as comprehensive as possible, we opt for a literature selection procedure that determines a representative set of papers from the large body of related publications (vom Brocke et al. 2015). We therefore adopt the list of 21 top IS journals analyzed by Lowry et al. (2013) as well as conference papers presented at both of the most influential international and European conferences in IS (i.e., ICIS and ECIS). To identify relevant publications, we conducted a search via the ISI Web of Science using the different simulation techniques introduced in Sect. 2.1, as well as the general term Simulation. In addition, we searched in the AIS electronic Library (AISeL) for papers presented at ECIS or ICIS. We used the following search string for the fields abstract, title, and keywords, across all selected journals for publications published up to the year 2016:

“neural network” OR “system dynamic*” OR “NK fitness landscape” OR “genetic algorithm*” OR “cellular automat*” OR “stochastic process*” OR “simulation*”

In total, we retrieved 697 publications in this first step. To select the most relevant and influential papers from this database, we first only included papers that have on average at least one citation per year since their publication. However, as papers published in 2015/2016 are too recent to get a large number of citations, and as papers published in AIS’s basket of top journals are crucial for our review, we included all of them in our review. This results in 255 papers, for which we then read the abstract and introduction sections, to exclude papers that do not develop an own computer simulation model. In this step, we excluded, for example, papers that only discuss the use of simulation in general, only reference other simulation studies, or use simulation to refer to human experiments that do not involve computation. The final result is a database of 175 relevant papers (see Appendix A1; available online via http://springerlink.bibliotecabuap.elogim.com).

3.2 Analysis Framework and Coding Procedure

For coding the selected papers, we developed a comprehensive analysis framework, based on the preceding discussion of epistemic inferences. We thereby focus on the links between knowledge of the real-world system, in the form of scientific system theories or empirical data, and the simulation world. We distinguish three parts to form the analysis framework and to structure the subsequent discussion of results:

Real world to simulation world First, we investigated how researchers went about constructing conceptual models (step 1, modelling in Fig. 1). Since scientific knowledge about the real-world systems is essentially based on scientific theories (Bichler et al. 2016), we in particular analyzed how researchers employed extant system theories in both the design of conceptual simulation models and the interpretation of simulation results (Sargent 2005; Davis et al. 2007; Poile and Safayeni 2016). In coding our results, we relied on the list of 174 theories used in IS research, which are retrieved by Lim et al. (2013). Due to the large number of theories and the large number of papers in our database, we derived regular expressions for each theory (e.g. “Resource dependence theory” was replaced by “resource.?dependenc” to capture instances where “resource-dependency” is written with a hyphen and a “y”), which were then used in combination with text-mining software to color-code relevant parts of the analyzed papers. Then we read the respective papers, to see how theories are used, and only included a theory in our coding if it is explicitly referenced and employed in constructing the simulation model or in evaluating simulation results.

Simulation world Second, we analyzed how researchers translated their conceptual model to a simulation model, relating to the steps specifying (step 2) and implementing (step 3) in Fig. 1. Based on the preceding discussion of IS as socio-technical systems (see Sect. 2), we classified simulation-based research according to the socio-technical system components (i.e., actors, tasks, structure, technology, and environment) that were included in the simulation model, to understand which aspects of the investigated IS were focused in the model and where potential simplifications may take place. We additionally coded papers according to the employed simulation technique (stochastic processes, analytical, system dynamics, genetic algorithms, artificial neural networks, and agent-based), based on the discussion in Sect. 2.1. Coding mainly relied on reading the content of the methodology sections of the selected papers, reaching into other sections whenever the model was discussed in a different part.

Simulation world to real world Third, we studied how the simulation was used to investigate and validate against real-world phenomena, combining the steps experimenting (step 4), validating and hypothesizing (step 5) and validating and predicting (step 6). Since simulations may fulfill a very different purpose in scientific research (Boero and Squazzoni 2005), we coded simulations by their intended use, as this significantly affects the development process of the simulation model (Davis et al. 2007). Following Harrison et al. (2007) and Axelrod (1997), we distinguish the following uses for simulation: prediction (how are model variables related), proof (show that certain system behavior exists), discovery (discover unexpected consequences of interactions), explanation (explain why the system behaves in a certain way), critique (test existing theories), prescription (suggest how to best interact with or within the system), and empirical guidance (derive hypotheses for empirical testing). During coding, we relied on the descriptions given by Harrison et al. (2007, pp 1238–1239) to investigate the introduction, results, and discussion sections of the selected papers. Note that it is possible for a simulation to serve multiple purposes. For example, Wöhner et al. (2015) use a simulation model to first predict the effects of different parametrizations on the behavior of managed wikis, and subsequently employ the obtained data to prescriptively argue how this concept can be used to overcome related issues, such as online harassment and cyberbullying (Wöhner et al. 2015).

Finally, we relied on Sargent (2005) to distinguish different validation techniques that researchers may employ in the process of simulation-based research (see Table 2). For coding, we read the methodology, results, and discussion sections of the selected papers, as well as related appendices.

Table 2 Validation techniques in simulation-based research

To ensure the uniformity of the coding and to avoid ambiguity, the coding scheme was discussed intensively among all authors in a series of five workshops, totaling 11.5 h, to reach a common understanding on each element of the given coding scheme. In the workshops, the authors discussed the underpinning criteria for each of the coding scheme’s elements, relying on the provided arguments in the referenced studies upon which the analysis framework and, consequently, the coding scheme, is built. The latter resulted in the first version of a detailed guideline for coding. In a first step, we then conducted a pilot coding, in which two of the authors coded the same set of papers independently based on the initial coding guideline. We discussed the few disagreements among coders in the pilot coding endeavor and adjusted the coding guideline accordingly. This revised guideline was then used by the first author to code all 175 selected papers, including a re-coding of the pilot papers. Finally, we followed the recommendations of Lombard et al. (2002) and Saldaña (2013) to formally assess the reliability of the coding. To this end, we had two researchers independently code a random, nonoverlapping sample of 10 papers each (20 papers total), thereby reaching the suggested reliability sample size of > 10% (Lombard et al. 2002, p. 601) of the full sample. Appendix A2 provides the details of this intercoder reliability analysis. Overall, we reached 93.54% intercoder agreement across all items, and the additionally calculated indices (i.e., Krippendorf’s α and Cohen’s κ) suggest that the coding is highly reliable for all intents and purposes (Lacy and Riffe 1996; Lombard et al. 2002; Neuendorf 2016).

4 Current State of Simulation-Based Research in IS

Table 3 provides an overview of the selected 175 papers, showing during which period these papers were published and which simulation techniques are employed. Most papers in this database are published in Decision Support Systems (65 papers), followed by Information Systems Research (33 papers) and the Journal of Management Information Systems (18 papers); see Table 7 in Appendix A1 for a detailed overview by outlet.

Table 3 Overview of the use of simulation techniques over time

We focus our discussion on the most interesting insights resulting from a cross-element analysis of prior research, to illustrate how different modelling choices during simulation development are interrelated and how they influence simulation use and validation. The structure follows the introduced elements of the analysis framework in the previous section: we first examine the use of theories as a foundation for simulation model development, then investigate several aspects related to the development of the simulation model itself, and finally describe how IS scholars have used and validated simulation models in their research. Table 4 summarizes our findings about simulation-based research in IS.

Table 4 Summary of the literature review findings

Real world to simulation world Our analysis of simulation-based IS research reveals that, similar to the trend in the IS discipline in general, the development of theory-based conceptual models in simulation-based studies is growing (Fig. 2). Almost 80% of the analyzed papers that were published between 2015 and 2016 employ at least one theory in constructing their conceptual model. Compared to the 1990s and 2000s, the use of theories in simulation-based IS research has substantially increased in the 2010s, so that the more recent studies more frequently use theoretical lenses.

Fig. 2
figure 2

Use of theories in simulation-based IS research

A particularity of IS research is that, due to its multidisciplinary nature, a wide range of theories from reference disciplines, such as the organization, management, and computer sciences, are used in addition to native IS theories to guide both theory building and theory testing (Straub 2012; Lim et al. 2013). The same can be observed for simulation-based studies: the most frequently employed theories are both (i) commonly used theories in IS or native IS theories (e.g., game theory, the technology acceptance model, competitive strategy, or portfolio theory) as well as (ii) discipline-specific theories (e.g., auction and queuing theories in economics and computer science), which are easily applicable to the specific research questions of a given study. In effect, researchers exploit a wide range of theories that can be easily translated into mathematical models and that help researchers to systematically derive different scenarios subject to simulation (Finding 1 in Table 4).

Table 5 distinguishes different simulation techniques to investigate the average number of theories used in a single publication as well as the percentage of publications that use at least one theory. Combining these two data points allows us to make several interesting observations: first, we note that theories are mostly exploited in agent-based and analytical simulations (see Table 5). For these simulation techniques it is often the case that the complex socio-technical real-world system requires the use of multiple, complementary theories to describe different parts of the conceptual model (Finding 2 in Table 4). The same is true for system dynamics models in terms of using multiple theories, however most system dynamics models in our sample rely on adapting and combining existing models instead of referencing system theories. Nevertheless, system dynamics studies that employ system theories often use multiple theories.

Table 5 Use of theories by simulation technique

These simulation techniques (agent-based, analytical, and system dynamics) can then be contrasted with genetic algorithms and artificial neural networks. We find that most studies rely on exactly one theory to justify input/output parameters (in the case of artificial neural networks) or parameters that define a candidate solution (in the case of genetic algorithms). Since the corresponding simulation models are essentially abstract, mathematical black boxes, they do not require multiple theories to translate complex socio-technical interactions into holistic conceptual models (Finding 3 in Table 4).

Simulation world To analyze which components of a socio-technical system are covered by simulation models, we compare the percentages of publications that include a socio-technical system component for different simulation techniques (Fig. 3). Unsurprisingly, the coverage of socio-technical system components often directly follows the nature of the employed simulation technique. For example, the autonomous agents in agent-based models usually match the socio-technical description of actors, and system dynamics models rely on a precisely defined structure.

Fig. 3
figure 3

Socio-technical components modeled by simulation technique

In Fig. 3, we first can observe a larger coverage of socio-technical system components in agent-based and system-dynamics simulation models when compared to stochastic processes and analytical simulations, which explicitly model relations (Parunak et al. 1998; Carley 2001; Dooley 2002). To adequately represent a real-world problem, agent-based and system dynamics simulation models need to take a large number of factors into account, relating to different socio-technical components of the respective phenomenon and their interactions. Since the overall behavior of the simulation model emerges as a consequence of the modelled interactions, it is often a priori not clear, which factors are important and which factors may be ignored (Wu et al. 2015). Consequently, such models generally cover a comparatively large number of socio-technical system components (Finding 4 in Table 4).

On the other end of this spectrum are artificial neural networks, which do not require an explicit description of socio-technical interactions. Instead, the underlying models of this technique rely on abstract mathematical models that do not resemble any real-world socio-technical system components (Graupe 2013). Hence, we frequently find simulations of IS that employ artificial neural networks and only cover a limited number of socio-technical system components (Finding 5 in Table 4).

Simulation world to real world We now analyze how simulations are validated and used in IS research. Figure 4 shows the percentages of papers that employ different validation techniques (not including animation, degenerate tests, and Turing tests, which are not explicitly reported in any of the 175 analyzed papers) grouped by simulation technique. From this figure, we can see that agent-based simulations in particular rely on a wide range of different validation techniques to establish the credibility of the simulation model. The complex socio-technical nature of the investigated real-world IS phenomena augments the difficulties that researchers face in following the emergent process in which results are obtained in agent-based simulation experiments. Consequently, a single paper generally reports the use of multiple, complementary validation procedures for example, data driven techniques (e.g., historical data validation, variability-sensitivity analysis) as well as established knowledge (e.g., comparison to other models, face validity) and graphical support tools (e.g., operational graphics, traces). The same can be observed, to a lesser extent, for analytical and stochastic processes simulations that model complex socio-technical interactions over time (Finding 6 in Table 4).

Fig. 4
figure 4

Employed validation techniques by simulation technique

Again, we note that most system dynamics models in our sample adapt or combine existing models in their construction, which is reflected in the high percentage for comparison to other models in Fig. 4. The credibility of the extant models is then used to argue for modeling choices in a new simulation model (Finding 7 in Table 4). Regarding genetic algorithms, we find a surprisingly high percentage of publications that employ traces, i.e., track the behavior of specific objects during a simulation experiment. In these cases, researchers often follow the mutations and the development of a successful candidate solution and then use these observations to argue that the simulation model behaves as intended (Finding 8 in Table 4). Finally, simulations that rely on artificial neural networks commonly employ historical data (see Fig. 4) to establish epistemic credibility in the simulation model. These papers essentially perform some type of cross-validation, where a dataset is split into training and validation parts. If the dataset is large enough and of sufficient quality, this is often enough to suggest that the simulation model behaves as intended, thus not requiring further validation efforts (Finding 9 in Table 4).

Figure 5 shows the percentage of papers employing a simulation technique for a specific use. More direct simulation techniques (stochastic processes, analytical) are often used to conduct proofs, in the sense that the simulation is employed to show that the modeled processes can produce a certain type of behavior (Harrison et al. 2007). Similarly, such simulation techniques can be employed to critique extant theoretical explanations of the observed phenomena, for example by describing more accurate or more parsimonious models. These simulation uses generally require a very precise mathematical description of the observed IS phenomena at the outset of the study (Finding 10 in Table 4).

Fig. 5
figure 5

Simulation use by simulation technique

Similarly, genetic algorithms can be employed to demonstrate that certain adaptation strategies work (proof) or are better than previously suggested strategies (critique). Nevertheless, genetic algorithms are more dominantly used to provide empirical guidance by tracing the development of a successful candidate solution to uncover the factors that determined its success. This technique is also frequently used to suggest more efficient designs of IS (prescription), generated through genetic mutations (Finding 11 in Table 4).

Artificial neural networks are mostly used to predict one or more output variables based on a set of multiple input variables. Along the same line, this simulation technique is frequently used to provide empirical guidance to testing multiple configurations of input variables, and use the results to identify the most important predictors for the studied phenomena. Another common use is the application of artificial neural networks in the design of specific IS (i.e., the simulation use prescription), which more efficiently manage certain tasks activities (Finding 12 in Table 4). For example, Kim and Street (2004) develop an artificial neural network that prescribes optimal customer targets for marketing.

In contrast, agent-based simulations are generally employed to discover unexpected consequences of the modeled interactions, and to explain the processes that produce the observed behavior. For example, Johnson et al. (2014) use simulation to study emergence mechanisms of social networks. The researchers start with several highly stylized descriptions of such mechanisms (e.g., social network structure emerges only through the rule of preferential attachment), which by themselves do not adequately represent the complex socio-technical interactions involved in the observed phenomena. Consequently, the researchers propose a blended, multi-theoretic model that better captures observed distributions in online networks (Finding 13 in Table 4).

5 Discussion

We now reflect our findings from the literature review (see Table 4) in light of the preceding discussion on the epistemic particularities of simulation-based research in IS. We follow the structure of the preceding section (real world to simulation world, simulation world, and simulation world to real world). The goal is to point out the choices and consequences that researchers face, and to evaluate the actual decisions that extant research took from an epistemic perspective.

Real world to simulation world Our review reveals that scholars increasingly refer to extant system theories in simulation-based research (see Fig. 2) to support partially creative and intuitive modeling choices in the development of the conceptual model (Finding 1) (Winsberg 1999; Frank and Troitzsch 2005). Researchers generally opt for one of the following two choices for employing theories: (i) a rather comprehensive multi-theoretic foundation or (ii) a parsimonious employment of system theories. The former, multi-theoretic approach to simulation model development is particularly used in agent-based and complex analytical simulations for discovery and explanation of the respective phenomenon of interest (Findings 2 and 13). In contrast, the latter approach – to employ system theories parsimoniously – is common for artificial neural networks and genetic algorithms that are used for prediction and prescription. Such simulations usually rely on a single specific theory to support the development of the simulation model (Findings 3, 11, and 12).

To better understand this division in the observed simulation-based IS studies, we analyze it in light of the dichotomy between inductive and deductive approaches to research (Johnson 1996). In our context, the epistemic credit of a simulation model may be established deductively through extant theory or inductively through empirical data (Winsberg 1999; Sargent 2005; Davis et al. 2007). Some simulation techniques, for instance agent-based and analytical simulations, facilitate mechanism-based descriptions grounded in theory, since they easily allow to combine different theories to describe different parts of the simulation model (Curşeu 2006; Davis et al. 2007; Hedström and Ylikoski 2010). Consequently, such simulation-based studies purposefully combine complementary theories, i.e., a synergistic combination of theories that aims at a comprehensive analysis of the phenomenon of interest (Tiwana and Bush 2007). This is particularly important for agent-based, system dynamics, and complex analytical simulations that are used to study emergent phenomena. In such cases, the relation between minor deviations in the simulation model and resultant changes in simulation outputs is difficult to capture with standard statistical techniques (Hedström and Ylikoski 2010; Grüne-Yanoff and Weirich 2010; Houy et al. 2012). Thus, instead of such variation-based approaches, they often employ a combination of theories to describe individual components of a system (Woodard and Clemons 2014). The use of complementary theories helps scholars to justify modeling choices and also facilitates the interpretation of newly discovered insights in the simulation results (Morgan and Morrison 1999; Woodard and Clemons 2014; Bichler et al. 2016). An example of such research is the work of Nan (2011), where several theories of IT use (e.g., the structurational theory of technology and technology acceptance models) are combined with a complex adaptive systems approach to develop an agent-based simulation model (Nan 2011). Nan (2011) uses system theories to justify a wide range of modeling choices, ranging from models of mental activities of employees to organizational structures and environmental factors.

Conversely, other simulation techniques, such as artificial neural networks, remain epistemically opaque in their operation, but allow to easily verify the simulation results against large sets of empirical data (Sargent 2005; Grüne-Yanoff and Weirich 2010). In these cases, researchers are only required to a priori hypothesize the investigated relationships, but they do not need to detail and argue for the internal mechanisms that cause the observed effects. Instead, the simulation gains epistemic credibility inductively by validating the simulation results against empirical data. As an example, Rivkin (2001) investigates the optimal level of strategic complexity in organizations by using a single, highly stylized fact. This stylized fact is based on case studies and prior theoretical work that links strategic complexity with a performance measure. This essentially follows the recommendation of Davis et al. (2007) to start with simple theory (i.e., a game theoretic description of strategic maneuvers) that addresses the phenomenon of interest and then directly translate this theory to a computational representation for a suitable simulation technique. Researchers may consult Bichler et al. (2016), who elaborate on the development of stylized facts based on existing system theories as well as Houy et al. (2015), who describe how stylized facts can be derived from literature and data.

Simulation world Figure 3 shows that there are significant differences regarding the coverage of socio-technical components in different simulation techniques (Findings 4 and 5). Some simulation techniques, such as agent-based or system dynamics, rely on a description of interaction patterns on a local level to model system behavior (Bonabeau 2002). For these simulation techniques, we generally find a rather comprehensive coverage of socio-technical IS components in the simulation models. On the other hand, if a simulation technique relies on grounding the simulation model in a theoretically derived, abstract, and formal mathematical model, a more selective coverage of socio-technical components may be better suited. Such a selective coverage facilitates the verification of the implemented simulation model and the subsequent validation and interpretation of simulation results (Adner et al. 2009). Thus, artificial neural networks and, to a lesser extent, analytical simulations, genetic algorithms, and stochastic processes may employ simulation models that only cover a limited or minimal number of socio-technical components.

This insight is in line with existing discourses on the creation of conceptual models: ontologically adequate conceptual models that provide mechanism-based explanations of emergent phenomena are often required to hypothesize and detail hidden, but nevertheless relevant, causal mechanisms in simulation models (Frank 2011; Frank et al. 2014). Such conceptual models need to consider all potentially relevant aspects of complex systems, including behaviors and co-evolutionary structures (McKelvey 2002; Houy et al. 2012), which is particularly true for simulation techniques that are commonly used to study emergent socio-technical phenomena.

For example, Nan (2011) to a wide extent covers socio-technical system components in the developed simulation model – comprising employees, tasks, information technology artifacts, organizational structures and environmental factors –, which is then used to experiment and to discover new explanations for the investigated phenomenon. In contrast, simulation techniques that remain epistemically opaque in their operation, for instance artificial neural networks, hide these details in the simulation model. Researchers may consult extant literature on socio-technical IS modeling (e.g., Lyytinen and Newman 2008; Bednar and Sadok 2015; Wu et al. 2015; Beese et al. 2015) and IS modeling in general (e.g., Houy et al. 2012; Frank et al. 2014; Frank 2014) to guide their simulation endeavors.

Simulation world to real world The adequacy of a validation strategy is dependent on the specific simulation technique, the simulation use, the characteristics of the empirical target and the corresponding epistemic challenges (Boero and Squazzoni 2005). In general, it is recommended to combine internal verification with empirical validation in a circular process (Sargent 2005). This not only helps scholars to iteratively develop and test the simulation model, but also to validate obtained simulation results (Axtell et al. 1996; Edmonds and Hales 2003; Boero and Squazzoni 2005; Burton and Obel 2011). Literature on simulation model validation generally suggests the use of multiple validation techniques to build confidence in the connection of the simulation model to the underlying real-world system (Sargent 2005; Davis et al. 2007; Harrison et al. 2007). This is in line with the results of our literature review in Fig. 4, which shows that different simulation techniques rely on different validation strategies, which generally comprise multiple and different validation techniques (Findings 6, 7, 8, and 9). More precisely, we find that artificial neural networks rely on the availability of large datasets, and that system dynamics models rely on adapting and combining existing models. Other simulation techniques, for example agent-based simulations, rely on several complementary validation techniques as well as on the available empirical data.

Especially in the context of complex and dynamic socio-technical IS phenomena, the extent to which the simulation model accurately predicts and captures the essential behavior of the real-world system is often unclear. Some simulation approaches, such as artificial neural networks (see, e.g., Olson et al. 2012; Wang and Chuang 2016), offer a straightforward way towards inductively establishing epistemic credibility in the simulation model. To validate artificial neural networks, researchers may thus refer to established guidelines, for example consulting Arlot and Celisse (2010) on cross-validation procedures. In contrast, epistemic opacity (i.e., the inability of researchers to analytically trace how simulation results are obtained) is notably difficult for simulation techniques that aim at the discovery of novel phenomena and unexpected consequences (Harrison et al. 2007). Consequently, such simulation-based research additionally relies on deductive reasoning to argue for the epistemic credit of the simulation.

Interpreting novel and unexpected simulation results requires a precise understanding of the dynamic processes in the simulation model (Burton and Obel 2011). These processes in the simulation model are often reflected in the action-formation and transformational mechanisms of the conceptual model, such as organizational and individual learning, decision-making, and imitation (Coleman 1994; Burton and Obel 2011). Issues in understanding emergent results of simulation experiments are related to existing discussions on computational models that go “beyond what is to explore possibilities and examine boundaries to what might be” (Burton and Obel 2011, p. 1197), essentially combining the issues arising from equifinality and the opacity of simulation experiments. If multiple assumptions may have led to the same outcome (equifinality), how can one argue that the observed outcome justifies the assumptions? Due to the inability of humans to follow all calculations in a simulation experiment in detail (opacity), such studies instead require a solid theoretical grounding and should employ multiple validation techniques – essentially combining both inductive and deductive reasoning – to establish credibility in the simulation results (Winsberg 2001).

Simulation uses In line with the work of Harrison et al. (2007) on simulation modeling in organizational research, we find that researchers in IS employ simulations for a variety of purposes, for which different simulation techniques may be suitable (Findings 10, 11, 12, and 13). Whereas stochastic processes and analytical simulations are frequently used to critique or prove, agent-based simulations generally aim at the discovery of new phenomena or at explaining previously made observations. Artificial neural networks and genetic algorithms are also frequently employed in the design of IS, fulfilling a more prescriptive purpose.

Considering the earlier discussion on epistemic inferences in simulation-based research, we find that there is a fundamental difference between using simulation to predict empirical observations (step 5 in Fig. 1) and using simulation to hypothesize (step 6 in Fig. 1). This difference is reflected in the choice of simulation technique and consequently in the design of the simulation model. For example, criticizing extant theory or proving novel theory necessitates coherent and intelligible inferences in modeling decisions. This is mostly complicated owing to epistemic issues related to the predictive precision of a simulation model: due to highly stylized abstractions and simplifications, the simulation model does not necessarily show a precise correspondence with empirical data. Thus, instead of claiming to create indubitable proof, many simulation experiments rather aim to create statistical estimates of phenomena or observations that suggest the presence of certain conceptual relations, captured, for example through stylized facts (Küppers and Lenhard 2005). Simulations that aim to predict or prescribe, therefore often focus on their utility, rather than their correspondence with reality, and are mainly conducted through artificial neural networks and genetic algorithms.

In contrast, epistemically opaque simulations offer a potential to study unexpected emergent phenomena that are not inherently obvious in the design of the simulation model. For example, agent-based simulations generally aim to explore or discover new phenomena in complex socio-technical IS. Researchers usually test how different initial conditions and different processes lead to specific results in simulation experiments (Epstein 1999; Davis et al. 2007; Harrison et al. 2007; Weinhardt and Vancouver 2012; Poile and Safayeni 2016). Due to issues related to equifinality and epistemic opacity, these simulations need to be designed in a way that allow theoretical constructs to be traced back to these modeling choices.

For example, Hua et al. (2011) propose an agent-based simulation model that investigates how combinations of operational decisions propagate through complex supply chain networks and finally lead to bankruptcy of organizations. For obtaining their results, they are required to conduct a series of highly structured simulation experiments that allow them to distinguish important decisions from unimportant or unrelated factors (Hua et al. 2011). Scholars interested in building such simulations may rely on a variety of extant knowledge, including, for example, research on visualization techniques (Zhuge 2006; Trier 2008) or on multilevel modeling and analysis (Bélanger et al. 2014; Frank 2014).

6 Conclusion

This study starts with the premise that although the presence of simulation-based research in IS discipline is relatively low, it has recently started to gain recognition within the IS community. This motivates us in considering the particularities of both IS, as a multidisciplinary research field, as well as simulation, as a third way of doing science compared to theoretical and empirical analyses. We first discuss the complex socio-technical nature of IS phenomena, which needs to be considered in simulation-based research, and the epistemic implications of using simulation-based research approaches. Building on these discussions, we derive an analysis framework to investigate the status quo of simulation-based research in IS. We finally synthesize the extracted findings on the current use of simulation in IS research and elaborate on them with regard to currently ongoing discussions on the epistemic particularities of simulation-based research. In doing so, we aim not only to consolidate existing implicit knowledge about the use of simulation in IS, but also to guide prospective simulation-based IS research.

Accordingly, we briefly summarize the key insights from this study. First, the use of theoretical lenses (both the frequently used theories in IS and the native IS theories) is recommended to develop theory-informed simulation models. There is a choice to opt for a comprehensive multi-theoretic foundation in simulation model development versus a more parsimonious use of theory. The multi-theoretic approach is particularly well-suited for deductively establishing epistemic credibility in complex simulation models, whereas the parsimonious approach generally relies on inductive arguments based on empirical observations. Second, some simulation techniques require a comprehensive coverage of system components to account for the socio-technical nature of IS phenomena, while other simulation techniques can more easily isolate and focus on specific components. Simulations that aim to provide mechanism-based explanations are required to detail hidden causal mechanisms in the simulation models. In contrast, simulations that remain epistemically opaque in their operation, e.g., artificial neural networks, allow to hide away these details. Third, IS scholars should consciously and purposefully employ different validation techniques to ensure the reliability and validity of obtained simulation results. We find that a clear inductive or deductive approach is only rarely possible. Instead, simulations generally must be validated by using multiple, complementary, deductive and inductive techniques, which counteract the loss of credit due to the different epistemic inferences made in the simulation process. Fourth, the choice of simulation techniques by IS scholars should be in line with the intended type of theorization in the given study. Criticizing extant theory or proving novel theory necessitates coherent and intelligible inferences in modeling decisions (e.g., in analytical simulation models). In contrast, emergent and epistemically opaque simulations (e.g., agent-based simulations) offer a potential discover new phenomena and explore the underlying mechanisms.

Finally, we want to discuss two important limitations of this research. First, and most notably, this research is inevitably selective in discussing related topics as well as limited in the depth of discussions due to the breadth of the investigated subject. For each single simulation technique, there exists a wealth of published knowledge and often we only scratch the surface in this paper. While we try to provide pointers to further information for readers, well-versed experts in a specific simulation technique will most likely have a deeper understanding of the corresponding intricacies than we present in this analysis.

Furthermore, a literature review is methodologically limited to an investigation of the status quo of a phenomenon, since it only considers previously published research. However, analyzing and critically examining the results of our literature review in the context of ongoing discussions in the philosophy of science allows us to not only summarize prior research, but also to critically examine the contributions and to provide additional explanations for the observed patterns in the reviewed papers (Rowe 2014). Consequently, by studying a representative sample of simulation-based research in IS, we can use the tacit knowledge of experienced simulation researchers to facilitate future simulation-based IS studies for less experienced scholars.