Introduction

Policymakers have identified science, technology and innovation (STI) as the most important policy targets for the future of our societies (OECD 2009a, b; European Commission 2002, 2008). Related STI Policy frameworks, such as the funding programme of the European Commission, are of ever-increasing importance: “Horizon 2020 is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020)—in addition to the private investment that this money will attract. It promises more breakthroughs, discoveries and world-firsts by taking great ideas from the lab to the market. (…) Seen as a means to drive economic growth and create jobs, Horizon 2020 has the political backing of Europe’s leaders and the Members of the European Parliament. They agreed that research is an investment in our future and so put it at the heart of the EU’s blueprint for smart, sustainable and inclusive growth and jobs. By coupling research and innovation, Horizon 2020 is helping to achieve this with its emphasis on excellent science, industrial leadership and tackling societal challenges. The goal is to ensure Europe produces world-class science, removes barriers to innovation” (29 DEC 2015; https://ec.europa.eu/programmes/horizon2020/en/what-horizon-2020).

This contribution starts from the challenges of complexity, uncertainty, and agency, which refute the prediction of social systems, especially where new knowledge (scientific discoveries, emergent technologies, and disruptive innovations) is involved as a radical game-changer. Next, this contribution will introduce requirements for STI Policy agendas stemming from current network modes of knowledge production, which decisively shape the need for scientific policy advice in this field. Then it will present how experimental methods of agent-based simulation can address the identified requirements for policy advice. This will be done with reference to some simulation-based policy advice projects in the STI field using the agent-based simulation platform SKIN. The article finishes with a discussion of validation and quality assessment issues for social simulation models, and of the applicability and limitations of computational experiments in the social sciences.

Science, technology and innovation for society

The role of knowledge–of STI for modern economies–is confirmed by income distributions and the share of knowledge-intensive industries in different world regions. The correlation is significant—high-tech regions match with high-income regions (cf. Krueger et al. 2004).

“R&D expenditures and intensity have been found to have a significant effect on per capita GDP growth.” (OECD 2009b: 5) The extensive evidence for this correlation has long been monitored and documented by international and national institutions in much detail (e.g. OECD 2009a, b; European Commission 2002, 2008). However, these analyses provide evidence using correlations from econometric data. They do not tell us much about causal chains and mechanisms, about the traceable line from investment to result. Empirical evidence proving a direct and immediate profitability of STI investment is scarce. There are even studies suggesting that R&D intensity is negatively associated with innovation and economic growth (Jordan and O’Leary 2007).Footnote 1

Especially in a time of diminished public resources and difficult capital markets, this is not acceptable. The strong need for justifying public and private investments produces a tendency in STI Policy, business management, and public discourse to expect that the current investments in R&D, higher education institutions, science-industry networks etc. will immediately produce a flow of products and processes with high commercial returns. The requirement is to see value for money, and that is money for money. If there is a considerable investment as input, there must be a considerable, beneficial, and short-term output, which can directly be traced to this input.

This expectation still feeds from one of the first conceptual policy frameworks, the so-called linear model of innovation, which was fundamental to post-war innovation policy. It assumed that innovation—like through an input–output pipe—could directly be triggered by investing in basic scientific research, which would immediately be followed by applied research and technology development, and would end with production and diffusion bringing products and services to the market (cf. Bush 1945). However, too often STI policymakers who had put large amounts of money into the R&D end of this pipe and sat at the output end waiting for the benefits had been disappointed. Due to this situation of “market failure”, the linear model and theoretical frameworks favoring it were heavily criticized, e.g. in discussing the principal-agent theory of policymaking (van der Meulen 1998; Kassim 2003), or in promoting the garbage can-model of policymaking (Mucciaroni 1992). Although the linear model is still ghost-riding through practitioner needs and public discussions, STI policymakers have long since favored the so-called “neo-liberal model”, which better includes issues of open innovation (Chesbrough 2003), innovation networks etc.

The linear model of innovation, that assumes that research leads directly to innovation, has proved to be insufficient to explain innovation performance and to design appropriate innovation policy responses. (European Parliament 2006: 18).

On the practitioner side and to the critical public mind, the disappointments and legitimatory problems arising from missing outputs, however, were considerable and showed the limits of steering, control, and policy functions. If not a principle apprehension against the importance of knowledge and innovation (Jordan and O’Leary 2007), the responsible innovation managers mention a frustration with the too messy and complicated features of the innovation process, which simply ‘does not seem to compute’.

Knowledge has always been a challenge to economic growth theory (cf. Hanusch and Pyka 2010). First it figured as a residual variable to work and capital (Solow 1956, 1957), then “knowledge and innovation” advanced as a new productivity factor, e.g. in the New Growth Theory of Paul Romer, which was used by the OECD in their famous 1996 paper on the knowledge-based economy (OECD 1996). In New Growth Theory (e.g. Romer 1990; Grossman and Helpman 1991), the continuously increasing factor of human capital, i.e. the sum of all technological capabilities of human beings in the production process, secured the usage of capital with constant marginal productivity—leading to limitless and continuous growth. This framework would have offered the best fit to the expectations mentioned: we invest in technology, research, and learning, and we will get direct and ever-increasing economic returns. However, empirical economic research quickly falsified the general applicability of framing the relation between technological innovation and economic growth like this.

The world model of the Club of Rome (Meadows et al. 1972) presented five variables allowed to grow exponentially (population, industrialization, pollution, food production, and resources depletion) while the ability of technology to increase resource ability was supposed to grow linear, discrete, and incremental. Although some people have said the opposite in the meantime (cf. Turner 2008; Nørgård et al. 2010), critics generally agreed that the model had been proven to be wrong historically, did not take in enough variables and feedbacks, relied on simplistic dynamics, was based on limited data, and generally failed in prediction (cf. Sandbach 1978; Hayes 2012). One of the strongest critics was Nobel Prize winner Robert Solow, who argued that the role of new knowledge and technology was seriously under-estimated (Newsweek, March 13, 1972, p. 103).

In empirical reality, growth processes are never continuous. They are specific to technologies and sectors showing multiple layers of small cycles, they stagnate, they slow down, they are characterized by time-delays, they break up, they go on—sometimes incrementally, sometimes in radical jumps. Neo-Schumpeterian approaches in economics concluded: if we are interested in this fine-granulation of growth processes, we have to look deep into the real dynamics of innovation, i.e. on the micro-level. This is because success and failure of empirical innovation processes determine the movements in productivity (Nelson and Winter 1982). Accordingly, economic growth can be observed on the macro-level, but an explanation for growth cannot be found there.

This quick look into the recent history of growth theory already directs us towards doubting the justification of the expectation of immediate economic returns for R&D investments. Castells (2000) continued his analogous discussion concerning the economic profitability issues of the ICT revolution by further elaborating on the reasons why there is no direct input–output relation:

  1. (a)

    we have to account for the lag effects—knowledge and new technology need quite some time to enter the market and to diffuse widely;

  2. (b)

    we have to account for serious productivity measurement problems, especially where the service sector is concerned. “The focus on non-technological innovation has been most prominent in the services sector, which now accounts for more than 70 % of GDP in OECD countries. Indeed, empirical evidence shows that innovation in this sector takes different forms than in the manufacturing sector. Services firms innovate through informal R&D, the purchasing and application of existing technologies, as well as the introduction of new business models. There is a growing recognition that innovation encompasses a wide range of intangible activities, in addition to R&D. Efforts to improve measures of such innovative activity, or show that R&D needs to be supported by a complementary range of other investments, are still underway” (OECD 2009b: 7). The same holds for qualitative improvements of technologies as in most cases it is not the amount of output produced that increases but the technical features improve, thus offering higher quality;

  3. (c)

    we have to account for sector-specific productivity—aggregated productivity figures might not tell the true story.

It is important to look at the reasons why empirical growth rates depart from a linear relationship with R&D investment and what the consequences are for STI Policy and innovation management. Refuting a simple causal connection between innovation and productivity does not imply that there is none at all. We just have to take up the challenge to investigate it for what it is: a complex empirical phenomenon. The task is to enter the turbulent layers of small innovation cycles and the innovation dynamics of innovation networks.

Following insights from innovation economics and economic sociology (cf. Ahrweiler 2010), it would indeed be surprising to see immediate and easily measurable output following any improvements in the area of knowledge, research, and learning. Although growth can be observed on the systems level, growth cannot be explained and controlled on the systems level. We have to investigate the non-linearities and path-dependencies of sector-specific productivity located in institutional contexts (cf. Saviotti 2010). There is a strong influence of geography that matters (cf. Cooke, Heidenreich and Braczyk 2004; Ebersberger and Becke 2010). Ultimately, growth as a system-level phenomenon is produced by a complex interaction pattern on the micro level of innovative actors in networks (cf. Allen et al. 2010). This is why we have to investigate the role of collaborative arrangements in innovation.

For business innovation management this means difficult decisions: the true uncertainty (Knight 1921) of knowledge availability, access, and transfer, of technology absorption, of financial risk, of regulatory barriers, institutional impediments, of market access, and profitability counteracts all predictability (Pyka and Ahrweiler 2008). STI Policy needs to accept and handle the complex features of innovation (cf. Rossi et al. 2010). This implies resisting the temptation of false expectations concerning short-term economic rewards for R&D investment. “Governments (…) need to focus on medium to long-term actions to strengthen innovation. A broad range of policy reforms will be needed in OECD economies and non-OECD economies to respond to the changing nature of the innovation process and strengthen innovation performance to foster sustainable growth and address key global challenges.” (OECD 2009a: 12).

It is understandable that the linear model still influences the expectation structures of STI policymakers, business managers, and the public. Simple messages about causes and effects always go down well where there is a need for control. However, in this case, it has been made very clear over the past decades that “they do not compute” (Buchanan: “This Economy Does Not Compute”, New York Times, 1 Oct 2008: A29). The task at hand is to develop a complexity-adapted way to support, on the one hand, STI Policy design and analysis (cf. Squazzoni and Boero 2010), and, on the other, to understand and analyze the self-organizing coordination mechanisms, which arise in and between participating innovative actors in R&D networks.

Planning and prediction as a policy challenge

There is a tension in current policymaking between the obvious necessities of planning and the opacity of its impacts. Planning is the process of contemplating and organizing the set of activities and measures required to achieve a desired goal. Creating and following a plan implies scenario analysis, i.e. forecasting the most likely developments and preparing for challenges and conditions that could potentially arise. Planning has obvious advantages, and they all center on the possibility of asking “what if”-questions for evaluating different scenarios and using predictive information before implementing decisions about activities which will affect the future in the empirical world.

Implementing measures realizing planned objectives is usually risky: there is no linearity between the suggested measure and its desired effect. Analytical approaches which attempt to offer guidance and support have to acknowledge that any forecasts and predictions on planning success or failure are difficult if not impossible. The real-world implementation of planning measures can also turn out to be expensive. If the implemented measure is a failure, it will not only have occasioned production and roll-out costs, it might even prove to be harmful and lead to some un-intended, very costly side-effects. Last but not least, the time and efforts wasted on the failure might have been better used for a more appropriate set of activities. Furthermore, objectives can change or disappear in the middle of the plan implementation process, and quick and intelligent response is needed.

Even in the highly-controlled settings of game theory, the future is unpredictable and cannot be planned accurately. A famous example is the so-called El Farol problem put forward by mathematician Brian Arthur (Arthur 1994). As the story goes, there are 100 jazz fans in Santa Fe, New Mexico, who like to visit the Thursday jazz concerts of the El Farol Bar in town, where enjoying yourself becomes impossible when there are more than 60 people in the room. Although attendance numbers of each concert are published in the weekly newspaper, the jazz lovers use their own prediction rules for estimating visitor numbers of the next concert (such as “same as last week”; “half as many as last week”, etc.), which they update according to their reliability. That means that whenever the prediction rule forecasts a visitor number above 60, you will stay at home. The problem is that other people are clever as well, so, if a person has a high forecast, other people might have a number well above 60, too, and will stay at home accordingly. The result is that a person could actually go. However, when everyone is using the same logic—“I think that you think that I think.”—and are using the same strategy, you will have an unhappy gathering of everybody.

In other words, we create and change the world we want to predict. There is no analytical solution to the problem, which would allow us to plan. However, if we simulate the number of weekly guests as a global property emerging from the individual decisions of jazz fans, we get something looking like a random process fluctuating around 60. However, it is not even a random process; in fact there is nothing random in this instance. The number of weekly guests is a completely deterministic function of the individual predictions, which are themselves deterministic functions of past guest numbers. There is nothing random here.

The bad news is that the El Farol problem is a tiny problem. It deals with a limited number of homogenous individuals featuring a limited number of attributes (loving jazz music, hating crowds) being able to obtain a limited number of possible states (be in bar, be at home) and following a limited number of rules (prediction rules) in a stable context. El Farol is a tiny, closed, completely deterministic world to operate in. Nevertheless, planning and prediction in any analytical sense is impossible.

Enters “the real world” to deal with: heterogeneous individuals with many attributes and properties in permanently changing contexts displaying a multitude of behaviors—interacting, learning, creating, anticipating, changing their mind, adapting, forgetting, ignoring, experimenting, testing, choosing etc. Above all, those individuals also are inventing, creating new knowledge, developing new technologies, and innovating—to solve problems, to give somebody a surprise, to make life better or for whatever reasons they imagine.

How about planning and prediction here? According to popular definitions, planning requires some sort of prediction: it requires knowledge about the future. In the El Farol world, we cannot even plan our next visit to the bar due to severe problems forecasting guest numbers and preparing for anything happening.

Are we in any better situation because we have scientists, professional planners, policy makers, future analysts, and suchlike people to provide forecasts for whole societies, especially with reference to new technology? And would it be a good idea to abstract from interacting individuals but go for statistics, aggregate variables, and correlations on the macro level for modeling?

It is sometimes said that small time horizons and/or relying on associative and narrative knowledge would offer a bit of leeway to say something about the future. However, how should that work? Surprises can arise at any time. Why should associative people telling stories have better access to anything than policy analysts? There is no really convincing reason why this sort of planning escapism could work.

The general verdict says we cannot predict and plan due to a long list of features such as:

  • complexity,

  • emergence,

  • surprise,

  • self-reference,

  • choice,

  • long causal chains,

  • un-intended effects,

  • multi-level feedbacks,

  • high contingency/ambiguity,

  • randomness,

  • deciding in turbulent environments based on uncertainty and incompleteness of knowledge,

  • no central definition and control of objectives, desired futures, and strategies,

  • and so on (no claim for completeness).

Each of the features per se—but, of course, the moreso their combination—prevents any knowledge of the future. The options for planning are nil if we take this in. There is no certainty of prediction. Analytical approaches have to acknowledge that any forecasts and predictions on planning success or failure are difficult if not impossible. Planning is futile.

Policy modelling for complex social systems

Complexity science has provided some new mathematical approaches and tools to challenge our common belief that there is only something to learn about the future when looking at a deterministic system, which would naturally exclude any real-world social phenomenon. Only deterministic systems—this belongs to our usual set of convictions—can be predicted and manipulated.

Complexity science says it is even worse: not even deterministic systems (see El Farol example above) can be accurately predicted! For complex social phenomena, the situation scales up. Scholars from complexity science (Bar-Yam 1997, 2004; Braha et al. 2008; Casti 1995; Flake 1999; Stewart 1989; Waldrop 1992) locate social processes in turbulent environments with high uncertainty and ambiguity. They assign to social processes characteristics such as multi-scale dynamics with high contingency and non-linearity, emergence, all kinds of feedbacks, pattern formation, path dependency, recursive closure, and self-organization (Frenken 2006; Lane et al. 2009). Scholars such as Brian Arthur (cf. Arthur 1989, 1998), building on mathematical concepts originating from physics and engineering science (Gell-Mann 1994; Kauffman 1993, 1995; Prigogine and Stengers 1984; Holland 1995), impressively demonstrated what this means for the predictability of social dynamics, namely non-predictability. However, their message is that this is not the end of the story. All of these features, which might prevent us from “solving” a complex social equation analytically where we neither know about the relevant variables (nor could we handle any relevant numbers even if we knew them) nor about the applicable functions for calculating the future state, do not disallow the simulation of the phenomenon in question using all the knowledge we have.

In a simulation, we can deal with many variables, many interactions, much feedback, randomness etc. We can mimic what we see as relevant processes on the computer and observe what the model is doing. In a simulation, we can actually exploit the features which have originally prevented us from “solving” the situation analytically, to understand the situation and its “production algorithm” by seeing it at work.

This is especially the case for simulations where we “grow” the macro parameters by micro level dynamics. Here we can implement action and interaction of actors (agents) on the micro level, and we can observe long causal chains, un-intended effects, and multi-level feedbacks, and see macro parameters emerging from these dynamics. In fact, this addresses the core issue of most social science theories—the micro–macro problem, which already asked in its origins how “social order” does emerge from individual behavior (cf. Weber 1921), or—more modernly put—how macro features arise from micro dynamics (cf. Giddens 1988).

This is where empirical research enters the scene. The El Farol situation cannot be solved analytically, but it can be simulated and understood. We can observe what happens and follow it through—step-by-step and in many different runs—to see the options and possibilities of what can happen and what cannot.

And what if, for example, we make the El Farol simulation a little more realistic using some empirical information? What if we look for the empirical distribution of this cognitive capacity and give it to our 100 jazz fans? This would probably do something to the random fluctuation around 60, which does not help us in planning at all, but does help us decide whether we visit the next concert or not. The more we know empirically about the micro dynamics, the more realistic our computer simulation can be—and the better for our planning purposes.

Would this be the “carte blanche” to announce that prediction and planning is possible for the social realm? Certainly not, but it might be a wake-up call to look again carefully and in detail at the long list of reasons why we cannot predict and plan, and to dis-entangle the set of issues from the general verdict, which says that nothing is possible (planning and prediction) because anything is possible (in terms of future). We also need to re-assess where we are simply doing what we have done before, and where we are doing something different and new. What does it imply to inform our artificial computer worlds populated by agents with insights from empirical research?

In the past decades, the task set of STI Policy has been changed considerably. Knowledge has not only been advanced to be the central resource for economic growth, creating new jobs and markets; research and innovation is expected to address societal challenges and solve societal problems such as climate change issues, health care, and food and energy supply. STI Policy is supposed not only to ensure the production and availability of the precious resource of “knowledge” but also to organize the combination and interaction of necessary knowledge fields, which are required to address complex questions, and to organize their use that new knowledge can be translated into action.

Scientific policy advice (cf. Weingart and Lentsch 2009; Wrasai and Swank 2007; Jasanoff 2004; Weaver et al. 2001) directed towards STI Policy is required to provide a systematic monitoring and impact assessment of STI contexts. These include ex-ante evaluation and assessments of potential futures, options, developments, and scenarios for these contexts to inform political debates and decisions including future resources, which are particularly unavailable, i.e. new knowledge, the emergence of the new in research and innovation.

Interestingly, it is exactly this area which can serve as a most reliable example of the difficulty to model the future, because complexity of social reality refutes a “blue-print for social engineering on the grand scale” (Popper 1972: 267) where the call for simulation experiments gets louder. Simulation studies are tendered by political actors to evaluate ex-ante the impacts of STI Policy: “Policy impact simulation: An important goal of evaluation research is to make evaluations relevant to policy options for intervention in RTD and innovation. Evaluations must relate observed parameters and impacts to the characteristics of the intervention. It must be possible to deduce what could and should be changed in the intervention to improve impacts. Accordingly, much more use should be made of ex-ante network analysis to simulate the impacts of intervention policy changes” (European Commission Workshop Report “Using Network Analysis to Assess Systemic Impacts of Research”, March 2009: 18).

The demand for simulation refers to “what if”-questions of policy interventions (ex-ante evaluation), which can only be answered if development scenarios are realistically modelled for experimentally estimating policy options for possible “futures”. This is about identifying potentials, chances, and options, but also about avoiding undesirable developments in terms of an “early warning system”. Policy strategies and the related effects and impacts are subject to experimental testing. Further below, this contribution will show how simulation studies answer these needs.

Social simulation using agent-based modelling

In social science, empirically-based experiments are hardly possible—they may only take place in a very limited way if at all. The usual characteristics of experiments are (1) reproducibility, (2) controlled and thoroughly understood experimental setting, and (3) controlled and thoroughly understood experimental process. Social science has problems with all three aspects. They would need to deal with social contexts, i.e. social interactions between people and groups of people. To reproduce an experiment with identical initial conditions, the reproduction would need to happen with exactly the same people due to different socialization and experience backgrounds of individuals. This crowd, however, would now have the experience and insights from the first run of the experiment, which would probably change its behavior during the second run of the experiment, and most definitely has already changed their initial condition.

Social interactions have features which cannot be directly observed, such as expectations, learning, knowledge flows, choice between alternative options of how to act, deciding under uncertainty, etc. Furthermore, interaction processes are non-linear: they are characterized by much feedback (e.g. between micro and macro level), many loops (e.g. in defining and re-defining action contexts), long causal chains, un-intended effects, and self-reference (see above).

These characteristics of the social world generate an infinite excess of possibilities and options (not everything is but too many things are possible) in processing interaction contexts. This makes planning and prediction rather difficult if not impossible. They also prevent a controlled and thoroughly understood experimental setting, and a controlled and thoroughly understood experimental process in the social realm. Reproducible experiments cannot be guaranteed at any time. It is already the fact of empirical “un-observables,” and at that, the missing understanding of fundamental processes of interaction contexts makes this impossible.

Computer simulations share some but not all of the difficulties of empirically-based social experiments. Models and simulations have considerably improved with regard to their capacity to represent complex interaction contexts. They can help to understand their social dynamics and to identify potential access points for intervention on the micro level of actors. Today, we are able to model complex non-linear dynamics; this includes the modeling of possible and likely extrapolations in time horizons and experiments with parameter changes. Simulations representing computational worlds as “artificial societies” (cf. Doran and Gilbert 1994) can now rely on quantitative methods of informing models by huge datasets coming from e-humanities and BigData technologies for representing empirical structures in great detail. Furthermore, agent-based simulations are closely connected to qualitative methods of interpretative sociology. With this support from the empirical research realm, scenario analysis becomes much more than an event corridor between “best case” and “worst case”—simulation experiments rather aim at understanding the micro dynamics on the actor level that lead to observable structure.

Nevertheless, it would be too optimistic to conclude from being confined to a closed world—the computer—and being constructed from software programs written by programmers that simulation experiments fulfill all requirements for controlled design of experimental setting and experimental process as listed above. Among other issues, there are computational limitations, “It is impossible to determine whether portions of the code have ever been executed by black-box testing. Code that has not been executed during testing is a sleeping bomb in any software package. Certainly, code that has not been executed has not been tested” (Cole 2000: 23f). Each simulation program contains software bits that have never been subject to any testing and that therefore cannot be claimed to be understood or “under control”.

Simulations are used in many scientific disciplines and cover a wide area of functions. Concerning the latter, the computational representation of real-world systems in order to experiment with parameter variations for predictive purposes about future behavior of the system is only one of the existing and the common functions of computer simulations (cf. Gilbert and Troitzsch 2005). Within scientific disciplines, simulation applications show huge methodological diversity. Just for sociology, Gilbert and Troitzsch list the advantages and limitations of seven popular simulation techniques with current examples in “Simulation for the Social Scientist” (2005). Among them are the well-known equation-based system dynamics models, micro simulations, the queuing models coming from engineering science, cellular automata, and multi-agent systems.

Due to the purpose dealt with here, we will only look at the latter in more detail. This type of simulation is used to model complex systems of interactive agents (cf. Epstein and Axtell 1996; Bonabeau 2001; North and Macal 2007; Macal and North 2009). Each agent of an agent-based model (ABM) is an independent autonomous computer program and has properties (variables) and behaviors (algorithms, “rules”). In multi-agent systems the agent programs interact with each other and with an environment implemented in the system. An “agent” can be everything with “agency” (having properties as a unit and having behavior): a human being, collective actors such as organizations, households or states (but also other objects such as cars as agents in traffic simulation).

Using ABMs, we can relate the dynamic behavior and the structure of a system to the properties and behaviors of individual agents and their interaction. This type of modelling is especially appropriate where the mutual responsiveness between micro behaviors of agents and macro behavior of the system is under investigation. Here, it is possible to trace system behavior to the combination of individual action points and decisions on the actor level, and to how changes on the system level affect the behaviors of agents. There are ABMs with very simple, homogenous agents, which—each per se—only muster few properties and simple behavior but can produce complex system behavior in their interaction (example: segregation behavior in US-American cities using the Schelling model, cf. Schelling 1971).

However, there is also a large community using “intelligent” agents in the wake of Artificial Intelligence approaches and so-called “expert systems”. Here, many heterogeneous types of agents are represented, which are equipped with a large number of properties—among them, for example, anticipation and learning—with individual and changing knowledge bases and a multitude of behavioral options. These heterogeneous complex agent types interact in dynamic environments. This second approach is the one of choice whenever the aim is to model human or organizational behavior as realistically and as detailedly as possible. This especially applies where the objective is to change this behavior. Only when we understand and computationally represent the properties/behaviors of agents and the resulting dynamics, can we identify where changes on the micro level, i.e. the level of agents, lead to changes on the system level. Although ABMs are used in many scientific disciplines, we can find ABMs of the second type mostly in social science due to their capacity to mimic complex human and social behavior. They go with the label of “social simulation”.

To represent empirically observable and analysed behaviors of actors on the micro level of social phenomena within a simulation, computational agents—their properties and behaviors—of the simulation have to be informed (calibrated) by empirical data. The more we know theoretically and empirically about the case to be modelled, the richer and more detailed the case can be represented on the computer. With such a simulation, we build “artificial worlds” using software, which follows the knowledge we have about these worlds. This is the place where social simulation has to rely on social theory and empirical social research.

For example, agent-based simulations with intelligent agents used for representing a particular social phenomenon as realistically as possible is closely connected to the hermeneutic approaches and qualitative methods of interpretative sociology. The latter serves as an “informant” for calibrating agents: it is necessary to understand actors and their behaviours to model agents. To use simulations as a “social laboratory” means that computational agents have to have relevant action orientations, knowledge, intentions, strategies, fears, hopes, etc.; they need to have options for behavior to act and interact, which their empirical “pendants”—the actors—also use. For this, agents in simulations often have highly complex “interieurs”, such as so-called Belief-Desire-Intention (BDI) structures (cf. Wooldridge 2000; Balke and Gilbert 2014). These are agent architectures, which show how the conceptual level of action orientation translates into actual behaviour. The architectures, drawn from social theory, then need to be calibrated with empirical details from qualitative research, which works with methods such as case studies, interviews, document and discourse analysis. Of course, quantitative social research also plays an important role in calibrating agent models. If, for example, we want to simulate a bigger social context such as the EU-funded research landscape in Europe, for a detail-prone representation of the current landscape the model will need statistical data about number and type of funded organisations, projects, thematic areas etc. Here, the “artificial societies” (Doran and Gilbert 1994) represented in simulations of computer worlds can today rely on quantitative methods to use huge amounts of data from the e-humanities and BigData technologies for a detailed representation of empirical structures.

Integral parts of these simulations are also the complex, non-linear interactions between agents and environment (e.g. EU research policy, availability of funding, etc.); environmental conditions need to be calibrated quantitatively and qualitatively, too. Models from social simulation calibrated this way can help to understand social dynamics of the empirical system and identify possible access points for intervention on the micro level of actors. For this, both structural and procedural aspects of the social phenomenon to-be-modelled need to be informed empirically. The more empirical knowledge that goes into the simulation of both aspects, the more similar the “computer world” becomes to our empirical world of experience. It becomes a “sociotope”, an artificial world, which resembles the empirical one in decisive aspects. The quality of a simulation is partly decided by its “recognition value” with the stakeholders: if the stakeholders recognise essential elements of their every-day experience in the simulation, they accept and value the possibility to learn from and with the simulation and gain additional knowledge for shaping the empirical system (cf. Ahrweiler and Gilbert 2005).

To fulfil this expectation, the calibrated model needs to be “similar” to the empirical system at a certain point in time; it needs to produce structures and dynamics that the empirical system shows or has shown without further intervention in the next time periods (zero hypothesis). We then can use this reproduction of the empirical system by simulation as benchmark, as “baseline scenario”, to experiment with interventions.

If we can observe a qualitative correspondence between the empirical structures and the structures produced by the agent-based model (similarity of dynamics, iso-morphy of structures), we can call our simulation experiments “history-friendly”:

“‘History-friendly’ models are formal models which aim to capture—in stylized form—qualitative theories about mechanisms and factors (…) They present empirical evidence and suggest powerful explanations. Usually these “histories” (…) are so rich and complex that only a simulation model can capture (at least in part) the substance, above all when verbal explanations imply non-linear dynamics” (Malerba et al. 1999). Interventions can then target procedural aspects, e.g. changing agent behaviour, or structural aspects, e.g. changing of agent numbers in the starting configuration, of the agent system, or environmental conditions such as available resources.

The advantages of using techniques from social simulation for innovation research are confirmed by many agent-based models (cf. Ahrweiler 2010: 233–315). These models implement, for example, the interaction of knowledge and actors, of outputs and organizations, of network formation and evolution. They simulate the interdependencies of existing innovation policies and funding strategies, of future innovation policy scenarios and alternative technology paths to improve innovation performance. For example, to understand and describe the structures and dynamics of knowledge-intensive industries, the [SKIN] model would be required to constantly monitor their networking behaviors. For some of the most important features here (the creation and diffusion of knowledge), such observations are difficult up to impossible. In the face of this challenge, an agent-based simulation continuously produces dynamic data stemming from the theoretical framework underlying (mostly from innovation economics, science and technology studies, and economic sociology) and relying on empirical calibration data. With this, an ABM offers observation and experimental opportunities, which are not available in the empirical field. A recent overview and a critical discussion of existing simulation models concerning innovation is provided in the book by (Watts and Gilbert 2014).

Simulation experiments using the SKIN model

The agent-based simulation platform SKIN (acronym for Simulating Knowledge Dynamics in Innovation Networks) works with heterogeneous, “intelligent”,Footnote 2 and complex agent types, which act and interact in a computational world resembling as much as possible the empirical world. There is a close relationship between theory,Footnote 3 empirical data, and simulation. Due to this, SKIN claims to be relevant for providing policy advice. SKIN reproduces the research and innovation worlds of empirical actors on the computer. By calibrating the model with empirical data sets, it allows realistic and detailed experiments to answer “what if”-questions of STI Policy.

The SKIN model

The SKIN model is concerned with simulating knowledge profiles, science and research landscapes, and innovation networks on different scales. The “basic SKIN model” has been presented elsewhere (cf. Pyka et al. 2007; Gilbert et al. 2007; Ahrweiler et al. 2011a). On its most general level, SKIN is an ABM with knowledge-intensive organizations as agents, which try to produce new basic or applied knowledge, and/or which try to produce new products and processes via innovation. Agents are located in permanently changing, complex social environments where their efforts need to find approval; e.g. in the market if they target innovation, or in the scientific community if they try to publish their research results.

SKIN agents are knowledge-intensive, learning organizations. Each agent owns an individual dynamic knowledge profile. In the model, an agent’s individual knowledge base—a vector in a multi-dimensional space—is called its “kene” (cf. Gilbert 1997), which the agent uses as source and object for its research and innovation activities. The abstract knowledge profile can be “fed”, i.e. calibrated or informed, by empirical data. “Data points” are “units of knowledge” (e.g. core competences, capabilities, codified and tacit knowledge, explicit and implicit knowledge), which are produced, used, and available.

For example, we can directly work here with publication and patent or other source data for specific actors and contexts. Using methods from bibliometrics, scientometrics, patent analysis etc., structural knowledge profiles of organizations can be collected, analyzed, and evaluated.Footnote 4 Interpretative social science can furthermore contribute to shedding light on knowledge profiles by making the context of meaning and the connectivity to actions accessible and “understandable” via interviews with actors, case studies, and document/discourse analysis. Using this modeling approach, SKIN represents and simulates the knowledge profiles of organizations active in research and innovation where, in aggregation and extrapolation, knowledge profiles of countries, regions, municipalities, and clusters can be re-constructed and simulated. Simulating knowledge profiles belongs to every SKIN application. The kene is dynamic: an agent can learn—either alone by incremental or radical research—or together with other agents by exchanging and improving knowledge in partnerships and networks.

Within these collaborative arrangements, SKIN agents have a large number of strategies and mechanisms available; for example, to choose partners, to engage in partnerships, to initiate knowledge exchange, to generate collaborative knowledge outputs, or to distribute innovation rewards. These interactions and the resulting social structures can be calibrated by empirical data as well. Information on the structures and dynamics of the science and research landscape on the actor and system level is broadly provided for countries, regions, sectors, and clusters. “Data points” are actors, interactions, and networks in research and innovation. Social Network Analysis (SNA) is a common tool to analyze this type of empirical data identifying and visualizing central actors (hubs), clusters, the position and role of new entries in the research and innovation landscape etc. However, it only addresses the structural aspects of the science and research landscape. Actors, processes, and causal chains producing these network structures are in between “snapshots” of two network states following each other. Information on actors, their expectations, objectives, competences, strategies, cooperation behavior etc. and about their action contexts, the processes, cultures, and institutional frameworks they are embedded in must be made transparent, accessible, and “understandable” again with the help of complementary qualitative methods such as interviews with actors, case studies, and document or discourse analysis.

Summarizing, agents in any SKIN application interact on both the knowledge level and the social level. Both levels are inter-linked in many different ways. SKIN is all about actors, knowledge, and networks. This general architecture is quite flexible, which is why the SKIN model has been called a “platform” (cf. Ahrweiler et al. 2014) It features applications as different as modelling the Vienna biotech cluster (Korber and Paier 2014), the simulation of Irish university-industry networks (Ahrweiler et al. 2011b), and also the ex-ante evaluation of EU-funded research projects and the research landscape they produce (Ahrweiler et al. 2015).

Example: a SKIN simulation study for European STI policy

This last example will be discussed here in more detail. It is about a contractual research study tendered by the former Directorate General for Information Society and Media (DG INFSO; now called DG CONNECT) of the European Commission about ex-ante evaluation of potential policy interventions for the new research funding scheme Horizon 2020 in the area of information and communication technologies (ICT).

For many years, the Evaluation Unit of DG INFSO had a tradition of tendering studies on impact assessment of EU funding in ICT, mostly using Social Network Analysis as a methodological tool for evaluation of funded projects and organizations in the Framework Programmes (FP). They targeted structuring effects of FP ICT networks (RAND Europe 2004–2005), their international reach (CESPRI 2005), the linkages between EU research and deployment and regional innovation systems (CESPRI 2006), ex-post evaluation of the IST thematic priority for FP6 (IST-FP6), or ICT Network Impact on Structuring a Competitive ERA (SMART 2009/0034). SNA was supposed to show central actors (hubs) and clusters, analyze and visualize the position and role of particular actor types (e.g. new member state actors, SMEs etc.), look at cohesion and density of networks, etc. (cf. for results of the mentioned studies Breschi and Cusmano 2004; Breschi et al. 2007; Cassi et al. 2008). Empirical data for these SNA exercises has been always provided by DG INFSO.

However, by the end of FP7, a certain dissatisfaction with the used methodology for evaluation purposes became obvious with DG INFSO. SNA only captured the structural aspects of the research landscape. Actors, processes, and causes producing these structures were invisible between frozen snapshots of two network states following each other. Policymakers became convinced that it would be useful to know about these procedural aspects to find appropriate options and access points for interventions and changes. Furthermore, SNA only allows the evaluation of the structures produced by certain funding policies “ex post”; ex-ante evaluation is only possible in the very limited ways of statistical modelling. The future was not appropriately addressed.

The following discussions in DG INFSO led to their request for the policy impact simulation already quoted at the end of “Planning and prediction as a policy challenge” section (European Commission 2009).

The task of the tender study DG INFSO commissioned accordingly consisted of the usual network analysis for impact assessment of EU-funded ICT research in the Seventh Framework Programme (FP7) and also of a simulation based on these data and findings for ex-ante evaluation of policy interventions for the new Framework Programme called Horizon 2020.

Agents in INFSO-SKIN are research organizations such as universities and research institutions, research departments of big firms, and small and medium enterprises. The model (cf. Fig. 1) simulates the social context in EU-funded research: Calls of the European Commission specify the funding conditions, such as the desired expertise and capability combination of research consortia, the minimum number of partners in project consortia, the duration of projects, the deadlines for proposal submission, the thematic areas, etc. The research organizations in the European Research Area build proposal consortia following these requirements and submit proposals, which will be evaluated. Successful consortia start with their project work and produce research results for the scientific community and deliverables for the Commission.

Fig. 1
figure 1

Source: Ahrweiler et al. 2015

Flowchart of INFSO-SKIN (calibrated with empirical data on 1183 EU-funded research projects, 3783 funded organisations (universities and research institutions (RES agents), research departments of big firms (large diversified firms LDF agents) and small and medium enterprises (SME agents), and 11244 project participations between 2007 and 2012)

The procedural aspects to inform agent properties and behaviors for this specific social context had already been subject to investigation and covered by quantitative and qualitative studies on actors in EU-funded science and research within a number of previous projects conducted by the study team, among them the EU project “Network Models, Governance, and R&D Collaboration Networks” (NEMO). This was used to model the agents and their social interaction context as realistically as possible (cf. Scholz et al. 2010).

Using a dataset provided by DG INFSO on details of funded projects answering the Calls 1–6 of the European Commission, the simulation model INFSO-SKIN was supposed to re-produce and evaluate the research landscape following funding policies of the Seventh Framework Programme (FP7).

Calibration aimed at computationally and artificially reproducing the structures of the empirically observed research networks before starting with any simulations—this just meant to reproduce the database with the model. The data set was capable of calibrating the knowledge base, the social configurations, and the contexts of agents at a given point in time. Due to the comfortable situation of the time series data being available for Calls and Work Programmes of the EC, it was possible to validate the simulations step-by-step by comparing the artificially produced simulation data with the empirical data. The following Figure shows how INFSO-SKIN reproduced the empirical database with simulation data.

The Emp/Sim table demonstrates which output parameters were of special interest to the clients of the study (the simulation produces many more). For them, it was interesting how policy changes to-be-tested would affect the number of participants and participations in research projects (Participants), the number of submitted proposals answering Calls (Proposals), the number of funded projects (Projects), the knowledge landscape (Knowledge and Capabilities), and the network measures for the whole program as funded structure of the European Research Area (Network).

This model, extrapolated into the future without any policy changes, was taken as an empirically grounded benchmark for further experiments to answer the evaluative questions about potential policy changes by DG INFSO.

For the selected set of “what if”-questions, the benchmark question is the so-called zero-hypothesis: what if there are no changes? This is just an extension of the time horizon: the Baseline Scenario. Answering the benchmark question is important for two reasons, both related to the fact that we do not have data about the future: (1) To test the sustainability and stability of network structures by extending time lines, and (2) To use this scenario as a benchmark for comparing its outputs to results of further experiments—this time with policy changes.

We tested the following questions or potential policy interventions against this “baseline scenario”: (1) What if more/fewer/different knowledge fields received funding (question concerning prioritization of research funding)?; (2) What if bigger/smaller project groups than now were to be funded?; (3) What if more/less money was made available for particular programs/project types/actors?; (4) What if policy efforts which try to attract small and medium enterprises to participate in EU-funded research were finally met with success? Results of simulation experiments show likely scenarios following from these interventions as policy options for Horizon 2020 (cf. Ahrweiler et al. 2015) and were presented to the European Cabinet.

The next sections exemplarily present simulation results of the evaluative questions (1) and (4), which were produced by parameter variations in the simulation experiments.

The policy background of question (1) had been a “technology push approach” of the European Commission: “Current EU funding programmes have put considerable effort in tackling societal challenges, predominately through a thematic technology push. Bringing researchers from across Europe together in collaborative networks has been at the heart of this approach and will continue to be vital in sustaining a European research fabric. Experience has shown, however, the limitations of this approach in achieving the necessary flexibility, creativity and cross-disciplinary research needed” (European Commission DG Research and Innovation (Ed.) (2011). Green Paper on a Common Strategic Framework for EU Research and Innovation Funding. Analysis of public consultation. Luxemburg: Publications Office of the European Union, p. 8). Stakeholders of DG INFSO asked: what if there were going to be changes in thematic areas of funding? How would the current research landscape in ICT of the ERA react to this? What if more/less/other thematic areas were going to be funded than the eight chosen thematic areas in FP7? (Fig. 2).

Fig. 2
figure 2

Source: DG INFSO

Empirical dataset for calibrating INFSO-SKIN

The next Figure shows results for one of the many output parameters, namely how knowledge flows between agents will be affected if more or less thematic areas will be funded in the future compared to the present state (one of the dimensions of the output parameter “Knowledge”, cf. Table in Fig. 3 below).

Simulation experiments were surprising and counter-intuitive for the DG INFSO stakeholders. The expectation had been that prioritization of research funding (fewer themes, same money) would have resulted in much more drastic changes concerning the knowledge base and other output parameters. In contrast, the simulation showed a remarkable resilience of the research landscape and its knowledge base. The conclusion drawn was that prioritization of research is basically a political discussion and decision if it stays within a certain realm—a corridor, which could be precisely located in the simulation (Fig. 3).

Fig. 3
figure 3

Source: Final Report 2011; European Commission. (Color figure online)

Validating INFSO-SKIN: Left Emp STREP means empirical data in FP7 (ICT; funding instrument STREP, which are small targeted project consortia); Sim STREP shows the similarity of simulated data for this area and this funding instrument. Blue entries mean that there have been no empirical data for this category; the simulation, however, produces artificial data for the category. Right network visualizations of the empirical and the simulated FP7 network

Also, simulation results concerning question (4) provided interesting results for the stakeholders from von DG INFSO: the policy background for this question had been the long-time attempt of the European Commission to integrate innovative research-intensive SMEs in EU-funded research: “Through their flexibility and agility, SMEs play a pivotal role in developing novel products and services. Outstanding and fast growing SMEs have the potential to transform the structure of Europe’s economy by growing into tomorrow’s multinational companies (….) although particular attention has been paid to increasing SME involvement throughout FP7, SMEs are still finding it challenging to participate” (Green Paper on a Common Strategic Framework for EU Research and Innovation Funding: Analysis of public consultation 2011, p. 10). The Evaluation Unit of DG INFSO had already issued a few tender studies to find out about the reasons for the “policy failure” of why EU funding was not as attractive as expected for SMEs, and why the measures taken had not been as successful as expected. Furthermore, a discussion started among the stakeholders whether the policy efforts and costly incentive structures to draw SMEs into EU research would be really worthwhile and pay off in the way expected anyway (Fig. 4).

Fig. 4
figure 4

Source: Final Report 2011; European Commission. (Color figure online)

Example result for question (1) above: Y-axis knowledge flows between agents; X-axis time line of funding instrument; red line = Baseline Scenario, green line = more themes, blue line = fewer themes

Would the effects on the European Research Area indeed be as positive as expected if there was more SME participation? There were certain doubts—a case for the simulation with INFSO-SKIN. The related simulation experiments started with considerably more research-intensive and highly-specialized SMEs in the starting population than could be seen in the empirical distribution. The simulation showed that these “additional” SMEs over-proportionally participated in proposals and, especially, in successful project consortia. Furthermore, they had positive effects on knowledge and network parameters. This result supported the SME policy advocates in the stakeholder group who represented the Green Paper position and argued against the critics of these policies within the group.

Summarizing, simulation results have informed stakeholder discussions about likely future effects of policy changes. Some of these effects were surprising and counter-intuitive. New knowledge was generated for the stakeholders. Complex contexts were made available and accessible via experimentation. Simulations had helped and practiced how to deal with them. A further gain for stakeholders were the insights into so-called empirical “un-observables”, which were made accessible and observable within the simulation. Supporting the generation of new knowledge and facilitating knowledge flows between actors (learning, diffusion) are central policy targets of STI Policy related to important overall objectives (scientific excellence of European research, cohesion in a high-quality European research landscape, high learning and innovation capacity of European research organizations etc.). Simulations enabled observation of knowledge gains and knowledge flows, and showed the success or failure of policy measures targeting at them.

Conclusions and outlook

Experiments can be used to give an indication of the likely effect of a wide variety of planning measures. Using the above methods, we can deal with most of the complexity features of the “verdict” list of reasons that refute predictability and planningFootnote 5 presented in “Science, technology and innovation for society” section above. The foundation why we can do this rests on two pillars: new (mathematical) tools and (interpretative) social science.

Social simulations, such as the one presented above, indeed offer a construction environment similar to a “social laboratory”, where, on the one hand, the stream of data is produced by the lab, and, on the other, is also analyzed and interpreted by it as well. This applies to any laboratory approach with its specific relation between theory and data (cf. Latour und Woolgar 1979; Knorr-Cetina 1984). The big advantage of computer simulations is that, here, the construction machinery is explicit, codified (indeed by “code”), visible, and can be controlled (and that is “written”) by the observer (with the limitations already mentioned). Models and simulations are the second-order constructions of modelers and simulators—however, different from “analogous model construction” laid out as an algorithm, i.e. codified, explicit, observable, testable, to be manipulated and controlled.

Using computer simulations, stakeholders, for example from STI Policy, can use scenario modeling as a worksite for their own reality constructions. Experiments can point to likely effects of many different planning details. “What if”-questions can be posed (ex-ante evaluation)—an option otherwise hardly available in the policy worlds of planning and prediction. Empirical “un-observables”, such as knowledge flows and learning, can be observed in the model: we can watch what they are doing. This is an important advantage: mostly, simulations have to provide insights especially into issues that empirical observations do not reveal or do not reveal sufficiently. For example, we cannot directly observe “learning”, but we usually look for selected indicators which measure consequences of learning. This then allows the conclusion that learning must have taken place. In social simulation, i.e. in the running theory of learning on the computer, these processes can be observed together with the data produced by them. They become “observables” (for remaining limitations cf. Knepell and Arangno 1993).

A “realistic” ABM with its artificial data gets into contact with empirical data in at least four ways: (1) quantitative and qualitative empirical data is used to calibrate the model; (2) data is processed in simulation experiments for producing particular scenarios (sensitivity analyses, ex-ante evaluation); (3) simulations produce artificial data, which need to be analyzed and interpreted, and which need to be validated against empirical data; and (4) simulation models are evaluated and validated by their users (cf. Ahrweiler and Gilbert 2005, 2015): for the stakeholders to trust the model (and its results), they need to understand the mechanisms represented in the model, feel that they have an input into the design of the agent rules and characteristics, and agree that the baseline simulations of FP7 are sufficiently close to what they observe actually happens.

Studies using the SKIN platform have demonstrated that validation is easier in cases where the simulation model looks as similar as possible to the world experienced by STI policymakers in their daily practices and routines. The simulation must display the same degree of complexity, the same structures and processes identified as relevant by the stakeholders, the same objects of concern, and the same areas of intervention. Below a certain “similarity threshold”, the model is discarded as a “toy model”, which is not realistic and is under-determined by empirical data. In the eyes of the stakeholders, the quality of the model is the better the more of its features can be validated against empirical data, and that means more than just anecdotal evidence. This is, of course, independent of the fact that there will be always necessary selection and abstraction processes of model building, empirical “un-observables” which we will never get any validation data for, and random and probability features of the model, which will lead to its empirical under-determination. Interacting with stakeholders from STI Policy, it is important to find the appropriate trade-off between empirical under-determination and credibility and trust for the model.

To trust the quality of the simulations means to trust the process that produced its results. This process is not only the one incorporated in the simulation model itself. It is the whole interaction between stakeholders, study team, model, and findings (cf. Ahrweiler and Gilbert 2015), and it is the relevant assessment mechanism for the quality of the model. This clearly indicates further areas for work: the entire interaction process between STI policymakers, researchers, data, model, and findings needs to be addressed and investigated systematically in order to understand the dynamics and improve efficiency. It is a co-design process.

Requirements coming from the stakeholders also point to other large areas, which need further work in using simulation studies for policy. In the case of INFSO-SKIN, the policy makers could not watch the running model (a run lasted 48 h), and they did not want to look at huge amounts of data presented in Excel sheets or at a multitude of tables and charts. New visualization tools and interactive technologies were needed to present simulation experiments and their results in an attractive, customized and efficient fashion.

Agent-based simulation can help to shed light into the darkness of the future—not in predicting it, but in coping with the challenges of complexity, in understanding the dynamics of the system under investigation, and in finding potential access points for planning of its future offering “weak prediction”. There is a certain restriction in this as well: it is impossible to predict a certain system state in the future. Statements such as “this is what the biotech industry in the US will look like in 10 years from now” are simply unsound. The type of knowledge that is instead produced is confined to statements such as “this class of future scenarios is more likely to happen than alternative ones given certain conditions,” or “in this parameter setting, the system is reacting strongly to any intervention on parameter x,” etc. It is mandatory to point out the difference to the stakeholders for letting them understand the limitations and caveats of policy modelling, and prevent over-reliance on model results.

Recognizing the predictive limitations has to be complemented by a reluctance to formulate normative statements: policy decisions remain decisions under uncertainty even if the contexts are more transparent and accessible after simulation. This means that the responsibility of democratically-legitimized political actors as decisionmakers cannot be replaced by any “policy recommendations” coming from a scientific project. Final policy decisions should be made based on expert political opinion and value discussions informed by scientific advice. In the case of the simulation study above, future discussions with the former stakeholders will reveal the extent to which model results impacted the actual process of finalising Horizon 2020 policies. Stakeholder feedback will help understand utility and impact, and provide a means to optimize the model, tailor its performance to the needs of policymakers more closely, and make it a better fit to what is required to support data-driven decision making.