1 From ‘grand challenges’ to evaluation challenges: a new form of research impact assessment

Since the 1950s, various methods have been developed to assess the socio-economic impact of publicly funded research. On the one hand, Griliches’ (1958) seminal work led to a major stream of research on the computation of internal rates of return from public R&D, or its cost-benefit ratio. On the other hand, the first case study analyses were developed in the post-world war II era, and focused on retrospective aspects such as historical descriptions, research event studies, or matched comparisons (Bozeman and Kingsley 1997, p. 34), or some combination of these. All these evaluation methods aimed to analyze the links between R&D and innovation, or economic growth more generally, based on the assumption that science has a direct effect on the economy.

In recent years, the focus of research impact assessment (RIA) has changed to include a wider range of non-economic impacts, based on the expectation that science is important for society as a whole. The societal impact of research and its ability to address ‘grand challenges’ is an item on the agendas of innovation policy makers and large national and international public research organizations. In Europe, Horizon 2020 focuses on excellent science and industrial leadership, and managing societal challenges. In the US, the National Science Foundation (NSF) claims that every grant has the potential to advance knowledge and benefit society. Many research funding organizations (the European Commission, NSF in the US, the Dutch Technology Foundation…) evaluate both the scientific merit of proposals, and their broader impacts on society or their potential to produce broader societal effects (Bornmann 2013). Public research organizations (the French National Institute for Agronomic Research, Consultative Group on International Agricultural Research, Biotech and Biological Sciences Research Councils in the UK…), higher education and research institutions (Research Excellence Framework in the UK—Martin 2011) and academic scientists more generally are not only in charge of producing scientific knowledge, but also in charge of addressing societal challenges and diffusing knowledge to socio-economic partners and contributing to public debate and policy decisions. The 2015 Lund DeclarationFootnote 1 discusses how the European Research and Innovation Area might be reorganized and reoriented to better address global challenges: aligning national and European strategies, instruments, resources and actors; supporting frontier research, interdisciplinary collaboration, mobility of world-class scientists and research infrastructures; developing global partnerships with top scientists and innovators; and reinforcing open innovation and the role of end-users.

The current «grand challenges» differ from the previous mission-oriented policies (e.g. the Manhattan and Apollo projects) which supported the military-industrial complex (Gassler et al. 2008) and the development of specific technological capabilities. “Today’s grand challenges are broader in nature and require efforts that are structured for the long run” (Foray et al. 2012, p. 1698). The technologies that will need to be developed will require the involvement of numerous stakeholders, complex evolving networks, public–private interactions, and contributions from end-users. Funding for research and development (R&D) and innovation activities, and the resources needed to develop and diffuse the new technologies will be provided by both governments and several other central actors in the value chain. These new technologies will enable to unlock current technological paradigms and trajectories, and will require social changes. As Kuhlmann and Rip (2014) rightly point out, grand challenges are about systems transformations. They involve changes to social, economic, and technical systems, and objectives which are not given at the outset but emerge along the pathway. This new framework will involve new policy rationales, new policy requirements related especially to the design of research programs, and new methods of evaluation and RIA (Amanatidou et al. 2014).

The complex nature of grand challenges poses new challenges related to evaluation (Amanatidou et al. 2014). Such challenges need multi-disciplinary approaches, scientific, technological, and social innovations, multi-level governance, policy coordination, multi-actor engagement, and a long-term vision. The evaluation challenges include dealing with or overcoming the existing scientific and technological fragmentation, the multiple impact types, the multiple levels of policy actions, the existing policy silos, the broader set of stakeholders, and the identification of research impact. RIA involves collective and system level learning, and should be seen as a tool to guide complex transformation dynamics (formulation of coherent sets of policies and programs, new organizational forms, new social norms…). It is imperative for there to be an awareness of the conceptual changes at stake. RIA must be designed and conducted to enable a better understanding of the impact generating mechanisms, i.e. the various chains involved in the translation of research results into impacts (Joly et al. 2015). System transformation is a long and complex process with multiple causes and consequences which needs new RIA approaches suited to the interactions between research, innovation, and society.

2 The need to develop an integrated evaluation system

Against this background, this Special Issue gathers together papers that discuss the methodological challenges and the transformations to RIA methodologies in practice. Evaluating the various impacts generated by publicly funded research (conducted within R&D programs or within public research organizations) requires the mobilization of different methodologies. Evaluation should be seen as a multifaceted exercise that provides relevant information to the various stakeholders involved in different types of decision processes. RIA objectives are not related only to accountability and allocation; they also involve advocacy, and learning. Thus, a major issue will be the better linking of evaluation approaches and evaluation strategies to learning and continuous improvement, and to a more conventional role of evaluation for policy justification (Shapira and Kuhlmann 2003). There is a need for a system of evaluation that is based on several different approaches that can be integrated with a comprehensive management and political system. The several existing RIA methods can be grouped into two sets of approaches based on the types of impacts considered, and on their theoretical foundations. We would suggest that these approaches should be considered complementary.

The first set of approaches were developed to assess the economic impact of publicly funded research. Econometric methods are used to estimate the impact of research expenditure on productivity gains in order to compile cost benefit ratios or rates of return on investment in public research (Alston et al. 2009). Quantitative methods have generally been applied to the assessment of the economic impacts of specific public R&D programs such as the Advanced Technology Program (Ruegg and Feller 2003), or EU Framework Programs (Policy Research in Engineering Science and Technology PREST 2002). Control group, counter-factual, cost-benefit, econometric, and input–output approaches are rarely applied to the evaluation of innovation policies across Europe (Edler et al. 2012). This set of quantitative approaches mostly estimates a wide range of economic impacts such as cost savings, rate of adoption of technologies, the impact of increased product quality on sales, economic efficiency of alliances, impacts on firm’s productivity, input–output additionality, etc. It is assumed that these quantitative approaches are useful for justifying existing public R&D programs at the national, regional, or industry level–although there is little evidence of their effectiveness. They help to highlight the impacts of and improved performance of participants’ practices (improved partnership management, increased production process efficiency…) but they do not add to our understanding of the processes and mechanisms that generate overall wider impacts.

The second set of approaches addresses the societal value (or “public values”, Bozeman 2003) of research, and considers both economic impacts as well as environmental, health, political, etc. impacts. State-of-the-art methods combine qualitative case studies and quantitative indicators (Donovan 2011; Douthwaite et al. 2003; Joly et al. 2015; Spaapen and Van Drooge 2011). They allow for the complexity of impact generation mechanisms related to networks of knowledge translation (Callon 1986) evolving along the various stages of an impact pathway. Broader impact approaches help our understanding of the impact generating mechanisms, and the beneficiaries and co-innovators’ roles in the funded projects analyzed via detailed case studies. These approaches are useful for learning purposes, and the lessons derived from these analyses help in the anticipation and identification of critical points, and how they can be overcome for future impact pathways. However, they do not provide an overall assessment of the economic efficiency of research.

The complementarity between these approaches can be demonstrated also at a more methodological level. They are reciprocal and involve dialogue, rather than being opposed. Some of the hypotheses of macro-economic models, such as adequate lag structure could be informed and founded empirically on case studies analysis. Quantitative approaches could extend the types of impacts measured from only the economic benefits. Case studies generally consider a wider variety of impacts compared to quantitative approaches but are rather weak on their measurement. Efforts could be made to achieve coherent measurement of economic benefits across quantitative and qualitative approaches to facilitate a scaling-up from case study level to the meso or macro levels. These approaches would be complementary and would provide more robust and original measures.

The idea of complementarity of the approaches in this context is particularly daring since both methods have some important differences which should not be overlooked. Quantitative methods used to assess economic impacts are usually based on a simplified and reduced representation of the research and innovation processes in which investment increases the stock of knowledge which in turn, increases productivity. This focus on increased productivity and social welfare tends to overlook possible side-effects that might have a negative impact, and assumes that economic growth results automatically in social progress. These approaches consider knowledge, resources, and projects as additive, and that their impact can be attributed to specific organizations, projects, or geographic regions. They primarily serve the objectives of accountability and (supposedly) budget allocation. In contrast, RIA methods developed recently to assess broader impacts acknowledge that innovation is complex, interactive, and produced in systemic contexts, that there is a shift from mode 1 to mode 2 production of knowledge (Gibbons et al. 1994), and a shift from pure competitiveness to a need to address societal challenges. In these system oriented approaches, actors contribute to the generation of societal impacts within complex and evolving productive configurations or networks. The main aim of these approaches is to understand the impact generating mechanisms and processes, and to support policy and organizational learning.

The papers included in this Special Issue try to address some of the methodological challenges referred to. It is time to overcome the opposition between quantitative and qualitative approaches. This is a key issue for both the social sciences and RIA.

3 Methodological challenges and complementarity issues: from theory to practice

The six papers in this Special Issue develop, discuss, or assess new RIA methodologies focusing on and highlighting the dynamics of research, innovation, and societal change. They often consider a variety of impacts, and adopt complementary approaches to increase robustness: qualitative and quantitative evaluation, multi-objective analysis, multi-level analysis. For analytical reasons, we organize what follows according to four topics: opening the black box of impacts (3.1), assessing the broader impacts (3.2), developing hybrid approaches (3.3), and lessons from implementation (3.4). The papers are grouped under these topics in order to highlight their main focus and strengths in relation to the issues central to RIA discussed above.

Feller, in his paper, adopts a reflexive stance, and reminds us that we do not have to forget history. He observes that as far back as the 1970s the societal impact of research was already being discussed; and currently, it is used as a baggy term. Hence, it is necessary to clarify what is meant by societal impacts, why there is an urgent need for them to be assessed, and how this should be done. All of these issues are tackled in different ways in the papers in this Special Issue.

3.1 Impact generating mechanisms: opening the black box of impacts

The papers in this Special Issue generally recognize the nonlinear nature of impact pathways. The paths to solving challenge-oriented problems are often long and complex, and involve a variety of stakeholders whose interactions and networks evolve over time. A major challenge whatever the method developed, is enrolling this variety of stakeholders and especially end users in RIA studies to allow an understanding of not only the different impacts but also their generating mechanisms and turning points.

Van Drooge and Spaapen’s paper focuses on the new arrangements needed for the governance (evaluation and monitoring) of transdisciplinary research collaborations (TDCs) to cope with grand societal challenges. The governance of these new distributed forms of cooperation need to consider the dynamic processes involved in TDCs (learning, network evolution, involvement of various stakeholders in different ways in different steps, co-existence of various interests). A key challenge for researchers involved in TDCs is to conduct multidisciplinary and responsible science, to involve stakeholders from the agenda setting phase through the allocation decisions and evaluation analysis steps. This entails “a participatory and distributed approach to evaluation in which stakeholders are empowered and committed”. The authors adopt an evaluation approach based on participatory impact pathway analyses (PIPA) with the intent to develop mutual learning, to involve stakeholders from the beginning, and to plan the successive steps in the development of the project to its impacts. PIPA is based on a series of participatory workshops in which all participants jointly, develop, adapt (if necessary), and agree on a theory of change. Participants create and maintain a shared vision and understanding of the whole project and its impact on society. They build logical frames, a common understanding of the pathways to impact, the causalities and underlying mechanisms. Evaluation is considered an integral part of governance, and its objective is to improve the collaborative understanding of the joint process and progress towards the common societal goal.

The BETA-EvaRIO method (Bach and Wolff) was developed to evaluate the socio-economic impacts of research infrastructures (RI). It focuses on the evaluation of various types of effects generated by different forms of learning experienced during RI-based activities (building, operating, and using the RI) performed by different actors (RI operators, suppliers, and users). One of novelties of this method is that it evaluates the extent to which performing RI-based activities increases the capacity of the different actors. The fundamental hypothesis is that more gaining capacity increases the potential for the generation of future economic effects. The capacity effect corresponds to changes in resources and competences (related to science and technology; management activities, and organizational change, external ties, and reputation), and to the ability to make these resources evolve (human capital). Exploitation of this capacity change might increase the performance of RI-related activities and/or allow the generation of economic benefits in activities other than RI-related ones. In other words, the main impact generating mechanisms at stake are the accumulation of knowledge and competences and learning processes leading to capacity changes. Finally, the method combines elements of attribution and contribution. Attribution is linked to the way the economic effects are measured while contribution refers to the complexity of the process underlying the impact creation (the interactions among multiple actors, projects, knowledge-bases…).

Morgan Jones et al. analyze the UK government’s experience of implementing the national research assessment exercise, the Research Excellence Framework (2014) which for the first time included evaluation of research impacts. UK universities submitted 6975 case studies across 36 units of assessment. Each case study describes impacts that occurred between 2008 and 2013 (for REF 2014). The authors provide a detailed description of the whole assessment process and the methodology used to assess the submission process related to the impact component of the REF, and the evaluation process. They discuss how this RIA exercise took account of some important evaluation challenges: time lag between research phases and the first impacts, non-linearity of the impact pathway, engagement with end users, and understanding the attribution vs contribution debate. The REF uses a research window of 20 years (1993–2013), which was considered too long by some actors and too short by others. The impact window (impact is considered if it occurred between 2008 and 2013) was heavily debated since impacts can occur at any point. The evaluators interviewed criticized the linear thinking about impacts that was forced by the case study template. Although evaluators recognized the complex and chain-linked nature of the impact pathways, it was difficult for them to provide information about the iterative and reflexive nature of these effects. The REF opted for a contribution approach since one of the aims of the assessment was to demonstrate in what way the research considered contributed to the impact and not by what proportion. The contribution should be decisive, even if marginal (without this research, the impact would not have occurred). The authors consider that attribution should be assessed only if the evaluation is linked to the allocation of funding.

Although this dimension is not developed in the paper by Gaunand, Colinet, Joly and Matt, this contribution draws on the ASIRPA project which aims explicitly at understanding impact mechanisms (Joly et al. 2015). The measurement of impacts is positioned within the frame of an approach based on standardized case studies that allows a link between the understanding of such mechanisms and an assessment of the diversity and magnitude of the impacts produced.

3.2 Broader impacts and metrology

Evaluating the societal impact of research requires the set of evaluated impacts to go beyond economic impacts. This is not straightforward since apart from scientific and economic impact, the appropriate assessment methodologies are scarce. As Feller notes in his paper, the sum of the efforts devoted to RIA has led to a greater emphasis on metrics and methods for assessing scientific and economic impacts relative to societal impacts, to the further disadvantage of the latter. The categories of impacts considered by each study might vary but usually involve economic, health, environmental, social and political dimensions (others include culture, public services etc.). There are at least two challenges linked to the broader impact issue: (1) to clearly define each category of impact since the boundaries are not always clear cut, and (2) to define the appropriate metrics.

Morgan Jones et al. underline in their paper that the HEFCE (Higher Education Funding Council for England) identifies 8 main categories of impact while analysis of the impacts identified by the case studies include 60 impact topics. Providing verifiable evidence of impact, and developing a shared understanding of impacts, were considered by the evaluators involved in case studies to be the overriding challenges. These challenges include the provision and collection of particular types of evidence, connections with the users of the research, and the perception that quantitative indicators were preferred over qualitative evidence of impact. Impacts related to policy and cultural changes, and greater policy awareness, were found to be particularly difficult to grasp. Demonstrating and providing evidence of these impacts is especially difficult since data are not collected routinely, and the evaluators were advised to look for quantitative evidence. There are problems related also to the how much the evaluation exercise depends on the abilities and willingness of beneficiaries and end users to provide accurate information.

The paper by Gaunand et al. presents an original methodology for assessing the political impact of research. The paper draws on the most up to date literature to develop an understanding of the different ways that research can contribute to policy. An expert panel is used to produce a rating scale whose successful implementation shows that it can be regarded as a generic tool. However, they make no claims about the measure being objective but do show that it is robust. Gaunand et al. consider that the advantages of assigning an ordinal measure to the political impact of research outweigh the risks of misuse of an impact number. These advantages include influence over political agenda-setting by showing what really matters, and the opportunities provided for scaling-up analyses of multidimensional impacts and identifying impact-generating mechanisms, and learning about and promoting discussion of the value systems reflected in the assessment.

3.3 Formulating comprehensive evaluation designs

As suggested by Feller, substantive ex post assessment of societal impacts is less a matter of choosing a specific technique or bundle of mixed methods from those listed above and more one of formulating a comprehensive but flexible evaluation design grounded in relevant social science research. From this perspective, the papers in this Special Issue explore three types of complementarities: combining qualitative and quantitative evaluation methods, following multiple evaluation objectives (accountability, learning, advocacy), and considering various level of aggregation. We contend that developing methods incorporating complementary evaluation elements increases not only the robustness of the method but also its significance.

3.3.1 Combining qualitative and quantitative evaluation methods

Current state of the art related to RIA methods involves combining qualitative elements and quantitative metrics.

Evaluation of the Austrian START program (Seus and Bührer) is based on a multi-method approach. The START program supports individual, post-doctoral researchers through allocation of grants (of up to 1.2 million euros) for a maximum period of 5 years. The evaluation is aimed at understanding to what extent the START initiative affects the scientific performance, career paths and rate of advancement of grantees over time. The authors use a quasi-experimental design with a counterfactual (use of control and comparison groups), bibliometric analysis, an online survey, interview-based case-studies, documentation review, and expert workshops. The experimental design enables assessment of how much the measured effect is attributable to the policy instrument being evaluated. Careful attention is given to the choice of the control and comparison (unsuccessful applicants) groups. The bibliometric analysis assesses the scientific performance of START grantees. The online survey provided quantitative information on their career progress. A series of interviews was conducted with grantees, representatives of host institutions, START project group members, institutional stakeholders, and START committee members. The interviews provided information to understand the cause and effect mechanisms in the funding process. This mixed method approach allowed the limitations of each individual method to be overcome, allowed data triangulation, and provided robust results. Triangulation allowed especially to contrast the results in the analysis of grantees’ career development.

In Van Drooge and Spaapen’s paper, the PIPA approach combines logic frameworks and quantitative and qualitative indicators. The indicators are related to measuring the aim for change and the different project steps (inputs, activities, outputs, outcomes, impacts). These indicators are used to monitor the collaborative process, and to inform the stakeholders about successes or failures. The objective of the indicators is not to compare projects or measure scientific excellence but to understand the collaboration dynamics and the societal changes achieved.

The BETA-EvaRIO evaluation approach combines quantitative and qualitative metrics. Socio-economic effects (direct and indirect) are described in monetary terms and express the added-value generated by income, and the cost-reductions enabled by efficiency gains, new R&D research contracts, time savings, etc. The capacity effects and the effects on performance are assessed using quantitative indicators and qualitative observations. The wide range of data used was collected via direct interviews, surveys, RI archives, and existing S&T databases. This mixed approach allowed data triangulation and increased the robustness of the results. It relates economic performance expressed in monetary terms, to rich explanations or at least hypotheses about the underlying learning mechanisms and complex causalities.

3.3.2 Multi-objective evaluation

Evaluation studies may be conducted to fulfill several different reasons and objectives such as advocacy, accountability, and learning.

The BETA-EvaRIO (Bach and Wolff) method claims explicitly to meet all of these goals. Quantification of direct and indirect effects can be considered proxies to calculate ‘returns on investment’. The information collected during the interviews, and the list of indicators create learning opportunities related to how the impacts were generated, and how they might be improved.

The main objective of the UK REF exercise (Morgan Jones et al.) is to achieve a better allocation of funding in the UK higher education system. The REF is an ex post assessment aiming at evaluating the quality and impact of research conducted in UK higher education institutions. The focus of the evaluation is on outcomes more than the processes and mechanisms leading to impact. The authors claim that to assess the impact generation process would require a different kind of evaluation study that would focus on the understanding of the underlying mechanisms. Morgan Jones et al. suggest that in future, more consideration should be given to the learning objectives of the evaluation. They consider this especially important to achieve policy makers’, researchers’, funders’, and other stakeholders’ objectives of supporting research with the greatest potential impact.

In the case of the paper by Gaunand et al., the assessment of impacts takes account of both advocacy and learning. Their experience shows that if the assessment exercise involves the researchers concerned, then this contributes to develop a culture of impact. As the authors point out, implementation of the original methodology contributed to a better appraisal of the mechanisms that generate political impact through the actions of researchers and managers.

3.3.3 Multi-level approaches

In most cases, each method is designed to evaluate impacts at a specific level-micro, meso or macro—and it is difficult to aggregate or scale-up micro-and meso-level evaluation results to the macro-level.

The multi-method approach used to evaluate the impact of the START program (Seus and Bührer) presents aggregate results at the three different levels. At the micro level, the authors analyze the effects of the program on the funded grantee, through changes to scientific performance and career developments. The quantitative techniques applied to evaluate the impacts at the micro level allow the attribution problem to be overcome. At the meso level, the authors consider the impacts on START group members and higher education institutions (i.e. on the indirect beneficiaries). Here, the problems linked to attributing the effects to the program increase, and qualitative information is required to understand the program’s effect at the more aggregate or indirect level. At the macro level, the study looks at the impact of START on the Austrian research system by analyzing the regional and disciplinary distribution of grants, and the success in achieving a grant from the European Research Council (international strength of the Austrian research system). Establishing a direct causal link between the program and meso-macro level impacts is a risky and difficult task.

The BETA-EvaRIO approach considers four types of impacts (direct impacts, capacity impacts, indirect impacts, and impacts on RI performance) and three types of actors (RI operators, RI suppliers, and RI users) which could entail different aggregation strategies. The BETA-EvaRIO method is designed to evaluate micro-level effects at the level of actors. The quantification in monetary terms of direct and indirect effects allows these effects for a category of actors to be summed to provide a meso-level measure (assuming that inconsistencies and double counting are not present). The heterogeneity of metrics and the nature (indivisibilities, overlaps, combination) of the other categories of effects prevent such aggregation, making macro-level analysis infeasible.

3.4 Lessons from implementation

Some of the papers in this Special Issue not only develop some interesting methods but also implement them in a particular context. They provide several important lessons.

In the case of the PIPA approach (Van Drooge and Spaapen), a new form of accountability emerges. Instead of accounting for the ex post individual performance of the various research participants and innovation projects, the evaluation of TDCs should consider the shared responsibility within the project. Stakeholders have different interests and ideas about the ability to control (using appropriate indicators) project quality and relevance, and they also have different expectations about the goals, organization, and impacts of research and innovation projects. The study underlines that achieving a shared understanding and establishing shared responsibility take time. Mutual learning is fundamental for their achievement.

The REF exercise (Morgan Jones et al.) has had potential effects on the UK higher education system. It has induced changes to practices and culture at the individual level (greater appreciation of their colleagues’ work outside academia, higher self-esteem among researchers involved in third mission activities…), and at the institutional level (impact as a criterion for promotion, creating department strategies, planning impact, using impact case studies for marketing and advocacy at the regional level…). There has been some concern about a possible shift in the research agenda from blue sky research towards more applied and impact-driven research. This can be alleviated through the provision of appropriate incentives from HEI leaders who are responsible for ensuring balance. Finally, the representativeness of the impacts captured by this exercise could be increased by including a larger number of case studies.

As several of the papers allude to, impact assessment is fundamental to the renewal of the social contract between science and society. Commissioner Carlos Moedas stated emphatically at the German National ERA Conference (October 2016, Berlin) that:

“We have an obligation and an incentive to be much better at understanding and communicating the impact of what we do. Not only to ministers of finance, but to the general public!”

Societal impact is indeed the cornerstone. However, as suggested by Feller, since societal impact is a rather loosely defined term, it would be fanciful to try to homogenize definitions and analytical perspectives that might constitute the base for a common evaluation approach, method, and metrics. Hence, it is necessary to cope with having a diversity of ways to assess impact. This could be a major problem if it is considered that the aim of impact assessment is to provide an objective measure that allows marginal adjustments to research budgets. However, if impact assessment is seen as allowing conscientious attention to the transformative effects of research, the diversity of the tools available is a precious asset.

The outcome is likely to be a functionally and methodologically segmented cluster of non-comparable sometimes competing approaches to assessing societal impacts—in other words continuation of the status quo.