Introduction

Policy appraisal processes have become an established part of the policy making landscape. Research is commissioned, stakeholders consulted and policy impacts assessed with the various aims of protecting the environment, making ‘better’ regulation and mainstreaming a neo-liberal approach to policy (Turnpenny et al. 2009: 640). Such ex ante analysis is especially likely in knowledge-dense or technically complex policy problems, where decision-makers’ experience sizeable knowledge deficits and struggle to predict the consequences of their activities. So far, the growing academic interest in appraisal has focussed on categorising analytical tools and procedures, explaining their diffusion, use and non-use (Nilsson et al. 2008; Radaelli 2004, 2005; Turnpenny et al. 2008, 2009). A key strand of consensus that has developed is that the gap between the rational-analytic promise of policy appraisal and reality of the ‘policy mess’ results in significant barriers to decision-makers’ learning (Hertin et al. 2009). This paper aims to expand on this finding by exploring how and if appraisal makes institutions think differently (Radaelli 2007) and, specifically, the depth of learning that policy appraisal engenders, and how we can account for the survival policies known to pose significant countervailing risks.

Rather than adding to the rational–analytical accounts of appraisal use that dominate the nascent literature, the institutional context of policy appraisal is explored with a view to getting under the skin of the ‘policy and politics’ of policy appraisal (Turnpenny et al. 2009: 640). Specifically, the paper goes beyond the conventional consideration that ‘institutions matter’ and uses path dependence analysis to explore a specific proposition; policy appraisal processes, which are designed to help decision-makers think and learn, may actually reinforce limited learning forms in government. The discussion rests on the assertion that a lack of synchronicity exists between making and delivering policy to a political timetable on the one hand and producing knowledge that is robust and clear enough to guide policymakers on the other. The proposition advanced here is that, in issue areas marked by policy urgency and technical complexity, this temporal disjuncture can result in an array of evidence and signals about potentially countervailing risks that decision-makers are unable to weigh and navigate, in the time they have. In such circumstances, we can expect decision-makers to fall back on early policy frames and institutionalised ways of thinking. The information produced by appraisal will be heavily filtered by institutional processes associated with the evolution of the technologies in question; the rules and hierarchy in political life, and the norms that inform political actors’ internal representations of issues. These forces impact upon the depth of learning that is possible and, in particular, reinforce the tendency towards limited forms of organizational learning already present in the political world.

The first section of the paper sets out the proposition. Here what is being explained—organizational learning—is outlined using Argyris and Schön’s (1974, 1978) seminal model. Their account, which contrasts shallow ‘single-loop’ learning with deep ‘double-loop’ learning, is used as the basis for scoping out the dependent variable—the learning form associated with policy appraisal. Three temporal challenges that underpin the policy-knowledge development interface are then outlined and related to the two learning types. Drawing on institutional analysis from new institutional economics (NIE), the paper explores how the results of policy appraisals in technically complex issues are mediated by institutions. Specifically, the ‘rules of the game’—that are constructed and reproduced to ensure stable and predictable political interactions (North 1990, 1994; Pierson 2004). Using the NIE conceptualisation, section two of the paper explores how policy appraisal evidence that both supports and undermines a policy goal can be filtered through four positive feedback processes familiar to NIE analysis: large set-up costs; learning by doing; coordination effects and adaptive expectations (Arthur 1988). Empirically, this is applied to UK biofuels policy, and specifically the interpretation of policy appraisal evidence that emerged in the development of the Renewable Transport Fuel Obligation (RTFO) between 2004 and 2008. The paper concludes by summarising the findings and reflecting on the wider significance of the characteristics of positive feedback on the depth learning that policy appraisal can generate, and the measures that can be taken within government to disrupt these forces of inertia.

While the paper offers some early evidence on state responses to climate change in general and biofuels in the UK in particular, this case study illustrates the learning challenges decision-makers face when policy appraisal processes produce evidence of anomalies between the stated goals of policy and its potential consequences. In this way, the case is treated as illustrative of the high level of complexity and temporal pressures that increasingly confront decision-makers attempting to engage, not only with technologies to address sustainable development, but knowledge-dense issues more generally.

The major limitation of the account is that when analysing a ‘live’ issue not all learning can be captured, and so hard results are necessarily limited. What learning gets left out? It is not only policy analysts who produce appraisals, and the decision-makers attempting to decipher the resulting evidence, who face temporal challenges. Learning processes have their own temporal dimension—with enlightenment and policy oriented learning happening over protracted periods of time (Sabatier 1988; Weiss 1979). Research asking what depth of learning appraisal has stimulated is itself looking at the ‘snapshot’ rather than the moving picture (Turnpenny et al. 2009: 468).

The proposition: policy appraisal, the rules of the game and single-loop learning

Single and double-loop learning in complex organizations

Before we explore the type of learning that policy appraisals can stimulate, we first need to outline key forms of organizational learning more generally. What sort of learning is possible within government? Arguably the most influential work on learning in complex organizations is that of Argyris and Schön (1974, 1978). All organizational life is marked by a paradox—the pressure for stability and predictability on the one hand and the necessity for change on the other. In complex multi-level, multi-layered settings, this paradox creates tensions in how decision-makers deal with situations, where something is predicted to go wrong, or, there is the potential for damaging countervailing risks that are difficult to resolve. This focus on complexity and definition of learning as the detection and correction of error makes Argyris and Schön’s thesis, which distinguishes two depths of learning, a good fit with analysis of what government learns from policy appraisal.

Action in organizations is encapsulated by the idea of ‘theories-in-use’, which are comprised of three linked components (Argyris and Schön 1974, 1978). These can be described and related to policy action in this way:

  • Governing variables that represent the objective or policy goal to be achieved,

  • Action strategies that are comprised of the policy instruments and tools deployed to deliver those objectives, and

  • Consequences, both intended and unintended, that result from the goals set and action taken to reach them.

When the consequences match the policy goal, an organization’s theory-in-use is confirmed. Where there is a mismatch between intention and outcome, one of two learning types is triggered in response—single-loop or double-loop. The difference between single and double-loop learning can be captured in the neat shorthand of ‘doing things better’ versus ‘doing things differently’ (Hayes and Allinson 1998). Organizations that first look for another action strategy, with which to achieve their goals, are engaged in single-loop learning. Such learning is thermostatic—based on adjustment rather than fundamental change. This constrained character has lead some scholars to argue that when they engage in single-loop policy adjustment, decision-makers are not actually learning at all (Haas 1990: chap. 1). In double-loop learning by contrast, the frames and norms that underpin policy goals are problematized and often disrupted. Double-loop learning is expansive; it requires a willingness to question the appropriateness of goals and ‘revalue’ them (Haas 1990: 24). Figure 1 offers a simple illustration of the two learning types.

Fig. 1
figure 1

Theories-in-use and single and double-loop learning. Source: Smith (2001)

How does this thesis relate to decision-makers’ context? The political world is not efficient in the way the economic sphere aims to be; rather the complexity of the tasks outstrip humans’ information-processing capacities (Simon 1957). This opacity and the cognitive limitations experienced by decision-makers make it particularly prone to single-loop learning (Lindblom 1959; North 1990, 1994; Pierson 2000, 2004; Simon 1957). Issues have multiple linkages, the presence and consequences of which are often unclear and difficult to calculate in a time frame that is politically tenable. Even where a problem is easy to diagnose, solutions can be difficult to identify and develop—decision-makers do not have an endless supply of ‘plan Bs’ at their disposal (Allison 1971). Decision-makers aim to reduce uncertainty in the short-term, and as a result may downplay the significance of dissonant information resulting from policy appraisals, preferring to argue that the benefits outweigh the drawbacks until proven otherwise.

While Argyris and Schön’s is a prescriptive account, where double-loop learning should be the goal for every organization, it is worth noting that no such assumption is followed here. In politics, there are many conceptions of what makes ‘good’ policy, ‘what works’ and constitutes ‘policy success’ (Lindblom 1959; Marsh and McConnell 2008; Parsons 2004)—ranging from the rational–analytic view that underpins double-loop learning to highly politicised definitions where power and material interests displace learning. More usually, the political world tends towards adaptive behaviour. To establish themselves as credible and legitimate actors, decision-makers engage in patterns of behaviour and construct institutions that emphasise stability and predictability. A world of double-loop learning, in which goals and underlying assumptions are readily and publicly questioned, is one of low trust and instability rather than calm continuity. Institutions offer a way to avoid such uncertainty, by reproducing and reinforcing existing policies and power structures. There is also evidence that adaptive learning is actually advantageous in particular issues—notably, complex and chronic problems where knowledge is evolving and inconclusive (Gunderson and Light 2006).

The research question

What depth of learning can policy appraisal stimulate? Policy appraisal tools and processes are intended to help decision-makers learn and institutions think (Owens et al. 2004; Turnpenny et al. 2009). They exist both as a panacea to the inherent ambiguity of the political world described above and as a source of authoritative justification for the policy changes that may be undesirable otherwise. Can policy appraisal processes counter the single-loop tendencies of the political world? To understand the types of learning that policy appraisal can stimulate, we need to understand the limits within which policy appraisal operates. The proposition is that where policy problems are urgent and potential solutions involve complex technology and an emerging evidence base, policy appraisal processes may not encourage deep learning. Specifically, it is argued there are three temporal challenges associated with policy appraisal processes that reduce decision-makers capacity to engage with evidence—especially on countervailing risks—and exacerbate the tendency towards single-loop, adaptive behaviours.

The first challenge is the reality that policy appraisals may help shape and justify policy goals, but they do not precede them. While appraisal happens ‘upstream’ in the policy process, policy goals are often well established by the time reports have been commissioned, consultations started and analysis of evidence begun. This is especially likely in multi-level decision-making structures or situations where a policy problem and its potential solutions are technically complicated (Dunlop 2007, 2009; Dunlop and James 2007). Where policy is being constructed in a context of complexity and uncertainty, decision-makers may find themselves appraising policy options for delivering goals they cannot easily revisit or retract. The epistemic inputs that are most relevant to decision-makers are those that represent ‘useable knowledge’ (Haas 2004; Lindblom and Cohen 1979), which helps them refine policy strategy rather than those disruptive to overall policy objectives. In such circumstances, there may be a high potential for anomalies and inefficiencies in policies to persist, even where they are detected by appraisal because decision-makers lack the scope to reflect on them.

The second challenge concerns the different standards that underpin knowledge creation and policy development. For the former it is wide validation and epistemic consensus, and for the latter, the delivery of political preferences is commonly the primary goal. These contrasting motivations mean that the timetables that govern knowledge creation and policy construction are distinct—with the former being more protracted and open-ended than the latter. Policy appraisal is an artificial construct, which aims to bridge this temporal gap and offer a compromise that can result in an evidence-base for policy. In policy appraisal, evidence is produced against the clock. To catch decision-makers’ attention, and warrant further consideration, it needs to exist in a digestible and clear form before policy has been implemented. However, the arrival of a scientific consensus will not always coincide with the policy timetable. Binding the evidential production of evidence to the timetable of policy development timetables reduces the certainty of what is produced, because its scope is necessarily restricted to making predictions at one particular juncture about what the impacts of policy might be. The tendency is towards capturing the ‘snapshot’ as opposed to the ‘moving picture’ (Pierson 1996), with policy appraisal processes conflicting with the cumulative character of knowledge production (Kuhn 1962). And so, any synchronicity between appraisal and epistemic consensus becomes a matter of chance and not design. In this view, the snapshots produced by appraisal processes may offer few clues as to how different aspects of knowledge fit together, leaving the form or even existence of a bigger picture unclear. Such de-contextualisation may lead decision-makers to dismiss as conjectural early indicators of problems which are substantiated later.

The third temporal challenge found at the policy-knowledge interface concerns information overload. The policy legitimation function served by appraisal ensures a plurality of evidential inputs; however, the restricted length of time that exists for the interpretation of these inputs can leave decision-makers overloaded with evidence about a huge array of potential countervailing risks that might be triggered by the policy they are developing (Graham and Weiner 1995). This creates validation difficulties in knowing what weight to attach to a piece of evidence, thus increasing, rather than reducing, uncertainty about the costs of certain courses of action. Such uncertainty, in turn, reinforces existing patterns of thinking and initial policy frames and, in doing so, exacerbates the political tendency towards single-loop learning. In this way, by addressing one capacity problem—the much discussed lack of information available to decision-makers (see Turnpenny et al. 2009 on ‘type 2’ research on policy appraisal)—policy appraisal processes, and the temporal limits they place on knowledge development, can actually give rise to others notably too much evidence to sift in too little time. In short, policy appraisal processes may increase not decrease uncertainty and complexity in decision-making, ‘endarkening’ rather than enlightening (Weiss 1979: 430).

The analytical framework: explaining the impact of policy appraisal

How can we explain the impact of policy appraisal in knowledge-dense policy dilemmas? The temporal tension that lies at the heart of policy appraisal, between knowledge production and policy development, increases the importance of existing institutionalised ‘rules of the game’. We know that when faced with a wide range of conflicting signals, and complex or incomplete information, decision-makers rely on existing modus operandi and habits of thinking to simplify, interpret and weigh evidence about the potential impact of a policy. North conceptualises these formal procedures and informal norms and understandings as ‘humanly devised constraints that shape human interactions’ (1990: 3). The second aspect of the proposition explored here involves explaining how the evidence yielded by appraisals is interpreted in knowledge-dense policy problems. This is done using the insights into new institutional economics (NIE) (Arthur 1994; North 1990), and its extensions in political analysis (Pierson 2004). Specifically, the mediating influence of three aspects of these rules is explored.

First, they encapsulate the tendency in complex, knowledge-intensive sectors for particular technological ‘solutions’ to gain an early advantage and become locked-in even where they are found to be sub-optimal (Arthur 1994; David 1985; Romer 1986, 1990). In the evolution of technologies, small events may exert disproportionately large and long-lasting effects (Arthur 1988). So, for example, where a technology appears to offer the main answer to an urgent problem or fill a profitable gap in the market, economic, political and cognitive resources that are invested in its development ensure that it can persist even in the face of evidence of deleterious effects or inefficiency. Thus, having an early niche or ‘being fastest out of the gate’ can lead to ‘monopolistic domination’, and path dependence, as the costs of changing become prohibitive (North 1990: 94).

Second, this argument can be extended to institutional development around policies (North 1990; Pierson 2004). To navigate their way through complex policy problems, decision-makers create formal constraints—systemic structures, rules and procedures—that enhance stability, and deliberately bind them (and their successors) to particular policy goals. This encourages continuity, and enhances predictability in the uncertain political world. Over time, the institutions and policies which embody these rules become resistant to fundamental change as they become reinforced by organizations and interest groups with an interest in keeping the existing constraints (North 1990: 99). We should be careful to distinguish between policies and the policy appraisal of them. Policies concern the goals and tools that have been used to signal to actors about what is to be achieved and how (Pierson 2004; Pierson and Skocpol 2002). The incentives and opportunity structures that flow from them often precede any role for policy appraisal.

These rules, and the power asymmetries and opportunity structures they give rise to, both reflect and reinforce norms and cognitive frames that dominate thinking around an issue, and provide policymakers with ‘mental maps’ (Argyris and Schön 1974; Denzau and North 1994) about what is technically, systemically and politically feasible and desirable. These maps, which are often based on first impressions (Mannheim 1952), represent important tools for intendedly rational decision-makers to navigate ambiguous political and technological terrain (Denzau and North 1994; Simon 1957). These subjective constructions of the contribution made by a particular technology to the resolution of a problem, and how to harness that solution procedurally, represent the third component of the rules of the game. It is difficult to convince decision-makers that these cognitive shortcuts may no longer be valid, because these ways of thinking both pre-date, and inform, the construction of formal procedures and technology selection [in a process akin to the idea of ‘sedimentation’ (Tolbert and Zucker 1996)]. Even where a policy initiative is new or novel, aspects of the rules of the game that surround it will be well established in layers of underlying values and understandings.

The array of new and conflicting information yielded by policy appraisal, about the consequences of a course of action, is filtered through this ‘institutional matrix’ of inter-dependent technical, procedural and cognitive constraints (North 1990: 95). Significantly, as actors commit to them, these rules generate self-reinforcing activity (Arthur 1994) creating an inertial tendency toward initial policy choices and frames; ‘[T]he farther into a process we are, the harder it becomes to shift from one path to another’ (Arthur 1994 in Pierson 2004: 18). Thus, the positive feedback created by institutional rules and routines creates homeostasis and inflexibility. Events, mindsets and decisions that happen early in policy development—i.e. as the issue is being framed—exert a disproportionately large influence (Pierson 2000). The importance of this bias, towards starting points and initial policy frames, reinforces the problem that policy appraisal often comes too late in the sequence of policy development, and casts doubt on whether appraisal alone could ever enable deep, double-loop learning.

We should be clear about the type of learning that is possible in an environment of self-reinforcing investment, rules and beliefs. The argument is not that these rules of the game prevent learning, and ensure the preservation of the status quo. Path dependence does not mean that, once set, policy paths are inevitable and unchangeable. Organizational learning does result from the new information yielded from policy appraisals but, most commonly, such learning takes an adaptive form with institutions attempting to correct previous dysfunctional decisions by making amendments at the margins (Cheung 1996; Crozier 1962; Kreuger 1996; March and Simon 1957). Indeed, in extreme cases, where corrective measures are not taken, the institution itself may cease to exist (Genschel 1997). But, the cumulative logic of the rules of the game, places limits on decision-makers’ interpretations narrowing the political and economic choices they draw from appraisal resulting in policy adaptations that are usually, but not always, derivative (North 1990: 94–95; Pierson 1996).

The research method: scoping single and double-loop learning

How can we scope our dependent variable, and capture the learning that results from appraisal? At its simplest, the absence or presence of single or double-loop learning is identified in terms of how decision-makers respond to information that predicts a mismatch between goals and consequences. Where strategies are adapted, but underlying goals defended, single-loop learning has occurred, where underlying goals are challenged and, in extreme cases, actually changed it is double-loop. This needs to be nuanced a little further however. Decision-makers’ learning across the course of policy appraisal is dynamic not static—narrow understandings may widen over time as knowledge develops. While this may not result in a switch from single to double-loop learning, learning over time may change their propensity and ability to engage in deeper learning. This issue of the extent to which double-loop learning could take place needs to be scoped out.

Argyris and Schön (1978) differentiate two models that describe the manner in which learning is approached. Of specific interest here are the underlying values and indicators of theories-in-use that either inhibit or enhance the possibility of double-loop learning (Argyris and Schön 1978). Model I inhibits double-loop learning. Here, responses to new and dissonant information are defensive. Actors deploy strategies that control the environment and discourage in-depth or external testing of ideas. Model II enhances the possibility of double-loop learning. It involves engagement in ‘abnormal discourse’ (Rorty 1979) and exploration in the inquiry, design and implementation of corrective action. The indicators, elaborated by Argyris and Schön and those using their thesis (summarised in Table 1), allow us to track the learning associated with policy appraisal across time. Specifically, they illuminate the extent to which the single-loop learning, most associated with policy appraisal, is the type that encourages or discourages deeper learning.

Table 1 The manner of learning: governing values and indicators associated with theories-in-use that inhibit and enhance double-loop learning

The rules of the game and policy appraisal: positive feedback, single-loop learning and biofuels policy in the UK

The proposition that policy appraisal evidence in complex issues tends to produce single-loop learning policy requires empirical exploration. Specifically, the extent to which policy appraisal processes are mediated by technical, economic and systemic factors endogenous to issues and institutions, and the cognitive biases and ‘mental maps’ they produce, exert positive feedback is explored through an examination of biofuels policy development in the UK. Learning is explored in terms of individual decision-makers in government departments, as well as scientists and stakeholders involved in the policy process (see Etheridge 1981, 1985 and Levy 1994 for a similar micro-level approach where governmental learning is equated with the sum of what and how individuals learn). Analysis follows a ‘process-tracing’ approach (Berman 2001; George 1997), with actors’ perceptions of how the ‘rules of the game’ around biofuels influenced what was learned from policy appraisal outputs identified through interviews with key actors.Footnote 1 When they are asked how they address a mismatch between goals and (predicted) outcomes, members of organizations are prone to rationalise their behaviour (Argyris and Schön 1974: 6–7). To avoid such espoused accounts, interviews and analysis used the indicators outlined earlier to guide questioning. This is accompanied by analysis of documentary evidence—policy appraisal documentation, predominately scientific reports, parliamentary enquiries, legislation, internal reports and government publications.

Analysis of the case makes an empirical contribution to our limited knowledge of the challenges decision-makers face in trying to develop policy in circumstances where new and often conjectural information, about the deleterious effects of a favoured course of action, is emerging after the policy goals have been set and delivery instruments selected. We know how government would ideally like to narrow the gap between policy and epistemic timetables—a plethora of guidance exists about learning technologies such as horizon scanning, scenario planning, stakeholder consultation and impact assessment. We know less about how decision-makers keep pace with, verify, weigh and respond to unclear, unanticipated or unexpectedly strong signals that arise from these appraisal processes.

Biofuels have been heralded as offering solutions to various global problems—energy insecurity, rural poverty and, most notably, climate change—and generous subsidies have been deployed by governments across the world to stimulate their production. In April 2008, the Renewable Transport Fuel ObligationFootnote 2 (RTFO) came into force in the UK. This requires that biofuels make up 2.5% by volume of road transport fuel sales, increasing by 1.25% a year to 5% by 2010/11. Amid concerns about the carbon savings yielded by biofuels, and their potentially deleterious impact on sustainability, the RTFO requires that transport fuel suppliers report on the environmental performance of their biofuels.

The RTFO was the result of four years of policy development where appraisal was extensive. This exploration can be divided into two distinct phases. The first covers the period between 2004 and 2007, when policy was being developed by the Department for Transport (DfT). Here appraisal (predominately, commissioned reports, stakeholder consultations and impact assessments) focussed on the direct effects of increased biofuels production, where the estimated GHG emissions reductions and implications for land use change (LUC) were particular concerns. Rather than explaining the fundamental policy goal to increase biofuel production and use, the DfT used the evidence to develop detailed policy strategy. The policy goal had been set in the 2003 EU Biofuels Directive (2003/30/EC), leaving member states researching and consulting on: the selection and design of the specific mechanism deployed to encourage industry (RTFO) (DfT 2004: 7); what targets should be set and when (DfT 2004: 4); public labeling (DfT 2004: s8), and best practice in relation to sustainability criteria (DfT 2004: s7.5). However, while appraisal focused on developing policy instruments, it is important to be clear that throughout the appraisals, decision-makers were aware that increased biofuel production raised potentially significant and environmentally deleterious countervailing risks. The thorny questions that exist about the level and costs of CO2 emissions reductions they yield were well known (for an example of an early intervention see the European Environmental Bureau’s [EEB] (2002) statement). By 2007, these concerns intensified with appraisal inputs becoming more numerous from both within government (notably, responses to the Department for Transport consultations rose from 129 in the first consultation in 2004 to 6,335 in the 2007 exercise) (DfT 2004, 2007) and beyond it where interventions, particularly on indirect effects like food price rises and the displacement of agriculture onto uncultivated land, from NGOs, academics, journalists and international agencies came thick and fast. Decision-makers struggled to know both how to process the often inconsistent and conjectural evidence and the weight to attach to the risks being signalled. As an emerging technology, the evidence on the magnitude of biofuels’ unintended effects (both direct and indirect), and the carbon abatement costs associated with them was nebulous, and conflicting signals were abundant. Thus, in the manner described earlier, decisions about detailed aspects of the design of biofuels policy were being made ahead of the production of concrete substantive knowledge about the consequences of the overall policy goal.

Questions and evidence relating to the countervailing risks implied by biofuels, especially their indirect effects on staple food supplies and prices and deforestation, gathered and gained widespread international attention in the run-up to the RTFO’s implementation. This led to calls for a review, and in some cases a moratorium, on all policies aimed at increasing the use of biofuelsFootnote 3 (EAC 2008). Aware that the science had started to move very quickly, and was more than the DfT could assess, the government’s Chief Scientific Adviser and Chief Scientific Advisers (CSAs) of the DfT and the Department for Environment, Food and Rural Affairs’ (DEFRA) Chief Scientific Adviser intervened, advising Ministers of the need to take stock and get advice from outside the circle of government (Bob Watson interview; RTFO Programme Director interview; LCVP Director interview). Particularly pivotal was the public declaration of Professor Bob Watson—the DEFRA CSA and former Intergovernmental Panel on Climate Change (IPCC) chair—that the policy should be examined very carefully before any implementation: ‘it is absolutely ridiculous to have a policy that causes further problems’ (BBC 2008a, b).

While it did not suspend implementation in April 2008, in the February the DfT commissioned a review of the evidence chaired by Professor Ed Gallagher, the Chair of the Renewable Fuels Agency (RFA) (the independent agency created to implement the RTFO). The Gallagher Review represented the second phase of appraisal, though with the policy already being implemented this was more post factum than ex ante. Prepared in rapid response mode—it was commissioned in late February, reported to government in May and published in July 2008. Gallagher focussed-in on six questions associated with the controversial and conjectural evidence on indirect effects by interviewing key scholars, commissioning technical reports and holding stakeholder workshops (RFA 2008). The overall findings—which were reviewed and commented on by officials at the DfT, DEFRA and Cabinet Office and the relevant CSAs—were entirely supportive of the policy objective to increase biofuels use and production: ‘there is a future for a sustainable biofuels industry’ (RFA 2008). Its recommendations were focused around adaptation of existing strategy, rather than any overhaul of the main policy objective. The three most significant recommendations that were outlined by the Secretary of State for Transport in July 2008 concerned amending strategy:

  • government should slow down the rate of increase in the RTFO to 0.5% per annum so that the RTFO reaches 5% in 2013/14 rather than 2010/11 as planned,

  • until controls on land-use change were set and enforced internationally, the UK should press for the European Union’s (EU) 10% by 2020 target to be kept under regular review in the light of the emerging evidence, and

  • the sustainability criteria for biofuels being negotiated in the EU should address indirect, as well as direct, effects on land use (Kelly 2008).

While decision-makers’ responses, to both the RTFO appraisals and Gallagher Review, bore the hallmarks of single-loop learning, the manner of decision-makers’ learning in the second phase of appraisal can be distinguished from that of the first. Though government action post-Gallagher was limited to changes in policy strategy, given its previous firm stance against any slowdown in biofuels adoption, the changes were significant and suggest that more radical action could not be ruled out were more damning evidence to be presented in the future. Moreover, when commissioning Gallagher, the Minister had been clear that the question of a moratorium should be addressed even though it would be difficult to implement (DfT Senior Policy Officer interview; Bob Watson interview). Of course, the fact that the body conducting the review—the RFA—had been created to implement the RTFO made it unlikely that such drastic action would be recommended. However, giving public recognition to this, as one possible and plausible policy option, is an important step towards enhanced learning. The third indicator suggestive of enhanced learning was that, by focussing on indirect effects, Gallagher crystallized for decision-makers that some aspects of biofuels impacts were intangible, and could not be rationalised within existing arrangements (Bob Watson interview).

The empirical puzzle here concerns why the principles that underpinned the Renewable Transport Fuel Obligation (RTFO) were not challenged in the first phase of appraisal, despite the mounting evidence against increasing the use and production of biofuels. Why did the UK government decide to do things ‘better’ rather than do things ‘differently’? The biofuels case is now analysed through the four self-reinforcing mechanisms identified by Arthur (1988) which dominate policy development, and pose substantial hurdles to the ability of policy appraisal evidence to trigger deep learning and policy change.

Large set-up costs

Any new policy initiative entails start-up costs. Where these are substantial, decision-makers have an immediate incentive to stand by that policy choice, even in the face of criticism and evidence of the significant countervailing risks to which it may give rise. The novelty and technical complexity of biofuels meant that the economic and institutional set-up costs associated with the RTFO were especially high, leaving evidence of countervailing risks interpreted in the ‘win don’t lose’ terms that inhibits double-loop learning.

Decision-makers who believe in a policy goal often design it in a way that enables it to withstand challenge and makes it difficult to dismantle. Though the DfT did not present them as a ‘silver bullet’, decision-makers there consciously accentuated the positive on biofuels (DfT Senior Policy Officer interview). This was driven, in part, by the initial promise of the technology and the lack of many emissions reduction initiatives, from elsewhere in Whitehall, for the governments’ planned 2005 Climate Change Bill. The pressure on the DfT to throw its weight behind biofuels would also have been intensified both by the fact that it was the only sector where emissions were on an upward path in the 1990s, and the unattractiveness of alternative ‘solutions’ like reducing speed limits and traffic volume.Footnote 4 Accordingly, the aim was to secure industry commitment to the technology by providing stable long-term support for biofuels, and the RTFO was designed in a way that made it difficult to switch-off (unlike duty incentives). As a result, high costs were incurred in terms of the time spent constructing the legislation.

By late 2007, as the evidence on deleterious impacts was growing, the RTFO was being prepared for its final parliamentary passage in the October, before its implementation the following April. The institutional time pressures led to the strong sense among decision-makers that the emerging evidence casting doubt on the efficacy of biofuels had ‘missed the boat’ (DfT Policy Officer interview), and that any revisions would have to come later as the policy matured. Even if there had been strong political will to suspend the legislation, achieving this would have been logistically impossible for at least its first year given the parliamentary time required to rescind legislation.

Decision-makers were also very aware of the sunk costs, in both economic and reputational terms, which had been made by the UK government and transport fuels industry. Generous duty incentives had been offered since 2002 (for biodiesel) and 2005 (for bioethanol), and the industry had invested on the assumption that the RTFO would come into force. Moreover, it had agreed to a carbon and sustainability (C&S) reporting system that offered no guarantees of being the same two years down the line when differential rewards through certificates come on stream. This was seen as a huge commitment by the industry and a willingness to shoulder its share of the risk (Hyman, UK Environmental Industries Commission [EIC] in EAC 2008). Against this backdrop, any radical re-thinking of policy would not only have been legally and economically questionable but would also have fatally undermined the DfT’s credibility in the fuel sector.

Sunk costs may also be cognitive. This is most clearly seen in the equivocation of key environmental stakeholders in response to the evidence of direct and indirect risks of biofuels. The 2003 Biofuels Directive enjoyed support from a wide range of policy stakeholders. Until 2006, environmental NGOs, agricultural lobby and the fuel industry endorsed biofuels as the best hope the transport sector had of making a meaningful contribution to greenhouse gas (GHG) emissions reductions.Footnote 5 Against the backdrop of this early enthusiasm, environmental NGOs found it difficult to adjust their initially positive stance and in the run up to the RTFO’s implementation were noticeably unclear on how the government should respond. Such vacillation is reflective of that fact that many of these organizations were themselves struggling to weigh the risk tradeoffs. For example, the fact that agrifuels can be economically beneficial to local communities of the South led to considerable debate within Friends of the Earth (FoE) about their position and resulted in a compromise that they should not be condemned outright (Griffiths [FoE] in EAC (2008: Ev48). One effect of this was the tacit reinforcement of the government’s position that the RTFO should be implemented as per its design.

Learning by doing

The dilemma which all product or policy developers face is gauging when what they are making is ‘good enough’ to be released to the market or society. Rather than wait for perfection that may never be achieved, the conviction that something can be good enough is rooted in the belief that interaction with the world beyond, and adoption by others, will make a product or policy improve over time (Rosenberg 1982). Only after this process of maturation, when the appropriate standards for a product or activity have been identified, can the main protagonists look back and wish they had done things differently (Williamson 1993). The basis of this logic is the idea of experiential learning. Experiential learning—learning by doing—is by far the most common form for humans (Mocker and Spear 1982). Such learning creates snowball effects; where the knowledge that is gained from how systems operate will increase the future effectiveness of those systems. This is the promise of future gains, where inefficiencies found in a policy or technology at its inception can be ironed out through implementation and iteration. When it comes to policy, belief in this promise serves to ‘lock-in’ decision-makers’ original goals. The conviction that the RTFO marked the start of an important learning curve is a strong theme in the government reports and interviews. Future decision-makers would use the experiential knowledge gained from its implementation to: inform later revisions of the RTFO; take a lead role in developing such assurance and train of custody schemes on the international stage (DfT Senior Policy Officer interview; industry stakeholder interview), and boost the UK’s ability to exploit second, third and fourth generation biofuel technologies.Footnote 6

The attachment to developing policy through experience, where the aim is to rationalise contrary evidence within the policy goal (and learning is single-loop), pervaded arguments about the establishment of C&S reporting. As evidence filtered into government about the deleterious potential of biofuels, and the actual levels of carbon savings they create, the fact that the RTFO was coming into force without legally enforceable C&S standards was controversial. Taking carbon savings first, the government was candid about having revised down its estimates from an expectation in 2005 that by 2010 1 million tons per year would be saved to 700,000 tons per year (Transport Minister in EAC 2008: Ev111). This uncertainty is linked to the fact that carbon calculation is an emerging area of science, too incomplete for levels to be linked to any fiscal rewards under the RTFO. Decision-makers’ response to this was to begin the process of developing a calculation methodology, able to differentiate between the different abatement costs of crops, to be road-tested through the reporting requirements before it was hard wired into the RTFO in 2010. Their focus was not on more fundamental questions about relatively high cost of CO2 reduction implied by biofuels.

On sustainability, especially problematic was that information on country of origin and land-use change could be recorded as ‘unknown’. Critics argued that inclusion of this category meant that the biofuels industry was not incentivised to behave sustainably, and data gleaned would be very weak (EAC 2008). Decision-makers’ expectation, however, was that unsustainable behaviour would be rare on two counts. First, it was argued that it was very unlikely that very much fuel produced and supplied into the UK market would come from land which has been deforested during 2006 and 2007, making an early UK contribution to deleterious effects unlikely (Archer, Low Carbon Vehicle Partnership [LCVP] in EAC 2008: Ev85). Second, extensive stakeholder consultation and piloting of the scheme suggested that the reporting mechanism offered a strong signal to industry to source biofuels that save the most carbon because these would be rewarded under future mandatory scheme planned for 2011Footnote 7 (Furness, DfT Head of Biofuels in EAC 2008: Ev111). Thus here, the tacit knowledge (Polanyi 1967) that resulted from decision-makers’ relationships with fuel producers and observation of the importance of the shadow of the future in the market were viewed as providing a sufficient counter to emerging evidence of the possible countervailing risks created by biofuels. Similarly, the importance of learning by doing on data collection was emphasized as a necessity associated with the technology, and a virtue of the data capture targets set for the RFA (rising from 50% in the first year of the scheme to 90% in the third year). Over the first few years of the scheme, the challenge of passing data through the supply chain could be ironed out as those chains matured (Archer [LCVP] in EAC 2008: Ev85).

A further line of defence of the reporting arrangements centred upon them as a potential model for future mandatory international schemes to manage biofuels sustainability (Furness [DfT] in EAC 2008: Ev117). Here learning by doing was promoted as an important source of both political and economic advantage. The reporting requirements of the RTFO make it the most advanced national scheme for managing biofuels’ sustainability and carbon savings, and it was hoped that this would enable the UK to play an influential role in the development such standards in the forthcoming EU Renewable Energy Directive (CEU 2008). Economically, UK fuel producers and suppliers believed that their detailed knowledge of the sustainability issues around biofuels and early commitment to a train of custody scheme would leave them well-placed to adjust quickly to the international standards that followed from that, and claim first move advantage (Hyman [EIC] in EAC 2008: Ev26).

Learning by doing, and the belief that ‘innovation will spur further innovation’ (Pierson 2004: 24), is embedded in the argument that second generation biofuels made from non-food materials, thought to be more sustainable than first, will only get off the ground if a developed market existed—making first generation biofuels an essential learning curve (Wenner, Renewable Fuels Agency [REA] in EAC 2008: Ev111). Warnings made in the 2006 Stern Report on Climate Change, about the UK’s previous hesitation to commit to renewable technologies, were also influential in the belief that innovations must be allowed to mature over time. Waiting for the perfect technology in the past explained the UK’s poor performance on renewables (Hilton [EIC] in EAC 2008: Ev26, DfT Senior Policy Officer interview), and on biofuels it was already a laggard when compared with its Western European neighbours (Bomb et al. 2007). In this way, conceptions of past failures and the need to learn from experience helped justify the way in which contrary evidence was rationalised and the RTFO portrayed as a necessary step on the road towards the UK claiming a commercial advantage in more promising and greener technologies. This ‘strategy of small losses’ (Sitkin 1992; see also Wildavsky 1988 on trial-and-error learning) was confirmed by the DfT Head of the Biofuels Programme who was explicit that, in light of the emerging evidence of countervailing risks, the promise of the second generation fuels serves as main justification for enduring the costs of the first (Furness [DfT] in EAC 2008: Ev110–111).

The Gallagher Review similarly rejected calls for a moratorium on biofuels on the grounds that it would ‘reduce the ability of the biofuels industry to invest in new technologies … [and] … make it significantly more difficult for the potential of biofuels to be realised’ (RFA 2008: 66). What should be noted about the Gallagher intervention, however, is that while they were rejected, the possibility of a moratorium or suspension was openly discussed, signaling the potential for deeper policy learning in government (RFA 2008:65–66).

Coordination effects

Coordination effects occur when the benefits that an organization receives from an activity increase as others adopt the same behaviour. The benefits are increased and, importantly, the drawbacks reduced if they ‘fit’ with the activities of others (Pierson 2004: 25). This feature of positive feedback can be seen in the development of the RTFO in three particular respects: the increased investment in biofuels in the UK; the ‘fit’ with the approach of cross-national competitors, and the ‘shadow of hierarchy’ (Scharpf 1997) cast by both the EU and World Trade Organisation (WTO).

Coordination effects are enhanced where the development of a technology envelopes other sectors, creating linked infrastructures. When externalities become networked in this way, the economic stakes increase exponentially, and lobbies in favour of a policy grow. The UK biofuels industry developed alongside the policy. When unfavourable evidence began to emerge and filter through via appraisal, this created huge disincentives for decision-makers to act in a way that might threaten both the direct biofuels industry but also its linked infrastructure.

The use of generous fuel duty incentives in the UK mirrored action in Spain, the Netherlands and Sweden (DfT 2004: s6.5) and there is much evidence of cross-national lesson drawing in the development of biofuels policy in Europe. DfT officials worked particularly closely with their counterparts in the Netherlands and the DG Transport and Energy (DG Tren) of the European Commission (CEU), to explore the implications of the emerging evidence on biofuels negative impacts (DfT Senior Policy Officer interview; Greg Archer LCVP interview). Such mirroring of behaviour and close association can foster intersubjective understandings, where policy goals become validated and reinforced by peers.

Coordinative effects may also be enforced; the result of commitments made in the past or delegation of authority to hierarchy. The hierarchical dimension of political life is very important in the story of UK biofuels policy, where decisions were made and appraisals considered in the shadow of the EU and WTO. Taking the EU first, the UK is legally obliged to comply with the Biofuels Directive, and so adopted the indicative target for 2010 that 5.75% by energy content of transport fuel sales across EU should be made up of biofuels. Decision-makers in the DfT were conscious throughout the development of the RTFO that they were against the clock and that infraction proceedings, which had been escaped in 2004 because of the promise of the RTFO, loomed large if the UK failed to meet its obligations (DfT Senior Policy Officer interview). Having taken so long to develop the RTFO, decision-makers viewed reaching that target as difficult enough, but further delay ‘would risk putting us fundamentally at odds with what the Directive requires’ (Furness [DfT] in EAC 2008: Ev116). This pressure was intensified further in March 2007 when the EU agreed the heroic target of 10% by 2020.

A further shadow of hierarchy informed the design of policy strategy. The belief that the main risk facing the RTFO was the potential for it to become ‘bogged down in WTO legal arguments for years and years’ was long-held by decision-makers and industry stakeholders (Archer [LCVP] in EAC 2008: Ev85). Accordingly, decision-makers rejected arguments that criteria being piloted by the Roundtable on Sustainable Palm Oil (RSPO) could serve as the basis for early mandatory sustainability standards, preferring instead to establish a C&S reporting regime which included the highly controversial ‘unknown’ category. It was argued that without this, the reporting arrangements could be considered a de facto barrier to trade and the scheme susceptible to challenge under WTO rules, because it was harder for countries of the South to provide evidence on the presence or absence of land-use change (E4tech 2005; DfT Policy Officer interview; Archer [LCVP] in EAC 2008: Ev85).

The hierarchical dimension of coordinative effects raises important issues about how decision-makers order risks. Specifically, what risks they classify as most hazardous. In this case, the risks of reforming the RTFO in a manner which contravened either EU or WTO obligations were seen as of a much higher order of magnitude than the UK’s potential contribution to deleterious impacts of biofuels. Thus, though the UK could have reduced targets in the original formulation of the RTFO, it chose not to. And, while it was free to impose standards unilaterally, the preference was that this should happen Europe-wide. The benefits of coordination mean that the European Commission would shoulder the risk, and be liable for any challenge if any of the standards set were believed to be incompatible with WTO rules (Furness [DfT] in EAC 2008: Ev122).

Gallagher’s intervention, and the government’s response to it, signalled a change in tone regarding how deferential decision-makers were to the targets impose from above. Specifically, the UK’s move to scaling back its own targets and push debate further in the EU on the suitability of the 10% by 2020 suggest an openness to internal, if not radical, change that had not existed in the run-up to the RTFO’s implementation.

Adaptive expectations

Just as business organizations are under pressure to ‘pick the right horse’ (Pierson 2004: 24), decision-makers addressing urgent policy problems must address goals and select strategies, that can command broad acceptance. Such decisions are made taking into account the best evidence, which is available at the time. Once established, the positive expectations associated with a policy become self-fulfilling as they breed investment—notably economic, political and cognitive—which feeds back positively to the policy. In such circumstances, evidence that questions the wisdom in such extensive investment should be expected to meet substantial resistance. This was the case in biofuels. As one policymaker put it, had the full reach of the deleterious effects of biofuels had been known at the outset, while the UK would have developed a policy to develop biofuels, it would probably not have been an obligation based one (DfT Senior Policy Officer interview). By 2007, as the signals of countervailing risks intensified, it was thought to be ‘too late’ for the UK to reconsider. The political, material and cognitive costs of policy suspension, let alone termination or reversal, were simply too high.

The collective nature of politics is important to how expectations about a policy develop and are reproduced: actors change their actions in light of expectations about how others will act (Pierson 2004: 25, 33). EU targets, rather than independent market demand, were the impetus for UK biofuels policy. This left the DfT needing to foster the development of an industry as well as a policy (DfT Policy Officer interview). In the early days of policy development, the DfT worked hard to bring fuel stakeholders on board. It was argued that the sector’s responsibility for a quarter of UK GHG emissions and the dearth of renewable technologies from which to choose meant the transport sector had to embrace the best technology on offer. In 2003, this was biofuels. As the RTFO developed, so the renewable fuel lobby became more established and united, and industry behaviour changed. While the DfT was far from captured by these actors, they did represent an important source of institutional friction (Olson 1981). This made it unlikely that policy appraisal evidence pointing to reduced GHG emissions savings, and harmful effects of biofuels, would precipitate dramatic policy change. Industry had contributed significantly to the design of the RTFO, and invested heavily in changing their practices, in readiness for its implementation (Hyman [EIC] in EAC 2008: Ev21). This political authority was arguably enhanced by the fragmented and uncertain response of the environmental stakeholders and made the RTFO’s passage inevitable (see “Large set-up costs” section).

Decision-makers’ expectations were also influenced by the ways in which other governments were responding to the evidence on biofuels. This links to the intersubjective understandings that are fostered by policy officers discussing how to address the unintended consequences of biofuels, with their contemporaries in other states (see “Coordination effects” section). It also has an economic dimension. The economic returns around biofuels would still increase even if the UK had abandoned the RTFO entirely. Decision-makers and industry stakeholders were especially conscious that schemes already set-up in the Netherlands and Germany were less stringent than the proposed RTFO (Wenner [REA] in EAC 2008: Ev23–24, National Farmers’ Union [NFU] in EAC 2008: Ev67), and if UK standards were set too high this could stymie the growth of the industry, and hand a competitive advantage to another country.

Post-Gallagher, decision-makers’ interpretation of the flexibility of the targets changed. The review convinced decision-makers that they could revisit and adjust their targets, because the weight of evidence was such that their European partners would make similar moves. While the slowdown has been criticised as both too modest, and as sending out the wrong signal to the nascent industry, in terms of learning it is symptomatic of the freer thinking and understanding of choice than was in evidence pre-Gallagher.

Conclusions

This paper is concerned with the analysis of policy appraisal systems and, in particular, the depth of learning they can stimulate in relation to complex and urgent policy problems. Analysis suggests the usefulness of accounts that attend to the temporal tensions that exist between policy and knowledge development. The case study findings illustrate the proposition that, where policy and knowledge development timetables are out of synch, existing technical, procedural and cognitive rules of the game can condition the interpretation of findings from the policy appraisals, in ways that inhibit deep learning. Evidence throw up by appraisals on countervailing risks can be too conjectural, or unclear, to force decision-makers to reconsider the premises on which policy is based, and engage in deep forms of learning, in the time available to them. The biofuels case is underscored by the sense that the appearance of evidence lagged too far behind policy development to trigger any fundamental re-thinking.

What have we learned about the relative importance of each of the four feedback mechanisms? In this case, two orders of feedback existed. The first order is the coordinative effects of the multilevel and hierarchical context, within which UK biofuels policy was developed, which created particularly intense feedback. The shadows of hierarchy cast by the EU and, to a lesser extent the WTO, conditioned decision-makers’ understandings of ‘the boundaries of the possible’ (Majone 1989) on biofuels. The result was a context favourable to second order mechanisms that operated at the domestic level. In response to EU pressure, and anticipated WTO sanctions, significant costs were sunk into biofuels resulting in resource distributions that reinforced a bias towards adjustive or ‘single-loop’ learning processes. Dissonant information was rationalised away, with the promise of ‘learning by doing’, and the perception that it was ‘too late’ to reconsider became policymakers’ accepted mantra.

That two orders of feedback were identified, operating at two levels of decision-making, has significance beyond the biofuels case. Action on climate change needs to be coordinated at the supranational level. However, the biofuels example illustrates that one of the risks of such collective action is the inability of states to engage fully with the results of the policy appraisals they conduct. Attenuating this risk is further complicated by the speed with which path dependent processes appear able to become established around the governance of new sustainability technologies. These concerns must, of course, be tempered by the fact that this case, and indeed climate change governance as a whole, is very much a moving target. It is quite conceivable that decision-makers involved in initiatives such as the RTFO will apply lessons learned in this instance to future iterations of biofuels policy, and to similarly complex technologies.

The value of using analytical insights from NIE to explain how appraisal evidence was interpreted is that it offers a political account focussed on the behaviour of the decision-makers at the heart of policymaking. This eschews functional arguments that assume a level of rationality that simply does not exist when the issues at stake are complex, knowledge-dense and urgent (Pierson 2004: 46). That decision-makers’ interpretations are mediated by paths they do not entirely choose or control, reducing their ability and desire to engage in deep learning, does not mean however that the outlook for appraisal is bleak. Recall Weiss’s (1987: 48) famous advice to evaluation researchers not to be overwhelmed by knowledge of political constraints, but rather to treat them as ‘a precondition for useable evaluation research’. The aim here is the same. The main useable insight into the policy and politics of policy appraisal generated concerns the measures that can be taken to enable decision-makers to learn how to engage in different depths of learning. The biofuels case highlights both an additional appraisal procedure, and government actor, which may help facilitate such ‘deutero-learning’ (Argyris and Schön 1974, 1978).

The first is that deeper learning may result from reviews of policy appraisal conducted by ‘knowledge brokers’ (Litfin 1994; Sabatier 1988) located beyond the immediate circle of government. The biofuels case brings into relief the confusion that appraisal processes may create, and illustrates that policy appraisal does not always result in consensus or coincide with a period of normal science. By commissioning research, and inviting views, on the RTFO a wealth of uncertainties were uncovered. However, while learning throughout was single-loop, important differences in the style of government learning between the first and second phase of appraisal were detected. These suggest that appraisals that are conducted in the public eye and beyond the immediate circle of government may enable moves towards enhanced learning. In the absence of any consensus as a North Star with which decision-makers can orient themselves to the epistemic constellations around biofuels, Gallagher’s intervention allowed them to step back from the issue and reflect upon the interpretations that had become locked-in during the RTFO’s development. These small changes in tone may appear to be but trifles, but their importance is potentially huge. Following the path dependence logic, once established, policies are difficult to change. Gallagher-style reviews conducted by ‘critical friends’, trusted by government, represent an additional appraisal form that may help decision-makers make tentative steps off sub-optimal paths.

The need to have a second appraisal should not be taken as evidence that the first phase was ineffective. On the contrary, the biofuels case illustrates that the endarkened state that existed by 2007 represented an opportunity as much as a threat to policy. The wider reflection, and enhanced learning, that resulted from the Gallagher review would not have been possible without the confusion generated by the earlier appraisal processes.

The second practical insight concerns the question of who are best placed to trigger such reflective processes. Enhanced types of learning are costly—while positive feedback allows inefficient policies to survive, the disruptive nature of double-loop learning means that it cannot be encouraged in all cases where the consequence appears to jar with the objective. In the biofuels case, Chief Scientific Advisers (CSAs) within government departments emerged important catalysts for the Gallagher review. The role of these actors, and their interventions in policy appraisal processes, warrants further research. Their unique professional position, spanning the boundary between science and politics, may give them the right blend of epistemic credibility and political authority for their advice to be trusted on when model II learning should be initiated.