We must rely more on long-term, ecosystem-scale experiments and real case histories, and less on small-scale experimentation.

−David Schindler (2012)

The Modern Challenge

Since the beginning of the Anthropocene, Homo sapiens has transformed the planet’s ecosystems via exploitation, deforestation, pollution, eutrophication and climate change (Estes and others 2011; Ehrlich and Ehrlich 2013). Not only have we eradicated apex predators in many parts of the world (Myers and others 2007; Atwood and others 2015), but we are a uniquely voracious “hyperkeystone” species and super-predator, consuming adult prey at a rate 14 times that of any other predator (Darimont and others 2015; Worm and Paine 2016). Humanity failed to anticipate many of the ecological problems that it has caused, including the depletion of the ozone layer and bio-accumulation of DDT (Kates and Clark 1996). It has been argued that this lack of foresight was partly due to an overreliance on small-scale, short-term experiments that failed to capture the complex nature of large-scale perturbations and net ecosystem responses (Carpenter 1989, 1990; Pace and others 1998; Schindler 1998; Table 1). Although small-scale, controlled studies provide a useful means to test hypotheses, there is an urgent need to replicate findings at the level of ecosystems to better anticipate ecological “surprises” (Knowlton 1992; Clark and Gelfand 2006; Miao and Carstenn 2006; Chave 2013). However, a significant challenge remains: how does one robustly test hypotheses at large scales?

Table 1 Contrasting Outcomes from Small- and Large-scale Experiments

The Modern Solution?

The solution to this problem may lie in ecologists making greater use of large-scale but unreplicated “natural experiments” (LUNEs). LUNEs have been generated by natural phenomena such as El Niño, hurricanes and tsunamis (Kessler 2011; Zoback and Gorelick 2012). However, these “accidental” experiments (sensu Hillerislambers and others 2013) are also a by-product of scientific and economic activities, including fisheries, large-scale infrastructure projects such as the Suez canal, carbon capture and storage projects and catastrophic oil spills such as Deepwater Horizon and the fur trade (Estes and Palmisano 1974; Smith and others 2004; Madin and others 2010). LUNEs, regardless of their origin, can provide unique insights into ecosystem-level processes that complement the findings of classical laboratory and field experiments (Hillerislambers and others 2013).

However, LUNEs constitute a Faustian bargain: in exchange for rare insights into net ecosystem responses, the ecologist may have to sacrifice treatment replication (Hargrove and Pickering 1992; Johnson 2006). Replication has traditionally been considered a prerequisite for extrapolating results to other systems and, where manipulations are included, for inferring cause (Fisher 1926). Yet, the replication of LUNEs is typically challenging for ethical and practical reasons (how does one mimic a tsunami or build a life-size replica of the Panama Canal?). In the deepest sense, ‘fully controlled’ replication of any natural experiment is impossible, because the initial conditions have not been fully described. Furthermore, the “treatments” that create LUNEs are often unique to a single location (Johnson 2002), and even if true replicates exist, the number required to preclude a Type II error may be prohibitively high due to the variability inherent in ecosystems (Carpenter 1989).

LUNEs may nonetheless contribute a modern solution to the current “replication crisis” affecting science (Stroebe and Strack 2014; Nosek and others 2015). Relatively few attempts are typically made to replicate research findings and a disconcerting proportion of studies have turned out to be irreproducible (McNutt 2014). In response, some critics have sought to redefine the role of replication in science, arguing that exact replications should no longer be treated as all-powerful, “one-off” confirmations or rejections of a theory (Stroebe and Strack 2014). Instead, scientists should prioritise “conceptual” replications that test the real-world relevance of theories in diverse field contexts (Hüffmeiera and others 2016). LUNEs offer a way to conduct such conceptual replications, as exemplified recently by Atwood and others (2015).

The Powers of LUNEs

LUNEs have led to new insights into a range of ecological issues, including the role of apex predators and prey within ecosystems, the effectiveness of ocean “geo-engineering” and the impact of climate change on marine communities, in addition to habitat fragmentation and eutrophication (Carpenter 1990; Vanni and others 1990; Rowan and others 1997; Kessler 2011; Gabric and others 2015). Yet, LUNEs remain an under-utilised minority in the scientific literature relative to classical experiments, that is, small-scale, controlled and replicated studies (Hillerislambers and others 2013). Ecologists and journal referees largely continue to view the latter as the scientific “gold standard” (Grossman and Mackenzie 2005), even though some of the most influential, manipulative field experiments in the history of ecology lacked replication (Schindler and others 1971, 2008; Raffaelli and Moller 1999). As a result, few LUNEs are conducted in the first place and fewer still are accepted for publication. However, in the context of emerging environmental challenges, LUNEs are uniquely positioned to answer challenging, time-sensitive ecological questions (Worm and Paine 2016).

Power 1: Testing Logistically and/or Ethically Challenging Hypotheses

Under rare circumstances, scientists have conducted controlled and randomised experiments at large spatial scales (Naeem and others 1994; Ewers and others 2011). However, in most cases, conducting classical experiments at large spatial scales is prohibitively expensive, time consuming, labour intensive or unethical. As a result, LUNEs provide one of the very few means available to scientists to test hypotheses at ecologically relevant scales (Carpenter 1990; Hargrove and Pickering 1992). For example, the construction of the Panama Canal created a unique opportunity to investigate competitive exclusion in previously isolated communities of freshwater fishes (Smith and others 2004). The Deepwater Horizon oil spill provided Kessler (2011) with a unique and otherwise unobtainable simulation of the effects of climate change on subsea deposits of methane hydrates. Similarly, the invasions of Guam by brown tree snakes Boiga irregularis and Christmas Island by yellow crazy ants Anoplolepis gracilipes yielded insights into the ecological roles of keystone prey species that were unreplicable due to the ethical challenges around reducing the abundance of protected, endemic species (O’Dowd and others 2003; Rogers and others 2012).

Power 2: Systems That have no Replicates

In many cases, replicating a LUNE is not only ethically unconscionable but also impossible because the system is unique (Power and others 1998; Schindler 1998). For example, it is impossible to replicate natural experiments that explore the effect of invasive species on fauna endemic to specific islands or locations (O’Dowd and others 2003; Rogers and others 2012). Similarly, true replicates for LUNEs that examine lake and river processes remain elusive due to differences between water bodies in species composition, water chemistry and other factors (but see Schindler and others (1978) for statistical/partial mitigation measures). To circumvent this problem, studies may compare upstream, “control” stretches of river to downstream, “treatment” stretches that have been experimentally manipulated (Fraser and Gilliam 1987; Hildrew and others 2004).The task of finding true replicates for natural experiments conducted on coral reefs can also be daunting. Ruppert and others (2013) compared two reef systems in northwestern Australia that were similar with respect to disturbance history, productivity, habitat structure and a number of other characteristics, the primary difference being that one of the systems had a history of targeted shark exploitation. This study generated valuable evidence consistent with the hypothesis that depletion of sharks may lead to trophic cascades on coral reefs. However, it is difficult to imagine an exact “replicate” pair of reef systems due to the unique disturbance histories of coral reefs, the scarcity of pristine sites and the nature of fishing, which typically leads to the removal of not just sharks but also their prey (Pauly and others 1998).

Power 3: Meta-Analyses

Given the challenges of replicating LUNEs, the gradual accumulation of independent, unreplicated studies may be the only way to gather evidence at the ecosystem level (Carpenter and others 1995; Johnson 2002, 2006). Like lawyers, ecologists can then “build a case” in favour of or against hypotheses (McArdle 1996). Ultimately, such studies can fuel meta-analyses in which drivers in common are reinforced when seen in combination (Cottenie and Meester 2003; Worm and Paine 2016). Although meta-analyses based on unreplicated studies have been criticised (Hurlbert 2004), they have provided insights into a range of important ecological issues including trophic cascades and overfishing (Micheli 1999; Shurin and others 2002; Prevedello and others 2013).

Meta-analyses of LUNEs have played a particularly important role in demonstrating the effectiveness of marine reserves (Edgar and others 2014). Many reserves are established precisely because a site is unique or contains endemic fauna (Allison and others 1998), precluding replication. Before-after-control-impact (BACI) analyses, in which samples are collected from “control” and “treatment” sites before and after a reserve is established, have played a key role in circumventing this problem (Bence and others 1996). However, relatively few reserves are subject to long-term monitoring programmes. Moreover, BACIs may constitute temporal autocorrelation (Stewart-Oaten and others 1986). Meta-analyses of unreplicated studies can provide a solution, but considerable time will be required to amass a sufficient number of studies to “fuel” the analysis, particularly if there is a publishing bias towards replicated studies (Lester and Halpern 2008).

Power 4: Ecological Gradients

LUNEs provide a particularly powerful tool when they operate across an ecological gradient. This approach allows a regression model to be fitted to the data, predictions to be made about how a variable changes in response to another variable and may identify “tipping points” within ecosystem function (Lennon 2011). Indeed, Kreyling and others (2014) recommend that such regression analyses be used to develop simulation models, which, in turn, can be validated by further experiments and ultimately generate data that are more informative about ecosystem processes than those produced by small-scale, manipulative experiments. Regression analyses based on LUNEs have proven to be particularly useful in exploring the role of large piscivores on coral reef systems. For example, a LUNE created by a gradient in fishing pressure along the Northern Line islands in Hawaii has been used to study the relationships between predator biomass and prey biomass, behaviour and condition, providing important validations of hypotheses generated at smaller scales (Madin and others 2010; Walsh and others 2012).

Power 5: Qualitative Arguments

Some have argued that most robust LUNEs use evidence-based arguments rather than inferential statistics to describe ecological patterns. For example, even though Likens and others’ (1970) large-scale deforestation experiment at Hubbard Brook was unreplicated, it met with approval from critics because it “convincingly demonstrated the effects of the experimental variables, without resorting to inferential statistical tests that would have been inappropriate” (Hurlbert 1984). Similarly, of the more than nine large-scale, unreplicated iron and phosphorus enrichment ocean-based experiments that have been conducted globally, only two used inferential statistics, drawing criticism from Hale and Rivkin (2007). Moreover, LUNEs provide a powerful test of hypotheses when the effect is both consistent with a priori hypotheses and the magnitude of the effect is ecologically meaningful (Stewart-Oaten and others 1992; Moss and others 1996). For instance, when there is a single perturbation in an otherwise unmanipulated, well-described lake, it is possible to infer causality (Schindler, D.W., pers. comm.).

The Pitfalls of LUNEs

Although LUNEs offer unique insights into ecological processes, they are fallible and must be replicated in a range of systems before becoming the basis for changes in policy. However, it is increasingly common for the results of unreplicated studies to be treated as definitive (an “ill-informed” strategy; Ioannidis 2005). In contrast, we would argue that ecologists who conduct LUNEs should present their results as either “consistent” or “inconsistent” with the tested hypothesis, while thoroughly exploring and ranking alternative explanations. In addition, it is important that the other scientists and the public be aware of the limitations of LUNEs and view the results in the context of other studies.

A cautionary example of the pitfalls associated with the interpretation of LUNEs is provided by the recent critique of Myers and others (2007), a study that linked the collapse of a scallop fishery in the Atlantic Ocean to declines in large sharks due to overfishing. This LUNE not only led to changes in fishing policy but also a vigorous campaign to reduce numbers of the cownose ray Rhinoptera bonasus, a slow-growing species with poor resilience to fishing. Recently, however, the findings were questioned (Grubbs and others 2016). We would argue that both studies, despite their seemingly contradictory results, are crucial steps in the accumulation of the large-scale evidence needed to test hypotheses about ecological processes. Indeed, Worm and Paine (2016) recently argued that the findings of Myers and others (2007) and Grubbs and others (2016) may be reconciled by taking into account non-trophic and/or behavioural factors.

Concluding Remarks

Despite their lack of replication, LUNEs have a unique power, not attainable in any other way, to test hypotheses at large scales and in complex systems. For example, cosmologists get by with just the one Big Bang (Gadgil and Bossert 1970), seismologists rely on sound waves generated by unpredictable earthquakes to constrain deep Earth structures (Ritsema and Van Heijst 2000) and the single submarine slump that entombed unique Cambrian fauna allows the Burgess Shale to be a key calibrant of evolutionary radiation models (Morris 1989). Of course, there are limitations to what conclusions can be drawn from LUNEs and the findings must be viewed in the context of classical experiments, models and long-term monitoring programmes. However, as noted by Lennon (2011), “unreplicated results do not equal lies, just as replicated results do not equal truth”. The potential pitfalls introduced by diminished control can be mitigated against by a willingness to explore and rank alternative interpretations, in addition to the appropriate use of statistical tests. In line with the recent “replication crisis”, LUNEs also provide a way to conceptually replicate the findings of smaller-scale, controlled experiments.

Demonstration that human activities are altering ecological processes is often necessary for policy changes to occur (Morrisette 1989). In cases where rapid action is required to avert environmental disaster, we would therefore argue that LUNEs and other large-scale studies should be prioritised (Figure 1). An illustration of this point is the dramatic decline in water quality in rivers and lakes in North America in the 1960s. Whereas small-scale experiments implicated nitrogen and carbon, David Schindler’s large-scale, unreplicated and manipulative experiments on lakes eventually persuaded the Canadian government and several US states to ban phosphorus (Schindler 1998). Similarly, the discovery of a hole in the ozone layer precipitated a global consensus to ban chlorofluorocarbons, even though evidence based on small-scale lab experiments and modelling had existed for almost 20 years (Morrisette 1989). These examples suggest that when an environmental threat is immediate and significant, policy-makers will act on large-scale evidence and that these actions are largely appropriate. Our review suggests that we must urgently fast-track such LUNEs if we are to anticipate ecosystem-level feedbacks to contemporary perturbations.

Figure 1
figure 1

A new approach to replication (adapted from Hüffmeiera and others 2016).