Introduction

Pit lakes can form when open cut mining operations cease dewatering and fill with net ground and/or surface water inflows. These novel lakes are not well understood at the ecosystem level, with site-specific characteristics peculiar to their regional and local contexts of climate, biota, hydrology/hydrogeology, and geology. Considerations of pit lake water quality and ecology can evolve independently of management actions or through either incidental or deliberate biological intervention and manipulation.

In some jurisdictions, regulators explicitly require experimental demonstration of pit lake sustainability and risk management as part of mining approvals (CEMA 2012; DMP and EPA 2015; Jones and McCullough 2011; Williams 2009). Alternatively, corporate or industry standards may promote principles of sustainability, such as maintaining regional, or even reclaiming lost, local values (APEC 2018; DIIS 2016; ICMM 2019; IRMA 2018). In parallel, water quality guidelines are generally moving toward demonstration of environmental responses to toxicants across more than one line of evidence, including scale (ANZG 2018).

Furthermore, leading practice advises a risk-based approach to managing pit lake mine closure legacies (DIIS 2016; Doupé and Lymbery 2005; McCullough and Van Etten 2011; Vandenberg and McCullough 2017). For example, closure objectives might stipulate that planned pit lakes will achieve acceptable water quality for release, for long-term presence in the regional environment, and the establishment of a self-sustaining ecosystem that provides either general or specific end use values (de Lange et al. 2018; McCullough et al. 2018, 2020; Vandenberg and McCullough 2017). If strategically planned, experiments can provide empirical data with which to validate predictions generated from numerical models, and can refine models for future full-scale validation. Finally, experiments may also enable tangible demonstration of management interventions, including different adaptive management strategy options to achieve sustainability (Nixdorf et al. 2010).

In addition to supporting full-scale design, numerical models informed by different scales of study can reveal knowledge gaps and experimental needs, and bridge findings from experiments conducted at different spatial and temporal scales. Using models to consolidate common understandings of PLS research findings at different scales is useful for understanding complex real-life systems. Numerical models can especially provide a structure that allows researchers to quantitatively represent physical, chemical, and biological processes in a pit lake.

The use of studies at different scales provides a better understanding of processes of interest and delivers research outcomes in a more timely and economic manner than full-scale pit lake experimentation alone. Critical to research across all these scales, as spatial scale increases, is the relevance to the region’s planned, large-scale pit lakes at closure, whereas the ability to robustly address knowledge gaps across multiple design options decreases (Fig. 1). Additionally, both cost and research timelines increase as the scale of experimentation expands.

Fig. 1
figure 1

Typical scales of study for pit lakes showing inverse relationship between relevance and replicability

Consideration of scale is fundamental to a pit lake experimental approach, given the large size of some pit lakes coupled with their long presence in the environment, and often the collective impact from the regional development of pit lake districts (McCullough and Van Etten 2011). However, pit lake planning and design is often undertaken based on findings from much smaller scale and duration experiments.

Most smaller-scale experiments involve isolation and manipulation of only a small part of the pit lake environment, for example, in replicate test tubes, bottles, columns, or other enclosures. This is not only a limitation of small-scale systems, but a feature of all systems when attempting to control for single variables. Experimental manipulations would typically consist of the addition or removal of expected aquatic organisms, addition of chemical amendments, or alterations of the fundamental physical environment, followed by incubation for various times. Results are then extrapolated to whole systems from these differently scaled studies. However, such extrapolation may be questionable when important physico-chemical features of the proposed systems and their communities are missing from the experiment (Schindler 1998) or when they are present, but do not vary according to anticipated ambient conditions. Therefore, smaller-scale experiments alone can yield erroneous conclusions about community and ecosystem processes (Carpenter 1996).

Conversely, whole-ecosystem experiments conducted at larger scales cannot be exactly replicated and are expensive and difficult to execute. As a result, many ecologists favour smaller scales in order to obtain statistical confidence in study results (Schindler 1998). The use of various scales introduces the question of whether the balance between realism and replication implicitly proposed by the chosen experimental scale is adequate for the intended purpose. The critical consideration is: what is the fundamental research question, and how well does the chosen experimental scale answer it (Hurlbert 1984; Hurst and Pacey 2004). Modelling may be used to combine results from different scales. Additionally, constructing a pilot-scale pit lake will directly integrate different research scales and yield empirical data. While both of these methods have inherent limitations to understanding the role of single or multiple interacting variables, a full-scale system that is constructed along with a numerical model provide compelling lines of evidence when developed in an integrated and iterative manner.

Most of the literature we have considered deals with spatial scale. However, temporal scale must also be considered when experimenting with and applying results from different spatial scales. While some processes are time invariant (or nearly so) across spatial scales, other processes will take more time to establish in larger systems, particularly those that will attain a state of equilibrium with ambient conditions, because the larger systems will be subject to a larger set of driving variables, as well as processes that require the establishment of biological communities at multiple trophic levels.

Although various enclosure experiments are conducted across a range of mine water issues, we constrained our review to enclosure experiments that particularly sought to better understand aspects of pit lake systems (PLS). Similarly, although many case studies are contained in consulting or industry-funded reports, we constrained our review to peer-reviewed and published literature. We reviewed pit lake literature from peer-reviewed conference and journal papers and theses, first determining what typically scaled PLS enclosures had been used for. We particularly sought case studies where scaled experiments had led to full-scale realisations of a full-scale pit lake. Together with the pit lake and broader aquatic experimental literature, we describe what limitations and opportunities might be unique to that particular scale for future pit lake research. Finally, we advise a multi-scale consideration of pit lake research questions, contributing collectively to a multiple-lines-of-evidence (MLE) or a similar approach to understanding research findings from full-scale enclosure experiments.

Overview of Typical Experimental Scales

Pit lake research can be conducted in experimental systems over many orders of magnitude: from mL in a test-tube to millions of litres in large enclosure experiments, and billions of litres in field-scale experimental lake systems, through to full-scale pit lakes either deliberately constructed for experimentation or resulting from mining activities. A range of terms are used in the literature to describe the various scales that pit lake experimentation has been undertaken at, with little consistency. Following our review findings, we define four main scales of enclosure that have been used for published pit lake research.

  • microcosms (up to tens of litres);

  • mesocosms (hundreds to thousands of litres);

  • macrocosms (experimental ponds, tens of thousands of litres); and,

  • pilot scale (millions of litres).

Microcosms

Microcosms are miniature constructed ecosystems in which environmental constraints are imposed primarily for the controlled study of ecological and geochemical processes (Drake and Kramer 2012). The two main types of microcosms are biological and geochemical microcosms. These types may overlap where an understanding of biological responses to chemistry are sought (Stierle and Stierle 2014). In particular, small laboratory-based containers may be used to determine dose–response relationships in toxicological studies e.g. pit lakes affected by acid and metalliferous drainage (Neil 2008; Neil et al. 2009; Stierle et al. 2006) or by other mine waters, such as tailings (Dompierre et al. 2016).

Geochemistry microcosms are used in both static and kinetic testing programs (ASTM 2013). Tests can be performed in the laboratory or in the field. Laboratory tests are designed to standardise reaction rates relative to field conditions e.g. remediation experiments, whereas field-scale tests are performed to confirm that the results of the laboratory tests are representative of reaction rates in site conditions.

Sometimes called ‘bottle’ experiments, biological microcosms are small volume containers of lake water suitable for replicating a statistically more powerful number of samples for each treatment factor. Microcosm experiments investigating ecological processes may operate with artificial communities assembled from cultures, such as single-species experiments in batch and continuous cultures (Neil 2008). Biological microcosms may also include lake sediments, tailings, natural microbial assemblages, and chemical amendments. Conditions such as oxygen and redox may be artificially controlled to replicate one component of a pit lake, such as the tailings-water interface. Examples of biological microcosms are shown in Fig. 2.

Fig. 2
figure 2

Microcosm experiment studying pit lake biogeochemistry and the effect of two different substrates at two different loadings, including interaction effects and with a control (left) (McCullough and Lund 2011)

In the water treatment industry, bench-scale tests are an essential step toward developing a pilot and then full-scale treatment system (Tchobanoglous and Burton 1991). Variables such as dosing rates and reaction times are determined from theoretically derived rates that are varied over a range of expected values. Owing to the limitations of small-scale tests mentioned above, the rates from bench-scale tests are considered approximate, but provide reasonable starting points for setting the pilot system, which can undergo further testing and optimization. Bench-scale tests are often carried out on sample size of a litre to a few litres.

Because of their small size, microcosms are typically maintained in a laboratory facility to control ambient conditions. This controlled environment can reduce the need for replication. Microcosms may even be deployed within a pit lake from floating structures or from jetties to more accurately provide realistic ambient conditions (Larratt et al. 2007). Microcosms provide for greater statistical power with which to experimentally test the effect of independent variables on pit lake waters and substrates. Large numbers of microcosms can be incorporated into experimental designs. Time scales for the maximum duration of an experiment are generally in the order of hours to months, with the lower limit for chemical reactions, intermediate times for biotic reactions, and the upper limit for biogeochemical reactions.

Microcosms in Pit Lake Research

Microcosms can focus on fundamental pit lake processes primarily influenced by geochemical and microbiological processes. Physical mixing in the natural environment will be either accounted for, with accompanying assumptions and limitations clearly recognised, or excluded from these smaller scale tests e.g. the unexpectedly high performance of AMD remediation of the Berkeley pit lake (Gammons and Icopini 2019; Tucci and Gammons 2015). Similarly, ecological community processes other than short-term primary production experiments should be restricted to larger experimentation scales.

Microcosm-Scale Opportunities

Research using microcosms persist, despite their limitations, because their smaller size confers advantages that often take precedence over their shortcomings (Gamble 1990). For example, the highest numbers of replicates and controls can be achieved in the smallest enclosures, affording strong statistical power (Stewart-Oaten 1995). An important opportunity afforded by smaller-scale tests is the ability to design tests that can differentiate biotic from abiotic processes (Chen et al. 2013), which is useful in designing adaptive management strategies (e.g., in-pit subaqueous waste disposal Lapakko et al. 2013).

Microcosms enable reasonable exploration of fundamental pit lake biogeochemical and microbiological processes (Drake and Kramer 2012); even those involving more complex interactions with climate such as water chemistry and sunlight exposure (Friese et al. 2002). Laboratory-based microcosms can be used to assess the dynamics of algal and other microbial populations and simple food webs over multiple generations of their communities (Fyson et al. 1998a; Read et al. 2009). In particular, numerous microcosm studies have been successfully used to interpret the influence of chemotrophic bacterial communities on pit lake metalliferous geochemistry and water quality (Bozau et al. 2007; Frömmichen et al. 2004; Fyson et al. 2006; Geller et al. 2009; Koschorreck 2011; Kumar et al. 2011a, c, 2013; McCullough and Lund 2011; McCullough et al. 2006; Read et al. 2009; Wendt-Potthoff et al. 2010) and algal (Corzo et al. 2018; Fyson et al. 2003; Kumar et al. 2011b, 2016) as well as anaerobic biodegradation of recalcitrant hydrocarbons in oil sands pit lakes (Chen et al. 2013; Chi Fru et al. 2013; Siddique et al. 2011, 2014a, b, 2015).

Microcosm-Scale Limitations

Reasonably realistic microcosms can often simulate many fundamental responses of entire natural ecosystems (Buikema and Voshell 1993). The applicability of results from microcosm studies to nature depends on realistic imitation, particularly the interaction of species and environmental variables. Unless they can be adequately designed to mimic major ecosystem processes and community compositions, smaller-scale experiments can give highly replicable and statistically powerful, but spurious, answers (Schindler 1998). Therefore, if the relevant and intrinsic limitations of their scale as an experimental tool are not carefully considered, conclusions drawn from microcosm studies can be likened to the “right answer to the wrong question”—namely, when the scale or other complications make it unable to test the hypothesis to the level of rigour required.

Within microcosms, a lack of habitat variation, the high ratio of surface area to volume, and the microcosm container (i.e. “wall effects”) can lead to challenges in scaling up microcosm results to the full-scale environment. In particular, the interaction between habitat size and food abundance is consequential to aquatic animals and choice of scale in experiments may affect results (Wynn and Paradise 2001).

Other limitations of microcosms are:

  • the small volumes can limit the number of samples that can be analysed during the incubation time;

  • the limited ability to prepare true replicates of small volumes if the sample itself is inherently heterogeneous (e.g. a stratified material used as backfill, such as tailings); and,

  • they cannot directly answer research questions relating to larger organisms or larger physical processes.

Mesocosms

A variety of experimental systems are described under the umbrella term “mesocosm” (Stewart et al. 2013). Mesocosms are medium-sized experimental enclosures of larger volume than microcosms (Odum 1984). They are generally stored either indoors in cooler latitudes, or outdoors in more temperate and tropical environments (Fig. 3). Mesocosms operate with natural species assemblages, allow a degree of replication and control of experimental manipulations, but are limited in temporal scale. Because of their smaller size relative to pilot or demonstration systems, they can often be constructed in higher numbers, above ground and near the laboratory facilities, allowing for both good replication of experimental treatments as well as regular and intense sampling activities. A typical experimental design would include up to a dozen mesocosms with time scales of weeks to months.

Fig. 3
figure 3

Manipulative experimental mesocosms: 12 fibreglass 2000 L enclosures (Lund and McCullough 2009)

Mesocosm systems are often run in close proximity to a full-scale system of interest to provide similar source materials and relevant field conditions. Mesocosms often achieve this comparison by including shallow sediments that incorporate some basic pit lake sedimentary geochemical processes and interaction with overlying waters, and also some benthic ecosystem diversity and function (Lund and McCullough 2009).

The use of mesocosms to study both the marine and fresh water planktonic environment has been a major trend of the last decade. These have usually been employed to examine the effect of a controlled change to the environment, such as pH, light, temperature, zooplankton invertebrates or, most commonly, nutrients (Watts and Bigg 2001). The use of mesocosms, essentially larger microcosms exposed to more environmental variation, often has the goal of considering many such parameters simultaneously (Drake and Kramer 2012). Environment Canada provided major reviews of mesocosm research and concluded that, in most cases, laboratory toxicity tests were good predictors of effects in natural habitats (DOE 2010).

Mesocosms are often large enough to enable simple ecosystems to develop that can then be experimented on. Mesocosms have been used extensively in aquatic ecology/ecotoxicology studies of pit lake studies to understand the effects of addition or generation of acidic and metalliferous drainage (Kuznetsov et al. 2014), nutrients and organic matter on water quality and biological communities (Lund and McCullough 2009; McCullough and Horwitz 2010). Additionally, the larger size allows for more samples, or larger samples, to be withdrawn for replicate analysis over time relative to microcosms.

Mesocosms in Pit Lake Research

Mesocosms typically do not provide adequate volume or environmental realism for physical limnological processes, such as water column stratification, or higher-level ecological processes, such as direct effects on higher trophic levels or large-bodied species. Instead, mesocosms should be used to expand microcosm-scale experiments. This can be achieved temporally by allowing experiments to run longer with less confounding imposed by the enclosure than in their microcosm counterparts. Mesocosm experiments can also be used to extend and validate microcosm experiments undertaken under controlled conditions in more field-realistic environments (Caquet et al. 1996). Mesocosms should also be used to include primary and even smaller secondary consumers in food-web studies, such as in biomagnification assessments. Geochemical experiments probably do not require this level of scale for fundamental processes that are less scale-sensitive; but collecting geochemical data from mesocosm scale experimentation should be regularly undertaken both to validate these processes at this higher scale and to better inform biological and physical processes. Similarly, sediment–water interface experiments may be less variable and confounded at this scale.

Mesocosm-Scale Opportunities

The purpose of scaling results from mesocosm experiments to ecosystems is usually to address larger-scale ecological problems and management strategies. This may be particularly true for fundamental geochemical processes and ecological functions that can identify trends, threshold levels, and interrelationships that might be manipulated in the course of a particular treatment (Gamble 1990). Mesocosms more closely mimic the full-scale environment than microcosms and as such, have been successfully used to test the validity of microcosm findings. Mesocosms can achieve this by accommodating both water and lake sediment (Neil et al. 2009), including in-pit waste disposal (Han et al. 2009). Whilst it is often the case that pit lakes have depauperate littoral and catchment zones (Vandenberg and McCullough 2017), this is by no means absolute. Both certain commodity types (sand and, to a lesser extent, coal mines, for example) have less steep slopes and more extensive littorals (Schultze et al. 2010), shorelines can be modified during closure to increase littoral zone extent (McCullough et al. 2019). Drainage basin size (catchment area) can also be markedly increased e.g. through flow-through closure design connecting pit lakes to significant regional waterways (McCullough and Schultze 2018; Schultze et al. 2011). All lakes have sediment, even if this constitutes mixed cobbles and talus overlying a hard rock benthos. However, some pit lakes may also develop an extensive soft sediment through organic decomposition processes and accrual of fine sediments from catchment inflows (Blodau et al. 2000; Oldham et al. 2009; Pal et al. 2014; Read et al. 2009).

Mesocosms can be considered a valid tool for pit lake ecological and geochemical studies in that they are more realistic than small-scale laboratory microcosms (Gamble 1990), but retain experimental utility and may be the only way to investigate effects on a multi-trophic scale (Neil et al. 2009). In particular, mesocosm toxicological experiments can incorporate multi-species interactions such as competition and predation, enabling comparison and contrast with simpler single-species mine water toxicity tests of smaller scale (McCullough 2006; Van Dam et al. 2014).

Some ecological trophic and competitive interactions are also insensitive to spatial scales (Warwick et al. 1988) such as simple manipulations of direct interactions in pelagic systems at timescales relevant to phytoplankton growth. For example, algal response to nutrient enrichment varies little across spatial scales at a given depth or light intensity (Spivak et al. 2010), and results from small-scale experiments that examine the direct response of lake algae to nutrient enrichment or metal toxicity can be scaled up and applied to larger, more natural aquatic systems.

Mesocosms have successfully been deployed in situ in mine pit lakes as floating structures and ex situ containers, either nearby or at more distant laboratory facilities. In situ mesocosms have been referred to as ‘limnocorrals’ (Martin et al. 2003; Whittle 2004). PLS studies have been primarily for biological remediation studies of AMD (acid and metalliferous drainage) contamination (McCullough 2008). Mesocosm studies have evaluated chemical responses of biological processes, such as phytoplanktonic algae (Dessouki et al. 2005), microbial sediment processes (Bozau et al. 2007; Koschorreck et al. 2002a, b, 2003, 2007), and a combination of both (Lund and McCullough 2009; Neil et al. 2009; Sackmann 2006).

Mesocosm-Scale Limitations

Mesocosm dimensions, including volume, depth, radius, and wall area, can affect abiotic processes, including light availability, gas exchange, and surface area. Artificial mixing regimes may lead to the creation of water column stratification and increased sedimentation (Watts and Bigg 2001). These can then have run-on effects which, in turn, influence geochemical and biological processes (Striebel et al. 2013). Loss of inorganic and organic material and nutrients to growth on container walls can also be a problem (Williams and Egge 1998). Mesocosms tend to be more sensitive to environmental influences than open pit lakes because containers are small and easily influenced by differences such as biotic colonisation by organisms and environmental variables such as temperature (Watts and Bigg 2001) and self-shading from the walls.

Comparisons across experiments, and extrapolations to larger scales, are further complicated by the use of mesocosms with varying dimensions, or by studies that do not cite experimental dimensions. Consequently, the scale of mesocosm experiments have been criticised as being unrealistic simplifications with limited relevance to natural ecosystems (Schindler 1998). Additionally, results of even fundamental ecological process experiments, such as toxicant or nutrient limitation, from mesocosm systems may have limited relevance to natural ecosystems by failing to account for long-term changes in biological community dynamics and biogeochemical processes (Carpenter 1996). Mesocosm experiments are generally conducted over a longer duration than microcosms, in which time founder effects from differing assemblages of pioneer species and interactions between trophic levels can occur. These effects can lead populations within replicate mesocosms to diverge from one another, even though the physico-chemical conditions are practically identical (Gamble 1990).

Macrocosms (Experimental Ponds)

Experimental ponds, technically known as macrocosms, have generally been considered as the final scale in lake research prior to field scale (Odum 1984). Macrocosms can be either sectioned-off portions of an existing pit lake or constructed ponds that are often sunk into the excavated ground to accommodate their size (Fig. 4).

Fig. 4
figure 4

Manipulative experimental macrocosms formed by sectioning of a pit lake arm by watertight curtains (Lund et al. 2006)

Experimental ponds can be useful for examining higher-scale components of ecosystem responses such as micro- and macro-invertebrate and plant communities as well as basic ecosystem interactions. Indirect effects of stressors can be observed in macrocosms through changes in abundance or biomass of plants and animals, such as fish, amphibians, and macroinvertebrates, in response to changes in food, substrate, or habitat (deNoyelles et al. 1994). Although the definition of scale differs by discipline (Watts and Bigg 2001), the scientific lake research and mine pit lake literature include studies ranging in scale from simple plastic film or mesh bags of less than 5 m3 to pond systems of several thousand m3. Historically, some enclosures isolated the water column from the benthos, but there is now a growing emphasis on benthic enclosures that include both aquatic ecosystem components (Kovalenko et al. 2013). Pond sizes of 100–1000 m2 surface area and 6 m depth should be sufficient for most experimental purposes, with the exception being physical processes such as hydrodynamics that require larger systems (Caquet et al. 1996). These ponds provide the opportunity for scale-up from microcosm and mesocosm experiments to incorporate more realism by encompassing more environmental variables and ecological response scales. They also allow for multi-year trials of adaptive management and ecosystem experiments using a wide range of mine waters, wastes such as tailings or over/inter burdens, and other potential backfill materials.

Macrocosms in Pit Lake Research

Experimental ponds can be used for the great majority of manipulative experimentation, whether replication is required or not. Their large physical scale means that ecological experiments involving higher trophic levels species, such as fish and amphibians or larger organisms, including aquatic macrophytes, is one goal of these structures. Similarly, phytoplankton/zooplankton community interactions will also be more valid at this scale. However, as a result of the greatly increased cost and loss of replication penalties of this scale, macrocosm ponds should be restricted to experiments that require this scale in their design to minimize the effects of confounding factors or to experiments validating findings from smaller-scale studies. For example, geochemical data should be collected from macrocosm pond studies, although geochemical processes should already have been studied and refined in smaller-scale studies. This duplication allows for testing of scaling assumptions and pseudo-replicate sampling of replicate ponds (Hurlbert 1984).

Macrocosm Opportunities

Experimental ponds retain a strong element of environmental realism and applicability, whilst permitting laboratory-like manipulations and replication (McCullough 2009). A greater diversity and complexity of biological assemblages can be incorporated, including aquatic macrophytes, amphibians, and fish. Depending on the depth and width, some physical processes such as water column mixing may also be able to be incorporated. Macrocosms can also accommodate experiments running over longer durations than smaller-scale tests, e.g. months to years.

Large lake enclosures extending from the surface to a few meters deep, up to hundreds of thousands of litres in volume, have been used successfully for microbial studies in acidic lakes (Koschorreck et al. 2002a, b, 2007).

Experimental ponds offer the smallest scale for field-testing adaptive management strategies. Testing adaptive management at this scale allows for optimization prior to full-scale implementation, which may in turn lead to cost savings. These systems can be readily pumped out and restarted to allow for new experiments over time. Additionally, the experimental pond is the smallest scale that is likely to gain acceptance of adaptive management strategies by regulators and stakeholders.

Macrocosm Limitations

Because of their larger size (relative to microcosms or mesocosms), space and cost typically limit macrocosm use. Even if simple in-ground constructions are used, macrocosms require sufficient hydrogeological integrity to prevent groundwater and other hydraulic connectivity including inter-pond and local groundwater seepage and contamination (Lund et al. 2006). Their large size can also complicate sampling, including greater occupational health and safety (OH&S) regulations. Macrocosm-scale experiments can still omit important pit lake full-scale variables, such as wind mixing and currents.

Pilot-Scale Pit Lakes

Due to their relatively small size and duration compared to full-scale pit lakes, small enclosures and short-term experiments particularly limit the scale of physical processes and ecological complexity (e.g. number of trophic levels able to be studied) (Petersen et al. 2009). As a result, many scientists now consider that accurate management decisions cannot be made with confidence without ecosystem-scaled studies (Schindler 1998). This view is increasingly common regarding ecological studies, which has increased the focus on extrapolating findings from small-scale experiments to natural ecosystems at more realistic scales.

Large-scale, unreplicated natural experiments (LUNEs), such as pilot-scale pit lakes, have been found to be useful in testing hypotheses at ecologically realistic scales. However, this scale of experimentation is relatively rare in the field of ecology in particular, due to their lack of replication. Nevertheless, pilot-scale pit lakes can be a crucial next step in the understanding of ecological processes, extrapolating from small-scale experiments to relevant scales (Barley and Meeuwig 2017).

Pilot-scale pit lakes should be constructed at an appropriate scale, depth, and shape to reasonably demonstrate conditions analogous to the expected pit lakes. In metalliferous mines, this may mean relatively deep and steep-sided bathymetry. In coal, sand, and oil sands mining areas, they instead would have large surface areas.

One of the few examples of a pilot-scale is the Base Mine Lake (BML) project (Dompierre and Barbour 2016, 2017; Dompierre et al. 2016; Hurley 2017; Morandi et al. 2015, 2016, 2017). The principal goal of the BML project is to demonstrate the “water-capped tailings” closure strategy pioneered by Syncrude. Most of the studies to date have focused on microbial (Richardson et al. 2020) and geochemical interactions at the tailings-water interface (Dompierre and Barbour 2016; Dompierre et al. 2016, 2017; Rudderham 2019; Samadi 2019), resuspension (or lack thereof) of tailings into the water column (Hurley 2017; Lawrence et al. 2016; Tedford et al. 2019), and detoxification of the overlying water column (Morandi et al. 2015, 2016, 2017; Mori et al. 2019; White and Liber 2018). BML is a density-stratified aquatic system, with an initial 5 m water column comprising mainly OSPW placed over a 40 m fine fluid tailing (FFT) zone. Over time, the water column will deepen as the tailings densify. While BML will provide a pilot-scale demonstration case for the oil sands industry and will answer many important questions regarding oil sands mine closures, it represents a pit lake with unique properties that make transfer of operational conditions to general pit lake design of other closure scenarios challenging. For example, BML will employ a closure strategy that has the following unique aspects:

  • a high volume of tailings is added to the pit prior to lake filling (≈ 80% of the total lake volume);

  • shallow water column (5 m, initially);

  • rapid lake filling (< 1 year); and,

  • lake filling occurs during mine operations, so water can be flushed through the cap with the outflow used in operations until acceptable discharge criteria are achieved.

The monitoring and research associated with these objectives will demonstrate the overall pit lake concept for the industry, although other operators will need to demonstrate their closure plans as well. To that end, other operators such as Suncor are constructing similar facilities on their leases (Suncor 2018).

Another pilot-scale study was the bioremediation of an acid pit lake in northern Queensland, Australia (Fig. 5). Laboratory (Kumar et al. 2011c; McCullough and Lund 2011; McCullough et al. 2006), macrocosm (McCullough et al. 2008a), and finally pilot scale (McCullough et al. 2008b) studies were all used in concert to demonstrate its potential and then to demonstrate that:

Fig. 5
figure 5

Demonstration pit lake scale experiment formed by sectioning of a pit lake by a waste rock causeway for a control lake (far side) and manipulated lake (near side) (McCullough et al. 2008b)

  • microbial sulfate reduction would remediate high AMD waters;

  • bulk and readily available wastes could be used as sources of organic materials;

  • that products formed through alkalinity generation would be stored in the lake sediment.

Pilot Systems in Pit Lake Research

Pilot scale often represents the final scale of study in pit lake research, with volumes reaching millions of litres (Bozau et al. 2007). This scale of study is therefore often geared toward demonstrating that pit lake closure plans can meet regulatory commitments, achieve acceptable water quality, and develop sustainable aquatic ecosystems (i.e. what might be considered regulatory knowledge gaps). The anticipated maximum experimental duration for these lakes is ≈ 20 years, depending on how challenging the substrate, climate, and other factors will render chemical and biotic effects, such as ecological succession.

Studies have compared predicted geochemical models of pit lakes prior to pit lake formation with actual pit lake water quality and generally show that the geochemical models frequently fail to predict actual pit lake water quality (Eary 1998; Kuipers et al. 2006). Using pilot-scale PLS is a useful tool for validating and calibrating such water quality models.

Pilot-Scale Opportunities

Many researchers believe that qualified decisions for ecosystem management cannot be made with confidence unless the limitations of mesocosm studies are understood and full ecosystem scales are studied (Ahn and Mitsch 2000). Until this full-scale is realised, many environmental processes may still be omitted from study and left to best judgement and estimates. Pilot scale is the only scale of study that allows interactions with the broader catchment to be incorporated into the pit lake. These may include the local broader catchment, including waste materials such as overburden dumps, tailings storage facilities, and other mining landforms, as well as the broader watershed where flow-through or other local or even regional interaction is occurring. Complex questions of the specific responses of entire ecosystems may only be able to be answered by full-scale experimentation (McCullough 2015; McCullough and Schultze 2018).

Whilst small-scale studies can suffer from significant variability between replicates, pilot-scale studies may detect more subtle changes due to lower variability and sensitivity to noise at this scale (Eberhardt and Thomas 1991). Consequently, pilot-scale water bodies are primarily intended to verify that pit lake closure plans can achieve acceptable water quality and develop sustainable aquatic ecosystems. There are currently few studies of aquatic macrophytes of full-scale pit lakes (Kamberović and Arudanović 2012; Otaheľová and Oťaheľ 2006; Pal et al. 2014), with most studies only undertaken at smaller scale.

Pilot-Scale Limitations

While experiments conducted at the ecosystem scale are considered the most realistic, such experiments suffer from limitations including low replication and reduced experimental control (Hurlbert 1984; Stewart-Oaten et al. 1992). For instance, there is likely to be only one or very few pilot-scale tests to demonstrate chosen strategies prior to proceeding to full-scale; either observational (McCullough et al. 2008a) or even manipulative (McCullough et al. 2008b) pilot-scale PLS experiments may have only one treatment (often a single large enclosure within a pit lake) and one control (often the surrounding pit lake) (Bozau et al. 2007), which greatly limits their interpretation as to the effects of the treatment of interest.

The large size of pilot-scale pit lakes can also complicate sampling, including greater OH&S considerations such as the need for boats and possibly underwater sampling techniques (Ross and McCullough 2011). Recent advances in drone sampling technology (Castendyk et al. 2019) may reduce these limitations. However, these technologies currently do not permit biotic sampling.

Importantly, construction and modification costs may be very high. Therefore, it is critical to plan field-scale developments early in the research program by selecting the right filling materials to achieve the objectives of the project. Similarly, it is important to engage with stakeholders and regulators prior to construction to confirm that the pilot-scale system will achieve the desired outcomes in terms of providing a credible demonstration.

Integrating Multiple Scales of Study

Scale is fundamental to both experimentation and theory, particularly in the biological sciences (Petersen et al. 2009). A lack of realism is inherent to all experimental science (Drake and Kramer 2012), where scale is an implicit component of all study designs that sample a subset of a given population. However, small-scale experiments using ‘model organisms’ in small scale studies using microcosms or mesocosms have been shown to be a useful approach to begin addressing complex ecosystems (Benton et al. 2007). The main experimental approaches in pit lake studies can therefore be presented along a gradient of scale: microcosms with an artificial mixture of species in batch culture, mesocosms and macrocosms with more natural mixes of species, and unenclosed field experiments (Fig. 6). For instance, mesocosms are a powerful tool to link large field studies close to natural conditions with controlled small-scale laboratory experiments (Striebel et al. 2013). Selecting an appropriate scale of experimentation is not only a question of technical and financial feasibility but a consideration of the inevitable trade‐offs between realism and control.

Fig. 6
figure 6

Different scales of study contribute different types of knowledge about pit lake physical, chemical and biotic ecosystems. As experimental physical scale increases, validity of study results to full scale pit lakes increases through more bio-physico-chemical processes being incorporated

Equally, full-scale modelling of pit lake attributes is often undertaken with assumptions of smaller-scale characteristics. Typical examples include predictive modelling of long-term geochemical conditions, such as water quality, and more recently, other conditions, even shoreline erosion (Fig. 7), where assumptions must be made with regard to small-scale attributes (McCullough et al. 2019). In particular, the geochemical evolution of pit lakes, and how that can substantially affect biological evolution, may change at different scales of biological complexity and biota. Simple factors such as pH and TDS can determine what organisms will survive in a specific pit lake. Modelling studies at larger scale can benefit from smaller-scale studies directed toward the pit lake environment, supporting their use of equation constants e.g. for geochemical dissolution (Castendyk et al. 2015a, b; Nixdorf et al. 2010; Watson et al. 2016), water balance (McCullough et al. 2013; McJannet et al. 2017, 2019), hydrodynamics (Hurley 2017; Lawrence et al. 2016; McCullough et al. 2011; Nguyen 2004), cohesivity (McCullough et al. 2019), and other physico-chemical assumptions.

Fig. 7
figure 7

Bed shear change predictions carrying assumptions of nature and strength of field-scale sediment cohesiveness (McCullough et al. 2019)

The comparison of different scales to each other and to modelling results help indicate which processes or combinations of processes can be scaled (that is, are general processes) and which cannot. Modelling can then further inform the different study scales through sensitivity analyses highlighting primary drivers of water quality and ecological processes and areas of knowledge considered important (Castendyk and Webster-Brown 2007a, b). These drivers and areas of knowledge should then receive greater research attention to advance their understanding. Conversely, the comparison of different scales to each other and to modelling results will indicate which processes can be scaled (that is, are general processes) and which cannot. For example, modelling pit lake water quality has been criticised for the inaccuracies inherent in scaling geochemical reactions from typical scales of laboratory static and kinetic test-work to field scale (Eary 1999; Pilkey and Pilkey-Jarvis 2012).

Problems with appropriate scaling of pit lake studies can be difficult to deduce without direct comparisons with much larger scale or even whole-pit lake experiments. Depending on the research question being explored, potential problems arising from studies undertaken at singular scales include:

  • Too small spatial scales that do not include whole ecological communities or incorporate physical processes. For example, elimination of fundamental littoral–pelagic and catchment–lake interactions, such as organic matter diagenesis and nutrient incorporation into foodwebs (Schindler 1998) or water column mixing frequency, timing, and duration (Boehrer and Schultze 2006).

  • Too small spatial scales that do not capture the heterogeneity or stochasticity of the system of study.

  • Temporal scales too short to assess slow-responding organisms and complex pit lake biogeochemical processes. For example, ecological succession in a new lake is expected to take many years with longer durations required for higher trophic levels as the food-chain below them becomes established (Lund and McCullough 2011).

Many experiments (ecological and physical, in particular) are sensitive to scale, as the size and duration of the experimental scale will likely exclude or distort important features of the ecosystems (Carpenter 1996). Both larger scale macrocosm and mesocosm manipulations have limitations, particularly for ecological research questions, due to limited generality and applicability of results to even larger and more complex pit lake systems with different physical parameters. Assemblage compositions and responses involving indirect food web interactions and processes usually occur over longer temporal scales (e.g. numerical responses of consumers) and may be more sensitive to variations in spatial scale (i.e. environmental connectivity to other ecological communities within or outside of the pit lake). Physical studies will be influenced by regional climatic conditions and local wind patterns, including the effects of nearby waste and other mining landforms (Huber et al. 2008).

Microcosm studies of ecological processes, in particular, have been criticized for being unrealistic. Scaling rules have been developed for some processes to help translate experimental results from these small enclosures to entire ecosystems (Petersen and Hastings 2001). Even identical studies of limnological processes across wide ranges of lake sizes reveal that scaling correction is necessary when extrapolating from small lakes to large ones (Schindler 1998). As a result, mesocosms and macrocosms are often better suited for testing large numbers of single variables with replication that provides reasonable statistical power for pit lake ecological questions. These small-to-medium scale experiments have become increasingly popular because they provide an important bridge between very tightly controlled microcosm experiments (which can suffer from limited realism) and the greater biological complexity of natural systems (Stewart et al. 2013).

Because of their more realistic geometries, mesocosm and macrocosm experiments may realise similar results for algal and invertebrate studies (de Szalay et al. 1996). Some organisms are too large and some processes too slow, to include in smaller-scale experiments. For instance, lake mixing processes and contaminant bioaccumulation and biomagnification effects at high trophic levels, such as fish and birds, require larger-scale systems.

Although pit lakes are expected to yield relatively simpler ecosystems than their natural analogue counterparts (Lund et al. 2013; Van Etten et al. 2014), the basic dimensions of spatial and temporal scale and complexity with commensurate levels of replication are still needed to answer research questions (Hurlbert 1984). The choice of the appropriate experimental scale is therefore a trade‐off between realism and control. Unenclosed field manipulations have the highest degree of realism, but the least degree of control. Small‐scale ecological experiments with single or a few species rely on a ‘model organism’ concept, and are biased against the detection of slow and space-requiring processes (Sommer 2012). As a result of these trade-offs, there is no one single scale for a pit lake study that is suitable for examining ecological processes and outcomes. Instead, different enclosure scales and field scales form just one part of a study-scale jigsaw, with conclusions more widely accepted if they are supported by experiments at a variety of scales (Fig. 8).

Fig. 8
figure 8

Integration of studies of different scales together

The results and conclusions of each study scale can then be compared with related studies at different scales by linking their findings as general principles that would then go on to provide input variables e.g. constants to empirical models of key pit lake processes. Enclosures can also be linked together; either simultaneously such as flow-through, or in time, such as being undertaken sequentially (Petersen and Englund 2005). In this manner, different enclosures can represent different components of a pit lake, such as benthic and pelagic zone, littoral, or even riparian.

Different enclosure and field scales therefore form just one part of a framework of studies that often need to come together to answer fundamental process questions in the complex systems of pit lakes. Physical process studies will rarely be reliable at small scale and will require larger scales (macrocosm and upward) with modelling to extrapolate temporally. Geochemical processes can be reasonably demonstrated at very small scales with fundamental processes, but benefit from inclusion of other parameters, such as sediment interaction (mesocosm scale and upward) and physical processes (macrocosm and upward). Once parameterized, modelling is also often able to be scaled for well-established abiotic geochemical processes (Parkhurst and Appelo 1999) and even biogeochemical processes, although in the latter case, the findings may be limited to the encountered experimental conditions (Bozau et al. 2007). Biological processes can be demonstrated by microbiological (bacterial and phytoplanktonic) communities at only microcosm scale with micro-and macroinvertebrates becoming reasonably demonstrated at mesocosm scale, and vertebrates, such as fish, only at macrocosm scale. Riparian and other catchment biological processes require full-demonstration pit lake (DPL) scale study.

Integration of different research scales to achieve a multi-scale understanding of pit lake closure issues will necessitate incorporation of different research studies and indeed research programs. As a result, a pit lake research program must maintain a degree of flexibility that allows different researchers to answer research questions in (equally) valid and complementary ways. It is also important that the approach taken by a single research discipline does not compromise or preclude other types of research at the facility.

One way that study findings from different scales can be integrated is through the MLE approach. This formal methodology provides support for a conceptual model by undertaking different but complementary investigations representing entirely separate fields of science. If conclusions from each study converge, this indicates the conceptual model is correct and we can have assurance in the model, and be able to confidently communicate it with stakeholders. We recommend that an MLE approach be applied, beginning at smaller scales and moving progressively larger, to maximize their demonstrable validity at the full-pit-lake scale. The MLE approach does not need to be formal or rigid; rather, it can be an holistic approach applied to each experimental question being studied at smaller scales.

Conclusions

Our review was limited by the lack of many studies not being published in the primary peer-reviewed literature, which we restricted our review to. Many studies have been undertaken either as internal organisational, or including the authors’ own) consulting commercial-in-confidence reports. As a result, it is likely that our review has omitted some, especially smaller-scale studies that preceded larger scale studies e.g. some of the full-scale pit lake remediation studies described in Geller et al. (1998, 2013). However, we found that different scales of study present different opportunities and limitations for understanding PLS (Table 1). Most of these studies were directed toward in situ remediation of acid mine drainage (Klapper 2003; Klapper et al. 1996). Very few ecological studies have been undertaken, and these have been directed primarily toward oil sands pit lakes (Quagraine et al. 2005). Although still low in replication, published smaller-scale studies present greater replication and thus statistical power than larger-scale studies. However, there are few small-scale studies that have been concomitantly matched with larger-scale studies, and this remains a significant knowledge gap. Equally, there are some full-scale pit lake studies that were never undertaken at smaller scales e.g. Harrington (2002) and Lu (2004).

Table 1 Summary of published pit lake studies at various scales

Our review found that smaller scale microcosms and mesocosms are ideal for testing single PLS variables, under well-defined conditions, with replication providing reasonable statistical power. Variables and processes can be isolated, controlled and tested to answer a number of questions such as;

  • What are the toxicological thresholds for constituents of concern in pit lake waters for aquatic species; including synergistic or competitive toxicological effects of COPC mixtures?

  • What are biogeochemical generation and fate processes for water quality?

  • What is the role of nutrient limitation and stimulation on trophic status?

In contrast, experimental ponds are more suited to answering higher level questions, such as:

  • What adaptive management strategies can be applied to improve sustainability and thus success of pit lakes?

  • How will water quality change over time with interactions with sediments?

  • What is the toxicity of this water to the site-specific ecological communities and the effects of more complex ecological interactions e.g. of competition, predation? That is, what macrophyte, phytoplankton, macroinvertebrate, zooplankton, and fish assemblages can successfully be established in water composition representative of pit lakes?

  • Are there risks of bioaccumulation or biomagnification in pit lake food chains?

  • What water quality variables drive successful ecological rehabilitation of pit lakes?

  • What is the optimal residence time for pit lakes that require bioremediation for water quality improvement prior to discharge?

  • What different types of mine wastes can be safely stored under the water column?

Ideally, an overall PLS research programme will be systematically planned from conception to completion, with at least an anticipation of which types of questions will be answered by each scale of study. In an integrated PLS research programme, the types of experimental systems best employed need to be carefully examined in the context of the specific knowledge gaps to be addressed. In most cases, the information gathered at each stage of experimentation can be integrated into a conceptual and often even a numerical model that can reveal remaining knowledge gaps. An important decision should be the desirability of sacrificing spatial and temporal scales so as to obtain replication against a view that appropriate scale must always have priority over replication (Oksanen 2001). Some processes simply will not scale well, such as more complex physical and biological studies. This has especially been the case where small-scale e.g. microcosm studies have overestimated larger scale e.g. macrocosm/pilot scale study outcomes (Geller et al. 2009; Geller and Schultze 2013). The PLS programme can then adapt to findings over time, moving to progressively larger scales to maintain economic efficiency. In this way, knowledge gaps can be addressed using an appropriate scale of study that reduces time, effort, and cost, while maximizing flexibility and options for the largest investments of the pilot and full-scale systems.

However, models can be more reliable when validated by bench-top column experiments (i.e. microcosm) and field-based tank experiments (i.e. mesocosm) experiments of the “pit lake in a bucket” approach (Castendyk et al. 2015b).

Finally, this review showed how few pit lake experiments at smaller scale have resulted in outcomes at larger scale and the need for future research at this scale. However, few scaled experiments have been realised as full-scale pit lake outcomes, making the reliable translation of experimental results to real life examples unknown. Instead, our review found that there are very few studies of either smaller or full scale pit lakes, and that there are none that we are aware of where the thesis of the smaller scale experiment was validated at the full scale. For example, manipulative bioremediation or toxicity experimental tests at smaller scales have often not been validated by pilot-scale treatment or exposure experiments. Equally, we do not find either observational or manipulative experiments of full scale pit lakes that have had robust manipulative experiments of any smaller scale undertaken prior to their formation.

There are now a large number of pit lakes forming, some of which are in early stages of biological and chemical evolution. Some of these lakes will not move beyond very simple systems, constrained by poor water quality and salinising and/or acidophilic reactions (Lund and McCullough 2011). Monitoring and investigating geochemical, biological and ecological aspects of these some of these pit lakes could, if intensively studied, serve as real examples for which model ecosystem scales could be compared for validation of those experimental models. If small-scale models are not useful for predicting actual pit lake ecosystems, perhaps they will help to better define what type of experiments are helpful. Several pit lakes are now forming around the world, from relinquished and abandoned mine voids and from pit voids at operations that have ceased dewatering activities. An appropriately matched scale of study to understand pit lake ecosystem evolution would therefore examine these pit lakes that are now forming.

Finally, when faced with complex questions and decision making, environmental management often requires a diversity of evidence rather than single studies (Cook et al. 2012). This MLE approach achieves robust understanding of poorly understood systems with multiple studies occurring at different, but complementary, temporal and spatial scales (Hall and Giddings 2000). It is more effective and reliable to have multiple, independent lines of evidence converging on a single conclusion to develop an understanding of pit lake issues and processes impacting PLS management and have demonstrable and sustainable conclusions for stakeholders.