Introduction

The notion of ecosystem health in rivers has been widely adopted in environmental policy where the management objective is frequently to “maintain river health”. In this context, the term ‘health’ is considered synonymous with human health, in that, like a healthy human, a healthy ecosystem is free from stress and disease with its component parts functioning appropriately (Karr, 1999). The inclusion of this idea in policy comes despite ongoing debate and no general consensus on an appropriate definition of ecosystem health or means by which to measure it (Norris & Thoms, 1999). Nevertheless, the concept has underpinned some major developments in stream ecology, and led to the development of effective tools for stream bioassessment (e.g. Wright et al., 2000).

While the state of rivers has long been on political and social agendas, aquifers and groundwater have largely been out of sight and out of mind. In reality, aquifers often represent a vertical and lateral continuation of riverine systems (Ward, 1989), and so, the two should be managed holistically rather than as separate resources (Tomlinson et al., 2007). As global demand for water increases, pressure on aquifers from water abstraction and contamination is also increasing (Danielopol et al., 2003), so it is timely that the condition of aquifers is being considered in water management directives across the world (e.g. EU Groundwater Directive 2006/118/EC). However, unlike river systems, even a notional understanding of groundwater health, its assessment and place in policy remains elusive and a consistent definition and methods by which to measure it are needed.

The aims of this article are (1) to provide a definition of groundwater health and a description of the likely attributes of healthy groundwater ecosystems, (2) to develop a framework for assessing groundwater health and (3) to demonstrate the efficacy of the framework using field data. We focus here particularly on ecosystems occurring in aquifers per se (sensu Humphreys, 2006) and do not consider hyporheic zones. For discussions of hyporheic zones, see Boulton et al. (2010).

Methods

Framework development

Our definition of groundwater health and discussion of the likely attributes of healthy groundwater ecosystems (provided in the “Results” section) was developed from a detailed search and review of the aquatic sciences and ecosystem health literature. Our initial aim was to provide a definition of ecosystem health consistent with that applied to linked surface waters. Accordingly, we drew heavily on the river health literature. To identify the likely attributes of healthy groundwater ecosystems we examined both the current paradigms of ecosystem structure and function provided in review (e.g. Humphreys, 2006, 2008) or biogeographical/biodiversity (e.g. Dole-Olivier et al., 2009a, b) papers and examined these paradigms in the light of environmental impact/stressor-response studies which highlighted differences between reference (nominally healthy) and impacted (nominally unhealthy) sites. From these comparisons, we identified the common attributes of undisturbed or ‘healthy’ sites, and further, consider indicators or metrics that were able to discriminate disturbed and undisturbed or ‘unhealthy’ sites.

The resulting definition of groundwater health dictated the necessary components of our framework, which was structured to incorporate all aspects of our definition, and also ensure its broad applicability and utility under different circumstances of knowledge and resource availability. The final step in the process was to complete a case study to examine the application of the framework on a field-collected dataset.

Framework testing: case study

Our case study uses data obtained from groundwater sampling in an alluvial aquifer in the Gwydir River catchment in north-west NSW, Australia. Samples were collected in January 2007 (summer) as part of a broader study on the impacts of land use on groundwater ecosystems. Samples were collected from nine sites in an agricultural area surrounding the town of Moree (29°28′S, 149°54′E). Sites were divided into ‘reference’ (nominally undisturbed) sites located away from irrigated cropping (four sites) and five ‘test’ sites. Of the ‘test’ sites, four were located in areas of irrigated agriculture and one (site 5) was located away from agricultural activities. In practice, a greater number of reference sites should be used for benchmark setting but the small number used here is suitable for our illustrative purposes.

Sampling methods were based on those outlined in Hancock & Boulton (2009). Groundwater (300 l) was pumped from each bore using an inertia pump (Waterra Powerpack II, Waterra Pumps Ltd, ON, Canada) and passed through a 63-μm sieve to collect stygofauna. Removing 300 l was sufficient to purge the bore, so water samples were then collected (in sterile amber glass or clean plastic containers) for chemical and microbial analyses and refrigerated or frozen until analysed. Water samples were analysed for 55 pesticides (including major metabolites), nutrients and metals at the NSW Office of Water Environment Laboratory in Arncliffe, NSW, Australia. Field meters were used to measure dissolved oxygen (YeoKal 609, Yeokal Electronics, Brookvale, NSW, Australia), conductivity (TPS LC84), pH/ORP/Temperature (TPS LC80A, TPS, Springwood, QLD, Australia) of bore water samples on site.

Stygofauna collected in the sieve at the time of sampling were preserved in 100% ethanol, stained with Rose Bengal, and later sorted under a microscope (×60 magnification). Stygofauna were identified to lowest taxonomic level using relevant keys and identifications were confirmed by taxonomic experts. Cotton strips assays (CSA) comprising of 4 cm × 10 cm calico cotton strips were placed in each bore after sampling and left for 6 weeks to measure microbial activity (Lategan et al., 2010). Tensile strength loss of the CSA was tested using a pneumatic tensiometer Universal Testing Machine (UTM Instron 6022 10-kN load frame) with flat plate grips and specialized software (DOLI EDC120 Germany). See Lategan et al. (2010) for detailed methods.

In addition to CSA, microbial activity was assessed using BiologTM Ecoplates, which measure microbial activity through carbon utilization (Preston-Mafham et al., 2002). The BiologTM Ecoplates were inoculated with 150 μl of the sample per well and incubated in the dark at 20°C. Colour development was measured at 590 nm after 6 days of incubation.

Results

This section is presented in three parts, each relating to a specific aim of this article. In the first part, we define groundwater health and describe the likely attributes of healthy groundwater ecosystems. We then incorporate these attributes into a framework for assessing groundwater ecosystem health (part 2), which are subsequently tested by way of a case study (part 3).

Defining groundwater ecosystem health

Ecosystem health is a term widely used to describe the overall functioning and condition of an ecosystem. Ecosystem health is difficult to define and there has been much debate surrounding the definition of the term and the merits of the overall concept (Rapport et al., 1998; Karr, 1999; Vugteveen et al., 2006). While some argue that the ecosystem health is undefinable (Scrimgeour & Wicklum, 1996), less scientific than related concepts such as ecosystem integrity, and not a measurable ecological property (Suter, 1993), the concept has merit in being readily interpretable by managers, politicians, stakeholders and the general public such that it is now common place in public policy and dialogue (Vugteveen et al., 2006). Given this broad adoption and identification in policy, the concept must be embraced, and the science necessary to underpin the concept performed.

Integrated management of aquifers and surface waters is needed (Tomlinson et al., 2007) and may be fostered by a consistent definition of ecosystem health across both environments. Accordingly, we propose a definition of groundwater health, based on that for rivers by Vugteveen et al. (2006) but modified to articulate more clearly the role of providing ecosystem goods and services (collectively ‘ecosystem services’), because this is the key attribute setting apart assessments of ecosystem health and ecosystem integrity/condition (Boulton, 1999). We define groundwater health as “an expression of an aquifer’s ability to sustain its ecological functioning (vigour and resilience) in accordance with its organisation while maintaining the provision of ecosystem goods and services”.

As a note of caution, the provision of ecosystem services is only one part of a healthy ecosystem, and the notion of a healthy aquifer as one that meets the water supply needs of a society must be tempered by the need for sustainability, such that an aquifer should not be considered unhealthy if the sustainable yield is insufficient to meet water needs. Conversely, an aquifer that meets human water needs may be in a degraded state because of that process. Consequently, we need to consider the provision of ecosystem goods not in terms of the ability to meet human needs, but in terms of sustainability. To this end, assessing the health of aquifers by their ability to provide water may be addressed by indicators of sustainability (see Vrba & Lipponen, 2007).

Attributes of a healthy groundwater ecosystems

By our definition, a healthy groundwater ecosystem is one that can sustain its ecological structure (organisation) and function while sustainably providing ecosystem services. Necessary under this definition is an understanding of the biological, physical, and chemical attributes of healthy groundwater ecosystems, the ecosystem functions (including those that underpin ecosystem services), and the ecosystem’s resilience (its ability to maintain its integrity when stressed see Brand & Jax 2007).

The physical and chemical attributes of healthy groundwater ecosystems

The physical structure of the aquifer matrix is perhaps the principal factor shaping groundwater ecosystems and biotic distribution (Dole-Olivier et al., 2009a, b). In addition to limiting the body size of fauna (Pospisil, 1994), the matrix influences water flow and chemistry, including the distribution of nutrients, carbon and oxygen throughout the aquifer, and the ionic composition of the groundwater (Dole-Olivier et al., 1994; Mulholland & DeAngelis, 2000; Gibert, 2001). The structure of the aquifer matrix is largely determined by the geology and or geological setting, but can be modified by anthropogenic activities, contributing to the death of invertebrates and aerobic microbes (Boulton, 2000). Indeed, such environmental attributes may themselves be considered indicators (sensu Dale & Beyer, 2001), although we use them here primarily to define the aquifer environment, requiring that they be similar across the sites being examined (Table 1). However, environmental attributes are considered indicators in that variations from natural conditions maybe considered as stress indicators (see below).

Table 1 Environmental factors influencing groundwater biota

Aquifers are characterised by a lack of light and stable environmental conditions (relative to surface environments). Environmental temperatures, at least for shallow unconfined aquifers, usually vary little from the mean annual surface temperature (Jones & Mulholland, 2000). The absence of light precludes primary production by photosynthesis, so groundwater ecosystems are generally reliant on allochthonous carbon from surface environments. Consequently, organic carbon concentrations in pristine aquifers are generally low (Edmunds & Shand, 2008), with a tendency to decrease with depth (Gounot, 1994) and distance along groundwater flow paths (Datry et al., 2005).

Healthy groundwater ecosystems will have a natural regime of groundwater flow, pressure, depth, availability (timing) and quality to which biotic elements of the ecosystem have evolved (Eamus & Froend, 2006). Changes to the natural groundwater regime will interfere with habitat availability, water flow, the flux of nutrients and energy, and alter groundwater/surface water interactions (Gibert et al., 1994). For example, lowering the water table can reduce available habitat, and disconnect aquifers from surface processes (e.g. Hancock, 2009), and if rapid, can strand fauna in upper layers of the aquifer, leading to death through desiccation (Tomlinson, 2008). Raising the water table can cause changes in water chemistry as groundwater moves upwards through saline sediments (e.g. Pannell & Ewing, 2006) with effects on biota likely as water quality changes.

Healthy groundwater should be generally low in nutrients and heavy metals (within the range of background concentrations) and free from synthetic chemicals. The presence of nitrogen and metals above background concentrations, and the presence of synthetic chemicals (such as pesticides or petroleum hydrocarbons) in groundwater are clear indicators of anthropogenic disturbance, can cause significant biological changes and therefore indicate potentially impaired ecosystem health (Table 2). Nitrate can occur naturally in groundwater, but sewage pollution and widespread use of nitrogenous fertilizers in agriculture and their relatively high solubility and leaching capacity has led to widespread contamination (Almasri, 2007), which has been linked to impacts on fauna (Stein et al., 2010).

Table 2 Potential indicators of groundwater ecosystem health
The biological attributes of healthy groundwater ecosystems

Microbial assemblages are the foundation of aquifer ecosystems, capturing energy, and forming the basis of the foodweb (Gibert et al., 1994; Humphreys. 2006). The majority of microbes are sparsely dispersed as single cells or small colonies attached to sediment surfaces, providing a food source for invertebrates (Novarino et al., 1997; Humphreys, 2006). Generally less than 1% of available sediment surfaces are colonized by bacteria (Griebler et al., 2002; Anneser et al., 2010), with healthy, undisturbed aquifers tending to have very low microbial diversity and activity relative to surface waters (Griebler & Lueders, 2009), due mainly to naturally low concentrations of nutrients, carbon and oxygen (Gounot, 1994). It also appears that most microbes inhabiting aquifers are attached rather than being free-living (Gounot, 1994; Griebler & Lueders 2009; Anneser et al., 2010) although the ratio of attached to free-living bacteria can change with contamination (e.g. Griebler et al., 2002).

Unlike surface waters, groundwater ecosystems rarely support vertebrates and generally lack primary producers and herbivores (Humphreys, 2006). The invertebrate fauna (stygofauna) usually represent the highest trophic level within aquifers. Groundwater ecosystems are relatively simple, typified by low α diversity (few species at any one locality) with a ‘truncated’ functional and taxonomic diversity (Gibert & Deharveng, 2002), creating a system with (generally) low horizontal (within trophic level) and vertical (between trophic level) diversity (sensu Duffy et al., 2007) in a given location, and concomitant short food chains. However, isolation of stygofauna has created a fauna dominated by short-range endemic species (e.g. Eberhard et al., 2009), providing high β diversity of invertebrates (Humphreys, 2008), although this appears not to be the case for microbial assemblages (Griebler & Lueders, 2009).

Where the void size permits, a healthy aquifer is expected to have invertebrates present (e.g. Danielopol et al., 2000; Hancock & Boulton, 2008). Invertebrate assemblages in groundwater are generally dominated by crustaceans which often make up more than 50% of the total species richness and abundance (Table 3). Amphipods, syncarids and copepods appear particularly common in healthy groundwater ecosystems (Gibert et al., 2009) and may be useful as indicators of broader biodiversity (e.g. Stoch et al., 2009). Invertebrate assemblages also often include mites, oligochaetes and rarely insects and molluscs (Humphreys, 2006). However, disturbance can cause a shift in the structure and composition of biotic assemblages; for example, oligochaetes, nematodes, ostracods and cyclopoid copepods may become dominant in sites with organic enrichment (Table 2).

Table 3 Potential benchmarks of groundwater ecosystem health for Tier 1 assessments

Most biota found in groundwater ecosystems are highly evolved, obligate groundwater-dwelling animals (stygobites) not found in surface environments. Generally, healthy groundwater will have a relatively high proportion of stygobites in comparison to non-groundwater adapted surface species (stygoxenes) (Malard, 2001; Stein et al., 2010). Altered groundwater conditions can favour the colonization of aquifers by stygoxenes, thereby reducing the ratio of stygobites to stygoxenes in the stygofauna assemblage (see Table 2). Even though direct impacts of stygoxenes on groundwater assemblages may not be evident (e.g. Jasinska et al., 1993), exotic species pose a considerable threat to fauna (e.g. Proudlove, 2001), often leading to shifts in ecosystem structure and function.

An apparently universal characteristic of healthy groundwater ecosystems is spatial and temporal heterogeneity of biota. Whereas environmental conditions such as water temperature may vary little over time, patterns in microbial activity and invertebrate abundance can vary considerably over weeks, months (Hose, unpublished data), seasons (Hancock & Boulton, 2009) and years (Eberhard et al., 2009). The significance of this heterogeneity is that it becomes difficult to detect ecological change using routine statistical approaches, or to predict the distribution and abundance of biota based on environmental attributes (Stanford & Gibert, 1994; Stein et al., 2010).

The ecosystem services of healthy groundwater ecosystems

Recent literature has highlighted the ecosystem services provided by groundwater ecosystems (Boulton et al., 2008; Griebler & Schmidt, 2009). Improvement to water quality, such as the removal of nitrogen, breakdown of organic contaminants and the assimilation of DOC, is perhaps the most valuable ecosystem service provided by groundwater ecosystems. This service is largely provided by the microbial assemblages (Griebler, 2001; Griebler & Lueders, 2009).

Stygofauna are considered to provide the service of maintaining porosity within aquifers (through bioturbation and burrowing), and thereby enhancing the flow of water (Boulton et al., 2008). It has also been speculated that they contribute to water quality improvement by grazing on microbes which in turn promotes microbial activity and hence purification capacity (Gounot, 1994; Chapelle, 2001). However, the low abundance of stygofauna coupled with the low density of microbes, suggests that stygofauna grazing is likely to have little affect on water quality (Christian Griebler, Helmholtz Centre, Munich, personal communication). Overall, the provision of ecosystem services from stygofauna remain largely speculative and untested.

Groundwater ecosystem health and stress

Groundwater invertebrate assemblages are likely to have limited stability and resilience because they have evolved under very stable environmental conditions; with the biota accordingly having limited ranges of tolerance for environmental conditions (Humphreys, 2006). Furthermore, ecosystem function and stability are positively related to functional diversity of biota (Hulot et al., 2000; McCann, 2000), hence the truncated functional diversity among groundwater invertebrates (Gibert & Deharveng, 2002), in which specialist predators and grazers are rare and omnivores dominate, may render the ecosystem particularly unstable. The corollary is that the shared or similar function of stygofauna allows for functional redundancy, meaning one organism may take over a function if another declines, creating a situation of ‘biological insurance’ where removal of particular taxa is unlikely to cause a net loss in function (Griffiths et al., 2000).

The resilience and stability of microbial assemblages is largely dependent on the composition and structure of the assemblage, the nature of the disturbance, and the metrics used for its assessment (Botton et al., 2006). The paradigm of assemblage richness leading to greater stability and resilience that has been shown true for soil microbial systems (e.g. Griffiths et al., 2001, 2004) does not always hold true in groundwater (Hashsham et al., 2000). Indeed, it may well be that temporal flexibility in composition provides greater functional stability (Fernandez et al., 2000), or that a specific component of the microbial assemblage is most responsible for stability and resilience (Botton et al., 2006). Various systems have been suggested to measure resilience of ecosystems. While the trophic network and modelling approaches of Ulanowicz (1992) and Costanza & Mageau (1999) have merit, our limited understanding of groundwater ecology and trophic links limits their application at the present time.

A summary of attributes of healthy groundwater ecosystems

Despite their inherent heterogeneity, there appears to be some basic attributes common to healthy groundwater ecosystems. As a generalisation, we expect healthy groundwater ecosystems to have;

  1. 1.

    Where present, an invertebrate fauna dominated by crustaceans, with other groups present;

  2. 2.

    High ratio of stygobites to stygoxenes;

  3. 3.

    Absence of exotic species;

  4. 4.

    Low levels of microbial diversity;

  5. 5.

    Low levels of microbial activity;

  6. 6.

    High ratios of attached to suspended microbes;

  7. 7.

    Low concentrations of nitrogen and dissolved organic carbon (DOC);

  8. 8.

    Absence of synthetic chemicals.

Development of a framework for groundwater ecosystem health

The measurement of ecosystem health is in essence, the recording of relevant attributes at a site and comparing those to the values of attributes expected in the absence of disturbance. Current approaches to assessing ecosystem health can be divided broadly into predictive models and multimetric indices, which differ in how the ‘expected’ attributes are determined. Predictive models are used to predict the expected biological attributes based on environmental data, either from multivariate models based on data from a large number of undisturbed or reference sites (see Wright et al., 2000) or the application of environmental filters (sensu Chessman & Royal, 2004). Deviation of the observed attributes from the expected attributes is used to indicate changes in health. In contrast, multimetric models measure a series of attributes at a large number of undisturbed or ‘reference sites’ with the variations in those attributes used to represent the range of acceptable conditions (Bailey et al., 2004) to which the observed values are compared.

Both predictive and multimetric approaches are widely used in river assessment (e.g. Wright et al., 2000; Hering et al., 2006), and indeed, both have been applied with some success in groundwater ecosystems (Castellarini et al., 2007b; Steube et al., 2009). Each approach has advantages and disadvantages in the information contained (or lost) and the subsequent interpretation of the final metric (see Bonada et al., 2006). However, the heterogeneous nature of groundwater biota, and the current poor understanding of biotic relationships to environmental gradients in aquifers complicates the development of predictive models. Consequently, we have adopted the multimetric approach for ecosystem health assessment, but acknowledge that predictive models may prove more suitable as our knowledge of groundwater ecosystems grows.

The process of developing a multimetric index follows three main steps: (1) indicators are chosen, (2) benchmarks for those indicators are set, and (3) the outcomes for each indicator can be combined into a single and final statistic (Hering et al., 2006). This approach is particularly appropriate for ecosystem health assessment because it allows a range of different indicators (reflecting the various components of ecosystem health) to be included into a final indicator value. The remainder of this paper will describe and discuss the steps to create a multimetric index for ecosystem health, as part of a tiered framework for assessing ecosystem health (Fig. 1), culminating in a case study to demonstrate its efficacy in discriminating disturbances.

Fig. 1
figure 1

A tiered framework for assessing groundwater ecosystem health

Step 1. Selecting indicators

This step may be the most difficult to complete, but among the most important in the framework. It involves choosing the attributes of groundwater ecosystems that are to be measured and assessed as indicative of an aspect of health. Ideally, the indicators chosen should encompass all aspects of ecosystem health. Thus, the attributes used to assess ecosystem health should include those that reflect (1) the condition of the ecosystem and (2) the level of stress the ecosystem is under (Vugteveen et al., 2006). Ideally, these should also include those that reflect current (‘snapshot’) conditions (e.g. water quality) and those that summarise conditions over a longer period (e.g. biota, sediment quality).

Importantly the indicators used must be sensitive to disturbance, broadly applicable across aquifers and regions, and ideally simple and easy to measure. They should also cover the main biotic groups, (microbes, microfauna, meiofana and macrofauna) either separately or together at the ecosystem level. Accordingly we propose several categories of indicators and require that indicators from each category are included in the final metric of ecosystem health. Rules for indicator selection are outlined in Fig. 1, with examples of each given in Table 2 and discussed in the following sections. Importantly, if more than one indicator is included per category, indicators should not duplicate the information provided (e.g. measuring total nitrogen and nitrate concentrations) as this is both a waste of resources, and may bias the final metric.

Ecosystem condition indicators

In accordance with our definition of ecosystem health, indicators of ecosystem condition must cover both the function (such as metabolism and resilience) and organisation (structure and composition). Of the many potential indicators of ecosystem condition, few have been applied in freshwater systems (Vugteveen et al., 2006). Table 2 lists a range of indicators of function and organisation that have been applied to groundwaters.

(i) Functional indicators The functioning of aquatic ecosystems may be considered in terms of activity, metabolism or primary production and includes the measurement of ecosystem services and resilience. As primary production is generally absent, groundwater ecosystems are reliant on and limited by external sources of carbon for energy. As a result, DOC concentrations in groundwater reflect the energy input and potential ecosystem activity. Although, not all DOC is bioavailable, DOC concentrations in groundwater have been correlated with important ecosystem functions such as denitrifying activity (Cannavo et al., 2004), and broader microbial activity (Mauclaire et al., 2000). Interestingly though, DOC is often not well correlated with invertebrate abundance over a moderate DOC gradient. In cases of significant DOC enrichment (e.g. stormwater infiltration or contamination), however, strong positive and negative relationships between DOC and invertebrates have been observed (Masciopinto et al. 2006). Given its relationship to microbial activity, it seems that DOC may be a useful indicator of ecosystem function. However, in light of the uncertainty of the relationship across moderate gradients and the uncertain bioavailability of some DOC, it may be more viable to measure ecosystem activity more directly.

Knowledge of microbial assemblages in soil and surface waters, and the commonality of some taxa with groundwater (Griebler & Lueders, 2009), provides perhaps the strongest base for assessing ecosystem function and activity in aquifers. There are numerous methods for detecting microbial activity and diversity in groundwaters (see Goldscheider et al., 2006 for extensive summary), many of which have great potential, particularly as they can capture the in situ activity of microorganisms that cannot be cultured. For example, adenosine tri-phosphate (ATP) activity is being promoted as a robust technique for estimating microbial activity and biomass in groundwaters (Eydal & Pedersen, 2007). Given the many methods available to assess microbial assemblages, it is not surprising that there is little consensus as to the most appropriate methods for use in aquifers. Furthermore, any single method is rarely sufficient to characterise microbial assemblages and rather, a combination of techniques is preferred.

Currently, the most reliable techniques for establishing microbial activity in groundwater are based on the study of the community as a whole, rather than individuals (Goldscheider et al., 2006). Microbial enzyme activity is routinely measured in soils and surface waters, including the hyporheos (e.g. Claret & Boulton, 2003), but has seen limited application in aquifers, perhaps due to the lower levels of microbial activity in aquifers compared to surface waters (Griebler & Lueders, 2009). The activity of enzymes, as determined by the metabolism of particular carbon substrates can provide a metabolic profile, indicating both the level of activity and functional diversity of the assemblage (Preston-Mafham et al., 2002). So called community-level physiological profiling (e.g. Biolog™ Ecoplates) has been used to successfully indicate differences between polluted and non-polluted groundwater samples (Fliermans et al., 1997; Röling et al., 2000; de Lipthay et al., 2004).

A further measure of functional activity is the rate of organic matter decomposition. The deployment of organic matter such as leaf litter or cotton strips in soil and streams is common place (e.g. Boulton & Boon, 1991; Boulton & Quinn, 2000) with the loss in biomass or tensile strength (in the case of cotton strips) over time used as an indicator of microbial activity. Although this technique has not been often applied to groundwater, recent studies have shown it is a useful surrogate for microbial activity in groundwaters (Lategan et al., 2010). However, its dependence on a relatively small component of the microbial flora may limit its use in broad-scale groundwater monitoring (Lategan et al., 2010). The method has merit in being simple, cheap (Lategan et al., 2010) and able to distinguishing sites of different agricultural land uses (Korbel, unpublished data).

Most functional indicators used in groundwater to date are based on microbial activity, with invertebrate activity rarely quantified. Simple estimates of invertebrate biomass are rarely quoted, but this may well reflect uncertainty in how well the biomass from bore samples reflects that in the broader aquifer, (Hakenkamp & Palmer, 1992; Hahn & Matzke 2005; Hancock & Boulton 2009). The relatively greater abundances of some taxa within the bores may reflect preferential colonisation by those taxa (Hahn & Matzke 2005) and may mean that biomass from bores is not a particularly meaningful index, at least until the relationship between bore and aquifer samples can be better quantified.

The measurement of ecosystem services is essential in any assessment of ecosystem health. Inherently, these services fall under the functional indicator group of indices. The simplest measure of ecosystem services is to determine the presence or absence (or abundance) of specific taxa that provide services (Boulton et al., 2008; Creuze des Chatelliers et al., 2009). For water quality improvements, the genes or proteins related to microbial functions can be quantified (Groffman et al., 2006). However, the service of flow improvement in aquifers is more difficult to quantify. Presumably, the greater the stygofauna abundance or biomass, the greater levels of this service are provided, suggesting that simple biomass measures maybe a suitable surrogate measure of this services.

(ii) Organisational indicators Organisational indicators should reflect the structure and composition of the biotic community present. The simplest measures of community organisation are taxonomic richness and abundance. Numerous studies have shown differences in the abundance, relative abundances, richness and diversity of microbial assemblages across natural (e.g. Findlay et al., 1993; Findlay & Sobczak, 2000; Franklin et al., 2000), or contamination gradients (Cho & Kim, 2000; Griebler et al., 2002; Humphries et al., 2005; Anneser et al., 2010). The diversity and abundance of microbial assemblages can be measured through a variety of techniques including molecular profiling (Shi et al., 1998; Cho & Kim, 2000; Anneser et al., 2010; Stein et al., 2010), laboratory culturing and identification or morphotyping (e.g. de Lipthay et al., 2004, but with the likelihood of missing some uncultivable taxa, e.g. see Griebler & Lueders, 2009), and measuring the ability of microbes to utilise different carbon sources (e.g. Biolog™ Ecoplates, e.g. Röling et al., 2000).

Invertebrate richness in groundwater has proven to be sensitive to changes in water quality (e.g. Sinton, 1984; Culver et al., 1992) supporting their use as indicators of groundwater health (e.g. Stein et al., 2010). Although few ecotoxicology tests have been completed for groundwater species, various groups are believed to be more pollution tolerant (Ward et al., 1992; Notenboom et al., 1994; Pospisil, 1994; Lafont et al., 1996) and others more pollution sensitive (e.g. Amphipoda, Notenboom et al., 1994). Recent studies also suggest that particular invertebrate taxa may provide good surrogates for overall richness and diversity within an aquifer (Galassi et al., 2009a; Stoch et al., 2009).

Incorporating both richness and abundance, diversity indices are used throughout ecology for the comparison of biotic assemblages (Washington, 1984). Many are readily applicable to quantitative data from morphological or molecular based taxonomies or morphospecies and functional classifications (e.g. Claret et al., 2001) likely from groundwater studies. Diversity indices have been used for analysis of interstitial (hyporheic) faunas (e.g. Claret et al., 1999; Marmonier et al., 2000; Mary & Marmonier, 2000), but appear little used for other groundwater ecosystems. A potential limitation of diversity indices may be that data for different biological groups cannot be combined if they are quantified using different methods (see Müller et al., 2002).

Stress indicators

A stressor may be present prior to any change in ecosystem health, but will be absent from healthy, undisturbed systems. As a result, the presence of stressors may be an actual or early indicator of impaired ecosystem health. Examples of likely stressors in groundwater ecosystems are provided in Table 2, which serves as a starting point for the range of stress indicators that may be used. The number of possible stressors acting in an ecosystem is likely to be many, and it will be beyond the capacity of most research and monitoring programs to measure all of these. Instead, selection of stress indicators should be targeted at those that pose the greatest risk, i.e., those that are most likely to occur or likely to be most severe in their impact.

Step 2. Setting benchmarks and the reference condition

The multimetric approach adopted here requires benchmarks for indices to be set according to the values recorded at a number of reference sites. Importantly, reference sites must be matched with tests sites, having similar environmental conditions (see Table 1) but free from disturbance. However, like most ecosystems, there may no longer be any aquifers free from human disturbance, in which case, minimally disturbed sites or ‘natural target conditions’ should be identified and used for reference (Griebler et al. 2010). In the same way, the reference condition approach (Bailey et al. 2004) applied in river health assessment, identifies reference sites as being the ‘best available’ (i.e. not necessarily undisturbed or pristine) but this approach has been criticised for providing ill defined health benchmarks (Chessman & Royal, 2004). However, methods for measuring ecosystem health in the absence of reference sites (e.g. Chessman & Royal, 2004) require intimate knowledge of the physiological and behavioural traits and environmental tolerances of groundwater taxa. Such data are currently lacking for all groundwater taxa, leaving the reference condition approach as the current best means for benchmark setting.

The use of reference sites in groundwater is complicated by the limited availability of bores especially in undisturbed locations (Griebler et al., 2010; Stein et al., 2010). Most bores used for water supply or monitoring are located in agricultural, mining or urban areas, with few located in undisturbed areas suitable for use as reference sites. Furthermore, the expense of constructing new bores solely for monitoring purposes in remote locations is likely to be prohibitive, and issues of groundwater contamination from construction further complicate the search for adequate reference locations (Chapelle, 2001). Finally, the hydrological complexity of aquifers means that the connectivity of different subterranean water bodies cannot be guaranteed without detailed expensive hydrological investigations; thereby further complicating reference site selection.

Heterogeneity of biota within and between aquifers (e.g. Stanford & Gibert, 1994; Danielopol et al., 2000; Griebler & Lueders, 2009) and over time also complicates selection of sites. The effect of this heterogeneity is to limit the ability to detect deviation from reference condition using routine statistical approaches. In order to maximise the similarity among reference and test sites, it is desirable that they are within the same aquifer, however, this may be complicated because of the large number of reference sites likely to be needed to provide statistical power, and the potential hydrological connectivity of nearby sites may invalidate them as independent reference sites.

Temporal variation in reference site conditions may be accounted for by sampling reference and test sites at the same time, although it is desirable that repeated samples of both test and reference sites are made in order to (1) more reliably assess biodiversity (e.g. Hancock & Boulton, 2009) and (2) identify the breadth of conditions encountered under the ‘reference condition’. Repeated temporal sampling of both tests and reference sites is consistent with a BACI-type experimental design (Underwood, 1997) which is widely recommended for environmental impact assessment. However, it may be necessary for targeted and detailed quantitative analysis of selected indicators in such instances because the combined multimetric index of ecosystem health may not have the fine discriminatory power necessary for environmental impact studies.

In the multimetric framework provided, benchmarks are set using data collected from the reference sites, thus taking into account natural background levels of parameters. A variety of statistical analyses that include the range, maximum or minimum values, percentile ranges or a measure of central tendency (e.g. mean, median, mode) are used to set upper and/or lower benchmark values for individual indices. Benchmarks may also be determined based on expert judgement where necessary (Hering et al., 2006). The type of analysis used is determined by the nature of the indicator being studied and the type of data available. Data quality is critical in setting benchmark values; quality control procedures should include screening the data for extreme or outlying values that may be removed before analysis.

Step 3. Generation of multimetric index

Results of indices from test sites are compared to the set benchmarks, and then with a goal of providing a single measure of ecosystem health, each of the individual indices must be aggregated. Common approaches are to normalise and weight each index according to its ecological importance before aggregating them into a single value (Hering et al., 2006). However, limited knowledge of how groundwater ecosystem structure and function respond to various perturbations makes it difficult to put in context any deviation from reference condition. Accordingly, we have adopted a simple approach based on pass/fail criteria for each metric. The number of indices that ‘fail’ when compared with benchmarks is summed, and the proportion of failed comparisons used as both a final indicator of ecosystem health and as a means of ranking those sites.

Combining the multimetric index with a tiered framework approach

The above section has outlined a range of potentially relevant and sensitive indicators, which could together provide an index of groundwater ecosystem health. We now draw on these, and our earlier discussion of the generic biotic and abiotic attributes of healthy groundwater ecosystems, and merge these into a framework for assessing the ecosystem health of groundwater (Fig. 1).

We propose a two-tiered system for health assessment, with the choice of tier dependent on the level of information required and the information, resources and expertise available (cf. Tomlinson et al., 2007). The first tier provides a preliminary assessment that is based and benchmarked on the generic features of healthy groundwater ecosystems described earlier (Table 3), although where existing local data are available (e.g. water quality), Tier 1 benchmarks maybe adjusted based on local conditions. This level serves as a preliminary screen for groundwater health. It is intentional that Tier 1 indices require minimal technical expertise for sample collection and analysis, with more complex tasks (such as chemical analyses) done routinely by analytical laboratories. A ‘No’ answer to any of the questions or exceedance of threshold values (Table 3) should prompt a Tier 2 assessment (Fig. 1), although Tier 1 assessment will indicate a deviation from reference condition and allow ranking of sites, albeit with coarse resolution.

Tier 2 assessments require more detailed investigation, including the selection of indicators and sampling of reference sites in order to set locally relevant benchmarks. Tier 2 assessments will require greater effort, cost and expertise, but will provide a more robust assessment of health.

Application of the framework: a case study

Tier 1 assessment

Samples were collected from the five test sites (for details see “Methods”) and compared against the generic benchmarks proposed in the framework (Table 3). The Tier 1 assessment indicated that three of the five test sites failed the nitrate benchmark, with another site containing synthetic chemicals. Thus, Tier 1 assessment indicated that four of the five sites had impaired ecosystem health, triggering a Tier 2 assessment. The test site located outside the agriculture region (Site 5) was in similar health to the reference site; however, for illustrative purposes a Tier 2 assessment was also performed on this site.

Tier 2 assessment

1. Choice of indicators Microbial and invertebrate assemblages were examined as indicators of ecological condition, along with water quality variables as indicators of condition and stress (Table 4). The indices chosen required one sampling visit per site; with an additional visit to retrieve cotton strips left in situ (see Lategan et al., 2010). However, we recommend that any definitive study of groundwater health sampling should involve more than one sampling occasion at each site, in order to address likely biotic heterogeneity (e.g. Hancock & Boulton, 2009). Indices chosen included six representing ecological condition (four functional and two organisational indices) and three stress indices.

Table 4 Indices and their thresholds developed as a case study of Tier 2 indicators for aquifer ecosystem health assessment in the Gwydir R valley, NSW, Australia

2. Benchmark setting and multi-metric analysis Benchmarks were set for each of the nine indices based on the range of data collected or the presence/absence of particular conditions among the four ‘reference’ sites. The indices for each of the five ‘test’ sites were calculated, compared to the reference benchmarks, and the number of fails for each site summed (Table 4) and used as the basis for site health ranking. Site 3 had the most non-compliance, with six out of nine indices failing benchmark levels. All other impacted sites had fewer failed indices (Table 4).

Discussion

From the case study, it is evident that the tiered framework provided is able to distinguish between impacted and non-impacted sites (Table 4). Both Tier 1 and Tier 2 assessments indicated that the test sites located in irrigated cropping regions had impaired ecosystem health and the ‘reference condition’ test site (Site 5) was in relatively good health (Table 4).

Benchmarks associated with invertebrate assemblages were the most frequently exceeded in the case study. Only low richness and abundances of invertebrates were recorded across reference sites, and indeed, some reference sites had no invertebrates, but this probably reflects the heterogeneity of the assemblages and highlights the need to sample a bore more than once to adequately assess the richness of that site (Eberhard et al., 2009; Hancock & Boulton, 2009). The richness and abundance of invertebrates at test sites were generally greater than at reference sites, which is consistent with the intermediate disturbance hypothesis (Connell, 1978) which predicts higher richness and abundance in the presence of mild nutrient enrichment (here DOC and nitrate concentrations).

Further in the case study, the abundance index was exceeded at all nominally impacted sites although abundances at Sites 1 and 2 were close to the threshold values (Table 4). Using the range for this index, and the small sample size may be too limiting in that it may not adequately account for the range of natural variability. Here a 95% confidence interval, having a potentially greater threshold range, may better discriminate sites 1, 2 and 5 (having relatively lower abundances, similar to the reference sites) from Sites 3 and 4 which were clearly higher. The taxonomic richness index may be similarly conservative and may also be better served with a broader threshold range. Water quality variables appeared useful indicators of condition, particularly nitrate and the presence of pesticides.

Conclusion

Our definition and framework provide an initial structure to guide ecosystem health assessments in groundwaters. There remains much work to do in order to further develop the indicators and metrics and indeed, there are a number of indicators already in use in groundwaters and many more developed for surface waters that would be readily applicable. What is currently lacking is a detailed understanding of how indicators, or changes in indicator values, either together or separately, reflect the overall structure and functioning of groundwater ecosystems. Indeed, these limitations have constrained our final ecosystem health metric to being a count of failed indicators, rather than a grading of each indicator and a weighted total of those values.

Our framework outlines a series of indicator types that are necessary to cover the major components of groundwater ecosystems. The framework is flexible, both in the level of detail and effort required (i.e. the tier used), and the specific indicators used, which can be selected to suit local conditions, issues and ecosystem types. In our case study, the framework could distinguish between nominally healthy and impacted sites at both tiers of assessment. However, this framework requires broader application and testing, particularly in other regions and aquifer types. We hope that it can be widely applied, and improved and refined by the diversity of inputs as a result.