Introduction

Halting the loss of biodiversity on Earth is a prerequisite to avoiding impairment to, or loss of, ecosystem services that are either directly or indirectly dependent on biodiversity. According to the Millennium Ecosystem Assessment (MA 2005b), biodiversity loss is linked to “the degradation of many ecosystem services [and] could grow significantly worse during the first half of this century […]”. Ecosystem services, such as food and wood production, self purification and nutrient cycling, are based on ecosystem functions and underlying processes (e.g., production and decomposition of organic material). There is some evidence that ecosystem functions are related to biodiversity (Loreau 2000; Diaz et al. 2006; Luck et al. 2009), even if these linkages are still not well-understood and difficult to quantify (see Srivastava and Vellend 2005 for a review). Despite this knowledge gap, large-scale assessment and monitoring of biodiversity and ecosystem services in general require the development and application of indicators. Suitable indicators can help detect changes in ecosystems and the subsequent provision of ecosystem services over time and, thus, help policy and decision makers take appropriate actions to stop and reverse negative trends. But what exactly is a suitable indicator?

In the context of this study, ‘suitable’ refers to various characteristics that render an indicator more or less suited for its specific purpose, i.e. biodiversity indication with a particular focus on, or relationship to, ecosystem service indication. As a general prerequisite, indicators need to be reliable measures capable of simplifying complex relationships and they should be ecologically interpretable (for general suitability criteria see also McGeoch 1998; Dale and Bayeler 2001; Duelli and Obrist 2003). Further, in order to easily communicate indication results and trends to policy and decision makers, indicators should be quantifiable and transparent (Balmford et al. 2005). Last but not least, indicators need to fit the purpose of indication. This purpose refers to what Failing and Gregory (2003) named ‘endpoints’ of indication. Endpoints in biodiversity and ecosystem service indication can be manifold and ultimately determine the (type of) indicator to be applied—be it abiotic or biotic measures, single keystone species or entire community measures, measures of taxonomic and functional diversity, market prices or measures of the production rate of a good. A logical core criterion in biodiversity assessment and monitoring, however, is the biological relevance of indicators, i.e. indicators should be proven to show a measurable and quantifiable relationship to biodiversity (e.g., Balmford et al. 2005). Furthermore, this relationship should ideally be direct and should account for different aspects (or types) of biodiversity as, for instance, outlined by Noss (1990): taxonomic richness, composition and structure, functional (trait) diversity, and genetic diversity.

Conceptual frameworks can help facilitate and structure indicator development, such as the Driver-Pressure-State-Impact-Response (DPSIR) framework (e.g., EEA 2007) and the SMART approach (Shahin and Mahbod 2007). The DPSIR framework is applied in large-scale ecosystem assessment and biodiversity monitoring to capture and describe the relationships between society and the environment. Knowledge of the relationships between the D, P, S, I and R components enables policy makers to link indicators for societal drivers, environmental pressures and biological diversity, which assists the identification of appropriate policy actions to halt the loss of biodiversity (see Rounsevell et al. 2010). The SMART approach defines five criteria that could be applied to set management goals, i.e. the goals should be specific, measurable, attainable, realistic and time-sensitive (e.g., Shahin and Mahbod 2007). Three attributes (specific, measurable and time-sensitive) also apply to the indicators necessary to measure and assess progress towards these management goals.

However, despite the availability of such conceptual frameworks and a considerable body of literature on indicator development (see Feld et al. 2009 for a recent review), many indicators of biodiversity and ecosystem services do not meet these general suitability criteria. The review by Feld et al. (2009) revealed several shortcomings with current indicator initiatives and indication approaches, including, amongst others, a lack of biodiversity indicators that account for the different components of biodiversity (e.g., a focus on species richness only), a lack of direct linkages of indicators to biodiversity and a lack of indicators to quantify ecosystem services (see also Feld et al. 2008 for a more detailed report). Some of these ongoing shortcomings of ecosystem indication towards the 2010 goal have also been criticised by Balmford et al. (2005) in a generic capacity, and much of the criticism remains relevant.

In 2003 the Convention on Biological Diversity (CBD) proposed a list of indicators to track trends in ecosystem biodiversity and related ecosystem services at the global scale (UNEP/CBD/COP7 2003). The European initiative to “Streamline European Biodiversity Indicators by 2010” (SEBI 2010, EEA 2007) has recently presented its first list of 26 European Biodiversity Headline Indicators (EEA 2009, review draft Annex 4, 19 Dec 2009). However, only very few of the proposed headline indicators show a direct relationship to the State of, and Impact on, biodiversity, while the majority of indicators rather refer to the Pressure and Response components of the DPSIR framework. Furthermore, bioindication (i.e. the use of biological indicators) is often restricted to relatively fine scales, namely regional to local, while area and other abiotic measures are applied at larger scales up to the global scale (Feld et al. 2009). Comparatively few studies directly indicate and assess regulating and supporting ecosystem services (i.e. ecosystem processes that underpin many provisioning services) (e.g., Díaz et al. 2007; Lara et al. 2009), while provisioning and, in particular, market-related services such as food and timber are easily indicated and assessed by monetary measures or simply by the volume/weight of a good.

An almost neglected issue in biodiversity and ecosystem service indication is the setting of benchmark or reference values. The reference condition approach is a key element in European ecosystem monitoring and management frameworks (e.g., Directive 2000/60/EC) and allows for the indication of the deviation of current conditions from desired natural (target) conditions (e.g., Nijboer et al. 2004). The necessity for setting reference conditions (e.g., targets for conservation, rates of ecosystem services) has already been addressed by Balmford et al. (2005) in the context of the CBD indicators of biodiversity, but the approach continues to be missing in the list of CBD and SEBI 2010 indicators. Consequently, the indicators will continue to focus on the detection of trends until such targets or thresholds are identified and applied in biodiversity and ecosystem service assessment.

Given these ongoing shortcomings in indicator development, an attempt is made in this study to contribute to existing conceptual frameworks. First, a set of indicator suitability criteria is framed, based on the indication requirements as outlined above. The criteria are then applied to a selection of existing indicators of biodiversity and ecosystem services in order to evaluate their practicability. We discuss the rationale behind these criteria and their justification for successful future ecosystem indication and show how the criteria might help render future indicators of biodiversity and ecosystem services more relevant towards the 2010 goal. Finally, suggestions are made for actions to improve existing indicators to meet the criteria and to ultimately meet the purpose of indication.

The framework of suitability criteria

We defined seven criteria to assess the general suitability of existing indicators of biodiversity and ecosystem services. The criteria are listed in Table 1, while their interdependence is illustrated in Fig. 1. These criteria provide a kind of general checklist for indicator development and testing, which, if applied consistently, will help assist the development and application of indicators of biological relevance across spatial scales.

Table 1 Indicator suitability criteria and rationale
Fig. 1
figure 1

Schema showing the interdependence of the indicator suitability criteria. Note that the criteria do not represent a hierarchical order from the top to the bottom. ① For many indicators the spatial scale refers to the scale of application of the indicator, which might be different from the scale of measurement. Upscaling of fine-scale indicators (e.g., local/regional richness measures) is often easily possible by aggregation, in particular if expressed as a relative measure. In contrast, downscaling of broad-scale indicators (e.g., area of organic farming at national/sub-continental scale) is often difficult or even impossible if targeting at fine scales (local, a single patch or farm). ② Indicators derived from remote sensing (e.g., ecosystem area and fragmentation using CORINE data) are easily upscaled, whereas downscaling is limited to the spatial resolution of the data source.③ The definition of threshold or reference values for biodiversity or ecosystem service rates provide the opportunity to derive relative measures, such as the ratio of observed to expected values (O/E). ④ O/E ratio and other relative measures—in contrast to trend monitoring—provide a means for assessment, even of a single field measure, and ⑤ could be easily aggregated to larger spatial scales (e.g., relative number of endangered species per ecosystem at regional, national and sub-global scales. ⑥ The application of indicators in routine monitoring requires the availability of both data and protocols to sample or otherwise derive such data. This is likely to differ considerably between countries. Remote sensing offers both data and the procedures to derive and interpret suitable measures (indicators), which are largely comparable up to the global scale

The first criterion addresses the purpose of indication and tests whether the (ultimate) purpose of indication (sensu Failing and Gregory 2003) has been defined and, if the answer is yes, whether the indicator might potentially meet the purpose for which it was developed. For instance biodiversity monitoring and assessment constitute two different purposes; while target (reference or benchmark) biodiversity values would be required for an assessment (i.e. valuation), biodiversity monitoring does not necessarily require such targets. Other purposes might include ecosystem service indication, ecological quality assessment or ecosystem management (Table 1). Distinguishing between biodiversity and ecosystem service indication might be difficult, since both are frequently reported to be closely associated (e.g., Srivastava and Vellend 2005). Yet, to what degree biodiversity actually affects ecosystem service provision remains unclear.

The second criterion addresses the indicator type and is linked to the purpose of indication. Following the DPSIR framework, five basic objectives are distinguished: indication of Drivers, Pressures, States, Impacts and Responses. In general, with regard to indicator type, current biodiversity policies seem to focus on indicators of biodiversity status and trends and on possible environmental pressures on biodiversity (e.g., UNEP/CBD/COP7 2003; EEA 2007). However, SEBI 2010 and the CBD also list a few response indicators, for instance, “sustainable use”, “benefit sharing” and “public opinion”. Nevertheless, there is a need for indicators which directly assess and monitor (policy) response to halt and reverse negative trends in biodiversity and ecosystem service provision to inform future biodiversity policies. Following the circular DPSIR scheme, the response should then lead to a reduction in the strength of drivers and pressures. Trends in the strength of drivers and pressures in turn might be used to measure the success of policy response measures. Overall, indicators are required that cover the different steps from threat to action.

The third criterion refers to the association of an indicator with specific biodiversity attributes or ecosystem service categories. According to Noss (1990), biodiversity attributes include compositional, structural and functional aspects, although genetic aspects are also an important additional category. The classification of ecosystem services follows the Millennium Ecosystem Assessment (MA 2005a) and distinguishes provisioning (e.g., food, water, fuel), regulating (e.g., water and air regulation), cultural (e.g., recreation, spiritual values) and supporting services (e.g., nutrient cycling, photosynthesis).

The fourth criterion, spatial scaling and scalability across scales and ecosystems, is related to the spatial extent to which indicators are actually applied, or might be applied by up- and downscaling of indicator values to broader and finer spatial scales, respectively. The indication may be confined to particular ecosystems, for instance, in the case of measuring the amount of dead wood in old-growth forests as a proxy for disturbance or naturalness, but it may also be defined across ecosystems, such as measuring nitrogen emissions as a proxy for environmental pollution and eutrophication in both aquatic and terrestrial ecosystems. Indicators applicable across ecosystems are considered useful for cross-system comparison of status and trends. In this sense, the question of spatial scaling is very important in ecosystem indication (Niemi and McDonald 2004). However, regarding spatial scaling in ecosystem monitoring, it is also important to pay attention to the scales of sampling and indicator application (e.g., Comin et al. 2004; Johnson et al. 2007; Kail and Hering 2009). Sampling and data gathering is frequently carried out ‘on-site’, i.e. at the local scale, and site-specific results can often easily be mathematically up-scaled (e.g., combining local nitrogen contents into regional or national mean values) and are usually illustrated in maps at the regional (e.g., entire river basin) or larger scales. With the suitability criterion ‘Scalability’ we therefore address the potential applicability of indicators across spatial scales. In a more detailed analysis of biological indicators of biodiversity, Feld et al. (2009) found that many indicators are frequently used at local to regional scales (e.g., alpha and beta diversity indices, richness measures, genetic variability) while abiotic indicators, such as measures of area and fragmentation, are applicable at regional (landscape) and broader scales. The authors revealed a strong operational linkage of indicator types (abiotic vs. biotic) and spatial scales (local vs. regional and larger scales). While statistical upscaling of GIS-derived landscape metrics to compare larger areas of interest is theoretically possible, some studies do advise caution in the actual methodologies used to upscale field-based point monitoring to landscape scales (e.g., Dawson et al. 2003). Downscaling, however, is largely limited by data availability and resolution (EEA 2006). A pixel size of 25 × 25 m in CORINE maps, for instance, sets the theoretical minimum area at approximately 0.0625 ha, while the minimum area for a land cover class to be recorded is 25 ha. Biological indicators, in contrast, are different across ecosystems and spatial scales, which renders such species- and population-based measures often incomparable across regions and ecosystems (e.g., Huston 1999; Nortcliff 2002). The broadly-applied number of International Union for Conservation of Nature (IUCN) red-listed taxa (Rodrigues et al. 2006), for instance, does not allow for comparisons across ecosystems in the same region or comparisons of similar ecosystems in different regions. Generalisations, however, might be used across ecosystems, such as the relative proportion of threatened or otherwise listed taxa.

The concept of reference conditions (also referred to as the reference condition approach) is addressed by the fifth criterion. It is applied in various ecosystems, for example, in rivers (Wright et al. 1993; Reynoldson et al. 1997; Directive 2000/60/EC; Bailey et al. 2007), lakes, estuaries and coastal waters (for example Directive 2000/60/EC), forests and grasslands (Swetnam et al. 1999) and drylands (Boer and Puigdefabregas 2003). The approach is particularly prominent in aquatic ecosystems, where reference conditions have been defined and introduced to ecosystem assessment and monitoring for more than a decade (Wright et al. 1993; Davis and Simon 1995). At the European scale, the concept is currently being transferred to soil ecosystem assessment and monitoring (Breure et al. 2005). The rationale is to define threshold values or attributes that reflect natural conditions including natural variability. The conditions of a test site are compared to these thresholds and the deviation from the reference is used to assess, for instance, the status of naturalness or disturbance. The application of standard reference conditions would enable conservationists to set quality targets and to evaluate ecosystem conditions against these targets, a prerequisite for assessment sensu stricto.

The sixth criterion refers to the generation of consistent and comparable data using standardised sampling protocols, for example, standards to determine physico-chemical parameters or standardised field methods to sample organisms or characteristics of them. This requirement is well acknowledged by numerous national, European (EN Standards) and International Standards (ISO Standards) on sampling, processing and analysis of various abiotic and biotic parameters. However, to our knowledge, standards on data generation for assessing and monitoring biodiversity and ecosystem services are currently lacking. Data also needs to be comparable in terms of sampling effort, and numerical and spatial scaling. Thus, sufficiently comprehensive, yet easily applicable and standardised sampling protocols are likely to help gather this data. A high level of data quality and comparability is also likely to facilitate comparisons across regions and even ecosystems and might better enable up- and down-scaling across different spatial scales. Thus, data quality and scalability are, to some degree, interrelated (Fig. 1). The common protocols for sampling and sample processing, for instance, that were developed and applied within various recent EU-funded projects on the implementation of the European Water Framework Directive set standards for the generation of consistent data (Hering et al. 2003; Furse et al. 2006). Consequently, the resulting lists of taxa, physico-chemical parameters and hydromorphological attributes of streams and rivers were comparable at the pan-European scale, as well as at ecoregional and national scales.

The seventh criterion addresses the applicability of remote sensing to obtain the required data for indication. Remote sensing data (e.g., ecosystem and habitat area, vegetation status, degree of fragmentation) usually provides a comparable data source (EEA 2007). Satellite images are available at regional to global scales and across multiple time-scales. Satellite-derived remote-sensing indicators, such as vegetation indices, enable us to compare and scale-up data measured from fieldwork to multiple spatial scales. Indicators that are based upon remote sensing data allow for cross-comparisons of biodiversity and ecosystem services at comparatively broad scales and across different ecosystems (e.g., Nagendra 2001; Duro et al. 2007).

Applying the suitability criteria to selected indicators

The criteria are applied to a selection of 24 indicators (listed in Table 2) from those listed in Feld et al. (2008) and EEA (2009). Our aim was to test the practicality of the criteria against a selection of indicators that (i) explicitly address biodiversity assessment and/or ecosystem service indication and (ii) have been widely adopted within and/or across terrestrial and aquatic ecosystems. In their recent review of more than 600 publications addressing more than 500 indicators of biodiversity and ecosystem services, Feld et al. (2009) have shown that the vast number of published indicators can be reduced significantly by a focus on relevant studies. Thus, indicators that do not explicitly address biodiversity and/or have not been applied beyond the scope of a publication were not considered in the selection. Hence, the selection of indicators is not intended to be representative of all ecosystem indicators, but rather to capture a range of relevant indicators to test the defined set of seven suitability criteria. We believe that the general trends presented in this study would hold true with another or larger selection of (relevant) indicators. Nevertheless, we are aware that further assumptions that might be derived from the findings presented below deserve further consideration to prove, or otherwise, the underlying hypotheses.

Table 2 Selected indicators to test the indicator framework

The first general trend obvious from Table 2 refers to the indicator type. With only a few exceptions (No. 12, 15, 16 and 17), indicators of status and trends in biodiversity dominate the selection. This is in line with a major finding of Feld et al. (2009), based on a review of more than 500 indicators published in peer-reviewed journals. Both abiotic (e.g., 1, 3, 5, 18) and biological indicators (e.g., 2, 4, 6, 10) can be found in the status and trends group, but abiotic measures do not necessarily fit the purpose of indication as they lack a direct link to biodiversity (e.g., 5, 7, 12, 15, 18, 22). Among indicators that directly address biodiversity, richness measures dominate (9, 10, 13, 14 and 21), a finding that is supported by previous studies (e.g., Loreau et al. 2002; de Bello et al. 2006; Feld et al. 2009). The most frequent richness measure is the number of selected sensitive or threatened species, although this component only provides an incomplete overview of the overall biodiversity of a system. Functional (diversity) measures make use of ecological traits (e.g., 19, 24), but are rarely found; the same applies to measures of genetic diversity (No. 4 only in our selection).

Several indicators directly address the provision of ecosystem services (4, 8, 10, 12, 15, 20 and 24). If ordered according to the spatial scales of application, indicators of supporting services seem to be confined to the local scale (8, 10, 24), while indicators of provisioning services range from the regional to the global scale (4, 12, 15, 20). Whether or not this is an artefact of the small and presumably biased selection of indicators would be worth testing with an extended list, as such a pattern may help to identify gaps, but also limitations in ecosystem service indication.

Interestingly, reference values could be defined for at least 18 out of the 24 indicators listed in Table 2. Yet, their actual application is limited to indicator No. 5 and 8, for both of which reference values exist, either defined as biological benchmarks (Breure et al. 2005) or as nitrogen ‘critical load’ (EEA 2007, 2009, review draft Annex 4, 19 Dec 2009). For indicator No. 12 benchmarks theoretically exist (as good/high chemical status) as a consequence of the implementation of the EU Water Framework Directive (Directive 2000/60/EC), but these benchmarks have not been applied in the context of the Biodiversity Headline Indicators. For most of the indicators listed in Table 2 sampling protocols and/or data already exist. At the pan-European scale, data coverage and comparability is very good if based on remote sensing. The application of remote sensing data, however, is largely restricted to indicators of area and fragmentation (e.g., 1, 3, 7, 11, 18). Most biological measures still require on-site sampling and measuring.

Discussion

The suitability criteria defined in this study (Table 1) are not independent of each other. In particular, data quality (based on the application of standardised protocols for sampling/data generation) and scaling/scalability of indicators are linked to some level (see also Fig. 1). For instance, if national indicator values are to be compared and analysed at the regional or continental scale (=upscaling), standardised protocols are required at the national level to allow comparison. On the other hand, the application of O/E ratios (Fig. 1, ③) facilitates the comparison of results, as O/E ratios can be calculated based on different sampling protocols, while they are usually expressed as values ranging from 0–1 (or 0–100%). Despite this interrelation of suitability criteria, the following discussion is structured by the criteria for clarity reasons.

Is the purpose of indication met?

It is not a trivial exercise to derive the purpose from biodiversity indicators, as many references lack a clear statement of the endpoint(s) of indication sensu Failing and Gregory (2003). The endpoints of the CBD Biodiversity Headline Indicators and of SEBI 2010, for example, can be defined in various ways. “SEBI 2010 was established in 2005 as a process to select and streamline a set of biodiversity indicators to monitor progress towards the 2010 target of halting biodiversity loss and help achieve progress towards the target” (EEA 2007). This essentially requires that biodiversity indicators are suitable to monitor status and trends in biodiversity and should help to detect progress towards halting the loss of biodiversity.

In this study, several CBD/SEBI 2010 indicators are listed in Table 2 (No. 1–7, 12, and 15). The indicators fit the purpose by definition, but interestingly, only a few out of the total of 26 indicators of SEBI 2010 (not all listed in Table 2) directly refer to biodiversity; the majority do not, which also applies to other indicators of biodiversity in Table 2. Typical examples of the latter are area-based proxies of biodiversity, whose relation to biological diversity is derived from island biogeography theory (species-area relationship acc. to MacArthur and Wilson 1967). “Trends in extent of biomes, ecosystems and habitats”, “connectivity/fragmentation of ecosystems” and “coverage of protected areas” follow the assumption that biodiversity increases with increasing area, although there remains some important questions on the nature of this relationship over different spatial scales (Lomolino 2001). The latter indicator also belongs to the group of (policy) response indicators; a typical policy’s response is habitat protection and management.

A shortcoming of this assumption might be that island biogeography theory does not refer to the different components of biodiversity (e.g., according to Noss 1990), but exclusively considers species richness. Consequently, area-based proxies rather focus on the richness component and tend to neglect other compositional, structural, functional and genetic components of biodiversity. At present, only one SEBI 2010 indicator refers to the functional component: the Marine Trophic Index (Pauly and Watson 2005). This aspect continues to be omitted by existing biodiversity indicator initiatives, although several studies have implied a significant role of single species’ or groups of species’ functions in ecosystem service provision (Diaz and Cabido 2001; Tilman et al. 2001; Scheu 2003; Lavelle et al. 2006; Luck et al. 2009). As a similar finding is obvious for indicators of the genetic component of diversity, the conclusion must be that, regarding the purpose “to monitor the status and trends in biodiversity”, many proposed indicators are likely to fail to fully meet the purpose as their linkage to biodiversity is too much focused on species richness (see also Balmford et al. 2005). If we went one step further and implied that biodiversity protection is not an end in itself, but rather aims to assure the sustainable provision of ecosystem goods and services, we could argue that current biodiversity indicators rarely account for underlying ecosystem functions and processes to provide such services. Thus, more effort should be spent on the development and testing of service indicators, in particular of those indicators addressing regulating and supporting services.

Are relevant spatial scales sufficiently addressed?

This study shows that upscaling is often possible with both abiotic and biotic indicators. The only exception found in this study was “key indicator species” (No. 16 in Table 2), which cannot be easily scaled up or down due to their often limited distribution. A particular point to note is those indicators formed by the large (and growing) group of vegetation indices and measures of habitat area and fragmentation/connectivity statistics that are derived from remote sensing data. Here, both sampling and application usually refer to broader scales, while upscaling up to the global scale is easily possible. In contrast, downscaling is limited, for instance, by image resolution (EEA 2007; Nagendra and Rocchini 2008) and the suitability of metrics (Bailey et al. 2007; Buyantuyev and Wu 2007).

A number of species-level global biodiversity indicators are currently under development to address the gap in biodiversity indicators capable of measuring trends at a global scale. These include the Living Planet Index (LPI, Loh et al. 2005), the IUCN Sampled Red List Index (SRLI, Rodrigues et al. 2006), the Biodiversity Intactness Index (Scholes and Biggs 2005) and site-based approaches for monitoring population trends. Nevertheless, the usage of biological indicators of biodiversity is often restricted to the regional or finer scales. One reason for this might be the relatively high costs and efforts for extensive fieldwork to obtain the necessary data of a sufficient quality. This is surely true, but the European Water Framework Directive has impressively shown that site-based species- or genus-level biological indication of aquatic ecosystems is applicable even at the pan-European scale. The implementation of the Water Framework Directive in 2000 initiated a concerted action by all member states to develop new biological assessment systems to evaluate the ecological quality of their marine and fresh waters. And although the member states finally developed different indication systems, the results are subject to a pan-European comparison called ‘Intercalibration Exercise’ (e.g., Birk et al. 2006). Among the new indicators, the trait-based (functional) approaches in particular continue to gain increasing attention in freshwaters (e.g., Feld and Hering 2007; Dolédec and Statzner 2008; Verberk et al. 2008a, b; Feld et al. 2009), grasslands (e.g., Moretti et al. 2008) and soil ecosystems (e.g., Mulder et al. 2005; Parisi et al. 2005; Winding et al. 2005). One important assumption for the broad-scale application of traits is that, unlike with species, the functional characteristics of a community are similar in the same ecosystems across different regions (Baird et al. 2008).

Is the reference condition approach applicable and is it applied?

The reference condition approach is rarely implemented in biodiversity and ecosystem service assessment, although the approach could be easily applied in this field of indication. Indicators, such as “trends in extent of biomes, ecosystems and habitats” or “trends in abundance and distribution of selected species” could be assessed against reference values defined for the area of biomes or for the abundance and distribution of targeted species. Conservation targets could be defined as benchmarks for habitat management whilst desired service rates could be set as benchmarks for ecosystem service assessment (see also Balmford et al. 2005). One of the advantages of this approach is that assessments can be made without the need to measure long-term trends and changes. By comparison with a biodiversity reference condition, immediate assessment would be feasible and, moreover, the results would allow deviation from the reference to be estimated. This deviation is equal to the status of biodiversity, which could be worse even if the long term-trends show an increase. An increase from bad to poor biodiversity conditions, for instance, would indicate a positive trend, but would still mean a considerable deviation of the current status from the desired or targeted conditions. This example may also illustrate that mere trends do not suffice to indicate and assess the status of biodiversity or ecosystem services. Thus, although the ‘status and trends of the components of biodiversity’ are considered one indicator category in the context of the CBD and SEBI 2010, it might be wise not to confuse status and trend indicators.

Biodiversity reference conditions do not necessarily have to be static; they might account for natural dynamics as well as for shifting biodiversity baselines, for instance, due to the impact of climate change. Furthermore, a common understanding of the term ‘reference conditions’ among scientists, practitioners and policy makers will be needed. In the field of aquatic ecosystem assessment and monitoring in Europe, the Water Framework Directive (WFD) unambiguously defined reference conditions as ‘no, or only very minor, anthropogenic alterations to the values […] normally associated with […] undisturbed conditions’ (Directive 2000/60/EC, Annex V 1.2). In other words, reference conditions according to the WFD refer to natural environmental and biological conditions. In contrast, targets for conservation and management of ecosystems might deviate significantly from such natural conditions, for instance, if these are considered not attainable due to socio-economic restrictions. Stoddard et al. (2006) provided a useful summary of different targets for stream monitoring that are often confused with reference conditions.

Such quality targets are applied with the SEBI 2010 indicator “critical load exceedance for nitrogen” (No. 5 in Table 2), for which nitrogen thresholds are defined, above which a severe impact on biodiversity is likely to occur (biodiversity damage, EEA 2009, review draft Annex 4, 19 Dec 2009). For landscape metrics, a maximum allowable fragmentation rate could be defined that is ecosystem-specific in order to set the threshold below which the system is still sufficiently connected.

If expressed in mathematical terms, the reference condition approach might be expressed as the ratio of (actual) Observed and Expected (reference) values, which would allow the definition of O/E ratios. Such normalised O/E ratios could be defined for almost all kinds of indicators and for the multiple components of biodiversity. The O/E ratios could be used to assess and compare the status of biodiversity across regions and across ecosystems. Hence, upscaling and aggregation of comparable results across multiple spatial scales would be feasible. Three other advantages render O/E ratios particularly valuable for ecosystem assessment: First, by providing a standardised and comparable measure, they would facilitate predictive modelling of biodiversity trends under changing environmental conditions. Second, they would allow appropriate biodiversity targets for the protection and restoration of ecosystems to be set and, third, they would provide a suitable measure to assess progress with respect to the endpoints of indication, i.e. to assess the effectiveness of policy response.

Although the advantages of using the reference condition approach in biodiversity assessment are considerable, we do not want to conceal the difficulties in adopting this approach. It is unlikely that references can be defined for all indicators; natural values cannot always be expected for individual indicators as they do not necessarily exist. For example, indicators such as ecosystem key species, umbrella species or IUCN red-listed taxa mainly account for the presence of these species/taxa. The setting of reference values would require at least the additional definition of natural (=expected) abundances for the taxa. Furthermore, reference values (i.e. the values or conditions of specific parameters at reference sites) based upon species populations, relative abundance or presence/absence monitoring, for example, are often likely to be subject to natural inter-annual variability and dynamic changes (through climate variability or biotic interactions, for example), which need to be sufficiently understood in order to provide a sound basis for ecosystem assessment.

Are data availability and data quality sufficient?

Our test of selected indicators potentially reveals some general problems that frequently occur with bioindication at national or larger scales. Over large areas within a region of interest data is often patchy and, hence, does not sufficiently represent the region as a whole. The patchy availability of data is often the result of the comparatively huge effort to obtain the data and the costs connected with field studies. These problems and limitations do not apply to indicators based on remote sensing data. In remote sensing, large areas are easily and quickly scanned by aerial photographs, maps and satellite imagery. The data obtained are comparable, having been radiometrically calibrated and analysable across different spatial scales using a set of standard GIS-based software tools at low cost. This may explain the comparatively advanced status of indicators based on remote sensing data, whose history is rather young if compared to classical bioindication (e.g., the saprobic system, Kolkwitz and Marsson 1902, 1908). Nevertheless, the application of our framework to selected examples revealed that the application of remote sensing itself is limited in bioindication (Table 2). Of the 24 indicators tested, 18 biological examples cannot make use of remote sensing, simply because the size of these biological entities is far below the resolution of satellite imagery and even aerial photography or not detectable in the spectral domain. Bioindication is therefore usually linked to small-scale sampling. The subsequent application (upscaling) at larger scales and the comparison of results, however, strongly depend on the comparability of sampling and sample processing methods. In other words, standardised sampling protocols are required. Such protocols frequently exist for national/regional monitoring schemes, but with few exceptions (e.g., Römbke et al. 2006) are largely lacking at the continental and global scale, save for the group of landscape indicators derived from remote sensing.

Whether data on other abiotic indicators, for example on nitrogen deposition, is available at a pan-European scale, in particular if reduced forms of nitrogen are to be included, is questionable. The report on the CBD Conference of the Parties makes the following statement concerning the availability of nitrogen data: “nitrogen additions can be estimated […] for some countries and watersheds. Some data is also available for nitrogen loads in aquatic ecosystems.” (UNEP/CBD/COP/7 2003, p. 14). This implies that data availability on nitrogen deposition is rather patchy and limited to some watersheds and aquatic ecosystems. The same is likely to apply to the indicator “area under sustainable management”; the data required for this indicator is available for production systems, which are often complemented by certification schemes (e.g., for sustainable forest management). However, “data availability and reliability is variable” (UNEP/CBD/COP/7 2003). The same is likely to apply to national statistics on fish landings, which are used to calculate the Marine Trophic Index. Such (commercial) statistics do not cover representative samples of fish communities (e.g., http://www.jncc.gov.uk/page-4248), but are likely to be biased towards high-value target species.

How to streamline and improve future ecosystem indication

Develop direct bioindicators

One of the main challenges in ecosystem indication remains the development of suitable direct indicators, i.e. indicators that directly refer to the component of biodiversity or to the functions and processes behind a certain ecosystem service. Scientists often discover that certain ecosystem services are provided by a small number of species or a functional group of species rather than by the whole diversity present in that ecosystem (e.g., Walker 1992; Heemsbergen et al. 2004; Luck et al. 2009; de Bello et al. 2010). This can be illustrated by the global production of food and raw materials in agricultural systems, which is dominated by several dozens of cultivated plant species. The inherent self-purification capacity of aquatic systems is largely provided by bacteria, fungi and benthic algae, which constitute the biofilm and purify the water by processing organic compounds and nutrients. In contrast, benthic macroinvertebates, macrophytes or fish may contribute to this function, but presumably to a lower degree compared to their overall contribution to biodiversity in these ecosystems.

On the other hand, scientists hypothesise that a certain level of biodiversity will be needed beyond the actual community of organisms that provide a service to compensate for a loss of species (Walker 1995; Diaz and Cabido 2001; Rosenfeld 2002). This ‘redundant’ biodiversity is assumed to be crucial to sustain service provision under changing environmental conditions, for example, under a changing climate or increasing demand for agricultural land for biofuel production. In order to better address ecosystem functioning, biodiversity indication and monitoring should directly address the functional and process-related components of biodiversity. Ideally, the indication should be based on direct linkages between biodiversity and ecosystem services. Special emphasis should be spent on important ecosystem-specific regulating and supporting services.

Develop broad-scale bioindicators and validate abiotic metrics derived from remote sensing

A first step towards broad-scale bioindication should concentrate on the usefulness of different components of biodiversity at these scales. In theory both structural (e.g., age structure of fish communities) and functional measures (e.g., trophic relationships in grassland communities) are potentially suited for broad-scale application. Even taxonomic richness and other species-based metrics may become useful; if they are expressed as an index of, for instance, relative richness compared to a reference value (see below). The consideration of references (benchmark) values in ecosystem assessment would provide a general means to broaden the scale of application.

In a second step, further effort should be spent on identifying linkages between functional measures (e.g., traits) and regulating and supporting ecosystem services. Although not always directly beneficial to human well-being, services like self-purification, waste treatment, water, erosion and air quality regulation, nutrient cycling, and photosynthesis provide the basis for many marketable (provisioning) services. Yet, applicable biological indicators for these services are largely missing, while the role of biodiversity for service provision is still unclear (Srivastava and Vellend 2005).

In contrast to the status of bioindication, both the development and the application of landscape indicators derived from remote sensing data are comparatively advanced (e.g., Gobin et al. 2004; EEA 2006, 2007). The Normalised Difference Vegetation Index (NDVI; Tucker 1979), for instance, constitutes a well-described and widely applied indicator of green leaf biomass, which has been used to estimate changes in vegetation health, leaf area and forest canopy cover from landscape to global scales (e.g., Myneni et al. 1995; Ares et al. 2001; Ingram and Dawson 2005). Further examples have been reported by Dormann et al. (2007) and Hendrickx et al. (2007).

It is acknowledged that a number of organisms not detectable by remote sensing have a strong affinity with a dominant species that creates and maintains large-area physical structures over long (including evolutionary) time periods. In addition to the forest example above, sphagnum bogs, wetlands, savannas, salt marshes and coral reefs create habitats that provide food sources, micro-environments and protection for a whole community of species that have a specificity to these habitat types (see, for example, Jones and Lawton 1995). The identification and classification of these macro structures by remote sensing is possible (e.g., Yang and Prince 2000; Ozesmi and Bauer 2002; Silvestri et al. 2003; Mumby et al. 2004; Harris and Bryant 2009) and quantitative assessments of biodiversity populations have also been made using the species-area relationship, discussed earlier, and extent of habitat derived from remote sensing (Turner et al. 2003; Jha et al. 2005). A reliable indication of the status and trends of biodiversity and ecosystem service provision beyond habitat mapping and based on remote sensing data, however, requires more research effort to validate the results. This in particular applies to the validation of statistically significant relationships of landscape metrics and measures of components of biodiversity by ground truthing. The knowledge of these relationships at the landscape scale might provide a widely applicable and cost-effective tool for biodiversity monitoring (e.g., Lengyel et al. 2008).

Develop and apply reference conditions

In contrast to trend monitoring, ecosystem assessment evaluates (or values) status, which requires the additional knowledge of a reference or benchmark value with which to compare the observed condition. Reference conditions are required for both different components of biodiversity and (desired) rates of different ecosystem services. If assessment was based on normalised O/E ratios, the application of reference conditions would render comparisons feasible among biodiversity and service levels within ecosystems and across ecosystems. O/E values of different components of biodiversity might be combined to produce an overall multi-metric index of biodiversity. Thus, we recommend the development and application of the reference condition approach in both biodiversity and ecosystem service assessment.

Develop and apply standardised protocols to gain representative data of high quality

The development and application of biological indicators of the different components of biodiversity require high quality data of the genetic, structural and functional characteristics of species and communities in all ecosystems. To achieve this high quality, standardised protocols will be necessary that facilitate gathering of comparable data at all relevant scales.

Conclusions

The application of our framework of suitability criteria to current and widely-used indicators of biodiversity and ecosystem services revealed scope for improvement. We suggest that more effort should be spent on the expansion of direct biological indicators of biodiversity and the development of thresholds or benchmarks. Justifiable benchmark values of a specific component of biodiversity (e.g., structural and functional diversity) or of specific processes underlying ecosystem functions and services (e.g., productivity, decomposition rate) would offer a sound basis for the assessment of both components.

In order to streamline future indication and to better address the implementation of biodiversity conventions, concerted effort is required at the international level. This would include the coordination of related activities (e.g., monitoring, indicator development, ecosystem management) and the provision of financial resources. The European Water Framework Directive may serve as an example of such a concerted effort. Since 2000, the directive has driven and supported the development of new indication systems towards an integrated assessment and management of European waters—rivers, lakes, marine and ground waters. A tremendous amount of research has been funded by the European Commission, but also by individual countries, to develop novel indicators and to render assessment results comparable between Member States.

A ‘European Biodiversity or Ecosystem Service Directive’ (see also Harrison et al. 2010) might provide the appropriate framework to foster and coordinate biodiversity indication and monitoring at the pan-European scale, in particular to improve our tools and knowledge, specifically to:

  • measure structural and functional components of diversity in all ecosystems at relevant spatial scales,

  • set comparable reference thresholds/quality targets for components of biodiversity;

  • identify and measure key ecosystem functions and processes,

  • identify the linkage of these functions/processes to ecosystem service provision (incl. provisioning, regulating and supporting services),

  • identify (critical) service provision rates needed to sustain human well-being,

  • assess the status and trends of biodiversity and ecosystem services in all ecosystems (e.g., by O/E ratios), and

  • develop cost-effective, easily understandable, broadly applicable and integrated multi-metric indicators of biodiversity and ecosystem services to address policy makers, decision makers and the public.