Introduction

Given increasing impacts on ecosystems around the world (Ngo et al. 2019) and limited resources for conservation (Evans et al. 2012; Waldron et al. 2013), it is critical that conservation actions are effective and cost-efficient. Whether the goal of a conservation program is to preserve remaining natural resources or restore them, measuring the quantity and quality of the species, the ecosystem, or the natural resources is a necessary tool in determining efficacy. It is also important to identify which human activities are impacting these natural resources and how much negative effect they have. Therefore, it is valuable to understand the type and amount of conservation actions necessary to ameliorate or reverse the anthropogenic impacts upon the system (Bull et al. 2017).

This is particularly important in the case in biodiversity mitigation (BM) programs (Maron et al. 2012; Gonçalves et al. 2015). The process of BM involves measuring anthropogenic impacts and addressing them through conservation action. However, a BM program may be less effective at achieving conservation goals (Maron et al. 2012, 2016) if it ineffectively measures impacts or restoration, either by requiring metrics that poorly assess biodiversity (Business and Biodiversity Offsets Programme (BBOP) 2012; Brownlie et al. 2013; Pilla 2014) or by failing to properly implement these measurements.

To address the need for better outcomes in the BM process, conservation science has created increasingly rigorous systems that define when and how conservation actions are applied. The Mitigation Hierarchy (International Finance Corporation 2012; M Ekstrom et al. 2015) is one example of such a system that has become a cornerstone in conservation policy worldwide (Arlidge et al. 2018). It defines sequential conservation actions that range from avoiding all impacts whenever possible, to minimizing necessary impacts to the system, and in cases where impacts cannot be avoided, implementing biodiversity offsets (BO) which recreate lost biodiversity in another location. However, while the Mitigation Hierarchy defines what types of conservation actions are appropriate, it does not define what metrics are appropriate to measure biodiversity impacts and conservation outcomes.

The goal of many BM programs is to achieve a “no net loss” of biodiversity (CITE). Whether or not a BM program achieves this goal depends, in part, on the metrics used to measure impacts and restoration actions (Gordon et al. 2011; Habib et al. 2013; van Teeffelen et al. 2014; Bull et al. 2017; Gamarra and Toombs 2017). Poorly measuring impacts or restoration may lead to incorrect management choices or, in a worst-case scenario, failure of the program to achieve its goal (Maron et al. 2012; Gonçalves et al. 2015; Van Bochove et al. 2016; Gourevitch et al. 2018). While the consequences of poorly chosen metrics may be severe in any type of BM program, they may be particularly so in the case of biodiversity offsetting (BO). The intent of BO programs is to create new resources, or restore damaged resources (habitat, ecosystems, etc.) as direct recompense for destroying or damaging these resources in another location. Improper measurement has great potential for harm under BO as areas of land may be reduced in function or converted completely and newly restored areas may not match or exceed the lost function (Maron et al. 2016). In part due to this, BO has a mixed to poor record of success. Studies regularly show that BO frequently fails to achieve the goal of “no net loss” when one area is impacted and another restored (Brownlie et al. 2013; Bezombes et al. 2019; zu Ermgassen et al. 2019).

Failure to achieve “no net loss” in mitigation is often attributed to either a failure of governments to mandate a suitably comprehensive conservation plan, planning for long-term viability, or a failure to develop a plan that comprehensively measures the correct elements of biodiversity within the focal system to assure the desired result (Bull et al. 2013). Although BO programs are a part of mitigation policy or practice in more than 45 countries, the number of transactions remains low (Madsen et al. 2011). Based on our experience with designing BO programs, and through communications with government agencies and practitioners around the world, the lack of strong policies with clear goals and required metrics that assure success is one reason for the limited success of these programs. This lack of clearly defined metrics also hinders researchers’ and practitioners’ ability to compare observed results from programs with similar conservation goals.

However, this failure to consistently develop adequate metrics for BM programs is not surprising. Ecosystems are notoriously complex, consisting of thousands of interacting biotic and abiotic processes (Pimm 1984; Cadenasso et al. 2006; Duffy et al. 2007). Determining which elements of any ecosystem may be particularly important to that system’s continuing healthy function or which are particularly impacted by human stressors is critical to any conservation action, particularly in the case of more invasive forms mitigation such as BO (Van Bochove et al. 2016).

We see a clear need for a common language describing the metrics required by mitigation policy that addresses known elements of the focal systems (e.g., species richness, genetic diversity, or soil permeability).

To address this need, we propose a Biodiversity Metrics Framework (henceforth the “Framework”). We specifically designed the Framework to serve the following two purposes. First, the Framework is intended to aid in creating mitigation policy and programs. In this application, the Framework serves as a tool to ensure known conservation issues within the focal system (e.g., inbreeding depression, soil pH necessary for survival) are addressed by required metrics. Second, the Framework may be used as a tool for evaluating conservation policies, both determining the degree to which known conservation issues are addressed by a policy and facilitating easy comparison between policies with respect to metrics employed. We believe that the Framework, if regularly implemented in creation and review of mitigation policies, may become another cornerstone tool in conservation programs, which in turn may reduce the risk of further ecosystem degradation and species extinction.

The biodiversity metrics framework

The Framework is based upon Noss’ Hierarchy of Biodiversity (Fig. 1). Noss’ Hierarchy is well-known (cited by more than 1400 published works as of 2019) and provides a scaffold describing the three attributes of biodiversity (composition, structure, and function; Jerry and Franklin 1993) at four different scales ranging from genetic to regional inter-ecosystem (Noss 1983). Noss’ Hierarchy also provides indicators corresponding to these elements of biodiversity and even suggests common techniques to measure the various elements of biodiversity within an ecosystem. For example, a measure of structure at the regional/landscape-scale (the top, center cell in Fig. 1) may include spatial heterogeneity (an example of an indicator) conducted by applying spatial analysis to remotely sensed data (i.e., satellite or aerial imagery).

Fig. 1
figure 1

Noss’ Hierarchy of Biodiversity (adapted from Noss 1990). The red, green, and blue columns each represent one of the three primary attributes of biodiversity (Franklin et al. 1981). Rows represent the scales from the largest (Landscape) to the smallest (Genetic). Cells (the nexus of a row and column) are each one “element” of biodiversity within the hierarchy as described by Noss. Indicators of each element (samples from the original Noss manuscript) appear in each cell. For in-depth descriptions of each element as well as recommended methods for assessing them, see Noss (1990)

The Framework itself consists of two components: The Biodiversity Scorecard (“Scorecard,” see Fig. 2) and the Definitions & Descriptions (“D&Ds,” see Fig. 3). The Scorecard is an organizational tool and serves as a visual introduction to a description of a mitigation program. The D&Ds follow the Scorecard and describe the mitigation program, highlighting which metrics are required by policy and information from published scientific literature describing specific components of biodiversity that are critical to conserving the focal system. The D&Ds are divided into sections that correspond to, and serve as the basis for, each cell of the Scorecard. The user would first read the Scorecard for a quick review of the comprehensiveness of the mitigation policy and then read the D&Ds to gain a more in-depth understanding. We discuss the creation and interpretation of both of these components below, beginning with the D&Ds, as the Scorecard uses information derived from them.

Fig. 2
figure 2

An example application of the Biodiversity Metrics Scorecard. Row and column definitions are adapted from Noss’ hierarchy of biodiversity (Noss 1990) and define nested elements of biodiversity ranging in scale from genetic to landscape (inter-ecosystem). Numbers within the cells describe the number of metrics that assess that particular aspect of biodiversity. Summary cells (rightmost column and bottom row) describe how comprehensively the metrics prescribed by the offsetting policy/program address the ecosystem. Summary cells list the percent of the cells within the respective row or column that have any metric required by policy. Yellow highlighted cells indicate elements of biodiversity identified in literature review of published research as critical for the focal species or system. Mismatch (cells with yellow highlights, but no metric) indicates areas where conservation targets are not explicitly addressed by metrics employed by the mitigation program. The purple letters “A,” “B,” and “C” are reference for the manuscript only and not part of the Scorecard

Fig. 3
figure 3

Example of subsection of “Definitions & Descriptions” (D&D) section. The header of this subsection refers to the cell (intersection of row and column) of the Biodiversity Scorecard (Fig. 2). The description (subtext within the header) is text describing elements of biodiversity adapted from (Noss 1990). The alphabetically listed sections (A & B) allow the reader to identify known threats or impacts to the system (A), and which metrics are required by policy to address the impacts in a mitigation program (B). Peer-reviewed evidence for the threats/impacts particularly important to the focal system and the efficacy of required metrics for assessing each threat should be cited as well. For clarity, each cell of the Biodiversity Scorecard should have a subsection with the D&D regardless of whether or not there are metrics addressing that element of the hierarchy. A complete D&D for Utah Prairie Dog (provided for example only) with instructions is available in Appendix S1

Definitions and descriptions

The purpose of the Definitions and descriptions (D&Ds) is to summarize the best available science regarding the conservation of the focal system, and the link metrics required by the policy to known impacts on the system. Each subsection of the D&Ds focuses on one element (one cell in the Biodiversity Scorecard) in Noss’ hierarchy (see Fig. 3 as an example of a D&D for the species-level composition cell in the Scorecard). Each of these subsections list (A) threats or impacts to the focal system, and (B) metric(s) required by the policy or plan to measure each threat or impact as part of mitigation. While we recommend in-depth descriptions of each of these later in a policy document, each subsection of the D&D should be concise with numbered lists (see Fig. 3 and Appendix S1 for examples) linking threats to the focal system and metrics that measure these impacts or actions related to these threats. For example, if the first enumerated threat to the focal species is extirpation through spread of disease when the population becomes too dense, the first listed metric employed should correspond to this threat. Relevant, peer-reviewed, scientific evidence for both the nature of the threat and the viability of the metric as a tool to measure it should be cited for each enumerated point. While metrics described in the D&Ds may directly measure the element in question (e.g., a field count of the population as a means of measuring abundance), they may also be indirect, or proxy, assessments. As an example of an indirect metric, the total wintering monarch butterfly population in North America is estimated based on a number of hectares of forest that are occupied during their winter torpor rather than a rigorous count of each individual. Here, the justification for this indirect assessment should be included in the cited scientific literature in the D&D for species-level composition.

The process of creating the D&Ds is the same whether it is for the purposes of creating a policy document, reviewing a policy, or comparing several policies. In each case, we recommend the author to first perform a literature review of the focal system, particularly focused upon peer-reviewed articles to assure information is based upon the best possible science. If the focal system has a paucity of information, the literature review should include the closest possible analogous systems (e.g., closely related species, similar hydrology to that of the focal system, but in a different geographic location, etc.). If using analogs, the author should provide notes in the manuscript (i.e., the policy or review of policy) justifying the use of these analogs and stating potential differences between the focal system and the chosen analogous systems. We recommend highlighting and annotating all literature used in this review process to match that of the relevant subsection of the D&D and Scorecard (e.g., use “Species-Level Composition” or “C2” as annotation for highlighted text in published scientific texts). This technique may also be employed when creating a Framework as a review of an existing policy as a way to annotate the policy document, highlighting both the scientific evidence of impacts and also the metrics required by the policy to measure mitigation of these impacts.

In the example provided (Fig. 3), we give one subsection of a D&D for the Utah prairie dog (Cynomys parvidens). A complete Framework for this mitigation program is provided in the supplemental to this publication (Appendix S1). Literature review of risks for this species identified two conservation concerns for Species-level Composition: population size and population density. Here, the United States Forest Service (2015) finds that the species’ total population continues to decline and that local populations are declining below minimum viability thresholds. This threat is enumerated as “1” in the “Threats or Impacts” section of the D&D. The metric required by the offset policy to address this threat is to perform annual population surveys at both the impact and offset sites. This metric corresponds to the first threat in the section above, and therefore also is listed as “1.” The second threat identified in the literature review for this species relevant to this D&D subsection is sylvatic plague. This disease is associated with high population densities of prairie dogs (the disease is primarily spread by close contact and fleas). In this case, there is no metric required as part of the mitigation program to address this threat. We note this in the line marked “2.” In addition, we state that population density could be calculated for this program because both colony area and annual population surveys are required as part of this mitigation program.

The biodiversity scorecard

The Biodiversity Scorecard (Fig. 2) is designed to be a summary table describing the metrics required to mitigate the focal system. As described below, each cell (the nexus of a row and column) represents an element of biodiversity within the focal system. Highlighted areas and numbers in the cells of the Scorecard refer to information in the D&Ds. As such, we recommend that the Scorecard precedes the D&Ds in any policy document (or formal review of policy or comparison of policies).

As with Noss’ hierarchy, the Scorecard is composed of three columns and four rows of cells describing elements of biodiversity. The three columns represent the three primary attributes of ecosystems: composition, structure, and function (Jerry and Franklin 1993). The four rows describe the focal system at a range of scales from within a species (genetic and population scale) to the interactions of patches of ecosystems within a landscape. Each cell (the nexus of a row and column) represents one element of Noss’ hierarchy. For example, the cell in Fig. 2 labeled “A” describes the species-level composition of the focal system. Any number within a cell (in the case of the “A” cell in Fig. 2, “1”) describes the number of metrics required by the conservation policy to measure this element. Cells highlighted in yellow (for example, in Fig. 2 cell “A,” but not “B”) represent elements of biodiversity identified by a literature review as of particular conservation significance for the focal system. For example, the primate species Golden Lion Tamarin (Leontopithecus rosalia) is particularly vulnerable to extinction due to inbreeding depression (Dietz et al. 2000). When creating a Scorecard for a conservation program focused on this species of Tamarin, an author would place a yellow highlight in the Function cell at the Genetic scale, because Noss (1990) classifies inbreeding depression as this type of element. Also, see Noss (1990) for in-depth descriptions of each of the elements and biodiversity indicators that describe them.

In addition to describing both the elements of biodiversity considered important for the conservation of the focal system (yellow highlighted cells) and elements that policy requires measurement of (numbers within cells), the Scorecard provides an accounting of the comprehensiveness of the metrics required by the policy. The bottom row and rightmost column of the Scorecard summarize the percent of Noss’ elements that are addressed by the required metrics. These values describe the number of cells in a given row (scale level) or column (attribute) as a percent of total cells. While any cell may have more than one metric addressing it, and thus, a number higher than one (e.g., if population counts of two different species are required by a policy, the species-scale composition cell would have the number “2” recorded), we intend the summary rows and columns to illuminate the comprehensiveness of the mitigation policy with respect to the breadth and depth of an ecosystem. This is because one noted cause of poorer than expected outcomes in BM is the failure to account for the complexities of the focal system when taking action (Bull et al. 2013; Gelcich et al. 2017). This is particularly the case when engaging in offsetting practices that involve the creation or restoration of ecosystems or habitat, such as wetland mitigation or other similarly intensive practices (Brown and Veneman 2001; Robb 2002; BenDor 2009; Vaissière and Levrel 2015).

A reader interpreting the Scorecard should first note the yellow highlighted cells showing the elements of biodiversity identified by published research as particularly valuable in conserving the system. Then, the user should note the yellow highlighted cells that lack a number inside the cell (e.g., the cell marked “C” in Fig. 2). These are instances where research has identified an element of biodiversity as noteworthy in the conservation of the system that the policy does not measure. Alternately, there may be instances where a cell contains a number and yet no yellow highlight. This indicates that a metric, or multiple metrics if the number is greater than one, required by the mitigation policy does not correspond to an element of biodiversity identified by peer-reviewed literature as particularly valuable to the focal system. In either case, a mismatch between identified risks for the focal system (yellow highlights) and metrics required to assess the risks (numbers within the cells) may indicate that a mitigation policy is at risk of decreased success. Because the information provided within the Scorecard itself is limited, an evaluation of a mitigation policy should begin with reading it, rather than end with it. Evaluation should then continue with reviewing the D&Ds and finally any in-depth descriptions of metrics (if needed) and guides for application as part of a mitigation project.

Applying the framework

We envision two main applications for the Framework: as a tool aiding in the creation of policy documents and as a means of evaluating the comprehensiveness of the metrics required by a policy or comparing policies. The primary use for the Biodiversity Framework, as a tool that organizes mitigation policy documents in a standardized fashion, is intended to summarize the often necessarily complex policy documents. While the Framework is not intended to replace a comprehensive policy document, it is a means of concisely summarizing the often necessarily complex policy documents describing the mitigation metrics required in a policy. Additionally, in utilizing the Scorecard portion of the Framework, policy creators may identify flaws in the mitigation policy before it is adopted. A mismatch between vulnerabilities of the focal system and metrics (either a case where research identifies a particular vulnerability and there are no metrics required to assess it in the policy, or if a metric is required that does not address a vulnerability) may indicate risk that the policy may not meet conservation goals. As previously stated, a leading cause of mitigation programs failing to achieve goals is a lack of comprehensive metrics that address both short- and long-term vulnerabilities to a system (Bull et al. 2013; Gonçalves et al. 2015; Maron et al. 2016). By using the Framework as a tool to create policy, mismatches may be identified, and hopefully rectified, before policy is adopted. Alternately, if a mismatch is not addressed requiring additional metrics, the policy could flag these mismatches and suggest means of addressing them in the future if desired mitigation outcomes are not being achieved.

The second application of the Framework is as a means of assessing existing mitigation policies. Reviews of the relative success or failure of mitigation policies are common. Although poor mitigation outcomes are frequently attributed to poor choice of metrics for the focal system (at least as a contributing factor), no tool exists to readily identify which metrics may have been poorly chosen or which particular vulnerabilities of the focal system were not addressed by metrics. Because the Scorecard considers the composition, structure, and function of a focal system and assesses the comprehensiveness of these attributes represented in metrics outlined in mitigation policy, it may predict long-term consequences for other elements of the focal ecosystem that may not have been addressed by metrics. For example, consider an offsetting program that relocates a threatened arboreal primate species to a newly restored area of habitat. If the policy requires population counts of the species (the Species-level Composition element on the Scorecard) but neglects to measure canopy connectedness (Community-Ecosystem-level Structure), the species may establish in the short-term, but ultimately become extirpated at that location. Lastly, it is likely of interest to compare mitigation policies for the same, or similar, focal systems. Whether the same focal system is being mitigated by different policies (for example, one species found in multiple countries) or two different, but similar focal systems, a tool for evaluating metrics may provide a valuable first step in comparison. Comparing the Scorecards for two different programs with different success rates may suggest that a metric employed in one program absent from another may be particularly critical to success, or at least a point worthy of further investigation.

Discussion

We began this research as an investigation of which metrics were used to measure biodiversity impacts and conservation outcomes in BO programs around the world. Specifically, we set out to document the comprehensiveness of these metrics with respect to Noss’ Hierarchy of Biodiversity, hoping to identify potential biases in required metrics that may lead to poorer than expected outcomes in this particularly sensitive type of mitigation. In spite of the fact that BO programs are now in use or nascent in at least 45 countries (Madsen et al. 2011), policy documents defining the BO programs in these countries rarely define concise metrics.

As one example of how the Framework may add clarity and transparency, consider the biodiversity offset program at the Rio Tinto QIT Minerals Site in Madagascar. Rio Tinto has partnered with IUCN and other NGO organizations to develop and help administer this program, and it is often considered a model of success for a private company volunteering to mitigate the considerable impacts of mining through and offset program (Temple et al. 2012; Bidaud et al. 2015). Documentation describing methods for this program provides programmatic goals (e.g., “no net loss) and describes how offsetting “currency” is calculated by assessing quality multiplied by quantity or land area as a percent of the total remaining geographic range of focal species (Temple et al. 2012), but it does not provide clear methods or tools to assess whether these metrics are adequate for the goals of the program. If the Framework were to have been used in the case of Rio Tinto, it could fill this gap. For example, the Scorecard would provide an overall sense of whether the metrics were comprehensive enough to address the conservation goal, and the D&D would provide the details of how the metrics were derived, and the peer-reviewed scientific research justifying the need for each metric. It may be that these considerations were made during the development of the Rio Tinto program, but since they were not clearly presented and made available to the public through any documentation, it makes it difficult for others to evaluate the program. The Framework addresses this problem by clearly organizing all the necessary the information within a sound and foundational science framework (i.e., Noss 1990). This is particularly valuable for conservation-focused agencies within the Malagasy government or NGOs concerned with the long-term viability of the species and ecosystems impacted by this program as it may shed light on particularly useful metrics in conservation, or those that fall short.

Without a standardized system of review and comparison of the metrics employed in BM, it is unclear whether successes and failures can be attributed to poorly executed policies or poorly designed ones (Curran et al. 2015). Further, both the lack of clarity in policies and poor success in achieving offsetting goals were consistently stated in our communications with practitioners as reasons why BO was not more readily employed.

The Biodiversity Metrics Framework elucidates which biodiversity components are and are not assessed in biodiversity mitigation programs, identifying mismatches between the stated goals of policy and the metrics employed to measure them. By doing so, the Framework may flag accidental oversights that could lead to inaccurate measurements of program outcomes. Since the Framework is grounded in Noss’ Hierarchy, it can provide scientific justification for the metrics required by mitigation in an organized fashion that can be applied to any BM policy.

To be most effective, we believe the Framework should be used in conjunction with other mitigation planning tools such as the Mitigation Hierarchy (ten Kate et al. 2004; BBOP and UNEP 2015), the Metrics Decision Tree (Gamarra et al. 2018), and the site selection protocol described in Kiesecker et al. (2009) to increase accountability and the likelihood of successful conservation. Each of these other tools may prove critical to the success of a conservation program: The Mitigation Hierarchy provides guidance as to what type of conservation action is preferred to best assure “no net loss.” The Metrics Decision Tree defines which general types of metrics may be applicable given available data on a focal species. Kiesecker’s site selection tool identifies target areas based on landscape metrics. The benefit of including the Framework into this BM is a means of organizing and explicitly clarifying which metrics are appropriate for the focal system based upon the best available science. This clarity is necessary to make sure that biodiversity is appropriately and consistently measured under a policy and is a necessary component for each of the mitigation planning tools described above.

We present the Biodiversity Metrics Framework as a working draft of this tool that will hopefully be improved as it is applied. It is limited to addressing elements of biodiversity, rather than a comprehensive assessment of the effectiveness of a policy. As such, the Framework does not measure socioeconomic factors such as cost-effectiveness, support by stakeholders, or a myriad of other metrics that might be used to assess a mitigation policy. It also does not assess compliance, which is another considerable challenge in achieving mitigation goals (Walker et al. 2009; van Teeffelen et al. 2014; Lindenmayer et al. 2017). However, the Framework serves to standardize, and make more transparent, one portion of the creation and evaluation of mitigation policies.

Better BM driven by better accounting of losses and gains is increasingly necessary in the face of continuing human impacts and clear indications that current efforts at ameliorating these impacts have failed (Barnosky et al. 2011; Alvarado-Quesada et al. 2014; Ceballos et al. 2015). Our intent in developing the Framework is to advance the ongoing conversation among the scientific and practitioner communities focused on the improvement of metrics and BM policy. The Framework offers a tangible example of how such a science-based tool could contribute to a more comprehensive assessment of biodiversity, and improve the effectiveness and legitimacy of BM.