Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

An index of biological integrity (IBI) is a tool that may be used by biologists, regulators, planners, and others to ascertain the condition of a habitat type or resource with respect to its biological communities (Karr and Chu 2000; Simon et al. 2000; Miltner et al. 2004). In brief, IBIs are composed of metrics or characteristics of flora or fauna taxa in a pre-defined system (e.g., floodplain wetland, emergent wetland), that respond minimally to natural variation but in a predictable fashion to human disturbances. Yet, the development and adoption of the IBI as a tool are still in a state of flux as a result of refinements resulting in IBI advancement in terms of geographic regions, habitat, and indicator taxa research over the past 30 years.

1.2 Background

Around the world there has been a 50 % global decline in wetlands since 1900, including some regions such as New Zealand or California losing up to 90 % due to anthropogenic activities (Spiers 1999). Many countries have begun programs to track the trends and conditions of the remaining wetlands over time, including Australia’s Index of Wetland Condition and Ireland’s Integrated Constructed Wetland program that have been developed over the last 23 years. These programs incorporate aspects of biodiversity, water management, and landscape fit to help track and define the roles wetlands play in the greater ecosystem.

The monitoring and tracking of aquatic resources in the United States began, in earnest, in 1972 with the Clean Water Act making it necessary for states to evaluate the condition of their water resources. This law was born out of the environmental movement, spurred on it part by Rachel Carson’s Silent Spring (1962). This book about pesticides and industrial chemical effects on animal populations and the environment, as well as fires on the Cuyahoga River (Ohio, USA) in 1969 featured in Time Magazine finally turned public opinion. The government was spurred into action and formed the U.S. Environmental Protection Agency (USEPA) as proposed by Richard Nixon and approved by Congress in 1970. Prior to this time, there was no federal agency tasked to regulate environmental pollutants on a national level. One of the first tasks was for states to develop criteria to establish thresholds for specific contaminants that indicate impairment, which then could be approved by the USEPA. These thresholds were used as surrogates to quantify the level of “physical, chemical, and biological integrity” of the nation’s wetlands, rivers, and lakes. Defining integrity has always been a somewhat loose term that depended on the status of the science and water research advancements. As regulations developed as a result of the law, chemical monitoring of the nation’s waters became standard in water monitoring programs. Contaminant thresholds were used to determine National Pollutant Discharge Elimination System (NPDES) permit standards and to establish water quality standards. However, relying only on water quality as an indicator of biological integrity may be limiting, and the regulatory process has continued to evolve and progress as these shortcomings were recognized (Table 1.1).

Table 1.1 Positive and negative aspects related to chemical water quality monitoring for regulatory biological integrity purposes

Chemical monitoring is a sample from a point in time. It is not indicative of chronic conditions and can dramatically fluctuate over geographic regions or over time, in part depending on the biogeochemistry of the underlying region. Additionally, the cumulative effects of these pollutants can be overlooked as they are often greater than the sum of their parts and there is no way to measure the cumulative impact of these factors on biological integrity. Moreover, chemical monitoring does not capture the other stressors that may affect biological communities. For example, flow alterations, habitat degradation, or heated effluent effects that may not show up in terms of responses using chemical thresholds.

In response to shortcomings, the U.S. Fish and Wildlife Service (USFWS) developed Habitat Evaluation Procedures (HEP) and Habitat Suitability Indices (HSI) as resource planning tools in the late 1970s and early 1980s. The HEPs were intended to document the quality and quantity of available habitat for a selected species. They provide a relative measure of comparison between different areas in the same point in time, or a relative measure of comparison of the same place at different points in time.

In order to make like-comparisons, each HEP is based on a specific HSI model developed for a target species of interest (e.g., snapping turtle (Chelydra serpentina), American woodcock (Scolopax minor)). HSIs are derived from life-history and habitat preferences of the target species derived from the literature. This requires an intensive study, or literature search of the target species, throughout its range and a scientific understanding about the natural variation of habitat features throughout its range. A great deal of time and effort has been invested into developing these HSI models since the 1980s. The result is a list of over 150 species, including plants, macroinvertebrates, fish and wildlife; however, development of these models has all but ceased and they are much less prevalent today than previously (USFWS 1980).

For each variable considered in the HSI, a sub-index value is generated from 0 indicating no habitat is suitable, to 1.0 indicating the area has the habitat characteristics associated with the potential carrying capacity of the target species. These sub-index values are then averaged for a total HSI value for the area of interest that was surveyed. The HSI value is then multiplied by the area of available habitat to determine Habitat Units (HUs), which are the unit of measurement in Habitat Evaluation Procedures.

The HEP has been a useful tool, but like chemical monitoring there are a number of inherent strengths and shortcomings (Table 1.2). Despite its broad appeal for managing individual species habitat preferences, its overall utility was limited in terms of permitting and holistic decision-making (Roloff and Kernohan 1999) despite widespread use in many environmental impact assessments. Many of these models have not been field-tested, calibrated, and verified; which can be grounds for litigation in a regulatory context. The models can be improved through field verification and regional specificity, as demonstrated with the Louisiana waterthrush (Seirius motacilla) in Pennsylvania riparian corridors (Brooks 1997). But even with improved models, in terms of wetland permitting, individual species are not considered in the context of the Clean Water Act, and the question of biological integrity is not settled by HSI models. For example, a farm pond may not be indicative of natural functioning wetland system, but may score a high value for American bullfrogs (Lithobates catesbeianus), and low for a marsh wren (Cistothorus palustris). What does this mean in terms of the wetland’s overall biological integrity? Appropriate onsite or offsite wetland mitigation certainly cannot be based on one animal species’ habitat score. Moreover, despite the extensive amount of work that went into creating these HSI models, many of them have not been field verified across the entire distributional range of each species. We know that a species' realized niche may vary across its range depending on other factors such as competition, food, and shelter resources. These niche factors also change not only across a species' distributional range, but in terms of a smaller spatial scale also. A depression wetland without flowing water is different than a floodplain wetland with a first order stream that overflows its banks occasionally. By not having the research literature that supports the determination of each species’ optimal carrying capacity for each wetland type, the comparison becomes ambiguous and essentially meaningless for ensuring biological integrity in a regulatory setting.

Table 1.2 Positive and negative aspects related to habitat suitability indices (HSI) models for regulatory ‘biological integrity’ purposes

1.3 The Evolution of Indices of Biological Integrity

As monitoring programs of the nations’ water progressed, researchers used diatoms and macroinvertebrates as indicators of long-term water quality, as these organisms responded directly and in a predictable fashion to impairment, even when the impairment is not evident at all times (as in the case with many chemicals) (Barbour et al. 1996, 1999; Gerritsen et al. 2000; Hill et al. 2003). They were effective at identifying impaired waterbodies, and were not focused strictly on one species. The variety of species, all having different preferred habitat requirements, functional trophic levels, and sensitivities to impairment allowed for a robust data collection effort that not only varied predictably to disturbance, but also varied predictably between habitat type (i.e., high order versus low order rivers); thus determining a measure of integrity across ranges and habitat types was possible.

Yet, there were still problems with using macroinvertebrates and diatoms as indicators of biological integrity. The life histories of many of these organisms were unknown, as well as their tolerance levels to pollutants (Batzer et al. 2001). Baseline research answering these tolerance questions and identifying scientifically vetted thresholds was lacking. Moreover, simply due to the numbers of species collected, as well as the subtle and complicated morphology clues that are used to identify each species, it was expensive and required an intensive time commitment. Although there are volunteer programs such as Save Our Streams that are challenging this notion, it became difficult to translate the importance and sell water quality to the general public in terms of “bugs and slime.” However, people did care about fish, and when James Karr introduced the first fish IBI in 1981, it began a new chapter in using macro-organisms as biological assays to indicate the condition of ecosystem and habitat components (Karr 1991).

Using higher organisms as the basis and taxa of interest had a number of advantages that became apparent in later IBI reiterations. The life histories and trophic levels of many larger organisms are well known and documented, as well as the physical habitat characteristics that are preferred for each ecological guild. This enabled scientists to track and adjust the biological metrics according to habitat type and region. As fish were the first taxa that were successful at indicating variation attributed to human impairment, and because the Clean Water Act expressly mandates the tracking of biological integrity in waterways of the United States; naturally, fish IBIs were adapted and tested across the United States for both large and small river and stream systems. However, it is important to keep in mind that IBIs were initially only being used to assess running waters (i.e., rivers and streams).

As the number of fish and stream IBIs increased, researchers began to examine alternatives for creating indices for both upland and other aquatic habitats. Birds were among the next logical choice; they are relatively conspicuous species, much is known about life histories, they tend to have specific habitat needs, and are relatively easy to sample. Avian-based indices of biological integrity were developed for riparian areas associated with streams (Croonquist and Brooks 1991), which naturally led to avian assemblages being used to measure the condition of non-wetland habitats (Bradford et al. 1998; Canterbury et al. 2000). In fact, it stands to reason that when designing and developing an IBI, many researchers considered adopting the methods used by many citizen groups to minimize the training necessary to capture data for the IBI (why reinvent the wheel?). This also allows previous years’ data collected to be evaluated without having to manipulate previous data by adopting it to fit or match new protocols. Vegetation and other taxa have been evaluated (Galatowitsch et al. 1999; Mack 2004), but as far as the majority of aquatic habitat assessments; fish remained the taxa of choice as IBIs for lakes, ponds, and estuaries (Karr 1991; Moyle and Randall 1998; O’Connor et al. 2000; Simon et al. 2000; Teels et al. 2004; Miltner et al. 2004; Veraat et al. 2004). However, the use of amphibian species, including streamside salamanders and frogs, began to be more prevalent as they are also commonly sampled by volunteer groups and known for their sensitivity to toxins as well as the surrounding landscape conditions (Micacchion 2004).

1.4 Taxa Groups of Interest

Taxa groups are used as a response to disturbance in wetland IBIs and, accordingly, will have varied responses due to scale. If we consider each wetland to be a bull’s eye in the center of a target, we can conceptually imagine that each taxonomic group of interest that might be used to make up an IBI to have concentric rings that represent zones of impairment influence radiating out from the wetland area. Some groups of animals are more susceptible to localized sources of impairments within the wetland, or even within a small fraction of the area within the wetland (e.g., sediment accumulation within the foraging area with regards to shorebirds), whereas others may be affected by regional or landscape level influences (e.g., lack of forested cover for some warbler species). This is an over-simplification as in reality even the distance to impairments varies from species to species.

Avian species are among the most conspicuous wetland species, and relatively speaking, easy to monitor through the commonly-implemented point-count surveys (Weller 1988). Many states have Breeding Bird Survey (BBS) routes, and if located within or at the edge of a wetland, can often serve as measurements of wetland biological integrity. Callback surveys can augment these surveys to locate the often difficult to detect wetland-obligate birds such as American bitterns (Botaurus lentiginosus), sora (Porzana carolina), or other rails. Avian species may respond to structural changes in habitat, as well as indicate wetland functions. For example, the moist-soil management strategies used to promote waterfowl forage is a function of manipulating water levels (Anderson and Smith 2000; Taft et al. 2002). What this means functionally is that this wetland has the capacity to provide habitat and to attenuate and moderate flood events.

Amphibians have long been heralded as harbingers of ecological change as their populations have globally declined as anthropogenic impacts increase (Wake 1991; Wyman 1990). However, deciphering exactly what is behind these population collapses is not entirely clear. Their permeable skin and egg masses, as well as reliance on terrestrial, wetland, and aquatic habitats (with limited dispersal capacity), and relatively short-cycled population characteristics, demonstrate exactly why this taxa is so susceptible to human stressors (Blaustein et al. 1994). Although due to their sensitivity, there are numerous combinations of factors that affect populations – making it all the more important to use the appropriate disturbance index when assessing population trends. Amphibian populations, in general, are well known to fluctuate wildly year to year, even when the environment appears to remain the same year after year (Pechmann et al. 1991). As such, their use as indicator species for tracking of human impairment trends can be problematic. However, they are among the easiest to collect some data on, and can be done so with volunteers conducting call surveys during seasonal windows dictated by the North American Amphibian Monitoring Program (NAAMP). This dataset, as well as the Breeding Bird Survey dataset, are representative of some of the most extensive collections of presence or absence and relative abundance (based on the call surveys) data available over such a wide area of any vertebrate taxa.

With so many factors known to influence amphibian reproduction, a detailed disturbance index that is sensitive to only one type of impact (e.g., chemical impairment, unstable or flashy hydrology), or one that the impact can be parsed out (e.g., buffer zone vegetation alteration versus wetland sedimentation) may be better suited to this taxa. This will likely result in a limited number of suitable metrics, but will also yield a better and more consistent response than trying to pool all amphibian genera and all the possible stressors together. For example, an IBI that focuses on adult stage ambystomid salamanders would likely be more sensitive to the upland stressors around wetlands than the biological quality of the wetland itself. Furthermore, in the future, we may find that groups of species of amphibians explain more about wetland function than condition. The presence of American bullfrog and northern green frog (Lithobates clamitans) tadpoles would speak to the semi-permanently flooded water regime, which means anaerobic conditions that would facilitate the buildup of organic material (carbon sequestration). This niche-like focus is not necessarily the ideal scenario for conducting IBI work, but considering the larval taxonomic identification skills needed to build an amphibian IBI (Micacchion 2004), this revised approach may yield more clues about wetland function and specific contaminants than just biological integrity.

Plant communities require a highly-skilled botanist to ascertain a complete species list and inventory that are used to form an IBI, but they are also among the most consistent and responsive groups that indicate varying levels of stressors (Miller et al. 2006). Due to the nature of their site fidelity, plants cannot avoid stressors, rather they are bound to adapt or perish. There are certainly many metrics possible from plant communities, but the one of the most commonly used that is based on this site fidelity concept is the Floristic Quality Index (FQI) (Miller and Wardrop 2006; Rentch and Anderson 2006). With this measure, each plant species is assigned a known Coefficient of Conservatism (CoC) from 0 to 10 that is indicative of the plants site fidelity. Those with unique and specialized habitat requirements are assigned higher scores, whereas the generalist species are assigned the lower scores. Invasive species are not considered or are assigned a value of 0, depending on the regional formula. This is an index based on the presence or absence of species, whereas other metrics may depend on abundance or coverage date (i.e., percent of graminoid species).

With plants, we know quite a bit about individual life histories, and know that some species are indicators of specific stressors (Mahaney et al. 2004; Magee and Kentula 2005). For example, cattail (Typha spp.) is tolerant of high levels of nutrients, and fox sedge (Carex vulpinoidea) and touch-me-not (Impatiens spp.) can withstand high levels of sedimentation. With this knowledge, we no longer need to determine what the impact is that is affecting the wetland, but rather focus on finding the source of the sediment and work to correct it with the expectation of observing plant communities shift over time. However, the operative words are “over time”, which in addition to the expertise required, is a drawback to using plant communities for bioindicators. The response time to disturbance is often delayed as species struggle to adapt, and depending on the dispersal characteristics of some of the more tolerant plants that indicate stressors, may take longer than one growing season to observe meaningful shifts in population dynamics (Koning 2005).

Macroinvertebrates would seem to be an easy pick for a taxa group to use to evaluate biological integrity in a wetland. They have been used with great success in evaluating streams, and as previously mentioned, were among the first taxa to be used in an IBI (Hilsenhoff 1988). This seemingly would hold true for wetlands; they can be easy to sample and are represented by a diverse number of Families with varying life histories that have a documented and well-established response to certain stressors, and have already been classified by Family into Functional Feeding Groups (FFGs). However, in reality, we are still attempting to unlock the cues and consistent biological responses. With such a relatively short lifespan compared to other species, the seasonal variations in temperatures, precipitation, and hydrology make it difficult to determine a consistent sampling time frame. Even within a wetland, macroinvertebrate communities vary between water regimes, which fluctuate by nature so sampling needs to be stratified to account for such variations or be comprehensive enough to capture all the changing parameters in one setting. When seasonally-flooded wetlands dry up in late summer, there is often rapid colonization by terrestrial macroinvertebrates that may confound the response signal when looking for indicators based on aquatic or wetland assemblages (Batzer 2004). Furthermore, even if an area does not dry up, flashiness in the hydrology (perhaps due, in part, to impervious surfaces) may only be evident for a day or two, and can disrupt life stages and leave no trace a week or so later or during a sampling event. Another issue at hand is the resolution of the data collected. Family-level identification may not be sufficient, and identification at the genus or species level would certainly be more telling (as it is in streams and rivers); however, finding people with the expertise to do family-level identification is difficult, even more so at the genera level of wetland macroinvertebrates (Bailey et al. 2001).

We might want to consider further investigations into evaluating macroinvertebrate communities as indicators of wetland function (Batzer and Wissinger 1996; Brady et al. 2002). Just like some larval amphibians that require semi-permanently flooded water over a year before metamorphosis; the relative complexity and balance of macroinvertebrates can offer clues into function (Cummins and Merritt 2001). More predators may indicate a more consistent hydrology that can support multiple trophic levels of macroinvertebrates. If there is plenty of organic matter, but not many collectors or shredders, it may be an indication of some toxic effect in the water that is impairing community structure. We still have a lot to learn about macroinvertebrate communities in wetlands, but the key will be determining a consistent strategy for sampling the variability associated with both yearly weather fluctuations and with wetland heterogeneity.

There are undoubtedly many other taxa that can be used to detect trends in wetland impairment; among the most commonly considered are algae and or bacterial communities (Hill et al. 2003). Despite the research being conducted, the level of expertise that is required to evaluate and assess these communities is certainly a barrier to widespread adoption. Metrics may be locale specific, as these communities vary widely regionally. Furthermore, and perhaps more importantly, it is not easy to relay this information to the public.

Each taxa group used to track biological integrity has positives and negatives. The key is knowing the resources available and using them accordingly. Table 1.3 contains a comprehensive list of different metrics that can be drawn from common taxa groups. Researchers must assess how easily the information is collected. Does it take a trained person or can volunteers collect the data? How accurate are the results? Call counts for amphibians are easy, but the presence or absence and relative abundance data lacks resolution. Finally, one must remember that ultimately these results must be communicated to the public. This is often an overlooked aspect in creating IBIs; they need to be evaluated so people know what type of work is going on and how it may affect their daily lives. If you can talk to stakeholders in terms that they understand (e.g., how wetland impacts support more or less waterfowl, how wetland grasses and flowers relate to clean water, and what this all means in terms of sewage treatment costs) then they are more likely to show an interest in the condition of the wetlands in their community.

Table 1.3 A comprehensive list of potential metrics by taxa group that can be used to develop wetland indices of biological integrity

1.5 Designing and Building an IBI

The monitoring of each taxonomic group of interest is done by volunteer groups, agency and academic personnel. Although there are variations in the level of detail of data collected, generally standardized census techniques have been developed for each taxa group regardless of geographic location. This allowed for the development of regional IBIs for rivers, streams, and lakes. Each IBI had to not only be developed for a specific geographic location, but also for variations within that classification (i.e., high gradient vs. low-gradient streams; large vs. small lakes). However, wetlands have even another layer of complexity due to numerous variations in landscape settings coupled with two systems for classifying wetland type.

The Cowardin et al. (1979) wetland classification is the standard for the National Wetlands Inventory (NWI) and is the most commonly used wetland classification system in the United States. It is based primarily on vegetative structure, with special modifiers describing water regime or other characteristics. However, comparing the biological attributes of emergent wetlands to other emergent wetlands (i.e., scrub shrub to scrub shrub, forested to forested), does not always reveal a large number of metrics that could discriminate between levels of human impairment.

The Cowardin et al. (1979) classification scheme does provide a classification to compare one wetland to another, but how valuable is this in terms of comparing ecological integrity? An emergent depression in the middle of a farm field functions differently than an emergent floodplain bench along a flowing stream or river. When comparing one large river system, or a small system to another small system, are we not really comparing two systems that function alike in a landscape context? The river continuum concept (Vannote et al. 1980) states that the biological structure and function of the river’s ecosystem change in a predictable fashion as it increases in size from upstream to downstream. Likewise, the hydrogeomorphic classification scheme (Brinson 1993) uses a wetland’s source of hydrology, placement in the landscape, and underlying geology to describe classes of wetlands based on the potential function in the landscape. For example, regardless of vegetative structure, a floodplain wetland typically will receive and attenuate overland flooding, and slow down and retain sediments from both uplands and upstream on a landscape level. A basin wetland with no outlet sequesters and processes nutrients and toxins. Biological communities of species that depend on a wetland to perform these functions will therefore respond predictably as the level of function is changed due to human impairment. Therefore, it may be more relevant, in some cases, to compare wetlands based on HGM landscape classifications rather than Cowardin structural classes.

When developing an IBI, it is not necessarily imperative to decide what classification system you will be using first, as the data can still be collected and then categorized and examined for patterns post-hoc. However, it is important to compare like to like and to maximize your sampling effort. In terms of ensuring a meaningful representative sample, the best available data often are found in the NWI, so the Cowardin classifications may be used to select and randomize wetlands and make sure that all known wetland vegetative types are sampled in accordance with their frequency. There are currently efforts to use Geographic Information Systems (GIS) to classify wetlands remotely based on function, but this is a long way from any sort of national standardization. The HGM classes can then be determined as each site is sampled. The take home message is to consider the unit of interest that the IBI will be designed around, comparing apples to apples, and ensuring that the classes of each unit makes sense biologically.

1.5.1 The Disturbance Gradient

No matter what unit is being assessed by an IBI, using whatever taxa group of interest (e.g., avian, amphibian, or vegetation communities), it is important to select a disturbance gradient that adequately captures and quantifies variation attributed to impairment or impacted systems. This disturbance gradient may be a simple checklist of stressors that are present (Jacobs 2010), a GIS derived gradient (i.e., percent forested area surface in a given area) (Brooks et al. 2006), a series of questions with multiple choice answers indicative of increasing levels of disturbance (Mack 2001), or some combination of these (Collins et al. 2008). Each gradient has relative advantages and disadvantages, and the most effective gradient used will likely vary with the species of interest used to gauge level of impairment (Table 1.4).

Table 1.4 Positives and negatives associated with different types of disturbances gradient used to test the response of taxa groups to human impairment

When selecting a disturbance gradient, it is important to consider an overall context. We know that the Clean Water Act calls for ways to measure the integrity of wetlands, but measuring biological communities as surrogates of integrity is only part of the mandate. Is it really reasonable to intensively go out and sample these communities? The time, energy and effort expenditures involved with this sampling are both cost and logistically prohibitive. Birds need to be sampled during the breeding season when males are vocalizing, and two point-counts, at minimum, are needed to assess basic population characteristics. Amphibian communities are stochastic and explosive in numbers and respond to local atmospheric conditions; it is impossible to sample on every warm, rainy spring night, so some species may be missed. IBIs are considered a Level 3 assessment, or intensively collected data that must be calibrated and paired with the Level 2 assessment, or rapidly collected field data. These rapid assessment data can be collected over a more generic time period, and then paired with the temporally-specific IBI data. Many of the disturbance gradients used to calculate IBI scores and effectiveness are actually these Level 2 data (Mack 2001). Rapid assessments can also be much more than just the disturbance gradient; they can also form the basis for comparing and validating other Level 3 assessments (intensively collected data) such as hydrogeomorphic models. The message remains that one should think of all aspects of the project and its implications and select a disturbance gradient that may be used for multiple scenarios, that way data can be leveraged and future research does not start from square one.

For example, a stressor checklist is among the easiest disturbance gradients to fashion and fill out. It can serve as Level 2 data to provide a sense of wetland biological condition, but its utility is limiting in terms of determining levels of other functions (e.g., floodwater attenuation, carbon sequestration, nutrient processing) in the wetland. Multiple choice based disturbance gradients, or rapid assessments, do give us a relative level of the impairment that is occurring in the wetland, but the scoring systems are categorical and can be problematic in measuring a response signature of multiple biological communities.

1.5.2 After the Disturbance Gradient, the Nuts and Bolts of Building an Index of Biological Integrity

Once a disturbance gradient has been settled upon, the first step of actually building an IBI is to designate reference sites. Reference is a slippery term. Does it refer to pre-colonial conditions that no longer exist or is it the best-modern day equivalent? It is impossible to establish a baseline condition for wetlands based on true habitat and landscape variables, so reference sites are used as examples of the best sites and the worst sites captured in your disturbance gradient (USEPA 2002). However, be aware that what is considered reference can vary due to geographic location, and is specific to the type of classification system the IBI is being based upon. Just as terrain varies regionally, then the reference standards are likely to vary from region to region (see breakout box).

West Virginia has extremely variable terrain from the Appalachian Highlands to the banks of the Ohio and Potomac Rivers. Many people consider the Canaan Valley National Wildlife Refuge (CVNWR) to be the premier wetland system in the state. However, this wetland complex is somewhat of an anomaly in the state due to its size and its location on the top of the Allegheny Mountains. It would not make a good reference site if we were comparing it to all wetland conditions in the state, although it is comparable to other high elevation wetlands. The inherent differences between the relatively unimpaired high elevation systems of CVNWR and unimpaired systems along the Ohio River will undoubtedly produce a lot of “noise” that can render IBI development impossible. For example, the pickerel frog (Lithobates palustris) is common and found throughout West Virginia; however, it is less likely to be found in the high-elevation wetlands of Canaan Valley than in the floodplains and swales of the low-lying Ohio River and other large systems. As such, basing a metric on the pickerel frog would be misleading due to the inherent variation in habitat requirements and range throughout the study area.

There is no one way to determine reference sites, and it will undoubtedly vary between researchers and projects based on background knowledge. Before determining which population characteristics, or metrics, are applicable to an IBI, the question of reference and stressed sites will need to be settled. It can be as simple as identifying the top and bottom 25 % sites based on the disturbance gradient (Barbour et al. 1995). Bear in mind that there is variability even among reference conditions, as some things are not captured by the disturbance gradient. That being said, it is critical to have an adequate sample of both good and poor condition sites from which to begin comparing metric effectiveness (Chipps et al. 2006).

Once the reference and stressed sites have been determined, box-and-whisker plots are commonly used to compare metrics, or characteristics of the sampled population, between the categories (Fig. 1.1). A thorough literature search of the taxa group of interest should yield a large number of candidate metrics that should be tested for inclusion into the IBI. The bigger the group of metrics, the more likely it will be to find appropriate metrics that are responsive to the disturbance gradient. Do not be discouraged by having too many metrics, or many of the metrics not showing a consistent response between reference and stressed sites. This is to be expected; however, it is also imperative that the metrics make biological and logical sense. For example, metrics developed for the playas of the Great Plains, such as the number of waterfowl, may not be applicable in the Appalachian Highlands. The Highlands are not on the route of any major waterfowl flyways, and there are many habitat differences that do not support the large numbers of wintering waterfowl. Therefore, it would be permissive to omit this metric in favor of a more biologically meaningful one, such as number of neotropical migrants like common yellowthroat (Geothlypis trichas) and yellow warbler (Dendroica petechia) that are more common in West Virginia. IBI development entails a clear, consistent step-wise process that should eliminate nonresponsive metrics, redundant metrics, or those that may vary based on the classification or wetland setting.

Fig. 1.1
figure 00011

A visual comparison of metric values, examining the interquartile range and median, is the first step used to eliminate nonresponsive metrics and yields a narrative rating of discriminatory power (Barbour et al. 1996). Metrics are classified as excellent, good, fair, or poor. The excellent rating indicates that there is no overlap between interquartile range, whereas the good rating may have some overlap, but the median metric score does not overlap with the interquartile range. Fair and poor metrics should be removed from further analysis

The discrimination efficiency, or effectiveness of the metric value, is one manner to discriminate between reference and stressed sites. Metrics rated good and excellent (Eq. 1.1) based on the box-and-whisker results are retained. After box-and-whisker visual screening, a quantitative calculation of discrimination efficiency value is used to discard metrics with a value less than 60 % because of their inability to consistently differentiate between reference and stressed conditions (Maxted et al. 2000).

$$ \mathrm{ Discrimination}\ \mathrm{ Efficiency}=100\times \left( {a/b} \right) $$
(1.1)

where,

  • a = the number of stressed sites scoring below 25th percentile of reference

  • b = the total number of stressed sites.

Eliminating redundant metrics that discriminate between reference and stressed sites may be accomplished using Spearman’s R correlation (Hughes et al. 1998). Metrics with an R-value >0.80 are considered correlated, although this is a subjective value and sometimes 0.70 or 0.90 are considered instead (Hughes et al. 1998). This rank correlation is preferred over Pearson’s R correlation (raw numbers-not ranked) as the data does not rely on normal distribution assumptions. Of the correlated pairs of metrics, the one with the greatest discrimination efficiency between reference and stressed sites is retained for inclusion into the IBI. If correlated metrics had the same discrimination efficiency, then both metrics can be retained for further screening to determine which metric was best suited for inclusion in the IBI.

To ensure each remaining metric are not responding to regional or classification based influences, we can test with a simple two-way analysis of variance (ANOVA). This may require multiple tests depending on the number of categorical classifications for the waterbody type. For example, a metric can be tested to ensure that the population’s signature response does not vary by ecoregion (Brooks et al. 1998); a second ANOVA can determine if the unit of interest varies due to some secondary classification (e.g., Cowardin classification or HGM setting). If we are designing a wetland IBI for floodplain wetlands, should we not ensure that the metrics will be consistent regardless if it is an emergent floodplain or a forested floodplain? Metrics may need to be transformed as to not violate normality assumptions (skewness and kurtosis between −1 and 1); however, in some cases, normality assumptions may need to be overlooked as the violation of normality may be a function of not enough samples (Miller et al. 2006). Based on these analyses and results, any metrics that do respond to regional or secondary wetland classification differences should be omitted from the final IBI if the desire is to have a state-wide or larger area of impact.

Metrics that passed through these preliminary filters may then be evaluated for a cumulative effect with a multivariate analysis of variance (MANOVA) screening. This ensures that there is no cumulative interactive effect, which there may be despite checking for individual interaction effects between the metrics to the previously mentioned classifications or regional effects. Despite screening for correlations and individual metric influences, there still may be a cumulative effect resulting in a metric being omitted based on best-professional judgment. If the omission is necessary, the IBI metrics should be re-screened to ensure no significant influence.

After these series of screenings to finalize the metrics in an IBI, it is necessary to assign scoring values to each of the metrics. There are two general lines of thinking to scoring each metric value, continuous and discrete. Each system has its relative advantages and disadvantages (Table 1.5). This can be done in multiple ways, although all metrics must be in the same scale or scoring system.

Table 1.5 The characteristics of different scoring techniques used to score IBI

Discrete scoring essentially involves taking the range of values, then breaking them up based on some measure, and assigning a value to each scoring category or bin. These bins are typically determined by subjective percentile ranking (0–25 %, 26–50 %, etc.). Scoring values are then assigned to each bin. These values are subjective and may follow a pattern such as 1, 3, 5 if only three bins, or 3, 6, 9, 12 if four bins. However, continuous values scoring typically fare better in comparisons than discrete scoring methods for metrics (Blocksom 2003). If choosing to base scoring on a continuous system, the integer metrics, such as richness are then normalized (0–1) to allow scoring comparisons with other metrics (Eq. 1.2).

$$ \mathrm{ Normalized}\ \mathrm{ value}=\mathrm{ metric}\ \mathrm{ value}/\mathrm{ maximum}\ \mathrm{ metric}\ \mathrm{ value}\ \mathrm{ observed}\ \mathrm{ in}\ \mathrm{ the}\ \mathrm{ data} $$
(1.2)

Other metrics that respond positively to human impairment, such as the percentage of a tolerant species, needs to be inversed (Eq. 1.3) to enable a consistent response for all metric values.

$$ \mathrm{ Inverted}\ \mathrm{ metric}\ \mathrm{ value}=\left| {1-\left( {\mathrm{ metric}\ \mathrm{ responding}\ \mathrm{ positively}\ \mathrm{ to}\ \mathrm{ human}\ \mathrm{ impairment}} \right)} \right| $$
(1.3)

After these transformations, metrics can be scaled to a continuous 0–10 scale (Blocksom 2003). The influence of outlier values was mitigated by using the best standard value (BSV) of each metric, which was determined to be the 95th percentile of the highest values. Metric scores were standardized by dividing the raw metric value by the range in that metric (Hill et al. 2003) and multiplying by 10 (Eq. 1.4).

$$ \mathrm{ Metric}\ \mathrm{ score}=10\times \left( {{{\mathrm{ raw}\ \mathrm{ metric}\ \mathrm{ value}} \left/ {{\left( {95\mathrm{ th}\;\mathrm{ percentile - low}\ \mathrm{ metric}\ \mathrm{ value}} \right)}} \right.}} \right) $$
(1.4)

Using the metrics appropriate for each classification, IBIs are formed by summing all metrics selected for inclusion to a single composite score. There is no set number of metrics, and they may vary by each classification. For example, the number of suitable metrics that could consistently discriminate between reference and stressed conditions in a depression wetland will likely be different than that of a floodplain wetland. After these resulting IBIs are derived, statistical tests should be performed to ensure a meaningful and significant response to disturbance. This may simply be done based on linear regression or some other dose-response type analysis.

The disturbance gradient and the distribution of the IBI scores for the reference sites may be used to set numeric thresholds describing wetland condition with regards to biological integrity (Gerritsen et al. 2000). For example, categorical threshold limits for IBI scores, if set using the 75th, 25th, and 5th percentiles for all sites, may indicate good (>75 %), fair (74–25 %), poor (24–5 %), and very poor (<5 %); however, we should caution that these categories and thresholds are completely subjective and reliant upon the researcher capturing the full range of wetland condition from the very best to the very worst.

1.6 West Virginia Wetland Avian Wetland Index of Biological Integrity Case Study

In 2005, West Virginia began conducting its own IBI research (Veselka 2008; Veselka et al. 2010a, b). One of the taxa groups studied was avian assemblages because they are generally conspicuous creatures with a long history of species-specific recorded life histories that are conducive to determining assemblage patterns in response to disturbance. Moreover, birds are commonly censused by volunteers using methods described in the Breeding Bird Survey (BBS) so that the responsive IBI could easily be derived using existing data and familiar methods over the entire state. Over the course of two summer field seasons, 151 wetlands were surveyed for birds twice each between 15 April and 1 June, but for the purpose of this case-study we will only consider Floodplain and Scrub-shrub wetlands (Table 1.6), stratified across all three aquatic ecoregions (Woods et al. 1999). Due to the high number of classes associated with regional HGM subclasses (Cole et al. 1997), classes were combined into designated HGM management classes to bolster sample size and to facilitate meaningful comparisons as most general environmental practitioners without a wetland background would likely find many of these subclass designations confusing and likely overlapping (riparian depression versus headwater floodplain).

Table 1.6 Total number of sites by regional hydrogeomorphic (HGM) subclass, designated HGM management class, and Cowardin class by ecoregion for use in developing class specific avian wetland indices of biological integrity (AW-IBI) in West Virginia, USA from 2005 to 2006

At each wetland, in addition to classifying by Cowardin and HGM classifications a disturbance index was recorded. A disturbance gradient denoted the relative levels of human impairment visible in each wetland according to the Ohio Rapid Assessment Method (Mack 2001), an established methodology capable of differentiating various levels of disturbance. These scores theoretically ranged from a low of 4 representing poor conditions, to a high score of 39 indicative of no visible signs of human impairment based on upland buffer width, amount and intensity of surrounding land use, hydrologic modifications, substrate alteration, and habitat alteration.

The reference and stressed designations were based on the top and bottom 25 % of sites scored. Candidate avian IBI metrics were pulled from the literature and compared using box-and-whisker plots, then the Spearman’s R correlation, and finally evaluated for ecoregion or classification effects based upon the series of ANOVAs and MANOVAs. In the case of floodplain wetlands, there were a total of 22 candidate metrics derived from the literature, of which only four made it through all the analysis screenings to be included in the final floodplain bird-based wetland IBI. These metrics were the percentage of permanent resident and edge tolerant birds, the percent of omnivorous birds, the Shannon-Weaver diversity index, and percent of insectivorous birds.

The scrub-shrub wetlands also had four metrics, although three of them were the same responsive metrics found in the floodplain-based IBI (i.e., percent edge tolerant and residential birds, percent omnivorous, and percent insectivorous) and also included the percent of habitat-specific neotropical migrants. In each case, these four metrics were all normalized from 0 to 10 and summed to generate a combined score. The hypothesis was that the higher the score, the greater the biological integrity of the wetland. Moreover, due to the dual classification schemes, we combined the two indices, averaging the values of the like metrics, to create a new type of wetland IBI that is based on two distinct classifications. This enabled a finer-level of resolution, providing a new level of specificity for wetlands in evaluating biological integrity. Considering there is no additional field work required for this specificity, only additional categorization and analyses, the increase in a consistent response signature is impressive (Table 1.7).

Table 1.7 The relation between avian-wetland indices of biological integrity (AW-IBI) and the Ohio Rapid Assessment Method derived disturbance gradient (Mack 2001)

So how will IBI data be used in the future? What does monitoring tell us and how can this be used to protect and maintain the conditions of wetlands? Going back to the Clean Water Act, we remember that no net loss not only pertains to area, but wetland function as well. Indices of biological integrity tell us about the condition of a wetland, but we must remember that not every permit or wetland alteration will be required to conduct a bioassessment – the assessments are too limited to sampling windows and timeframes. So how will all these bioassessments be used to ensure ‘no net loss’?

We need to interpret IBIs in the correct context. What do the scores mean and how can we be assured they are meaningful? The most common way of defining bins of integrity is using breaks based on the percentile values of score (USEPA 2002). For example, the top 15 % of scores may indicate optimal conditions, the next 25 % may be suboptimal, followed by breaks defining marginal and poor condition; but is there a better way to define these breaks?

IBIs are part of the U.S. Environmental Protection Agency’s three-tiered approach for evaluating wetlands. This approach consists of landscape level or GIS-based analyses on wetlands (Level 1), rapid assessments (Level 2) designed to take two people less than 4 h to complete, and data-intensive studies like IBIs or hydrogeomorphic (HGM) functional assessments (Level 3) that can take place over a single or many seasons with multiple visits. By incorporating these multiple scales that can depict disturbances, we can isolate IBI characteristics that respond to local impairments, as well as a sense of landscape thresholds (e.g., percent impervious surface) that can overpower local influences.

A rapid assessment will generate a disturbance index or a type of provision of wildlife habitat numeric value. We are then able to look at the entire sample of sites and link IBI values to conditions we see based on rapid assessments and use a statistical process like threshold or break-point analyses to determine meaningful thresholds.

This requires a large initial sample of both Level 2 and Level 3 data. Furthermore, we are able to use this large sample size to compare with the GIS-based analyses. Depending on the strength of the relation between GIS and rapid-assessment functional metrics, one might wonder why bother with rapid assessments at all? GIS data are static and not always updated on a timely basis. Moreover, there are a number of sources for GIS data and most are not uniform and come from varying sources. A continuous monitoring program based on Level 2 assessments is representative of what is actually happening on the ground, and the relation between GIS data should be reexamined periodically (or when the GIS data are updated). This allows a greater, calibrated GIS estimate that should be used for landscape-level planning purposes, not site specific regulatory criteria. On the other hand, it is also critical to revisit the Level 2-derived functions to be sure that they capture the variations in Level 3 IBI or functional HGM evaluations.

This USEPA process creates a framework for wetland rapid evaluation that is scientifically reinforced in two manners (landscape and intensive assessments). Hence, when wetlands are now slated to be destroyed, filled or modified due to some regulated activity, we can evaluate the wetland pre-impact (and potentially post-impact if not completely destroyed) to quantify exactly what services were lost and need to be replaced within the same watershed (eight-digit hydrologic unit code) to achieve the ‘no net loss.’ These data are then used to guide mitigation activities and steer the wetland enhancement or creation design process towards features that facilitate the functions that were lost on the landscape. This ensures that floodplain wetlands are replaced with similar functioning floodplain wetlands. This is in contrast to some historical mitigation projects that were variations of impounded streams which maximize wetland area relative to a project site and lack design variation regardless of the wetland context it is replacing.

We should caution that there are some services that may need extra attention not heeded in a watershed approach. For example, floodwater attenuation or abatement is a localized service dictated by a wetland’s proximity to susceptible downstream human development or other resources. The position of the wetland in the watershed certainly dictates the extent of the floodwater attenuation potential. Furthermore, even if the replacement wetland provides a greater extent of floodwater attenuation, it may not serve society if the replacement wetland is located in an area of the watershed not inhabited by humans. One can see in this scenario how a mitigated wetland, even if it still functions the same as the one replaced, would not protect communities from economic damages caused by flooding that could have otherwise been avoided if the onsite function was retained. This opens the door for future work that may actually break up the services wetlands provide into local and watershed services, enabling multiple mitigation ratios that differ between onsite enhancement of one function (e.g., sediment stabilization or floodwater attenuation) and offsite mitigation or banking for others (e.g., provision of wildlife habitat or carbon sequestration).

Currently most mitigation ratios are based on the Cowardin type of the wetland that is being replaced. For example, a forested wetland takes more time to mature than an emergent wetland, so a forested wetland will have higher impact mitigation ratios (3:1) than the emergent wetland (2:1) (WVSWVM 2011). The idea behind mitigation ratios incorporates the temporal time that it takes for a wetland to mature, and the likelihood that that type of wetland can be reproduced; the increased area is meant to compensate and provide for a better replacement of wetland function (as it is not quantified in a wetland delineation). Furthermore, there is temporal penalty for the delay between the wetland impact and the mitigation completion. Remember, no net loss applies to physical, chemical and biological integrity and there needs to be a timeline associated with the replacement. With no way to quantify or define these values, the additional area is meant to compensate for functions that currently are not being quantified. This is based on the assumption that the additional area will include these ecosystem services so they are not lost. However, with the advent of multi-purpose Level 2 rapid assessments tools, function can now be evaluated and counted. With capacity, future permitting of required mitigation may be quantified in terms of functions. This would enable a mix of onsite and offsite compensatory mitigation projects, with ratios playing a role in ensuring local protection and preservation of the impacted wetland’s functions. This will ultimately be determined by legislative action. These laws, and potential court challenges, define the need to be attentive to the details that support good science, but also a good communicator about the importance of IBIs and what they say about wetland condition.