Introduction

Estimates of extent and distribution form the foundation of wetland monitoring, policy, and management (Euliss et al. 2008). Comprehensive mapping of all wetland resources has long been used as the default approach for determining regional or statewide wetland extent (Rebelo et al. 2009). However, the time and money required to comprehensively map wetlands limits the frequency at which this can occur and/or limits this approach to small regions and subregions. In contrast, probability-based mapping can be completed quickly and cost-effectively for large geographic areas at regular intervals (Dahl 2011; Kloiber 2010; Ståhl et al. 2010, Lackey and Stein 2013). Probabilistic mapping is used to estimate wetland extent by mapping randomly selected plots and extrapolating the resultant area to the geographic area the plots were selected from. While probabilistic mapping cannot replace all comprehensive mapping applications, probability-based (also referred to here as design-based) methods can provide area-wide estimates of wetland extent and density. These estimates can also be supplemented by model-based interpolation methodologies which can estimate wetland extent and density at every location in an area-wide grid (Aubry and Debouzie 2001; Gregoire 1999). As a result, probabilistic mapping can provide regional, statewide, and national estimates of wetland extent (status) and changes in extent (trends) over time (Dahl 2011; Kloiber 2010; Nusser and Goebel 1997; Ståhl et al. 2010).

Several existing state and national programs utilize probability-based sampling and mapping to estimate wetland extent and distribution. Examples include the status and trends (S&T) component of the National Wetland Inventory (NWI-S&T), the Minnesota Wetland S&T program (MN-S&T), and the National Inventory of Landscapes in Sweden (NILS) (Dahl 2011; Kloiber 2010; Ståhl et al. 2010). Each of these programs uses probabilistic sampling to select a sample of cells from a regular grid covering the entire target area. Then, wetland maps are produced for all selected cells and area-wide estimates are produced based on the sampling theory utilized to select the sample plots. The implicit assumption is that a design-based approach will approximate the results of comprehensive mapping, within statistical confidence limits (as defined by the design-based approach utilized).

The ability to extrapolate the relationship between design-based estimates and the actual “true” wetland area with a known level of certainty is critical for making informed decisions based on the results of S&T programs. This is particularly important when estimating trends, as statistical and methodological uncertainty can obscure subtle changes in area or distribution. There are two important assumption about design-based estimates that have been largely untested in existing S&T programs. First, there should be a direct comparison between probabilistic and comprehensive mapping to verify how closely design based estimates approximate actual wetland extent. Second, variability between independent mapping teams should be tested to determine how much potential error is introduced during the normal photointerpretation and mapping process.

This study compared probabilistic and comprehensive estimates of wetland and stream extent for two representative regions in California. The comparison was used to answer the following questions: (1) How well do design-based estimates of wetland extent approximate results from comprehensive mapping? And (2) How much does inter-mapper variability contribute to overall error in design-based estimates? The results of this analysis provide insight into potential sources of error associated with design-based mapping programs that should be considered when reporting confidence levels associated with status and trends programs.

Methods

General approach

Two sets of analysis were conducted to answer the two main study questions and provide critical information about confidence levels that can be expected from the California S&T program. First, the overall accuracy of the design-based estimates were evaluated by comparing probabilistic mapping to comprehensive mapping in two regions of California (validation plots). Second, intermapper variability was evaluated by selecting twenty targeted plots representing all wetland types across California and comparing the results from three independent mapping teams who each mapped these same twenty plots.

Plot selection

Two validation areas were selected for the comparison of probabilistic estimate of wetland extent to totals derived from comprehensive maps (Fig. 1). These areas in the central and southern coast of California were selected based on a number of criteria. First, both areas have comprehensive, high-quality wetland maps available for analysis that were produced from freely available imagery dated from 2005 or later. In both cases, the comprehensive wetland maps were available for download from the national wetland inventory (NWI). Second, the two areas are ecologically and geographically distinct and have different types and densities of anthropogenic land use. This allowed the mapping and estimation protocols to be tested in two distinct settings. It is important to note that comprehensive maps were necessary to conduct the comparative analysis; however, comprehensive maps are not necessary for implementation of a probabilistic mapping program.

Fig. 1
figure 1

Study areas and sample plot locations by mapping team. The third team did not produce sample plot maps in the south coast region

Validation plots were selected using a protocol previously optimized for California aquatic resources (wetlands, streams, and deepwater; Lackey and Stein 2013, 2014). Briefly, plots were selected using generalized random tessellation stratified (GRTS) sampling. First, a 4 km square grid (16 km2) was produced in ArcGIS for each study area (ESRI 2010). Next, grid cells were converted to points, exported as a shapefile, and the GRTS sample draw performed in R using the spsurvey package (Kincaid and Olsen 2011; R Development Core Team 2011). No stratification was used and 30 plots were selected in each region (total of 60 plots). Sample draws were then divided between the three mapping teams (Fig. 1; mapping team 3 did not produce maps for the south coast). Division was consistent with GRTS theory, so that area-wide estimates could be produced from only the sample plots produced by each team.

In addition to the 60 validation plots, 20 intermapper variability plots were selected in a targeted manner to represent the range of wetland types throughout California. The 20 intermapper plots were each 4 km2 consistent with previously established S&T protocols for California (Lackey and Stein 2014). The plots encompassed all major ecoregions in the State and were selected to include multiple wetland types (and transition zones) in each plot. Each of the three mapping groups were asked to independently map the same 20 plots using the established S&T protocols. Targeted selection of the intermapper plots ensured a robust representation of the range of potential circumstances that would likely be encountered during implementation of the S&T program.

Map production

Mapping teams used the California Aquatic Resources Mapping Standards for both validation and intercalibration maps (CWMW 2014). Wetland and stream mapping was done directly in ArcGIS using the 2005 National Agriculture Imagery Program (NAIP) base imagery. Both the natural color and color infra-red (CIR) NAIP images were used with 1-m pixel resolution. In addition, mapping teams used the USGS Topographic Quadrangle digital raster graphic (DRG), the 10-m digital elevation model (DEM), and the national elevation dataset (NED), the national hydrography dataset (NHD); and soil maps from the Natural Resources Conservation Service as ancillary data to support map interpretation.

Once fully delineated and classified, maps were transferred to a different mapping professional and were completely reviewed for delineation and classification. The reviewing mapper performed edits directly within a copy of the draft map geodatabase, resulting in a final, QAQC’d geodatabase for each plot.

Classification

Wetland polygons were classified using the California Aquatic Resources Classification System (CARCS; Table 1). CARCS is a functional classification derived from the hydrogeomorphic (HGM) classification system used by the US Army Corps of Engineers (Brinson 1993). The CARCS system combines hierarchical classification, based on hydrogeomorphology and landscape connection, together with optional modifiers for vegetation, anthropogenic influence, flow regime, and substrate (Not shown in Table 1). The pre-existing comprehensive maps were primarily classified according to the U.S. Fish and Wildlife Service classification used by the NWI (Cowardin et al. 1979). Therefore, a crosswalk was developed to facilitate comparisons between sample plot maps and the pre-existing comprehensive maps (Table 2).

Table 1 California aquatic resources classification system (CARCS)
Table 2 Cross-walk between CARCS and the NWI wetland classification system

A notable deficiency in the crosswalk exists for the riverine and palustrine NWI subtypes. Under the CARCS classification system, all functionally riverine wetlands and streams are classified as such. However, wetlands and streams are only classified as riverine under Cowardin et al. if they are (i) scoured or unvegetated or (ii) intermittent streams with no vegetation differences between the streambed and surrounding upland. Indeed, a portion of wetlands classified as riverine under CARCS would instead be classified as palustrine under Cowardin et al. (1979) if they are vegetated and have spectral or other obvious vegetation differences from upland.

Fortunately, the comprehensive South Coast maps were also mapped and classified using a form of the HGM classification that included a fluvial/non-fluvial designation. This allowed us to determine which palustrine wetlands in the comprehensive map were functionally fluvial and therefore equivalent to riverine wetlands in the probabilistic map. Therefore, map analysis (described below) that considered classification was conducted in two ways for the South Coast. First, analysis was conducted according to the crosswalk in Table 2. Second, analysis was conducted according to the crosswalk in Table 3, which considers the fluvial portion of palustrine wetlands in the comprehensive map to be equivalent to riverine wetlands in the sample maps.

Table 3 Cross-walk between CARCS and the NWI wetland classification system, with fluvial designation

Probabilistic estimates versus comprehensive totals of wetland extent

To further identify potential sources of uncertainty, the newly produced sample maps were directly compared to the existing comprehensive maps to determine the quantitative differences in mapping methodologies. This comparison considered only the portion of the comprehensive maps that was within the boundaries of the sample plot. Therefore, paired differences could be calculated between the sample maps and the comprehensive maps. This pairing controlled for differences between plots and isolated the effect of methodology and inter-mapper variability. The fractional difference between comprehensive and sample maps (f method ) was calculated for each plot:

$$f_{method} = \frac{{Area_{sample} }}{{Area_{comprehensive} }} - 1$$
(1)

and an average f method was then calculated for the entire study area. Given our emphasis on statistical estimation, we considered it appropriate that positive and negative differences would partially cancel themselves out in the calculation of a mean f method .

Intermapper variability

Intermapper variability was evaluated by comparing results from the three teams for each of the 20 plots selected for this portion of the study. Agreement between the three teams for both total area and area of each wetland class was expressed in terms of the coefficient of variation among all teams and the correlation between pairs of teams. Agreement in wetland classification was evaluated at the major class, class and type levels to determine how the depth of classification may affect certainty estimates. Following the initial mapping of all 20 plots by all teams, the three teams reviewed their results and agreed upon a consensus “true map” for each plot. This “true map” was then compared to the initial map generated by each team to calculate estimates of users and producers accuracy: Producer’s Accuracy (i.e., error of omission) measures the percent of wetland features that are correctly mapped as wetlands. Users Accuracy (i.e. error of commission) measures the percent of polygons mapped as wetlands that are not actually wetlands. Results of this exercise were used to develop recommendations for future refinement of mapping protocols to reduce ambiguity and improve consistency (Fig. 2).

Fig. 2
figure 2

Sample estimates and comprehensive totals for all wetlands, palustrine, and riverine types. In the Central Coast (asterisk), the crosswalk in Table 2 (solid line) compared depressions and slopes against palustrine and riverine. In the South Coast (double asterisk), the Table 2 crosswalk was used (dashed line) in addition to the Table 3 crosswalk (dotted line), which compares the fluvial portion of palustrine wetlands against riverine wetlands

Results

Probabilistic estimates versus comprehensive totals of wetland extent

Overall sample-based estimates of total wetland extent were statistically equivalent to comprehensive totals in the Central Coast (8 % higher) and significantly lower than comprehensive totals in the South Coast (40 % lower). However, comparison by wetland class and plot-by-plot (Fig. 3) showed much greater differences, suggesting that methodological differences were more substantial than indicated by comparison of overall area estimates.

Fig. 3
figure 3

Average fractional difference between comprehensive and sample maps (f method ) for all wetlands, palustrine, and riverine types. In the Central Coast (asterisk), the crosswalk in Table 2 was utilized while the South Coast (double asterisk) utilized both crosswalks (Tables 2, 3)

In the Central Coast, plot-by-plot comparisons showed an average of 66 % more mapped wetland area in sample plots versus pre-existing comprehensive maps (Fig. 3). Palustrine area in individual sample plot maps was an average of 55 % lower, while riverine area was an average of 370 % higher (these plot-by-plot differences were reflected in the sample estimates). However, these differences were influenced to an unknown extent by the difficulty of cross-walking the classifications used for the sample and comprehensive maps (Table 2). This pattern of results in the Central Coast suggests that substantial methodological differences were offset by the random sample draw.

In the South Coast, plot-by-plot comparisons showed the sample plot maps were less than 5 % different from existing comprehensive maps for the same region (Fig. 3). Similar comparisons by sample type, and using the crosswalk from Table 2, showed palustrine wetland area was slightly lower on average in individual sample plot maps and riverine area was significantly higher. Using the improved crosswalk (Table 3), brought riverine wetlands for sample plot maps within 2 %, on average, of existing comprehensive maps while palustrine wetlands became slightly, but non-significantly higher (35 %). This pattern of results in the South Coast suggests that the sample estimate was affected by a non-representative sample (resulting from the relatively low number of plots), instead of by methodological differences. The increased familiarity of the south coast team with the riverine resources may have also contributed to the higher levels of accuracy (relative to the central coast team).

Intermapper variability

Estimates of total wetland area from the 20 plots mapped by all three mapping teams were similar among teams. The overall coefficient of variation was 5.7 % and the paired r-values for inter-team comparisons were 0.93, 0.94, and 0.97 (Fig. 4). Overall producer’s accuracy (i.e. a measure of the percent of wetland features that are correctly mapped as wetlands) was 95.2 %, while overall user’s accuracy (i.e. a measure of the percent of polygons mapped as wetlands that are actually wetlands) was 99 % (Table 4). This means there was a slightly higher risk of underestimating wetland area than overestimating it. Concordance between teams was lower for the overall length of riverine wetlands, with paired r values between teams of 0.78, 0.85, and 0.90. One of the three teams consistently mapped greater stream length than the others, which influenced the relatively lower r-values.

Fig. 4
figure 4

Relationship between mapping agencies for wetland/open water area. Each point represents a different 400 ha plot that both agencies determined had Wetland and/or open water present

Table 4 Producers and users accuracy for overall wetland extent and by wetland class

Concordance between mapping teams varied by wetland class (Fig. 5). R-values for paired comparisons between mapping teams by class ranged from 0.66 to 0.94. There was consistently high agreement within the depressional, lacustrine, and estuarine classes; all three teams correctly identified the presence of these wetland types in 75, 80, and 89 % of the plots, respectively. Agreement was lowest for slope wetlands, where all three teams correctly identified this wetland type in only 14 % of the plots. In several cases, differences between teams were based on a specific substitution resulting from misinterpretation of mapping standards. For example, Team 2 substituted a slope classification for a riverine classification on several plots (Fig. 5). Once these differences were rectified, overall accuracy improved. The final producer’s accuracy ranged from 77 % for estuarine wetlands to 100 % for slope wetlands. Final user’s accuracy ranged from 88.8 % for riverine wetlands to 100 % for estuarine and lacustrine wetlands (Table 4). Error rates increased substantially for classification levels below wetland class; therefore, we recommend that comparisons and data quality objectives only be applied at the overall wetland area and major class levels.

Fig. 5
figure 5

Total area within each wetland class by mapping team

Discussion

Our findings suggest that extreme care should be exercised when compiling multiple independent mapping efforts to produce composite wetland maps and associated estimates of extent or change. Although routinely done due to resource and time constraints, compilation of wetland maps from disparate mapping efforts will compound errors associated with spatial and temporal variability, methodological error, and intermapper variability in interpretation and classification. The cumulative error may result in erroneous conclusions, particularly for tend (i.e. change over time) estimates. As shown in this study, probabilistic and comprehensive mapping may produce different results due to these methodological, interpretation, and classification differences. For example, in our study, plot-by-plot differences between comprehensive maps and sample maps differed by up to ±50 %. Similarly, the State of Washington Wetland Change Analysis Program compared wetland change estimates from three different mapping programs covering the same region and found that estimates varied from nearly identical to up to 30 % different in estimated change in area (http://www.ecy.wa.gov/programs/sea/wetlands/StatusAndTrends.html). Differences may be much higher if different classification systems have to be rectified or cross-walked. Brooks et al. (2011) estimated reclassification accuracy between the National Wetlands Inventory classification (Cowardin et al. 1979) and the hydrogeomorphic classification (Brinson 1993) at approximately 60 %, leading to high rates of uncertainty when merging maps produced using different classification systems. Consequently, we recommend that such consolidation of disparate mapping efforts should not be used for trend assessment since the error rates will most likely be far higher than the estimated rates of change in wetland area.

Intermapper variability is the largest source of error in most mapping programs. Wetlands tend to be more dynamic than other ecosystems, with conditions changing based on tidal, seasonal, inter-annual, or decadal cycles. This makes wetland mapping and classification particularly difficult. Fully automated image interpretation can have wide ranges of accuracy and precision depending on topographic and vegetation conditions (Corcoran et al. 2011; Hirano et al. 2003), and may not be appropriate for many ongoing wetland assessment programs. Consequently, manual interpretation remains the standard method used by most programs. This approach, by definition, involves individual judgment on issues such as the location of the wetland-upland boundary, overall extent of an individual wetland polygon, interpretation of wetland type or class, and transition between one wetland type and another (e.g. riverine to estuarine transitions in coastal areas). Decisions are particularly challenging in systems that have been subject to recent disturbance or are undergoing active restoration; however, these are the areas where change assessment may be most critical for decision makers.

We have demonstrated that independent teams can produce consistent maps within ranges of acceptable error, but only with development and application of standardized protocols, training, and rigorous quality control measures, including regular team intercalibration. Team intercalibration is particularly important because it allows all teams to maximize proficiency based on the collective experience of all mappers. Following implementation of all training and quality control measures, our overall accuracy (average of users and producers error) was 97 % for total wetland area and ranged from 7 to 100 % for individual wetland classes. These results are within, to slightly below, the producers and users accuracy ranges of Federal Geographic Data Committee (FGDC) standards and are consistent with other programs. For example, the Minnesota wetland change assessment program reports that the classification process correctly distinguishes between wetland and upland 94 % of the time and correctly classifies the more detailed land cover types 89 % of the time (Kloiber 2010). However, maintaining this level of accuracy requires a strict quality control program that includes documentation of standard operating procedures, consistent training of photo-interpreters, in-office secondary review by senior photo-interpreters, and field verification of wetland maps (Kloiber et al. 2012). Furthermore, we found that consistent classification below the wetland class level is challenging, and error rates rise to a level where interpretation at these deeper classification levels should be done with extreme caution.

Wetland loss rates in some portions of the United States have declined over the past 30 years, dropping below 5 % overall loss in many regions (Dahl 2011). Although the loss rates are low, these losses can still represent tens of thousands of ha, including critical losses of rare and threatened wetlands that provide important functions and habitat. As loss rates fall, it is increasingly important that accurate error rates be included in any reporting of wetland status and trends. Our results suggest that 6 % is not an un-expected error rate for probabilistic estimation of wetland loss (which is the method used in the national wetland status and trends program). This means that there may be little to no difference between the reported rate of loss and the uncertainty associated with those estimates. The national status and trends program, administered by the U.S. Fish and Wildlife Service reports their procedural error rates as 3–5 %, which is within the same range as the 0–4 % reported rate of change (Dahl 2011). Similarly, a New York State assessment of wetland trends using a randomized sampling approach reported that error rates were within the same range as the estimated change in wetland area (Huffman and Associates Inc. 2000).

Probabilistic mapping is an efficient approach to estimate wetland status and trends over large spatial scales where comprehensive mapping is not practical. However, once a probabilistic program is adopted, it is critical to implement rigorous and ongoing quality control and mapper intercalibration programs and to report error estimates along with estimates of change. Furthermore, once a probabilistic program is adopted it may be difficult to compare previous wetland estimates based on comprehensive mapping. For example, in California, the 2010 State of the State’s Wetlands report, estimated that there were 2.9 million acres (1.1 million ha) of wetlands statewide (CNRA 2010). However, preliminary application of the probabilistic mapping methods developed for California suggest an estimated wetland density of 3.24, or an extrapolated area estimate of 3.2 million acres (1.3 million ha; Lackey and Stein 2013). This 3.5 % difference in initial estimates should not automatically be interpreted as an increase in wetland area given the possibility for methodological, inter-mapper, and statistical errors in both estimates. Such issues should be carefully considered and disclosed when embarking on state or regional mapping efforts. Furthermore, once an approach is adopted, changes to classification systems, mapping protocols, or overall mapping strategy should be made with extreme caution, as they may preclude meaningful comparisons of change over time.