Introduction

Wetlands provide a vast array of ecological services, functions, and values. From flood protection and the maintenance of water quality, to carbon sequestration, the mediation of biogeochemical cycles, and numerous economic, aesthetic, and cultural benefits, wetlands are paramount to human health and well-being (Mitsch and Gossilink 2000, Barbier 2011). Notwithstanding the widely-recognized need to identify and manage wetland resources in a sustainable way (Christensen 1996, Turner 1989), techniques to efficiently inventory and realistically model wetland presence at the landscape-level remain elusive (Finlayson et al. 1999, Rebelo et al. 2009, Meixler and Bain 2010).

The National Wetland Inventory (NWI) produced by the United States Department of Interior Fish and Wildlife Service is the most widely used source of spatial wetland data in the United States and has been utilized to predict impacts from sea-level rise, to undertake wetland restoration planning, and to perform ecological modeling for a variety of wildlife species and habitats (Fish and Wildlife Service 2009). Although the NWI has been applied to a diverse range of studies, its creation and continued development are dependent on the time-consuming aerial interpretation of vegetative cover by individual human analysts (Dahl 2011).

Vegetative land cover can play an important role in wetland identification (Adam et al. 2010); however, vegetative evidence in the absence of other corroborative data may underestimate wetland presence or extent. For example, indicators of hydrology as derived from elevation models may be valuable in determining wetland presence (Lang et al. 2012, Lang et al. 2013) and the ability of soil chemistry and morphology to reflect hydrologic gradients is a key tenant of hydropedology (Pennock et al. 2014). Indeed, the robustness of the hydrology, soils, and vegetation triad of wetland indicators is well-established and even has been legally formalized into the regulatory definition of wetlands as implemented in Section 404 of the Clean Water Act (33 U.S.C. §1251 et seq. 1972). In Florida for example, Chapter 62-340 of the Florida Administrative Code details that delineation of the landward extent of wetlands or other surface waters for regulatory purposes requires consideration of a site’s hydrology, soils, and vegetation. The essence of the rule is that if evidence for “two out of three” of these indicators is present at a location, then that area may be designated as a wetland or surface water. Even though elevation and soils data are increasingly available at no cost, reliance on qualitative methods bars the inclusion of such ancillary information from the wetland identification process. Photographic interpretive techniques, like those used in development of the NWI, may not fully exploit the potential time-savings, scientific rigor, and ecological realism available through quantitative landscape ecology and spatial statistics.

Landscapes are spatially heterogeneous. The charge of the landscape ecologist is to explain this heterogeneity in terms of pattern and process with explicit examination of spatial structure, variability, and scale (Wiens 1989). In spite of the need for consideration of spatial structure in ecological modeling (Hoeting 2009), efforts to improve on the NWI have focused largely on aggregation approaches in which NWI data are merged with selected attributes from National Resources Conservation Service (NRCS) soil maps, the National Hydrography Dataset (NHD), or similar sources using attribute look-up tables or “spatial join” procedures in a Geographic Information Systems (GIS) environment (see e.g. Galbraith et al, 2003; Reif et al, 2009; Dvorett et al, 2012). Spatial processes are often excluded in these efforts and the selection criteria used by researchers in choosing what attributes to amalgamate are rarely provided or detailed. In a few instances, investigations have included spatial statistics in a post hoc fashion to help describe aggregated results (see e.g. Mccauley and Jenkins, 2005; Martin et al, 2012); however, the unambiguous consideration of spatial dependence, spatial autocorrelation, or other latent structural processes as potential explanatory variables in predicting wetland presence is uncommon. Beyond neglecting to leverage the explanatory power of spatial processes, the majority of wetland modeling studies generate results under the patch matrix perspective (however, see Murphy et al, 2007). That is, most wetland models reduce real-world environmental gradients to dichotomous units of wetland presence or absence.

Since the publication of Patches and Structural Components for A Landscape Ecology, (Forman and Godron, 1981), patch matrix has been the prevailing method used to represent landscape pattern. Despite wide-ranging endorsement and implementation, the patch matrix model’s basis in delineation of discrete ecological units or “patches” is problematic under many modeling scenarios. For instance, patch matrix boundaries are often subjectively determined, they may not be relevant to focal ecological processes, and they often fail to represent landscapes in an ecologically realistic context (McGarigal 2005, McGarigal et al. 2009, Evans and Cushman 2009, Lausch et al. 2015). Furthermore, the homogenization or simplification of landscape structure diminishes measurable spatial heterogeneity and thereby decreases the total amount of information available to unravel the pattern and process relationship (McGarigal 2005). In contrast to the patch matrix model’s categorization of landscapes as patchworks of discrete ecological units, the gradient-based approach aims to quantify landscape heterogeneity through analysis of continuous value ranges.

Continuous values better represent environmental variability precisely because they capture the trends, gradations, and transitions found within and among landscape components (McGarigal 2005). In the case of wetlands, the gradient-based perspective allows for wetlands to be modeled and represented as integrated components of the larger hydrologic system, complete with ecotones, riparian zones, transitional areas, and hydrologic connections (Murphy et al. 2007). As part of a larger theoretical framework, movement from the patch matrix model to a gradient-based methodology tracks the historical trajectory of landscape ecology. That is, landscape ecology has matured from a once descriptive science, concerned predominantly with the taxonomy of homogeneous types or units, to one focused on the reciprocity of pattern and process as observed in multiple dimensions and restrained by scale (Wiens 1989, Wu and Loucks 1995, Wu and Hobbs 2002, 2007).

Capable of accommodating both fixed and random effects, Bayesian probability models may offer resolution to many of the difficulties encountered when predicting wetland presence. The tiered configuration (hierarchy) of the probability models allow for incorporation of “known” environmental variables as well as the “unknown” effects associated with latent processes like spatial correlation. Within the hierarchical model framework, latent structural processes can be assimilated into models via a random-effect term that serves to quantify the uncertainty remaining after accounting for the effects of fixed environmental covariates (Elsner et al. 2016). Until recently, fitting of Bayesian hierarchical models has been restricted to computationally demanding Markov chain Monte Carlo (MCMC) simulation; however, integrated nested Laplace approximation (INLA) uses an approximation for inference and therefore provides a newly accessible and fast alternative to MCMC (Rue et al. 2009).

The goal of the current study is to employ techniques from spatial statistics to predict the probability of wetland presence as a continuous gradient and with explicit consideration of spatial structure. As a first step to accomplishing this task, explanatory variables linked to soils, hydrology, and vegetation are extracted from freely available government datasets. Following verification of each individual variable’s significance in explaining the presence of wetlands, the variables are fitted to a hierarchical model. In addition to the soil, hydrologic, and vegetative fixed effects, a Gaussian random-effect is stacked into the model to incorporate latent spatial structure as a stochastic process. Finally, model robustness is evaluated through examination of its sensitivity to changes in scale, extent, and geographic location.

Methods

Domain and Data

Model fitting is conducted for a primary domain in northwest Florida. Model robustness is then assessed by applying the best performing model from the primary domain to several other resolutions, areal extents, and geographic locations.

The primary domain has an extent of 100 km 2 and is centered on Wakulla Springs State Park, approximately 23 km south of Tallahassee, Florida [Fig. 1]. The area is bounded by a 10 km by 10 km square with a northwest corner located at 30.23 N Latitude and −84.25 W Longitude. The domain occurs in the Southeast Coastal Plain and is within the Gulf Coastal Lowlands physiographic province. Elevation over the area is mostly uniform between three and nine meters above sea level with infrequent increases to a maximum of approximately fourteen meters. The Wakulla River and its associated floodplain transect the study area from the northwest to the southeast corners. Dominant land cover over the domain includes natural upland communities, natural wetland communities, silvaculture, agricultural, and developed areas (commercial and residential).

Fig. 1
figure 1

Primary Domain (Domain 1). The 100 km 2 domain is located in Wakulla County, Florida. The background map is produced using functions in the ggmap package (Kahle and Wickham 2013)

Fig. 2
figure 2

Domain 4 located in northwest Florida. The 100 km 2 domain is found in Leon County, Florida

The second and third domains overlap that of the primary study area, but encompass larger areal extents. These domains double and then triple the area of the primary domain to 200 km 2 and 300 km 2 respectively and are used to evaluate model sensitivity to changes in areal extent. The fourth domain [Fig. 2] is also located within the Gulf Coastal Lowlands physiographic province in northwest Florida, but does not overlap the primary study area. The fifth and final domain [Fig. 3] straddles the Peace River in peninsular Florida about 80 km due east of the City of Bradenton. Both the fourth and fifth domains exhibit an areal extent of approximately 100 km 2.

Analysis and modeling are performed using the open-source R language for statistical computing (R Core Team 2016) with freely-available government data. Incorporated data includes the 1/3 arc sec digital elevation model from the United States Geological Service (USGS) [http://ned.usgs.gov/], surface reflectance from the National Aeronautics and Space Administration and USGS Landsat 8 collaboration [http://landsat.usgs.gov/landsat8.php], land use data from the Florida Department of Environmental Protection [http://www.dep.state.fl.us/gis/datadir.htm], and Soil Survey Geographic Database soils data (SSURGO) from the NRCS [http://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm]. Copies of the R code and all data used in the study are available at http://github.com/JMHumphreys/WetlandPatchwork http://github.com/JMHumphreys/WetlandPatchwork.

Preliminary Analysis

To begin, soils data representing the primary domain are spatially subset based on their coincidence with natural upland and wetland land uses as identified by overlaying land use data. In conjunction with the sub-setting of soils data, a binary value (1, 0) designating each soil mapping unit’s (SMU) geographic association with a wetland (1) or upland (0) is added to the soils spatial object. In addition to the newly added vector specifying each SMU’s affiliation to a wetland or upland land use, the full complement of the dataset’s native attributes are maintained. To reveal underlying data structure, native soil attributes are then transformed using principal component and correspondence analyses.

Continuous numeric attributes from the soils spatial object, such as those estimating depth to water table and water storage, are explored using a Principal Components Analysis (PCA). In a similar fashion, categorical and nominal soil attributes (e.g. soil taxa, drainage classification, etc) are evaluated through Multiple Correspondence Analysis (MCA). Both the PCA and MCA are performed using the FactoMineR package (Lê et al. 2008). Next, the decomposed variables from the PCA and MCA are assessed for statistical significance by regressing each onto the binomial land use vector that was generated during the initial sub-setting of soils data. The regression is carried-out using a non-spatial model as implemented using the r-INLA package (Rue et al. 2009). Those decomposed soil variables with significant non-zero model coefficients as determined by the credible interval are retained for further investigation.

To supplement the decomposed soil variables, estimates for Available Water Capacity (AWC) and Organic Mass (OM) are calculated using the soilDB package (Beaudette et al. 2016). This is done by aggregating attributes by the horizon-level and then by the soil profile before performing a weighted average across each SMU based on the percent composition attribute. AWC is expressed as a volume fraction and, after adjusting for salinity and inclusive fragments, represents the difference in tension between water contents at field capacity (typically, one-tenth to one-third bar) and fifteen bars (Veihmeyer and Hendrickson 1927). OM describes the amount of decomposed organic matter present as a weighted percentage of soil material. As with the decomposed soils data, the significance of AWC and OM in predicting wetland presence within the domain is verified using non-spatial regression. There is concern in regard to the stability of the model coefficients due to correlation between attributes used in estimating AWC and those decomposed from the original soils dataset; however, these potentially confounding effects are addressed during final model fitting and are discussed below.

Having identified potential variables linked to soils, a Soil Adjusted Vegetation Index (SAVI) is constructed as a proxy for the wetland vegetation found in the domain using surface reflectance from the National Aeronautics and Space Administration and USGS Landsat 8 collaboration (30m 2 resolution). The fluctuation of chlorophyll concentration within wetland vegetation is correlated to site-specific hydrology, and because chlorophyll reflects the red and near-infrared spectra disproportionately, remotely sensed reflectance data can be applied to identify possible wetlands (Adam et al. 2010). The SAVI quantifies the ratio of the red and near-infrared bands of surface reflectance by dividing the difference of the two by their sum plus a standard correction factor (Masek et al. 2006). Following calculation of the SAVI, its ability to predict wetland presence is confirmed using non-spatial regression as previously described.

Recognizing that digital elevation data can be used to identify surface flow and wetland hydrology (Moore et al. 1993), two topographic indexes are derived. First, a simple Topographic Position Index (TPI) is calculated as the difference between the value of a target cell and the mean value of its eight neighboring cells. Secondly, a Compound Topographic Index (CTI) is given as the natural log of the upstream contributing area divided by the tangent of the slope for each cell. The explanatory power of both the TPI and CTI are checked through non-spatial regression.

As a final step before model fitting, a spatial adjacency graph is constructed for the study domain using the spdep package (Bivand and Piras 2015). Determination of neighbor contiguity during graph construction is restricted to a minimum of two neighboring cells and a maximum of eight (i.e. a “queen” configuration).

Model Fitting and Selection

The probability of wetland presence (p s ) at cell s is given as

$$ \pi(p_{s}) \sim \text{Binomial}(\mu_{p}, \eta), $$
(1)

where the presence or absence of a wetland at a location (either 0 or 1) is described by a binomial distribution with mean

$$ \mu_{p} = \frac{\exp(\eta)}{1 + \exp(\eta)}, $$
(2)

and a mean probability (μ p ) that is linearly related to the fixed and random effects as

$$\begin{array}{@{}rcl@{}} {\eta}&=&\gamma_{0} + \gamma_{V1}\text{V1}_{s} + \gamma_{V2}\text{V2}_{s}\\ &&+\gamma_{MCAV3}\text{MCAV3}_{s} + \gamma_{MCAV5}\text{MCAV5}_{s}\\ &&+ \gamma_{OM}\text{OM}_{s} + \gamma_{TPI}\text{TPI}_{s} +\gamma_{CTI} \text{CTI}_{s} + u_{s}, \end{array} $$
(3)

using the logit link. Throughout fitting of candidate sub-models, Eq. 3 (specifying sub-model “model2”) is modified iteratively to examine various covariates.

The fixed effects include the first and second decomposed soil variables from the PCA (V1 s ) and (V2 s ), the third and fifth decomposed soil variables from the MCA (MCAV3 s ) and (MCAV5 s ), Organic Mass (OM s ), the Topographic Position Index (TPI s ), and the Compound Topographic Index (CTI s ). The random effect (u s ) follows a Besag formulation (Besag 1975):

$$ u_{i} | u_{j}, i \neq j, \tau \sim N\left( \frac{1}{m_{i}} \sum\limits_{i \sim j} u_{j} , \frac{1}{m_{i} \tau}\right) $$
(4)
Fig. 3
figure 3

Domain 5 located in southwest Florida. The 100 km 2 domain is found in Hardee County, Florida

where N is the normal distribution with mean \(1/m_{i} \cdot {\sum }_{i \sim j} u_{j}\) and variance 1/(m i τ) where m i is the number of neighboring cells of cell i and τ is the precision; ij indicates cells i and j are neighbors.

Vague Gaussian priors with known precision are assigned to the γ’s. The priors and the likelihood are combined with Bayes rule to obtain the posterior distributions for the model parameters. The integrals cannot be solved analytically; therefore, the method of INLA is used as a fast alternative to MCMC simulation for models that have a latent Gaussian structure (Rue et al. 2009). This is done with functions from the r-INLA package (Rue et al. 2014).

The variables developed during the preceding preliminary analysis are evaluated for possible collinearity prior to initiating construction of the spatial model. Collinearity diagnostics for independent variables as described by Belsley et al. (1980) are implemented using the perturb package (Hendricks and Pelzer 1991). Initial conditioning of the variable matrix produces an overall condition index score of 28.45 and individual decomposition proportions less than 0.500. These values fall below the maximum thresholds proposed by Belsley et al. (1980), suggesting that collinearity will not bias the model.

To quantify latent structural processes, the model is first fitted (sub-model “model0”) with the random-effect term. The random-effect term quantifies spatial dependencies through incorporation of the adjacency graph. The adjacency graph provides a spatial index and neighbors list for all regions within the study domain. Having defined neighbor contiguity during preliminary analysis as being limited to a minimum of two points, the primary domain includes 40,000 regions, with an average of 7.94 links per region. The four corners of the square domain are found to be the least connected regions with three links each.

Next, the fixed covariates identified during preliminary analysis as being significant are added (Sub-model “model1”). Sub-model “model2” is then constructed by retaining the non-zero covariates resulting from model1. To investigate the influence of the random-effect term, sub-model model2 is re-fit without the random-effect term (sub-model “model3”).

Posterior distributions for the candidate models are re-approximated to ascertain the Watanabe-Akaike information criteria (WAIC) and log-conditional predictive ordinance (LCPO), which are methods of cross-validated skill. These statistics measure the relative quality of the model given the available data and both are scaled such that the lower the value, the better the model. After fitting all models, comparison of the WAIC and LCPO and consideration of non-significant covariates reveal that sub-model model2 is the best performing model. Although model1 produces a slightly less WAIC than does model2 (5664.46 and 5667.77 respectively), two of model1’s covariates (AWC and SAVI) are found to be non-significant as determined by the credible interval and the LCPO is found to be identical. Four candidate models are fitted in total for the primary domain, these are summarized with WAIC and LCPO in Table 1.

Table 1 Comparison of model results for the primary domain

To undertake formal prediction of wetland presence for the primary domain, several steps are required. Firstly, all data used in fitting the best performing sub-model (model2) is duplicated. Next, the response variable of the duplicate dataset is set to “NA” and then re-combined with the original data as required by the r-INLA package. The combined dataset, now twice the length used for initial fitting, is re-fit using the model specification from model2. This process leverages model fixed −effects to perform prediction of wetland presence while controlling for spatial processes.

As a means of gauging predictive accuracy, results for spatial and non −spatial models are compared using a Brier Score. The Brier score is a proper score function that measures the accuracy of probabilistic predictions for binary outcomes and is comparable to the mean squared error. The Brier Score is calculated as the sum of the differences of predicted probabilities of wetland presence and the value one (p) for all cells identified as having wetlands by the land cover training data used during pre-processing. Overall model accuracy can in-turn be estimated by a scaled Brier Score given as B r i e r s c a l e d =(1−B r i e r/B r i e r m a x )×100 % , where \(Brier_{max} = \bar p\times (1 - \bar p)\). The scaled Brier Score measures model accuracy over the more intuitive range from 0 % to 100 % such that the lower the score, the more accurate the prediction. Applied in this manner, the Brier Score is comparable to Pearson’s R 2 statistic (Hu et al. 2006).

Results

Results from the preliminary analysis indicate that four decomposed soil variables are significant predictors of wetland presence as determined by the credible interval. These variables are retained for model fitting and include the V1 and V2 variables resulting from the PCA and the MCAV3 and MCAV5 variables from the MCA. The continuous soil attributes listed in Table 2 were found to be significantly correlated (p-value < 0.001) to both V1 and V2 and are displayed with Pearson correlation coefficient (r). Table 3 provides the categorical soil attributes that best describe MCAV3 and MCAV5 with corresponding coefficient of determination (R 2). For the categorical attributes, an ANOVA with one factor is conducted for each dimension by regressing the categorical attributes onto the coordinates of the individuals and a F-test is performed to evaluate variable influence on the attribute (Lê et al. 2008).

Initial fitting of the model with the spatial random-effect term in the absence of fixed effects results in a map of smoothed densities relative to the domain mean [Fig. 4a]. Warm colors indicate cells with a density that exceeds the mean density for the domain and cool colors highlight cells where densities fall below the mean. On the whole, the map denotes several areas of elevated density. A linear pattern extends from near the northwest corner and traces a path diagonally across the domain to the southeast corner concomitant with the Wakulla River floodplain. The most concentrated area of density is located in the southeast corner of the domain. Geographically, the southeast corner approaches the confluence of the Wakulla River with the Gulf of Mexico. The southwest and northeast corners demarcate comparatively less-dense areas.

Table 2 Dimension descriptions for the V1 and V2 decomposed soil variables retained for model fitting
Table 3 Dimension descriptions for the MCAV3 and MCAV5 decomposed soil variables retained for model fitting
Fig. 4
figure 4

Spatially structured random effects for Domain 1. a Structured random effects present prior to adding fixed effects (model0). b Structured random effects after accounting for fixed effects (model3). Percentages indicate change above (warm colors) and below (cool colors) the domain mean density

Adding the fixed effects identified with the selected model (model2) allow for remapping of the random-effect while controlling for the soil and hydrologic fixed effects [Fig. 4b]. Regions of elevated density still evident in Fig. 4b provide an estimate for the structural processes that remain after accounting for the indicators of wetland presence specified by the selected model. In comparison to the raw random-effects shown in Fig. 4a, the corrected map for sub-model model2 depicted in Fig. 4b displays a distinct decrease in density across the domain as a whole and particularly along the linear pattern associated with the Wakulla River.

Posterior densities and the corresponding credible intervals for model fixed effects are summarized in Table 4. The first decomposed soil variable (V1) has a posterior mean of 0.4453[(0.0533, 0.8399) 95 % credible interval]. This translates to a 60.95 % [(1/(1 + exp(−0.4453))) × 100 %] increase in the probability of wetland presence for each whole number increase in the V1 value while holding all other fixed effects constant. Comparatively, the posterior mean of the second soil variable (V2) indicates a 12.69 % increase in the probability of wetland presence for each whole number decrease in V2. The decomposed soil variables resulting from the MCA analysis of categorical data (MCAV3 and MCAV5) indicate that the probability of wetland presence increases 98.86 % and 56.00 % respectively for each unit increase while holding other effects constant. The posterior mean of OM indicates that for each unit decrease in organic mass, the probability of wetland presence increases 44.53 % holding others constant. For each whole unit decrease in the Topographic Position Index (TPI), the probability of wetland presence increases 39.04 % with other effects constant. The posterior mean of the Compound Topographic Index (CTI) translates to a 86.22 % increase in wetland presence for each increase in the CTI index value while holding other fixed effects constant.

Table 4 Summary of posterior mean and 95 % Credible Interval for sub-model model2 fixed effects

To gauge predictive accuracy, results for spatial and non −spatial models are compared using a scaled Brier Score. The lower the Brier score, the better calibrated the model. Resulting scaled Brier Scores indicate that the spatially explicit model produces substantially increased predictive accuracy over its non −spatial equivalent across all domains. In the case of the primary domain (Domain 1), the spatial model achieves a scaled Brier Score of 90.7 % compared to a 75.8 % for the non −spatial model. Results for the three predicted domains are summarized in Table 5.

Table 5 Summary of scaled Brier Scores
Fig. 5
figure 5

Comparison of posterior distributions. a Posterior distributions of fixed effects at 30 m, 50 m, and 70 m spatial resolutions. b Posterior distributions of fixed effects at extents of 100 km 2, 200 km 2, and 300 km 2

After constructing a model for the primary domain, model robustness is evaluated by changing the spatial resolution, areal extent, and geographic location of the study area. To examine the influence of resolution, the primary domain is reevaluated using cells with a 30 m edge length, and then with cells having a 70 m edge length. The 30 m resolution is selected as it is the resolution of the SAVI covariate. Figure 5a depicts the posterior distributions of model fixed effects at 30 m and 70 m resolutions relative to the 50 m resolution used when fitting the original model. Generally, refining spatial resolution results in an increase to the absolute value of the TPI, MCAV3, and OM fixed effects while a decrease in absolute value is noted for CTI. Although the responses of individual covariates differ, all of model2’s fixed effects remain significant at both the 30 m and 70 m resolutions.

Doubling the domain areal extent from 100 km 2 to 200 km 2 does not result in the loss of any significant fixed effects [Fig. 5b]. That is, all covariates remain significant at both the 200 km 2 and 300 km 2 areal extents. No general pattern is noted in regard to the influence of extent on the absolute value of fixed effects as each of the covariates uniquely respond to the change; however, a sign change is apparent for several covariates (V1, MCAV3, and MCAV5).

To evaluate model performance in other physical landscapes, the selected model from the primary domain (Domain 1) is applied for two other locations in Florida. Firstly, model fitting and prediction of wetland presence is carried out for a different 100 km 2 area (Domain 4) in the same physiographic province as the primary domain and secondly for a 100 km 2 area (Domain 5) in peninsular Florida. Echoing the prediction procedure for the primary domain, both spatial and non −spatial models are implemented to better understand the contribution of the spatial random-effect term.

As with the primary study area, fitting of the fourth and fifth domains with the spatial random-effect term in the absence of fixed effects reveals regions of elevated density. In both cases, subsequent control of fixed effects reduces total domain density. Comparison maps of random-effects for the fourth domain are provided in Fig. 6 and maps for the fifth domain are shown in Fig. 7.

Fig. 6
figure 6

Spatially structured random effects for Domain 4. a Structured random effects present prior to adding fixed effects (model0). b Structured random effects after accounting for fixed effects (model2). Percentages indicate change above (warm colors) and below (cool colors) the domain mean density

Fig. 7
figure 7

Spatially structured random effects for Domain 5. a Structured random effects present prior to adding fixed effects (model0). b Structured random effects after accounting for fixed effects (model3). Percentages indicate change above (warm colors) and below (cool colors) the domain mean density

The wetland maps produced for each of the five study domains identify probable wetlands at locations not identified as wetlands by the National Wetlands Inventory (NWI). Review of aerial imagery for locations with a relatively high predicted probability of presence (e.g. probability greater than 0.75) verify evidence of wetlands in the majority of these cases. Figure 8 compares model predicted wetlands for the primary domain (Domain 1) to those identified by the NWI for the same area. The map located at the bottom (C) of Fig. 8 displays the difference between model predicted wetlands and those identified by the NWI. This illustration was created by reclassifying locations identified as wetlands by the NWI to 0 %. Comparison maps for Domain 4 and Domain 5 are provided in Figs. 9 and 10 respectively. To aid in interpretation of Fig. 10, a Flood Insurance Rate Map (FIRM) produced by the Federal Emergency Management Agency (FEMA) is also provided [Fig. 10c] to illustrate the base floodplain (i.e., 100 year floodplain, Flood Hazard Zones “A” and “AE”). FIRM data is freely available from the Federal Emergency Management Agency [https://msc.fema.gov/portal]. Assuming locations with a model predicted probability of wetland presence greater than the arbitrary threshold of 0.50 to represent realized wetlands, models for all domains identify a greater area of wetland (number of cells) than does the NWI. Beyond comparison of total area, model predicted wetlands across all domains include numerous flow ways, transitional areas, and ecotone gradients outside of areas identified by the NWI as wetlands. Typically, predicted probabilities for transitional areas were between 0.30 and 0.49. Because the probability of wetlands at these locations fall below the arbitrary threshold of 0.50, these extents are not included in the areal comparison provided above for illustrative purposes.

Fig. 8
figure 8

Comparison of model predicted wetlands for Domain 1 (northwest Florida) with wetlands identified by the NWI. Top map (a) depicts the extent of wetlands as predicted by the model. The map at center (b) displays wetland extents as identified by the NWI. The map at bottom (c) displays wetland extents identified by the model but not identified by the NWI (e.g. NWI wetlands reclassified as 0 %). All maps represent approximately 100 km 2 and are composed of square cells with an edge length of 50 m

Fig. 9
figure 9

Comparison of model predicted wetlands for Domain 4 (northwest Florida) with wetlands identified by the NWI. Map (a) depicts the extent of wetlands as predicted by the model. Map (b) displays wetland extents as identified by the NWI. All maps represent approximately 100 km 2 and are composed of square cells with an edge length of 50 m

Fig. 10
figure 10

Comparison of model predicted wetlands for Domain 5 (southwest Florida) with wetlands identified by the NWI and FEMA FIRM High Risk flood hazard areas (A and AE). Top map (a) depicts the extent of wetlands as predicted by the model. Map at center (b) displays wetland extents as identified by the NWI. Map at bottom (C) displays FEMA FIRM “High Risk” flood Zones A and AE (i.e., base floodplain). All maps represent approximately 100 km 2 and are composed of square cells with an edge length of 50 m

Discussion

The identification and inventory of wetlands is essential to natural resource management. To be effective in these endeavors, it is critical that the procedures used to detect and document wetlands be time efficient, accurate, and economical. Aerial photographic interpretation of land cover by individual human analysts necessitates hours of assessment, introduces human error, and fails to include the best available soils and hydrologic data. Furthermore, photographic interpretation results in products that assume the patch matrix perspective. Under the patch matrix view, real-world landscape gradients are reduced to dichotomous patches of wetland presence or absence. The current study leverages probabilistic models to predict wetland presence as a continuous gradient (from not likely to certain) with explicit consideration of spatial processes.

The presented model incorporates freely available elevation, land use, and soils data and is executed as a single R-code script. In the experience of the authors, identification of wetlands over a 100 km 2 area via photographic interpretation necessitates approximately 40 work hours by an experienced human analyst. By comparison, the presented model can evaluate 100 km 2 at 50 meter resolution in approximately 50 minutes (Intel Core i7 4712HQ x4 2.3GHz 16GB) while incorporating ancillary data and accounting for latent spatial processes. Model results demonstrate an ability to consistently capture training data derived from heads-up assessment with greater than 90 % accuracy as judged by the scaled Brier Score, to out perform non-spatial linear models, and to identify wetland extents, ecotones, and hydrologic connections not included in either the training data or the NWI.

Although, the provided model is reasonably robust to changes in resolution, areal extents between 100 km 2 and 300 km 2, and region-specific physical conditions, it is advisable that model fitting procedures be repeated for each unique study domain as a means of calibrating the model to local biophysical conditions. It may also be the case that the environmental covariates explored in the current study are not available for some locations outside of the United States, or that other topographic or vegetative indices are found to offer greater explanatory power due to region-specific characteristics. That is, caution should be used when exporting the best performing model from one region to another. The application of the best performing model from the primary domain to other locations in this study was for the sole purpose of evaluating model sensitivity. Others are encouraged to utilize the presented modeling approach, to apply a gradient-based perspective, to leverage spatial structure towards prediction, and to widely explore other potential environmental covariates.

Finally, several aspects of this study warrant further investigation and it is the authors’ hope that the presented work inspires exploration by other researchers. For example, although decomposition of soils data facilitated the development of several strong predictors of wetland presence, results from the sensitivity analysis bring to light issues in regard to consistent use of such an approach. Among these are issues relating to the shift of a given dimension between positive and negative signs at differing spatial extents, changes in the magnitude of effect sizes over different study domains, and the overall repeatably of what is essentially a data mining technique. Many of these potentially confounding factors were not fully explored in this study, as the presented model framework is motivated by prediction rather than explanation or causality.