Introduction

The surface elevation table (SET)–marker horizon (MH) approach (SET-MH, together) is a method for quantifying surface elevation change through measurements of both surface and subsurface processes that control wetland soil elevation (Cahoon et al. 2002a, b, 1995; Callaway et al. 2013; Lynch et al. 2015). The SET-MH approach has been widely used for documenting and interpreting trends in surface elevation dynamics over time, comparing elevation trends between different wetland types (Howard et al. 2020; Krauss et al. 2010; McKee and Vervaeke 2018), differentiating between surface and subsurface contributions to surface elevation change (McKee 2011; McKee et al. 2007; Stagg et al. 2016), understanding the effects of disturbance on surface elevation change (Cahoon 2006; Yeates et al. 2020), and assessing wetland vulnerability to sea-level rise (Cahoon et al. 2006; Jankowski et al. 2017; Saintilan et al. 2022; Sasmito et al. 2016), in addition to many other applications (Webb et al. 2013). Traditionally, rates of surface elevation change derived from SET data have been estimated from simple linear regression analyses where the independent variable is time since the first measurement and the dependent variable is cumulative surface elevation change relative to the first measurement (Cahoon and Lynch 1997; Krauss et al. 2010) (Fig. 1a). However, several recent studies have shown that elevation change dynamics in coastal wetlands can include both linear and non-linear relationships. For example, surface elevation data can include abrupt changes or disjunct patterns resulting from the impact of extreme events, such as hurricanes or floods (Fig. 1b) (Feher et al. 2020; Moon et al. 2022; Osland et al. 2020; Whelan et al. 2009), or from planned management or restoration activities (Fig. 1c) (Anisfeld et al. 2016; Cahoon et al. 2019; Krauss et al. 2017). Additionally, surface elevation data can also include repeating, cyclical patterns related to short-term seasonal effects (Fig. 1d) (e.g., growing season vs. non-growing season, or dry season vs. wet season) (Cahoon et al. 2011; Whelan et al. 2005), or long-term climate cycles (e.g., El-Niño vs. La-Niña) (Rogers and Saintilan 2008). Thus, additional methods can help to quantify rates of surface elevation change when complex non-linear patterns are apparent in the data.

Fig. 1
figure 1

Examples of surface elevation change data with linear (a) or non-linear patterns resulting from (b) a sudden increase in surface elevation resulting from sediment deposition during a hurricane, (c) a rapid increase in surface elevation resulting from the growth of mangroves planted during restoration, or (d) a cyclic trend resulting from water table fluctuations during the dry vs. wet seasons

Generalized additive models (GAMs) are an extension of generalized linear models (GLMs) that replace the linear component with a known function referred to as the sum of the smooths of the predictor (James et al. 2013). These smooth terms are penalized regression splines that are fit by restricted likelihood maximization and are composed of basis functions that can be conceptualized as a continuous set of connected piecewise polynomial functions (Hastie et al. 2009; Wood 2017). GAMs have previously been used to analyze other complex ecological processes such as cyclical changes in water quality (Murphy et al. 2019), modeling the spatial distribution of species and communities (Guisan et al. 2002), quantifying long-term patterns in wetland area change (Couvillion et al. 2017), and assessing historical environmental conditions based on palaeoecological time series (Simpson 2018), among multiple other applications. In contrast to traditional linear models, GAMs offer greater flexibility in modeling because the underlying form of the relationship between variables is not specified prior to estimation but is instead automatically derived from the data itself during model fitting (Yee and Mitchell 1991). Additionally, GAMs can include diverse combinations of both linear and non-linear components that reveal hidden or unexpected patterns in the data while improving the model fit (Simpson 2018). Compared to non-parametric modeling or other data-driven methods (e.g., artificial neural networks or random forests), GAMs maintain a high level of interpretability since many well-known tools for model selection and inference in linear models are applicable to GAMs (Wood 2020a). GAMs are sometimes described as an interpretable machine learning method (Molnar 2022) because the form of the relationship between variables is learned from the data. Thus, GAMs can be considered semi-parametric models that combine the interpretability of parametric models with the flexibility of non-parametric models (Guisan et al. 2002).

Here, we demonstrate the value of GAMs for analyzing non-linear patterns of surface elevation change in coastal wetlands. First, we compared rates of surface elevation calculated from a simple linear model and a GAM in order to illustrate the utility of GAMs for calculating surface elevation change from long-term SET data with an apparent linear trend. Second, we compared rates of surface elevation change calculated from a simple linear model, a segmented model (also known as a piecewise model), and a GAM to demonstrate the utility of GAMs for calculating surface elevation change from SET data with apparent non-linear trends. Finally, we illustrate how the GAM approach can be used to effectively quantify and compare rates of surface elevation change at both the site- and landscape-level scales.

Methods

Surface Elevation Change Data

We quantified and compared rates of surface elevation change using surface elevation table (SET) data from five sites (NE Florida Bay-7, Shark River-3, Shark River-1, Shark River-4, and Shark River-2) located within Everglades National Park (USA). Data from these sites were originally presented and analyzed as part of a regional synthesis of surface elevation change data from mangrove forests and coastal marshes in the Greater Everglades region (Feher et al. 2019, 2022a). Data from these five specific sites were selected for our model comparisons due to (a) the relatively long length of the data records (> 10 years), (b) the consistency of data collection (bi-annual or annual), and (c) the variety of linear and non-linear trends apparent in the data from these sites. Further details on the SET-MH approach can be found in Cahoon et al. (2002a), Cahoon et al. (2002b), Callaway et al. (2013), and Lynch et al. (2015).

The first site—NE Florida Bay-7—is a tall, highly productive mixed mangrove forest located in the western basin of Florida Bay (Coronado-Molina et al. 2012). SET measurements were conducted on an annual basis between 1997 and 2019, for a total of 23 sampling events. Visual inspection of the SET data from NE Florida-7 appeared to show a linear increase in surface elevation over time. We compared rates of surface elevation change for NE Florida Bay-7 that were estimated from either a simple linear model, or a generalized additive model to demonstrate the utility of generalized additive models (GAMs) for calculating surface elevation change from SET data with an apparent linear trend.

Shark River-3 is located along the lower Shark River within a mature mangrove forest dominated by a mix of red, white (Laguncularia racemosa), and black mangroves (Avicennia germinans). Three replicate SETs were used to monitor surface elevation change on a bi-annual to annual basis between 1998 and 2021, for a total of 66 sampling events. Previous work by Feher et al. (2020) found that surface elevation change at Shark River-3 was characterized by abrupt changes due to sediment inputs from the passage of Hurricanes Wilma and Irma in 2005 and 2017, respectively. Indeed, visual inspection of the SET data from Shark River-3 shows a sudden increase in elevation following Hurricane Wilma in 2005, followed by a short period of elevation loss, and then a secondary period of elevation gain prior to another sudden increase in elevation following Hurricane Irma in 2017. We compared rates of surface elevation change for Shark River-3 that were estimated from a simple linear model, a segmented model, and a generalized additive model to demonstrate the utility of GAMs for calculating surface elevation change from SET data with apparent non-linear trends.

Shark River-1 is located along the lower Shark Slough within a freshwater sawgrass (Cladium jamaicense) dominated marsh. Three replicate SETs were used to monitor surface elevation change at Shark River-1 on a bi-annual to annual basis between 1999 and 2018, for a total of 39 sampling events. Shark River-4 is located along the lower Shark River within a mature mangrove forest dominated by a mix of red, white, and black mangroves. Three replicate SETs were used to monitor surface elevation change at Shark River-4 on a bi-annual to annual basis between 2006 and 2021, for a total of 31 sampling events. Shark River-2 is located along the lower Shark River within a mature mangrove forest dominated by a mix of red, white, and black mangroves interspersed with sawgrass (C. jamaicense). Three replicate SETs were used to monitor surface elevation change at Shark River-2 on a bi-annual to annual basis between 1998 and 2021, for a total of 55 sampling events. We calculated rates of surface elevation change from site-level GAMs for Shark River-1, Shark River-4, and Shark River-2 to illustrate how the GAM approach can be used to effectively quantify and compare rates of surface elevation change across landscape-level scales. Note that for sites with multiple replicate SETs, data were averaged to the site-level prior to analyses. Further details about site conditions, data collection, and data preparation can be found in Feher et al. (2022a).

Simple Linear Models: Calculating Rates of Change

In the site-specific simple linear models for NE Florida Bay-7 and Shark River-3, surface elevation change on each sampling date was the response variable and time relative to the establishment of the SET-MH site in decimal years was the explanatory variable (Table 1). We used the function “lm” from the R package “stats” to fit a separate linear model for each site (R Core Team 2021).

Table 1 Regression model forms for simple linear regression, segmented regression, and generalized additive models (GAM) with sample R code

Segmented Models: Calculating Rates of Change

Segmented regression models, the nonlinearity in the relationship between two variables with separate regression slopes in distinct intervals of the independent variable domain (Toms and Lesperance 2003). To determine time-period specific rates of surface elevation change for Shark River-3, we first divided the data into four time periods: (1) Pre-Wilma—a consistent period of elevation gain leading up to the landfall of Hurricane Wilma in October 2005; (2) Post-Wilma #1—a consistent period of linear elevation loss following Wilma between October 2005 and spring of 2008; (3) Post-Wilma #2—a consistent period of linear elevation gains between spring of 2008 and the landfall of Hurricane Irma in September 2017; and (4) Post-Irma. We did not assess rates of change for the post-Irma period because this period is still underway and will require additional data collection for accurate rate determination. Whereas the breakpoints for the Pre-Wilma and Post-Irma periods corresponded to the landfall dates of Hurricane Wilma and Hurricane Irma (October 2005 and September 2017, respectively), we used the “segmented” function from the R package “segmented” to estimate the breakpoint location between the Post-Wilma #1 and Post-Wilma #2 periods (Muggeo 2022). We then used the “lm” function from the R package “stats” to fit a linear model where surface elevation change on each sampling date was the response variable and the interaction between time-period, which was coded as a categorical predictor, and time relative to the establishment of the SET-MH site in decimal years, which was coded as a continuous predictor, was the explanatory variable (Table 1). Note that this is equivalent to fitting a simple linear model where the breakpoints are parameterized as a series of indicator functions for each time interval.

GAM Models: Calculating Rates of Change

In the site-specific GAMs for each of the five sites, surface elevation on each sampling date was the response variable, and the explanatory variable was time relative to the establishment of the SET-MH site in decimal years represented by a penalized thin-plate regression spline (i.e., the smooth term) that was estimated by restricted maximum likelihood (REML) (Table 1). We used the “gam” function from the R package “mgcv” to fit a separate GAM model for each site (Wood 2020b), starting with a maximum basis size (K) of three to minimize over-fitting. In simple terms, the basis dimension is the maximum possible degrees of freedom allowed for the smooth term in the model. We used the “gam.check” function from the “mgcv” package to ensure that each fitted model conformed to the assumptions of the GAM approach, and to determine if the initial basis was adequate to represent any non-linear patterns in the data (Wood 2017, 2020b). If the K-index values for a model indicated that the initial basis size of three was too low for the smooth term, we increased the basis size by increments of one until the model fit stabilized. A model was considered stabilized when subsequent increases to the basis size did not affect the effective degrees of freedom for the smooth term or improve the model’s smoothing parameter selection score (Wood 2020b). To make comparisons to sea-level rise, managers frequently require a single estimate for the rate of wetland surface elevation change, which has traditionally been derived from the slope coefficient generated in simple linear regression analyses. However, a comparable single rate is not a standard output of GAMs. Thus, to generate a single, overall estimate of the rate of surface elevation change for each site, we used the fitted GAM models to calculate the rate of surface elevation change as the mean of 200 equally spaced first derivatives of the function representing the smooth term for time in each site-specific model via the “derivatives” function from the R package “gratia” (i.e., finite difference approximation) (Simpson 2021). Similarly, standard errors for the rates of surface elevation change estimated from the GAM models were calculated as the mean of the standard errors of the previously mentioned 200 equally spaced first derivatives.

Model Comparisons

For the site-level comparisons for NE Florida Bay-7 and Shark River-3, we compared the rates of surface elevation change estimated from each of the different model types. For the region-wide comparisons between Shark River-1, -4, and -2, we compared the shapes of the GAM models for each site using the effective degrees of freedom (EDF), which describes the degree of non-linearity or wiggliness of the fitted GAM curve (Wood 2017). An EDF of 1 indicates a linear relationship, whereas an EDF greater than 1 but less than 2 represents a weakly non-linear relationship, and an EDF greater than 2 represents a highly non-linear relationship (Zuur et al. 2009). Thus, we used the EDF of the GAM models to assess the degree of non-linearity or complexity of the fitted GAM curves. To facilitate model comparisons, rates of surface elevation change and model r2 values are presented in the following section and as well as in Table 2.

Table 2 Site-level rates of surface elevation change estimated from the three different model types and model r2 values

Results

For NE Florida Bay-7, the rate of surface elevation change estimated from the simple linear model was 4.13 ± 0.24 mm year−1 (F1,21 = 295.6, p < 0.001, r2 = 0.93) (Fig. 2a), whereas the rate of surface elevation change estimated from the generalized additive model (GAM) was 4.11 ± 0.24 mm year−1 and the GAM model EDF was 1.0 (p < 0.001, r2 = 0.93) (Fig. 2b) (Table 2).

Fig. 2
figure 2

Surface elevation change (SEC) at the SET-MH site NE Florida Bay-7 fit with (a) a simple linear model and (b) a generalized additive model (GAM)

For Shark River-3, the rate of surface elevation change estimated from the simple linear model was 4.07 ± 0.22 mm year−1 (F1,64 = 353.3, p < 0.001, r2 = 0.84) (Fig. 3a). For the Shark River-3 segmented model, the breakpoint between the Post-Wilma #1 and Post-Wilma #2 periods was estimated to occur on May 22, 2008. The rate of surface elevation change estimated from the segmented linear model was − 0.25 ± 0.57 mm year−1 for the Pre-Wilma period, − 9.24 ± 2.78 mm year−1 for the Post-Wilma #1 period, and 4.00 ± 0.74 mm year−1 for the Post-Wilma #2 period (F5,54 = 115.1, p < 0.001, r2 = 0.91) (Fig. 3b). The rate of surface elevation change estimated from the Shark River-3 GAM model was 3.67 ± 1.62 mm year−1 and the GAM model EDF was 6.6 (p < 0.001, r2 = 0.93) (Fig. 3c) (Table 2).

Fig. 3
figure 3

Surface elevation change (SEC) at the SET-MH site Shark River-3 fit with (a) a simple linear model, (b) a segmented linear model, and (c) a generalized additive model (GAM)

For Shark River-1, the rate of surface elevation change estimated from the GAM model was 3.95 ± 0.56 mm year−1 and the model EDF was 1.0 (p < 0.001, r2 = 0.57) (Fig. 4a). The rate of surface elevation change at Shark River-4 estimated from the GAM model was 2.67 ± 1.20 mm year−1 and the model EDF was 3.2 (p < 0.001, r2 = 0.87) (Fig. 4b). For Shark River-2, the rate of surface elevation change estimated from the GAM model was 2.76 ± 2.09 mm year−1 and the model EDF was 8.9 (p < 0.001, r2 = 0.89) (Fig. 4c) (Table 2).

Fig. 4
figure 4

Surface elevation change (SEC) fit with GAMs at the SET-MH sites (a) Shark River-1, (b) Shark River-4, and (c) Shark River-2. Note that as the form of the fitted GAM increase in complexity from top to bottom, the effective degrees of freedom (i.e., EDF) for each site-level model also increases

Discussion

Here, we introduced the rationale and methodology for using generalized additive models to analyze patterns of surface elevation change in coastal wetlands. In the subsequent sections, we discuss the advantages of GAMs over other commonly used approaches for analyzing surface elevation table data.

First, previous studies that utilized the SET-MH approach have typically used simple linear regression to quantify rates of surface elevation change (Callaway et al. 2013). However, our analyses illustrate how GAMs can be used to model linear or non-linear relationships in surface elevation change due to the data-driven nature of the GAM approach (Yee and Mitchell 1991). For example, the plots of the fitted values derived from both the simple linear regression and the GAM model for NE Florida Bay-7 are similar, indicating that there is a linear relationship between surface elevation change and time (Fig. 2). Additionally, the rates of surface elevation change for NE Florida Bay-7 obtained from the simple linear model and the GAM model were similar (4.13 vs. 4.11 mm year−1, respectively) and the variation explained by each model was identical (r2 = 0.93) (Fig. 2). Finally, the GAM model for NE Florida Bay-7 had an EDF of 1.0, indicating that a linear form was the best fit for the data at this site, despite the model being initially parameterized with a maximum basis dimension (i.e., K) of 3.0. Thus, GAM models fitted to data with linear trends are comparable to the simple linear regression models that have been traditionally used to analyze SET data (Russell et al. 2022), but with the added advantage that the relationship is not assumed to be linear prior to model fitting.

Second, several recent studies have shown that elevation change dynamics in coastal wetlands can include complex non-linear patterns resulting from random or irregular events such as hurricanes, shrink-swell cycles, or planned restoration activities (Cahoon et al. 2019; Feher et al. 2020; Moon et al. 2022; Osland et al. 2020). Our analyses demonstrate the advantages of using GAM models, as opposed to simple linear regression analyses, to model trends in SET data with abrupt changes or disjunct patterns. For example, the plot of the fitted values derived from the simple linear regression for Shark River-3 suggests that the linear model may not be ideal for capturing the apparent non-linear trends in the SET data from this site (Fig. 3a). Because we knew that surface elevation at Shark River-3 had been impacted by two rapid sediment deposition events (Hurricanes Wilma and Irma in 2005 and 2017, respectively), we used a segmented linear regression to estimate different linear rates of surface elevation change for the three distinct time intervals described in Feher et al. (2020): Pre-Wilma, Post-Wilma #1, and Post-Wilma #2 (Fig. 3b). Although the segmented model represents an improvement over the linear model in that it appears to better fit the data and allows for the calculation of different trends over time, there are several disadvantages to the segmented regression approach. First, performing a segmented linear regression requires either background knowledge of the number and location of the breakpoints prior to model fitting, or, if the breakpoints are not known beforehand, using a detection algorithm to estimate breakpoints that may or may not be sensible for the data (Cahoon et al. 2019; Feher et al. 2020). Additionally, the use of a segmented regression makes it difficult to estimate a single, determinant long-term rate of surface elevation change that may be needed to determine if the site is keeping pace with sea-level rise (Cahoon et al. 2006; Lynch et al. 2015). Similarly, rates of surface elevation change derived from segmented regression are not easily compared to rates of surface elevation change from other sites that were estimated from either linear models or from different segmented linear models fitted with different site-specific parameters (Osland et al. 2020; Yeates et al. 2020). Given the shortcomings of the linear and segmented regression methods for estimating rates of surface elevation change from SET data with apparent non-linear trends, we then applied the GAM approach to the surface elevation change data from Shark River-3. A preliminary visual inspection of the predicted GAM values for Shark River-3 illustrates that the GAM model is likely better suited to capture the full suite of non-linear patterns in the surface elevation data from this site as compared to either the simple linear or segmented model (Fig. 3c). Additionally, a comparison of the r2 values between the three models also suggests that the GAM model explains a higher proportion of the variance in the data (linear r2 = 0.84; segmented r2 = 0.91; GAM r2 = 0.93). Most importantly, the GAM approach yields a single value for the long-term annual rate of surface elevation change at Shark River-3 that is straightforward in its interpretation and can be easily compared to rates of change from other SET sites in the region or to current or future rates of sea-level rise (Feher et al. 2022a).

Third, our analyses illustrate how the GAM approach can be used to effectively quantify and compare rates of surface elevation change across different coastal wetlands at a landscape-level scale. We fit site-specific GAM models to the SET data from three different sites (Shark River-1, -4, and -2) located along the Shark River in Everglades National Park. Since GAMs do not require prior knowledge of the potential form of the relationship before model fitting, we were able to fit a site-specific model to the SET data from each site with minimal subjectivity or preliminary consideration of unspecified trends (Hastie and Tibshirani 1986). As implemented in the R package “mgcv,” the GAM approach allowed for relatively simple “tuning” of the parameters for each model as we used the “gam.check” function to identify an appropriate basis size (i.e., K) for each site-specific model (Table 3) (Wood 2020b). Additionally, the flexibility of GAMs allowed us to select site-specific models that incorporated both linear and non-linear patterns as needed since GAM models are additive in nature and do not require the selection of a single model form (Yee and Mitchell 1991). Thus, despite the obvious differences in shape and effective degrees of freedom (EDF) among the three fitted site-specific models (Fig. 4), the GAM approach enabled the calculation of site-specific values for the annual rate of surface elevation change that could be easily compared among the three sites.

Table 3 Site-level surface elevation GAM model parameters. “K” represents the basis dimension used in the model and “REML” (i.e., restricted maximum likelihood) represents the smoothing parameter selection score estimated by restricted maximum likelihood

Finally, we emphasize that our intention with this communication is not to imply that GAMs are a perfect solution to all of the many issues that can arise in the analyses of SET-MH data but rather to introduce the GAM method as another option or “tool in the toolbox” for dealing with some of these issues and to provide a demonstration of its application to real-world data. In fact, with a few exceptions, we note that most of the previous studies that have used SETs to measure rates of surface elevation change have successfully utilized simple linear models since these authors were usually most interested in quantifying and comparing rates of change between only a few sites that were installed in a relatively small, uniform area as part of a single cohesive study where non-linear trends were not readily apparent. Therefore, we suggest that future research efforts that endeavor to use the GAM method carefully consider whether the use of the technique is both statistically and biologically justifiable for the specific situation or if simpler model forms (i.e., simple linear, polynomial, or segmented regression) could be sufficient. In this sense, researchers may wish to use the GAM method for initial, exploratory data analyses to determine if non-linear trends are present and to examine how the presence of these trends could influence the estimation of surface elevation change. Additionally, while we have limited our discussion of the GAM method to models where the single explanatory variable was time relative to the establishment of the SET-MH site, the process of model selection and refinement for manipulative studies that include multiple covariates would undoubtedly be more complex and study-specific than the process that we have detailed here (Pedersen et al. 2019; Simpson 2018). Thus, while few studies have used GAM models with SET data (with the exceptions of Feher et al. (2022a) and Moon et al. (2022)), we suggest that the method described here is not dissimilar from other ecological publications that have utilized GAMs to detect significant rate changes (Fewster et al. 2000; Large et al. 2013; Mariën et al. 2022) or to predict responses at specific values within the original model domain (Drexler and Ainsworth 2013; Wood and Augustin 2002; Yee and Mitchell 1991).

Conclusion

Whereas past studies that utilized the SET-MH approach have most often quantified rates of surface elevation change using simple linear regression analyses, several recent studies have shown that elevation patterns often include a diverse combination of linear and non-linear relationships (Feher et al. 2022a; Moon et al. 2022). Our analyses show that GAMs provide a relatively simple and flexible approach to analyzing non-linear patterns of surface elevation change in coastal wetlands. More specifically, GAMs minimize the potential for bias (i.e., underfitting) that can occur with linear models by allowing for additive combinations of complex, unknown non-linear terms (Wood 2017). Similarly, GAMs also minimize the potential for overfitting (i.e., high variance) that can occur with polynomial or segmented models by imposing a penalty on the smooth term (Pedersen et al. 2019; Wood and Augustin 2002). By utilizing GAMs, we were able to effectively quantify and compare rates of surface elevation change across landscape-level scales, while minimizing subjectivity and incorporating both linear and non-linear patterns. Note that although we have focused on using GAMs for analyzing SET data, the GAM approach documented here can also be applied to marker horizon (MH) data to quantify rates of vertical accretion (Feher et al. 2022a; Moon et al. 2022). Finally, although we have attempted to illustrate some of the range and possibilities for applying GAMs to the analysis of surface elevation change in coastal wetlands, this paper should be viewed as an application of the technique to SET-MH data, and we refer the reader to the rich literature on GAMs for further information (Appendix).