Highlights

  • Simple terrain metrics of hillslope position and shape reduce variance in FS estimates

  • Ridgetop FS was highest in a wet year, while valley floor FS was highest in a drought

  • Growing season FS estimated from hourly sampling was about 1.25x greater than from weekly sampling

Introduction

Soil CO2 efflux (FS), the release of CO2 from soils to the atmosphere, is the second largest flux in the global carbon (C) cycle (Friedlingstein and others 2020). An estimated 90 Pg CO2-C diffuse from soils to the atmosphere each year, roughly 10 times larger than annual global anthropogenic CO2 emissions (Schlesinger and others 2013), and potentially increasing at a rate of about 0.1 Pg C year−1 (Bond-Lamberty and Thomson 2010; Hashimoto and others 2015). Despite its importance, the land surface exchange of CO2 with terrestrial ecosystems bears the largest uncertainty bounds in current global C budgets (Todd-Brown and others 2013; Friedlingstein and others 2014). In fact, FS is often estimated as a residual from other, better-known variables in the global C budget (for example, Le Quéré and others 2016).

Spatial heterogeneity contributes to the uncertainty in FS. Although FS has been measured extensively in the past century (Jian and others 2021), these point measurements are scaled to ecosystem, landscape, or global estimates using modeling frameworks that assume flat terrain (for example, Dai and others 2004; Mao and others 2016): an assumption violated by over 50% of the global land surface (Rotach and others 2014). Areas of complex terrain can be significant terrestrial sinks of atmospheric CO2; for example, Reyes and others (2017) estimate about 15% of C sequestration in the conterminous US occurs in topographically complex areas. Complex terrain influences soil temperature, as well as the lateral distribution of water, sediments, nutrients, and C, all of which may influence FS. Research from the Susquehanna Shale Hills Critical Zone Observatory (CZO) (where our field site is located) revealed significant relationships between topography and soil organic C storage (Andrews and others 2011), soil pCO2 (Hasenmueller and others 2015; Hodges and others 2019), and aboveground and belowground tree C storage (Smith and others 2017; Orr 2016). This complements a growing body of evidence linking topography to the spatial distribution of C fluxes (for example, Pacific and others 2011; Shi and others 2018; Smeglin and others 2020) and the response of these fluxes to climatic changes (for example, Riveros-Iregui and McGlynn 2009; Berryman and others 2015; Reyes and others 2017).

Despite the significance of topography as a mediator of C cycling, explicit study of FS in complex terrain remains limited. The only thorough case studies we are aware of were in the US Rocky Mountain range (for example, Pacific and others 2008; Riveros-Iregui and McGlynn 2009; Riveros-Iregui and others 2012; Berryman and others 2015). At these sites, lateral redistribution of soil water from non-convergent (water shedding) to convergent (water accumulating) areas led to bidirectional responses of FS to interannual precipitation variability: landscape positions receiving high drainage had higher cumulative FS in a drought year, whereas positions with low drainage had higher Fs in a non-drought year (Riveros and others 2012). Although such work has provided a nascent understanding of mechanisms underpinning FS variability across climate and topography, the pervasiveness of complex terrain on the global land surface calls for expanded exploration beyond these (sub)alpine ecosystems (Reyes and others 2017). For example, the idea that soil saturation decreases FS is well established in laboratory incubations and wetlands (for example, Doran and others 1991). Yet very few field studies identify which upland areas may be impacted by this process and to what extent. Additionally, many Earth System Models do not capture the lateral redistribution of water that drives these patterns (Clark and others 2015).

New understanding of FS in complex terrain may be advanced by higher temporal resolution data. The spatial distribution of soil moisture can change rapidly in complex terrain as preferential flow paths redistribute rainwater. For example, at the Shale Hills CZO, preferential soil water flow paths cause high moisture following rain events to be fleeting on ridgetops and planar slopes as water is drained to convergent landscape positions in swales and the valley floors (Lin and others 2006). These rainfall events rapidly alter soil pCO2 at Shale Hills (for example, Hodges and others 2019) and Fs in other forests in complex terrain (for example, Riveros-Iregui and others 2008). Thus, rainfall events can change FS within sub-weekly timescales, with the magnitude and lag time of response related to topographic positions (Petrakis and others 2017; Riveros-Iregui and others 2008). Yet, with so few high temporal resolution Fs measurements, we lack a generalized understanding of how topography mediates the Fs response to moisture change. It is possible that a unit change in soil moisture produces the same change in Fs at a ridgetop and a valley floor, but this assumption has rarely been tested, and it may be wrong for soils that remain saturated. For example, extended anaerobic conditions in convergent areas may lead to lower Fs at a given soil moisture than non-convergent areas with brief saturation.

There are inherent tradeoffs between temporal and spatial resolution when designing sampling schemes of FS in complex terrain (Lovett and others 2005). Explorations of FS across topographic gradients often use manual chambers, allowing for many replicates across space (for example, Riveros-Iregui and others 2012; Savage and Davidson 2001). However, collecting and processing samples from manual methods is labor intensive, which leads to sampling frequencies rarely finer than weekly (Riveros-Iregui and McGlynn 2009; Riveros-Iregui and others 2012) and, more often, as low as fortnightly (Berryman and others 2015) or monthly (Hanson and others 1993; Wang and others 2019; Jiang and others 2020). Further, manual sample collection typically excludes nighttime fluxes (for example, Hanson and others 1993; Riveros-Iregui and McGlynn 2009; Riveros-Iregui and others 2012; Berryman and others 2015; Wang and others 2019; Jiang and others 2020). Thus, while manual methods may capture spatial heterogeneity in FS at longer timescales (Savage and Davidson 2003), they miss the fine temporal responses.

By contrast, automatic chambers enable continuous FS observations at hourly or sub-hourly temporal resolutions that capture nighttime fluxes and short-term responses to rain (Savage and Davidson 2003; Ruehr and others 2009; Görres and others 2016). While automatic chambers have been increasingly used to measure FS, many of these studies do not explicitly consider topographic variation (Makita and others 2018; Courtois and others 2019). Studies that do consider topography have often been limited in execution, such as one automatic chamber (Ruehr and others 2010) or one year of measurement (Liu and others 2006; Ruehr and others 2010; Tian and others 2019), with notable exceptions focused on tropical forests or plantations (Rubio and Detto 2017; Yan and others 2019). Overall, automatic chambers remain an underutilized tool in identifying the timing of spatial controls on ecosystem-level FS that may be necessary for scaling FS responses to global change.

In this study, we present one of the first multi-year, continuous datasets of FS in a temperate deciduous forest in complex terrain. Within this dataset, we analyze FS across three years representing a gradient from drought to record precipitation. We use these data to ask: how does topography influence the response of FS to interannual precipitation variability? We hypothesize that (1) adding terrain variables to standard soil temperature and moisture predictors will explain significantly more variance in estimates of growing season and daily FS across climate variability, and (2) automated chamber methods will provide the same FS estimates as manual sampling when aggregated to the growing season temporal scale and catchment spatial scale.

Methods

Site Description

We designed our soil CO2 efflux (FS) sampling scheme to capture topographic variability (Figure 1) in the Shale Hills watershed (40°40′N, 77°54′W) of the Susquehanna Shale Hills Critical Zone Observatory (He 2019). The Shale Hills watershed is a small (0.08 km2), forested, first-order catchment underlain by Rosehill shale bedrock. Catchment topography includes steep planar slopes alternating with areas of convergent flow, known as swales (Brantley and others 2018). This convergence influences productivity in the mature oak-dominated (Quercus sp.) deciduous broadleaf forest, with evidence of greater aboveground carbon uptake and storage in swales and valley floors compared to ridgetops and planar slopes (Smith and others 2017). Hillslope curvature and position also drive soil carbon, texture, and depth, with valley floor positions having deeper and wetter soils with a greater clay content than the ridgetop soils (Lin and others 2006; Supplemental Table 1). We sampled across convergent (swale) and non-convergent (planar) slopes, as well as positions along these hillslopes (ridgetop, midslope, and valley floor, by elevation). Though the landscape can also be considered a continuous variable (for example, Riveros-Iregui and McGlynn 2009), we use these categories as one approach to define replicates, discuss trends across topography, and propose a method for upscaling FS in global models. Overall, this sampling design enables us to analyze FS within nested scales: at the level of chambers, of landscape positions, and by the presence or absence of convergent flow.

Figure 1
figure 1

Location of soil CO2 efflux (FS) measurements along elevation map of Shale Hills with 2-m topographical contoured lines. FS was measured at four landscape positions: ridgetops (yellow), planar midslopes (green), swales (blue), and valley floors (gray). Measurements were collected using both automatic chambers (where squares represent the soil collar) and manual methods (where circles represent the 10-m-diameter circular macroplots, within which were three soil collars averaged to the macroplot scale).

Shale Hills has a humid continental climate with a mean annual temperature of 10 °C and mean annual precipitation of 1050 mm (NOAA 2007). However, annual precipitation from 2016 to 2018 deviated from this average: 2016 was a drought year at 719 mm, 2017 was near average at 988 mm, and 2018 was a wet year at 1275 mm (Xiao and Li 2018). Put in a state historical context, 2016 was the second driest year in Pennsylvania in the last two decades and 2018 was the wettest year on record (NOAA 2021). We targeted our analyses across these three years to explore the response of FS to rapid and significant change in interannual water availability.

Automatic Soil CO2 Efflux Collection

FS was measured hourly in two replicates across four landscape positions (ridgetop, planar midslope, swale midslope, and valley floor) using an automated soil respiration instrument, the LI-8100A Soil CO2 Flux System (LI-COR Biosciences Inc., Lincoln, NE, USA). Two LI-8100A Flux Systems were each linked to four opaque long-term soil respiration chambers (8100-104) fitted to a LI-8150 Multiplexer. These chambers measure FS by closing over a soil collar installed to ~ 5 cm depth and continuously calculating the change in CO2 concentrations within the chamber over 120 s, allowing 20 s for chamber closure, 30 s as a “dead band” to reach steady mixing immediately after closure, and 30 s after measurement for air to purge sampling lines of moisture. After completing the measurement, the chamber moves 180° from the soil collar to preserve the natural CO2 gradient between soils and the atmosphere. Data were downloaded approximately weekly, at which time we removed any plant growth within the chambers and debris that could affect the chamber’s closure. Otherwise, all new litter inputs were allowed to accumulate in the collars. Measurements were stopped prior to forecasted snowfall to avoid damaging the automated systems. Samples were taken every hour from July 2015 to December 2018 which, when accounting for missing data from technical issues and inclement weather, led to a total of 177,477 observations. This base dataset is publicly available through COSORE (Bond-Lamberty and others 2020).

FS was estimated using SoilFluxPro software (version 4, LI-COR Biosciences). FS was calculated as both an exponential and linear regression of CO2 concentration in the chamber over time. The best-fitting model was determined by comparing the regression coefficient (R2) and the normalized sums of the squares of the residuals for both fits. All calculations discarded the first 30 s of the CO2 concentration curves to account for disturbances of soil surface pressure from the chamber movement (Courtois and others 2019).

FS estimates from the base dataset were removed using the following quality control pipeline: (1) incomplete entries (n = 1506); (2) fluxes that had a best-fitting regression between time and CO2 concentration with an R2 < 0.90 (n = 13,394) as per literature precedent (Courtois and others 2019; Savage and others 2014); (3) entries with known problematic data according to the field technician error log (n = 1264); and (4) entries with physically implausible values (fluxes < − 1 or > 50 μmol m−2 s−1) (n = 34). Additionally, fluxes that were ± 5 μmol m−2 s−1 from adjacent observations were flagged as “spikes” (Rubio and Detto 2017). The regression between time and CO2 concentration for each “spike” was individually reviewed and removed if there was evidence of measurement errors (n = 479), such as implausibly high starting CO2 concentrations (suggesting that not enough time elapsed between chamber closings to preserve the CO2 concentration gradient) or erratic concentrations (suggesting an improper seal between the chamber and soil collar). After following these criteria, 90.6% of original flux measurements were retained (Figure 2).

Figure 2
figure 2

Time series of soil CO2 efflux across three years of measurement. Black lines indicate quality controlled observations, while gray lines represent gap-filling through a regression model with modeled 5-cm soil temperature and volumetric soil water content (see Eq. 2). Year tick marks correspond to January 1.

Manual Soil CO2 Efflux Collection

FS was manually measured weekly to biweekly in 2016–2017 between 0900 and 1400 h because prior research suggested this time window may minimize diel effects (Davidson and others 1998). Measurements were collected at 50 macroplot sites (Figure 1) spanning ridgetops (n = 7), planar midslopes (n = 21), swale midslopes (n = 13), and valley floors (n = 9). Within these 10-m diameter circular macroplots, FS was measured at three soil collars with the same LI-COR 8100 analyzer used for continuous observations. At each sampling time, soil collar FS was averaged to the macroplot scale to account for spatial autocorrelation. Spatial measurements were checked for quality, such that values indicating a malfunction (that is, unreasonable chamber temperature, initial CO2, or pressure, etc.) were removed. We calculated growing season estimates as a linear interpolation between daily observations (for example, Pacific and others 2008), which were summed for the 180 days between May 9 to October 15 (the earliest and latest sampling dates found in both years).

Co-located Timeseries and Geospatial Data

To understand controls on FS, we leveraged co-located time series data available from the Shale Hills CZO (https://czo-archive.criticalzone.org/shale-hills/data/datasets/). For climate variables, this included hourly precipitation from an OTT Pluvio weighing rain gauge gap-filled with data from the National Atmospheric Deposition Program (Xiao and Li 2018). Hourly air temperature was measured in the automated chambers and gap-filled with regional Daymet climate data (Thornton and others 2020) adjusted for our study site using the R “daymetr” package (version 1.6) (Hufkens and others 2018). For metrics of plant productivity, an indicator of autotrophic respiration, we used 90th percentile daily green chromatic coordinate (GCC), an estimate of canopy greenness from PhenoCam imagery (Richardson and others 2018). For biophysical controls on heterotrophic respiration, we monitored soil moisture using ECH2O EC-5 or GS1 (Decagon, METER Group Inc, Pullman, WA, USA) sensors and soil temperature using 8150-203 soil temperature probes (LI-COR) at 5-cm soil depth co-located with each chamber. However, the sensors often failed or recorded physically impossible data.

Instead, we modeled hourly soil moisture and temperature at a 5-cm depth using the Penn State Integrated Hydrologic Model with a surface heat flux module (Flux-PIHM; Shi and others 2013). Flux-PIHM is a physically based, spatially distributed, land surface hydrologic model that simulates lateral water flows (Shi and others 2013), which are critical to capturing heterogenous FS in complex topography (Riveros-Iregui and others 2012). In the Shale Hills watershed, Shi and others (2015) have found that Flux-PIHM simulates the dynamic and spatial structure of observed soil moisture. Specifically, the Shale Hills watershed model domain was decomposed into a triangular network of 532 grids. Flux-PIHM simulations used a surface elevation map from lidar measurements (Guo 2019), a soil map and soil hydraulic properties from an extensive soil survey (Lin and others 2006), and a vegetation map from a survey of more than 2000 trees (Eissenstat and others 2013). The meteorological forcing data for Phase 2 of the North American Land Data Assimilation system (NLDAS-2; Xia and others 2012) were used for the model simulation. The model was calibrated using discharge, groundwater level, soil moisture, soil temperature, and surface heat flux measurements. Model-predicted spatial patterns of soil moisture at a 5-cm depth have been validated using field measurements (Shi and others 2015), which showed that the model is able to capture the observed macro-spatial pattern of soil moisture at Shale Hills. We used the modeled soil temperature and soil moisture as liquid water of the topsoil layer (0–10 cm) at the corresponding grids where automated chambers were located. Note that soil moisture, here, is the volume of liquid water, which excludes the frozen soil moisture content.

Additionally, we gathered static (or slow changing) variables using soil samples and remote sensing products (Supplemental Table 1). In June 2020, we collected two soil cores within 1 m of each chamber to a 15-cm depth using a 5.08-cm-diameter split corer. Each soil core was split into two layers for uniform processing: “surface” mineral soils from 0 to 5 cm, and “deeper” mineral soils from 5 to 15 cm. We collected O horizons within a 24-cm ring centered around each core. Samples were placed on ice and transported back to the laboratory where they remained at 4 °C until processing. Soils were dried to constant mass to determine water content gravimetrically. Soil bulk density was calculated both with and without rock volume (Throop and others 2012). O horizon samples were ground to 1 mm using a Wiley Mill, and mineral soils were sieved at 2 mm. Soil texture was determined using the rapid method from Kettler and others (2001). Soils were analyzed for total C at Penn State Agricultural Analytical Laboratories using the combustion method (Nelson and Sommers 1996).

We estimated topographic variables in ArcGIS Pro using a 3-m Digital Elevation Model (Guo 2019). Soil depth was calculated as elevation minus bedrock elevation, determined from ground penetrating radar (Lin 2019). Three types of curvature were calculated using the Curvature function: profile, or curvature parallel to the slope, which relates to rates of erosion and deposition; planform, or curvature perpendicular to the slope, which relates to flow convergence; and standard, which combines both types of curvature into a standard value. Lastly, topographic wetness index (TWI), an indicator of the influence of local topography on water movement and accumulation, was calculated using the equation:

$$ TWI = {\text{ln}}\left( {\alpha /{\text{tan}}\beta } \right) $$
(1)

where \(\alpha\) is the upslope contributing area calculated using a D-infinity algorithm (Tarboton 1997) and \(\beta\) is the slope angle (Beven and Kirkby 1979).

Estimating Automatic Growing Season Efflux

To sum continuous FS across time, we gap-filled our observations using a regression model from Sullivan and others (2010) based upon modeled soil temperature and water content:

$$ \ln \left( {F_{S} } \right) = \beta_{0} + \beta_{1} T + \beta_{2} \theta + \beta_{3} \theta^{2} $$
(2)

where FS is soil CO2 efflux (μmol m−2 s−1), βn is a parameter coefficient, T is soil temperature at 5 cm, and θ is volumetric soil water content at 5 cm (m3 m−3). We attempted to fit simpler equations, but we determined Eq. (2) was the preferred method to gap-fill based on Akaike’s Information Criterion (Akaike 1974), adjusted R-squared, root mean square error, and mean absolute error. Final model performances found all variables to be significant (p-value < 0.001); and across all chambers and years, the fraction of FS from gap-filling averaged 29.0% in the growing season and 30.4% in total (Supplemental Table 2). Gap-filled data are displayed in Figure 2. Once the dataset was gap-filled, we calculated growing season fluxes from automatic chambers by summing all hourly fluxes between May 9 and Oct 15 of each year (Table 1), as well as annual fluxes summed for the calendar year (Supplemental Table 3). We also simulated annual estimates from manual sampling for 2016–2018 by randomly choosing one observation from the automatic gap-filled dataset each week between 0900 and 1200 h in hours without rain, linearly interpolating between daily observations, and summing for the calendar year (Supplemental Figure 1).

Table 1 Growing Season FS Mean ± Standard Error (g C m−2 180 day−1) by Landscape Position and Sampling Method

We focused our statistical analyses on growing season estimates to align automatic measurements with manual measurements. Further, we ensured that our analyses were robust to gap-filling methods by calculating growing season FS using three additional methods (Supplemental Figure 1) from the R package “FluxGapsR” (version 0.1.0) (Zhao and others 2020). One method, singular spectrum analysis, is independent of soil moisture and temperature (Zhao and others 2020), which ensures FS estimates are not an artifact of the Flux-PIHM model.

Data Analysis

We first assessed the impact of topographic positions, climatic variability, and sampling methods on growing season FS through total least squares (TLS) regression and repeated measures analysis of variance (ANOVA). We regressed manual and automatic growing season FS from all gap-filling methods using TLS regression, which accounts for variability within both estimates. Further, we explored the impact of landscape position and sampling method across 2016 and 2017 growing season FS using a three-way repeated measures ANOVA. Significant interaction effects were explored separately for manual methods and automatic methods in post-hoc two-way and one-way repeated measures ANOVAs.

However, these statistical approaches have weaknesses that we overcame using a Random Forest (RF) model to explore controls on daily FS predictions. RF is a supervised machine learning algorithm used for classification and regression (Breiman 2001). A detailed description is available in Hoffman and others (2018). Briefly, RF is built upon the concept of recursive partitioning, a nonparametric method that creates decision trees by recursively splitting response variables at a series of nodes into clusters of similar observations. This method is ideal for both our time series and geospatial data, because it is relatively free of assumptions regarding the distribution of variables and regarding the relationships between predictor and response variables. However, individual decision trees are sensitive to the training data and prone to overfitting. RF offers more robust predictions by constructing many independent regression trees and generating the mean prediction across all trees. An ensemble of trees is grown using bootstrapped samples of observations split at a user-defined number of randomly chosen predictor variables. From this ensemble, RF algorithms provide several useful outputs, two of which we display: variable importance score, which ranks predictor variables based on the contribution of each variable to overall model accuracy, and partial dependence plots, which explore the relationship between one predictor (or two interacting predictors) and the response averaged across all observations (Strobl and others 2007).

We built the RF model using the R package “randomForest” (version 4.7–1.1) (Breiman and Cutler 2018). We trained the RF model to predict observed daily FS using days without any gap-filled hours (n = 3966) using 16 predictor variables: mean daily soil water content (m3 m−3); mean daily soil temperature (°C); cumulative 3-week precipitation (mm); mean daily air temperature (°C); planform curvature; profile curvature; standard curvature; elevation (m); total soil depth (m); topographic wetness index; 90th percentile daily green chromatic coordinate; percent soil carbon in the O Horizon, 0–5 cm, and 5–15 cm; and percent clay at 0–5 cm and 5–15 cm. Hourly values were aggregated to mean daily values for soil moisture and soil temperature, because green chromatic coordinate data were not reliable at all hourly timesteps (for example, photographs cannot be collected at night). Precipitation and air temperature were aggregated to the timestep with the largest Spearman rank correlation coefficient (Benjamini-Hoberg-adjusted p-value < 0.001) (Spearman 1904; Benjamini and Hochberg 1995). We optimized the number of trees (ntree) and the number of predictor variables considered at each node for splitting (mtry), such that ntree = 1000 and mtry = 5. After optimization, the RF model was retrained on two subsets of the data: swales and valley floors, which we call convergent (n = 1851), and planar midslopes and ridgetops, which we call non-convergent (n = 2115). These datasets were randomly split into 70% for model training and 30% for model validation. We repeated this split 10 times to estimate the uncertainty of variable importance scores from subsampling the training data (as in Saha and others 2021). We assessed model performance through percent variance explained and ordinary least squares linear regression between observed and predicted daily FS for the validation dataset. All statistical analyses were performed in R (version 4.1) software (R Core Team 2018), and code for the Random Forest model is available via GitHub (https://github.com/MWKopp/Ecosystems2022).

Results

Growing Season Soil CO2 Efflux from Continuous Measurements

Our first hypothesis was that adding terrain variables to standard soil temperature and moisture predictors would explain significantly more variance in estimates of growing season and daily FS across climate variability. We first tested this hypothesis through repeated measures ANOVA of growing season FS from continuous measurements across landscape positions (Supplemental Table 4). Growing season FS from automatic chambers ranged from 610 ± 63 to 1350 ± 139 g C m−2 180 day−1 across all topography and years (Table 1). However, measurements within a landscape position varied widely—and, for ridgetop and valley floors, significantly—across years. Two-way repeated measures ANOVA found support for a significant effect of year on growing season FS (F value = 14.539, p value = 0.009) and a significant interaction effect between year and landscape position (F value = 8.643, p value = 0.012). These significant effects seem to be driven by two trends: (1) FS from ridgetop and valley floors show a bidirectional response to interannual climate variability, and (2) FS from non-convergent flow paths varied more across years. Specifically, growing season FS from ridgetops increased by 463 g C m−2 180 day−1 between average estimates from the driest year (2016) to wettest year (2018), while valley floors decreased by an average of 208 g C m−2 180 day−1 (Table 1). Though this response is less clear for midslopes, planar midslopes (non-convergent) also showed the highest growing season FS in the wettest year, with an increase of 424 g C m−2 180 day−1 relative to the driest year (Table 1). Convergent flow paths varied widely within years (evidenced by relatively high standard errors in Table 1), which may have masked responses across years for swale midslopes. In short, data from automatic chambers support capturing topography as a significant interactive predictor of interannual growing season FS in the Shale Hills catchment.

Random Forest Modeling of Daily Soil CO2 Efflux

Our next test of the first hypothesis was to explore the predictive power of topographic, soil, and climate variables in modeling daily FS from automatic chambers using a Random Forest (RF) approach. We trained a RF model to predict daily FS using 16 variables from days without any missing (that is, gap-filled) hours of automatic data. Using all data, the overall final RF model explained 77.8% (± 0.02) of the variation in the data using 13 variables. Topographic wetness index (TWI), standard curvature, and green chromatic coordinate (GCC) were removed from the final model, because other variables accounted for these mechanisms. For example, standard curvature is a combination of planform and profile curvature, and GCC was highly correlated with air temperature. To compare predictors of FS and their interactions across topography, we retrained this model on two subsets of the overall data: convergent (swales and valleys) and non-convergent areas (planar midslopes and ridgetops). These models explained 76.92% (± 0.03) of the variation in data from convergent areas and 79.87% (± 0.02) of the variation from non-convergent areas.

We compared predictions from the final RF models with observations in our validation dataset to assess model performance. Pearson correlations between predicted and observed daily FS showed strong positive correlation from convergent (r = 0.82, p value < 0.001) and non-convergent areas (r = 0.89, p value < 0.001). Ordinary least squares linear regression between observations and predictions yielded an average slope of 1.00 (± 0.00) and 1.02 (± 0.02) for convergent and non-convergent areas, respectively. As such, we consider model performance to be robust.

To understand the variables driving RF model predictions, we calculated variable importance scores and partial dependence plots. Variable importance scores are a metric that ranks predictors based on the relative contribution of each variable to overall model accuracy. For convergent flow areas, the most important variables influencing FS were 5-cm soil temperature and mean daily air temperature, which showed a median percent increase in mean square errors across models of 71.07 ± 0.10 and 54.26 ± 0.10, respectively (Figure 3). For non-convergent areas, the most important variable was also soil temperature (68.89 ± 0.15); however, this was closely followed by 3-week antecedent precipitation (64.29 ± 0.19) and 5-cm volumetric soil water content (64.15 ± 0.22). Whereas variable importance scores explore the relationship between all predictors, partial dependence plots explore the relationship between one predictor (here, soil moisture) or the interaction of two predictors (here, soil moisture and temperature) and the response averaged across all observations. For non-convergent areas, soil moisture and daily FS displayed a monotonically increasing relationship with a greater amplitude of change in FS (Figure 4a). For convergent areas, soil moisture and daily FS displayed a parabolic relationship (Figure 4b). These relationships remain even when accounting for the interactive effects of soil temperature (Figure 4c and Figure 4d). Overall, RF models suggest that accounting for convergent flow may change the relative importance of and relationship between dominant predictors and daily FS.

Figure 3
figure 3

Variable importance scores from Random Forest models for daily soil CO2 efflux (FS) predictions trained on only observed data from (A) non-convergent areas (ridgetops and planar midslopes) and (B) convergent areas (swales and valley floors). Variables are ranked by relative importance for predicting FS, where a greater percent increase in mean square error indicates greater importance in the model. Ten variable importance scores were calculated by a Random Forest model built on ten separate random subsamples of training data. These ten variable importance scores are represented as a box plot corresponding to each variable to estimate uncertainty. Random Forest models were trained on observation only (that is, not gap-filled) data from automated flux chambers.

Figure 4
figure 4

Partial dependence plots of dominant predictors on daily FS from Random Forest models. Partial dependence plots show the average relationship between modeled 5-cm soil moisture and daily FS across a non-convergent areas (ridgetops and planar midslopes) and b convergent areas (swales and valley floors). Multipredictor partial dependence plots show the average interactive effect of modeled 5-cm soil moisture and temperature on daily FS across c non-convergent areas and d convergent areas. White areas in multipredictor plots are regions outside of the observed range used to train the Random Forest model. Random Forest models were trained on observation only (that is, not gap-filled) data.

Comparing Growing Seasons Estimates Across Automatic and Manual Methods

Our second hypothesis was that automated chamber methods will provide the same FS estimates as manual sampling when aggregated to the growing season temporal scale and catchment spatial scale. However, three-way repeated measures ANOVA (Supplemental Table 4) found a significant effect of method (F value = 7.280, p value = 0.021) on growing season FS estimates, as well as a significant interaction effect between year (2016–2017) and method (F value = 7.866, p value = 0.019).

To test the magnitude of the method effect, and its interaction with year, we compared growing season FS estimates from both methods using total least squares linear regression. We found growing season FS estimates from automatic methods to be 1.25 (± 0.08) times greater than manual estimates across all landscape positions and both years (Figure 5). This effect tended to be greater in a dry year (1.45 ± 0.06) than an average year (1.11 ± 0.06) and greater in convergent (1.27 ± 0.11) than non-convergent (1.21 ± 0.10) areas (Supplemental Table 5).

Figure 5
figure 5

Regression between estimated growing season FS (g C m−2 180 days−1) from manual and automatic chamber methods. The red dashed line indicates a theoretical 1:1 line. The black line indicates the total least squares linear regression between methods (slope = 1.25) with a 95% confidence interval shaded in gray. Regression coefficients are similar regardless of gap-filling methods for automatic methods (see Supplemental Table 5). Vertical standard error lines are greater (that is, more variability along the y axis), because automatic chambers have fewer replicates (n = 2) relative to manual sampling (n = 7 to 21, depending on landscape position).

We ensured that this difference was not an artifact of gap-filling automatic estimates by repeating regressions with three other gap-filling methods, both across and within years; regardless of treatment of automatic data, estimates from automatic chambers were consistently greater than from manual methods (regression slope with a 95% confidence interval > 1 in Supplemental Table 5). Although manual data were only collected in 2016 and 2017, simulating annual FS estimates for manual methods by randomly drawing from the automated chamber dataset across 2016–2018 also found consistently lower estimates than automatic methods (Supplemental Figure 1).

Discussion

We present one of the first multi-year continuous soil CO2 efflux (FS) datasets to capture the interactions of both complex terrain and significant precipitation variability in a temperate deciduous forest. Leveraging this dataset, we found a bidirectional response of FS across the catchment to increasing interannual water availability. We discuss the mechanisms driving this response as well as their implications for predicting and monitoring FS in complex terrain.

Using Automated Chambers to Estimate F S Responses Across Topography and Climate

The significance of complex terrain for C storage and fluxes has generated a pressing need to understand and predict the response of FS to climate variability across topographic gradients (for example, Rotach and others 2014; Senar and others 2018). Yet maximizing spatial coverage in monitoring FS has relied on manual sampling methods (for example, Riveros-Iregui and others 2012; Savage and Davidson 2001), which limit sampling frequencies to coarse timescales (for example, Hanson and others 1993) that may miss significant short-term (sub-daily) responses to climatic disturbances. Automatic chambers offer an opportunity to monitor FS at a fine temporal resolution; however, their cost limits spatial replication. With a sampling design that explicitly accounts for terrain, we find that hillslope position is a significant control on interannual FS (Supplemental Table 4) beyond what can be captured by instantaneous soil moisture and temperature.

A key finding is that terrain position determines the direction of response to traditionally measured soil and climatic predictors of FS. Specifically, growing season FS from ridgetops at Shale Hills increased with increasing interannual water availability, while valley floors showed decreasing annual FS in increasingly wet years (Table 1). These results not only corroborate previous research that interannual precipitation variability leads to a bidirectional response of FS in complex terrain (Riveros-Iregui and others 2012; Berryman and others 2015), but expand this exploration from drought/non-drought comparisons in semiarid and (sub)alpine forests to record annual precipitation in a humid temperate forest. While we found a considerable range in both hourly FS (0.00 g C m−2 h−1 to 1.78 g C m−2 h−1) and growing season FS (610 ± 63 to 1350 ± 139 g C m−2 180 day−1), this range is comparable to other FS observations in temperate forests (Giasson and others 2013). Moreover, these estimates are comparable to observations from forests in complex terrain (Berryman and others 2015; Riveros-Iregui and McGlynn 2009) and modeled estimates for our study site (Shi and others 2018). In short, continuously monitoring a few key positions in complex terrain identified a bidirectional response of FS to interannual climate variability within comparable ranges to manual monitoring across an order of magnitude more spatial replicates (for example, Riveros-Iregui and others 2009; Primka 2021).

Untangling Mechanisms of the Bidirectional Response: Moisture-Versus Diffusion-Limited F S

Our work advances FS research by showing that monitoring soil moisture and temperature variation is not enough to estimate and predict FS—landscape context is critical for knowing how soil moisture affects FS. We hypothesize that the bidirectional response of FS to interannual water availability hinges on the spatial distribution of mechanisms dominantly limiting FS: diffusion limitations in areas receiving convergent flows, and water limitations to biological activity in non-convergent areas.

In convergent areas, such as swales and valley floors, FS responds based on a parabolic relationship with soil water content (Riveros-Iregui and others 2012). Generally, FS peaks at intermediate soil moisture conditions (Doran and others 1991), which we confirm for our site in Random Forest models (Figure 4b, d). In topographic positions with wetter soils, such as the deep, clay-rich valley floors at Shale Hills (Lin and others 2006), persistent high soil moisture reduces diffusivity, and oxygen availability limits aerobic respiration (Hodges and others 2019). However, these wet sites could dry under reduced hydrologic connectivity, such as during a summer drought, which could promote a large release of CO2 from enhanced microbial and root respiration (Davidson and others 1998; Senar and others 2018). This is one likely explanation for the flush of FS from the valley floor in 2016. Under the dry conditions of 2016, valley floor soils may have dried enough to increase oxygen diffusion into the soil surface, increasing aerobic respiration. A concurrent study at Shale Hills measured soil pO2 at nearby valley floor position and found a marked increase in %O2 at all soil depths in the drought summer of 2016 relative to the summer of 2017 (Hodges and others 2019). This increase in soil pO2 may have allowed for the breakdown of available C substrates. Alternatively, increased diffusivity in 2016 may have allowed accumulated soil pCO2 to move from soil storage into the atmosphere (Hassenmueller and others 2015). In contrast, increased precipitation in 2018 may have led to soils that were too wet for maximum FS. Under saturated conditions, there is limited diffusion of pO2 for aerobic respiration; and, even when microbial communities switch to anaerobic respiration (evidenced by redox features in Lin and others 2006, and direct measurements of pCO2/pO2 in Hodges and others 2019), limited diffusion leads to a build-up of pCO2 rather than a flush of FS (Hassenmueller and others 2015). This may explain the decrease in FS from convergent areas at high volumetric soil water content (Figure 4b). In short, shifts in biological activity and diffusivity may explain interannual FS variability in valley floor positions at Shale Hills, leading to large fluxes in drought years and lower fluxes in wet years.

By contrast, interannual variability in FS from non-convergent areas, such as ridgetops and planar midslopes, may reflect water limitation to biological activity rather than limitation by low O2 or slow CO2 diffusion. Whereas convergent areas display a parabolic relationship with soil moisture, daily FS from non-convergent areas monotonically increased with increasing soil moisture (Figure 4a). Ridgetops at Shale Hills have thinner, sandier soils that drain quickly (Lin and others 2006). In a dry year, water in soil pores may be disconnected, limiting dissolved organic carbon (DOC) supply for microbial activity and lowering heterotrophic respiration (Papendick and Campbell 1981). Similarly, drought stress on trees could limit photosynthesis or C allocation to new or maintained root growth, lowering autotrophic respiration (Bryla and others 1997; Wang and others 2014). Supporting this hypothesis, minirhizotron data from the same spatially distributed sites in this study showed decreased root tip production in drier years relative to wetter years in 2016–2018 (Primka IV and others 2022). In a wet year, water in soil pores is connected, which allows water and DOC to reach microbial communities, increasing heterotrophic respiration. Additionally, tree roots may also access shallow water near the soil surface, upon which most trees at Shale Hills depend for water uptake (Gaines and others 2015), such that growth and maintenance root respiration are not water limited. Together, these sources contribute to an increase in FS in wet years such that ridgetop FS equals or exceeds FS from valley floors (Table 1; Supplemental Table 3), despite valley floor soils having greater C storage (Andrews and others 2011) and soil pCO2 (Hassenmueller and others 2015; Hodges and others 2019). Overall, contrasting limiting factors on FS across convergent and non-convergent areas lead to opposing responses of FS across interannual climatic variability.

Implications for Predictions: Random Forest Models Unveil Topography-Mediated Interactions with Soil Moisture

Random Forest (RF) models are among the machine learning tools rapidly improving predictions of soil greenhouse gas emissions (for example, Saha and others 2021). For example, Lu and others (2021) found RF models outperformed ten common process-based terrestrial ecosystem models for global FS predictions. As such, coupling automatic chamber data with RF models offers one of the best methods to model complex interactions among drivers of FS at fine temporal scales (Lu and others 2021). Our RF models found soil temperature, soil moisture, and climate variables were dominant predictors of FS, but their relative importance (Figure 3) and relationship with daily FS (Figure 4) differed between areas receiving convergent flow or not.

We expected that soil temperature would have great predictive power, because soil temperature is the most common predictor used to model FS (for example, Arrhenius 1889; van’t Hoff 1898; Lloyd and Taylor 1994). While soil temperature did have high importance values in RF models across the landscape, soil moisture and 3-week antecedent precipitation were nearly as important for predicting daily FS from non-convergent areas (Figure 3). Other topographic (elevation and curvature) and soil (texture, C, and total depth) characteristics had low importance values in non-convergent areas (Figure 3a). By contrast, daily FS from convergent areas had higher importance values for soil and air temperature, with moderate predictive power from moisture variables (soil and 3-week precipitation) and some predictive power from curvature, surface and O horizon soil C, and total soil depth (Figure 3b). If further studies find that these results hold true in other ecosystems, then simple and remotely sensed terrain metrics may improve which predictors we choose to scale FS from small but topographically complex catchments to larger scale models.

Relative to soil temperature, the relationship between FS and soil moisture varies widely across studies, which has hampered the development of empirical equations translating soil moisture parameters into reliable FS predictions (Lou and others 2006). Generally, optimum FS is predicted to occur at intermediate soil moisture, whether in statistical correlations (for example, Doran and others 1991) or more complex mechanistic models (for example, Davidson and others 2011). Beneath some soil moisture threshold, FS is most limited by slow diffusion of soluble C substrates into extracellular enzymes and by microbes involved in decomposition (Papendick and Campbell 1981), which can lead to dormancy in microbes and diminish heterotrophic respiration (Fierer and Schimel 2002). Similarly, drought conditions can decrease photosynthesis, which decreases translocation of photosynthates to the rhizosphere for root respiration (Ruehr and others 2009). Under these dry soil conditions, FS has a positive—sometimes even linear (Jassal and others 2008)—relationship with soil water availability yet has little response to soil temperature (Suseela and others 2012).

However, we suggest that some non-convergent areas may never reach or remain at volumetric soil water contents above this intermediate optimum long enough to decrease FS, leading to a relationship which appears monotonically increasing rather than parabolic (Figure 4a), even when accounting for interactive effects with soil temperature (Figure 4c). Despite training RF models with data from 2018, the wettest year on record in the state of our study site, FS from non-convergent areas does not display the decrease expected by limited diffusion of soluble-C and O2. Overall, the flush of FS from water-limited non-convergent soils in a wet year may suggest a shift in the topographic positions dominantly contributing to catchment-level FS at Shale Hills as the climate transitions toward wetter conditions (Ning and others 2012).

Implications for Methods: Targeting Control Points of Between-Method Variability

A current hypothesis in FS research is that manual and automatic chamber methods, although imbued with different biases (Yao and others 2009), balance a spatial and temporal tradeoff that produce similar estimates, particularly when scaling up across time (Savage and Davidson 2003). While both methods capture interannual variability in growing season FS across the Shale Hills watershed, the choice of method significantly affected the magnitude of estimates (Supplemental Table 4). Estimates from automatic chambers averaged 1.25 ± 0.08 times greater than from manual methods (Figure 5). Underpinning this variability is an interactive effect between methods, climate, and landscape positions. We find the difference between methods was most pronounced in a dry year (2016) and in areas receiving convergent flow (swales and valley floors) (Supplemental Table 5). Specifically, we find estimates from automatic chambers in a dry year to be 1.45 ± 0.06 greater than from manual estimates across the catchment, which is significantly more than in an average year (1.11 ± 0.06). These findings caution that the assumption of consistent FS estimates across sampling methods may hold true under average climatic conditions in well-drained landscapes but may be violated in areas receiving convergent flow.

There are several explanations for differences between methods at this site. First, automatic chambers may be biased by aspect. Shale Hills is a V-shaped catchment with a north- and south-facing slope. Although manual methods captured variability across aspect, automatic chambers were located on the south-facing slope, which may have greater FS from relatively greater SOC storage (Andrews and others 2011) and more solar radiation leading to warmer soils. As such, automatic chambers may overrepresent, and thus overestimate, the “hot spots” in the catchment. However, a more likely explanation is that manual methods may be biased by an underestimation of diurnal variation. A growing body of research finds that accounting for nighttime fluxes leads to higher FS estimates from automatic chambers, whether from lags in response to physical and biological changes (Makita and others 2018; Phillips and others 2011) or from measurement bias (Brændholt and others 2017). At Shale Hills, there is preliminary evidence of pronounced diurnal variation (Kopp, unpublished data), which automatic chambers more accurately capture (Yao and others 2009). This explanation is further supported by our simulated manual sampling, which found notably lower annual FS estimates from all automatic chambers when excluding nighttime observations (Supplemental Figure 1). These findings support previous suggestions that the fine temporal resolution of automatic chambers combined with the spatial distribution of manual methods complement landscape-scale monitoring of greenhouse gas emissions (Savage and others 2014). In complex terrain, we further refine this suggestion to strategically place automatic chambers at ecosystem control points (Bernhardt and others 2017) disproportionately responsive to climatic variability, such as valley floors (activated in dry years) and ridgetops (activated in wet years).

Though such ecosystem control points may be relatively rare on the landscape, their pronounced variability for between-method variation has implications for scaling FS across space. To consider these implications, we performed a simple spatial scaling exercise to estimate average catchment-scale growing season FS. We weighted the average growing season FS from convergent or non-convergent flow paths by their relative area within Shale Hills (that is, 22% of total catchment area is convergent, while 78% is non-convergent, as in Smith and others 2017). In 2016 (a dry year), we estimate average catchment-scale growing season FS to be 813 and 578 g C m−2 180 day−1 from automatic and manual methods, respectively. In 2017 (an average year), we estimate average catchment-scale FS to be 824 and 760 g C m−2 180 day−1 from automatic and manual methods, respectively. These estimates are consistent with our within-year regressions between methods and emphasize that the choice of method could lead to similar catchment-scale estimates in an average growing season or to about 28.9% error when failing to capture “hot moments,” such as from convergent areas in a dry year or at night. This error may be substantial even when hot moments are relegated to patches representative of a small fraction of the total catchment area. In short, FS monitoring designs in complex terrain may need to account for significant interactive effects between methods and landscape positions, particularly as interannual climatic conditions become increasingly variable.

Conclusions

We use one of the first multi-year FS sampling schemes that captures both fine spatial and temporal heterogeneity to demonstrate that hillslope position and shape can explain variance of daily, seasonal, and interannual FS estimates from a temperate deciduous forest in complex terrain. By capturing fine spatial heterogeneity, we find that landscape context is critical for understanding how FS (bidirectionally) responds to soil moisture. We hypothesize this response hinges on the spatial distribution of limiting factors on FS—slow diffusion limits FS from areas receiving convergent flows, and water availability for biota limits FS from non-convergent areas. Although soil saturation limitations on FS are well known in wetland and laboratory soil incubations, our work contributes to an understanding of when and where this process occurs in upland soils and the factors that govern its spatial distributions. Even in upland soils, our results show that accounting for convergent flow paths can change the relative importance and relationship of predictors of daily FS. Further, by capturing fine temporal heterogeneity, we find that the choice of sampling frequency has a significant effect on growing season FS estimates.

Moreover, our findings could have implications for scaling FS to global predictions in Earth System Models (ESM) by demonstrating how sub-grid topographic heterogeneity can lead to significant spatiotemporal variability of FS within less than 1–km2. In a review of 16 ESMs, Todd-Brown and others (2013) found that most ESMs could not reproduced grid-scale spatial heterogeneity of soil C or its decomposition. The authors posited that this poor performance was due, in part, to inadequate representations of topographic and soil moisture interactions, with some models assuming all soil C decomposition has a monotonically increasing relationship with soil moisture regardless of landscape position (Todd-Brown and others 2013). Our research suggests that sub-grid FS may display a parabolic relationship with soil moisture depending on lateral redistribution of water, yet this redistribution is rarely included in ESM land models (Clark and others 2015). Clark and others (2015) recommend using ESMs that capture sub-grid soil moisture heterogeneity, such as the Catchment model (Koster and others 2000) or the tiled hydrology implementation of the LM3 model (Subin and others 2014), to resolve uncertainties in land–atmosphere fluxes. We further suggest that, if our results hold true elsewhere, ESMs could incorporate simple and remotely sensed terrain metrics to partition the parabolic relationship between sub-grid soil moisture and FS into the full range (that is, decreasing FS at high soil moisture for convergent areas) or a range drier than the inflection point (that is, monotonically increasing FS for non-convergent areas). Future work should test how this approach might improve current uncertainty in spatial patterns of soil C and its decomposition across global ecosystems.