Introduction

Phenology is defined as “the art of observing life cycle phases or activities of plants and animals in their temporal occurrence throughout the year” (Gratani et al. 1986). Many studies have shown that plant phenology is sensitive to temperature, and therefore the timing of phenophases has been used as an indicator of climate change in many parts of the world including North America, Europe, and Asia (Schwartz 1998; Sparks et al. 2000; Defila and Clot 2001; Matsumoto et al. 2003; Menzel 2003; Chmielewski et al. 2005; Donnelly et al. 2006; Ge et al. 2015). In situ observations of tree phenophases such as bud burst and leaf out in spring, fruiting in summer, and leaf coloration and fall in autumn are commonly used for this purpose (Beaubien and Johnson 1994; Linkosalo 1999; Ahas and Aasa 2006; Menzel et al. 2006). A number of long-term phenological networks in Europe, such as the International Phenological Gardens network, have been used to demonstrate the impact of rising temperature on forest trees (Chmielewski and Rötzer 2001; Menzel et al. 2006). However, these datasets are often focused on a small number of species with limited spatial coverage and tend to be temporally and spatially discontinuous, making it difficult to determine large-scale and long-term trends in the timing of leaf phenophases (Schwartz 1994; Peng et al. 2017. In addition, observations are made on a few discrete individuals which may not be representative of phenological response to climate at community and ecosystem levels (Diaz and Cabido 1997; Donnelly et al. 2017).

In contrast, satellites have the ability to monitor vegetation on a continuous basis across a large geographical area. Therefore, satellite data, as a substitute for in situ observations, is becoming one of the most widely used approaches in landscape-scale forest phenological research (Reed et al. 1994; Zhang et al. 2001; Zhang et al. 2004). Numerous studies have utilized satellite imagery-based normalized difference vegetation index (NDVI) and/or enhanced vegetation index (EVI) to estimate the start and end of greenness or the timing of deciduous tree phenophases, such as bud burst, leaf expansion, and leaf fall (Moulin et al. 1997; White et al. 1997; Liang et al. 2011; White et al. 2014). The spatial and temporal coverage afforded by satellite data facilitate monitoring of vegetation response to climate change at the continental or global scale (Reed 2006; Xiao et al. 2006; White et al. 2009; Liu et al. 2016). However, since satellite integrates vegetation signal at the pixel level, some important details on land cover and community composition within a pixel may get overlooked (Liang and Schwartz 2009). This may result in discrepancies between satellite-derived phenology and direct observation data, although such discrepancies in forest ecosystems are argued to be smaller than in grassland, cropland, savanna, and heterogeneous areas (Donnelly et al. 2018; Zhang et al. 2018). Therefore, in order to improve the accuracy and reliability of satellite data in estimating forest phenophases, further research is required.

Carbon flux measurements are another approach used in large-scale phenological studies (Richardson et al. 2010; Peng et al. 2017). Plants influence atmospheric carbon concentration by taking in carbon dioxide through photosynthesis and releasing it through respiration, which generally results in forests acting as a major carbon sink (Goulden et al. 1996; Schimel et al. 2015). Therefore, the timing of when photosynthesis exceeds respiration in spring is closely related to leaf out and expansion, which in turn impacts the seasonal and annual carbon budget in forest ecosystems (Goulden et al. 1996; Angert et al. 2005; Richardson et al. 2010). Multiple studies have extracted deciduous forest phenological transition dates using variations in carbon flux (Desai et al. 2005; Garrity et al. 2011; Wu et al. 2013; Liu et al. 2017). For example, the transition from dormant buds to budburst corresponds to the start of CO2 uptake from the atmosphere (Richardson et al. 2009). Similar to satellite data, carbon flux measurements differ in scale from in situ observations, as the footprint of a carbon flux tower is 1.1~5 km2 (Chen et al. 2011). Therefore, it is not possible to isolate the contribution of individual species and communities to overall carbon flux. Data quality can also be affected by topography, vegetation type, and non-vegetation activities (Baldocchi 2003; Solaymani 2017), highlighting the need to improve carbon flux accuracy in estimating deciduous forest phenology.

In situ observations, satellite data and carbon flux measurements are all widely used in forest phenological research, with the spring season receiving more attention than autumn (Beaubien and Johnson 1994; Schwartz 1998; Schaber and Badeck 2003; Menzel et al. 2006; Guo et al. 2015; Donnelly et al. 2017). Although spring leaf development and flowering advance has been broadly observed, a trend toward a delay in autumn leaf coloration and leaf fall was less consistent (Defila and Clot 2001; Chmielewski et al. 2005; Menzel et al. 2006). In addition, a distinct NDVI threshold of 0.6 to 0.7 corresponds to spring leaf expansion, but no corresponding threshold has yet been found for autumn defoliation (Nagai et al. 2010). Furthermore, both NDVI and EVI2 (two-band EVI, similar to the three-band EVI but blue band is not used) show higher correlation with the PhenoCam-derived green chromatic coordinate and vegetation contrast index for green-up than for leaf coloration (Zhang et al. 2018). Nevertheless, even though autumn phenology is less understood, its influence on seasonal and annual carbon budgets cannot be ignored. For example, delayed autumn senescence and an extension to the growing season have explained 50% of the annual carbon flux variation in a deciduous forest (Dragoni et al. 2011). Although photosynthesis and respiration both increase with warmer temperatures in autumn, CO2 release through respiration may offset 90% of the increased spring carbon uptake by photosynthesis (Piao et al. 2008). Therefore, more studies of forest autumn phenology are required to improve our understanding on its relationship with carbon exchange.

In order to address some of the shortcomings in scale between in situ observations, satellite-derived NDVI and EVI, and carbon flux measurements of autumn phenology, we used datasets from all three approaches from a temperate mixed forest in northern Wisconsin, USA. The aims of this study were to explore (i) the accuracy and reliability of satellite data/carbon flux measurements in estimating direct observations of autumn forest phenology and (ii) the environmental and physiological factors that may influence these differences. The results will help improve the understanding of the effectiveness of different approaches to estimate autumn forest phenology.

Materials and methods

Study area

The study area was located in the Park Falls Ranger District of the Chequamegon-Nicolet National Forest of northern Wisconsin, where the vegetation is mixed temperate forest comprising deciduous (70%) and coniferous (30%) species (Haugen et al. 1998). The area is located within the footprint of a 447-m WLEF AmeriFlux tower (i.e., W Lee E. Franks TV tower at 45.94°N, 90.27°W), which has been operated by the Chequamegon Ecosystem-Atmosphere Study group (ChEAS) to record carbon flux since 1995 (Desai et al. 2005). In situ phenological observations were available for two 625 m × 625 m study sites primarily composed of 80-year-old mature hardwood forest each with slightly different species composition. The north study site was dominated by sugar maple (Acer saccharum), red maple (Acer rubrum), and basswood (Tilia americana), whereas the south study site was dominated by quaking aspen (Populus tremuloides), speckled alder (Alnus incana), red maple (Acer rubrum), and white birch (Betula papyrifera), (Fig.1 and Table A1, online resource).

Fig. 1
figure 1

Map of north and south study sites generated from a QuickBird (2.4 m) false color composite image (September 27, 2012). Black dots represent location of the plots where phenological observations were recorded on trees. The blue lines outline the boundaries of the study sites, with the WLEF Flux Tower in the middle of the image

Data collection and processing

Field data

Field data (Table A2, online resource) were collected during 2010–2013. Specifically, autumn phenological data (leaf coloration and leaf fall) were recorded in 2010 and 2012 at the north site and in 2010, 2012, and 2013 at the south site. For each tree, leaf coloration and leaf fall phenophases were recorded using following protocol: 800 = leaf coloration < 10%; 810 = leaf coloration 10~50%; 850 = leaf coloration 50~90%; 890 = leaf coloration > 90%; 900 = leaf fall < 10%; 910 = leaf fall 10~50%; 950 = leaf fall 50~90%; 990 = leaf fall > 90% (Schwartz and Liang 2013; Yu et al. 2016). Afterward, field observations were scaled up to the site level in a two-step process (Liang and Schwartz 2009), so that it becomes comparable with satellite data and carbon flux measurements in terms of scale (details of field data collection and upscale can be found in the supplemental material).

In 2010 and 2012, field observations began after leaf coloration (LC) had started. In particular, when observations began, the population phenology of all species except for tamarack (Larix laricina, 2010) and basswood (2012) were greater than 800 (LC < 10%) in the north site, while all species were greater than 800 in the south site. In 2013, observation ended before the full leaf coloration (FLC) and full leaf fall (FLF) of speckled alder were reached. Therefore, it was difficult to identify the exact start of LC and LF (leaf fall) dates in 2010 and 2012 and FLC and FLF dates in 2013. Given this situation, three transition dates were calculated instead of the directly observed phenophases based on a logistic model described in Zhang et al. (2003):

$$ y(t)=\frac{c}{1+{e}^{a+ bt}}+d $$
(1)

Where t is time in days, y(t) is the leaf coloration or leaf fall value at time t, a and b are fitting parameters, and c + d is the maximum phenophase value. Subsequently, change of curvature (i.e., the derivative of curvature derived from function (1)) was calculated. For each time series two minimum and one maximum change of curvature values were determined which represented three transition dates, i.e., LCt1, LCt2, and LCt3 (first, second, and third transition of leaf coloration) and LFt1, LFt2, and LFt3 (first, second, and third transition of leaf fall).

Satellite data

The MOD13Q1 006 vegetation index dataset was selected for processing the satellite-based phenology which were acquired via Google Earth Engine (GEE) platform (https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD13Q1). GEE is a cloud-based platform enabling users to process large datasets with limited lines of code (Gorelick et al. 2017). In contrast to conventional data processing procedures, GEE simplifies data processing by removing image clipping, reprojection, and format conversion processes. Furthermore, when estimating phenology with satellite data, the vegetation index (VI) value of each pixel was weighted by the area of intersection within the plot (Liang et al. 2011), which is also feasible via GEE-provided functions.

The VI values obtained were normalized using the following Eq. (2) described in White et al. (1997):

$$ V\;{I}_{ratio}=\frac{V\;I-V\;{I}_{\mathrm{min}}}{V\;{I}_{\mathrm{max}}-V\;{I}_{\mathrm{min}}} $$
(2)

The normalization process did not have a significant influence on data quality (shown by the mostly insignificant difference in terms of fitting parameters and goodness of fit, Table A3, online resource), while it reduced the complexity of model by fixing the parameters c and d to 1 and 0. The normalized time series (NDVInor for normalized NDVI and EVInor for normalized EVI) was also applied to Eq. (1), and in this case y(t) was the VI value at time t. Similarly, three transition dates were retrieved from the change of curvature corresponding to the start of autumn (SOA), middle of autumn (MOA), and end of autumn (EOA).

Carbon flux data

Carbon flux data for WLEF/Park Falls was downloaded from the AmeriFlux website (http://ameriflux.lbl.gov/sites/siteinfo/US-PFa#doi) for the study period 1997–2016 excluding 2005 and 2010 as data for these years were unavailable or unreliable due to equipment failure. For the remaining data, only daytime records reflecting photosynthetic productivity were utilized since nighttime gross primary production (GPP) is usually zero and net ecosystem exchange (NEE) tends to reflect nighttime respiration. A double-logistic function was applied to derive the same three transitions: SOA, MOA, and EOA. The function is described in Eq. (3), which originates from the double logistic function in Fisher et al. (2006) and the two-section logistic models in Zhang et al. (2003). Non-normalized carbon flux datasets were analyzed since normalization significantly reduced data quality (P < 0.01 for goodness of fit of normalized and non-normalized datasets).

$$ Flux(t)=a+\frac{d}{1+{e}^{ct+b}}-\frac{g}{1+{e}^{ft+e}} $$
(3)

Where Flux(t) is the carbon flux index value at time t and a, b, c, d, e, f, and g are fitting parameters (b, c, and d are spring parameters while e, f, and g are autumn parameters) all estimated by non-linear regression. In agreement with field data and satellite data, SOA, MOA, and EOA were derived from the minimum and maximum values of change of curvature.

Transition dates comparison

SOA, MOA, and EOA were derived from both satellite and carbon flux datasets. For field data, equivalent transition dates (LCt1/LCt2/LCt3 and LFt1/LFt2/LFt3) were also computed. In years for which field data were available, bias error (Soudani et al. 2008) between field observation and the indirect measurements were calculated for each transition date.

$$ Bias\kern0.17em error=\frac{\sum_i^n\left({M}_i-{O}_i\right)}{n} $$
(4)

Where Mi is the transition date for satellite data or carbon flux indices, Oi is the transition date of the equivalent field observation, and n is the pair of comparisons.

Long-term field data proxy

In order to make direct comparison between field observation and indirect measures of autumn phenology, it was necessary to extend the in situ time series to the same timespan as indirect approaches (1997–2017). One method is to use the average of available field records as representatives of long-term records. The effectiveness of this method is examined by the z values (subtract the mean from a data point then divided by the standard deviation) of long-term transition dates and progression rates derived from EVInor, NDVInor, NEE, and GPP. Since indirect determination of autumn phenology may be used as a proxy for field observations, the effectiveness of average field transitions and progression rates could be supported if field-observed years were consistent within the long-term trend, i.e., z values are within a threshold. (the threshold is − 3~3, as only 0.2% of data points would fall out of this range and be considered as outliers).

For transition dates, all z values in 2010, 2012, and 2013 were less than three, suggesting these years were consistent within long-term records (Table A4 online resource). Therefore, the average of six direct observation transition dates in years with available field data (Table A1, online resource) were taken as representative of a long-term field record. For progression rates, the z value derived from NDVInor in 2013; south site was an outlier (3.62). However, the z values derived from the other three approaches in 2013 were relatively small (− 0.80~0.25), suggesting that 2013 could not be considered as an abnormal year (Table A4, online resource). Therefore, the average progression rate of field data-available years was also taken as representative of a long-term field record.

Statistical analysis

In a two-section logistic model (eq. (1)), the absolute value of parameter b (or the absolute value of c and f in a double-logistic function) is the rate of vegetation growth or senescence progress, while the absolute value of a/b (or the absolute value of b/c and e/f in a double-logistic function) is the peak of growth or senescence progress, which corresponds to MOS (middle of spring) or MOA (Beck et al. 2006; Zhang 2015). For satellite data and carbon flux data, linear correlation of progression rate parameters (b or f) was conducted, where the constant was manually set as zero since the intercept did not have any physical meaning. Therefore, the slope represented how many times the rate derived from one approach was compared to another approach. RMSE and P value derived from the T test were used to evaluate the quality of the model because they were applicable in a non-constant regression model.

Results

Interannual comparison between satellite-, carbon flux-, and field-derived autumn phenology

Transition dates

For both the north and south sites, SOA and MOA transition dates derived from all four indirect approaches (NDVInor, EVInor, NEE, and GPP) were consistently earlier than the equivalent transition dates for LC and LF derived by direct observation. Overall, the greatest disparity (− 83~4 days, Table 1) was recorded between NEE and field observations, while NDVInor showed the smallest difference (− 12~− 4 days). In addition, the bias error between SOA and LCt1 (− 78~− 4 days) and LFt1 (− 83~− 12 days) were generally larger than that between MOA and LCt2 (− 49~− 7 days) and LFt2 (− 56~− 9 days). EOA derived from EVInor or GPP was generally later (0~26 days) than direct observations while the EVInor bias error was smaller than GPP by 13~14 days. In contrast, NDVInor consistently showed an earlier EOA than LCt3 and LFt3 by 5~6 days and 12~14 days, respectively. The bias error between NEE derived EOA and LCt3 and/or LFt3 was even smaller (67~70 and 35~43 days smaller than the early and mid-season respectively, Table 1).

Table 1 Comparison of bias error (average number of days between indirect measurements and direct observation), modeled autumn progression rate for each approach and their average ratios to field observation rates of autumn phenology.

Autumn phenology progression rates

In general, the autumn progression rate derived from indirect approaches were slower than direct observations except for NDVInor in 2013 at the south site (a ratio lower than one indicates the progression rate derived from an indirect approach was faster than direct observations, and vice versa, Table 1). For both sites, the average ratios between NDVInor and direct observations were consistently close to one (0.88~1.35), while average ratios for NEE were greater (8.49~9.88), suggesting that NDVI was better able to capture direct observations than the other indirect approaches. Furthermore, for both sites and both phenophases, satellite-derived progression rates were in closer agreement with direct observations than carbon flux measurements (Table 1).

Both MOA and autumn progression rate derived from NDVInor were closer to that of direct observation than the other indirect measurements (the bias error between MOA and LCt2 was − 7 days, between MOA and LFt2 was − 9~− 11 days, and the ratio was 0.88~1.35). At both sites, NEE-derived MOA showed greater bias error from LCt2 and LFt2 than either GPP or EVInor (the bias error between MOA and LCt2 was − 39~− 49 days, between MOA and LFt2 was − 44~− 56 days). The magnitude in bias error between GPP/EVInor derived MOA and LCt2 and/or LFt2 varied across sites. Conversely, the difference in progression rate between field data and either NEE or GPP was greater than EVInor, while only small differences existed between the former two. Therefore, NEE was considered to be the least reliable approach in capturing direct observations given the lower performance compared to either EVInor or GPP (Table 1).

Long-term comparison between satellite- (2000–2017), carbon flux- (1997–2016), and field- derived autumn phenology

Transition dates

In general, SOA derived from indirect approaches was earlier than LCt1 and LFt1. The greatest bias error was recorded for NEE in the north site and EVInor in the south site, while the lowest was for NDVInor (both sites, Fig. 2a, b). Similarly, MOA was earlier than LCt2 and LFt2, while the bias error for both sites was greatest for NEE and smallest for NDVInor (Fig. 2c, d). In contrast, EOA tended to be later than both LCt3 and LFt3 by 5~23 days and 8~15 days, respectively, apart from NDVInor which occurred on the same day as LFt3 at the south site and was earlier at the north site by 2 days. Bias error was smallest for NDVInor (0~7 days) and largest for GPP (14~23 days) which exceeded NEE by 6 days at both sites and exceeded EVInor by 4 days at the north site and 3 days at the south site (Fig. 2e, f).

Fig. 2
figure 2

Autumn phenology derived from satellite data (2000–2017) and carbon flux measurements (1997–2016) compared with field observation (the north site: 2010 and 2012; the south site: 2010, 2012, and 2013). SOA = start of autumn, MOA = middle of autumn, EOA = end of autumn. Dotted lines show the average first, second, and third transition of leaf coloration, while the dot-dash lines show the average first, second, and third transition of leaf fall. The star in each box represents mean values and open circles represent outliers. Numbers below each label show absolute bias error between mean values of each approach and LC/LF stage

Autumn phenology progression rates

In contrast to the short-term comparisons, all four indirect approaches underestimated the rate of LC and LF at both sites over the long-term. Ratios between field data and NDVInor remained the lowest for both sites and phenophases (1.93~2.12) while that between field data and NEE were consistently the largest (5.74~7.23). Conversely, the order of EVInor and GPP differed between phenophases, whereby EVInor ratios were lower than GPP for LC (5.50~5.97 versus 5.89~6.38) and larger for LF (5.64~6.19 versus 5.07~5.56). This result suggests that EVInor, NEE, and GPP were less effective than NDVInor at capturing LC and LF observations.

Linear regression analysis between satellite and carbon flux-derived progression rates at both sites revealed significantly (P < 0.01) positive relationships with low RMSE (0.021~0.138), resulting in the following order: NEE ≈ GPP ≈ EVInor < NDVInor (Fig. 3). Specifically, the progression rate derived from NDVInor tended to be faster than from EVInor, NEE, or GPP (NDVInor rate was 1.159~2.053, 1.553~2.551, and 1.933~2.300 times greater than NEE, GPP, and EVInor rate, respectively), while the progression rates derived from these three approaches were relatively similar (EVInor rate was −0.114~0.015 and 0.142~0.246 times faster than NEE and GPP rate, respectively).

Fig. 3
figure 3

Linear relationships of autumn phenology progression rates at two forest sites (north or south) in northern Wisconsin, USA, derived from four indirect measures (NDVInor, EVInor, NEE, and GPP). For either site, two of the four approaches were selected, and their rate parameters were regressed ((a) to (k))

The bias error between MOA and LCt2 and/or LFt2 was smallest for NDVInor (1~10 days) and largest for NEE (18~25 days), while the order of EVInor and GPP differed between sites (Fig. 2c, d). NDVInor-derived progression rate was also closest to LC and LF, suggesting that this method was more effective than EVInor, NEE, and GPP at capturing changes in phenology. In addition, MOA derived from EVInor, NEE, and GPP were generally a few days later (8~25 days) than LCt2 and LFt2, indicating delayed autumn progression compared to direct observations. Furthermore, the rate of LC and LF was 5.07~7.23 times faster than EVInor-, NEE-, and GPP-derived progression rates. This resulted in an earlier SOA, later EOA, and longer autumn duration for indirect estimated phenology compared to directly observe autumn phenology. Considering the similarity in progression rates and inconsistent bias error order, the ability of EVInor, NEE, and GPP was similar in terms of capturing autumn canopy change signals.

Field phenology corresponding with satellite-derived transition dates

Although the phenological stages of LC and LF corresponding to SOA, MOA, and EOA derived from indirect approaches could be estimated using the fitted field observation curve, the results may not necessarily represent true field phenology. Since LC had already started (i.e., LC > 800) when field observations began in 2010 and 2012, the phenological stage before LCt1 was overestimated. In addition, field observations ended before FLC and FLF in 2013, resulting in the underestimation of LC and LF progression stages after LCt3 and LFt3. However, slopes could nonetheless identify the rate of phenology progression. For example, the flat line before LCt1 indicates LC had not yet started, and the flat line after LCt3 and LFt3 indicates FLC and FLF had already occurred (Fig. 4).

Fig. 4
figure 4

Modeled field phenology correspondence with transition dates derived from four indirect approaches. Dot-dash lines are the fitted curves of field phenology. Circles and stars represent SOA, MOA, and EOA derived from EVInor and NDVInor, respectively. Triangles represent three transitions of field phenology (LCt1, LCt2, LCt3 and LFt1, LFt2, LFt3). The three dotted lines in each chart represent 10%, 50%, and 90% of LC and LF based on the observation scheme

EVInor-derived SOA and MOA tended to occur earlier than LCt1 and LFt1 except for the north site in 2010, when MOA occurred 1 day after LCt1 (Fig. 4). For this exception, LC progressed by 11% from SOA to MOA, suggesting that LC had just started when MOA was reached. For all other situations, only slight increases from SOA to MOA were found for LC (0%~2%) and LF (0%~4%), suggesting that LC and LF had not started by the time MOA was reached, whereas EVInor-derived EOA occurred after LCt3 and LFt3, except for the north site in 2010 where EOA occurred 1 day earlier than LFt3. In all other situations, the flat lines at EOA indicated that FLC and FLF had already occurred. Therefore, EVI was able to capture LC and LF progression, both of which tended to overlap with the EVI decline after MOA.

Similarly, NDVInor-derived SOA and MOA tended to occur before or close to LCt1 and LFt1 except for the south site in 2013, when SOA was 3 days later than LCt1 (Fig. 4). This pattern suggests that LC had just started or had not started yet by the time MOA was reached. In contrast to EVInor, NDVInor-derived EOA tended to occur later than LCt1 while earlier than LCt3 and LFt3. LC progression corresponding to EOA varied between 20% and 80%, while LF progression varied between 0% and 53%. Therefore, NDVI tended to decrease earlier than or close to field-observed canopy change, while it stopped declining before FLC and FLF.

Variation in carbon flux during senescence

In 2012 and 2013 in both sites, NEE increased and GPP decreased before LCt1 and LFt1 occurred and continued after FLC and FLF were reached (Fig. 5). During observed LC and LF progression, GPP decreased and NEE increased as expected. In particular, NEE increased by 0.25 μmolCO2 m−2 s−1 at the north site (2012) and by 0.39 and 0.92 μmolCO2 m−2 s−1 at the south site (2012 and 2013, LCt1–LCt3). NEE increased by 0.27 μmolCO2 m−2 s−1 in the north site (2012) and by 0.28 and 0.88 μmolCO2 m−2 s−1 in the south site (2012 and 2013, LFt1–LFt3). Meanwhile, GPP decreased by 0.55 μmolCO2 m−2 s−1 in the north site and by 0.84~3.09 μmolCO2 m−2 s−1 in the south site and by 0.55 μmolCO2 m−2 s−1 in the north site and by 0.57~3.2 μmolCO2 m−2 s−1 in the south site from LFt1 to LFt3. However, the change of NEE and GPP during field-observed autumn phenology was only a small fraction of the entire variation throughout the season (6%~17% of NEE and 5%~30% of GPP), suggesting that other factors may also impact carbon flux.

Fig. 5
figure 5

NEE increase and GPP decrease variation. The solid lines show NEE, while dot-dash lines show GPP. The units of NEE and GPP are both μmolCO2 m−2 s−1. The vertical dotted lines correspond with LCt1, LCt2, LCt3 and LFt1, LFt2, LFt3, while vertical dash-dot lines are EOA derived from EVInor. Hollow circles and black asterisks are SOA, MOA, and EOA derived from NEE and GPP, respectively

Discussion

Discrepancies between satellite-derived autumn phenology and field observations

There are numerous methods used to determine the timing and duration of the autumn phenology season including direct observation, satellite indices, and carbon flux measurements, each with their own uncertainties. In the current work, the SOA derived from satellite data was consistently earlier than the equivalent dates determined by direct observations. However, the decline in NDVInor and EVInor prior to MOA had little or no overlap with LC and LF phenophases. One reason for this mismatch in timing may be related to satellites observing the upper surface of the canopy from above while direct observations are generally recorded by looking up into the canopy from below; thus, these methods are not necessarily viewing the same part of the tree canopy. Furthermore, different parts of a tree could color at different times, the pattern of which depends on the species in question.

Koike (1990) classified 30 tree species in northern Japan into outer-type species, whereby coloration began on the outer part of the crown and in situ observation may detect the start of autumn progression later than satellite observation from above, and inner-type species, whereby the reverse occurred. Therefore, the existence of outer-type species in the study sites could contribute to the earlier SOA observed by satellite data. However, whether the species in our study area were “inner-type” or “outer-type” was not known, since there was no information reported for aspen (Populus tremuloides), which had the largest population in the south site. Furthermore, although maple (Acer mono Bunge) was reported to be an outer-type species (Koike et al. 2001), the species of maple (Acer saccharum or Acer rubrum) which had large populations in the current study area were different.

Similar discrepancies in the timing of autumn phenology have been reported by Donnelly et al. (2018) albeit for different species in Ireland and over a longer time period (1970–2017) whereby the start of coloration was earlier when derived from EVI2 compared to ground observations. Some of the explanations they put forward included the different parameters observed by satellite data and field records, the heterogeneous landscape covered by satellite pixels, and the scale difference between satellite data and field observations. These results highlight the need for further research to help understand and explain these discrepancies.

Evaluating effectiveness of EVI and NDVI at capturing in situ autumn phenology

Generally, EVI started to decrease earlier than NDVI and ended later, resulting in lower autumn progression rates and a longer autumn duration. The earlier SOA derived from EVI than NDVI is likely due to the higher sensitivity of EVI than NDVI for canopies with high leaf area indices (LAI) (Gamon et al. 1995; Huete et al. 2002). In particular, NDVI will saturate and remain stable throughout the middle of the growing season (Motohka et al. 2010), and therefore early LC and LF may not be detected. EVI, in contrast, avoids this issue due to its overall lower values and has been reported to be better able to detect canopy change in the early period of LC and LF (Huete et al. 2002).

Given the sensitivity of NDVI to soil (Gao et al. 2000) and litter (Van Leeuwen and Huete 1996) and to subtle changes in leaf spectral properties (Motohka et al. 2010; Junker and Ensminger 2016) during LC, it was not surprising that NDVI-derived EOA occurred earlier than LCt3 and LFt3. In particular, NDVI has been reported to vary with soil lightness and land cover when LAI is constant (Gao et al. 2000) and to be susceptible to noise produced by soil and litter (Van Leeuwen and Huete 1996). In addition, change in maple leaf color in late autumn could delay the decrease in NDVI. Although NDVI showed an obvious decrease when leaves turned from green to yellow during early autumn (2004–2008) in a Japanese deciduous forest (Motohka et al. 2010), an increase was reported when maple (Acer saccharum Marsh.) leaves in Canada changed from yellow to red during late autumn (Junker and Ensminger 2016). Therefore, the increase in NDVI resulting from red leaves may offset the decrease caused by yellow leaves, resulting in a stabilizing trend in NDVI prior to FLC and FLF.

In contrast to NDVI, EVI did not finish declining until nearly 2 weeks after LCt3 was reached and in some instances even 1 week after LFt3. This is in agreement with that reported for a 35-year time series in which EVI2 decrease ended later and lasted longer than the overall field record of many species in Ireland (Donnelly et al. 2018). Since EVI has been shown to be insensitive to coniferous late autumn phenology as they do not change color (Huete et al. 2002; Yuan et al. 2018), the delayed decline recorded in the current study may have resulted from the influence of late coloring understory shrubs, which remain green after canopy species became leafless (Xu et al. 2007). Compared to NDVI, EVI has reduced sensitivity to background soil while keeping the sensitivity to vegetation by incorporating the blue band (Huete et al. 2002), thus enabling it to detect understory vegetation change under an open canopy in late autumn with limited noise interference.

These results suggest that the proportion of deciduous and coniferous species, the pattern in which color changes occur in deciduous species, together with the actual color (yellow/red) may have a significant impact on satellite-derived autumn phenology and highlights the continued need for in situ ground observations of trees and shrubs (Donnelly et al. 2019) with which to validate satellite-derived vegetation indices.

Evaluating the effectiveness of carbon flux measurements at capturing in situ autumn phenology

During early autumn, NEE started to increase and GPP started to decrease before LCt1. When only considering years for which both flux and field data were available, NEE-derived SOA occurred 9~11 weeks prior to field-observed LCt1 and 6 weeks earlier when the long-term datasets were used. This pattern was in close agreement with a reported reduction in maximum carboxylation rate (an indicator of photosynthetic capacity) of five deciduous species in Tennessee, USA, which occurred 6–8 weeks before observed leaf senescence during 1997 to 1998 (Wilson et al. 2000b). Furthermore, the follow-up study suggested that this reduction started during mid-summer when both leaf nitrogen content and leaf area were constant (Wilson et al. 2001). In southern Wisconsin, USA, leaf photosynthetic capacity per unit area for red maple and sugar maple peaked in summer during 1986 to 1989 (Reich et al. 1991), supporting our findings in northern Wisconsin.

The decline in carbon flux before LCt1 was reached may at least in part be a result of asynchronous canopy senescence whereby color change starts at the outer part of the canopy for some species, resulting in an earlier start to carbon flux decline than field-observed LC and LF. In addition, photosynthesis has been reported to decline prior to visible symptoms of chlorophyll degradation and LC being detected in response to physiological and environmental changes (Reich et al. 1991; Wilson et al. 2000b; Bauerle et al. 2012). In terms of physiological changes, the net photosynthesis per unit area of pin oak (Quercus ellipsoidalis), red maple, and sugar maple in Wisconsin peaks in summer then decline in response to growing leaf age (Reich et al. 1991).Similarly, the photosynthesis of white oak (Quercus alba L.), chestnut oak (Quercus prinus L.), and red maple in Tennessee shows the same temporal pattern with the increase of leaf age as well as leaf thickness (Wilson et al. 2000a). In terms of environmental changes, a decrease in photosynthetic capacity for 23 tree species (both deciduous and coniferous) after the summer solstice (when leaves are still green) resulted from shortening photoperiod (Bauerle et al. 2012), while drought could also reduce autumn photosynthetic capacity of deciduous trees as has been reported in Michigan and Tennessee (Weber and Gates 1990; Wilson et al. 2000a; Busch et al. 2008).

During late autumn, NEE continued to decline and GPP to increase after FLC and FLF were reached, but this signal could not be attributed to deciduous tree activity. The carbon flux signal may have resulted from a combination of coniferous and/or understory shrub photosynthesis as chlorophyll would have likely been still active. Unfortunately, since phenology of understory vegetation and conifers was not recorded in this study, it was not possible to determine their contribution to either the carbon flux- or the satellite-derived signals. However, a previous study (carried out in the eastern USA on 73 common species in deciduous forests) suggested that some non-native understory species remained active after trees were leafless (Fridley 2012). Similarly, the late leafing native species in a New York secondary growth forest were also active after FLF in 2004 (Xu et al. 2007). Evergreen species, on the other hand, lack observable autumn phenology, while photoperiod and temperature could be influential on autumn photosynthesis capability of conifers such as Pinus banksiana (Tanja et al. 2003; Richardson et al. 2009).

Explaining the consistency between NDVI-derived and field-observed autumn phenology

Although the timing of SOA and MOA derived from NDVInor showed closer agreement with field-observed LCt1, LCt2, LFt1, and LFt2 than that derived from EVInor, NEE, or GPP, this relationship does not necessarily imply that NDVI was a better predictor of field phenology during early autumn. Since LC was observed from the bottom of the canopy even though coloration may have already started at the top, and NDVI decline began after LC had started (top of canopy), LC and NDVI show close agreement likely due to the slight delay in both methods. However, this phenomenon did not persist into late autumn, and the bias error between EOA derived from the indirect approaches and LCt3 and/or LFt3 generally became closer than in early autumn. Therefore, the overall higher consistency between NDVI and field observations resulted from (i) the delay in NDVI decline and field-observed LC during early autumn compared to LC at the top of the canopy and (ii) the reduced difference among indirect approaches during late autumn.

Relationship between satellite data and carbon flux measurements

During early autumn, there was higher temporal consistency between EVI- and carbon flux-derived SOA than between NDVI- and carbon flux-derived SOA. In particular, EVI, NEE, and GPP all started to decrease, while NDVI remained stable partly due to the fact that NDVI becomes saturated at high LAI. Furthermore, NDVI is not as efficient as EVI in estimating chlorophyll content during early autumn (Gitelson and Merzlyak 1994; Junker and Ensminger 2016). In particular, NDVI becomes less sensitive with increasing chlorophyll content in sugar maple in Canada (Junker and Ensminger 2016) and grassland, shrubs, and trees in California (Gamon et al. 1995). Similarly, NDVI became saturated when leaf chlorophyll content was low in horse chestnut (Aesculus hippocastanum L.) and Norway maple (Acer platanoides L., Gitelson and Merzlyak 1994). Compared to NDVI, EVI has been shown to have a similar relationship with chlorophyll content but with a higher saturation threshold (Schlemmer et al. 2013); and when chlorophyll content was high, EVI crop canopy noise was lower than NDVI (Peng et al. 2017). These results indicate that EVI has higher sensitivity to chlorophyll content than NDVI in highly vegetated areas, which contributes to its closer temporal agreement with carbon flux variation during early autumn.

The better temporal consistency between EVI and carbon flux measurements than NDVI during late autumn was based on the fact that both EVI and carbon flux indices continued to decrease after FLC and FLF, while NDVI stopped declining before FLC and FLF were reached. This was likely due to the higher sensitivity of EVI to sparse vegetation than NDVI. Therefore, EVI decline had higher temporal consistency with NEE and GPP than NDVI in both early and late autumn, leading to the overall similarity in autumn duration and progression rates. Similar reports by Wu et al. (2017) suggested that NDVI-derived phenology had poor correlation with carbon derived end of growing season, especially for mixed forests; and Peng et al. (2017) suggested that EVI-based spring onset had higher consistency with carbon flux measurements than NDVI. The consistency between EVI and carbon flux measurements was also supported by the reports estimating GPP with EVI in Amazonian and North American forests (Huete et al. 2006; Harris and Dash 2010).

Overall, the difference between these approaches in capturing field phenology may, at least in part, be explained by the difference in what they measure, i.e., satellite data is a measure of vegetation spectral properties while carbon flux is a measure of photosynthetic activity. Although EVI is more sensitive than NDVI under high chlorophyll content, neither is a good predictor of leaf chlorophyll content partly due to the absence of green band. In particular, the green band is argued to be sensitive to chlorophyll (Gitelson and Merzlyak 1994; Datt 1998), whereby indices that include the green band show higher consistency with chlorophyll content than NDVI (Gitelson et al. 1996; Lichtenthaler et al. 1996; Motohka et al. 2010) while lower noise than EVI (Peng et al. 2017). Furthermore, the photosynthetic capability of vegetation may vary even when vegetation spectral properties are similar (Prince 1991) due to the changes in photoperiod, temperature, drought leaf age, leaf thickness, etc., as discussed earlier.

Conclusion

The transition dates and progression rates of autumn phenology in a northern Wisconsin mixed temperature forest were derived from field data, satellite data, and carbon flux measurements. Our results suggest that the timing of autumn transition dates and progression rates derived from NDVI were closer to that of field observations than EVI, NEE, or GPP, while the reliability of the latter three were close. The consistency between NDVI and field observation was, at least in part, a result of inaccuracies of both approaches. Indirect approaches tended to derive earlier SOA than LCt1, while EVI, NEE, and GPP continued to decrease after FLC and FLF. The advanced SOA derived from satellite data could be influenced by species-specific canopy senescence patterns and the sensitivity of vegetation indices, while the advanced SOA derived from carbon flux measurements was more likely due to trees responding to physiological and environmental changes. EVI could possibly detect change in understory activity in late autumn as indicated by the decline after FLC and FLF, while the decrease in carbon flux measurements during this period could have resulted from both understory and conifers. There were a number of limitations associated with this study relating to the limited availably of in situ field data in terms of the length of the time series, a lack of conifer and shrub phenological observations, a lack of detailed data on canopy senescence patterns, and missing data on the extremes of the season all of which could help, at least in part, explain the observed discrepancies between in situ and remotely derived autumn phenology.