Introduction

Estuaries are highly productive coastal environments, which provide a broad range of ecosystem services to human communities (Beck et al. 2001; Martinez et al. 2007). They supply essential nursery habitats for several diadromous fishes, and support commercial and recreational fisheries targeting multiple species (Costa et al. 2002; Pihl et al. 2002). In addition, estuarine environments are important for nutrient cycling and filtration processes, and also function as natural storm protection (Martinez et al. 2007; Barbier et al. 2011). However, due to their accessibility to human populations, estuaries are often subject to heavy anthropogenic pressures (Edgar et al. 2000). They form a focal interface between terrestrial, river, and ocean environments, and tend to concentrate and retain nutrients and pollutants (Barbier et al. 2011; Feyrer et al. 2015). Along with substantial landscape and flow regime modification, this has led to the eutrophication and degradation of ecosystem function in many estuaries worldwide (Cloern et al. 2016).

In addition to these management challenges, there is increasing recognition of the potential for climate change impacts on estuaries. As estuaries are transitional systems between land and ocean, they will incorporate complex drivers from both environments, across multiple temporal and spatial scales (Scavia et al. 2002; Gillanders et al. 2011; Cloern et al. 2016). Changes to temperature and precipitation cycles in watersheds, including snow melt dynamics, will affect freshwater inflow to estuaries, shifts in flow peaks, extreme events, and droughts (Gleick and Adams 2000; Milly et al. 2005; Vigano et al. 2015; Demaria et al. 2016; Lee et al. 2016). Along with land use patterns, changes to flow cycles may modify nutrient and sediment inputs to estuaries, affecting eutrophication, habitats, and ability to meet management targets for water quality (Howarth et al. 2002; Scavia et al. 2002; Chen et al. 2014). Rising sea levels will result in increased coastal inundation and altered circulation patterns and flushing characteristics (Scavia et al. 2002). Warming air temperatures will drive increases in estuarine water temperatures, with impacts on physiological stress, phenology, migration, and recruitment of estuarine-dependent species (Hare and Able 2007; Wagner et al. 2011; Bell et al. 2014; Peer and Miller 2015; Wilber et al. 2016). Many diadromous fishes are already endangered, threatened, or vulnerable as a result of habitat loss, overfishing, and pollution (Jelks et al. 2008; Limburg and Waldman 2009). Their reliance on estuarine habitats will likely render them highly vulnerable to the effects of climate change (Lynch et al. 2014; Hare et al. 2016).

There is clearly a need to develop and refine projections of future conditions in estuaries for the purposes of planning and management. However, current general circulation models (GCMs) have typical spatial resolutions of >100 km. This is too coarse to resolve processes at the scale of most estuaries or even the scale of most watersheds (Stock et al. 2011). Thus, GCMs are generally downscaled to a more appropriate spatial scale, using dynamical or statistical downscaling methods (Wood et al. 2004; Wagner et al. 2011; Flato et al. 2013). Statistical downscaling works by assuming that the local-scale climate is a product of both large-scale climatic processes and smaller-scale local processes. Relationships between these can then be used to develop future projections of local conditions (e.g., Cannon and Whitfield 2002; Johnson and Weaver 2009; Gaitán et al. 2014; Dixon et al. 2016). The advantages of statistical downscaling are primarily that it is computationally simpler and faster than dynamical downscaling and that it incorporates bias correction inherently. A key feature is that the computational simplicity allows for rapid generation of an ensemble of projections spanning a range of climate futures, GCMs, greenhouse scenarios, and internal climate variations (Hawkins and Sutton 2011). Incomplete exploration of this ensemble of climate futures is a primary limitation of the past generation of climate impact assessments on marine resources (Cheung et al. 2016; Payne et al. 2016). The main disadvantages of statistical downscaling are that stationarity is assumed (i.e., that the relationships between regional and local-scale processes remain constant as climate changes) and that long historical observational datasets are required (Wilby and Wigley 1997; Diaz-Nieto and Wilby 2005; Benestad et al. 2008; Dixon et al. 2016). There is also a wide range of different downscaling methods to choose from, further expanding the range of climate futures that can be considered (Hessami et al. 2008; Chen et al. 2013; Gaitán et al. 2014).

Statistical downscaling is likely to be well suited for estuarine environments, as many have comprehensive time series data available for key variables such as temperature and salinity (Feyrer et al. 2015; Cloern et al. 2016; Schulte et al. 2016). However, many modeling steps are required to get from coarse resolution atmospheric outputs from GCMs to projections of biologically relevant environmental conditions in the estuary itself. There are multiple choices to be made regarding input variables and model structure, with complex and interacting sources of uncertainty. Different studies have approached these questions differently: using one-to-many GCMs and a range of downscaling methods and hydrological models, across estuaries with different hydrological and environmental characteristics (e.g., Maurer and Duffy 2005; Wilby and Harris 2006; Vicuna and Dracup 2007; Chen et al. 2014; Thompson et al. 2015; Brown et al. 2016).

The Chesapeake Bay (Fig. 1) is the largest estuary in the USA, supporting multiple biological communities, ecosystems, and human use activities and providing essential habitat for a large number of economically important fish and invertebrate species (Richards and Rago 1999; Sharov et al. 2003; Najjar et al. 2010). This importance has motivated decades of physical and biological observations to track and understand the response of the Chesapeake Bay to changes in land use, climate, and other potential stressors (Hagy et al. 2004; Kimmel and Roman 2004; Kemp et al. 2005; Najjar et al. 2010). Water quality in the Chesapeake Bay is strongly tied to the timing and magnitude of freshwater inflow events, through influences on nutrient and sediment delivery, flushing times, and stratification (Gibson and Najjar 2000; Glibert et al. 2001; Wood et al. 2002; Paerl 2006; Paerl and Otten 2013; Lee et al. 2016). Historical land use practices in the watershed have led to significant eutrophication of the bay, with consequent declines in water clarity, submerged aquatic vegetation, and bottom oxygen concentrations in warmer months (Sprague et al. 2000; Boesch et al. 2001; Langland et al. 2004; Kemp et al. 2005). Harmful algal blooms have also been increasing in incidence and severity, as a consequence of both nutrient enrichment and warming waters (Glibert et al. 2001; Paerl and Otten 2013). These events can contribute to hypoxia, fish kills, and seafood contamination (Tango et al. 2005). Water temperature in the bay can also drive phenology and recruitment of various fish species (Hare and Able 2007; Bell et al. 2014), while streamflow may trigger migration behaviors of diadromous species (Tommasi et al. 2015). There is therefore a substantial need to assess potential climate change impacts on streamflow, temperature, and salinity regimes in the Chesapeake Bay and how these may interact with current management issues.

Fig. 1
figure 1

Study area with bathymetry for the Chesapeake Bay, Susquehanna River watershed, and major tributaries shown. The location of the Thomas Point buoy is shown in yellow, the location of eight weather stations providing air temperature observations are in red, and grid point locations for the WD GCM are shown in purple, to highlight the coarse spatial resolution of GCMs (Color figure online)

While the data richness of Chesapeake Bay is conducive to statistical downscaling, the dynamical complexity poses challenges. The bay is nearly 300 km long and up to 48 km wide. While it has a relatively deep (>20 m) and narrow central channel with reasonably well-characterized dynamics, much of the bay consists of shallow and often complex habitat, with an average depth of only 6.5 m across the entire estuary (Hagy et al. 2005). More than half of the freshwater inflow comes from the Susquehanna River, which drains a 71,250-km2 watershed stretching across urban, suburban, and rural areas of the states of Maryland, Pennsylvania, and New York (Schubel and Pritchard 1986; Hagy et al. 2005). This primary watershed combines with numerous smaller ones to shape salinity and circulation conditions across the bay (Guo and Levinson 2007; Shen and Wang 2007; Reay and Moore 2009; Xu et al. 2012). While up-estuary portions of the Chesapeake Bay are strongly shaped by these river flows, down-estuary conditions reflect a mix of river input and continental shelf conditions. These dynamical complexities may impact the feasibility of statistical downscaling approaches.

In this study, we developed a simple modeling framework to obtain projections of surface temperature and salinity in the Chesapeake Bay from multiple GCMs, downscaled using different statistical downscaling techniques. A key novel feature of this framework for the Chesapeake Bay is the ability to assess spatial structure of surface temperature and salinity. Our approach attempts to draw a balance between capturing the primary estuarine responses to climate drivers, while maintaining the computational efficiency required to rapidly assess a range of climate futures. We assess the sufficiency of this approach in capturing past variations in the surface hydrography of the Chesapeake Bay, and then examine primary drivers of estuarine conditions and contributions to uncertainty from spread in GCM projections vs. downscaling methods. Potential consequences of both future change, and the uncertainty around these projections, are then discussed for the Chesapeake Bay ecosystem.

Methods

A conceptual diagram of the overall modeling framework is shown in Fig. 2a. Daily air temperature and total precipitation from GCMs were statistically downscaled, and fed into a simple water balance model to derive Susquehanna River streamflow. Daily air temperature was also used to estimate daily surface water temperature at the Thomas Point buoy (Fig. 1). Streamflow and water temperature were then combined with other variables to project spatial patterns of surface temperature and salinity across the Chesapeake Bay, at monthly resolution. Each step of this process is described in more detail below.

Fig. 2
figure 2

a Schematic representation of the statistical framework developed in this study. Models are boxed and model outputs are un-boxed. b Conceptual model of a linear model tree (after Solomatine and Dulal 2003)

Thomas Point Water Temperature Model

Before the full framework could be used to generate future projections for the Chesapeake Bay, the development of several models relating air temperature and rainfall to water temperature and salinity using historical observations was required. The first and simplest of these predicted daily surface water temperature from daily air temperature (Fig. 2a). The best long-term time series of air and water temperature in the Chesapeake Bay is from the Thomas Point buoy, which has been recording since fall 1985 (Fig. 1). Several studies (e.g., Hare and Able 2007; Hare et al. 2010; Tommasi et al. 2015) have used daily air temperature as a direct proxy for water temperature in estuaries and rivers, due to their low heat capacity relative to deeper ocean waters, often yielding a high correlation between the two variables. However, there are some issues with this approach. Firstly, the relationship between air and water temperature is often non-linear, leveling off at very cold and very warm temperatures (Mohseni et al. 2003). As climate change projections will require extrapolation of present-day relationships, behavior at extremes is important to quantify. Leveling off at temperature extremes was clearly evident at Thomas Point. The daily air and water temperatures were strongly correlated (R 2 = 0.89, RMSE = 3.56 °C), but the relationship was obviously non-linear, particularly at cold temperatures. Differences between daily air and water temperatures could exceed 5 °C at cooler times of year (Fig. 3).

Fig. 3
figure 3

Modeling daily surface water temperatures at the Thomas Point buoy: using daily air temperatures at Thomas Point (light gray), using a 17-day moving mean of air temperatures (dark gray), and using the non-linear equation from Mohseni et al. (2003) applied to a 17-day moving mean of air temperatures (black). The 1:1 ratio denoting perfect fit is shown as a black dashed line

Water temperatures integrate air temperatures over the preceding days to weeks, and may therefore lag them considerably, particularly during times of rapid air temperature change (Letcher et al. 2016). To account for this, we determined the optimum integration time for air temperature to predict daily water temperature. Correlations were calculated between daily water temperature (1985–2015) and moving means of daily air temperature, the latter tested at all values between 2 and 21 days. Of all the values tested, a 17-day moving mean best improved the correlation (R 2 = 0.98, RMSE = 3.08 °C) (Fig. 3).

Once the lag issue had been addressed, the problem of non-linearity was solved using the exponential equation from Mohseni et al. (2003):

$$ {T}_w=\mu +\frac{\alpha -\mu}{1+{e}^{y\left(\beta -{T}_a\right)}} $$

This equation was optimized using the non-linear least squares (nls) function in R 3.2.1 (R Core Team 2015). The best fit between 17-day air temperatures and water temperatures was achieved where μ = −6.53, α = 37.08, y = 0.1, and β = 14.49. While this equation was initially designed for unsmoothed daily data, it was also useful for further improving the fit of the 17-day air temperature vs. daily water temperature model, particularly at cooler temperatures (R 2 = 0.99, RMSE = 3.04 °C) (Fig. 3). This improvement in correlation was statistically significant at p < 0.001 (Fisher r-to-z transformation).

Water Balance Model

A wide array of hydrological models are available for estimating streamflow from temperature, precipitation, and other variables. We chose to test the sufficiency of a simple water balance model to maximize efficient and rapid consideration of multiple potential climate futures. Our water balance model requires only air temperature and precipitation, and runs at a monthly resolution, using Java code provided by McCabe and Markstrom (2007) adapted into R 3.2.1. The model assigns precipitation in the watershed to snow or rain, depending on temperature. Results are aggregated for the whole watershed, and no routing models are included. Snow is stored in snowpack, with melt rates determined by temperature, while rain and snowmelt contribute to runoff after soil moisture storage is saturated. Runoff across the entire watershed area then becomes streamflow at the mouth of the Susquehanna River. Actual evapotranspiration is determined from potential evapotranspiration (PET), soil moisture storage, and soil moisture storage withdrawal. Of the many methods available for calculating PET, we chose to use the Hamon equation, which has previously shown low error and bias in US watersheds (Vorosmarty et al. 1998). All model parameters were set at the values recommended by McCabe and Markstrom (2007), except for the maximum snowmelt proportion, which was set to 0.7 instead of 0.5 (within the range of parameter uncertainty). This adjustment gave predictions slightly closer to observations. Extractive water use in the catchment is currently <5% (Najjar 1999; SRBC 2013), and so, we did not consider this in the water balance model.

Historical precipitation and air temperature were obtained for the Susquehanna River watershed from the NOAA/National Centers for Environmental Prediction (NCEP) Global Historical Climatology Network (GHCN) Climate Anomaly Monitoring System (CAMS) 0.5° monthly temperature dataset (Fan and van den Dool 2008) and the Climate Prediction Center (CPC) Unified Gauge-Based 0.25° Analysis of Daily Precipitation (Chen et al. 2008). All grid points inside the watershed were averaged by year and month, from 1970 to 2006, and used to drive the water balance model. Results were compared to monthly streamflow observations at Conowingo Dam, which is ~16 km north of where the Susquehanna River opens into the northern end of the Chesapeake Bay (USGS station 01578310, obtained from 1970 to 2006 from http://waterdata.usgs.gov/nwis/dv/?referred_module=sw). Although the Potomac, James, and other rivers also deliver freshwater into the Chesapeake Bay, time series of monthly flow from all major rivers were highly correlated (e.g., Susquehanna vs. Potomac: R 2 = 0.65, 1970–2006, USGS). As a result, we only modeled flow from the Susquehanna River for our framework, to maximize simplicity and minimize multicollinearity in predictive models.

While modeled monthly streamflow from the water balance model (1970–2006) was well correlated with observed streamflow at Conowingo Dam (R 2 = 0.76), predictions were biased low, particularly during winter (January–March) and spring (April–June) (Fig. 4). This suggested a problem with snow measurements, which are well known to be affected by wind-induced under-catch and bias from the placement of gauges (Hayhoe et al. 2007). Larsen and Peck (1974) and Adam and Lettenmaier (2003) suggested that snow under-catch varied strongly by region, but averaged around 50% at wind speeds typical of the Susquehanna watershed in winter-spring (4–7 ms−1: NCEP/NCAR Monthly Reanalysis). Iterative testing of a snow catch ratio in the water balance model showed that a value of 0.55 was optimal (i.e., precipitation classified as snow by the water balance model should be divided by 0.55). This simple correction improved the bias of the model substantially, and also slightly improved the fit (R 2 = 0.8) (Fig. 4). However, very high flow events (e.g., Hurricane Agnes, 1972) were still somewhat under-estimated by the model. Note that the snow correction was not required for projections from the GCMs, as it relates only to bias in actual precipitation observations from gauges.

Fig. 4
figure 4

Water balance model results. Top: observed mean monthly Susquehanna River streamflow at Conowingo Dam (black), modeled monthly streamflow from the water balance model (red), and modeled monthly streamflow from the water balance model with the snow under-catch correction (green). Bottom: observed versus modeled monthly Susquehanna River streamflow 1970–2006 (Color figure online)

Spatial Temperature and Salinity Models

Spatial models were built using historical conductivity, temperature, and depth (CTD) cast data from the Chesapeake Bay Program (http://www.chesapeakebay.net/data), the University of Maryland Chesapeake Biological Laboratory cruise database (hjort.cbl.umces.edu), and the Smithsonian Environmental Research Center database (https://serc.si.edu/environmental-data), from 1986 to 2015. Surface temperature and salinity values were extracted from individual casts at all stations within the Chesapeake Bay and major tributaries. Stations <0.5 km from shore were excluded. This represented a small overall proportion of the pelagic, subtidal Chesapeake Bay, but excluded sampling stations subject to finer-scale nearshore variability, which our models would likely not be able to capture (e.g., Breitburg 1990). The CTD dataset was used to build models of spatially resolved surface temperature and salinity within the Chesapeake Bay, given estimates of Thomas Point surface temperature and streamflow (derived as described above) and a small set of additional predictors described below.

Daily mean near-surface air temperature at Thomas Point was strongly correlated to air temperature at ten other nearby buoys in the Chesapeake Bay (75.8–77° W and 37–39.5° N) for years 2010–2014 (R 2 = 0.83 to 0.93), confirming highly coherent daily air temperature variation across the Chesapeake Bay. The estimated Thomas Point buoy temperature thus served as the primary predictor of surface water temperature at other latitudes and longitudes via spatial covariance. We also included three other predictors supported on both mechanistic and theoretical grounds: (a) the 30-day trend in Thomas Point temperature change to account for seasonal hysteresis in spatial temperature covariance (Letcher et al. 2016), (b) freshwater inflow, which can have a pronounced cooling effect in the upper portions of the bay relative to moderating ocean effects in its lower reaches (Preston 2004), and (c) the time of day to account for the diurnal signal in CTD casts (Table 1). Variables (a) and (b) are point measurements, taken at only one spatial location (i.e., Thomas Point and the Conowingo Dam, respectively). We therefore included latitude and longitude as predictors in the model, to allow the modeling of surface temperature and salinity in two-dimensional space. The same predictor variables were used for the spatial model of surface salinity, except that predicted Thomas Point temperature was not included.

Table 1 Predictor variables included in predictive model trees for surface temperature and salinity

The Cubist package in R 3.2.1 (Kuhn et al. 2015) was used to create statistical model trees to predict surface temperature and salinity across the Chesapeake Bay (Fig. 2b). These models are similar to standard regression tree models, in that they split the training data into increasingly similar subsets based on the values of predictor variables, before arriving at predicted values at terminal nodes. Model trees differ, however, in that the values at the terminal nodes are described using multivariate linear equations (terminal linear models), rather than fixed values. These linear equations predict the value of the target outcome (in this case, temperature or salinity), based on a subset of the predictor variables. The model tree is thus reduced to a set of conditional linear regression equations or “rules,” which can then be either eliminated via pruning or combined for simplification (Kuhn et al. 2015). As a result of their use of linear equations at tree nodes, model trees can extrapolate beyond the range of training datasets, while many other machine learning techniques cannot (Quinlan 1992, 1993). This characteristic is particularly important for modeling temperature under future climate change, as warming will lead to novel conditions outside the range of historical datasets. Model tree training can be refined using “committees,” which operate similarly to boosting for boosted regression trees (Elith et al. 2008). Essentially, multiple model trees are constructed, each one learning from the deficiencies of the previous one, and the final predicted value is a mean from all trees (Kuhn et al. 2015).

Machine learning predictive models are flexible and powerful, but can be prone to overfitting if appropriate steps are not taken to control this. They are generally expected to perform much better on training rather than unseen test or validation datasets (e.g., Elith et al. 2008), with high skill on training data and low skill on validation data indicative of overfitting, and potentially an overly complex model. To determine the best model configurations, model trees were therefore trained on the first 20 years of CTD data (1986–2005), and validated on the last 10 years (2006–2015). The optimal number of control rules and committees was determined by assessing root-mean-square error (RMSE) only on the unseen test data. The model configuration that gave the best results on the test data was thus considered to be sensitive enough to capture important relationships and interactions, but general enough to avoid overfitting. The optimum configuration to predict surface temperature was 100 committees and a maximum of 20 rules, while for surface salinity, it was 100 committees and a maximum of 15 rules. The importance of the predictor variables to both the conditional splits of the model tree, and the linear models at the terminal nodes, was reported as percentages, as described in Kuhn et al. (2015). The maximum importance score that a predictor can attain is 100%. Thus, the total percentage score across all variables will not add up to 100%.

As surface temperature and salinity both have very strong spatial characteristics in the Chesapeake Bay, the predictive power of the two model trees was assessed across seven major subregions: the James; Rappahannock; Potomac and Patuxent Rivers; and the Upper, Mid, and Lower portions of the bay main-stem (Fig. 1). The Upper Bay was defined as all main-stem waters north of the Patuxent River mouth, the Lower Bay was defined as waters south of the Rappahannock River mouth, and the Mid Bay was all main-stem waters in between. In addition, surface temperature results were assessed using monthly anomalies, to remove the effect of the very strong seasonal temperature signal (i.e., the observed monthly mean surface temperature for each subregion was subtracted from both observed and modeled values before comparison).

Selecting GCMs

Our aim was to consider a range of plausible futures for the Chesapeake Bay, using GCMs from the 5th Coupled Model Intercomparison Project (CMIP5). All GCMs (n = 33) available which included 2-m air temperature and total precipitation were assessed for inclusion. Model output was accessed through the NOAA Climate Change Web Portal (Scott et al. 2016), which re-grids GCMs to a common spatial resolution, and models were compared for the Chesapeake Bay and Susquehanna River watershed region (36–43° N, 74–80° W). Late twentieth-century (1956–2005) climatologies of air temperature and precipitation from GCMs were compared to observations from the CAMS temperature dataset and the CPC precipitation analysis for the same region and time period. As model hindcast skill within a region of interest often differs among variables, selecting a subset of “best models” can be difficult (Overland et al. 2011; Sheffield et al. 2013). We instead focused on culling models with the highest hindcast error and then selecting models that encompassed the range of future temperature and precipitation projections for use in our study. Outlier models were defined as those with an annual air temperature error of >2 °C and/or a precipitation error of >400 mm year−1. Seven of the 33 candidate GCMs were removed using these criteria: FGOALS-S2, MIROC-ESM, MIROC-ESM-CHEM, CAN-ESM2, and ACCESS1-3 were excluded for warm temperature bias (>2 °C) versus observations. FGOALS-G2 was too cool, and rainfall in CMCC-CESM was too high. Adjusting total observed rainfall upwards with the snow correction described above did not change the model selections.

The spread of future projections from the remaining models highlighted a broad range of potential futures for the region under RCP8.5 (Fig. 5). All GCMs projected warming temperatures, and some increase in mean annual precipitation in the Susquehanna River watershed. The extent of these trends, however, varied considerably among models. We chose four GCMs to capture the range of potential futures: GFDL-CM3 (more warming, greater precipitation increase: hereafter WW (warm wet) model), IPSL-CM5A-LR (more warming, less precipitation increase: hereafter WD (warm dry) model), GFDL-ESM2G (less warming, less precipitation increase: hereafter CD (cool dry) model), and MRI-CGCM-3 (less warming, greater precipitation increase: hereafter CW (cool wet) model) (Fig. 5). Each of these GCMs had a spatial resolution of >90 km over the study area. While INMCM4 and FIO-ESM were both more extreme examples of cooler, drier GCMs than GFDL-ESM2G, they had very strong seasonal bias compared to historical observations (winter several degrees too warm, summer in INMCM4 also too cold). This bias persisted after downscaling, and so we selected ESM2G instead.

Fig. 5
figure 5

Two-meter air temperature and total precipitation anomalies for the Susquehanna River watershed from 26 GCMs under RCP8.5: 1956–2005 versus 2050–2099. The ensemble mean from all GCMs is shown in black and extended to both axes with the black dashed line. The four GCMs chosen to represent the range of potential futures for the Chesapeake Bay are labeled (Color figure online)

Daily air temperature and precipitation were extracted from each GCM, at all grid points contained within the Susquehanna River watershed (Fig. 1). In addition, air temperature was extracted for the closest grid point to the Thomas Point buoy. Where several grid points were close to this location, the one where the mean late twentieth-century seasonal air temperature cycle most closely resembled that observed at Thomas Point (1985–2000) was chosen.

Statistical Downscaling

Statistical downscaling was used to derive (a) estimates of Thomas Point air temperature and (b) estimates of air temperature and precipitation across the Susquehanna River watershed from coarse resolution GCM data. We note that the latter case involves deriving anomalies across a scale that can include several GCM grid cells but which nonetheless remains poorly resolved by GCMs.

Four statistical downscaling methods were applied to outputs from the four GCMs: bias-corrected quantile mapping (BCQM), change factor quantile mapping (CFQM), equidistant quantile mapping (EDQM), and the cumulative distribution function transform (CDFt). These statistical models are all based on quantile mapping but differ in their implementation of the bias correction step. We acknowledge that there are more sophisticated ways in which to derive quantile mapping functions (e.g., Cannon 2011) and that there are many more statistical downscaling techniques that also could have been used (e.g., Zorita and von Storch 1999; Wilby et al. 2002, 2003; Fasbender and Ouarda 2010; Zeng et al. 2011; Gaitán et al. 2014). However, we chose to include the selected methods, which are conceptually simple and relatively low cost, to demonstrate the flexibility of our modeling framework. Our approach also investigates whether the downscaled projections from these related methods may differ, with implications for the use of multiple downscaling methods for impact assessment (Gaitán 2016).

Of the aforementioned quantile methods, the most commonly employed is the BCQM (Ho et al. 2012). It derives separate bias corrections at each position (quantile) within the cumulative distribution function (CDF). In essence, the bias, defined as the difference between the observations and climate model during the historical period, is used as a correction factor for the model output during the future. The second method, the CFQM (see Ho et al. 2012 for details), is much less commonly used in practice. In essence, the change in the model from the historical to the future period is applied to the historical observations, again separately at each position in the CDF.

Since there is no obvious theoretical argument to favor either of the previous two strategies, Li et al. (2010) introduced the equidistant cumulative distribution function matching method (EDQM), combining aspects of the CFQM and BCQM methodologies. A criticism of the BCQM method is that it assumes that the bias computed for the historical period is applicable to the future (Li et al. 2010). Conversely, one could criticize the CFQM approach since it assumes that the change factor computed for the climate model is equally applicable to the observations. The EDQM method deals with these issues by applying a correction that consists of two terms: one for bias correction and the other as a change factor.

The fourth method, cumulative distribution function transform (CDFt; Michelangeli et al. 2009), uses the following function to obtain the CDF of the downscaled variable:

$$ {F}_{\mathrm{obsF}}(x)={F}_{\mathrm{obsH}}\left({F^{\hbox{-} 1}}_{\mathrm{gcmH}}\left({F}_{\mathrm{gcmF}}(x)\right)\right) $$

This method uses a series of post-processing refinements to improve the tail behavior of the projections (unlike the raw BCQM method, described above). Specifically, the time series from the coarse resolution GCMs are transformed to have the same mean as the local historical time series, thus preventing the downscaled projections getting out of range when implementing the transform equation. Hence, the final output maintains the initial mean from the corresponding GCM output (historical or future).

Specifically, we used the CDFt R package (Vrac and Michelangeli 2009) to obtain the downscaled time series. This implementation needs two parameters to be defined: npas and dev. Dev, or the coefficient of development, is used to extend the range of data on which the quantiles will be calculated, while npas is the number of quantiles to be empirically estimated (default value is 100). Here, we used dev = 1 and npas = default.

Each grid point in the Susquehanna River watershed from the CPC precipitation analysis (n = 119, 1970–2005) was assigned to the closest grid point for each GCM (n = 2–6: Fig. 1). A mean of all CPC grid points assigned to each GCM point was then used as the historical precipitation observations for downscaling. Daily air temperature observations were obtained from eight weather stations for 1970–2005. Similarly to precipitation, each weather station was assigned to the closest grid point for each GCM and a mean taken to provide historical observations air temperature observations for downscaling. Air temperatures at Thomas Point were obtained from only one grid point from each GCM, and were downscaled using historical observed air temperatures at the Thomas Point buoy (1985–2015).

Temperature and precipitation outputs from statistically downscaled GCMs were then run through the framework shown in Fig. 2a, to give projections of surface temperature and salinity across the Chesapeake Bay. As the water balance model ran at monthly resolution, all predictor variables input to the model trees were also aggregated to month and year before this analysis. Results thus provided projections of surface temperature and salinity at monthly resolution. Model outputs were compared between the late twentieth century (1970–1999) and the late twenty-first century (2071–2100), to show possible temperature and salinity futures from different GCMs.

Results

Spatial Surface Temperature and Salinity Estimates

Both the surface temperature and surface salinity models reproduced historical spatiotemporal variability across the Chesapeake Bay reasonably well, despite the dynamical complexity of the system. The predictive power of the surface temperature model on monthly anomalies (using only the out-of-sample test years 2006–2015) was highest in the Lower Bay (i.e., most seaward: R 2 = 0.74) and James River (R 2 = 0.77) and lowest in the Rappahannock River (R 2 = 0.62) (Table 2). RMSE averaged 1.32 °C, and was highest in the Rappahannock River (1.51 °C) and lowest in the Lower Bay (1.14 °C). The most important variables for the conditional splits in the model tree were the modeled water temperature at Thomas Point (75%) and the seasonal (30 day) change in air temperature (71%). These two variables were also most important to the terminal linear models, with scores of 100 and 76%, respectively.

Table 2 R 2 values between observed and modeled surface temperature (monthly anomalies) and surface salinity in seven regions of the Chesapeake Bay, from the unseen test data not used to build the predictive model trees

The surface salinity model gave the best results in the Upper Bay (R 2 = 0.76), and was weakest in the Lower Bay and the Patuxent River (R 2 = 0.55, R 2 = 0.42, respectively) (Table 2). RMSE averaged 2.21, and was highest in the James River (3.00) and lowest in the Upper Bay (1.80). Absolute RMSE in the Potomac River was comparable to other regions (2.02), but constituted the highest percentage of observed mean annual salinity (49.5%) (Table 2). The most important variables for the conditional splits in the model tree were latitude (97%), longitude (90%), and streamflow (41%). These three variables were also most influential for the terminal linear models, with scores of 97, 93, and 97%, respectively.

Example time series of monthly observed and modeled surface temperature (anomalies) and surface salinity for the Upper Bay, Lower Bay, and Potomac River show that the model trees generally tracked observed values well (Fig. 6). As expected, model skill was high during the 1986–2005 training period, and degraded somewhat in the unseen 2006–2015 testing period. However, R 2 statistics between observed and predicted surface temperature anomalies stayed above 0.6 for the test period in all zones of the Chesapeake Bay. Monthly time series of surface salinity in the test period were also reasonably well represented in the James River, Mid Bay, Potomac River, Rappahannock River, and Upper Bay (R 2 > 0.6). There was a more marked degradation of skill between training and testing time periods in the Lower Bay (R 2 = 0.55) and Patuxent River (R 2 = 0.42) (Table 2). Some point locations in the bay were sampled repeatedly over the 30-year time series, and contained >500 observations. The skill of the models on unseen test data was similar whether the whole zone (e.g., Mid Bay) was aggregated together or if these long-term station locations within each zone were assessed separately. For example, a station at −76.292° W, 38.319° N in the Mid Bay (CB5.1: Fig. 6) with 634 observations in the dataset had an out-of-model validation R 2 between observed and modeled surface salinity of 0.61: the same value for the Mid Bay as a whole (Table 2). Similarly, a station at −76.602° W, 38.425° N with 564 observations in the Patuxent River (LE1.1) had an R 2 between observed and modeled surface salinity of 0.40, whereas the value for the river as a whole was 0.42. The models were thus capturing seasonal and interannual variability at point locations in the Chesapeake Bay with acceptable skill and not simply reflecting (for example) climatological salinity gradients within specific zones.

Fig. 6
figure 6

Observed and modeled monthly surface temperature anomaly and surface salinity in the Upper and Lower Chesapeake Bay and Potomac River, 1986–2015. Results from one well-sampled station in the Mid Bay, are also shown (station CB5.1). Time series and R 2 statistics are shown for the dataset used to train the models (1986–2005) and the out-of-sample “test” dataset (2006–2015)

To further assess the ability of the model trees to capture not only climatological patterns, but also anomalous years, spatial fields of both temperature and salinity were compared for a cool September (2011) vs. a warm September (2008) and a dry September (2010) vs. a wet September (2011) (Fig. 7). September was chosen as it was a generally well-sampled month across the time series, both spatially and temporally. We note that September 2011 was particularly wet due to the effects of Tropical Storm Lee (Cheng et al. 2013). Hindcasts from the model trees captured not only the higher surface temperatures during a warm year (Fig. 7c, d) and lower salinities during a wet year (Fig. 7g, h), but also the general spatial patterns of these phenomena. In particular, both observations and models highlighted the downriver and down-bay movement of isohalines during times of high Susquehanna River flow (Fig. 7g, h). The surface temperature predictions were somewhat more biased than those for surface salinity, with the western rivers showing a cool bias in 2011 and the Upper Bay showing a warm bias in 2008 (Fig. 7a–d). However, the overall spatial structure was reproduced reasonably well.

Fig. 7
figure 7

Observed and modeled September surface temperature and salinity for a warm year (2008) versus a cool year (2011) and a wet year (2011) versus a dry year (2010). Results are interpolated (kriging) between CTD station locations (shown in black). a Sept. 2011: observed. b Sept. 2011: modeled. c Sept. 2008: observed. d Sept. 2008: modeled. e Sept. 2010: observed. f Sept. 2010: modeled. g Sept. 2011: observed. h Sept. 2011: modeled (Color figure online)

Projected Atmospheric Temperature and Precipitation Changes

Comparison of the four statistical downscaling methods used to link projected GCM-scale changes in air temperature and precipitation to the Chesapeake Bay showed that downscaled changes were similar in magnitude to those from the corresponding GCMs (2–5.5 °C, ~0–200 mm year−1 rainfall) (Fig. 8). However, overall trends and separation among the four GCMs were less clear for precipitation, which showed much stronger interannual variability than temperature. In addition, the contribution of downscaling methods to variability in projections was much less than the contribution of inter-GCM variability, for the four GCMs examined here (Fig. 8). Choice of downscaling method, however, could exert a significant impact by the end of the twenty-first century for GCMs with the largest projected warming. The mean difference among the warmest and coolest downscaling methods for 2071–2100 was 0.7 °C for the WW model and 0.8 °C for the WD model (Fig. 8a).

Fig. 8
figure 8

a, b Projections of mean Susquehanna River watershed 2-m air temperature and total precipitation from four GCMs under RCP8.5, using four statistical downscaling methods. c, d Ten-year moving means of air temperature and precipitation from each GCM, with the overall range from a and b shown in gray (Color figure online)

Surface Temperature and Salinity Futures for the Chesapeake Bay

Annual mean projections of bay-wide averages of surface temperature showed a clear warming trend similar to that of air temperature (Fig. 9a). Projections of surface salinity and Susquehanna River streamflow were more variable; however, the two wetter models (WW and CW) showed little change to annual mean salinity or streamflow by the end of the century. In contrast, the two dry models (WD and CD) projected significantly decreased mean annual streamflow and thus increasing surface salinity (Fig. 9b, c). When this analysis was repeated for each season (graph not shown), the CW model also showed a significant decrease in winter salinity and increase in streamflow, by 2100 (linear regression, p < 0.05, Durbin-Watson test p > 0.05)

Fig. 9
figure 9

Ten-year moving means of modeled a surface water temperature, b surface salinity, and c Susquehanna River flow at Conowingo Dam from four GCMs under RCP8.5. A mean of the four statistical downscaling methods is shown, with overall range in gray. a, b Means across the entire Chesapeake Bay (Color figure online)

Projections from each GCM averaged across all downscaling methods showed warming of Chesapeake Bay surface waters during all seasons by the latter half of the last 30 years of the twenty-first century (Fig. 10). However, the extent of overall warming and its seasonal distribution varied among models. The WW model projected the strongest warming in estuarine water temperatures between 1970–1999 and 2071–2100: >5 °C in all seasons. Warming in this model was particularly strong in winter (January–March) and fall (October–November), when compared to the other GCMs. In contrast, the WD model showed the weakest warming in winter: 2.7 °C between 1970–1999 and 2070–2100 vs. >4 °C in all other seasons. The CW model projected the weakest warming overall: from 2.3 °C in winter to 3.0 °C in fall by the end of the century. Both the WW and WD models projected mean summer (July–September) surface temperatures across the bay of >30 °C by 2071–2100, compared to a historical (1970–1999) mean of 25.5–26 °C. In contrast, the CW model projected mean summer water temperatures of 28.0 °C for 2071–2100 and the CD model 28.9 °C. The strong winter warming in the WW model resulted in projections of mean winter surface temperatures of 11.5 °C, much warmer than for 1970–1999, where the mean was 6.3 °C. In contrast, the CW, WD, and CD models projected winter surface temperatures of 8.7–9.0 °C by the end of the century (Fig. 10). Projections of winter warming and streamflow changes were more variable than those for summer, across all models.

Fig. 10
figure 10

Mean monthly surface temperature, Susquehanna River streamflow, and surface salinity for 1970–1999 (black) and 2071–2100 (red) under RCP8.5, from each GCM, averaged across all downscaling methods. Mean values for each time period are shown in bold lines; thin lines represent ±one standard deviation among the 30 years in each time period (Color figure online)

Warming in the WW and WD models resulted in projected conditions well outside the range of historical variability during summer. For example, the maximum observed surface water temperature at the Thomas Point buoy (1985–2015) was 29.9 °C, recorded on August 4, 2006. By the end of the twenty-first century (2071–2100), this value was projected to be exceeded on 63% of days in July, in both the WW and WD models. For August, this value was exceeded on 100% of days in the WW model and 87% of days in the WD model. In contrast, surface temperature at Thomas Point was projected to exceed 29.9 °C on only 14% of days in August in the CW model by the end of the century and 37.6% of days in the CD model. Thus, summer conditions in the Chesapeake Bay were projected to be strongly novel in the WW and WD models, but less so in the CW and CD models.

Changes in streamflow varied strongly among models. While the wetter WW and CW models projected increases in late winter/early spring streamflow from the Susquehanna River and stable mean flows during other seasons, the WD model projected decreased streamflow across most seasons by the end of the century (Fig. 10). This was despite modest increases in precipitation, and was due to increased evapotranspiration associated with warming atmospheric conditions. The different models were in much closer agreement for summer, with all showing either stable or slightly decreasing streamflow. Surface salinity changes reflected differences in streamflow patterns across models. Changes were close to neutral for much of the spring through fall in all models. Projections diverged, however, in the winter and early spring, with more saline conditions prevailing for the drier WD model and fresher conditions for the WW and CW models.

Overall, choice of statistical downscaling method contributed less to variability in projections than choice of GCM (Fig. 8). However, a comparison of the coolest (BCQM) and warmest (CFQM) methods for the WW model highlighted considerable discrepancies in projections at certain times of year (Fig. 11). The CFQM model was 1.4 °C warmer than the BCQM model during the summer, and Susquehanna River streamflow was ~200 m3 s−1 less during winter-early spring (January–April). Differences in salinity between the two downscaling methods were small, but were slightly lower for BCQM.

Fig. 11
figure 11

a, c, e Mean monthly surface temperature, Susquehanna River streamflow, and surface salinity for 1970–1999 (black) and 2071–2100 (colored) under RCP8.5 for the WW model only, for the BCQM (blue) and CFQM (pink) statistical downscaling methods. Mean values for each time period are shown in bold lines; thin lines represent ±one standard deviation (omitted from future projections for clarity). b, d, f As for a, c, e, but only future change (2071–2100 minus 1970–1999) is shown, to highlight differences between downscaling methods (Color figure online)

Projected increases in surface temperature were largely spatially coherent across the Chesapeake Bay, but some spatial structure was apparent. We focus on the summer period, when the Chesapeake Bay is most likely to experience warm conditions beyond previously recorded highs, which could stress the physiological limits of contemporary marine communities. Results from other seasons are shown in the Supplementary Material (Fig. S1). In each case, the two GCMs with the smallest and largest changes in temperature are shown. Reduced warming was associated with the southern end of the Chesapeake Bay, due to the moderating influence of continental shelf waters (Fig. 12). Maximum warming was associated with the Upper Bay, particularly in the WW model.

Fig. 12
figure 12

Projected changes in surface temperature in the Chesapeake Bay from the WW (a) and CW models (b) during summer: 2071–2100 minus 1970–1999, averaged across all downscaling methods. The two GCMs shown had the smallest and largest changes in temperature of the four considered. Results are shown on a common scale (4.1 °C range, a, b), to highlight the difference between the two models, and on a model-specific scale (1.5 °C range, c, d), to highlight spatial structure

Stronger spatial structure was apparent in the projected changes in surface salinity. We focus on the winter period (January–March) where projected changes in streamflow were highest, as were contrasts between the models. Results from other seasons are shown in the Supplementary Material (Fig. S2). In each case, the two GCMs with the smallest and largest changes in salinity are shown. Changes in the WW model were strongest in the Upper Bay and in mid-low reaches of some western rivers. These are currently transition zones between oligohaline and mesohaline waters, or mesohaline and polyhaline waters, and salinity decreases in these areas represent mean downstream movement of isohalines. The weakest salinity change was in upstream portions of rivers, where conditions are currently tidal fresh to oligohaline, and so increasing streamflow cannot decrease salinity substantially (Fig. 13). In contrast, the decreased streamflow in the WD model leads to projected increases in salinity throughout the bay. Similarly to the WW model, changes were strongest in the Upper Bay and the mid reaches of some western rivers, representing a particularly large percentage change given current low salinities in these areas. Weakest changes in salinity were projected for tidal fresh to oligohaline reaches of rivers, which were expected to stay largely fresh despite reduced streamflow.

Fig. 13
figure 13

Projected changes in surface salinity in the Chesapeake Bay from the a WW and b WD models during winter: 2071–2100 minus 1970–1999. The two GCMs shown had the smallest and largest changes in salinity of the four considered

Discussion

Modeling Framework: Uncertainty and Complexity

A key challenge for understanding potential climate change impacts on natural resources and environments is the development of projections at the appropriate spatial scale. GCMs generally have coarse spatial resolution and inherent bias in the simulation of important processes (Wood et al. 2004; Xu and Yang 2012). Downscaling using either dynamical or statistical techniques usually improves projections, but each family of methods is subject to advantages and disadvantages. Dynamical downscaling has the advantage of explicitly and mechanistically representing physical processes controlling regional climate (Hellström et al. 2001). However, it is computationally expensive, which makes comparison of multiple GCMs and emission scenarios more difficult. In addition, the dynamically downscaled model will usually inherit the bias of the parent GCM, and addressing this issue is complex (Xu and Yang 2012). This is an important consideration for projections involving hydrological simulations, which are sensitive to bias in both the mean and spatial distribution of watershed properties (Wood et al. 2004).

Statistical downscaling has the disadvantage of relying on empirical relationships between coarse- and fine-scale processes and assuming that these relationships will remain valid when projected into the future (i.e., stationarity: Schmith 2008; Michelangeli et al. 2009; Cannon 2010; Kallache et al. 2011; Gaitán and Cannon 2013; Dixon et al. 2016). However, advantages include inherent bias correction and a much lower computational cost than dynamical downscaling. As a result, the statistical framework presented here allowed consideration of a range of projected surface temperature and salinity futures for the Chesapeake Bay under climate change. While relatively simple, our approach was able to reproduce historical conditions well at monthly resolution, and facilitated the easy and rapid comparison of multiple GCMs and statistical downscaling methods. To establish the methodology, here we tested four GCMs spanning the range of plausible future temperature and precipitation projections from CMIP5 and one CO2 concentration pathway; however, the statistical framework could easily ingest outputs from other models and scenarios.

We build on previous studies of potential climate change impacts to the Chesapeake Bay by using statistical model trees to consider both surface temperature and salinity in two dimensions. Although the use of one-dimensional air temperature as a proxy for water temperature (e.g., Pilgrim et al. 1998; Hare and Able 2007; Tisseuil et al. 2012; Jacobs et al. 2015) is a reasonable strategy in shallow estuarine environments, the more complex approach used in this study confers several advantages. Firstly, non-linearities in the air-temperature vs. water-temperature relationship could be accounted for, in a way that allowed for future extrapolation. Secondly, the influence of streamflow on surface water temperature could be included, although this effect was minor compared to that of air temperature in the Chesapeake Bay. This may not be the case in other estuaries more influenced by snowmelt, however. More importantly, projections of Chesapeake Bay salinity changes exhibited spatial contrasts within the bay, which have not been previously estimated, though changes in estuarine salinity have effects on marine resources comparable to those imposed by more often emphasized temperature shifts (Rome et al. 2005; Constantin de Magny et al. 2009; Jacobs et al. 2014, see additional discussion below).

While more complex than a linear air temperature proxy model, our approach was much less complicated than many other examples in the literature. Our water balance model was one of the simplest available, and did not include complex soil dynamics, flow routing, nutrient and sediment dynamics, or groundwater inflow (e.g., Hayhoe et al. 2007; Chen et al. 2014; Demaria et al. 2016; Lee et al. 2016), which may have led to an overestimation of watershed evapotranspiration (Milly and Dunne 2011) and an inability to capture extreme flow events (e.g., hurricanes, floods). We also did not consider hydrodynamics of the estuary in three dimensions, which usually requires additional parameters, such as wind fields (Gibson and Najjar 2000; Valle-Levinson et al. 2001; Xu et al. 2012; Lee et al. 2013; Irby et al. 2016). Considering the effects of sea level rise on temperature and salinity in the Chesapeake Bay was also beyond the scope of this study, even though these may be substantial (Hong and Shen 2012; Ross et al. 2015). This was largely because the impacts of sea level rise on estuarine conditions are spatially and temporally complex (e.g., Hilton et al. 2008) and because projections of the future magnitude of sea level rise are extremely divergent and uncertain (Church et al. 2013; Grinsted et al. 2015). Our projections of surface temperature and salinity would also not represent potential changes throughout the entire water column, which are more relevant to organisms which do not live at the surface. Despite these simplifications, our approach was able to reproduce observations in the Chesapeake Bay at a monthly resolution with good accuracy, using only air temperature and watershed precipitation as inputs. However, different or modified models may be required for other watersheds with different characteristics (e.g., smaller, more arid, more influenced by snowmelt etc.). We also note the importance of continued development of dynamical approaches that allow extrapolation beyond surface properties and to biogeochemical impacts such as hypoxia (Bever et al. 2013; Brown et al. 2013; Testa et al. 2014; Feng et al. 2015). Applications of statistical and dynamical tools in concert may maximize the benefits of large ensembles that statistical approaches facilitate, along with the stronger mechanistic linkages enabled by dynamical approaches.

An advantage of the simplicity of our statistical framework was the ability to include projections from multiple GCMs, downscaled using multiple methods. We found that the choice of GCM contributed much more to overall uncertainty than choice of downscaling method. However, this was largely a product of the decision to include GCMs with widely diverging futures, but to use statistical downscaling methods with reasonably similar characteristics. Previous studies (e.g., Wood et al. 2004; Wilby and Harris 2006; Chen et al. 2011; Mandal et al. 2016) have found that the choice of statistical downscaling method can contribute considerable uncertainty to future projections in some systems. However, the suite of GCMs, hydrological models, and statistical downscaling methods differed between each study. Clearly, the relative contribution of model and downscaling methods to projection uncertainty depends on both the method and model selected, and will be region specific (Johnson et al. 2012). A more exhaustive assessment of their relative influence on projection uncertainty will require consideration of a wider range of downscaling methods. Ideally, downscaling approaches would be explicitly considered within a full suite of uncertainty sources impacting projections (Hawkins and Sutton 2011; Cheung et al. 2016).

While in our study the choice of GCM was more influential than the choice of statistical downscaling method, the two warmer models (WW and WD) showed some divergence of projections from different downscaling methods in the later twenty-first century. Results diverged primarily at the tails of the distributions, particularly where future conditions were outside ranges experienced in the historical period (e.g., summer temperatures: Fig. S3). This was primarily a result of the different bias correction procedures used in each technique (see “Methods” section). Many climate change impact studies include only one statistical downscaling method, without consideration of the effect of this choice on overall propagation of error or variance. Our results suggest that the choice of downscaling method can introduce considerable variability once conditions diverge from current observations, even among closely related methods. For example, projected mean annual air temperatures in the Susquehanna River watershed by the end of the twenty-first century in the WD model differed by 0.8 °C between the four statistical downscaling methods. The more conservative EDQM method projected mean temperatures from 2071 to 2100 of 15.4 °C, while the warmer CFQM method projected a mean of 16.2 °C. The other methods (BCQM and CDFt) were intermediate between the two.

Potential Impacts from Projected Temperature and Salinity Changes

Results from this study highlighted several potential changes to surface temperature and salinity in the Chesapeake Bay by the end of the century. All GCMs projected substantial warming under RCP8.5; however, the magnitude of warming between the late twentieth and late twenty-first centuries varied markedly. For example, the WW model projected an increase in summer surface temperatures of >5 °C, while the CW model projected as little as ~2 °C. The models also disagreed on which season would see the strongest warming. In all models, the magnitude of summer warming was more much less variable than that for winter. Changes to salinity were even less certain, particularly for winter and fall. While the warmer, drier WD model projected increases in salinity for both these seasons, the WW and CW models showed salinity decreases for winter. In contrast, all models projected weaker changes to salinity for summer.

These findings are broadly consistent with many of those from Najjar et al. (2010), who reviewed potential climate change impacts on the Chesapeake Bay using CMIP3 GCMs. Apart from one outlier (CCSR), a seven-member ensemble also projected a ~2–5.5 °C increase in temperature over the Chesapeake Bay watershed area in this earlier study. However, similarly to the present study, precipitation projections were much more variable, ranging from a ~30% decrease to a ~20% increase within seasons. Variability in freshwater inflow to the Chesapeake Bay is largely driven by precipitation, rather than evapotranspiration (Najjar 1999; Gibson and Najjar 2000; Najjar et al. 2010). Uncertainty in precipitation projections from the GCMs is thus the main driver of uncertainty in projected streamflow and surface salinity within the bay. This uncertainty in projected future precipitation, and thus streamflow, is a common thread in many other studies from the NE USA (Najjar et al. 2010; Johnson et al. 2012) and other locations (Schneider et al. 2013; Bosshard et al. 2014).

The key addition of our study relative to previous GCM syntheses (including Najjar et al. 2010) is inference of the implications of these large-scale changes on spatial temperature and salinity patterns in Chesapeake Bay. These are, to our knowledge, the first spatially resolved salinity and temperature projections for the bay and the first comparison of projection uncertainty at the scale of the estuary itself. For summer warming, results highlight a strong coherent warming signal with uncertainty bounds mimicking the range of projected changes in surface air temperature. Weaker warming was projected for the Lower Bay, potentially due to the moderating influence of nearby continental shelf waters. For salinity, results highlight regions likely subject to larger salinity changes. For the majority of projections showing the potential for altered streamflow during the winter and early spring, mesohaline regions are likely to experience the largest absolute changes in salinity; however, oligohaline areas may experience the largest percentage changes.

The strong temperature and salinity gradients in the Chesapeake Bay result in distinct spatial distributions of resident organisms, driven by species-specific physical tolerances (Atwood et al. 2001; Cotton et al. 2003; Jung and Houde 2003; Kimmel et al. 2006). Climate-driven changes in temperature and salinity will therefore alter spatiotemporal habitat availability, and species may need to change their spatial distributions and migratory patterns if conditions begin to exceed tolerable ranges (Wood et al. 2002; Najjar et al. 2010). For example, some coldwater species such as winter flounder (Pseudopleuronectes americanus) are currently only present in the Chesapeake Bay during cooler months. Laboratory experiments on adults suggest a relatively cool temperature preference of 13–14 °C, and they have been observed to stop feeding at temperature >23 °C (Olla et al. 1969; Periera et al. 1999). Projections from this study suggest mean spring surface water temperatures of 21.5 °C (CW) to >23 °C (WW) by the end of the century under RCP8.5 (Fig. 9), compared to a recent historical mean of ~18 °C. Under this scenario, if the rest of the water column warms at a comparable rate to surface waters, this species may spend less time in the Chesapeake Bay or eventually be excluded altogether. This is particularly likely if future conditions follow projections suggested by the warmer WW and WD models. While projections from our study show some slight thermal refugia in the Lower Bay and lower James River, most of the Chesapeake Bay is projected to warm substantially. Similarly, juvenile Atlantic sturgeon (Acipenser oxyrhynchus) are physiologically stressed by temperatures of >28 °C in their first few years of life in the bay (Niklitschek and Secor 2005). While recent historical and present-day summer surface temperatures average ~25–26 °C, these may increase to between 27 and 29 °C (CW) and >30 °C (WW) by the end of the century. The differences in projected temperature in the Chesapeake Bay among different climate models could thus encompass the difference between a moderate and potentially lethal change in conditions for some species.

Hales and Able (2001) showed that young-of-the-year black sea bass (Centropristis striata) could not tolerate water temperatures of <2–3 °C. This restriction is also shared by other species which occur in the Chesapeake Bay: summer flounder (Paralichthys dentatus), striped bass (Morone saxatilis), Atlantic croaker (Micropogonias undulatus), weakfish (Cynoscion regalis), and spot (Leiostomus xanthurus) also appear to have lethal lower temperature limits of ~2–3 °C (Schwartz 1964; Malloy and Targett 1991; Atwood et al. 2001; Lankford and Targett 2001; Rome et al. 2005). In the case of blue crab (Callinectes sapidus), this temperature limit also interacts with salinity, with cold, low-salinity conditions least favorable for survival (Rome et al. 2005; Hines et al. 2010). As a result of these tolerance limits, recruitment variability in some of these species (e.g., Atlantic croaker: Hare and Able 2007) has been directly linked to overwintering mortality of juveniles in estuarine habitats, with warmer winters being more favorable. During the monitoring period of the Thomas Point buoy (1985–2016), mean monthly surface temperatures were <3 °C in 48% of years for January, 54% of years for February, and 3% of years for December. By 2071–2100, projected mean monthly surface temperature at the Thomas Point location fell below 3 °C on only one occasion: a January in the CW GCM. Our results thus suggest that under RCP8.5, even using the most conservative GCM considered, the overwintering mortality restriction on recruitment for many fish species may be completely removed in the Chesapeake Bay.

Species that rely on environmental cues for spawning initiations and migration may also shift their phenology in response to changing temperature and flow characteristics. For example, temperature influences movement of striped bass within the Chesapeake Bay, and both temperature and flow regimes drive immigration and emigration of river herring (Alosa aestivalis, Alosa pseudoharengus) in and out of the bay, across multiple life stages (Peer and Miller 2015; Tommasi et al. 2015). Adult spawning activity may be associated with specific conditions most favorable for larval survival: larval striped bass survive best at 15–20 °C, and current spawning activity peaks from April to June, where surface temperatures currently average ~18 °C across the Chesapeake Bay (Rutherford and Houde 1995; Secor and Houde 1995). These are projected to increase to between 21.4 °C (CW) and 23.3 °C (WW) by the end of the century. To ensure spawning success, striped bass will therefore have to acclimate to warmer temperatures or shift their spawning season to earlier in the year. Spawning activity in some species is also timed to maximize food availability for larvae and to take advantage of seasonal blooms in primary productivity caused by temperature and flow patterns. If the timing of favorable temperatures shifts at a different rate to the timing of favorable feeding conditions, a mismatch may result, with implications for recruitment success (Wood et al. 2002).

Future temperature and salinity changes will also impact important benthic habitats, such as submerged aquatic vegetation (SAV). In the Chesapeake Bay, these primarily comprise seagrasses in higher salinity zones and freshwater angiosperms in lower salinity locations (Dennison et al. 1993; Kemp et al. 2004). SAVs provide essential habitat for multiple life stages of vertebrate and invertebrate organisms in the bay, as well as assimilating nutrients and reducing turbidity (Kemp et al. 2005). However, they are strongly sensitive to changes in salinity regimes and water quality within the estuary. Future changes to salinity in the bay may shift the spatial distributions of different types of SAV, depending on their physiological tolerances. In addition, any decrease in water quality driven by changing streamflow regimes (e.g., Lee et al. 2016) may reduce light penetration, and lead to the loss of SAV beds.

Climate change is likely to influence the abundance and distribution of not just economically and ecologically beneficial species, but also nuisance and pathogenic organisms which lead to management issues. A good example of this in the Chesapeake Bay is the occurrence of Vibrio spp., which cause potentially severe illness in humans through foodborne and environmental exposure (Ralston et al. 2011). Vibrios are currently most abundant in warmer water temperatures in the Chesapeake Bay and associated with species-specific salinity ranges (Kaneko and Colwell 1973, 1978; Constantin de Magny et al. 2009; Jacobs et al. 2014, 2015). Laboratory experiments show optimum temperatures for these species of 37–39 °C: much warmer than for the vertebrates described above and much warmer than currently observed water temperatures in the Chesapeake Bay (Kelly 1982; Miles et al. 1997; Sedas 2007). Projections from the statistical framework therefore suggest that Vibrio concentrations in the bay are likely to increase. Jacobs et al. (2014) found that 96% of positive water samples for Vibrio vulnificus were collected at temperatures warmer than 15 °C. Surface temperature in the Chesapeake Bay is currently above this threshold between May and October, on average. By the end of the century, the CW model projected an extension of this time period to April through October, while the WW, WD, and CD models projected favorable conditions for April through November. Changes to projected surface salinity varied among models, but the increasing summer salinity in tributaries projected by the WD and CW models would also likely move Vibrio hotspots upstream.

In conclusion, we found that an empirical modeling framework using statistically downscaled air temperature and precipitation was able to reproduce monthly historical flow, surface temperature, and surface salinity characteristics in the Chesapeake Bay. While choice of GCM contributed a large amount of uncertainty to future projections, downscaled global climate models suggest a 2–5.5 °C increase in surface water temperatures in the Chesapeake Bay by the end of the century in all seasons. Projections of streamflow were more uncertain, but may increase in the winter and spring and decrease in the fall, with subsequent impacts on surface salinity. These changes have implications for biological organisms that currently use the bay as feeding, spawning, or nursery habitat, particularly those that are currently approaching their upper thermal limits during summer. In contrast, limits to recruitment on several species currently imposed by cold winters may be largely removed. There were multiple uncertainties associated with our study, including the simplification or exclusion of important physical and biological processes. However, results presented here provide a simple starting point for investigation of climate change impacts on spatial characteristics of the Chesapeake Bay and potentially other estuaries around the world.