1 Introduction

Due to the limited water exchange with the open ocean, the semi-enclosed Baltic Sea, located in northern Europe in the transition zone between maritime and continental climates (Fig. 1), suffered historically from anthropogenic riverborne nutrient loads due to intensified agriculture and waste-water discharges since the 1950s (Gustafsson et al. 2012). Therefore, the Baltic Sea is today the worldwide largest marine ecosystem threatened by anthropogenically induced permanent hypoxia (e.g., Conley et al. 2009). A common definition of hypoxic area, which is used also in this study, is the extent of bottom water with oxygen concentrations below 2 mL O2 L− 1.

Fig. 1
figure 1

Bottom topography of the Baltic Sea and the locations of the monitoring station Gotland Deep (BY15, white square) and the Swedish lightships (black triangles) Sydostbrotten (L1), Finngrundet (L2), Grundkallen (L3), Svenska Björn (L4), Landsort (L5), Kopparstenarna (L6), Falsterborev (L7), Svinbådan (L8) and Fladen (L9). The Baltic proper comprises the Arkona, Bornholm and Gotland sub-basins. An open boundary of the model domain is located in the northern Kattegat (black line). The tide gauge station Landsort is located close to the lightship L5 with the same name

Although many coastal seas suffer worldwide from increasing human impact (Diaz and Rosenberg 2008), especially the Baltic Sea is threatened by hypoxia because of its specific hydrodynamic characteristics. Limited water exchange and strong vertical stratification due to large freshwater supply from a catchment area, that is about four times as large as the sea surface of the Baltic Sea, cause a freshwater residence time of about 35 years (Meier and Kauker 2003). Due to the long residence time, the response of the Baltic Sea hydrodynamics and biogeochemical cycles to perturbations is slow (e.g., Gustafsson et al. 2012).

In addition, decreasing oxygen concentrations of the bottom water cause increased phosphate fluxes from the sediment, which in turn lead to intensified cyanobacteria blooms and nitrogen fixation and expanding hypoxic bottom areas. Increased nitrogen removal with increasing hypoxia by denitrification might lower the nitrogen (N) to phosphorus (P) ratio further favoring cyanobacteria over other phytoplankton groups. This positive feedback is the so-called “vicious circle” (Vahtera et al. 2007).

To address eutrophication (i.e., the excess of nutrients) and deoxygenation due to the decomposition of organic matter both in the water column and in the sediments, several modeling studies were previously performed. For instance, both Gustafsson and Omstedt (2009) and Lehmann et al. (2014) investigated with simplified oxygen consumption models the temporal variability of oxygen depletion during 1958–2006 and 1970–2010, respectively. While Gustafsson and Omstedt (2009) focused on sensitivity experiments with reduced and increased runoff and decreased remineralization rates, Lehmann et al. (2014) calculated the spatial variability of hypoxic area. Both studies did not consider the change of oxygen consumption over time and hence could not simulate centennial changes from pristine to contemporary ecosystem conditions. In this study, ecosystem states prior to 1900 are considered unaffected by elevated, anthropogenic nutrient loads and are called pristine (Savchuk et al. 2008).

Based on the more complex biogeochemical model by Eilola et al. (2009), Väli et al. (2013) investigated the drivers of variations in halocline depth and its impact on hypoxia during 1961–2007. They showed a significant correlation between the depth of the halocline and the oxygen concentration of the bottom water. However, the simulation was too short to address long-term changes in biogeochemical cycling.

In many earlier studies, simulations with Baltic Sea circulation models were too short as well because long records of high-resolution, atmospheric forcing data were lacking. Available reanalysis data sets are either starting after 1948, like the NCEP/NCAR reanalysis data (Kalnay et al. 1996), or suffer from inconsistencies (Krueger et al. 2013). Hence, atmospheric surface fields were reconstructed for the Baltic Sea region based upon historical observations to generate longer data sets, e.g., by Kauker and Meier (2003) (see also Meier and Kauker 2003), Hansson and Gustafsson (2011) and Schenk and Zorita (2012).

Kauker and Meier (2003) used a ‘‘redundancy analysis’’ to reconstruct daily sea level pressure (SLP) and monthly surface air temperature, dew-point temperature, precipitation, and cloud cover over the Baltic Sea region. Predictors were daily SLP at 19 stations and monthly coarse-gridded air temperature and precipitation fields available for the period 1902 to 1998. The second input was a gridded, high-resolution atmospheric data set, based on synoptic stations, which was available since 1970.

Schenk and Zorita (2012) applied regional climate model results and the analog-method to reconstruct atmospheric surface fields for the period 1850–2008 (for details see Sect. 2). The advantage of the analog-method is that the high-resolution data set of variables during the reference period is dynamically consistent. Based on these atmospheric surface fields Gustafsson et al. (2012) performed the first comprehensive reconstruction of the Baltic Sea environment since 1850 by resolving the Baltic Sea into 13 dynamically interconnected and horizontally integrated sub-basins. They found a delayed response in eutrophication indicators to the massive nutrient load increase, an intensification of the pelagic cycling and a shift from spring to summer primary production. Based on the same data sets, Meier et al. (2012b) showed that three different coupled physical-biogeochemical models of varying complexity realistically reproduce deep water oxygen conditions during 1850–2008.

Also a statistical approach was applied to investigate Baltic Sea hypoxia. Carstensen et al. (2014) reconstructed oxygen and stratification conditions for 1898–2012 and found a 10-fold increase of hypoxia, which is primarily linked to the increased nutrient loads from land. However, according to their results also increased respiration due to higher temperatures has contributed to the worsening of oxygen conditions during the past two decades.

Meier et al. (2017) investigated the impact of historical eustatic sea level rise on hypoxia. They performed a simple sensitivity experiment for 1850–2008 assuming that the global mean sea level was 0.24 m lower compared to 1960 and studied the impact on biogeochemical cycles and hypoxia in the Baltic Sea. They found that the impact of historical global mean sea level rise since 1850 on hypoxic area and on mean dissolved inorganic phosphorus (DIP, phosphate) concentrations was relatively small. Hypoxic area only increased due to the historical global mean sea level rise by up to 8% in 1961.

Following the approach by Gustafsson et al. (2012) and Meier et al. (2012b, 2017), we aim in this study to disentangle the impact of various drivers on eutrophication and hypoxia on time scales larger than 4 years, i.e., the decadal time scale following Meier and Kauker (2003), using a high-resolution, coupled physical-biogeochemical circulation model for the Baltic Sea. Especially we focus on the question whether physical or biogeochemical drivers are more important for changes in nutrient pools, spring and cyanobacteria blooms, Secchi depth, oxygen depletion and hypoxic area since 1850. Physical drivers are, for instance, changes in air temperature, cloudiness, river runoff, wind speed and global mean sea level whereas biogeochemical drivers are nutrient loads from the  atmosphere and land as well as from the North Sea (Fig. 2). Further, we investigate whether the ``vicious circle´´ was impacted by changes in the hydrodynamics. This research question is very relevant also for other coastal seas with an anthropogenic impact like the Chesapeake Bay (e.g., Murphy et al. 2011) or the Gulf of Mexico (e.g., Rabalais et al. 2007).

Fig. 2
figure 2

Sketch of physical and biogeochemical processes with relevance for climate variability of the Baltic Sea ecosystem. In this study, we focus on the impact of heat fluxes, solar radiation, wind induced mixing, river runoff, riverborne nutrient loads and atmospheric deposition, nutrient fluxes across the lateral boundary towards the North Sea, and global mean sea level on eutrophication and hypoxia in the Baltic Sea

The study is organized as follows. In Sect. 2, the method is explained including a description of the numerical model and the historical reconstructed atmospheric, hydrological and nutrient load forcing. In addition, observational data sets used for model evaluation and the experimental strategy of one reference and 12 sensitivity simulations are presented. In Sect. 3, results of the long-term hindcast for 1850–2008 are evaluated and the sensitivity experiments are analyzed. Finally, in Sect. 4 and 5 the results are discussed and conclusions of the study are drawn, respectively.

2 Method

2.1 Model description

We used the three-dimensional, coupled physical-biogeochemical circulation model RCO-SCOBI which consists of the physical Rossby Centre Ocean (RCO) model (Meier et al. 2003; Meier 2007) and the Swedish Coastal and Ocean Biogeochemical (SCOBI) model (Eilola et al. 2009; Almroth-Rosell et al. 2011). In RCO, the ocean circulation model is coupled to a Hibler-type sea ice model and the subgrid-scale mixing in the ocean is parameterized using a k-ɛ turbulence closure scheme with flux boundary conditions (Meier 2001). A flux-corrected, monotonicity preserving transport (FCT) scheme is embedded without explicit horizontal diffusion (Meier 2007). The model domain comprises the Baltic Sea and Kattegat with lateral open boundaries in the northern Kattegat (Fig. 1). In case of inflow, temperature, salinity, nutrients (phosphate, nitrate, ammonium) and detritus model results are nudged towards observed climatological profiles and, in case of outflow, a modified Orlanski radiation condition is used (Meier et al. 2003). Horizontal and vertical resolutions amount to 3.7 km and 3 m, respectively.

SCOBI simulates the dynamics of nitrate, ammonium, phosphate, oxygen and hydrogen sulfide (as negative oxygen concentration according to 1 mL H2S L− 1 = − 2 mL O2 L− 1), three phytoplankton groups (including nitrogen fixing cyanobacteria), zooplankton and detritus (one pool limited by the Redfield ratio) (Eilola et al. 2009). The sediment contains nutrients in the form of benthic nitrogen and benthic phosphorus. Processes like assimilation, remineralization, nitrogen fixation, nitrification, denitrification, grazing, mortality, excretion, sedimentation, resuspension and burial are considered. With the help of a simplified wave model resuspension of organic matter is calculated (Almroth-Rosell et al. 2011). In this study, the parameterization of the temperature dependent remineralization from the Ecological Regional Ocean Model (ERGOM) (Neumann et al. 2002; Neumann and Schernewski 2008) was used (see Meier et al. 2012a, b). Further, oxygen consumption was higher than in previous versions because of a re-formulation of oxygen consumption rates through denitrification under anoxic and oxic conditions. Due to the induced changes, burial rates and the critical bottom stress for the calculation of resuspension were re-calibrated.

Fluxes of heat, incoming long- and shortwave radiation, momentum, and matter between atmosphere, ocean and sea ice are parameterized using bulk formulae adopted to the Baltic Sea region (Meier 2002). Inputs to the bulk formulae are state variables of the atmospheric planetary boundary layer like 2 m air temperature, 2 m specific humidity, 10 m wind, cloudiness and mean sea-level pressure, and ocean variables like sea surface temperature (SST), sea surface salinity (SSS), sea ice concentration, albedo, and water and sea ice velocities. In the sensitivity experiments, changing air temperature requires changing specific humidity because both quantities are connected via the Clausius-Clapeyron-equation. As described by the bulk formulae, the relationship between air-sea fluxes and state variables is highly nonlinear. For instance, 10 m wind speed affects both the momentum fluxes for mean currents and turbulent kinetic energy and sensible and latent heat fluxes.

RCO-SCOBI has previously been evaluated and applied in numerous long-term studies (e.g., Meier 2005, 2007; Eilola et al. 2011; Meier et al. 2003). In the following, we describe the reconstructed atmospheric, hydrological, nutrient load, and lateral boundary forcing data for a historical simulation 1850–2008 and a set of sensitivity experiments.

2.2 Atmospheric forcing

Using regionalized reanalysis data for 1958–2007 together with historical station data of daily sea-level pressure and monthly air temperature observations, multivariate three-hourly, High RESolution Atmospheric Forcing Fields (HiResAFF) for the period 1850–2008 were reconstructed based upon the analog-method (Schenk and Zorita 2012). This technique consists of searching for the atmospheric surface fields that are most simlar to the historical observations in a library of predictands from the calibration period 1958–2007. The predictands or analogs are multivariate atmospheric forcing fields of 2 m air temperature, 2 m specific humidity, 10 m wind, precipitation, total cloud cover and mean sea-level pressure taken from the Rossby Centre Atmosphere Ocean model, RCAO (Döscher et al. 2002), with a horizontal resolution of 0.25° × 0.25° (~ 25 km) interpolated onto a regular geographical grid. RCAO covers northern Europe and was driven by ERA40 reanalysis data at the lateral boundaries. Due to shortcomings in the heat fluxes mean monthly air temperature fields were taken from an atmosphere-only simulation with RCA3 driven with observed SSTs (Samuelsson et al. 2011). Although the applied analog-method identifies only daily-mean atmospheric fields (in case of air temperature linearly interpolated), appropriate variables from the RCAO simulation with 3-hourly resolution were used instead of daily averages. Unlike other applications of RCA3 (Höglund et al. 2009; Meier et al. 2011b), the 10 m wind speed was not corrected.

2.3 River runoff

For the calculation of monthly mean river flows several data sets were merged (Fig. 3). For 1850–1900, 1901–1920, 1921–1949, 1950–2004, 2005–2008 reconstructions by Hansson et al. (2011a), Cyberski and Wroblewski (2000) and Mikulski (1986), observations from the BALTEX Hydrological Data Center (BHDC) (Bergström and Carlsson 1994), and hydrological model results (Graham 1999) were used, respectively. Bengt Carlson (Swedish Meteorological and Hydrological Institute SMHI, pers. comm.) compiled observations after 1990. In the Bothnian Bay and the Bothnian Sea, runoff data are observations up through 2004; in the Danish straits and Kattegat up through 2003; in the Gulf of Finland and the Gulf of Riga up through 2002; and in the Baltic proper up through 1993. For the period 1994–1996, in the Baltic proper replacement stations were used. All other missing values in the observations during 1997–2004 have been replaced by hydrological model results (Graham 1999). For 1850–1900 and 1921–2008, monthly mean runoff data to sub-basins as described above were compiled by Thomas Neumann (Leibniz Institute for Baltic Sea Research Warnemünde IOW, pers. comm.) and by Phil Graham (SMHI, pers. comm.), respectively, and differ slightly from the reconstruction by Meier and Kauker (2003) during the overlap period 1902–1998 because minor errors in the data set were corrected.

Fig. 3
figure 3

Reconstructed annual mean river runoff (in 104 m3 s− 1) to the Baltic Sea without Kattegat based upon (1) Hansson et al. (2011a) (1850–1900, green), (2) Cyberski and Wroblewski (2000) (1901–1920, red), (3) Mikulski (1986) (1921–1949, blue), (4) observations from the BALTEX Hydrological Data Center (BHDC) (Bergström and Carlsson 1994) (1950–2004, black), and (5) hydrological model results (Graham 1999) (2005–2008, yellow). During 1997–2004 missing values in the observations have been replaced by model results (see text). The horizontal dotted line denotes the total mean for the period 1850–2008 of 14,100 m3 s− 1

For 1500–1995, Hansson et al. (2011a) reconstructed river runoff to three sub-basins of the Baltic Sea (the northern and southern regions and the Gulf of Finland) using air temperature and atmospheric circulation indices. Because of the applied stepwise regression, inter-annual variability of river discharge is significantly smaller than in observations (Fig. 3).

From sub-basin discharges and runoff of the 30 most important rivers of the BHDC during 1950–1998 (27 rivers to the Baltic Sea and 3 rivers to the Kattegat) climatological mean ratios were calculated and applied for the entire period 1850–2008. For homogeneity reasons, the available higher spatial resolution of river discharge for most of the period 1950–2004 was not utilized. From the resulting time series for 1850–2008, a climatological annual mean runoff to the Baltic Sea without Kattegat of 14,100 m3 s− 1 was calculated, in accordance with results by Meier and Kauker (2003). A statistically significant trend in the annual mean runoff was not detected.

2.4 Nutrient loads and atmospheric deposition

The reconstruction of historical nutrient loads of nitrogen and phosphorus followed Gustafsson et al. (2012). Bioavailable loads are shown in Fig. 4. For 1970–2006, nutrient loads from rivers and point sources were compiled from the Baltic Environmental and HELCOM databases (Savchuk et al. 2012). Estimates of pre-industrial loads for 1900 were based upon Savchuk et al. (2008). Between selected reference years of these two environmental states (1970–2006 and around 1900), nutrient loads were linearly interpolated taking intensified agriculture since the 1950s into account. Similarly, atmospheric loads were estimated (Ruoho-Airola et al. 2012). Nutrient loads contain both organic and inorganic phosphorus and nitrogen, respectively. For riverine organic phosphorus and nitrogen loads bioavailable fractions of 100 and 30% are assumed, respectively. Nutrient loads after 2006 were set to the values of the year 2006. As loads were calculated from runoff and annual mean nutrient concentrations (Eilola et al. 2011), the seasonal cycle is determined by the river discharge. Compared to the nutrient loads from Gustafsson et al. (2012) slight deviations occur because loads were calculated from concentrations and slightly differing river runoff data sets (Fig. 3). In addition, some differences may occur because of the constant N:P-ratio of organic matter used in the RCO–SCOBI model that by definition may cause limitations on the relative supplies of organic nitrogen and phosphorus (Eilola et al. 2011). The inter-annual variability of the climatological mean atmospheric deposition of nitrogen (Eilola et al. 2009) used in the period 1970–1990 is by definition lower than in the annual loads described by Gustafsson et al. (2012).

Fig. 4
figure 4

Annual mean bioavailable nutrient loads of a phosphorus and b nitrogen (in 103 ton year− 1) from land and atmosphere to the whole Baltic Sea

2.5 Lateral boundary data

Daily mean sea level elevations at the lateral boundary were calculated from the reconstructed, meridional sea level pressure gradient across the North Sea following Gustafsson and Andersson (2001). To avoid underestimation of extremes the correlations were calculated by Gustafsson et al. (2012) for various frequency bands using Empirical Orthogonal Functions (EOFs). The mean value of the time series was subtracted and replaced at the lateral boundary by the mean sea level in the Nordic height system 1960 (NH60) (Ekman and Mäkinen 1996), calculated by linear interpolation from the mean sea level at Frederikshavn in Denmark (− 11.1 cm) and Ringhals in Sweden (− 1.4 cm). Hence, the mean sea surface height at the lateral boundary is tilted reflecting the geostrophically balanced outflow of brackish water from the Baltic Sea (see Meier et al. 2004, their Fig. 1).

In case of inflow, temperature, salinity, nutrients (phosphate, nitrate, ammonium) and detritus values are nudged towards observed climatological seasonal (winter DJF, spring MAM, summer JJA, autumn SON) mean profiles for 1980–2005 at the monitoring station Å17 located at 58°N 16.5′ and 10°E 30.8′ in the southern Skagerrak. For instance, winter phosphate and nitrate concentrations at the sea surface at Å17 amount to 0.508 mmol P m− 3 and 5.62 mmol N m− 3, respectively. Corresponding figures for summer are 0.0532 mmol P m− 3 and 0.134 mmol N m− 3. Nutrient concentrations before 1900 were assumed to be only 85% of present-day concentrations (Savchuk et al. 2008). A linear decrease of nutrient concentrations from 1950 and back in time to 1900 was assumed (Gustafsson et al. 2012). Between 1850 and 1900 nutrient concentrations are constant at the 1900 level.

2.6 Initial conditions

Initial conditions were estimated based upon Savchuk et al. (2008). After a spin-up simulation for 1850–1902 utilizing the reconstructed forcing as described above, the calculated physical and biogeochemical variables at the end of the spin-up simulation on 1 January 1903 were used as initial conditions for 1 January 1850.

2.7 Evaluation data sets

2.7.1 Sea level data

One of the longest sea level records in the Baltic Sea is measured at the tide gauge station Landsort Norra located at the latitude 58.7689 °N and the longitude 17.8589 °E in the northern Baltic proper (Fig. 1). Landsort is located close to the nodal line of the first seiche extending between the western Baltic Sea and the Gulf of Finland (Neumann 1941). Hence, the tide gauge Landsort measures the Baltic Sea volume changes. Some of these volume changes are caused by saltwater inflows into the Baltic Sea (Matthäus and Franck 1992; Fischer and Matthäus 1996). The tide gauge is operated by SMHI. During 1886–2006, a mareograph recorded hourly values. Since 2004 sea level is measured automatically every 10 min and even hourly minimum and maximum values are recorded. We compared simulated and observed sea level data based upon a temporal resolution of 2 days because we only were focusing on low-frequency volume changes.

2.7.2 Lightship data

To evaluate temperature and salinity we used digitized data from nine Swedish lightships archived by SMHI (Lindkvist and Lindow 2006) at locations covering the Bothnian Sea, Northern Baltic proper, Kattegat and Sound (see Fig. 1). Most of the stations cover the period 1880–1970. However, there are exceptions. Some stations recorded only during shorter periods in the late 19th/early twentieth centuries (Landsort and Kopparstenarna), and some lightships started the measurements already in the 1860s (Finngrundet and Falsterborev). Measurements during the World Wars are sometimes missing. The lightship data were sampled on a daily frequency, sometimes even several times per day. Here, we focused on the long-term mean state, and thus calculated monthly mean cycles from the original data. Sampling during the winter months is sparse and at some locations even completely missing. We therefore caution the interpretation of the data during winter months.

Water temperature was measured using a mercury-in-glass thermometer. Salinity was determined by two different methods. (1) Before 1923, salinity was estimated using a hydrometer (Svansson 1974). The specific weight and temperature of a water sample were measured and then converted into salinity using an indirect method (Svansson 1974). (2) After 1923, salinity was measured using the titration method, which is a more precise and robust method compared to the hydrometer-derived salinity (Svansson 1974).

2.7.3 Annual maximum ice extent

During 1720–1996, 84.6% of the variability in annual maximum ice extent is attributed to the variations in regional air temperature during winter over the Baltic Sea (Tinz 1996). Hence, annual maximum ice extent and winter air temperature are strongly correlated although the correlation was not stationary over time (Omstedt and Chen 2001). In the present study, we utilized observed maximum ice extent to evaluate winter air temperature of HiResAFF.

Observations of annual maximum ice extent from the Finnish Meteorological Institute (FMI) were used (Source: Jouni Vainio, FMI, http://en.ilmatieteenlaitos.fi/ice-winter-in-the-baltic-sea, updated from Seinä and Palosuo 1996; Seinä et al. 2001). For comparison, annual maximum ice extent was calculated from simulated sea ice concentration output every second day by integrating the area of all grid boxes with an ice concentration larger than 0.1. We corrected simulated maximum ice extent because the area of the model domain is restricted to the Baltic Sea and Kattegat and does not include Skagerrak which is included in the observations. The model area of the Baltic Sea and Kattegat amounts to 420,172 km2. However, observed areas of the Baltic Sea, Kattegat and Skagerrak amount to 386,680, 29,320 and 31,570 km2, respectively (Lindkvist et al. 2003). Hence, we increased the simulated ice extent by 6.4%, although Skagerrak is never completely ice covered.

2.7.4 Monitoring data

For the evaluation of the model simulation, observations of temperature, salinity and oxygen, phosphate and nitrate concentrations from two different public databases were used: (1) the Swedish Ocean Archive (SHARK, http://sharkweb.smhi.se) and (2) the Baltic Environmental Database (BED, http://nest.su.se/bed) operated by SMHI and by the Baltic Nest Institute—Sweden (BNI, http://www.balticnest.org), respectively. We focused on data from the monitoring station Gotland Deep (BY15) that is representative for the Baltic proper. We compared long-term mean simulated and observed profiles of both annual and seasonal mean quantities. We focused, inter alia, on the July temperature as an important indicator for cyanobacteria blooms. Further, temporal changes in annual mean simulated water temperature and salinity in 1.5 and 200 m depth were investigated and compared with observations that were compiled by Gustafsson and Rodriguez Medina (2011). Annual mean observations of SST before 1970 were not displayed because the pronounced seasonal cycle is not resolved by available historical observations. In addition, annual mean oxygen and hydrogen sulfide concentrations in 200 m were investigated. Hydrogen sulfide was considered as negative oxygen equivalents. Further, winter mean nutrient (phosphate and nitrate) concentrations in 1.5 m depth were calculated from model results and observations in January and February before the onset of the spring bloom.

2.7.5 Nutrient pools and hypoxic area

Pools of DIP and dissolved inorganic nitrogen (DIN, ammonium and nitrate) were computed from all available oceanographic data in BED (Gustafsson and Rodriguez Medina 2011) using the data assimilation system (DAS) (Sokolov et al. 1997; Gustafsson et al. 2017). Savchuk (2010) discussed limitations of the method caused by uneven and sparse temporal and spatial data coverage.

Annual mean hypoxic areas in the Baltic proper were calculated from simulated oxygen and hydrogen sulfide concentrations and compared with the results by Hansson et al. (2011b) and Hansson and Andersson (2016). Hansson et al. (2011b) inter- and extrapolated hypoxic depths obtained from observed profiles of regular SMHI monitoring cruises in autumn over the Baltic Sea area. As shown by Väli et al. (2013), these estimates very likely underestimate hypoxic areas due to the lack of data.

2.8 Experimental strategy

We performed a 159-year reference simulation for 1850–2008 with the forcing data described above (henceforth referred to as REF). In addition to REF, 12 sensitivity experiments were carried out with the same experimental setup as in REF but with modified forcing data to identify the main drivers of eutrophication, cyanobacteria blooms and hypoxia (Table 1).

Table 1 List of sensitivity experiments

Six sensitivity experiments of Table 1 were designed to investigate the impacts on eutrophication and hypoxia from hydrodynamic changes in:

  1. (a)

    water temperature (TAIR1, TAIR2), wind induced mixing (WIND) and solar radiation (CONST). Water temperatures larger than 16 °C and calm and cloudless conditions are prerequisites for cyanobacteria blooms (e.g., Kahru and Elmgren 2014) and the chosen combination of changes in water temperature, wind induced mixing and solar radiation is expected to lead to unfavorable conditions for cyanobacteria blooms in CONST;

  2. (b)

    vertical stratification due to decadal freshwater variations (RUNOFF);

  3. (c)

    mean sea level (MSLD).

The impacts of biogeochemical drivers from changing nutrient loads were studied in:

  1. (d)

    OBC (importance of the North Sea boundary);

  2. (e)

    HIGH (development under maximum historical loads);

  3. (f)

    LOW (development under pristine conditions).

Combinations of both, hydrodynamic and biogeochemical drivers, were investigated with the help of three experiments focusing on the questions:

  1. (g)

    Could climatic conditions favorable for cyanobacteria blooms lead to hypoxia even under pristine nutrient loads (CYANO)?

  2. (h)

    Could stronger stratification and climatic conditions favorable for cyanobacteria blooms lead to hypoxia even under pristine nutrient loads (MSLR)?

  3. (i)

    Could a weaker stratification caused by 20% increased freshwater supply hamper eutrophication caused by the corresponding 20% increase in nutrient loads (FRESH)?

Note that simulated changes in TAIR2 and FRESH reflect results typical for future projections (e.g., Meier et al. 2012a, b).

In the following, the experimental design is described in detail. In TAIR1 and TAIR2, the model was forced during all years by 3-hourly air temperature and specific humidity from 1904 to 2008, respectively. 2008 and 1904 represent warm and cold years of the period 1850–2008, respectively. In WIND the wind velocity components and in CONST (CONSTant) all atmospheric forcing variables (air temperature, specific humidity, cloudiness and wind velocity) except precipitation were taken from 1904 during the entire simulation period. Hence, the inter-annual variations of hydrodynamic variables were only controlled by precipitation, runoff and sea level elevations at the lateral boundary in Kattegat. RUNOFF was forced by climatological monthly mean river runoff of the period 1850–2008, i.e. the inter-annually varying flow in each river was replaced by a climatological mean seasonal cycle.

In FRESH (FRESHwater), river runoff was increased by 20% during 1850–2008. Hence, simulated salinity decreased and nutrient loads increased because riverborne nutrient loads were calculated from the products of nutrient concentrations and river volume flow.

In LOW and HIGH, nutrient loads and atmospheric deposition from 1850 to 1985 were used for the entire period 1850–2008, respectively. Hence, LOW was forced by constant pristine loads whereas HIGH applies the highest ever-recorded loads since 1850 for the entire simulation period. In OBC (Open Boundary Conditions), we studied the sensitivity of North Sea nutrient concentrations on the biogeochemical cycling in the Baltic Sea by doubling the climatological nutrient concentrations at the lateral boundary in case of inflow. The latter experiment is rather speculative because historical measurements are missing. The winter concentrations of phosphate in Skagerrak show no statistically significant trend since 1950. Nitrate observations are only available for later years. Hence, we used a factor of two as exaggerated upper bound for long-term changes in possibly reinforced eutrophication. On the other hand, under pristine conditions nutrient concentrations in Skagerrak were estimated to be 85% of present concentrations corroborated by results of several studies (see Savchuk et al. 2008 and references therein).

In both CYANO (CYANObacteria) and MSLR (Mean Sea Level Rise), the model was forced during all years with three-hourly air temperature, specific humidity and cloudiness from 2008. According to Kahru and Elmgren (2014) the warm and calm summer 2008 was characterized by a record-high cyanobacteria bloom. Further, in both simulations pristine nutrient loads and atmospheric deposition from 1850 were used. In addition, the mean sea level in MSLR was increased by 1.5 m compared to the mean sea level in NH60 which is used in RCO (see Sect. 2.5). In CYANO, the wind velocity components during all years of the simulation were taken from 2008, which was dominated by low wind speeds favoring cyanobacteria blooms.

The impact of global mean sea level rise was investigated earlier (Meier et al. 2017) and will not be repeated here. However, the results by Meier et al. (2017) were compared with the results of this study. In the sensitivity experiment MSLD (Mean Sea Level Decrease) by Meier et al. (2017), the mean sea level was during the entire simulation period reduced by 0.24 m compared to REF neglecting past eustatic sea level rise. This sea level rise in the Baltic Sea during 1850–2008 was estimated to be 0.24 m or 1.5 ± 0.53 mm year− 1 (Gornitz 1995). Consequently, salinity and thus vertical stratification decreased slightly as shown by Meier et al. (2017).

2.9 Analysis of results

We focused on the long-term development of eutrophication and hypoxia on centennial time scale. Hence, temporal evolutions were low-pass filtered with a cutoff period of 4 years (see Meier and Kauker 2003). Volume averaged quantities were always calculated for the entire Baltic Sea and Kattegat. For the analysis of the annual maximum ice extent, REF was prolonged until July 2015 with the help of (1) high-resolution atmospheric forcing data for the period 2008–2015 (Geyer 2014) and (2) climatological mean runoff for the period 2009–2015 calculated from the reconstructed runoff for 1850–2008 (Sect. 2.3). Nutrient loads after 2006 were set to the values of the year 2006. For 2008–2015, at the open boundary daily sea level data from the Swedish tide gauge Smögen in Skagerrak from SMHI were used. For details of the prolongation of the simulation the reader is referred to Meier et al. (2018).

3 Results

3.1 Evaluation of the reference simulation

3.1.1 Sea level

Simulated sea levels at the station Landsort are in good agreement with observations (Fig. 5; Table 2). Volume changes during the large saltwater inflow in December 1951 are well reproduced by RCO-SCOBI (Fig. 5). For the period 1902–1998, the Mean Error (ME) and Root Mean Square Error (RMSE) calculated from the model output every second day amount to − 1.8 and 10.3 cm, respectively, which are close to the biases of the reconstruction by Meier and Kauker (2003) (Table 2). For the same period, also correlations and explained variances of the two reconstructions are of similar size. Further, sea level variability in our simulation and in observations are rather close. We found standard deviations in model results and observations of 18.7 and 19.5 cm, respectively.

Fig. 5
figure 5

40-day running mean sea level (in cm) at Landsort for the period 1950–1959. Solid line: model results. Dotted line: observations

Table 2 Mean values and model errors of two-daily sea level at Landsort for 1902–1998

3.1.2 Sea surface temperature and salinity

Based on the long-term monthly mean seasonal cycle of SST we found an overall good agreement between simulated and lightship data, both in terms of the mean state and the variability (Fig. 6). However, the SSTs are overestimated in summer in the northern and underestimated in autumn in the southern Baltic Sea. The maximum monthly summer (autumn) SST bias in the northern (southern) Baltic Sea amounts to 2.5 °C (− 1.5 °C). In general, biases in most of the other months are lower than 0.5 °C. The correlation, R, between model results and observations of the long-term monthly time series is generally good (R = 0.93–0.98).

Fig. 6
figure 6

Simulated and observed monthly mean sea surface temperature (in °C) from the Swedish lightships Sydostbrotten, Finngrundet, Grundkallen, Svenska Björn, Landsort, Kopparstenarna, Falsterborev, Svinbådan and Fladen (for the locations see Fig. 1) during 1860–1972 (for the exact operation periods of the lightships see top of the panels)

For SSS, the comparison between simulated and lightship data yields a much more scattered result (Fig. 7). Stations in the southern Baltic Sea have a reasonable seasonal cycle but underestimate the salinity by 0.1–3.4 g kg− 1. In the northern Baltic Sea simulated and observed seasonal cycles agree reasonably well at some stations (Sydostbrotten, Svenska Björn and Finngrundet), whereas there is a pronounced disagreement at other stations. From the comparison of lightship data that cover only the early period (e.g., Landsort and Kopparstenarna), and thus use the hydrometer method, with lightship data from other locations that cover the full period (e.g., Sydostbrotten and Svenska Björn), it is evident that the hydrometer measurements affect the results. Obviously, these hydrometer measurements yield questionable salinity values in the northern parts of the Baltic Sea. This is also seen in model–observation correlations of the northern stations, which are for the hydrometer period R = − 0.07 to 0.47 and for the titration period R = 0.54 to 0.67. At the southern stations the model–observation correlations are generally higher (R = 0.67–0.73) and the differences between the hydrometer and titration periods are less significant. Hence, we conclude that measurements of salinity on lightships before 1923 in the northern Baltic Sea (at low salinities) are not reliable.

Fig. 7
figure 7

As Fig. 6 but for sea surface salinity (in g kg− 1)

3.1.3 Annual maximum ice extent

Simulated and observed annual maximum ice extents correspond reasonably well during 1850–2015 (Fig. 8). The mean maximum ice extent in the model is somewhat underestimated. Mean simulated and observed annual maximum ice extents amount to 186 × 109 m2 and 196 × 109 m2, respectively (Table 3). However, more important is the relatively large RMSE of 67.3 × 109 m2 caused by the underestimation of severe ice winters, in particular, during the reconstruction period. Hence, correlation coefficient and explained variance only amount to 0.80 and 62%, respectively.

Fig. 8
figure 8

Annual maximum ice extent (in 109 m2) during 1850–2015: observations (red) and model results (black). The straight line denotes the area of the model domain comprising the Baltic Sea and Kattegat of 420 × 109 m2. The differences between model results and observations are shown in the upper panel (black). The blue line denotes the mean error of − 9.8 × 109 m2

Table 3 Mean, maximum and minimum values and model errors of annual maximum ice extent for the whole period 1850–2015, for the simulation period 1903–1998 by Meier and Kauker (2003), for the reconstruction period 1850–1957, and for the analog-pool period 1958–2007

Compared to the earlier reconstruction of atmospheric forcing fields for 1902–1998 by Kauker and Meier (2003) we found that HiResAFF resulted in larger errors of simulated annual maximum ice extent (RMSE, R, VAR, see Table 3). As also severe ice winters during the reference period of the analog-method (1958–2007) were underestimated, e.g., maxima in ice extent in model results and observations amount to 348 × 109 m2 and 405 × 109 m2, respectively (Table 3), we speculate that there are two reasons for the shortcomings. The air temperatures of the regional atmosphere model are too warm causing, inter alia, overestimated SSTs in summer in the northern Baltic Sea (Fig. 6) and artificial melting of sea ice. As cold winters during the reference period are missing, the analog-method cannot reproduce the overall colder climate conditions during the reconstruction period. Hence, the number of severe ice winters during the 19th and the first half of the twentieth centuries is underestimated.

3.1.4 Temporal changes in physical and biogeochemical variables

In Fig. 9, simulated and observed annual mean water temperature, salinity, and oxygen, hydrogen sulfide and nutrient concentrations in 0 and 200 m depth at the monitoring station Gotland Deep are shown. Observed decadal variations and long-term trends are well reproduced by model results. Surface and deep water temperatures show increasing trends whereas no statistically significant trends in salinity are found in agreement with earlier studies (Meier and Kauker 2003). SSTs in July are well simulated both in pre-industrial and recent climate conditions whereas annual mean SSTs during 1902–1932 are too warm (Fig. 10) because of too warm winter SSTs in accordance with the results of the model-data comparison using lightship data at Landsort (Fig. 6). Annual and seasonal mean water temperatures in the deep water at BY15 are well simulated (Fig. 10).

Fig. 9
figure 9

Simulated and observed annual mean water temperatures (in °C) (a, b) and salinities (in g kg− 1) (c, d) at Gotland Deep at 1.5 and 200 m depths, annual mean oxygen concentrations (in mL O2 L− 1) at 200 m depth (e), and winter (January–February) mean phosphate (in mmol P m− 3) (f) and nitrate (in mmol N m− 3) (g) concentrations at 1.5 m depth. For comparison, observations from monitoring cruises at Gotland Deep (green diamonds, in panel a since 1970 only) are shown. Surface observations are taken from data extrapolated to 0 m

Fig. 10
figure 10

31-year mean vertical profiles of temperature (in °C), salinity (in g kg− 1) and oxygen concentration (in mL O2 L− 1) at Gotland Deep (BY15) for 1902–1932 and 1976–2006: annual mean observations (black solid line), July mean observations (black dash-dotted line), annual mean model results (red line) and July mean model results (green line). The grey shaded areas and the grey dashed lines indicate the ± 1 standard deviations of the annual and July mean observations, respectively. Correspondingly, the red shaded areas indicate the ± 1 standard deviations of the annual mean model results

The pronounced decadal variations in simulated salinity caused by variations in river runoff and westerly wind (Meier and Kauker 2003) are well reproduced although simulated deep water salinity is systematically lower than in observations (Fig. 9). In addition, we found good model results for vertical salinity profiles although simulated halocline depths are too shallow and bottom salinities are too low (Fig. 10). This shortcoming seems to be model specific because it was already reported earlier from RCO simulations driven by other atmospheric forcing fields (e.g., Meier 2001).

Simulated oxygen and hydrogen sulfide concentrations (the latter as negative oxygen equivalents) in 200 m depth at Gotland Deep show an overall negative trend due to increasing eutrophication and large decadal variations due to saltwater inflows (Fig. 9). The temporal variations since the 1970s are well reproduced by the model whereas deep water oxygen concentrations during preindustrial periods, e.g. 1902–1932, are overestimated compared to the monitoring data (Fig. 10). This model shortcoming was already reported from earlier RCO-SCOBI versions and other Baltic Sea models (Meier et al. 2012b).

Simulated surface phosphate concentrations in winter follow observations well and indicate the increasing eutrophication since the 1950s in the Baltic proper (Fig. 9). In particular, decreasing phosphate concentrations during the stagnation period of the 1980s (Conley et al. 2002) are well reproduced. However, during the 2000s simulated phosphate concentrations overestimate observations.

Elevated surface nitrate concentrations were observed since about 1975 (Fig. 9). However, corresponding simulated nitrate concentrations underestimate observations after the 1980s. As the first nitrate observations became available at the end of the 1960s, it is unclear whether the elevated nitrate concentrations already at the beginning of the 1960s are realistic in the model.

3.1.5 Nutrient pools and hypoxic area

After a spin-up period at the beginning of the simulation, model results indicate an increase of both DIN and DIP pools in the water column towards the end of the twentieth century, especially after World War II (not shown). Nutrient input to the Baltic Sea started to decrease again around 1985 (Fig. 4), while the Baltic proper phosphorus content in the model continued to increase until about 2008. However, the increase of DIN was interrupted in the model by a period of low oxygen concentrations and strong denitrification caused by a stagnant period in the early 1970s (not shown). The DIN pool increased again after the strong inflow that improved the deep water oxygen conditions in the mid-1970s and remained at about the same level until 2008 although the nitrogen loads continued to decrease. Compared to the DIP pool the DIN pool seems to adjust more rapidly to changing loads. The temporal development of the pools is similar to the results shown by Gustafsson et al. (2012).

The simulated pool of DIP is larger than the corresponding pool estimated from BED (not shown). The difference might be explained by a too large amount of pelagic DIP in the model (Fig. 9). Some differences between simulated and from observations derived nutrient pools may also be caused from unavoidable under-sampling of BED data as discussed, e.g., by Gustafsson et al. (2017). The simulated pool of DIN is relatively close to results from BED although the simulated DIN pool during the 2000s overestimates the corresponding BED pool (not shown).

According to our model results, hypoxic area increased considerably between 1950 and 1970 from very low values to about 90,000 km2 in 2008 (Fig. 11). Although data are sparse, we argue that the timing of the unprecedented and rapid increase in hypoxic area is well simulated. Hypoxic area estimated from observations almost double during the 1960s, i.e. an increase from about 36,000 km2 in 1960 to about 69,000 km2 in 1968. Similar values are found from model results. However, after about 1970 simulated hypoxic areas overestimate corresponding areas derived from observations. For 1960–2008, mean simulated and from observations estimated hypoxic areas amount to 83,000 and 59,000 km2, respectively. Hence, the mean bias is with 23,000 km2 or about 39% of the mean observed area considerable. Similarly, simulated anoxic area is overestimated (not shown). Nevertheless, the observed reduction in hypoxic area during the stagnation period in the 1980s is simulated although the simulated decrease is smaller than the estimates from observations. Further, the continuous increase in hypoxic area observed since 1998 is not captured by the model.

Fig. 11
figure 11

Annual maximum simulated (black) and observed (red) hypoxic areas (in 109 m2) in the Baltic proper for 1850–2008. Data by Hansson et al. (2011b) were calculated from observed profiles of oxygen concentrations. For the year 1967 data are missing

3.2 Results of the sensitivity experiments

In the sensitivity experiments TAIR1 and TAIR2 no trends in volume averaged water temperature are found because by definition all years have the same seasonal cycles of air temperature from selected individual years, that is 1904 and 2008, respectively (Fig. 12; Table 4). Altered salinity and nutrient concentrations (RUNOFF, FRESH, MSLD, LOW, HIGH, OBC) approximately do not change the temperature evolution compared to REF. Constant cloudiness compared to TAIR1 or TAIR2 changes temperature only slightly (not shown). WIND has the same temperature variability but is warmer than REF perhaps because of a reduced latent heat flux. Although CONST does not have trends in air temperature, cloudiness and wind, there is still a drift to higher temperatures although the decadal variability in temperature is reduced. A possible explanation might be the reduced latent heat flux that affects the deep water temperature on a time scale of about 35 years (e.g., Meier and Kauker 2003). In TAIR2, CYANO and MSLR, water temperature during the initial period ~ 1850–1880 is on average about 2 °C higher than in REF. Compared to TAIR1, water temperatures in these experiments are higher even during the entire period 1850–2008 because 2008 was an exceptionally warm year.

Fig. 12
figure 12

4-year running mean, volume averaged annual temperature (in °C), salinity (in g kg− 1), DIN (in mmol N m− 3), DIP (in mmol P m− 3), phytoplankton concentration (in mg CHL m− 3), cyanobacteria concentration (in mg CHL m− 3) and dissolved oxygen concentration (in mL O2 L− 1), and hypoxic area (in km2) (from upper left to lower right): REF (black solid), TAIR1 (dark green solid), TAIR2 (dark green dashed), WIND (magenta solid), CONST (magenta dashed), RUNOFF (blue solid), FRESH (light blue solid), LOW (yellow solid), OBC (orange solid), HIGH (red solid), CYANO (turquoise solid), MSLR (gray solid), and MSLD (gray dashed). Note that the temperature curves of REF, RUNOFF, LOW, OBC, HIGH, and MSLD are on top of each other. Similarly, the salinity curves of REF, LOW, OBC, and HIGH are indistinguishable

Table 4 Brief summary of the results of the sensitivity experiments (see Sects. 4.2, 4.3 and 4.4 of the discussion)

As expected, LOW, HIGH, OBC and even TAIR1 follow approximately the same salinity evolution as REF (Fig. 12). At the end of the simulation salinities in MSLD and FRESH are approximately 0.4 and 1.4 g kg− 1 lower than in REF, respectively. The sensitivity experiment TAIR2 show slightly lower salinity than in TAIR1 and REF because 2008 is warmer than both 1904 and most years during 1850–2005 and thus evaporation from the ocean would be reduced because the annual mean air-sea temperature difference is smaller (Meier and Döscher 2002). We found that in RUNOFF decadal variability in salinity is reduced compared to REF confirming earlier results by Meier and Kauker (2003). In WIND and CONST salinities are drifting to significantly lower values (maximum differences amount to about 1.5 g kg− 1) because wind and sea level variations in Kattegat do not fit together in these experiments. Hence, the intensities of saltwater inflows are smaller than in REF. Salinities in CYANO and MSLR are on average about 1 g kg− 1 lower and 2 g kg− 1 higher than in REF, respectively. As in WIND and CONST, wind and sea level variations in Kattegat in CYANO do not fit together causing a reduced salt flow from Kattegat into the Baltic Sea. In MSLR, global mean sea level is by 1.5 m higher than in REF causing an increased cross section in the Danish straits and, thus, an enhanced salt flow into the Baltic Sea (cf. Meier et al. 2017).

In REF, the temporal evolution of DIN and DIP concentrations in the water column show the well-known increases in nutrient concentrations since the 1950s indicating eutrophication (Fig. 12). During the last two decades of the simulation DIN decreases while DIP still increases although nutrient loads have decreased since the 1980s (Fig. 4). Whereas in all experiments (except LOW, CYANO and MSLR) DIP concentrations in the water column increase with increasing external (< 1980s) or internal loads (> 1980s), the behavior of DIN is more complicated than DIP (Fig. 12). The results of HIGH and LOW, CYANO and MSLR for both DIP and DIN form the envelopes of all simulations at least during the past two decades. In OBC and TAIR2, DIP concentrations are significantly larger compared to REF, whereas in WIND and CONST DIP concentrations are significantly lower. At the beginning of the FRESH simulation, DIN concentrations are larger compared to REF because of the larger nitrogen loads from land. In TAIR2, DIN concentrations are increasing with time. In CYANO, DIN and DIP concentrations are as low as in LOW or even lower (DIN). However, with both significantly increased temperature and salinity as in MSLR DIP concentrations are slightly elevated compared to LOW enabling the growth of cyanobacteria (see below).

DIN maxima in HIGH during the 1860s and in WIND and CONST during the 1960s are characterized by DIP limited regimes (Fig. 13). Also during 1900–1970 DIN/DIP ratios in WIND and CONST are much higher than the Redfield ratio. First during the 1980s the regimes in these two simulations got DIN limited explaining possibly the delay in the development of oxygen depletion. In FRESH DIN/DIP ratios during 1850–1910 are larger than in REF. During the entire period 1850–2008 the sensitivity experiments TAIR2, OBC, CYANO and MSLR and even HIGH after 1870 are DIN limited.

Fig. 13
figure 13

4-year running, winter (January to March) mean DIN to DIP ratio in 0 to 10 m of depth (left panel) and Secchi depth (in m, right panel) at Gotland Deep (BY15). Line colors of the experiments as in Fig. 12. Note that the empirically developed atomic ratio of nitrogen and phosphorus in marine phytoplankton (the so-called Redfield ratio) amounts to 16:1

Because of elevated nutrient levels since the 1950s, phytoplankton (sum of diatoms, flagellates and cyanobacteria) and cyanobacteria concentrations increased in REF (Fig. 12). Compared to the phytoplankton concentration, the concentration of cyanobacteria increased later with a delay of more than 10–20 years. All sensitivity experiments except HIGH, LOW, CYANO and MSLR show approximately the same behavior with the largest and smallest increases in TAIR2 and CONST, respectively. In HIGH, we found the highest phytoplankton and cyanobacteria concentrations compared to all other experiments. Note that the rapid increase in cyanobacteria concentration is delayed until about 1860, whereas phytoplankton concentration increases immediately after the start of the integration. Correspondingly, the lowest values are found in LOW and in CYANO and MSLR throughout the entire period and during the last decades, respectively. At the end of the simulations, cyanobacteria concentrations compared to REF are considerably lower in TAIR1, WIND and CONST and higher in TAIR2 and OBC, respectively. These differences are caused by significantly lower water temperatures since the 1970s (TAIR1, CONST) and significantly deeper mixed layer (WIND) or both (CONST) (Fig. 12). In TAIR2, the about 2 °C higher water temperature causes an about 15 years earlier onset of extensive cyanobacteria blooms and 50% higher concentrations at the end of the investigated period compared to REF. In FRESH, cyanobacteria concentrations follow the evolution of REF approximately, whereas phytoplankton concentrations are since the 1980s noticeably larger than in REF. In MSLR cyanobacteria concentrations are somewhat elevated since the 1860s whereas in CYANO cyanobacteria concentrations are as small as in LOW. Due to the higher mean sea level, in MSLR stratification is much stronger than in CYANO causing reduced ventilation, elevated phosphate concentrations and, thus, increased cyanobacteria concentrations. Results for Secchi depth in the eastern Gotland Basin are very similar to the results of volume averaged phytoplankton concentrations because changes in simulated Secchi depth are calculated from the inverted changes in detritus concentration (Fig. 13).

Volume averaged oxygen concentrations follow the temporal evolution of phytoplankton concentrations approximately (Fig. 12). Interestingly, oxygen concentrations in REF fall below the decreased level in MSLR first during the 1960s. In HIGH, hypoxia develops already during the 1860s, while in LOW, CYANO and MSLR hypoxic areas are always close to zero (Fig. 12). In REF, TAIR1, WIND and RUNOFF hypoxic areas increase considerably between the 1950s and 1970s and the temporal evolutions differ first after the 1970s slightly. In OBC, FRESH and TAIR2 and in CONST the rise in hypoxic area occurs about 5 years earlier and about 10 years later, respectively. In CONST, hypoxic area decreases again after the maximum in the 1970s.

In all simulations except HIGH and LOW, the benthic nitrogen and phosphorus pools increase continuously since about 1900 (not shown). In HIGH, large increases in the benthic pools are found already during the first 10–20 years of the integration. In LOW, in particular the benthic phosphorus pool decreases continuously with time due to an export of phosphorus from the Baltic Sea into the North Sea indicating an imbalance from steady-state conditions between initial conditions, pristine nutrient supply and lateral boundary conditions. Note that at the start of the simulation in 1850 nutrient concentrations represent estimated conditions of the year 1900.

Results of the sensitivity experiments are summarized in Table 4 and will be discussed below.

4 Discussion

4.1 Biases of the reconstruction

Sea level variations at Landsort are dominated by volume changes of the entire Baltic Sea and are well simulated (Fig. 5). Mean error, RMSE, correlation and explained variance are relatively close to the values obtained by Meier and Kauker (2003, their Table 2) indicating that both reconstructions (Kauker and Meier 2003 and this study) are characterized by realistic wind-driven water exchange between the Baltic Sea and North Sea (Fig. 5; Table 2).

Further, simulated and observed annual maximum ice extents are well correlated (Fig. 8; Table 3). During the overlap period 1903–1998 the quality of the simulated ice extent in our study is only slightly worse than the simulated ice extent by Meier and Kauker (2003). However, the results for the reconstruction period 1850–1957 are not satisfactory. In particular, severe ice winters are underestimated. As during the reference period (1958–2007) the regional climate model underestimated cold winters, the reconstruction method failed to reproduce severe ice winters before 1957. Our results illustrate that the quality of the results of the analog-method depends on accurate model data sets during the reference or calibration period and on the assumption that the climate does not change in the reconstruction compared to the reference period. However, also the historical observations might be biased. The accuracy of the measurements may decrease back in time and the data are not homogeneous (Meier and Kauker 2003).

Compared to long-term monitoring profile data of temperature, salinity and oxygen, hydrogen sulfide and nutrient concentrations (Figs. 9, 10) and lightship data of temperature and salinity (Figs. 6, 7), the results of our coupled physical-biogeochemical model are satisfactory. However, it is important to note that before 1950 only a few oxygen measurements are available. Reliable nutrient measurements are not available before 1970. Nevertheless, both the seasonal cycle and long-term variations in temperature and salinity are simulated reasonably well compared to observations when salinity measurements from the hydrometer method are excluded. The decadal variability in salinity before 1900 is small and very likely underestimated due to shortcomings in the runoff reconstruction. The reconstruction of nutrient loads before 1945 is based on estimates only (Savchuk et al. 2008). Hence, both evaluation and forcing data are getting less reliable when we are moving back in time and it is difficult to judge for the historical period whether model-data discrepancies are caused by biased forcing data, model biases, measurement errors or biases due to undersampling (e.g., in winter).

The timing of the transition from oligotrophic to eutrophic states in biogeochemical cycling is simulated well (Fig. 11). Before the 1940s simulated oxygen concentrations in the deep water are too high (Figs. 9, 10) causing very likely an underestimation of hypoxic area during the early period. However, historical observations of hypoxic area prior to 1960 do not exist. Also compared to the results of the statistical model by Carstensen et al. (2014) we found smaller hypoxic area, in particular, prior to 1960. For the period after 1960 during eutrophied conditions, our model shows good performance compared to observations although hypoxic areas are overestimated, especially during the stagnation period in the 1980s and early 1990s. Nevertheless, we concluded that the model is suitable for sensitivity and attribution studies despite the identified shortcomings.

4.2 Sensitivity experiments with removed centennial or decadal variations of the forcing

The results of the attribution experiments TAIR1, WIND, CONST, RUNOFF and MSLD suggest that for biogeochemical cycling in the Baltic Sea observed decadal variations and centennial trends in temperature, solar radiation, wind induced mixing, river runoff and sea level are less important than changes in nutrient loads.

Nutrient loads are very important drivers of eutrophication and hypoxia as shown in LOW and HIGH. As in OBC the applied change of nutrient concentrations at the lateral boundary is exaggerated, we may conclude that the sensitivity of the Baltic Sea ecosystem to the lateral boundary condition is relatively small. Hence, our assumption that nutrient concentrations at the lateral boundary before 1900 amount to 85% of present-day concentrations is not critical for the results of our experiments within the Baltic Sea.

Although in the Baltic Sea long-term changes in hypoxia are mainly caused by elevated nutrient loads, on inter-annual to decadal time scales changes in hypoxia are more controlled by hydrodynamical changes, e.g., saltwater inflows (cf., Lehmann et al. 2014) and stagnation periods (cf., Conley et al. 2002). However, high-frequency changes of hypoxia are not topic of the present study.

4.3 Sensitivity experiments under future climate conditions

In addition to nutrient supply, air temperature is an important driver under eutrophied conditions. An increase of air temperature by 2 °C in TAIR2 relative to TAIR1 amplifies cyanobacteria blooms, i.e., causes an increase in the average cyanobacteria concentration by 50% compared to REF in 2008, and leads to a further expansion of hypoxic area.

In FRESH, the nutrient supply increases by 20%. Hence, the phytoplankton concentrations during the spring bloom increases but not the cyanobacteria concentrations because the primary production is phosphorus limited. In particular, during the earlier part of the simulation period DIN concentration increases compared to REF because of reduced denitrification efficiency. Although stratification decreases with increased river runoff and, by that, vertical ventilation with oxygen should be improved, the increased nutrient supply is the dominating factor for oxygen depletion. The combined responses of TAIR2 and FRESH illustrate the changes in earlier scenario simulations of future climate with warming and increased volume and nutrient flows (e.g., Meier et al. 2011a, b, 2012b). However, note that our sensitivity experiments are only idealized and do not represent consistent projections of future climate.

In WIND, salinity decreases because the transport of saltwater into the Baltic Sea decreases. Hence, mixed layer depth increases hindering phytoplankton blooms in general and cyanobacteria blooms in particular. In combination with cold water temperatures (CONST) primary productivity, cyanobacteria blooms and hypoxic area are considerably reduced. Despite unfavorable weather conditions for cyanobacteria blooms in CONST, we still found oxygen depletion and hypoxia because of the elevated nutrient supply. WIND and CONST are not scenarios of future climate but illustrate the response of biogeochemical cycles to the large changes in vertical stratification.

4.4 Sensitivity experiments under pristine nutrient conditions

In the experiments CYANO and MSLR we followed the opposite strategy and investigated whether hypoxia would have been developed during the past 150 years if climate conditions are favorable for cyanobacteria blooms (like warm water, clear sky, weak winds and strong vertical stratification with limited deep water ventilation) but nutrient supplies are at pristine levels. We found that the processes of the “vicious circle” will not cause hypoxia if nutrient loads are low although cyanobacteria concentrations in MSLR increase slightly compared to LOW. This result might be explained by model biases, for instance, the temperature dependency of biogeochemical processes (e.g., growth rates, remineralization rates, sediment–water fluxes) or the magnitude of burial of nutrients stored in the sediments. If the permanent sink of phosphorus is small, a phosphorus pool in the sediments may build up over the years, even under small external supply. If this phosphorus pool in the sediments is finally large enough, randomly generated or climatically induced anoxic conditions may cause a release of large amounts of phosphorus to the water column triggering cyanobacteria blooms and the “vicious circle” such that hypoxia will occur permanently. However, our model does not support this scenario.

5 Conclusions

The main conclusions of this study are:

  1. 1.

    The presented reconstruction of Baltic Sea climate variability for 1850–2008 has good quality compared to available observations. Transient biogeochemical cycling from oligotropic to eutrophic conditions is simulated realistically.

  2. 2.

    The excessive nutrient loads during the twentieth century that have accelerated since the 1950s caused the long-term evolution of eutrophication and hypoxia in the Baltic Sea. The observed increases in temperature and sea level had a small impact as well whereas changes in cloudiness during the twentieth century did not affect the long-term evolution of eutrophication and hypoxia considerably. However, if the nutrient concentrations at the lateral boundaries in Skagerrak had doubled, hypoxia would have developed about 5 years earlier. If water temperatures had been colder and the mixed layer had been deeper as of the year 1904, the occurrence of cyanobacteria would have been about 50% less and hypoxia would have developed about 10 years later. Hence, in the past nitrogen fixation by cyanobacteria and the “vicious circle” played an important role for the development of hypoxia.

  3. 3.

    Without elevated nutrient concentrations, hypoxia would not have occurred during the twentieth and twenty-first centuries according to our model. Even under physical conditions favorable for the development of cyanobacteria like warm water, clear sky, weak winds and strong vertical stratification of salinity causing oxygen depletion, hypoxia would not have been developed during 159 years under pristine nutrient conditions.