Introduction

Eutrophication is one of the most prevalent threats to freshwater resources, with consequences for public health, food security, biodiversity and other ecosystem services, highlighting the urgency of understanding controls on water quality. Nutrients, particularly total phosphorus, have long been recognized as a primary factor influencing water quality (Quinlan et al. 2020; Schindler 1997). Hydrology and climate are tightly coupled, and climate change is affecting water quality through warming temperatures and changing precipitation patterns which can increase in nutrient inputs from land (Michalak 2016). For example, regions that are experiencing increases in precipitation are likely to see a rise in eutrophication driven by increases in nutrient runoff (Sinha et al. 2017). These increases in algal biomass have further implications for greenhouse gas sources (Beaulieu et al. 2019) and sinks (Webb et al. 2019), as well as for the presence of cyanobacterial toxins (Hayes and Vanni 2018). Understanding how climate drivers interact with nutrients to impact water quality is important for environmental management and improving water quality can subsequently mitigate the consequences of climate change on aquatic ecosystems (Vaughan et al. 2019).

Climate is increasingly being recognized as an important driver of water quality at local to regional scales. Synchronous changes in lake algal biomass associated with climate-mediated variability in the hydrological cycle suggests that climate plays an important role in chlorophyll a (chla) dynamics (Baines et al. 2000). In general, increases in nutrient loading from greater precipitation exacerbate water quality problems through increases in chla, changes in phytoplankton community composition (Jeppesen et al. 2009), and decreases in water clarity (Rose et al. 2017). In the northeastern United States, summer air temperatures and winter precipitation were significant predictors of summer algal biomass (Collins et al. 2019). Temperature has been associated with increased chla in other regional studies (Woelmer et al. 2016). The dominance of cyanobacteria, which may generate harmful algal blooms (HABs), increases at warmer temperatures despite no changes to nutrient levels or total nitrogen: total phosphorus ratios (McQueen and Lean 1987). Warmer air temperatures contribute to earlier ice-out, leading to a longer growing season and higher chla levels (Preston et al. 2016), and changes in the onset and strength of thermal stratification can lead to increases (e.g. Jöhnk et al. 2008) or decreases in algal biomass (e.g. Verburg and Hecky 2003). Additionally, while temperature is generally positively correlated with algal biomass, the response can vary depending on lake trophic status, trophic interactions, and resource availability (Kraemer et al. 2017).

Solar radiation is important for primary production, and recent changes in solar radiation at earth’s surface have the potential to influence chla. Broadly, the earth’s surface underwent a period of declining solar radiation from around 1950 until the mid-1980s, referred to as ‘global dimming’, after which solar radiation increased and stabilized (Wild et al. 2005; Sanchez-Lorenzo et al. 2015), creating a recent period of ‘global brightening’ (Wild et al. 2015). These large-scale changes have been attributed to changes in the transparency of the atmosphere owing to anthropogenic aerosol emissions and/or cloudiness (Gan et al. 2014; Long et al. 2009; Mateos et al. 2014). Recent trends in brightening are most apparent in regions that have long been industrialized, whereas newly industrialized regions have seen continued dimming (Wild 2012). For example, from 1980 to 2010 in Europe and North America, average surface solar radiation increased up to 8 W m−2 per decade (Wild 2012). In contrast, surface solar radiation decreased by up to 10 W m−2 per decade in India (Wild 2012). Patterns in solar radiation brightening and dimming vary spatially (e.g. Wild et al. 2012) and temporally (e.g. Sanchez-Lorenzo et al. 2015). For example, there may be increases in solar radiation in one season but decreases in another (Deng et al. 2019; Sanchez-Lorenzo et al. 2015). Long term trends can be substantially greater than that of the 11-year solar sunspot cycle, which varies only by approximately 1 W m−2 and is known to have a wide range of effects (Engels and van Geel 2012). Declines in solar radiation are associated with similar reductions in PAR, contributing to “underwater darkening” and decreases in Secchi depth across lakes (Zhang et al. 2020), and have the potential to influence algal growth and biomass as well as other components of lake ecosystems.

Lake basin morphology and hydrology have the capacity to influence water quality, particularly with respect to interactions with other drivers. Factors such as lake depth and surface area influence physical, chemical, and biological processes within the lake, with consequences for mixing regimes, nutrient availability, and residence times. Deeper lakes typically have both lower total phosphorus (TP) and lower chla concentrations (Canfield et al. 2016; Fergus et al. 2016; Liu et al. 2012). For example, deeper lakes had clearer water in Wisconsin, although this only occurred in watersheds with low levels of agriculture (Rose et al. 2017). Lake surface area has also been identified as an important contributor to chla (Woelmer et al. 2016). Among deeper lakes in eastern China, those with a smaller surface area had higher chla concentrations compared to larger lakes (Huang et al. 2014). Longer residence times have been associated with lower chla concentrations in Europe (Noges 2009), but with higher concentrations in eastern China (Liu et al. 2012). However, none of these previous studies have comprehensively considered lake geomorphometric and hydrologic characteristics in the context of both climate and nutrient factors.

Current trajectories in climate, the continuing influence of nutrient management legacies (Sharpley et al. 2013), and the growing demand for clean freshwater underscore the importance of understanding drivers of water quality. Patterns in water quality can be highly variable, driven by many factors that operate at different spatial scales (Fergus et al. 2016). Nutrient concentrations, particularly phosphorus, are well-established as key predictors of chla (Prairie et al. 1989; McCauley et al. 1989; Abell et al. 2012; Quinlan et al. 2020), but macro-scale analyses have focused primarily on temperature and precipitation, while also excluding many geomorphometric variables (e.g. Collins et al. 2019). Additionally, previous studies have not included potentially important climate factors, such as solar radiation (Wild 2012) and other variables that can influence solar radiation at more local scales (e.g. cloudiness). Essentially, the full suite of climate and lake geomorphic variables has not been incorporated into studies of lake chla across a wide range of lake conditions.

We used a large lake chla database from 2561 lakes in 7 countries to explore how nutrient, climate, and geomorphic factors influence water quality. Using a similar dataset, Quinlin et al. (2020) had used a single-predictor model incorporating co-variation of additional predictors, but here we formally partition that variation. Such large macroscale ecological studies pose challenges such as the incorporation of interactions across scales (Soranno et al. 2010; Heffernan et al. 2014), diverse data objects (Heffernan et al. 2014; Levy et al. 2014), and varied data-intensive approaches (Levy et al. 2014), yet can reveal interactions among scales that are important features of ecological systems (Peters et al. 2007; Soranno et al. 2010). In this study we asked, in addition to total phosphorus, which climatic and lake geomorphometric factors influence water quality? We used this lake chla dataset with a more complete set of climate variables to examine the relative importance of nutrient inputs, climate conditions, and lake morphology factors on chla concentrations for lakes worldwide. We explored the relative roles of temperature, solar radiation, cloud cover, and precipitation, at various seasonal scales as well as a wide range of geomorphic factors (depth, residence time, volume, surface area, elevation, and watershed area).

Methods

Data acquisition and dataset compilation

Assembling large-scale data into one unified dataset can be challenging because of the various thematic datasets (e.g., hydrological, geological, morphological, meteorological), data sources (e.g., in situ sampling, remote sensing, laboratory processed), and the various jurisdictions the data was collected from which had their own protocols, measurement techniques, and research purposes (Soranno et al. 2010; Heffernan et al. 2014; Levy et al. 2014). Further, the datasets may contain corrupted, missing, or meaningless data which makes it difficult to extract the required results and reduces the power of the models (Kelling et al. 2009; Michener and Jones 2012; Levy et al. 2014). Data consistency and completeness is an important part of the qa/qc process to reduce data gaps and noise. However, once the data is prepared, data-driven models can allow for large amounts of knowledge to be extracted while minimizing the cost and time as well as maximizing the accuracy, speed, reliability, and comprehensibility of the models produced (Vargas et al. 2011).

From a large lake chla database that included total phosphorus (Filazzola et al. 2020), we used 2561 lakes for which we were also able to acquire climate and solar radiation data (Fig. 1; Table S1). Although the distribution of the lakes is heavily biased towards North America, our dataset contained a wide range for all variables, with an average of three orders of magnitude for water quality and geomorphic factors (range 2–6 orders of magnitude; Table S1). Each chla measurement corresponded with the year in which the measurement was taken and the lake’s latitude and longitude. In instances where the same lake was sampled in multiple locations, we took the median values of measured data for that lake. When a lake was sampled multiple times within the same year (e.g. monthly), we extracted data from the time when chla values were greatest. If a lake was sampled over multiple years, we selected data from the most recent year because it most closely resembles the most recent status of that lake. TP and climate data were collected in the same manner and matched with corresponding chla values. The Laurentian Great Lakes were also excluded because of significant spatial heterogeneity in water quality due to their size. While this dataset includes samples collected with a variety of methodologies, we do not anticipate that the international multi-jurisdictional nature of the data will influence analyses, as Hanna and Peters (1991) showed that differences in protocol and sampling depth for TP and chla did not add variation to TP-chla response models (Filazzola et al. 2020; Quinlan et al. 2020). If lake geomorphometrics were not included in the dataset, we extracted them from HydroLAKES (Messager et al. 2016). HydroLAKES provides the shoreline polygons of all global lakes with a surface area of at least 10 ha, as well as estimates of the shoreline length, average depth, water volume and residence time (Messager et al. 2016). The unique IDs in the lake chla dataset match those in HydroLAKES for this purpose. (Fig. S3; Supplementary Information Table 1).

Fig. 1
figure 1

Map showing 2561 lake locations with water chemistry and climate data (air temperature, precipitation, solar radiation, and cloud cover)

For climate variables, we extracted air temperature (°C), precipitation (mm), cloud cover (%) for each lake and sampling year from the University of East Anglia’s Climate Research Unit (CRU) online open access database (Harris et al. 2020). CRU interpolates air temperature and precipitation from meteorological station measurements and summarizes the database on a grid at a 0.5° latitude and longitude resolution. Shortwave incoming solar radiation in the near infrared (0.2–4 μm) data was collected from CLARA A-1 which is gridded at a 0.25 × 0.25 degrees resolution between 1982 and 2009 (Karlsson et al. 2013). Using the monthly mean temperature, precipitation, cloud cover and solar radiation values, we calculated the mean, maximum, and standard deviation annually, for each season, and at a lag of 1 year. For the scope of our study we used spring (March, April, May for the Northern Hemisphere, and September, October, November for the Southern Hemisphere) and summer (June, July, August for the Northern Hemisphere, and December, January, February for the Southern Hemisphere) climate variables because that is when algal biomass is the most abundant.

Data analyses

Chlorophyll a was tested against total phosphorus (TP), climate (solar radiation, summer and spring air temperature, precipitation, and cloud cover) and geomorphometry (lake mean depth, elevation, area, residence time, total volume, and watershed area). We assessed normality of all variables using a Kolmogorov–Smirnov test and a qqplot using the “ks.test” and “qqnorm” functions in R. The primary variables of interest (chla and TP) were log10 transformed before applying the randomforest function (package = randomforest) with iteration = 5 (Liaw and Wiener 2002). Kendall’s Tau correlation coefficients were calculated between all variables using the “cor” function in R.

We used (i) random forests; (ii) linear models; and (iii) principal component analysis to evaluate the relationships between chla and nutrient, climatic, and geomorphometric factors. The random forest was used because it is a powerful explanatory analysis and helped us identify the most influential predictor variables. We then used linear models to gauge how strong the relationship between chla and its strongest drivers were. The PCA was finally used to identify synergies between predictor variables, which was something not evident from the random forest or linear models. The three analyses each provide novel insights which better elucidate the relationship between chla and its drivers, however, the three models also drew similar conclusions, which suggests the results were robust.

First, we conducted a random forest analysis to identify the nutrient, climatic, and geomorphometric factors that influence chla at a statistically significant threshold. Decision trees are the basic building blocks for a random forest. The decision tree is a machine learning algorithm which fits complex datasets with interacting predictor variables (Ali et al. 2012; Ben-Haim and Tom-Tov 2010). Random forest analyses are classified as a strong learner because they are built as an ensemble of decision trees (Ali et al. 2012; Breiman 2001). Random forest analyses work through bagging or bootstrap aggregation where subsets of the training data are randomly sampled, and a model is fit to these smaller data sets after which the predictions are aggregated (Ali et al. 2012; Breiman 1996, 2001). For our analysis we conducted 5 iterations consisting of 500 trees and proximity = TRUE as a similarity measure which accounts for the number of times certain variables end up on the same node. Cross validation is not required because bootstrap sampling excludes data that was not used in the model and creates an out of bag error (oob). This prevents overfitting in the random forest model.

We then investigated the relationship between chla and TP, and the climate variables mentioned previously. The linear regression models the relationship between the predictor and response variables by fitting a linear equation (Y =  + ε) to observed data (Bhattacharya and Burman 2016). We report linear regression results here because linearity checks showed that the only relationship which exhibited non-linearity was chla-TP, and generalized additive models (GAMs) showed similar results to the linear models. To test the variables influencing chla, we fitted linear regression models (function: lm) with chla as the response variable and the climate, nutrient, and lake morphometric variables as the predictors. We ran separate models for each of the predictors. Finally, we also calculated the Akaike information criterion (AIC) values for each corresponding model, where a lower AIC score is indicative of the most parsimonious model.

Finally, we conducted a Principal Components Analysis (PCA) in R (function prcomp) to summarize the relationships between chla, climate, and nutrient levels with a reduced number of dimensions. The previous two models showed which predictor variables influence algal growth in lakes and how strongly they influence it; however, the PCA emphasizes major directions in variation and helps identify the relation between predictor variables and how they may have combined effects on algal growth in lakes. The PCA is an algorithm that generates principal components (ordination axes) along which the data show the maximum amount of variation. One of the outputs of a PCA is a two-dimensional ordination plot which can allow for the visualization of relationships among predictor variables and among individual observations (Legendre and Legendre 2012). We did not include the geomorphometric variables because there were missing data and we did not want to comprise our sample size, which would have been reduced by 37%. Further, we wanted to focus more so on the impacts climate and nutrients have on lake primary production, as these were the major factors explaining variation overall. We generated the PCA visual using the ‘fviz_pca_var()’ function from the package = factoextra.

Results

The random forest analysis using nutrient concentrations, climatic variables, and lake geomorphometric characteristics as predictors explained 60% of variation in lake chla levels. The random forest regression had a mean of squared residual of 0.12 suggesting that the model is a good fit (Fig. 2). Total phosphorus was the most important factor influencing lake chla concentrations, explaining approximately 42% of the variation that was explained (Fig. 2). Other climatic variables, comprising summer and spring air temperatures, solar radiation, precipitation, and cloud cover cumulatively explained 38% of the total variation that was explained (Fig. 2). Notably, the most influential climatic variables were summer and spring air temperature, but spring solar radiation was the next most influential climate variable—even more so than precipitation and cloud cover. Geomorphometric characteristics accounted for the least amount of variation in chla and contributed to 20% of the total variation that was explained, with lake watershed area being the most important (Fig. 2).

Fig. 2
figure 2

Stacked bar plot representing the proportion of variation in chla explained by each variable in the random forest model. Variables are separated into groups characterized by nutrients (total phosphorus), climate (spring and summer solar radiation, cloud cover, precipitation, and air temperature), and morphology (elevation, lake area, mean depth, residence time, volume, watershed area)

The linear models supported the results from the random forest (Fig. 3). The TP-chla relationship was best explained by a linear model (P < 0.001, R2adj = 0.47, AIC = 6581.4) which suggests a fairly strong model fit and a significant positive relationship (Fig. 3). The linear models also supported the influential role of solar radiation (P < 0.001, R2adj = 0.041, AIC = 8367.04) (Fig. 3), summer air temperature (P < 0.001, R2adj = 0.14, AIC = 8085.00), and summer precipitation (P < 0.001, R2adj = 0.019, AIC = 8425.50) as positive correlates of chla. Conversely, summer cloud cover (P < 0.001, R2adj = 0.045, AIC = 8356.17) was negatively associated with chla.

Fig. 3
figure 3

Chlorophyll a concentrations plotted against the most influential climate and geomorphometric variables, with linear models for a total phosphorus (R2adj = 0.47), b spring solar radiation (R2adj = 0.041), c summer air temperature (R2adj = 0.14), and d summer cloud cover (R2adj = 0.045)

The first two axes of the PCA explained 53.7% of the variation, with principal axis one explaining 33.8% and principal axis two explaining 19.9% of the variation (Fig. 4). Chlorophyll a was closely associated with spring and summer temperature, TP, and also associated with spring solar radiation (Fig. 4 and S1).

Fig. 4
figure 4

The most variation was explained by Principal Component 1 (Dim1), followed by the second principal component axis (Dim2). The arrows depict the environmental variables considered in the analysis. The longer the arrow, the more influential the variable. The angle between arrows suggests the correlation between variables. The acronyms are as follows: TP (total phosphorus), Temp (temperature), Ppt (precipitation), Srad (solar radiation), Cldcvr (cloud cover). TP, spring and summer temperature, and spring Srad are closely aligned with chla, and summer Srad is more distantly aligned with chla. Spring and summer cloud cover are orthogonal to chla

Discussion

Our study examined chla concentrations across a large gradient of lakes to identify the most influential factors associated with algal biomass and attempted to understand how these potential drivers may synergistically influence the abundance of lake algae. Our models confirmed that total phosphorus (TP) was the most influential driver of chla concentrations, yet, the combined impact of climate variables (temperature, solar radiation, precipitation, and cloud cover) was equally as important. In particular, solar radiation had a critical influence on summer chla when incorporated into our models (see Supplementary Information for the models without solar radiation for comparison). Lake geomorphometric characteristics also explained some variation in our models, but to a much lesser extent when compared to TP and climate variables.

Our results underscore the role of total phosphorus as a primary predictor of chla in freshwater lakes even when analyses are global in scale, maximizing climate gradients. We found that TP explained approximately 40% of the variation in chla concentrations. Total phosphorus has been well established as a predictor of chla, including at large scales (Abell et al. 2012; McCauley et al. 1989; Prairie et al. 1989; Quinlan et al. 2020). Freshwater ecosystems naturally act as nutrient sinks, as well as receive phosphorus inputs from a number of human activities (e.g. excess P-fertilizer use (Carpenter 2005), P-containing pesticides (Arbuckle and Downing 2001), and domestic and industrial sewage (Smith and Schindler 2009). Our results showing that the correlation between TP and chla concentration was moderate (⍴ = 0.5; Fig. S1) and that TP explained approximately 40% of the variation in chla concentrations are consistent with previous examinations of the TP-chla relationship at global scales (Abell et al. 2012; Quinlan et al. 2020). Across a large gradient of phosphorus concentrations, the TP-chla relationship is sigmoidal, structured by breakpoints related to limitation by nitrogen among other factors (Quinlan et al. 2020). Variation within the global TP-chla relationship is large and the nature of the relationship varies among regions (Filstrup et al. 2014; Quinlan et al. 2020). This variation is postulated to be due to a wide array of factors in addition to nitrogen limitation both when phosphorus is very high (e.g. Filstrup and Downing 2017) and very low (e.g. Morris and Lewis 1988), as well as watershed land use (Filstrup et al. 2014). For example, the accumulation of algal biomass could be limited by cold water temperatures (Butterwick et al. 2005) or shading from dense algal blooms (Agustí et al. 1990). Alternatively, algal biomass could be greater than expected due to a combination of lake depth and transparency allowing for specific species to thrive (Hennemann and Petrucio 2016).

In addition to exploring the chla-TP relationship, we were able to examine the importance of climate variables for predicting chla levels. The combination of climate variables considered in our study explained as much variation as TP in global chla concentrations. For example, past studies have suggested that chla concentrations were lower in regions with less surface level solar radiation (Zhang et al. 2020). Light limitation of algal growth has been demonstrated in many lakes by experiments (e.g. Dubourg et al. 2015) and can vary by season (e.g. Kolzau et al. 2014). For example, in western Lake Erie, solar radiation was a top predictor of chla concentrations in spring and summer compared to other climate and nutrient factors, explaining 38 and 25% of the total variation respectively each season (Tian et al. 2017). Increases in winter and spring solar radiation over the past two decades were drivers of phytoplankton abundance and composition in those seasons in Lake Taihu (Deng et al. 2019). Increases in solar radiation have also been linked to warming of lake water temperatures around the globe (O’Reilly et al. 2015; Schmid and Köster 2016; Zhong et al. 2016). We observed strong correlations between solar radiation and air temperature across a wide range of lakes (Fig. S1). Solar radiation contributes to lake heat budgets and water temperatures (Schmid et al. 2014), thus our lake temperatures were also likely warmer with greater solar radiation. Warmer water temperatures increase algal growth rates and have been linked to increases in chla and algal blooms (Ho et al. 2019). Since we did not have water temperature data, we could not examine if the importance of solar radiation for predicting chla was related to water temperature, increased light availability, or both. Additionally, we were unable to explore trends in chla concentrations within lakes, which could also facilitate understanding the relative roles and interactions of these different drivers. We suggest that future studies investigate the contribution of global brightening as well as water temperature to chla trends in lakes around the world.

The ability to identify the major drivers of lake chla depends on both the suite of factors incorporated into the analyses as well as the time scales that are involved. The climate variables found to predict chla levels and their relative importance have varied significantly among studies. For example, summer and spring temperature, as well as winter precipitation, were the most important predictors of chla among lakes in the northeastern region of the United States (Collins et al. 2019). However, a similar study found that water clarity in the northeast US correlated more strongly with summer precipitation, and not with maximum air temperatures, while in the midwest there was no relationship to summer precipitation and a negative correlation with maximum air temperatures (McCullough et al. 2019). Most prior studies have been regional in scale and investigated only temperature and precipitation (e.g. Collins et al. 2019) using lag times of conditions ranging between the prior two weeks (Lennard 2019) to the previous year (Collins et al. 2019). In addition, lake trophic status can be an important consideration, as a study of 188 lakes found that the relationship between water temperature and chla was positive for eutrophic lakes, but negative for oligotrophic ones (Kraemer et al. 2017). Thus, while it is difficult to compare specific findings across studies, temperature and precipitation are consistently key overarching factors, and it is clear that lake water quality is sensitive to climate in ways that are highly variable within and across regions.

Our results showed that the relationships between chla and lake geomorphometric and hydrological factors were less important in the broader context of climate and nutrients. Although geomorphic factors (area, mean depth, and elevation) and hydrology (residence time) did play a role in determining chla, these factors were much less important than climate and total phosphorus. Lake mean depth was the most important of these factors, but in combination, morphology and residence time only contributed to about 20% of the variation in chla (Fig. 1). Mean depth and water residence time have explained a much greater proportion of the variation in other studies, but this may be because all lakes considered in those studies were relatively deep (Liu et al. 2012). Ultimately, lake depth affects the mixing regime and degree of stratification, which is probably the more proximal factor influencing the TP-chla relationship. Chlorophyll a was negatively correlated with mean depth, as has been found in other studies (Fig. 3f) (Canfield et al. 2016; Fergus et al. 2016). Shorter residence times are often associated with a flushing effect, leading to relatively lower chla concentrations (Wagner et al. 2011; Filstrup et al. 2014). Longer residence time can allow for the development of greater biomass in lakes that already have high nutrient concentrations (e.g. Huang et al. 2014) but does not appear to influence chla concentrations in lakes that have low nutrient concentrations (e.g. Nõges 2009). Overall, our results are consistent with various regional-scale studies that indicate an influence of lake morphology and hydrology on chla but highlight that these factors are relatively minor compared to climate and total phosphorus.

Conclusion

While our study underscored the importance of TP as a driver of chla concentrations, we also found that the role of climate was just as important as TP. The primary role of TP in chla has been well established, but the variation within this TP-chla relationship is large (Quinlan et al. 2020). Our results indicate that the factors contributing to this variation are primarily composed of a combination of climate and geomorphometric variables, but spring and summer climate variables also clearly emerged as important factors influencing summer chla, further highlighting their role in ecosystem structure at various spatiotemporal scales (Collins et al. 2019). Geomorphometric and hydrological features were also important but had a reduced role relative to climate in terms of explaining overall variation in summer chla. The relatively similar contributions of climate and nutrients suggest that management plans focused primarily on nutrient reductions may also need to consider the implications of ongoing climate change in order to achieve desired outcomes (Trolle et al. 2011; Paerl et al. 2016). Our study highlights the specific aspects of climate that should be taken into consideration, and fortunately future projections for these climate variables are readily available from existing climate models (Flato et al. 2013). Changes in temperature and precipitation during spring and summer seasons are occurring across many parts of the world, and these changes will clearly present challenges for management of water quality if not incorporated into water quality management frameworks.