Introduction

Disruption of the biogeochemical cycle of phosphorus (P) is a major aspect of global environmental change (Smith et al. 1999). The use of P in fertilizers and detergents dramatically increased its bioavailability in the natural environment with corresponding deleterious effects on inland waters, such as increased productivity and altered biological structure (Carpenter et al. 1998), which can be broadly termed “cultural eutrophication”. As well as the unambiguous biological/ecological effects, there are also socio-economic costs associated with eutrophication, namely increased water treatment costs and major loss of amenity value (Dodds et al. 2009). The role of P in the eutrophication of lakes and rivers was recognised in the early-1970s (Schindler 1977) and since that time considerable effort has been made to reduce the availability of P in the natural environment, primarily through the use of sewage treatment plants to remove it from urban wastewater (e.g. Chapra 1977).

Today, however, phosphorus is back on the environmental agenda. Excessive use in agriculture has resulted in P saturation in soils with a resultant increase in diffuse losses of soluble P from soils (Carpenter et al. 1998). Losses of P from soils in its soluble form, as opposed to particulate P loss which requires soil erosion, is more difficult to control, and has correspondingly greater environmental impacts (Sharpley et al. 1994). It is also exacerbated by the links to climate forcing and increased hydrological variability (Jeppesen et al. 2009).

Given the effect of excessive P on the biological structure of freshwater ecosystems, considerable effort has been put into restoration and the re-establishment of natural, background conditions (Bennion and Battarbee 2007). Restoration focuses on reducing P-loading on lakes to arrive at a pre-disturbance P concentration. As most lake monitoring post-dates increases in P-loss from catchments, a number of different approaches have been used to estimate the background P-concentration in lakes, notably export coefficient modelling, geochemical P analysis and diatom-based inference models (see Rippey and Anderson 1996 for a review).

Arguably, there are problems associated with all of these approaches but the acceptance that diatom community structure reflects epilimnetic TP has seen widespread application of diatom-based reconstructions, using so-called transfer functions based on weighted averaging (WA) models (e.g. Hall and Smol 1992; Anderson and Rippey 1994; Bennion et al. 1996). While a number of studies have successfully validated the reconstruction against historic water quality records (Bennion et al. 1995; Lotter 1998), others have identified problems with the approach, most notably the underestimation of present-day TP values in shallow hypereutrophic lakes (Fritz et al. 1993; Bennion et al. 2001; Sayer 2001; Dong et al. 2008) and, perhaps more importantly from a restoration perspective, the over-estimation of the pre-disturbance, background concentration at levels that are inexplicable in terms of land management and phosphorus export (Bennion et al. 2001).

Originally, these problems with diatom-inferred P were seen as the result of spatial hetereogeneity in diatom sedimentation (Adler and Hubener 2007; Anderson 1998), errors due to the temporal variability in water chemistry used in the calibration models (e.g. Stauffer 1988), or bias caused by uneven or truncated sampling along the P gradient (Telford and Birks 2011). In an attempt to correct for the latter two problems considerable effort was put into amalgamating regional datasets in an effort to sample taxa across the full P gradient and thereby derive more accurate estimates of species’ optima (Bennion et al. 1996). Clearly, however, these attempts have been unsuccessful or at least have not provided realistic inferred values at all sites (Anderson and Odgaard 1994; Bradshaw et al. 2005). Further, a growing body of literature over the last 20 years has called for increasing caution in the interpretability of these results either due to confounding secondary gradients or the prevalence of diatom species which are not responding directly to epilimnetic TP (e.g., Fritz et al. 1993; Bennion 1995; Bennion et al. 2001, 2010; Sayer 2001; Dong et al. 2008).

Juggins (2013) highlights a number of problems with the transfer function approach when the underlying assumptions are violated. Those assumptions are (1) that TP is, or is linearly related to, an ecologically important determinant of species distribution and abundance, and (2) the effect of secondary environmental variables or gradients are negligible, or that their co-variation with TP is constant in space and time. In this paper we examine these assumptions in the context of quantitative TP reconstructions using examples from culturally impacted eutrophic lakes in North America and Europe. First, we examine a selection of published TP training sets and assess the degree to which (1) TP is a main driver of diatom distribution, and (2) the resulting transfer functions model an ecological response to TP. Second, we examine the use of these training sets to reconstruct TP and explore the validity of the assumption that down-core changes in diatom assemblage composition are driven by TP and that the methods are insensitive to the effects of other gradients or variables.

Methods

Training sets

Table 1 lists the characteristics of the published TP training sets used in this paper. The first two training sets (Minnesota: MN and Northwest European: NW) are used to investigate the strength of the diatom-TP relationship (see below). The remaining six are used to examine the repeatability of the modelled species responses to TP in different regions of Europe and North America. The original Minnesota (MN) diatom calibration set contained 55 lakes (Ramstack et al. 2003) and was subsequently expanded with an additional 90 sites to extend the TP gradient across four natural ecoregions as well as the intensively urbanised Minneapolis/St. Paul metropolitan area (Edlund and Kingston 2004; Reavie et al. 2005; Edlund 2005). The Northwest European dataset (NW: Bennion et al. 1996) is an amalgamation of six smaller regional datasets from southeast England, the English Midlands, Wales, Northern Ireland, Denmark, and Sweden. The lakes are mostly lowland, shallow, small, slightly acid to alkaline waters with agricultural activity and/or forestry in the catchments. The combined dataset spans a long TP gradient from oligotrophic to hypereutrophic waters. The remaining training sets and their analytical methods are described in Lotter et al. (1998: SW), Wunsam and Schmidt (1995: CE), Gregory-Eaves et al. (1999: AL), Reavie and Smol (2001: ON), Enache and Prairie (2002: QU) and Shinneman et al. (2010: NE).

Table 1 List of diatom-TP training sets and their characteristics

Sediment cores

Details of the five lake-sediment diatom profiles used here are listed in Table 2. Lotus Lake is located in Polk County, Wisconsin, within 15 km of the MN/WI border in the North Central Hardwood Forests ecoregion (Omernik 1987). The lake has a maximum depth of 4.5 m and an approximate surface area of 1.0 km2. The lake catchment is approximately 11 km2 with approximately one-third forested and one-quarter under agriculture. During 2005 and 2007 it maintained an average summer TP concentration of 159 μg L−1 (Ramstack and Edlund 2008). Lotus and nearby Horse Lake are the headwaters to Cedar Lake, which is a US federal 303(d) listed impaired water body. Sediment coring, diatom analysis and 210Pb chronology are described in Ramstack and Edlund (2008). Winona Lake is located within the city of Alexandria, MN, also in the North Central Hardwood Forests ecoregion (Omernik 1987). The lake is approximately 3 km long and less than 0.5 km wide; the surface area is 0.89 km2 and the maximum depth is 2.4 m. The lake was surrounded by farmland following European settlement, but that has changed to being largely surrounded by residential development since the 1970s. The lake is considered hypereutrophic, with current average TP values well over 200 μg L−1 (MPCA 2012). Since the 1960s Winona Lake has been the receiving water body for municipal wastewater treatment effluent. The Alexandria Waste Water Treatment Plant (WWTP) began operation in 1967 and was replaced in 1978 by the Alexandria Lake Area Sanitary District (ALASD) WWTP in 1978. Since 1994 tertiary treatment was introduced at the plant in an effort to reduce nutrient loading to the lake. The site was cored in 2008 as part of a larger project to characterize the trophic and sedimentation history of a suite of lakes in the region to help the Minnesota Pollution Control Agency set management goals. Sediment coring, diatom analysis and 210Pb chronology are described in Edlund and Ramstack (2009).

Table 2 List of lake core-sites and their characteristics

Knud Sø is a relatively large (0.19 km2) and deep (maximum depth 92 m), monomictic, mesotrophic lake situated on Jutland, Denmark. Present day TP concentration is 25 μg L−1. The lake has a predominantly agricultural catchment but the eutrophication of the lake was largely driven by sewage disposal from the small town of Ry, which was effectively eliminated in 1974. There is extensive monitoring data that records the recovery of the in-lake TP concentration from the onset of wastewater treatment in the early 1980s. The sediment core from Knud Sø was collected in 1998 using a 1-m Mackereth corer and dated using 210Pb (Appleby 2001). Vængsø is a shallow, eutrophic lake located some 20 km from Knud Sø and is typical of the many shallow lakes found in Denmark (Jeppesen et al. 1991). The lake has a mean depth of approximately 1 m and a very short retention time (<20 days). Point source nutrient inputs ceased in 1981 but with no immediate effect. The TP concentration has been reduced since the 1980s following biomanipulation (Lauridsen et al. 1994). In general, both point and diffuse nutrient inputs to Danish lakes have been reduced since the early 1980s as a result of the implementation of catchment management plans (Jeppesen et al. 1999). Further information on the limnology of Vængsø are provided by Lauridsen et al. (1993) and details of core retrieval, 210Pb dating and diatom analysis can be found in Anderson and Odgaard (1994). Lough Augher is a small (0.09 km2) eutrophic lake in Co. Tyrone (Northern Ireland, UK). The lake is monomictic and underwent eutrophication as a result of wastewater disposal from a creamery located on the shore of the lake. Details of core dating and the diatom record of this eutrophication and the associated ecological recovery following the redirection of the creamery effluent are provided by Anderson (1989) and Anderson and Rippey (1994).

Numerical methods

We test the assumptions of the transfer function approach described above by using redundancy analysis (RDA) and partial RDA to assess the significance, strength, and independence of TP as a predictor of overall diatom assemblage composition. Specifically, we use RDA with a Monte Carlo permutation test (Leps and Smilauer 2003) to identify which environmental variables in the training sets are significantly related to diatom distribution. An RDA with the subset of significant variables is then used to identify the main species/environment relationships in the NW and MN datasets and to test the assumption that TP is, or is linearly related to, an important ecological variable in these systems.

The strength and independence of the TP/diatom relationship is quantified using RDA and partial RDA to partition the variation in the diatom assemblages into components reflecting the total and unique contributions that can be attributed to (1) TP, (2) pH/alkalinity, and (3) lake maximum depth.

The ordination-based variance partitioning provides a summary of the strength of community response to individual or multiple environmental variables. We also assess the importance of TP as an explanatory variable by assessing the proportion of taxa in the training sets that individually exhibit a significant response along the TP gradient using generalised additive models (GAMs: Yee and Mitchell 1991). These models provide a flexible approach that can fit a range of response forms (linear, unimodal, skewed). We tested two types of responses: (1) the significance of the overall response to the measured TP gradient, and (2) the significance of the unique explanatory power of TP, once the effect of alkalinity has been excluded. GAM modelling was restricted to taxa with more than 10 occurrences. The smoothing parameter was assessed using generalised cross-validation and the significance of the fitted model assessed using an F test (Wood 2006).

If TP is an ecologically important variable controlling diatom abundance and distribution we would expect to observe a similar response of a taxon along the TP gradient in different regions. We test this hypothesis by comparing optima (i.e. the transfer function coefficients) from different regions. However, it is known that estimates of species optima can be biased by uneven or truncated sampling along the TP gradient (Mohler 1983; Telford and Birks 2011). We therefore calculate Spearman’s rank correlation between optima derived from the eight training sets listed in Table 1. This is a less stringent test and simply requires consistency in the order of optima in different regions, not their absolute values. Taxa with less than ten occurrences are excluded and correlations are only calculated between datasets with at least seven taxa in common.

Past estimates of TP concentrations were derived for each of the sediment core samples using either a weighted averaging (WA) with inverse deshrinking (Lotus and Winona) or a two-component WA partial least squares model (Knud Sø, Vængsø and Augher) using the MN and NW training sets respectively. Choice of model type was based on the original training set publication and verified using a randomisation t test (van der Voet 1994; Juggins and Birks 2012). We also project the sediment cores onto an ordination of the training set, constrained by the full set of environmental variables, to provide a qualitative estimate of the core trajectories in relation to the dominant environmental gradients represented in the species–environment ordination. We calculate two measures to estimate the extent to which down-core changes in diatom assemblages are related to TP. The first (λRP) is the ratio of the variance in the diatom data that is accounted for by the diatom-inferred TP reconstruction (calculated as the variance explained by the first axis of an RDA of the sediment assemblages constrained by DI-TP), divided by the variance explained by the first axis of a PCA of the same data: the division expresses the variance accounted for by DI-TP as a fraction of the maximum explainable and facilitates comparison between sites. The second is the correlation between DI-TP and axes 1 and 2 of a PCA of the sediment assemblages. We would expect both the measure of (λRP) as well as the correlation with the PCA axes to be high if down-core changes in species abundance are systematically related to changing nutrient concentrations, as reflected in DI-TP. Finally, we calculate down-core diatom species turnover as the Bray–Curtis distance between adjacent levels, and the down-core rate of change as this measure divided by the time interval between samples. Species data were square-root transformed prior to RDA and WA(PLS). All numerical analyses were performed using R statistical software (R Core Team 2012), with packages vegan (Oksanen et al. 2012), rioja (Juggins 2012) and mgcv (Wood 2006).

Results

Training sets

Figure 1 shows the results of the RDA analysis of the NW and MN training sets using a subset of significant environmental variables. Comparison of the variance explained by axes 1 (NW = 11.2 %, MN = 13.6 % with that from an unconstrained principal components (PCA) ordination (NW = 15.1 %; MN = 17.4 %) indicates that the set of significant environmental variables accounts for a major part of the underlying main gradients in the diatom data. For NW the first RDA axis represents a composite gradient of nutrients (TP, NO3) and alkalinity/pH/conductivity, reflecting the strong correlation of these variables in the training set (r = 0.57–0.76). Depth is negatively correlated to TP (r = −0.47) and contributes to both axes 1 and 2. A similar pattern is shown by MN although in this training set the correlations between TP and pH/alkalinity are slightly lower (r = 0.51–0.58) and there is some separation of these gradients in their contribution to axis 2 (Fig. 1). Depth is strongly negatively correlated to TP (r = −0.78).

Fig. 1
figure 1

RDA ordination biplots of the NW and MN training sets, showing sites and environmental variables, variance explained by each axis (λ1, λ2), and (inset) correlations between TP and selected variables

Results of the RDA analyses indicate that in both the NW and MN training sets, TP is strongly correlated to axis 1, which represents the main gradient of species turnover in the training set data. Results of the variance partitioning are illustrated in Fig. 2 and quantify the strength of this relationship. The figure displays two components of explained variance for each variable/gradient: the marginal or total effect, and the conditional effect, or the fraction of variation that can be uniquely attributed to each variable. For both training sets the total effects of TP and pH/alkalinity are similar, with both gradients individually accounting for 9–11 % of the variance in the diatom data. Lake maximum depth accounts for a total of 5.3 % in NW and 8.8 % in MN. For TP and pH/alkalinity the unique effect is substantially smaller than the total effect, with unique TP effects of 3.9 and 2.5 % for NW and MN respectively. This is not a surprising result, given that a strong positive correlation between TP and pH/alkalinity has previously been demonstrated in natural lakes (Vighi and Chiaudani 1985). Thus, although TP has a strong total effect in these training sets only a relatively small fraction of this total effect (25–40 %) can be uniquely attributed to TP. The remainder (60–75 %) is confounded with other correlated variables, primarily pH, alkalinity and lake depth.

Fig. 2
figure 2

Bar charts showing the percentage of the variance in the square-root transformed diatom data in the NW and MN training sets explained by different hydrochemical and physical gradients. Total height and darker shaded areas show, respectively, the total and unique fractions of variance explained by each gradient. All components are significant (p ≤ 0.05), see text for explanation

The ordination analyses and variance partitioning examine the strength of the whole diatom assemblage response to TP and other variables. Results of the individual species response modelling using GAMs are listed in Table 3. For NW, 15 of the 22 planktonic taxa (68 %) exhibit a significant response along the TP gradient whereas only 47 of 119 benthic taxa (39 %) do so. Similarly, for MN, 18 of the 26 planktonic taxa (69 %) but only 48 of 114 (42 %) benthic taxa show a significant response to TP. Species response modelling with TP as a single explanatory variable essentially tests the significance of the total TP effect on individual species distribution. The final row in Table 3 lists the number of taxa with significant responses to TP with the effect of the confounding alkalinity gradient removed. Only 6 (27 %) planktonic and 22 (18 %) benthic taxa show significant responses in the NW training set, and only 7 (27 %) planktonic and 25 (22 %) benthic taxa show significant responses in the MN training set.

Table 3 Results of the GAM species response modelling, showing the total number of planktonic and benthic diatom taxa with more than 10 occurrences in each training set (N) and the number that exhibit significant (N sig: p ≤ 0.05) response along the TP gradient or the TP gradient after accounting for the effects of alkalinity (TP | Alkalinity)

The comparison of species optima between the MN and NW training sets analysed here and the six additional training sets listed in Table 2 is shown in Fig. 3. Considering all taxa, there is a significant correlation in the species optima between nine of the possible 28 training set inter-comparisons, with the highest correlations between NW and MN (r = 0.45) and MN and ON (r = 0.44). There are only 6 significant correlations when benthic taxa alone are compared and the correlations are generally lower (mean = 0.14). For planktonic taxa 9 of the 14 possible comparisons are significant and the correlations are high (mean = 0.54).

Fig. 3
figure 3

Scatterplot matrix showing the relationship between species weighted averaging optima for the training sets listed in Table 1. The three columns of numbers in the bottom left cells show number of taxa compared (top), the Pearson’s product moment correlation (middle) and associated p value (bottom) for all taxa (left), planktonic taxa only (middle) and benthic taxa only (right). Significant correlations (p ≤ 0.05) are asterisked. Filled circles planktonic taxa. Grey lines show 1:1 relationship. See Table 1 for training set codes

TP reconstructions and sediment core trajectories

Summary diatom stratigraphy, TP reconstructions, and assemblage trajectories projected onto the ordination of the training sets are shown in Fig. 4. Summary statistics for the reconstructions are listed in Table 4. The key changes at each site are as follows. At Lotus Lake the basal diatom assemblages are dominated by small fragilarioid taxa. Aulacoseria ambigua increases gradually from around 1900, followed by an increase in Fragilaria crotonensis and, later, Asterionella formosa. Diatom-inferred TP (DI-TP) increases from c. 35 μg L−1 at the base to c. 50 μg L−1 at the core top, significantly lower than the measured value of c. 160 μg L−1. The core trajectory shows most variation on axis 2 with no systematic trend along axis 1, the nutrient gradient. The fraction of variance in the fossil assemblages explained by the TP reconstruction (λRP) for the whole core is low (0.37) but substantially higher for the post-1950 section and there is only weak or moderate correlation between DI-TP and PCA axes 1 and 2 respectively. At Winona Lake the basal assemblages are dominated by fragilarioid and other benthic taxa. These are replaced from the 1940s onwards by a succession of planktonic taxa from A. ambigua, A. granulata, and, after 1970, by Cyclotella meneghiniana and Stephanodiscus taxa. DI-TP increased steadily from 40 μg L−1 in 1800 to c. 100 μg L−1 at the present, somewhat lower than the measured value of over 200 μg L−1. λRP and the correlation between DI-TP and PCA axes are high for both the whole core and post-1950 section and the core clearly tracks the nutrient gradient in the training set ordination. The assemblages at Knud Sø are characterised by the planktonic taxa Aulacoseria subarctica, A. islandica, Stephanodiscus parvus and S. alpinus throughout the core, with increases in small Cyclotella species and F. crotonensis c. 1850, and a large increase and then decrease in S. parvus between 1950 and 1980. DI-TP fluctuates between 50 and 60 μg L−1 from 1750 to 1850, is relatively stable at 40 μg L−1 until the late 1950s when it rises to c. 65 before falling to c. 30 at the top of the core. The peak TP concentrations in the 1980s and subsequent decline are in close agreement with historical water quality data at this site (Fig. 4). λRP and the correlation between DI-TP and PCA axes are low for the whole core but larger, and the latter significant, for the post-1950 section. The lower part of the core shows most variation along axis 2 of the training set ordination although levels corresponding to the increase and subsequent decrease of DI-TP in the upper part of the core track track axis 1. At Vængsø the total species turnover is low, with assemblages throughout the core characterised by small fragilarioid taxa and the planktonic forms Cyclostephanos dubius, A. formosa, and F. crotonensis, and a succession in Cyclotella taxa from C. meneghiniana to C. radiosa around 1900. DI-TP fluctuates between 100 and 130 μg L−1 before 1850, drops to 80–100 μg L−1 between 1850 and 1900 and gradually increases to c. 130 μg L−1 at the top of the core. λRP and the correlation between DI-TP and PCA axes are low for the whole core but higher and significant for the post-1950 levels. Most of the variability in the core trajectory is along training set axis 2. The basal diatom assemblages at Lough Augher are dominated by the benthic taxa Achnanthes minutissima, Cocconeis placentula and small fragilarioid forms. Starting around 1900 these are replaced by a succession of planktonic taxa including Aulacoseira ambigua, A. formosa and, after 1940, small Stephanodiscus taxa. λRP and the correlation between DI-TP and PCA axes are high and significant for the whole core and post-1950 levels, and the core trajectory tracks along training set axis 1, although the upper levels also shows considerable variability along axis 2.

Fig. 4
figure 4

Summary diagrams of the five study sites listed in Table 2, showing (from left) stratigraphic diagrams with relative abundance of selected taxa, DI-TP reconstructions, PCA axis 1 (solid line) and 2 (dashed line) scores, and core trajectories projected onto the training set RDA ordinations. Core tops and bottoms are identified by solid circles and squared respectively. Environmental variables are shown enlarged in Fig. 1. The dashed line from 1971 to the present on the Knud Sø DI-TP reconstruction represents TP concentrations from monitoring data

Table 4 Summary statistics for the core reconstructions, showing the variance in diatom data explained by DI-TP as a fraction of the total explainable variance (λRP), and the correlation between DI-TP and PCA axes 1/axis 2 of the diatom data, both calculated for the whole core and post-1950 levels

Discussion

The strength of the diatom-TP relationship and effect of confounding variables

Quantitative TP reconstruction using diatoms is underpinned by two fundamental assumptions (Juggins and Birks 2012). The first is that TP is, or is linearly related to, an ecologically important determinant of species distribution and abundance. That is, spatial and temporal variation in species (relative) abundance is systematically related to variation in TP concentration between sites or within a site through time. In most ecological systems species distribution is a complex function of multiple, often inter-correlated, environmental factors. The second assumption requires that that the effect of these secondary environmental variables or gradients is either negligible, or that their joint distribution with TP is constant in space and time.

Constrained ordination of the MN and NW training sets clearly demonstrates that TP is associated with the main biological gradient in these datasets (Fig. 1). This is not surprising as the training sets were designed explicitly to capture biological variation along the TP gradient. We conclude therefore, that TP is, or is related to, an ecologically important variable in these data. However, results of the variance partitioning clearly demonstrate that the TP–diatom relationship is strongly confounded with other variables, notably alkalinity and depth and only a relatively small fraction (25–40 %) of the variance accounted for by TP can be uniquely attributed to TP. It is impossible to determine from the variance partitioning alone if the confounded, or shared, fraction is due to the direct ecological effect of TP, or if TP is acting as a surrogate for other, underlying, causal factors that are simply correlated to TP in the training set. The hypothesis that TP has a direct ecological effect and is not acting as a surrogate can, however, be tested by comparing optima from different regions: if TP is a direct ecological determinant we would expect taxa to show similar responses to TP in different regions. The results of this test indicate that for planktonic taxa there are indeed consistent patterns in the optima between most regions. However, there is either poor or no repeatability in the optima of benthic taxa between different training sets, even in the rank correlation test, which simply requires consistency in the order of optima, not their absolute values. There are two possible explanations for the observed lack of consistency in benthic species response to TP. The first is that TP, or its correlates, is not an ecologically important determinant for many taxa. This interpretation is supported by the GAM species response modelling that indicates less than half of benthic taxa in these training sets exhibit a significant response to TP (Table 3). The second is that even where there is an observed relationship, TP is acting as a surrogate for other variables and does not, itself, have a direct ecological effect. This interpretation is consistent with the observation that only c. 50 % of the benthic taxa that have a significant response to TP still exhibit a significant response once the effect of alkalinity has been removed.

All the training sets listed in Table 1 exhibit an apparently strong TP–diatom relationship (as quantified by the squared correlation between observed and diatom-inferred TP in the training set). The lack of spatial transferability of these models, together with the magnitude of the shared component of explained variance in the MN and NW training sets suggests that at least part of the apparently strong TP–diatom relationship is the result of surrogacy: the strength of the relationship is boosted by the correlation between TP and other co-varying causal factors. The relationship between TP and these co-varying factors varies spatially, leading to the lack of spatial consistency in model parameters.

TP transfer functions are based predominately on all species, with the consequence they include many taxa that are either not related to TP, or taxa whose relationship with TP is confounded by other factors. The TP optima for the first group of taxa are meaningless and, because they are included in the model, will introduce spurious errors into the reconstruction. For the second group, assumption two, that the co-variation of TP with any confounding factors is invariant in time, is crucial. TP models are predominately generated from lakes in impacted landscapes, and the relationships between TP, and other chemical and limnological variables has changed profoundly over the recent past. Indeed, the comparison between training sets indicate that these relationships are spatially heterogeneous. It is unlikely they will be time invariant and assumption two is, therefore, almost always violated. The consequence is that changes in co-varying factors, independent of TP, will be manifest as spurious and misleading fluctuations in the TP reconstruction. Anderson (2000) described this effect in diatom-temperature reconstructions and the warning is equally relevant here.

Even in moderately deep and productive lakes benthic diatoms can form a substantial component of the total assemblages. In oligotrophic and mesotrophic lakes the littoral zone (the benthos) contributes most to whole lake production and when whole basin diatom accumulation rates have been undertaken this remains the case after eutrophication (Anderson 1989). Ecological controls on benthic diatoms in meso-eutrophic lakes are numerous and reflect habitat complexity, alternative and diverse nutrient sources, an intense and localized microbial loop and perhaps, most importantly, light (Stevenson et al. 1996; Bennion et al. 2010). Neither the environmental sampling associated with diatom training set development, nor the WA model themselves, reflect this ecological complexity. A greater percentage of planktonic diatoms are significantly associated with TP than benthic forms in the training sets (70 vs. 40 %; Table 3) and planktonic taxa generally show consistency in optima in between regions (Fig. 3). Indeed, lakes where validation exercises have been undertaken and show good agreement between monitored and modelled data tend to be deep, stratifying lakes with a predominantly plankton flora (Bennion et al. 1995; Rippey et al. 1997; Lotter 1998; see below for a discussion). Ecological arguments for the lack of benthic diatom response to TP have led some to suggest that benthic diatoms should be omitted from TP-reconstruction models (Siver 1999). This is seldom done as TP-models generally exhibit lower prediction errors when all taxa are included (Bradshaw and Anderson 2001; Philibert and Prairie 2002). Results here suggest that this decrease in prediction error is misleading and the result of boosting the apparent explanatory power of the TP gradient by co-varying factors. Importantly, our results show that statistical significance in a constrained ordination and low apparent training set errors, even under internal cross-validation, are not sufficient to demonstrate a statistically or ecologically sound model or reconstruction.

Ecological species turnover

Eutrophication results in considerable change in the ecological/community structure of lakes, change that is clearly reflected in the relative abundance of diatoms, especially planktonic forms (Bradbury 1975). These “classic” temporal eutrophication sequences, for example, from CyclotellaAulacoseriaAsterionella/Fragilaria crotonensis to small Stephanodiscus species (Bradbury 1975; Anderson 1989) reflect the known ecology of the main planktonic diatom groups associated with lake trophic status (e.g. Willén 1991). These species changes, together with increasing percentage abundance of planktonic diatoms and diatom accumulation rates, provided the basis of interpreting sediment diatom records in terms of changed lake productivity up until the late 1980s (e.g. Anderson 1989). DI-TP reconstructions attempt to model species trends and community turnover (cf. Fig. 5a) that have been well known and interpreted subjectively by diatomists for decades and quantify them in terms of changing TP concentrations (reviewed in Hall and Smol 2010).

Fig. 5
figure 5

a Diatom community turnover and rate-of-change (ROC) for core Lough Augher AA2 (see Anderson and Rippey 1994 for details); b DI-TP for core AA2 (see Fig. 4) and manure P-loading for the Blackwater catchment (data from Foy et al. 2003). c Geochemical P accumulation rate for core AA10 together with the whole basin geochemical accumulation rate (data from Rippey and Anderson 1994). The AA10 core was taken in 12-m water depth and was influenced by redox changes; the whole basin rate includes shallow water cores where P released from deep-water sediment can accumulate. However, the overall P-loss rate increases with anoxia and so the geochemical record does not faithfully reflect the loading history

Diatom abundance is a complex function of many interacting variables (light, alkalinity, inorganic carbon, silica, nitrogen, biotic factors as well as phosphorus) and while nutrients are undoubtedly important their significance can change over a range of timescales, e.g. seasonally to decadal (Bradshaw et al. 2002). Weighted averaging transfer functions simplify this ecological complexity and attempt to model assemblage change, which is the result of diverse ecological and biological drivers, as a function of a single variable, TP. The emphasis placed on TP, as opposed to biologically available P, is justified in many studies, despite its ecological naivety (Reynolds and Davies 2001), because it is readily interpretable in a management context. The core trajectories displayed in Fig. 4 illustrate the complex and varying diatom response to enrichment. Lough Augher, and to some extent, Winona Lake, follow a AulacoseriaAsterionella/Fragilaria crotonensis/Stephanodiscus sequence and their trajectories track the TP gradient in the training set ordination, albeit with some variation, especially at Lough Augher, along ordination axis 1. The DI-TP reconstructions at these sites are plausible. The overall trends and inflection points are consistent with the known history of enrichment and absolute values for the uppermost level are consistent with monitored data in terms of lake status class (e.g. hypereutrophic at Winona Lake and eutrophic at Lough Augher).

This “classic” trophic diatom sequence is not observed at Lotus Lake, Knud Sø or Vaengsø and at these sites the core trajectory primarily tracks along ordination axis 2. It is likely that TP is not the primary controlling variable at these sites and that the TP transfer functions and resulting reconstructions are confounded by other variables, including depth and alkalinity. At Vaengsø, Anderson and Odgaard (1994) highlighted the fact that DI-TP values in the 1980s agreed well with measured values but the reconstruction failed to identify the most recent recovery. This poor fit was attributed to the strong depth gradient in the small Danish training set and dataset expansion to remove the depth gradient was suggested. However, a more fundamental problem with the Vaengsø reconstruction is the basal (pre-1900) values of more than 100 μg TP L−1, largely driven by Cyclostephanos dubius, values which are environmentally unlikely (see below). Bradshaw and Anderson (2003) identified nitrogen and silica as possible important controlling variables for this planktonic diatom based on results of contemporary algal monitoring. This species is also common in high alkalinity lakes and has been abundant for a considerable length of time in lakes in Denmark and southern Sweden (Bradshaw and Anderson 2003). Thus, long-term diatom dynamics at this site are probably not driven primarily by phosphorus, invalidating the DI-TP reconstructions.

At Knud Sø, the DI-TP values at the base of the core are similarly high but here the presence of Aulacoseria subarctica may have led to erroneous values. This diatom is associated with low transparency waters, often due to high DOC concentration (Gibson et al. 2003); changing abundance of this species may therefore reflect changes in a secondary gradient (light, mixing) uncorrelated to lake TP. Anderson et al. (2012) invoked a similar explanation at White Lough where DI-TP was unchanged prior to 1960 (reflecting dominance of A. subarctica) despite unambiguous evidence for increased catchment P-loading in this period. Finally, at Lotus Lake the basal assemblages are dominated by small fragilarioid taxa that are responding to variables other than TP. These are replaced following increased P loading by Aulacoseria ambigua, but the reconstructed core-top values of 40–50 μg TP L−1 grossly underestimate the monitored values of c. 160 μg TP L−1. Aulacoseira ambigua is another planktonic diatom where light and mixing may have a stronger influence on its abundance, especially in this shallow lake, rather than nutrients per se. This polymorphic diatom is found across a range of trophic levels in NW Europe, Scandinavia and North America. Its “optimum” ranges from 12 to 170 μg TP L−1 in the training sets examined in this study, which suggests a more complex response to environmental change. Clearly, changes in diatom assemblages at Lotus Lake, Knud Sø and White Lough are being driven by factors other than TP, resulting in unreliable and misleading reconstructions.

Calibrating diatoms against a disturbance gradient: the dominance of SRP in aquatic systems post-1950

Diatom-inferred TP values have been compared with monitored data at a number of sites, often favourably (Bennion et al. 1995; Lotter 1998). At Knud Sø there is also good agreement between reconstructed values and monitored data from 1971 to 1990 (Fig. 4) yet the basal DI-TP values are clearly too high for early nineteenth century land management and associated P-inputs to lakes. A compelling explanation for this apparent contradiction is that the DI-TP model is calibrated along a disturbance gradient, in this case a high SRP environment, but the resultant model is applied historically to periods where the limnological conditions were different, i.e. a natural, low SRP world. In many European lakes today SRP is an important fraction of TP and can often reach ~100 μg P L−1, a concentration well above that at which it is limiting to algal growth (Reynolds 1984). Historically, however, both TP and SRP would have had much lower concentrations in aquatic ecosystems (Newman 1995; Reynolds and Davies 2001). Inputs of P to natural ecosystems by weathering are low (0.05–1.0 kg P ha−1 year−1) (Newman 1995) and the efficiency of P-cycling by terrestrial ecosystems means that natural P-loss to water is very low (Hobbie and Likens 1973). The DI-TP values at Vængsø and Knud Sø imply input (and hence loss rates) were considerably greater than this. In the absence of industrially-derived fertilizers, which increased substantially after 1950 (Smil 2000) the subsidies would be derived from animal manure. These have to be considerable, however, to offset the loss-rates by crop/animal export from the farm/catchment (Newman 1995). Bradshaw et al. (2005) reported DI-TP values >100 μg TP L−1 in the Medieval Period for a small eutrophic Danish lake (Dallund Sø), which even allowing for some internal loading are not sustainable from a catchment-lake mass balance perspective. A more parsimonious explanation is that the DI-TP values, which at Dallund Sø and Vængsø are driven by Cyclostephanos dubius (TP “optima” is 150 μg TP L−1), are erroneous. Factors other than TP were driving the changing abundance of this diatom in the past and the DI-TP model is erroneously reconstructing these changes as P-dynamics.

Biologically available P increased throughout NW Europe and North America since 1950 with the advent of phosphorus use in detergents, increased output in sewage, agricultural intensification and the rise of soil-P saturation in western agro-systems (Carpenter et al. 1998; Smil 2000). Loss rates from catchments have increased correspondingly and the associated impact on lake productivity and ecological structure in lakes is profound (Smith et al. 2006). Support for this concept of calibrating diatoms against high SRP disturbance gradients is provided by monitoring data from Knud Sø where peak SRP concentration during winter (i.e. prior to biological uptake) was 60–90 μg P L−1 in the period 1973–1990 while in 1997 it was <5 μg P L−1 following implementation of catchment P management strategies (Jeppesen et al. 1999). The range of SRP in the Northern Irish lakes used by Anderson (1998) is ~5 to >300 μg L−1. The calibration of diatoms against an SRP-rich gradient therefore provides an explanation of why DI-TP models can have apparently good model performance statistics and can be validated against monitoring data but produce erroneously high TP values for periods prior to the onset of widespread P-utilisation and the increase of SRP in the natural environment (Smil 2000; Reynolds and Davies 2001). Support for the hypothesis that the apparent functionality of DI-TP models is a result of post-1950 rise in SRP availability is provided by the ratio λRP and correlations between DI-TP and PCA axes 1 or 2 (Table 4). Both measures are generally higher (and for the correlations, significant) for the post-1950 sections than for the whole core, suggesting a stronger coupling between assemblage change and DI-TP for this period.

DI-TP and determining environmentally-realistic historic TP values: implications for lake management

Defining background nutrient conditions is seen as an extremely important contribution made by palaeolimnology to environmental management (Battarbee and Bennion 2012). Given the problems with geochemical P records (see Rippey and Anderson 1996 for a discussion and other references) considerable emphasis has been placed on the power of diatom-based models (e.g. Bennion and Battarbee 2007). Determining the correct historical epilimnetic TP value using diatom models is important in terms of setting restoration targets: erroneously high background levels will result in catchment management strategies/programmes/initiatives that are insufficient to bring about improvements in lake ecology. Redox-driven changes associated with eutrophication (due to increased organic load) generally reduce the relevance of geochemical phosphorus records although whole-basin accumulation rates can provide a better approximation of the external P-load (Fig. 5c; Rippey and Anderson 1996). The DI-TP and single core geochemical P records at Augher track each other reasonably well (Fig. 5) until the mid-1940s when they diverge, mainly due to the onset of severe anoxia (Anderson and Rippey 1994). The DI-TP record suggests sustained enrichment until the late-1970s when the creamery waste was diverted (Anderson 1989). Interestingly, the temporal variability in measures of diatom community change (Fig. 5a) all increase after ~1930 when the manure P-loading estimates (Foy et al. 2003) start to increase. The latter reconstruction of P-dynamics in the Blackwater catchment (which includes Augher) offers an independent measure of external P-loading; the diatom-based variables all tend to reflect this eutrophication trajectory, certainly in terms of timing and trend. Given the strong point source inputs from the creamery a better agreement is unlikely. The question remains, however, about the level of DI-TP values in the nineteenth century and their validity given low intensity agriculture. The identification of the errors and inaccuracies associated with diatom-based TP transfer functions (above) requires an evaluation of both their use in environmental management and their comparison with alternative methods for determining background TP levels.

Independent validation of DI-TP records for pre-disturbance/background conditions is difficult because all other methods (export coefficient and dynamic export models) have their own set of problems or because monitoring data are temporally limited (Anderson 1995). May et al. (2012) derived a catchment P loading using diatom-inferred TP values for Loch Leven that was compared with a monitoring-derived loading for 1970–2010 and suggested there was good agreement. However, no comparative analysis was attempted (i.e., for the most recent period) and errors in the DI-TP values were not considered. While the diatom data pick up the main trends of the lake recovery at Loch Leven there is considerable difference in individual values for the most recent period, with reconstructed values differing by up to ±40 μg TP L−1 (see Figure 2 in Battarbee et al. 2012). A similar mismatch between monitored data and DI-TP values was also observed at Esthwaite Water where DI-TP and monitored data differed by up to 10–40 μg TP L−1 for the post-1980 period (Bennion et al. 2005), suggesting that some circumspection is needed when interpreting the DI-TP values at this site. Examples where monitoring data and DI-TP values are in better agreement have largely been limited to reconstructions from deep, less productive lakes (i.e., Bennion et al. 1995; Lotter 1998).

DI-TP models: fit for purpose?

It is important to retain a perspective of the management relevance of diatom-inferred TP models in the light of the above discussion. Many DI-TP models will provide erroneous estimates when applied to diatom assemblages from sediments deposited prior to the twentieth century increase in SRP in catchment runoff (~1930 to 1950; Fig. 5) due to other factors influencing changing diatom abundance. However, while the absolute values have little credibility in an environmental context (e.g. the basal values at Knud Sø; Fig. 4), the trends at some lakes (e.g. Winona and Augher; Fig. 4) provide an ecologically and environmentally plausible summary of P-driven change at these sites. Given the constraints on geochemical P from single sediment cores from deep, stratifying lakes where redox conditions change with eutrophication, diatom records (either as DI-TP or qualitative biostratigraphies) undoubtedly provide some independent confirmation of eutrophication effects. It is clear that diatoms reflect P-driven eutrophication, and some of this signal is quantifiable (particularly after 1950; Fig. 4; Table 3), but there has to be some concern about the “background”, pre-disturbance values or reconstructed values of >100 μg TP L−1 for primitive agriculture systems. Clearly the confounding roles of alkalinity, inorganic carbon and the light/DOC gradients all come to the fore in controlling diatom abundance in natural, low SRP environments.

The problems of surrogacy and confounding variables are not confined to diatoms and phosphorus. Anderson (2000) and Velle et al. (2010) have argued that diatom- and chironomid-temperature transfer functions suffer the same limitations. Furthermore, Juggins (2013) demonstrates that these problems are more widespread and are the result of violating the fundamental assumptions of the approach. When the underlying assumptions are not met, these problems thus pervade any attempts at quantitative reconstruction from biological proxies. Experiments with simulated data show that WA-based reconstructions are particularly susceptible to the effects of changes in secondary variables that can lead to spurious reconstructions that are misleading in terms of both trends and absolute values (Juggins 2013).

DI-TP models are clearly problematic at some sites. We suggest that the simple steps we have outlined in this paper can be used to determine the utility of each training set and subsequent reconstructions. These steps include (1) RDA and partial RDAs of the training set to assess the strength and independence of TP as an explanatory variable and to determine the total and unique effects of TP and other highly correlated variables, (2) projecting each sediment core onto an RDA of the training set to determine if changes in the sedimentary diatom community composition correlate with the training-set TP gradient, (3) assess the strength of the TP signal in the sediment core (expressed as DI-TP) using λRP and the correlation between DI-TP and PCA axes 1 and 2. The diatom stratigraphy should be examined, especially in light of the results of the GAM modelling presented here and what is known about the lack of a predictable response of many benthic species to TP. If the TP reconstruction appears plausible after these steps, one should finally apply “expert opinion” to determine if the reconstruction is ecologically credible.