Introduction

Small, shallow lakes are the most common freshwater ecosystems in the world (Wetzel 2001; Downing et al. 2006). Understanding the ecology of these systems has important implications for lake management and the preservation of biodiversity (Oertli et al. 2002; Scheffer et al. 2006). Not surprisingly, there has been a recent surge of interest in paleolimnological reconstructions of shallow lakes (e.g. Brenner et al. 2006; Denys 2006). As researchers have come to appreciate the central role that macrophytes play in structuring shallow lake ecosystems (Carpenter and Lodge 1986; Jeppesen et al. 1998; Tan and Özesmi 2006) and have become interested in exploring the theory of alternative stable states (Scheffer 1989, 1998; Scheffer et al. 1993), studies on macrophyte ecology have increased dramatically. For example, an ISI Web of Science search for the keyword “macrophyte” found that in the four top journals accounting for ∼25% of macrophyte studies between 1967 and 2007 (Hydrobiologia, Aquatic Botany, Freshwater Biology, and Archiv für Hydrobiologie), there has been a sudden increase in macrophyte studies from an average of ∼2% of all publications per year (between 1967 and 1990), to an average of ∼10% of all publications (between 1991 and 2007).

Despite the interest in macrophyte ecology, relatively little is known regarding the long-term dynamics of whole-lake macrophyte biomass. The few studies that have attempted to reconstruct past macrophyte cover, biomass, or community over longer time scales (decades to centuries) have been primarily qualitative and have employed plant macrofossil and/or pollen analyses (e.g. Sayer et al. 1999; Birks 2000; Sand-Jensen et al. 2000; Odgaard and Rasmussen 2001; Egertson et al. 2004; Davidson et al. 2005; McGowan et al. 2005; Rasmussen and Anderson 2005; Väliranta 2006; Zhao et al. 2006). Although plant macrofossil and pollen analyses are excellent tools to provide an indication of what macrophyte taxa were present in the lake, one cannot obtain quantitative estimates for either macrophyte biomass or cover using these techniques. This is because macrofossils are distributed unevenly in lake sediments and there is differential preservation of macrofossils between macrophyte taxa (Zhao et al. 2006), making it difficult to quantitatively interpret macrofossils as plant cover. Pollen analysis also presents its own problems for obtaining quantitative estimates of macrophyte cover. The aquatic pollen assemblage underestimates macrophyte diversity and abundance because the majority of aquatic plants often reproduce vegetatively (Zhao et al. 2006). For these reasons, an approach based on the relative abundance of diatom taxa is being evaluated in this study for obtaining quantitative estimates of macrophyte cover.

There is good evidence that the presence of macrophytes influences diatom assemblages and diatoms have been used to qualitatively reconstruct changes in macrophyte cover within a lake (Moss 1978). Recently, there have been a few attempts to define the relationship between diatoms and macrophytes quantitatively. For example, Reavie and Smol (1997) showed that in the St. Lawrence River diatom assemblages collected from macrophytes are distinguishable from epilithic and epiphytic diatom assemblages on filamentous algae. These data were then used in a logistic regression model to semi-quantitatively infer changes in macrophyte habitat over the last century in two fluvial lakes (Reavie et al. 1998). Similarly, Dixit et al. (1999) have demonstrated that macrophyte density was a significant explanatory variable of diatom community composition in a 257-lake dataset for the northeastern USA. In fact, they found that macrophyte density explained differences in the diatom community composition that was independent of lake depth, total phosphorus, and other limnological variables. This suggests that it should be possible to construct a quantitative macrophyte biomass model based on diatom assemblages.

To test if diatom assemblages preserved in profundal surface sediments can distinguish between lakes with extensive, moderate, or sparse macrophyte cover, we analyzed a dataset assembled as part of the USA. Environmental Protection Agency’s Environmental Monitoring and Assessment Program-Surface Waters (EMAP-SW; Larsen et al. 1991; Dixit et al. 1999). This dataset also included diatom counts from the top and bottom 1 cm from sediment cores of 136 lakes. This “top–bottom” approach is well suited for regional studies, as it allows for a relatively rapid comparison between recent and pre-impact (top and bottom 1 cm of sediment cores respectively) limnological conditions for many lakes (Smol 2002). Thus, we used these diatom data to infer changes in macrophyte cover for lakes across the northeastern United States.

The EMAP-SW dataset

The complete EMAP-SW dataset consists of a suite of environmental and physical data for 370 lakes and reservoirs in northeastern USA (Maine, New Hampshire, Vermont, Massachusetts, Connecticut, Rhode Island, New York, and New Jersey). These lakes were sampled through a stratified random sampling design from a dataset of all lakes in the region that were at least 1 m deep and had a surface area between 1 and 10,000 ha (Larsen et al. 1991). As a result, the lakes sampled represent a subset of all northeast lakes (as defined above) with a known statistical uncertainty (Larsen et al. 1991; Dixit et al. 1999). For a complete description of lake selection methods used in EMAP-SW, see Larsen et al. (1991). Sediment cores were recovered from the deep, central, portion of 238 of the lakes and reservoirs (159 lakes and 79 reservoirs) using a Glew (1989) gravity corer for diatom analysis (Dixit et al. 1999). For a full description of coring techniques, sediment dating, and diatom analysis, see Dixit et al. (1999). To determine if the sediment cores extended to pre-industrial conditions (pre-1850), 210Pb activity and pollen analyses (the ratio of ragweed and grass pollen to other pollen types) were conducted on the bottom sediments of the cores (Dixit et al. 1999). Diatoms were analyzed from the top 1 cm and bottom 1 cm of the sediment cores following standard taxonomic procedures (Dixit et al. 1999).

Macrophyte cover was also estimated for 239 lakes. Macrophyte cover was determined following a standardized, semi-quantitative procedure (Baker et al. 1997). At each lake, the habitat of the lake margin was examined at 10 evenly spaced, predetermined stations 10 m from the shoreline. Macrophyte cover was visually examined in an area 15 m wide by 10 m out from the shoreline and the lake was ranked as sparse (<10% cover), moderate (10–40% cover), or extensive (>40% cover) macrophyte cover. This method of quantifying macrophyte cover only provides a coarse estimate but offers the advantage of being a relatively quick and inexpensive technique for generating estimates from a large number of lakes.

Methods

We conducted some preliminary screening of the EMAP-SW dataset to include only those lakes that had both estimates of macrophyte cover and diatom assemblage data. Of the 370 lakes in the EMAP-SW dataset, 215 fit the above criteria and were retained for further analysis. Across these 215 lakes, a total of 468 diatom taxa were identified in the surface sediment samples (Dixit et al. 1999). Of the 468 diatom taxa, 198 were considered to be common using the criteria in Dixit et al. (1999) that includes only those taxa that are found in at least 10 samples and reach a relative abundance of at least 1% in one lake. All analyses were carried out on the two datasets: The full complement and the common diatom taxa datasets. We found, however, no appreciable differences between the two datasets and thus present only the results of the analysis on the common diatom taxa.

Analysis of similarity (ANOSIM) based on a Bray-Curtis similarity index was performed using the computer program PRIMER (version 5) to test if the surface sediment diatom assemblages are significantly different between the a priori groups of macrophyte cover classified as sparse (group 1), moderate (group 2), or extensive (group 3). ANOSIM provides an R statistic value that is a measure of the average rank similarities arising from all pairs of samples between different groups (Clarke and Warwick 2001). The R statistic ranges between values of −1 to 1 with R = 1 when all samples within a group are more similar to each other than any samples from any other group and R approaches zero when similarities between and within groups are on average the same. Significance of the R statistic is established using a permutation test (999 permutations, P < 0.05).

Given that the boundary between the moderate cover classification (10–40% cover) is likely difficult to distinguish from the sparse (<10% cover) or extensive (>40% cover) classification, only lakes grouped as sparse or extensive macrophyte cover were included in model development (n = 145). These lakes are primarily shallow (median depth 2.6 m), and small (median surface area 30.1 ha; Fig. 1). The distribution of lake morphometric parameters used in selecting the EMAP-SW lakes (i.e., mean depth and lake surface area) are not significantly different than the original data set (two-tailed Kolmogrov–Smirnov test: D = 0.082, P = 0.51; D = 0.080, P = 0.55 for mean depth and lake surface area respectively). Using CANOCO (version 4.5; ter Braak and Šmilauer 2002), a correspondence analysis (CA) was used to explore the relationship among the lakes based on their diatom assemblages. For CA analysis diatom taxa were square root transformed to reduce the influence of dominant taxa, and rare species were downweighted.

Fig. 1
figure 1

Distribution of lakes in the sparse/extensive macrophyte data set (n = 145) along gradients of (A) mean depth (m), (B) surface area (km2), (C) turbidity (NTU), and (D) total phosphorus (μg/l)

Logistic regressions are well suited for constructing predictive models based on binary data (Osborne and Tigar 1992) and have been used successfully in paleolimnological studies (Reavie and Smol 1997). Due to the binary nature of the sparse/extensive lake cover classification, we employed a logistic modeling approach similar to that used by Reavie and Smol (1997) in their presence/absence modeling of habitat type for the St. Lawrence River. Logistic regression models are similar to linear regression models but the logistic regression model is constrained to a value between zero and one (Osborne and Tigar 1992; Reavie and Smol 1997) and takes the form

$$ y = {\text {e}}^{{{\text{a}} + {\text{b}}x}} /(1 + {\text {e}}^{{{\text{a}} + {\text{b}}x}} ) $$
(1)

where y is the predicted response value, a + bx is the linear predictor where a is the intercept, b is the regression coefficient and x is the independent variable. Using a random number table, two thirds of the sites (97 lakes) were selected to calibrate the macrophyte sparse/extensive model and 48 sites were set aside for an independent cross validation of the model. Sample CA axis scores that were found to be significant predictors of macrophyte cover based on an ANOVA (P < 0.05) were selected as independent variables in a logistic regression model to classify macrophyte cover for each sample as either sparse (0) or extensive (1). The performance of the logistic model was evaluated on the 48 sites set aside to test the model accuracy, the correlation coefficient, and the significance (P < 0.05) of the relationship.

To infer past macrophyte cover for the EMAP-SW lakes, the diatom assemblages from the bottom 1 cm of the core were plotted passively within the CA conducted on the surface sediment diatom assemblages. This provided CA axis scores for the bottom sediment samples without their diatom assemblages influencing the distribution of the sites in the ordination space. The logistic regression model was then applied to the CA axis scores for the bottom 1 cm samples to obtain the probability of the past macrophyte biomass being sparse or extensive. Change in macrophyte cover between past and present was calculated by subtracting inferred past macrophyte cover from inferred present macrophyte cover.

Results and discussion

Diatoms as indicators of macrophyte cover

Analysis of similarity on the three a priori groups of macrophyte cover (sparse, moderate, and extensive) indicated that the diatom assemblages within a group are significantly more similar to each other than they are to the diatom assemblages of other groups (R > 0, P < 0.05 for all comparisons; Table 1). This suggests that diatom assemblages from profundal sediment cores could be used to quantitatively reconstruct macrophyte cover within a lake. Although all three groups of macrophyte cover differed significantly in their respective diatom assemblages, the differences between the diatom assemblages from the moderate macrophyte cover group to those from either the sparse or extensive cover groups are relatively low (Table 1). This result, however, is not surprising considering how the estimates of macrophyte cover were collected. For example, it is no doubt difficult to distinguish between 8 and 12% cover using the EMAP method and yet this boundary separates the sparse and moderate cover lakes. For this reason, we have only included lakes classified as having either sparse or extensive macrophyte cover in our analyses below to contrast the largest changes in lake margin habitat (n = 145).

Table 1 ANOSIM results for sites with sparse (group 1), moderate (group 2), and extensive (group 3) macrophyte cover

A CA analysis was used to collapse the large amount of species composition data for each site into sample scores that can be more easily applied in logistic regression modeling (Reavie and Smol 1997). Sites that were set aside for model cross validation and samples from the bottom of the sediment cores were plotted passively in the CA. The mean CA axis 2 scores for the sparse and extensive sites are significantly different from one another (t-test, P < 0.0001), with most of the extensive cover sites (84%) having positive axis 2 scores (Fig. 2). Although it is difficult with this binary dataset to speculate on which characteristics of macrophyte beds favor certain diatom taxa, the known autecology of the diatoms is consistent with the differentiation of taxa separated along CA axis 2. For example, we found that diatom taxa identified by Reavie and Smol (1997) that were associated with macrophytes had positive CA axis 2 scores. These taxa included Cocconeis placentula, Gomphonema gracile, and Navicula capitata, which have also been reported to grow on macrophytes in other studies (e.g. van Dam and Mertens 1993; Sayer et al. 1999; See Appendix 1 for a complete list of diatom taxa).

Fig. 2
figure 2

Correspondence analysis of the diatom assemblages (sample scores) for extensive (squares) and sparse (circles) macrophyte cover sites from the northeastern United States

Binary logistic model

The CA axis 2 sample scores were used as the independent variable in a logistic regression model to predict the macrophyte cover in a lake as either sparse or extensive (Fig. 3). When this model was applied to the 48 sites that were set aside for an independent cross validation, it correctly assigned sites as either sparse or extensive macrophyte cover 79% of the time (r 2 = 0.32, P < 0.0001, RMSEP = 0.19), indicating that this model is robust in inferring macrophyte cover as either sparse or extensive for lakes in the northeastern USA. Whereas we recognize that this model is an oversimplification of the relationship between diatom taxa and macrophytes, it does predict large changes in macrophyte cover with a high degree of accuracy and thus can be used to provide the first estimates of pronounced shifts in macrophyte cover for lakes in the northeastern USA.

Fig. 3
figure 3

Logistic regression model used to infer macrophyte cover (r 2 = 0.32, P < 0.0001, RMSEP = 0.19). Closed circles are the actual model score of the sample and open circles are the classification of the site as either sparse (0) or extensive (1) macrophyte cover based on the model score cutoff of 0.5 (i.e., >0.5 = extensive; <0.5 = sparse)

Our model forms part of a growing body of literature demonstrating that macrophyte densities can be inferred semi-quantitatively from fossil assemblages. For example, Reavie et al. (1998) used diatoms to reconstruct changes in the littoral habitat of two fluvial lakes along the St. Lawrence River, Quebec, Canada. Analysis of cladoceran and chironomid remains also hold potential. Ogden (2000) found that the proportion of Chydoridae to total Cladocerans in surface sediments was positively related to macrophyte cover in Australian floodplain lakes. Ogden (2000) then applied this relationship to reconstruct changes in macrophyte cover in lakes over a gradient of agricultural land-use intensities. Davidson (2006) also used cladoceran remains to semi-quantitatively reconstruct planktivorous fish and macrophyte abundance in two shallow, eutrophic lakes in England. Similarly, in a study of 25 Danish lakes, chironomid remains have also been shown to significantly differ (P < 0.001) in lakes of varying macrophyte classes (Brodersen et al. 2001). Finally, employing a relatively new paleolimnological indicator, Odgaard and Rasmussen (2001), demonstrated that the imprints of cell patterns on leech (Piscicola geometra) egg-cocoons could also be used as an indicator of the presence of macrophytes.

Inferred changes in macrophyte cover

We applied the binary logistic model to the diatom assemblages from the bottom of the cores to infer past macrophyte cover in 136 lakes. Of the 136 sites with bottom samples, 88 of them were classified as having a bottom age of pre-1850 and 49 were classified as having a bottom age of post-1850 (Dixit et al. 1999). The division between the pre and post-1850 sites is largely a division between natural lakes and reservoirs because 95% of the natural lakes used in this dataset had bottom samples of pre-1850 age and 81% of the bottom samples from reservoirs were post-1850. Because the sediment cores recovered from natural lakes and reservoirs represent different time periods, we have treated them separately in our comparisons with modern data.

When inferred top and bottom macrophyte cover were compared, the majority of lakes showed no change in inferred macrophyte cover (i.e., 84% of natural lakes showed no change and 83% of reservoirs showed no change). Given that most of the natural lakes and reservoirs (>90% and ∼88% respectively) are currently oligotrophic or mesotrophic (based on the OECD boundaries; OECD 1982), however, we were not surprised by the relatively small percentage of sites showing a change in macrophyte cover. When we compared the inferred changes in macrophyte cover to the modern total phosphorus (TP) values for the same lakes, we found no significant relationship (P = 0.63) between modern TP and inferred changes in macrophyte cover (Fig. 4). Many other factors such as nitrogen, inorganic carbon, trophic interactions, mechanical disturbance, and lake morphometry, however, have also been shown to be important in predicting how macrophyte cover will responds to TP levels within a lake (Scheffer 1998). Such a lake-dependent response in macrophyte cover to nutrient load has also been observed in several eutrophic and hypereutrophic systems that have lost macrophyte cover and shifted to a turbid-water state (Scheffer 1998). The interaction between these other factors and TP levels has made it difficult to define a critical TP concentration where a shift to the turbid-water state will occur (Scheffer 1998).

Fig. 4
figure 4

Comparison between inferred change in macrophyte cover (inferred modern macrophyte cover–inferred past macrophyte cover) and modern total phosphorus (TP) values (μg/l) for natural lakes (squares) and reservoirs (circles). Modern TP values have been ln (x + 1) transformed. An increase in macrophyte cover indicates that the inferred modern cover of macrophytes is greater than it was in the past whereas a decrease means that macrophytes were more abundant in the past

Extrapolating results to all lakes in the northeastern USA

Although long-term quantitative data on macrophyte cover is lacking for lakes in eastern North America, lake users often report an increase in macrophyte cover in their largely oligo/mesotrophic lakes (e.g. RAPPEL 2004). In these nutrient-limited systems there tends to be a weak, although significant, positive correlation between TP and macrophyte cover (Bachmann et al. 2002). In the EMAP-SW dataset the mean TP of sites with extensive macrophyte cover is significantly greater than sites with sparse macrophyte cover (P = 0.001). An increase in macrophyte cover with increasing TP is in contrast to what has been observed in the nutrient rich shallow lakes of Europe were many have lost their macrophyte cover as a result of eutrophication (Scheffer 1998).

One of the advantages of the EMAP-SW dataset is that the lakes were selected using a randomized sampling design. As a result, studies on the dataset can be scaled up to represent all lakes in the sampling area (Larsen et al. 1991). In this study we have analyzed a subset of the entire EMAP-SW dataset (i.e., those sites with both diatom and macrophyte cover data). It is still possible to scale up our results, however, as the distributions of lake morphometric parameters used in selecting the EMAP-SW lakes (i.e., mean depth and lake surface area) are not significantly different than the original data set. Thus, the 83 natural lakes and 53 reservoirs used in this study represent ∼2,576 natural lakes and ∼3,189 reservoirs in the northeastern USA. Based on our results, we estimate that ∼355 natural lakes have increased in macrophyte cover while ∼100 have decreased over the northeastern USA. Reservoirs show the opposite trend with ∼276 reservoirs declining in macrophyte cover and only ∼56 reservoirs increasing in macrophyte cover. These estimates of changes in macrophyte cover over the northeastern USA suggest that, in the sites where macrophyte cover has changed dramatically, most natural lakes have had an increase in macrophyte cover while most reservoirs have had a decline in cover. This difference in the pattern of inferred macrophyte changes between natural lakes and reservoirs may be caused by the difference in the time covered by the sediment cores analyzed. For the natural lakes, 95% of the cores date back to pre-1850, and thus may represent a period before large-scale human impact on lakes and their catchments. Reservoirs, on the other hand, by definition are man made and the base of these sediment cores may represent a period of higher human impact on the system than present. This is plausible for the northeastern USA as agricultural activity peaked in the region around 1870 (Waisanen and Bliss 2002). To properly test this hypothesis, however, more detailed down-core analyses at these sites are required.

Conclusions

This study has shown that sedimentary diatom assemblages are reliable indicators for distinguishing between sparse or extensive macrophyte cover. A logistic regression model based on diatom assemblages was able to correctly classify the macrophyte cover of an independent dataset of lakes 79% of the time. This has led to the first semi-quantitative estimates of changes in macrophyte cover across the northeastern USA. For the sites with an inferred change in macrophyte cover, the majority of natural lakes had an increase in macrophyte cover, while the majority of reservoirs had a decline in cover. Understanding how macrophyte cover has changed in a lake through time has important implications for lake management and the preservation of biodiversity. We believe that with continuous estimates of macrophyte cover or biomass density, such models could be further refined to increase the resolution of our inferences and allow us to assess more subtle changes in macrophyte communities.