Introduction

Diatoms (Class: Bacillariophyceae) are unicellular, eukaryotic organisms that are ubiquitous across the aquatic continuum (Smol and Stoermer 2010), occupying diverse aquatic habitats that span broad environmental gradients (e.g., pH, nutrient levels, metal(loid) concentrations, ionic strength, lake depth; Round et al. 1990). These siliceous algae, with at least 30,000 extant species (Mann and Vanormelingen 2013), form the base of most freshwater and marine food webs and play a key role in aquatic biogeochemical processes (Smol and Stoermer 2010). Because diatoms respond rapidly and predictably to changes in the chemical and physical conditions of their surroundings, have fast reproduction rates, and have siliceous frustules which readily preserve in most sediments (Julius and Theriot 2010), they are ideal bioindicators of contemporary and past environmental conditions across timescales spanning seasons to millennia (Smol and Stoermer 2010).

Diatoms have been extensively used as bioindicators in paleolimnological studies, tracking long-term limnological responses to anthropogenic stressors including acidification, eutrophication, climate impacts, industrial contamination, and salinization (Smol 2008; Smol and Stoermer 2010). For example, during the acid rain debates of the 1970s to 1990s, diatom-based approaches provided unequivocal qualitative (i.e., species changes) and quantitative (i.e., reconstructions of lakewater pH) evidence that many low-alkalinity lakes in northeastern North America and Europe acidified because of anthropogenic activities (Battarbee 1990; Cumming et al. 1992; Dixit et al. 1992).

Diatom autecological information is commonly derived from surface-sediment calibration sets (also known as training sets) that relate the modern distribution of taxa to current environmental measurements (Birks et al. 1990; Fritz et al. 1993; Wilson et al. 1994; Hall and Smol 1996; Ramstack et al. 2003; Chen et al. 2008; Juggins and Birks 2012). This approach provides estimates of taxon optima and tolerances, which can then be used to infer past limnological conditions related to shifts in the composition of downcore sedimentary diatom assemblages. Generally, a minimum of 40 to 70 sample sites are used in calibration datasets to encompass the gradient of limnological characteristics observed within a study area (Reavie and Juggins 2011). Localized calibration datasets can also be merged to create larger calibration sets, capturing broader environmental conditions that can yield a more accurate estimate of a taxon’s ecological range (Juggins and Birks 2012). For example, to assess eutrophication trends in a lake in the province of British Columbia (Canada), Cumming et al. (2015) developed a 251-lake-calibration dataset along a total phosphorus (TP) gradient of 2–227 µg L−1 and quantified TP optima and tolerances for 48 common diatom taxa. Similarly, large databases such the European Diatom Database (EDDI; Battarbee et al. 2001a, b), as well as the “Omnidia” software (Lecointe et al. 1993), provide diatom optima and tolerances that are widely applied throughout Europe. Optima and tolerances calculated using large datasets are often more robust and have the potential to be applied in regions where localized calibration sets are not available because these analyses are time consuming and require taxonomic expertise. Diatom autecological information derived from large calibration sets can be an attractive and useful tool for biomonitoring assessments and lake management.

In this study, we synthesized surface-sediment diatom relative abundance data from 20 multi-lake studies conducted between 1987 and 2019 at the Paleoecological Environmental Assessment and Research Lab (PEARL) at Queen’s University (Kingston, Ontario, Canada) to estimate optima for lakewater pH, TP concentration, and maximum lake depth for common diatom taxa in Ontario lakes (Table 1). This dataset is comprised of 546 samples from 464 unique sites (some lakes were resampled in varying years and, at some lakes, multiple depositional basins were sampled). The sampled lakes span Ontario’s three ecozones (Hudson Bay Lowlands, Ontario Shield, Mixedwood Plains; Fig. 1), and are therefore diverse in terms of limnological and physical features.

Table 1 Details of the studies included in this synthesis, separated by lead researcher
Fig. 1
figure 1

Map of Ontario showing the ecozones and locations of the study lakes, and corresponding violin plots depicting the distribution of lakewater pH, total phosphorus (TP) concentration, and maximum depth. Note that TP and depth are shown on a log10 scale. The violin plots illustrate the median as a dot and the interquartile range as whiskers

We focused on the relationships between the most common diatom taxa in Ontario and lakewater pH, TP concentrations, and maximum depth. These limnological variables were selected as they have well-established relationships with diatoms, are often correlated to other important limnological variables (e.g., buffering capacity, light, etc.), and are commonly measured when conducting spatial surveys. Moreover, lakewater pH and TP concentration are important water-quality variables to measure lake acidification and eutrophication, respectively, which are major stressors for freshwater ecosystems (Schindler 1988; Anderson et al. 2002; Schindler et al. 2008; Keller 2009; Van Staden et al. 2022). The strong response of diatom taxa to changes in TP concentrations and pH makes them excellent bioindicators and form the basis of quantitative inference models for reconstructing past environmental conditions (Birks et al. 1990; Hall and Smol 1996; Battarbee et al. 2001a, b; Reavie and Smol 2001; Tremblay et al. 2014; Cumming et al. 2015). In contrast, quantitatively reconstructing lake depth using diatoms from a set of calibration lakes has many challenges and is therefore not a common practice. However, quantitative reconstructions are not the goal of this study, and we rationalize including an exploration of diatom distributions across maximum lake depth as the relationship between lake depth and key taxa can help clarify diatom responses to acidification and eutrophication across a diverse suite of lakes. For instance, diatom-assemblage responses to increasing TP concentrations could be vastly different in a shallow lake relative to a deeper lake, because environmental factors associated with lake depth (e.g., light availability, habitat structure) are substantially different between the two systems. Our objectives were three-fold: 1) to identify the most common diatom taxa observed in the surface sediment of Ontario lakes; 2) to estimate the environmental optima for lakewater pH, TP concentration, and maximum depth for the most common taxa; and 3) to estimate the environmental range across Ontario where the common diatoms occur and flourish. By refining diatom autecology and advancing our understanding of the environmental conditions in which diatom taxa are distributed, this study will contribute to the more effective use of diatoms as bioindicators in environmental assessments.

Methods

Site selection and sample collection

We merged 20 Ontario surface sediment calibration datasets and multi-lake studies collected by a single research program (PEARL at Queen’s University, Kingston, ON), thereby minimizing methodological variability (Table 1). All studies used comparable methods for water chemistry and sediment collection, as well as for diatom preparation and enumeration. Sediment cores were retrieved from the deepest basin of each lake (or where the deepest point was identified using a depth sounder near the centre of the lake) using a Glew-type gravity corer (Glew 1989, 1991) or a modified Kajak-Brinkhurst gravity corer (used in Dixit et al. 1991, 1992). Occasionally, multiple cores were collected from lakes with complex bathymetry (i.e., with two or more depositional areas such as Lake Manitou (Nelligan et al. 2020) and Lake Nipissing (Favot 2021)). The top-most sediment samples were sectioned at 0.25, 0.5, or 1 cm intervals using a vertical extruder (Glew 1988) and correspond to limnological data collected from the surface waters of the same lake at the time of coring, or as part of ongoing water-quality-monitoring programs. Water-chemistry measurements, including TP concentration and pH, were analyzed in a variety of laboratories, but all used similar well-established methodologies (OME 1979, 1981, 1983; Environment Canada 1994a, b; Janhurst 1998). In some cases, lakewater pH was measured while collecting the sediment core using calibrated handheld pH meters. We categorized the limnetic TP concentrations following the classification scheme for Canadian waters by Wetzel (2001), where lakes with TP concentrations < 12 µg L−1 are oligotrophic, between 12 and 24 µg L−1 are mesotrophic, between 24 and 100 µg L−1 are eutrophic, and > 100 µg L−1 are hypereutrophic. Lake-depth measurements were taken concurrent with sediment-core retrieval. In most cases, sediment cores were collected from the deepest point of the lake, thus corresponding with maximum depth.

Diatoms from all waterbodies used in this study (546 samples, 464 unique sites) were prepared following standard techniques used at PEARL as described in Rühland and Smol (2002). Briefly, 0.2 to 0.5 g wet sediment (or 0.02 to 0.05 g freeze-dried sediment) was subsampled and placed into glass scintillation vials. To remove the organic components, ~ 15 ml of a 1:1 molecular weight ratio of concentrated sulphuric and nitric acid was added to each sample, heated to ~ 80 °C for at least 2 h, and rinsed repeatedly with deionized water to achieve circumneutral pH before preparing permanent microscope slides using Hyrax® or Naphrax® mounting medium.

Diatoms were identified to at least the species level (often to variety) using a research-grade compound microscope under oil immersion at 1000× magnification. For each sample, at least 250 diatom valves, but often more than 400, were enumerated. Taxonomic identifications were based on a variety of standard sources including Krammer and Lange-Bertalot (1986–1991), Patrick and Reimer (1966), Camburn and Charles (2000), Reavie and Kireta (2015) and published articles (e.g., Koppen 1975; Siver and Kling 1997).

Taxonomic harmonization

All contributing taxonomists were trained at the Paleoecological Environmental Assessment and Research Laboratory (PEARL) at Queen’s University, Kingston, Ontario. However, since diatom identification has evolved and advanced over the study period, we used a series of conservative harmonization techniques to establish taxonomic units that helped minimize Type I errors. We consider taxonomic units to be a group of morphologically similar taxa that are generally representative of comparable ecological conditions.

First, we re-calculated relative abundances from raw count data to avoid errors that may have been introduced through data transformations. Next, imprecise taxa identifications were omitted from further analyses (e.g., those that were identified only to genus level, aff. (affinis), cf. (conferre), unknowns). Taxon names were then updated to the most current nomenclature based on recent publications and online databases (e.g., Spaulding et al. 2022; Guiry and Guiry 2022). Finally, some taxa were grouped into taxonomic complexes to address potential concerns that may arise due to differences in: 1) taxonomic precision; 2) identification of taxa that are morphologically similar and challenging to distinguish using light microscopy; and 3) nomenclature over time. Diatom taxa that were grouped with their varieties include: Achnanthidium minutissimum, Asterionella ralfsii, Aulacoseira perglabra, Cocconeis placentula, Fragilaria capucina, Frustulia rhomboides, Pinnularia microstauron, Sellaphora pupula, and Staurosirella pinnata. Other taxon groupings based on identification imprecision include Discostella stelligera/pseudostelligera, Pantocsekiella comensis/gordonensis, Fragilaria delicatissima/tenera/nanana, Navicula cryptocephala/cryptotenella, the Cymbella gaeumannii complex (Encyonopsis falaisensis, Encyonema gaeumannii, E. perpusillum), the Fragilaria ulna complex (F. ulna varieties, Ulnaria acus, Synedra acus var. angustissima), the Fragilaria virescens complex (Fragilariforma virescens varieties, Stauroforma exiguiformis), and the Lindavia bodanica complex (Lindavia bodanica varieties, L. comta varieties, L. affinis, L. intermedia).

After harmonization, taxon names were updated to the most recently accepted entity. Notable updates include all Synedra spp. renamed to Fragilaria spp. (as per Alexson et al. 2022), Cyclotella tripartita and Lindavia tripartita to Pantocsekiella tripartita (as per Ács et al. 2016), Lindavia ocellata to Pantocsekiella ocellata (as per Ács et al. 2016), Cyclotella michiganiana and Lindavia michiganiana to Pantocsekiella michiganiana (as per Schultz and Dreßler 2022), and Cymbella microcephala to Encyonopsis microcephala (as per Krammer 1997). The species authorities for the common diatom taxa, and the individual taxa within a given complex, are provided in Supplementary Table 1.

Statistical analyses

Histograms of the key limnological variables were used to visualize the distribution of lakewater pH, TP concentration, and depth in the surface-sediment-calibration set (Supplementary Figures S1-S5). Total phosphorus concentration and maximum depth were right-skewed and therefore were normalized using a log10 transformation. We used violin plots to visualize the distribution of the three environmental variables across ecozones (Fig. 1), created using the R package ‘ggplot2’ v. 3.3.5 (Wickham 2016) in the R workspace v.1.4.1717 (R Core Team 2020).

We focused subsequent analyses on “common taxa” in the harmonized dataset. We defined “common taxa” as those with a Hill’s N2 diversity index (Hill 1973) greater than 10, occurring in at least 10% of samples (i.e., 55 occurrences), and having a relative abundance of at least 10% in at least one sample. These cut-off criteria were selected to create an informative and broadly applicable dataset that considers taxonomic changes and potential misidentification of rare taxa resulting from multiple researchers over three decades.

We used hierarchical logistic regression modelling with niche types and species responses, sensu Huisman et al. (1993), to visualize the distribution of taxa along gradients of lakewater pH, TP concentration, and depth. Huisman et al. (1993) outlined five potential response shapes (termed models) applied to an explanatory variable: (I) flat response (i.e., no relationship), which we define as a null response; (II) monotone sigmoid reaching the peak at one end of the gradient; (III) monotone sigmoid with plateau; (IV) unimodal symmetric; and (V) unimodal skewed. Two additional response curves were added by Jansen and Oksanen (2013), representing bimodal ecological response curves. However, considering that unimodal response curves are more representative of our sites across ecozones in Ontario, we restricted our results to models I-V. The most parsimonious model for each response variable was determined using Akaike Information Criterion (AIC), following bootstrapping with 100 permutations to ensure model stability. Model selection was performed using the R package ‘eHOF’ v.1.11 (Jansen and Oksanen 2013).

Lakewater pH, TP concentration, and depth optima were calculated for the common diatom taxa with significant ecological responses (i.e., models II-V) using a weighted-average (WA) approach (ter Braak and Looman 1986) and assessed through bootstrapping with 1000 Monte Carlo permutations. Optima were not calculated for the model I responses as this model represents a lack of response across the gradient of a limnological variable. We also calculated niche borders to identify a taxon’s range of ecological responses in place of using a Gaussian standard deviation approach. Because ecological responses are often skewed in distribution (e.g., models II, III, V), borders associated with curve maxima provide a more representative description of a taxon’s tolerance to a specific variable (Jansen and Oksanen 2013). Here, the outer borders were calculated by multiplying the maximum response with the coefficient e−2, following Heegaard (2002). All niche borders were truncated to the range of the measured gradient of the dataset and did not extrapolate beyond the presented data. Optima and borders were calculated using the R packages ‘analogue’ v.0.17–5 (Simpson and Oksanen 2020) and ‘eHOF’ (Jansen and Oksanen 2013), respectively.

Results

The majority of the 546 surface sediment samples in our regional dataset are from the Ontario Shield ecozone (412 lakes; ~ 75% of total), followed by Mixedwood Plains (87 lakes; ~ 16%), and finally Hudson Bay Lowlands (47 lakes; ~ 9%; Fig. 1; Table 2). The number of sites examined differ for each environmental variable because lakewater pH and TP concentration were not measured at each site (etc.). Ontario Shield lakes had the lowest median pH (6.7), followed by Hudson Bay Lowlands (7.4), and Mixedwood Plains (8.5; Fig. 1; Table 2). Median TP concentration was the lowest in Ontario Shield lakes (6.7 µg L−1), followed by Mixedwood Plains (11.2 µg L−1), and Hudson Bay Lowlands (14.8 µg L−1; Fig. 1; Table 2). Median depth was much shallower in the Hudson Bay Lowlands sites (1.6 m) in comparison to the Mixedwood Plains (13.4 m) and Ontario Shield (17.2 m; Fig. 1; Table 2). Based on median values of the full dataset of 464 unique sites, lakes were generally circumneutral (pH = 6.9), oligotrophic (TP = 8.0 µg L−1), and moderately deep (depth = 14.8 m; Fig. 1; Table 2).

Table 2 Comparison across ecozones of the three environmental variables examined in this study

Following data harmonization there were 728 unique diatom taxonomic units in the full lake set. After applying the cut-off criteria to identify common taxa (Hill’s N2 ≥ 10, no. of occurrences ≥ 55, relative abundance ≥ 10% in at least one sample), 52 diatom taxonomic units were considered common in the lake set and are described in Table 3. The top ten taxa and taxon groups include: (1) the Achnanthidium minutissimum complex (occurring in 88.1% of samples); (2) the Discostella stelligera/pseudostelligera complex (83.2%); (3) Asterionella formosa (77.8%); (4) the Lindavia bodanica complex (76.6%); (5) the Fragilaria delicatissima/tenera/nanana complex (69.2%); (6) Staurosirella pinnata (69.0%); (7) Tabellaria flocculosa strain IIIp (65.8%); (8) Aulacoseira ambigua (63.4%); (9) the Fragilaria capucina complex (59.3%); and (10) Pseudostaurosira brevistriata (55.9%).

Table 3 Lakewater pH, TP concentration, and depth optima of the 52 most common taxa in our 546-sample diatom dataset

With the exception of the Cymbella gaeumannii complex and Platessa conspicua, 96% of common taxa had a significant ecological response to pH (models II–V; Table 3). The most frequent diatom response to pH, as determined using AIC was the symmetrical and unimodal response of model IV (43%; Table 3). Most of the common taxa (83%, 42/52) had a significant ecological response to log10-transformed TP with the asymmetric and unimodal response of model V being the most frequent (30%; Table 3). For maximum lake depth, 85% (44/52) of the taxa had an ecologically significant response to log10-transformed depth, with the most common being an asymmetrical and unimodal response of model V (30%; Table 3).

In our Ontario dataset, diatom taxa generally had highly variable pH border ranges, indicating a broad range of tolerance spanning multiple pH units (Fig. 2). The taxa most indicative of acidic conditions were the Frustulia rhomboides complex (pH optimum: 5.80), Asterionella ralfsii (6.01), Tabellaria flocculosa strain III (6.05), Eunotia exigua (6.05), and the Fragilaria virescens complex (6.06; Fig. 2; Table 3). On the other end of the pH gradient, taxa with the highest pH optima included Pantocsekiella michiganiana (8.47), Pantocsekiella comensis/gordonensis (8.44), Stephanodiscus minutulus (8.41), Amphora pediculus (8.16), and Encyonopsis microcephala (8.06; Fig. 2; Table 3). Several common taxa had narrow border ranges relative to the dataset and unimodal responses to pH, suggesting that they are effective as pH indicators, including Pantocsekiella michiganiana (pH optimum = 8.47; range = 8.27–8.76), Staurosira construens var. venter (7.22; 6.42–7.85), and Staurosirella pinnata (7.41; 6.37–8.76). Other common taxa displayed relatively wide pH border lengths and a unimodal response (IV, V) to pH, including Asterionella formosa (optimum = 6.9; range = 5.88–9.26), the Lindavia bodanica complex (7.23; 4.15–9.26), and Tabellaria flocculosa strain IIIp (6.56; 4.15–8.19) (Fig. 5).

Fig. 2
figure 2

Caterpillar plot of pH optima, illustrating common taxa with ecologically significant responses (50/52 taxa). The dots represent optima and tails are the outer borders

Generally, diatom taxa had broad border ranges for TP concentration (Fig. 3). The taxa indicative of the lowest TP concentrations were Pantocsekiella tripartita (TP optimum: 3.38 µg L−1), the Aulacoseira perglabra complex (5.13 µg L−1), Eunotia exigua (5.63 µg L−1), Asterionella ralfsii (5.63 µg L−1), and Pantocsekiella ocellata (5.77 µg L−1). The taxa indicative of relatively higher TP concentrations in our lake set were Nitzschia fonticola (14.29 µg L−1), Pinnularia interrupta (13.94 µg L−1), Staurosira construens var. venter (13.73 µg L−1), Eolimna minima (13.50 µg L−1), and Staurosira construens (12.82 µg L−1) (Fig. 3; Table 3). Common taxa showing a unimodal response with the narrowest borders in the lake set included Asterionella ralfsii (TP optimum = 5.63 µg L−1; range = 1.10–24.04 µg L−1) and Pantocsekiella michiganiana (8.01 µg L−1; 4.08–21.38 µg L−1) (Fig. 5). More often, unimodal responses to TP concentration were accompanied with broad border ranges, as observed with Asterionella formosa (TP optimum = 7.44 µg L−1; range = 1.74–96.8 µg L−1), the Lindavia bodanica complex (7.67 µg L−1; 1.10–96.8 µg L−1), and Tabellaria flocculosa strain IIIp (7.12 µg L−1; 1.20–96.8 µg L−1; Fig. 5).

Fig. 3
figure 3

Caterpillar plot of TP optima, illustrating taxa with ecologically significant responses (43/52 taxa). The dots represent optima and tails are the outer borders. Note that TP is shown on a log10 scale

The taxa most indicative of shallow water conditions were Pinnularia interrupta (depth optimum = 1.9 m), Staurosira construens var. venter (2.5 m), Psammothidium sacculus (2.8 m), Eolimna minima (2.9 m), and Encyonema hebridicum (3.0 m). Taxa most indicative of deeper-water conditions were Stephanodiscus minutulus (depth optimum: 27.0 m), Pantocsekiella ocellata (26.8 m), Pantocsekiella tripartita (24.7 m), the Lindavia bodanica complex (23.2 m), and Fragilaria crotonensis (22.1 m; Fig. 4; Table 3). Common taxa that displayed unimodal responses with narrow borders relative to the dataset lengths, which suggest they are strong indicators of water depth included Aulacoseira nygaardii (depth optimum = 9.2 m; range = 2.2–11.6 m), Staurosira construens var. venter (2.5 m; 0.8–3.9 m), and Pantocsekiella tripartita (24.7 m; 10.0–95.0 m; Fig. 5; Fig. S6). Several taxa displayed unimodal responses with broad depth-border lengths that spanned almost the entire depth gradient of our lake set, including Aulacoseira ambigua (depth optimum = 13.6 m; range = 1.2–95.0 m), Pantocsekiella michiganiana (13.8 m; 1.0–77.1 m), and Staurosirella pinnata (6.18 m; 0.7–63.5 m; Fig. 5).

Fig. 4
figure 4

Caterpillar plot of depth optima, illustrating taxa with ecologically significant responses (43/52 taxa). The dots represent optima and tails are the outer borders. Note that depth is shown on a log10 scale

Fig. 5
figure 5figure 5figure 5figure 5figure 5figure 5figure 5figure 5figure 5figure 5figure 5

Ecological response curves of the 52 common taxa in this dataset. Curves reflect the ideal model based on hierarchical logistic regression modelling. The optima and selected models are shown in the top left of each panel. The optimum is shown with a solid vertical line and the outer borders are dashed vertical lines. Note that TP and depth are shown on a log10 scale

Discussion

This 546-sample (464 unique sites) dataset, collating diatom information from 20 projects in Ontario (Table 1), incorporates a broad range of samples from remote sub-Arctic sites to lakes in more populated areas in the southernmost part of the province (Fig. 1). The distribution of lakes in the dataset is indicative of the management concerns in Ontario since the 1980s. Nearly 40% of the lakes in our dataset were selected to examine the effects of acidification caused by smelting operations (Dixit et al. 1989, 1991, 2002; Greenaway et al. 2012), as well as its subsequent recovery (Cheng et al. 2022). Approximately 45% of lakes were selected to describe the effects of eutrophication related to recreation, urbanization, shoreline development, and tourism (Hall and Smol 1996; Reavie and Smol 2001; Werner and Smol 2005; Hadley et al. 2013; Barrow et al. 2014; Wilkins 2021). Around 13% of the lakes were in the sub-Arctic Hudson Bay Lowlands and were originally used to study the recent responses to warming and to retrospectively assess baseline conditions before the onset of major mining operations (Rühland et al. 2014; Hargan et al. 2016; Hargan unpublished). Recent diatom-based studies have aimed to address newly developing management concerns using smaller lake sets, such as lake-trout management in response to climate warming (Nelligan et al. 2016, 2020), increasing prevalence of cyanobacterial blooms in minimally impacted lakes (Favot et al. 2019; Favot 2021), and the effects of chloride on diatom assemblages (Valleau 2021; Valleau et al. 2022).

The high representation of study lakes in the Ontario Shield compared to the Hudson Bay Lowlands (Fig. 1; Table 2) is not surprising given the size of this ecozone in relation to the province and the remoteness of the Hudson Bay Lowlands. Given that Ontario spans three terrestrial ecozones that vary in bedrock geology, vegetation, climate, and catchment development, it is understandable that measurements from the 464 sites across this large province have a broad range in lakewater pH and TP concentration. Differences in pH among the geologically diverse ecozones are largely a reflection of the differences in underlying bedrock and catchment characteristics, as well as the cumulative effects of regional smelting activity in many of the Ontario Shield lakes in the study (Dixit et al. 1991; Cheng et al. 2022). Total phosphorus concentrations are low in Ontario Shield lakes and were generally higher in the Mixedwood Plains and Hudson Bay Lowlands, largely reflecting differences in geological setting. While many of the study lakes from the Mixedwood Plains and Ontario Shield are relatively deep the lakes in the Hudson Bay Lowlands are distinctly shallow, which is typical of this sub-Arctic region. Due to the uneven distribution of sites among the three ecozones (i.e., the majority of sites are in the Ontario Shield), optima of diatom taxa were not calculated for each ecozone separately. Many previously published individual calibration sets, however, provide region-specific optima for the most common diatom taxa (Hall and Smol 1996; Reavie and Smol 2001; Werner and Smol 2005; Hadley et al. 2013).

Significant ecological responses to pH were observed for all but two of the most common diatom taxonomic units (the Cymbella gaeumanii complex; Platessa conspicua), and model IV (symmetrical and unimodal) was the most common response type. The strong relationship between diatoms and pH has been observed in several datasets from Canada and around the world and robust pH transfer functions have been developed to track acidification and subsequent pH recovery (Birks et al. 1990; Dixit et al. 2002). Species such as Asterionella ralfsii, the Frustulia rhomboides complex and Eunotia exigua recorded the lowest pH optima in our dataset and have been identified as key indicators of lake acidification in previous investigations (Camburn and Charles 2000; Dixit et al. 2002). Planktonic taxa such as A. ralfsii, Tabellaria flocculosa strain III, Eunotia zasuminensis, and several filamentous Aulocoseira taxa (A. distans, A. lirata, A. perglabra) have low pH optima and were more common in deeper lakes (> 10 m; Table 3). Meanwhile, benthic taxa such as the Frustulia ormosas complex, Eunotia incisa, and the Fragilaria virescens complex also have relatively low pH optima, but had lower depth optima (< 10 m; Table 3). Although acidobiontic Eunotia exigua is a benthic taxon, it had a depth optimum of 15.3 m. This benthic taxon may thrive in the littoral zones of deep, clear, acidic lakes, highlighting that maximum lake depth is a coarse measure that cannot account for the complexities in lake morphology or the light environment, which can influence the relative availability of littoral habitat.

Significant ecological responses to TP concentrations were observed for 43 of the 52 most common taxonomic units in this dataset. A strong understanding of how diatom assemblages respond to changes in lake-water-TP concentrations and the development of transfer functions have helped recognize eutrophication trajectories in lakes (Ramstack et al. 2004; Cumming et al. 2015). In our Ontario dataset, the majority of lakes were considered oligotrophic to mesotrophic in TP concentrations, and therefore “low” and “high” TP optima are described within this context. Generally, epiphytic and benthic taxa (Cocconeis placentula, Eolimna minima, Nitzschia fonticola, Nitzschia perminuta, Pinnularia interrupta, Sellaphora pupula, Staurosira construens, and Staurosira construens var. venter) had the highest TP optima in our Ontario lakes (TP optima range = 12.00–14.29 µg L−1). Relatively high optima for TP concentrations were also recorded for some planktonic taxa, such as Fragilaria crotonensis (TP optimum = 9.64 µg L−1), the Fragilaria ulna complex (TP optimum = 10.34 µg L−1), and Stephanodiscus minutulus (TP optimum = 9.78 µg L−1).

Although our study lakes are distributed across large environmental gradients (Fig. 1; Table 2), the majority are circumneutral (Fig. S1), oligotrophic (Fig. S2), and reach a maximum depth of between 10 and 25 m (Fig. S4). Therefore, the weighted-average (WA) optima described in this study may suffer from the ‘edge effect’ for taxa indicative of eutrophic or deep conditions, where optima are poorly estimated due to the truncation of their ecological response curves at the extremes of the gradient (Simpson and Hall 2012). Given that the majority of the lakes in our study are skewed towards low TP concentration and relatively shallow maximum depth, WA optima for taxa that more commonly thrive in eutrophic or very deep conditions may be underestimated. Another factor that may affect our WA optima is high seasonal variability in surface-water-TP concentration. For example, in oligotrophic Precambrian Shield lakes, TP concentrations may decline in the summer months when many lakes in the study were sampled (Clark et al. 2010). The taxa most likely affected by edge effects are represented by Model II responses, where the taxa optima are outside of the measured range in our dataset. These optima should be viewed with an understanding of the distribution, and the knowledge that there may not be sufficient data to accurately estimate the optima.

Notably, taxa that are often considered strong indicators of eutrophication were not common in our dataset (i.e., were not included in the group of 52 common taxa). For example, indicators of eutrophic waters, such as Stephanodiscus hantzschii and Cyclostephanos invisitatus, were only observed in 16 lakes (~ 3% of the lake set) and 9 lakes (~ 2%), respectively, and therefore did not meet the criteria required for undertaking detailed analysis. Although the TP concentration gradient of the Ontario dataset is large (1.1–96.8 µg L−1; Table 2), the majority of lakes are oligotrophic (60% of sampled sites have TP < 10 µg L−1) and therefore taxa that indicate highly eutrophic conditions (TP > 24 µg L−1) were not commonly observed (Fig. 1).

In general, the most common diatoms had broad ecological responses to lakewater pH, TP concentration, and maximum depth (Fig. 5). This is not surprising given that these are three commonly and easily measured variables that encompass other limnological factors important for diatom growth, thereby potentially confounding measurements of diatom distributions. For example, TP concentration and depth are associated with light penetration, and TP levels and pH may be associated with dissolved organic carbon (DOC) concentrations. Dissolved organic carbon is directly linked to the depth of light penetration, which can be a key variable driving variation in diatom communities (Gushulak et al. 2017). Other factors that diatoms respond to and that were not measured in our study include the length of the growing season, thermal stratification and turbulent mixing patterns, which are increasingly important with accelerated climate warming (Rühland et al. 2015). For instance, the relative abundances of two of the most common planktonic diatom taxa in the dataset (D. stelligera/pseudostelligera and A. formosa) have been associated with climate-mediated changes to lake thermal properties in many Ontario lakes (Enache et al. 2011; Rühland et al. 2013; Hadley et al. 2013; Barrow et al. 2014; Sivarajah et al. 2016, 2018).

In the 546-sample dataset, the most common distribution of ecological response curves was Model IV (~ 31%), followed by Model V (~ 30%), Model II (~ 16%), Model I (~ 14%) and Model III (~ 9%). This distribution generally agrees with earlier work, which noted that symmetrical response curves are the most common, followed by monotonic responses, skewed and null responses, and finally plateau responses (Oksanen and Minchin 2002; reviewed in Birks et al. 2012). A key difference in our study is the larger number of Model I (null) responses and Model V (skewed, unimodal) responses. We propose two potential reasons for this discrepancy. First, the loss of some taxonomic resolution during the necessary harmonization step could have led to null or skewed response curves (e.g., the Fragilaria capucina complex; Fig. S6). Our conservative harmonization approach was required because the data were collected over three decades by multiple researchers. Second, high variability in taxa relative abundance may have resulted in null response curves as a strong response curve could not be fit to the data. This occurred when taxa were found in low relative abundance across the majority of lakes and were rarely observed at high abundance (e.g., Amphora libyca, Cocconeis placentula; Fig. S6), or when taxa were found in high abundance but in relatively few lakes (e.g., Eunotia zasuminensis, Nitzschia fonticola; Fig. S6). These issues can be addressed by a pre-established study design assessing lake conditions, such as the ongoing Natural Sciences and Engineering Research Council of Canada (NSERC) Lake Pulse network in which even proportions of lakes were selected that considered trophic status, human stressors, and other factors (Huot et al. 2019).

The development of large biological datasets can help to explore a variety of questions and serve as a starting point for future research. In this study, we quantified environmental optima for the most common diatoms reported in Ontario lakes. Future studies could explore how environmental optima for individual species differ between each geographic region by comparing the results from this study to previously published work from Ontario. Similarly, this large dataset could be used to develop an Ontario-wide diatom-based transfer function to reconstruct past changes in TP concentrations and pH. The strength and performance of the new model could be assessed against existing models and measured data from monitoring programs to determine if a transfer function with more than 500 sites can reconstruct environmental variables more accurately. The availability of a large dataset also enables the use of innovative techniques that require large datasets to reconstruct limnological variables. For example, the moving window approach selects a subset of assemblages (40, 60 … 200) from the large calibration dataset (400 + sites) that are similar to the fossil sample to reconstruct environmental variables (Hübener et al. 2008, 2009). The selection of a subset of diatom taxa helps to increase the likelihood of finding good modern analogues for the fossil assemblages, which can greatly improve the reconstruction of environmental variables.

Conclusions

By combining many surface sediment samples collected from Ontario lakes that were analyzed in the same lab using a similar methodology, we identified the overall patterns of diatom species response curves to three key limnological variables. Such analyses can help refine diatom autecology and advance the understanding of how common taxa respond to environmental variables that are important for lake management. By modelling ecological response curves for Ontario’s 52 commonly encountered taxa, we contribute to the application of diatoms as bioindicators for environmental assessment. The information from this dataset can also be useful for inferring past limnological conditions (particularly TP and pH) and/or biological recovery trajectories at sites that have been impacted by multiple environmental stressors. These data may be particularly valuable in understudied regions with similar limnological environments and diatom communities where calibration sets are not available. Moreover, lakes can undergo dramatic changes through time, thereby requiring large lake sets spanning large environmental gradients to increase the likelihood of finding modern analogues for downcore-diatom assemblages. The taxon-specific optima for lakewater pH and TP concentrations can help to qualitatively assess past trends in these key limnological variables, which can serve as important management and biomonitoring tools.