Introduction

The study of modern species-environment relationships as a key to understand past environmental changes has become a central tenet in paleolimnology. The use of biological indicators to infer paleoenvironmental conditions has a long history, and evolved from qualitative studies based on a few indicator taxa to the calibration of thousands of taxa to a multitude of environmental variables (Smol 1992; Birks 1998, 2012; Juggins and Birks 2012). Such calibration data sets revolutionized paleolimnology by providing the means of deriving quantitative reconstructions of past environmental variables from the fossil assemblages preserved in lake sediments (Birks 1998).

Statistical calibration models (i.e. transfer functions) are based on surface-sediment samples collected from a large number of lakes covering the environmental gradient of interest for the reconstruction. In each lake, a single surface-sediment sample is usually collected from the center or deepest zone of the lake basin and death assemblages from the topmost sediments analyzed (Juggins and Birks 2012). In doing so, single death assemblages are considered to represent a good estimation of the species composition of one lake (Adler and Hübener 2007). This is a rough approximation, however, as practical considerations and resource limitations usually do not allow an intensive sampling of each lake along spatial and temporal scales. Nevertheless, this simplified methodology only works when the between-lake variability on diatom assemblages composition exceeds the internal variability within any of the lakes of the dataset (Link et al. 1994; Bunbury and Gajewski 2008). If the average composition of diatom assemblages significantly differs between two lakes of contrasting environmental characteristics, and no overlap among their multivariate dispersions is evident, then single samples can be faithfully used for environmental bioindication, provided the environmental variable of interest explains a significant portion of the diatom variance. If, on the other hand, diatom composition variability within these lakes is larger than its between-lake variance, then the value of single samples to represent environmental differences among lakes is highly dependent of the sampling design. In such cases, the role of sampling effort in capturing environmental differences becomes a key issue for diatom-based paleoenvironmental studies.

The distribution of death assemblages in lake sediments is a complex process, regulated not only by the ecology and distribution of the original living communities but also by taphonomic factors (Birks and Birks 1980). The point of deposition of sub-fossils and their possible resuspension and sedimentation can be influenced by lake morphometry, aquatic vegetation, waves, wind intensity and direction (Blais and Kalff 1995; Whitmore et al. 1996; Gilbert and Lamoureux 2004; Heggen et al. 2012). The source community and organism life-form are also critical issues, as floating remains (such as plankton) may tend to be easily reworked and redistributed through the lakes sub-environments, whereas benthic and periphytic taxa may be more easily sunk and incorporated into the sediments near their source areas (De Nicola 1986; Frey 1988; Heggen et al. 2012; Hassan 2015). Overall, all these factors influence the degree of variability exhibited by death assemblages, leading to unpredictable differences among lakes and proxies. This problem has been recognized both in biomonitoring and paleolimnology, and a series of studies addressing the impact of spatial variability of death assemblages on sample representativity have been published in the last decades (Anderson 1990; Link et al. 1994; Weilhoefer and Pan 2006; Eggermont et al. 2007; Bunbury and Gajewski 2008; Dennis et al. 2010; Heggen et al. 2012; Bennett et al. 2014).

Although the importance of diatom sample representativity has been addressed for deep, dimictic lakes (Charles et al. 1991; Adler and Hübener 2007) and rivers (Weilhoefer and Pan 2006), the extent of this problem in shallow lakes has received little attention (Adler and Hübener 2007). This is particularly true for the shallow, hypereutrophic and environmentally heterogeneous lakes characterizing the South American Pampas (Diovisalvi et al. 2015; García-Rodríguez et al. 2009). These lakes are characterized by an extensive development of the littoral zone relative to the pelagic zone, providing a wide suite of available microhabitats that promote the growth of diverse and productive periphytic and benthic communities (Wetzel 2001), and exerting differential taphonomic biases during diatom deposition (Hassan 2015). This environmental complexity originates an unpredictable variability on death diatom assemblages distribution in these lakes, leading to uncertainties on the sampling effort needed to faithfully represent within-lake composition. In this paper, the degree of variability in sedimentary diatom assemblages composition is addressed in three Pampean shallow lakes exhibiting differences in their environmental characteristics, particularly in conductivity and hardness (Cristini et al. 2017). Previous studies evidenced significant relationships between diatom assemblages composition, conductivity, pH, and depth in these lakes, encouraging their use as indicators of past environmental fluctuations (Hassan 2015). Therefore, they bring an opportunity to assess the effect of within-lake variability on sampling representativity and its subsequent impact on between-lake comparisons. The results are expected to provide insights on the role of diatom variability on the formation of sedimentary assemblages in shallow lakes and to discuss the implications for diatom-based calibration and paleoenvironmental studies.

This study is part of a larger project focused on addressing the nature and magnitude of the taphonomic biases suffered by diatom remains in shallow lakes. In a previous work, the compositional and environmental fidelity exhibited by diatom death assemblages was evaluated through the multivariate analysis of live-dead data sets collected in three lakes from the Southern Pampas (Hassan 2015). Death assemblages captured environmental gradients better than life assemblages, being conductivity and depth the strongest environmental variables in explaining significant portions of diatom variance (12.26 and 10.66%, respectively; p < 0.001, Hassan 2015). In the present contribution, the extent to which single sedimentary samples do represent the within-lake average diatom composition is evaluated in the same three shallow lakes.

Study site

The lakes are located in the southeastern Pampa plain (Buenos Aires province, Argentina), close to the Atlantic coast: (1) Las Mostazas (37°9′57″S; 57°14′50″W), (2) Los Carpinchos (37°3′34″S; 57°19′56″W) and (3) Nahuel Rucá (37°37′21″S; 57°25′42″W). They are situated along a north-south gradient covering a distance of approximately 60 km (Fig. 1). As with most Pampean lakes, the studied sites are very shallow (max. depth = 90–150 cm; Table 1), showing considerable inter-annual differences in area and depth between wet and dry periods (Bohn et al. 2011). Their shallowness favors the interaction between sediments and the water column through wind-driven turbulence. The dominant wind direction in the region is South-Southeast, making the fetch of Las Mostazas (1.4 km) higher than Nahuel Rucá (1.15) and Los Carpinchos (0.65; González-Sagrario et al. 2018). As most lakes occurring in the continental plain in the southern part of the Pampean region, they show typical pfanne or wanne profiles and tend to have rounded or slightly elongated shapes (Diovisalvi et al. 2015). Their littoral zones are densely populated by emergent and submersed macrophytes, forming a ring that usually surrounds entirely the lakes and clearly differentiates littoral from open waters areas (Fig. 1). When compared against other lakes worldwide, Pampean shallow lakes show limnological characteristics that depart from most of those located in temperate regions as having higher phosphorous, nitrogen and chlorophyll a concentrations and much lower transparency, and therefore they stand as extremes of the trophic-state continuum (Diovisalvi et al. 2015). This is the case of the studied lakes, which exhibited very eutrophic, turbid and alkaline conditions during the sampling period (Table 1). According to a recent study (Cristini et al. 2017), the three lakes differ significantly in their ionic composition, with Las Mostazas characterized by higher concentrations of chloride, carbonates, bicarbonates, calcium carbonate, calcium and conductivity, as it is a closed basin settled in a saline soil. Nahuel Rucá, on the other hand, is the most stable lake, as its water level is regulated by a floodgate. It exhibits the lowest conductivities and pH among the three studied lakes (Table 1). The sediments from the three lakes are muddy and organic, with maximum organic matter contents ranging between 30 and 60% (Cristini et al. 2017).

Fig. 1
figure 1

Location of the studied lakes: Nahuel Rucá (NR), Los Carpinchos (LC) and Las Mostazas (LM). Lines indicate the zones of the open waters (dotted line) and littoral areas (continuous line) covered in the samplings

Table 1 Summary of environmental information taken during diatom samplings by Hassan (2015)

Materials and methods

Sampling strategy

In each lake, sampling was carried out on a seasonal basis during a year. Given the very low water levels, Las Mostazas lake became very muddy during the summer, making it impossible to access the sampling area. This was mostly consequence of the settlement of suspended sediments in its extensive and very shallow littoral area during the dry season. Hence, the Las Mostazas data set included data from only 3 field trips. The sampling was designed in order to allow the comparison between the two main habitats represented in each lake: the highly vegetated littoral zone and the open-water area (Fig. 1). The differentiation between littoral and open-water samples was based on the presence or absence of emerging macrophytes on the sampling point. Littoral samples were collected in the surroundings of Schoenoplectus californicus (Meyer) Sojak mats, which were absent of the zone classified as open waters. Zones with well-established mats were selected for sampling, in order to avoid the regions of the lake that could be subjected to occasional desiccation. The percentage of each lake area covered by macrophytes was assessed from Google Earth® satellite images using klm polygons. The proportion of littoral area ranged between 50.2 and 55.3% (Table 1), supporting previous works on Nahuel Rucá and other Pampean lakes, in which macrophyte coverages were always around 50% (Federman 2003). Consequently, the sampling was designed in order to cover both littoral and open-water areas in similar proportions (50/50). In each visit, six surface sediment samples were randomly collected both from the littoral and open-water areas using a piston core. Hence, the final data set comprised a total of 132 surface sediment samples (Los Carpinchos: 48; Nahuel Rucá: 48; Las Mostazas: 36). The top 1 cm of each core was sliced in the field and preserved with 96% alcohol.

In the laboratory, the fresh samples stored in alcohol were oxidized with 30% hydrogen peroxide and washed several times with distilled water. After homogenization, a subsample was transferred to a coverslip and air dried, and permanent slides were made using Naphrax®. On each slide, at least 500 diatom valves were counted across random transects using a Leica DM500 light microscope at X1000 magnification. Diatom taxa were identified according to Lange-Bertalot et al. (1996), Krammer and Lange-Bertalot (1997, 1999, 2004a, b), Metzeltin and Lange-Bertalot (1998, 2007), Rumrich et al. (2000), Metzeltin et al. (2005), Levkov (2009) and Sar et al. (2009).

Data analyses

Within-lake variability in diatom composition

In order to assess the variability on within-lake diatom assemblages composition, surface-sediment samples from each lake were compared with Monte Carlo-simulated assemblages from the same species pool (Heggen et al. 2012). All diatom valves counted in each lake were pooled and randomly resampled 1000 times using Monte Carlo, with a minimum sample size of 500 valves. These randomized data sets were then averaged in order to calculate a simulated mean assemblage composition for each lake. The models were performed using the sample function in R version 3.3.2 (R Development Core Team 2016). The ecological distances between the original samples and the simulated mean assemblage were assessed through Bray Curtis (Bray and Curtis 1957) as dissimilarity measurement. NMDS (Non-metric Multidimensional Scaling) based on Bray Curtis distances were used to summarize the multivariate data of each lake on a bidimensional ordination showing the variability of littoral and open waters samples from each season around the simulated mean assemblage. Permutational multivariate analysis of variance (PERMANOVA) tests were applied to search for differences between littoral and open water samples inside each lake, and the diatom taxa determining differences among both lake compartments were identified through Similarity Percentage (SIMPER) routines. This method computes the average dissimilarity between inter-group samples, and then assesses the separate contributions to this dissimilarity for each species (Clarke 1993).

Spatial differences in within-lake variability were also assessed through box-plots comparing the Bray Curtis distances to simulated mean assemblage for littoral and open waters samples separately. Additionally, temporal differences in within-lake variability were assessed through box-plots comparing the Bray Curtis distances among seasons for the three lakes. Significant differences in median values among littoral and open waters and among the four seasons were tested using Kruskal–Wallis tests followed by Mann–Whitney post-hoc comparisons. Differences in the variance between littoral and open waters were tested using F tests (Zar 2010). All the indices, analyses and graphs were performed using the software Past v. 3.11 (Hammer et al. 2001).

Between-lake differences in diatom composition

Relative abundances, richness, Shannon-Wiener diversity, and Simpson (1-D) evenness were used for comparison of different aspects of diatom assemblages between the three studied lakes (Magurran 2004). Significant differences in the median values among these indices were tested using Kruskal–Wallis analysis of variance followed by Mann–Whitney post-hoc tests (Zar 2010). In order to analyze the interaction among within and between-lake variability in the studied shallow lakes, the datasets from the three lakes were ordered in a bidimensional space using principal coordinates analysis (PCoA) with Bray Curtis as dissimilarity measure. Significant differences between the centroids of the multivariate areas covered by the three lakes were tested with PERMANOVA, and interpreted as indicative of differences in the mean composition of diatom assemblages among lakes. The diatom taxa determining differences among the three lakes were identified through Similarity Percentage (SIMPER) routines. Kruskal–Wallis, PERMANOVA and SIMPER tests were performed using the software Past v. 3.11 (Hammer et al. 2001).

A test of homogeneity of multivariate dispersions (HMD, Anderson et al. 2006) was also applied in order to analyze for differences in the variances between the three lakes. Dispersions were represented by the Bray Curtis distances of samples to their centroid in a Euclidean space, calculated with PCoA. If two lakes differed in the location of their multivariate centroids, the degree of overlapping among the areas covered by these lakes were then considered as indicative of the probability of collecting a sample poorly representative of that between-lake difference. Here, the poor representative samples were defined as those single diatom assemblages lying in a portion of the multivariate diagram in which two or more lake polygons overlap. HMD and PCoA were conducted in R version 3.3.2 (R Development Core Team 2016). Dissimilarity matrices were constructed based on the Bray Curtis index and tested under 999 permutations using the betadisper function in the package “vegan” version 2.0–7 (Oksanen et al. 2013).

Effect of replicated sampling on representativity

Because of significant differences among between-lake average assemblages composition significant according to PERMANOVA results, a second modelling approach was applied to the original datasets from the three lakes in order to assess the minimum number of replicated samples needed to faithfully capture that difference. Monte Carlo resamplings were applied to the original data sets by randomly selecting n samples 100 times, being n pair and ranging between 2 and 20 samples. For each run, resamplings were constrained to choose equal numbers of samples from the littoral and open waters samples using the ddply function in the package “plyr” version 1.8.4 (Wickham 2016).

The capability of the simulated groups of replicates to capture the average composition of diatom assemblages within each lake was then evaluated by comparing the Bray Curtis distances of their centroid to the simulated mean assemblage. Those replicated mean assemblages lying within a Bray Curtis distance threshold of 0.3 from the simulated mean assemblage were considered as good representatives of the within-lake average composition (Kelly 2001). In order to evaluate the impact of increasing the number of replicates on between-lake comparisons, PCoA ordinations of the three lakes were repeated based on the models constructed with 2 and 4 replicated samples previously developed. The minimum number of replicates representative of between-lake differences was defined as the one which lead to no multivariate overlap between the modeled mean abundances of the three lakes. The models, analyses and graphs were prepared using R version 3.3.2 (R Development Core Team 2016).

Results

Within-lake variability in diatom composition

A total of 146 diatom species were identified in the three lakes (ESM1). The distribution of the dominant taxa in littoral and open waters sediments is presented in ESM 2–4. Assemblages in Las Mostazas were dominated by the tychoplanktonic Cyclotella meneghiniana Kützing, accompanied by epiphytic (Cocconeis placentula Ehrenberg, Epithemia adnata (Kützing) Brébisson and E. sorex Kützing) and benthic taxa (Nitzschia amphibia Grunow, Hippodonta hungarica (Grunow) Lange-Bertalot, Metzeltin and Witkowski and Navicula peregrina (Ehrenberg) Kützing). Differences in assemblages composition between littoral and open waters were significant (F: 11.89; p < 0.0001), and mainly related to a higher percentage of N. amphibia (SIMPER contribution: 18.4%) in littoral sediments and a dominance of C. meneghiniana (19.37%) and N. peregrina (6.5%) in open waters (ESM2). This was also evidenced by the NMDS plot, in which a clear separation between littoral and open-water samples was observed (Fig. 2b). Bray Curtis similarity to within-lake simulated mean assemblage composition ranged between 0.54 and 0.87 (Fig. 2a). Littoral and open-water samples did not differ in the mean Bray Curtis similarities of samples to the simulated mean assemblage (H: 0.09, p = 0.9) nor in its variance (F = 1.74, p = 0.3; ESM5). There were no significant differences in the Bray Curtis similarities of samples among seasons in this lake (H: 0.57, p = 0.75; ESM5).

Fig. 2
figure 2

Analyses of variability of diatom composition around the simulated average assemblage composition. a Boxplots showing Bray–Curtis distances to the simulated mean assemblage of samples from Las Mostazas (LM), Los Carpinchos (LC) and Nahuel Rucá (NR); B-D) NMDS plots based on Bray–Curtis distances of the original datasets from littoral (circles) and open waters (squares) and the average composition of the modeled dataset (white cross) in Las Mostazas (b), Los Carpinchos (c) and Nahuel Rucá (d)

Diatom assemblages in Los Carpinchos Lake were also dominated by Cyclotella meneghiniana, accompanied by tychoplanctonic (Aulacoseira granulata (Ehrenberg) Simonsen and A. granulata var. angustissima (Müller) Simonsen), epiphytic (Cocconeis placentula, Epithemia adnata, E. sorex, Pseudostaurosira americana Morales and Planothidium delicatulum (Kützing) Round and Bukhtiyarova) and benthic taxa (Nitzschia amphibia, Hippodonta hungarica, Navicula veneta Kützing and N. peregrina). Diatom assemblage composition differed significantly between littoral and open-water compartments of the lake (F: 4.46; p < 0.001), as a consequence of the higher dominance of C. meneghiniana (SIMPER contribution: 14.3%), C. placentula (7.9%), N. amphibia (6.5%), A. granulata (5.5%), and N. peregrina (5%) in open waters (ESM3). Ordination of samples in the NMDS plot also evidenced a partial separation between littoral and open-water samples, although they show similar dispersions around the simulated mean assemblage (Fig. 2c). Accordingly, littoral samples did not differ from open-water ones in their Bray Curtis similarities to the simulated mean assemblage (H = 2.3, p = 0.12) nor in its variance (F = 1.05, p = 0.92; ESM5). Also, no significant differences in the Bray Curtis similarities among seasons were found in this lake (H = 2.3, p = 0.5; ESM5).

In Nahuel Rucá Lake, assemblages composition differed significantly between littoral and open waters (F: 3.56; p < 0.001), as the tychoplanktonic taxa C. meneghiniana, A. granulata and A. granulata var. angustissima were more abundant in open waters than in littoral sediments (SIMPER cumulative contribution: 25.5%). The benthic taxa N. amphibia, N. veneta and H. hungarica, together with the epiphytics E. adnata, E. sorex and P. americana, exhibited their higher relative abundances in littoral sediments (SIMPER cumulative contribution: 24.8%). The compositional dissimilarity between littoral and open-water samples was not evident in the NMDS plot, where samples from both lake subenvironments occupied mostly overlapping multivariate spaces (Fig. 2d). No significant differences between the Bray Curtis distances (H = 1.74; p = 0.18) or the variances (F = 2.07; p = 0.31) of both sets of samples to the simulated mean assemblage were recorded (ESM5). However, Bray Curtis similarities to the simulated mean assemblage were significantly higher in autumn in this lake (H = 10.21; p < 0.05; ESM5).

Between-lake differences in diatom composition

Analyses of diversity indices indicated that Las Mostazas was the least diverse of the three studied lakes, as it exhibited significantly lower richness (R = 35 ± 7), Shannon diversity (H′ = 2.31 ± 0.25) and evenness values (E = 0.83 ± 0.04) than Los Carpinchos and Nahuel Rucá (p < 0.0001; ESM6). On the other hand, no significant differences among these two lakes were observed, as assemblages from Los Carpinchos and Nahuel Rucá exhibited similar values of richness (-Los Carpinchos = 54 ± 10, Nahuel Rucá = 57 ± 10), Shannon diversity (Los Carpinchos = 2.95 ± 0.43, Nahuel Rucá = 3.07 ± 0.42) and evenness (Los Carpinchos = 0.89 ± 0.08, Nahuel Rucá = 0.90 ± 0.08; ESM6).

According to the HMD tests, variance in Las Mostazas assemblages (D = 0.2339) was significantly lower than in Nahuel Rucá (D = 0.3348; p < 0.0001) and Los Carpinchos (D = 0.3121; p < 0.0001), while these two lakes did not show significant differences among them. PERMANOVA tests indicated that mean assemblages composition differed significantly among the three lakes (F = 24; p < 0.0001). According to SIMPER results, these differences were a consequence of both (1) differences in the relative abundances of the dominant diatom taxa (e.g. C. meneghiniana, C. placentula and N. amphibia; ESM7), and (2) the presence of some taxa exclusively represented in one lake (e.g. Staurosira longirostris in Nahuel Rucá, ESM7).

Examination of PCoA biplot, however, evidenced a strong degree of overlap between the multivariate spaces occupied by these lakes (Fig. 4a). The percentage of samples ordered in the multivariate space exclusive of each dataset varied among lakes: a 61% of Las Mostazas (n = 22; 11 littoral, 11 open waters), a 29% of Los Carpinchos (n = 14; 7 littoral, 7 open waters), and a 50% of Nahuel Rucá samples (n = 24; 5 littoral, 19 open waters), lied out of the overlapping area of the multivariate space and can be considered as fully representative of the assemblages from each lake (Fig. 4a). A nearly equivalent distribution of representative samples across seasons was found in Las Mostazas (Spring: 9, Autumn: 6, Winter: 7), and Nahuel Rucá (Spring: 6, Summer: 6, Autumn: 6, Winter: 6) Lakes, while in Los Carpinchos these were mostly collected in winter (Spring: 2, Summer: 1, Autumn: 4, Winter: 7).The degree of overlapping between the two lakes representing the two extreme conductivity conditions (the brackish Las Mostazas and the freshwater Nahuel Rucá, Table 1) was almost negligible, being only 1 open waters sample from Las Mostazas autumn ordered inside the multivariate space occupied by Nahuel Rucá samples (Fig. 4a).

Effect of replicated sampling on representativity

Monte Carlo resampling allowed identifying a strong influence of increasing the number of replicated samples on the representativity of diatom assemblages in the three lakes (Fig. 3). Average composition of the simulated groups of replicates decreased their distance to the simulated mean assemblage as the number of replicates increased. Collecting two replicates (one from littoral and one from open waters) lead to Bray Curtis distances to the centroid lower than 0.3 in Las Mostazas, while four replicated samples were needed to reach this threshold in Los Carpinchos and Nahuel Rucá (Fig. 3). Overall, collecting two samples from each subenvironment led to replicated groups of samples whose average composition closely resembled the average composition of the whole dataset from each lake, suggesting that this sampling intensity would be enough to capture a representative sample of within-lake mean assemblages composition.

Fig. 3
figure 3

Plots used to assess the effect of replicated sampling on representativity. Through Monte-Carlo simulations, simulated resamplings of 1–20 samples were generated, being half of the samples chosen from littoral and open waters, respectively. Their Bray–Curtis distances of the replicated groups of samples to the modeled average composition of diatom assemblages calculated. A critical distance to the centroid of 0.3 was used to separate poor (dotted line) and good representative samples

The degree of overlapping among the three lakes decreased significantly when the models based on 2 replicated random samples were used in the PCoA ordination (Fig. 4b). The multivariate space occupied by Las Mostazas lake showed no overlapping with Nahuel Rucá, while only one of the samples from Los Carpinchos lied on the overlapping area. The polygons of Los Carpinchos and Nahuel Rucá lakes still showed a high overlapping, although this became strongly reduced when compared to the original dataset (Fig. 4a). Hence, although lower, there is still a chance of collecting poor representative groups of samples when only one littoral and one open-water sample are collected in these lakes. No overlap among the multivariate spaces occupied by the three lakes was observed when 4 simulated replicates were used in the PCoA (Fig. 4c). Hence, the results indicate that the probability of obtaining faithful representatives of average assemblages composition when collecting 4 samples (2 littoral and 2 open waters) is high, as average replicate composition of sampling groups is located in a narrow area around the centroid of the diatom assemblages distribution of each lake.

Fig. 4
figure 4

PCoA plots used to assess the effect of representativity on between-lake comparisons. Ordinations of Las Mostazas (triangles), Los Carpinchos (circles) and Nahuel rucá (squares) lakes were performed based on the original datasets (a), modeled pools of 2 samples (b) and modeled pools of 4 samples (c). Littoral (filled symbols) and open waters (empty symbols) are indicated for the original dataset plot (a)

Discussion

Within-lake variability on diatom assemblages composition

Diatom assemblages preserved in surface sediments from Pampean shallow lakes were characterized by a high variability in species composition. This was evidenced by large dispersions of samples around the centroid in NMDS plots, related to large Bray Curtis distances among some samples and the modelled mean abundance composition. Examination of SIMPER results pointed the patchy distribution of some taxa as the main cause of this high variability, as evidenced by the differences on proportional abundances of dominant species. Heterogeneity of diatom communities on scales of centimeters or greater has been associated to changes in light and current regimes, grazing, successional stages and variation in substratum type and composition (Stevenson et al. 1985; Ledger and Hildrew 1998; Sommer 2000; Soininen 2003). Small- scale biotic interactions, such as competition, grazing, colonization processes, variations in recruitment, or low movement ability also affected composition of assemblages both in marine and freshwater habitats (Černá 2010). Physical and chemical parameters responsible of small-scale patchiness of microhabitats, substrate complexity and heterogeneity, and water currents also play an important role (Passy 2001; Soininen 2003 Moreno-Ostos et al. 2008; Černá 2010). In Pampean shallow lakes, these small-scale processes would be responsible for the patchy distribution of the living diatom communities associated to the observed within-lake variability. Moreover, additional variability can be added to the original community during the formation of sedimentary assemblages by taphonomic processes (Hassan 2015). Given their shallowness and high nutrient content, these lakes are prone to frequent resuspension cycles and intensive grazing by benthic invertebrates, which can cause breakage of diatom frustules and diffusion of dissolved silica, making them more susceptible to differential fragmentation and dissolution (Bennion et al. 2010; Hassan 2015; Hassan and De Francesco 2018). The deposition of resuspended particles follows complex dynamics in these lakes, as their flat profiles and very shallow depths favors sediment trapping in peripheric embayments and sheltered areas, such as macrophytes patches, rather than focusing in the deepest point (Whitmore et al. 1996). Hence, the observed variability in diatom assemblages composition can be explained by a combination of the biological and ecological processes that structure the original communities and the physical and chemical taphonomic processes that act during the deposition of their dead remains.

Variability was high within both littoral and open-water assemblages from the three lakes, signaling that spatial heterogeneity of diatom assemblages composition was more related to micro-scale environmental diversity than to bathymetric position. Several studies pointed out the presence of highly variable surface sediment diatom assemblages unrelated to bathymetry in shallow waters (De Nicola 1986; Earle et al. 1988; Wolfe 1996; Weilhoefer and Pan 2006). In these studies, lateral environmental gradients, such as patchiness in substrate availability, were pointed out as the main forcings influencing the specific composition of communities in shallow, complex environments, as they exert strong influence not only by providing substrates for colonization, but also by changing the light environment and by mediating the available nutrients (Weilhoefer and Pan 2006). Besides their similar variability around mean composition, diatom assemblages from littoral and open water environments differed significantly in their taxonomic composition in the three lakes. This compositional difference was not surprising, however, as these shallow lakes are known to support distinctive littoral and open-water habitats that promote the growth of diverse and productive diatom assemblages adapted to the intrinsic characteristics of each lake subenvironment. In open waters, dominance of planktonic and tychoplanktonic taxa, such as C. meneghiniana and A. granulata, is promoted by intermittent mixing events that frequently resuspend the sediments, preventing their loss by sinking (Bennion et al. 2010; Hassan and De Francesco 2018). On the other hand, the development of extensive littoral zones covered by macrophytes provides a variety of habitats for epipelic and epiphytic taxa, leading to the growth of characteristic communities (Wetzel 2001, Rojas and Hassan 2017). These compositional differences have implications for paleoenvironmental studies, particularly for assessing past variations on macrophyte coverage and littoralization during the Holocene. Hence, compositional differences among littoral and open waters diatom assemblages point out the relevance of representing the main within-lake subenvironments when assessing the composition of diatom assemblages in these environmentally heterogeneous shallow lakes.

Within versus between-lake variability

Multivariate comparisons of diatom assemblages among the three lakes evidenced significant differences both in their variance and average compositions. Previous studies indicated conductivity and pH as the main environmental drivers of compositional differences in death diatom assemblages between these three lakes (Hassan 2015). These differences were mainly related to differences in the relative abundances of dominant taxa, rather than to species replacement among lakes. Hence, the brackish Las Mostazas lake was characterized by higher proportions of C. meneghiniana and C. placentula, which are known to display wider environmental tolerances and opportunistic strategies in the region (Rojas and Hassan 2017; Hassan and De Francesco 2018). This lake also exhibited the lower diversity and richness, which can be related to its higher values of conductivity, pH and hardness. Because these lakes are naturally exposed to episodes of drought and flooding, their environmental differences can become crucial in periods of drought, in which strong evaporation leads to the consequent substantial increase of ionic concentration (Cristini et al. 2017; Tietze and De Francesco 2017), as became evident during summer, when it was not possible to reach Las Mostazas lake as a consequence of low water levels. Under these circumstances, conductivity in Las Mostazas reaches values above 6 mS cm−1 (Cristini et al. 2017), surpassing the salinity tolerance of many freshwater diatoms, and posing physiological limitations to the development of their populations (Snoeijs and Weckström 2010). Hence, assemblages in this lake became mostly dominated by taxa adapted to higher conductivities and capable to cope with environmental changes under periods of stress, resulting in a reduced variability in composition around the average when compared to the more stable Los Carpinchos and Nahuel Rucá lakes.

Diatom assemblages from Pampean shallow lakes clearly exemplified the effect that within-lake variability exerts on between-lake comparisons. Even when the multivariate clouds of the three lakes differed significantly in their mean position, a high chance of collecting samples that are poorly representative of that difference does exist, as evidenced by the high overlap among their multivariate spaces in PCoA analysis. The extent and characteristics of this overlap can be related to environmental similarities between these lakes, as it became more evident as lakes displayed more environmental similarities. Accordingly, assemblages from Los Carpinchos, which showed intermediate conductivities, were highly overlapped with both Las Mostazas and Nahuel Rucá, while these two extreme lakes shared only a small part of their multivariate space. Hence, it becomes evident that the impact of within-lake variability in obscuring between-lake differences is related to environmental characteristics, suggesting that the problems associated to sampling representativity could tend to be reduced at longer environmental gradients. Consequently, the significant role of sampling effort in faithfully capturing assemblages composition may become emphasized when between-lake comparisons intend to capture subtle environmental differences among sites.

The results also indicated that the comparison of diatom assemblages can be significantly affected when within-lake variability differs among sites. Accordingly, lakes exhibiting higher internal variability (Nahuel Rucá and Los Carpinchos) showed also a higher proportion of samples lying in the overlapping areas of the plot. If overlooked, this variability might influence the outcome of multivariate analyses, and the consequent detection and quantification of differences between communities (Podani et al. 1993; Cao et al. 2002b; Schmera and Eros 2006). As differences among communities can be underestimated at low representativities (Cao et al. 2007), the impact of within-lake variability on between-lake comparisons can have noteworthy consequences for paleolimnological studies, not only for the construction of calibration sets but also for qualitative and semi-quantitative interpretations. Ordination techniques are widely used in paleolimnology, both to assess similarities in assemblage composition between adjacent points (modern and fossil samples) and to detect the underlying latent structure in the data (ter Braak and Prentice 1988; Legendre and Birks 2012). At low representativities, such ordination patterns could become more related to sampling biases than to real spatial or environmental gradients, with the consequent impact on the quality of the inferences derived from the datasets. Hence, obtaining representative sampling sets for assessing within-lake diatom assemblages composition should become an essential first step in order to construct reliable and robust datasets when conducting paleolimnological reconstructions from biological data in these environmentally heterogeneous environments.

Effect of replicate sampling on diatom representativity

Because diatom assemblages were found to be heterogeneously structured and exhibited high within-lake variability in Pampean shallow lakes, the accuracy of estimations on ecological parameters (such as diversity, richness and relative abundances) will greatly depend on sample representativity, and consequently, on sampling effort. The importance of sample size on ecological estimations has been largely recognized, as well as the difficulties that this sampling effort or spatial scale effect have for community estimates and interpretations (Cao et al. 2002a). The observed small-scale patchiness in the distribution of diatom assemblages points out the relevance of sampling design to obtain representative sedimentary samples. The use of single samples as representatives of the average community overlooks within-site variability, which can constitute a significant part of the total variation (Link et al. 1994; Barker et al. 2010; Dennis et al. 2010). Hence, sampling should be focused on elucidating and representing within-lake average assemblage composition, which can only be achieved through replication (Link et al. 1994). The importance of replication as a way to cope with all sources of variability that lead to observation errors has been largely recognized in ecological and monitoring studies (Link et al. 1994; Cao et al. 2002a, b; Weiloefer and Pan 2006; Barker et al. 2010; Dennis et al. 2010); but scarcely considered in paleolimnological and paleoenvironmental research (Earle et al. 1988; Charles et al. 1991; Bennington and Rutherford 1999; Heiri et al. 2003).

In Pampean shallow lakes, the strong spatial variability exhibited by sedimentary diatom assemblages constitutes a potential source of error to deal with in order to obtain full representation of within-lake average composition. Given the high prevalence of poor representative assemblages, single-sample based studies should be avoided. Here, multi-sample collection and replication arise as a promising approach to reduce uncertainties in sampling designs. Previous studies have pointed out the advantages collecting multiple samples from different parts of the lakes in order to improve paleoenvironmental reconstructions, particularly for quantitative inferences (Dixit and Evans 1986; Jones and Flower 1986; Earle et al. 1988; Charles et al. 1991). In the present study, simulation allowed us to evaluate the effect of replication on improving assemblages representativity, leading to the conclusion that collecting four samples would reduce noticeably the variability on diatom assemblages composition around the average, being those samples equally collected from both main lake subenvironments (i.e. littoral and open waters areas). In doing so, the average within-lake assemblage composition can be faithfully captured, even in highly variable lakes. These findings are in agreement with the previous results obtained by Weilhoefer and Pan (2006), who simulated the compositing of diatom samples and its impact on assemblages richness in morphologically complex wetlands, demonstrating that at least five composite samples were needed to characterize the diatom assemblage adequately.

The impact of replicated sampling in between-lake comparisons was also evident in the dataset, as results of PCoA demonstrated that collecting four samples was enough not only to capture within-lake average assemblages composition, but also to accurately reflect the significant compositional differences among lakes detected by the PERMANOVA. It is usually accepted that increasing sampling effort asymptotically increases the representativeness of the samples and also the separation of samples originating from different communities in multivariate spaces (Cao et al. 2002b; Schmera and Eros 2006). This is the case of the studied lakes, in which replication demonstrated to be useful not only to avoid the problems related to differences in within-lake variability, but also to capture between-lake differences at short environmental gradients. The impact of environmental gradient lengths in these comparisons was evidenced by the results of Monte Carlo resamplings: whereas only 2 replicates were necessary to completely avoid overlapping between assemblages average composition of Nahuel Rucá and Las Mostazas, the sampling effort required to capture Los Carpinchos compositional differences was of 4 replicates. Moreover, stratifying the samplings in order to cover main lake subenvironments also allowed to capture within-lake differences among littoral and open waters assemblages, highlighting the importance of replication to capture the main compositional characteristics of sedimentary diatom assemblages related to both local and regional environmental gradients.

Temporal changes did not seem to have an impact in diatom assemblages representativity, as no clear relationship among variability and season was evident in the dataset. Except for littoral samples from Los Carpinchos, which were mostly collected in winter, representative samples were almost equally distributed among seasons in these lakes. Bray Curtis distances to average composition were also similar among seasons, being autumn samples from Nahuel Rucá the only exception. This is not surprising, however, as death assemblages constitute time-averaged representations of living communities and integrate dead-valve inputs over long periods of time, capturing demographic and environmental stochasticity and leading to time-averaged species richness (Kidwell 2002; Tomasových and Kidwell 2011). In a previous example, Bunbury and Gajewski (2008) demonstrated no significant effect of temporal variablility at the scale of years on the results of diatom-based transfer functions, associated to high time-averaging in shallow lakes. Although sedimentation rates for Pampean lakes are unknown, age models for Nahuel Rucá suggested a span of 2.5–3 years for the first centimeters of sediment (Stutz et al. 2014). Moreover, the low live-dead fidelity exhibited by diatom composition in surface sediments suggested that time-averaging in death assemblages exceeds the timeframe of 1 year (Hassan 2015). Hence, results suggest that representative samples in these lakes can be regarded as independent of sampling season, since replication over spatial rather than temporal scales was required to capture average composition. Therefore, collecting a minimum of two littoral and two open waters samples from a single season would be enough to faithfully represent the average composition of diatom assemblages in these lakes, as demonstrated by Monte Carlo modeled replications. Certainly, more samples will always provide more information and reduce uncertainties, although the time and resources invested in obtaining such information will also depend on the balance between the costs of collecting and analyzing a higher number of samples and the improvements obtained (Heiri et al. 2003; Bennett et al. 2014). Moreover, even as the actual number of samples required will vary according to lake size and habitat complexity (Weilhoefer and Pan 2006), the minimum threshold of 4 samples identified in the present study seems to work well for lakes differing both in their environmental and compositional variability, encouraging the application of the results to other lakes and environments. Nevertheless, further studies covering different habitats and regions are needed in order to improve our knowledge of within and between-lake diatom assemblages variability and to evaluate the intensity of the sampling efforts required under contrasting environmental situations.

Conclusions

Diatom assemblages from Pampean shallow lakes surface sediments exhibited strong within-lake variability in composition. This spatial heterogeneity implied large variations in the representativity of single assemblages when compared to average composition, as a number of samples departed significantly from the lake’s mean assemblage composition and were ordered in the areas of the PCoA plot shared by two lakes. This finding impacts directly on the use of surface sediment assemblages as modern analogues in paleonvironmental reconstructions, as low representativities could mask true ecological patterns inside datasets, leading to errors in estimations of ecological parameters and biasing the outcomes of multivariate ordinations. This problem is particularly important for diatoms, as they exhibit a high variability and diversity when compared to other bioindicators, such as plant remains, cladocera or chironomids (Heggen et al. 2014). Clearly, choosing single samples as a proxy for diatom assemblage composition in these lakes would lead to unpredictable errors in between-lake comparisons of assemblages and should not be recommended.

Under this scenario, replication arises as an alternative to cope with representativity biases. Collection of a number of replicated samples from within-lake subenvironments improved significantly the representativity of the obtained diatom assemblages, as it allowed us to capture good estimations of their mean composition in the modeled data. As the source of variability was mainly the patchy distribution of certain diatom taxa, collecting replicate samples from different points of the lakes integrated mean diatom composition better than single assemblages. In the present study, a minimum of four samples was required in order to reduce uncertainties until negligible levels in shallow lakes, being half of these collected from each of both dominant lake subenvironments (littoral and open waters). However, as diatom-assemblage variability strongly depends on the intrinsic characteristics of each lake, the required number of samples for other lakes would be variable and should be addressed individually. In the present work, the lower variability exhibited by Las Mostazas diatom assemblages led to a lower number of replicated samples threshold. Although further work is needed in order to evaluate if these findings are applicable to other lakes or regions, it becomes evident that within-lake variability needs to be carefully considered if sedimentary assemblages are being used as modern analogues in paleoenvironmental research.

The obtained results have strong implications for the quantitative reconstruction of past environmental changes from fossil diatoms in these lakes. If compositional variability is assumed to have been similar in past and modern environments, hence fossil assemblages from a single core should be regarded as one of an equally wide range of possible past compositions. Moreover, compositional changes between some successive sedimentary levels could be in some cases consequence of past within-lake variability rather than responses to paleoenvironmental changes. Under these circumstances, including multiple samples in calibration sets would improve the chances of good analogy between modern and fossil assemblages, and using MAT (Modern Analog Technique, Overpeck et al. 1985) as transfer function method could lead to reliable quantitative estimations of past conductivity fluctuations. Alternatively, the widely used WA (Weighted Averaging, ter Braak and Looman 1986) and WA-PLS (Weighted Averaging-Partial Least Square, ter Braak and Juggins 1993) methods can be applied to down-weight species according to their indicative power, allowing to suppress the species with wide and overlapping environmental optima (Juggins and Birks 2012). Nevertheless, replication would be still needed in order to faithfully capture the optima and tolerances of single species in these environmentally heterogeneous lakes, as calculation over single samples could lead to biased inferences of species distributions over environmental gradients. For instance, the presence of S. longirostris in half of the samples of Nahuel Rucá indicates that by collecting a single sample a 50% chance of overlooking the presence of this indicator taxa on the lake exists, while collecting 4 replicated samples allowed us to capture this taxa in all the simulated resamplings. Hence, it can be concluded that including multiple samples from each lake when constructing calibration sets would be the best option to obtain reliable paleoenvironmental reconstructions from single sediment cores in these environmentally heterogeneous shallow lakes.