Introduction

Ecological status, an expression of the quality of the structure and functioning of aquatic ecosystems, has been adopted by the EU Water Framework Directive (WFD) to assess the quality of surface waters (European Commission 2000). The ecological status of surface waters is defined on the basis of the degree of deviation from type-specific pre-impact conditions. Palaeolimnological techniques have been recognised as one of the most powerful methods in the reconstruction of reference conditions for surface waters (European Commission 2000; Irvine et al. 2001).

Palaeolimnological methods can be used directly to reconstruct past biological assemblages and indirectly to reconstruct past nutrient levels quantitatively through transfer functions (Battarbee et al. 2005). The remains of diatoms (Bacillariophyceae) preserved in lake sediments have been widely used as biological indicators of water quality (Stoermer and Smol 1999; Battarbee et al. 2001). Diatom transfer functions can provide quantitative estimates of historical water quality from sedimentary sequences of diatoms, and thus a means of determining the level of current and past anthropogenic impacts (e.g., Cumming et al. 1992; Bennion et al. 2004). The development of transfer functions to model relationships between the composition of diatom assemblages and water chemistry in a training set of lakes has received renewed interest of late, because of their potentially useful role in supporting lake management, conservation and restoration (e.g., Battarbee et al. 2005; Denys 2006). Transfer functions have been developed for a range of factors that influence conditions within a lake, notably pH (Birks et al. 1990; Dixit et al. 1993) and total phosphorus (TP) (Hall and Smol 1992; Bradshaw and Anderson 2001), but also climate (Lotter et al. 1997; Schmidt et al. 2006) and salinity (Fritz 1990; Sylvestre et al. 2001). In the Irish Ecoregion (Ecoregion number 17) eutrophication and acidification are among the main causes of reduced water quality (McGarrigle et al. 2002; Jennings et al. 2003; Toner et al. 2005), while diatom remains increasingly feature in studies of relatively recent changes in lake conditions, as a result of heightened awareness of the value of palaeolimnological approaches and improved investment in environmental research (e.g., Leira et al. 2006; Taylor et al. 2006). Both trends look set to continue into the future, and there is thus a pressing need to develop pH and TP transfer functions that are robust and directly applicable to the Irish Ecoregion.

To date lakes on the island of Ireland have been included in either local (Anderson et al. 1993) or European-wide (Cameron et al. 1999) training sets that have then been used to derive pH and TP transfer functions. The diatom-phosphorus model developed by Anderson et al. (1993) was based on 43 lakes from a limited geographical area (mainly the counties Down and Armagh in the north of the island) with the majority of lakes being eutrophic and hypertrophic, and thus at the upper end of the TP gradient. Only one Irish lake (Lough Maam) was included in a European-wide diatom-pH training set (Cameron et al. 1999). Preliminary diatom training sets linking diatom remains in surface sediments to lake water quality (specifically pH and TP) in the Irish Ecoregion were developed and applied in Leira et al. (2006) and Taylor et al. (2006). However, the former comprised 35 lakes, all of which had been previously identified by the Environmental Protection Agency in Ireland as candidate reference lakes (CRLs) because of their perceived location in catchments that had been little impacted by humans, and most of which had low nutrient levels (<20 μg l−1 TP), while the latter comprised a limited number of impacted mesotrophic and eutrophic lakes in addition to the 35 CRLs. In neither case was the construction of transfer functions described in detail.

This paper describes the development of transfer functions for inferring pH and TP based on surface sediment diatoms and related environmental data from an expanded training set comprising 72 lakes in the Irish Ecoregion, and provides the first comprehensive investigation of surface sediment diatoms across the Irish Ecoregion. The transfer functions for pH and TP discussed are intended not only to assist in the interpretation of diatom-based palaeolimnological studies, but also to enhance the role of palaeolimnological approaches in the implementation phase of the WFD.

Study sites and field data collection

A total of 72 lakes was selected to provide both pH (5.1–8.5) and TP (4.0–142.3 μg l−1) gradients for a range of lake types in the Irish Ecoregion (Table 1). The selected lakes are located in 12 counties across the island, but mainly along its western seaboard (Fig. 1). A lake typology scheme for the Ecoregion comprising 12 typology classes was proposed by the EPA, based on alkalinity, mean depth and lake area (Toner et al. 2005). The 72 lakes selected in the current study encompass the range of lake types that are well-represented in the Irish Ecoregion, with each category accommodating three to 11 lakes.

Table 1 List of the 72 study lakes in the diatom training set with selected environmental variables from the Irish Ecoregion
Fig. 1
figure 1

Location of 72 study lakes included in the diatom training set for the Irish Ecoregion (not all sites are labelled)

Bathymetric surveys were carried out using a handheld echo-sounder and a portable GPS. A Renberg (Renberg 1991) gravity corer (HTH Teknik, Vårvågen 37, SE-95149 Luleå) was used to extract sediments for 58 lakes in the summer of 2003 and 2004. Sediment cores were sub-sampled using a vertical extruder immediately after coring in the field with the top 0.5 cm used for diatom analysis. An additional 14 lakes were sampled using an Ekman grab, with sediment samples (the top 2–3 cm) collected from the profundal zone (Chen 2006). Sub-samples were kept cool and out of direct sunlight before analysing in the laboratory.

Physico-chemical data for the training set lakes are summarised in Table 2. Most lakes in the diatom training set are small, with lake areas <200 ha; catchment areas <20,000 ha; and catchment to lake area ratios <100:1. The mean depths of lakes in the training set vary between 0.7 and 19.8 m, while maximum depths range from 1.1 to 45.7 m. All related chemical and physical data for the training set were assembled from Irvine et al. (2001), Toner et al. (2005) and Wemaëre (2005). Chemical data were mainly collected during the summer season for most of the lakes with a sampling frequency varying from one to nine times per year. Land cover data were extracted from the Irish National CORINE 2000 database.

Table 2 Summary statistics for the 17 environmental variables of 72 Irish lakes

Laboratory methods

Diatom analysis

Surface sediment samples were prepared for diatom analysis using standard methods (Battarbee et al. 2001). The procedure involved: oxidation of organic matter by adding 5 ml of 30% hydrogen peroxide (H2O2) to 0.1 g of wet sediment in a water bath at 80°C for around 4 h; adding 5–10 drops of 10% hydrochloric acid (HCl) to eliminate the remaining H2O2 and carbonates; repeated washing with distilled water and centrifuging at 1200 rpm for 4 min; drying 5 ml of the slurry on a coverslip; and mounting with Naphrax® (a resin of high refractive index) on a hotplate at 100–150°C for around 10 min.

At least 300–500 valves were identified and counted by G. Chen and M. Leira along transects at 1000× magnification under phase contrast microscope with oil immersion. Diatom nomenclature and taxonomy mainly follow Krammer and Lange-Bertalot (1986, 1988, 1991a, b, 2000), together with other supplementary references (Foged 1977; Stevenson et al. 1991; Lange-Bertalot and Metzeltin 1996; Prygiel and Coste 2000; Håkansson 2002). Frequent meetings and discussions enabled a high level of agreement in diatom identification.

Numerical methods

Multivariate analyses were employed to examine the variation in diatom assemblages and their distribution along measured environmental gradients. Missing environmental data were substituted with the mean values prior to data transformation and multivariate analysis (Legendre and Legendre 1998). Normalizing transformation (log10(x + 1) and square root) of environmental data reduced the influence of measurement unit and extreme values (Lepš and Šmilauer 2003). Diatom data were square root transformed to stabilize variance with rare species down-weighted.

A standardised principal components analysis (PCA) was used to explore the variation within the measured environmental data as they are often measured in different and non-comparable units. A detrended correspondence analysis (DCA) of diatom assemblages (3.72 SD) indicated that the diatom taxa were generally behaving in a unimodal pattern along the measured environmental gradients (ter Braak and Prentice 1988). Correspondence analysis (CA) and canonical correspondence analysis (CCA) were therefore employed to explore the distribution of diatom taxa in more detail. Analysis of similarities (ANOSIM) based on Bray–Curtis Similarity (999 permutations) was used to test the difference between groups of lakes, classified according to alkalinity, lake size, water depth as well as CRL status (Clarke and Warwick 2001). Dissimilarity values (R) range from 0 (minimal separation) to 1 (maximal separation). Co-linearity between environmental variables was identified using variance inflation factors (VIF) and variables with VIF >20 were deleted (ter Braak and Šmilauer 2002). Forward selection was used to select those environmental variables that appeared to exert greatest influence on the diatom data (Legendre and Legendre 1998). Monte Carlo permutation tests were used to test the significance of each variable and only those variables with p < 0.05 under 999 permutations were accepted (Lepš and Šmilauer 2003). All the ordination analyses, as well as forward selection, were performed using the computer package Vegan in version 2.2.1 of the R program (R Development Core Team 2006). ANOSIM was performed using PRIMER (version 5.2.9).

Calibration analysis expresses the values of environmental variables as a function of species (ter Braak 1987) and the regression function so constructed is the transfer function used to predict environmental variables from diatom data. The length of the first axis of detrended CCA (DCCA) constrained by the environmental variable of interest generally indicates the degree of species turnover, with a length of >2 SD suggesting unimodal-based methods are appropriate (Birks 1998). The dataset was analysed using both weighted averaging (WA) and weighted averaging partial least squares (WA-PLS) (ter Braak and Juggins 1993) techniques, as diatom assemblages were found to vary unimodally in response to the main environmental variables. Several WA models were run, including inverse or classical deshrinking and with or without species tolerance down-weighted (Birks 1995). WA-PLS is not discussed further here as it was outperformed by WA when applied to the Irish Ecoregion training set. Calibration analyses were implemented in the programme C2 (version 1.4.2) (Juggins 2003).

Data manipulation can critically influence model performance (Birks 1995). Both untransformed or log10(x + 1)-transformed TP and untransformed or square root transformed diatom data were explored, while the leave-one-out jack-knifing approach was used in the cross-validation of models (Birks 1995). Performances of all the data transformations and model cross-validation were assessed, before selection of an optimal format for model development. Root mean square error (RMSE), coefficient of determination (r 2) and the bias between estimated and observed values were calculated as measures of the performance of the inference model, as is routine (Smol 2002). A model with a low prediction error and high predictability in cross-validation (RMSEP), as well as low bias, is preferred in model selection. In addition, the removal of outlier sites in the training set can also improve model performance (Hall and Smol 1992; Gasse et al. 1995). The removal of two outlier sites from the 72-lake diatom training set improved the performance of the WA models significantly, as has been found elsewhere (e.g., Tibby 2004). Lough Veagh was identified as an outlier site mainly due to a measured TP value below detection limits (<4 μg l−1), which generated a large difference between observed and predicted values of TP. Lough Caragh was the second outlier identified: the surface sediment diatom assemblage in Lough Caragh was characterised by Aulacoseira subarctica (O. Muller) Haworth (49.5%), a meso-eutrophic species, while the measured TP value was only 5.5 μg l−1.

Results

Features of environmental data

PCA of the 17 environmental variables (not illustrated here) revealed two main gradients along the first two axes (axes one and two). According to the broken stick model (Jackson 1993), axes one and two were the only PCA axes to account for substantial levels of variation: axis one accounted for 33.8% of the total variance in environmental data, while axis two accounted for 19.5%. Axis one is highly correlated with alkalinity, conductivity, pH, TP, and land cover variables (peat and pasture). Physical variables (catchment area, lake area and maximum depth) and colour appear to be important components of axis two. Strong co-linearity occurs between some environment variables, including chlorophyll-a and TP (r = 0.79, p < 0.001), and also among pH, alkalinity and conductivity. There are significant but relatively weak co-linearity between pH and TP (r = 0.37, p < 0.01), with all lakes with pH <6.0 having TP values of below 20 μg −1 and therefore in the oligotrophic to mesotrophic range. Eutrophic lakes, by comparison, are not acidic.

Surface sediment diatoms

In total 602 diatom taxa were identified in the surface sediments of 72 lakes, 233 of which had abundances ≥1% in at least three sites (Appendix 1). To reduce the influence of rare species, only these 233 diatom taxa were used for multivariate analysis and model construction. Achnanthidium minutissimum (taxon code = 10; taxon codes and authorities are listed in Appendix 1) is the most common species, occurring in 70 of the 72 lakes, and has the highest Hill’s number (N2 = 37.9) (Hill 1973). Asterionella formosa (14), Cocconeis placentula var placentula (59), Puncticulata radiosa (200), Staurosira construens var venter (215), Stauroforma exiguiformis (219) and Tabellaria flocculosa (231) are also common, occurring in more than half of the 72 lakes and receiving high Hill’s numbers (>10).

The first two CA axes of the 233 diatom species with abundances ≥1% at three or more sites explained 21.7% of total variance in the diatom data (Fig. 2). Most of the 72 sites displayed a distribution strongly associated with the first two CA axes. Lakes Dan (DAN), Dunglow (DUN), Nahasleam (NAH) and Upper (UPE), which are clustered at the far right side of axis 1, have high abundances of non-planktonic Eunotia incisa var incisa (86), Frustulia saxonica (119), Stauroforma exiguiformis (219) and Tabellaria flocculosa (231). A second cluster of lakes, including Annaghmore (ANN), O’Flynn (OFL), Rea (REA) and Talt (TAL), is located at the upper end of CA axis 2 and is mainly associated with high abundances of non-planktonic taxa, such as Amphora pediculus (20), Pseudostaurosira brevistriata (203), P. pinnata (227) and Staurosira construens var venter (215). A third cluster, including Atedaun (ATE), Ballyteige (BAT), Morgan (MOR) and Rosconnell (ROS), located at the lower end of axis 2 is associated with planktonic taxa indicative of more productive waters (e.g., Asterionella formosa (14), Aulacoseira subarctica (30) and Stephanodiscus parvus (226)).

Fig. 2
figure 2

CA plots showing the 72 sites (CRLs = triangles; non-CRLs = open circles), and 233 common taxa (labelled by numbers) (Not all sites are labelled; refer to Table 1 and Appendix 1 for site and species codes respectively)

CRLs and impacted lakes are clearly separated on the CA plot (Fig. 2). Diatom assemblages of CRLs are mainly composed of non-planktonic taxa, while eutrophic and planktonic diatoms typically characterise impacted lakes. A strong dissimilarity between CRLs and impacted lakes (R value = 0.50, p < 0.01) is evident in the ANOSIM results. Levels of similarity within lakes grouped according to contrasting alkalinity, lake depth and lake area are illustrated in Table 3. The group containing lakes with low and high alkalinities displayed a strong dissimilarity (R = 0.78, p < 0.01), while medium alkalinity lakes showed a closer similarity to those with high alkalinity than to low alkalinity sites. In contrast, groups with contrasting lake depth or lake area showed either statistically non-significant difference or significant but with very high similarity.

Table 3 ANOSIM results for different groups of lakes

Diatom–environment relationships

VIF of the 17 environmental variables reported in CCA output show that catchment area, lake area and catchment area: lake area ratio are strongly dependent on other variables. Catchment area and forestry are not significant at the p-level of 0.05, and were removed before further data analysis. Alkalinity, conductivity and pH independently explained the highest variances (8.2, 7.7 and 7.4% respectively) in the diatom data, but their VIF values suggest that moderate co-linearity exists for these three variables. Each of the two nutrient variables, TP and chlorophyll-a, explains about 6% of total variance.

Automatic forward selection identified pH and chlorophyll-a from remaining environment variables, which explained 13.4% of the total variance. However, automatic selection can be misleading when several correlated variables explain similar levels of variation and small changes in the data can change the selection results. For example, chlorophyll-a on its own accounted for 6.0% of total variance, similar to TP (5.9%). The strong correlation between chlorophyll-a and TP is to be expected, given that TP is known to exert a strong influence on algal biomass and overall trophic status (OECD 1982). Forward selection with TP manually selected along with pH and maximum depth explained 15.8% of the total variance in the diatom data. TP and pH shared only 0.7% of the total variance, indicating that these two variables exert strong influence on diatom assemblages independently. There were also strong species–environmental correlations (0.85–0.95) for the three constrained CCA axes. The diatom taxa display a triangular formation in the CA and CCA plots, indicating that the internal structure of the diatom assemblages was captured by the measured environmental variables. Variance in the diatom assemblages is captured by the first two ordination axes of both CA and CCA, which were strongly correlated with pH and TP respectively.

Generalist species are located near the origin of the CCA plot, including Achnanthidium minutissimum (species code 10), Eolimna minima (92), Mayamaea atomus (144) and Navicula radiosa (175) (Fig. 3). Acidophilic taxa (e.g., Eunotia exigua (80), Eunotia incisa var incisa (86), Fragilaria virescens (124) and Frustulia saxonica (119)) locate mainly to the right of CCA axis one. Some taxa (e.g., Staurosira elliptica (218), Stephanodiscus alpinus (213) and Ulnaria ulna (233)) have their highest abundances in alkaline waters. Planktonic Aulacoseira granulata var angustissima (26), Cyclostephanos invisitatus (50) and Stephanodiscus hantzschii f. tenuis (221) are positioned towards the high end of the TP gradient, together with epiphytic Fragilaria ulna Sippen angustissima (123) and Fragilaria ulna var acus (122). These taxa are also associated with shallow lakes. In contrast a number of taxa (e.g., Brachysira vitrea (35), Cyclotella schumanni (63), Eunotia rhynchocephala var rhynchocephala (96) and Peronia fibula (192)) are more abundant in deep lakes.

Fig. 3
figure 3

CCA plot of 233 diatom taxa constrained by TP, pH and maximum depth (represented by arrows). See Appendix 1 for interpretations of diatom species codes

Diatom-inferred pH and TP transfer functions

In the development of transfer functions, one means of assessing the viability of selected environmental variables is the ratio of the first two eigenvalues of partial CCA (i.e. the ratio of λ 1 to λ 2) (e.g., Hall and Smol 1992). Variables with high λ 1/λ 2 ratios (>1) generally produce viable and strong calibration models. In the current work, pH, conductivity and alkalinity generated the highest ratios, although only pH produced a λ 1/λ 2 ratio >1, while TP produced a λ 1/λ 2 ratio of 0.673. The latter, although relatively low, compares favourably with λ 1/λ 2 ratios for TP from other studies, e.g., 0.50 (Gregory-Eaves et al. 1999), 0.45 (Wunsam and Schmidt 1995), 0.42 (Reavie et al. 1995) and 0.4 (Hall and Smol 1992).

The use of log-transformed TP data in WA modelling led to enhanced inference model performance when compared with untransformed data. In addition, the removal of data from two outlier sites (loughs Caragh and Veagh) substantially improved performances in terms of jack-knifed r 2 and RMSEP. Performances of pH and log-transformed TP models based on raw and square root transformed diatom data of 70 lakes are reported in Table 4. The WA model with classical deshrinking and based on transformed species data was the optimum for pH (r 2 jack = 0.89, RMSEP = 0.32), while the optimum TP model was based on untransformed diatom data with tolerance down-weighted and inverse deshrinking (r 2 jack = 0.74, RMSEP = 0.21 (log10 μg l−1)). Optimum models for pH and TP show good correspondence between the observed and diatom-predicted values (Fig. 4).

Table 4 Summary of WA models for 70 Irish lakes
Fig. 4
figure 4

WA calibration models for pH (with species data square-root transformed and classical deshrinking) and TP (log10(x + 1) transformed, with species data untransformed, tolerance down-weighted and inverse deshrinking)

Discussion

Diatom distribution and species responses

Diatom assemblages from impacted lakes were generally more homogeneous when compared with those of reference lakes, presumably because of a loss of habitats and biodiversity as a result of human activity. Among the three physico-chemical variables proposed for classifying lakes in the Irish Ecoregion (Toner et al. 2005), alkalinity is shown to have a significant influence on diatom assemblages, while no significant effects are attributable to lake depth and area for the 72 lakes included in this study. A strong influence of alkalinity and only a weak influence of depth on diatom assemblages were also reported by Bennion et al. (2004) for lakes in the UK, including Northern Ireland.

The length of the TP gradient (4–142 μg l−1) in the training set used in the current study is comparable with those of training sets in the published literature (e.g., Gregory-Eaves et al. 1999; Kauppila et al. 2002; Miettinen 2003; Ramstack et al. 2003), while the distribution of individual taxa along the TP gradient generally corresponds well with previously published work (Wunsam and Schmidt 1995; Lotter et al. 1998; Bradshaw and Anderson 2001). However the pH–conductivity gradient in the current study explained the largest single share of total variance in the diatom assemblages, and this has also been found in other diatom training sets constructed to infer TP (e.g., Reavie and Smol 2001). The ratios of λ 1/λ 2 for pH were higher than for TP in the current study, as has been reported in other studies (e.g., Dixit and Smol 1994; Dalton 1999; Enache and Prairie 2002). Generally there is a strong relationship between diatoms and pH due to the direct physiological influence of pH on diatoms. A weaker diatom–TP relationship may be a result of factors such as physical mixing and silica availability (Reynolds 1984).

TP optima values for the 233 common taxa are all below 100 μg l−1: only 19 taxa have TP optima of above 40 μg l−1 in this study (see Appendix 1). Furthermore, 112 (almost 50%) of the common taxa have TP optima of less than 10 μg l−1. The relatively few taxa with high TP optima reflects the predominance of lakes in the current training set that are located towards the lower end of the TP gradient. By comparison, diatoms used in the development of a TP transfer function based on data from lakes in Northern Ireland (Anderson 1997) generally show relatively high TP optima values than those in the current study, owing to the larger number of eutrophic and hypertrophic lakes and a broader TP gradient (15–800 μg l−1). Thus, when compared with the current study, optima were slightly higher for Aulacoseira subarctica (30) and Puncticulata radiosa (200) with TP optima at the low end of the TP gradient, and much higher for Stephanodiscus hantzschii (220) and S. parvus (226), taxa with high TP optima.

Results from a survey of modern epilithic algae in samples from 32 lakes in Ireland across a TP gradient of 3.6–90.5 μg l−1 (DeNicola et al. 2004) provide a second basis for comparison with diatom TP optima determined in the current study. Accordingly, 8 of the 10 taxa common in both studies are mainly benthic or littoral dwellers in lakes, including Gomphonema parvulum var parvulum (135), Staurosirella construens var venter (215) and S. pinnata (227). Their TP optima according to results from the two studies are generally in close agreement, e.g., 25.6 and 25.3 μg l−1 for Fragilaria gracilis (110), 23.1 and 23.6 μg l−1 for S. pinnata and 13.3 and 19.4 μg l−1 for Tabellaria flocculosa (231). However, Aulacoseira ambigua (2) and Asterionella formosa (14) display relatively large differences with TP optima of 22.7 and 36.0 μg l−1 for both species in the current study in comparison with 50.3 and 21.9 μg l−1 respectively in DeNicola et al. (2004). Both Aulacoseira ambigua and Asterionella formosa are commonly found in open waters and the large difference in calculated TP optima could be due to sampling differences between the two studies. Surface sediments are expected to integrate a far greater variety of environmental conditions, and therefore potentially far more complex assemblages of diatoms, than is likely to be accommodated within the range of rock surfaces sampled by DeNicola et al. (2004).

The taxa Eunotia incisa var incisa (86), E. exigua (80), Peronia fibula (192) and Pinnularia subcapitata var subcapitata (204) are typically acidophilus in the Irish Ecoregion, while Amphora pediculus (20), Diploneis elliptica (69), Karayevia clevei (140) and Stephanodiscus neoastreae (225) are most abundant where pH >8.0. The pH optima in published European training sets (Birks et al. 1990; Cameron et al. 1999; Bradshaw and Anderson 2001) are generally lower than the WA-inferred pH values calculated in the current study, presumably because the former are based upon a pH gradient that accommodates highly acidified sites.

Diatom inference models

The optimum diatom pH and TP models for the Irish Ecoregion that underpin this paper are comparable with most other diatom pH and TP transfer functions in regard to training set size, environmental gradients and model performances. The pH model developed has a similar predictive capability and yielded similar errors to two European-wide pH models (Birks et al. 1990; Cameron et al. 1999). These two models were based on a larger number of training set lakes (167 and 118) and accommodated a similar range of pH values (4.3–7.3 and 4.5–8.0) when compared with the current study. The optimum TP model described here outperformed the diatom-inferred TP model developed for Northern Ireland with a higher predictive capability (Anderson and Rippey 1994). The differences in performance could be partly due to the larger number of lakes in the current study, despite a narrower range of TP values. TP transfer functions based on a smaller number of lakes often display weaker predictability and/or higher prediction errors (Reavie et al. 1995; Bradshaw and Anderson 2001; Tibby 2004). Moreover, the TP model developed in the current research also performs well when compared with TP transfer functions derived from a similar TP gradient (e.g., Miettinen 2003). The use of forward selection and variance partitioning confirm that pH and TP are tracking unique water quality characteristics in the current training set suggesting, therefore, that both models can be applied simultaneously in down-core analysis to reconstruct lake water pH and TP independently.

Log-transformation of TP data improved the performance of the models, possibly because a more normalised distribution of log-transformed TP would strengthen the unimodal response of diatom species. Untransformed biological data generally outperformed the square root transformed data in the TP model but showed a reverse influence in the pH model. It therefore appears that untransformed ecological data provide valuable information on the strength of relationship between biological assemblages and environmental variables. This phenomenon was also observed by Koster et al. (2004).

Although the diatom-inference models developed in the current research are robust, with high predictability and low prediction errors and bias, there are still many possible sources of error, as highlighted by Anderson (1997). These errors include taxonomic harmonization, spatial variability of biological assemblages in surface sediments and the influence of unmeasured variables. Additional sources of error are sampling differences (both sediment core top and Ekman Grab samples were used in the current study) and the low frequency of water quality measurements. Many of the lakes included in the training set were only sampled for hydrochemistry on relatively few occasions. Increasingly, studies suggest that diatom-TP training sets may be much less effective when applied to shallow lakes (Bennion et al. 1995; Sayer 2001). In shallow lakes, macrophytes can colonise large areas, while light, substrate conditions, grazing and the prevalence of non-planktonic diatoms can further complicate the relationship between diatoms and trophic status (Bennion et al. 2005). As no significant difference in diatom assemblages was found between shallow and deep lakes, additional sampling is needed to construct diatom-nutrient models specific for shallow lakes in the Irish Ecoregion.

Conclusions

Surface sediment diatoms from 72 lakes in the Irish Ecoregion displayed a strong response to measured environment gradients. Acidity and nutrient gradients were the most important in controlling diatom distribution. Selected candidate reference lakes (CRLs) in the Irish Ecoregion showed strong dissimilarity in diatom assemblages when compared with impacted lakes. Of the criteria used to classify lakes in the Irish Ecoregion, alkalinity was found to have a significant influence on diatom assemblages. Diatom-inferred WA models with strong predictability and relatively low prediction errors were developed for pH and TP. Manipulation of both TP and diatom data significantly influenced the performances of the models. The pH and TP transfer functions developed in the current research can be readily and most appropriately applied to the reconstruction of lake water quality, the determination of reference conditions, the setting of restoration targets and in ecological assessment in the Irish Ecoregion.