Introduction

The Water Framework Directive requires European Union (EU) member states to intercalibrate national methods to ensure consistent application across the EU. The first attempt at such an intercalibration for benthic diatoms focussed on countries in temperate regions of western and central Europe and the Baltic states (Kelly et al., 2009b). The ability to compare national approaches was aided by the widespread adoption of European standards for sampling and analysing diatoms in rivers (CEN, 2003, 2004) and by the use of broadly-similar metrics for assessing ecological status, mostly based on the weighted average equation of Zelinka & Marvan (1961). However, Kelly et al. (2009b) also describe a number of challenges that could not be overcome in this first phase of comparison and which may have contributed uncertainty to the exercise. One of these was the lack of consistent approaches to diatom identification. The nature of the intercalibration exercise was such that data were accepted at face value with no attempt at taxonomic harmonisation. However, diatom taxonomy is in a state of flux, with the result that a single entity may be treated in a number of different ways, depending on the analyst and national conventions. In some cases, such issues are straightforward cases of synonymy [e.g. Synedra ulna (Nitzsch) Ehrenberg, Fragilaria ulna (Nitzsch) Lange-Bertalot and Ulnaria ulna (Nitzsch) Compère], but there were also several cases where new species have been described and the national adoption of these varies, with some analysts using only the Süßwasserflora von Mitteleuropa (Krammer & Lange-Bertalot, 19861991) as their standard Floras, whilst others supplement these with more recent monographs. There are also differences in the extent to which subspecific taxa are reported: Cocconeis placentula Ehrenberg varieties, for example, were regarded by some countries as being difficult to separate and offering little extra ecological information over species-level identification, whilst others regarded the autecological signals of the members of the C. placentula complex to be sufficiently distinctive to merit their separation (e.g. Monnier et al., 2007b).

The intercalibration was based on a hybrid metric, itself composed of two metrics (Indice de Polluosensibilité Spécifique—Specific Pollution sensitivity Index, IPS: Coste in CEMAGREF, 1982 and Trophienindex—Trophic Index, TI, Rott et al., 1999). Values for samples, calculated using national metrics, were converted to the corresponding value for the ‘Intercalibration Common Metric’ (ICM) and a regression between national metric and ICM was then used to convert high/good and good/moderate status boundaries from the national metric to the ICM (Kelly et al., 2009b). The boundary values for different countries can then be compared directly. As both of the component metrics of the ICM require fine-scale diatom taxonomy, it is possible that differences in taxonomic conventions will influence the relationship between the national metric and the ICM and, as a result, affect the final comparison of boundaries between ecological status classes.

However, intercalibration is, by necessity, based upon data collected by member states to their own data standards. That these differ between countries reflects the dynamic state of diatom taxonomy; not all genera described in recent years are universally accepted [see, for example, Cox (2003) on Hippodonta and Monnier et al. (2007a) on Psammothidium]. There are, in short, both practical and conceptual reasons why a consistent application of fine-level taxonomic concepts is not possible. This, in turn, raises the possibility that taxonomic conventions themselves may affect the value of the boundaries. One alternative is to harmonise taxonomy at a level at which all participants in the intercalibration do agree and this is the idea explored in this paper. The crudest approach might be to ignore interspecific differences altogether, adopting a genus-level index such as the Indice Diatomique Générique (IDG; Rumeau & Coste, 1988; Coste & Ayphassorho, 1991). Another option is to merge the taxa that analysts regard as being particularly troublesome and which are known to contribute to variability in ring-test results (Kelly et al., 2002; Prygiel et al., 2002; Kahlert et al., 2009). A further possibility is to base assessments only on taxa which are widespread and/or abundant in datasets (Prygiel et al., 1996; Lavoie et al., 2009). A final possibility, also explored here, is to adjust taxonomic conventions back to those in use before the taxonomic upheavals of the past 30 years. In this study, results using modern taxonomy (mainly Krammer & Lange-Bertalot, 19861991) have been compared with those obtained using the concepts of Hustedt (1930). The purpose of these analyses is largely the pragmatic one of ensuring that intercalibration of ecological status class boundaries is not compromised by taxonomic inconsistency. However, by exploring the extent to which fine-level taxonomy has the capacity to influence ecological assessments it is also possible that light will be shed on broader issues of how recent developments in our understanding of diatom diversity may affect our understanding of diatom ecology and stream functioning.

Materials and methods

Dataset

This exercise is based on data submitted by the Grand-duchy of Luxembourg to the first round of the intercalibration exercise (Kelly et al., 2009b). This dataset was chosen as it represents a small but geologically variable area and was analysed to a high degree of taxonomic resolution by a small group of analysts. The development of the Luxembourg diatom assessment system is described in more detail in Rimet et al. (2004).

Effectiveness of genus and species-level assessments and effect of removing uncommon and infrequent taxa

The metric used for diatom assessments in Luxembourg is the IPS and the scale of redundancy in this dataset was explored by comparing the relationship between the IPS and datasets in which groups of taxa had been merged into broader categories. The most extreme of these was the IDG; another involved the merger of taxa that analysts regard as being ‘difficult’, and which are consequently likely to contribute to variability between analyses. The IDG was developed as a simpler alternative to the IPS (Prygiel et al., 1999) and the two indices have subsequently developed in tandem such that it is reasonable to assume that the main difference between IDG and IPS is the taxonomic level (rather than biogeographical or methodological factors). The list of merged taxa (Table 1) is based on Kahlert et al. (2009) plus informal discussions with colleagues involved in the intercalibration process (Kahlert et al., 2012). Revised values for sensitivity for the IPS were calculated as the average sensitivity for those taxa in the dataset weighted by the relative frequency of records of each taxon in the dataset [so if 23 records of taxon A, with a sensitivity of 2.4 are merged with 8 records of taxon B, which has a sensitivity of 2.1, the sensitivity of taxon AB will be (23 × 2.4) + (8 × 2.1)/(23 + 8) = 2.3]. Indicator values for these merged taxa were set to 1. In addition, taxa which were only identified to genus level and those which have a largely planktonic habit (Asterionella formosa Hassall, Cyclostephanos, Fragilaria crotonensis Kitton, Stephanodiscus, Thalassiosira) were removed altogether.

Table 1 List of taxa omissions and amalgamations used in the analyses reported in this paper

The effectiveness of taxa streamlining was evaluated in three ways (Fig. 1):

Fig. 1
figure 1

Schematic diagram illustrating the approach used to assess similarity between the IPS (the ‘original’ index) and variants based on simplified taxonomy (‘modified index’). The effectiveness of the revised index is evaluated either as (a) the similarity between the original and modified metrics (evaluated as Lin’s concordance correlation coefficient, ρc) or (b) as the proportion of samples that are correctly classified as either ‘good or better’ status (top right quadrant) or ‘moderate or worse’ (bottom left quadrant) status (dotted lines indicate hypothetical good/moderate status class boundary). See text for more details

  1. (a)

    as the concordance correlation coefficient, ρc (Lin, 1989) between the original and revised metrics. This is a modification of correlation analysis which assesses the deviation from a perfect 1:1 relationship between the two variables. This was calculated by means of the epiR package (Stevenson, 2010) within the R statistical package (R Development Core Team, 2006).

  2. (b)

    as the proportion of samples that are correctly classified as either ‘good or better’ status (top right quadrant) or ‘moderate or worse’ (bottom left quadrant) status. In the example illustrated in Fig. 1, six samples (indicated by closed circles) are classified differently depending on the method used (top left and bottom right quadrants). For simplicity, the ecological status boundaries used by the neighbouring Belgian region Wallonia were used, as these are not dependent upon a typology; and,

  3. (c)

    as the ability of the metrics to predict the underlying pressure gradient (expressed as the coefficient of determination, R 2). The first axis of a Principal Component Analysis (PCA) of water chemistry samples [see Rimet et al. (2004) for methods] collected at the same time as the diatom sample was used as a proxy for the pressure gradient. This was calculated by the Vegan package (Oksanen et al., 2007) within R. The first axis accounted for 66.2% of the total variation in the chemical data and was highly correlated with nitrite-N (\( {{\text {NO}^{-}_{2}}} \)-N), ammonia-N (\( {{\text {NH}^{\cdot}_{4}}} \)-N), total P (TP) and P as orthophosphate (\( {{\text {PO}_{4}}^{3-}} \)-P) (Table 2).

Table 2 Summary statistics for nutrient concentrations (mg l−1) in the dataset used for this study (based on spot samples), along with correlations between these and axis 1 of a Principal Component Analysis of these variables

Many taxa are only present in a few samples and/or at low relative abundances. Their contribution to weighted average-based metrics was evaluated by computing the IPS after screening the dataset to remove these taxa. Three criteria were evaluated:

  1. (a)

    only include taxa the maximum relative abundance of which in the dataset exceeds a threshold. Thresholds from ‘present’ (ca. 0.149% ≈ 1 valve counted, expressed as relative abundance) to 20% were tested;

  2. (b)

    only include taxa the representation of which in the dataset exceeds a threshold. Thresholds from a single record to present in 20% of samples were tested;

  3. (c)

    only include taxa the maximum relative abundance of which in a sample exceeds a threshold. Thresholds from ‘present’ to 10% of the total relative abundance were tested.

Having performed these tests on the IPS, the study was then extended to the TI (Rott et al., 1999) and the ICM (Kelly et al., 2009b). The sensitivity values for the TI were re-calculated in the same way as for the IPS; indicator values were set to 1 for all taxa except the Amphora pediculus (Kützing) Grunow, C. placentula and Nitzschia palea (Kützing) W. Smith complexes, where these were set to 2. In addition, Mayamea and Fistulifera species retained their separate identities for this streamlined version of the TI. In order to calculate the ICM, the IPS and TI were both converted to Ecological Quality Ratios (EQRs) by dividing observed IPS and TI values by the ‘expected’ reference values used for the EU intercalibration exercise (Kelly et al., 2009b), set at 15.65 for IPS and 2.44 for TI [note that the TI needs to be inverted (i.e. 4—observed TI) before the EQR calculation]. The ICM is the average of the IPS-EQR and TI-EQR.

Comparison between ‘modern’ and ‘Hustedt’ taxonomy

In order to examine the extent to which recent taxonomic developments had given deeper insights into diatom ecology, detrended correspondence analyses (DCA; Hill, 1979) were performed, using two variants of the dataset: first, with the ‘modern’ taxonomy (i.e. the dataset as supplied to the intercalibration exercise) and then with the taxonomy adjusted to the conventions in Hustedt (1930). In many cases, there was direct agreement between taxa in Hustedt and those recognised today. In other instances, Hustedt (1930) had a simpler system recognising, for example, only a single taxon ‘Cymbella ventricosa Kützing’, whereas modern analysts recognised several species including Encyonema minutum (Hilse in Rabenhorst) D.G. Mann and E. silesiacum (Bleisch in Rabenhorst) D.G. Mann. E. reichardtii (Krammer) D.G. Mann would probably also have been included in Hustedt (1930)’s C. ventricosa, as would several more recently described or resurrected taxa E. lange-bertalotii Krammer, E. ventricosum (C. Agardh) Rabenhorst (Krammer, 1997), though these latter species were not encountered in this study. Several of the smaller naviculoid diatoms [e.g. Navicula ingenua Hustedt, Sellaphora radiosa (Hustedt) H. Kobayasi (Syn.: N. joubaudii H. Germain), Craticula minusculoides (Hustedt) Lange-Bertalot, Microcostatus pseudomuralis (Hustedt) Lange-Bertalot and Fistulifera saprophila (Lange-Bertalot et Bonik) Lange-Bertalot) were not described in 1930, and here they have either been allocated to a plausible category based on visual inspection of images in Hustedt (1930) or, if this revealed no plausible matches, recorded at the appropriate genus-level.

Results

Data overview

339 diatom taxa representing 62 genera found in 247 benthic, mainly epilithic, samples collected in watercourses over the whole territory of the Grand-duchy of Luxembourg from 1994 to 2003. 85 (25%) of these taxa were only found in one sample and 224 (66%) were never present in the dataset with a relative abundance >1%. The most commonly recorded taxa were Navicula gregaria Donkin (239 samples), Achnanthidium minutissimum (Kützing) Czarnecki (237), Mayamaea permitis (Hustedt) Bruder et Medlin (225), Eolimna minima (Grunow) Lange-Bertalot (220) and N. lanceolata (C. Agardh) Ehrenberg (218).

Effectiveness of genus- and species-level assessments

There was a poor correlation between the IDG and the IPS (ρc = 0.362; Table 3): the relationship is good when IPS is high, but fans out as IPS decreases. Nonetheless, 56% of samples still get the same classification (i.e. ‘good status or better’ versus ‘moderate status or worse’) using the IDG as when the IPS is used. The residuals from this relationship were then plotted against the total number of valves belonging to each of the 10 most abundant genera in the dataset. Two types of response were observed (Fig. 2):

Table 3 Comparison between IPS and simplified indices for assessment of ecological status
Fig. 2
figure 2

Analysis of residuals of relationship between IPS and IDG for the genera Navicula and Nitzschia. Navicula shows a pattern of decreasing residuals as the proportion of Navicula in the sample increases whereas some residuals appear to increase as the proportion of Nitzschia in a sample increases

  • Seven of the 10 most abundant genera showed patterns like that for Navicula, i.e. a tendency for residuals to decrease as the proportion of Navicula in the sample increases (NB: ‘Navicula’ = Navicula sensu stricto plus others that have not yet been assigned to ‘new’ genera).

  • The other three (Achnanthidium, Amphora, Nitzschia) also showed a pattern of decreasing residuals; however, for these, there was also a distinct ‘tail’ of samples where the residuals increased as the proportion of that genera increased.

Examination of those samples responsible for high residuals revealed that the species which predominated were Amphora pediculus, Nitzschia inconspicua Grunow, N. archibaldii Lange-Bertalot, Achnanthidum minutissimum and A. pyrenaicum (Hustedt) H. Kobayasi. Replacing the values for these species in the IDG by their respective values for the IPS led to an improvement in the relationship between IDG and IPS, from ρc = 0.362 to 0.598 (Fig. 3a).

Fig. 3
figure 3

Relationship between a modified IDG and original IPS (y = 0.37 + 8.34x; R 2 = 0.621) and b modified IPS and original IPS (y = 0.33 + 0.97x; R 2 = 0.994). The modified IDG has replaced sensitivity and indicator values for Achnanthidium minutissimum, A. pyrenaicum, Amphora pediculus, Nitzschia inconspicua and N. archibaldii by their respective values in the IPS. Changes to the modified IPS are listed in Table 1. Horizontal and vertical dashed lines indicate the position of the good/moderate status boundary, assumed here to be IPS = 13; diagonal line indicates slope = 1

If this improvement can be gained from modifying a genus-level index, what will be the effect when a species-level index is modified? Whereas modification to the IDG involved adding extra taxa, the next modification involved merging taxa that are differentiated in the IPS, following Table 1. In this case, the R 2 was very close to 1 (Table 3), suggesting very little difference between an IPS based on fine-level taxonomy and one based on more pragmatic categories.

The modified IDG led to the proportion of samples which were classified in the same way as by the IPS increasing from 56% (with the unmodified IDG) to 84%. The modified IPS saw a further increase, with 98% of sites being classified in the same way as by the original IPS. There was no relationship between the first axis of the PCA and the IDG, a weak relationship for the modified IDG and strong relationships for both the modified IPS and original IPS (Table 3).

Effect of removing uncommon and infrequent taxa

Removing taxa from the IPS calculation based on the number of records or their maximum relative abundance in the dataset had little effect until each criterion reached about 10%, after which, the deviation from the unmodified IPS gradually increased (Fig. 4). Nonetheless, even when the criteria were set at 20%, ρc was still 0.96 based on the maximum relative abundance and 0.98 based on the number of records in the dataset. The effect of removing taxa based on their representation in the sample led to more significant declines: excluding all taxa with a relative abundance <10% led to ρc = 0.90.

Fig. 4
figure 4

Effect of removing taxa from IPS calculations on Lin’s concordance correlation coefficient (ρc). Three methods for reducing taxa numbers were used: removing taxa based on the number of records in the database in which the taxon occurs (filled circle), on the maximum relative abundance of the taxon in the database (open circle) and on the maximum relative abundance of the taxon in the sample (triangle)

In light of these results, a working threshold of 2% maximum relative abundance in the dataset was set as a minimum criterion for inclusion in IPS calculations and when this filter was applied along with the taxa merges described in Table 1, ρc between the original and revised IPS was 0.9976.

Effect on TI and ICM

The modifications described for the IPS were then applied to the TI. ρc for the relationship between the ‘original’ and ‘revised’ TI was 0.982 (Fig. 5a) whilst ρc between the ICM calculated on the two unmodified metrics and that ICM computed using modified metrics was 0.977 (Fig. 5b).

Fig. 5
figure 5

Relationship between a ‘original’ and ‘revised’ TI and b ‘original’ and ‘revised’ ICM

Comparison between ‘modern’ and ‘Hustedt’ taxonomy

A final test of the potential for taxonomic simplification was to reset all nomenclature to that used in Hustedt (1930) and then to compare ordinations based on ‘modern’ taxonomy with those based on ‘Hustedt’ taxonomy. Some of these mirror changes are described in Table 1 (i.e. ‘Cymbella ventricosa’ merges several more recently described taxa).

Although the first two axes of DCAs based on both ‘Hustedt’ and ‘modern’ taxonomy were correlated with the IPS, axis 2 was, in both cases, more strongly correlated with the pressure gradient (Pearson’s r = 0.625 and 0.704, respectively). Low values for axis 2 were associated with pollution-tolerant taxa such as Nitzschia amphibia Grunow and Eolimna minima, whilst pollution-sensitive taxa tended to have high scores on axis 2. Axis 1 is more difficult to disentangle, but may reflect differential responses to pressure gradient with taxa such as Amphora pediculus and Platessa conspicua (Ant. Mayer) Lange-Bertalot in Krammer & Lange-Bertalot which are sensitive to organic pollution but tolerant of elevated inorganic nutrients having high scores, whilst more organic-pollution tolerant taxa such as Luticola goeppertiana (Bleisch) D.G. Mann and Nitzschia paleacea Grunow having low scores. Axis 2 for both Hustedt and modern taxonomy actually had stronger relationships with the first axis of the PCA than did the IPS (see Table 4). Overall, however, both axes showed a strong association (Pearson’s r = 0.898; Fig. 6).

Table 4 Pearson’s correlation coefficient between axis 1 of a PCA of water chemistry variables, IPS values generated from diatom samples and the first two axes of Detrended Correspondence Analyses based on modern and ‘Hustedt’ taxonomy (see text for more details)
Fig. 6
figure 6

Comparison between the second axes of DCAs based on ‘Hustedt’ (Hustedt, 1930) and ‘modern’ taxonomy (r = 0.948)

Discussion

Results presented in this paper allow two conclusions: one relevant to the work presented by Kelly et al. (2009b) and one relevant to future intercalibration exercises, and indeed, for diatom-based monitoring more generally. The observation, that there is little difference between IPS and TI computed using detailed taxonomy and these same indices with ‘difficult’ taxa amalgamated and ‘rare’ taxa removed, suggests that variation due to differing taxonomic conventions is probably not a significant contributor to the variation between national ecological status boundaries described in Kelly et al. (2009b). Moreover, these same results, seen from a different perspective, also suggest that detailed taxonomy does not add a great deal to the sensitivity of weighted average metrics such as the IPS and TI. Indeed, an ordination based on modern taxonomy gives very similar results to one based on taxonomic conventions current 80 years ago (Fig. 6).

This is not to question recent evidence for the number of diatom species to be much higher than previously thought (Mann, 1999), rather to note that many of these newly recorded species are either relatively infrequent in datasets such as that reported here or that too little information is currently known about their ecological spectra for them to make a significant contribution to our understanding of the response of phytobenthic communities to pollution. Of the 20 most frequently recorded taxa in the Luxembourg dataset, 15 can be identified directly from Hustedt (1930). Two taxa—Planothidium lanceolatum (Brébisson ex Kützing) Lange-Bertalot and P. frequentissimum (Lange-Bertalot) Lange-Bertalot—were recorded by Hustedt as a single entity—Achnanthes lanceolata (Brébisson ex Kützing) Grunow in Van Heurck,—and only three small naviculoid diatoms—Eolimna subminuscula (Manguin) Gerd Moser, Lange-Bertalot et Metzeltin, F. saprophila, and M. permitis—are not included in Hustedt (1930).

A comparison between the IDG and the IPS (Table 3) suggests that there are limits to the extent to which taxonomy can be simplified. Although there are reports of stronger relationships between genus- and species-level assessments (Kelly et al., 1995, 2009b; Kwandrans et al., 1998; Wu & Kow, 2002), there are also reports of poor relationships similar to that observed here (Prygiel & Coste, 1993; Rimet et al., 2005; Feio et al., 2009). What is interesting is how adjusting the sensitivities of just a few species can improve this relationship (Table 3). This, in turn, provides an interesting parallel to the subsequent ‘simplification’ of the species-level assessment.

Kelly et al. (2009a) criticize the approach adopted by many EU States for the development of WFD-compatible assessment systems, suggesting that these were insufficiently grounded in ecological theory. ‘Ecological status’ is defined as an expression of ‘the structure and functioning of aquatic ecosystem’ (WFD, Article 2; European Union, 2000), whilst weighted-average metrics such as the IPS and TI are optimised solely in terms of correlations between species composition and chemical variables. Non-diatom algae are ignored altogether in most assessment systems (exceptions include Austria: Rott et al., 1999; Pfister & Pipp, 2009, and Germany: Schaumburg et al., 2004). This means that ‘splitting’ diatom taxa and refining sensitivity values becomes little more than a statistical ‘fix’ designed to improve the relationships with the chemical environment. Whilst Kelly (2006) and Kelly et al. (2008) demonstrate that diatoms provide a cost-effective summary of the status of the broader phytobenthos community, there would seem to be limits to the extent to which the diatom assemblages should be dredged for every last scrap of data, whilst other algal groups are ignored. Indeed, the reliance on cleaned valves to reveal the detail on which fine-scale taxonomy is based means that it is not possible to know for certain whether a diatom present in very low numbers (one or two valves) in an analysis ever actually grew at the sample site (rather than just being washed in from elsewhere in the catchment). Bearing in mind that the definition of ecological status emphasises ‘structure and function’, there does seem to be good grounds for focussing on those diatoms which are quantitatively important and, perhaps, for future generations of assessment tools to give greater weight to non-diatom algae.

Recent developments in diatom taxonomy have taken place against a vigorous debate about whether or not diatoms are distributed ubiquitously. Finlay et al. (2002) and Heino et al. (2010) cite Baas-Becking (1934) who postulated that for microorganisms ‘everything is everywhere’, but the environment will select if they can survive whilst others (Telford et al., 2006; Medlin, 2007; Vyverman et al., 2007) arguing that ecological gradients (as pH), biogeographic, historical and molecular concepts are relevant to diatoms. Whilst this knowledge can help to track the spread of invasive taxa (Coste & Ector, 2000; Blanco et al., 2008; Blanco & Ector, 2009), we suggest a further possibility—that members of a species complex may be biogeographically distinct yet still play similar roles in the overall phytobenthos community.

Conclusion

In practical terms, analyses reported in this paper demonstrate that focussing on a level of identification on which all analysts agree, and ignoring taxa that are never found at >2% relative abundance in a dataset does not lead to a loss of ecological information, when viewed at a continental scale. Interestingly, adopting the approaches described here does not lead to better precision (Kahlert et al., 2012) suggesting that other aspects of analytical procedure may be more significant sources of uncertainty than identification conventions. These results do not mean that member states should not adopt their own taxonomic conventions, and indeed, regional assessments may gain from a detailed knowledge of taxa with localised distributions and distinct ecologies (Potapova & Charles, 2002). The message from this paper, however, is that such nuances are not necessary for continent-wide comparisons, which are required for the EU’s intercalibration exercise.