Introduction

The contamination of aquatic ecosystems by mixtures of industrial and household chemicals, pesticides, and other anthropogenic contaminants is still a major environmental problem (Fuerhacker 2009; Silva et al. 2012; Smital et al. 2012). Contaminants are known to affect ecosystem functions and services but the extent and impact of the effects are still relatively poorly characterized (Rockström et al. 2009). To model and predict impacts of toxic chemicals on aquatic biota, several approaches have been suggested (Solomon et al. 2008).

One of the methods broadly used for prospective ecotoxicological risk assessment and the setting of environmental quality standards (EQS) is species sensitivity distribution (SSD) (Posthuma et al. 2002; Van Straalen and Denneman 1989). This statistical model combines ecotoxicity values (e.g., chronic no-effect concentrations such as no observed effect concentration [NOEC] or acute concentrations causing an effect in 50 % of organisms [EC50]), which are available for different species in databases and public literature. The uncertainty in setting EQS using this probabilistic risk assessment approach is then expected to be smaller in comparison with the approach using assessment factors applicable for a limited number of EC50 or NOEC values (EC 2011; Vighi et al. 2006). Since SSD is a quantitative method, it can also be applied in retrospective risk assessment to predict a fraction of the community which is likely to be impacted by a certain chemical or mixture (Fig. 1). To describe the risk, the SSD approach uses values called the “potentially affected fraction” (PAF) for single substances and the “multiple substance potentially affected fraction” (msPAF) for mixtures (Posthuma et al. 2002) and has been employed in some case studies in Europe (Carafa et al. 2011; Comte et al. 2010; Faggiano et al. 2010) and the USA (Schuler and Rand 2008).

Fig. 1
figure 1

SSD as a cumulative distribution function. The figure shows the derivation of the concentration affecting “p%” of species (the hazard concentration HCp, used in probabilistic risk assessment and for the setting of EQS) as well as the estimation of the fraction of species affected by a certain concentration (PAF—retrospective risk assessment). Black dots denote the NOEC or EC50 values for different biological species

Despite the fact that SSD is broadly used and recommended by authorities for setting EQS within the prospective risk assessment (e.g., EC 2003, 2011; Xiaowei et al. 2012; Caquet et al. 2013), many different statistical methods and data validation approaches have been used in SSD (especially within the retrospective risk assessment of mixtures), which can lead to highly variable results.

The basic approach is a classical parametric statistical approach assuming a certain distribution of primary data. The toxicity data are fitted, for example, to log-normal (Wagner and Lokke 1991), log-logistic (Van Straalen and Denneman 1989), or log-triangular (US Environmental Protection Agency (EPA) 1985) distributions. Since some datasets hardly fit classical distributions, nonparametric bootstrapping has also been suggested for SSD (Duboudin et al. 2004; Newman et al. 2000; Wheeler et al. 2002). More recently, the use of Bayesian statistics has become a popular approach thanks to its advantages such as the easy derivation of confidence intervals and the possibility of incorporating expert knowledge (Aldenberg and Jaworska 2000; Grist et al. 2006; Hayashi and Kashiwagi 2010).

In addition to variability in statistical methods, the outcomes of SSD may be affected by the application of different data validation approaches, such as using different sets (NOECs vs. EC50s) and considering different taxonomic groups and/or exposure durations. The influence of these factors on SSD is briefly described in the following paragraphs.

It was originally recommended to use chronic NOEC values for SSD modeling (Bachman 2009; EC 2003; Sijm et al. 2002). However, this approach suffers from a low number of available NOEC values; in practice, acute ecotoxicity data (EC50) are more often available. Also, the high uncertainties related to the way NOECs are derived (post-analysis of variance multiple comparison tests) and the overall NOEC concept (the derived value depends on actually tested concentrations) have been discussed critically, and some authors suggest that NOEC should not be used in ecotoxicology (Crane and Newman 2000; Jager 2012; Laskowski 1995; Organization for Economic Cooperation and Development (OECD) 2006b; Van der Hoeven 1997). It is possible to replace NOEC with the more robust chronic EC10 (Crane and Newman 2000; EC 2011), but these values are in general still missing in ecotoxicity databases. Therefore, acute EC50 values are often used in retrospective risk assessment and they can be combined with uncertainty factors (acute to chronic factor [ACF]) to extrapolate to chronic effects (De Zwart 2002; Van den Brink et al. 2006).

According to European legislation (EC 2003, 2011), algae, macrophytes, invertebrates, and vertebrates should all be included in the SSD calculation, and values for at least 10 species should be used. This approach is generally accepted, but specific recommendations differ among other agencies. For example, the US EPA recommends at least eight species, the Dutch RIVM recommends four species, and the OECD recommends from five to eight species (Kefford et al. 2005). In practice, the number of species is defined by toxicity data, which are available in databases, and often <10 species are used (Maltby et al. 2009; Schuler and Rand 2008). The proper representation of species and taxa within an SSD is one of the major assumptions of this method (Forbes and Calow 2002), and using species weighting (Duboudin et al. 2004; Hayashi and Kashiwagi 2010) or ecotoxicity data for ecologically relevant species has been suggested (Hickey et al. 2008; Kefford et al. 2005; Xiaowei et al. 2012). In the case of substances with a specific toxic mode of action (TMoA) such as pesticides, some authors include all available species across all taxonomic groups within a single SSD curve (Faggiano et al. 2010). However, other authors (Maltby et al. 2005; Van den Brink et al. 2006) recommend that only species from the most sensitive taxonomic group should be used (e.g., primary producers for herbicides, insects for insecticides, etc.) and that merging data from different taxonomic groups is not correct. A slightly different approach is suggested by EC (2011), where the most sensitive group is used only in the case of an apparent gap between the sensitivity distributions of the most sensitive vs. other taxa or if there is a poor log-normal fit of the SSD. Another problem discussed in the literature is the use of freshwater vs. marine species in SSD. The general assumption of SSD is a random and representative selection of taxa from the target group (Forbes and Calow 2002), which should logically distinguish between freshwater and marine organisms. However, several studies have shown no systematic or statistically significant difference in the sensitivities of organisms from different aquatic compartments and, therefore, using ecotoxicity data from both freshwater and marine taxa (which do not differ significantly) within one SSD has been recommended (De Zwart 2002; Maltby et al. 2005; Raimondo et al. 2008; EC 2011).

Another important step in data selection for SSD is exposure duration selection. As mentioned previously, EC50s are nowadays the most commonly used values in SSD modeling. However, some authors require strictly acute values from very short exposures (e.g., <1 day for algae; De Zwart 2002), while others use different exposure duration ranges (e.g., Van den Brink et al. 2006 used 1–7 days for algae and 2–28 days for macrophytes; Duboudin et al. 2004 used a maximum of 2 days for algae; Brock et al. 2004 used 1–5 days for algae and 4–28 days for macrophytes; Schuler and Rand 2008 used a maximum of 7 days for all primary producers; Caquet et al. 2013 used 1 day or more for primary producers). Furthermore, in some studies, the exact exposure durations behind the ecotoxicity data used for SSD calculations have not been exactly specified (Newman et al. 2000; Comte et al. 2010). The major problem is thus the poor definition of “acute,” especially for some taxa such as algae, where relative short exposures lasting a few days represent, in fact, several generations.

Besides the three major sources of variability mentioned previously, other validation steps may affect the final SSD outcome such as (1) checking the purity of tested substances (especially for pesticide preparations), (2) deleting outliers and replicated values, which can often be found in ecotoxicity databases, or (3) checking of variable test conditions (e.g., pH, temperature).

Some studies critically investigated the impact of various data validation approaches on SSD outcomes in prospective risk assessment design, where SSD is commonly used for the estimation of EQS with respect to individual chemicals (Maltby et al. 2005; Van den Brink et al. 2006; Wheeler et al. 2002). However, to our knowledge, no critical assessment of different data validation approaches in the retrospective ecotoxicological risk assessment of mixtures has been provided. Therefore, the aim of the present study was to primarily investigate the impact of different data validation approaches on the outcomes of SSD applied in a retrospective case study. As a model case, seven important herbicides (differing in their modes of action) were selected and long-term 1998–2009 monitoring data from the Scheldt river basin, Belgium were used to calculate msPAF values. The objective was to assess variability in the SSD outputs (i.e., values and ranges in msPAF) when combining real monitoring data with various input toxicity data for different taxa (all taxonomic groups vs. primary producers), when using different exposure durations (selected periods vs. all available durations), when testing the influence of different purities of tested pesticides (pure only vs. all available data), and when using different validation procedures (the removal of replicates and outliers). Further, by synthesizing various opinions discussed in the literature, we tried to outline and discuss important steps in the data preparation and validation and demonstrate key factors affecting the SSD outcomes when applied in retrospective design.

Materials and methods

SSD model

Seven different herbicide compounds in this study were selected mainly because of the availability of long-term monitoring data (1998–2009) and the availability of ecotoxicity data covering different chemical structures and TMoA (Reade and Cobb 2002). The selection served as a model for the assessment of different input data on retrospective ERA outcomes. Bentazone (benzothiadiazinone class) is a photosynthesis inhibitor in photosystem II, which binds to the “C” site of D1 protein; glyphosate (glycine derivative) is a selective inhibitor of essential aromatic amino acid synthesis in chloroplasts; isoproturon (urea derivative) is an inhibitor of photosynthesis in photosystem II, which binds to a “B” site of D1 protein; 2-methyl-4-chlorophenoxyacetic acid (MCPA) and mecoprop are phenoxy-carboxylic acid herbicides, which act as synthetic auxins; simazine and terbuthylazine are triazine herbicides acting as inhibitors of photosynthesis in photosystem II (binding to an “A” site of D1 protein).

Ecotoxicity data for these seven herbicides were obtained from the US EPA ecotoxicity database ECOTOX (http://cfpub.epa.gov/ecotox, last data acquired during November 2011) and from open literature (Supplementary Table S5). The original ecotoxicity database contained a very low number of suitable chronic NOEC or EC10 values; often the same NOEC value was repeatedly reported for different species by the work of a single author. Consequently, only EC50 values were selected for further processing. With respect to reported endpoints, we considered only toxicity data related to growth and biomass (for primary producers) and mortality and immobilization (for other organisms). Assessment of this original dataset (EC50 values) showed no difference in sensitivities between freshwater and saltwater species (Mann–Whitney U test, Z = −0.47, p = 0.64), and data from both compartments were combined together.

Original datasets were further processed and validated using different approaches, which resulted in 11 different datasets used for the modeling of SSD (Table 1 and Supplementary Table S5). Detailed explanations of the individual datasets are provided in the “Results and discussion” section.

Table 1 Different datasets of EC50 values used according to different data selection and validation approaches

Log-normal approximation was applied and SSD curves were modeled using the Bayesian approach in R-software and winBUGS according to Hayashi and Kashiwagi (2010). For SSD calculations, model 1 of Hayashi and Kashiwagi (2010) was used and no expert knowledge incorporated.

EU guidance (EC 2011) recommends a minimum of 10 species toxicity values for SSD modeling. This requirement was fulfilled in all datasets for glyphosate, isoproturon, simazine, and terbuthylazine and also in some subdatasets for bentazone and MCPA (see Table 2 and Supplementary Table S1). In the present study, a minimum of five species EC50 values for each SSD were applied in agreement with some previous studies (Van den Brink et al. 2006; Schuler and Rand 2008). This minimum is also in accord with Dutch RIVM (a minimum of four values; RIZA (1999)) and US EPA (five to eight values). In the case of some datasets of mecoprop (Supplementary Table S1), the number of species EC50 values was lower than five. However, we did not discard mecoprop from the study because this herbicide is frequently detected in European rivers, and the potential risks should not be overlooked. The SSD parameters of mecoprop were, therefore, estimated according to Aldenberg and Luttik (2002) by using the standard deviation of the substance with the same TMoA (MCPA for mecoprop).

Table 2 Parameters of the normal distribution fitted to the decimal logarithm of primary producer EC50 values (mean μ and standard deviation σ), HC5 values, and number of data points in SSD models derived according to different data selection; selected “most appropriate” dataset is in italic text

Monitoring data

Concentrations of herbicides were obtained from the Flemish Environment Agency (VMM; the Flemish government, Belgium; http://www.vmm.be) and covered chemical monitoring in the Belgian part of the Scheldt river basin between 1998 and 2009. The monitoring was performed in monthly periods and included 37 sampling stations (Fig. 2). If data were reported below the detection limit (not constant), a value of half the detection limit was assigned. The number of samples collected and analyzed per year varied from 92 (in 1998) to 266 (in 2005). Not all stations and herbicides were monitored each year. Between 1998 and 2000, only isoproturon, simazine, and terbuthylazine were analyzed systematically. After 2001, other herbicides were included, and from 2005, the total yearly number of samples settled to around 250 (see Supplementary Table S4).

Fig. 2
figure 2

Positions of the studied localities within the Belgian part of the Scheldt river basin

Risk characterization

SSDs derived for individual substances (based on acute EC50) were transformed to “chronic SSD” using an ACF of 10, as chronic impacts could be expected with respect to the long-lasting occurrence of several herbicides in the River Scheldt as demonstrated by monitoring data. An ACF of 10 was used according to De Zwart (2002), who correlated averaged chronic NOEC values and acute EC50 values for 89 different compounds and derived this “average” factor. Despite the uncertainty (the 95 % confidence interval for ACF in the study of De Zwart 2002 ranged from 8 to 94), a factor of 10 is generally recommended and used in the SSD of mixtures as a median factor for different substances. For example, Van den Brink et al. (2006) justified its use for a mixture of nine herbicides. A more appropriate way of transferring acute EC50 to chronic NOEC would be to estimate the unique ACF for each substance. However, a lack of NOEC values in the database did not allow us to use this approach and an ACF of 10 was used in the present study.

To characterize the potential impact of herbicides on a community, msPAF values were calculated. msPAFs combine SSDs for individual herbicides using a mixed-model approach with concentration addition (CA) and response addition (RA) models (De Zwart and Posthuma 2005; Traas et al. 2002).

First, the concentration addition msPAFs (msPAFCA) were calculated for those substances which share the same TMoA (i.e., MCPA + mecoprop and simazine + terbuthylazine). The CA approach assumes that substances with the same TMoA act on the same target, that these substances’ SSDs have a similar standard deviation, and that only the means are different (i.e., the substances differs in their potencies) (Aldenberg and Luttik 2002; Cedergreen et al. 2008). The CA model for SSDs based on log-normal EC50 approximation is calculated as follows:

$$ \mathrm{msPA}{{\mathrm{F}}_{\mathrm{CA}}}=\frac{1}{{{\sigma^2}\mathrm{TMoA}\sqrt{{2\pi }}}}\exp \frac{{-\log {{{\left( {\sum\nolimits_{i=1}^n {\mathrm{H}{{\mathrm{U}}_{{\mathrm{TMoA},i}}}} } \right)}}^2}}}{{2{\sigma^2}\mathrm{TMoA}}} $$
(1)

where σ TMoA is the average standard deviation of SSD for herbicides within the same the same TMoA and HUTMoA is a hazard unit for each substance calculated by the following equation:

$$ \mathrm{H}{{\mathrm{U}}_{\mathrm{TMoA}}}=\frac{{\left( {\mathrm{environmental}\,\mathrm{concentration}} \right)\mathrm{TMoA}}}{{{10^{\mu }}}} $$
(2)

where μ is the mean of SSD for the respective chemical substance (mean of the log(EC50) values).

In the second step, the response addition msPAFs (msPAFRA) for herbicides (or groups of herbicides) with different TMoA were calculated. RA (the term “independent action” is used for single-species dose–response models) assumes that substances with different TMoA do not interact and thus act independently. The final mixture effect is, therefore, based only on the combined probabilities calculated for all substances in the mixture (Cedergreen et al. 2008; De Zwart and Posthuma 2005). The solution of the RA model for SSD according to Traas et al. (2002) is:

$$ \mathrm{msPA}{{\mathrm{F}}_{\mathrm{RA}}}=1-\prod\nolimits_{i=1}^n {\left( {1-\mathrm{msPA}{{\mathrm{F}}_i}} \right)} $$
(3)

where msPAF i stands for msPAFCA (for a group of herbicides with the same TMoA) or PAF (for a single herbicide with unique TMoA).

For data management, SSD derivation, and msPAF calculations, we used R-software 2.14.0 (R Development Core Team, Vienna, Austria), winBUGS 1.4.3 (Imperial College and MRC, UK), Statistica 10 (StatSoft, Tulsa, OK, USA), and Microsoft Excel 2010.

Results and discussion

Although SSD is often used for prospective risk assessment and setting the EQS (EC 2003, 2011), the method has been only poorly characterized in terms of its application for the retrospective risk assessment of mixtures at the river basin scale. To achieve the goal of the present paper, i.e., to characterize the impacts of different data processing and validation approaches on the outcomes of SSD, we first assessed the selections of input EC50 data on SSD outcomes. Five herbicides were selected for this first assessment, one from each TMoA group. Further, the most appropriate EC50 datasets were tested with respect to the pesticide monitoring case of the Scheldt river basin.

Influence of different data validation approaches on the SSD

Different exposure durations

The exposure duration of experiments from which ecotoxicity values (mostly EC50) are derived is one of the most important factors, but variable ranges have been used by different authors (De Zwart 2002; Van den Brink et al. 2006; Schuler and Rand 2008). In the present exercise, herbicide ecotoxicity data obtained with the target organisms (primary producers) were selected. Data were validated by (1) selecting EC50 values only and (2) checking and removing replicates and outlying values, which were commonly found in ecotoxicity databases. The outliers, i.e., EC50 values outside of the 3σ interval of the SSD distribution, were tracked back to original articles and their reliability was assessed case by case. Finally, (3) for the species with more EC50s, the values were averaged by a geometric mean so that the data were equally representative among species (a single data point for each species). Five different categories were then defined (Table 1 and Supplementary Table S2): dataset no. 1 combined all available ecotoxicity data from very short—usually in vitro biochemical—experiments lasting minutes to hours, up to experiments lasting more than 3 months (96 days). The very short, ecologically nonrelevant values (<1 day), which could bias overall risks, were excluded in dataset no. 2, which then contained data from 1- to 96-day-long ecotoxicological experiments. Other datasets (nos. 3, 4, and 5a) contained different short-term EC50 values because the term “acute” is not well defined and substantial differences exist among primary producers (microalgae vs. macrophytes). Two datasets did not discriminate between algae and macrophytes and the categories were 1–4 days (dataset no. 3; i.e., the general “acute” range in ecotoxicology) and 1–7 days (dataset no. 5a, selected for pragmatic reasons as more ecotoxicity values were available). Dataset no. 4 included exposure durations reflecting OECD guidelines (OECD 2006a, 2011) with data from <1 to 3 days for algae and from <1 to 7 days for macrophytes.

Example comparisons of the SSD curves for five herbicides and three exposure categories are presented in Fig. 3, and detailed characteristics of the modeled SSD values (number of data points [N], mean, standard deviation, and hazard concentration predicting the impact on 5 % of the community [HC5]) are presented in Table 2. As is apparent, for example, with bentazone, glyphosate, and MCPA, the narrower the range of exposure durations, the smaller the number of EC50 values and the higher the uncertainty of SSD (demonstrated by broader 90 % confidence intervals, especially on the left tail of the curve; see Fig. 3 and HC5 values in Table 2). For these statistical reasons, the largest number of values in dataset no. 1 (<1–96 days) could be the best choice. However, it is obvious (for example, with simazine or MCPA; Fig. 3 and Table 2) that many long (>7 days) and/or too short exposures (<1 day) increased the standard deviation of the SSD curve. Large uncertainties and standard deviations were also observed for datasets with very small numbers of EC50 values (dataset nos. 3 and 4; an extreme case was, for example, MCPA—only a single EC50 value was available for dataset no. 4). Therefore, dataset no. 5a (exposure durations, 1–7 days) was found to be a good compromise representing short-term acute exposures by a sufficiently large number of data and reasonably small confidence intervals.

Fig. 3
figure 3

SSD for five herbicides: bentazone, glyphosate, isoproturon, MCPA, simazine; comparison of SSD curves modeled from data with different categories of exposure durations. Each point is a geometric mean of EC50 values for a single species (normality, Anderson–Darling goodness p > 0.05, was accepted in all datasets except for simazine (dataset nos. 1 and 4))

Purity of the herbicide formulation

Many ecotoxicological experiments tested pesticides as commercial formulations, which, apart from the active ingredient, contain also various additives such as detergents or stabilizers, and the final EC50 value may not necessarily represent the effect of the pesticide itself (Beggel et al. 2010; Cedergreen and Streibig 2005; Pereira et al. 2000). To check the influence of this bias within the original ecotoxicity database, the data from the selected dataset no. 5a (1–7 days) were further validated and only the values for pure herbicides (≥90 % active ingredient) were selected for dataset no. 5b. As shown in Table 2, no major differences were observed between SSDs from dataset nos. 5a vs. 5b, except for bentazone. For this herbicide (Fig. S1), better normality was achieved with dataset no. 5b (the p value of Anderson–Darling goodness of fit increased from 0.09 to 0.54), while the HC5 value decreased by approximately two times (Table 2). Taken together, it is recommended to perform detailed data validation which considers also the purity of herbicides used in the original ecotoxicological experiments. This validation step provided more relevant data (effects of additives in the pesticide formulation excluded), and the statistical assumptions (the normality) were not affected or even improved.

Dealing with more ecotoxicological data values for one species

The compilation of ecotoxicity data (EC50 values) for SSD modeling assumes that all taxa (species) are equally represented in the dataset (Van den Brink et al. 2008), and it is recommended to use only a single toxicity value (EC50) for each species (Suter et al. 2002). However, two different approaches regarding the EC50 have been discussed. The first approach, which was also used in the present study, considers intraspecies and interlaboratory variability and estimates the “average” sensitivity of the species by using the geometric mean of all available EC50 values—dataset no. 5b (Maltby et al. 2005; Raimondo et al. 2008). Another approach focuses on the worst case by selecting the “most sensitive” endpoint—i.e., the lowest EC50 values for each species (Hayashi and Kashiwagi 2011; Schuler and Rand 2008). To assess the actual differences, a specific dataset no. 5c (“the most sensitive” approach) was prepared and compared with the corresponding dataset no. 5b. It should be mentioned that neither of these approaches in the present study considered differences in life stages or test conditions (such as pH, hardness, or dissolved organic carbon) (Hayashi and Kashiwagi 2011). As expected, using lower EC50 values in dataset no. 5c shifted SSDs to lower concentrations (see HC5 values in Table 2 and Fig. S2 in the Electronic supplementary material). With respect to the normality distribution, we recorded one improvement (with glyphosate, where the p value of Anderson–Darling goodness of fit increased from 0.16 to 0.64) and four worsenings (e.g., with bentazone and others). Both approaches apparently lead to different SSD outcomes. Using only the smallest EC50 values gives more weight to sensitive life stages and clones; thus, it seems to be more conservative. However, it could also overestimate the actual risk because not using all available data gives more weight to potential outliers (see, for example, bentazone and glyphosate in Supplementary Fig. S2). The use of the “averaged” all-species values (geometric mean) is probably less conservative, but it may minimize the influence of experimental errors from experimental ecotoxicity tests (Wheeler et al. 2002).

Finally, the approach based on the geometric means of validated EC50 values from primary producers with an exposure duration of 1–7 days (dataset no. 5b) was selected for the present case study on the Scheldt river basin as described in the following sections. This, in our opinion, was the most appropriate dataset and was used for msPAFs calculation; the results were then compared with msPAFs derived by other approaches suggested in the literature (the use of nonvalidated datasets, the use of complete ecotoxicity data for taxa other than primary producers alone, and the use of the most sensitive EC50 endpoint).

Risks of herbicide mixtures for the Scheldt river basin localities

Rather than assessing the complex ecotoxicological status in the Scheldt river basin, the objective of the present study was to investigate SSD suitability in the retrospective ERA by testing the influence of input data on the range and variability of predicted msPAF (see the “The role of different data validations on retrospective risk assessment—msPAFRA ” section). Nevertheless, the described approach provided simple comparisons of relative risks between localities and years. In spite of uncertainties (e.g., a limited number of herbicides, a relatively low number of EC50s for some compounds, a single ACF for all substances, the selected statistical method or data validation approach), predicted msPAF values for herbicide mixtures provided reasonable prioritization of localities and the identification of the most affected ones (see also Supplementary Table S3).

Estimated risks (msPAF) for individual monitored sites, months, and years within the Scheldt river basin, Belgium during the 12-year monitoring period are presented in Fig. 4. As is apparent, a clear decrease in estimated risks (the lowering of msPAFs) was observed during the years 1998–2009 (Figs. 4 and 5, Supplementary Table S3), which reflected the decreasing concentrations of some herbicides (isoproturon and simazine). Yearly means of msPAFRA (all localities taken together) were significantly correlated with simazine and isoproturon concentrations (Spearman rank order correlation R S = 0.7 and 0.67, respectively, p < 0.05).

Fig. 4
figure 4

Calculated risks (chronic msPAFRA values) for 37 localities within the Scheldt river basin; a presentation of the results during the 1998–2009 monitoring, b time dependency of msPAF for localities with at least one msPAF > 0.05 and for the years 2007–2009 (split into two plots for better differentiation). Each point represents the calculated msPAF (monitoring was not done each month); the red line indicates msPAFRA = 0.05

Fig. 5
figure 5

Yearly mean msPAFRA values (a) and concentrations of herbicides (b) at all localities together. Herbicides with higher toxicity (i.e., HC5 < 100 μg/l) are shown in red

Until 2004, the monthly levels of predicted risk often exceeded an msPAF of 0.05 at several sites. This 5 % threshold is used in the prospective SSD for deriving predicted no-effect concentrations–HC5 (the hazardous concentration of a substance affecting 5 % of species). If the msPAF exceeds 0.05, the ecosystem is considered significantly affected (EC 2011). The present study shows that, at many localities, the msPAF reached 0.4 and, for seven localities, the predicted risks were even higher (Supplementary Table S3). Less pronounced effects were derived for the more recent years 2005–2009, where none of the msPAFs exceeded 0.3. The highest values were found at localities 28–36, which are mostly situated in regions with higher proportion of arable land (Supplementary Table S3 and Fig. 1). Clear relationships between msPAF and the periods of herbicide application were recorded, with elevated msPAFRA values from March to July and in October and November (Fig. 4).

The derived msPAFRA corresponded to previous studies. For example, Schuler and Rand (2008) investigated the acute effects of 11 PSII inhibitors (+ norflurazon) on aquatic primary producers in Florida and predicted impacts on up to 27 % of the community (msPAF = 27 %). In the Scheldt river basin, Comte et al. (2010) predicted the acute effects of chemical mixtures on invertebrates with maximum msPAFs up to 25 %. Carafa et al. (2011) predicted the acute effects of the mixture of 60 different contaminants in Catalonia, Spain, with maximum msPAF up to 100 %. Lower msPAFs (<5 %) were reported by Faggiano et al. (2010), who predicted the acute effects of 26 pesticides at the Adour-Garonne basin, France.

In our study, the importance of actual herbicide ecotoxicity for the prediction of risk was clearly revealed. While the measured concentrations (yearly means) differed by a maximum of two orders of magnitude across all sampling sites, the SSD curves of individual herbicides differed up to three orders of magnitude (see Table 2 and Supplementary Table S4). The final msPAFRA value combining the measured concentrations with ecotoxicity indices clearly provides more realistic risk prediction than concentrations alone.

The investigated samples contained herbicide mixtures with highly variable concentrations and ratios. Our further investigation aimed to elucidate the influence of individual chemicals on the overall msPAF, thus identifying hazardous compounds with the highest contribution to overall risk. A subset of the monitoring data was selected to cover periods of elevated pesticide concentrations (March to July + November) during two particular years with different levels of contamination (1999 vs. 2009), and the relative importance of each herbicide (or class with the same TMoA) was calculated as a weighted rank. For each sample, the PAFs of individual herbicides (or TMoA classes) were ranked (the higher the PAF value, the bigger the influence of the respective compound on msPAF). The obtained rank values (i) were then weighted by dividing by the number of herbicides (n) that were actually determined in the specific sample (not all pesticides were present in each sample). The final value (weighted rank = i/n) indicates the relative contribution of the individual compound (or TMoA class) to the total msPAF (the smaller the value of the weighted rank, the higher the importance of an individual herbicide; see Fig. 6). During 1999, monitoring data for only three individual pesticides from two TMoA classes were available (isoproturon and simazine + terbuthylazine), and as is apparent from Fig. 6, the influences of these herbicides on the msPAFRA were clearly comparable. During 2009, simazine + terbuthylazine still had the highest importance (the lowest weighted ranks), while the group of MCPA + mecoprop appeared to be the least hazardous. For isoproturon, two peaks in the distribution of ranks were observed (Fig. 6), which can be linked to the specific application of this herbicide during the autumn months.

Fig. 6
figure 6

The influence of individual pesticides (or groups of pesticides with the same TMoA) on predicted risks (all localities together). The graphs show weighted ranks of PAF values in the years 1999 (upper panel) and 2009 (lower panel); for calculations, see the “Risks of herbicide mixtures for the Scheldt river basin localities” section. While in 1999 both compared pesticides had the same impact on predicted risks (similar percent distributions of weighted ranks), in 2009, simazine + terbuthylazine were the most problematic (very high percent fractions at the two lowest, i.e., most influential, weighted rank categories <0.2 and 0.2–0.4)

The role of different data validations on retrospective risk assessment—msPAFRA

Risk predictions described in the previous paragraphs were based on the selected—and, in our opinion, the most appropriate—dataset of ecotoxicity values (i.e., dataset no. 5b), which included only validated acute data (exposure of 1–7 days, purity ≥90 %, a single geometric mean EC50 for each species, and the exclusion of replicates and outliers) with respect to a group of primary producers (i.e., the most sensitive target taxa). However, other authors have suggested different approaches (Comte et al. 2010; Faggiano et al. 2010; Schuler and Rand 2008), and the major objective of the present paper was to investigate actual changes in the retrospective msPAFs after the use of different ecotoxicity datasets. For this exercise, values of msPAF were calculated for the years 2007, 2008, and 2009, which are relevant to the current exposure situation. The results are shown in Figs. 7 and 8.

Fig. 7
figure 7

Predicted risks (msPAFRA values) at 37 localities of the Scheldt river basin (2007, April to June)—influence of the EC50 datasets on the SSD outcomes (EC50 values for primary producers vs. all taxa; validated vs. nonvalidated datasets; “geomean” vs. “most sensitive” endpoint approach)

Fig. 8
figure 8

Relationships between predicted risks (msPAFRA values) at 37 localities of the Scheldt river basin (2007–2009, April to June) using different EC50 datasets. The msPAFRA values predicted using the reference dataset no. 5b (primary producers, geomean, validated data) are compared with other selected dataset nos. 5c, 6a, and 6b. The dashed line indicates a 1:1 relationship

Comparison of validated vs. nonvalidated datasets for primary producers (dataset no. 5b vs. 6b—Figs. 7 and 8) showed no significant major difference in the final msPAFs, although slightly lower msPAF values were determined using nonvalidated data. With respect to individual compounds, the mean sensitivity (mean of the SSD) shifted to either higher (e.g., bentazone, glyphosate, terbuthylazine) or lower concentrations (other compounds), and the variability increased (higher standard deviation of the SSD) when using larger datasets for nonvalidated primary producers (for example, with MCPA, where the standard deviation almost doubled; see Table 2).

More pronounced differences were observed when EC50 values for all taxa (validated data—dataset no. 6a) were included in the risk calculations. Smaller msPAF values (i.e., msPAF < 0.1 derived originally with the primary producer dataset no. 5b) generally increased when using the larger dataset no. 6a (see Fig. 7; Wilcoxon matched pair test, p < 0.05). In contrast, the bigger original msPAFs calculated with dataset no. 5b (msPAF > 0.1) were lowered when dataset no. 6a was used (Fig. 8). Similar trends were also observed when all available ecotoxicity data (all taxa, all exposure durations, nonvalidated—dataset no. 6c) were used for msPAF estimation. The pronounced differences in msPAFRA can be attributed to changes in the SSD parameters of individual herbicides. Use of dataset no. 6a instead of no. 5b resulted in an increase in both the mean sensitivity distribution and variability of single SSDs (see example of simazine in Supplementary Fig. S3). The only exception was the SSD for isoproturon, where mostly ecotoxicity data for primary producers were available (see Table 2). Differences observed between these two approaches (dataset no. 5b vs. 6a) were statistically significant and they may have a strong impact on predicted risks (large numerical differences in msPAFRA).

Differences in msPAF values were also observed between datasets based on mean sensitivity values (“geomean” dataset no. 5b—primary producers, no. 6a—all taxa) and the “most sensitive” values (dataset nos. 5c and 6d). For primary producer data, msPAFRA from dataset no. 5c were on average two times higher than values predicted from dataset no. 5b (see Fig. 8). Less pronounced differences were observed when data for all taxa were used (“geomean” dataset no. 6a vs. the “most sensitive” dataset no. 6d).

Overall, testing different datasets within the retrospective risk assessment of herbicides clearly confirmed the critical influence of taxa selection on the final msPAF values. This is in agreement with previous discussions of prospective SSD and HC5 calculations (Maltby et al. 2005; Van den Brink et al. 2006; Wagner and Lokke 1991) as well as studies of species weighting (Duboudin et al. 2004; Hayashi and Kashiwagi 2010). Duboudin et al. (2004) suggested weighting taxa according to their relevant environmental proportions, and Hayashi and Kashiwagi (2010) developed a Bayesian model which in addition to taxa weighting gives specific means and variance with respect to different taxonomic groups. Our study also confirmed the importance of another factor, which affects the predicted risk value, i.e., the decision to use “average” or “the most sensitive” data for individual species. While the average sensitivity (the “geomean” in our study) is often used in SSD, use of the “most sensitive” data is required by some conservative guidance documents for probabilistic risk assessment. No simple recommendation can be derived at this point, and case by case evaluation is needed. Possible improvements could be achieved by the preselection of ecotoxicity data according to experimental conditions such as using appropriate hardness of water, pH, organic matter, or temperature (EC 2003; Hayashi and Kashiwagi 2011). However, practical use of this demanding approach is limited because the required experimental details are often missing from both databases and original research studies.

In contrast, the present study with herbicide mixtures demonstrated that data validation (i.e., the manual deletion of replicates and outliers in the database, searching for new data in public literature, the detailed selection of exposure duration, or the checking of pesticide purity) has only a minor influence on the predicted msPAF values in practice, and these time-consuming procedures can eventually be omitted.

Conclusions

The present study demonstrates the application of the SSD approach for the retrospective risk assessment of herbicides at the river basin scale. Relatively high impacts of herbicide mixtures on aquatic primary producers were predicted by chronic msPAF (often up to 40 % or more of the community affected) but the risk appeared to decrease during the 1998–2009 period. It was found that different approaches to validating the original EC50 datasets (e.g., testing different exposure durations, investigating different purities of the studied herbicides, removing outlying and replicate values) substantially affected the sensitivity distributions at the level of individual studied herbicides. However, these demanding validation procedures had only a minor impact on the outcome in the retrospective risk assessment of the mixtures in the field case study. Therefore, for practical applications of SSD in retrospective ERA, the use of rough nonvalidated data seems to provide robust results, especially when few ecotoxicity values are available for certain compound(s). It was demonstrated that (1) the selection of the appropriate taxonomic group for the SSD calculation and (2) the decision whether to use “average” or the “most sensitive” data for individual species are the critical factors affecting the retrospective risk msPAF values.