1 Introduction

To date, the capability of analytical methods is commonly related to the analysis of matrix-matched samples or spiked homogenised natural samples collected at a representative site. The involved uncertainty of the obtained analyte concentration values therefore reflects mainly the contribution of the analytical process (EURACHEM/CITAC Guide 2000). But particularly in environmental research, the knowledge of analyte concentration in an individual subsample is not sufficient or even misleading. The interpretation of environmental impacts rather requires the specification of a batch of material. With this respect, the term measurand is used, and its uncertainty also includes the effects arising from the process of sampling and the preparation of the test samples (Ramsey and Ellison 2007; ISO 11648-1 2003). In accordance with that, the uncertainty of the measurand includes particularly any necessary allowance for heterogeneity of the bulk.

The presented investigations focused on a case study monitoring the content of environmental essential macronutrients (P, K, S, Ca and Mg) in agricultural crops which were cultivated in a setup of lysimeter trials. The objective of this trial was both to evaluate the impact of different mineral fertiliser application rates and to test the influence of different land management intensities (fertiliser application rate linked with different irrigation regimes) on the uptake of macronutrients by plants (grassland, arable land), respectively.

The interpretation of the results will focus on the analytical results for phosphorus because it is of particular importance for groundwater and surface water pollution via runoff by rain water, especially under the aspect that phosphorus is the limiting nutrient for algae blooming in aquatic environment.

To examine the impact of land management activities on the environment, it proves of particular concern to control the budget of the essential macronutrients in plants by means of an appropriate analytical method.

In the present paper, we focused on X-ray fluorescence analysis as an approved and cost-effective analytical method of solid-state analysis.

To evaluate the appropriateness of the proposed method, the results of recently published works (Meissner et al. 2002; Godlinski et al. 2004) were used as a reference to assess relevant performance requirements of the shortlisted analytical procedure. Particularly in the framework of the fitness for purpose concept (Ramsey and Ellison 2007; Lyn et al. 2007a), it proves important to be aware of the minimum variations of analyte concentrations in the plant material which can be just confirmed by the analytical method in dependence on the sample size. The corresponding investigations require reliable estimates for the measurement uncertainty (including both analytical and sampling components). For this purpose, the analytical data provided by a survey across several lysimeter stations were subjected to analysis of variance (ANOVA).

2 Materials and Methods

2.1 Lysimeter Trial

The presented investigations based on a set of different non-weighable lysimeters each with a depth of 125 cm and a surface area of 1 m2. These lysimeters had been filled with soil types that are representative for the River Elbe catchment (Leinweber et al. 1999).

The lysimeters were run since 1991 under treatments reflecting land use and soil management.

The results of a lysimeter trial, including grass as well as arable land used lysimeters were considered as a case study. Fresh matter and the dry matter yields were determined after harvesting the biomass grown up on the lysimeters. The annual nutrient withdrawal at last was calculated from the nutrient concentrations determined by the analytical method and the amount of total dry matter as estimated before.

2.2 Sampling Protocol

Within this study, a sample was taken to represent the entire biomass of a lysimeter station (i.e. the sampling target). The application of the specified sample protocol aimed at the extraction of each two duplicates D i,1 and D i,2 from each sampling target L i (see Fig. 1).

Fig. 1
figure 1

Nested sampling design of the sampling target L i . D i,1 and D i,2 denote the duplicates taken from the lysimeter Li. A i,1,1, A i,1,2 and A i,2,1, A i,2,2 are the test samples (intended for analysis) prepared from both duplicates D i,1 and D i,2

The harvested material was air-dried (at 80°C), afterwards milled in an ultracentrifugal mill Z1 (Retsch, Haan, Germany; applying a sieve with a mesh size of 0.25 mm) and finally homogenised by shaking for about 5 min. From this partly prepared sampling target, one composite sample was taken that should represent the typical composition of the primary sample. For this purpose, a number of grab samples (increments of about 0.5 g) were taken from the prepared target at random by means of a spatula. From each sampling target in this way, a mass of each 4 g was selected for sample preparation. In the same way, a second sample was taken to reflect the inevitable variations in the sampling protocol caused, e.g. by the process of collecting a combined sample and the effect of small-scale heterogeneities in the sampling target. Each duplicate sample was analysed twice, yielding at last four analytical results (A i,1,1, A i,1,2 and A i,2,1, A i,2,2 ) evaluating a target.

2.3 Sample Preparation

The total amount of each extracted duplicate was divided into two equal portions (each 2 g), yielding each two separate test portions (T i,1,1, T i,1,2 and T i,2,1, T i,2,2) for the duplicates D i,1 and D i,2. Sample preparation was accomplished by mixing the test portions (2 g) with stearine wax [Hoechst wax for X-ray fluorescence (XRF) analysis] as binder in a ratio 80:20 (w/w) and by afterwards pressing the mixed powder into pellets with a diameter of 32 mm and a total mass of 2.5 g.

2.4 Analysis of Variance

The database required for the ANOVA was performed by analytical runs applied to the nested sampling design (duplicate method), as shown in Fig. 1. In order to separate the variance components associated with the processes of sampling (denoted as factor D) and analysis (denoted as factor A) from the variation between the lysimeters (denoted as factor L), the appropriate data sets were processed by nested ANOVA (Massart et al. 1997; Dehouck et al. 2003), yielding the sum of squares SS A , SSD(L) and SS L . Dividing the SS by the corresponding degrees of freedom, df, results in the mean squares MS, which at last will be utilised to compute the corresponding variance components as listed in Table 1 at which l, d and a denote the number of investigated lysimeters (l = 16), the number of replicates for each sample (d = 2) as well as the number of analysis (a = 2).

Table 1 Notation and calculation of variance components provided by ANOVA (l is the number of lysimeters (l = 16), d is the number of sample replicates (d = 2) and a is the number of analytical replicates (a = 2)

The involved mean squares MSA thereby represent the error terms due to the analytical variation MSL the effect of the between lysimeter variation and MSD(L) the effect of factor D (sampling) nested in factor L, as it is also obvious from Fig. 1. Assuming that the sources of variation are independent, the total variance was approached by the expression \( s_{\rm T}^2 = s_{{\rm samp} }^2 + s_{{\rm ana} }^2 + s_{\rm L}^2 \) (Thompson and Maguire 1993).

The combined standard uncertainty of the measurement, u c, finally results from the combination of the variances of sampling and analysis according to \( {u_{\rm c} } = {s_{meas}} = \sqrt {s_{{\rm samp} }^2 + s_{{\rm ana} }^2} \), or expressed in terms of \( r = s_{{\rm ana} }^2/s_{{\rm samp} }^2 \):

$$ {s_{meas}} = {s_{ana}}\sqrt {\frac{{1 + r}}{r}} . $$
(1)

The actual degrees of freedom for \( s_{ana}^2 \), related to the hierarchical duplicated design used, amounts to df ana = 32. In contrast to this, the degrees of freedom for the combined estimates as \( s_{samp}^2,\,s_{\rm L}^2 \) or \( s_{meas}^2 \), which all result from linear combinations of the independent mean squares MSD(L), MSA and MSL, have to be approached by means of effective degrees of freedom \( d\widehat{f} \) (Lyn et al. 2007b). In particular, the interesting combined measurement variance \( s_{meas}^2 \) is obtained from:

$$ s_{meas}^2 = \frac{{{{{\rm MS} }_{D(L)}} + \left( {a - 1} \right){{{\rm MS} }_A}}}{a}. $$
(2)

Consequently, the actual number of degrees of freedom for \( s_{meas}^2 \) is not simply the sum of degrees of freedom (\( d{f_{samp}} + d{f_{ana}} \)) of the involved variance components, but it may be approximated rather by effective degrees of freedom \(d\widehat{f}_{{meas}} < df_{{samp}} + df_{{ana}} \) obtained from the Welch–Satterthwaite equation (Satterthwaite 1946) according to:

$$ d{\hat f_{meas}} = \frac{{{{\left( {\frac{1}{a} \times {{{\rm MS} }_{D(L)}} + \frac{{a - 1}}{a} \times {{{\rm MS} }_A}} \right)}^2}}}{{\frac{1}{{{a^2}}}\frac{{{\rm MS}_{D(L)}^2}}{{l \times \left( {d - 1} \right)}} + {{\left( {\frac{{a - 1}}{a}} \right)}^2}\frac{{{\rm MS}_A^2}}{{l \times d \times \left( {a - 1} \right)}}}}. $$
(3)

In consideration of \( s_{ana}^2 = {{\rm MS}_A} \) (Table 1) as well as the specification a = 2 (duplicate analysis), Eq. 2 can be rearranged yielding \( {{\rm MS}_{D(L)}} = s_{{\rm ana} }^2 + 2 \times s_{samp}^2 \), by what Eq. 3 can also be expressed in terms of the ratio r, leading to the more clearly arranged expression:

$$ d{\widehat{f}_{meas}} = \frac{{4 \times {{\left( {1 + r} \right)}^2}}}{{\frac{{{{\left( {2 + r} \right)}^2}}}{{l \times \left( {d - 1} \right)}} + \frac{{{r^2}}}{{l \times d}}}}. $$
(4)

Referring to the investigated sample design (l = 16, d = a = 2), the calculated values \( d{\widehat{f}_{meas}} \) (4) therefore cover at maximum the range of \(16 \leqslant d\widehat{f}_{{meas}} < 42.7\) in dependence on the variation of the ratio r in the interval \(0 < r < \infty \), which corresponds to the total scope of experimental conditions \( s_{{\rm ana} }^2 \ll s_{samp}^2 \) up to \( s_{ana}^2 \gg s_{samp}^2 \).

2.5 Minimum Detectable Concentration Difference and Sample Size

In the present paper, the concept of the validation parameters critical concentration CCα—“decision limit”—and the critical concentration CCβ—“detection capability”—(Council Directive 96/23/EC 2002) was used to define the performance parameter Δx min (minimum detectable concentration difference at a permitted concentration level x mean).

The decision limit, CCα, corresponds to an individual measurement result. By means of the critical concentration, CCα, the result x can be classified as compliant or non-compliant in dependence whether x < CCα or x ≥ CCα. At and above CCα (x ≥ CCα), consequently, it can be concluded with an error probability p ≤ α that the measurement result x is non-compliant (see Fig. 2).

Fig. 2
figure 2

Schematic diagram for the derivation of minimum detectable concentration difference, ∆x min, at the permitted concentration level x mean in consideration of a measurement variance \( s_{meas}^2 \), decision limit CCα, detection capability CCβ and the non-centrality parameter \( \delta = {t_{1 - \alpha }} + {t_{1 - \beta }} \)

The detection capability on the other hand represents a basic feature of the method and is defined as that concentration at and above which (x ≥ CCβ) the analytical method is able to detect true analyte concentrations in a sample with a probability p ≥ 1 − β.

In the present study, CCβ was used to define the critical parameter \( \Delta {x_{\min }} = \left| {{x_{mean}} - {{{\rm CC} }_\beta }} \right| \) which corresponds to the minimum detectable concentration difference around x mean (see Fig. 2). For true variations Δx ≥ Δx min accordingly, it can be concluded that the method is able to confirm Δx with a statistical certainty p ≥ 1 − β. On the other hand, true variations Δx ≥ Δx min remain undetected with an error probability pβ.

Based on the definition of the non-centrality parameter \( \delta = \delta \left( {d{{\widehat{f}}_{meas}};\alpha; \beta } \right) \) (ISO 11843), the minimum detectable concentration difference, Δx min, is given by \( \Delta {x_{\min }} = \delta \left( {d{{\widehat{f}}_{meas}};\alpha; \beta } \right) \times {s_{meas}} \). In consideration of the recommended significance levels \( \alpha = \beta = 0.05 \) (ISO 11843), the non-centrality parameter can be approached by the expression:

$$ \delta \approx {t_{1 - \alpha }}\left( {d{{\widehat{f}}_{meas}}} \right) + {t_{1 - \beta }}\left( {d{{\widehat{f}}_{meas}}} \right) = 2 \times {t_{0.95}}\left( {d{{\widehat{f}}_{meas}}} \right) $$
(5)

as it should become evident after examination of the sketch in Fig. 2. Taking into account n replicates of the measurements, it follows \( \Delta {x_{\min, n}} = \delta \times {s_{meas}}/\sqrt n \), or expressed in terms of r, finally, it can be written as:

$$ \Delta {x_{\min, n}} = \delta \times {s_{ana}}\sqrt {\frac{{1 + r}}{{n \times r}}} . $$
(6)

In consideration of an accepted minimum detectable concentration difference \( \Delta {x_{acc}} \), the required minimum sample size n consequently can be obtained from the relation:

$$ n = \frac{{{\delta^2}\left( {d{{\widehat{f}}_{meas}};\alpha = \beta = 0.05} \right) \times s_{ana}^2}}{{\Delta x_{acc}^2}} \times \left( {\frac{{1 + r}}{r}} \right), $$
(7)

where \( {t_{\left( {1 - \alpha } \right) = 0.95}} \) and \( {t_{\left( {1 - \beta } \right) = 0.95}} \) are the one-sided tabulated values of the t statistic at the significance levels α = β = 0.05 and for the corresponding effective degrees of freedom \( d{\widehat{f}_{meas}} \).

3 Experimental

3.1 Analytical Method

The XRF measurements were performed by a wavelength dispersive XRF spectrometer (S4 Pioneer, Bruker axs) which was controlled by the software package SPECTRAplus. The spectrometer operating conditions were vacuum and 23-mm collimator mask in conjunction with the analysing crystals OVO55, Ge and LIF100.

The calculation of the analyte concentrations is based on individual calibration curves which were adjusted by means of certified reference plant materials B211 (maize), B214 (wheat strew), B215 (cabbage), B223 (grass), B227 (wheat strew), B229 (lucerne)—all from Wageningen Agricultural University—and SRM1515 (apple leaves), SRM1547 (peach leaves) and SRM1575 (pine needles)—all from NIST, which had been also used in a former study (Wennrich et al. 1995).

Analytical runs applied to the nested sample design provided the required data pool to accomplish variance analysis (ANOVA).

3.2 Results of ANOVA

In Table 2, exemplarily, the pool of analytical data is displayed for the determination of phosphorus in grass, as obtained by the application of the duplicate method to the investigated setup of lysimeter stations. After subjecting the appropriate data sets to ANOVA, the involved variance components were available.

Table 2 Phosphorus contents in grass samples (specified in mg/kg) taken from the grassland lysimeter setup (L1…L16)

3.2.1 Measurement Uncertainty

In Table 3, the results of ANOVA are summarised. The gained estimates for the respective uncertainty components refer to both setup of lysimeter trials (grass and field crops growing) including all analytes under study.

Table 3 ANOVA applied to the pool of analytical data provided by the application of the specified sample protocol to the investigated setup of lysimeter stations (Table 2)

The evaluation of the obtained data indicates clearly that most of the variability (∼99% of the total variance \( s_{\rm T}^2 \)) arose due to the between-location variance which is related to the true variations of analyte concentration between the respective lysimeter stations L i . In the framework of the fitness for purpose concept, the investigated method meets the commonly accepted fitness for purpose criterion \(s{^{2}_{{meas}} } \mathord{\left/ {\vphantom {{^{2}_{{meas}} } {s^{2}_{{\text{T}}} }}} \right. \kern-\nulldelimiterspace} {s^{2}_{{\text{T}}} } < 20\% \) (Ramsey and Argyraki 1997).

The ratio \( {s_{samp}}/{s_{ana}} \) was found to be less or at most close by 1. This result confirms that the homogeneity of the primary sample (entire biomass of a lysimeter) can be considerably improved by subjecting the raw material to a set of material operations (e.g. crushing, grinding and mixing) as specified in the sampling protocol (Section 2.2). On these conditions, it was achieved that sampling contributes only minor or in almost to the same degree as the analytical variance to the combined measurement uncertainty.

Only for the determination of sulphur in grass did the relevant ratio \( {s_{{\rm samp} }}/{s_{ana}} \) somewhat exceeded the level of 1, suggesting a more significant heterogeneous distribution of this analyte in the primary sample.

As mentioned above, the number of degrees of freedom for the estimated measurement variance \( s_{meas}^2 \) is not simply the sum of the degrees of freedom of the involved variance components but rather result from the Welch–Satterthwaite Eqs. 3 and 4.

In consideration of the specific setup of investigated lysimeter stations and the calculated estimates \( s_{ana}^2 \) and \( s_{samp}^2 \) (see Table 3), Eq. 4 yields exemplarily for the determination of phosphorus in grass \( d{\widehat{f}_{meas}} = 30.1 \) degrees of freedom, which indicates that also in consideration of the effective degrees of freedom, the corresponding t distribution becomes close to a normal distribution.

3.2.2 Detection Capability vs. Sample Size

Based on the estimated variance components and the effective degrees of freedom, the minimum detectable concentration differences, Δx min, as a function of the related sample size n was calculated according to Eqs. 6 and 7. In Table 4, the corresponding results are summarised and refer to n = 2 replicates of the measurements, yielding the minimum detectable concentration differences Δx min,2.

Table 4 Minimum detectable concentration differences Δx min,n (quoted as mg/kg or as percentage of nominal values) based on the non-centrality parameter \( \delta \left( {d{{\widehat{f}}_{meas}},\alpha = \beta = 0.05} \right) \approx 2 \times {t_{0.95}}\left( {{{\widehat{f}}_{meas}}} \right) \)and n = 2 replicate measurements

Exemplarily, the analytical results for phosphorus across the entire setup of the investigated lysimeters are displayed in Fig. 3.

Fig. 3
figure 3

Analytical results for phosphorus (target mean) across the investigated setup of lysimeter stations Li [grass (triangles), field crops (circles)]. The error bars correspond to the minimum detectable concentration differences, ∆x min,2 , as calculated from Eq. 6

The included error bars at each data point represent the estimated minimum detectable concentration differences, Δx min,2, at the actual concentration level x.

In Fig. 4, a graphical view is given, demonstrating the dependence of Δx min on the ratio r for the parameters n = 2 and n = 4 (replicate measurements), denoted by the functions Δx min,2 and Δx min,4. The presented functions thereby base upon the specific experimental design used for the estimation of the measurement uncertainty s meas (duplicate method with l = 16, d = n = 2) and are related to an error I probability α = 0.05 (decision limit) and an error II probability β = 0.05 (detection capability), respectively. The evaluation of the displayed functions provides substantial information to assist a costumer in the framework of the benefit–cost analysis.

Fig. 4
figure 4

Minimum detectable concentration difference, ∆x min, for phosphorus refer to n = 2 and n = 4 replicate measurements versus the ratio \( r = s_{{\rm ana} }^2/s_{{\rm samp} }^2 \) represented by the functions ∆x min,2(r) (squares) and ∆x min,4(r) (cricles)

In this context, in Fig. 4 for instance, the point with the coordinates (1.53;158) was marked by arrows corresponding to the actual measurement conditions for the determination of phosphorus in grass (based on the number of replicates n = 2 and the calculated uncertainty components, \( {s_{ana}} = 45.3{\rm mg} /kg \) and \( {s_{samp}} = 36.6{\rm mg} /kg \) or rather r = 1.53). Evaluating the graph in Fig. 4, the detection capability for phosphorus in grass can be directly derived, reading out the function value \( \Delta {x_{\min, 2}}\left( {1.53} \right) \approx 140{\rm mg} /kg \).

Additionally, by means of the displayed functions \( \Delta {x_{\min, 2}}(r) \) and \( \Delta {x_{\min, 4}}(r) \), the costumer will be immediately enabled to assess the experimental effort, which will be required to adjust the performance parameter \( \Delta {x_{min}} \) to any accepted value \( \Delta {x_{acc}} \). For example, maintaining the experimental conditions as mentioned above (\( r = 1.53 \)), but accepting the increased cost, associated with n = 4 replicate measurements, the detection capability would, according to the corresponding graph in Fig. 4, be upgraded to an improved value \( \Delta {x_{\min, 4}}(1.53) \approx 99{\rm mg} /kg \).

Using n = 4 replicate measurements but accepting \( \Delta {x_{acc}} \approx 140\,{\rm mg} /kg \), the relevant function Δx min,4 takes that value already at r = 0.43 (see Fig. 4), which corresponds to an increased permitted sampling uncertainty component \( {s_{samp}} = {s_{ana}}/\sqrt r \, \approx 69{\rm mg} /kg \); in other words, the necessary sampling requirements will be reduced noticeably.

4 Conclusion

Based on the results of recently published works (Meissner et al. 2002; Godlinski et al. 2004), the findings of the presented study suggest the capability of the specified XRF method to investigate effects of land management activities on the environment. Variations of the environmental essential macronutrients in agricultural crops as occurring for instance within the scope of lysimeter trials (3% to 5% percentage of x mean) could be confirmed by the analytical method with a probability p(x) ≥ 95% already at sample sizes n ≥ 2. In the framework of the fitness for purpose concept, it was found that the method specified meets the commonly accepted fitness for purpose criterion \({s^{2}_{{{\text{meas}}}} } \mathord{\left/ {\vphantom {{s^{2}_{{{\text{meas}}}} } {s^{2}_{{\text{T}}} < 20\% }}} \right. \kern-\nulldelimiterspace} {s^{2}_{{\text{T}}} < 20\% }\).

For all of the investigated analytes, there was clearly a dominance of the between-target variance which was found to be approximately 99% of the total variance.

The required estimates for measurement uncertainties were provided by the application of the duplicate method to the plant material taken from an adequate number of cultivated targets (lysimeter trial). The corresponding t distributions were close to a normal distribution (\( d{\widehat{f}_{{\rm meas} }} \approx 30 \)).