Introduction

Legislation in the European Union demands the labelling of food products consisting of or containing “more than 0.9 % genetically modified organisms” (GMOs), provided that the GMO has been placed on the market in accordance with Community legislation [1]. This enforces the necessity to develop and validate reliable GMO-specific quantification methods. Most methods for the detection and quantification of GMOs are based on event-specific quantitative real-time PCR (qPCR) targeting one of the junction regions of the inserted transgenic DNA. These event-specific qPCR methods are developed and proposed by the biotech companies that request authorisation of the GMO event in Europe. As a ratio between genetically modified (GM)-specific and species-specific entities has to be determined, a DNA region around the integration site (the so-called 5′ or 3′ insert-to-plant junction) and a taxon-specific reference sequence are targeted. The performance of the event-specific qPCR method is further evaluated and approved by the European Union Reference Laboratory for GM Food and Feed (EURL-GMFF) for being suitable for the quantification of GMOs in view of (EC) No 1830/2003 [1]. Those qPCR methods are preferably applied by GMO testing laboratories including European National Reference Laboratories (NRLs). In parallel, reference materials (RMs) for the calibration and quality control of the qPCR methods are requested by the EU legislation [2]. The Institute for Reference Materials and Measurements (IRMM) provides a number of certified reference materials (CRMs) suitable for the quantification of GMOs such as ERM-BF415, ERM-BF427 and ERM-BF425 for the GM events maize NK603, maize 98140 and soya 356043, respectively. The CRMs ERM-BF415 [3] and ERM-BF427 [4] are a series of materials consisting of mixtures of GM NK603 and 98140 maize powders with conventional maize powders, respectively. The CRM ERM-BF425 [5] is a series of materials composed of GM modified soya 356043 seed powder and conventional soya seed powder. The mass fraction CRMs are intended to be used for quality control, for example, to check for any bias in the measurement or for calibration of the qPCR measurement. Beside the GM content expressed as mass fraction, one of the CRM of each series (namely ERM-BF415e [6], ERM-BF427c [7] and ERM-BF425c [8]) is additionally certified for their GMO content expressed as DNA copy number ratio.

The IRMM is committed to produce CRMs based on most advanced metrological concepts and a certified value with an uncertainty fit for the intended purpose. Therefore, three independent interlaboratory comparisons were launched for the quantification and certification of the GMO content of ERM-BF415e, ERM-BF427c and ERM-BF425c expressed in DNA copy number ratios. Laboratories were asked to extract DNA from those CRMs, to determine the total DNA concentration and to perform the qPCR measurements. At the time when certification studies are carried out, the GM events can still be in the authorisation process in Europe. This means that the sequences for the primers and probes for the detection of the GM event are still confidential. As a consequence, the laboratories have been asked to analyse the samples without knowing which event they analysed. They were requested to submit raw data in the form of cycle threshold (Ct) values applying automatic threshold settings. The subsequent scrutiny of the data and its evaluation was done by IRMM.

There is typically one validated qPCR method for the quantification of the GM content available per event, and this method is used as reference method as defined in Regulation (EC) No 882/2004 [9]. For qPCR measurements, usually repeatability (relative repeatability standard deviation (RSD r )) of up to 25 % and reproducibility (relative reproducibility standard deviation (RSD R )) of below 35 % are accepted [10]. In order to minimise the influence of repeatability and reproducibility effects in the characterisation study, a large number of up to 42 data sets were created per interlaboratory comparison. For each data set, the mean measured GM content expressed as DNA copy number ratio was calculated. Finally, the GM content of ERM-BF415e, ERM-BF427c and ERM-BF425c was established as the mean of means.

In order to find out the best approach for the establishment of the certified values and their uncertainties, the mean of means and its standard deviation were calculated following different approaches: by arithmetic mean from all data sets, by robust statistical procedures and by arithmetic mean from data sets after scrutiny of raw data.

The analysis of raw data from the three large interlaboratory comparisons allowed to investigate the best approach for the certification of matrix CRMs. This study defines a number of measurement parameters that guarantee the quality of the qPCR raw data that are selected to derive a certified value and its related uncertainty.

For the first time, the impact of such quality criteria and data assessment on reported results has been evaluated on a large number of data, and their applicability and relevance not only for RM certification campaigns but also for routine GMO measurements are discussed in this manuscript.

Materials and methods

Experimental set-up of interlaboratory comparisons

Three independent interlaboratory comparisons were carried out on two maize CRMs ERM-BF415e and ERM-BF427c and one soya CRM ERM-BF425c, respectively. The CRMs are each composed of a mixture of genetically modified and conventional seed powders. The CRMs were initially certified for their GM content expressed as a mass fraction with certified values and expanded uncertainty of 19.6 ± 0.9 g kg−1 for ERM-BF415e, 10.0 ± 1.1 g kg−1 for ERM-BF425c and 20.0 ± 0.8 g kg−1 for ERM-BF427c. Afterwards, they were additionally certified for their GM content expressed as a DNA copy number ratio based on the interlaboratory comparison data and their evaluation described here. Expert laboratories with proven experience and quality management systems in place provided in total 42 data sets for ERM-BF415e and ERM-BF425c, respectively, and 35 data sets for ERM-BF427c. Three different DNA extraction methods were applied per CRM. For ERM-BF415e, DNA was extracted either by applying a modified cetyltrimethylammonium bromide (CTAB) extraction method (adopted from ISO21571:2005 [11]), the modified CTAB method combined with a Genomic-tip 20/G purification step (Qiagen, Benelux B.V., Venlo, NL) or the GENESpin extraction method (GeneScan Analytics GmbH, Freiburg, DE) [6]. For ERM-BF247c, DNA was extracted by using a modified CTAB method [12], the DNeasy plant mini kit (Qiagen, Benelux B.V., Venlo, NL) or the GENESpin DNA kit (GeneScan Analytics GmbH, Freiburg, DE) [7]. The DNA from ERM-BF425c was extracted by using a modified protocol based on the DNAExtractor kit (Eurofins GeneScan GmbH, Freiburg, DE) with a Genomic-tip 20/G purification step, the CTAB method combined with a Genomic-tip 20/G purification step, originally validated for cotton 3006-210-23/281-24-236 [13] or a slightly modified protocol of the GENESpin extraction kit (Eurofins GeneScan GmbH, Freiburg, DE) [8].

For each independent analysis, two bottles of the matrix CRMs ERM-BF415e, ERM-BF427c or ERM-BF425c were provided. The laboratories did not know which specific GM event was analysed. From each bottle, three analytical samples were extracted and analysed (N = 2, n = 3), further referred to as “unknowns” (U1-U3 for bottle 1, U4-U6 for bottle 2). U1–U3 were analysed on day 1, and U4–U6 were analysed on day 2, that is, the complete analysis of one data set had to be spread over at least 2 days. The total DNA concentration as well as the quality of the DNA extract was estimated by UV spectrometry at 280, 260 and 230 nm or by fluorometry. The DNA concentration of each extract was adjusted to 20 ng/μL with nuclease-free water (referred to as “undiluted extracts”). Afterwards, all extracts were diluted: 1.5-, 3- and 4-fold for ERM-BF427c samples (Fig. 1), 1.5-, 3-, 4- and 6-fold for ERM-BF415e samples and 3-, 6-, 9 and 12-fold for ERM-BF425c samples. The dilution series were analysed by qPCR in triplicate for ERM-BF415e and ERM-BF427c and in quadruplicate for ERM-BF425c according to the EURL-GMFF published event-specific qPCR methods [11, 14, 15]. For the quantification of the maize NK603 event in ERM-BF415e, hmg served as reference gene instead of targeting adh1. This change of the reference gene avoids an erroneous quantification of the copy number as reported previously by Broothaerts et al. [16]. Event-specific primer pairs and labelled TaqMan® probes for the three studies were provided by IRMM as given in Table 1.

Fig. 1
figure 1

Example for the analysis scheme followed by the participating laboratories during the copy number study. Undiluted extract U1 is given as an example. The same steps have been followed for all other samples and their dilutions. One data set comprises of six DNA extractions and their dilutions analysed in triplicates (or quadruplicates (not shown)) by qPCR, using the plasmid DNA calibrant CRMs ERM-AD415, ERM-AD427 or ERM-AD425

Table 1 Primer and probe sequences for event-specific qPCR methods

Plasmid DNA calibrant CRMs ERM-AD415, ERM-AD427 and ERM-AD425 with known DNA copy number ratios were used for the calibration of qPCR experiments [1719]. Dilution series were prepared by the participating laboratories for the calibrants ERM-AD415 and ERM-AD427 with concentrations ranging from 10 to 105 copies μL−1 and for ERM-AD425 with concentrations ranging from 5 to 2 × 105 copies μL−1. The qPCR experiments using TaqMan Universal mastermix were carried out according to the manufacturer’s instructions (Applied Biosystems, Foster City, CA, USA) on 96-well microtiter plates with all runs performed for 45 cycles. One laboratory adapted the set-up for the performance in 384-well plates for ERM-BF415e and ERM-BF425c. The laboratories were requested to submit raw data in the form of Ct values. These data were scrutinised and evaluated by IRMM. The mean Ct values for the undiluted DNA extracts and their dilutions for each unknown were converted into transgene and endogene copy numbers via the calibration curves by the following equation:

$$ {\text{DNA copy number}} = V \cdot 10^{{\frac{{{\text{Ct}}_{\text{mean}} - y}}{m}}} $$

where V = sample volume of PCR, Ctmean = mean cycle threshold from triplicate or quadruplicate measurement, y = ordinate intercept, m = slope.

Copy number ratios were calculated from transgene and endogene copy numbers for each unknown for each day. Only the copy number ratios of the undiluted extracts were used for each accepted unknown to calculate the mean GM content expressed as relative DNA copy number ratio.

Data acceptance criteria

All data sets needed to fulfil the minimum performance requirements in order to be accepted (Fig. 2): (1) the calibration curves had to consist of 3 or more accepted points from each dilution; (2) Ct values for DNA extracts and their dilutions needed to fall inside the working interval of the calibration curve; (3) the copy number ratio measured on an undiluted extract was used for the calculation of the mean GM content if at least one more copy number ratio measured in its dilution series could be calculated; and (4) a minimum of two DNA copy number ratios per day was needed to fulfil this criterion, that is, a minimum of four copy number ratios per data set were accepted.

Fig. 2
figure 2

Acceptance criteria applied for data evaluation. Ct cycle threshold, R 2 coefficient of determination, ε PCR efficiency, RSD r relative repeatability standard deviation and RSD ip relative standard deviation for intermediate precision

For the calculation of the certified values, additional data acceptance criteria were established. The data had also to pass the following internal controls: (1) only data sets with negative results on non-template controls (NTC) were accepted; (2) data points were rejected if any inconsistency in the results within a dilution series was noticeable in the trend of reported Ct values, that is, the more diluted a sample is, the higher the Ct values should be; and (3) means of the triplicate or quadruplicate measurement results were not accepted if the Ct variation between one triplicate or quadruplicate result was higher than 1.5 of the mean Ct of the remaining data for replicates at the particular dilution point (ΔCt). Dilution points were also excluded if one single amplification signal was absent among the replicates.

The calibration curves were needed to fulfil the minimal performance requirements stipulated by the EURL-GMFF in the method validation guidelines [10]. In particular, these selection criteria apply to the linearity and slope, that is, PCR efficiencies in the calibration curves. Calibration curves with a R 2 above 0.98 were accepted. For the certification campaigns, control limits for PCR amplification efficiencies (ε) were established in a method-dependent approach. For each event, transgene and endogene, mean PCR efficiencies and their standard deviations were calculated. Genomic DNA extracted from young plant leaves was also used as calibrant to allow a comparison between plasmid and genomic DNA calibrant, but those data have been submitted in a separate paper [20]. Control limits have been calculated for both calibrants per transgene and endogene as the mean plus or minus one standard deviation, that is, four intervals were generated. The minimum and maximum values of all intervals were then applied as the lower and higher limits of the PCR efficiency per interlaboratory study. The resulting cut-off limits for PCR efficiency applied in the interlaboratory comparisons are given in Table 2.

Table 2 Lower and higher limits of PCR efficiency (ε) applied as acceptance criteria for transgene and endogene calibration curves

Data sets considered for the certified value also needed to fulfil the minimum requirements regarding precision. Two types of relative standard deviations (RSD) were considered: (1) the relative repeatability standard deviation (RSD r ) between the results on three analytical samples within one day and (2) the RSD for intermediate precision (RSD ip ) obtained from one laboratory on two bottles over two days. RSD r and RSD ip had both to be below 25 %.

Calculation of means

Means were calculated following three different approaches. Firstly, the arithmetic means from all available data sets for a given measurement target were calculated, in this manuscript referred to as “meanall”. Secondly, as large numbers of data sets were available, the GM content was estimated by determining the median and calculating a robust mean. The Huber’s H15 estimator for robust mean (referred to as “robust mean” hereafter) was chosen as the data sets were of medium size, roughly symmetric and unimodal but contaminated with outliers [21]. As a third approach, the raw data, that is, Ct values provided by the laboratories, were subjected to a data filtering using the quality criteria described here. That means only data based on pre-set values for the coefficient of determination (R 2), method-dependent control limits for PCR efficiencies for the calibration curves and a maximum threshold for repeatability in the measurement of the GM content were included in the calculation. As a consequence, extreme values were not taken on board. The arithmetic mean was only calculated from those data sets that fulfilled all quality criteria. In this manuscript, we refer to this value as “meanscrutinised”.

Arithmetic means, medians, robust means and standard deviation (s) were calculated using Microsoft Excel and the Microsoft Excel Add-in for robust statistics [22]. For the calculation of meanall, meanlog, robust mean, robust meanlog and median, a total of 37, 35 and 39 qPCR data sets, respectively, were considered for ERM-BF415e, ERM-BF427c and ERM-BF425c. From them, 21, 20 and 15 data sets for ERM-BF415e, ERM-BF427c and ERM-BF425c, respectively, were accepted for the calculation of the meanscruitinised.

Results

The comparison presented in this manuscript is based on different approaches of calculating the GM content expressed as DNA copy number ratios of the matrix CRMs ERM-BF415e, ERM-BF427c and ERM-BF425c.

Impact on calibration

The impact of applying data acceptance criteria on the calibration curves was investigated. Mean R 2 of the calibration curves was always above 0.98 regardless of applying data quality criteria or not (data not shown). Few calibration curves were rejected for technical reasons, in the cases that a calibration curve consisted of less than three dilution points or that a higher dilution was not reflected in higher Ct values (data not shown).

Looking at the mean ε, a total of 66 out of 476 calibration curves were rejected when the data quality criteria were applied, which are described in Table 2 and Fig. 2. The mean ε and s increased slightly for the PCR targets hmg and 356043, while for all other investigated targets, mean ε decreased when the quality criteria were applied (Table 3). A mean ε above 100 % was observed for the le1 PCR target when no limits for ΔCt, R 2 and ε were applied. PCR efficiencies above 100 % are an indication for potential inhibition of the polymerase during the amplification process [23]. However, when the quality criteria are applied, the mean ε moved below 100 % for the le1 target. In all cases, s has been much lower when ΔCt, R 2 and ε were within the established limits (Table 3).

Table 3 Mean PCR efficiencies (ε) ± s, Ct is the cycle threshold, R 2 is the coefficient of determination, εmin and εmax are given in Table 1, n = number of accepted calibration curves

Impact on mean measured copy number ratios and s

The above comparison demonstrated that mean R 2 and ε were all acceptable even if no acceptance criterion regarding linearity was applied to improve the data sets. However, the data evaluation of the three interlaboratory comparisons implied that the failure of a single calibration curve for either the transgene or the endogene on one day triggered the omission of the whole data set. Therefore, filtering the data by using the impact of quality criteria described in the “Materials and methods” section on the mean DNA copy number ratios and s was assessed as follows.

Firstly, data sets needed to fulfil the minimum requirements only (Fig. 2). Following this approach, all 35 data sets were retained for ERM-BF427c, and 37 and 39 data sets (from 42 each) were accepted for ERM-BF415e and ERM-BF425c, respectively (Table 4). Calculated mean DNA copy number ratios (meanall) are given in Table 4. Large standard deviations were obtained (Table 4), corresponding to values for relative reproducibility standard deviations (RSD R ) of 88, 41 and 31 % for ERM-BF415e, ERM-BF427c and ERM-BF425c, respectively (Table 5). Hence, under these evaluation criteria, the quantification methods for the maize events NK603 and 98140 did not satisfy the method performance requirements for this precision parameter set by the ENGL that should be below 35 % [10].

Table 4 DNA copy number ratios ± s
Table 5 Comparison of the relative reproducibility standard deviation (RSD R ) calculated for meanall, for data sets filtered by ENGL requirements, for meanscrutinised (the certified values), robust means, medians and RSD R as reported in the EURL-GMFF validation reports [3234]

Secondly, raw data additionally needed to fulfil the requirements for repeatability: the RSD criterion was applied allowing an RSD r and an RSD ip of less than 25 % (Fig. 2). Following this approach, the mean GM content was estimated to be lower than meanall for ERM-BF415e and ERM-BF425c. The RSD criterion did not have an impact on the mean for ERM-BF427c. However, allowing a RSD maximum value of 25 % as a quality criterion alone led to a drastically decreased s for ERM-BF415e and ERM-BF425c. The RSD criterion alone did not have an impact on s for ERM-BF427c (Table 4).

As a third approach, raw data from the three interlaboratory comparisons were scrutinised using method acceptance criteria from the EURL-GMFF (R 2 ≥ 0.98, ε min = 90 %, ε max = 110 % and RSD r  ≤ 25 % [10]). Doing so, only few data sets fulfilled the requirements: 4 out of 42 for ERM-BF415e, 23 out of 35 for ERM-BF427c and 12 out of 42 for ERM-BF425c. The mean DNA copy number ratios were lower than the meanall for all three CRMs (Table 4). Also the standard deviations obtained following this approach were much lower than the ones when all data sets were accepted, corresponding to RSD R of 29, 27 and 25 % for ERM-BF415e, ERM-BF427c and ERM-BF425c, respectively. Thus, the method performance requirement for this precision parameter set by the ENGL was met. However, the RSD R values obtained following this approach were higher than the ones reported in the validation reports from the EURL-GMFF (Table 5). Moreover, only a few data sets could be retained, especially for ERM-BF415e and ERM-BF425c. While mean PCR efficiencies fell in the interval of 90 to 110 %, many individual calibration curves could not satisfy this requirement. As the three studies were designed so that the rejection of a single calibration curve led to the rejection of the complete data set, many data sets had to be omitted. With the exception of measurement results for the le1 target, all mean PCR efficiencies were lower than 100 % (Table 3). Therefore, the PCR efficiency ranges were determined per GM event in a method-dependent approach as described in the “Materials and methods” section (Table 2).

As a fourth approach, the quality acceptance criteria for data sets described above were applied in a sequential manner, comparing DNA copy number ratios when no specific quality criterion was in place (meanall) to DNA copy number ratios obtained when data were successively filtered. The mean calculated DNA copy number ratios changed when raw data were subjected to scrutiny (Table 4). Interestingly, for the two maize CRMs, meanall was clearly higher than meanscrutinised.

For all three CRMs, the sequential application of quality criteria on qPCR raw data also showed that s was lowered with the number of quality criteria in place (Table 4). The lowest s was obtained when all quality criteria were applied. In this comparison, the linearity of the calibration curve had the largest single impact on s for the calculated DNA copy number ratio of ERM-BF415e. For the DNA copy number ratio for ERM-BF427c, s improved most when a minimum ε was established. For ERM-BF425c, the RSD criterion had the largest single impact, as s decreased drastically when only data sets with RSD < 25 % were considered (Table 4).

The sequential application of quality criteria also allowed to identify for which reason the majority of data sets was filtered out. Table 6 summarises the number of data sets that were rejected out for particular reasons described above. For the maize CRMs ERM-BF415e and ERM-BF427c, most data sets were not accepted due to not fulfilling the set requirements for linearity and PCR efficiency of the calibration curves. For the soya CRM ERM-BF425c, most data sets were rejected because they failed internal control requirements (i.e. they showed positive results for NTCs or dilution inconsistencies) and second most for not fulfilling the established requirements for calibration curves.

Table 6 Quality criteria as described in Fig. 2 and the number of data sets (N d) that were rejected on the basis of these for measurement data on ERM-BF415e, ERM-BF427c and ERM-BF425c

The DNA copy number ratios per analysed sample and s are presented graphically for ERM-BF415e (Fig. 3), ERM-BF427c (Fig. 4) and ERM-BF425c (Fig. 5). The upper part of the graphs (A) show the results when minimum quality criteria were applied only. In these cases, also extreme values with large s were obtained. In all three cases, the DNA copy number ratios per sample followed a right-skewed distribution. In the lower part of the graphs (B), the measured DNA copy number ratios from scrutinised data sets are presented. For all three materials, extreme values were not considered for the estimation of the GM content of the samples when quality criteria were applied to the raw data. Large standard deviations were not obtained after the quality criteria were in place. Therefore, this approach was followed for the calculation of the certified value (Table 4). The RSD R values for the certified values are 12, 14 and 15 % for ERM-BF415e, ERM-BF427c and ERM-BF425c, respectively (Table 5).

Fig. 3
figure 3

DNA copy number ratios (%) for DNA extracted from ERM-BF415e (maize NK603). Error bars represent standard deviation s. The upper graph a (x) shows all DNA copy number ratios for samples fulfilling the minimum requirements for acceptance of a data set. The lower graph b (Δ) shows the DNA copy number ratios for samples additionally fulfilling the quality criteria regarding internal controls, calibration curve and RSD as described in Fig. 2. In graph a and b, the bold line marks meanscrutinised (i.e. the certified value), and the dotted line represents meanscrutinised plus or minus 1 s. n is the number of samples. 1The DNA copy number ratio ± s obtained for this sample was 13.16 ± 2.54 %

Fig. 4
figure 4

DNA copy number ratios (%) for DNA extracted from ERM-BF427c (maize 98140). Error bars represent standard deviation s. The upper graph a (x) shows all DNA copy number ratios for samples fulfilling the minimum requirements for acceptance of a data set. The lower graph b (Δ) shows the DNA copy number ratios for samples additionally fulfilling the quality criteria regarding internal controls, calibration curve and RSD as described in Fig. 2. In graph a and b, the bold line marks meanscrutinised (i.e. the certified value), and the dotted line represents meanscrutinised plus or minus 1 s. n is the number of samples

Fig. 5
figure 5

DNA copy number ratios (%) for DNA extracted from ERM-BF425c (soya 356043). Error bars represent standard deviation s. The upper graph a (x) shows all DNA copy number ratios for samples fulfilling the minimum requirements for acceptance of a data set. The lower graph b (Δ) shows the DNA copy number ratios for samples additionally fulfilling the quality criteria regarding internal controls, calibration curve and RSD as described in Fig. 2. In graph a and b, the bold line marks meanscrutinised (i.e. the certified value), and the dotted line represents meanscrutinised plus or minus 1 s. n is the number of samples

Comparison of calculated DNA copy number ratios

The large number of data sets allowed to establish a reliable system for scrutiny of raw data from qPCR. The calculation and comparison of mean DNA copy number ratios revealed that the scrutiny of raw data as proposed here was advantageous for obtaining results with suitable standard deviations. In all investigated cases, mean DNA copy number ratios were higher when all data sets were accepted (meanall) than after scrutiny of raw data (meanscrutinised) (Tables 2, 7). The latter was assigned as the certified values. In the following, it was investigated which evaluation approaches are relevant in order to obtain reliable results from a smaller number of data sets.

Table 7 Comparison of the certified value ± standard deviation s to arithmetic meanall ± s, arithmetic meanlog ± s, robust mean ± s, robust meanlog ± s and median ± median absolute deviation (MAD) of DNA copy number ratios when data sets fulfil the minimum requirements only

It has been already reported that results from proficiency tests of laboratories quantifying GM content by qPCR show a right-skewed distribution [2426]. The DNA copy number ratios were transformed into the logarithm to base 10 to follow a normal distribution. For all three GM materials investigated here, the DNA copy number ratios were lower when calculated as arithmetic mean of all data sets after log-transformation (arithmetic meanlog) than by using untransformed data (arithmetic meanall) (Table 7). For data on ERM-BF415e, the standard deviation was also smaller after log-transformation (Table 7). For data of the two maize CRMs ERM-BF415e and ERM-BF427c, DNA copy number ratios after log-transformation are still slightly higher than the certified values, while this parameter is slightly lower than the certified value for the soya CRM ERM-BF425 (Table 7).

The robust means calculated from all acceptable data sets were good estimates of the certified values. Their standard deviations were also smaller compared to those of the arithmetic means of all data sets (Table 7). Robust means and s calculated from log-transformed data were similar to robust means and s from untransformed data. In all cases investigated here, the median gave the best approximation of the certified value. The median absolute deviations (MAD) as a robust measure of variability, which itself is the median of the absolute deviations from the data’s median, were much smaller compared to the standard deviations from the meanall and robust mean (Table 7).

The corresponding RSD R values are given in Table 5. They are in line with the observation above: RSD R values are smaller for the median than for the robust mean.

Discussion

Regulation (EC) No 1830/2003 of the European Union specifies the labelling of GMOs within the food chain [1]. It demands that food and feed products consisting of or containing more than 0.9 % GMOs authorised in Europe are labelled accordingly. Consequently, analytical methods that are able to identify and quantify GMOs are required. According to Regulation (EC) No 882/2004 [27], the EURL-GMFF shall provide NRLs with such analytical methods, including validated reference methods. In the EU, the quantification of GMO in food and feed is based on one reference method per event [28], which is the method to be applied and referred to in the case of legal dispute. Therefore, it is of outmost importance that these analytical parameters are identified, which are crucial for the adequate performance of the measurement and which have to be checked to guarantee that the methods are well under control.

The comparison presented above is based on the evaluation of raw data obtained from the blind analysis of samples in the frame of three independent interlaboratory studies. The participating laboratories were requested to follow strictly the protocols for DNA extraction, dilutions of calibrants and DNA extracts and the set-up of the qPCR. They had to submit their qPCR raw data and did not have the possibility to treat their data prior submission. They were also not specifically asked to apply quality checks.

By evaluating in parallel a large number of qPCR raw data of the independent interlaboratory comparisons, it was possible to identify factors that influence the reported measurement results.

Certainly, the measurement results depend on the calibration. However, even if calibration curves were not subjected to any filtering by applying any quality criteria, the large majority of calibration curves were within the limits recommended by EURL-GMFF and commonly accepted for linearity and efficiency of qPCR. Only for the PCR target NK603, two calibration curves were not accepted as no signals after amplification were detected for some dilution points (Table 3). The large standard deviation for PCR efficiencies that is observed when no selection according to quality criteria is performed (Table 3) indicates that individual calibration curves may not necessarily be suitable for calibration. As a consequence, when all calibration curves regardless of their linearity or PCR efficiency were accepted, extreme values of DNA copy number ratios with large standard deviations were obtained (Table 4, Figs. 3a, 4a, 5a). Data followed a positively skewed distribution. Also, the meanall DNA copy number ratio was estimated to be higher than the certified value.

The characteristics of the calibration curves can be improved if measures are taken regarding a minimum value for ΔCt of a triplicate measurement in addition to the R 2 and PCR efficiency criteria. In this way, standard deviations for mean PCR efficiencies would be improved. However, when quality requirements for ΔCt, R 2 and PCR efficiency were in place, a number of calibration curves could not satisfy these and consequently were rejected. In the set-up of the three interlaboratory comparison studies, the rejection of a single calibration curve for either transgene or endogene triggered the exclusion of the complete data set. The rejection of calibration curves was the main reason for rejection of data sets for the two maize materials ERM-BF415e and ERM-BF427c and the second most reason in the case of the soya material ERM-BF425c (Table 6). Likewise, when the calibration curves met the quality criteria established here, mean measured DNA copy number ratios were in agreement with the certified values. This was not achieved if the RSD criterion or the technical criterion alone was used as quality criteria (Table 4).

The ENGL defined method performance criteria that need to be met before a method is considered to be ready to enter the full validation process. A method is accepted if the slope of the calibration curve is in the range of (−3.1 ≥ slope ≥ −3.6) corresponding to a PCR efficiency range of 90–110 % [10]. Except for the PCR target NK603, mean PCR efficiencies measured in the three interlaboratory comparisons met this requirement, regardless of the data scrutiny level (Table 3). While mean PCR efficiencies fell in the interval of 90–110 % (with the exception for the NK603 PCR target), PCR efficiencies of individual calibration curves did not necessarily fall in this interval. This can be monitored if the PCR efficiency interval of 90–110 % was applied as a quality criterion. If only data sets were considered whose calibration curves showed PCR efficiencies between 90 and 110 %, many data sets were omitted for further calculation: Only 4 out of 42 data sets for ERM-BF415e, 23 out of 35 data sets for ERM-BF427c and 11 out of 42 data sets for ERM-BF425c were accepted (Table 4). On the other hand, the standard deviations of the mean measured DNA copy number ratios were large (Table 4) resulting in higher RSD R values than those obtained during method validation studies (Table 5). As there was a large number of data sets available that allowed us to evaluate the raw data in detail, cut-off limits for PCR efficiencies were established in a method-dependent approach for the certification of ERM-BF415e, ERM-BF427c and ERM-BF425c (Table 2). Yet, for a reliable measurement, it is more important that the PCR efficiencies demonstrated by the calibration curve and the PCR efficiency measured for the diluted test sample are in the same range [20, 29].

A tendency to overestimate GM levels by laboratories and a right-skewed distribution of data has been noted [24, 26].

The assigned value for the proficiency testing material is usually calculated using a robust statistical procedure of log-transformed data [23, 26]. Therefore, in a part of the study presented here, DNA copy number ratios have been calculated after scrutiny of data and were compared to those calculated by robust statistics. For the three materials ERM-BF415e, ERM-BF427c and ERM-BF425c, robust means (of both, log-transformed and untransformed data) and medians are close to the certified values. The lowest standard deviations were achieved when DNA copy number ratios were calculated from technically acceptable data sets only. The data selection based on well-defined acceptance criteria describing the quality of the calibration curves (i.e. R 2 of above 0.98 and PCR efficiencies within a method dependent defined interval) as well as defined minimum requirements in terms of repeatability, intermediate precision and reproducibility as described in Fig. 2 improve calculation results considerably. After scrutiny of raw data, mean DNA copy number ratios followed a normal distribution (Figs. 3b, 4b, 5b).

The data scrutiny by applying the quality criteria described here is not only important for certification campaigns for RMs but also in routine analysis. The comparison presented above shows that, if a large number of data sets are available, reliable results could also be achieved by applying robust statistical procedures. The robust mean and median of unevaluated data sets can be good estimators of the certified value (Table 7). Such large numbers of data sets are usually not available in routine analysis, and consequently, the scrutiny of data as described here is strongly advised.

Identifying and excluding technically invalid data is especially important in view of the new European legislation on low-level presence of GMOs in feed imports [30]. The requirements of this regulation are technically demanding as it sets a non-compliant limit of 0.1 % related to the mass fraction of GM material in feed. This is currently the lowest level considered by the EURL-GMFF and is referred to as minimum required performance limit (MRPL) [10]. The regulation applies to GM material in feed that is authorised in a third country and for which an authorisation procedure for the European Union is pending. It also applies to GM material for which the authorisation has expired. This means that only those GMOs for which detection and quantification methods were accepted by the EURL-GMFF and that show an RSD r value of maximum 25 % at the level of 0.1 % related to mass fraction of the GM material are covered by this regulation. EU Member states are obliged to declare animal feed as non-compliant with Regulation (EC) 1829/2003 at the MRPL level of 0.1 % or higher taking into account the uncertainty for the measurement [30, 31].

The comparison of calculating DNA copy number ratios from qPCR raw data by different approaches demonstrates that the lowest RSD R values are achieved only when data were scrutinised and technically invalid data sets were not considered for the reported result. As shown here, RSD R of 12, 14 and 15 % was obtained for the measurement of the GM content of ERM-BF415e, ERM-BF427c and ERM-BF425c, respectively (Table 5). These RSD values for reproducibility were obtained for matrix materials containing GMO mass fractions well above the MRPL of 0.1 %, but at the same time, they are also well below the maximum value for reproducibility recommended by the ENGL, which is 35 % and even 50 % for containing GMO mass fractions of below 0.2 % [10]. Considering this, the approach for scrutiny of raw data as presented here could be followed for routine analysis to lower the measurement uncertainty.

Conclusion

Taken together, the studies demonstrated that the declared GM content is not only dependent on the ability of the method to deliver good quality data. Indeed, the importance of data evaluation and the impact of data scrutiny after the actual measurement on the reported result have been demonstrated here. For obtaining the most reliable and best possible results, the following measures should be taken: Firstly, raw data, that is, the Ct values should be verified against technical aspects. This means, for instance that NTC should not show an amplification, the variation between replicate measurements should be small and the trend in dilution should be reflected in the trend of measurement results. The latter two criteria are applicable to both the calibrant and the test samples.

Secondly, it is crucial for the measurement system that the calibration is under control which can be monitored by R 2 and PCR efficiency. The comparison has shown that amplification efficiency is slightly suboptimal in most of the methods compared here. PCR efficiencies were always below 100 %. Therefore, PCR efficiency intervals were chosen in a method-dependent approach and applied as acceptance criteria in order to better reflect the real rather than the ideal PCR amplification.

Lastly, repeatability, intermediate precision and reproducibility should be under control. These parameters can easily be monitored by measuring powder CRMs such as ERM-BF415e, ERM-BF427c and ERM-BF425c. The powder CRMs are intended to be used as quality control material and have been subjected to homogeneity and stability studies leading to an uncertainty fit for this intended purpose.

We have demonstrated that technically invalid data can be identified by following this straight forward approach and can then be excluded for the calculation of the result. In this way, the GM content can be quantified more reliably than calculating the robust mean or median from all data.