Introduction

Celiac disease (CD) patients are affected by a chronic immune-mediated inflammatory disease of the upper small intestine trigged by the ingestion of gluten proteins from wheat, rye, barley, and possibly oats (Ludvigsson et al. 2013; Wieser et al. 2014). The only effective therapy for CD is a strict lifelong adherence to a gluten-free diet (GFD) with an average tolerable gluten intake between 10 and 50 mg daily (Bruins Slot et al. 2015b). A safe GFD is essential not only for CD patients but may also be indicated in cases of wheat allergy and non-celiac gluten sensitivity (NCGS) (Scherf et al. 2016). Taking all wheat sensitivities together, an estimated 5 % of the population rely on a GFD, not counting the growing number of people who voluntarily adopt a GFD mostly for lifestyle reasons (Scherf 2015). According to the Codex Alimentarius, gluten-free foods, either made from naturally gluten-free ingredients or from specially processed gluten-containing sources, must not exceed gluten levels of 20 mg/kg in the product as sold to the consumer (Codex Standard 118–1979 2015).

Enzyme-linked immunosorbent assays (ELISAs) are most commonly used in gluten-free management along the production chain and for legal compliance testing to ensure the safety of these dietary products for CD patients. The advantages of ELISAs are their specificity, sensitivity, and suitability for comparatively fast routine analyses (Scherf and Poms 2016). However, a number of analytical challenges remain within one test kit, because of influences from the food matrix (e.g., polyphenols, high fat contents), the processing conditions during food manufacturing, and gluten extraction procedures prior to the ELISA measurement. When comparing results from different gluten test kits, additional sources of variation are different extraction procedures, test formats, reference materials used for calibration, and antibody specificities (Diaz-Amigo and Popping 2013; Rosell et al. 2014) (Table 1). Comparative studies of commercially available gluten ELISA kits have been performed with a variety of test kits. In the absence of an independent reference method and a certified reference material, the performance of different test kits is typically evaluated after the analysis of spiked or incurred materials with defined gliadin or gluten contents. These may be prepared by adding the Prolamin Working Group (PWG)-gliadin reference material (van Eckert et al. 2006) into a model cookie recipe based on gluten-free corn flour (Bugyi et al. 2013) or amaranth flour (Török et al. 2015); by mixing gluten or wheat flour into corn flour (Sharma 2012) or a cornbread recipe (Sharma et al. 2013); by blending wheat flour and buckwheat flour (Alvarez and Boye 2014) or whole wheat, whole rye, or barley flour and rice flour (Bruins Slot et al. 2015a); or by spiking sausage meat, pastries, and infant semolina with wheat flour, spelt whole flour, wheat whole grain flakes, or wheat semolina (Scharf et al. 2013). After comparison of the values determined by different gluten test kits with the target values, gliadin or gluten recoveries may be calculated. For most test kits, the precision of analytical results within one kit was satisfactory, but the results for the same sample analyzed by different test kits showed rather high variations. For example, the gliadin recoveries of seven test kits ranged from 103 to 624 % in the cookie powder mixture spiked with 10 mg gliadin/kg and from 17 to 326 % in the resulting baked cookie (Bugyi et al. 2013). Because CD patients rely on correct labeling to identify safe products, it is essential that a high degree of confidence can be placed into a gluten-free claim. However, this degree of confidence is, in part, dependent on the characteristics of the assay used to determine the gluten content (Thompson and Mendez 2008), because of the aforementioned divergence of results between different ELISA test kits.

Table 1 Overview of the commercially available ELISA test kits for gluten detection used in this study according to the kit manuals

Therefore, the aim of this investigation was to compare seven commercially available ELISA test kits for gliadin/gluten detection using a set of 30 wheat starches (WSt), 14 of them declared as gluten-free (GfW). This selection of test kits covered seven extraction procedures, five sandwich and two competitive formats, and four monoclonal antibodies (mAb) and two polyclonal antibodies (pAb; Table 1). Apart from one spiked sample with defined gluten content, the target gluten contents were unknown, but the focus was on using a larger number of real samples and a comparatively simple food matrix to compare the test kits. Additional reasons for the in-depth investigation of WSt were that the inclusion of GfW as part of a GFD is considered as safe (Collin et al. 2004; Peräaho et al. 2003) but has been discussed controversially (Hischenhuber et al. 2006; Kupper 2005; Thompson 2001) and that WSt poses specific analytical challenges. Most antibodies used in current ELISA test kits are specific for the alcohol-soluble prolamin fraction of gluten (gliadin in the case of wheat), whereas the alcohol-insoluble glutelin fraction is inadequately targeted (glutenin in the case of wheat) (Scherf and Poms 2016). Therefore, the measured gliadin content is commonly multiplied by a factor of 2 to obtain the gluten content, based on a gliadin-to-glutenin (gli/glu) ratio of 1 (Codex Standard 118–1979 2015). With flours, this duplication usually results in an overestimation of gluten (Wieser and Koehler 2009), but in WSt, this may lead to an underestimation of gluten contents, because gli/glu ratios were down to 0.3 due to gliadin depletion during washing (Scherf 2016). With unknown target gluten contents, the main aim was not to calculate gluten recoveries but to highlight the divergence of results of different gluten ELISA test kits, even in a food matrix as simple as WSt.

Material and Methods

Material

Thirty samples of WSt were either purchased or kindly donated by starch manufacturers. The sources listed in alphabetical order were Baktat (Mannheim-Neckarau, Germany), Bezgluten (Koniusza, Poland), Cargill (Krefeld, Germany), Crespel & Deiters (Ibbenbüren, Germany), Kröner-Stärke (Ibbenbüren, Germany), Merck (Darmstadt, Germany), Querfood (Putzbrunn, Germany), Roquette (Lestrem, France), Sacchetto (Lagnasco, Italy), Sigma-Aldrich (Steinheim, Germany), Tereos Syral (Aalst, Belgium), VWR (Darmstadt, Germany), and Weizenin (Unilever, Hamburg, Germany). The WSt were either native or modified (oxidized, octenyl succinate-modified, pregelatinized). With the exception of four samples without specification, all were food grade. The WSt were grouped into gluten-free WSt (GfW) as declared by the manufacturer and WSt without specification regarding the gluten content (W), which included all four non-food-grade samples (W1, W9, W10, W11). Ammonium heptamolybdate tetrahydrate, ammonium vanadate, ethanol, and nitric acid were purchased from Merck (Darmstadt, Germany) and perchloric acid from Fluka (Buchs, Switzerland). Pure water for ELISA measurements was obtained from B. Braun (Melsungen, Germany).

Determination of Moisture, Crude Protein, and Phosphorus Contents of Wheat Starches

Moisture contents of WSt were determined in triplicates after drying for 2 h at 130 °C, cooling for 1 h at 22 °C in a desiccator, and weighing. The nitrogen contents of WSt were measured by the Dumas combustion method using a FP-328 nitrogen analyzer (Leco, Moenchengladbach, Germany; n = 6) (ICC Standard No. 167 2000). The nitrogen content was multiplied with a factor of 5.7 to calculate the crude protein (CP) content (Koenig et al. 2015). Quantitation of phosphorus in WSt was performed in triplicates using the vanadomolybdate method reported by Noda et al. (2004) to correct for non-protein nitrogen originating from lysophospholipids (Kasarda et al. 2008). Corrected CP contents only were used for further calculations.

ELISA

The gluten contents of the 30 WSt were analyzed by the seven commercially available ELISA test kits for gliadin/gluten detection that are listed in Table 1 together with their abbreviations. During the course of this investigation, the manufacturer of the α20c ELISA test kit made changes to the assay protocol, detection principle, and conversion factor from peptide equivalents to gluten. Therefore, results for the “new” procedure (α20cnew; lot no. RN5317 of November 2015) were compared to those of the “old” procedure (α20cold; lot no. QN5640 of July 2015). The WSt were extracted in triplicates strictly according to the manufacturers’ instructions for starch samples (Table 1). Each extract was applied into two cavities of the 96-well plate (n = 3 × 2). Additional dilutions of sample extracts were made as appropriate and the samples re-analyzed to fall within the respective ranges of quantitation. The subsequent ELISA procedure was also carried out exactly as described by the manufacturers for each test kit. All ELISA measurements were performed in a separate room to avoid gluten contamination and surfaces; vials and equipment had been cleaned with 60 % ethanol. The absorbance was read using an Expert 96 microplate reader (Asys Hitech, Eugendorf, Austria). Calibration was performed with the respective standard provided in each kit, and calibration curves were constructed as stated in each test kit manual (cubic spline function for R5 and R5c using the Rida®Soft Win Software, R-Biopharm; ready-to-use spreadsheet for G12 provided by Romer Labs; third-order polynomial fit for Skerritt and pAb2 using Microsoft Excel 2010; logarithmic function for α20c using Microsoft Excel 2010; 4-parametric fit for pAb1 using the Rida®Soft Win Software). The calibration solutions of the G12 and Skerritt ELISAs directly resulted in gluten contents. The values for gliadin contents obtained after quantitation with the R5, R5c, and pAb1 ELISAs were multiplied by a factor of 2 to yield gluten contents. The concentrations of peptide equivalents determined by the α20cold ELISA were multiplied by 20,000 and those by the α20cnew ELISA were multiplied by 50,000 to obtain gluten contents. The analyses by the pAb2 ELISA yielded wheat protein contents that were multiplied by the recommended factor of 0.85 to obtain gluten contents.

WSt Samples with Defined Gluten Content

For assay control, WSt samples with defined gluten contents were prepared by mixing wheat flour (cv. Akteur, 2013) with WSt W9, which had been confirmed to be gluten-free by all seven ELISA test kits, although it was not declared as such. The wheat flour had 13.3 % moisture (ICC Standard No. 110/1 1976), 0.50 % ash (ICC Standard No. 104/1 1990), and 13.1 % CP (ICC Standard No. 167 2000). First, albumin/globulin, gliadin, and glutenin contents of the wheat flour were quantitated (n = 3) after sequential extraction as described by Wieser et al. (1998) and reversed-phase high-performance liquid chromatography with UV detection at 210 nm (Thanhaeuser et al. 2014) using the calibration reference PWG-gliadin (van Eckert et al. 2006). The flour contained 1.30 ± 0.04 % albumins/globulins, 7.59 ± 0.26 % gliadins, and 3.72 ± 0.08 % glutenins, resulting in 11.31 % gluten as sum of gliadins and glutenins. Then, WSt W9 and the wheat flour were mixed to yield a gluten content of 1000 mg/kg, and this mixture was further diluted down to 20 mg/kg (W9 + 20), which is the legislative threshold value for gluten (Codex Standard 118–1979 2015). The mixtures were shaken overhead for 24 h (Don et al. 2014). The W9 + 20 sample was analyzed 10 times by R5 ELISA to ensure that it was homogeneous. Having ascertained that the coefficient of variation was 7.0 % and thus acceptable for homogeneity (Scharf et al. 2013; Thompson et al. 2006), W9 + 20 was also analyzed with the other ELISA test kits.

Data Analysis

Mean values (n = 6), absolute (SD; Tables 2 and 3), and relative standard deviations (RSD) were calculated for all quantitative values of each WSt sample. The overall RSD values were 0.1–14.3 % (mean 5.5 %) for the R5, 1.5–16.0 % (mean 9.6 %) for the G12, 1.9–25.1 % (mean 6.3 %) for the Skerritt, 1.8–11.8 % (mean 5.4 %) for the pAb1, 0.9–20.7 % (mean 7.2 %) for the pAb2, 0.2–13.9 % (mean 4.5 %) for the R5c, 2.6–13.0 % (mean 6.5 %) for the α20cold, and 0.4–11.1 % (mean 4.9 %) for the α20cnew assay. Statistical significances of differences between the kits within one WSt sample were assessed by one-way analysis of variance (ANOVA) with Tukey’s test (p < 0.05). Pearson’s product moment correlations were calculated between all possible combinations of the gluten contents analyzed by seven ELISA test kits (only α20cnew) and the CP contents. Correlation coefficients >0.78, 0.67–0.78, 0.54–0.66, and <0.54 were considered as strong, medium, weak, and no correlations, respectively (Thanhaeuser et al. 2014). First, all 30 samples were included. However, due to the exceptionally high CP and gluten contents in the WSt samples W8, W11, and W15, these three WSt were excluded from further correlation analyses, because they had a disproportionate influence on the results (all correlation coefficients ≥0.85). For further differentiation of the WSt samples and the test kits, three correlation analyses were made, one with all remaining 27 WSt samples, the second with the 10 WSt samples that had gluten contents ranging from 20 to 100 mg/kg by R5, and the third with the 17 WSt samples that had gluten contents below 20 mg/kg by R5. All statistical analyses were carried out with SigmaPlot 12.0 (Systat Software, San Jose, CA, USA).

Table 2 Gluten contents of 14 wheat starch samples that were declared as gluten-free (GfW1–14) quantitated by the R5, G12, Skerritt, pAb1, pAb2, R5c, α20cold, and α20cnew ELISA test kits, respectively
Table 3 Gluten contents of 16 wheat starch samples without specification regarding the gluten content (W1–16) quantitated by the R5, G12, Skerritt, pAb1, pAb2, R5c, α20cold, and α20cnew ELISA test kits, respectively

Results

Determination of Moisture, Crude Protein, and Phosphorus Contents of Wheat Starches

The moisture contents of WSt ranged from 12.8 ± 0.3 % (GfW4 and W9) to 2.8 ± 0.2 % (W15), and there was no correlation between moisture and gluten contents regardless of the ELISA kit used (p > 0.574). All following values are given on an “as is” basis, because the gluten contents are also reported in such a way, as required by legislation (Codex Standard 118–1979 2015). The nitrogen contents of the WSt were corrected for non-protein nitrogen from lysophospholipids after quantitation of the phosphorus contents. The phosphorus contents were in the range of 16 mg/100 g (GfW6) to 121 mg/100 g (W15), with an overall average value (n = 30) of 40 ± 2 mg/100 g, similar to those reported earlier for WSt samples (Kasarda et al. 2008). This resulted in corrected CP contents from as low as 0.06 ± 0.02 % (W3), 0.12 ± 0.02 % (W5), and 0.12 ± 0.03 % (W9) up to 2.02 ± 0.07 % (W8) and 2.04 ± 0.06 % (W15). The CP contents of GfW samples were much lower ranging from 0.14 ± 0.04 % (GfW1) to 0.32 ± 0.06 % (GfW6), with an average content (n = 14) of 0.23 ± 0.05 %. The average over all samples (n = 30) was 0.35 ± 0.47 %, mostly due to the two exceptions W8 and W15 with very high CP contents.

Comparative Analysis of Gluten-Free Wheat Starches

In total, 14 samples that had been declared as gluten-free were analyzed (Table 2). For these comparisons, the threshold was set strictly at 20.0 mg gluten/kg, so that samples containing 19.5 ± 0.6 mg/kg (GfW9 by pAb1) or 19.1 ± 2.4 mg/kg (GfW12 by pAb2) would still be considered gluten-free, as opposed to a sample containing 20.3 ± 0.6 mg/kg (GfW12 by pAb1) that would no longer be considered gluten-free. Judging by the R5 ELISA, 12 out of 14 GfW were confirmed to contain less than 20 mg gluten/kg; one was just a bit over the threshold (GfW11) and one was clearly over the threshold (GfW8). Compared to the R5, the G12 ELISA found lower values for all but one GfW (GfW13), where the difference to the R5 was not significant. Therefore, all GfW samples would have been compliant with the regulatory threshold if analyzed by the G12 ELISA, even the GfW8 sample, which was below the limit of quantitation (LOQ) of the test kit. The analyses by Skerritt ELISA resulted in unexpectedly high values for all GfW that were significantly higher than those of all other test kits in 12 out of 14 cases. According to this kit, only 1 out of 14 samples (GfW5) would have been gluten-free. The pAb1 test kit gave five results that agreed with the R5 (GfW1, GfW4, GfW6, GfW7, GfW10), three that were lower than the R5 (GfW2, GfW8, GfW11), and six that were higher than the R5 (GfW3, GfW5, GfW9, GfW12-14). All in all, 10 out of 14 GfW were gluten-free with this test kit, 2 were right at the threshold (GfW3, GfW12), and 2 were slightly above the threshold (GfW13, GfW14). With the pAb2 ELISA, eight results were not significantly different from the R5 (GfW1, GfW3, GfW4, GfW7, GfW9, GfW10, GfW13, GfW14), three were lower (GfW2, GfW6, GfW8), and two were higher (GfW11, GfW12) than the R5. One result could not be judged (GfW5), because it was below the LOQ of the R5 ELISA (<5.0 mg/kg). Still, the value found by the pAb2 ELISA was 2.2 mg/kg, which would also agree with the R5 ELISA. Judging by the pAb2 test kit, 12 out of 14 GfW were gluten-free, 1 was just above the threshold (GfW13), and 1 contained about twice the allowable amount (GfW11). In comparison to the R5 sandwich ELISA, the competitive R5 assay gave lower values for four samples (GfW7, GfW8, GfW10, GfW11), higher values for four samples (GfW3-5, GfW9), and the same values for five samples (GfW1, GfW2, GfW12–14). GfW6 could not be judged, because the content was 8.7 by R5, which was already below the LOQ (<10.0 mg/kg) of the R5c. Again, 12 out of 14 samples were gluten-free, but the 2 samples with gluten contents above 20 mg/kg were different for both R5 assays, i.e., GfW8 and GfW11 for R5, but GfW3 and GfW4 for R5c. With one exception, all GfW samples contained gluten below the LOQ (<5.0 mg/kg) of the α20cold assay, and according to this test kit, all GfW were gluten-free. The α20cnew assay provided significantly different results compared to the α20cold assay in 13 GfW, except GfW5. Seven α20cnew results were not significantly different from the R5 (GfW1, GfW3, GfW6, GfW10, GfW11, GfW13, GfW14), two were lower (GfW2, GfW8), and four were higher (GfW4, GfW7, GfW9, GfW12) than the R5; the gluten content of GfW5 was below the respective LOQs of both kits. In total, 12 out of 14 samples were gluten-free judging by the α20cnew kit.

When comparing the individual samples, one (GfW5) was classified as gluten-free by all test kits, even the Skerritt, and five were classified as gluten-free by all test kits except the Skerritt (GfW1, GfW2, GfW6, GfW7, GfW10). Not taking the Skerritt test kit into account, five samples were not gluten-free by one test kit (GfW4 by R5c, GfW8 by R5, GfW9 by α20cnew, GfW12 and GfW14 by pAb1), two were not gluten-free by two test kits (GfW3 by pAb1 and R5c, GfW13 by pAb1 and pAb2), and one was not gluten-free by three test kits (GfW11 by R5, pAb2, and α20cnew).

Comparative Analysis of Wheat Starches Without Specification of the Gluten Content

Table 3 shows the results for the other 16 WSt that had no specification regarding the gluten content. It was obvious that the gluten contents in those starches covered a very wide range from below 5 mg/kg (W9) up to more than 10,000 mg/kg (W8). According to the R5 ELISA, 5 out of 16 WSt were gluten-free (W1, W2, W5, W9, W16), 8 contained gluten between 20 and 100 mg/kg (W3, W4, W6, W7, W10, W12–14), and the remaining 3 WSt had very high gluten amounts above 400 mg/kg (W8, W11, W15). When analyzed with the G12 ELISA, 7 out of 16 WSt were gluten-free (W1–3, W5, W9, W12, W16), 5 had gluten contents between 20 and 100 mg/kg (W4, W6, W7, W10, W13), and 4 WSt contained more than 100 mg/kg (W8, W11, W14, W15). Compared to the R5 ELISA, the G12 assay found no significant differences for six WSt (W1, W2, W4, W8, W9, W15), lower values for six WSt (W3, W5–7, W12, W13), and higher values for four WSt (W10, W11, W14, W16). As seen before with the GfW samples, the Skerritt ELISA yielded exceptionally high values for all samples, which were 4 times (W15) up to 41 times (W16) as high as those of the R5 ELISA and also higher than any of the other results. Both assay controls provided by the manufacturer were also analyzed. The negative assay control was below the LOQ of the test kit (<5.0 mg/kg), and the positive assay control was between 5 and 40 mg/kg (19.0 ± 2.3 mg/kg), as indicated in the kit manual, so that any contamination or systematic error could be ruled out. The values for gluten were 5.89 % in W8, 1.00 % in W11, and 3.09 % in W15 and thus even higher than the CP contents (2.02 % in W8, 0.39 % in W11, and 2.04 % in W15), which seemed to make little sense. The results of the pAb1 ELISA showed that 4 out of 16 WSt were gluten-free (W2, W5, W9, W16), 6 had gluten contents between 20 and 100 mg/kg (W1, W3, W4, W10, W12, W14), and the remaining 6 (W6–8, W11, W13, W15) had gluten contents above 100 mg/kg. Three results of the pAb1 ELISA agreed with those of the R5 ELISA (W9, W10, W16), five results were lower (W8, W11, W12, W14, W15), and eight results were higher (W1–7, W13) than those of the R5 ELISA. Only 3 out of 16 WSt were gluten-free (W2, W9, W16) when analyzed with the pAb2 ELISA, 6 (W1, W3–5, W7, W12) had 20–100 mg gluten/kg, and 7 had more than 100 mg gluten/kg (W6, W8, W10, W11, W13–15). In comparison to the R5 ELISA, the pAb2 ELISA provided the same results for 3 WSt (W8, W9, W15), a lower result for 1 WSt (W12), and higher results for 12 WSt (W1–7, W10, W11, W13, W14, W16). The results of the pAb2 were the same as those of the pAb1 ELISA in only 6 out of 16 cases (W1, W3, W4, W6, W9, W12), which pointed to distinct differences between these two pAb-based sandwich ELISAs. With the R5c assay, 2 out of 16 WSt were gluten-free (W2, W9), 6 contained 20–100 mg gluten/kg (W1, W3, W5, W12, W14, W16), and 8 contained more than 100 mg/kg (W4, W6–8, W10, W11, W13, W15). Four results of the R5c agreed with the R5 ELISA (W3, W8, W9, W12), 2 were lower (W11, W15), and 10 were higher (W1, W2, W4–7, W10, W13, W14, W16) than with the R5 ELISA. Six out of 16 WSt were gluten-free (W1–3, W5, W9, W16) according to the α20cold test kit, 4 had gluten contents between 20 and 100 mg/kg (W4, W10, W12, W14), and 6 above 100 mg/kg (W6–8, W11, W13, W15). Only one result (W9) was not significantly different from the R5 assay, eight results were lower (W1–3, W5, W10, W12, W14, W16), and seven higher (W4, W6–8, W11, W13, W15) with the α20cold compared to the R5 ELISA. With the α20cnew kit, 4 out of 16 WSt were gluten-free (W2, W5, W9, W16), 6 contained 20–100 mg gluten/kg (W1, W3, W4, W10, W12, W14), and 6 more than 100 mg gluten/kg (W6–8, W11, W13, W15). When comparing the α20cnew to the R5 assay, three results for gluten content were not significantly different (W3, W9, W16), four were lower (W8, W11, W12, W15), and eight (W1, W4–7, W10, W13, W14) were higher with the α20cnew kit. The results for W2 could not be judged, because the gluten content determined by the R5 assay was below the LOQ of the α20cnew assay. The values for gluten content determined by α20cold and α20cnew agreed in only 4 out of 16 cases (W2, W6, W9, W12). Both competitive assays (R5c and α20cnew) gave the same results in 5 out of 16 samples.

Not considering the results of the Skerritt ELISA because of their exceedingly high gluten contents, seven WSt samples (W1–3, W5, W9, W12, W16) would be classified gluten-free by at least one of the remaining six test kits. W2 and W9 were gluten-free by all test kits and W16 was gluten-free by all test kits except the R5c. W5 was gluten-free with four out of six kits but not with the pAb2 and the R5c test kits, although the R5c result was more or less at the threshold of 20 mg/kg. According to the R5, G12, and α20cold ELISAs, W1 was gluten-free (12.8–16.2 mg/kg), but the pAb1, pAb2, R5c, and α20cnew ELISAs indicated gluten clearly above the regulatory threshold (38.7–55.2 mg/kg). Two test kits (G12 and α20cold) yielded results below 20 mg/kg for the W3 sample, but the other test kits gave values at (R5) or above (pAb1, pAb2, R5c, α20cnew) 20 mg/kg. W12 was only gluten-free with the G12 ELISA, whereas all other ELISAs gave higher values of 26.8 up to 70.8 mg/kg. The other nine samples contained gluten above the threshold of 20 mg/kg with all ELISAs, although the exact quantities varied depending on the ELISA test kit, e.g., from 46.8 mg/kg (R5) to 104.8 mg/kg (R5c) in W4 and from 37.5 mg/kg (α20cold) to 282.8 mg/kg (pAb2) in W14.

Comparative Analysis of the Defined W9 + 20 Sample

Due to the rather high variation of results between the seven different ELISA test kits, one sample with a defined gluten content of 20 mg/kg, composed of 13.4 mg gliadins/kg and 6.6 mg glutenins/kg, was prepared by mixing wheat flour and WSt W9 (W9 + 20). After confirmation of the requirements for homogeneity (Thompson et al. 2006), the sample W9 + 20 was also analyzed with all ELISA test kits (Table 3). The R5 and pAb2 ELISAs were closest to the target value of 20 mg gluten/kg. The G12, pAb1, α20cold, and α20cnew assays tended to underestimate the gluten content and found 66.5–82.0 % of the target concentration. In contrast, the Skerritt and R5c assays found 227.5 and 191 % of the target concentration, resulting in an overestimation. The measured gluten concentrations in the unknown samples GfW1–14 and W1–16 given in Tables 2 and 3 were not corrected with these recoveries for reasons discussed later.

Correlations Between Crude Protein and Gluten Contents for Each ELISA Test Kit

The CP contents were correlated to the gluten contents measured by each ELISA test kit considering all WSt samples (n = 27) except the three WSt with uncharacteristically high CP and gluten contents (W8, W11, W15; Fig. 1). The range of CP contents included into the correlations was thus from 0.06 % (W3) to 0.46 % (W14). The equations for the linear regression lines y = b 0 + b 1 x were calculated for each of the ELISA test kits and were given by y = 20.46 + 37.11x (r 2 = 0.015) for the R5, y = 2.45 + 115.14x (r 2 = 0.105) for the G12, y = 25.10 + 66.56x (r 2 = 0.009) for the pAb1, y = −36.44 + 363.50x (r 2 = 0.236) for the pAb2, y = 31.96 + 82.57x (r 2 = 0.016) for the R5c, y = 39.57 + 88.92x (r 2 = 0.033) for the α20cold, y = 25.48 + 67.91x (r 2 = 0.019) for the α20cnew, and y = 14.16 + 1572.45x (r 2 = 0.074) for the Skerritt assay. None of the correlations between CP and gluten contents was significant, indicating that the analysis of the CP content is not suitable as independent analytical parameter to predict gluten contents of WSt.

Fig. 1
figure 1

Correlations between crude protein and gluten contents for each ELISA test kit including all WSt samples (n = 27) except W8, W11, and W15. Missing values were below the respective limit of quantitation of the assay. a R5 ELISA, n = 25, r = 0.121; b G12 ELISA, n = 19, r = 0.324; c pAb1 ELISA, n = 26, r = 0.094; d pAb2 ELISA, n = 27, r = 0.486; e R5c ELISA, n = 23, r = 0.126; f α20cnew ELISA, n = 24, r = 0.139; and g Skerritt ELISA, n = 27, r = 0.272

Correlations Between Gluten Contents Analyzed by Different ELISA Test Kits

The gluten contents obtained after analysis by the different ELISA test kits (only α20cnew) were correlated in all 21 possible combinations in order to see how well the results corresponded to one another. When the 27 WSt samples except W8, W11, and W15 were included, 12 strong, 3 medium, and 4 weak correlations were observed, leaving 2 cases without correlation (Table 4). The best agreements (r > 0.90) were seen between the α20cnew and R5c, the α20cnew and R5, the R5c and pAb1, and the R5c and R5 assays. The similarity of results between the R5c and R5 assays was according to expectations, because the same antibody is used in both test kits. The results of all test kits corresponded to a greater or lesser extent to those of the R5 kit, with weak (G12), medium (pAb2), or strong (Skerritt, pAb1, R5c, α20cnew) correlations. Despite the exceptionally high values for gluten observed with the Skerritt assay, the results of this kit were highly correlated to those of the R5, G12, pAb2, and R5c assays. This indicates that the material used for calibration in this kit may be unsuitable for WSt samples.

Table 4 Pearson’s product moment correlation coefficients (r) for all possible combinations of the seven ELISA test kits, correlation analysis including all WSt samples (n = 27) except W8, W11, and W15

The second correlation analysis only included the 10 out of 27 WSt samples that had gluten contents between 20 and 100 mg/kg by R5 (Table 5). Here, eight correlations were strong, two were medium, three were weak, and the other eight combinations were not correlated to one another. Again, the α20cnew and R5c and the R5c and pAb1 assays showed the best concordance (r > 0.90). The results of the pAb1, R5c, and α20cnew assays were still highly correlated to those of the R5, but those of the Skerritt were only weakly correlated and those of the G12 and pAb2 assays were not correlated to the R5. The results of the G12 and pAb2 assays were highly correlated to each other in both cases (Tables 4 and 5).

Table 5 Pearson’s product moment correlation coefficients (r) for all possible combinations of the 7 ELISA test kits, correlation analysis including only the 10 WSt samples with gluten contents from 20 to 100 mg/kg by R5

The picture changed again when only the 17 WSt with gluten contents below 20 mg/kg by R5 were included into the correlation (Table 6). There were only four strong, five medium, and two weak correlations in this case. The other 10 combinations were not correlated. The results of the pAb1 and pAb2 assays were strongly correlated in this case, which was interesting, because they were not correlated in the other two cases (Tables 4 and 5). Apparently, the differences in specificities and sensitivities of the various test systems were most obvious at values between the regulatory threshold of 20 mg gluten/kg and the respective LOQs of the kits.

Table 6 Pearson’s product moment correlation coefficients (r) for all possible combinations of the 7 ELISA test kits, correlation analysis including only the 17 WSt samples with gluten contents below 20 mg/kg by R5

Discussion

The comparative analysis of 30 WSt samples with 7 commercial gluten ELISA test kits revealed discrepancies in gluten quantitation, even in this comparatively simple food matrix. Judging by the available literature (Allred and Ritter 2010; Bruins Slot et al. 2015a; Bugyi et al. 2013; Rallabhandi et al. 2015; Sharma et al. 2015), these results were not unexpected, mostly because of the different characteristics of the ELISA test kits (Table 1) that cause systematic deviations (Sykes et al. 2012). As the first step, the gluten extraction procedure has an important influence on the analytical results. Some kits only use aqueous ethanol (pAb1, R5c, α20cold, α20cnew) which is neither intended nor suitable for extraction of polymeric gluten proteins. Still, due to calibration and/or correction factors, these kits showed no obvious tendency to underestimate gluten contents of WSt compared to those kits (R5, G12, pAb2) that use reducing agents for extraction. These disparities between ELISA systems affect analytical results for gluten contents in raw materials and food products and have to be considered carefully, especially in relation to legislative requirements for the correct labeling of gluten-free foods. At the moment, the R5 Mendez method is recommended as type 1 method for the determination of gluten (Codex Standard 234–1999 2014), but due to recent progress in method development and validation studies (Don et al. 2014), efforts are underway to review this recommendation and add the G12 mAb-based method or others (Codex Committee of Methods of Analysis and Sampling 2015). A recent comparison of results obtained with the G12 compared to the R5 ELISA concluded that both assays gave comparable gluten contents in a variety of routine food samples (Hochegger et al. 2015). The aforementioned study did not include any WSt samples, but the data presented here indicated that the R5 and G12 assays yielded significantly different gluten contents in 23 out of a total of 31 WSt, including the control sample (W9 + 20). Regarding the identification of the gluten-free status of WSt, the R5 and G12 were comparable in 26 out of 30 cases, despite the weak or no correlations between both assays, which may be due to the properties inherent in correlation analyses. Apart from the R5 ELISA, the US Food and Drug Administration mentions the pAb2 ELISA (“Morinaga method”) as an example for a scientifically valid method that can be used to detect gluten in a variety of food matrices (US Department of Health and Human Services 2014). Although the results of the R5 and pAb2 assays were the same and quite accurate for the control sample in the present investigation, the gluten contents of the other WSt samples were significantly different in 18 out of 31 cases, resulting in medium or no correlations. The Skerritt, R5c, and α20cold assays have also undergone full validation studies (Scherf and Poms 2016). The exceptionally high gluten values reported for WSt by the Skerritt test kit could not be fully explained, because the assay control provided by the manufacturer gave a realistic value close to 20 mg gluten/kg. The recovery of the W9 + 20 sample was 228 %, but even so, the results of the real samples were 2 times (GfW5) up to 25 times (W14) as high as the mean value of all other six test kits, even resulting in gluten contents exceeding CP contents. Overestimation of gluten contents by up to six times the spiked level (Sharma 2012), fourfold to tenfold higher values than those of R5-based ELISAs (Allred and Ritter 2010), and recoveries above 150 % (Bruins Slot et al. 2015a) have been reported before for this assay. However, this tendency to overestimate gluten contents seemed to have been somewhat potentiated when analyzing WSt samples. The Skerritt mAb has been shown to react strongly with high-molecular-weight glutenins (Sharma 2012). Maybe this feature contributed to the high gluten values measured for all WSt, because the relative content of glutenins may be higher in WSt than in wheat flours, because of gliadin depletion during the washing procedure (Scherf 2016). Only one recent study also included the pAb1 and α20cold assays into their comparison of ELISA test kits (Bruins Slot et al. 2015a). In that study, the pAb1 assay was able to detect PWG-gliadin in spiked food products quite accurately, although it tended to overestimate gluten contents. Here, the findings were different, because the gluten content of the W9 + 20 sample was quantitated with a lower recovery of 73 %. Other than that, this kit did not show any obvious tendencies to overestimate or underestimate the gluten contents of the unknown WSt samples. The results of the pAb1 kit were significantly different from those of the pAb2 kit in 18 out of 31 cases and only strongly correlated when looking at the WSt with gluten contents below 20 mg/kg but not in the other cases. Although both test kits use a pAb, the characteristics of the pAbs and the overall test procedures seemed to result in different gluten contents in the majority of cases. The recovery for gluten in the W9 + 20 sample was 67 % for the α20cold and 82 % for the α20cnew assays, which is comparable to the 80 % reported before for gliadin in a flour mixture analyzed with the α20cold kit (Bruins Slot et al. 2015a). The most obvious feature of the α20cold assay in the present study was its inability to detect gluten in all but one GfW, whereas the α20cnew assay identified gluten contents above its LOQ in 13 out of 14 GfW. Considering the WSt samples except W8, W11, and W15 with values above the LOQs (n = 10), the results of the α20cold and α20cnew assays were strongly correlated (r = 0.894), but it seems as if the α20cnew kit with its new system is more robust and well suited for the analysis of gluten in WSt, especially GfW, compared to the α20cold kit. The R5 and R5c assays both use the same mAb, but the analysis by R5c resulted in the same or higher gluten contents in 9 and 15 out of 31 cases, respectively, compared to the R5. This may be expected, because the competitive format only requires one epitope for antibody binding and not two as in the sandwich format, so that smaller gluten peptides may be recognized as well. It is possible that a part of gluten proteins contained only one epitope, especially if the overall contents are low. Both assays showed strongly correlated results, with the exception of the WSt with gluten contents below 20 mg/kg, where the differences between both formats were most striking and resulted in no correlation at all (r = 0.079).

The apparent lack of correlation (no correlation in 10 out of 21 pairs; Table 6) between the results of different test systems was most noticeable when looking at the 17 WSt with gluten contents below 20 mg/kg by R5. The different specificities and sensitivities of the various test systems were most apparent at those trace levels between the regulatory threshold of 20 mg gluten/kg and the respective LOQs of the kits. Another issue to highlight was the lack of correlation between the CP and gluten contents, irrespective of the kit used. A rough estimation of gluten contents may be possible by taking 80 % of total protein in gluten-containing products (1 to 10.6 % of gluten in the product) (Assor et al. 2014), not taking variations due to growing conditions or cultivar into account, but this is definitely not possible in the case of WSt, where the average gluten content was around 0.3 %. Furthermore, these kits are designed for the analysis of trace amounts of gluten, which is why the measurements tend to be less accurate when contamination with gluten is very high as in W8 and W15.

All in all, the comparative analysis of gluten in 30 unknown WSt and 1 defined control by 7 different commercial ELISA kits (considering only α20cnew) yielded 2 significantly different results in 2 cases (GfW10, W9), 3 significantly different results in 6 cases (GfW1, GfW2, GfW5, GfW6, GfW9, GfW13), 4 significantly different results in 12 cases (GfW7, GfW8, GfW14, W1–3, W6, W8, W10, W12, W16, W9 + 20), 5 significantly different results in 7 cases (GfW3, GfW4, GfW11, GfW12, W4, W5, W15), and even 6 significantly different results in 4 cases (W7, W11, W13, W14), not counting the results below the LOQs. There were 13 WSt where at least 1 kit found gluten values below 20 mg/kg and at least 2 kits found gluten values above 20 mg/kg (GfW3, GfW4, GfW8, GfW9, GfW11–14, W1, W3, W5, W12, W16), which would directly affect the labeling of the WSt, even if WSt is usually not used per se, but as ingredient for baked goods.

One important aspect to help standardize the results of different test kits would be to have generally accepted reference materials, such as the PWG-gliadin standard (van Eckert et al. 2006), only with total gluten from wheat, rye, and barley. Then, the validity of certain calculation factors that have to be introduced during data analysis (e.g., factor 2 from gliadin to gluten, factor 500 from gliadin peptide equivalents to gluten, or factor 0.85 from wheat protein to gluten) could also be verified. These reference materials could also be used to produce standardized assay control samples. Due to the lack of a certified reference material, spiking food samples with the well-characterized prolamin fraction of 28 European wheat cultivars (PWG-gliadin; van Eckert et al. 2006) or mixing gluten-free materials with gluten-containing flours at defined amounts, as done here, is common practice. Even though one of the most commonly produced German winter wheat cultivars with a typical gliadin and glutenin profile was used for spiking purposes, its gluten composition may not adequately reflect that of the WSt samples.

The main reason why gluten contents were presented without recovery correction was that the gluten composition (relative amounts of gliadins to glutenins as well as individual gluten protein types, i.e., ω5-, ω1,2-, α-, and γ-gliadins and low- and high-molecular-weight glutenin subunits) of the W9 + 20 sample was not necessarily the same as that of the WSt, because the WSt had already undergone processing to partially or almost completely remove gluten. During the course of WSt purification, gliadins were eliminated more extensively than glutenins (Scherf 2016), and depending on the respective sensitivities and specificities of the antibodies, this may have a substantial influence on the results. Another reason was that, even if the results had been recovery corrected, most major differences would still have been there, albeit with other numbers. Except for one kit that seemed to overestimate gluten contents in all WSt (Skerritt), neither of the other kits (only considering α20cnew) showed any obvious flaws, which would make it less well suited for the analysis of gluten in WSt. More research comparing not only defined samples (Scharf et al. 2013) but also large numbers of real samples is needed to help elucidate the characteristics of the various ELISA kits and help judge which kit is best suited for as many different food matrices as possible. These extensive comparisons will also become necessary if more than one test for gluten analysis will be recommended by legislation in the future, not only to establish guidelines on how to proceed in case of conflicting results but also to guarantee that a high level of confidence can be placed into the analytical results to ensure the safety of gluten-free foods for CD patients. With the development of improved reference materials and novel methods based on LC-MS/MS, aptamers, or multiplex assays, many of the current uncertainties in gluten and allergen measurements may be overcome in the near future.

Conclusion

The comparison of gluten contents in 30 WSt samples analyzed with 7 commercially available ELISA test kits revealed significant differences between the kits within 1 WSt. The main reasons are different extraction procedures, test formats, and reference materials as well as the specificities and sensitivities of the mAbs and pAbs. The divergence of results for gluten contents even in a comparatively simple food matrix such as WSt is rather concerning, because GfW is commonly used as ingredient for gluten-free products. As such, the analytical methods used in quality control should be able to reliably detect gluten to ensure that each batch of GfW is safe to be used for foods for CD patients. As seen in this study, the variability of results between test kits may lead to one sample being labeled gluten-free according to kit x, whereas it would have contained more than 20 mg gluten/kg, if it had been analyzed by kit y. As more kits are being devised and marketed, in-depth comparative studies using a large number of real samples are needed to test the strengths and weaknesses of each kit to ensure the safety of products for CD patients.