Introduction

Polychlorinated dibenzo-p-dioxins (PCDD) and polychlorinated dibenzofurans (PCDF) are persistent organic pollutants (POPs) widely recognized by the scientific community as being toxic and a risk to the health of wildlife and humans. Thus, risk assessment programs have been organized all around the world to evaluate the presence of these contaminants in various environmental compartments and food products. In the field of food safety, specific international regulations have been put in place to establish some rules and requirements for these monitoring plans. One example of this is the definition of a toxic equivalent value (TEQ) that can characterize the total dioxin content of a sample. This approach is justified by the need for a compliant/noncompliant decision for samples presenting a concentration below or above this limit. However, this quantitative procedure may minimize the qualitative information given by the abundance of each PCDD/PCDF congener: the specific characteristic contamination profile of the sample.

The main difficulty linked to the interpretation of this profile comes from the relative complexities of datasets arising from a large number of samples. Indeed, the existence of similarities or differences between samples is not easy to identify from these raw datatables. One possible way to deal with this issue is provided by multivariate statistical analysis, such as principal component analysis (PCA) or linear discriminant analysis (LDA). These techniques have been already used to investigate the main factors that influence variations in dioxin congener profiles in postcombustion chambers of waste incinerators. Nonoptimal conditions (for example reduced temperatures) were found to result in increased contents of lower chlorinated (mono- to tri-CDD) dibenzo-p-dioxins [1], while the chlorine content in the waste was shown to be a major influence on the relative amounts of PCDDs and PCDFs [2]. The variability of the congener profile in relation to the environmental compartment (air, soil, sediment) that the sample originates from was also widely investigated using PCA and/or cluster analysis [38], with results demonstrating specific behaviors for the different congeners when transferred between these environmental media. This variability led to a congener-specific bioaccumulation of PCDD/PCDF in the environment [9, 10], which was suggested to provide a signature that could be used to identify the source of contamination [6, 1115].

Compared to these numerous applications in the environmental sciences, the utilization of multivariate analysis to assess the variations in the PCDD/PCDF contamination profile in the field of food safety has been rather limited, even if some data regarding the discrimination of egg samples from various locations [16], or the existence of different profiles for various dairy products [17] have been published. However, important modifications of these profiles can occur during the transfer of the different congeners from the environment to living organisms and their products. Some hypotheses proposed to explain these modifications of the dioxin profile from the source to the final biological matrix by invoking the existence of specific physicochemical properties (volatility, solubility,…), of the different congeners [18], and/or different reactions (photosensitivity, association with various environmental chemicals,…). Last, but not least, one main source of variability is provided by the metabolic biotransformations and bioaccumulation that occurs in living animals. In this context, the objective of the present study was to use multivariate statistical techniques and especially LDA in order to analyze a large dataset from dioxin analyses performed on various food products of animal origin collected in France. The aim was to demonstrate some differences between these samples in term of their dioxin contamination profiles, the nature of which depended on the sample type and/or origin. The final challenge of this study was to investigate the causes and possible consequences of such eventual differences, and to discuss some possible practical applications of these findings.

Experimental

Samples

The present study is based on 501 food sample analyses, including 176 fish, 112 milk, 54 muscle, 30 liver, 46 oil, 35 egg, 32 cheese, 14 fat and 2 butter samples. These samples covered different animal species as well as different collection locations. They were collected during a two year period within the remit of an official French monitoring survey.

Reagents and chemicals

Organic solvents such as pentane, hexane, cyclohexane, isooctane, toluene, acetone, dichloromethane, diethylether, ethanol and methanol were of picrograde quality and were provided by Promochem (Molsheim, France). Acetic and sulfuric acids were purchased from SDS (Peypin, France). Sodium sulfate and potassium oxalate were from Merck (Darmstad, Germany). Silica gel was from Fluka (Buchs, Switzerland). Native 12C- and 13C-labeled PCDD/PCDF congeners were provided by Promochem. Standard solutions were prepared in toluene and stored in the dark at <6 °C.

Sample preparation

For solid matrices, 10–20 g aliquots of fresh samples (corresponding to 0.5–1.5 g of the fat equivalent) were lyophilized, powdered, and transferred into accelerated solvent extraction (ASE) cells. The pressure and temperature were set to 100 bar and 120 °C respectively. Four successive extraction cycles (5 min each) were performed using a mixture of toluene/acetone 70:30 (v/v) as the extraction solvent. The extract was evaporated to dryness, permitting gravimetric determination of the fat content. For liquid samples, 2–10 mL aliquots (corresponding to 0.5–1.5 g of fat equivalent) were mixed with 4 mL of a dipotassium oxalate solution (35%), followed by 200 mL ethanol, 100 mL ether and 140 mL pentane. The upper organic layer was treated with a sodium sulfate solution (3%) and evaporated to dryness. For all samples, extracts were dissolved in hexane and a classical three-step purification process was then performed, using activated silica, florisil and celite/carbon stationary phases in that order. 13C-labeled internal standards were introduced into all samples before extraction.

GC-HRMS analysis

GC-HRMS detection was performed on a Hewlett-Packard 6890 gas chromatograph (Palo Alto, CA, USA), equipped with a DB-5MS column (30 m×0.25 mm i.d., 0.25 μm film thickness), and coupled to a Jeol (Peabody, MA, USA) JMS-700D high-resolution mass spectrometer. The GC program was as follows: 120 °C (3 min), 20 °C/min until 170 °C (0 min), and then 3 °C/min until 275 °C (7 min). The injector and the transfer line temperature were both set to 280 °C. Acquisition was performed in the SIM mode with a resolution better than 10,000 (10% valley). The electron impact ionization energy was 38–40 eV and the ion source temperature was set to 280 °C. The 17 monitored PCDD/PCDF congeners were identified on the basis of their molecular ions [M]+. and the corresponding 37Cl isotopic contributions. Quantification was performed using 13C-labeled analogs as internal standards.

Statistical analysis

Each PCDD/PCDF congener was considered to be a statistical variable and each analyzed sample to be an observation. The value assigned to each variable was the concentration measured for the corresponding congener, reported in relation to the sum of the 17 congener concentrations. For each observation, additional informative variables were introduced, including the amount of extracted fat, the calculated WHO-TEQ, the nature/species of the sample, as well as its location of collection. The resulting datatable contained 501 samples (rows) and 22 variables observed for each sample (columns). Multivariate statistical analyses were performed on this dataset to reveal and study the variability across the different samples in terms of their global contamination profiles. These statistical techniques were realized using Statistica software (v.5.5, Statsoft, Inc., Tulsa, OK, USA), and they included principal component analysis (PCA), hierarchical clustering, and step-by-step incremental linear discriminant analysis (LDA).

Results and discussion

Overview of the data set

Principal component analysis (PCA) was first performed on the entire dataset constituted by the 17 congener concentrations measured in the 501 investigated food samples. The three principal axes extracted by the PCA explained 69.47% of the total variance. The projections of the 17 variables onto these axes are shown in Fig. 1. The gravity centers (average products) of the nine food product classes (fish, milk, muscle, liver, fat, egg, cheese, oil, butter) were also plotted. These results show the existence of a group of highly correlated variables, corresponding to all hexa-PCDD/F congeners except 1,2,3,7,8,9-HxCDF, which was present particularly strongly in milk, butter and fat samples. A second observation was a correlation between 2,3,7,8-TCDF and 1,2,3,7,8-PeCDF; these two lower chlorinated PCDFs were present at particularly high levels in fish samples. Finally, the PCA revealed significantly different PCDD/F profiles between fish, dairy products (milk, butter, cheese) and meat (muscle, liver) samples. In term of relationships between the congeners, some correlation was clearly apparent but the majority of the observed projections were not found to be unambiguously interpretable.

Fig. 1
figure 1

Representation of the 17 congener variables and gravity centers (average points) of the nine food product classes on the axes extracted by means of PCA (C1: 2,3,7,8-TCDD; C2: 1,2,3,7,8-PeCDD; C3: 1,2,3,4,7,8-HxCDD; C4: 1,2,3,6,7,8-HxCDD; C5: 1,2,3,7,8,9-HxCDD; C6: 1,2,3,4,6,7,8-HpCDD; C7: OCDD; C8: 2,3,7,8-TCDF; C9: 1,2,3,7,8-PeCDF; C10: 2,3,4,7,8-PeCDF; C11: 1,2,3,4,7,8-HxCDF; C12: 1,2,3,6,7,8-HxCDF; C13: 1,2,3,7,8,9-HxCDF; C14: 2,3,4,6,7,8-HxCDF; C15: 1,2,3,4,6,7,8-HpCDF; C16: 1,2,3,4,7,8,9-HpCDF; C17: OCDF)

In a subsequent stage, a hierarchical clustering analysis of the variables was performed on the same dataset in order to highlight the relationships between the 17 congeners in terms of correlations. For this purpose, a classical aggregation procedure based on the Ward method and the 1-Pearson metric was used. The resulting hierarchical tree is shown in Fig. 2. This analysis first confirmed the correlations already observed with the PCA, but also revealed additional links between the congeners. A first synthetic cut-off for an aggregation distance of around 1.5 led to three variable clusters corresponding to 2,3,7,8-TCDF 1,2,3,7,8-PeCDF, Cl7 and Cl8 PCDD/F, 1,2,3,7,8,9-HxCDF, and Cl4 to Cl6 PCDD/F. More precisely, some strong correlations appeared between some PCDD/F analogs (1,2,3,4,6,7,8-HpCDD/F) or homologs (1,2,3,4,7,8-HxCDF/1,2,3,6,7,8-HxCDF; 1,2,3,6,7,8-HxCDD/1,2,3,7,8,9-HxCDD).

Fig. 2
figure 2

Result of the hierarchical clustering performed on the 17 congener variables

Summing up, it can be stated that investigating the dataset by means of nonsupervised (descriptive) analyses such as PCA and hierarchical clustering permitted us to confirm the existence of a significant amount of variability between the PCDD/F profiles of various food products, and revealed some questionable linear relationships between the 17 congeners. The main purpose of the present study was to focus on the first aspect: possible objective discrimination between different samples on the basis of their contamination profile. It was then decided to perform linear discriminant analysis (LDA) using the sample nature or animal species as discriminating factor. For the second aspect (the relationships between the 17 congeners), some additional work would be now necessary to explain the reported findings. The nature of the contamination source and the original emission profile, the influence of the chlorination level of the emitted PCDD/F on its transfer from the environment to living organisms, the possibility of degradation from one congener to another by dechlorination reactions, and congener-specific bioaccumulation factors in different tissues are example of studies that may help interpret these observations.

Variation in the contamination profile across different food matrices

Figure 3 shows the projections of various food samples onto the two first axes extracted by the LDA using the matrix nature as discriminating factor. The first obvious discrimination was between milk and fish samples, with meat and egg samples being located in a third area. The details of this LDA analysis are displayed in Table 1. These results revealed that the main congeners correlated with the first axis, involved in the differentiation of milk and fish samples, are globally the Cl6 compounds (except 1,2,3,7,8,9-HxCDF) and 2.3.7.8-TCDF, in accordance with the results of the PCA. The Cl6 congeners appeared to be more abundant in milk, while 2,3,7,8-TCDF was found to be present at higher levels in fish. The main congeners correlated with the second axis, responsible for a slight separation between meat/egg samples and the others, are more specifically 1,2,3,4,6,7,8-HpCDD and OCDD (lower in meat/egg than in milk/fish), as well as Cl4 compounds (2,3,7,8-TCDF/D) and 2,3,4,7,8-PeCDF (higher in meat/egg than in milk/fish). A clear difference also appeared between meat, fat and liver samples, mainly based on the congeners OCDD and 2,3,7,8-TCDD. Three Cl5 compounds (1,2,3,6,7,8-HxCDD, 1,2,3,7,8,9-HxCDD, 2,3,4,6,7,8-HxCDF) also showed this discrimination. Globally, the heavier congeners (OCDD) seems to be more abundant in liver than in fat, while 2,3,7,8-TCDD was found to be less abundant in liver. This observation may be linked to the existence of different accumulation factors for PCDD/F; not only strong lipophilic properties leading to high contents in fat, but also for instance specific protein binding, which explains their high content in other organs and tissues such liver. Finally, a strong discrimination was observed between the different fatty products. Indeed, fat and butter samples are clearly distinguished from oil samples on the basis of the congeners 2,3,4,7,8-PeCDF and 1,2,3,6,7,8-HxCDD (less abundant in oil). Moreover, fish oil samples are separated from all of the others by their lower contents of 2,3,4,7,8-PeCDF and 2,3,7,8-TCDF.

Fig. 3
figure 3

Representation of various food samples on the two principal axes obtained via LDA based on the matrix nature as discriminating factor

Table 1 Details for the LDA based on the matrix nature as discriminating factor

These results confirmed the existence of specific profiles in different food products, which differed independent of their total dioxin content. Some of these differences may have been noticed directly from several isolated sample profiles, without any statistical analysis. Nevertheless, the LDA enables us to validate these observations for a large number of samples, and to provide a diagnostic model that is based on more than the more obvious sources of variability, including all of the congeners, permitting us to improve the discrimination. This observation has to be validated on a larger scale before it can be applied to any practical applications. For instance, finely characterizing typical contamination profiles for a variety of food products available for consumers could be useful from a toxicological point of view, providing valuable information for risk assessors. From an analytical point of view, examining the overall contamination profile of a sample would allow us to discard a hypothesis of external cross-contamination between samples as an additional quality criterion. Another application could be as a supporting technique for the interpretation of interlaboratory studies and proficiency tests: the final WHO-TEQ value could be moderated on the basis of the obtained profiles. Indeed, although the present regulation is only based on the WHO-TEQ value, being able to efficiently monitor all of the congeners remains important, because the final result may be highly dependent on only one or two of the congeners. In other words, a good analytical method for dioxin analysis is expected to have a good repeatability not only in terms of the TEQ value, but also for each separated congener.

Variability in the contamination profile according to different animal species

Figure 4 shows the projections of various food samples onto the two principal axes extracted by the LDA based on the animal species as discriminating factor. For compliant meat samples, characterized by a WHO-TEQ value lower than the maximal authorized level, the first axis appeared to be discriminant with respect to the different ruminant species, while the second axis permitted differentiation between ruminants and poultry samples. The details of this LDA are presented in Table 2. These results permit us to suggest that the contamination profile variability observed between the different ruminant species is mainly due to the congeners 1,2,3,7,8-PeCDD, 2,3,4,7,8-PeCDF, 1,2,3,4,7,8-HxCDD and 1,2,3,6,7,8-HxCDD, which were found to be less abundant in porcine meat than in bovine and ovine meat. The poultry group was separated using the same congeners except for the last one, but this group also had a higher content of 2,3,7,8-TCDF.

Fig. 4
figure 4

Representation of various food samples on the two principal axes extracted via LDA based on the animal species as discriminating factor

Table 2 Details of the LDA based on animal species as discriminating factor

Interpreting these results would require more information, especially regarding the source of contamination, the sample collection location and other sample details, because one main source of variability should be the source of exposure, with its qualitative and quantitative aspects. Nevertheless, other factors influencing the final dioxin content in biological samples should be linked to physical and chemical parameters (congener properties) and metabolic parameters (organism physiology, enzymatic material). These parameters can lead to differential transfer, distribution, biotransformation and storage of the different PCDD/F congeners in the organism, which consequently affect the observed contamination profile. In this way, the proposed statistical approach may be a useful tool for biologists and toxicologists that can help to reveal hidden patterns that need to be investigated and confirmed by conventional experimental protocols.

Conclusion

The goal of this study was to investigate the variability of the PCDD/PCDF contamination profile across a wide array of food products of animal origin using multivariate statistical analysis. The results demonstrated the existence of differences between the analyzed samples in term of congener-specific patterns. A variability that depends upon the sample nature (fish, meat, milk, fatty products) was first demonstrated. A variability that depends upon the animal species for meat and milk samples (bovine, ovine, porcine, caprine and poultry) was then observed. While some of these discriminations can be identified via a monodimensional approach (observation of the 17 congener profiles for several isolated samples), the utilization of LDA permitted us to confirm and depict these results on the basis of a large data set. Moreover, the power of the statistical techniques we used permitted us to reveal other differences not accessible with a unidimensional approach. The origin(s) of the observed differences as well as their significance remain to be investigated, both in terms of environmental factors and transfer to living organisms.

Obviously, such findings are only possible if the applied method is proved to be highly repeatable for all of the 17 congeners, whatever the biological matrices. Indeed, a difference observed between two samples in terms of the dioxin content must be clearly explained by a factor external to the analytical process. The analytical method used for this study was fully validated according to current European criteria, and the quantification process was based upon the use of a 13C-labeled internal standard for each monitored congener.

The results from this study may have potential practical applications. Therefore, preliminary work regarding the occurrence and transfer of PCDD/PCDF throughout the food chain is needed in order to obtain a better understanding of these observations. Precise characterization of the contamination profiles from specific sources and/or food products should be of great interest to scientists in the fields of contaminant analysis, toxicology and metabolism, as well as to regulatorybodies and risk assessors in charge of final decisions regarding the eventual risks associated with these substances.