Introduction

Populations from Mediterranean countries present lower rates of mortality by coronary heart disease and cancer than those from the US and northern Europe, which can be ascribed to the typical Mediterranean diet [1, 2]. Olive oil is one of the most important constituents of the Mediterranean diet, considering its demonstrated health benefits [3].

Due to the complexity of the procedure used to produce extra virgin olive oil, it is a high-price food, and so authentication is a key objective of organizations associated with olive oil. Authentication covers many different aspects, including adulteration, characterization, mislabelling and misleading origin. The adulteration of extra virgin olive oil with cheap vegetable oils is a frequent problem for regulatory agencies, oil suppliers and consumers. The most common adulterants found in extra virgin olive oil are refined olive oil, olive pomace oil, synthetic olive oil−glycerol products, seed oils such as sunflower, soy, corn and rapeseed, and nut oils such as hazelnut and peanut oil [47]. In addition to the economic fraud, adulteration may sometimes cause serious damage to health, as happened in 1981 during the Spanish toxic oil syndrome, which affected over 20,000 people.

Several methods have been proposed for qualitative and/or quantitative analysis of adulteration of virgin olive oil with other edible vegetable oils. The simplest methods, which can easily be implemented routinely in laboratories, are those based upon the analysis of a particular compound (a marker). These methods are useful for the qualitative detection of a particular type of adulterant, such as the determination of filbertone, which is used to identify the adulteration of olive oil with hazelnut oil [8], and also for quantitative purposes in the determination of a given fatty acid, a triglyceride or a parameter such as ΔECN42 [9]. Another option is the direct analysis of the sample without sample pretreatment but with dilution. This option is based upon the use of detection systems such as fluorescence [10, 11], Raman [12], infrared [13, 14] and nuclear magnetic resonance spectrometries [15, 16] as well as mass spectrometry [17,18], which provide characteristic fingerprints of the pure or adulterated sample. The high amount of data in these fingerprints cannot be processed without chemometric tools (exploratory and classification analysis approaches), which enable adulterated samples of virgin olive oil to be detected. Some methods [1014, 16] also provide quantitative information on the proportion of the adulterant; multivariate regression approaches are a great help in this respect.

A third alternative is to analyse a specific fraction of the oil sample; in this case sample preparation steps are mandatory. This is the case when liquid–liquid extraction is used for the analysis of polar compounds [19] or headspace for that of volatile compounds [20]. Other possibility is to perform a chromatographic separation in order to obtain the specific profile of the target compounds [2124]. In this case, the most time-consuming step is not the chromatographic step but the development of chemometric models, since a great number of samples need to be analysed; on the other hand, a shortcoming of spectroscopic detection techniques without sample preparation is that the spectral differences between most vegetable oils are quite small. The peak areas of compounds characteristic of the sample composition can provide a more selective fingerprint. In the case of edible oils, one of the most important fractions characteristic of each vegetable oil is the lipid fraction. Therefore, the determination of the lipid profile—fatty acids, triglycerides, etc.—by gas chromatography with flame ionization or mass spectrometry detection can provide useful information on the adulteration of virgin olive oil by vegetable edible oils.

The main limitation of methods based on the analysis of a specific fraction is that so far such methods have focused on the qualitative or quantitative detection of adulteration without considering the type of vegetable oil involved. The aim of the research presented here was to propose a step-by-step method for the detection of adulteration, the identification of the adulterant and its quantitation in extra virgin olive oil with four different vegetable oils (sunflower, corn, peanut and coconut oils). In order to obtain this high information level, a method based on the chromatographic analysis of the lipid fraction (fatty acids and a characteristic compound of the olive oil, such as squalene) with subsequent treatment of the data (chromatographic peak areas) was applied. Thus, the results from a common analysis performed in order to characterize olive oil quality (determine the lipid profile) can be efficiently employed for authentication. Therefore, mixtures of olive oil with the different vegetable oils were prepared in order to aid in the development of chemometric models with prediction capabilities.

Experimental

Instruments and apparatus

A vortex from Ika-Works (Wilmington, NC, USA) and a centrifuge from Selecta (Barcelona, Spain) were used in the derivatization step. A Varian (Palo Alto, CA, USA) CP 3800 gas chromatograph coupled to a Saturn (Sugar Land, TX, USA) 2200 ion trap mass spectrometer equipped with a VF-23 ms FactorFour capillary column (60 m × 0.25 mm, 0.25 μm), also from Varian, was used for the analysis of the fatty acids and squalene in the extracts.

Unscrambler 7.8 (Camo Process AS, Oslo, Norway) and Pirouette (Infometrix Inc., Woodinville, WA, USA) were used for data processing.

Reagents

The reagents used were n-hexane (HPLC grade) from Panreac (Barcelona, Spain), for the dilution of samples and standard solutions, and 0.5 M sodium methylate in methanol (Panreac) as a derivatization reagent in order to hydrolyse and transform the fatty acids into their corresponding methyl esters (FAMEs). All necessary safety precautions (gloves, mask, fume-hood, etc.) were employed when using the organic solvents.

Decanoic acid methyl ester, dodecanoic acid methyl ester, tetradecanoic acid methyl ester, hexadecanoic acid methyl ester, trans-hexadecenoic methyl ester, cis-hexadecenoic methyl ester, heptadecanoic acid methyl ester, octadecanoic acid methyl ester, octadecenoic acid methyl ester, trans-octadecenoic acid methyl ester, trans,trans-octadecadienoic acid methyl ester, cis,trans-octadecadienoic acid methyl ester, trans,cis-octadecadienoic acid methyl ester, cis,cis-octadecadienoic acid methyl ester, eicosanoic acid methyl ester, cis,cis,cis-octadecatrienoic, eicosenoic acid methyl ester, docosanoic acid methyl ester, docosenoic acid methyl ester, tricosanoic acid methyl ester, tetracosanoic acid methyl ester, pentacosanoic acid methyl ester and hexacosanoic acid methyl ester were from Sigma–Aldrich (St. Louis, MO, USA) and were used as standards to obtain a representative profile for fatty acids in oils. Squalene from Sigma was also employed due to its high content in olive oil in comparison to other edible vegetable oils and fats [25].

Samples

Eight different samples of extra virgin olive oil were used in this research. The adulterants were selected according to their use in Spain. Thus, four types of sunflower oil and two of corn, peanut and coconut oils were used to prepare by weighing out the adulterated samples to twenty-five different levels. Although it is difficult to envisage proportions of adulterant outside the range 5–50%, the mixtures prepared ranged from 0.01% to 100% adulterant oil in order to demonstrate the capabilities of the proposed method. The number of samples employed to develop chemometric models was 566.

Preparation of fatty acid methyl esters (FAMEs)

Pure or adulterated oil (0.1 g) was diluted to 5 mL with n-hexane and homogenized for 30 s in a vortex. Then, 0.5 mL of sodium methylate in methanol was added and shaken vigorously for 3 min in the vortex and centrifuged for 2 min at 2000 rpm. The supernatant was transferred to a test tube and evaporated to dryness under an N2 stream. n-Hexane (0.5 mL) was used to reconstitute the residue, which was shaken for 1 min. Finally, 1 μL of the solution thus obtained was injected into the chromatograph.

GC–MS separation–detection

Helium at a constant flow-rate of 1 mL/min was used as carrier gas for the GC–MS determination of the FAMEs. The column temperature program was 50 °C, held for 2 min, then increased at 5 °C/min to 250 °C, before finally being held for 15 min. Injections (1 μL each) used a 1:1 split ratio with the injector temperature set to 250 °C. The chromatogram was obtained within 40 min.

The ion trap mass spectrometer was operated using electron impact ionization (EI) in the positive mode. For EI experiments, the instrumental parameters were set to the following values: filament emission current, 80 μA; transfer line, ion trap and manifold temperatures, 210, 170 and 50 °C, respectively. The storage window was set to between 40 m/z and 600 m/z and the selected-ion monitoring (SIM) ion preparation mode was used. The scan time for data acquisition was set to 1.0 s with three microscans performed per second.

Visualization of outliers

Principal component analysis (PCA) using the peak areas of twenty four compounds—the 23 fatty acids plus squalene—was required in order to reduce the number of colinear variables, thus representing the samples in a new, reduced n-dimensional space. After identifying outliers, they were examined in order to decide whether they provide useful information or not; in the latter case they were removed.

Qualitative analysis

The qualitative classification analysis was based on the use of the soft independent modelling of class analogy (SIMCA) and K nearest neighbours (KNN). These algorithms are based on the multivariate similarity of a specific sample to others that comprise the training set.

Quantitative analysis

The quantitative determination of adulteration was carried out by partial least squares regression (PLSR). This methodology consists of first developing a calibration step in which the equations are obtained, and then validating the calibration model according to statistical parameters.

Results and discussion

The proposed method is novel because it uses a chemometric strategy followed by the analysis of chromatographic peak areas to detect, identify and quantify the adulteration of olive oil with different edible vegetable oils. An exploratory data analysis method—PCA—was applied to the whole sample set for qualitative purposes and to aid in the identification of the adulterant; using PCA, relationships between groups of samples could be identified as well as possible outliers. Classification analysis—supervised pattern recognition methods, KNN and SIMCA—were then used to distinguish between extra virgin olive oil and adulterated oils, as well as to identify the adulterant used. These methods consisted of selecting the training set (comprising a group of samples belonging to a well-known class) in order to establish a model, which was then validated by an independent test set. After this, multivariate regression–PLSR—was used to develop equations for the quantitative determination of the level of adulteration of the olive oil with different edible vegetable oils.

Exploratory data analysis

Principal component analysis (PCA) was applied to the data matrix corresponding to the peak areas of the FAMEs and squalene. Once the samples were in the new space defined by the principal components, their leverage values were computed. Variable profiles with leverages of higher than 0.5 were considered to be outliers, and these were examined to see whether or not they provided useful information; if they did not, they were removed. Seven outliers (possibly occurring due to operational errors) were detected and removed from the data matrix. However, sample groups were not distinguished in the plot (see Fig. 1), which can be explained by the high variability between the compound profiles of the adulterated samples.

Fig. 1
figure 1

Samples in the space determined by the first two principal components. The circles highlight outliers

Detection of adulteration and identification of adulterants

After outlier removal, SIMCA and KNN were used for qualitative classification purposes by randomly defining the training (to construct the models) and validation sets (to estimate the prediction capacity). 75% of the samples were placed in the training set and 25% in the validation set. While KNN is based on the distances between pairs of samples, SIMCA develops a principal component model for each class of the training set; when a new sample is projected into the PC space of each class, this sample is then assigned to the class it best fits. SIMCA is a more realistic predictive approach than KNN, as KNN assigns every sample to exactly one class while SIMCA provides three possible prediction outcomes: the sample fits only one predefined category; it does not fit any predefined category; it fits into more than one predefined category. For this reason, SIMCA was the first approach used in this study.

The training set was divided in two classes, one for pure olive oil and one for the other samples containing other pure vegetable oils and adulterated samples. The next step was to develop PCA models for each class, which, after outlier removal, was optimized in order to estimate the number of factors—PCs chosen independently for each model—for each class. The optimum number of factors was found to be three for all of the models generated. The variance explained was 100%, with a significance level of 5% providing the best results. The SIMCA model enabled us to detect that an extra virgin olive oil had not been adulterated with an accuracy of 100%. In addition, 91% of the samples adulterated at any level were properly classified. The false predictions were not classified into an erroneous category and remained undefined.

The next objective was to check whether SIMCA enabled us to identify the type of vegetable oil used in each adulterated sample. The original training and validation sets—75% (417 samples) and 25% (142 samples)—were therefore divided into five groups: one for the pure extra virgin olive oil samples and one for each pure adulterant oil (sunflower, corn, peanut and coconut oils). After the same optimization process as used in the previous SIMCA analysis, the prediction capacity of the SIMCA models was found to be unsatisfactory, as the technique correctly classified from 50 to 60% of the samples for all models. In addition, some samples were classified into different groups, which can be explained by the great variability of each class. Therefore, the SIMCA model generated can be efficiently employed as a qualitative tool to detect whether an extra virgin olive oil is adulterated or not, but it does not provide reliable information about what type of vegetable oil has been used as the adulterant.

The latter objective was then tackled using KNN, which, as previously indicated, is a less realistic predictive approach than SIMCA. All KNN models were constructed using variance scaling as the data preprocessing technique and a maximum neighbour value of 10 was used. Optimization of the model showed that the optimal neighbour value was a fifth, which gave 0% prediction error for all models built. The prediction abilities of the proposed models were always higher than 88%, which illustrates the ability of KNN analysis to identify the kind of vegetable oil used in the adulteration of virgin olive oil samples.

Therefore, qualitative analysis can be efficiently accomplished by combining the SIMCA and KNN models. Thus, SIMCA enables us to detect whether a sample is adulterated or not while KNN provides the type of adulterant used in the fraud. This can be explained by noting that SIMCA has a more restrictive character which does not fit well to the large sample variability used when constructing the model. By contrast, although KNN can be affected by data colinearity, the results showed that this approach had high prediction capabilities in our research. One advantage of the proposed method is that supervised analysis based on SIMCA and KNN establishes a classification criterion by assigning an unknown sample to a specific class of olive oil adulterated with another oil. This is an advantage over qualitative methods, which are mostly based on unsupervised analysis (PCA, hierarchical cluster analysis). The assignation of an unknown sample to a specific category could not be assured using the methods reported previously.

Quantitative analysis

Equation calibrations

Although our main interest was in being able to detect whether an olive oil was adulterated or not, quantitative analysis was also important since it could be used to evaluate the fraud economically. Table 1 provides information on the reference data, which includes ranges, mean values, standard deviations (SDs) and numbers of samples. The standard laboratory error (SEL), which corresponds to the standard error that arises when preparing a sample by weighing, is also included. As can be seen, the reference values range from 100% adulterant oil (0% extra virgin olive oil) to 0.01% adulterant (99.99% extra virgin olive oil). Also, the standard deviation for each set and compound is higher than the SEL; therefore, the theoretical R 2 is 0.99 for each adulteration according to the Mark and Workman equation [26, 27].

Table 1 Reference data for quantitative analysis (ranges, means, standard deviations and SELs, expressed as %)

Partial least squares regression (PLSR) was used to develop the calibration equations based on the calibration dataset. The calibration (used for model development) and validation sets (used to test equations) were defined independently. Again, 75% of the samples were placed in the calibration set and 25% in the validation set. The validation set was selected by performing PCA and calculating the subsequent distance H. The validation set was chosen by calculating both PCA and subsequent H distance, then selecting the samples that were most strongly separated from the others (H < 0.6) and those with the highest number of neighbours. The results obtained, based on the statistical parameters described below, were similar no matter which mathematical preprocessing approach was used.

The cross-validation procedure was used to calibrate the equations. The minimum value of the standard error cross-validation determined the number of PLS factors in each equation, thus avoiding overfitting problems. A study of possible outliers was carried out when predicting the cross-validation, which took into account the statistical t (Student’s test) parameter, set to 2.5. Table 2 shows the statistical results obtained for the calibration step, including statistic parameters such as the SECV and the determination coefficient (R 2), the number of samples used for the calibration set for each adulterant oil after the removal of calibration outliers, and the PLS factors used to construction the equations. The number of calibration groups was set to 5, and the maximum number of PLS factors was 22 for adulterations with sunflower oil and 11 for the rest. Selection was based on the following rule: one PLS factor per ten samples of the training set plus two. The criteria proposed by Shenk and Westerhaus [28], based on the values of R 2 and SECV, were used in this section. Thus, R 2 values higher than 0.90 and SECV values lower than 1.5×SEL indicate excellent precision. R 2 values of between 0.70 and 0.90 indicate good precision, as do SECV values between 2×SEL and 3×SEL. Finally, R 2 values of <0.70 indicate that the equation can only be used for screening purposes, which enables us to distinguish between low, medium and high values of adulteration. If the R 2 value is lower than 0.50, the equation only discriminates between high and low values.

Table 2 Results obtained using the proposed equations (ranges, means, standard deviations and SEPs expressed as μg g−1)

External validation

After the calibration equations had been developed, they were tested using the validation set; thus, statistical parameters such as the standard prediction error (SEP) and R 2 were obtained. Table 2 shows values for SEP, R 2 for validation, slope and bias. These parameters enabled us to evaluate the analytical quality of the equations; thus, the values of slope and bias were useful to distinguish systematic errors and to study the correlation between the reference data and those provided by the proposed method. Slope and bias were evaluated to test whether they were statistically equal to 1 and 0, respectively; a significance level of 0.5% was applied. The nonsignificant ranges are shown in the slope and bias columns (the values for these parameters were always within these ranges). The slope of the correlation was always lower than 1 for corn and sunflower, which means that the values provided by the proposed method were systematically higher than the reference values, considering that the proposed method corresponded to the abscissa in the correlation plots. For adulteration with peanut and coconut oils, the slope of the correlation was always higher than 1 and so the opposite situation applied.

The most important result achieved in this part of the research was that the SEP values were lower than 1.5×SECV for all of the vegetable oils tested; therefore, taking into account the SEP criterion, the equations developed were robust. The R 2 values for the correlation between the reference values and those obtained with the proposed method were always higher than 0.90. According to the R 2 criteria, all types of adulteration could be determined with excellent precision.

The most dramatic aspect of the methodology proposed here is the quality of the statistical parameters for the quantification step, taking into account the extended range over which the adulteration can be determined (0.01–100%). As far as the authors know, no methods based on IR detection—the technique most commonly used to detect adulteration—capable of quantifying adulteration at these low levels have been reported so far. In fact, the main limitation of infrared detection is its lack of sensitivity, although this is not as important in adulteration detection as in other analytical fields. Another advantage of the proposed method is the number of variables (the fatty acid profile and squalene) used to develop the chemometric models, which provide a characteristic fingerprint of a given sample. This provides a more robust methodology than used for other reported methods based on adulteration detection by quantification of a particular marker [8, 9], which could be corrected.

Conclusions

A sequential analytical method using chromatographic peak areas has been developed for the detection, identification and quantitation of adulteration in extra virgin olive oil for the first time. The method enables us to: (1) distinguish between pure extra virgin olive oil samples and those adulterated with one of the vegetable oils tested (sunflower, corn, peanut and coconut oils); (2) identify the adulterant used for the fraud; (3) determine the content of the latter in the adulterated sample. The results obtained permit the qualitative detection of adulteration and the identification of the type of adulterant involved with accuracies of above 91 and 88%, respectively. For quantitative analysis, the accuracy of the equations has been demonstrated using the Shenk criteria, since it is possible to estimate the percentage adulterant with excellent precision. The only data required were those obtained in a routine analysis of olive oil quality based on the determination of the lipid fraction by gas chromatography–mass spectrometry.