Introduction

In recent years, there has been a growing interest among consumers in the safety and traceability of food products. In particular, there has been an increasing focus on the geographical origin of raw materials and finished products, for several reasons including specific sensory properties, perceived health values, confidence in locally produced products and media attention [1]. As a result of these factors, the European Union recognised and supported the differentiation of quality products on a regional basis [2], introducing an integrated framework for the protection of geographical origin for agricultural products and foodstuffs by specific regulation [3]. This regulation permits the application of the following labels to a food product: protected designation of origin (PDO), protected geographical indication and traditional speciality guaranteed.

In order to assure quality and to safeguard consumers from commercial frauds, there is an increasing need for analytical tools able to verify whether a product, sold with a specific label, is actually compatible with that claim or not.

The present work was focused on building efficient class models for the Chianti Classico PDO olive oil, potentially useful to guarantee the authenticity of this product and to avoid possible frauds, using fast and easy-to-automate methodologies. Such an achievement may be of great interest both for the Chianti Classico PDO olive oil consortium, which has been protecting and promoting the PDO appellation since its creation in 2001 [4, 5], and for consumer associations.

Classical chemical analyses, including fatty acids, sterols and triterpenic alcohols determinations, followed by a proper application of chemometric pattern recognition strategies, have widely demonstrated to be excellent tools for recognising the geographical origin of olive oils [68]. Nevertheless, these traditional analyses have several drawbacks, the most significant of which are low speed, the necessity for sample pre-treatments, a requirement for highly skilled personnel and destruction of the sample.

Several fast and non-destructive instrumental methods have been proposed to overcome these hurdles [9]. Among them, infrared and UV–Vis spectroscopy, artificial noses and tongues have proven to be successful analytical methods for characterisation of olive oil [1013]. All these techniques give non-specific fingerprints of oil samples. It means that the instrumental signals are not employed to verify the existence or absence of particular chemical compounds in a sample, neither to measure single chemical/physical properties, but to obtain a comprehensive, multivariate, description of the sample, analogous to a fingerprint. Such instrumental non-specific fingerprints embody complex—multivariate—information, which may be successfully used for authentication purposes, by means of a proper multivariate pattern recognition analysis.

As for the chemometrics strategies, classification and, foremost, class-modelling techniques represent appropriate tools for verifying authenticity of a product. Discriminant classification methods—such as linear discriminant analysis (LDA) and partial least squares discriminant analysis—require at least two classes to be considered. Such techniques build a delimiter between these classes and, then, they always assign new samples to one of these classes, even in the case of samples belonging to other—non-considered—classes. Instead, class-modelling techniques study each class individually, providing an answer to the general question: “Is this sample, claiming to be of class X, really compatible with the class X model?”, which is essentially the question to be answered in food authentication [14].

In the present study, class models for the Chianti Classico PDO olive oil were built using both classical and state-of-the art chemometric methods, comparing and discussing their relative performances and their applicability.

Experimental

Olive oil samples

A total number of 57 extra virgin olive oils, all obtained from the 2008–2009 olive harvest, were analysed. This sample set comprised 23 authentic (i.e., of guaranteed provenance) oils from different producers associated to the Chianti Classico PDO consortium and 34 oils from a close production area (Maremma). These latter samples are characterised by a number of features—e.g., the olive cultivars—which are closely related to those of Chianti Classico PDO. For this reason, they can be considered as an excellent set of test samples, helpful to assess the performances of Chianti Classico models.

The samples were collected directly and personally from known and reliable farmers and oil mills.

Instrumentation and procedures

Olive oil samples were analysed by means of four fingerprinting instruments: a NIR and a UV–Vis spectrophotometers, an artificial nose and an artificial tongue.

NIR measurements were performed by means of a FT near-infrared spectrophotometer based on a polarisation interferometer (Buchi NIRFlex N-500) in the 4,000–10,000 cm−1 range at 4 cm−1 resolution. The samples were analysed in the transmittance mode using quartz cuvettes with 5 mm path length. Samples were measured under thermostated conditions at 35 ± 0.5 °C.

UV–Vis absorption spectra were obtained in the range 190–1,100 nm, at 1 nm resolution, using an Agilent 8453 spectrophotometer. The radiation source is a combination of a deuterium-discharge lamp for the ultraviolet wavelength range and a tungsten lamp for the visible wavelength range. The measurement cells employed were quartz cuvettes with 5 mm path length.

In order to remove oil turbidity that could affect NIR and UV–Vis spectra, all the oil samples were centrifuged at 3,000 rpm for 30 min before being analysed.

The artificial nose (headspace mass spectrometer) used in this study was assembled in our laboratory and it is formed by an automatic headspace sampler and introduction system, directly coupled to a quadrupole mass detector without any interposed chromatographic separation. An extensive description of the instrumental configuration and of the analytical procedure can be found elsewhere [15]. Signals were recorded between 50 and 150 m/z.

Artificial tongue analyses were performed by an Amel 2059 potentiostat/galvanostat equipped with an Amel 7800 interface and function generator. For the measurements, a two-electrode cell was employed. The working electrode was a Pt microdisk (r = 12.5 μm), coupled with an Ag wire pseudo-reference electrode. Olive oil samples were previously filtered through paper to remove the particulate and mixed with the room temperature ionic liquid trihexyl(tetradecyl)phosphonium bis(trifluoromethylsulfonyl)imide, at 0.1 M concentration, in order to provide sufficient conductivity to the matrix [13]. The cyclic voltammograms (three consecutive cycles) were recorded by applying the following conditions: initial potential = 0.0 V; low potential = −3.0 V; high potential = +3.0 V; final potential = 0.0 V; initial scan polarity: negative; scan rate = 20 mV s−1; T = 21 ± 1 °C. The complex current/potential profiles obtained were used—in the same way as the spectral profiles—to characterise the oil samples.

All four analytical techniques allowed measurements to be taken directly from the oil samples, without complex sample pre-treatments, resulting in a significant reduction in time and costs. Two series of measurements were taken for all the samples, randomising the order of analysis. Finally, each sample was identified with the average of the two replicated signals, acquired from each instrumental technique.

Data analysis

Digital signals were organised in four data matrices, one for each instrumental technique. Each matrix was arranged in 57 rows—corresponding to the samples analysed—and as many columns as signal elements, i.e., instrumental variables.

NIR spectra were reduced to the range 4,400–8,900 cm−1 and UV–Vis spectra were reduced to the range 290–730 nm, in order to eliminate regions non-informative and/or characterised by a low signal-to-noise ratio. For the same reasons, electronic nose mass spectra were reduced to the range 45–90 m/z. Then, all these signals have been pre-treated by the standard normal variate transformation [16] and, subsequently, by column centering. Electronic tongue cyclic voltammograms have been pre-treated by computing the differences of the current values evaluated at the same potential (forward scan–backward scan) [13] and, subsequently, by column autoscaling.

As a first step, principal component analysis (PCA) has been applied as an unsupervised exploratory tool, useful to visualise the actual sample distribution in the multivariate space [17].

Then, a number of classical and novel supervised class-modelling methods have been performed, in order to build models for verifying authenticity of the Chianti Classico PDO olive oil.

Soft independent modelling of class analogy (SIMCA) was the first class-modelling technique introduced into chemometrics [18]; the central feature of this method is the application of PCA to the sample category studied (e.g., a PDO food product), generally after within-class autoscaling or centering. SIMCA models are defined by the range of the sample scores on a selected number of low-order principal components (PCs) referred to as the SIMCA inner space. Model shapes therefore correspond to rectangles (two PCs), parallelepipeds (three PCs) or hyper-parallelepiped (more than three PCs).

Unequal dispersed classes (UNEQ), originating in the work of Hotelling [19], was introduced into chemometrics by Derde and Massart [20]. This technique derives from quadratic discriminant analysis; it is based on the hypothesis of a multivariate normal distribution in each category studied and, consequently, on the use of T 2 statistics to define a class space. The UNEQ model is the class centroid, i.e., the vector of the mean values of each variable, computed from the objects of the modelled class. The boundary of the class space around the centroid is an ellipse (two variables), an ellipsoid (three variables) or a hyper-ellipsoid (more than three variables). The dispersion of a class space is defined by the critical value of the T 2 statistics at a selected confidence level. UNEQ should be applied in cases when the ratio between the number of objects in a given category and the number of the variables measured is three or greater. In cases involving many variables (such as spectral and voltammetric data), it is possible to apply UNEQ following a preliminary reduction in variable number by PCA.

In comparison with these two classical techniques, two methods recently developed and presented have been applied in the present study: multivariate range modelling (MRM) and CAIMAN analogues modelling methods (CAMM).

MRM [21] follows a simple and intuitive way of building class models by employing the ranges of the predictors. In order to take into account intercorrelations between variables in the dataset, several additional features are computed as linear combinations of the original predictors and employed to build the models: LDA canonical variables represent a typical example. A basic feature of MRM is the capability of providing, by definition, class models with 100% sensitivity: this property is very important when a model is built for a food protection consortium, the production of whose affiliated producers must be all recognised and accepted by the model. A further advantage of MRM can be found in its outcome, which is easily understandable and interpretable also by people with a limited knowledge of multivariate analysis.

CAMM [22] is a family of powerful class-modelling techniques which builds the class models using distances (leverages or Mahalanobis distances from the class centroids) as predictors. Such new variables may be used separately or in combination with the original predictors; the class models and the class spaces are defined as in the case for UNEQ.

Data analysis was performed by means of the chemometric package V-PARVUS [23].

Results and discussion

Exploratory analysis

Figures 1, 2, 3 and 4 show the PCA score plots of Chianti Classico and Maremma olive oil samples, corresponding to the four different instrumental data matrices. The Cartesian axes correspond, in each case, to the PCs presenting the greatest Fisher weight, which is an estimation of the ratio between inter-classes and intra-class variance [24], used as a measure of the power for distinguishing between the two categories of oil considered.

Fig. 1
figure 1

PCA score plot of NIR data; red Chianti Classico samples, green Maremma samples

Fig. 2
figure 2

PCA score plot of UV–Vis data; red Chianti Classico samples, green Maremma samples

Fig. 3
figure 3

PCA score plot of artificial nose data; red Chianti Classico samples, green Maremma samples

Fig. 4
figure 4

PCA score plot of artificial tongue data; red Chianti Classico samples, green Maremma samples

From the examination of these plots, it clearly arises that the two spectroscopic techniques allow a direct and well-resolved differentiation of the Chianti Classico oils from the Maremma samples. In fact, the samples corresponding to oils of the two categories are evidently clustered and distinguishable in the space defined by the first two PCs (Figs. 1 and 2).

On the other hand, as for artificial nose and tongue data, the distinction between Chianti Classico and Maremma samples is not perfectly defined and the samples of the two classes partially overlap. Anyway, two regions characterised respectively by a relatively high density of objects belonging to one of the two classes are noticeable on the plots (Figs. 3 and 4).

This suggests that also artificial nose and tongue signals contain information potentially useful for characterisation of the Chianti Classico olive oil although, in this case, a higher data complexity degree is evident.

Class-modelling

Seeing as in all four data sets, corresponding to the four analytical techniques, the number of variables was very high compared to the number of samples, class-modelling methods were always applied working on the scores of low-order principal components (two PCs for UV–Vis and NIR spectroscopy, seven PCs for artificial nose and tongue).

Model validation was performed by a cross-validation scheme with five cancellation groups. The evaluation parameters computed are sensitivity (i.e., the percentage of samples of the modelled class correctly accepted by the class model), and specificity (i.e., the percentage of samples extraneous to modelled class correctly rejected by the class model). Sensitivity may be considered as an experimental evaluation of the confidence level of a model, while specificity may be considered as an experimental measurement of the model rejection ability of non-pertinent samples. In order to take into account the balance between these two parameters, a model efficiency value was computed as the geometric mean of sensitivity and specificity.

As for SIMCA, UNEQ and CAMM, class models for Chianti Classico PDO olive oil have been built at a 95% confidence level. After that, new models forced to retain all the objects of the training set of the modelled class (corresponding to 100% sensitivity) have been developed as well. MRM provides, by definition, models with 100% sensitivity. Therefore, MRM outcomes should be compared with those obtained with the other methods for the forced models.

Tables 1, 2, 3 and 4 report the modelling results for the characterisation of the Chianti Classico PDO olive oil with the four analytical techniques employed in this study, respectively.

Table 1 Modelling results of NIR data
Table 2 Modelling results of UV–Vis data
Table 3 Modelling results of artificial nose data
Table 4 Modelling results of artificial tongue data

As expected from the results of the exploratory analysis, the spectroscopic techniques provided the most efficient models, with cross-validation efficiency values generally higher than 90%. Using spectral data, all the class-modelling methods investigated performed very well, not only at the 95% confidence level but also in the case of the forced models. Actually, it is quite common that such models, enlarged to accept all the samples belonging to the modelled class (100% sensitivity), fall to very low—often unacceptable for practical purposes—specificity values. Conversely, the forced models built for the Chianti Classico oil from NIR and UV–Vis data showed specificities generally higher than 95%, thus providing very satisfactory efficiencies (Tables 1 and 2).

Artificial nose and tongue data, due to the higher complexity of the information embodied, as noticed by the exploratory analysis, allowed to obtain models that are characterised by a generally lower efficiency (Tables 3 and 4).

Nevertheless, it is possible to notice that, for both the data sets, there is a sensible difference between the performances of MRM and CAMM and those of UNEQ and SIMCA.

In particular, UNEQ and SIMCA models were characterised by cross-validation efficiency values lower than 50%: such models have no practical interest. Conversely, as for artificial nose data, MRM provided a model with well-balanced sensitivity and specificity values in cross-validation, and 70% efficiency. Instead, in the case of artificial tongue data, the most efficient model was obtained by CAMM (80% cross-validation efficiency).

Figures 5, 6, 7 and 8 show the Coomans’ plots corresponding to the CAMM models obtained for the four analytical techniques, respectively. The graphs confirm that CAMM is able provide efficient models for the characterisation of Chianti Classico PDO olive oil using data from any of the analytical techniques investigated in this study.

Fig. 5
figure 5

Coomans’ plot of CAMM model for NIR data; red Chianti Classico samples, green Maremma samples

Fig. 6
figure 6

Coomans’ plot of CAMM model for UV–Vis data; red Chianti Classico samples, green Maremma samples

Fig. 7
figure 7

Coomans’ plot of CAMM model for artificial nose data; red Chianti Classico samples, green Maremma samples

Fig. 8
figure 8

Coomans’ plot of CAMM model for artificial tongue data; red Chianti Classico samples, green Maremma samples

From the explorative and modelling results, it arises that the spectroscopic techniques worked better than artificial nose and tongue for characterising the Chianti Classico PDO olive oil. Anyway, the comparison of the modelling outcomes show that the novel methods applied (MRM and, mainly, CAMM) are effectively capable of extracting the useful information from complex data and of providing acceptable models also in the case of overlapping distributions.

Conclusions

The present study proves the usefulness of four fingerprinting techniques in the authenticity verification of an Italian PDO olive oil. In particular, it is possible to conclude that, among them, NIR and UV–Vis spectroscopy provided directly the most valuable information for the problem studied. Conversely, the artificial nose and the artificial tongue employed for the same purpose supplied more complex outcomes. Furthermore, four class-modelling methods (UNEQ, SIMCA, MRM and CAMM) were compared. This comparison showed that the classical strategies, UNEQ and SIMCA, worked very well on well-resolved data, but they failed in supplying efficient models in the case of complex data distribution. Instead, the novel methods, MRM and CAMM, are capable of extracting the useful information from complex data, so that they should be preferred when partially overlapping classes are being modelled.