Introduction

The non-ionizing regions of electromagnetic spectrum are capable of creating useful information when interact with biological materials. Mid- (MIR) and near- (NIR) infrared have been the most applied regions, especially when clear distinction is needed between structurally similar food matrices [1]. Olive oil is a food material that does not provide distinct visual differences regardless of its different olive origin, variety, harvest time, or quality properties. Elaborate chemical analyses are usually needed to sufficiently elucidate all the intrinsic differences in olive oils. However, olive oil, like all the other lipid-based foods, maintains a specific fingerprint pattern under MIR and NIR spectroscopy. Any form of modulation in this pattern provides hints that could be used in olive oil characterization [2]. Similarly, odor-responsive non-specific electronic nose (e-nose) sensors have found effective usage in the field of olive oil [3,4,5]. Apart from serving as an effective mimicker of human olfactory system, e-nose is a waste-free and rapid analytical technique like spectroscopy. With the advent of these techniques, expensive, technically demanding, and time-consuming classical chemical analyses of olive oil might be reduced.

Main areas of olive cultivation and olive oil production are in the west coast of Aegean region in Turkey. The most economically significant and highly distributed varieties include Ayvalik, Memecik, and Erkence. The oils obtained, especially from Ayvalik and Memecik, have some degrees of overlap in chemical characteristics, but also many differences [6]. Extensive chemical evaluations are required to establish these differences with absolute clarity. This is owing to the fact that oils obtained from the same geographical location present some difficulties during classification [7]. There are various attempts to confirm varietal, geographical origin, and harvest season of olive oils using different chemical analyses or combination of methods. Some examples include the use of gas chromatography (GC), nuclear magnetic resonance (NMR), high-performance liquid chromatography (HPLC), electrophoresis, and differential scanning calorimetry (DSC) [8,9,10,11,12]. These studies have largely been successful, but the amount of time, energy, and resources invested on these analytical methods make them infeasible for online process monitoring and quick decision making.

In reality, consumers basically rely on physical means of assessing olive oil before making a purchase. Color, appearance, and labels are the only observable means for consumers to make the right choice. Adulteration, misrepresentation, and mislabeling of olive oils have become global phenomena and these have dwindled the confidence of consumers. However, it is almost impossible to have a single analytical method that can unequivocally determine the varietal, geographical, and even harvest time differences in olive oil at the same time. Oils of different varieties may still be wrongly classified if some of their chemical properties overlap. Therefore, attentions have shifted toward more rapid, simple, cost-effective spectroscopic, and electronic sensing methods. There are reports about classification of olive oils with respect to sensory qualities, varietal differences and geographical origins using MIR spectroscopy [13,14,15]. In the study of Inarejos-García et al. [16], NIR spectroscopy was applied to classify olive oils of different varieties and sensory properties. Application of e-nose in olive oil discrimination also cuts across sensorial, varietal, and geographical differences with reasonable level of success [17,18,19].

There are relatively low numbers of reports on the combined application of these methods in extra virgin olive oil classification. Sinelli et al. [20] classified extra virgin olive oils using NIR-MIR data fusion, while Forina et al. [21] and Oliveri et al. [22] combined NIR, UV, and e-nose to evaluate specific monovarietal extra virgin olive oils. Borràs et al. [23] used the spectroscopic data from different sources to predict the sensory state of olive oils. So far, there are no reports that combined or compared the performances of NIR, MIR, and e-nose techniques on the classification of olive oils produced in the Aegean coast of Turkey. Thus, this study aims at determining and comparing the discriminative abilities of NIR, MIR, and e-nose on different varieties of Turkish olive oil. The proposed methodology can provide a basis for effective and rapid methods for the differentiation of olive oils.

Materials and methods

Olive oil samples

Sixty-three olive oil samples belonging to Ayvalik (A), Memecik (M), and Erkence (E) olive varieties grown in west coast of Turkey, obtained between 2012 and 2015 harvest years, were analyzed (Table 1). Ayvalik, Memecik, and Erkence oils were obtained from Edremit Bay area, Aydin, and Karaburun peninsula of Izmir, respectively. The defined growing and production areas are nearby and relatively small geographies with respect to whole Aegean coastal line; however, they are the main olive oil producing and exporting regions of Turkey. Olive oil samples were kept in dark glass containers and head spaces were flushed with nitrogen prior to refrigerated storage (4 °C).

Table 1 Olive oil samples of 2012–2015 harvest years

FT-NIR spectroscopy

Fourier transform (FT)-NIR spectra of the oil samples were collected in transmission mode with a FT-NIR spectrometer (MPA, Bruker Optics, Ettlingen, Germany), using disposable vials of 8 mm path length. A spectral range of 12,500–4000 cm−1 was used, with 8 cm−1 resolution, 10 kHz scanner velocity, and 16 scans for both background and sample. Spectra were taken in duplicate, at room temperature. Instrument control was performed by the OPUS software (v. 6.5 Bruker Optics, Ettlingen, Germany).

FT-IR spectroscopy

A FT-IR spectrometer (VERTEX 70, Bruker Optics, Ettlingen, Germany), with an incorporated deuterated triglycine sulphate (DTGS) detector, was used in the collection of FT-IR data over the range 4000–700 cm−1. The operational conditions of the instrument were: 16 cm−1 resolution; 7.5 kHz scanner velocity; 32 scans for both background and sample; room temperature. The samples were positioned on an Attenuated Total Reflectance (ATR) germanium crystal with multiple reflections and measurements were replicated twice. Instrument control and data acquisition were carried out using the Opus software (v. 6.5, Bruker Optics, Ettlingen, Germany).

E-nose data acquisition

The aroma fingerprints of olive oil samples were evaluated with the application of a Portable Electronic Nose, PEN2 (Win Muster Air sense Analytics Inc., Schwerin, Germany). The system consists of a sampling/injection compartment, a sensors-embedded detector unit, and a pattern recognition analytical software (Win Muster v.1.6) for data acquisition. There are ten metal oxide semiconductors (MOS) sensors available in the system. The sensors are identified as: W1C (aromatic), W5S (broad-range), W3C (aromatic), W6S (hydrogen), W5C (aromatic-aliphatic), W1S (Sulphur-organic), W2S (broad-alcohol), W2W (Sulphur-chloride), and W3S (methane-aliphatic) [24]. The numerical signal outputs of the system are expressed as resistivity Ω (ohms). The oil sample (2 g) is placed in 30 mL Pyrex® vials fitted with pierceable Silicon/Teflon disk in the cap. After 10 min equilibration at 40 °C, the sample headspace is pumped over the sensor surfaces for 60 s (injection time) at a flow rate of 300 mL/min, and during this time, the sensor signals are recorded. After sample analysis, the system is purged for 180 s with filtered air prior to the next sample injection to allow reestablishment of the instrument baseline. The sensor drift was evaluated using a standard solution of 1% ethanol included in each measurement cycle. Each olive oil sample was analyzed in duplicate.

Data analysis

In the multivariate classification models, the following data matrices were used, each containing 63 oil samples:

  1. (a)

    NIR data matrix included the range 9000–4000 cm−1 that constituted the most informative segments of the spectrum.

  2. (b)

    MIR (4000–700 cm−1) and reduced MIR spectra (3090–2750 and 1874–700 cm−1) were considered separately.

  3. (c)

    NIR-MIR matrix consisted of concatenated spectra created by merging NIR and MIR reduced data.

  4. (d)

    E-nose data matrix composed of ten variables (MOS sensors).

The replicated spectra were averaged prior to the multivariate analysis. All spectra matrices were subjected to standard normal variate (SNV) in conjunction with the second-order derivative (2der) filtering techniques. The selection of these methods was largely due to the nature of the data and prior experience on their abilities to successfully separate light scattering from light absorbance. SNV is a simple row-oriented spectra transforming algorithm effective for scattering correction using centering and scaling of individual spectra. This helps to minimize spectra matrix dimensionality effects by normalizing the variables to the same scale [25]. The second-order derivative is calculated by obtaining the differences between two consecutive points, by smoothing, specified gap distance, or Savitzky–Golay polynomial fitting with 15 points. This procedure improves the signal-to-noise ratio by the removal of random noise [26].

PCA (Principal Component Analysis) was applied as a qualitative inspection tool to visualize possible patterns, clusters, or trends among the samples. This usually precedes any form of class modelling or discriminant analysis that might come next. PCA applies mathematical principles to transform a number of mutually correlated variables into smaller uncorrelated ones called principal components (PC). This is achieved using a vector space method to reduce data dimensionality [27].

As supervised data classification technique, OPLS-DA (Orthogonal Projection to Latent Structure-Discriminant Analysis) was applied to classify the samples on the basis of varietal differences. OPLS-DA evaluates the relationship between two data matrices (X and Y), where Y is a user-created class information for the oil samples defined as follows: class 1 (Ayvalik), class 2 (Memecik), and class 3 (Erkence) for the varietal differences. OPLS-DA modifies the classical partial least square model with its ability to separately categorize systematic variation in the X data matrix into two distinct parts: a predictive part correlated to Y variable (class information in this case) and an orthogonal part uncorrelated to Y [28]. OPLS-DA models were validated using both a cross validation procedure with seven cancellation groups (five CV) and an external test set composed of 23 spectra (17 A, 15 M, and 8 E). The external validation set was randomly selected and proportionate to the total number of observations in each varietal class. Details of the computed statistical outputs were reported in terms of number of PCs used (PC_p + PC_o, where p and o stand for predictive and orthogonal components, respectively), coefficients of determination for calibration (\(R_{\text{cal}}^{2}\)), cross validation (\(R_{\text{CV}}^{2}\)), and percentage of correctly classified samples (confusion matrices) for both calibration and prediction models.

All the statistical analyses were carried out by SIMCA software (v. 13, Umetrics, Umea, Sweden) and model overfitting was avoided by using autofit (automatic fitting) embedded in the software.

Results and discussion

Spectra and e-nose data interpretation

Representative FT-IR and FT-NIR spectra as well as e-nose signals of the olive oils are presented in Fig. 1. Each significant FT-IR spectral band (Fig. 1a) represents vibrational response of the chemical composition of the oil at molecular level. The two main bands (3006–2854 and 1746–1654 cm−1) as well as the fingerprint region (1464–983 cm−1) represent a number of vibrational modes depending on the predominant and most responsive functional groups of that region. A small shoulder-like band at 3006 cm−1 is due to the stretching vibration of cis double-bond, as expected in any natural edible oil. Asymmetric and symmetric stretching vibration of CH2 aliphatic functional group is responsible for the bands at 2924 and 2854 cm−1. Oils are esters of unsaturated fatty acids and the bands representing ester carbonyl functional group are found at 1746 cm−1, while the weak band at 1654 cm−1 is associated with C=C vibration. Bending and rocking vibrations are the common modes of vibration within the fingerprint regions (1464–983 cm−1) [29]. There are four major informative regions in the FT-NIR spectra (Fig. 1b). The range 4800–4500 cm−1 indicates a combined CH stretching vibration in conjunction with other vibrational modes. Two conjoined peaks within 6000–5300 cm−1 represent methyl, methylene, and ethylene first overtone stretching vibration. Variation in the frequency of this region is an index of oxidative quality of olive oil. In 7400–6250 cm−1 region, there are CH stretching vibrations. Finally, 8700–8100 cm−1 band is the second overtone of CH stretching vibration of methyl; methylene and ethylene interrupted; or conjugated groups [30].

Fig. 1
figure 1

Representative FT-IR (a), FT-NIR (b), SNV+2der pretreated FT-NIR-IR spectra (c), and e-nose signals (d) of olive oil samples

Three e-nose sensors (W5S, W1S and W2S) are relevant in the discrimination of oil samples (Fig. 1d). These sensors are responsive to broad-range, broad-methane and broad-alcohol flavor compounds, respectively. Other important groups of sensors that enhance the discriminant ability of e-nose data matrix are aromatic-based compounds as W1C, W3C, and W5C and sulphur-organic compounds as W1W and W2W [31].

PCA and OPLS-DA results

PCA score plots of the four data matrices, revealing patterns of projection of observations, are shown in Fig. 2. The common observation to all models is the sufficiency of PCA to clearly distinguish oils of Erkence variety, whereas there are varying degrees of overlap between Ayvalik and Memecik olive oil samples. PCA model of FT-IR data with ten PCs explained 57% of the total variance with clear separation of E olive oils. A similar output was obtained with FT-NIR spectra, where the first seven PCs explained 83% of the data variation. FT-NIR sparingly showed some discrepancy between M and A oils compared to inseparable cluster in FT-IR model. The model of combined FT-NIR-IR spectra was very similar to the one obtained with FT-NIR spectra alone, explaining 81% total variance using six PCs. This is similar to the result of PCA model of concatenated NIR and MIR spectra reported in literature [32]. Alamprese et al. [33] observed NIR region as the most significant part in fused signals evaluation for meat species discrimination. E-nose data showed somewhat different projection with E oils partially grouped in the positive part of PC1. In spite of PCA model with a good explained variance (98%, five PCs), e-nose data failed to provide a distinct separation among the samples unlike the spectral models. The failure clustering of Ayvalik and Memecik oils can be explained by the close threshold response in their volatiles.

Fig. 2
figure 2

Score plots of PCA models of FT-IR (a), FT-NIR (b), combined FT-NIR-IR (c), and e-nose (d) data sets

OPLS-DA results

In Tables 2 and 3, the varietal classification rates (as percentages), number of PCs used, and coefficients of determination of OPLS-DA models are given for FT-IR and FT-NIR spectra (separately or combined), as well as for e-nose data. The varietal discrimination (A, M and E) with FT-IR and FT-NIR spectra shows at least 90% correct classification in each class for both calibration and external validation models. In calibration, FT-NIR spectra gave 100% correct classification of samples for each varietal class, compared to 94% of Ayvalik variety obtained with FT-IR spectra, with one misclassified sample. Similar observations exist in literature, where NIR was found to give better overall model outcomes compared to MIR [34]. In any case, results obtained in prediction are equal for the two spectroscopic techniques, with A variety being the more difficult to fully discriminate. A common observation to both calibration and prediction models of all the data matrices is that there was no misclassification of oils of Erkence variety. This is consistent with the previous observations about PCA models. In spite of the unequal number of observations in E class compare to A and M, FT-NIR, FT-IR, combined FT-NIR-IR spectra and e-nose data were consistent in correctly differentiating E samples. Coefficients of determination were relatively similar in both FT-NIR and FT-IR models (Table 3), whereas they were higher for the combined FT-NIR-IR spectra. Actually, as it can be seen in Fig. 3, the combination of FT-NIR and FT-IR spectra resulted in a more distinctive class separation compared to the models of FT-NIR and FT-IR spectra separately. Even though the results of combined spectra are the same as those of FT-NIR in terms of correct classification in both calibration, cross validation, and prediction, it can be resolved that modelling the most informative details coming from both FT-NIR and FT-IR spectroscopy improves the separation of different olive oils, as also stated in the work of Borràs et al. [15].

Table 2 OPLS-DA calibration and validation model results: correct classification rates of the oil samples
Table 3 Individual and fused matrix OPLS-DA calibration model parameters
Fig. 3
figure 3

Score plots of OPLS-DA models: FT-IR (a), FT-NIR (b), and combined FT-NIR-IR (c) spectra

Satisfactory class modelling and prediction outcomes were obtained from OPLS-DA models of e-nose data (Table 2). In calibration, correct classification rates for A (94%) and E (100%) samples were similar to those obtained with FT-IR spectra. In prediction, better results were obtained for A (90%) and E (100%) compared to M (56%). This observation further confirms the quality characteristics of the oil samples. Considering the e-nose score plot (Fig. 4a), Ayvalik olive oils are localized at the lower right corner of the control ellipse with slight overlap with Memecik and Erkence varieties near the origin. The variables corresponding to the observation projection on the score plane are shown in the loading plot (Fig. 4b). W1S, W2S, and W3S sensors, which are specific for partially aromatic, alcoholic, methane, and sulphur-organic compounds, are largely responsible for the discrimination of Ayvalik olive oils. Similarly, W6S, W2W, and W5S sensors, which are sensitive to hydrogen, sulphur-organic compounds, and broad-range volatiles seem significant for the characterization of Memecik oil samples located in the first quadrant of the score plot. Erkence samples located in the negative part of PC1 (Fig. 4a) are discriminated by the response of W1C, W3C, and W5C sensors sensitive to aromatic compounds.

Fig. 4
figure 4

Score (a) and loading plots (b) of OPLS-DA model of e-nose data

The capability of OPLS-DA model of e-nose was comparatively lower than those of FT-NIR and FT-IR spectra. However, it should be added that e-nose results are important in the consideration of sensory attributes of olive oils, which are equally important to the chemical characteristics.

Conclusion

The results of varietal classification of Turkish olive oils with rapid and non-destructive spectroscopic (FT-IR and FT-NIR) and e-nose techniques combined with chemometric analysis are considerably satisfactory. The best performances in terms of correct classification of samples were observed for FT-NIR spectra and combined FT-IR-NIR spectra. Discriminative capacity of e-nose is comparatively low, but could be a useful substitute or complimentary step to human subjective method in sensory analysis. Applications of these techniques are recommended for rapid inexpensive, non-specific, and waste-free authentication of olive oils, which can span through varietal, harvest time, geographical origin, and quality differences.