Introduction

Extra virgin olive oil (EVOO) is a non-polar liquid phase formed after the milling of fresh fruits of Olea europaea. EVOO is obtained exclusively by cold pressing of olive paste, without any additives and other manipulations except decantation, centrifugation, and filtration [1,2,3]. This source of fat is typical in the Mediterranean countries; however, nowadays, it is possible to find EVOOs produced also in South America, Australia, or China [4].

Triacylglycerols (99%), diacylglycerols, and free fatty acids are the major oil compounds [58]. Perfect balance of these components together with other minor bioactive oil substances such as phenols, vitamins, sterols, and pigments is responsible for the EVOO health benefits [9,10,11]. Although phenols (PPs) are only minor oil components (50–1500 mg/kg), these are very valuable bioactive substances which play also an important role in its organoleptic properties [12]. Although only 2% of PPs are transferred to the olive oil during its production, EVOO contains more than 36 structurally different phenolic compounds. Except the major secoiridoids, olive oil comprises also phenolic acids, phenol alcohols, flavonoids, lignans, and aldehydes [11, 13,14,15,16,17,18].

Climatic conditions, olive-growing soil quality and composition, altitude and latitude, technological aspects (fertilization, irrigation), olive tree cultivar (variety), olive fruit maturity, and whole oil production process (including producing, processing and storage) significantly influence EVOO composition and hence its quality and sensory, nutritional, physical, and chemical properties [4, 11, 13, 18,19,20,21,22,23,24,25,26].

Unfortunately, EVOO is one of the often-counterfeited goods [27, 28]. Adulteration of EVOO is not only related to good manufacturing practice, but it is also associated with exchanges of geographical origin [13, 26, 27, 29]. Therefore, many analytical methods have been developed for the determination of various organic compounds (mainly fatty acids and triglycerides) for the differentiation and classification of olive oils [26, 27, 29,30,31,32]. Although PPs are minor compounds in the olive oils, their assessment could provide information about the geographical origin, technological practice, and other factors, which could help to distinguish the authenticity of the olive oils [33, 34].

Reversed-phase high-performance liquid chromatography (RP-HPLC) combined with mass spectrometric (MS), spectrophotometric (UV), fluorescence, or electrochemical detection is frequently used technique for analysis of PPs in olive oils. However, the separation is usually very long (30–90 min) and the experiment is prolonged by necessary and generally time-consuming pre-treatment steps [solid-phase or liquid–liquid (micro)extraction] [10, 13, 24, 35,36,37]. A few numbers of studies deal with the identification and quantification of PPs using gas chromatography, nuclear magnetic resonance, or infrared spectroscopy [2, 34, 38,39,40,41].

Generally, analytical techniques create large data sets that require a use of advanced chemometric tools to obtain the maximum amount of useful information. Discriminant (DA) analysis (linear, partial least squares), soft independent modelling of class analogy (SIMCA), factor analysis (FA), or principal component analysis (PCA) are the most common techniques of the multivariate statistical analysis [26, 32].

The aim of this work was to evaluate EVOO samples produced in different part of Italy in the terms of the composition and total amount of PPs using multivariate statistical methods. For this purpose, 68 EVOOs from 9 Italian regions, namely Apulia, Calabria, Campania, Garda (Lombardy, Veneto, and Trentino-Alto Adige), Liguria, Lazio, Tuscany, Umbria, and Sicily, were analysed by reversed-phase liquid chromatography coupled with diode array detector and mass spectrometer (RP-HPLC-DAD-ESI/MS), and the results were assessed by PCA, FA, and DA. Multivariate statistical methods enable to distinguish the EVOOs based on the different parameters (geographical origin, ripeness of the olives, pedoclimatic conditions, manufacturing practice, etc.), which affect the content of phenol compounds and relate to each other, from one data set.

Materials and methods

Chemicals and reagents

Standards of apigenin (purity ≥ 99%), ethyl gallate (≥ 96%), hydroxytyrosol (≥ 90%), luteolin (≥ 97%), tyrosol (≥ 95%), and oleuropein (≥ 98%) were purchased from Merck (KGaA, Darmstadt, Germany) as well as HPLC–MS grade acetonitrile, formic acid, n-hexane, and methanol. Sample extracts were filtered through 0.45 µm nylon filter from Agilent (Santa Clara, USA) prior to the analysis. Water (resistivity above 18 MΩcm) was obtained from a Milli-Q SP Reagent Water System (Millipore, Bedford, MA, USA).

Samples and extraction procedure

EVOO samples were provided directly from the producers from 9 Italian regions. Due to insufficient number of EVOOs obtained from some regions, the Italy was divided into three supra-regions (northern Italy, southern Italy, and Sicily, Fig. 1) for the purpose of multidimensional statistical treatment. The group of northern Italy comprises of 14 samples from Lazio, 15 samples from Tuscany, 2 samples from Umbria, 4 samples from Garda (Lombardy, Veneto and Trentino-Alto Adige), and 2 samples from Liguria. The group of southern Italy includes 18 samples from Apulia, 2 samples from Calabria, and 1 sample from Campania. Last 10 samples were obtained from Sicily, which form individual group of the samples. EVOOs coming from all the regions had different designation marks (blend, PDO, and PGI). The information about the sample numbering and their geographical and botanical origin is listed in detail in Table 1. An illustrative map of the Italy displaying the cities of samples origin together with division to the three supra-regions is shown in Fig. 1. All EVOOs were from the 2017 campaign. The information about growth conditions of olive trees, olive collection, maturity, and sample processing is not known.

Fig. 1
figure 1

Map of the Italian region with the sample representation (Zoner Photo Studio 12)

Table 1 List of EVOO samples with description of their botanical and geographical origin and total phenol content

EVOOs were stored in the darkness at room temperature and these were processed immediately after the opening the bottles. Phenolic fraction was extracted from EVOOs according to Ricciutelli et al. [42] with slight modifications. In short, 1.0 g of EVOO was dissolved in 1 mL of n-hexane, extracted four times by 1 mL of water:methanol mixture (2:3; v:v), and centrifuged (5 min, 4000 rpm). Polar phases were collected, and the oil residue was removed by 1 mL of n-hexane. Then, a known amount of internal standard (IS = ethyl gallate) was added. Furthermore, the final extract was evaporated to dryness under vacuum, re-dissolved in 0.5 mL HPLC-grade methanol, filtered through 0.45 µm nylon filter, and injected to HPLC configuration in three repetitions (n = 3). All samples were extracted and analysed in triplicate (n = 9 for individual sample).

Standard solutions

Stock standard solutions of apigenin, hydroxytyrosol, luteolin, tyrosol, and oleuropein were prepared with a concentration of 1000 mg/L in methanol.

Quantification was performed by the internal standard calibration, where each calibration solution contained IS (c = 50 mg/L). The calibration data were measured at 6 concentration levels (0.1–100 mg/L), each level five times (n = 5). Oleacein and isomers of elenolic acid were quantified using the oleuropein calibration curve. Verbascoside isomers and acetoxypinoresinol were quantified using the hydroxytyrosol calibration curve. Tyrosol glucoside and ligstroside aglycone were quantified using the tyrosol calibration curve. The rest of the compounds was quantified using available standards.

HPLC analysis

HPLC analyses were performed using a Shimadzu (Kyoto, Japan) instrument, consisting of two LC-20AD binary solvent pumps, an SPD-M20A photodiode array detector (DAD) and an LCMS-2020 quadrupole mass spectrometer (MS) equipped with an electrospray ionization (ESI) source operating in negative-ion mode. DAD was set at 280 nm. The ESI ion source was working at following conditions: desolvation line temperature, 280 °C; nebulizing gas (nitrogen) flow rate, 1.5 L/min; drying gas (nitrogen) flow rate, 5 L/min; heat block temperature, 300 °C. Mass scan range was set in the range of m/z 100–800 and event time was 0.5 s.

Separations were performed on an Ascentis Express C18 (150 × 4.6 mm, 2.7 μm) analytical column (Merck Life Science, Merck KGaA, Darmstadt, Germany) with mobile phase composed of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B) with gradient conditions: 0 min, 10% B; 4 min, 35% B; 12 min, 47% B; 12.5 min, 60% B; 16 min, 75% B; 21 min, 100% B, and a flow rate 1 mL/min. The injection volume was 5 μL.

Statistical analysis

The HPLC method was validated by determining of extraction recoveries, coefficients of linearity, limits of detection (LODs), and limits of quantification (LOQs) for the estimated standards [43]. Acquired recoveries were in the range 84.3–99.5%. Coefficients of linearity were within the range of 0.9994–0.9999. LODs and LOQs were determined as the concentration yielding signal-to-noise ratio (S/N) of 3 or 10, respectively. The LOD and LOQ values were within the range of 5.2–39.9 µg/L and 17.5–133.0 µg/L, respectively.

The one-way analysis of variance (ANOVA) was used to determine the statistical significance of individual variables for description of the system (QC Expert 2.5, Trilobyte, ČR). Each variable was significant; therefore, all data were used for the multivariate statistical treatment using three methods—PCA, FA, and linear DA (Statistica 12, StatSoft, Inc.; Tulsa, OK, USA). FA and PCA give information how the objects are related to each other. The objects located close to each other are similar and objects located far away are outliers. In FA, the emphasis is on a transformation from the underlying factors to the observed variables, whereas in PCA, the emphasis is on a transformation from the observed variables to the principal components. Therefore, the PCA and FA analyses were performed to find the relation between individual samples of Italian extra virgin oils depending on PP content and profile. Discriminant analysis classifies objects into groups based on known group membership [44]. In this study, the influence of geographical origin on phenolic content and profile was investigated using DA.

The amount of monitored phenolic compounds in the individual samples was determined in triplicate (the extraction was performed three times). All data were subjected to multivariate statistical treatment, where 80% of the data were used for model building and 20% of the data were used for cross-validation.

Results and discussion

Selected PPs were determined in 68 EVOO samples from 9 Italian regions using optimized HPLC/DAD/MS method [43]. The typical chromatographic separation of the sample number 30 coming from Catanzaro in Calabria region is shown in Fig. S1. For the purpose of discrimination analysis, the samples were divided into three groups (Northern part of Italy, Southern part of Italy, and Sicily) according to their geographical origin (Table 1, Fig. 1). Liguria, Garda (Lombardy, Veneto, and Trentino-Alto Adige), Lazio, Tuscany, and Umbria regions belong to Northern part of Italy. Campania, Apulia, and Calabria regions belong to Southern part of Italy. The last region was Sicily as an autonomous unit.

Quantity of PPs in the EVOOs

Olive oil is known for the presence of many derivatives, isomers, and other forms of phenolic compounds, especially secoiridoids [19]. In total, 40 different forms of PPs (Table S1), namely hydroxytyrosol (HTY; M = 154 g/mol), tyrosol (TY; M = 138 g/mol), tyrosol glucoside (TY-Glu; M = 300 g/mol), four elenolic acid isomers (EA is.; M = 242 g/mol), three verbascoside isomers (VER is.; M = 624 g/mol), apigenin (AP; M = 270 g/mol), luteolin (LU; M = 286 g/mol), oleacein (HTY-EDA; M = 320 g/mol), eleven oleuropein aglycone isomers (HTY-EA der; M = 378 g/mol), three oleuropein isomers (HTY-EA is; M = 540 g/mol), nine ligstroside aglycone isomers (TY-EA is.; M = 362 g/mol), three isomers of ligstroside aglycone derivatives (TY-EA der.; M = 394 g/mol), and acetoxypinoresinol (AOP; M = 416 g/mol), were identified in EVOO samples according to the UV and mass spectra and information from the literature [17, 37, 38, 41]. Due to the commercial unavailability of standards of individual isomers, the quantification was performed using structurally similar analogues (see Section “Standard solutions”) by spectrophotometric detection. The mass spectrometer was used for the identification of individual phenolic compounds and for confirmation of their presence in the individual samples. Derivatives and isomers of TY-EA and HTY-EA were the most significant compounds with the highest contribution to the total phenol content. TY-EA isomers and derivatives were quantified with the median 106 mg/kg and 89 mg/kg, respectively, and HTY-EA isomers and derivatives were quantified with the median 44 mg/kg and 228 mg/kg, respectively. On the other hand, VER isomers and AP were the minor compounds with the median 1.9 mg/kg and 1.7 mg/kg, respectively.

Total phenol content in studied olive oil samples is listed in Table 1. Samples from Apulia and Tuscany contained the highest amount of PPs with the average about 700 mg/kg from which the samples 21 (Apulia) and 61 (Tuscany) were the richest EVOOs regarding the phenol content (1220 mg/kg and 1110 mg/kg, respectively). The pedoclimatic aspects, which are related to the geographic origin, together with the technological aspects are the main parameters influencing the total phenol content in EVOO samples. On the other hand, the botanical origin influences mainly the phenolic profile and the contribution to the total phenol content is lower [45,46,47,48,49]. Due to the high number of variables influencing the content of PPs in oil, we can also record the samples with mean or below mean phenol content from these generally “PPs rich” regions (Apulia, Tuscany). The samples from northern Italian regions such as Garda and Liguria, together with samples from Sicily contained low amount of phenolic substances, with the average of 355 mg/kg. The lowest amount of PPs was observed in the samples 3 and 8 (260 mg/kg and 257 mg/kg, respectively), both obtained from Tonda Iblea olives coming from the southernmost part of Sicily. However, a relationship between content of PPs and PDO or PGI denomination marks was not found.

Since olive varieties are in most cases bound to a specific location (i.e., Sicilian olive cultivars do not grow in the other parts of Italy), we cannot unambiguously claim, if the total phenol content is more related to the geographical origin or to the olive cultivars, which is nowadays frequently in dispute. This situation especially fits to Italy, where a lot of olive cultivars occur (more than 500). All cultivars are not suitable for each environment; therefore, the different PP amount and composition could be obtained from the same cultivar grown in different locations [47]. All Sicilian EVOOs prepared from typical Sicilian cultivars (Tonda Iblea or Nocellara del Belice) showed the lowest total phenol content in comparison with the other Italian cultivars (average amount around of 350 mg/kg), which is in agreement with the previous study [50]. The northernmost analysed samples were EVOOs 65 and 68 coming from the southern part of Garda region and were also poor in phenol content. Both were made from cultivar Leccino; in the case of sample 65, it was mixture of Leccino and Casaliva. Their low content could be caused by the geographical position, which also includes different pedoclimatic conditions [45, 48, 49].

Factor analysis

Factor analysis finds the underlying structure in a data matrix and allows the visualization of the variance between the olive oils. Figure 2 brings factor loadings after varimax rotation. The factor loadings can be interpreted as correlations between the factors and the variables, where the first factor shows most of the highest loadings. From Fig. 2, it is evident that derivatives of HTY-EA and TY-EA are useful for the factor 1 description, because these are located close to -1 on the axis of factor 1 and close to 0 on the axis of the factor 2. Derivatives and isomers of HTY-EA and TY-EA most contribute to the total phenol content of the EVOOs; therefore, the factor 1 is connected with the phenol content. The objects EA, HTY-EDA, and HTY suit the factor 2 description, because they are located close to 1 on the axis of the factor 2 and close to 0 on the axis of factor 1. It was found [51], that EA isomers and HTY-EA derivatives are indicators of the olives’ ripeness. The EA content increases with olives ripening, while the HTY-EA content decreases. It is caused by degradation of HTY-EA on EA and HTY during ripening. Therefore, decrease of HTY-EA was observed in case of over-ripened olives [48, 49]. Thus, the factor 2 is connected to the ripeness of the olives.

Fig. 2
figure 2

The factor loading plot after normalized varimax rotation demonstrates a similarity of the objects (STATISTICA, StatSoft)

Using FA, the EVOO samples could be divided into two groups (Fig. 3). The samples with high phenol content are located on the left part of the factor score (group A) and the samples with low amount of phenolic compounds are located on the right part of the plot (group B). The distribution of samples along the axis of factor 2 is given by the content of EA and HTY-EA, and it is connected to ripeness of the olives. The samples located on the top of the factor score contain high amount of EA and lower amount of HTY-EA, which indicate that the olive oils could be prepared from the over-ripened olives (Fig. 3).

Fig. 3
figure 3

The factor scatterplot of the first and the second factor indicates two clusters of the EVOO samples (STATISTICA, StatSoft)

The samples with high phenol content in the group A (Fig. 3) were mainly from Apulia (15, 19, 22–27 and 29), but we could find samples from Tuscany (48, 50, 58 and 60) and Lazio (40 and 45). The bigger group B contains the samples with average or below average amount of PPs. In the down right corner of the group B (Fig. 3), there are mostly samples from northern Italian regions (Garda and Liguria; samples 63–68), which contain smaller amount of total PPs. Almost all samples from Sicily (samples 1–10) could be found on the right part of the group B with insignificant value of factor 2. It means that the EVOOs were prepared from the well-ripened olives, but contain low amount of the phenolic compounds. Furthermore, five outliers (samples 1, 21, 49, 57, and 61) were found in the factor score (Fig. 3). Samples 21 and 61 showed the highest amount of PPs from all analysed olive oils, and thus, these are located to the far left. Samples 49 and 57, which are located on the top of factor score, contain enormously high amount of EA (and also lower content of HTY-EA) compared with other EVOOs. These oils were probably produced from over-ripened olives; however, the total phenol content seems to be reasonable. The last outlier was the sample 1 observed in the upper corner far right of Fig. 3. This sample contained very high amount of elenolic acid (similar as sample 49), while its total phenol content was deeply below the mean value. Consequently, this sample was probably prepared from over-ripened olives, and moreover, the olives could be grown, or the oil could be processed and stored in inappropriate conditions.

The results of factor analysis were confirmed by PCA (data not shown, the principal component score is identical as the factor score) and well correspond with the raw data shown in Table S1.

Discriminant analysis

The samples of olive oils were divided into three groups (A–C) according to their geographical origin using linear discriminant analysis (Fig. 4). The discrimination function is depicted in the supplementary material in Table S2 together with the significance of the individual variables to the discrimination of the samples into three groups (Table S3). The group A in Fig. 4 includes the EVOOs from northern part of Italy (samples 32–68). Sample 40 is a bit far from the others, which could be caused by higher amounts of luteolin and apigenin. The chemical composition of these EVOOs is similar and olive cultivars are typical for northern Italian regions (Casaliva, Leccino, Garda Orientale, Moraiolo, Toscano, Carboncella, Lavagnina, Taggiasca, Chianti classico, Olivastra Seggianese, and their mixtures). Most samples have PDO or PGI denomination marks. The group B (Fig. 4) comprises Italian southern EVOOs (samples 11–31), mainly prepared from Coratina olive cultivar. These samples came from the coast of the Adriatic Sea (all Apulian samples; 12–30), except the sample 11 from inland part of Campania, and then samples 30 and 31 coming from the coast of colder Ionian Sea. Just these two samples (30 and 31) could be considered as slight outliers of the group B which may be caused by the geographical position (the southernmost Italy) with different sea type and special olive cultivars (Carolea, Lametia). Influence of sea type and geographical position of olive trees growing area (coast versus inland) on the quality of Spanish olive oils were investigated and recorded by Sayago et al. [26] Apulian sample 29 from Coratina cultivar was located a little bit far from the rest of the samples of the group B. It could be caused by technological aspects of cultivation (olives growth, irrigation, plant health condition, olives harvest, and collection) and processing (mainly crushing, malaxation, pressure of extraction, centrifugation, and storage) [45, 47,48,49, 52]. The C group includes Sicilian EVOOs which are made from the typical Sicilian olive cultivars Nocellara del Belice and Tonda Iblea. The C group (Fig. 4) is not clustered as good as previous two groups due to smaller number of the samples coming from this part of Italy. However, the main difference between these samples and the samples from the other two groups is in the position of Root 2 (Fig. 4), probably due to different climatic island condition. Linear DA has been also successfully applied in the study of Becerra-Herrera et al. [13], where the Spanish EVOO samples were distributed according to their geographical origin and cultivars. Furthermore, the influence of pedoclimatic conditions (geographical area, province, climatic condition, irrigation, and altitude) on the content of fatty acids in EVOO samples coming from Italian region Lazio was investigated by PCA and linear DA [50].

Fig. 4
figure 4

Scatterplot of the linear discriminant score gives a visual impression of how well the Fisher linear discriminant functions classify the data. The EVOO samples are divided into three clusters according to their geographical origin: Northern Italy-group A, Southern Italy-group B, and Sicily-group C (STATISTICA, StatSoft)

Conclusion

In the present study, 40 phenolic compounds in 68 Italian EVOO samples were characterised by high-performance liquid chromatography with diode array detection and mass spectrometer. Then, the obtained data set was subjected to multivariate statistical analysis (factor analysis, principal component analysis, and linear discriminant analysis) to distinguish EVOO samples based on their PP content and profile.

The factor analysis enabled evaluation of individual EVOO samples on the base of two main factors which were useful for the description of the studied EVOOs. The first one was related to the total phenol content in EVOOs given mainly by the quantities of oleuropein and ligstroside. The second factor corresponded to the degree of ripeness of the olives given mainly by the amount of elenolic acid isomers. Based on these facts, the factor analysis divided samples into two groups with five outliers. Two outliers were samples containing very high amount of PPs in the comparison with the others. The other three outliers were probably prepared from over-ripened olives, because the amount of elenolic acid isomers was significantly higher compared to the other samples.

The linear discriminant analysis was used to classify EVOOs according to their geographical origin. Although the phenolic profile is affected by many factors such as irrigation, storage, ripeness, climatic condition, botanical origin, etc., all EVOOs were perfectly and clearly divided into three groups (Northern and Southern of Italy and Sicily) without any outliers in our study. This is the first time that Italian EVOOs have been grouped according to their geographical origin on the basis on PP profile and content using multivariate statistical data treatment.

Based on our findings, we can conclude that the PP profile and content is a very useful tool for the evaluation of EVOOs in terms of their quality, geographical origin, and good manufacturing practice.