Introduction

Virgin olive oil (VOO) represents the main source of fats in the countries of the Mediterranean basin where olive oil production is concentrated [1]. Olive oil only makes a minor contribution (2–2.5%) to the total global production of vegetable oils, but its economic importance rises by several factors due to the higher price of this oil in comparison to others, such as soybean, rapeseed, or sunflower oil, the reason for such a high price lies in its quality [2].

In recent years, extra virgin olive oil has acquired a distinguished position on the shelves of retail and delicatessen stores all over the world. The current growing demand is related to preference of some consumers for the ingredients of the Mediterranean diet and the nutritional benefits expected, due to the presence of valuable phytochemicals, such as monounsaturated fatty acids (MUFA) mainly oleic acid, phenolic antioxidants, squalene and carotenoids [3]. In addition, both the high content of oleic acid and low amounts of linoleic and linolenic acids, make an important contribution to the high oxidative stability to this type of vegetable oil [4]. Consequently, VOO has a well-balanced composition of fatty acids, which makes it both fairly stable against thermo-oxidation and very suitable for human health [2].

Further, olive cultivation is widespread throughout the Mediterranean region and is important to the rural economy, local heritage and the environment [5]. For these reasons, the improvement of quality and peculiarity of products and the safeguarding of the genetic inheritance of the cultivated species are new objectives of world wide research [6].

The content of fatty acids of VOO is affected by several factors, but mainly by olive cultivar [7], period of harvest [8] and year of cultivation [9]. To a lesser extent, the effects of altitude [10], agronomic factors [11], storage temperature of olives after harvest [12], and oil extraction system [13] were also demonstrated. The fatty acid composition is dissimilar at different stages of maturity; it is affected by different environmental conditions that affect the lipid metabolism in plants, e.g., longer days and lower temperatures related or not to an increase in altitude, induces an increase in the unsaturation of fatty acids of oils, mainly in linoleic acid proportions. This is generally explained by the fact that low temperatures promote the activity of acyl-desaturases [14].

In Tunisia, the olive oil sector plays an important role in the economy, providing both employment and export revenue. With an annual production of 170,000 tonne/year, Tunisia is the fourth largest producer of olive oil [15]. Under the framework of the Tunisian Project, our research in an experimental orchard focused on the effects of crossbreeding under controlled conditions, on the final sustainability of olive oil quality. Chemlali Sfax, the first main olive variety cultivated in Tunisia, is very widespread in arid and semi arid backgrounds due to its intrinsic qualities such as sturdiness, productivity and oil content [16]. Besides its organoleptic and taste characteristics, Chemlali Sfax oils have high levels of palmitic and linoleic acid [17].

In order to improve its oil quality, a genetic crossing was made [18] between Chemlali Sfax (female or seed stock) and other varieties (autochthonous and introduced) well known by their superior fatty acid composition to that of Chemlali Sfax [16]. Studies of genetic variability, heritability and correlation between properties can show the extent to which certain traits are genetically determined and which of them have the greatest importance in the selection and creation of new cultivars [19].

Major VOO components such as triacylglycerols, fatty acids and sterols [20] or minor components such as tocopherol, volatiles and phenolics [17] in combination with chemometrics have been employed for the classification and characterization of VOOs. Moreover, fatty acid profiles are extremely useful for the characterization and discrimination of an olive cultivar or its geographical location [16, 17, 20] as well as the harvest period [21] even for oils produced within close geographical proximity, or harvested at different maturity stages [22].

Currently, multivariate statistical methods, such as principal component analysis (PCA), hierarchical cluster analysis (HCA), discriminant analysis (DA) and classification analysis (CA), are used extensively to classify and characterize VOOs in terms of detecting compositional differences for authenticity evaluation. This classification is of great importance for the producers, importers, and consumers [2325]. The PCA method, one of the simplest and most often used methods, allows the minimization of variables, and those that are selected become the principal explanation for the variability between cultivars, improving the separation and making easier the understanding of what are the strongest factors that comprise the base of distinction [21].

The purpose of this investigation was to evaluate the variability and correlations of some fatty acid compositions in olive oil obtained from new olive cultivars. The information obtained can be useful to select the best hybrids for oil quality on the basis of their fatty acid profiles. Moreover, experiments using controlled pedoclimatic conditions were conducted to understand more clearly the role of the genetic factor on the mechanisms of synthesis and accumulation of fatty acids. This study uses PCA to perform the classification and discrimination of virgin olive oils originating from controlled crossings based on the fatty acid profile yielding a reliable indicator for the discrimination and classification of oils.

Materials and Methods

Plant Material and Growing Areas Selected

Samples of olives were collected from descendants obtained by crossbreeding between the main Tunisian cultivar Chemlali Sfax (female or seed stock) and other varieties (autochthonous and introduced). These descendants were cultivated on the farm at Taous (23 km far from Sfax) using standard growing techniques. Trials were conducted in three consecutive seasons 2003/2004, 2004/2005 and 2005/2006 with 8-year-old trees of the olive cultivar, spaced at 2 m × 4 m, for a plant density of approx. 1,250 trees ha−1 and drip irrigated. The studies were carried out on fruits (Olea europaea L.) from about 540 new cultivars (out of 1,850 obtained from an intervarietal crossbreeding) derived from traditional technologies for genetic improvement.

Parent trees were chosen for their high productivity and oil content, different geographical origin and differences in earliness of bearing and oleic acid content. The experimental design of the trials in the 3 years was a randomized block with three replicates of one tree each. Olives were randomly picked at full maturity and harvesting was done by hand, using rakes. Only healthy fruits, without any kind of infection or physical damage, were processed.

Extraction and Determination of Fatty Acid Composition

The Allen and Good method [26] has been applied to extract the lipid substances from the olives and then the fatty acid composition was released as fatty acids methyl esters (FAME) by the analytical methods described in Regulations EUC/1989/03 of the European Union Commission [27] and in the International Olive Council [28]. The FAME were prepared by vigorous shaking of a solution of oil in hexane (0.1 g in 2 ml) with 0.2 ml of 2 N methanolic potash. The methyl esters were then analyzed by capillary gas chromatography using a Hewlett-Packard (HP 5890) chromatograph (Palo Alto, USA), equipped with a flame ionization detector linked to an HP Chemstation integrator. The separation was carried out on a 30 m × 0.25 mm fused silica capillary column (film 0.25 μm) HP-Innowax with nitrogen as carrier gas at a flow rate of 1 ml/min. The column temperature was programmed from 180 to 250 °C. The injector and detector temperatures were 250 and 280 °C, respectively. Results are expressed as percentages of FAME as previously described by Dabbou et al. [16].

Statistical Analysis

The results are reported as mean values and standard deviations. Correlations between fatty acids were determined by the Pearson correlation coefficient, which describes the strength of the linear relationship between two quantitative variables, at p < 0.05.

The classification and discrimination of olive oil hybrids using fatty acid profiles were achieved by PCA. PCA is a standard method in chemometrics that entails data reconstruction and reduction to capture the main features in the multivariate data sets in terms of a set of underlying new orthogonal axes or variables known as PCs from the original variables and to extract information from them. The original variables can be expressed as a particular linear combination of the PCs [29]. Coefficients, by which the original variables must be multiplied to obtain the PC, are called loadings. The numerical value of a loading of a given variable of a PC shows how much the variable has in common with that component [30]. Furthermore, the data sets presented on the orthogonal axes are uncorrelated with one another, and express much of the total variability in the data set through comparison of only a few PCs [31]. The maximal amount of variance in the data set and its direction are often explained by the first PC (i.e., PC1). The second PC, orthogonal to the first, contains the next largest possible variation; the third next, orthogonal to both the first two and so on. Therefore, by selecting only the most important principal components the original matrix may be greatly simplified without substantial loss of information. Each PC is defined by a vector known as the eigenvector of the variance–covariance matrix [32]. Then, the loadings of the original variables were projected onto the factorial plane formed by the first and second component, which were enough to explain almost all the variability for each analysis. The variance along the vector is known as the eigenvalue. Eigenvalues were also observed and used for the determination of variances of the major PCs. For a visualization of the data discrimination, PCA plots mapped variables (fatty acids) through loadings in dimensional spaces determined by PCs with eigenvalues >1.0 based on Kaiser’s rule [31, 33]. Analysis of variance was performed for all measured traits in order to test the significance of variation among accessions. Pearson correlation coefficients and PCA were calculated using XL-Stat-Pro 7.5 (2007) for Windows (Addinsoft, NY, USA).

Results and Discussion

Fatty Acid Profiles and Correlations

The gas chromatography analysis of 540 olive oil hybrids, collected in 2003/2004, 2004/2005 and 2005/2006 crop years, exhibited a fatty acid composition similar to those of extra virgin olive oil samples (Table 1). Twelve fatty acids were considered in this study: palmitic (C16:0), palmitoleic (C16:1, w7), margaric (C17:0), margaroleic (C17:1, w8), stearic (C18:0), oleic (C18:1, w9), linoleic (C18:2, w6), linolenic (C18:3, w3), arachidic (C20:0), behenic (C22:0) and lignoceric acids (C24:0).

Table 1 Fatty acid composition (%) from lipid extracts of olive oils obtained through controlled crossbreeding (n = 540)

For the main fatty acids, palmitic acid ranged from 5.33% to 24.93%; stearic acid, 1.20–8.92%; oleic acid, 29.83–78.75%; linoleic acid, 3.94–37.09% and linolenic acid between 0.00% and 3.31% (Table 1). These results correspond with those from other studies and show that the sum of oleic and linoleic acids accounts for nearly 80% of the total fatty acids detected in olive oil samples [17]. In addition, Table 1 shows a wide range of variation for the sum of fatty acids, with SFA, MUFA and PUFA contents of 10.38–31.88, 33.07–79.21 and 4.39–37.68%, respectively. The oleic to linoleic acid (O/L) ratio is a quality index employed for the determination of stability and rancidity of olive oil. In fact, a high O/L ratio is associated with high stability and low rancidity of olive oil [34]. Consequently, the genetic factor (cultivar) was the main contributor to total variance for all the fatty acids and ratios evaluated, which confirms previous works on olive breeding [34, 35]. Furthermore, with regard to the legal limits [15], 95 out of 540 hybrids analyzed have been classified as extra virgin olive oil (Table 1).

Pearson correlation coefficients between fatty acid variables are given in Table 2. The correlation between oleic acid and palmitic acid was strong but negative (r = −0.668, p < 0.05); that is, an increase in one fatty acid leads to a corresponding decrease in the other. The increase in oleic acid content is a result of the active biosynthesis of TAG, which takes place through fruit ripening, involving a fall in the relative percentage of the oil’s palmitic acid content [36]. This negative relationship can be explained from the biochemical pathways of olive development: in the olive germplasm, palmitoyl ACP is elongated to stearoyl ACP followed by desaturation forming oleic acid. Then, the action of oleoyl–phosphatidylcholine desaturase (a D12-fatty acid desaturase) synthesizes palmitic acid from oleic acid [2]. However, a positive correlation between oleic acid and palmitic acid in sesame oil lipids was observed in previous studies [37]. In addition, palmitic acid is highly positively correlated to palmitoleic and SFA (r = 0.712, p < 0.05; r = 0.955, p < 0.05) respectively, while negatively highly correlated to MUFA (r = −0.612, p < 0.05) and UFA/SFA ratio (r = −0.937, p < 0.05). The positive correlation between palmitic and palmitoleic acids originates from the fact that the latter is produced by desaturation of the former. The Δ9-stearoyl-ACP desaturase, which acts mainly on stearic acid, can also use palmitic acid as a substrate to produce palmitoleic acid. Therefore, increasing stearoyl-ACP desaturase activity may result in the accumulation of palmitoleic acid, which has important pharmaceutical applications [2].

Table 2 Pearson correlation coefficients between percentage levels of fatty acid composition (p < 0.05) from lipid extracts of olive oils obtained through controlled crossbreeding (n = 540)

Positive correlations between linolenic and behenic acids (r = 0.554, p < 0.05) as well as oleic and O/L ratio (r = 0.802, p < 0.05), O/L and P/L ratios (r = 0.778, p < 0.05); palmitic and SFA (r = 0.625, p < 0.05); oleic and MUFA (r = 0.992, p < 0.05); MUFA and O/L ratio (r = 0.825, p < 0.05); linoleic and PUFA (r = 0.998, p < 0.05); MUFA/PUFA and oleic acid, O/L, P/L ratios and PUFA (r = 0.800, p < 0.05; r = 0.998, p < 0.05; r = 0.794, p < 0.05; r = 0.827, p < 0.05) respectively; UFA/SFA with oleic and MUFA (r = 0.660, p < 0.05; r = 0.615, p < 0.05 ibid) were observed in our study.

Moreover, a high negative correlation was observed between oleic and linoleic acid. This correlation can be explained by the regular activity of the enzyme oleate desaturase which transforms oleic acid into linoleic. This activity was observed although the absolute palmitic acid content may remain constant with the continuing biosynthesis of triglycerides, with the formation of oleic acid. The net result is that palmitic acid remains constant while linoleic increases [2].

Principal Component Analysis (PCA)

The data matrix of variables analysed (i.e., the fatty acid composition in olive oil cultivars) was subjected to PCA which was constructed using all the samples in the study.

The eigenvalues, as well as the cumulative variance, are included in Table 3. The eigenvalues listed were used to observe the relative importance of each dimension. For the selection of the number of PCs, the Kaiser criterion [33] was followed and only PCs with eigenvalues higher than 1.00 were retained. As can be seen, most of the variation in the data set can be explained by the first few principal components. Table 4 shows the most significant PCs generated from the olive oil fatty acid data and their statistical loadings in the current study. PC analysis extracted only five eigenvalues higher than one. In the present work, the first five PCs, corresponding to 79.95% of total explained variance were retained. Thus, the total number of variables included in this study could be substituted with only five linear combinations without apparent loss of information. The two first components (PC1 and PC2) showed the highest eigenvalues; 6.63 (36.84% of total variance explained), and 3.67 (20.40% of total variance explained). The third, fourth and fifth PCs (i.e., PC3, PC4 and PC5) had eigenvalues of 1.73, 1.26 and 1.10, and accounted for 9.56, 7.06 and 6.09% of the variance in the data, respectively. Cumulatively, the explained variance by the five eigenvalues was 79.95%. The subsequent PCs (i.e., eigenvalues < 1) progressively explained less and less variance. The loadings associated with each variable on these five PCs identified the variables that most contribute to them. Loading values >0.60 in PC1 and PC4 and values >0.50 in PC2, PC3 and PC5 are marked throughout Table 4 in boldface type. These numbers represent significant contributions of individual fatty acid variables to the total variability explained by the generated PCs.

Table 3 Variance explained (percentages of individual and cumulative variance) and eigenvalues explained by the five first principal components
Table 4 Eigenvalue analysis of the correlation matrix loadings of the significant principal components (PCs)

By selecting only the most important principal components, the original matrix can be greatly simplified without substantial loss of information. The projection of the variable loadings on the plane defined by the first two PCs (Fig. 1) allowed a visualization of the variables and their corresponding correlations, which are the cosine of the angle of their respective vectors (90° is no correlation at all) [38]. The projections (loads) of the variables on the PC1 and PC2 axes represent the variables contributions to the principal components. PC1 describes 36.84% of the variance in the data set, and is composed of positive loadings which includes high contributions from oleic acid (0.98), O/L ratio (0.89), MUFA (0.98), MUFA/PUFA (0.89) and UFA/SFA (0.60) variables as well as negative loadings of linoleic and PUFA (−0.87) are seen by their spatial relationship to one another and the x-axis). The P/L ratio had positive loadings denoting the sign relationship of its contributions to the data variability. The PC2 revealed a high positive loading for palmitic acid (0.76), palmitoleic acid (0.59), P/L (0.82) and SFA (0.77). The plot of the case scores on the plane of these two PCs did not lead to their clear separation according to styles or cultivars. The PC3 was most described by linolenic (0.77), behenic (0.77) and lignoceric acid content (0.50), PC4 by stearic acid (0.75) and PC5 by heptadecanoic (0.51) and heptadecenoic (0.67). Together PC3, PC4 and PC5 comprised 22.67% of the remaining variance in the data set. Another study on the genetic improvement of olive oil conducted by León et al. [34] confirmed our results indicating that the percentages of C18:1, C18:2 and SFA were the main contributors to the total variation. Further, Diraman et al. [9] studying fatty acid profile for 64 Turkish VOOs observed 64.57% of variance as PC1 and 29.06% of variance as PC2 whereas Bianchi et al. [32] applying PCA to the fatty acids data, showed the percentage of variance explained by the first two PCs was 63.7%.

Fig. 1
figure 1

Results of the PCA. Only five eigenvalues higher than 1 were found. The first accounted for 36.84%, and the second accounted for 20.40% (accumulative variance of 57.24% of the total). The graph shows the projection (loadings) of the variables (fatty acids) on the plane of the two first principal components

Specific patterns of correlation between the variables tested can be visualized when one compares loading plots between the PCs (Fig. 1). The objective of a loading projection is to visualize the position of the variables with respect to one another in two-dimensional space and their corresponding correlations. Variables closest to one another and far from the plot origin are positively correlated (or directly proportional; e.g., see C18:1 and PUFA), while variables opposite one another on the plot are negatively correlated (or inversely proportional; e.g., see C16:0 and UFA/SFA). The loading plots generated from the data of Table 3 can explain the relationships between two variables by their angle from the center. The correlation coefficient between two variables is defined as the cosine of the angle between their respective vectors on the plot [39]. These results correlate well with the eigenvalues discussed in Table 3.

Conclusion

This research specifically focused on Tunisian olive oil hybrids obtained through controlled crossing and grown in an experimental field in the region of Sfax and showed how the individual fatty acids related to the generated PCs contribute to the total variability of olive oil hybrids. Pearson correlation coefficients between fatty acid variables revealed an inverse association between C18:1 and C18:2; C18:1 and C16:0, while C16:0 and C16:1 were positively correlated. Eigenvalue analysis of the correlation matrix loadings of the first two PCs revealed that PC1 was mainly contributed to this variability by oleic and linoleic acids, O/L, P/L, MUFA/PUFA and UFA/SFA ratio, MUFA and PUFA while PC2 was contributed by palmitic and palmitoleic acids, P/L ratio and SFA. When the loading plots for olive oil hybrids were projected as PC1–PC2 groupings, they were found to be explanatory of more than 57% of the total variability in the data set. Furthermore, this study clearly indicates that the combination of experimental GC fatty acid data along with a chemometric approach (PCA in this case) can be successfully employed for the final selection of the best genotypes to be registered as new Tunisian cultivars.