Introduction

Olive tree (Olea europaea L.) is the most important oil-producing crop in Mediterranean basin. Olive is a diploid specie with 23 chromosomes (2n = 46) [1], most cultivars are self-incompatible, and the flowers are wind pollinated [2, 3]. The cultivated olive is an evergreen, outcrossing, vegetatively propagated tree with a very wide genetic patrimony that is the result of both plant longevity and the scarcity of genotype turnover through centuries of cultivation. As consequence of the beneficial properties for human health for this crop [4], the consumption of olive oil is increasing throughout the entire world.

Some cultivars of olive oil are recognized as being of higher quality because they derive from well-defined geographical areas, command better prices, and are generally legally protected.

Olive oil is usually traded as a blend from different cultivars and from different provenances. Some olive oils are blends of high- and low-grade oils. Yet, they are traded as being of high quality. That is why a well-documented traceability system has become a requirement for quality control in the olive oil chain.

The definition of traceability according to the European Council Regulation EEC 178/2002 is the ability to identify and trace a product or a batch of products at all stages of production and marketing. Traceability is important for commercial reasons and plays a considerable role in the assurance of public health.

For these reasons, several analytical techniques are used to detect adulteration of virgin olive oil and to establish its authenticity. Most chemical analyses are of limited significance due to their high variability of oil components according to environmental conditions. Neither morphological characteristics nor the analyses of chemical composition of fatty acid and secondary metabolites can provide reliable results for oil traceability [5].

Nowadays, molecular markers allow the detection of DNA polymorphisms and enable to effectively distinguish different cultivars in an effective way, without any environmental influence. Genetic markers based on DNA polymorphisms have been largely developed and used for germplasm characterization in the last decade (restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and microsatellites or simple sequence repeat (SSR)).

As a result of their high, locus-specific polymorphism, reproducibility, and transferability, microsatellites have become markers of choice for genetic studies in olive. These markers have been used for different applications such as cultivar discrimination [69], Paternity analysis is also used in olive, for example, to identify pollen donors in an olive (O. europaea L.) grove [10] and identification of olive oil varietals composition [5, 11]. Microsatellite analysis has also been used in the agrofood industry for identification of virgin olive oil [12].

In Tunisia, olive is cultivated throughout the country, and Tunisia is a major producer and exporter of olive oil. Therefore, olives have a great commercial, economic and social importance.

Our objective is to perfectly identify and characterize the several commercial cultivars existing in Tunisia, and more generally all over the Mediterranean basin.

The aim of this research was to evaluate the possibility of identifying virgin olive oils from different cultivars by the analysis of SSR markers. For this purpose, twenty-two virgin olive oils were prepared in the laboratory from cultivars grown in different geographical region and were then analyzed using eight microsatellites. The practical novelty of this work was to establish a classification key with the minimum number of markers that discriminate all the cultivar.

Materials and methods

Plant material

A total of twenty-two olive tree cultivars were chosen and used in this study (Table 1). These cultivars were selected from different geographical regions of the country Tunisia from north to south.

Table 1 List of olive accessions studied

For each cultivar, 2 trees were used, and from each tree, DNA was extracted from young leaves and from olive oil.

Production of oil

Olive oil is produced by grinding 2.5-kg stoned olives and extracting the oil by mechanical means. The procedure for monovariety oil production followed the standard methods used in oils factories, including milling, malaxation for 30 min at 25 °C, centrifugation at 2,000×g for 3 min, and olive oil was obtained by natural decantation. Samples were stored at 4 °C until DNA extraction.

DNA extraction

DNA was extracted separately from leaves using the CTAB protocol followed by two purification steps. Young leaves were frozen and powdered under liquid nitrogen using a mortar and pestle, and total DNA was extracted from leaves of twenty-two olive cultivars using the CTAB methods described by Rekik et al. [8]. An additional purification step was introduced; it consists in washing and eluting once with the QIAamp DNA stool (Qiagen) to eliminate contaminant molecules and to generate a high-quality DNA for specific, reproducible, and consistent amplifications.

The DNA was extracted from oil using QIAamp DNA stool (Qiagen) kit. DNA was extracted from 200 μL for each olive oil sample by the use of the protocol for pathogen detection [13].

DNA was quantified by Hoechst H33258 dye incorporation detected by spectrofluorometer (Tecan GENIOS Plus), and a dilution series of Lamda DNA (D150A Promega) was used as calibration standard. Genomic DNA was undiluted in TE buffer (10 mM Tris–HCl pH 8, 1 mM EDTA pH 8) at −20 °C.

Microsatellite markers

Eight microsatellites (SSRs) markers were used in this study (Table 2). Three markers (DCA1, DCA3, DCA4) from the primer set designed by Sefc et al. [14], three markers (GAPU59, GAPU71A, GAPU71B) by Carriero et al. [15], and two markers (UDO12, UDO09) by Cipriani et al. [16] were selected for their high polymorphism in many olive cultivars [9] and in Tunisian cultivars [5, 8].

Table 2 Summary of genetic diversity observed by typing 22 samples using 8 simple sequence repeat markers

PCRs and capillary sequencer

PCRs were performed in a 15 μL volume consisting of 10 ng of olive oil genomic DNA or 20 ng of young leaves genomic DNA, 2 mM MgCl2, 0.05 mM of each dNTP, 0.1 μM of forward primer (forward primer was labeled with FAM fluorescent dye), 0.4 μM of reverse primer, 0.5 unit of Go Taq (Go Taq Flexi DNA polymerase, Promega), 1× buffer Go taq; PCR amplifications were performed on a thermal cycler verity (Applied Biosystems, 96 well) for DCA1, DCA3, GAPU59, GAPU71B, UDO12 primers at 95 °C for 5 min for 1 cycle, 95 °C for 30 s, 50 °C for 45 s, 72 °C for 45 s for 35 cycles, then 72 °C for 10 min and for GAPU71A, UDO09, DCA4 and primers at 95 °C for 5 min for 1 cycle, 95 °C for 30 s, 57 °C for 45 s, 72 °C for 45 s for 35 cycles, and 72 °C for 10 min.

Five microliters of PCR products was mixed with 0.3 μL of marqueur 420 pb and 14.7 μL of deionized H2O, centrifuged at 2,000 rpm for 1 min, denatured at 94 °C for 3 min, cooled in ice and analyzed on a (3130XL Genetic analyser of Applied Biosystems) capillary sequencer.

Data analysis

The alleles detected for each microsatellite were recorded into a data matrix for the presence (1) and absence (0) of bands (each allele representing a band). Allele frequencies and heterozygosities (both observed and expected under Hardy–Weinberg equilibrium) were calculated using the GDA program [17]. The power of discrimination (PD) was calculated for each SSR locus according to Brenner and Morris [18]:

$$ {\text{PD}} = 1 - \sum\limits_{i = 1}^{g} {p_{i}^{2} } $$

where p i is the frequency of the ith genotype for the locus and the sum is overall genotypes.

The combined power of discrimination over all loci was then calculated as:

$$ 1 - \prod\limits_{l = 1}^{L} {(1 - {\text{PD}}_{l} )} $$

where index l is relative to the loci and the product is taken for all L loci. The probability of null alleles was estimated according to the formula of Brookfield [19]:

$$ {\text{r}} = \left( {{\text{He}} - {\text{Ho}}} \right)/\left( { 1 + {\text{He}}} \right). $$

The data matrix was converted into a matrix of similarity (S) values using Jaccard coefficient [20]. For a pair of two cultivars, i and j, this coefficient is calculated as:

$$ {\text{S}}_{ij} = \frac{{n_{ij} }}{{n_{ij} + n_{i} + n_{j} }} $$

where n i is the number of bands present in cultivar i and absent in cultivar j, n j is the number of bands present in j and absent in i, and n ij is the number of bands shared by the two cultivars i and j.

A tree is then inferred using the unweighted pair group method using an arithmetic average (UPGMA) clustering algorithm. All analyses were done using NTSysPc program version 2.1 [21].

Results and discussion

Marker characteristics and cultivar identification

A high degree of polymorphism was detected with a total of 47 alleles ranging from 3 alleles for UDO12 to 7 alleles for GAPU59, GAPU71B, UDO09, and DCA03 (Table 2). The average number of polymorphic alleles per locus was 6 alleles per SSR primer.

The power of discrimination (PD) illustrates the ability of primer pairs to discriminate between cultivars. PD was calculated on the results obtained from the amplification of leaf DNA, because DNA from some monovarietal oils was unsuccessful to amplify (Table 2). PD value ranged from 0.582 for UDO09 to 0.791 for GAPU59.

For the discrimination of these 22 oils with the lowest number of markers, only SSRs having the highest values of PD were kept to produce an identification key.

The observed (Ho) and expected (He) heterozygosity ranged from 0.333 to 1 and from 0.455 to 0.784 in oil DNA. For all markers, except two (DCA1 and DCA4), Ho was higher than He, making known a high genetic changeability among the cultivars studied. In our research, the average value of expected heterozygosity is quantified to 0.649. This result is in line with Diaz et al. [3] and Alba et al. [11] who reported that the microsatellites developed in olive are characterized by medium levels of heterozygosity due to the specificity of this outcrossing species.

The frequency of null alleles (r) varied from 0 to 0.1 (Table 2). Note that the majority of markers studied (6 from 8 markers) showed an inconsistent values of r, indicating an excess of heterozygotes of these markers.

Patterns of genetic diversity or genetic relationship between cultivars

To elucidate genetic relationships among 22 olive oil cultivars, a dendrogram was produced using UPGMA cluster analysis and the Jaccard similarity coefficients over 8 SSR loci (Fig. 1). The genotypes studied can be divided into four main groups by cutting the dendrogram at a similarity value of 0.8. The first group corresponds to the 8 cultivars of the top of the dendrogram including ‘Chemlali’ cultivars from different region and ‘Zalmati’. All these cultivars have small-sized fruit (0.7–1 g) and seem to be clones of the same genotype. The second group contains ‘Picholine’, ‘Ascolana’, and ‘Coratina’. The third group contains ‘Oueslati’, ‘Zarrazi’, and ‘Chemlali Tataouine’. These three cultivars, except ‘Chemlali Tataouine’, have many common features consisting of medium-sized fruits (1.6–3 g) and their dual end use (oil and table olive). The forth group corresponds to the three ‘Chetoui’ cultivars with Rkhaymi, which also are clonally related. These varieties have medium-sized fruits (1.6–2.8 g) and are grown in the north of Tunisia. ‘Manzanilla’ and ‘Arbequina’ that originated from Spain and Koroneiki from Greece are in separated single cultivars groups, having medium fruit size 3.5 g. In addition, ‘Chemchali’ that originated from Tunisia (Gafsa region) was separated from the other group, having medium fruit size 3 g.

Fig. 1
figure 1

Dendrogram of 22 olive cultivars generated by unweighted pair group method using an arithmetic average cluster analysis using Jaccard similarity coefficients from simple sequence repeat markers

Genotyping and characterization of olive oil

Eight SSR markers were analyzed on the set of leaves and oil samples from 22 cultivars. They revealed a total of 47 alleles ranging from 3 to 7 alleles per markers (Table 2). There were 40 alleles found in leave samples and 7 supplementary alleles present only in oil (Table 3).

Table 3 Allele size (bp) and frequencies for each SSR locus in 22 olive oil genotypes

Ordinarily, the SSR markers employed gave repeatable and acceptable amplifications both on leaf and oil DNA, but for few cases, the DNA extracted from olive oil samples was unsuccessful to amplify. This failure can be explained by the low concentration and the degradation of DNA from oil.

Concerning oil samples, a total of 176 PCRs (22 oil samples × 8 SSR markers) were performed: amplifications failed in 31 cases, corresponding to a rate of failure of 18.45%. This result is consistent with the finding of Breton et al. [22] and Alba et al. [11] who obtained 20 and 14.3% of failed amplification of olive oil samples, respectively.

Amplification failure can be caused by the high degradation of DNA recovered from this oil samples; for this reason, we used a suitable method to extract a high-quality DNA recovered from oil samples [5, 13].

Another critical step is the choice of SSR markers, which plays a very important role in the success of SSR amplification of degraded DNA from oil; indeed, we used SSR markers that are short sequences (100–300 bp) including the flanking regions because the use of fragments longer than 300 bp could not be amplified for this nature of degraded DNA from oil. This result is similar to that obtained by Pasqualone et al. [12], Testolin and Lain [13] and Doveri et al. [1].

Notwithstanding this unsuccessful cases of PCR amplification, overall, microsatellites are suitable, efficient, and feasible tools for the analysis of DNA extracted from olive oil and for the study of traceability.

After discarding the 31 cases of unsuccessful PCR, the majority of the genotype profiles showed the same patterns between leaves and oil DNA. Nevertheless, in many cases, the DNA extracted from monovarietal olive oils showed several mismatchings with DNA extracted from leaves; for example, profile of DNA isolated from monovarietal olive oil extracted from ‘Chetoui thibar’ demonstrates numerous mismatching with leaves (the loss of the 210 and 228 bp alleles for GAPU71A; the loss of the 117 bp allele for GAPU71B; the loss of the 115 bp for UDO09; two new alleles at of the 212 and 214 bp for GAPU71A; the appearance of the 120 and 140 bp for GAPU71B; and the appearance of the 101 bp for UDO09).

Other mismatchings were revealed for other cultivars between leaves and oils patterns, such as, for the ‘Chemchali Gafsa’ oil (lost of the 244 bp allele with DCA1, additional 212 and 101 bp respectively with GAPU59 and UDO09), ‘Zarrazi Zarzis’ oil (lost of 157 bp with UDO12 and additional 99 bp alleles with UDO09), ‘Chemlali Tataouine’ oil (additional of 255, 210 and 101 bp respectively with DCA3, GAPU59 and UDO12), and ‘Arbequina’ oil (additional of 212 and 126 bp respectively with GAPU59 and GAPU71B (Fig. 2)). When we compare the SSRs profiles obtained with the monovarietal oil-derived DNA and genomic DNA extracted from leaves, we generally showed consistent profiles with some markers (such as GAPU71A and DCA4), although notable non-concordance was recorded for other SSR markers (such as GAPU71B, GAPU59, UDO12, UDO09). These incomplete consistent profiles between leaves and oil are likely due to contamination in monovarietal oils by other cultivars developed in the same block or by contaminating happened at the mill during the extraction of olive oil. Moreover, the lack of some peaks in oil profiles can be caused by the highly degraded DNA extracted from olive oils due to the mechanical stress connected with the process of olive crushing. Otherwise, the existence of additional peaks in oil profiles with respect to alleles present in leaf profiles could be originated from the pollinators DNA present in the genome of the seed embryo. In fact, the most cultivated olive is self-incompatible, and the flowers are wind pollinated. These results are in line with many previous researches; in fact, Doveri et al. [1] observed different profiles in oil and leave of the cultivar ‘Leccino’, proposed that additional alleles in oil patterns have originated from paternal contribution (pollinator). Likewise, Alba et al. [11] analyzed ten SSR markers of seven Italian cultivars and they showed that the oil DNA gave some fragments that were not detectable in DNA from leaf tissues and concluded that these additional peaks have a paternal origin. This was already pointed out by Ben-Ayed et al. [5] who analyzed the genetic profiles for six SSR makers in olive oil and leaves from two major Tunisian cultivars ‘Chemlali’ and ‘Chetoui’ and suggested that the additional of alleles in oil profile originate from the pollinators contribution present in the seed of the hole fruit (olive). These results contrast with those of Muzzalupo et al. [23] who did not point out differences in the DNA fingerprinting extracted from oils with or without pits from the cultivar ‘Ogliarola salentina’. Therefore, it is possible to say that these differences between researchers can be explained by the low number of SSR markers and cultivars studied or the sensibility of the electrophoretic tools.

Fig. 2
figure 2

Capillary electrophoresis electropherogram showing the amplification of the microsatellite Gapu71B: a white sample, b DNA extracted from Oueslati leaves, c DNA extracted from Oueslati oil, d DNA extracted from Arbequina leaves, e DNA extracted from Arbequina oil. Peaks correspond to SSR alleles. The minor peaks in graph e show the presence correspond to embryo’s alleles. The profile represents the region between 110 and 150 bp. In the x-axis is reported the size of DNA fragments, while the y-axis refers to the fluorescence intensity of the signal

In order to confirm the postulation of paternal contribution that could be arisen, two cases are given by ‘Chemcheli Gafsa’ an ‘Arbequina’ oil profile obtained with GAPU59, where the supplementary allele 120 bp is present also in all the other cultivars excepting ‘Chemlali Tataouine’. On the other hand, we showed the allele 238 bp present in ‘Zalmati Zarzis’ oil profile with DCA1 marker and lacking in all the other profiles; perhaps, the source of this allele originates from a pollinator contribution of cultivars that are not incorporated in this report but present really in the same field that grows this cultivar.

It is interesting to note that the ‘Arbequina’ oil profile shows the presence of two paternal alleles 212 and 126 bp respectively for GAPU59 and GAPU71B marker. These two alleles were also scored in ‘Oueslati kairouan’, ‘Zarrazi Zarzis’, and ‘Koroneiki’ cultivars. Thus, this shows that these cultivars could pollinate the ‘Arbequina’ cultivars. Moreover, it is worth noting that alleles from ‘Chemcheli’ cultivar were detected in the oil samples from two Chemlali cultivars (‘Chemlali Sidi Bouzid’ and ‘Chemlali Chàal’) for marker GAPU59, and the inverse was seen for the same marker. This reveals that these two cultivars can cross-pollinate each others. Likewise, the same conclusion can be obtained concerning ‘Coratina’ cultivas with ‘Ascolana’ and ‘Chemlali Monastir’ cultivars that can cross-pollinate because alleles from ‘Chemlali Monastir’ and ‘Ascolana’ cultivars were detected in the samples from ‘Coratina’ cultivars for UDO09 marker and the inverse was seen for the same marker.

In contrast, for other cultivars, oil profiles have exactly the genotype of the leaves profiles (mother), such as ‘Oueslati Kairouan’ and ‘Koroneiki’ cultivars, which indicate either pollination by another tree of the same clone or self-pollination.

Minimum number of markers to discriminate all the cultivar

A minimum set of markers was therefore chosen for rapid identification of 22 olive varieties. This included GAPU59 and GAPU71A with the highest discriminating power (DCA1 has a higher power of discrimination than GAPU71A, but it does not succeed all the PCR amplification of Ascolana cultivars). The identification key for the 22 olive accessions is shown in Fig. 3. Specific allele profiles at locus GAPU59 were first assigned to 5 varieties (‘Chemlali Tataouine’, ‘Chemchali’, ‘Picholine de Languedoc’, ‘Arbequina’, ‘Manzanilla’). The varieties ‘Koroneiki’, ‘Zalmati’, ‘Chemlali Sidi Bouzid’, ‘Chemlali Sousse’, ‘Chemlali Monastir’, ‘Chemlali Nabeul’, ‘Chemlali Chàal’, ‘Chemlali Blettech’, and ‘Chemlali Dokhane’ were differentiated by GAPU71A. whereas the accessions ‘Oueslati’ and ‘Ascolana’ were distinguished by GAPU71B.

Fig. 3
figure 3

Microsatellite-based identification key of the monovarietal oils from 22 cultivars. Microsatellite name and dimensions of the amplicons in base pairs are reported at each branch

In conclusion, this research indicated that SSR markers are a suitable and effective tool to characterize olive oil varieties. Especially, the use of capillary electrophoresis by an automatic sequencer with fluorescent labeling greatly facilitates the identification and discrimination between olive oil cultivars, compared to other markers systems or SSR genotyping techniques. Moreover, capillary electrophoresis permits to differentiate alleles with very small differences in molecular weight and detects a very low or partially degraded DNA, which is the case of extracted DNA from olive oil.

The focal point of this paper concerns the traceability of origin and authenticity of olive oil by genomic approaches that can serve for protecting consumers from fraud and producers from unfair competition in the specific market of extra virgin olive oil.

The overcome of the analysis of olive oil from many different cultivars could be possible to constitute an ‘oil SSR profile’ as an identity card for the setup of an oil’s referential data bank.