Introduction

The olive tree (Olea europaea L. subsp. europaea), the most representative tree in Mediterranean countries, has been cultivated in this area for approximately 6000 years. It is estimated that there are more than 1200 local and old olive cultivars distributed in 54 countries worldwide and they can be found in a variety of environmental conditions and growing systems (Bartolini 2008). The world catalogue of olive cultivars collects information of 139 cultivars from 23 olive-growing countries that cover almost 85% of the olive crop area. Being a clonally propagated crop, olive cultivars have been traditionally maintained in ex situ field collections (Belaj et al. 2001, 2004b). Evaluation and conservation of olive germplasm are being performed in approximately 100 institutions with a regional, national or international scope (Bartolini 2008).

There is a controversy about the true origin of Mediterranean olive trees. The work of Besnard et al. (2013) supports the existence of three long-term shelters during the Quaternary glaciations in the Mediterranean that could have played a key role for the preservation of the genetic diversity of this plant species. These three shelters would be the Middle East (including Cyprus), the Aegean area and the Strait of Gibraltar. However, the genome of Olea is not only related to the existence of areas of glacial retreat since it is also strongly influenced by both the biogeographical conditions of the Mediterranean basin and the human influence. The comparison of the geographical distribution of plastid genome diversity between wild and cultivated olive trees indicates the cradle of the first domestication in the northern Levant that was followed by dispersal across the Mediterranean basin, in parallel with the expansion of civilization and human exchanges in this part of the world (Besnard et al. 2013). According to these authors, humans have widely dispersed the chlorotype present in the East, where 90% of the olive cultivars share the same “eastern-like” chlorotype (Besnard et al. 2013).

The relationships among cultivated olive trees, wild forms and related subspecies need to be extensively explored to gain a better understanding of the genomic profile of wild populations and related subspecies (Green et al. 1989; Bartolini 2008; Belaj et al. 2010; Díez et al. 2011; Besnard et al. 2013; Barazani et al. 2014). A better understanding of their genetic structure would be the first step to clarify these issues (Green et al. 1989; Angiolillo et al. 1999; Belaj et al. 2011). The presence of synonyms (the same cultivar with different names) and homonyms (different cultivars with the same name) in olive cultivars, together with the recent and extensive diffusion of some cultivars (Arbequina, Frantoio, Koroneiki, etc.) out of their areas of origin, imposes the need for reliable and efficient tools for olive cultivar identification. Morphological and biological characters have been used for long for descriptive purposes to distinguish olive cultivars (Cantini et al. 1999; Barranco et al. 2000, 2005; León et al. 2004). Agronomic characterisation also allowed the classification of different olive cultivars (Barranco and Rallo 2000). However, the use of the morphological characterisation is questionable because the expression of most morpho-biological traits is strongly affected by environmental conditions, the age of the trees, the training systems, and plant-phenological stage of the plants. Nevertheless, the morphological approach is still the initial step for the description and classification of olive germplasm (Rotondi et al. 2003).

A core collection consists of a limited number of selected accessions representing the genetic spectrum of the whole olive cultivar community. According to Brown (1995), it should include as much genetic diversity as possible. Belaj et al. defined in 2012 a core collection of 36 olive cultivars that includes the genetic spectrum of the WOGBC located at IFAPA (Instituto de Investigación y Formación Agraria y Pesquera de Andalucía) “Alameda del Obispo” Córdoba, Spain. It was designed through molecular markers (DArTs, SSRs and SNPs) and agronomic traits. Given its high average genetic distance and good representation of the different regions of the Mediterranean area in a relatively small number of varieties, the core collection from Belaj et al. (2012) is recognised as the most suitable for studies of olive genetic improvement.

In the last decades, the new molecular techniques have allowed the design of different genetic markers, employed, for example, to elucidate the variability of some crops (Trujillo et al. 1995, 2014; Belaj et al. 2001, 2004b; Fendri et al. 2010; Muzzalupo et al. 2010; Haouane et al. 2011; Atienza et al. 2013; Beghè et al. 2015) or for phylogenetic studies (Baldoni et al. 2006; Rubio de Casas et al. 2006; Belaj et al. 2007; Erre et al. 2010; Besnard et al. 2013). Other authors have combined the use of both morphological traits and molecular markers for identification purposes (Fendri et al. 2010; D´Imperio et al. 2011; Trujillo et al. 2014).

As pointed out by Belaj et al. (2018), most of the current identification efforts in plant germplasm collections are based on DNA markers. These authors compiled from several studies (Bracci et al. 2011; De Lorenzis et al. 2015; Mason et al. 2015; Marrano et al. 2017) the desirable properties that a molecular marker should fulfil for olive identification: availability of many polymorphisms, co-dominant inheritance, high frequent occurrence, easy accessibility, low cost, quick and high throughput, high reproducibility and transferability among different laboratories and detection platforms. At present, SNPs and SSR markers have been the molecular markers more commonly employed to identify olive cultivars. Single nucleotide polymorphism (SNP) is a variation in a single nucleotide that occurs at a specific position in the DNA chain. SNPs are sequence-based and distinguished according to the nucleotide present at each given position, which confers them high reproducibility among laboratories and detection techniques (Bracci et al. 2011; Bevan et al. 2017). During the last years the information about SNPs in olive tree has strongly increased (Reale et al. 2006; Consolandi et al. 2007; Muleo et al. 2009; Hakim et al. 2010; Belaj et al. 2012; Dominguez-Garcia et al. 2012; Kaya et al. 2013; Biton et al. 2015; Ipek et al. 2016; Belaj et al. (2018).

Microsatellites or SSR markers are regions of DNA consisting of tandemly repeated units of mono-, di-, tri-, tetra-, penta- or hexa-nucleotides arranged throughout the genomes of most eukaryotic species (Powell et al. 1996). Over the past 20 years, SSRs have been the most widely used markers for genotyping plants as they are highly informative, co-dominant, multi-allele genetic markers that are experimentally reproducible and exhibit relatively high transferability among related species (Mason 2015). The advent of the genomic period has resulted in the production of vast amounts of publicly available DNA sequence data, including large collections of Expressed Sequence Tags (ESTs), a rich source of SSRs, with many advantages over SSR markers from genomic DNA, and a large number of applications, since they reveal polymorphisms not only within the source taxon but in related taxa as well (Ellis and Burke 2007). SSRs with core repeats from 3 to 6 nucleotides long, have been designed in some woody species such as Prunus (Aranzana et al. 2003; Dettori et al. 2015), Vitis vinifera (Riaz et al. 2004; Cipriani et al. 2008), Malus domestica (Silfverberg-Dilworth et al. 2006) among others, and they are quickly increasing to the detriment of the classical di-nucleotide SSR.

In olive tree, microsatellites have been used for many purposes (Bracci et al. 2011) including paternity analysis (De la Rosa et al. 2004; Díaz et al. 2007a, b), construction of linkage maps (De la Rosa et al. 2003; Wu et al. 2004), cultivar traceability in olive oil (Martins-Lopes et al. 2008; Alba et al. 2009), DNA fingerprinting of cultivars (Sefc et al. 2000; Sarri et al. 2006; Baldoni and Belaj 2009), phylogenetic studies (Belaj et al. 2007; Erre et al. 2010), phylogeography and population genetics (Belaj et al. 2007; Khadari et al. 2008) and admixture events detection (Besnard et al. 2007; Díez et al. 2015). The high level of SSR transferability among olive tree subspecies (Rallo et al. 2003), combined with the level of polymorphism, makes SSR the current markers of choice for identification and variability studies (Trujillo et al. 2014; Belaj et al. 2012; Fendri et al. 2010; Muzzalupo et al. 2010; Haouane et al. 2011).

Despite their utility in olive tree less than 100 good and polymorphic SSR markers, have been developed to date (Sefc et al. 2000; Rallo et al 2000; Carriero et al. 2002; Cipriani et al. 2002; De la Rosa et al. 2002; Gil et al. 2006) and they have been extensively used by many researchers (e.g., De la Rosa et al. 2004; Belaj et al. 2004a; Sarri et al. 2006; Díaz et al. 2006 and D’Imperio et al. 2011). Haouane et al. (2011) used ten of the SSRs previously reported by different authors (Sefc et al. 2000; Carriero et al. 2002; Cipriani et al. 2002; De la Rosa et al. 2002) to study the genetic structure of the core collection from the World Olive Germplasm Bank of Marrakech. El Bakkali et al. (2013) employed a set of 17 markers from the previous authors to construct new core collections. De la Rosa et al. (2002) and Belaj et al. (2011) used 8 SSRs from Cipriani et al. (2002), together with agro-morphological traits, to analyse the variability of wild olive trees. Some more recent papers (Adawy et al. 2015; Mariotti et al. 2016; Arbeiter et al. 2017) have shown new sets of SSR markers, mainly di- or tri-nucleotides.

Even though the advances in SNP technology, the use of microsatellites remains as the predominant molecular tool for identification and characterization of olive cultivars. In its review, Sebastiani and Busconi (2017) highlighting the article from Dominguez-Garcia et al. (2012), concluded that SNPs are less polymorphic than microsatellites, although they showed an interesting level of polymorphism to study some cultivars from Algeria. These authors also recognized the necessity of developing more SNPs to make them as discriminative as SSRs. Belaj et al. (2018) have identified a new set of 1043 EST-SNP markers but according to the authors they display lower levels of genetic diversity than SSRs.

The main objectives of this work were to design and develop a highly efficient and reproducible set of long core repeat EST-SSRs, its use in multiplex PCR, and its validation in the identification of the 36 olive cultivars of the Belaj et al. (2012) core collection. The markers were designed to make the multiplexing easier. In particular, they allowed the design of primers that generated a wide range of allele sizes, labelled with four different fluorophores, in a very standard and unique PCR conditions. This set represents a potent tool to discriminate any other olive cultivar in the world, and is useful for studies of population genetic structure, genetic mapping and evolutionary processes. SSR markers with core repeat 4 to 6 nucleotides long are the election tool in the current analyses with SSR markers, especially if they permit their multiplexing.

Materials and methods

ESTs-SSR markers retrieving and primers design

The initial data for this work were taken from three cDNA libraries, sequenced through Sanger technology, in the framework of the OLEAGEN project (Muñoz-Mérida et al. 2013). The first library proceeded from buds taken from young and adult branches of 10 seedlings from the cross of cultivars “Picual” and”Arbequina”. The second library came from “Lechín de Sevilla” fruit mesocarp at different maturation stages (green with lignified endocarp, turning and purple). The last was generated from young leaves and stems of”Lechín de Sevilla” plus seeds from fruit at two different maturation stages (turning and purple) from a progeny of”Picual’ and ‘Arbequina”.

Identification of SSRs in the three libraries was carried out using MIcroSAtellite software (MISA, https://www.pgrc.ipk-gatersleben.de/misa, Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany) (Thiel et al. 2003), following the methodology previously described by De la Rosa et al. (2013). Primers design was developed by Oligo 7 software (Primer Analysis Software Oligo 7.60. Molecular Biology Insights, Inc.; Cascade, CO 80,809, USA).

Plant material, DNA extraction and PCR amplification conditions

Total genomic DNA derived from 100 mg of fresh leaves of the 36 olive cultivars of the core collection obtained from the WOGBC. DNA extraction was carried out using a commercial kit, Phytopure, following the manufacturer’s instructions (GE Healthcare). DNA quality was assessed by electrophoresis on 1% (w/v) agarose gels and quantification was performed using a microplate reader (model BioTek, Synergy HT). The first step of EST-SSR selection included only eight olive cultivars from the core collection: “Abbadi Abou Gabra-842”, “Arbequina”, “Chemlal de Kabylie”, “Frantoio”, “Koroneiki”, “Manzanilla de Sevilla”, “Maari” and “Picual”. PCR amplification and reproducibility of the 40 initially designed non-fluorescent primers were tested using the selected cultivars. SSR amplification was carried out in a thermal cycler by Bio-Rad (MyCycler TM), in a final volume of 10 µl containing: 100 ng of genomic DNA, 0.25 U of AmpliTaq Gold® DNA polymerase, 2.5 mM MgCl2 final concentration, 1 mM each dNTP mix (Roche), and 0.5 µM each, forward and reverse, non-fluorescent primers. The program used for PCR amplification was as follows: initial denaturation at 95 °C for 5 min; 35 cycles of denaturation at 95 °C for the 30 s, annealing at 50 °C to 55 °C for 30 s, extension at 72 °C for 30 s, and a final extension at 72 °C for 60 min. Detection of amplification was confirmed by electrophoresis in 2% agarose gel. From the initial set of 40 primer pairs, only 24 were selected on the basis of their polymorphism (Table 1).

Table 1 Set of 24 SSR markers assayed with 8 olive cultivars of the core collection. The 8 finally selected are highlighted in bold type

The next step was the fluorescent dye-labelling of one of the two primers from each of the 24 selected SSR markers (forward or reverse) with FAM, NED, PET or VIC fluorophores (Applied Biosystems, Foster City, CA, USA) (Table 1). The expected fragment–size range for each locus (Table 1) was established according to the OLEAGEN database information. SSR markers with non-overlapping amplicons were labelled with the same fluorescent dye. In contrast, each of the markers that produced amplicons with the same size, were labelled with one different fluorophore.

Fluorescent-labelled markers were initially tested on the eight selected olive cultivars and amplification was checked by electrophoresis in 2% agarose gel. The evaluation of polymorphism was carried out with an automated sequencer 3500 genetic analyzer (Life Technologies) using the internal standard GeneScan™ 600 LIZ® Size Standard v2.0 (Life Technologies).

The final number of SSR markers selected to identify the 36 olive cultivars was ten: eight specially designed for this work and two previously described by De la Rosa et al. (2013) (Table 1).

PCR multiplex design

Once the set of ten SSR markers had been separately tested, and their allelic profiles described, the possibility of making multiplex PCR was studied. Amplification reactions were performed in a total volume of 10 µl with 100 ng of template DNA, 0.5 µM of primers labelled either with 6-FAM, NED, PET or VIC fluorophores, 0.5 µM of unlabelled primers, 1 mM each dNTP mix (Roche), 2 mM MgCl2, 1× Buffer Gold and 0.25 U of AmpliTaq Gold® DNA polymerase. The PCR program was a touchdown one with the following steps: initial denaturation at 95 °C for 5 min, 20 cycles of denaturation at 95 °C for 30 s and annealing at 65 °C, 10 cycles of denaturation at 95 °C for 30 s and annealing at 55 °C for 30 s, 10 cycles of denaturation at 95 °C for 30 s and annealing at 50 °C, and a final extension at 72 °C for 30 min. Labelled amplification products were resolved onto an automated sequencer 3500 genetic analyzer (Life Technologies) using the internal standard GeneScan™ 600 LIZ® Size Standard v2.0 (Life Technologies) The PCR fragments were detected with the GeneMarker analysis software version 2.00.

As mentioned above, all pairs of primers were designed to allow the indiscriminate mixing of them in multiplexing without giving rise to errors of interpretation (Fig. 1). The optimal number of markers that can be introduced in the same PCR tube without causing problems of lack of amplification is five or less. More than five produces the absence of amplification of some markers.

Fig. 1
figure 1

a Amplicon size range of the 10 SSR markers employed to characterise the 36 olive cultivars of the core collection defined by Belaj et al. (2012). Each amplicon range is represented with the same colour than the flourophore linked to the marker. Multiplexing with no more than 5 different markers at the same time is possible because those markers overlapping in amplicon size are linked to fluorophores that produce a signal of a different colour. Markers named Oleagen were developed by De la Rosa et al. (2013). b Example of an electropherogram resulting from a multiplex PCR with five SSR markers

During the development of PCR assays, successful and repetitive amplifications were usually obtained in all genotypes analysed. In order to exclude errors due to DNA concentration or quality, two additional tests were carried out in genotypes where no amplification was achieved. When one cultivar did not show any allele for a marker (null), the analysis was repeated once again to confirm the result.

Data analysis

The allele profiles and fragment analysis of the olive cultivars were characterised using GeneMarker software. The electropherograms displayed fluorescent signal intensities as a single line trace for each dye colour: FAM, VIC, NED and PET. After having uploaded the raw data files, its processing included the application of a sizing standard, filtering of noisy peaks, and comparison to a known allelic panel. The common size standard used was LIZ600.

The peaks from each of the ten primer pairs employed to identify the 36 olive cultivars were analysed. Those genotypes showing a single peak at a given locus were recorded as homozygous. The absence of any peak in the expected range size was checked twice before it was confirmed as null genotype. The statistical analysis of heterozygosity, number of alleles and their frequency, and values of the polymorphic information content (PIC) was performed using PowerMarker V3.0 software (Liu and Muse 2005) (https://www.powermarker.net) and GenAlex V6.5 (Peakall and Smouse 2012) (https://biology-assets.anu.edu.au/GenAlEx/Welcome.html).

Summary statistics (Table 2) were calculated with Power Marker V3 software (Liu and Muse 2005) except for Dj for which GenAlex 6.5 (Peakall and Smouse 2012) was used. Each SSR marker was characterised by the following parameters: Na is the number of actual alleles per SSR marker. Ne is the number of effective alleles, that is, the number of equally frequent alleles that would take to achieve the same expected heterozygosity than in the studied population. Ne allows comparison of populations where the number and distributions of alleles differ drastically. Major allele frequency is the higher frequency of an allele profile, at a particular locus from a population, expressed as a fraction per one unit. Genotype number is the number of different allelic combinations found. Ho and He are the observed and expected heterozygosity respectively. When the heterozygosity is high, the effective number of alleles is also high. Gene diversity is defined as the possibility that two randomly chosen alleles from the population are different. PIC (Polymorphic information content) is a parameter often used to measure the discriminatory capacity of an SSR marker. PIC takes into consideration the number of alleles present at a marker locus, and the frequency of these alleles. Consequently, loci with a large number of alleles usually have higher PICs, although the PIC is also influenced by the frequency of these alleles. It is possible to have a large number of alleles and a relatively small PIC if one or two of the alleles predominate. PIC values range from 0 to 1. According to Botstein et al. (1980) three categories are defined: high (PIC > 0.5), moderate (0.5 > PIC > 0.25), and low (PIC < 0.25). From the PIC parameter, Tessier et al. (1999) defined Dj as a way to evaluate the efficiency of a primer for the purpose of identifying varieties.

Table 2 Statistical analysis of the quality of SSR markers tested in this work, their peaks definition and the number of cultivars with null genotype that they produce

Results and discussion

Selection and checking of the SSR markers

As it has already been described in material and methods, the 40 initial non-fluorescent hexa, penta and tetra-nucleotide SSRs were reduced to a set of 24 selected primer pairs after PCR amplification assays (Table 1) that were linked to a fluorescent dye. The selection relied on their amplification capacity and the high polymorphism they showed in agarose gels. From the set of 24 primer pairs that at first sight seemed to be polymorphic, 16 generated the same allelic pattern in all the olive cultivars assayed and therefore, they were rejected. In summary, from the 40 initial SSRs, only 8 were chosen. The final set of SSR markers selected to characterise the cultivars of the olive core collection WOGBC, was composed of ten: 8 the newly described ones above mentioned, plus 2 hexa-nucleotide (Oleagen-H20.1 and Oleagen-H2) previously described by De la Rosa et al. (2013) and tested here for the first time on all the 36 cultivars of the core collection.

Table 3 shows the amplification results of the set of ten SSR markers on the 36 olive cultivars studied, and the allele size range of every SSR marker employed. Differences of 1 bp between alleles from different cultivars were checked by re-amplification to establish whether a coding error had occurred. A total of 98 different genotypes were produced. The null genotypes can be the result of mutations in the flanking region at primer binding sites (Guichoux et al. 2011). Nulls have not been taken into account in the general record of alleles. Nevertheless, the absence of alleles in a cultivar has been considered as one genotype (null).

Table 3 Genotype of the 36 cultivars studied at each of the 10 SSR markers tested. SSR markers are grouped according to their quality. Oleagen H2 and Oleagen-H20.1 were previously designed by de la Rosa et al. (2013)

Multiplex SSR set coupled with fluorescent detection systems have already shown to be relevant and successfully applied in plant genetic studies. Nevertheless, most of these studies are based on SSR markers not originally developed to be multiplexed and thus, in general, these markers are available for their use in multiplexing in very low number (Merdinoglu et al. 2005). In contrast, our approach is based on the specific design of SSR markers for their multiplexing (pre-PCR) and multiloading (post-PCR). The optimisation of this multiplex-PCR was one of the more time-consuming tasks in this work and had several critical steps: (i) the new touchdown-PCR conditions to save costs and time of work, (ii) the determination of the best combinations of grouped fluorescent labelled markers, and (iii) the optimal number of markers in each multiplex PCR that permits a good amplification of all of them.

The development of multiplex PCR has been considered especially important because it represents a clear benefit by reducing laboratory work and consumption of expensive reagents without compromising test accuracy (Guichoux et al. 2011). On the one hand, the method is faster, because in every PCR tube up to five different markers can be amplified with very good results. On the other hand, the multiplex PCR allows a considerable costs save in reagents and sequencing processes. The touchdown PCR is the technique of choice in the majority of multiplex PCR experiments (Hill et al. 2009; Guichoux et al. 2011) as it allows to amplify heterogeneous SSR primers, with different annealing temperature by reducing this parameter in successive annealing cycles.

Figure 1a shows the amplicon size range of each of the 10 SSR markers. Each amplicon range is represented with the same colour that the fluorophore linked to its marker. As seen in this figure, those markers producing amplicons with the same size, are labelled with fluorophores of different colour. Figure 1b) shows an example of an electropherogram resulting from a multiplex PCR with five different SSR markers

Evaluation of SSR markers polymorphism and discrimination power

A total of 36 cultivars, the core collection, that have been chosen from the WOGBC and whose origins are the main Mediterranean olive-cultivating countries, were genotyped with the set of ten selected SSR markers.

SSR markers were classified (Table 2) on the basis of the allelic profile they produced, using as selection criteria: a) the presence of sharp peaks, b) the number of different alleles and genotypes they revealed and c) the number of cultivars with a null genotype they produce. ESM_1 shows the allele combinations obtained with the best markers (Oleagen-H2, Oleagen-H20.1 and Olea42.31). They are considered the best because they provide electropherograms in which alleles are clearly defined and they do not give null genotype in any of the cultivars assayed. ESM_1 also includes the allele combinations obtained with Olea9.4. This last marker produces also very well defined peaks, but it gives null genotype in ten of the cultivars. The second group in order of quality (ESM_2) consists of Olea39, Olea40.13, Olea41.2, Olea42.34 and Olea42.9. These five markers provide clearly distinguishable alleles, although sometimes appear double peaks for a single allele. According to Clark (1988) and Esselink et al. (2003), double peaks are caused by the non-template addition of a nucleotide (generally an adenine) to PCR fragments by the Taq polymerase. When the adenylation is incomplete, in the resulting electropherogram appears one peak from the original fragment and an additional peak 1 bp longer corresponding to the adenylated fragment (Guichoux et al. 2011). The marker that gave worse results in terms of peaks definition was Olea42.30 (Table 2 and ESM_3). It produced electropherograms that were difficult to interpret, at least in some cultivars, because of their stuttered peaks. Reproducible and polymorphic amplification products were obtained, displaying from 4 to 7 different alleles per locus (Table 2).

It has been suggested that a research assessing an array of possible primer pairs, should select those associated with di-nucleotide repeats over more elaborated motif lengths (tri-, tetra-, or penta-nucleotide motifs) to ensure higher levels of genetic variation (Levinson and Gutman 1987; Grist et al. 1993; Chakraborty et al. 1997; Sup Lee et al. 1999; Ellegren 2004). In fact, most (48–67%) microsatellite markers found in many species are di-nucleotide repeats, although they are less frequent in coding regions (Li et al. 2002). Tri-nucleotide and hexa-nucleotide repeats are thought to be more common in coding regions because they do not cause any change in the frameshift (Tóth et al. 2000; Ellegren 2004).

All SSR markers employed in this work have shown a low number of alleles compared to di-nucleotide SSRs although they are in accordance with previous studies on the olive tree (De la Rosa et al. 2013). Even though long core SSR markers have a lower number of alleles than di-nucleotides (Nishio et al. 2011; Poncet et al. 2006; Rahemi et al. 2012), they are more appreciated because they produce wider distances among alleles and less stuttered peaks, contributing to a more reliable scoring of microsatellites (Dettori et al. 2015).

Oleagen-H2 and Oleagen-H20.1 SSR markers have been previously employed with some of the olive tree core collection cultivars (‘Arbequina’ ‘Frantoio’, ‘Manzanilla de Sevilla’, ‘Koroneiki’ and ‘Picual’) (De la Rosa et al. 2013). This allows us to confirm the reproducibility of the results obtained with them by both laboratories. All results obtained in this research agreed with those reported in the work above mentioned, which provides evidence of the quality of the markers. Oleagen-H2 and Oleagen-H20.1 showed the same allele profile in the common cultivars assayed except for minor displacements of no more than one or two nucleotides in the position of the peak.

Interestingly, 17 of the 36 cultivars studied could be identified using only one marker, because it reveals a specific and exclusive genotype (ESM_4). Jabali cultivar is a striking case because it shows an exclusive genotype with five different markers. With respect to SSRs, Oleagen-H2 is the one that produces more exclusive genotypes (nine), followed by Olea9.4 with six. Noticeably, Oleagen-H2 and Olea9.4 are hexa-nucleotide markers, which reinforces the idea of higher cultivar discriminating ability of polynucleotide microsatellites compared to di-nucleotide ones. Olea42.31 is the only marker that does not reveal any exclusive genotype.

Through the use of only four SSR markers, one cultivar can be differentiated from the rest of the core collection. Nevertheless, in 34 of the 36 cultivars, three SSR markers (one trio) is just enough for this purpose. There are three possible trios: Oleagen-H2, Olea41.2 and Olea42.9 (ESM_5), the best one; Oleagen-H2, Oleagen-H20.1 and Olea41.2 (ESM_6) and Oleagen-H2, Olea42.34 and Olea42.9 (ESM_7). From each trio, there are only two cultivars that cannot be distinguished one from another (Chenge (Shengeh) and Abou Satl Mohazam when using the first trio; Chemlal de Kabylie and Koroneiki when using the second trio and Myrtolia and Mastoidis when using the third trio). In these cases, there are several SSR markers that can be used as an additional fourth marker to differentiate between those cultivars. ESM_8 shows the different possibilities of using a new fourth marker to differentiate the only two cultivars with the same genotype. The presence of null alleles can lead to an interpreting mistake when they are in heterozygosis because the single peak they show might be interpreted as homozygous instead of a heterozygous including a null allele (Dakin and Avise 2004). The advantage of the three trios proposed here is that none of them includes a marker that produces null genotype.

Once proved the discriminating power of this set of ten markers, the next step is to use this tool to genotype the hundreds of cultivars that are present in the WOGBC. Evidently, when faced with this task of genotyping new cultivars not included in the core collection, the method of choice is to employ the entire set of ten markers, once the multiplex-PCR has been perfected. Although the ten SSR markers can be mixed indiscriminately, to avoid confusion in the sizes of some alleles, it is advisable neither mix Olea40.13 with Olea42.30 nor 344 Oleagen-H2 with Oleagen-H20.1 due to the proximity between the size of alleles and the fluorophore used with them.

Polymorphism and diversity study. The capability of SSRs to identify the olive cultivars from the core collection.

Statistical analysis of the SSR markers shown in Table 2, reveals an allele number per SSR, Na, varying from 4 to 7, with an average of 5.25. These values could be considered as relatively low, but they are common in long core repeat molecular markers (De la Rosa et al. 2013; Cipriani et al. 2008). Ne values, the effective number of alleles, ranged from 2.17 and 5.02 with an average of 3.17. Ho and He ranged from 0.35 to 0.86, and from 0.54 to 0.80 respectively, with an average of 0.64 and 0.66. As these parameters can reach values from zero (no heterozygosity) to nearly 1.0 (for a system with a large number of equally frequent alleles) these results are similar to those described by De la Rosa et al. (2013) for Olea europaea and Cipriani et al. (2008) for Vitis vinifera.

The Major allele frequency parameter ranged from 0.28 for Olea9.4 SSR to 0.64 for Olea42.30 marker. Gene diversity showed a range of high values (from 0.54 for Olea42.31 SSR to 0.80 for Oleagen-H2 marker). Regarding PIC values, all microsatellites employed in this research showed a value higher than 0.5, except for two of them: Olea42.9 (PIC = 0.47) and Olea42.31 (PIC = 0.49) which means that they have a high discriminatory capacity.

As mentioned in Material and Methods, both Dj and PIC values are based on allele frequencies, and therefore, both parameters have similar values. Dj ranged from 0.55 for Olea42.31 to 0.82 for Oleagen-H2. The three markers with higher Dj values were: Oleagen-H2 (0.82); Olea9.4 (0.78) and Oleagen-H20.1 (0.76). In general, the discriminatory power of the SSRs tested in the present research is similar to that described by De la Rosa et al. (2013).

Comparison of data from this work and the results from other core collections

Over the last decade, several core subsets have been proposed for both annual species, e.g. Arabidopsis thaliana (McKhann et al. 2004); Oryza sativa (Zhao et al. 2010), Triticum aestivum (Balfourier et al. 2007) and Zea mays (Franco et al. 2005), and perennial species, e.g. Annona cherimola (Escribano et al. 2008), Malus domestica (Richards et al. 2009), Prunus armeniaca (Wang et al. 2011) and Vitis vinifera (Le Cunff et al. 2008) using different eco-geographical, agro-morphological, biochemical or molecular data.

Trujillo et al. (2014), used 33 out of 77 (43%) di-nucleotide SSRs previously developed by the same authors to identify, together with morphological markers, the WOGBC. The same 33 sequences were used again by Díez et al. (2015). Some of them were discarded and 72% of the previously chosen remained. Muzzalupo et al. (2014) employed 11 SSRs designed by the same authors to analyse the genetic biodiversity of Italian olive trees. Chalak et al. (2015) worked on the genetic diversity in Lebanese olive trees, using 12 out of 17 SSR previously employed by Haouane et al. in 2011. Also, Ipek et al. (2015) have recently employed 20 of the same microsatellites to study the contents of fatty acids in olive oil and the genetic diversity of an olive tree core collection. Las Casas et al. (2014) carried out a molecular characterization of olive Sicilian cultivars using 8 new di-nucleotide SSR markers designed by themselves. However, molecular discrimination of all of these core subsets were based on di-nucleotide SSRs

In the last years, De la Rosa et al. (2013) have designed a set of 8 new hexa-nucleotide SSRs based on 54-core sequences with long repeats, which is 15% of the previously chosen. These SSR loci were generated on the basis of ESTs in the frame of an olive genomic project (Muñoz-Mérida et al. 2013). Looking at other woody crops, only the study of Cipriani et al. (2008) have used polynucleotide SSRs. They tested a set of 94 long core repeat SSRs from Vitis vinifera selecting 38 of them (40.4%).

The present work started using 40 new long-core repeat sequences but only 8 of them (20%) were finally selected to be employed in order to discriminate the 36 cultivars of the core collection developed by Belaj et al. (2012). The excellent sharpness and discriminatory capability of electropherograms obtained with these SSR markers highlight their quality (ESM_1 to ESM_3). The final set of SSR consisted of 10, 8 of them specifically designed for this work, plus 2 previously designed by De la Rosa et al. (2013. As a probe of the discrimination power of the set, 94.4% of the cultivars can be discriminated from the rest of the core collection, using only three of these markers (ESM_5 to ESM_7).

The sequences of the microsatellite loci developed in this study represent a new and informative set of markers which can be easily combined for multiplexing and multi-loading according to the needs of any user and thus suitable for large-scale genetic analyses in olive. This work has made possible to obtain a set of SSR markers that produce reliable allelic profiles of 36 olive cultivars of the core collection from WOGBC and represents a powerful tool for genetic and plant breeding.