Introduction

Olive (Olea europaea L. subsp. europaea, var. europaea) is one of the most important oil crops, and its cultivation is mainly concentrated along the Mediterranean basin although now is spreading in many other new areas (Baldoni and Belaj 2010). Olive is a diploid species (2n = 46), highly heterozygous, and with a medium-big size genome (haploid = 1500 Mb) (Loureiro et al. 2007).

Currently, a growing interest has been directed towards the genetic diversity of olive germplasm, which was surveyed in different geographical regions, also characterized by severe climatic and ecological conditions (Hosseini-Mazinani et al. 2014; Mousavi et al. 2014). In addition, the relationships between cultivated varieties and related taxa, as different wild trees (O. europaea subsp. europaea var. sylvestris) and related subspecies, have also been deeply analyzed (Besnard and El Bakkali 2014; Díez et al. 2014), not only to understand the phylogenetic relationships within the genus but also to discover important traits of great agronomical and environmental interest (Kaya et al. 2016).

Cultivar identification is a primary concern for olive growers, breeders, and scientists. Up to now, the high genetic variability of cultivated olive and the little morphological differences among varieties, coupled with the use of low-effective markers and the confusion in genotype assignment within olive collections, have often led to conflicting information about varietal identity, hiding the level of variability and misinterpreting relationships among cultivars (Atienza et al. 2013; Díez et al. 2012). Lately, several efforts were made in order to solve this issue, by applying molecular genotyping to the accessions hold by main olive collections and developing more effective markers (Belaj et al. 2012; Biton et al. 2015; Haouane et al. 2011; Trujillo et al. 2014).

Fingerprinting based on microsatellite markers represents a powerful and reliable tool for characterizing plant varieties, which are exploitable for several molecular investigations such as cultivar identification, paternity test, kinship, or population structure analyses (Chankaew et al. 2014; Cipriani et al. 2010; Ellegren 2004; Shen et al. 2014; Zhang et al. 2015). According to their location on the genome, microsatellite regions may be distinguished into genomic neutral simple sequence repeat (SSR) markers and transcript-tagged SSRs, located on expressed sequence tags (ESTs), with a potential functional value (Bradbury et al. 2013; Hinchliffe et al. 2011). SSR markers show significant advantages, such as reproducibility, locus specificity and low quantity of template required; nevertheless, several disadvantages have also been recorded, which are mainly due to stutter products, allele binning and allele miscalling, as a consequence of the wide use of di-nucleotide microsatellites (Cabezas et al. 2011; Kaur et al. 2015; Targońska et al. 2015; This et al. 2004). The majority of discrepancies among laboratories in scoring di-nucleotide microsatellites are due to the binning process in which raw allele lengths are converted into allele classes whose size is then expressed by an integer (Baldoni et al. 2009; Weeks et al. 2002). These problems may be solved partly by discarding di-nucleotide SSRs in favor of microsatellites with longer motifs (Amos et al. 2007), as made in human fingerprinting (Butler 2007) and for fruit crop genotyping (Cipriani et al. 2008; Dai et al. 2015).

The majority of SSRs currently identified for olive genotyping carry di-nucleotide repeats (Carriero et al. 2002; Cipriani et al. 2002; de la Rosa et al. 2002; Diaz et al. 2006; Rallo et al. 2000; Sabino Gil et al. 2006; Sefc et al. 2000), mostly AG/CT repeats. In order to establish a common set of markers able to produce reliable data for olive cultivar discrimination, Baldoni et al. (2009) provided a consensus list of di-nucleotide SSR markers. But, despite the use of highly robust fingerprinting protocols, genotyping errors may still occur (Atienza et al. 2013; Díez et al. 2012; This et al. 2004) and the standardization of protocols and the exchange of information concerning the genetic profile of reference varieties are still required.

Frequency of microsatellites with long core repeats is lower than those carrying shorter motifs (Fungtammasan et al. 2015), thus their identification procedure requires additional efforts. Previous studies highlighted that in plants, microsatellite regions occur more frequently in transcribed regions than in genomic DNA (Morgante et al. 2002). A high frequency of trinucleotide repeats is reported in exons that contain almost no tetranucleotide repeats (Toth et al. 2000; Varshney et al. 2005), whereas mutability tests showed that the allelic variability of exon SSRs is lower than that of intronic repeats (Li et al. 2011). Massive EST and genomic sequence data have made available information on repeat region abundance and position, and bioinformatic analyses have significantly accelerated the process of identification and selection, reducing the time for their application (Acuna et al. 2012; Li et al. 2012; Shiferaw et al. 2012; Yang et al. 2012; Duran et al. 2013; Cubry et al. 2014).

SSRs located on gene exons may potentially control important agronomic traits (Zeng et al. 2010; Zhang et al. 2012; Boccacci et al. 2015; Dutta et al. 2011). Furthermore, since EST-SSR markers stand on expressed sequences, they show a high transferability across taxa (Scott et al. 2000; Zhang et al. 2005; Aggarwal et al. 2007; Luro et al. 2008), representing a valuable resource for comparative genomics, biodiversity and evolutionary studies (De Keyser et al. 2009).

SSR markers with repeat motifs longer than a di-nucleotide have been developed from genome sequences in several crop species, such as grape, cranberry, citrus, celery and common bean (Biswas et al. 2012; Chen et al. 2014; Cipriani et al. 2008; Fu et al. 2013; Zhu et al. 2012), from gene sequences, as in sesame (Zhang et al. 2012), or from EST data, as made for sunflower, castor bean, grape, tea, pea and alfalfa (Heesacker et al. 2008; Huang et al. 2011; Qiu et al. 2010; Wang et al. 2013; Yao et al. 2012; Xu et al. 2012). In fact, EST-derived SSRs have been well documented in several plant species (Lima et al. 2008; Feng et al. 2009; Du et al. 2013; Yao et al. 2012; Xu et al. 2012; Zhou et al. 2014; Ferrao et al. 2015). Lately, also, in olive, some EST-SSRs have been identified and used for paternity testing and mapping purposes (Essalouh et al. 2014; de la Rosa et al. 2013; Khadari et al. 2014).

The present study has taken its cue by the availability of several EST collections of O. europaea, in order to address the following issues: i) identifying new polynucleotidic repeats through the screening of these sequences, ii) testing the polymorphism and applicability of these regions as new markers for cultivated olive fingerprinting, and iii) validating their transferability on a wide set of related Olea taxa.

Materials and methods

Plant material and DNA extraction

A set of 32 olive (O. europaea subsp. europaea var. europaea) cultivars was primarily selected to test amplificability and applicability of the EST repeats as potential markers (Table S1). Then, to assay their ability to keep the widest range of variability within cultivated olives, other 47 varieties were added to the first group, considering their distribution and genetic variability (Sarri et al. 2006; Baldoni et al. 2009). Moreover, in order to test transferability of markers to other olive-related forms, four samples of wild olives (O. europaea subsp. europaea var. sylvestris), two O. europaea subsp. cuspidata, two subsp. laperrinei, samples of polyploid subspecies cerasiformis and maroccana (Besnard et al. 2008), and one Olea paniculata genotype were also included in the study, for a total of 90 samples (Table S1). DNA samples of this plant material were derived from the CNR–IBBR Olive Collection (Perugia, Italy) and from the World Olive Germplasm Bank (WOGB), IFAPA (Cordoba, Spain). Two clones of cv. frantoio, collected from two different growing sites and previously verified as a unique genotype by di-nucleotide well-characterized SSRs, were included in the analysis in order to test the reliability of new markers.

Identification of repeated polymorphic motifs in the EST collections

In order to identify the most effective SSR markers from the EST sequences, a specific pipeline has been established, as reported in Fig. 1.

Fig. 1
figure 1

Pipeline used for the selection of new EST-SSR markers

SSR motifs were searched within the EST collections from flower and fruit tissues of different olive varieties (Alagna et al. 2009; Alagna et al. 2016; Corrado et al. 2012). The adaptor-trimmed 454 read sequence data were assembled by using the GS De Novo Assembler software (Roche Diagnostics Corporation, Basel, Switzerland). Repeated motifs were searched using the Perl script program MISA (http://pgrc.ipk-gatersleben.de/misa) within each EST collection, by applying the following parameters: mononucleotide repeats (MNRs) ≥10, di-nucleotide (DNR) ≥6, from trinucleotide to hexanucleotide (TNR, TTNR, PNR, and HNR) ≥5, and distance between two SSRs ≤100 bp. Only polymorphic repeated motifs between varieties were selected for further analyses.

Selection of most valuable polynucleotide repeats as potential markers and functional annotation of corresponding putative genes

Transcripts containing polynucleotide repeats were aligned by BLASTN to the olive genome scaffolds of the cv. leccino made available to the OLEA project partners (oleagenome.org) in order to identify the genomic regions containing the target SSRs. A first screening was carried out on the selected repeats in order to discard regions characterized by multiple calls, showing flanking regions inadequate for primer construction or any other serious drawback.

A second screening was performed on the remaining 80 SSRs: primers were designed by using Primer3 v. 0.4.0 (http://frodo.wi.mit.edu), with a GC content higher than 20% and with a common melting temperature (Tm) of 60 °C so as to obtain fragments with an expected length ranging from 100 to 350 bp, approximately, in order to facilitate their discrimination when multiplexing. Analyses were carried out on a subset of five cultivars (leccino, chemlali, izmir sofralik, koroneiki, picual), in order to discard non-amplifiable loci and markers showing multi-band PCR products or unexpected amplicon lengths. Polymerase chain reactions were performed in a volume of 25 μl containing 25 ng of DNA, 10× PCR buffer, 200 μM of each dNTP, 10 pmol of primers forward (with a 18 bp tail at the 5′ end) and reverse, and 2 U of PerfectTaq DNA Polymerase (Q5 High-Fidelity DNA Polymerase, New England Biolabs). Fluorescent tail (10 pmol) was annealed with the forward primer using a double-step PCR: the first step consisting of an initial denaturation at 95 °C for 5 min, followed by 35 cycles of 95 °C for 30 s, 60 °C for 30 s, and 72 °C for 25 s, followed by a final elongation at 72 °C for 30 min; the second one (for tail annealing) made up by 17 cycles, with the same conditions of the first step except for annealing temperature (Tm = 52 °C). Negative (no template DNA) and positive (cv. leccino DNA) controls were included in each amplification run, in order to detect non-specific products and verify the success of PCR reaction, respectively. All amplifications were performed with the PCR System 9600 (Applied Biosystems, Foster City, CA). PCR products were initially electrophoresed on 2% agarose gels in order to check the amplicons and then loaded on an ABI 3130 genetic analyzer (Applied Biosystems-Hitachi, Foster City, CA), by using the internal GeneScan™ 500 LIZ Size Standard (Applied Biosystems). All amplifications and runs were replicated two times in order to test their repeatability. Output data were analyzed by GeneMapper 3.7 (Applied Biosystems-Hitachi). Non-amplifiable loci, multi-band PCR products (tri or more alleles or smearing when visualized on agarose gel), or amplicons, with unexpected lengths, were discarded.

The final selection of best markers was performed on a set of 32 varieties (Table S1) on 26 selected loci, using the aforementioned amplification conditions and applying the following criteria: (1) signal level, which was ranked as strong, medium, and weak; (2) stuttering level, which was scaled as low (or no stuttering), medium, and high; and (3) number of amplified alleles detected through fragment analysis.

All alleles at each locus were sequenced, in order to verify repeat motif features and base composition and confirm fragment length. Homozygous fragments were directly sequenced through the BigDye Terminator technique (Applied Biosystems). Heterozygous alleles showing different lengths were cloned into Escherichia coli XL1 Blue strain by using pGEM-T Easy Vector System I (Promega). All amplifications and cloning products were sequenced on an ABI 310 Genetic Analyzer (Applied Biosystems-Hitachi, Foster City, CA).

To verify the transferability of these loci to other forms related to cultivated olive, only 10 EST-SSRs, showing the best diversity scores, were further applied to the wider sample set of 90 genotypes.

In order to compare the discrimination power of the new developed SSR markers with di-nucleotide SSRs, the same diversity indices were applied on best-ranked di-nucleotide loci for the same set of 79 cultivars (Baldoni et al. 2009; Mariotti unpublished data).

To predict the entire open reading frame (ORF) of each original transcript, genomic scaffolds containing the target SSR regions were analyzed by Softberry FGENESH (http://linux1.softberry.com) and were then functionally annotated by using the Blast2GO software (Conesa et al. 2005, http://www.blast2go.com). The ExPASy translate tool (Gasteiger et al. 2003) was used to predict the protein sequence, starting from the genomic scaffold containing the validated 26 EST-SSRs. The coding regions were aligned in the NCBI database (BLASTX, non-redundant protein sequences), and those that did not show significant similarity, versus known proteins, were aligned to the nucleotide collection using BLASTN (Standard Nucleotide BLAST). The BLASTX hits were searched on Gene Ontology (GO) terms.

Data analysis

GenAlEx 6.501 (Peakall and Smouse 2012) was used to calculate the number of alleles (Na); number of effective alleles (Ne); Shannon’s information index (I); observed, expected, and unbiased expected heterozygosity (Ho, He, and uHe, respectively); fixation index (F); and genetic differentiation among populations (Fst). To evaluate the ability of new SSRs to assess molecular diversity and their potential use in fingerprinting analyses, the polymorphism information content (PIC) was computed at each locus by using Cervus 3.0.3 (Kalinowski et al. 2007), while FreeNa was applied to estimate the presence of null alleles (Dempster et al. 1977). Hardy-Weinberg equilibrium was estimated with GENEPOP 4.2 (Raymond and Rousset 1995; Rousset 2008) through chi-squared tests. Two-dimensional principal component analysis (PCA) was carried out with MultiVariate Statistical Package (MVSP) version 3.22 (Kovach Computing Services, Anglesey, Wales, UK), starting from a square matrix obtained by GenAlEx. Neighbor-joining (NJ) dendrogram was calculated with the Darwin software version 6 (Perrier and Jacquemoud-Collet 2006) using 10,000 bootstrap replications and 50% cutoff and visualized with FigTree v.1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

Results

Identification of repeated polymorphic motifs in EST collections

From the EST collections, all kinds of repeats from one to n nucleotides have been considered, just to get an idea on the frequency of each type of repeat along the transcript sequences. Fruit EST transcriptomes, represented by a total of 102,133 unigenes, contained 6637 repeats (1 repeated sequence every 3.45 Kbps), while flower EST collections were composed by 106,598 unigenes, with 12,762 total repeats (1 repeat every 3.41 Kbps) that were detected (Table 1). In both cases, a high percentage of mononucleotide motifs was observed. From a total of 19,399 identified repeats, 1738 resulted trinucleotidic or with longer repeated motifs. Only simple sequence repeats with 3 to 6 nucleotide motifs were selected, using the minimal length of SSR repeats (3 × 6 = 18 bp for trinucleotides, 4 × 5 = 20 bp for tetranucleotides, 5 × 4 = 20 bp for pentanucleotides, and 6 × 4 = 24 bp for hexanucleotides). Considering the aim of the present work, which intended to validate EST-SSRs with motifs longer than two bases, mononucleotide and di-nucleotide repeats were not examined.

Table 1 Total repeats derived from fruit and flower EST collections

EST-SSRs screening

From the first marker screening, which was carried out through MISA software considering polymorphisms among cultivars, 174 potential EST-SSRs were identified (Fig. 1). Selected SSR regions detected in different EST collections, but related to the same locus, were discarded in order to avoid redundancy. Eighty loci were preselected for primer design and further laboratory analyses, allowing to discard those showing amplification drawbacks.

Technical details of new EST-SSR markers

The 26 EST-SSRs finally selected as best markers were named as Olive EST (OLEST) SSRs (Fig. 1) and kept as the core of new markers. Their primer sequences, repeat patterns, and allele sizes are listed in Table 2. Accession numbers of submitted sequences (one allele for each locus), predicted genes including the microsatellite region, and corresponding biological processes are shown in Table 3. Among the selected EST-SSRs, twenty-five contained trinucleotide repetitions and only one (OLEST11) had a tetramer motif. The most common microsatellite motif was GAA (15.4%) followed by TGG (11.5%).

Table 2 Primer list, repeat patterns, and allele size for the 26 EST-SSR-developed markers
Table 3 Diversity indices for each locus resulting from the analysis of 32 cultivars

Sequencing analyses of the alleles detected at all 26 new EST-SSR markers, when amplified on 32 olive cultivars, confirmed the amplicon lengths obtained from fragment analysis and highlighted other sequence polymorphisms than those expected. In OLEST1, the CTT motif was interrupted by an ATT; in OLEST7, OLEST14, OLEST22, and OLEST26, some deletions of 10 to 25 bp were observed; OLEST17 included two 3 bp indels; OLEST10 and OLEST24 presented alleles with 3 bp indels (TGA and TTT, respectively). In OLEST26, the allele 306 was distinguished by a 10 bp insertion.

All selected loci resulted polymorphic in the 32 analyzed cultivated varieties (Tables 3 and S2). Na and Ne per locus were on average 4.65 and 2.81, respectively, with a minimum Na of 2 for OLEST6 and OLEST11 to a maximum of 10 for OLEST16. Mean values of I, Ho, He, uHe, and F were 1.07, 0.53, 0.56, 0.57, and 0.01, respectively. Ho and He values resulted very similar at all loci, except for OLEST5, for which the Ho value was considerably lower than He. Frequency of null alleles revealed high values only for OLEST5 and OLEST12 while resulted negligible or moderate for the other loci. Chi-squared test, which was applied to detect deviations from Hardy-Weinberg equilibrium, highlighted significant values for seven out of 26 markers, all characterized by a heterozygote deficit. PIC values ranged from 0.18 (OLEST25) to 0.82 (OLEST14), with an average of 0.51.

Transferability of OLEST SSR markers within the Olea genus

When based on their allele number and discrimination power, ten EST-SSR loci (OLEST1, OLEST7, OLEST9, OLEST12, OLEST14, OLEST15, OLEST16, OLEST20, OLEST22, and OLEST23) were applied to a larger number of cultivars (79) and to other related forms, such as wild plants, related subspecies, and the O. paniculata species, for a total of 90 genotypes (Table S1); new alleles were detected at all observed loci (excepting OLEST23) (Table S3), due to the new variation captured in the larger set of varieties and that specific to related taxa. The number of alleles ranged from two (OLEST6) to ten (OLEST18), with an average of 5.23. The maximum gap between the longest and shortest alleles was 57 bp for OLEST1 to 16 bp for OLEST15. In most cases, the alleles private to the related taxa resulted shorter (OLEST1, OLEST12, and OLEST15) or longer (OLEST9, OLEST14, OLEST20, and OLEST22) than those observed within the cultivated samples.

Most of EST-SSR markers were amplified correctly, except for OLEST16 and OLEST17. In particular, OLEST16 did not amplify in the O. europaea subspecies (except for subsp. cerasiformis) and in two out of four wild genotypes, while OLEST17 was detectable in the subsp. cuspidata and wild samples. The Olea subsp. maroccana (known as hexaploid) showed problems during the amplification for five out of 26 SSRs (OLEST10, OLEST14, OLEST16, OLEST23, and OLEST24). For each EST-SSR, a maximum of two alleles was detected in all accessions, including the subspecies known as polyploids (maroccana and cerasiformis).

The information index increased in most cases excepting for OLEST15, OLEST22, and OLEST23. Values of Ho and He remained quite similar to those found in the restricted set of varieties, with Ho generally lower than expected, especially for OLEST9 and OLEST12, whereas it was higher for OLEST22. On the contrary, fixation index values varied considerably, decreasing for OLEST7 and increasing for OLEST9 and OLEST15. Most of the selected markers showed zero or low null allele frequencies, four loci showed moderate values, and none showed high frequencies (higher than 0.2), as indicated by Chapuis and Estoup (2007). GENEPOP highlighted a deviation from Hardy-Weinberg equilibrium (due to heterozygosity deficit) for six out of ten loci and PIC values remained high, comparable to what was previously obtained for 32 cultivars. When verifying the ability of new SSRs to differentiate between the group of cultivars and the other related taxa considered as a separate population, an Fst value of 0.064 was obtained, indicating a moderate genetic differentiation.

By analyzing the entire data set, PCA described about 37% of variance (21.58 and 15.02% for first and second axes, respectively), revealing that wild olives, subspecies, and the O. paniculata samples were grouped apart from all varieties and, in particular, wild plants appeared separated, which were up to the extreme part of the graph, far from all other genotypes (Fig. 2).

Fig. 2
figure 2

Principal component analysis (PCA) carried out by using the whole sample set, analyzed through ten EST-SSRs, and amplified in all genotypes, by using MVSP version 3.22 (Kovach Computing Services, Anglesey, Wales, UK). Coordinates 1 and 2 represent the 21.58 and 15.02% of the total variance, respectively

Moreover, the ten EST-SSRs were also able to fully distinguish the 79 cultivars, unless for the two clones of cv. frantoio, as expected. Neighbor-joining analysis grouped them into three main clusters (Fig. 3). Cultivars oueslati, elmacik, and zalmati, which in the multivariate analysis were positioned close to wilds and subspecies, in neighbor joining were clustered apart from the other cultivars. Finally, it has been observed that three EST-SSR markers, which were characterized by the highest allele number and PIC values (OLEST1, OLEST14, and OLEST16), well distinguished all the 79 cultivars.

Fig. 3
figure 3

Relationships among the 79 olive cultivars assessed by constructing a neighbor-joining dendrogram and performed with the Darwin software, applying 1000 replicates and bootstrap cutoff of 50%. Starting data set was represented by the ten best EST-SSRs

When, for the same cultivar sample set, the ten best di-nucleotide SSR marker data were considered, they showed a higher mean number of alleles and slight higher values for I, Ho, He, uHe, and PIC, with a lower frequency of null alleles (Table S4).

Functional annotation

BLAST search in GenBank, performed for the 26 OLEST SSRs, revealed high identities for many EST-SSRs. Considering the low level of information available for these EST-SSRs, these markers have been also annotated. From the 26 unigenes, 18 showed similarity to known functional proteins and eight resulted unknown (Table 4). In particular, the predicted gene related to OLEST21 showed 73% of identity with OesDHN (dehydrin gene isolated and characterized from an oleaster, Chiappetta et al. 2015) and in silico annotation showed an insertion of 107 bp. Moreover, OLEST11 and OLEST13 were related to cysteine-type endopeptidase inhibitor activity and defense responses, respectively; OLEST18 is involved in the biological process of proteolysis, and OLEST25 works as signaling pathway related to ethylene activation.

Table 4 GenBank accession numbers (AN) of new EST-SSRs, related predicted genes, and putative biological process

Discussion

In the last decades, the development of more effective molecular markers has been promoted for genotyping purposes in several fruit crop species (Guo et al. 2014; Jiao et al. 2014; Sun et al. 2015), and especially for the olive (Biton et al. 2015; Dominguez-Garcia et al. 2012; Kaya et al. 2013; Torkzaban et al. 2015). Despite the availability of a large set of molecular tools, olive fingerprinting still remains a difficult task, mainly due to the several weaknesses of most widely used markers (Bracci et al. 2011) that include the lack of sequence information, the difficulties in distinguishing among alleles, and the impossibility to share data among labs.

The development of highly reliable markers, which are potentially linked to traits of interest, may improve theoretical and applied research on genotyping, mapping, and marker-assisted breeding (Cubry et al. 2014; El-Rodeny et al. 2014; Liu et al. 2014; Kalia et al. 2011; Ramchiary et al. 2011; Shirasawa et al. 2011; Zhang et al. 2014; Zhao et al. 2012).

In order to identify new and effective SSR markers for olive, a deep screening of fruit and flower EST collections has been performed. In silico analysis and laboratory procedures made it possible to validate 26 promising OLEST SSRs, discarding markers characterized by inadequate flanking regions, multiple calls, and uncertain amplifications. Twenty-five of the selected EST-SSRs showed trinucleotide repeated motifs and only one was a tetranucleotide, likely reflecting the distribution of repeat regions among transcribed and repeated DNAs in many other plants (Morgante et al. 2002; Varshney et al. 2005). The absence of longer repeated patterns within the selected set probably depends on their low frequency (288 long repeats found in the EST collections versus 1450 trinucleotidic patterns) and their low polymorphism level, as already observed in other plant species (Boccacci et al. 2015; Long et al. 2015; You et al. 2015).

Tandem repeats occurring in coding regions can result in the variation of the polynucleotide sequence, thereby causing changes in protein products. In fact, many human diseases have been reported to be associated with trinucleotide repeats (Duran et al. 2013; King 2012). No clear information is yet available about how EST-SSRs in plant genomes can change the function of genes or their expression rate (Asadi and Monfared 2014).

Variations in EST allele sequences are more strictly related to protein function since they are responsible for important changes, leading to the shift in reading frames or unexpected stop codons or different protein lengths and structures, as detected in other species (Qi et al. 2010; Emebiri 2010). The presence of variations in the coding regions increase the functional value of these new EST markers, which are mainly involved in important biological processes, as defense and response to abiotic and biotic stresses. For example, the sequence similarity detected between the predicted dehydrin from the OLEST21 locus and the OesDHN dehydrin gene (Chiappetta et al. 2015), involved in drought tolerance, confirms the potentiality of the OLEST SSRs as functional markers. Further studies will contribute to clarify their potential role as markers linked to traits of main interest.

Their potential capacity to keep the olive variability has been tested on a restricted set of olive cultivars, which is a representative of the wide variability of cultivated olive (Belaj et al. 2012; El Bakkali et al. 2013), then extended to a larger set of cultivars and other related forms, in order to test their capacity to capture molecular variation at a wider scale. Their mean values of polymorphism information content fit in an intermediate class (Xie et al. 2010), highlighting promising levels of variability, as also confirmed by their observed heterozygosity.

Based on main genetic diversity indices (Na, Ho, and PIC), ten OLEST markers have been further selected within the initial set, as best candidates to assess the genetic variability over a high number of varieties and wild-related forms. In fact, it has been observed that only three new SSR markers (OLEST1, OLEST14, and OLEST16) may allow to discriminate all the analyzed genotypes. On the contrary, also the less polymorphic EST-derived markers, highlighting the presence of private or rare alleles, could find useful application and represent a valuable resource for genotyping, for the detection of specific cultivars or for DNA testing of olive oils obtained by blending different varieties.

The transferability of the 26 new OLEST SSR markers developed on cultivated olives was assessed on related taxa within the Olea genus, including wild plants (O. europaea subsp. europaea var. sylvestris), O. europaea subspecies, and related Olea species, as previously tested for other markers (Besnard et al. 2011; Besnard and El Bakkali 2014; Rallo et al. 2003). Most of them resulted to be easily amplifiable and detectable across all genotypes, and just a few showed problems of amplification or scoring, probably due to polymorphisms (insertions/deletions or base mutations) in primer regions or, regarding subsp. maroccana and cerasiformis, to ploidy level. However, the majority of the OLEST SSR markers resulted to be highly suitable to discriminate among related Olea taxa, clearly distinguishing olive varieties from related taxa (wilds, subspecies, and subgenera), and could be applied as useful tools to investigate comparative genomics, genetic differentiation, and evolutionary dynamics within the Olea genus (Ma et al. 2010; Qiu et al. 2010; Varshney et al. 2005).

Compared to the ten best-ranked di-nucleotide SSRs (Baldoni et al. 2009), the ten most informative OLEST-SSRs showed a lower variability and a higher frequency of null alleles, probably because EST-derived SSRs are associated with transcribed regions, thus reflecting a lower genetic variability then genomic, neutral, and randomly selected markers as di-nucleotide SSRs (Hu et al. 2011; Leonarduzzi et al. 2016). However, both kinds of markers are well suitable to characterize the olive variability and their combined application could efficiently contribute to explore the olive germplasm and resolve variety identification problems arising from the use of di-nucleotide markers. Best primer sequences and amplification protocols for genotyping application of these new markers have been released, in order to allow for a clear discrimination among genetic profiles and the use of multiplexing strategies.

The 26 EST-SSR markers developed in this work represent new particularly useful tools for the olive genotyping. In particular, ten OLEST markers resulted to be highly polymorphic and effective to discriminate olive cultivars and are proposed as a new set of markers relevant to discriminate genotypes within the Olea genus.