Introduction

Vitis vinifera L., one of the most widely cultivated species of agricultural and economical interest (Vivier and Pretorius 2002), is of Indo-European origin and its distribution area extends from Central Asia to the Mediterranean Basin (Zohary et al. 2000). Over the centuries, different events (such as domestication process and outbreeding) have had a significant influence on the large genetic pool of grapevine cultivars around the word, leading grapevine to be one of the most heterozygous species, carrying in its genome deletions, insertions, inversions, and single nucleotide polymorphisms (This et al. 2006; Jaillon et al. 2007; Velasco et al. 2007). The accumulation of casual mutations and natural or artificial crossing have been the driver of grapevine evolution since its domestication (This et al. 2006; Forni 2012). The mutations could happen in shoot apical meristematic layers with different fate: (1) they are preserved by asexual propagation when occur in the somatic cell lines, or (2) they are inherited to the progenies by sexual reproduction when they arose in the germinal cell lines (D’Amato 1997). Because the grapevine propagation is done mainly through cuttings, somatic mutagenic events can cumulate over time, leading to determine genetic variation and the creation of new cultivars and clones (This et al. 2006). The induced polymorphisms are often detectable into the genome as single nucleotide polymorphism (SNP) and insertion/deletion (INDEL), and several examples were already reported. The Gret1 retrotransposon insertion in the VvMybA1 promoter region is one of the induced variability example detected in the Pinot family cultivars, differing for the berry color (Kobayashi et al. 2004; Yakushiji et al. 2006; Vezzulli et al. 2012). The Chardonnay musqué clone 44–60 Dijon, differing for a SNP in the candidate gene for muscat flavor from the mother clone (Emanuelli et al. 2010), and Carignan, where clones showing different cluster shape harbor the insertion of Hatvine1-rrm transposable element in the VvTFL1A promoter, confirmed these phenomena (Fernandez et al. 2010). Due to this wide genetic variability, clonal selection is the most common breeding method to improve grapevine by exploiting intra-varietal genetic diversity and identifying high-performant clones in a specific environment and their consequent official registration, propagation, and distribution to viniculture market.

With the aim to detect polymorphisms and distinguish among grapevine cultivars, different molecular markers such as RAPD (Moreno et al. 1995), AFLP (Cervera et al. 1998), SSR (Bowers et al. 1996; Sefc et al. 1999), ISSR (Moreno et al. 1998), S-SAP (D’Onofrio et al. 2010), REMAP (Castro et al. 2012), and SAMPL (Cretazzo et al. 2010) were used. The microsatellites or simple sequence repeats (SSRs) are highly reproducible and informative for their codominant and multiallelic fashion, playing a predominant role to evaluate genetic diversity in several plant crops, such as maize (Reif et al. 2006), rice (Thomson et al. 2007), wheat (Laidò et al. 2013), peach (Dirlewanger et al. 2002), olive (Cipriani et al. 2002), and citrus (Barkley et al. 2006). Also, in grapevine, genotyping is mainly based on SSR, which have been useful for cultivar identification, finding relationships among cultivars, synonyms, homonyms (Cipriani et al. 2010; Laucou et al. 2011; Emanuelli et al. 2013), and parentage analysis (Lacombe et al. 2013). However, SSR showed their limit to help clonal selection and identification, being not always able to discriminate among clones/biotypes of the same cultivar (González Techera et al. 2004; Pelsy et al. 2010). Thus, based also on SSR time consuming and costly detection, other markers should be developed to detect somatic mutations and clonal variations.

The advent of high-throughput next-generation sequencing (NGS) technologies with the possibility to sequence entire genomes more efficiently allowed to obtain large-scale SNP identification for crops and the onset of efficiency SNP genotyping platform (Schmid et al. 2003; Dereeper et al. 2011; Chagné et al. 2012; Ganal et al. 2012; Peace et al. 2012; Verde et al. 2012; Gardner et al. 2014; Yu et al. 2014; Tayeh et al. 2015; Jiang et al. 2016; Melo et al. 2016). The abundance in the genome and the ability to identify polymorphism due to variation at single base level based on bi-allelic nature are the main advantages of SNP markers. In addition, the high-throughput multiplexed SNP assay represents a very useful tool to evaluate genome-wide allelic variation for genetic diversity, population structure, and parentage analysis in crops. Although SNP polymorphism information content (PIC) is lower compared to SSR, the high number of identifiable SNP in the genome and their reproducibility make them ideal for developing panels of markers useful for genetic variation and cultivar identification (Myles et al. 2011; Sim et al. 2012; Mercati et al. 2015; Winfield et al. 2015; Kurokawa et al. 2016).

The grapevine genome sequence has been available since 2007 (Jaillon et al. 2007; Velasco et al. 2007), and large-scale SNP discovery and genotyping have been reported (Lijavetzky et al. 2007; Pindo et al. 2008), leading to identify more four hundreds of thousands SNPs, validating a subset on a 9K genotyping array (Myles et al. 2010, 2011). Exploiting SNP frequency and their discrimination power, informative SNPs set for cultivar identification (Cabezas et al. 2011), and clonal variation studies were also identified (Carrier et al. 2012). More recently, in the frame of a large-scale grape genome resequencing (GrapeReSeq Consortium), a high-throughput genotyping 18K SNP chip was developed (Le Paslier et al. 2013).

Among the European grapevine regions, Sicily is devoted to the viticulture from ancient time and characterized by a rich ampelographic platform including cultivars both of local, national, and international interest. This genetic diversity has been probably preserved by on-field grafting traditional technique, in contrast to the widely usage of pre-grafted and certified cultivars coming from a narrow genetic pool, adopted in other European grapevine regions. In the past decade, more than 3000 accessions around the island were collected and included in a clonal selection program. Based on morphological traits (48 Organization Internationale de la Vigne et du Vin (OIV) descriptors utilized in the GrapeGen06 European project; www.eu-vitis.de/index.php) (Maul et al. 2012), more plants from different biotypes belonging to each cultivar were selected (Maitti et al. 2009). In viticulture, the term biotype refers to morphological variations within a cultivar. Single plants or different clonal lines are classified in the same biotype if sharing similar phenotypic expressions regarding morphological traits, such as bunches and/or leaves, slightly different from the most frequent phenotypic traits of the cultivar. This morphological variability is usually addressed to the long-time cultivation of variety in a specific geographical area (Campostrini et al. 1995). Those morphological traits have an effect on the qualitative characteristics of grapes and musts that arise. A typical example is the different oenological aptitude to wine-making of Pinot noir clones. Indeed, the biotypes with large berries are used to produce sparkling wine, while that with small berries are used for red wine, as reported in the catalogue of vines grown in France (http://plantgrape.plantnet-project.org). In the context of the clonal selection program in Sicily, the genetic variability among the main cultivars was assessed based also on selected SSR panel (Carimi et al. 2010, 2011; De Lorenzis et al. 2014).

In the present study, a grapevine core collection from Sicily was characterized by high-throughput Vitis18kSNP genotyping array to assess the genetic relationships among cultivars and to discriminate among biotypes of the same cultivar. Our results well distinguished the Sicilian cultivars, providing additional information about their genetic relationships by parentage analysis. In such cases, SNP genotyping appears also able to differentiate among biotypes of the same cultivar. Furthermore, the present study allowed us to isolate a small panel (12) of highly informative SNPs that may become a rapid diagnostic tool for Sicilian cultivar identification.

Material and methods

Plant material

A panel of 101 samples from 21 biotypes belonging to 10 Sicilian grapevine cultivars, included in the grapevine collection of Regione Sicilia (Marsala, Italy), was taken into account. Based on the ampelographic traits recorded for each cultivar in the experimental field, from 1 to 3 biotypes per cultivar were considered (Table 1 and supplementary file, Table S1). For each cultivar, clones already registered at the National Register of Grapevine cultivars (http://catalogoviti.politicheagricole.it/) were included in the panel. Catarratto Bianco Comune and Catarratto Bianco Lucido are two varieties registered at the National catalogue as distinct cultivars, even though they showed identical genetic profile at 11 SSR loci (Crespan et al. 2008). The first one corresponds to Catarratto biotype A and B and the second one to biotype C. Pinot Noir and Sangiovese were included in the analysis as international and national reference varieties, respectively.

Table 1 List of the major Sicilian V. vinifera cultivars analyzed by Vitis18kSNP array

Ampelographic analysis and SSR genotyping

Ten plants were cloned from each of the 21 biotypes (belonging 10 Sicilian cultivars) and were utilized for the ampelographic analysis to assess intra-cultivar variability (Table 1). Forty-two ampelographic traits (supplementary file, Table S2), related to young shoot, shoot, young and mature leaf, woody shoot, bunch, and berry, were recorded as specified by the OIV (http://www.oiv.int/) during spring-summer seasons 2011 and 2012. The observations were carried out in different times during the vegetative seasons, such as at flowering for OIV 1 or between berry set and vèraison for OIV 65, as reported in the second edition of the OIV Descriptor List for Grape Varieties and Vitis Species (http://www.oiv.int/en/technical-standards-and-documents/description-grape-varieties/oiv-descriptor-list-grape-varieties-and-vitis-species-2nd-edition).

A detailed description of each trait and its expression is reported at http://www.eu-vitis.de/docs/descriptors/mcpd/WP2-DESCRIPTORS-v4.pdf. Finally, using the ggplot2 R package (https://cran.r-project.org/web/packages/ggplot2/index.html), a heatmap describing the variation of OIV descriptors were set up. Each descriptor was recorded on a 1–9 scale, and the different colors and gradients were associated to the scale and combination for each category.

All 101 clones and 2 reference cultivars were analyzed by using the 9 SSR (VrZag62, VrZag79, VVMD5, VVMD7, VVMD25, VVMD27, VVMD28, VVMD32, VVS2) (Thomas and Scott 1993; Bowers et al. 1996, 1999a; Sefc et al. 1999), suggested as a standard set for grapevine genotyping in the frame of the GrapeGen06 European project, to assess the common genetic background of biotypes belonging the same cultivar. Genomic DNA was extracted from 0.1 g of young leaves tissue (1–2 cm of diameter), using the QiagenDNeasy Plant Mini Kit (Qiagen, Hilden, Germany). DNA quality (260/230 and 260/280 ratios) and concentration was checked by NanoDrop Spectrophotometer (Thermo Scientific, Waltham, MA). Multiplexed amplification reactions were performed in 25 μl final volume reaction mixture as described in De Lorenzis et al. (2013). The amplification products were solved on ABI PRISM 310 Genetic Analyser (Applied Biosystems by Life Technologies, Foster City, USA), and the alleles were sized by GENEMAPPER 4.0 (Applied Biosystems by Life Technologies). Pinot Noir and Sangiovese were included as reference varieties for the allele standardization.

SNP array analysis and reproducibility

The genotyping of the whole studied panel (101 Sicilian grapevine genotypes and 2 reference varieties) was carried out using the custom Vitis18kSNP array (Illumina Inc., San Diego, CA), designed by GrapeReSeq Consortium, which assays 18,071 SNPs (Le Paslier et al. 2013). DNAs, extracted as reported above, were delivered to Fondazione Edmund Mach (San Michele all’Adige, Trento, Italy) and to TraitGenetics GmbH (Gatersleben, Salzlandkreis, Germany) for genotyping. Two hundred nanograms of genomic DNA were used as template for the reaction, following the manufacturer’s instructions (Illumina Inc.). Because of genotypes were processed in two different service platforms (Fondazione Edmund Much and TraitGenetics GmbH), one sample per each biotype from eight out of ten Sicilian cultivars was genotyped twice (one per each laboratory) starting from two independent DNA extractions, to measure the reproducibility of Vitis18kSNP assay. The differences among duplicated SNP profiles were evaluated, and cluster analysis was performed to establish the threshold reproducibility of our system, according to unweighted pair-group method with arithmetical average (UPGMA) algorithm by Molecular Evolutionary Genetics Analysis (MEGA) version 5 (Tamura et al. 2011). Percentage of genetic similarity among replicates of each biotype was determined by cluster analysis, and the lowest value of similarity that grouped the replicates of a same biotype was used to determine the threshold value of reproducibility (Okitsu et al. 2013).

The reproducibility value was high, determining the threshold above 99 %. Among cultivars, Inzolia showed the highest percentage of different loci (0.36 %, 41 SNP loci), while Carricante the lowest ones (0.12 %, 14 SNP loci; Table S3). The results confirmed the stability of data produced.

SNP data processing and structure population analysis

SNP row data were scored with Genotyping Module 1.9.4 of the GenomeStudio Data Analysis V2011.1 software (Illumina Inc.). The dataset was filtered and standardized utilizing the following criteria: (i) samples with low SNP call quality (p50GC < 0.54) were removed; (ii) SNPs with a GenTrain score higher than 0.6 were retained; (iii) the monomorphic SNPs were detached; (iv) SNPs with a number of non-calls (NCs) higher than 20 % were deleted. For all the analyses SNPs with minor allele frequency (MAF) < 0.05, and missing rate > 0.20 were removed.

The main genetic parameters to analyze population genetic diversity, including observed (H o ) and expected heterozygosity (H e ) (Nei 1973), the MAF, and the inbreeding coefficient (F), were carried out by PEAS 1.0 software (Xu et al. 2010).

The SNP dataset was utilized to investigate the genetic relationships among biotypes of ten major Sicilian cultivars by both principal coordinates analysis (PCoA) and cluster analysis. PCoA was performed using SNPrelate, an R package for large-scale calculations (Zheng et al. 2012). A linkage disequilibrium (LD)-based pruned SNP set was first choice based on 0.2 LD thresholds to avoid a large SNP clusters effect. LD-based pruned SNP set was utilized for PCoA analysis by using the snpgdsPCA function in SNPrelate. A phylogenetic tree was designed by the UPGMA method, implemented in the MEGA 5.0 software (Tamura et al. 2011). The bootstrap analysis was performed based on 100 resampling.

Population structure analysis (Pritchard et al. 2000), a Bayesian approach to inferring the correlation between genotypes based on admixture model, was performed using fastSTRUCTURE package (Raj et al. 2014). Individuals were assigned to K population/genetic clusters based on their multilocus profile. The genetic clusters were assembled to minimize intra-cluster LD, and the proportion of membership for each genotype was estimated. The admixture model without the prior population information was employed. K values from 1 to 10 were tested, and the most likely K was chosen running the algorithm for multiple choices of K (Raj et al. 2014). For each run, the initial burn-in period was set to 50,000 iterations with 500,000 Markov chain Monte Carlo (MCMC) replications. The admixture proportions estimating the most likely K were viewing by DISTRUCT software (Rosenberg 2004).

The identification of highly informative SNPs for varietal identification was also carried out. The R package Genome Association and Prediction Integrated Tool (GAPIT) (Lipka et al. 2012) was used with default parameters to identify private SNP profiles related to the cultivars analyzed. For GAPIT analysis, 12 categories (one for each cultivar) were assigned, based on SSR profiles and ampelographic data. A mixed linear model (MLM) approach (Yu et al. 2006; Zhang et al. 2010) was adopted and a kinship (k) matrix was calculated. The Benjamini-Hochberg procedure (Benjamini and Hochberg 1995) adjusted for the multiple testing problems by controlling the false discovery rate (FDR) at 0.05 and the missing data were treated by major allele substitution. To identify specific SNP profiles related to the germplasm studied, each SNP was tested in turn, using an F test (H0: no association between the SNP profile and cultivar) and P values were obtained (Lipka et al. 2012). The SNPs selected were randomly verified through Sanger sequencing method following standard protocol to confirm the marker profiles identified by GAPIT. The PCR reaction was carried out in a 20 μl volume containing 50 ng of genomic DNA, 1× supplied PCR buffer (Promega), 0.2 mM of each dNTP (Roche), 0.2 unit of Taq DNA polymerase (Promega), 0.20 μM of specific primers pair. PCR reactions were performed under the following cycle program: 94 °C (5 min), then 30 cycles at 94 °C (30 s)/60 °C (30 s)/72 °C (30 s), and a final extension at 72 °C for 10 min. Subsequently, the fragments were solved on ABI PRISM 310 Genetic Analyser (Applied Biosystems by Life Technologies, Foster City, USA). The primer pairs are listed in supplementary file (Table S4).

Parentage analysis

The identity-by-descent (IBD) index (the probability that two genotypes are descended from single ancestral genotype and not identical by chance) was carried out on each pair of genotypes by PLINK 1.07 software (Purcell et al. 2007) to infer relationships among non-redundant individuals. The filtered SNP dataset was used, and the most frequent SNP profile for each cultivar was chosen for parentage analysis. MAF and r 2 of LD were set on 0.01 and 0.05 values, respectively. Four parameters were taken into account, Z0 (probability of sharing 0 IBD allele identical-by-descent), Z1 (probability to share 1 IBD allele), Z2 (probability to share 2 IBD alleles), and PI-HAT [the relatedness measure measured as PI-HAT = P (IBD = 2) + 0.5 × P (IBD = 1)]. In parent-offspring relationships, Z0 and Z2 are expected to be 0 and Z1 to be 1, while in second-degree pairs, Z0 and Z1 are expected to be 0.5 and Z2 to be 0. Therefore, pairs of genotypes showing a PI-HAT value similar to 0.5 are related by first-degree or closer relationships. Starting from the PLINK results, a circular plot was developed, reporting the first- and second-degree relationships among varieties.

Results

Ampelographic analysis

Morphological traits were scored twice during spring-summer seasons 2011 and 2012, recording 42 out of 48 OIV descriptors suggested by the European GrapeGen06 project (Maul et al. 2012) from ten cloned plants of each biotype. A detailed description of the 42 OIV descriptors, reporting the main traits discriminating among biotypes, is included in the supplementary file (Table S2). For an easy-view of the ampelographic results, a heatmap was provided, representing the expression level of each OIV descriptor in each biotype (Fig. 1). The biotypes clustered according to their own cultivar and the differences among biotypes of the same cultivar were displayed as well, even though those differences were addressed to few descriptors. The descriptors OIV 6, 51, 65, 67, 68, 75, 76, 83–1, 84, 93, 94, 101, 202, 204, 206, 208, 220, 221, 223, and 241 showed differences among cultivars that can be clearly distinguished, while the discrimination among biotypes of the same cultivars was much less evident (Fig. 1, Table S2). The descriptors showing the highest differences in their expression levels among the biotypes of the same cultivar were OIV 202 (length of bunch), with values ranging from 3 (short) and 9 (very long), and OIV 204 (density of bunch), with values ranging from 3 (loose) to 9 (very dense) overall the samples. Catarratto showed the highest number of discriminant traits, indeed 14 OIV descriptors were able to discriminate biotype C from A and B. The other cultivars showed a less number of different ampelographic traits, ranging from 1 (Grecanico) to 4 (Inzolia and Nero d’Avola), mainly related to the bunch morphology (Fig. 1).

Fig. 1
figure 1

Forty-two OIV descriptors recorded for each biotype of ten Sicilian grapevine cultivars. The expression level of OIV descriptors has been represented by a heatmap. Different colors and gradients represent the categories for each descriptor reported in detail in Table S2

SSR-based true-to-typeness cultivar determination

Nine SSR markers selected in the frame of European GrapeGen06 project were utilized to establish the genetic profile of each biotype belonging to their own cultivar. Twelve SSR profiles were obtained, one for each cultivar, including the two reference cultivars, Pinot Noir and Sangiovese (supplementary file, Table S5). Biotypes of the same cultivars held the same SSR profile, confirming their common background. The true-to-typeness of each cultivar was determined by a comparison with already public standardized SSR profiles (Italian Vitis Database (IVD), http://www.vitisdb.it; Vitis International Variety Catalogue (VIVC), http://www.vivc.de/index.php).

SNP analysis and genetic relationship assessment among cultivars and biotypes

The genetic relationships among the main Sicilian grapevine cultivars and the intra-varietal genetic variation were deeper investigated by using the Vitis18kSNP array, a high-throughput genotyping system. After the SNP dataset inspection, 554 loci (3 %) did not amplify among all genotypes and about 14,794 loci (82 %) showed GT score higher than 0.6 (Table 2). The final dataset resulted in 14,755 out of 18,071 loci, after removing the SNPs with a number of not calling (NC) higher than 20 %. The number of polymorphic loci (11,411) was about 77 % of the final SNP panel (Table 2). The expected and observed heterozygosity values were quite similar (0.284 and 0.313 for H e and H o , respectively) with a mean inbreeding coefficient of −0.102 (Table 2). The overall value of MAF was 0.210 and 3643 out of 11,411 SNP loci (about 32 %) showed a MAF value lower than 0.100.

Table 2 Summary statistics of genetic variation obtained by Vitis18kSNP array in 101 Sicilian grapevine clones belonging to 21 biotypes and 10 cultivars

The SNP profiles of samples belonging to their own cultivar were nearly identical, although the SNP divergences among plants of the same cultivar ranged from eight (Nerello Cappuccio) to 247 (Catarratto) loci (mean value of 52.5). The list of polymorphic loci for each cultivar was reported in the supplementary file (Table S6). Almost the whole cases of polymorphism (99.9 %) were due to the changes from homozygous to heterozygous loci. Only in few cases (0.1 %), the polymorphism was determined by the change from a homozygous to the other one, probably due to the natural nucleotide variation that allowed the fixation or loss of a new mutation in specific loci. The position 14879091 in chromosome 7 was the polymorphic locus shared among biotypes of eight analyzed cultivars. The major number (4) of polymorphic SNP loci were shared from the pair Frappato - Nero d’Avola (2:16685674; 1:9046299; 5:10489883; 11:6480292), followed by Catarratto - Perricone, Frappato - Grecanico, and Perricone – Zibibbo pair, which had three common polymorphic SNP loci.

Multivariate and population structure analyses

Multivariate PCoA and cluster analysis were used to investigate the genetic distances among cultivars and biotypes by using 11,411 polymorphic SNP loci. PCoA, a scattered plot reporting the first two coordinates and describing all cultivars analyzed, is showed in Fig. 2a. The main two coordinated explained 32.52 and 21.31 % of total variability, respectively. Most of the samples (96 %) were grouped based on their own biotype and/or cultivar, with the exception of Catarratto, which showed the highest genetic polymorphism among genotypes. Cluster analysis assigned properly all samples (100 %) to their own cultivar (Fig. 2b), likewise PCoA, showing Grecanico and Nero d’Avola as the two most distant cultivars based on SNP genetic diversity. Cluster analysis highlighted also that discrimination among biotypes of the same cultivar appeared difficult, indeed the sub-clusters for each cluster included different biotypes. The bootstraps among cluster, ranging from 95 to 99 %, should avoid misclassifications.

Fig. 2
figure 2

Genetic relationships among 101 Sicilian grapevine clones, arranged in 10 cultivars and 21 biotypes, and two reference cultivars, obtained by 11,411 polymorphic SNPs. a Principal coordinates analysis (PCoA) based on Vitis18kSNP. In the brackets the percentage of variability explained by each coordinate is reported. A zoom in has been provide in order to highlight the differences among clones of the same cultiva. b Dendrogram generated using UPGMA method. Bootstrap values are shown as percentage. ac different biotypes of the same cultivar; *reference cultivar

The allelic profiles were used in the model-based clustering method implemented by fastSTRUCTURE software to ascertain the likely number of genetic groups (K) within the Sicilian germplasm collection. The algorithm for multiple choices of K revealed a clear optimum for K = 3. The genetic structure of analyzed panel is shown in Fig. 3. Eighty out of 101 clones (about 76 %) were assigned to a cluster at K = 3 by using a >80 % threshold for group classification. Interestingly, none biotypes/genotypes were misclassified based on their own cultivar. Nearly the totality of cultivars showed 100 % of membership for one group. Indeed, Nero d’Avola, showing a private genetic structure, represented as the green pool (Fig. 3), Carricante, Frappato and Perricone together with Nerello Cappuccio and Zibibbo were members of the purple group (Fig. 3), while Grillo, Catarratto and Grecanico represented the third group (blue; Fig. 3). The analysis was not able to assign Inzolia to a specific genetic pool, since the percentages of membership for groups 1, 2, and 3 were 38, 24 and 38 %, respectively.

Fig. 3
figure 3

Admixture proportions of Sicilian grapevine cultivars, as estimated by fastSTRUCTURE (K = 3) based on 11,411 polymorphic SNPs from the Vitis18kSNP. The results are grouped by cultivars of origin for each individual. Each vertical bar represents a sample (101 clones belonging 10 Sicilian cultivars). The color proportion for each bar represents the posterior probability of assignment of each individual to one of three groups of genetic similarity. The range of assignment probability varies from 0 to 100 %

Different default criteria, as high frequency of genotyping success (missing rate < 0.20) and MAF > 5 %, were adopted to select an informative SNPs dataset (7235), starting from 11,411 polymorphic SNPs used in the genetic diversity analysis. The selected panel was used to identify putative private profiles related to each cultivar. Twelve categories, one for each cultivar (ten Sicilian and two reference grapevine cultivars), were assigned based on the SSR profiles and ampelographic analysis. The identification of specific markers profile associated to the 12 categories was performed using a MLM, controlling the relatedness based on kinship values. The MLM was applied to study the links between marker profiles and cultivars, since it improves the ability to detect phenotype-genotype associations in presence of population stratification and multiple levels of relatedness, increasing the statistical significance of the analysis. The kinship matrix was calculated based on the percentage of shared alleles, displaying the clustering of cultivars and the dissimilarity among genotypes (Fig. 4). This analysis confirmed private genetic clusters for each investigated cultivars as already suggested by the previous analyses. Indeed, the samples belonging to the same cultivar clustered in a common branch, of which Grecanico and Nero d’Avola are the two varieties most distant (Fig. 4). Although a large set of genetic markers that provide good coverage of whole genome was used, the kinship matrix, as well as the cluster analysis, did not allow us to discriminate among biotypes of the same cultivar. However, the chosen approach allowed us to identify a set of highly informative markers (12 SNPs, in both coding and non-coding regions) that, through their profiles combination (Table 3), can discriminate all the cultivars included in the present study, except Catarratto (Table 3). Indeed, this cultivar showed three different profiles, one of which is prevalent (14 out of 17 samples); however, all profiles were able to discriminate this cultivar from the others (Table 3). To verify the private profiles identified using the high-throughput genotyping system and MLM approach, the isolated SNPs were randomly tested on the same cultivars studied and validated through Sanger method. The analysis of sequences around the most informative SNP confirmed the private profiles belonging to each cultivar.

Fig. 4
figure 4

Kinship analysis for ten Sicilian and two international-national grapevine cultivars, visualized through a heatmap based on 11,411 polymorphic SNPs from the Vitis18kSNP. Color gradient displays the dissimilarity among genotypes: red indicates the most similar clones, while white shows the lowest genetic similarity. The cultivars are indicated with colors used in the PCoA and clustering analysis. Reference varieties (* and bold); #Catarratto prevalent profile

Table 3 SNP private profiles of 10 Sicilian grapevine cultivars at 12 highly informative loci out of 18,071, able to discriminate among the analyzed varieties

Parentage analysis

SNP dataset and the probability to have IBD alleles were probed to investigate the parentage among cultivars, assigning the properly relationship category, such as parent-offspring and second degree. The most informative relationships (first and second degree) were displayed in the circular plot (Fig. 5). A complete list of relationship categories per each pairwise genotype is also recorded in Table S7. Ten out of 12 cultivars resulted to be related to other cultivars included in the panel, for a total of 7 relationships, five of them classified as PO (parent-offspring), showing Z0, Z1, Z2, and PI-HAT values similar to theoretical values (0, 1, 0, and 0.5, respectively), and the two pairs showed a second-degree relationship, having relatedness values similar to theoretical values (0.5, 0.5, 0, 0.25).

Fig. 5
figure 5

Circular plot displaying the first- and second-degree relationships among ten Sicilian grapevine cultivars based on polymorphic SNPs from the Vitis18kSNP. The cultivars are represented as circles sited at the corners of a 12-side geometric shape; the black lines represent the architecture of the geometric shape and do not correspond to any parentage relationship among cultivars. Filled circles: Sicilian grapevine cultivars; empty circles: reference varieties; green lines: first-degree relationships; blue lines: second-degree relationships

The cultivar Catarratto showed the highest number of relationships within the analyzed panel, two PO (with Grecanico and Grillo) and one second degree (with Nero d’Avola) (Fig. 5 and supplementary file, Table S7). Among Sicilian cultivars, Carricante and Nerello Cappuccio did not show any relationships with the other cultivars. As expected, Pinot Noir did not show parentage relationships with all the other cultivars, while Sangiovese showed two PO (with Frappato and Perricone) relationships with the Sicilian cultivars. Catarratto, Inzolia, and Nero d’Avola linked each other by a second-degree relationship, even though the Catarratto-Inzolia pair showed relatedness values rather deviated from theoretical values (supplementary file, Table S7). Finally, the pairwise Carricante-Sangiovese showed the empiric values in between theoretical values for second-degree relationship and unrelated genotypes (supplementary file, Table S7).

Discussion

The genetic variability among 21 biotypes belonging to 10 major Sicilian autochthonous grapevine cultivars (101 clones) and their relationships were investigated by either 42 ampelographic OIV descriptors, 9 SSR or 18,701 SNP loci. SSR analysis, firstly adopted to classify the clones to their own cultivar, was able to detect ten genetic profiles (one for each cultivar), but not able to distinguish among biotypes of the same cultivar.

As expected, the ampelographic analysis (42 OIV descriptors) was able to well discriminate among cultivars (Fig. 1 and Table S2). In addition, the OIV descriptors resulted different among biotypes of the same cultivar, up to 14 descriptors were variable within a cultivar. Therefore, the Catarratto biotypes resulted clearly distinguished by 14 OIV descriptors (Fig. 1). The density of bunch (OIV 204), the most variable descriptor, was able to distinguish among biotypes belonging to the same other cultivars (Fig. 1). These evidences are consistent with the definition of biotype and that human beings are inclined to select plants for morphological and agronomical differences that finally could influence the qualitative properties of grapes, as occurred for the selection and maintenance of plants with fleshy and large berries, or with white berries (This et al. 2006). However, although the analysis of 42 OIV descriptors was useful to discriminate among biotypes of the same cultivar, these markers are time-consuming and largely influenced by environment (Tessier et al. 1999). The highest morphological variability observed in the cultivar Catarratto was confirmed by genetic analysis that, in contrast, was not able to distinguish sub-clusters including clones belonging each biotype.

Thus, high-throughput SNP genotyping could provide an additional tool to study the genetic diversity and the population structure of the Sicilian grapevine germplasm. The Vitis18kSNP array, developed through NGS technologies, represents a very useful tool to discover genome-wide allelic variation for genetic diversity that could replace the SSR markers for cultivar identification. After removing SNPs having a range of NC from 20 to 100 %, the analysis were carried out on the amplified loci showing a GT score lower than 0.6, as well, providing a good coverage of whole genome (79 %). Since the Vitis18kSNP array contains about 25 % of loci identified from different Vitis species (V. aestivalis, V. berlandierii, V. labrusca, V. cinerea, V. lincecumi, and Muscadinia rotundifolia), the percentage of SNP loci (3 %) showing any fragment amplification appeared reasonable, as compared to previous reports (Bekele et al. 2013; De Lorenzis et al. 2015). The percentage of polymorphic loci was high, and the values of heterozygosity (expected, H e and observed, H o ), very similar among them, lower than those reported for Sicilian collections analyzed by SSR markers (Carimi et al. 2010; De Lorenzis et al. 2014). These results are expected due to the bi-allelic nature of SNPs, although the higher discriminating powerful through the larger number of loci analyzed. Similar results were reported analyzing 700 grapevine cultivars by both 22 SSRs and 384 SNPs (Emanuelli et al. 2013). The MAF mean values were also comparable with those reported by Lijavetzky et al. (2007), who analyzed a collection of about 300 V. vinifera accessions (MAF = 0.24), and Emanuelli et al. (2013) (MAF = 0.25). The negative value of F was consistent with high heterozygosity values, meaning an excess of heterozygosity due to a probably prevalence of outcrossing.

As revealed by SSR, SNP analysis confirmed the properly classification of each clone (101) to its own cultivar (Figs. 2 and 4). In addition, SNP polymorphisms among plants of the same cultivar were detected, ranged from 8 (Nerello Cappuccio) to 247 (Catarratto) loci. Unfortunately, these polymorphisms were not able to classify each biotype belonging to the same cultivar in the same cluster, underlining a lack of correlation between genetic and morphological diversity. As example, the three biotypes of Catarratto showed marked differences in morphological (Table S2) and agronomical traits (data not shown) and large variability in their SNP profiles, not able to distinguish among biotype A and B (Catarratto Bianco Comune) and biotype C (Catarratto Bianco Lucido), as previously reported by Crespan et al. (2008), based on SSR analysis. The same authors reported the synthesis of epicuticle waxes covering berry skin as the unique discriminating trait among Catarratto biotypes (Crespan et al. 2008); thus, we concluded that neither the chosen SSR nor SNP loci were able to cover this mutation. In contrast, one or few SNP loci have been reported to discriminate among clones of the same cultivar, such as the SNP in the DXS locus of Muscat and non-Muscat aromatic cultivars (Emanuelli et al. 2010) or the SNP in GAI gene, which determines the number of leaf hairs and reduces plant height and promotes flowering (Boss and Thomas 2002). More recently, the sequencing of Pinot noir clones showed that highly polymorphic Gypsy-like elements are the major causes (about 85 % of the total polymorphic sites) in mutational events occurred in somatic mutations, followed by SNPs (11 %) and indels (4 %) (Carrier et al. 2012). Based on these evidences, the role of the Gret1 insertion retrotransposon in the promoter region of VvMybA1 gene to determine the absence of color in the berry skin were demonstrated (Kobayashi et al. 2004; Yakushiji et al. 2006; Vezzulli et al. 2012). Further, the insertion of Hatvine1-rrm transposable element in the VvTFL1A promoter was reported to cause differences in cluster shape in cultivar Carignan (Fernandez et al. 2010).

As highlighted above, the SNP-based genetic relationship among cultivars identified by cluster analysis (Fig. 2b) supported their distribution underlined also in the PCoA (Fig. 2a), except the Pinot Noir, that in the dendrogram appeared as the most divergent cultivar compared to the others, in agreement with their French origin (Bowers et al. 1999b).

The fastSTRUCTURE analysis inferred three groups based on SNP dataset (Fig. 3), where the largest ones included six Sicilian cultivars. The genetic structure was not able to discriminate between cultivars from Western (Catarratto, Grecanico, Grillo, Inzolia, Perricone, and Zibibbo) to Eastern (Carricante, Frappato, Nerello Cappuccio, and Nero d’Avola) areas of Sicily as already reported (De Lorenzis et al. 2014). Therefore, at K = 3, two significant groups of related genotypes were distinguished: Frappato, Perricone, and Zibibbo in the first group (purple) and Grillo, Catarratto, and Grecanico in the second one (blue). The third different genetic structure was assigned to Nero d’Avola, as already observed in the PCoA, the most important and widespread red berry cultivar in Sicily, in agreement with its presumable origin (Calabria).

The admixed genetic structure of Inzolia appeared in accordance with the most known hypothesis about its origin and spread around the Mediterranean Basin, resulting as an important evidence of our analysis. Indeed, molecular evidences already supported the hypothesis that Inzolia, alias Ansonica, was firstly introduced in Sicily by the Greeks in the fourth century B.C. and then spread out in the Island of Giglio (in front of Tuscany) (Labra et al. 1999). Thus, the genetic structure of Inzolia could be the result of human-mediated exchanges between Greece and Magna Graecia throughout history. Greek people domination could have influenced the genetic structure of grapevine varieties by the introduction of foreign varieties utilized for genetic improvement that gave raises the autochthonous cultivars of Sicily (Pastena 2009).

Parentage analysis highlighted significant relationships among cultivars, of which some confirmed previous reports as the cross “Catarratto × Zibibbo” from which Grillo derived (Di Vecchi-Staraz et al. 2007; Cipriani et al. 2010; De Lorenzis et al. 2014). In addition, first-degree relationships between Catarratto and Grecanico (Di Vecchi-Staraz et al. 2007; Lacombe et al. 2013), as well as the first-degree relationship of Sangiovese with Frappato and Perricone were confirmed (Di Vecchi-Staraz et al. 2007; Gasparro et al. 2013; De Lorenzis et al. 2014). Finally, a first evidence of a second-degree relationship between Catarratto and Nero d’Avola and Inzolia and Nero d’Avola, probably due to their same pedigree, were also found (Fig. 5).

The availability of genome sequence and the high-throughput genotyping platforms have enabled a wide range of applications to clarify the relationships between genotype and phenotype (Yang et al. 2011). Recently, Carrier et al. (2012) presented the first genome-wide analysis of polymorphism among clones of Pinot Noir to identify polymorphisms involved in somatic mutations. Another study, using the Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), resolved the complex heterozygous genome, isolating a set of mapped marker loci useful for breeding programs. Cabezas et al. (2011) using a resequencing strategy in selected genotypes developed a set of 48 stable SNP markers with a uniform genome distribution to use for grapevine genotyping.

In summary, single mutation (SNP) and transposon elements can generate somatic variation in grapevine; therefore, the new available high-throughput approaches, such as SNP-array genotyping, RAD-SEQ, and GBS, are very powerful technologies to investigate inter-varietal diversity and population structure of local variety. Nevertheless, in some cases, based on the genome complexity and given the difficulty to identify the different biotypes within a specific cultivar, the integration of different methods might be the best, but more expensive, approach.

Finally, the present study was able to identify, by GAPIT analysis, private SNP profiles related to each cultivar analyzed. In particular, a set of 12 highly polymorphic SNPs, scattered across the genome, can discriminate the main Sicilian cultivars that showed private 12 SNP profiles (Table 3). Specific SNP profiles were able to discriminate all Sicilian cultivars and the reference cultivars. The quality and repeatability of the SNP panel were evaluated by Sanger method.

Conclusion

In this paper, the genetic diversity of ten widespread Sicilian grapevine cultivars was assessed by 42 OIV descriptors, 9 standard SSRs, and the Vitis18kSNP array. The OIV descriptors were utilized for cultivar and biotype morphological characterization. The SNP array was then adopted for genotyping 101 clones from 21 biotypes belonging to the 10 cultivars.

OIV descriptors and SNP datasets were able to distinguish among cultivars, while the recognition among biotypes belonging to the same cultivar appeared more complex. In the next future, large efforts should be devoted to the analysis of location and function of each polymorphic SNP among biotypes of the same cultivar. Particular attention will be payed to Catarratto that revealed the larger intra-varietal genetic diversity. Although both classes of markers were informative, ampelographic analysis is time-consuming and largely influenced by environment, thus can be replaced by SNP-array. Up to date, for lab automation and cost-effectiveness, the SNP array will represent a very useful tool to investigate the genetic diversity. The development of SNP databases for grapevine cultivars could help the overcoming of SSR also for the true-to-typeness cultivar assignment. Cluster and parentage analyses confirmed a high number of genetic relationships among Sicilian cultivars, based on the Vitis18kSNP array. These results demonstrated that the selection practise was made over the years in Sicily, leading to increase the genetic diversity of grapevine germplasm, to date considered the biggest and oldest winegrowing Italian region.

Finally, the panel of 12 SNPs scattered across the genome can be proposed for a fast and low cost genotyping system to recognize and safeguard the Sicilian grapevine cultivars. This study could represent a starting point to implement and extend the same system to other national and international grapevine cultivar collections.