Introduction

Grape (Vitis vinifera L.) is one of the most economically important fruit crops worldwide. For table grapes, large berries and seedlessness are the most appreciated traits (Doligez et al. 2013). Two types of genetically mediated seedlessness exist in grape, parthenocarpy and stenospermocarpy (Bouquet and Danglot 1996; Cabezas et al. 2009; Ingrosso et al. 2011; Sarikhani et al. 2009). In parthenocarpy, the ovule develops without fertilization, yielding small berries that completely lack seeds. In stenospermocarpy, although pollination occur, the seeds fail to develop due to the early degeneration of the endosperm and abnormal development of integuments (Pratt 1970; Ledbetter and Ramming 2011). Unlike parthenocarpy, stenospermocarpy can produce berries of a size compatible with commercial requirements (Ledbetter and Burgos 1994). However, the development of new stenospermocarpy varieties involves the generation and selection of large numbers of hybrids each year, which is both expensive and time consuming (Karaagac et al. 2012). Therefore, breeding stenospermocarpy varieties tends to focus on identifying and exploiting molecular markers associated with genes that control seedlessness (Striem et al. 1994, 1996).

Many models for the genetic basis of seedlessness in grape have been proposed based on the traits associated with the molecular markers. Currently, the most widely accepted model involves three independent and complementary recessive genes, which are controlled by a dominant regulator locus called seed development inhibitor (SDI) (Bouquet and Danglot 1996). The seedless phenotype is determined by the presence of a dominant allele at the SDI locus, which has been located on chromosome 18 by quantitative trait locus (QTL) mapping (Cabezas et al. 2009; Doligez et al. 2002; Mejia et al. 2007). The VvAGL11 gene is an orthologue of a MADS-box gene involved in ovule differentiation, which has been suggested as the most probable candidate gene at the SDI locus for seedlessness (Mejía et al. 2011; Bergamini et al. 2013).

Various studies have sought to identify genes for seedlessness through the comparison of gene expression profiles in seeded and seedless grape varieties. Based on differential gene expression patterns in seeded and seedless cultivars, Wang et al. (2011) found that VvCBP1 was essential for embryo development in seedless grape. In addition, Nwafor et al. (2014) produced a list of genes that were differentially expressed in a seedless mutant compared to the wild-type. Although numerous studies have been carried out, our understanding of the genetic basis of seedlessness remains limited.

Genome-wide association study (GWAS) has been widely used to identify QTL and candidate genes due to its ability to analyse genetic controls of complex traits (Yan et al. 2011; Verslues et al. 2014; Mamidi et al. 2013). In this study, we performed specific-locus amplified fragment sequencing (SLAF-seq) and identified single nucleotide polymorphisms (SNPs) in 199 grape accessions. Subsequently, we applied GWAS for detection of trait loci and candidate genes associated with seedlessness.

Materials and methods

Mapping population

In the present study, 199 grape accessions were evaluated (Supplement Table 1). These accessions were obtained from the National Grape Germplasm Repository at Zhengzhou Fruit Research Institute of Chinese Academy of Agricultural Sciences (113°42′E and 34°42′N). The accessions for each treatment contained three plants, and a completely randomized selection with twenty replicates was used. For each accession, the row spacing was 2.5 m, and the planting distance was 1.0 m. The plants were cultivated under the same environmental conditions, without any growth regulator, and all plants were 12 years old. Seeds were scored repeatedly during four growing seasons (2012–2015 years) according to criteria of the OIV (International Organisation of Vine and Wine; Anonymous 1983). Among 199 accessions, 124 were seeded and 75 were seedless (OIV descriptor 241).

SLAF-seq and SNP identification

Genomic DNAs from each grape accession were isolated from fresh leaves using a previously described method (Murray et al. 1980). SNP genotyping was performed using SLAF-seq. Different restriction enzyme combinations were selected using in silico digestion-site prediction to obtain more than 400,000 sequencing tags of 300–500 bp per genome that were distributed evenly in unique genomic regions. Sequencing tags of 314–464 bp lengths were selected as SLAF labels. Double digestion with the enzymes RaseI and HaeIII (New England Biolabs, NEB, USA) were found to be the most suitable. The SLAF tags were evenly distributed through the genome. The sequence reads at both ends of each fragment in each library were generated using Illumina HiSeq™2500 (Illumina, Inc.; San Diego, CA, USA), with a barcode approach to identify each sample.

All reads were processed for quality control and then filtered with Seqtk (https://github.com/lh3/seqtk). High-quality paired-end reads were mapped onto the reference grape genome (PN40024) (Jaillon et al. 2007) using the Short Oligonucleotide Alignment Program 2 (SOAP2, Version_2.20) (Li et al. 2009b; Li and Durbin 2010). The sequences were downloaded from the grape database, which was available at ftp://ftp.ensemblgenomes.org/pub/release-23/plants/fasta/vitis_vinifera/dna/. After comparing the reference genome with the obtained genomic sequences using the Burrows Wheeler Aligner (BWA, Version_0.7.10-r789) software package (Li and Durbin 2010), SNPs were developed with the Genome Analysis Toolkit (GATK, Version_3.2) (Mckenna et al. 2010) and SAMtools (Version_1.1) (Li et al. 2009a) software packages. The SNPs were filtered according to minor allele frequency (MAF) >0.05 and integrity of each SNP >0.8.

Population structure evaluation

The SNPs were used for population structure evaluation filtered after the integrity of each SNP more than 0.8 and the minor allele frequencies (MAF) more than 0.05. A phylogenetic tree of the grape accessions based on SNPs was constructed using the neighbour-joining algorithm (Saitou and Nei 1987) and the MEGA5 software package (Tamura et al. 2011). Population structure of the grape accessions was analysed by the admixture software package (Alexander et al. 2009). Principal component analysis (PCA) (Steinherz et al. 1999) was performed using the cluster software package (De Hoon et al. 2004).

Linkage disequilibrium (LD) analysis and genome-wide association mapping

LD between pairs of SNPs was estimated as r2 using TASSEL version 3.0 software package (Bradbury et al. 2007). 414,223 SNPs from the grape accessions after filtered by the integrity of each SNP more than 0.8 and the MAF more than 0.05, were used for the association analysis using a general linear model (GLM) and compressed mixed linear model (MLM) in TASSEL. The p value was adjusted with the Bonferroni method at α ≤ 0.1 (corresponding to P ≤ 2.4 × 10−7) and α ≤ 0.01 (corresponding to P ≤ 2.4 × 10−8) to determine whether the association was significant (Holm 1979).

Results

SLAF-seq and SNP identification of grape accessions

An association map was constructed for the 199 grape accessions to enable fine mapping of seedlessness genes. Sequencing of the SLAF libraries yielded approximately 404 million paired-end reads, and 88.65% of these were successfully mapped to the grape reference genome. The average Q30-value of these reads was 90.93%, and GC content was 39.05%. A total of 421,204 high quality SLAF tags were obtained from the 199 genotypes, of which 327,872 were polymorphic. In total, 4,180,905 SNPs were identified from these SLAF-seq tags; 414,223 SNPs were selected by the criteria MAF >0.05 and integrity of each SNP >0.8 (Table 1). These 414,223 SNPs markers covered all 19 chromosomes. The maximum number of SNPs was located on chromosomes 14 (26,175 SNPs) and 18 (25,948 SNPs); the minimum number of SNPs was observed on chromosomes 2 (16,078 SNPs) and 17 (16,228 SNPs).

Table 1 Distribution and frequency of SNPs identified through the SLAF-seq approach in grapevine

LD and population evolution

The extent of LD depended on mapping resolution and the required marker density for GWAS. r2 was quantified between different physical distances. The 414,223 SNPs detected in the 199 accessions were used for pair-wise analyses. The r2 decay to half of its initial value was 11.57 kb (Fig. 1a). On the basis that the reference genome size was 486,265,422 bp, we estimated our SNP density is approximately one SNP per 1.17 kb.

Fig. 1
figure 1figure 1

Genetic diversity and population structure of 199 accessions. a Linkage disequilibrium (LD) pattern of the grape genome; b phylogenetic tree; c population structure. Each colour represents a group, and each row represents a stakeholder value. d Diagram shows the value of 199 samples based on clustering from 1 to 20; e principal components analysis (PCA). (Color figure online)

The relationships among the grape accessions were assessed by analysis of the population structure, construction of a phylogenetic tree, and by PCA. The population structure analysis identified 414,223 SNPs in the 199 grape accessions. Little divergence was indicated between the grape groups. The phylogenetic tree showed that the 199 accessions could be clustered into 12 branches (Fig. 1b). Different numbers of K populations were explored to reveal the hierarchical population structure. As shown in Fig. 1c and d, this analysis estimated the most likely number of populations at K = 12. A three-dimensional projection of each sample from the PCA analysis was plotted on a scatter plot. Most accessions were found to cluster despite the fact that several accessions were discriminative (Fig. 1e).

Genome-wide association analyses of loci for seedlessness traits

The genetic basis for seedlessness was investigated using an association panel from the 199 genotypes and 414,223 SNP markers. This analysis suggested that a major genetic locus for seedlessness might be located on Chr 18. The GLM and MLM analyses found 294 and 82 SNPs, respectively, which were significantly related to seed formation, with gene variances of 18% and 24%, respectively (Table 2; Fig. 2). From the MLM analysis, 74 SNPs were found to be located at 25.8–29.1 Mb on Chr 18; rs1826891824 was significantly associated with seed formation and may be an important locus for seed formation. Comparison of the positions of the SNPs detected by GWAS with previous QTL studies showed that the SNP rs1826891824 practically overlapped the simple sequence repeat (SSR) marker VMC7f2, which is associated with seedlessness (Fig. 3).

Table 2 Details of loci associated with seedlessness via GWAS based on GLM and MLM
Fig. 2
figure 2

Genome-wide association scan for seedlessness. a Manhattan plots for the MLM; the x-axis shows SNPs along each chromosome; the y-axis is the −log10 (P-value) for the association. The different colors indicate the 19 different and unmapped chromosomes of grape. Red and blue horizontal lines indicate the genome-wide significance and extreme significance threshold, respectively. b Quantile–quantile plot for MLM. The horizontal axis shows −log10-transformed expected P-values, while the vertical axis shows −log10-transformed observed P-values. c Manhattan plots for the GMLM; d Quantile–quantile plot for GLM. (Color figure online)

Fig. 3
figure 3

Regions of the genome showing strong association signals near previously identified SSR markers. The vertical red line indicates the SNPs of lowest P-value. The blue horizontal dashed lines indicate the genome-wide significance and extreme significance threshold, respectively. (Color figure online)

To confirm the beneficial allele at each peak SNP associated with the seedless phenotype, gene models that located in LD decay distance genomic region upstream and downstream of each peak SNP in the reference grape genome were considered to be seedless gene candidates in the present study. Hundreds of grape genes were identified in the flanking regions of each peak SNP based on MLM, however, most of these had no functional annotation or belonged to unknown function families. The candidate genes that had functional annotations are summarized in Supplementary Table 2.

Discussion

GWAS with LD analysis provides an effective tool for the identification of genetic loci controlling quantitative traits, and has been widely applied to investigate correlation genes. To date, a number of reduced representation sequencing methods have been developed, such as genotyping by sequencing (GBS) (Elshire et al. 2011), type IIB restriction site associated DNA (2b-RAD) (Wang et al. 2012), and SLAF-seq (Sun et al. 2013). We chose the latter for this study because of its advantages, such as lower sequencing costs, higher genotyping accuracy, and efficient detection system. In this study, SNP markers with an average density of one SNP per 11.57 kb were obtained. This marker density was sufficient to achieve association mapping of the seedlessness trait.

Data of this study found that the major genetic locus responsible for seedlessness may be at Chr18, suggesting that Chr 18 may be a very important chromosome for seedlessness in grape. The SSR marker VMC7f2 is located on Chr 18 and has been confirmed to have an inhibitory effect on seed development. Adam-Blondon et al. (2001) reported that SSC8 was a useful marker for seedlessness in grape. Additionally, Korpás et al. (2009) have shown that both SCC8 and SCF27 are linked to SDI, and are necessary but not sufficient loci for the seedless phenotype in grape. Interestingly, association analyses found that the SSR marker VMC7f2 was closely related to this QTL, and was therefore useful for selection of seedlessness. Cabezas et al. (2009) reported a major effect QTL located on Chr 18 which explained 50% of the phenotypic variation for fresh seed weight. In the present study, the SNP rs1826891824 (P-value = 1.43E-9, phenotypic variation 16.3%), was found to be associated with the seedlessness trait and was shown to be located within VMC7f2 and to practically overlap with the seedlessness SSR marker VMC7f2. The SNP rs1826891824 was 3.2 kb distant from VIT_18s0041g01880 (VvAGL11: MADS-box protein SEEDSTICK) (Nwafor et al. 2014).

MADS-box genes encode transcription factors with a highly conserved DNA-binding domain termed MADS domain, which are involved in the regulation of many aspects of plant growth and development, such as embryonic development, floral organ determination, ovule development and fruit ripening (Wang et al. 2015; Immink et al. 2009). MADS-box genes generally comprise large families, among which, VvAGL11 has been proposed to be the major functional candidate gene for seedlessness through variations in its promoter region (Mejía et al. 2011). Additionally, VvAGL11 shows homology to the Seedstick/Agamous-like 11 (STK/AGL11) gene, which is expressed during Arabidopsis seed development. VvAGL11 is the closest to the seedlessness SSR marker VMC7f2, and also contains the SDI locus, suggesting that VvAGL11 may be a functional candidate gene for the seedlessness trait (Costantini et al. 2007; Mejía et al. 2011; Bergamini et al. 2013).

Some of the genes that were detected in the seedlessness association analysis were also associated with berry development and berry size traits fell within the previously reported genes (Boss et al. 2002; Houel et al. 2010; Guillaumie et al. 2011; Doligez et al. 2013; Wang et al. 2016; Muñoz-Espinoza et al. 2016). Two SNP loci (rs1826914496 and rs176825328) associated with the genes VvAG3 (MADS-box ovule identity-MIKC gene that was expressed in flowers and berries) and EXPA (Alpha-expansin with expression linked to berry development) were detected from the seedlessness trait analysis. As berry growth and seed formation are known to be related, the correlation between the subtraits of seedlessness and berry weight/development observed at the phenotypic and genetic levels may be due to pleiotropy and may be directly or indirectly affected by the growth regulator produced by seeds (Mejía et al. 2011). Similarly, the correlation between flowering time and seedlessness traits observed at the SNP locus rs188117546 (VVC2892A: expressed in very early stages of floral development) (Doligez et al. 2013) may be due to that genes control the two trait functions independently of each other.

Minor loci were detected on other chromosomes; these may be due to environment-genotype interactions or to the limited detection power because of a combination of moderate population size and at least one major locus responsible for most of the phenotypic variance. In the present study, some loci were found to be consistent with several previously reported loci or related genes for seed traits. Nwafor et al. (2014) identified the gene VIT_12s0059g00560 associated with seed development and the gene located on the SNP rs125367605 downstream 43 kb. Additionally, Royo et al. (2016) reported that the gene VIT_05s0020g02350 plays an important role in seed development and the gene located on the SNP rs54076883 downstream 40 kb.

Hundreds of positional candidate genes were found using SNPs, some of which might be valuable in grape because functional annotation indicated that their potential orthologues were involved in determining seed contents. The expression patterns of genes encoding ubiquitin protein, abscisic aldehyde oxidase, ethylene responsive transcription factor, zinc finger protein, somatic embryogenesis receptor, and MADS-box gene were also obtained by GWAS. The studies of transgenic tobacco and potato that express ubiquitin extension protein promoter-GUS fusion, showed that the uidA gene is expressed in meristematic tissues, pollen and ovules (Garbarino and Belknap 1994; Callis et al. 1990). Hanania et al. (2009) reported that overexpression of the ubiquitin extension protein S27a in carpels and integuments may lead to embryo abortion and seedlessness in grape. The abscisic aldehyde oxidase gene is found to catalyze the final step in the biosynthesis of abscisic acid, and the abscisic acid plays a major role in seed development (González-Guzmán et al. 2004). Zinc finger protein genes constitute a large and complex gene families in the plant genomes, which have been described to have functions on seed development (Zhou et al. 2012). Furthermore, some genes are expected to control processes associated with seed development (such as ethylene, gibberellin and auxin), sugar biosynthesis and transport, cell division or elongation, and signal transduction (Ledbetter and Ramming 2011; Peng et al. 2007; Lijavetzky et al. 2012). Furthermore, MADS-box transcription factors play an important role in floral and ovule development, which may be associated with seedlessness traits (Wang et al. 2015).

Conclusions

A high-density map of SNPs markers from 199 grape accessions was developed using SLAF-seq technology and used for association mapping to investigate the genetic control of seedlessness traits. A major locus with the largest effect on Chr 18 was found, along with some minor loci on different chromosomes. This study demonstrates that SLAF-seq and GWAS are powerful approaches for the dissection of complex traits in grape.