Introduction

The eggplant (Solanum melongena L.), also known as aubergine or brinjal, belongs to the Solanaceae family. It is considered one of the most important cultivated Solanaceous crops with a global production of 52.3 million tons and cultivation of 1.8 million ha (FAO Statistics 2017, http://www.fao.org/). It is cultivated and consumed worldwide, of which China produces about half of global production, ranking it the largest eggplant producer (32.8 million tons, 785,000 ha), followed by India, Egypt, and Turkey (FAO Statistics 2017, http://www.fao.org/). Eggplant is also regarded as one of the healthiest vegetables due to its high amounts of vitamins, minerals, and bioactive compounds such as chlorogenic acid and anthocyanins with antioxidant properties (Cericola et al. 2013; Plazas et al. 2013).

Eggplant is an Old World species and it was hypothesized to be domesticated from the wild species Solanum insanum (Ranil et al. 2017; Taher et al. 2017). Eggplant cultivation was documented in China 2000 years ago, and continual selection and breeding resulted in eggplant varieties with different fruit shapes and colors (Hurtado et al. 2012). However, like other domesticated crops, the domestication outside of its wild ancestor natural habitat and anthropogenic selection resulted in a drastic reduction in genetic diversity (Acquadro et al. 2017; Flint-Garcia 2013). The focus on limited varieties as genetic materials in breeding would cause a danger of genetic erosion (Munoz-Falcon et al. 2009; Cericola et al. 2013). Despite several genetic research studies reporting eggplant genetic diversity (Cericola et al. 2013; Hurtado et al. 2012; Naegele et al. 2014), they focused mostly on eggplant germplasms rather than on cultivated varieties and hybrids, and a limited number of DNA markers were tested. Therefore, comprehensive studies assessing the genetic variations among the commercial eggplant hybrids are lacking, which could help to better understand their genetic basis and relationships.

In past decades, there has been a continuing increase in the cultivation of eggplants, and hundreds of commercial varieties have been released into the seed market. To protect the economic interest of eggplant breeders, seed producers, and eggplant producers, rapid variety identification and authentication methods are becoming necessary. The traditional variety identification method based on the morphological characteristics through field inspection was time consuming and easily being influenced by environmental conditions. Genotyping using molecular markers has been proven to be a reliable alternative method that could be more accurate and accessible (Munoz-Falcon et al. 2009; Gao et al. 2012; Jamali et al. 2019). Among the molecular markers, SNP as a third-generation marker system with high density across the genome and high genetic stability has been widely used for genetic background selection and marker-assisted selection breeding, as well as for map-based clone and QTL mapping (Cheng et al. 2016; Wu et al. 2018a). It has also been successfully applied and used to establish DNA fingerprint in several crops, such as rice and maize, but has not been much studied in eggplant cultivars (Shirasawa et al. 2004; Tian et al. 2015).

Although a large number of SNPs have been discovered in eggplant based on RAD sequencing (Barchi et al. 2011), RNA sequencing (Gramazio et al. 2016), genotyping-by-sequencing (Acquadro et al. 2017), and whole genome resequencing (Barchi et al. 2019a). Compared with the other major Solanaceae crops like tomato and potato (Kim et al. 2014; Hamilton et al. 2011; Vos et al. 2015), genome-wide SNPs for eggplant have been relatively unexplored. The draft genome of eggplant SME_r2.5.1 was published (Hirakawa et al. 2014), and a well-assembled genome of the eggplant line 63/7 anchored at chromosome level (Barchi et al. 2019b) was constructed. This provides an opportunity to perform resequencing in diverse eggplants to explore genetic variations and genome wide SNPs. For example, a recent study used resequencing data from eight S. melongena and one S. incanum accessions to identify SNPs for genotyping 422 eggplant accessions with Single Primer Enrichment Technology (SPET) (Barchi et al. 2019a). Studies have shown that exploring on a wider gene pool is necessary to avoid polymorphism bias and is more suitable for developing markers for use in a much wider range of germplasm and varieties (Wang et al. 2014; Vos et al. 2015). The use of variomes from large resequencing data has been demonstrated to be useful and efficient for selecting suitable SSR markers, named perfect SSRs for genotyping in cucumber (Yang et al. 2019). The perfect SSRs were selected based on their polymorphism, stable motif, and flanking sequence conservation in the cucumber variome. In this study, we utilized the eggplant variome based on the resequencing data of 45 eggplant lines and selected 219 genome-wide perfect SNPs that are informative and have conserved flanking sequences. A newly established high-throughput SNP genotyping technique, target SNP-seq, was successfully applied to genotype 377 eggplant varieties. Further genetic analysis of the 377 eggplant varieties provided valuable insight into the effect of human selection on population structure and genetic diversity. A genome-wide association study identified several SNPs associated with fruit shape. This study provides valuable information for eggplant variety management and for marker-assisted breeding in the future.

Material and methods

Plant material and DNA isolation

A total of 45 eggplant lines from China, South Asia, Japan, and Europe were resequenced, and 377 commercial eggplant varieties with variations in fruit shapes and color provided by Beijing Seed Management Station were applied to establish the DNA fingerprint in this study (Table S1). Total DNA were extracted from fresh true leaves following a CTAB-based method (Fulton et al. 1995).

Genome-wide SNP discovery

Resequencing data for 45 eggplant lines were used for genome-wide SNP discovery (Genome Sequence Archive at http://bigd.big.ac.cn/, accession number CRA001645). All of high-quality reads were mapped to the reference genome (Barchi et al. 2019b) using BWA, and the mapped reads were filtered to remove PCR duplication. SNP discovered from the 45 eggplant lines was conducted with GATK with stringent filtering criteria. Detected SNP with the following criteria: (1) minor allele frequency (MAF) > 0.4; (2) miss rate < 0.2; (3) heterozygosity < 0.2; (4) no sequence variation in the flanking region of 100 bp were selected as candidate perfect SNPs (Yang et al. 2019). Then a minimal number of SNPs representing the total genetic diversity of the 45 eggplant lines across the genome was selected (Henning et al. 2015; Yang et al. 2019) for subsequent genotyping by target SNP-seq.

SNP genotyping by target-seq

We designed a multiplex PCR panel with primers targeting sequence flanking the 150-bp regions containing each of the selected perfect SNPs. The target SNP-seq library construction procedure was based on the target SSR-seq protocol (Yang et al. 2019). It included a multiplex PCR reaction followed by another PCR reaction to add a universal adaptor and a specific barcode to the amplified DNA fragments from each sample. The purified PCR product from each sample was then combined and the target SNP-seq library was ready to be sequenced on the Illumina Hiseq X ten platform (Molbreeding Biotechnology Company, Shijiazhuang, China).

The raw sequencing reads were demultiplexed and assigned back to sample identities based on the specific barcodes via the Illumina bcl2fastq pipeline (Illumina, San Diego, CA, USA). The reads for each sample were then mapped to eggplant reference genome of eggplant line 67/3 (Barchi et al. 2019b) using BWA with default parameters (Li and Durbin 2009). Specific SNP variant bases were identified and SNP genotype data were extracted using the GATK software (McKenna et al. 2010). For each SNP locus, the allele with the highest number of reads was considered as major allele and the allele with the second highest number of reads as minor allele. A SNP locus with a read frequency of the major allele greater than 0.8 was considered as homozygous and a SNP locus with a read frequency of major and minor alleles both larger than 0.3 was considered as a heterozygous genotype (Yang et al. 2019).

Population structure analysis of eggplant varieties

The SNP genotype data were analyzed by using the model-based program STRUCTURE (Pritchard et al. 2000). Ten independent runs for K value (number of population groups assigned) ranging from 1 to 10 were performed using an admixture model with a burn-in period of 10,000 steps followed by 100,000 Monte Carlo Markov Chain simulations. The most likely K value was determined by considering ΔK (Evanno et al. 2005), the second order rate change of (estimated log probability of data) LnP (D) with respect to K implemented in STRUCTURE HARVESTER (Earl and von Holdt 2012). All varieties were assigned into corresponding groups based on their proportional membership probability (Q). Principal component analysis (PCA) and principal co-ordinate analysis (PCoA), and an unrooted neighbor-joining tree with Nei’s standard genetic distance were performed using the ape and poppr packages in R software (Nei 1978; Kamvar et al. 2014).

Marker polymorphism and population differentiation analysis in eggplant varieties

The minor allele frequency (MAF), gene diversity (GD), observed heterozygosity (Ho), and polymorphic information content (PIC) were calculated using a perl script (Yang et al. 2019). To understand the population differentiation, an analysis of molecular variance (AMOVA) between and within groups was conducted using the poppr R package (Kamvar et al. 2014). Also, the estimation of pairwise F statistics (Fst) among groups was performed using the hierfstat R package.

Core SNPs set development for variety identification

The Perl method developed in Yang’s study was used to select a core set of SNPs representing the total genetic diversity and with high discerning power (Yang et al. 2019). The saturation curve of the discernibility was plotted by pairwise comparison of varieties genotypes.

Core eggplant varieties selection

A pairwise comparison matrix by calculating the numbers of differential SNPs between each variety was built within each population (Yang et al. 2019). Fewer differential SNPs indicated closer kinship with others. The top 20% varieties with close kinship were considered as core varieties in each group.

GWAS analysis

The GWAS analysis was performed with Tassel v5.2.25 (Bradbury et al. 2007) using the mixed linear model (MLM, PCA + K-model), taking into account both kinship and structure (Yu et al. 2006). Three types of eggplant fruit shape: round, oval, and long were scored from 1 to 3, respectively. Association with an adjusted p value less than 0.005 was declared significant. For markers that were significantly associated with a trait, a general linear model with all fixed-effect terms was used to estimate R2, the amount of phenotypic variation explained by each marker.

Results

Genome-wide perfect SNPs discovery based on resequencing data of 45 eggplant lines

Based on the resequencing data of 45 eggplant lines, the total reads were mapped to the newly assembled reference genome of eggplant (Barchi et al. 2019b) to discover genome-wide polymorphic SNPs. A total of 26,029,890 SNP loci were obtained, of which 93,582 SNPs that had a MAF more than 0.4, and a missing rate and heterozygosity less than 0.2 were selected as candidate SNP loci for genetic analysis. Analyzing the 100-bp sequence flanking the 93,582 SNPs from the resequencing data, 1925 SNPs with no sequence variations in flanking sequences were chosen as perfect SNPs. A modified minimal marker protocol (Henning et al. 2015; Yang et al. 2019) was applied to select the minimal numbers of SNPs to represent the genetic diversity in the resequencing data of the 45 eggplant lines. Overall, 251 SNPs were selected, of which 219 successfully passed the multiplex PCR panel design for subsequent target SNP-seq genotyping. The 219 SNPs were distributed from chromosome one to 12 (Table S3) and the phylogenetic tree of the 45 eggplant lines using the 219 SNPs showed classification and relationships largely consistent with that determined using the total genome-wide SNPs (Fig. S1).

Genotyping analysis of eggplant varieties using target SNP-seq

The selected 219 perfect SNPs were successfully genotyped in the 377 eggplant varieties using the target SNP-seq method. The average read depth per SNPs in the 377 varieties was 1041 and 58% of the samples were sequenced at a depth greater than 1000× (Fig. S2A). Three hundred and seventy-two out the 377 varieties (98.7%) exhibited more than 98% alignment rate (Fig. S2B). Of these aligned reads, all varieties aligned to the target SNP region at a rate greater than 95%, with the average target region alignment rate of 97.4% (Fig. S2C). Only three SNP markers showed missing data with the highest missing rate being just 0.53%. This demonstrated that the selected 219 SNPs were highly conserved and well genotyped. In addition, the target SNP-seq uniformity index (Fig. S2D) was analyzed to calculate the proportion of the coverage above 10% of mean depth value for each variety (Nishio et al. 2015) and to infer the level of accuracy. Almost all the varieties (99.5%) had a uniform index higher than 98% indicating a high level of accuracy for the target SNP-seq.

Using these 219 SNPs, a unique genetic fingerprint was established for each of the 377 eggplant varieties. The PIC values of 219 SNPs genotype in eggplant varieties ranged from 0.078 to 0.375 with an average of 0.313, of which 68.64% of all SNPs exhibited PIC value higher than 0.3 (Fig. 1a). There were 119 SNPs displaying MAF value higher than 0.3 (54.3%) with mean value of 0.311 (Fig. 1b). The observed heterozygosity (Ho) displayed an average value of 0.224 with 57.9% of all SNPs above 0.2 (Fig. 1c). Moreover, the mean genetic diversity (GD) value of 219 SNPs was 0.398 ranging from 0.0813 to 0.5 for individual markers (Fig. 1d). This result indicated that the 219 perfect SNPs of eggplant are informative with good discriminating capacity, and suitable for variety identification and genetic diversity analysis.

Fig. 1
figure 1

Genetic diversity analysis with 219 perfect SNPs in the 377 eggplant varieties. Polymorphic information content (PIC; a), minor allele frequency (MAF; b), observed heterozygosity (Ho; c), genetic diversity (GD; d)

Population structure analysis in eggplant varieties

The population structure of 377 eggplant varieties were analyzed based on the 219 perfect SNPs in the eggplant genome. The 377 varieties include major contemporary eggplant varieties originated in China, as well as imported from other countries like the Netherlands and France. The model-based structure analysis showed that the best K value was K = 2 (Fig. 2a). At K = 2, the population were differentiated based on their origins and geographic distribution; the 299 varieties in Pop1 mostly originated in China and East Asia, whereas the 78 varieties in Pop2 are mainly introduced from Europe or have consanguinity of European varieties, like cultivars ‘Jin Qie 320’ and ‘17z36’ (Fig. 2b). Both Pop1 and Pop 2 contain eggplants with various fruit colors and shapes. To further analyze the genetic structure of eggplant varieties, the population structure at K = 3 was studied. Pop1 was divided into two sub-populations Pop1A and Pop1B, with clear separation of fruit shape in eggplants (Fig. 2b). PCA and PCoA were conducted to assess the population structure (Fig. 2c). The two-dimensional plots of PCA and PCoA clearly indicated three clusters of eggplant varieties, which was consistent with Pop1A, Pop1B, and Pop2 inferred by structure analysis at K = 3. In addition, a phylogenetic analysis using an unrooted neighbor-joining tree was calculated from pairwise genetic distances (Fig. 3), which was in accordance with the population structure inferred by the structure analysis, PCA and PCoA. All these suggested the 377 eggplant varieties could be classified into three populations. The presence of mixture was observed within the three populations (membership coefficient of its own population less than 0.8). Pop1A and Pop2 contained a small number of mixtures, whereas Pop1B contained a relatively higher amount of mixture with membership coefficient from both Pop1A and Pop1B. Interestingly, Pop1A was mainly composed of round- and oval-fruited eggplants, and most of the eggplant varieties in Pop1B had elongated fruits. This highlighted that fruit shape was strongly correlated to population structure for eggplant varieties in Asia, and that fruit shape is a major trait for selection, which had an impact on the genetic structure of eggplant varieties.

Fig. 2
figure 2

Population structure, PCA, and PCoA analysis of the 377 eggplant varieties. (a) Delta K plots derived from 219 SNPs genotyping result. (b) Population classification at K = 2 (Pop1 colored in orange and Pop2 colored in green) and at K = 3 (Pop1A colored in red, Pop1B colored in blue, and Pop2 colored in green). (c) Principal component analysis and principal coordinate analysis from 219 SNPs genotyping result. The varieties belonging to Pop1A, Pop1B, and Pop2 were colored the same as that in (b)

Fig. 3
figure 3

Unrooted neighbor-joining tree of 377 eggplant varieties based on the 219 perfect SNPs. The varieties in Pop1A, Pop1B, and Pop2 are colored as in Fig. 3

In this study, the AMOVA was conducted to assess the population structure of 377 eggplant varieties based on the 219 perfect SNPs. The AMOVA showed that the differences between three suggested populations (Pop1A, Pop1B, and Pop2) contributed 36.9% of the variation, and the minimum variation (14.7%) occurred within populations (Table 1). Meanwhile, the difference between varieties contributed 48.4% of the total variations. At the meantime, a pairwise Fst estimation between the three populations was also performed to test for significant variations between the populations. Within Pop1, the round- and oval-fruited eggplants in Pop1A represented a distinct population from the long-fruited eggplants in Pop1B (Fst = 0.2614). Pop2 was genetically differentiated from both Pop1A (Fst = 0.5023) and Pop1B (Fst = 0.3966), indicating a closer relationship with Pop1B (Table S4). The relatively closer relationship between Pop1B and Pop2 may be partly due to their higher level of gene exchange suggested by the mixture between them.

Table 1 Analysis of molecular variance (AMOVA) among populations and within populations in eggplant varieties

Core SNPs set for genetic diversity analysis and variety identification

Commonly, a small number of highly informative SNP markers are selected as a core marker set for easy and fast study in genetic diversity analysis and variety identification. In this study, 36 SNPs were chosen as the core SNPs set with the ability to differentiate 95% of the 377 eggplant varieties (Fig. 4a). Moreover, the PCA and PCoA analysis using the 36 markers showed three well-separated clusters, which was consistent with the analysis using all 219 perfect SNPs (Fig. 4b). The neighbor-joining tree built using the 36 core SNPs also showed a clear classification in three groups (Fig. 4c). Therefore, these 36 core SNPs were sufficient for representing the genetic diversity of the 377 varieties with high efficiency in variety identification.

Fig. 4
figure 4

Genotyping and genetic analysis of eggplant varieties based on 36 core SNPs. (a) The saturation curve of 219 perfect SNPs identifying in 377 eggplant varieties. (b) PCA and PCoA analysis using 36 core SNPs. (c) Unrooted neighbor-joining tree of 377 eggplant varieties based on the 36 core SNPs. The varieties in Pop1A, Pop1B, and Pop2 are colored as in Fig. 3

SNP polymorphism and genetic diversity within populations

The polymorphism of genome-wide perfect 219 SNP markers in eggplant genome was further evaluated within the three populations. Pop1A showed the lowest amount of polymorphic markers (180 SNPs), while both Pop1B and Pop2 had 215 polymorphic SNPs (Table 2). The five SNP markers with no polymorphism in Pop1B were behaving polymorphic in Pop2. Also, the five non-polymorphic SNPs in Pop2 displayed polymorphism in Pop1A and Pop1B. This further indicated the differentiation between the three populations. The lower polymorphic rate of SNPs in Pop1A in turn caused lower average GD and PIC value (0.248 and 0.199), compared to that in Pop1B (0.311 and 0.248) and in Pop2 (0.307 and 0.248) (Table 2). This demonstrated that a narrower genetic background is present in the round- and oval-fruited eggplant varieties (Pop1A). We accessed the inbreeding coefficient in the three populations. Pop2 had lowest inbreeding coefficient on average at 0.065, and Pop1B had a moderate inbreeding coefficient of 0.154. Meanwhile, Pop1A showed significantly higher inbreeding coefficient at 0.442 (Table 2). The high inbreeding coefficient in Pop1A corresponded to its low average Ho using either 219 SNPs (0.124) or its 180 polymorphic SNPs (0.152) (Table S7). This indicated a high frequency of inbreeding events among the round- and oval-fruited eggplants in Pop1A, which may explain their low genetic diversity detected by the 219 SNPs.

Table 2 Marker polymorphism within the defined three population groups

Genetic similarity and core varieties analysis

To further understand the gene exchange and genetic background of the 377 eggplant varieties, a genetic similarity matrix was built based on the number of differential SNP genotypes between varieties within each population. Fewer differential SNP genotypes between varieties indicates a closer relationship. Varieties in both Pop1B and Pop2 showed higher numbers of differential SNPs, at 95.72 and 96.81, respectively. Pop1A displayed the lowest average numbers of differential SNPs at 73.09, which further indicated its narrower genetic background (Fig. S3). Furthermore, the top 20% of varieties with minimum differential SNPs in each subgroup were selected as the core eggplant varieties (Table S6). We chose 24 varieties to represent Pop1 including ‘Niu Xin Qie’ and ‘Jiu Ye,’ 36 in Pop1B including ‘Hei Jao Zi’, ‘Jin Qie 218,’ and ‘Ya Shu16–2,’ and 16 in Pop2 including ‘Brigitte’ and ‘Sharapova 10-203.’

Genome-wide association analysis of fruit shape in 377 eggplant varieties

The genome-wide 219 SNPs were used to conduct a genome-wide association study to identify SNPs associated with fruit shape (round, oval, and elongated) in 377 eggplant varieties. There were five SNPs identified showing strong association with the fruit shape (p < 0.005) on chromosomes 3, 6, and 7 (Table 3; Fig. S4). With the availability of reference eggplant genome, we were able to map previously identified fruit trait QTLs to the genome and compare their genomic locations with the identified SNPs in this study. It was found that three of five newly identified SNPs were co-localized with fruit trait QTLs in previous QTL mapping and association studies (Doganlar et al. 2002; Frary et al. 2014; Portis et al. 2014, 2015). The SmSNP152 on chromosome 3 was matched to the fs7.1 associated with fruit shape (Doganlar et al. 2002; Frary et al. 2014), as well as the fdmaxE03.ML (Portis et al. 2014), and the E03.2 (Portis et al. 2015) associated with fruit diameter in eggplant. Analyzing the tomato syntenic region of SmSNP152 (Hirakawa et al. 2014), we found that it contained one SUN homolog gene and two OVATE homolog genes (Table S8). The SUN and OVATE family genes have been well studied as they control the fruit elongation in tomato (Liu et al. 2002; Xiao et al. 2008; Wu et al. 2018b). Moreover, the associated SmSNP025 and SmSNP034 both on chromosome 7 were analyzed to be matched to the known QTL regions underlying fruit length and shape in eggplant and are syntenic with regions of tomato genome containing OVATE-like genes (Portis et al. 2014). The SmSNP133 and SmSNP191 were not matched to the known QTLs published in research on eggplant, but their syntenic tomato regions were also observed to be harboring the SUN homologs (Table S8). Therefore, this study suggested that the SUN and OVATE homolog genes may play key roles in controlling the fruit shape in eggplant.

Table 3 SNPs associated with eggplant shape uncovered by association analysis

Discussion

Genome-wide perfect SNP discovery and its efficient utility in variety identification

In this study, we showed that the selection method for perfect SSRs (Yang et al. 2019) could be modified to select perfect SNPs from the variome of 45 eggplant material lines. Validation using the novel target SNP-seq in a larger collection of 377 eggplant varieties demonstrated good performance and efficiency with an extremely low missing rate for the chosen 219 perfect SNPs. This showed that the target SNP-seq provides a reliable and efficient method for high-throughput genotyping. The simultaneous amplification of hundreds of target sites using multiplex PCR and combining of multiple samples in one single sequencing run greatly reduced the time and cost for large-scale genotyping. Also, the high PIC and MAF values of the 219 perfect SNPs indicated that they are informative, suggesting that they could be used for variety identification and authentication in a wider range of eggplant cultivars. Therefore, the use of variomes proved to be useful for marker selection and evaluation, which could help to reduce the number of markers that need to be validated and ensure marker quality. Combining the use of variomes with a high-throughput genotyping method, target SNP-seq could help to speed up the molecular marker development and genotyping for variety identification, as well as molecular breeding for many other crops.

Fruit shape as a major selection trait in eggplants

Model-based population structure analysis, PCA, and PCoA detected three differentiated populations, Pop1A, Pop1B, and Pop2, within the 377 eggplant varieties. This variety collection was firstly divided into two main groups, Pop1 including eggplant originating from China and East Asia, and Pop2 containing eggplants with consanguinity from Western Europe. This was consistent with the fact that Western Europe and China are two secondary eggplant diversity centers and previous genetic structure analysis (Cericola et al. 2013; Taher et al. 2017). Their molecular differentiation could be contributed by human selection, mutations and recombination as well as environmental adaptation in two different geographic regions. Within Pop1, two sub-populations, Pop1A and Pop1B, could be detected and were found to be correlated with fruit shape. The majority of the eggplants in Pop1A have round or oval fruit, whereas Pop1B is composed of long-fruited eggplant. This provided evidence at genetic level showing that fruit shape is a major selection trait in eggplants. This was also indicated in a previous study showing a correlation between phylogenetic classification and fruit shape within clades (Barchi et al. 2011). Therefore, fruit morphology, as a major trait for human selection and breeding, had left a significant impact on the genetic structure of eggplants. This is also observed in other vegetable crops like tomato and pepper with selection for fruit shape as an important factor responsible for genetic structure, besides market specialization and environmental adaptation (Sim et al. 2009; Gonzalez-Perez et al. 2014).

In this study, we observed a significant lower level of genetic diversity in the round- and oval-fruited eggplants in Pop1A, indicated by their lower average GD and PIC values (Table 2). Furthermore, the low Ho and high inbreeding coefficient of Pop1A suggest a high level of inbreeding which could be the main reason behind the narrow background (Table 2). Also, a closer genetic similarity with the lowest average number of differential SNPs was observed for Pop1A (Fig. S3). This suggests that limited genetic material were used in the breeding program for round- and oval-fruited eggplants. Therefore, there is a high risk of genetic erosion in the round- and oval-fruited eggplants in Pop1A, which requires urgent attention in the breeding system. In Pop1A, a few varieties having membership coefficient from Pop1B based on structure analysis are all elongated eggplants. This suggests that introducing genetic material from elongated eggplants for round- or oval-fruited eggplants may be difficult to preserve the fruit shape trait. Genes promoting the elongated fruit may be dominant and there may be a higher chance of resulting long-fruited eggplants when crossing the round- or oval-fruited eggplants with the long-fruited eggplants. Marker-assisted selection (Zhou et al. 2003) could be used to quickly identify potential eggplants with the desired fruit traits which may help to introduce distant genetic materials for round- or oval-fruited eggplants at the same time to conserve the fruit shape. This would require an understanding of the genetic basis for fruit shape control in eggplant and to identify markers and genes associated with fruit shape, which can then be used for marker-assisted selection and molecular breeding.

Conservation of SUN and OVATE-like genes function in fruit shape control

The genotype of the 377 varieties using 219 genome-wide SNPs presented an opportunity for fruit shape association analysis in an attempt to discover SNPs that can be used for marker-assisted selection. This resulted in five associated SNPs to be detected. With location information provided by the new reference genome, SmSNP025, SmSNP034, and SmSNP152 were found to be co-localized with QTL regions in previous studies (Table 3) (Doganlar et al. 2002; Frary et al. 2014; Portis et al. 2014, 2015). Particularly, the region of SmSNP152 at chromosome 3 has been found to be associated with fruit shape in QTL mapping studies using two independent F2 population in different locations (Doganlar et al. 2002; Frary et al. 2014; Portis et al. 2014) and one genome-wide association study (Portis et al. 2015). Repeated detection of this region highlights its significance in fruit shape control and its identification in a population derived from a cross between the cultivated eggplant, S. melongena, and its wild relative, Solanum linnaeanum (Doganlar et al. 2002), suggests its possible selection during domestication. This showed that these 219 genome-wide SNPs are useful for association study as we confirmed three QTLs and identified two novel regions associated with fruit shape in eggplant. The five associated SNPs would be valuable for marker-assisted selection programs and further QTL mapping and GWAS with higher marker density in these regions could help to identify the exact responsible genes.

The SUN and OVATE-like genes are two important gene families controlling fruit elongation, and they have been well mapped in several crops, such as tomato, pepper, cucumber, and melon (Liu et al. 2002; Xiao et al. 2008; Zygier et al. 2005; Pan et al. 2017; Che and Zhang 2019; Monforte et al. 2014). SUN encodes a protein promoting fruit elongation (Xiao et al. 2008), whereas OVATE encodes a protein playing a negative role in growth as its null mutation results in elongated fruit (Liu et al. 2002; Wu et al. 2018b). Both SUN and OVATE family genes have been shown to influence microtubule organization and dynamics with opposite effect on cell shape, and organ morphology (Lazzaro et al. 2018). In this study, examination of the syntenic region of the five associated SNPs in tomato genome found that all of them carry either SUN or OVATE-like genes. Particularly, the syntenic region in tomato genome of the SmSNP034 contains the SlOFP20, which has been fine-mapped and cloned showing to contribute to tomato fruit shape (Wu et al. 2018b). Also, the repeatably mapped region of SmSNP152 contains two OVATE-like genes and one SUN-like gene. The association found between regions containing SUN and OVATE-like genes and eggplant fruit shape suggests conservation of their function in fruit shape control in eggplant.

Conclusion

We showed that the eggplant reference genome and the variome of diverse genetic materials can be used to select 219 genome-wide perfect SNPs for variety identification and genetic analysis. Correlation between the eggplant fruit shape and genetic structure in Asia demonstrated that fruit shape is a major selection trait. The low genetic background detected using the 219 perfect SNPs in the round- and oval-fruited eggplants in Asia indicated a risk of genetic erosion and the urgent need to widen the choice of genetic materials in breeding programs. In our association study, we identified both previously detected and novel genomic regions associated with fruit shape in eggplant. Therefore, this study showed that the genome-wide perfect SNPs can be a valuable tool for variety identification and genetic analysis.