Introduction

Soybean mosaic virus (SMV) is a single-stranded, positive-sense RNA virus of the Potyvirus genus (Adams et al. 2005). In SMV-infected soybean, an obvious mosaic symptom was observed in leaves, causing severe yield loss. Based on disease symptoms in specific soybean cultivars, SMV isolates were divided into seven strains (G1 to G7) in America (Cho and Goodman 1979) and there were twenty-two strains (SC1 to SC22) in China (Wang et al. 2003; Guo et al. 2005; Zhan et al. 2015; Li et al. 2010). However, the relationship among different SMV strains between America and China has not been reported yet. Breeding and planting soybean cultivars with broad-spectrum resistance were effective and eco-friendly methods to decrease the yield loss caused by SMV. Four resistant loci, Rsv1, Rsv3, Rsv4, and Rsv5 (Resistance to SMV), have been reported to be effectively resistant to several SMV strains (Kiihl and Hartwig 1979; Buss et al. 1997; Hayes et al. 2000; Klepadlo et al. 2017). However, a different naming method has been used in China. Some resistance loci on chromosome 2 containing RSC7 and RSC8 (Resistance to SMV in China) derived from Kefeng No. 1 were resistant to SC1-SC13, SC16, SC18-SC21, and G1-G7 (Li et al. 2010; Yan et al. 2015; Wang et al. 2011b; Cai et al. 2014). Resistance loci on chromosome 13 involving RSC3 and RSC14 derived from Qihuang No. 1 were resistant to SC1-SC8, SC11-SC14, SC16-SC18, SC20-SC22, and G1-G7 (Li et al. 2010; Cai et al. 2014; Zheng et al. 2014; Ma et al. 2011). Resistance loci RSC4 on chromosome 14 derived from Dabaima was resistant to SC4 and SC10 (Wang et al. 2011a; Li et al. 2012). RSC15 was a newfound locus on chromosome 6 derived from RN-9, which could resist the most virulent strain SC15 (Ren et al. 2017). To compare the different loci, clusters of resistant genes containing nucleotide-binding sites (NBS) and leucine-rich repeat (LRR) domains were identified in Rsv1 and Rsv3 (Wang et al. 2011a; Hayes et al. 2004; Yang et al. 2013; Ma et al. 2016). Recently, Rsv4 gene encoding an RNase H family protein was map-based cloned in Peking; Rsv4 protein inhibited virus multiplication by degrading dsRNA, thus conferring a broader resistance to potyvirus (Ishibashi et al. 2019). Silencing a chaperone GmHSP40.1 gene near Rsv4 enhanced the susceptibility of soybean plants to SMV (Liu and Whitham 2013). Taken together, it is of great significance to explore resistant genes by using resistant sources for breeding broad-spectrum–resistant soybean varieties.

A lot of work has been done in the field of resistant loci mapping and candidate genes prediction. However, few of Rsv or Rsc genes were cloned and functionally identified. The reasons for these results may be that many resistant loci were mapped to large regions by adopting biparental mapping populations with low genetic diversity and low mapping resolution. Therefore, high-density markers and new methods are required for the identification of resistant genes. Many putative genes have been identified in plant via genome-wide association study (GWAS) based on broad heritable diversity populations and high-density single-nucleotide polymorphisms (SNPs) markers. Three causal genes, LOC_Os01g62780, LOC_Os11g08410, and LOC_Os08g37890, which are associated, respectively, with days to heading, plant height, and panicle length, were identified in rice by GWAS and thus are important for agronomic traits improvement (Yano et al. 2016). Soybean GmMYB29, an R2R3-type MYB transcription factor, was identified by GWAS and obtained the function in isoflavone biosynthesis (Chu et al. 2017). Two homeologous genes Tof12 and Tof11 were identified by GWAS and provided the evidence that flower phenology was a key domestication trait in soybean (Lu et al. 2020; Gong 2020). Che et al. (2017) identified 104 SNPs significantly associated with resistance to SMV strain of SC7 in a soybean mutant panel (Che et al. 2017).

Few studies have been reported on the excavation of SMV resistance loci in soybean natural panel by GWAS. The object of this study was to screen some SMV-resistant germplasms in a soybean natural population and to identify the important SMV-resistant genetic loci in multiple environments using GWAS. This information could be used for marker-assisted selection in soybean-resistant breeding programs.

Material and methods

Plant materials and SC3 strain

A natural population consisting of 219 soybean accessions (including 195 landraces and 24 elite varieties) collected from 26 diverse provinces and six diverse ecological regions in China was used for GWAS (Wang et al. 2016). The viruliferous leaves of SMV strain of SC3 were preserved in a refrigerator at − 80 °C. All soybean accessions and SMV strain were provided by the National Center for Soybean Improvement (Nanjing Agricultural University, Nanjing, China). The resistance evaluation experiments were conducted in 2010, 2011, and 2012 in an aphid-free greenhouse at the Jiangpu Experimental station of Nanjing Agricultural University (32° 12′ N and 118° 37′ 48″ E), and these three environments were named E1, E2, and E3, respectively. In each environment, all the accessions were planted in a randomized complete block design.

Resistance evaluation and phenotype investigation

About thirty seedlings of each accessions were planted in round pots filled with sand. The unifoliate leaves were inoculated with a SC3 inoculum, which were viruliferous leaves grinded by using a mortar and pestle in 0.1 M sodium phosphate buffer (PH 7.2–7.4), and finally rinsed with tap water. Foliar symptoms to SC3 were monitored weekly after inoculation, compared with continuous counting 2 to 4 weeks post inoculation, then disease rate (DR) phenotype was investigated according to previously reported standard (Pu et al. 1983). The rate of the number of systematic mosaic plants to total plants was defined as DR. Based on this standard, DR under 10% was defined as resistant accession and DR over 10% was susceptible accession.

Phenotypic data analysis

Phenotypic data including descriptive statistics, the best linear unbiased prediction (BLUP), analysis of variance (ANOVA), and broad heritability for disease rates were performed by R software (R Core Team 2018). To minimize the effects of variation among the three different environments, BLUP was computed based on the phenotypic data according to the R software with lme4 package (Merk et al. 2012): Yik = μ + Gi + Yk + GYik + εik, where Yik is the phenotype evaluate, μ is the overall mean, Gi is ith’s genotypic effect, Yk is kth’s year effect, GYik is the interaction effect of genotype × year, and εik is the residual error. The broad-sense heritability (h2) of phenotype data was estimated according to the formula: h2 = σ2g/(σ2g + σ2gy/n + σ2e/nr), where σ2g is the genotype variance, σ2gy is the genotype and environment interaction variance, σ2e is the residual error, n is the number of environments, and r represents the replications (Knapp et al. 1985).

GWAS analysis

The 219 soybean accessions were genotyped using NJAU 355 K SoySNP array as previously reported (Wang et al. 2016). A total of 292,053 high-quality SNP markers were identified and provided approximately one SNP per 3.3 kb among the 20 soybean chromosomes. In this study, we screened out 207,608 SNPs with minor allele frequency (MAF) ≥ 5% and three environmental phenotypes and BLUP values were used for association analysis. GWAS was performed by using a compressed mixed linear model (CMLM) based on an R package named the Genome Association and Prediction Integrated Tool (Lipka et al. 2012). The Bonferroni threshold of significantly associated SNPs with DR was set as P ≤ 1/207,608 = 4.82e–06 or –Log10P ≥ 5.32. The linkage disequilibrium (LD) block and haplotype analysis were performed using Haploview 4.2 software (Barrett et al. 2005).

Putative gene prediction and expression analysis

To explore the putative genes underlying significant association signals, the predicted genes in the LD decay (130 kb) genomic regions were searched based on the annotation of the soybean reference genome Glyma.Wm82.a1.v1.1 (http://phytozome.net). Genes with annotations related to biotic stress or disease resistance were selected as putative genes. Meanwhile, a BLAST-P analysis was performed in Arabidopsis and rice genome using the amino acid sequences of soybean putative genes.

Two RNA-seq data downloaded from public databases were used for the expression analysis of putative genes. One database including seed development, vegetative tissues (leaves, roots, stems, and seedlings), and reproductive tissues (floral buds) was downloaded from the Plant Expression Database (https://www.plexdb.org, GEO Accession number: GSE29163). Another containing six tissues with different development stages was downloaded from the SoyBase (https://www.soybase.org/soyseq/). The multi-array comparative analysis method is the most popular preprocessing algorithm for gene expression microarrays. The normalized values from microarray were log2-transformed. To visualize the expression data, two heat maps were constructed by HemI software (Deng et al. 2014).

To evaluate whether these three candidate genes were induced by SC3 strain, resistant accession Kefeng No.1 and susceptible accession Nannong 1138-2 were planted in an aphid-free greenhouse, and leaves with three biological replications were collected in liquid N2 at 0, 1, 2, and 3 days after SC3 inoculating or mock inoculating with sodium phosphate buffer. Total RNA was isolated with Ultrapure RNA Kit (CWBIO). Synthesis of complementary DNA and removal of DNA were performed with first strand cDNA synthesis kit (Takara). Quantitative real-time PCR (qRT-PCR) was performed using the LightCycler 480 (Roche). The PCR cycling conditions were 95 °C for 30 s; 40 cycles: 95 °C for 10 s and 60 °C for 60 s. The soybean tubulin gene (Glyma.05G157300) as an internal control for calculating the relative expression levels of candidate genes.

Results

Significant phenotypic variation in soybean natural population

The disease rates of SMV strain of SC3 in three environments were used as phenotypic traits, which were analyzed by using descriptive statistics, ANOVA, and broad heritability to determine the resistance diversities in a soybean natural population. A total of 219 accessions, representing a broad level of genetic variation of soybean cultivation in China, were collected from three ecological habitats. As expected, a great level of genetic variation in DR was observed in this panel (Table 1). The DR was ranging from 0 to 1 with average values ranged from 0.64 to 0.70 in three environments. Moreover, these phenotypic values exhibited large coefficients of variation (CV), and mean CV in three environments were 52.94, 39.39, and 52.83, respectively. The frequency distribution of DR among three environments showed a skewed normal distribution (Fig. 1), suggesting DR was a complex quantitative trait controlled by multiple genes. The ANOVA showed that genotype, and genotype and environment interaction dramatically influenced the DR. To explore whether DR could stably be inherited to the progeny, the broad-sense heritability was calculated in this population. Unsurprisingly, the broad-sense heritability was 79.02%, which was a high value. Collectively, these results indicated that DR with high variation diversity and heritability was stable in this population, especially affected by a genetic factor.

Table 1 Descriptive statistics, ANOVA and broad heritability of disease rates to SC3 in three environments
Fig. 1
figure 1

Phenotypic analysis of disease rates to SC3 in three individual environments. a Frequency distribution of DR in three environments, E, environment. b Box plot of DR in three environments. Bold line is median value, black dots are DR values of 219 soybean accessions

Significant loci associated with DR

We screened out 207,608 polymorphic SNPs with MAF ≥ 0.05 for GWAS analysis. By using a CMLM model to correct the population structure and kinship, as shown in the quantile-quantile (QQ) plots (Fig. 2), in the left corner of scatter diagram were low significant loci that were not associated with disease rate trait, so the observed –log10(P) should be consistent with the expected –log10(P). The fact that these loci were indeed on the diagonal line indicates that the CMLM was suitable for disease rate trait in GWAS. A total of twenty-four SNPs exceeding a significant threshold (−log10P ≥ 5.32) in all three environments and BLUP (Table 2). Additionally, ten SNPs on chromosome 13 significantly associated with DR were repetitively detected at least twice. The other fourteen significant SNPs identified only in a single environment were located on chromosomes 2, 8, 13, and 20 (Table 2, Fig. 2).

Fig. 2
figure 2

Manhattan plots and quantile-quantile plots for three environments and BLUP in GWAS results. E, environment, BLUP, best linear unbiased prediction. Horizontal lines indicates Bonferroni-adjusted significance threshold −log10(P) ≥ 5.32

Table 2 SNPs significantly associated with disease rate trait in soybean

Four SNPs, which were identified by GWAS within or near previously fine-mapped QTLs associated with resistance to SMV on chromosome 2 (Yan et al. 2015; Wang et al. 2011b; Karthikeyan et al. 2017), closed to each other and explained almost 31.48% of the phenotypic variation. On chromosome 13, a total of fifteen SNPs were divided into two SNP clusters. One cluster included four SNPs (AX-93813964, AX-93813987, AX-93814000, and AX-93939503) that were significantly detected by GWAS within or near the previously identified resistant QTLs, such as Rsv1, Rsc-pm, and RSC11 (Yang et al. 2013; Shi et al. 2008; Bai et al. 2009); the other cluster containing eleven SNPs were located near several resistant QTLs, such as RSC18B, Rsc-ps, Rsv5 (Klepadlo et al. 2017; Yang et al. 2013; Li et al. 2015). Remarkably, the phenotypic variation of these 15 adjacent SNPs ranged from 25.54 to 36.27%, suggesting that these loci were major QTLs with large effects. In addition, five SNPs significantly associated with resistance to SC3 located on chromosomes 8 and 20 were not reported in previous reports and explained 31.50–32.65% of the phenotypic variation, possibly representing novel major loci underlying soybean resistance to SC3 (Fig. 3a, b). A new SNP locus AX-93753793 on chromosome 8 was significantly identified in E1 (Fig. 3a–c). In comparison with allelic variation, the superior alleles of this locus may decrease the average DR from 0.79 to 0.32 (Fig. 4a). Furthermore, resistant accession Kefeng No.1 and susceptible accession Nannong 1138-2 were also divided into different groups based on allelic variation of AX-93753793. On chromosome 20, four SNPs loci were significantly detected in E1 (Fig. 3d–f). Based on these four alleles, the genotype of natural population used in this study was classified into three haplotypes (Hap1–Hap3): Hap1 (ACCA), Hap2 (CTTG), and Hap3 (ATTG). Among them, Hap2 contained 19 soybean accessions with average DR less than 0.1, suggesting Hap2 was the favorable haplotype for resistance to SC3 (Fig. 4b). These results indicated that the genetic diversity association panel and abundant genome-wide SNP markers are both important factors for mapping resolution, which would enable to map previously identified SMV-resistant QTLs and detect novel loci via GWAS.

Fig. 3
figure 3

Regional plots of novel significant association loci on chromosomes 8 and 20. a A 260-kb region of local Manhattan plot harboring the significant SNP on chromosome 8. The red dot is the most significant SNP, AX-93753793. b The soybean reference genome (Williams 82 a1.V1.1) in a 260-kb region on chromosome 8. Candidate genes were showed in red color. The linkage disequilibrium heat map shown in c. d Local Manhattan plot in 396 kb region on chromosome 20, four significant SNPs are showed in red dots. The soybean reference genome (e) and linkage disequilibrium heat map (f) in a 396-kb region on chromosome 20, and the candidate gene was showed in red color

Fig. 4
figure 4

Superior allele analysis and haplotype analysis of SNPs significantly identified in GWAS. a Superior allele analysis on chromosome 8. b Haplotype analysis on chromosome20. Box edges represent 25% quantile (top) and 75% quantile (bottom), bold line is median value, square dot is mean value, and black dot is abnormal value. **, significant at P ≤ 0.01

Prediction of candidate genes and expression patterns analysis

A total of 24 SNPs significantly associated with DR detected in three environments and BLUP. More than two-thirds (19/24) of SNPs were located within or closed to previously reported loci. Fifteen SNPs on chromosome 13 associated with resistance to SC3 flanked with known major resistance loci Rsv1 and Rsv5. Previous studies reported that the resistant gene in Rsv1 loci contained typical nucleotide-binding sites and leucine-rich repeat domain. Four genes (Glyma13g25920, Glyma13g25950, Glyma13g25970, and Glyma13g26000) homologous to those reported resistant genes 5gG3 and 3gG2 were predicted as candidate genes (Li et al. 2017). This co-location of previously reported SC3 resistance loci with SNPs significantly identified in this study supplied a powerful evidence showing the credible of the GWAS results.

For high-density of SNP markers used in this panel, providing the possibility to predict the candidate genes. A new SNP locus AX-93753793 on chromosome 8 explained 31.69% of the DR variation. We found a total of 28 genes in the LD region (260 kb) of SNP AX-93753793 in the Williams 82 reference genome, then compared the different genetic variation of these genes between Kefeng No.1 and Nannong 1138-2 based on the whole genome re-sequencing data previously reported (Ma et al. 2019). Results showed that AX-93753793 was located in an LD block harboring two candidate genes encoding aldolase-type TIM barrel family protein (Glyma.08g175800) and TCP-1/cpn60 chaperonin family protein (Glyma.08g175900), respectively. An indel variation in 3′-UTR of Glyma.08g175800 and fourteen bases of frameshift insertion in the exonic region of Glyma.08g175900 were found (Table 3). Another unreported SC3 resistance loci was identified to be associated with DR on chromosome 20. In this region, Glyma.20g190000 encoding 3\′-phosphoinositide-dependent protein kinase 1 was regarded as a candidate gene. Two different indel variations in the 5′-UTR region of Glyma.20g190000 was identified between Kefeng No.1 and Nannong 1138-2 (Table 3). Glyma.20g190000 was homologous to Ospdk1 in rice. OsPdk1 was reported to regulate basal disease resistance, overexpression of Ospdk1-enhanced basal resistance against both fungus, Magnaporthe oryzae, and bacterial pathogen, Xanthomonas oryzae pv. oryzae (Matsui et al. 2010). Therefore, these three genes (Glyma.08g175800, Glyma.08g175900, and Glyma.20g190000) located under the peak SNPs of newly found SC3-resistant loci were identified as candidate genes.

Table 3 Different variations of candidate genes in Kefeng No.1 and Nannong 1138-2

Moreover, elucidating the expression patterns of candidate genes during different soybean tissues would be favorable for understanding the potential functions. The GEO RNA-seq data in public databases was used for expression analysis. Three genes Glyma.08g175800, Glyma.08g175900, and Glyma.20g190000 were expressed in all soybean tissues, including leaf, root, stem, flower, seeding, mature embryo, dry seed, and cotyledon (Fig. 5a). However, they showed different expression patterns: Glyma.08g175800 had high expression in leaf, flower, and seeding but low in root; Glyma.08g175900 was expressed in all tissues and highest in leaf; the expression level of Glyma.20g190000 in flower and cotyledon was twice as high as that in other tissues, and it also showed medium expression level in leaf and seeding. To verify the reliability of expression patterns of candidate genes, another RNA-seq data downloaded from Soybase was examined and acquired similar results (Fig. 5b).

Fig. 5
figure 5

Digital expression profiles of three candidate genes in different tissues. a Expression levels of three genes in eight different tissues from microarray data (GEO Accession number: GSE29163). b Expression levels of three genes in different tissues and at different stages based on RNA-seq data. The values used in heatmaps from the microarray data were log2-transformed (Glyma08g18750, Glyma08g18760 and Glyma20g33140 were corresponding to Glyma.08g175800, Glyma.08g175900 and Glyma.20g190000, respectively, in the a2 version of the soybean Williams 82 reference genome)

The expression analysis in leaves of three candidate genes with SC3 or mock inoculation was shown in Fig. 6. Comparing with mock inoculation, the expression level of Glyma.08g175800 was induced upregulated in Kefeng No.1 at 1, 2, and 3 days after SC3 inoculating and significantly upregulated in Nannong 1138-2 at 3 days after SC3 inoculating. On the contrary, the expression levels of Glyma.08g175900 and Glyma.20g190000 were induced downregulated in both Nannong 1138-2 and Kefeng No.1 at 1, 2, and 3 days with SC3 inoculation than mock inoculation. These three candidate genes were all induced by SC3 virus strain which may suggest some special functions in SMV resistance. The expression analysis was an important basis for further estimating the functional study of SC3 candidate genes.

Fig. 6
figure 6

Expression analysis of three candidate genes in leaves after SC3 inoculating. Error bar indicate the standard deviation. ∗Significant at P ≤ 0.05; ∗∗Significant at P ≤ 0.01

Discussion

Factors determining association analysis

To improve the resolution of GWAS for the rapid identification of new loci associated with DR to SC3, we employed high-density SNP markers in this study. SNPs have been extensively applied in genetic mapping research for their wide and uniform distribution in genome, reducing the possibility of missing important QTLs. A SoySNP50 K array was used for GWAS to detect QTLs related to soybean seed protein and oil content (Hwang et al. 2014). A total of 62,423 SNPs with a MAF ≥ 5% were used for GWAS to identify major and minor loci associated with dynamic plant height and number of node traits (Chang et al. 2018). In this study, a higher-density NJAU 355 K SoySNP array containing 292,053 SNPs was used for GWAS. These high-quality SNPs were strictly screened from SNP database which represents the population diversity of wild and cultivated soybean. The 93.6% of SNPs span less than 9 kb, and the average SNP spacing was approximately 3.3 kb among 20 chromosomes (Wang et al. 2016). Based on high-density SNP array, the resolution is much higher compared with 1536 SNPs used in previous study (Liu et al. 2016). Genotyping with high-density markers and phenotyping in multiple environments can improve the ability of GWAS to detect potential SC3 resistance loci.

LD is a theoretical basis for mapping candidate loci or genes in GWAS. A much smaller LD value in out-pollination crop than self-pollination crop, because the genome recombination rate was increasing and accumulating by generation in cross-pollinated crop. As out-pollination crop, the LD was evaluated only 2 kb in maize lines and such small LD value improved the localization accuracy to the gene level (Gore et al. 2009). Soybean is a strictly self-pollination crop so that its genome exhibits a strong LD, which was estimated about 500 kb in previous report (Hyten et al. 2007). Based on NJAU 355 K SoySNP array, LD analysis was performed in a large population containing 367 soybean accessions, the LD which decays to half of its maximum value was 130 kb in cultivated soybeans (Wang et al. 2016). In this study, the relatively small LD value enhances mapping resolution and significantly shrinks genomic region to effectively identify candidate gene underlying SNP loci. Population structure is another important factor that determines the power of GWAS. A complicated population structure can lead to unequal allele frequencies in the subpopulations, which may result in false SNP loci associated with target traits. In this study, we selected a panel with no high population structure (Wang et al. 2016). The same population have been used for identifying genetic loci of water-soluble protein content trait via GWAS (Zhang et al. 2017). As shown in the quantile-quantile and Manhattan plots (Fig. 2) for BLUP and three individual environments, we identified significant SNP signals for DR after employing a CMLM model in accounting for population structure and genetic relatedness.

Novel SNP loci for DR

Development of soybean accessions with broad-spectrum resistance to diverse SMV strains, as well as identification of novel resistant loci and genes, is necessary for reducing damages caused by SMV. Recently, the cloning of a major gene Rsv4 (Ishibashi et al. 2019) with broad resistance to potyvirus has provided increasing knowledge to understand the genetic variation and resistance mechanism in soybean defense response against SMV. The genomic regions of Rsv4 or RSC8, RSC7 significantly associated with SMV resistance, have been previously fine-mapped in a closed region on chromosome 2 (Ishibashi et al. 2019; Wang et al. 2011b; Yan et al. 2015). Thus, it is not surprising that four SNPs located in the Rsv4-linked regions were also identified in our study. In addition, fifteen SNPs located in a 2.6-Mb region identified in our study are close to Rsv1 (Shi et al. 2008), Rsv5 (Klepadlo et al. 2017), Rsc-pm (Yang et al. 2013), and Rsc-ps (Yang et al. 2013), and these loci also implied a contribution to broad SMV resistance. However, studies focusing on Rsv1 loci are difficult due to many R genes (contained NBS-LRR domains) flock to each other or to multiple alleles of the same gene (Yang et al. 2013; Ma et al. 2016). Previous reports indicate that 5gG3 and 3gG2 were strong candidate genes tightly linked to Rsv1 loci in PI 96983 (Hayes et al. 2004; Gore et al. 2002). A recombinant line only containing 3gG2 gene inherited from PI 96983 conferred resistance to certain SMV strains as well as cultivars Marshall and Ogden; thereby, it is speculated that diverse Rsv1 alleles in differ cultivars were due to recombination of major R genes (Hayes et al. 2004). Therefore, understanding structural variations caused by recombination in Rsv1 region with the help of sequencing technique may figure out the genetic architecture in this genomic region.

In addition, loci located on chromosomes 8 and 20 significantly identified in this study were novel loci with average R2 values as high as 32.10%. These five SNPs were located in the previous reported biotic stress-related QTLs of Fusarium lesion length 1-1 (Ellis et al. 2012), SCN 34-4 (Winter et al. 2007), SCN 43-6, SCN 43-8, SCN 44-12, and SCN 44-8 (Jiao et al. 2015), indicating that the two novel loci on chromosomes 8 and 20 were critical in the resistance of pathogens and soybean cyst nematode. We identified favorable allele and haplotypes in these two novel loci (Fig. 4). A few accessions in Hap1 have low DR, perhaps possessing other resistant loci. The average DR in Hap1 is 0.73, which suggested that most accessions in Hap1 were susceptible to SMV (Fig. 4b). Nineteen accessions in Hap2 were identified as resistant germplasms with average disease rates less than 0.1. These results will provide insights for accelerating the development of molecular marker and breeding for SMV resistance.

Candidate resistance genes for SC3

Most researchers reported resistant loci such as Rsv1, which are acquired from a limited number of resistant soybean cultivars (Kiihl and Hartwig 1979), whereas exploration in large population and identification of novel genes are lagging. Hence, novel resistant genes identified via GWAS in soybean germplasm resource population are necessary for SMV management. In contrast with Rsv1 and Rsv5 loci, no NBS-LRR genes were found in the region of two novel loci identified in our study, suggesting these loci likely contained new classes of resistance genes. Three candidate genes were identified underlying the two novel loci. Glyma.08g175800 is annotated as an aldolase in hydrogen peroxide (H2O2) biosynthetic process and highly expressed in leaf. A lethal systemic hypersensitive response caused by reactive oxygen lead to necrosis in leaf (Hernández et al. 2016). Local cell death caused by reactive oxygen can not only prevent SMV from spreading in adjacent cells, but also as a diffusible signal to induce antiviral production and pathogenesis-related defense responses in adjacent plant tissues (Chen et al. 2017). In addition, Glyma.08g175900, a member of TCP-1/cpn60 chaperonin family protein, has been reported to play important role in cell apoptosis and the immune responses of human cells (Tsan and Gao 2009). Furthermore, its homologous proteins in Arabidopsis and maize were significant upregulated after pathogen infection (Hwang et al. 2012; Wu et al. 2013). High expression level of Glyma.08g175900 in soybean leaf might suggest a potential role in SMV infection defense. Another gene Glyma.20g190000 annotated as a 3′-phosphoinositide-dependent protein kinase has not been studied in soybean, but its homologous gene PDK1 in tomato is involved in the regulation of cell death. Silencing PDK1 induces whole plant death in tomato, and further research showed that PDK1 phosphorylates AGC kinase negatively regulates cell death (Devarenne et al. 2006). Moreover, overexpression of Ospdk1 in rice enhanced defensive reaction against pathogens (Matsui et al. 2010). These novel candidate genes identified in this study provided valuable gene resources to reveal the resistance mechanism to SMV.