Introduction

Soybean (Glycine max) is one of the most important commercial crops across the world. About 30 kinds of major disease have been reported in soybean. Root and stem rot (Phytophthora root rot, PRR) was ranked among the five most serious diseases of soybean in the United States during 2003–2005 (Wrather and Koenning 2006). Soybean root and stem rot disease is caused by the soil-borne oomycete pathogen Phytophthora sojae (Kaufmann and Gerdemann) (Schmitthenner 1985; Zhu et al. 2003) and can occur throughout the entire period of soybean growth, resulting in huge losses in agricultural production (Wrather and Koenning 2006).

There are many methods to control this disease; seed coating agents, reducing soil moisture and screening for resistance resources are among the most effective (Dorrance et al. 2003; Dorrance et al. 2007). To date, special soybean genotypes conferring complete immunity to every strain of P.sojae have not been found. However, partial resistance is more durable for the changing diversities of P.sojae strains. Most research has focused on partial resistance to PRR (Dorrance et al. 2003; Lee et al. 2014; Sun et al. 2014a). So far, 21 Rps (resistance to P.sojae) genes/alleles have been detected on four chromosomes. Ten Rps genes/alleles were located on the short arm of chromosome 3, including Rps1 (containing five alleles; Rps1a, Rps1b, Rps1c, Rps1d and Rps1k), Rps7, Rps9, RpsYu25, RpsYD29, RpsUN1, and an unnamed Rps gene in the Japanese cultivar ‘Waseshiroge’ (Demirbas et al. 2001; Weng et al. 2001; Gao et al. 2005; Sugimoto et al. 2011; Sun et al. 2011; Wu et al. 2011a; Lin et al. 2013; Zhang et al. 2013). Rps3 (containing five alleles; Rps3a, Rps3b and Rps3c) has been mapped on chromosome 13, and is linked with Rps8 (Demirbas et al. 2001; Sandhu et al. 2005; Gordon et al. 2006; Lee et al. 2013). Rps2 and RpsUN2 have been mapped on chromosome 16 (Demirbas et al. 2001; Lin et al. 2013). RpsJS, Rps4, Rps5 and Rps6 are linked and located on chromosome 18 (Demirbas et al. 2001; Sandhu et al. 2004; Lee et al. 2013, 2014; Sun et al. 2014b). Among these Rps genes/alleles, only RpsJS and RpsYD29 have been finely mapped (Zhang et al. 2013; Sun et al. 2014b) and only one gene with a coiled coil-nucleotide binding site–leucine rich repeat has been cloned (Gao et al. 2005).

All previous studies on PRR resistance in soybean have been performed using traditional linkage mapping. Association mapping is another effective approach to connect structural genomics and phenomics in plants, if information on population structure and linkage disequilibrium (LD) is available (Jin et al. 2010). With high-density simple nucleotide polymorphisms (SNPs), genome-wide association mapping has been successfully used to analyze complex trait variation in Arabidopsis (Atwell et al. 2010; Horton et al. 2012), rice (Huang et al. 2010; Yang et al. 2014b), maize (Kump et al. 2011; Tian et al. 2011; Yang et al. 2014a), and soybean (Wang et al. 2008; Hao et al. 2012a; Wen et al. 2014; Zhang et al. 2014; Zhao et al. 2015) to identify alleles and genes for disease resistance and agronomic traits (Wen et al. 2014; Zhao et al. 2015). Here, we performed a genome-wide association study (GWAS) of soybean using 59,854 high-quality SNPs to identify significant markers and genes associated with PRR resistance. Our results could help to further enhance durable resistance based on the genes we identified and improve our understanding of the genetic basis of partial resistance to P.sojae.

Materials and methods

Sampling and genotyping

A total of 472 soybean accessions from Yangtze-Huai soybean breeding germplasm population were obtained from National Center for Soybean Improvement, Nanjing city, Jiangsu province, China.

The sequenced RAD (restriction association site DNA) markers were used in the present study. All the genotyping work was done at BGI Tech, Shenzhen, China. Soybean genomic DNA was extracted from young leaves using a modified cetyltrimethylammonium bromide (CTAB) method (Allen et al. 2006), with average concentration 90 ug/ul. The sequences of the 472 accessions were obtained using an Illumina HiSeq 2000 instrument and through utilizing the MSG (multiplexed shotgun genotyping) method (Andolfatto et al. 2011). DNA fragments between 400 and 600 bp were generated by TaqI digestion, giving a total of 1322.73 million paired-end reads of 90 bp (including 6 bp index) read length (110.87 Gb of sequence) with most at an approximately × 3.86 depth and 4.57 % coverage. All the sequence reads were aligned against the reference genome of Williams 82 (genome assembly 1 annotation version 1.1) (Schmutz et al. 2010) using the SOAP2 short read alignment software (Li et al. 2009) with parameters including sequence similarity, pair-end relationships, and sequence quality. RealSFS (Yi et al. 2010) was used for the population SNP calling, based on the Bayesian estimation of the site frequency at every site. The fastPHASE software (Scheet and Stephens 2006) was used for genotyping the SNP imputation after the heterozygous alleles were turned into missing alleles. In this study, we used 279 accessions with 59,845 SNPs for GWAS study (see Additional file 1).

Partial resistance evaluation

An isolate of P. sojae, P7076 (with virulence to 1a,1b,1c,1d,1k,3a), was obtained from the Key laboratory of Monitoring and Management of Plant Diseases and Insects, Ministry of Agriculture, Nanjing Agricultural University, China. The 472 soybean accessions were determined using the modified hypocotyl inoculation technique (Sun et al. 2011). There are 279 susceptible materials, which were used as the slant board assay. The method was previously published (Wu et al. 2011b). In brief, to avoid the difference of materials, ten 7-day-old seedlings for each accession were placed on trays, respectively. And wounded 20 mm below the crown using a scalpel. Lesion lengths were measured from the inoculation site to the edge of the lesion margin. Details of the procedures have been described previously (Sun et al. 2014a).

The soybean accessions were evaluated, and compared with two controls (var. ‘Conrad’ and var. ‘Sloan’) with known partial resistance characteristics. Conrad has high levels of partial resistance to P. sojae, whereas Sloan is moderately susceptible (Burnham et al. 2003). The experiment was repeated three times.

Genetic diversity and polymorphism information content (PIC)

Nei’s gene diversity index (H) (Nei 1973), and the polymorphism information content (PIC) per locus were calculated with an in-house R script. Nei’s gene diversity (H) was calculated based on the formula

$$H = 1 - \mathop \sum \limits_{i = 1}^{n} \left( {\frac{ni}{N}} \right)^{2}$$

where n i is the allele frequency at the ith locus, n is the number of alleles this locus and N is the total number of accessions.

The PIC for each marker was calculated based on the formula

$$PIC = 1 -( \mathop \sum \limits_{i = 1}^{n} p_{i}^{2})- \mathop \sum \limits_{i = 1}^{n-1} \mathop \sum \limits_{j = n + 1}^{n} 2p_{i}^{2} p_{j}^{2}$$

where P is the relative frequency of the jth pattern for SNP marker i (Botstein et al. 1980).

Population genetic study

Structure and maximum likelihood (ML) tree was applied to infer the population stratification of our materials. The population structure was inferred using the software program STRUCTURE 2.2 (Pritchard et al. 2000) based on 3113 SNPs with minor allele frequency (MAF) >20 % and physical distance >60 kb (Wang 2014). The burn in period was 100,000-iteration followed by 200,000 using a model that allowed for admixture and correlated allele frequencies (He and Park 2013). At least four independent runs of structure were performed, setting K from 1 to 10, and an average likelihood value, L(K), across all runs was calculated for each K. The most likely number of subpopulations was then determined using the Delta K method proposed by Evanno et al. (2005). Principal component analysis (PCA) was done by using GCTA software (Yang et al. 2011). We applied a maximum likelihood method to construct the phylogenetic tree based on the SNP-based genetic distance using RAxML software (Stamatakis 2006) with 100 replicates for a bootstrap confidence analysis based on 59,845 SNPs (MAF > 0.05) by using GTR+G model which was based on the result of jModelTest (Posada 2008). Linkage disequilibrium parameter (r 2) for estimating the degree of LD between pair-wise SNPs (59,845 with MAF > 0.05) was calculated by using the software TASSEL 4.0 (Bradbury et al. 2007) with 1000 permutations. The LD decay rate was measured as the chromosomal distance at the r 2 dropped to the half of from its maximum to minimum value.

Genome-wide association study

In this study we used two different models to test associations between the SNPs (MAF > 0.05) and disease assessment criteria. The first model was general linear model (GLM), which just containing the SNP tested as a fixed effect, was used to test the association between the SNP and the disease assessment criteria. The second model is mixed linear model (MLM) where, in addition to the SNP being tested, Q matrix and kinship matrix were included as fixed and random effects, respectively. Kinship was calculated by using TASSEL 4.0. Both of those two models were performed by the software TASSEL 4.0 (Bradbury et al. 2007). To get reliable results, false discovery rate (FDR) ≤ 0.05 was used to identify the significant associations.

Results

Genetic diversity, population structure and linkage disequilibrium analysis

A total of 60,656 SNPs were used to analyze the genetic diversity in the 279 soybean accessions. The allele frequencies of most alleles were less than 10 % and the average MAF was 0.217 (range 0.014–0.498) (Fig. 1a). The expected heterozygosity, nucleotide diversity and PIC of the 60,656 SNPs averaged 0.306, 0.307 and 0.251, with ranges of 0.028–0.499, 0.028–0.500 and 0.027–0.375, respectively (Fig. 1).

Fig. 1
figure 1

Distribution of genetic diversity of 60,656 SNPs across 279 accession. a Distribution of MAF, b distribution of expected heterozygosity, c distribution of PIC values, and d distribution of nucleotide diversity (color figure online)

In this study, we used Bayesian model-based methods and principal component analysis to analyze the population structure. Unfortunately, estimated likelihood values, LnP (D), do not indicate the exact K value. Therefore, an ad hoc quantity (∆K) was used to overcome the difficulty interpreting real K values. Using this approach, an identifiable peak indicated the true value of K based on ∆K. For the 279 accessions, the highest value of ∆K was at K = 3 (Fig. 2a). This suggested that the 279 accessions could be divided into three major subpopulations (Fig. 2a, b). Principal component analysis gave the same result (Fig. 2c). Using the maximum likelihood method, a phylogenetic tree was constructed that also showed consistent results with STRUCTURE (Fig. 2d). Interestingly, we found that the backgrounds of the three subpopulations were consistent with three main parents of the populations. In subpopulation Q1 (including 73 accessions), most soybean accessions were derived from the main parent 86–4. Subpopulation Q2 included 75 accessions that were mainly descended from the parent TongDou. There were 99 accessions in subpopulation Q3, whose main parent was 88–48. The 32 remaining accessions showed admixture patterns. The corresponding Q-matrix at K = 3 was used for the subsequent genome-wide association analysis. The results of AMOVA indicated that 71.52 % of the genetic variation was within subpopulations, whereas 28.48 % of the total genetic variation was among subpopulations (Table 1). The population pair-wise F ST was 0.2848 (P < 0.001) between the three subpopulations (Table 1).

Fig. 2
figure 2

Population structure of 279 soybean accessions. a Determination of K value in Structure analysis. Green line is log-likelihood of the data, L(K), as a function of K (number of groups used to stratify the sample). Red line are values of ∆K, which is model value used to detect true K of the three groups (K = 3). b Model-based clustering for each of 279 accessions based on 3113 SNPs used to build the Q matrix. c PCA plots of the first two components of 279 accession. d Maximum likelihood tree of 279 accessions. The red correspond with Q1, green correspond with Q2, and blue correspond with Q3 (color figure online)

Table 1 Analysis of molecular variance (AMOVA) and F ST for three subpopulation inferred from STRUCTURE

Different crops have different LD, which can extend from a few hundred base pairs to kilo-bases (http://www.extension.org). LD generally decays faster in cross-pollinated crops than in self-pollinated crops (Li et al. 2014). The LD in our materials was approximately 480 kb, where r 2 dropped to half its maximum value (r 2 = 0.47) (Fig. 3). It has been reported that there is high LD in the soybean genome (Lam et al. 2010; Hao et al. 2012b; Wen et al. 2014). The LD decay estimates in soybean are slightly higher than those in rice (75–150 kb), and much greater than in maize (1.5–10 kb) (Yan et al. 2009; Huang et al. 2010). This is consistent with an earlier suggestion that LD extends a much longer distance in self-pollinated species than in cross-pollinated species (Zhu et al. 2008).

Fig. 3
figure 3

Genome-wide average LD decay (measured as genotypic r 2) estimated in the population of 279 soybean accessions

Phenotypic analysis

In our study, 279 soybean lines were inoculated with isolates of P.sojae race P7076 to assess partial resistance by measuring lesion length. The cultivar ‘Conrad’ has high levels of partial resistance to P.sojae, whereas ‘Sloan’ is moderately susceptible (Burnham et al. 2003). Conrad and Sloan were used as control materials in the experiment to establish the feasibility of the experimental environment. We evaluated the phenotypic maximum value, minimum value and average values, and performed ANOVA using SAS 9.2. The maximum value was approximately 8–10 times the minimum value. Phenotypic analysis showed that there were significant correlations between materials. There was no significant difference between repeats (Table 2).

Table 2 Descriptive statistics, ANOVA and broad-sense heritability for trait

Genome-wide association analysis

In this study, the population containing 279 soybean accessions with 59,845 (MAF > 0.05) SNPs was used to carry out association analysis. Both the GLM model without population structure (P) and familial relatedness (K) and the MLM model (Q+K) were used for association mapping. The quantile–quantile (QQ) plot showed that the MLM model was better than GLM for reducing false positive results (Fig. 4b, d). However, the same strong peak (SNP 35123596) was found on chromosome 13 by both models (Fig. 4a, c). This SNP may be significantly associated with partial resistance to P.sojae.

Fig. 4
figure 4

Genome-wide association study of P.sojae in the population. a Manhattan plots of the MLM model for P.sojae association. The horizontal red line indicates the genome-wide significance threshold (−log10(P) > 6). b Quantile–quantile (QQ) plot of MLM model for P.sojae association. c Manhattan plots of GLM for P.sojae assocation. d Quantile–quantile plot of GLM for P.sojae association (color figure online)

Possible candidate gene identification and analysis

Based on the LD distance, we extended the region of interest 500 kb upstream and downstream of the significantly associated SNP position. Within this region, there were three SNPs (SNPs 34976382, 35564109, 35324537) in the exon parts of three genes (Glyma13g32980, Glyma13g33900, Glyma13g33512) (Table 3). Glyma13g32980 is a coat protein I (COPI)-related gene, while Glymal13g33900 and Glymal13g33512 encode a 2OG-FE(II) oxygenase family protein and pentatricopeptide (PPR) repeat-containing protein, respectively. In addition, there are two genes encoding leucine-rich repeat (LRR) domains (Glyma13g33536, Glyma13g33740) (Table 3), which are important in plant responses to various diseases (Avrova et al. 2004; Gao and Bhattacharyya 2008), were near the strong peak. Two other genes (Glyma13g33243, Glyma13g33260), which encoded a Gpi16 subunit and Zn-finger protein, respectively, were also located in this hot region. These genes are probably involved in natural variation in partial resistance to P.sojae. In the next stage, we will research the functions of these genes and examine their association with partial resistance to P.sojae in soybean.

Table 3 The candidate genes related to soybean P.sojae resistance in this study

Discussion

PRR is one of the most serious diseases in soybean, and has caused a great loss of soybean production in recent years. Here, we performed a whole-genome association study for soybean PRR disease resistance among 279 accessions. Based on the population structure, PCA and phylogenetic analysis, these 279 accessions were clearly divided into three subgroups (Fig. 2) that corresponded to three different parents. The GLM and MLM models were applied to our GWAS study. Our results were consistent with previous reports that the MLM model is superior to the GLM model (Huang et al. 2010; Yang et al. 2010; Hao et al. 2012a). We found that the MLM model greatly reduced false positives (Fig. 4b, d).

In recent years, genome-wide association mapping has been successfully used to analyze complex trait variation in several crops (Breseghello and Sorrells 2006; Zhao et al. 2007; Huang et al. 2010; Yang et al. 2010; Hao et al. 2012a; Wang et al. 2012). However, the GWAS approach had not been applied to study PRR resistance in soybean. Here, using GWAS to dissect the genetic architecture of PRR resistance in 279 accessions, we found the same strong peak on chromosome 13 using both models (Fig. 4a, c). This SNP may be significantly associated with partial resistance to P.sojae.

Within the 500-kb flanking regions of the significantly associated position, we found seven genes that are probably involved in natural variation in partial resistance to P.sojae. These genes encode a COPI (Glyma13g32980), a 2OG-Fe(II) protein (Glyma13g33900), a PPR protein (Glyma13g33512), LRR domain proteins (Glyma13g33536, Glyma13g33740), a Gpi16 subunit (Glyma13g33243) and a Zn-finger protein (Glyma13g33260) (Table 3). Three SNPs were located in exon regions within these genes (Glyma13g32980, Glyma13g33900, Glyma13g33512) (Table 3). According to previous research, the 2OG-Fe(II) domain of the DMR6 gene, through negative expression regulation of defense response genes, enhances resistance to downy mildew in Arabidopsis (van Damme et al. 2008). Here, we suggest that Glymal13g33900, which contains a 2OG-Fe(II) domain, may be related to soybean P.sojae resistance. In addition, two genes (Glyma13g33536, Glyma13g33740) encoding LRR domains (Table 3), which are characteristic of R genes that are important in plant responses to various diseases (Lupas et al. 1991; Cheng et al. 2010). Also found near the strong peak, the Gpi16 subunit plays a vital role in GPI anchoring in the yeast Saccharomyces cerevisiae, and Zn-finger proteins are important transcription factors in plant defense responses (Fraering et al. 2001; Lee et al. 2006) (Table 3).

Previous studies have refined the map locations of known loci related to SDS and SWM resistance in soybean by GWAS (Wen et al. 2014; Zhao et al. 2015). Our results demonstrate that genome-wide association mapping is also a powerful tool for developing soybean accessions with partial resistance to P.sojae. The new alleles and candidate genes we identified are probably involved in natural variation in partial resistance to P.sojae. In future research, the functions of these genes will be examined in relation to partial resistance to P.sojae in soybean.