Introduction

Soybean (Glycine max) is grown globally mainly for its protein and oil. However, global salinization of the arable land caused more than 20% of soybean yield reduction (Beecher 1994; Blumwald and Grover 2006; Katerji et al. 2003). Therefore, maximizing soybean yield potential depends on increases in salt tolerance to some extent. Soybean germplasms range widely in their response to salt stress (Shao et al. 1986). Salt-tolerant soybean lines (chloride excluder) accumulate less chloride in the leaves than salt-sensitive lines (chloride includer) (Lee et al. 2004; Ledesma et al. 2016), whereas chloride excluder has higher leaf chlorophyll concentration than chloride includer under salt stress (Patil et al. 2016). Evaluation of salt tolerance in the greenhouse is time-consuming, labor-intensive, and costly (Valencia et al. 2008; Ledesma et al. 2016), and selection of salt-tolerant lines in the field is not accurate since the salt concentration varies in the field. Bi-parental quantitative trait locus (QTL) mapping has been implemented to reveal the mechanisms for salt tolerance in soybean, and simple sequence repeat (SSR) markers have been reported significantly associated with salt tolerance (Lee et al. 2004; Hamwieh et al. 2011).

Compared to bi-parental QTL mapping, genome-wide association mapping provides more precise location of QTLs. The application of genome-wide association studies (GWAS) is beneficial from the advance of next-generation sequencing (NGS). GWAS has been carried out to identify markers associated with iron deficiency chlorosis (Mamidi et al. 2011), chlorophyll concentration (Hao et al. 2012), seed protein and oil content (Hwang et al. 2014), resistance to sudden death syndrome (SDS) (Wen et al. 2014, 2015), grain yield, lodging, seed coat color, pubescence, flower color (Wen et al. 2015), flowering time, maturity dates, and plant height in soybean (Wen et al. 2015; Zhang et al. 2015).

An important factor to consider in the application of GWAS is the extent of linkage disequilibrium (LD). LD refers to the degree of non-random association of alleles at different loci. The structure of LD across the genome determines the resolution of association mapping (Zhu et al. 2008). The average LD (R 2) decayed to 0.2 within 360 k base pairs (kb) in euchromatic region while decayed to 0.2 within 9600 kb in heterochromatic region of soybean (Hwang et al. 2014). A high marker density is required for the regions with low LD for GWAS (Hwang et al. 2014). The other problem confronted by GWAS is the potential spurious association caused by population structure and familiar relatedness. To control the false association error rate, general linear model (GLM) considering population structure (Pritchard and Donnelly 2001) has been initially employed. Mixed linear model (MLM)-based methods such as unified mixed model (Yu et al. 2006) and compressed MLM (Zhang et al. 2010) have been used to correct for genetic relatedness and population structure.

Although salt tolerance in soybean has been studied using various germplasms including domesticated soybean (G. max) and wild soybean (Glycine soja), the major soybean QTL conferring salt tolerance has been consistently mapped on chromosome (Chr.) 3 (Lee et al. 2004; Hamwieh and Xu 2008; Hamwieh et al. 2011; Guan et al. 2014; Qi et al. 2014; Concibido et al. 2015). Thus, a worldwide collection of diverse soybean plant introductions (PIs) included in GWAS provides a potential broad genetic basis for underlying the mechanism of salt tolerance. The availability of SoySNP50K has paved the way for identification of single-nucleotide polymorphism (SNP) markers associated with salt tolerance in soybean using GWAS. The GWAS of salt tolerance at vegetative 1 (V1) stage (one set of unfolded trifoliolate leaves) was initially carried out by Huang (2013) using 192 diverse soybean germplasm lines, of which 61% were originated from USA. A total of 62 SNP markers on Chr. 2, 3, 5, 6, 8, and 18 from SoySNP50K data were significantly associated with leaf scorch score (LSS) at V1 stage under salt stress (Huang 2013). Kan et al. (2015) conducted GWAS on an association mapping panel consisting of 191 soybean landraces for salt tolerance at germination stage. One SNP on Chr. 9 and seven SNPs on Chr. 2, 3, 9, 12, and 13 were significantly associated with germination index ratio and germination rate ratio, respectively (Kan et al. 2015). Patil et al. (2016) performed GWAS on a panel of 106 soybean lines for salt tolerance at V2 stage (two sets of unfolded trifoliolate leaves). A total of 19 and 11 SNPs on Chr. 3 from SoySNP50K data were associated with LSS and leaf chlorophyll concentrations which were expressed as soil plant analysis development (SPAD) value, respectively (Patil et al. 2016). In this study, we collected 283 PI lines distributed in 29 countries worldwide to provide a wide genetic basis for GWAS. Two salt tolerance trait indicators, leaf chloride concentrations and leaf chlorophyll concentrations, were utilized for salt tolerance evaluation at V1 stage. The objectives of this study were to uncover novel genomic regions and SNP markers for salt tolerance and to confirm the previously identified genomic regions and SNP markers associated with salt tolerance by GWAS.

Materials and methods

Plant materials and evaluation of salt tolerance in greenhouse

A total of 283 PIs (Supplement Table 1) were obtained from the USDA Soybean Germplasm Collection. Maturity group (MG) of these PIs ranged from 000 to VIII. Forty-nine percent of PIs were from USA, Russia, China, Germany, and Bulgaria; the rest were from other 24 different countries. Of the PIs, 283 were planted in a randomized complete block design with two replications in December 2014 and June 2015, in greenhouse (25 ± 2 °C, 14-h photoperiod) of Rosen Center at University of Arkansas. Chloride excluder (salt-tolerant) cultivar Osage (Chen et al. 2007) and chloride includer (salt-sensitive) cultivar Dare (Brim 1966) were used as checks. For each line, ten seeds were sown in a 3.5-in. plastic pot (Plasticflowerpots.net, Lake Worth, FI), containing approximately 300 g loamy sand (Kibler, AR) as the growth medium. Soil particle analysis based on a 2-h hydrometer method described by Arshad et al. (1996) showed that the sandy loam consists of 66% sand, 26% clay, and 8% silt. Pots were placed in trays (17 3/4″ × 25 1/2″ × 1″; US Plastic Corp., Lima, OH) for watering and salt treatment. At VC stage (unifoliolate leaves unrolled), plants were thinned to five per pot. Seedlings were fertilized once per week by Miracle-Gro®, All Purpose Plant Food (The Scotts Miracle-Gro Company, Marysville, OH). Salt treatment using 120 mM NaCl solution was initiated at V1 stage (one set of unfolded trifoliolate leaves). Salt treatments were performed for 2 h per day and continued until the checks were showing contrasting leaf symptom. It took 12–18 days. After the last day of treatment, the measurements of leaf chlorophyll concentrations were conducted on the top secondary fully expanded leaves for three times. Leaf chlorophyll concentrations, expressed as SPAD value, were measured by a chlorophyll meter (Konica Minolta SPAD-502). Subsequently, the plant leaves in each pot were harvested and were dried in a forage dryer. The leaf samples were ground to fine powder using a coffee bean grinder (Krups®, Shelton, CT). The fine powder sample was used for chloride extraction and quantified by inductively coupled plasma–optical emission spectroscopy (ICP-OES) as described by Wheal and Lyndon (2010). The chloride concentration was converted into mg kg−1.

Analysis of variance and heritability

The descriptive statistics for the leaf chloride concentration and leaf chlorophyll concentration of the populations and checks were obtained from JMP 9.0 (SAS Institute Inc. 1989). Broad-sense heritability (H 2) of leaf chloride concentration and leaf chlorophyll concentration were calculated using the following equation (Greene et al. 2008):\( {\ H}^2=\frac{\sigma_g^2}{\sigma_g^2+\frac{\sigma_{g xe}^2}{e}+\frac{\sigma^2}{re}} \), where \( {\sigma}_g^2 \) is the genetic variance, \( {\sigma}_{gxe}^2 \) is the genotype by year interaction, σ 2 is the error variance, r is the number of replications, and e is the number of years. Analysis of variance (ANOVA) and the estimation of variance components were performed using the PROC GLM procedure of SAS 9.4 (SAS Institute 2004), where genotype was considered as a random effect and replication nested with year was used as a random effect.

Genotyping and quality control

Genotypic data of ~42,509 SNPs for a total of 283 soybean genotypes were obtained from the Illumina Infinium SoySNP50K BeadChip database (Song et al. 2013). A total of 290 SNPs, which were not assigned to any chromosome, were excluded from further analysis. Markers with missing rate >2% and minor allele frequency (MAF) <5% were excluded from further analysis. Subsequently, a total of 33,009 SNPs (Supplement Table 2) with MAF ≥5% across 283 genotypes were employed in the association analysis with leaf chloride concentration and leaf chlorophyll concentration.

Linkage disequilibrium estimation

Pairwise LD between markers was calculated as squared correlation coefficient (R 2) of alleles using 33,009 SNPs. The calculation of LD was based on 10 M base pairs (Mb) windows using R package synbreed (Wimmer et al. 2012). The R 2 was calculated separately for euchromatic and heterochromatic regions in each chromosome because of the substantial difference in recombination rate between these two regions. The physical lengths of euchromatic and heterochromatic regions (Supplement Table 2) were obtained from Gmax 1.01 reference genome (Grant et al. 2010). The LD decay rate of the population, defined as the chromosomal distance where the LD decays to half of its maximum value (Huang et al. 2010), was calculated using R script developed by Marroni et al. (2011).

Population structure

All the 33,009 SNP markers were sorted by chromosome and physical distance, and one SNP marker was selected every ten SNP markers based on the physical distance; 3301 out of 33,009 SNP markers were used to infer the population structure by STRUCTURE 2.3.4 (Pritchard et al. 2000). The hypothetical number of subpopulations (K) was set from 1 to 10, and five independent iterations were performed for each K. Admixture and allele frequency correlated models were used. The burn-in iteration was 10,000, followed by 25,000 Markov chain Monte Carlo (MCMC) replications. The optimum value of K was determined by plotting the rate of change in the log probability of data (∆K) against the successive K values (Evanno et al. 2005). The K value was considered to be optimum while ∆K reaches the maximum. The population structure (Q matrix) was generated as the STRUCTURE result. Missing genotypes were imputed by k nearest neighbors with Euclidean distance, and the kinship matrix (K) was calculated by centered-IBS method (Endelman and Jannink 2012) using TASSEL 5.0.

Genome-wide association analysis

GLM considering population structure (Q matrix) and MLM accounting for population structure (Q matrix) and kinship (K matrix) were implemented in the TASSEL 5.0 (Bradbury et al. 2007) for genome-wide association analysis. For the GLM analysis, the equation was y = μ + + + e; for MLM analysis, the equation was y = μ + + + Zu + e, where y is N × 1 vector of best liner unbiased predictors (BLUPs) of genetic effect (N is the population size), μ is the overall mean value, X is the incidence matrix relating to the plant introduction lines to the marker effect α, P is the incidence matrix relating to the plant introduction lines to population structure effect β, Z is the incidence matrix relating to the plant introduction lines to kinship effect u, and e is the random error term. For GLM with Q matrix model, 10,000 permutation runs were conducted to find out the significant SNP markers associated with leaf chloride concentration and leaf chlorophyll concentration (Bradbury et al. 2007). The SNPs with −log10 (P) > 4.1 or p < 7.9 × 10−5 in GLM and the SNPs with −log10 (P) > 2.1 or p < 7.9 × 10−3 in MLM were considered to be significant. For MLM, optimum compression level was used in TASSEL 5.0 (Bradbury et al. 2007). Manhattan plots of −log10 (P) values for each SNP vs. chromosomal position were generated as the TASSEL results.

Results

Phenotypic data

The leaf chloride concentrations of 283 PIs ranged from 21,985 to 106,399 with an average of 64,056 mg kg−1 in 2014 and ranged from 6295 to 83,350 with an average of 35,730 mg kg−1 in 2015 (Supplement Table 3 and Supplement Fig. 1). The average leaf chloride concentration of chloride excluder Osage was 32,636 and 7383 mg kg−1 in 2014 and 2015, respectively (Supplement Table 3). In contrast, the average leaf chloride concentration of chloride includer Dare was 89,209 and 47,525 mg kg−1 in 2014 and 2015, respectively (Supplement Table 3). Leaf chlorophyll concentrations were significantly negatively correlated with leaf chloride concentrations under salt stress. The chloride excluder Osage had higher leaf chlorophyll concentrations than chloride includer Dare (Supplement Table 3). The leaf chlorophyll concentrations of 283 PIs ranged from 16.4 to 43.7 with an average of 29.9 in 2014 and ranged from 15.6 to 44.9 with an average of 32.3 in 2015 (Supplement Table 3 and Supplement Fig. 2). Both genotype and year variances were significant. However, both leaf chloride concentrations and leaf chlorophyll concentrations from two years were highly correlated, and most of the genotypes in the population ranked similarly between two years, as reflected by the insignificantly genotype × year variance components. As a result, relatively high broad-sense heritability (H 2) estimates (0.76 for leaf chloride concentrationt, 0.65 for leaf chlorophyll concentration) with the year as the environment factor were obtained based on variance components.

Distribution of SNP markers, linkage disequilibrium, and population structure

A total of 33,009 SNPs were employed for GWAS analysis of salt tolerance traits, resulting in a marker density of 59 SNPs per Mb in euchromatic region and 18 SNPs per Mb in heterochromatic region (Supplement Table 2). MAF of SNPs ranged from 0.05 to 0.50 with an average of 0.30 (Supplement Fig. 3). The LD decayed at 348 and 4838 kb in euchromatic region and heterochromatic region, respectively (Supplement Table 4). STRUCTURE analysis indicated that the calculated ∆K reached the maximum while K = 3, suggesting that three subpopulations contain all PIs with the greatest possibility (Supplement Fig. 4 and Fig. 1). Significant divergence among subpopulations and average distance among populations in the same population were obtained (Supplement Table 5). None of the subpopulations had PI lines exclusively from one country (Supplement Table 5 and Supplement Fig. 4).

Fig. 1
figure 1

QQ plot for GLM analysis (expected −log10 (P value) plot against −log10 (P value)

Genome-wide association analysis

For GLM, a total of 45 SNPs distributed on Chr. 2, 3, 7, 8, 10, 13, 14, 16, and 20 were significantly associated with both leaf chloride concentration and leaf chlorophyll concentration in 2014, 2015, and combined years (Table 1). The GWAS analysis based on the average phenotypic data over years indicated that the major alleles on Chr. 2 and Chr. 7 decreased the leaf chloride concentration by up to 17,492 mg kg−1 and increased the leaf chlorophyll concentrations by up to 6.6; meanwhile, three major alleles on Chr. 3 decreased the chloride concentration by up to 14,809 mg kg−1 and increased the leaf chlorophyll concentration by up to 4.6. On the other hand, other 36 major alleles on Chr. 3, 8, 10, 13, 14, 16, and 20 increased the leaf chloride concentration by up to 26,345 mg kg−1 and decreased leaf chlorophyll by up to 6.9. Overall, the significant SNPs associated with salt tolerance explained 8–52% of phenotypic variation for leaf chloride concentrations and 8–42% of phenotypic variation in leaf chlorophyll concentrations. For MLM, a total of 47 SNPs on Chr. 3 were significantly associated with both leaf chloride concentrations and leaf chlorophyll concentrations in 2014, 2015, and combined years. Among those 47 markers, 27 significant SNPs on Chr. 3 which were stable across years and traits were detected in both GLM and MLM (Table 2). The SNP markers ss715581136 on Chr. 2 and ss715637438 on Chr. 20, which were significantly associated with both leaf chloride concentrations and leaf chlorophyll concentrations in 2014, 2015, and combined years in GLM (Table 1), were also detected to be significantly associated with leaf chloride concentrations in 2014, 2015, and combined years in MLM (Table 2 and Fig. 2). Also, the SNP markers ss715618712 on Chr. 14 and ss715624611 on Chr. 16, which were significantly associated with both leaf chloride concentrations and leaf chlorophyll concentrations in 2014, 2015, and combined years in GLM (Table 1), were also identified to be significantly associated with leaf chlorophyll concentrations in 2014, 2015, and combined years in MLM (Table 2 and Fig. 3).

Table 1 List of SNPs significantly associated with both leaf chloride concentrations (mg kg−1) and leaf chlorophyll concentrations over two years based on GLM (−log10 P ≥ 4.1, p < −7.9 × 10−5)
Table 2 List of common significant SNPs between GLM and MLM
Fig. 2
figure 2

Manhattan plot of −log10(P) values for each SNP against chromosomal position for leaf chloride concentrations in 2014 (a), 2015 (b), combined chloride over 2014 and 2015 (c)

Fig. 3
figure 3

Manhattan plot of −log10(P) values for each SNP against chromosomal position for leaf chlorophyll concentrations (SPAD values) in 2014 (a), 2015 (b), and combined chlorophyll over 2014 and 2015 (c)

Discussion

The increases in NaCl concentrations were significantly associated with leaf chloride concentrations (Ledesma et al. 2016) and leaf chlorophyll concentrations (Lenis et al. 2011). Both leaf chloride concentrations and leaf chlorophyll concentrations, which are more objectively than visually scoring of leaf scorch, were used as the indicators of salt tolerance in this study. Leaf chloride concentrations were negatively correlated with leaf chlorophyll concentrations under salt stress (Patil et al. 2016). The significant negative correlation between leaf chloride concentrations and leaf chlorophyll concentrations in this study indicated that a reliable phenotypic data was generated. An important feature of GWAS is the broad genetic variation in the mapping population, which consisted of diverse germplasm resources. In order to capture the possible alleles relating to salt tolerance, 283 plant introduction lines were collected from 29 different countries. The mapping population showed a wide range of leaf chloride concentrations and leaf chlorophyll concentrations (Supplement Table 3), indicating that salt tolerance is controlled by a few quantitative trait loci. Variation of genotype × year interaction was not significant for leaf chloride concentrations and leaf chlorophyll concentrations, as expected for controlled environments in the greenhouse. However, the variation of year effect was significant. It is likely because pots were exposed to the salt treatment for different durations in 2014 and 2015, respectively, since the treatments were continued until the checks started to show the contrasting foliar symptoms in each year, which continued 18 and 12 days for 2014 and 2015, respectively. Significant differences among genotypes were present for both indicators of salt tolerance.

The mapping resolution and marker density required for an effective association analysis were determined by LD decay distance (Zhu et al. 2008). To obtain high-resolution mapping, a short LD decay distance requires greater number of markers (Zhu et al. 2008). The LD decay distances in euchromatic and heterochromatic regions, which were 348 and 4838 kb, respectively (Supplement Table 4), were similar to those reported by Zhang et al. (2016). The 33,009 SNPs gave an average marker distance of 16.9 and 55.6 kb in euchromatic and heterochromatic regions, respectively, which were much lower than the LD decay distance. Therefore, high density of SNPs in this study provided a robust genotypic data for the association analysis.

In GWAS analysis, population structure and relative kinship may cause spurious association between traits and markers (Yu et al. 2006). GLM corrects for population structure while MLM takes both population structure and familiar relatedness into account. Both GLM and MLM control the genomic inflation effectively and have been widely used in GWAS of soybean traits (Wen et al. 2014; Wen et al. 2015; Zhang et al. 2016). In this study, three subpopulations as suggested by STRUCTURE analysis were used to remove the spurious association caused by population structure. MLM was also conducted at the optimum compression level since compressed MLM has been demonstrated to be more powerful and effective in association studies (Zhang et al. 2010).

A major soybean QTL conferring salt tolerance has been consistently mapped on Chr. 3 (Lee et al. 2004; Hamwieh et al. 2011; Huang 2013; Guan et al. 2014; Concibido et al. 2015; Patil et al. 2016). In our study, the major QTL on Chr. 3 was confirmed and narrowed down to a region of 1.18 Mb, which was flanked by SNP markers ss715585943 and ss715586102. Moreover, the salt tolerance gene Glyma03g32900 (40,623,066–40,634,451) reported by Guan et al. (2014) and Patil et al. (2016) was also located near the SNP marker ss715586154 (40,440,832). A total of 18 out of 31 significant SNPs for both leaf chloride concentrations and leaf chlorophyll concentrations on Chr. 3 can be considered as major SNPs with explanation of phenotypic variation greater than 10%. SNPs significantly associated with both leaf chloride concentrations and leaf chlorophyll concentrations over years were also detected on Chr. 2, 7, 8, 10, 13, 14, 16, and 20 in GLM. Huang (2013) reported that three SNPs on Chr. 2 and two SNPs on Chr. 8 were significantly associated with leaf scorch score under salt stress. The significant SNP on Chr. 2 identified in our study is around 1.4 Mb away from those detected by Huang (2013). However, the significant SNP on Chr. 8 in our study is about 18 Mb away from those reported by Huang (2013). The SNP ss715581136 on Chr. 2 and ss715618138 and ss715618731 on Chr. 14 can be also considered as major SNPs since they explained greater than 10% of phenotypic variation for both leaf chloride concentrations and leaf chlorophyll concentrations (Table 1). The major alleles for 80% of significant SNPs in our population contributed to the increase of leaf chloride concentration in soybean under 120-mM NaCl treatment, which was in agreement with that soybeans were generally sensitive to salt stress (Launchli 1984). Minor alleles of most of the significant SNPs on Chr. 3 contributed to the decrease of leaf chloride concentrations under salt stress, which was in agreement with previously reported results (Huang 2013). However, minor alleles of three significant SNPs on Chr. 3, three SNPs on Chr. 7, and one SNP on Chr. 13 accounted for the increase of the leaf chloride concentrations under salt stress in our study (Table 1). Overall, the significant SNP markers on Chr. 2, 3, 14, 16, and 20, which were identified in both GLM and MLM (Table 2), are highly recommended for marker-assisted selection in breeding salt-tolerant soybean lines. Moreover, we found out that the SNPs and QTLs for salt tolerance at germination stage (Kan et al. 2015) were probably different from those identified at vegetative stages (V1 and V2) of soybean, because none of the SNPs or QTLs for salt tolerance at germination stage (Kan et al. 2015) was confirmed in our study or previously reported studies (Lee et al. 2004; Huang 2013; Concibido et al. 2015; Patil et al. 2016). Although one salt tolerance SNP at germination stage has been also identified on Chr. 3 (Kan et al. 2015), the physical position of this SNP was more than 35 Mb away from the previously reported major salt tolerance QTL at vegetative stages (Lee et al. 2004; Huang 2013; Concibido et al. 2015). In addition, the salt tolerance QTL on Chr. 13 indicated by current GWAS analysis was different from the QTL on Chr.13 indicated by bi-parental QTL mapping analysis (Zeng 2016); the QTL on Chr.15 identified in the bi-parental QTL mapping analysis (Zeng 2016) was not confirmed by current GWAS analysis.

In summary, a genome-wide association analysis was conducted using high-density SNP markers and two indicators of salt tolerance trait in a mapping population consisting of diverse germplasm lines worldwide. With the implementation of GLM and MLM, the major salt tolerance QTL for vegetative stage was confirmed on Chr. 3; a minor salt tolerance QTL for vegetative stage was confirmed on Chr. 2; and seven novel salt tolerance QTLs for vegetative stage were identified on Chr. 7, 8, 10, 13, 14, 16, and 20. The newly identified significant SNP markers for salt tolerance at vegetative stage will benefit the breeders in developing salt-tolerant varieties by assisting in parent line selection, trait introgression, and evaluation of germplasm.