Introduction

Soybean (Glycine max (L.) Merr.) is an important crop worldwide in terms of economic and nutritional value. One of the most destructive pests of soybean is the soybean cyst nematode (SCN; Heterodera glycines Ichinohe). It has infested the main soybean cultivation regions of China. Eight races (viz., 1, 2, 3, 5, 6, 7, 9 and 14) have been found (Liu 2005), race 3 is predominant in the northeast (including Heilongjiang, Jilin, Liaoning, and Inner Mongolia); race 1 is predominant in Shandong, Hebei, and Shaanxi; and race 4 predominates in Shanxi and Beijing city (Wu et al. 1982; Liu et al. 1984, 1989; Shang and Liu 1989; Li et al. 1991; Lu et al. 2006). The total area of SCN-infected soybean in China increases each year, resulting in substantial yield losses. Cultivation of resistant soybean cultivars combined with rational crop rotation is the most effective means of control. However, breeding for resistant cultivars by conventional methods is difficult, because it is time-consuming and expensive to determine the phenotypes of the lines for SCN resistance. The resistance is conditioned by different genes (Arelli et al. 1992), for example, SCN resistance sources PI437654 and PI88788 each had a different functional SCN resistance allele at or close to rhg1 (Brucker et al. 2005). SCN race determination is just as complex (Caviness 1992), because SCN field populations are variable (Riggs and Schmidt 1988). The use of genetic markers linked to SCN resistance genes would be a major improvement in breeding programs.

It is generally believed that the gene rhg1 is essential for the development of resistant cultivars regardless of SCN race (Cregan et al. 1999; Ruben et al. 2006). rhg1 was identified by classical genetic analysis as a recessive gene (Caldwell et al. 1960; Arelli et al. 1992) which seems to be common to major SCN resistance sources and provides the major portion of resistance to race 3 and race 14 (Concibido et al. 2004). Major quantitative trait loci (QTL) for resistance have been mapped across many resistance sources, including Peking and PI437654, PI90767, PI88788, PI209332, PI89772, and PI404198A (Concibido et al. 1994, 1996, 1997, 2004; Webb et al. 1995; Chang et al. 1997; Heer et al. 1998; Prabhu et al. 1999; Meksem et al. 2001a; Yue et al. 2001; Guo et al. 2006a), producing several loci and interactions among them (Huang et al. 1997; Meksem et al. 2001a; Concibido et al. 2004). The QTL conferring the greatest level of resistance mapped to a genomic region containing rhg1 in linkage group (LG) G (Concibido et al. 2004; Guo et al. 2006a), and QTL clusters on each end of LG G were confirmed by meta-analysis, a statistical method for assessing whether QTLs detected on a linkage group map in different studies are located on the same locus or linked (Goffinet and Gerber 2000; Guo et al. 2006b). To date, 12 molecular markers, including seven RFLP (restriction fragment length polymorphism), one RAPD (random amplified polymorphic DNA), one AFLP (amplified fragment length polymorphism) and two SSR (simple sequence repeat) linked to the rhg1 locus and to the QTL underlying resistance have been reported (Concibido et al. 2004; Guo et al. 2006a). The most closely linked marker is BACR-Satt309, a SSR marker that maps 0.4 cM proximal to the rhg1 locus (Cregan et al. 1999). It is now frequently used in MAS. It is convenient to distinguish Peking or PI437654 from PI88788. However, it cannot be used in MAS populations developed from susceptible southern USA cultivars crossed with PI88788 or PI209332, because these genotypes all share the same allele size at Satt309.

The candidate gene of rhg1 has been cloned and sequenced (AF506516, Lightfoot and Meksem 2000; Hauge et al. 2001). As a member of the RHG1 protein-receptor-like kinase (RLK) gene family it has an N-terminal signal peptide (1–48), an extracellular domain with ten extracellular leucine-rich repeats (LRR, 141–471), two trans-membrane domains (TM, 40–60; 485–507), and a cytoplasmic serine threonine kinase domain (STYKc, 569–840) (Ruben et al. 2006; Afzal and Lightfoot 2007; Afzal et al. 2008). LRR-containing RLKs, which form the largest group of RLKs in plants, were predicted to play a central role in signaling during pathogen recognition in plant defense mechanisms and in developmental regulation. A series of studies (Jia et al. 2000; Arai et al. 2005) showed that amino acid substitutions in various LRR-containing RLKs led to diverse phenotypes, including resistance. Jia et al. (2000) studied the resistance specificity of Pi-ta (a LRR gene against rice blast) and discovered that substitution of an alanine by a serine at position 918 in the LRR domain resulted in loss of resistance. The sequence of rhg1 makes it now possible to survey the SNP variation at this locus, and to use these SNPs as markers for SCN resistance (Meksem et al. 2001b; Hofmann et al. 2002; Qiu et al. 2003; Cahill and Schmidt 2004; Ruben et al. 2006).

Linkage disequilibrium (LD)-based association study, as an alternative approach for studies on genetics of complex traits, has received increased attention of plant geneticists during last few years. SNPs have emerged as promising molecular markers in the association study for their abundances in the genome, their low mutation rates, and their accessibility to high-throughput genotyping (Collins et al. 1997; Kim et al. 2004). They may even represent the functional change itself, but these cases are very rare. In fact, it is not always necessary to know the functional change, as association analysis can indicate the closely linked allele for MAS.

The purpose of this study was to determine the sequence variation in rhg1 and to use this information to identify SNPs in or near to the gene. Using an agarose-based assay for these SNPs we hoped to determine the occurrence of these SNPs in rhg1 in a set of 70 soybean genotypes, and to identify haplotypes and SNPs associated with SCN resistance.

Materials and methods

Plant materials

Eight soybean genotypes, Peking, PI437654, Sangutiaoheidou, Huipizhiheidou, Xiaolimoshidou, You1298, Suinong 14, and Guxin, were initially chosen for SNP discovery through cloning and sequencing analysis. Peking and PI437654, originating from China, are well-known resistant sources in the USA; Sangutiaoheidou, Huipizhiheidou and Xiaolimoshidou are from a Chinese SCN-resistant core collection. You1298 is moderately resistant only to race 5. Guxin and Suinong 14 are susceptible to all races. An additional 62 genotypes from China (58) and the USA (4) were selected for haplotype analysis using SNP markers (Fig. 2). Among these, 23 genotypes were identified by Xie et al. (1998) and the Coordinative Group of Evaluation of SCN (1993) as resistant to SCN races 1, 2, 3, 5 and 14 (Table 4).

Genomic PCR cloning

DNA was extracted from leaf tissue of five to ten plants of each genotype following the method of Xie et al. (2003). Based on the sequence of rhg1, AF506516, which was obtained from the SCN-resistant cultivar “Forrest”, four pairs of primers (Table 1) were designed using Primer Designer ver. 2.0 (Scientific and Educational Software, Cary, NC, USA) to produce overlapping fragments covering the complete 4,956 bp. The PCRs were carried out using Ex Taq polymerase (TaKaRa Biotechnology (Dalian)), a 3′ → 5′ proof reading polymerase, in a reaction volume of 20 μl, containing 50 ng genomic DNA, 2 μl 10× PCR buffer, 1.5 μl 2 μM forward and reverse primers, 1.5 μl 2 mM dNTPs, and 1 unit Ex Taq polymerase. PCR was performed on a PTC-225 Peltier Thermal Cycler (MJ Research) using the program: initial 3 min denaturation at 95°C; 32 amplification cycles, each of 30 s at 94°C (denaturation), 30 s at the optimized annealing temperature (Table 1), 1.5 min at 72°C (extension); and a final 8 min at 72°C. The target fragments were excised and purified with a Tianwei DNA Fragment Purification Kit (Tianwei, P.R. China) after the PCR product was separated on 1% agarose gel. The products were ligated into pMD 18-T (TaKaRa Biotechnology (Dalian)) and used to transform E. coli Electro-Cell TOP 10 (Tianwei, P. R. China) according to the protocol from the supplier. For each product, 12 white colonies were sub-cultured for plasmid DNA isolation using a Plasmid Purification Kit (Tianwei). Ten plasmid DNAs for each product were mixed in equal amounts based on measurement by a Lambda 35 UV–visible spectrometer (Perkin Elmer, USA). The clone mixtures were sequenced in both directions using an ABI 3730 sequencer (Applied Biosystems, Foster City, CA, USA). To obtain a good quality sequence for the full length of each fragment, two additional internal sequencing primers were designed for each fragment (Table 1). For each genotype, the resulting full-length sequences of both strands were aligned using Seqman (DNAstar, Madison, WI, USA) to obtain correct sequences. All DNA sequences from the eight genotypes were aligned with the reference sequence AF506516 to display the SNPs, and they were also translated into and compared as amino acid sequences. The sequences were deposited into Genbank/EMBL under accession number EU733524 to EU733528 and EU740426 to EU740428.

Table 1 Primer pairs used for cloning the rhg1 gene based on AF506516

SNP detection

SNPs were routinely determined as ACAS (agarose gel-based co-dominant allele-specific) PCR markers, also called allele-specific (AS)-PCR markers (Kanazin et al. 2000). This is a simple, low cost, and highly reproducible marker system which can be developed on the basis of single or a few nucleotide changes. It has been applied in, e.g., Arabidopsis (Drenkard et al. 2000), soybean (Jeong and Maroof 2004; Kim et al. 2005; Yuan et al. 2007), apple (Gao et al. 2005), maize (Shin et al. 2006), rice (Hayashi et al. 2004, 2006) and lettuce (Moreno-Vázquez et al. 2003). For each SNP locus, two allele-specific forward (or reverse) primers (each matching one of the SNP-variants at the 3′-end) and two normal reverse (or forward) primers with matching melting temperatures were designed (Fig. 1). The primers were designed in such a way that the difference in length between the two PCR fragments was at least 50 bp, so that the alleles could be easily separated on agarose gels. PCR reactions were performed as described above. The two amplified products for the same SNP locus were then mixed and separated on standard 1.5% agarose gels. Finally, the genotypes of the SNP were assessed according to the molecular weights of the PCR-amplified products.

Fig. 1
figure 1

Schematic diagram showing the ACAS-PCR approach used to validate SNPs, taking rhg1-689 as an example. R6 and R7 primers contained SNP-variants at the 3′-end and, in addition, had a mismatch with the nonspecific allele within the 2nd–3rd nucleotides in the 3′-end. F4 and F5 primers were normal primer. The two amplified products for the same SNP locus (1) and (2) were then mixed and separated on standard 1.5% agarose gels

Haplotype analysis

We tested the hypothesis of neutral polymorphisms using Tajima’s D test (Tajima 1989) and Fu and Li (1993) F tests. Analyses of diversity and tests of neutrality for individual genes were conducted in DnaSP 4.0 (Rozas and Rozas 1999).

LD was evaluated for each pair of SNP loci (including the six ACAS-PCR markers) using TASSEL (www.maizegenetics.net/bioinformatics/tasselindex.htm). Two values, a standardized disequilibrium coefficient (D′, Hedrick 1987) and a squared allele-frequency correlation (r 2, Weir 1996), were calculated for each locus pair. The significance (P value) of the LD for each locus pair was determined by 100,000 permutations.

Results

SNPs in rhg1 among eight sequenced genotypes

The alignment of the rhg1 allele sequence included 4,995 nucleotide positions including gaps. A total of 42 polymorphisms were identified among eight soybean genotypes, 37 of which were SNPs (Table 2) and the remainder consisted of InDels (data not shown). Among the 37 SNPs, 11 (29.7%) were singleton variable loci found in only one genotype, and 26 (70.3%) were parsimony informative loci found in at least two genotypes. No SNPs were detected in the 5′-noncoding region, nine were in exon 1, one in the intron, two in exon 2, and 25 in the 3′-noncoding region. Seven of the 11 SNPs in the coding regions led to changes in amino acid codons (six in exon 1 and one in exon 2). We also included the soybean rhg1 reference sequence AF506516 in the analysis. AF506516 is the sequence from Forrest which has the SCN resistance from Peking, so the expectation was that the sequence from Peking in this study should match AF506516. However, three single-base indels which were found in exon 1 resulted in three frame-shifts and, as a consequence, in a substantial amino acid change: from a 27 amino acid (VTYEDRKRSPSSCWWLMLKQVGRLEGN, AF506516) peptide to a new 26 amino acid peptide (ATMRTEKGVPPVAGGDVEAGGEAGGK). Furthermore, three single-base deletions were observed in the other seven sequenced genotypes.

Table 2 SNPs in the candidate rhg1 gene identified from eight soybean accessions

Development of ACAS-PCR markers

Five ACAS-PCR primer sets worked well and behaved co-dominantly, but the set for the 2564G > A SNP was dominant because one pair of primers produced no PCR product. A new primer pair was therefore designed on the sequence of the Actin gene (sequence number in GenBank: V00450) and included as a positive control in the multiplex PCR. This marker was named ADAS (agarose gel-based dominant allele-specific)-PCR marker. The sequences of the markers, annealing temperatures and the sizes of the PCR products after optimization of DNA concentrations and PCR programs for all five co-dominant and one dominant SNP markers are listed in Table 3. The genotypes of the SNPs detected by the ACAS-PCR markers in the eight sequenced genotypes were the same as in the sequencing.

Table 3 Description of AC-PCR markers for the soybean rhg1 gene

Among the six SNPs selected for developing AS-PCR markers, three were in exon 1, two were in exon 2, and one was in the 3′-noncoding region (indicated in bold in Table 2). Of the five SNPs in the coding regions, two (689C > A and 757C > T) were located between the N-terminal signal peptide domain and the leucine-rich repeat domain; the others were located in the serine/threonine kinase domain. Four SNPs (at positions 689, 757, 2,564 and 3,995) form one haplotype present in five resistant genotypes (Peking, PI437654, Sangutiaoheidou, Xiaolimoshidou, and Huipizhiheidou), and another haplotype in the susceptible genotypes Suinong 14 and Guxin. The SNP at position 2,233 characterized all resistant genotypes except Huipizhiheidou. Finally, the SNP at position 2,868 was present in all genotypes except You1298, the line with moderate resistance only to race 5.

Clustering and haplotype diversity

The six SNPs formed nine distinct haplotypes among the 70 genotypes. The haplotypes (HAPs) from different cultivars were displayed in a neighbor-joining tree (Fig. 2). Genotypes with HAP 1 to HAP 6 clustered in one group, and those with HAP 7 to HAP 9 clustered in another, supported by a bootstrap value of 84%. The haplotype in G. soja genotype Guxin was the same as that in several other SCN susceptible landraces and cultivars. All known resistant genotypes (23) were included in the first cluster, while the eight susceptible genotypes were all placed in the second cluster. The frequency of different haplotypes ranged from 1.4 to 24.3%. Of the nine haplotypes, HAP9 (A-T-A-A-T-C) was the most common and HAP7 (A-T-G-G-C-A) was unique being only present in Lee.

Fig. 2
figure 2

Neighbor-joining tree resulting from six SNP marker genotypes as found in 70 soybean accessions including 69 G. max and one (Guxin) G. soja accessions. Bootstrap values of 40% or higher are shown on the branches. The haplotypes are juxtaposed on the right of the tree. For each site in the haplotype, a dash represents the more common nucleotide. The sequence in the haplotype was arranged in order of the location in the rhg1 sequence as bp 689, 757, 2,223, 2,564, 2,868, and 3,995. Symbols were added before each accession name to show SCN response, as indicated

The nucleotide diversity for the six SNP loci among the 70 genotypes was 0.449 and haplotype diversity was 0.841. The number of haplotypes detected was much less than 64 haplotypes as expected when assuming that the six loci are in Hardy–Weinberg equilibrium with each other. This may indicate that natural selection or directional selection for resistance to SCN has been occurring.

Tests of selection

Neutrality tests were conducted using two values, Tajima’s D value and Fu and Li’s F value. Tajima’s D (2.78) was significant at P = 0.01. Fu and Li’s test with the G. soja genotype as an outlier showed that F (1.97) was significant at P = 0.05. Both tests indicated that the polymorphisms deviate from neutrality.

Linkage disequilibrium between the six AS-PCR markers

To survey linkage disequilibria between the polymorphisms at the six SNP marker loci, P-values of linkage disequilibrium were determined using Fisher’s exact test and two measures for estimating LD (D′ and r 2) (Fig. 3). The average values of D′ and r 2 for pairwise locus comparisons were 0.815 and 0.403, respectively. Most (60%) of the pairwise loci were in significant linkage disequilibrium (P < 0.0001) and the average value of D′ and r 2 for the nine significant pairwise loci were 0.869 and 0.587, respectively. The pair 689C > A and 757C > T, a close distance from each other and adjacent to the LRR domain (556–676), had the strongest linkage disequilibrium with the highest D′ and r 2 values. No recombination event was detected between them. Pairs of alleles involving the 2868T > C locus and the other loci were not significant in LD, as was the case for loci 2233G > A and 3995A > C. The minimum number of recombination events separating these groups was four (Rm = 4) detected between 757C > T and 2233G > A, 2233G > A and 2564G > A, 2564G > A and 2868T > C, and 2868T > C and 3995A > C.

Fig. 3
figure 3

Linkage disequilibrium matrix for the six SNP loci based on the 70 genotypes studied. Pair-wise calculations of LD (r 2) are displayed above the diagonal with the corresponding P values for Fisher’s exact test displayed below the diagonal

Association between SNP or haplotype and SCN resistance

The frequencies of 689C, 757C, 2564G, and 3995A in 23 SCN resistant genotypes were 1.00, 1.00, 1.00, and 0.96, respectively. SNP 2564G was also observed in the unique haplotype of susceptible genotype Lee, but the other four were absent from all the eight genotypes comprising the susceptible population. Therefore, 689C, 757C, 2564G, and 3995A can be considered as resistance-associated SNPs. The haplotype of Heidou 2, resistant to race 2, was identical to the others in the resistant group at the first three SNPs, but had SNP 3995C. The distribution of haplotype 689C–757C in the 23 resistant and eight susceptible groups was compared with the distribution of alleles present at the Satt309 SSR locus (Table 4) where a total of five alleles with estimated lengths of 125, 128, 131, 134, and 149 bp have been detected. In previous studies (Cregan et al. 1999; Wang et al. 2003), alleles 128 and 134 bp were considered as markers for “resistance” alleles, and alleles 131 bp and allele 149 bp as markers for “susceptibility” alleles. In the present study, alleles with 128 and 134 bp coincided with the 689C-757C haplotype, and alleles with 131 and 149 bp coincided with the 689A-757T haplotype. The 125 bp allele occurred in single genotypes of resistant and susceptible genotypes in previous studies (Cregan et al. 1999; Wang et al. 2003), but our results show that the SNPs on 689C > A and 757C > T can successfully separate the resistant (for example, PI88788) from the susceptible genotypes (for example, Lee). Therefore, our haplotypes may be more efficient than Satt309 in MAS.

Table 4 Responses of 31 soybean accessions to soybean cyst nematode races 1, 2, 3, 5, and 14, haplotype types based on the 689C > A and 757C > T SNPs and the allele size at locus Satt309

Discussion

Association between SNP and SCN resistance

In this study, a total of 37 SNPs were detected in the rhg1 candidate gene. Six SNPs which were primarily predicted to be associated with resistance were first selected for association study. LD analysis showed that a high LD (D′ = 0.815) pattern was detected among the pair-wise SNP loci in the rhg1 candidate gene. By association analysis, a haplotype of two SNPs (689C > A and 757C > T) that was present only in resistant genotypes was discovered. The association with disease resistance was superior to the microsatellite marker (Satt309) used up to now. The ACAS-marker we developed for genotyping 689C > A will be highly useful in breeding soybeans for resistance to SCN by MAS.

689C > A is non-synonymous as it alters K (Lys) to Q (Gln) at amino acid position 115. Ruben et al. (2006) also discovered 689C > A was one of five SNPs predicted to generate amino acid changes after analyzing partial cDNA sequences of the rhg1 candidate gene from 112 SCN-resistant PI genotypes, from 34 sequences derived from cultivars and published on GenBank by Afzal et al. (2004), and the sequences released from the patent issued to Hauge et al. (2001). Although 689C > A does not belong to two potential QTNs (quantitative trait nucleotide) detected by Ruben et al. (2006), its amino acid position (115K/Q) is located in an important functional region (between the TM1 and LRR domains). Similarly, one SNP was also discovered between TM1 and LRR domains at NTS1/GmNARK gene, resulting in a stop codon (AAA to TAA) (Arai et al. 2005). The relationship of the change (689C > A) with the functionality of the resistance remains to be determined.

Lee haplotype may not be distinguishable with Satt309 but breeders often cross resistant by susceptible and avoid making crosses with lines that are identical by descent with Lee derivatives because they often know the parental allelic state of each parent and the phenotypic status of the line. In addition, the Lee/R cross from PI88788 would be non-informative; however, it would be informative in a Lee by Peking type cross. Compared with the haplotype consisting of two SNPs, Satt309 has added benefit of distinguishing between Rhg1 derived from Peking/Hartwig/PI437654, etc., versus Rhg1 derived from PI88788 and those lines that are ostensibly identical by descent. Therefore combining both two SNPs in this study and Satt 309 would improve the selection efficiency of marker assisted selection.

Interestingly, the three single-base indel compared with the patented sequence AF506516 (Lightfoot and Meksem 2000) led to one amino acid missing and a stretch of 26 amino acids changing. There is little chance this was caused by sequencing errors, because we sequenced a bulk of ten clones from both directions for a total of eight genotypes and identified over 200 genotypes by PCR sequencing (unpublished data). The sequence, causing a stretch of 26 amino acids, was blasted and completely matched with the scaffold_121:1714112–1714189 of the phytosome (http://www.phytozome.net/search.php?show=blast) sequence from Williams 82. The matched sequence was located between Sat_168 and Satt309 at linkage group G, which correspond to a genetic study (Cregan et al. 1999). These results further confirmed our finding of Indel. Alternatively, the resistance associated with the SNPs might have another gene, for example Rhg4 involved, if these genes positively correlate with SNPs, because of the requirement that rhg1 from Peking alone does not give resistance (Meksem et al. 2001a). That the resistances for the 23 genotypes were different among the SCN races might be more evidence of the existence of a more resistant gene than rhg1.

Molecular evolution of the rhg1 candidate gene

Selection always increases the LD levels of targeted genes through genetic linkage. As a key SCN resistance gene, rhg1 must have been a target of selection during domestication and subsequent selective breeding, because farmers and breeders would have selected SCN-resistant cultivars. Directional selection reduces the level of polymorphism through rapid fixation of adaptive mutations (Gupta et al. 2001). This may account for the two co-segregating loci (689C > A and 757C > T) present in all 23 resistant genotypes and absent from all eight susceptible genotypes examined in this study. Selection for this gene is also supported by the fact that the gene was found in many accessions, and even in the related species G. soja.

Advantage of ACAS-PCR markers

To eliminate the need to sequence a large number of samples, ACAS-PCR markers were developed for genotyping selected SNPs based on the principles of AS-PCR, a promising simple procedure for assaying SNPs (Drenkard et al. 2000). Different from AS-PCR, which cannot address the problem of the absence of a PCR product caused by reasons other than primer specificity, ACAS-PCR markers not only increased the stability of the AS-PCR assay but may also enable heterozygotes to be identified (though no heterozygous locus was detected in this study). Moreover, ACAS-PCR marker is agarose gel-based enabling rapid assays of large numbers of samples. Five ACAS-PCR markers and one ADAS-PCR marker which used an EST as an internal positive control were developed and have been used to genotype SNP in the 70 soybean genotypes.