Introduction

Bread wheat (Triticum aestivum L.) is one of the most important food crops, feeding about 40 % of the world’s population (Gupta et al. 2008). Selection for high grain yield is an important objective in all wheat breeding programs. Grain yield is a complex trait and is usually controlled by quantitative trait locus (QTL). It is influenced by environmental factors, which make it difficult to be manipulated and improved in breeding programs. Improvement is therefore slow due to the lack of genetic information about the number, location and contribution of each QTL to the final expression of yield (Koebner and Snape 1999).

Grain yield can be dissected into a number of components. Grain number (GN), as one of three major components of yield in wheat, has steadily increased over the period. During the past decades, numerous studies on QTLs (or genes) for grain number have been reported in wheat. Araki et al. (1999) detected QTLs for spikelet number per spike (SNS) on chromosome 4AS. Kato et al. (2000) found a minor QTL for SNS on chromosome 5A. Li et al. (2002) detected QTLs for spike length (SL) on chromosomes 1AL, 1BS, 4AL and 7AL. QTLs for grain number per spike (GNS) were detected on 4A and 7D by Börner et al. (2002), by Huang et al. (2004) on 1D, 2A, 3D, 6A, 7A and 7D in an elite winter cultivar, by Narasimhamoorthy et al. (2006) on 3B and 3D, by Cuthbert et al. (2008) on 1A, 2D, 3B, 5A and 7A, and by Deng et al. (2011) on 4B. Kumar et al. (2007) found coincident QTLs for SNS on 2DS and 5AL, and QTLs for GNS on 1AL and 1BL in two populations. From the above analytical results, QTLs (or genes) for grain number were primarily located on chromosomes 1A, 1B, 1D, 2A, 2D, 3B, 3D, 4A, 5A, 6A, 7A and 7D.

All the above QTLs for grain number were detected by linkage mapping using F2, RIL or DH populations. With huge reductions in the cost of high-throughput genomic technologies (Atwell et al. 2010; Huang et al. 2010) and improvements in statistical methods (Zhu et al. 2008), association mapping (also named LD mapping) has become a reality in plants (Thornsberry et al. 2001). As an alternative to traditional linkage analysis, association mapping has gradually gained favor in plant genetic research because of several advantages: (1) association mapping reduces research time by using existing germplasm populations rather than segregating populations. (2) It can identify novel and superior alleles because more alleles are involved and their effects can be compared in the same diverse population. Associated markers can be used to assist breeders in introducing the best alleles into new varieties. (3) Using the historical recombination events over time, association mapping in a relatively small population of individuals can give a fine mapping resolution that requires very large populations in traditional biparental crosses. Association analysis can even detect associated genetic variation within a gene so as to prove its function (Thornsberry et al. 2001; Zhu et al. 2008; Cockram et al. 2010; Su et al. 2011). (4) Association mapping has lower costs in that only one set of genotypes is required to assess many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise (Atwell et al. 2010). In wheat, Yao et al. (2009) detected loci contributing to spikelets per spike and grains per spike on chromosome 2A by association mapping using 125 markers, and demonstrated that association mapping can enhance QTL information and achieve higher resolution. Neumann et al. (2011) performed a genome-wide association mapping study in wheat considering 20 traits including GN and thousand kernel weight (TKW), and those loci provide opportunities for further wheat improvement, based on a marker approach. Wang et al. (2012) used association analysis to provide useful information for marker-assisted selection in breeding wheat for TKW. Each of these studies indicated that association analysis is an effective complementary approach to dissect important quantitative traits in crops compared with the traditional linkage mapping.

In this study, we used the diverse Chinese wheat mini core collection (MCC) (Hao et al. 2008) to undertake a genome-wide association analysis of grain number using 531 SSR markers randomly located on all 21 chromosomes. The purposes of the study were (1) to search for QTLs controlling grain number in a diverse germplasm gene pool, (2) to identify markers and wheat germplasm with favored alleles or genotypes for use in predictive molecular breeding, and (3) to consider the genetic mechanism of grain yield in bread wheat.

Materials and methods

Phenotypic assessment

A set of Chinese wheat MCC (Zhang et al. 2007; Hao et al. 2008, 2011) was chosen for genome-wide association of grain number (GN) with SSR markers. The MCC contains 262 wheat accessions, including 157 landraces, 88 modern varieties, and 17 introduced lines, representing 1 % of the national collection but more than 70 % of the entire genetic diversity (Hao et al. 2008). All accessions were planted in a randomized complete block design in four environments, viz. 2002, 2005 and 2006 in Luoyang, Henan province, and 2010 in Shunyi, Beijing (named GN02, GN05, GN06 and GN10, respectively). Each accession was planted in a 2 m two-row plot with 30 cm between rows, and 40 seeds planted per row. Ten plants from the middle of each plot were used in investigating GN and TKW (Wang et al. 2012). Mean values of GN and standard errors were analyzed by SPSS 16.0 (http://www.brothersoft.com/downloads/spss-16.html). The mixed mean GN (MGN) was estimated by the best linear unbiased predictor (BLUP) method according to Bernardo (1996a, b, c).

SSR genotyping

Genomic DNA was extracted from leaves of ten seedlings in each accession according to Sharp et al. (1989) and fingerprinted by PCR amplifications that identified alleles at 531 SSR loci. Genetic map positions for most of the markers (512 loci) can be found in Hao et al. (2011). The loci were distributed evenly across all 21 wheat chromosomes. The primer sequences and genetic locations of the loci were obtained from http://www.shigen.nig.ac.jp and http://wheat.pw.usda.gov, Röder et al. (1998) and Somers et al. (2004). The annealing temperature for each primer pair was obtained from Röder et al. (1998) and GrainGenes (http://wheat.pw.usda.gov). The amplified products were separated on an ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA) after the purification. Fragment sizes were determined using an internal size standard (GeneScan™-500 LIZ, Applied Biosystems), and the outputs were analyzed using GeneMapper software (http://www.appliedbiosystems.com.cn/). The minimum allelic frequency (MAF) was set as 0.05 for statistical analysis.

Association analysis

In order to define the degree of genetic covariance between pairs of individuals, a kinship (K) analysis was conducted on genotypic data with SPAGeDi software (Hardy and Vekemans 2002). Calculation of pairwise kinship coefficients was according to Loiselle et al. (1995) with 10,000 permutation tests. Negative values between individual pairs were then set to 0, as this indicated that they were less related than random individuals (Yu et al. 2006). To reduce the risk of false or spurious associations, population structure was estimated by STRUCTURE v2.2 software (Pritchard and Rosenberg 1999; Pritchard et al. 2000), based on 42 unlinked loci from both arms of each chromosome with a burn-in period equal to 50,000 iterations and a run of 500,000 replications of Markov Chain Monte Carlo (MCMC) after burn in (Wang et al. 2012).

The mixed linear model (MLM) module with Q + K of the TASSEL 2.1 software package (http://www2.maizegenetics.net/) (Bradbury et al. 2007; Zhang et al. 2010a, b) was used for genome-wide association of GN in each trial and MGN. The relative value of the favored allele for GN (R 2) was calculated according to the equation, R 2 = (SSA − f A × MSE)/SST where SSA indicated the sum of squares between groups of favored alleles and others, f A indicated the degrees of freedom of the group with the favored alleles, MSE indicated the error mean square, and SST indicated the sum of squares (Agrama et al. 2007; Zhang et al. 2010a, b).

Results

Phenotypic assessment

Grain number (GN) in the Chinese wheat MCC was determined over four growing seasons and two environments, including Luoyang, Henan Province, in 2002, 2005, 2006, and Shunyi, Beijing, in 2010. Grain number per spike showed high variation, ranging from 22.4 to 100.4 with an average of 49.8 (Table 1). There were significant correlations among environments, although there were also clear differences (Table 2). This indicated that the grain number had high heritability, but with obvious interactions with environment. Comparing landraces with modern varieties, there were minor differences in GN means in all environments, except GN-05 (Table 1). Mean grain number (MGN) of modern varieties, calculated with BLUP methods based on multiple environments, was not significantly different (P = 0.21) from that of landraces, suggesting that GN might not have undergone strong selection in breeding in comparison with 1,000-kernel weight (Wang et al. 2012).

Table 1 Comparison of grain number between landraces and modern varieties in the Chinese wheat MCC grown in four environments
Table 2 t Tests and correlation analysis of grain number among different environments

Genome-wide association analysis of grain number

Population structure showed that the MCC entries comprised two sub-populations. One group was mainly the landraces, and the other included modern varieties and introduced lines (Wang et al. 2012). We therefore used the MLM model (Yu et al. 2006) to undertake marker/grain number association analysis. Thirty-two loci were significantly (P < 0.05) associated with MGN. Association analysis between 531 SSR markers and grain number measured in four individual environments was performed to further identify SSR loci significantly associated with phenotypic values in multiple environments. Among the 32 loci associated with MGN, four were detected in three environments, viz. Xgwm311-2A, Xgwm131-3B, Xcfd52-5D and Xcfe273-6A; six loci were detected in two environments; and 17 were detected in one environment. Five loci were not associated in any single environment (Fig. 1; Table S1).

Fig. 1
figure 1

Genome-wide association analysis of grain number per spike and SSR loci in the A (a), B (b) and D (c) genomes of wheat. GNs collected from four trials were used to estimate mean values (MGN). GN-02, GN-05, GN-06 and GN-10 are grain numbers for 2002, 2005, 2006 in Luoyang (Henan province) and 2010 in Shunyi (Beijing), respectively

Among the 27 loci associated with MGN in at least one trial, we found breeder-favored alleles with strong positive effects at 23 loci on 14 different chromosomes. Chromosome 3D had four loci, viz. Xgwm108, Xgwm2, Xcfd64 and Xbarc42. The genetic distance among these four loci is more than 10 cM, indicating they may not relate to a single yield gene. The allelic effect on MGN at each locus was estimated by ANOVA (SPSS16). Significant or extremely significant differences were detected on MGN between varieties conveying favored alleles and the varieties with other alleles. Two loci with the strongest effects were detected on chromosomes 3B (Xgwm131) and 6A (Xcfe273), which individually accounted for more than 10 % of the total variation (R 2 > 10 %) (Table 3).

Table 3 Favored alleles at the 23 SSR loci significantly (P < 0.05) associated with MGN, their frequencies, phenotypic effects and R 2

Distribution of favored alleles at associated loci

Based on the mini core collection entries, the frequencies of favored alleles at each locus in the MCC were estimated for landraces and modern entries. The frequencies of five favored alleles were much higher in modern varieties than in the landraces and only one favored allele, Xgwm249 on 2A, was present as the favored allele in modern varieties (Fig. 2). The frequencies of favored alleles at the 23 loci in modern varieties were not significantly different from those in landraces. The favored alleles at eight loci did not increase in frequency in modern varieties compared with landraces, or even decreased. The frequencies of three favored alleles were higher than 50 % in modern varieties. However, two loci (Xgwm131 and Xcfe273) explaining the highest levels of phenotypic variation were at low frequency in modern varieties (Table 3; Fig. 2).

Fig. 2
figure 2

Pairwise comparisons of favored alleles between landraces and modern varieties at 23 loci in the Chinese wheat MCC

Positive selection of favored alleles at key loci was also clearly implicated by changes in number and frequency. Modern varieties tended to have greater numbers of favored alleles than landraces (Fig. 3a), indicating that in modern breeding there was an accumulation of favored alleles in selected varieties. The best modern variety (GN 61) had 13 favored alleles at 23 marker loci, whereas the best landrace (GN 60) had 10. These results illustrate the reliability of identifying favored alleles. However, no modern cultivar had favored alleles at all 23 loci, indicating further opportunities for improvement of GN by marker-assisted selection. Importantly, the favored alleles explaining the highest amounts of phenotypic variation (Xgwm311 500 , Xgwm131 110 and Xcfe273 306 ) did not have high frequencies in modern varieties; indeed, in some instances the frequencies were lower.

Fig. 3
figure 3

Comparison of frequencies (a) and phenotypic values (b) of varieties containing different numbers of favored alleles at 23 loci between landraces and modern varieties in the Chinese wheat MCC

We also analyzed grain numbers of varieties containing different numbers of favored alleles in landraces and in modern varieties. The phenotypic values of varieties containing more favored alleles were much higher than those possessing less favored alleles in both sets of accessions (Fig. 3b).

Relationship of GN and TKW

In our group about identifying loci influencing 1,000-kernel weight in wheat, phenotyping and association study of TKW were analyzed (Wang et al. 2012). So in this study, we further analyzed the relationship between GN and TKW.

In order to avoid the effect of population structure, all accessions were divided into landraces and modern varieties for evaluating the relationship between grain number (GN) and thousand kernel weight (TKW). While there was no significant correlation within landraces (Fig. 4a), an extremely significant positive correlation occurred between GN and TKW in modern varieties (Fig. 4b). Varieties from the 1960s showed lower GN (<53) and TKW (<40 g), those from the 1970s showed higher GN (>53) and lower TKW (<40 g), those from the 1980s showed higher GN (>53) and TKW (>40 g). We also analyzed the relationship of favored alleles between GN and TKW, and similar results were obtained (Fig. 4c, d). Varieties from the 1960s contained less favored alleles for GN (<7) and TKW (<9), varieties from the 1970s contained more favored alleles for GN (>7), but less for TKW (<9), and those after the 1980s contained more favored alleles for both GN (>7) and TKW (>9).

Fig. 4
figure 4

Relation between GN and TKW in landraces (a, c) and modern varieties (b, d). Modern varieties from different periods are labeled by different colored dots, corresponding to the 1940s (dark green circles), 1950s (medium green circles), 1960s (light green circles), 1970s (light orange circles), 1980s (medium orange circles) and 1990s (dark orange circles)

We selected 11 significantly associated loci explaining high levels of phenotypic variation (5 loci with R 2 > 5 % for GN and 6 loci with R 2 > 10 % for TKW (Wang et al. 2012)) to evaluate the phenotypic effect of favored alleles on GN and TKW (Fig. 5). The favored alleles at six loci (Xcfa223, Xgwm156, Xcfa2257, Xgwm2, Xgwm131 and Xwmc304) showed significant positive effects for both GN and TKW, accounting for 55 % of the analyzed loci; and favored alleles at only two loci (Xwmc17 and Xgwm311) showing an inverse effect for GN and TKW, accounting for 18 % of the loci, but the effect was not significant.

Fig. 5
figure 5

The phenotypic values of Favored alleles at 11 associated loci explaining high levels of phenotypic variation. Asterisk Instances where the phenotypic effects of Favored alleles on TKW and GN were significantly higher than the phenotypic effects for Others

Discussion

Advantages of association mapping of complex traits

Finding genetic variation underlying complex traits in crops not only provides insights into genetic pathways, but also provides targets for marker-assisted selection in breeding (Clark 2010). Linkage mapping in biparental crosses has been the most commonly employed method to identify genetic marker: phenotype associations. To date, many plant genes have been identified (Harjes et al. 2008) and cloned based on the linkage mapping, especially in Arabidopsis thaliana and Oryza sativa. Association mapping (also known as linkage disequilibrium mapping) is an alternative approach that captures multiple historical recombination events among a selected panel or population of unrelated individuals (Myles et al. 2009). Any candidate markers associated with agronomic and environmentally adaptive traits in plants were recently identified by this method (Wilson et al. 2004; Zhang et al. 2005; Breseghello and Sorrells 2006; Cockram et al. 2010; Huang et al. 2010).

An important advantage of association mapping, as mentioned and reviewed many times (e.g. Thornsberry et al. 2001; Zhu et al. 2008; Atwell et al. 2010; Clark 2010; Huang et al. 2010), is that association mapping is highly efficient in detecting multiple QTLs in the same genetic network at the same time, and provides opportunities to understand the pleiotropic effects of some chromosome regions, QTLs or genes. The populations used can be so diverse that most of the variation related to many target traits at different loci can be included and detected in the same mapping population. Results from the present study confirm the high efficiency of association mapping in QTL detection. We detected more than 100 QTLs related to grain number using a population of 262 individuals. Previously reported QTLs related to grain number based on linkage mapping in biparental populations were not more than 10 (Huang et al. 2004; Wang et al. 2009). We compared the associated QTLs with the reported QTLs according to genetic distances of the markers, and found there was high consistency between linkage analysis and association mapping. Associations of candidate markers with traits in this study were also detected in previous linkage studies (Table 3). Clearly, some novel markers and QTLs were also detected in this study, such as Xbarc42 and Xcfe172 on 3D, Xcfd52 on 5D and Xcfe273 on 6A.

Another advantage of association mapping relative to linkage mapping is that many variations at one locus may be present, enabling identification of the best of several alleles rather than the better of two. Moreover, association approaches are amenable to high-throughput genomics that can be used to characterize all of the genes in a genome (Buckler and Thornsberry 2002). In this way varieties or individuals with the most useful alleles can be identified. Moreover, association analysis allows detection of QTLs for a wide array of different traits in the same population and so permits the discovery of pleiotropic regions. In this study, we found that Xwmc304 was associated with grain number. Wang et al. (2012) detected this marker was also associated with TKW. Importantly, Xwmc304 126 was not only a favored allele for grain number, but was also favored for TKW. Both grain number and TKW are important components of grain yield. For a long time it was believed that these traits were negatively correlated, i.e. small-seeded varieties normally produce more seeds. However, alleles such as Xwmc304 126 can be favored since they improve both traits.

Thus, whole genome association can provide a profile of chromosome regions or loci contributing to yield, their relative contributions to the phenotypic variation of target traits, and the relative phenotypic effects of different alleles. It identifies the candidate regions or markers that should be considered in breeding programs and in dissecting genetic variation related to target traits.

Frequencies and genetically additive effects of favored alleles indicate potential for yield increases by selection of loci associated with GN

Improvement in grain productivity can be achieved by increasing both grain weight and grain number. TKW increased from a mean 31.5 g in the 1940s to 44.64 g in the 2000s, with a 2.19 g increase in each decade (Zhang et al., data not shown). A large difference in TKW was found between landraces and modern varieties in the Chinese wheat MCC (Wang et al. 2012). Meanwhile the increase of grain number was relatively slow and only minor differences existed between the landraces and modern varieties in the MCC. Our results indicate that there are five favored alleles, whose frequencies were much higher in modern varieties than in the landraces (Fig. 2). The frequencies of most of the favored alleles in modern varieties were not significantly different from those in landraces, and favored allele at eight loci did not increase in frequency in modern varieties relative to landraces; some even decreased. Moreover, the favored alleles contributing most to the phenotypic variation occurred at low frequencies in modern varieties. Donmez et al. (2001) suggested that the number of kernels per spike was more amenable to genetic improvement than kernel weight. There seems to be potential for improving yield by selection for grain number.

Molecular breeding by design aims to select the most valuable genotypes or alleles and to combine them in developing a desired variety. Wang et al. (2012) detected linear correlations between TKW and favored alleles. We also found linear correlations between grain number and favored alleles in this study. The evidence indicates that there are additive effects of QTLs or genes; and effects that should not be ignored in practice and breeders could improve grain number by combining more favored alleles. Frisch et al. (2010) found that predictions based on 50 chosen genes were as accurate as predictions based on 5,000 random genes. Therefore, it is feasible to increase grain number in wheat by marker-assisted selection.

Genetic basis of the relationship between GN and TKW in Chinese wheat

Grain number and grain weight are important aspects of the domestication syndrome, which partly explain the process of domestication of crop plants. Traditionally, there was a negative correlation between these traits, evident from repeated observations that small-seeded plants normally produce more seeds. In wheat, this negative correlation also exists (Jia 1984; Gaju et al. 2009). However, some recent studies showed there was no correlation between the two traits (Wang et al. 2001; Brancourt-Hulme et al. 2003), and sometimes even showed positive correlation (Aycicek and Yildirim 2006). Clearly, the relationship between the two traits depends on the actual experimental materials. In the present study, we found that there was no correlation between the two traits in Chinese landraces, but there was a significantly positive correlation in modern varieties. We concluded that selection played a major role in this difference. For landraces in China, natural selection made them produce more spikelets, larger grain numbers and lower TKW. However, during modern wheat breeding for high yield, artificial selection played a major role. Many studies found that grain weight per spike was a leading contributor to improved yield, and TKW was the main component of grain weight per spike. Therefore, the main focus was on TKW which underwent the greatest response among all yield components (Hu and Zhao 1995). The trend in grain number per spike has changed in different production areas and different periods. Study of varieties from the 1980s showed that changes in grain number per spike among varieties were quite small (Ting 1979; Qian et al. 1989), whereas studies after 1990 (Tian 1991; Hu and Zhao 1995) showed an increasing trend in grain number, and even more so after 2000 (Zhang et al., data not shown).

It is normally assumed that genotypic correlations between complicated traits such as yield and its component traits arise from pleiotropy. Pleiotropic effects of QTL and epistatic loci appeared to be responsible for the negative correlation between KW and GN. In this study, we undertook a genome-wide association analysis of GN using the same 531 SSR markers as Wang et al. (2012). Among the associated markers, only 30 loci were simultaneously associated with GN and TKW, accounting for 17 % of all associated loci, and most of the associated loci were associated with only one of traits. The favored alleles at these loci should increase the phenotypic value of one trait, without negatively affecting the phenotypic value of the other. In other words, there was a significant proportion of identified QTL affecting GN, but not affecting TKW. Thus, selection of these QTLs was likely a major factor in changing the relationship between GN and TKW over time. More detailed information about the loci involved in different yield components and their genetic relationships will be helpful in improving yield potential by deliberately choosing these loci for marker-assisted selection.