Introduction

As world’s population rapidly increases and cultivatable land decreases, improving crop yield is an essential solution to solve world’s hunger issue. Wheat (Triticum aestivum) is one of the major food crops and its agronomic characteristics, especially grain yield, have been widely studied (Börner et al. 2002; Groos et al. 2003; Li et al. 2007). For an individual plant, kernel yield consists of three components: number of spikes per plant, number of kernels per spike, and kernel weight. As one of the important factors, kernel weight is mainly determined by several factors including kernel length (KL), kernel width (KW), and kernel thickness (KT) (Xu et al. 2002; Breseghello and Sorrells 2006). Among them, kernel width is the most important component of thousand-kernel weight (TKW) (Sun et al. 2009; Gegas et al. 2010). Therefore, enhancing kernel width in breeding programs is one of the important approaches to increase crop yield.

Quantitative trait locus (QTL) analysis has been widely used to study specific yield-related traits in bread wheat. In the last decade, QTLs for kernel shape (KL and KW) have been detected on several wheat chromosomes (Dholakia et al. 2003; Breseghello and Sorrells 2007; Sun et al. 2009; Ramya et al. 2010). Numerous QTLs for TKW have been reported on almost all 21 chromosomes of wheat (Börner et al. 2002; Dholakia et al. 2003; Groos et al. 2003; McCartney et al. 2005; Quarrie et al. 2005; Kumar et al. 2006; Li et al. 2007; Sun et al. 2009; Ramya et al. 2010). To date, a few genes have been cloned in wheat (Yan et al. 2000, 2004; Fu et al. 2009), among them, only TaGW2 is associated with TKW (Su et al. 2011). The isolation of important QTL by map-based cloning is hampered by the large and complex genome of wheat, the high number of repeated nucleotide sequences and the lack of an annotated genomic sequence (Arumuganathan and Earle 1991; Nabila et al. 2004; Flavell et al. 1974).

In rice, several yield-related QTLs, such as GS3 (Fan et al. 2006), GW2 (Song et al. 2007), GW5 (Weng et al. 2008), Ghd7 (Xue et al. 2008), and GIF1 (Wang et al. 2008), have been isolated. Comparative genomics based on QTL mapping revealed that gene sequence and order among different grass species are conservative (Ahn et al. 1993; Gale and Devos 1998; Varshney et al. 2005) and some homologous genes share similar functions across different species (Kojima et al. 2002). Thus, isolated rice genes may provide promising candidates for cloning corresponding genes in other species using comparative genomics. For example, GS3 is a rice gene responsible for rice kernel size development. Recently an orthologous maize gene, ZmGS3, has been cloned and thought to be involved in maize kernel development (Li et al. 2010).

The OsGW2 gene on rice chromosome 2 is the first identified gene controlling KW and kernel weight in rice (Song et al. 2007) and it encodes a previously unknown RING-type protein with E3 ubiquitin ligase activity. It is reported that this gene negatively regulates kernel width development through the control of cell division of spikelet hull. In slender-kernel varieties, OsGW2 encodes a complete protein with normal function. In contrast, in a wide-kernel variety, a single-nucleotide deletion in the fourth exon led to a premature stop, resulting in a non-functional protein with a truncation of 310 amino acids. In another maize study, Li et al. (2010) found two homologs of rice OsGW2 gene through comparative genomics, ZmGW2-CHR4 and ZmGW2-CHR5. One single nucleotide polymorphism (SNP) in the promoter region of ZmGW2-CHR4 was highly associated with KW and TKW, but nucleotide variation in coding region similar to rice OsGW2 was not found in any of the two genes, thus different mechanisms were proposed for the phenotypic variation caused by these two genes. Recently, a gene orthologous to rice OsGW2 gene, TaGW2, was cloned in bread wheat and located in homologous chromosomes group 6. Two SNPs were detected in the promoter region of TaGW2-6A, and one of them was related to KW and TKW (Su et al. 2011). However, divergence in the coding region was not found among the wheat varieties used in that study.

Interestingly, we recently identified a single-nucleotide insertion in the coding region of TaGW2 from a large-kernel wheat variety, Lankaodali. An F2:3 population (Lankaodali × Chinese Spring) and 22 additional varieties were used to study the function of the gene in kernel size. The objectives of this study were to (1) identify the mutation site of TaGW2 in Lankaodali, (2) develop allele-specific PCR markers based on the functional mutation site to distinguish the TaGW2 mutant and wild-type alleles, (3) determine the relationship between the mutant allele and kernel size through association analysis between genotypes and corresponding kernel phenotypes, and (4) detect whether other varieties carry the same mutation using newly developed markers.

Materials and methods

Plant materials

Chinese Spring (CS) has small kernels (TKW = 27.75 ± 0.62 g) and Lankaodali (LK) has very large kernels (TKW = 57.49 ± 0.88 g). They were used to obtain the genomic DNA, cDNA and introns sequence of TaGW2. Chinese Spring group 6 nulli–tetrasomic families were used for chromosomal locations of the TaGW2 mutant allele: N6A-T6D (nullisomic for 6A, tetrasomic for 6D), N6B-T6D (nullisomic for 6B, tetrasomic for 6D) and N6D-T6B (nullisomic for 6D, tetrasomic for 6B).

A total of 327 F2 plants were obtained from the cross (Lankaodali × Chinese Spring). F3 seeds from each F2 individual and their parents were planted in the wheat-growing season of 2010 at three different locations (Yangling, Qianxian, and Qishan) in Shaanxi, China, with two replicates in each field experiments. Thirty seeds per family were individually hand-planted in a 2-m row at 25-cm apart. Field plots were managed the same as for commercial production. Seeds harvested from the F2:3 families were evaluated for KL, KW and TKW.

Twenty-two additional wheat varieties with various kernel sizes were used for the validation of the developed markers. These varieties were planted separately in the 2009 and 2010 wheat-growing seasons on the experimental farm of Northwest A&F University, Yangling, Shaanxi. Field management followed the same agricultural practice as that of F2:3 families, and so did the seed harvesting and the kernel traits analysis.

Measurements of kernel traits

The F2:3 family seeds were harvested from each of the two replicates in three plots, and randomly selected 30 kernels were measured for KL and KW using a ruler. Two independent samples of 200 kernels were weighted and converted to TGW for final data analysis. The kernel traits for 22 additional wheat varieties were measured using the same methods. The statistical analysis was conducted using the software SPSS 17.0 for Windows (SPSS Inc., Chicago, IL).

DNA extraction, RNA extraction, primer design and PCR amplification

Genomic DNA of Chinese Spring, Lankaodali, 327 F2 plants and 22 additional varieties were extracted from young leaves of field-grown plants using a modified CTAB method (Chen and Ronald 1999). Total RNA of the young leaves from Chinese Spring and Lankaodali was isolated using TRIzol reagent according to manufacturer’s instruction (TaKaRa Biotechnology, Dalian, China). RNA concentration and quality were determined using nucleic acid and protein analytic apparatus (Biophotometer plus, Eppendorf, Germany). All primers used in this study (Table 1) were designed by Primer Premier 5.0 software (Premier Biosoft International, Palo, Alto, CA) and obtained from TaKaRa Biotechnology (Dalian) Co., Ltd. (http://www.takara.com.cn). Briefly, a 25 μL PCR reaction mix usually contains about 100 ng genomic DNA, 2.5 μL 10× PCR buffer (Mg2+ plus, 15 mM), 2 μL dNTP Mixture (2.5 mM each), 1 μL of each primer (10 μM), and 0.125 μL of Taq DNA polymerase (Takara, 5 U/μL). PCR was incubated at 94 °C for 5 min, followed by 35 cycles of 94 °C for 45 s, 60 °C (or optimal annealing temperature for different primer listed in Table 1) for 45 s, and 72 °C for 1.5 min, with a final extension of 72 °C for 10 min. The amplified products were analyzed on 1 % agarose gels using 1× TAE buffer for 1 h at 100 V and visualized with ethidium bromide staining under UV light.

Table 1 Primers for sequencing, prokaryotic expression and AS-PCR

Isolation of the complete coding region and all the introns of TaGW2

The complete cDNA sequence of rice OsGW2 gene (Song et al. 2007) was obtained from Genebank by searching the GeneBank accession No. EF447275.1. The rice sequence was used to blast against wheat EST database in National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/) and The Institute for Genomics Research (TIGR, http://compbi.dfci.harvard.edu/cgi-bin/tgi/Blast/index.cgi). Many EST sequences showed high identity, and they were assembled to an integrated contig sequence using the ContigExpress component of the Vector NTI software (Invitrogen, Carlsbad, CA). A specific primer pair, TaGW2-P1 (Table 1), was designed based on the assembled contig sequence to amplify the full-length TaGW2 cDNA sequence. A total of 1 μg RNA was reverse-transcribed into the first strand cDNA using the oligo (dT) primers following the instructions of the RevertAid First Strand cDNA Synthesis Kit (Fermentas, China). Two microliters of cDNA was used as a template for PCR amplification with the primer TaGW2-P1 using PCR procedure described previously. PCR was analyzed in an agarose gel and the bands with expected size were excised from the gel and eluted using Gel extraction kit (Biomiga, San Diego, CA). The purified fragments were subsequently ligated into the pGEM-T Easy Vector and transformed into competent E. coli DH5a cells according to the manufacturer’s instructions (Promega, Madison, WI, USA). After PCR and restriction enzymes analysis, ten positive clones were sequenced with ABI 3730 (Shanghai Sangon Biotechnology Co., Ltd., Shanghai, China).

Seven introns of TaGW2 were also cloned and sequenced to study the variation of introns between two parents, because introns of some genes were thought to be involved in the functional regulation (Lavett 1984; Bornstein et al. 1987). The rice OsGW2 had been reported to contain seven introns, and their exon/intron boundaries were obtained from NCBI. Five primer pairs (Table S1) were designed in the vicinity of corresponding splicing sites of TaGW2 on the basis of conserved intron positions among orthologous genes (Fedorov et al. 2002; de Roos 2007) to isolate the introns of TaGW2 using the same procedures described for cloning of coding region.

The cDNA sequences of TaGW2 from the Chinese Spring and Lankaodali were translated to protein sequences using EditSeq module of DNASTAR software (DNASTAR; Lasergene, Madison, WI). Molecular weights (MW) and functional structural motifs were predicted for TaGW2 proteins using the ProtParam tool (http://au.expasy.org/tool/protparams.html) and the SMART domain prediction server (Simple Modular Architecture Research Tool; http://smart.embl-heidelberg.de/). The nucleotide sequences and the deduced amino acid sequences were aligned using ClustalW2 program (http://www.ebi.ac.uk/Tools/clustalw2/index.html).

Prokaryotic expression analysis of TaGW2

According to the TaGW2 cDNA sequence, a second primer pair TaGW2-P2 (Table 1) supplemented with the BamHI and HindIII sites was designed to amplify the coding region. The cDNA sequences from two parental varieties were used as the template for PCR. PCR amplification, cloning, and sequencing used the same procedures mentioned above. After confirmed by sequencing, the recombinant pGEM-T Easy Vectors containing either the TaGW2-Lankaodali with the mutation site or the corresponding TaGW2-Chinese Spring allele were separately digested with BamHI and HindIII restriction enzymes (TaKaRa Biotechnology, Dalian, China). The expression vectors pET32a (+) (Invitrogen, Carlsbad, CA) were also digested using the same restriction enzymes. After that, the two TaGW2 fragments were purified, then ligated with the expression vectors pET32a (+) using T4 DNA ligase (TaKaRa Biotechnology, Dalian, China) with a molar ratio of 3:1 at 4 °C overnight. The resulting expression plasmids were named pET32a-TaGW2-LK for Lankaodali and pET32a-TaGW2-CS for Chinese Spring, respectively. Both recombinant plasmids were subsequently transformed into competent E. coli Rosetta-gami B (DE3) expression host strain (Invitrogen, Carlsbad, CA), and the positive clones were selected by PCR and restriction digestion.

Positive clones harboring pET32a-TaGW2-LK and pET32a-TaGW2-CS were cultured in 5 mL LB medium containing 100 μg mL−1 ampicillin overnight at 37 °C. 1 mL cultures were subsequently transferred into 20 mL fresh LB medium and grown at 37 °C until the OD600 of bacterial culture reached to 0.6, and then, 1 mM IPTG was added to induce the expression of the recombinant protein. Following further culture at 30 °C, the samples were collected for five times (0, 2, 4, 6, 8 h) after induction, then the cells were harvested by centrifugation at 10,000 rpm for 10 min at 4 °C, 1× SDS loading buffer was subsequently mixed with the samples and boiled at 100 °C for 5 min. Total protein and target protein were detected by SDS-PAGE (Ylä-Herttuala et al. 1989). The standard molecular weight marker was used to determine the molecular weight of target protein and E. coli Rosetta-gami B (DE3) carrying an empty pET32a (+) vector was also analyzed as a negative control.

Development of allele-specific PCR markers and application

To distinguish the alleles between mutant and wild-type, six modified allele-specific PCR primers were designed based on the amplification refractory mutation system (ARMS) technique (Newton et al. 1989). To enhance the specificity of the primers, an additional mismatch was introduced to replace one of the three bases closest to the 3′-end of the AS-PCR primers. Among the six primers, primers F4 and F6 were designed as the upstream primers according to the mutation site from Lankaodali with the mismatch separately at the second and third nucleotide of the 3′-end. Primer R4 was also from Lankaodali as the downstream primer with the mismatch at the third nucleotide of the 3′-end. To determine the genotype of the SNP locus, primer F5 was designed according to the corresponding site from Chinese Spring with the mismatch at the second nucleotide of the 3′-end. In addition, a pair of primers F3 and R3 was designed from the common sequence between Lankaodali and Chinese Spring as an internal positive control to improve the reliability of allele-specific PCR, which generated a product containing the mutation site. So four sets of primers formed from the combinations of the common primers and any one of another four primers were used to amplify the specific products. Among them, any set of primers TaGW2-AS-P1 (F3/F4/R3), TaGW2-AS-P3 (F3/F6/R3) and TaGW2-AS-P4 (F3/R4/R3) will be combined with primer set TaGW2-AS-P2 (F3/F5/R3) to distinguish different genotypes. All allele-specific primer sequences and other related information were listed in Table 1.

Slight modification was made based on the standard PCR procedure described above during the process of allele-specific PCR. Take primer set TaGW2-AS-P1 as an example, 0.4 μL F3 primer (10 μM), 0.6 μL F4 primer (10 μM) and 1 μL R3 primer (10 μM) were added into a common PCR reaction system, in which the template DNA was increased from 100 to 200 ng, and the PCR amplification cycles were increased from 35 to 37 with other conditions unchanged. Furthermore, we optimized the annealing temperature of all four sets of primers by running gradient PCR to obtain the optimal results.

Using optimized PCR conditions, the four sets of primers were first tested in Lankaodali and Chinese Spring to identify the ideal primers that can amplify polymorphisms between the two varieties. The selective primers were then used to amplify DNA from F2 population and other wheat varieties.

Results

Sequence analysis of TaGW2

Using the primer set TaGW2-P1, a 1,275 bp fragment was amplified from the cDNA of Chinese Spring. Three orthologous sequences were identified from 10 cDNA clones, which corresponding to A, B, and D genome of bread wheat. A comparative search of one of the three sequences in the NCBI database showed that it shared 87.0 % nucleotide homology with the OsGW2 in rice (GenBank Accession EF447275.1), 84.0 % with a sequence in maize (GenBank Accession FJ573211.1), 96.0 % with a sequence in barley (GenBank Accession EU333863.1), and 84.0 % with a sequence in sorghum (GenBank Accession XM_002453553.1).

All the three TaGW2 coding sequences were then translated into corresponding proteins with 424 amino acids residues. Protein function prediction showed that all the three sequences contain a RING-domain of 43 amino acids from residues 61 to 103. Amino acid sequence alignment revealed that three protein sequences shared completely identical RING-domain with only one amino acid difference between wheat TaGW2 and rice OsGW2 protein of the same domain. Seven introns of TaGW2 were isolated from the genomic DNA of Chinese Spring and the intron positions were found to be consistent with those of rice OsGW2 gene. These results indicated that OsGW2 genes among different grass species were highly conserved.

SNP identification for TaGW2

Alignments of TaGW2 coding sequence identified single nucleotide variations in all the three sequences from A, B and D genomes between Chinese Spring and Lankaodali. Some single nucleotide substitutions were found for two of the three sequences between the two varieties, but they did not cause any corresponding changes in amino acid sequence. However, comparison of another sequence revealed an important variation between the two varieties, an insertion of nucleotide T at the 977-bp position in the eighth exon of TaGW2 allele in Lankaodali. This single-base insertion caused the shift of open reading sequence (ORF) and a subsequent stop codon (TAG) at the 984-bp position (Fig. 1a, b). The premature termination resulted in a 96-aa truncation in the C-terminus of the predicted protein in Lankaodali (Fig. 1c). Although seven other single-nucleotide changes were detected and one of them caused single amino acid change in the protein chain (Fig. 1a, c), only the 1-bp insertion is the causal mutation responsible for change in function of TaGW2 protein, which may also be responsible for phenotypic variation. Thus, the sequence with only the 1-bp insertion was selected as the major target for further investigation. Subsequent mapping assay located the TaGW2 mutation allele on chromosome 6A.

Fig. 1
figure 1

SNP identification of TaGW2-6A for Chinese Spring and Lankaodali and structures of TaGW2-6A gene. a Eight SNP locus were detected through comparison of the TaGW2-6A sequences from Chinese Spring (TaGW2-6A-CS) and Lankaodali (TaGW2-6A-LK). The T-base insertion site is indicated by an asterisk. b Organization of the TaGW2-6A gene. The positions of coding regions (black boxes), translation start codon (ATG), translation stop codon (TAG or TAA), the upper represents TaGW2-6A for Lankaodali, a T-base insertion mutation and a subsequent stop codon in the eighth exon were indicated. c Alignment of predicted amino acid sequence of the two TaGW2-6A protein, the RING-domain were boxed, altered amino acids at the SNP sites were also displayed

Expression analysis of TaGW2-6A

To identify the mutation site of TaGW2-6A, prokaryotic expression experiments were performed to compare the expression status of two TaGW2-6A alleles in two varieties. In the process of constructing the TaGW2-6A expression vectors, we firstly identified the correct sequences of TaGW2-6A gene in two varieties by sequencing, and then the recombination expression plasmids pET32a-TaGW2-LK and pET32a-TaGW2-CS were separately used for expression assay (Fig. 2a, b). TaGW2-6A in Chinese Spring was predicted to encode a 47.2 kDa protein, while a 37.1 kDa protein in Lankaodali. SDS-PAGE gel analysis showed that the Trx-TaGW2-6A fusion protein was successfully expressed in E. coli Rosetta-gami B (DE3) when induced by IPTG. A 67.6 kDa Trx fusion protein containing a 20.4 kDa TrxA protein and a 47.2 kDa TaGW2-6A protein was detected in pET32a-TaGW2-CS, whereas a smaller Trx fusion protein about 57.5 kDa containing a 20.4 kDa TrxA protein and a 37.1 kDa truncated TaGW2-6A protein was expressed in pET32a-TaGW2-LK (Fig. 2c). The expression result confirmed the prediction and further verified the presence of the T-base insertion in Lankaodali.

Fig. 2
figure 2

The results of prokaryotic expression. a Result of PCR amplification for TaGW2-6A gene. Lane 1 and Lane 2, respectively, indicate the amplified product of TaGW2-6A-LK and TaGW2-6A-CS. b Identification of the recombination vector pET32a-TaGW2-LK and pET32a-TaGW2-CS by restriction enzymes digestion. c SDS-PAGE of fusion proteins expressed in E. coli Rosetta-gami B (DE3) lanes 1–5 total protein of E. coli Rosetta-gami B (DE3) containing pET32a after induction with IPTG for 0, 2, 4, 6, and 8 h, respectively. Lanes 6–10 total protein of E. coli Rosetta-gami B (DE3) containing pET32a-TaGW2-LK after induced with IPTG for 0, 1, 2, 4, 6 and 8 h, respectively. Lanes 11–15 total protein of E. coli Rosetta-gami B (DE3) containing pET32a-TaGW2-CS after induced with IPTG for 0, 1, 2, 4, 6 and 8 h, respectively. Lane M protein molecular mass standards. The corresponding fusion proteins were indicated by arrows

Development of AS-PCR markers for TaGW2-6A

The identified 1-bp insertion at the 977th base pair of TaGW2-6A in Lankaodali was used to develop PCR-based SNP marker to distinguish the different alleles of this gene. Four sets of primers (Table 1) were designed, including three sets for mutated TaGW2-6A allele and one for wild-type allele. Each set comprises a pair of common primers and a specific primer with an introduced mismatch in the vicinity of the mutation site (Fig. S1a). Three of the four sets of primers, except for TaGW2-AS-P1 (F3/F4/R3), effectively differentiated the TaGW2-6A allele variation between the two varieties. Primer sets TaGW2-AS-P3 (F3/F6/R3) and TaGW2-AS-P4 (F3/R4/R3) generated a specific 318 bp and a 260 bp fragment, respectively, in Lankaodali, while primer set TaGW2-AS-P2 (F3/F5/R3) generated a specific 317 bp fragment in Chinese Spring. All three primer sets generated a common PCR fragment of 538 bp (Fig. S1b). PCR annealing temperature of 57 °C was found to be optimal for all the three primer sets (Fig. S1b). Any of the two primer sets TaGW2-AS-P3 (F3/F6/R3) and TaGW2-AS-P4 (F3/R4/R3) combined with primer set TaGW2-AS-P2 (F3/F5/R3) in two complementary PCRs can be used to differentiate wild-type genotype (tt), heterozygous genotype (Tt), and homozygous mutant genotype (TT) at TaGW2-6A locus.

Chromosomal locations of the TaGW2 gene with the insertion mutation site

As described above, primer set TaGW2-AS-P2 (F3/F5/R3) was designed for Chinese Spring according to the TaGW2 sequence from the same chromosome as the mutated TaGW2 gene for Lankaodali. Thus, it was used for chromosomal localization using Chinese Spring group 6 nulli–tetrasomic families. The target PCR product was detected in N6B-T6D and N6D-T6B, but not in N6A-T6D (Fig. S2), indicating that the mutant TaGW2 allele was located on chromosome 6A of bread wheat.

The relationship between the mutant TaGW2 allele and kernel traits

To explore whether the insertion mutation leads to the large-kernel phenotype of Lankaodali, an F2 population of 327 individuals derived from the cross between Lankaodali and Chinese Spring was genotyped for the insertion mutation site using two primer sets TaGW2-AS-P4 (F3/R4/R3) and TaGW2-AS-P2 (F3/F5/R3) (Fig. S3). The genotyping results showed that three genotypes were detected with 75 F2 individuals being TT genotype, 184 Tt and 68 tt. The ratio of the three genotypes fits well to the expected ratio (1:2:1) for a single gene segregation in the F2 population (χ2 = 5.44, P > 0.05). Accordingly, all three groups of F2 individuals (TT, Tt and tt) were advanced into F2:3 families (Table S2).

Kernel traits for the three groups tested in three environments were analyzed to determine the relationship between three genotypes and kernel phenotype (Table 2). Significant differences (P < 0.001) in KW, TKW, and KL between tt and the other two genotypes (TT and Tt) were observed across all three environments, indicating that the single-basepair insertion can be associated with not only KW and TKW, but also KL, although the absolute difference was smaller in KL. Difference (P < 0.05) in KW between TT and Tt genotypes was significant in Yangling and the combined analysis, but not in TKW and KL, suggesting that the mutation has a dominant effect only for KW. Compared with the tt genotype, the average increase for KW was 0.13 mm for Tt genotype and 0.18 mm for TT genotype in KW. Similarly, the average increase for TKW was 2.92 g (Tt) and 3.94 g (TT).

Table 2 Kernel trait analyses of the 327 F2:3 families with three different genotypes and the two parental cultivars

The frequency distributions of KW, KL and TKW in the 327 F2:3 families are illustrated in Fig. 3. The KW of 68 F2:3 families with tt genotype ranged from 2.78 to 3.29 mm with an average of 3.10 ± 0.02 mm. Take 3.20 mm as the line of demarcation, among them, 78 % of the 68 families were <3.20 mm. The KW of 75 F2:3 families with TT genotype ranged from 3.03 to 3.62 mm with an average of 3.28 ± 0.01 mm and 72 % of the families showed more than 3.20 mm. The KW of 184 heterozygous F2-derived F2:3 families ranged from 2.92 to 3.53 with an average 3.23 ± 0.09 mm and 65 % of them showed more than 3.20 mm (Fig. 3a). Moreover, the KW and TKW of several families with TT and Tt genotypes were superior to Lankaodali, indicating that there is an opportunity to select high-yielding wheat varieties from the genotype with mutation TaGW2-6A. The TKW distribution was similar to that of KW, whereas KL showed a more or less normal distribution (Fig. 3b, c). These results demonstrated that the mutation site of TaGW2-6A was highly associated with kernel development, especially for KW. Its action mechanism may be similar to that of OsGW2 in rice.

Fig. 3
figure 3

Frequency distribution of kernel width (a), kernel length (b) and 1,000-kernel weight (c) in the F2:3 families. The mean values of three traits were used to generate the graphs

TaGW2-6A mutation in other wheat germplasm

To evaluate whether this particular mutation found in Lankaodali also exists in other wheat germplasm, the AS-PCR markers were used to screen 22 additional varieties with various kernel sizes. Besides the primer sets TaGW2-AS-P4 and TaGW2-AS-P2, we also used TaGW2-AS-P3 to verify the results (Fig. S4). The Lankaodali PCR banding patterns (TT) were only found in Sichuandali and Wanmai 38. Sichuandali has kernel traits similar to Lankaodali, but it has a wider KW (3.79 ± 0.02 mm), longer KL (8.43 ± 0.10 mm), and greater TGW (65.15 ± 0.15 g) than the other varieties (Fig. 4). Wanmai 38 also showed wide KW (3.48 ± 0.005 mm), but the KL (6.91 ± 0.005 mm) was shortest among the three varieties with TT genotype (Fig. 4). To further confirm our PCR results, sequencing was also performed for these two varieties. TaGW2-P3 was used to amplify the eighth exon, and the sequence alignment proved that Lankaodali, Sichuandali, and Wanmai 38 shared the identical insertion mutation site (Fig. S5), which was consistent with our PCR result. On the other hand, Mingxian 169, with a very small KW (2.80 ± 0.03 mm) and TKW (27.25 ± 0.75 g), had the same PCR bands as Chinese Spring (Fig. 4), which was also confirmed by sequencing (Fig. S5). Kernels of other varieties with tt genotypes were all smaller than that of Lankaodali and Sichuandali (Table S3). Thus, all these results indicated that the insertion mutation at TaGW2 is closely associated with large kernel size, and the AS-PCR primers can be effectively used in marker-assisted breeding programs to identify large kernel individuals.

Fig. 4
figure 4

Kernels of five varieties used in this study. SC Sichuandali, LK Lankaodali, WM Wanmai 38, CS Chinese Spring, MX Mingxian 169

Discussion

The isolation of rice OsGW2 gene from chromosome 2 provides the opportunity for us to identify its homologous genes in bread wheat. Using comparative genomics approach, both Su et al. (2011) and this study (GenBank Accession HQ404374.1) isolated the full cDNA sequences of TaGW2 from wheat. In rice, a 1-bp deletion in the fourth exon of OsGW2 was found in a big kernel rice variety and associated with large size kernel. In wheat, total three orthologous sequences were identified for three wheat genomes. By comparing the three orthologous sequences of TaGW2 between small and large kernel varieties, Su et al. (2011) did not find any variation in the coding regions. In stead, a base substitution in the promoter region of TaGW2-6A was found to be associated with kernel development (Su et al. 2011). In this study, variation was also not detected for the coding sequences from chromosomes 6B and 6D. However, further analysis of sequence from 6A identified a 1-bp insertion in the eighth exon of TaGW2 of a big kernel variety, Lankaodali. This insertion causes non-functioning protein, which was similar to the deletion in rice OsGW2, suggesting that TaGW2-6A is the gene with similar function to OsGW2 in rice. This finding is different from that of Su et al. (2011) and the discrepancy may be due to different materials used in these two studies.

Lankaodali carries the natural variation in TaGW2-6A coding region and is a unique material for studying the function of TaGW2 in wheat. The genetic analysis showed that most F2:3 families carrying the homozygous mutant TaGW2 allele had larger kernels than those from the families carrying homozygous TaGW2 wild-type allele; while the TaGW2 heterozygous families showed an intermediate type, with most families inclining toward big kernel parent (Fig. 3). This result indicates that the insertion of T nucleotide associates with increased kernel size, and the mutant allele is dominant or partially dominant. Comparison of possible amino acid sequences coded by TaGW2-6A and rice OsGW2 genes identified one amino acid difference in the function domain. The prokaryotic expression confirmed that mutated TaGW-6A gene coded an incomplete protein. In rice, loss of OsGW2 function increases cell numbers and the grain milk filling rate, resulting in a larger spikelet hull, enhanced grain width, weight and yield (Song et al. 2007). Thus we concluded that TaGW2 negatively regulates kernel width and weight development, which coincide with rice OsGW2 gene. Therefore, the two genes from these two important crops affect kernel width and weight through a similar mechanism. However, Su et al. (2011) reported that TaGW2 regulates kernel development through differentially regulating TaGW2 protein expression levels between varieties with contrasting kernel sizes, not direct change in protein content.

In this study, we located mutated TaGW2 from Lankaodali on the wheat chromosome 6A, which agrees with Su et al. (2011) and several other studies that reported stable QTLs for kernel width and yield on chromosome 6A (Sun et al. 2009; Snape et al. 2007). Su et al. (2011) further located TaGW2 to a region close to the centromere in the short arm of homologous group 6, which agrees with the positions reported for kernel width and yield QTLs (Sun et al. 2009; Snape et al. 2007). Further research is required to determine whether TaGW2 is the key gene for these QTLs. More interestingly, wheat group 6 showed almost complete colinearity with rice chromosome 2 (Distelfeld et al. 2004; Sorrells et al. 2003; Gale and Devos 1998). In another report, the rice QTL for kernel weight corresponding to KW QTL was found on the syntenic regions of wheat chromosomes 6A by comparative sequence analysis, but the OsGW2 locus on rice chromosome 2 did not correspond to any of the identified wheat QTLs (Gegas et al. 2010). That is not unexpected, because functional variation is due to 1-bp deletion in rice OsGW2, but a 1-bp insertion in wheat TaGW2-6A.

In rice, other two QTLs for KW, GW5/qSW5 (Weng et al. 2008; Shomura et al. 2008) and GS5 (Li et al. 2011) were also isolated. GW5/qSW5 was mapped on rice chromosome 5, and a 1,212-bp deletion was highly correlated with the KW phenotype. Unlike any of previously reported genes affecting kernel size in rice, GS5 functioned as a positive regulator of kernel size, higher expression of which was correlated with larger kernel size. These genes for KW isolated in rice suggested that wheat may also have more than one gene to control KW, and some QTLs detected for kernel width on several different wheat chromosomes may serve as support evidence. Thus, overlap of phenotypic data for kernel size between TT and tt genotypes in this study can be due to other modifier QTLs or environmental effects on the gene expression.

As an evolutionarily and agriculturally important trait, big kernel size is always the target of selection during domestication and crop improvement, so genes involved in its variation could be preferably selected in the long-term process of domestication. In rice, 38 varieties were determined to carry the C-A mutation in GS3 gene after 180 varieties were screened for the gene, and GS3 was suggested to be involved in kernel size domestication (Fan et al. 2006; Takano-kai et al. 2009). In this study, we identified two other varieties besides Lankaodali with the 1-bp insertion after 22 varieties with various kernel sizes were screened (Fig. S4, S5). Both varieties carrying the mutated allele (TT) are large kernel type as Lankaodali, and they were developed in different geographic areas of China with completely different pedigrees, thus they are unrelated to each other (Table S4). More varieties with large kernel and the same mutation can be expected if more varieties were screened. The result indicated that the insertion mutation in TaGW2 has been utilized in modern wheat breeding, but the frequency is still low. Therefore, the mutation may be a recently introduced variation into wheat gene pool like C-A mutation in rice GS3.

It is difficult for breeders to effectively improve the quantitative traits such as kernel length and width using conventional selection methods (Wang et al. 2011a). Molecular markers will make it feasible to select these quantitative traits; thus, development of markers for these quantitative traits is essential. Several methods are available for developing markers for SNP genotyping (Kwok 2001; Jenkins and Gibson 2002; Semagn et al. 2006). However, because most methods can not be used in a regular breeding programs due to the technical complexity and high cost of SNP genotyping, two were widely used in breeding, cleaved amplified polymorphic sequences (CAPS) and allele-specific PCR (AS-PCR) (Hayden et al. 2009; Zhao et al. 2007; Jeong and Maroof 2004). Both methods have been proved to be reliable (Wang et al. 2011b). Among them, AS-PCR has the advantages of being simpler, more rapid and inexpensive (Hayashi et al. 2004). In this study, no corresponding restriction site was found in the vicinity of the T-base insertion mutation site, so the CAPS method could not be adapted. As an alternative, we developed several AS-PCR primer sets to discriminate the mutant TaGW2 allele from wild-type TaGW2 allele. The AS-PCR results showed good polymorphisms among F2 progenies and among different varieties. Sequence analysis further verified the genotyping results from the AS-PCR. These AS-PCR markers can be directly used to assist selection for kernel weight improvement in wheat breeding. In conclusion, we demonstrated that the insertion mutation of TaGW2 was highly associated with big kernel size, and the AS-PCR markers developed in this study can be used in marker-assisted breeding in bread wheat.