Introduction

As the most widely cultivated crop with the highest trading value, common wheat (Triticum aestivum, 2n = 6x, genomes AABBDD) provides ~20% of our daily calorie and protein supply, thus playing a critical role in global food security and rural economy. Although wheat yield has reached a record of 736 million tons (FOA Stat 2015), demand for wheat grains continue to increase due to the ever growing world population. In another aspect, the rate of wheat genetic yield gain has slowed down, which is further impeded by the negative impact of climate change. Under such a backdrop, the International Wheat Yield Partnership (IWYP) was established with an overarching goal to increase wheat yields by 50% by 2034. To meet this goal, annual wheat yield increases must be raised from the current level of less than 1% to at least 1.7% (http://iwyp.org/). This quantum leap in genetic gain of grain yield requires identifying the genes and gene network underlying the yield and yield components. Grain yield is a product of grain number (GN) per unit area of land and grain size (GS), which is positively correlated with grain weight (GW). Increasing GN was extensively and intensively explored in the past 100 years of wheat breeding (Fischer 2008), which has nearly reached to upper limit and leaves little room for further yield increase due to GN–GS trade-off (Griffiths et al. 2015). The recent surge of studies on wheat GW or GS further corroborates the importance of this trait for future yield improvement. In this review, we summarize progress of GS/GW studies in the model plants Arabidopsis (Arabidopsis thaliana (L.) Heynh) and particularly rice (Oryza sativa L.), and discuss the GS/GW research in wheat. Our recent work on identifying potential GS candidate genes and their organization in the genomes of wheat and barley (Hordeum vulgare L., 2n = 2x, genome HH), as well as the prospect of utilizing GS homologs in improving wheat and barley yield potential with reverse genetics approaches, will also be discussed.

Grain size determination in model plants

The last decade witnessed a blossoming of genetic research on grain (seed) size in the model plants rice and Arabidopsis. More than 40 genes controlling GS were identified, mainly by use of forward genetics approaches, i.e. map-based cloning of quantitative trait loci (QTL) and screening of T-DNA tagging libraries. These genes are mainly associated with three genetic pathways: proteasomal degradation, G-protein signaling and phytohormone signaling (Reviewed by Li and Li 2015; Orozco-Arroyo et al. 2015; Zuo and Li 2014). These pathways are generally conserved between rice and Arabidopsis despite a few differences in functional mode (Fig. 1). In addition to these three pathways, newly identified genes, such as FER in Arabidopsis (Yu et al. 2014) and APG (Heang and Sassa 2012), GIF1 (Wang et al. 2008), HGW (Li et al. 2012a), IPA1 (Jiao et al. 2010; Miura et al. 2010), GS2 (Hu et al. 2015), GS5 (Li et al. 2011b), GLW7 (Si et al. 2016), GW5 (Duan et al. 2017; Weng et al. 2008), GW7 (Wang et al. 2015a), GW8 (Wang et al. 2012) and SRS5 (Segami et al. 2012) in rice, function in unknown pathways, which may eventually be integrated into the three major GS pathways. For example, recent studies indicated that GS2 (Che et al. 2016) and GW5 (Liu et al. 2017) are involved in brassinosteroid signaling. Of these GS genes, IPA1 (Jiao et al. 2010), GLW7 (Si et al. 2016), GS2 (Hu et al. 2015) and GW8 (Wang et al. 2012) are positive GS regulators, but their expression is under negative regulation by microRNA species, miR396 for the GS2 and miR156 for the remaining three genes (Supplementary Table 1). At the cellular level, increase of the grain size could be attributed to an increase in cell number, such as GS5 (Li et al. 2011b); to an expansion of cell size, such as FER (Yu et al. 2014) and GLW7 (Si et al. 2016); or to both, such as GS2 (Hu et al. 2015).

Fig. 1
figure 1

Major GS regulatory genes and genetic pathways in the model plants rice and Arabidopsis. The components in blue were from Arabidopsis, those in red from rice, and those in black from both rice and Arabidopsis. APG, GS2, GS5, FER, GIF1, GLW7, GW7, GW8, HGW, IPA1 and SRS5 function in unknown pathways. References to individual genes can be found in Supplementary Table 1. Modified from Zuo and Li (2014) with permission

Identification of GS regulators in wheat: a candidate gene approach

GS was a subject for selection during the domestication of wheat and modern wheat breeding, and significant variation has been observed among the morphological subspecies and cultivars, which are controlled by major genes and a large number of QTL (Gegas et al. 2010). Compared to rice, wheat has a large, polyploid genome, which impedes map-based cloning of QTL. The availability of draft genome sequences of wheat, however, has led to the development of a powerful approach to identifying GS genes by combining genome-wide QTL mapping and candidate gene analysis. More recent studies focus on validating the wheat homologs of rice GS regulators by QTL mapping and association analysis. Down-regulation mutations in wheat homologs of rice GW2, TaGW2-6A and TaGW2-6B (Jaiswal et al. 2015; Qin et al. 2014; Su et al. 2011) and TaGW2 RNAi transgenic wheat (Hong et al. 2014) increased GS and GW. Mutations in TaGW2-6A and TaGW2-6B are additive in increasing GS (Qin et al. 2014). Linkage mapping also showed that TaGW2-6A coincides with a GS QTL (Simmonds et al. 2014). An 18-bp deletion in intron 2 of TaCKX6-D1 (Gn1-3.4) (Zhang et al. 2012) and a novel allele of TaCKX6a02 (Gn1-3.5) (Lu et al. 2015), wheat homologs of rice Gn1 (OsCKX2), were tightly associated with thousand-grain weight (TGW), suggesting that the Gn1 ortholog is involved in the regulation of both GN and GS in wheat. A G>T substitution in coding sequence (cds) of TaGS5-3A, the wheat homolog of rice GS5, was significantly correlated with larger GS and greater TGW (Ma et al. 2015; Wang et al. 2015b). Variations in wheat homologs of TGW6 on chromosome arms 3AL (Hanif et al. 2016) and 4AL (Hu et al. 2016) individually explained ~17% TGW variation, and the low-expression alleles are associated with low auxin content and high TGW (Hu et al. 2016). A recent study showed that a wheat homolog of rice GS3 on chromosome arm 7DS is associated with grain weight and grain length (Zhang et al. 2014).

Genome organization of GS candidate genes in wheat

Considering the parallels between the GS pathways in the dicot Arabidopsis and monocot rice (Fig. 1) and the association of variation of the GS regulator homologs with GS phenotypes in wheat, we hypothesize that the genetic pathways underlying GS control are conserved in plants, particularly rice and wheat after 40 million years of coevolution, and these GS homologs are important genomic resources for improving wheat yield potential. Based on the similarity to protein sequences and expression patterns of GS regulators of the model plants, mainly rice, we identified wheat GS and GN candidate genes (Supplementary Table 1). Sequences of 30 rice and two Arabidopsis proteins encoded by the GS and GN gene/QTL were used as queries for searching the proteomes of diploid ancestors of wheat, i.e. Aegilops tauschii Coss. (2n = 2x, genome DD) (Jia et al. 2013) and T. urartu Tumanian ex Gnadilyan (2n = 2x, genome AA) (Ling et al. 2013) in the NCBI nr database. A total of 65 wheat proteins showing >60% similarity over 50% their length, were selected. Their cds were used as queries to search the wheat gene expression database WheatExp (http://wheat.pw.usda.gov/WheatExp) for chromosome arm location and tissue specificity. Forty-five orthologous genes showed expression patterns similar to their rice homoeologs, mainly in grain or/and spike, and were identified as wheat GS and GN candidate genes. An ortholog of rice GS3 is present in wheat coding for a short protein that only contains the plant-specific organ size regulation (OSR) domain, which is both necessary and sufficient for functioning as a negative regulator (Mao et al. 2010), suggesting that the wheat GS3 homolog is functional. Wheat homologs were found for DST and PGL1, but the former is dominantly expressed in the leaf and stem and the latter in the root. Multiple gene members were found for Gn1 (CKX2) and TGW6 families. Twenty-three wheat GS/GN candidate genes are homologous to the rice negative regulators APG, DEP1, FUWA, GL3, Gn1 (CKX2), GW2, GW5, GW7 and TGW6, and three candidate genes are homologous to Arabidopsis negative regulators EOD1 and FER. Notably, wheat homologs of GS2, GLW7, GW8 and IPA1 contain the conserved miRNA recognition sites.

To gain insight into their genome organization, we placed the GS homologs on the high density genetic map of A. tauschii (Luo et al. 2013), the D genome progenitor of common wheat, by aligning with its genomic (Jia et al. 2013) or BAC scaffolds and the extended sequences of the genetic markers. We anchored 33 GS homologs on the mapped marker sequences, and located 12 GS homologs to chromosome arms or regions based on their matches with chromosome arm survey sequences of common wheat cultivar Chinese Spring (The International Wheat Genome Sequencing Consortium 2014) (Fig. 2). HGW, PGL2 and TGW6-7.3 were placed in the proximal region of 7DS based on their positions relative to TGW6-7.1 and TGW6-7.2 on chromosome 7H of barley. Similarly, SRS5-1, LP, D61, FER4, EOD1 and GW8 were placed in the most likely intervals on chromosome arms 1DL, 3DL, 5DL and 6DL (Fig. 2). The position of GW2 on chromosome arm 6DS is based on its 6AS homoeolog (Simmonds et al. 2014; Su et al. 2011). Of the 33 anchored GS homologs, 26 are located in the proximal regions surrounding centromeres where recombination is largely suppressed. Another feature of the GS homologs in the wheat genome is their cluster distribution. Related to this is the amplification of the Gn1 (CKX2) and TGW6 families (Fig. 2). All these features pose a challenge for a map-based candidate gene analysis of GS in the wheat genome, particularly in the proximal regions. Mapping of GS regulators in the barley genome showed similar chromosome arm location except for GLW7 and GW7, which are located on the short arm of the group-2 chromosomes in wheat but on the long arm of chromosome 2H in barley (Supplementary Table 1). Homologs of FUWA are associated with centromeric retrotransposon Cereba (Li et al. 2004) and located on the short arm of chromosomes 6A and 6D and on the long arm of chromosome 6B in wheat and on the long arm of chromosome 6H in barley (Supplementary Table 1), suggesting its location in the centromeric region of the group-6 chromosomes (Fig. 1). Other discrepancies of homoeologous chromosome arm locations were caused by the fixed rearrangements among chromosome arms 4AL, 5AL and 7BS (Naranjo et al. 1987).

Fig. 2
figure 2

Genome organization of GS candidate genes in wheat. The candidate GS genes are placed on the high density linkage map of the seven d-genome chromosomes of A. tauschii. The chromosome designation is indicated at top, and the centromere positions are indicated by the white dots. A scale bar of 10 cM is indicated in the upper right corner. Positions of the GS gene loci that matched the d-genome marker sequence are indicated by horizontal bars; and GS gene loci with no matches in the d-genome marker sequences are placed in the most likely intervals based on their relative positions to other GS loci on homoeologous chromosomes of barley. Positive regulators are in black, the negative regulators are in red, and microRNA-regulated positive regulators in green. The functions of the loci indicated by asterisks in GS regulation are validated by association analysis (Supplemental Table 1)

Variations in gene amplification among the A, B and D genomes of common wheat and the difference in gene expression among homoeologs of an orthologous locus are observed. For example, Gn1-3.4 and TGW6-3.2 are paralogous duplication in the D genome, but Gn1-3.2 was deleted in the A genome (Supplementary Table 1). Of the 121 GS homoeologous loci in common wheat, eight loci were not expressed in any tissue of Chinese Spring (Supplementary Table 1). Interestingly, expression of TaCKX6-D1 (Gn1-3.4) was detected by quantitative RT-PCR (Zhang et al. 2012) but not by RNA-seq (Supplementary Table 1)., which raises questions about the reliability of the association analyses using GS homologs, particularly for those located in proximal regions with suppressed recombination.

Reverse genetic approaches for increasing yield potential

Identifying the wheat homologs of the GS regulators will not only facilitate candidate gene analysis, but also open the door to improving wheat yield using reverse genetics approaches that complement forward genetics approaches, such as QTL cloning. Wheat is one of the earliest domesticates, and landraces are the invaluable genetic resources as members of the primary gene pool for wheat improvement. During the long history of cultivation, beneficial alleles of the GS homologs may have accumulated in wheat landraces. In a pilot experiment, we sequenced TGW6-7.1 from 4AL of six accessions representing six subspecies of tetraploid wheat T. turgidum L. (2n = 4x, genomes AABB) and identified one nonsense mutation and two missense mutations in the portion coding for the strictosidine synthase domain (Li et al. unpublished data). These mutations can be transferred into elite common wheat cultivars by marker-assisted backcrossing and the resultant near isogenic lines can be used to evaluate the effect of mutations on GS and yield and also used as beneficial germplasm for breeding high-yielding varieties. At the same time, allele mining of the GS homologs will also facilitate the development of functional markers for breeding wheat varieties with increased grain yield. In this respect, sequence capture can be a cost-effective approach to discovering natural variations in the GS/GN candidate genes and their regulatory elements.

Targeting Induced Local Lesions in Genomes (TILLING), a powerful reverse genetics tool by combining chemical mutagenesis and molecular detection (McCallum et al. 2000), was successfully applied in diploid wheat (T. monococcum L., 2n = 2x, genome AmAm) (Rawat et al. 2012) and polyploid wheat (Krasileva et al. 2017; Slade et al. 2005; Uauy et al. 2009). Screening a TILLING population of the tetraploid wheat Kronos identified a mis-splicing mutation in TaGW2-A1 on 6AS, which led to increase of grain weight (6.6%), width (2.8%) and length (2.1%) in tetraploid and hexaploid wheat across 13 experiments (Simmonds et al. 2016). Compared to diploid wheat (Rawat et al. 2012), mutation rate is significantly higher in polyploid wheat due to genetic buffering (Slade et al. 2005; Uauy et al. 2009). For the best phenotypic effect, mutations may be developed for all three homoeologs of one orthologous gene individually and combined into one genotype. More than 10 million mutations in protein-coding regions have been cataloged in 2735 mutant lines of the durum wheat Kronos and the common wheat Cadenza by the exome capture sequencing (Krasileva et al. 2017). Searches of the wheat TILLING database (http://dubcovskylab.ucdavis.edu/wheat-tilling/sample-search or http://www.wheat-tilling.com/ ) revealed mutations for most GS/GN candidate genes, except for 24 gene loci (15 negative and nine positive regulators) in Cadenza and 12 gene loci (eight negative and four positive regulators) in Kronos without an identified mutation (Supplementary Table 2). Most striking was seen in the TGW6 gene family, where 13 of 16 loci in Cadenza and five of 10 loci in Kronos showed no mutations. This is probably due to the deletion in Cadenza or/and Kronos. Deletion of TWG6-7.1 is also detected on chromosome arm 4AL of the common wheat cultivar Fielder (Li et al. unpublished data). Compared to the landrace Chinese Spring, Cadenza, Fielder and Kronos are improved wheat varieties, implying that the TGW6 genes have likely been selected against during modern wheat breeding for bigger grains. Once updated to match the current version of wheat gene models (http://plants.ensembl.org/index.html) and improved in annotation, the wheat TILLING database can be more conveniently searched to identify GS candidate gene mutations. Due to the huge number of mutations present in each mutant line (Krasileva et al. 2017), a considerable amount of time will be needed to separate a specific mutation, purify the genetic background and achieve homozygosity of recessive mutations of the gene of interest.

For traditional mutagenesis, mutations occur randomly; the precise positions of mutations within a gene developed by TILLING are largely unpredictable. In contrast, genome editing technologies using engineered nucleases, such as zinc finger nucleases (ZFN) (Carroll 2011), TAL effector nucleases (TALEN) (Cermak et al. 2011; Li et al. 2011a; Mahfouz et al. 2011) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (Cas) systems (Cong et al. 2013; Mali et al. 2013), target specific DNA sequences for precise mutagenesis and generate mutations in an isogenic background. These systems operate in a similar fashion to create a double-strand break (DSB) at a preselected and defined genomic site. The DSB is subsequently repaired by non-homologous end joining (NHEJ) or homologous recombination (HR) (Symington and Gautier 2011). Although HR is error-free and leads to gene correction or replacement, NHEJ is error-prone and causes insertion, deletion and other mutations at the cleavage locus. When a negative regulator is targeted, the NHEJ mutations can be of agricultural importance. For example, NHEJ mutations of the rice BADH2 gene (Shan et al. 2015), soybean FAD2 gene (Haun et al. 2014) and potato vacuolar invertase gene (Clasen et al. 2015) significantly improved the end-product quality, and NHEJ mutations of the disease susceptibility genes SWEET13 and SWEET14 of rice (Li et al. 2012b; Zhou et al. 2015) and MLO of wheat (Wang et al. 2014) led to broad-spectrum disease resistance. A majority of the GS genes are negative regulators or negatively regulated by microRNAs (Figs. 1, 2), which are the perfect targets for developing knockout or overexpressing mutants, respectively, by genome editing to improve wheat genetic yield potential. For polyploid wheat, another advantage of genome editing over other approaches is that the three homoeologs of one orthologous gene can be targeted simultaneously by a guide RNA that is developed from a conserved sequence. Compared to traditional genetic engineering, a significant advantage of genome editing technologies is that their end products, the edited mutants, can be non-GMO, which are not regulated by USDA-APHIS under in 7 CFR §340 and can be directly used in breeding programs as new germplasm (Weeks et al. 2016). Therefore, the NHEJ mutants of the negative GS regulators may be directly used as novel germplasm in wheat breeding once the transgene is purged from the genetic background.

Author contribution statement

WL conceived the project, and WL and BY wrote the paper.