Introduction

Intermediate wheatgrass (Thinopyrum intermedium) is a cool-season perennial Triticeae grass, native to parts of Eastern Europe and western Asia that has been widely used for soil conservation and forage production in North America and other temperate regions of the world (Jensen et al. 2016; Zair et al. 2018). Intermediate wheatgrass (IWG) ranks among the highest-yielding forage and biomass crops that can be grown across regions of the Upper Midwest, Great Plains, and Intermountain regions of the USA and Canada (Harmoney 2015; Kenneth and Kevin 2001; Larson et al. 2017; Lee et al. 2009; Monono et al. 2013; Pearson et al. 2015; Robins 2010; Wang et al. 2014). At least 15 IWG cultivars have been released for purposes of soil conservation and forage production in Canada and the USA, with some of the first cultivars selected directly from plant introductions from Russia in 1932 (Jensen et al. 2016; Knowles 1977; Pearson et al. 2015). However, selection for better fertility and seed yields was critical for development of the first widely successful North American cultivar “Oahe” (Knowles 1977; Ross 1963), which is still grown today. Modern IWG forage cultivars have also undergone selection for disease resistance, forage quality, forage yield and other traits (Jensen et al. 2016; Krupinsky and Berdahl 2000; Vogel et al. 2005).

Species of the genus Thinopyrum, including IWG, are considered the closest perennial relatives of common wheat (Triticum aestivum) and comprise a useful part of its tertiary gene pool (Ceoloni et al. 2015; Lang et al. 2018; Li et al. 2017; Liu et al. 2017; Uzma et al. 2015; Zair et al. 2018). Common wheat and IWG have similar allohexaploid genomes (2n = 6x = 42). However, unlike wheat, IWG is usually self-incompatible meaning that most populations are highly heterogeneous and most individuals are highly heterozygous (Jensen et al. 1990, 2016; Kantarski et al. 2017; Zhang et al. 2016). Wheat-Thinopyrum hybrids have also been utilized for the development of perennial grain crops (Curwen-McAdams and Jones 2017; Hayes et al. 2018), but commercial production of these plant materials has not yet been realized because it has been difficult to introgress and stabilize this complex trait in wheat. However, parallel efforts to directly domesticate IWG itself as a perennial grain crop (Cox et al. 2006, 2010; Wagoner 1990) have led to small-scale production and utilization of IWG grain or flour for baking, beverages, and other edible food products (DeHaan and Ismail 2017). Although current grain yields are modest, IWG has potential to be used as a multipurpose perennial forage and grain crop (Bell et al. 2015; Cattani and Asselin 2017; Jungers et al. 2017; Ryan et al. 2018). These burgeoning efforts to domesticate IWG (Cattani 2017; DeHaan et al. 2018; Zhang et al. 2016) aim to diversify and enhance the quantity and quality of food products (DeHaan and Ismail 2017; Marti et al. 2016) while providing improved soil conservation, water quality, carbon sequestration, nutrient management, and other ecosystem services (Culman et al. 2013). These goals have been described as the “ecological intensification of agriculture” (DeHaan et al. 2018).

A high-density genotype-by-sequencing (GBS) consensus linkage map (Kantarski et al. 2017) and draft genome sequence were developed to accelerate the domestication of IWG (DeHaan et al. 2018). The IWG consensus map had three homoeologous sets of seven linkage groups (LG01 to LG21) with 10,029 GBS markers showing colinear alignments to the seven chromosomes of diploid barley (Kantarski et al. 2017). The consensus map was constructed using seven full-sib populations derived from 13 heterozygous individuals from the third (C3) and fourth (C4) cycles of selection (DeHaan et al. 2018) including one population derived by self-pollination of one outstanding C3 individual, C3_3471; a biparental population derived from two C4 parents, M26 and M35, descended from C3_3471 and two other C3 grandparents; and another biparental population derived from two C4 parents, C4_2856 and C4_5353, also descended from C3_3471 and three other C3 grandparents (Kantarski et al. 2017). The C3_3471 individual was identified in the third cycle of selection as the first predominantly free-threshing and non-shattering IWG plant, which also had exceptionally long and heavy seeds. The IWG draft genome sequence was developed by sequencing a haploid twin-seedling (Namikawa and Kawakami 1934) from C4_5353, which was also a parent of two mapping populations (Kantarski et al. 2017). The draft sequence includes 21 chromosome sequences ranging in size from 250.8 to 802.6 megabases (MB), totaling 9765 MB, and 59,064 scaffolds up to 8.8 MB large with a final assembly of 10,919 MB containing 152,296 annotated gene models. The 21 chromosome sequences were numbered CHR01 to CHR21 according to alignments with the 21 LGs of the IWG GBS consensus map (Kantarski et al. 2017). The IWG consensus map has been used to identify QTLs and markers associated with seed size in two biparental populations, M26 × M35 and C3-2331 × C3-2595, and one association-mapping (AM) population (Zhang et al. 2017). However, the M26 × M35 population was developed mainly to investigate the genetic control of the non-shattering and free-threshing seed traits of C3_3471.

The inflorescence of IWG is fundamentally like wheat in that they both have one sessile spikelet at each rachis node and multiple florets per spikelet with up to one seed per floret and variable levels of fertility. Seed disarticulation in wild IWG plants normally occurs in two different ways: (1) abscission above the junction of the rachis and spikelet base producing a complex diaspore with a relatively long wedge-shaped rachis internode extending away from the spikelet similar to that produced by the brittle rachis of barley (Pourkheirandish et al. 2015) and certain types of wheat that produce wedge-shaped diaspores (Li and Gill 2006), and (2) abscission below the junction of the rachilla and floret base, within the spikelet, producing a less complex diaspore that includes a short rachilla internode extending up from the base of the floret adjacent to the palea. Threshing of semidomesticated IWG forage varieties normally produces seeds with the lemma and palea attached to the pericarp, like those of hulled barleys. However, free-threshing forms of IWG, such as C3_3471, and other domesticated forms of wheat and barley produce naked grains comprised of the caryopsis detached from the lemma, palea, and spikelet glumes. At least two different genes have been identified, the barley nud gene (Taketa et al. 2008) and wheat Q gene (Simons et al. 2006), which are responsible for the free-threshing trait of naked barleys and common wheat, respectively.

The domestication and improvement of IWG as a perennial grain crop currently focuses on increasing seed size, fertility, inflorescence capacity, stem and inflorescence compactness, seed retention in the field, percentage of naked seeds after threshing, and uniformity of maturity (DeHaan et al. 2016, 2018; Zhang et al. 2016). One of the primary objectives of the study herein was to identify and map QTLs controlling these traits in the M26 × M35 population using the GBS consensus map (Kantarski et al. 2017). Another major objective of this study was to compare M26 × M35 QTLs, C3-2331 × C3-2595 QTLs, and AM markers to the annotated IWG draft genome sequence and identify possible candidate genes (CGs) controlling relevant domestication and improvement traits (Doebley et al. 2006; Kovach et al. 2007; Lenser and Theißen 2013; Meyer and Purugganan 2013; Tang et al. 2010) such as the percentage of free-threshing naked seeds (Simons et al. 2006; Taketa et al. 2008), seed shattering (Doust et al. 2014; Li and Gill 2006; Pourkheirandish et al. 2015), and grain yield (Nadolska-Orczyk et al. 2017) traits in wheat, barley, rice, and other grain crops.

Materials and methods

Plant materials and field evaluations

The terms “parent,” “hybrid,” or “progeny” are used to describe genetically unique individuals, generally referred to as “genets,” depending on context, whereas the terms “propagule” or “plot” may refer to different clonal replicates of the same genet. A total of 266 full-sib progenies, two C4 parents (M26 and M35), and two known C3 grandparents (C3_3471 and C3_3941) comprising one of seven families used to construct a high-density linkage map (Kantarski et al. 2017) were clonally propagated into replicated field plots in Kansas (KS) and Utah (UT). Propagules were planted in grids with 1-m spacing in UT (1.0 m2 plots) and 3-ft spacing in KS (0.81 m2 plots). Although these propagules eventually grow large enough to cover most plots, tillage between propagules was used to control weeds and help maintain plot integrity. The C4 M26 genet was the maternal seed parent for 128 progenies, whereas the C4 M35 genet was the maternal seed parent for the other 138 progenies. The M26 genet originates from a cross of C3_3471 as the seed parent and C3_3941 as the pollen parent in the third cycle of selection for grain production traits (DeHaan et al. 2018; Zhang et al. 2016). The M35 genet originates from a cross of C3_3941 as the seed parent and an unknown C3 pollen parent. Two propagules from each of 221 progenies, two C4 parents, and two known C3 grandparents were transplanted at the KS location (38.771517° N/− 97.569408° W) on November 28, 2012. Three clones from each of 253 progenies, two parents, and two known grandparents were transplanted into a field in UT (41.695957° N/− 111.831358° W) on May 6, 2013.

A total of 17 seed production traits were measured on each plot once per harvest year for at least two years (2014 and 2015) at both locations (KS and UT), and most traits were also evaluated at the KS location in 2013. The KS location was planted early enough in 2012 to enable sufficient plant growth, vernalization, and flowering to evaluate seed production traits in 2013. Conversely, the UT location was planted in the spring of 2013, which did not allow sufficient growth, vernalization, flowering, and set seed to evaluate seed production traits in 2013.

Four traits measured in the field include crown circumference (CRCI), number of inflorescences per crown (INCR), stem length (STLE), and one Zadoks (Zadoks et al. 1974) maturity rating (ZAMA) on each plot, once per year. The CRCI and INCR traits were evaluated by measuring the distance (cm) around the outermost stems at the base of each propagule, just above the soil surface, and then counting the number of inflorescences in each plot in 2014 at both locations. These traits were not evaluated in 2015 in KS because it was too difficult to discern the then natural edges of different propagules. However, in UT, propagules were just beginning to grow into cultivated spaces between rows in 2015, so we devised a faster and possibly more effective way to estimate CRCI and INCR. First, we counted the number of stems through a 10-cm-wide section of the widest undisturbed portion of each crown using an open-ended rectangular quadrate with marked distances along each arm to measure the maximum diameter (d) of the crown. Estimates of CRCI, from UT in 2015, were then determined based on the formula for the circumference of a circle (C = π·d). Estimates of INCR, from UT in 2015, were then determined using the formula for the area of a circular crown [A = π· (0.5·d)2] multiplied by the tiller density [S/(10·d)]. The total (stretched) STLE was measured (cm) from the soil surface to the upper tip of the inflorescence. The peduncle and inflorescence from ten of the tallest culms were harvested from each plot for subsequent measurements of 13 seed and inflorescence traits in the laboratory.

Five stem and inflorescence traits including the stem width (STWI), number of spikelets per inflorescence (SPIN), inflorescence length (INLE), seed shattering (SESH), and number florets per spikelet (FLSP) were measured on a subset of three of ten harvested culms from each plot. The INLE was measured from the rachis node subtending the lowest spikelet to uppermost point of the inflorescence. The SESH at the KS location was determined based on the percentage of disarticulation after bending spikelets to 90° angle from the rachis and dropping inflorescences from a height of 25 cm. The SESH at the UT location was determined by the percentage of disarticulation after three repeated strikes of each stem to a table surface. The STWI was measured on the thinnest part of each stem just below the inflorescence.

For UT field evaluations, six seed traits were measured after removing seed from ten harvested culms using a LD 180 laboratory thresher (Wintersteiger Inc., Salt Lake City, Utah, USA). Seeds and chaff were further separated using a General Seed Blower (Seedburo Equipment Co., Des Plaines, Illinois, USA). The percentage of seeds threshed out naked (SENA), seed area (SEAR), seed length (SELE), seed width (SEWI), and total number of seeds were determined using SmartGrain phenotyping software (Tanabata et al. 2012) from images of cleaned seeds from all ten inflorescences taken from a scanning device with a blue-paper background. The SENA was determined by first counting the number of hulled seeds and then counting the number of naked seeds based on color recognition of lighter-colored hulls and darker-colored pericarps, respectively. Measurements of SEAR, SELE, and SEWI were also taken from the darker-colored naked seeds, excluding broken parts less than 2.0-mm-long clumps of seeds that exceeded 8.0 mm length based on visual validation of SmartGrain image annotations. The total seed yield per inflorescence (SYIN) and average seed mass (SEMA) were determined by dividing the mass of all cleaned seeds (mg) by the number of inflorescences harvested (10) and the total number of seeds, respectively.

For KS field evaluations, six seed traits were measured after removing seed from ten harvested culms using a spike-tooth small bundle thresher. Additional cleaning was performed by hand sieving with a 12/64-inch round-hole sieve and aspirating with an STS-WM2 Air Separator (Seed Tech Systems, Wilton, CA, USA). The percentage of seed threshed out naked (SENA) was visually estimated for each sample. Approximately 20 naked seeds from each sample were photographed, and the images were analyzed in ImageJ (Schneider et al. 2012). Image analysis provided seed area (SEAR), seed length (SELE), seed width (SEWI), and number of seeds. This subsample was weighed, and the seed number data were used to calculate average seed mass (SEMA). The total seed yield per inflorescence (SYIN) was determined by dividing the mass of all cleaned seeds by the number of inflorescences harvested (10).

Two other seed- and inflorescence-related traits, the number of seeds per spikelet (SESP) and the number of seeds per floret (SEFL), were calculated based on measurements of the average number of seeds per inflorescence, SPIN, and FLSP as described above.

Data analyses

Least square means (LSMEANS) for each genet and least significant differences (LSD) among genets were determined using the SAS version 9.4 (SAS Institute, Cary, North Carolina, USA) MIXED procedure with genets, years, locations, and the three-way interaction term as fixed effects and replications as random effects. The overall means and LSD categories (p ≤ 0.05) for the grandparents and parents were determined using the repeated option to model covariance structure between years. Segregation of the trait values used for QTL analysis of the full-sib progeny, from LSMEANS procedure, was measured in part by relative standard deviations (standard deviation among trait values/mean over all trait values) which reflects the magnitude of trait variability, relative to the overall mean, among 68% of the most typical individuals in a normally distributed population. The broad-sense heritability (H) estimates and standard errors were determined on a single-plot basis and entry (genet) mean basis for randomized complete block designs in multiple environments using a SAS MIXED procedure with environments (five location × year combinations), replications within environments, genet, and genet × environment as random effects (Holland et al. 2010). Pearson correlation tests were performed for 136 pairwise comparisons of the 17 traits, based on individual plot measurements, using environments (locations and years) as a grouping factor for the statsBy function of the R (R Core Team 2017) psych package (Revelle 2018). Significance thresholds for the trait correlation tests were adjusted using a Bonferroni correction to control for multiple testing.

Two different approaches for QTL detection based on models for a two-way pseudo-testcross (TWPT) and cross-pollinated (CP) plants, more fully described in following paragraphs, were performed using MapQTL version 6 (Van Ooijen 2009). All of the map files and locus data used for these QTL analyses were based on the integrated GBS consensus map (Kantarski et al. 2017) with a total of 3156 markers from the M26 × M36 family. This map included 1699 markers that were heterozygous in M26 (progeny genotypes lm or ll), 1087 markers that were heterozygous in M35 (progeny genotypes nn or nm), and another 1070 markers there were heterozygous in both M26 and M35 (progeny genotypes hh, hk, or kk). The quantitative trait data were based on LSMEANS trait estimates for each genet, within and among environments, as described above. All TWPT and CP QTL analyses were initially performed using the same single-QTL interval mapping procedure to identify possible QTL markers that were subsequently used as cofactors in the first round of restricted multiple-QTL model (rMQM) mapping (Van Ooijen 2009). A second round of rMQM mapping was also performed for all TWPT and CP QTL analyses using cofactors for QTLs that were significant after the first round of rMQM mapping. All QTL analyses were performed using a maximum likelihood mixture model, with up to 20 iterations, to determine the LOD likelihood ratio statistic (Van Ooijen 2009). A permutation test for each CP and TWPT analysis was used to determine significance thresholds for the LOD statistic corresponding to a genome-wide p value of 0.05 (5%) to identify putative QTLs and cofactors (Van Ooijen 2009). Only those QTLs exceeding the 5% genome-wide LOD threshold in the second round of rMQM mapping were considered significant.

The two-way pseudo-testcross (TWPT) QTL analysis utilized a doubled haploid model with separate maps for each parent (Van Ooijen 2009), M26 and M35. Briefly, the lm and ll genotypes from 1699 M26 markers were changed to A and B, respectively, and the nn and np genotypes from 1087 M35 markers were changed to B and A, respectively. The remaining 370 markers with genotypes hh, hk, or kk were deleted. A map with 42 LGs including 1699 M26 markers in 21 LGs and 1087 M35 makers in another 21 LGs was assembled using marker positions from the integrated consensus map (Kantarski et al. 2017). Only one genotypic effect, the difference between A or B marker alleles from one parent, is fitted at any given map position in the TWPT approach. The TWPT design is a simplified model that has possible theoretical advantages because it enables rMQM mapping of each LG from each parent using QTL marker cofactors from other LGs of both parents, including one homologous LG from the other parent (Van Ooijen 2009). Another practical advantage of the TWPT design is that it is relatively easy to identify useful QTL cofactors when the parental LGs are separated since some QTLs may not be heterozygous in both parents. Thus, to identify TWPT rMQM cofactors, we initially selected one marker with the highest LOD score over all environments for each M26 or M35 LG having at least one significant TWTP QTL. However, the automatic cofactor selection procedure of MapQTL 6 was also used to identify the final set of TWPT rMQM cofactors.

The second approach for QTL analysis utilized an integrated map containing 3156 markers in 21 LGs for both M26 and M35 parents (Kantarski et al. 2017), which was constructed using the cross-pollinator (CP) model (Van Ooijen 2006). In full-sib CP families, one or more QTLs may be heterozygous in one or both parents with up to four possible alleles per QTL. The CP QTL approach always fits four possible QTL alleles designated a and b corresponding to marker alleles l and m, respectively, of the first parent (M26) and QTL alleles c and d corresponding to marker alleles n and p, respectively, of the second parent (M35). The CP QTL analysis (Van Ooijen 2009) has theoretical and practical advantages in that three possible genotypic effects are fitted including the difference between a and b QTL alleles of the first parent (α), the difference between c and d QTL alleles of the second parent (γ), and the intralocus interaction (τ) as deviations from the overall mean (µ). Thus, in a cross of ab × cd QTL alleles, the expected progeny phenotypes are modeled (Van Ooijen 2009) as follows:

$$\begin{aligned} ac & = \mu - \alpha - \gamma - \tau \\ ad & = \mu - \alpha + \gamma + \tau \\ bc & = \mu + \alpha - \gamma + \tau \\ bd & = \mu + \alpha + \gamma - \tau \\ \end{aligned}$$

If the parents (M26 and M35) are heterozygous for the same two QTL alleles, a and b, then τ would represent a dominance deviation term. However, this is never assumed to be the case because MapQTL 6 CP model always fits separate effects, α and γ, for both parents. Since more than 72% of the markers in the integrated CP map were not informative in one parent or the other parent (Kantarski et al. 2017), the interval mapping procedure can assign relatively high LOD scores to M26 or M35 marker loci that may not be directly associated with the QTL. Thus, in addition to the TWPT rMQM cofactors, the Kruskal–Wallis procedure of MapQTL 6 was used to help identify additional CP rMQM cofactors for each CP QTL. However, we did not consider more than one informative marker per parent per LG for use as a possible CP rMQM cofactor.

Comparative mapping

The two-LOD drop-off intervals for M26 × M35 QTLs with the highest LOD values for each trait and each LG, including parent-specific TWPT QTLs, were graphed onto the integrated M26 × M35 GBS consensus map (Kantarski et al. 2017) using the R (R Core Team 2017) LinkageMapView package (Ouellette et al. 2018). Another 44 SEMA, SEAR, SELE, and SEWI QTLs, detected over one or more years in the C3-2331 × C3-2595 (UMN) family (Zhang et al. 2017), were included on this graph for comparison to M26 × M35 QTLs detected in this study. The probability (Pxy) that one of the x most significant QTL markers was the same as one of the y most significant association-mapping markers, with a total of s shared markers, was calculated as one minus the probability of not having any of the same markers, which equals the products of one minus the chance of drawing one of y numbers × times (where the total number of possible makers, s, decreases from i = 0 to x − 1 without replacement) or drawing one of x numbers y times (where the total number of possible makers, s, decreases from i = 0 to y − 1 without replacement) as follows:

$$1 - \left( {\mathop \prod \limits_{i = 0}^{x - 1} \left( {1 - \left( {\frac{y}{s - i}} \right)} \right)\,{\text{or}}\, \mathop \prod \limits_{i = 0}^{y - 1} \left( {1 - \left( {\frac{x}{s - i}} \right)} \right)} \right) = Pxy.$$

Sequences of GBS markers (Kantarski et al. 2017), including 51 significant AM markers (Zhang et al. 2017), were aligned to the pre-publication “Thinopyrum intermedium C4-5353-T1 Annotated Standard Draft” sequence, available on Phytozome (https://phytozome-next.jgi.doe.gov/info/Tintermedium_v2_1), using a Basic Linear Alignment Search Tool (BLAST) with an expect value (E) threshold ≤ 1e−10 (Altschul et al. 1990). Graphical comparisons of the 21 LGs (LG01 to LG21) and 21 corresponding chromosome sequences (CHR01 to CHR21), based on GBS marker alignments, were performed using LinkageMapView by normalizing LG lengths, measured in centimorgans (cM), and CHR lengths, measured in nucleotide mega-bases (MB).

Seed-yield and domestication orthogenes of wheat, barley, rice, maize, and other plants (Doebley et al. 2006; Doust et al. 2014; Hackauf and Wehling 2005; Lenser and Theißen 2013; Li and Gill 2006; Meyer and Purugganan 2013; Nadolska-Orczyk et al. 2017; Pourkheirandish et al. 2015; Simons et al. 2006; Taketa et al. 2008; Tang et al. 2010) including Arabidopsis (Balanzà et al. 2016) and Lolium (Manzanares et al. 2016; Shinozuka et al. 2010) were aligned to the IWG draft genome sequence using BLAST or BLASTX (Altschul et al. 1997) with a minimum significance threshold of E ≤ 1e−20. At least three BLAST or BLASTX hits corresponding to three possible orthogenes on three homoeologous chromosomes of IWG (Kantarski et al. 2017) were considered for each CG. If the three most significant IWG BLAST hits were located on chromosome sequences with known orthology to chromosomes harboring barley, Lolium, rice, or wheat CGs (Klaas et al. 2011; La Rota and Sorrells 2004; Thorogood et al. 2017; Tulpan and Leger 2017), then only these hits were considered as possible orthogenes. Additional IWG BLAST hits having similar E values were considered when queried using Arabidopsis genes allowing for ancient duplications and many possible genome rearrangements that have occurred between dicots, such as Arabidopsis, and monocots such as IWG. Only those IWG BLAST hits corresponding to annotated IWG gene models, supported by IWG transcripts, were considered as possible IWG CGs.

Results

Phenotypic and genotypic variation

Significant differences (p ≤ 0.05) were detected among the C3 grandparents and C4 parents for all 17 traits, with up to four levels of difference for SEWI, SYIN, and ZAMA (Table 1). Moreover, significant effects (p ≤ 0.05) of genet, year, location, and three-way interaction term were detected for all traits except FLSP, SEFL and INLE in the overall analysis of the grandparents, parents, and 266 full-sib progenies. In most cases, those effects were highly significant (p ≤ 0.0001); therefore, all further analyses were conducted for each location × year environment. Only the year effect was not significant for INLE, and only the location effect was not significant for FLSP and SEFL. The standard deviations among the progeny trait estimates were within 4–8% of the overall trait means for SELE, SEWI, and ZAMA (Table 2). Conversely, the standard deviations varied from 38 to 86% of the trait means for SEFL, SESH, SESP, and SYIN (Table 2). However, broad-sense heritability (H) estimates for traits with lower variability tended to be greater than H estimates for traits with high variability (Table 2). The relative standard deviations for SENA varied from 17 to 94% among environments, in part because the overall means of this trait also varied widely among environments (Table 2). These data, in Tables 1 and 2, demonstrate significant genetic variation among the C3 grandparents, C4 parents, and full-sib progeny, respectively, for all 17 traits.

Table 1 Trait means with least significant differences (LSD) for cycle-3 (C3) grandparents and cycle-4 (C4) parents of the full-sib M26 × M36 population
Table 2 Trait means, relative standard deviations (RSD), ranges, and broad-sense heritabilities with standard errors (H ± SE) for 266 M26 × M35 progenies across five location (KS or UT) × year (2013, 2014, and 2015) environments and overall average across five environments (AVG)

Significant correlations (p < 0.05) were detected for 100 of the 136 possible pairwise comparisons among 17 traits, controlling for multiple testing (Table 3). Although there were up to 12 possible plot measurements taken for each of the 270 genets (two replications in KS evaluated 3 years and three replications in UT evaluated over 2 years), with 3240 possible measurements per trait, the maximum number of observations was limited to 2570 measurements because not all genets were present in all five replications. Relatively strong positive correlations were observed among the four seed-size traits (SEAR, SELE, SEMA, and SEWI) and between the two fertility traits, SEFL and SESP (Table 3). Seed yield per inflorescence (SYIN) is a complex trait that showed relatively strong and significant correlations with fertility traits, SEFL and SESP, and moderate correlations with seed size (SEAR, SELE, SEMA, and SEWI), inflorescence capacity (SPIN and FLSP), and seed disarticulation (SENA and SESH) traits (Table 3). The percentage of naked seeds (SENA) was negatively correlated with SEWI, SESH, and FLSP (Table 3), which was especially true in the KS evaluations where these correlation coefficients were − 0.25, − 0.34, and − 0.12, respectively.

Table 3 Trait correlations among parents and progenies of M26 × M35 family

QTL analysis

Permutation tests showed that the minimum LOD thresholds required to control for a 5% genome-wide error rate (p < 0.05) were 3.5 and 4.7 or less for the TWPT and CP QTL analyses, respectively, for each trait. Using these LOD thresholds for all 17 traits, there were a total of 210 significant QTLs detected among six QTL analyses for each of five different environments and averages over all five environments (Table 4; Supplemental Document 1). A subset of 56 QTLs were significant in only one analysis, whereas 55 QTLs were significant among two or more analyses (Supplemental Document 1). A total of 57 QTLs were significant in the analysis of trait averages over all five environments but only one QTL, for SELE on LG06, was significant across all six analyses (Table 4). Considering the most significant QTL with maximum LOD on each LG and each trait, there were at least 111 distinct QTLs including 19 M26 and 20 M35 QTLs detected using the TWPT model and 72 QTLs detected using the CP model (Tables 4 and 5; Supplemental Document 1). Only three distinct QTLs were detected for CRCI, and up to ten distinct QTLs were detected for ZAMA (Table 5). There were at least two significant QTLs on each of the 21 LGs, with up to 12 significant QTLs on LG06 (Table 5).

Table 4 Linkage groups (LGs) with significant QTL effects in M26 × M35 population across five location (KS or UT) × year (2013, 2014, and 2015) environments and overall average across five environments (AVG)
Table 5 Peak LOD scores for each linkage group (LG) and homoeologous group (HG) of M26 × M36 family

For most traits except INLE, the total magnitudes of maximum QTL effects (Max-total) and overall average QTL effects (Avg-total), within and among five environments, were not the same for both M26 and M35 parents (Table 6). The Max-total and Avg-total M35-γ effects were greater than corresponding M26-α effects for SEMA, FLSP, SEFL, SESP, STWI, ZAMA, CIRCI, and INCR (Table 6). Conversely, the Max-total and Avg-total M26-α effects were greater than corresponding M35-γ effects for SEAR, SENA, SYIN, and STLE (Table 6). The Avg-total M26-α effects were also greater than corresponding M35-γ effects for SELE, SESH, and SPIN, but this was not true for the Max-total effects of these four traits (Table 6). A more detailed examination of SENA QTL effects indicates the magnitude of M26-α QTL effects was equal or greater than the M35-γ effects for six of nine SENA QTLs including the LG02 and LG11 SENA QTLs, which had relatively large M26-α effects (Table 6). Thus, the Avg-total M26-α SENA effects (28% of population mean) were about 133% greater than corresponding M35-γ effects (12% of population mean), with a total estimated effect of 52% of the SENA mean (Table 6). The LG02 SENA QTL was most significant in the KS15 analysis (Table 4), where progenies with LG02 SENA QTL genotypes ac and bd were 39% greater than the population mean or 25% less than the population mean, respectively (Table 6; Supplemental Document 1). If the LG02 M26 a and b alleles are additive, then it could be inferred that the SENA means for progeny with an aa genotype would be 42% over the population mean (Table 2), with a M26-α SENA effect of 21% of population mean (Table 6). However, there are no progenies or parents that are homozygous for the four possible QTL alleles (a, b, c, or d) in the M26 × M35 family so it is not possible to determine the effects of all QTL genotypes. Nevertheless, the Avg-total SENA QTL effects, 68% of population mean (Table 6), were enough to account for the range of 23.2–83.5% SENA variation (55.3% average) among progenies (Table 2). Moreover, the Avg-total M26-α effects, 28% of population mean (Table 6), would also be enough to account for differences of 28.7% and 94.1% SENA between the parents of M26, C3_3941, and C3_3471 (Table 1), if they were homozygous for small- and large-effect alleles, respectively, and if the M26 a and b alleles are additive.

Table 6 Estimates of maximum QTL effects (α, γ, τ) based on six QTL analyses of the average trait values within or among five environments (Tables 4, 5), total magnitudes of maximum (Max) QTL effects, and total magnitudes of average (Avg) QTL effects over five environments reported as a percentage of population mean where α is difference between QTL alleles of the M26 parent, γ is difference between QTL alleles of the M35 parent, and τ is intralocus interaction

The highest LOD score for all 17 traits was for the LG11 SESH QTL (Table 5), which had relatively strong but opposite effects on both SESH and SENA (Table 6), meaning that it has a desirable or synergistic effects of decreasing seed shattering and increasing the percentage of free-threshing (naked) seed. In fact, this LG11 QTL was the only QTL that had synergistic M26-α SESH and SENA QTL effects and synergistic M35-γ SESH and SENA QTL effects (Table 6). Only one other QTL, on LG10, had synergistic SESH and SENA M26-α effects (Table 6). The total magnitude of the LG10 and LG11 SESH and SENA M26-α effects, 23% and 26%, respectively (Table 6), would be enough to account for most SESH and SENA differences between the parents of M26, C3_3941, and C3_3471 (Table 1), if they were homozygous for desirable and undesirable alleles, respectively, and if the M26 a and b alleles are additive. Two other QTLs on LG12 and LG14 had synergistic SESH and SENA M35-γ effects. Thus, at four QTLs on LG10, LG11, LG12, and LG14 had synergistic M26-α or M35-γ effects on SESH and SENA (Table 6), which account for negative correlations between these traits (Table 3). Only one QTL, on LG06, had the same positive or negative effect on SESH and SENA (Table 6). In general, the relative directions of QTL effects for different traits (Table 6) were consistent with trait correlations (Table 3). For example, nine of the 13 LGs associated with seed-size traits (SEAR, SELE, SEMA, and SEWI) had the same directional effects for α or γ on more than one of these traits, which was consistent with positive correlations among these traits (Table 3). Likewise, six of the ten LGs associated with SEFL and SESP had the same directional α or γ effects for both traits, consistent with the positive correlation between these two traits (Table 3).

The percent variation explained (PVE) by M26 was substantially greater than M35 for SEAR, SELE, SENA, SPIN, FLSP, and STLE (Table 7). Conversely, the PVE by M35 was substantially greater than M26 for SEMA, SEWI, SEFL, SESP, INLE, STWI, ZAMA, CRCI, and INCR (Table 7). Thus, the relative PVE by M26 and M35 (Table 7) closely mirrors the relative magnitudes of the average M26-α and M35-γ QTL effects among all five environments (Table 6). However, the combined PVE by M26 and M35 in the CP approach was greater than the combined PVE by M26 and M35 in the TWPT approach for SEMA, SEWI, SPIN, ZAMA, and CRCI (Table 7), presumably because the CP model includes effects of intralocus interactions (τ).

Table 7 Total percent variation explained (PVE) by M26 × M35 QTL markers using a two-way pseudo-testcross (TWPT) or a cross-pollinators (CP) across five location (KS or UT) × year (2013, 2014, and 2015) environments and overall averages across five environments (AVG) including AVGs by parent (M26 or M35)

Alignment of IWG QTL intervals and association-mapping (AM) loci to IWG draft genome sequence

A subset of 3608 (93.6%) of the 3856 GBS markers comprising the M26 × M26 linkage map had significant matches to the IWG physical map. Of these 3608 matches, 2977 (82.5%) showed synteny with homologous chromosome sequences such as LG01 and CHR01 (Fig. 1), 239 (6.6%) mapped to different members of the same homoeologous group (HG) such as LG01 and CHR02 or CHR03, 213 (5.8%) mapped to different genetic and physical HGs, and 179 (5.0%) matched unmapped scaffolds. Substantial collinearity among GBS markers of the 21 linkage groups and corresponding chromosome sequences was also discernable (Fig. 1). A subset of 1072 (29.7%) of the 3608 M26 × M35 GBS marker alignments were located within 1072 (0.7%) of the 159,905 IWG gene models. A slightly higher portion of 37 (33%) of the 111 most significant M26 × M35 QTL markers were located within IWG gene models. A total of 9403 (93.7%) of the 10,029 GBS markers from the IWG consensus map (Kantarski et al. 2017) showed significant matches to the draft genome sequence, including 3223 (34.3%) located within annotated IWG gene models.

Fig. 1
figure 1figure 1figure 1

Alignment of quantitative trait loci (QTLs), association-mapping markers, and possible candidate genes to 21 linkage groups (LG01–LG21) and 21 chromosome sequences (CHR01–CHR21) corresponding to seven homoeologous groups (HG1–HG7) of allotetraploid (2n = 6x = 42) intermediate wheatgrass. The 2-LOD drop-off intervals for 111 M26 × M35 QTLs (Tables 4, 5, 6) and 44 C3-2331 × C3-2595 (UMN) seed-size QTLs (Zhang et al. 2017) are indicated by filled and shaded box plots, respectively, on the right side of each linkage group The physical position of UMN seed-size AM markers (Zhang et al. 2017) and possible candidate genes (Table 8) are shown in bold and italic text, respectively, on the left side of each chromosome. Graphed portions of each linkage group and chromosome sequence are scaled in centimorgans (cM) in nucleotide megabases (MB), respectively, and normalized to comparable lengths

A total of 93 GBS markers had the highest LOD score for the 111 most significant M26 × M35 QTLs, and the two-LOD drop-off intervals for many of the M26 × M35 QTLs were identical or very similar (Supplemental Document 1). The 111 M26 × M35 QTLs collapsed into 72 QTL intervals spanning different regions or different lengths of the linkage map (Fig. 1). However, these 72 QTL intervals covered no more than 36 non-overlapping regions of the 21 integrated LGs. One region of LG06 had significant effects on 11 different traits including all seed-size, fertility, and seed harvest traits (Fig. 1).

A total of 44 SEMA, SEAR, SELE, and SEWI QTLs from the biparental UMN C3-2331 × C3-2595 population (Zhang et al. 2017) were mapped to 22 QTL intervals corresponding to 21 non-overlapping regions of the IWG consensus map (Fig. 1). Similarly, 25 M26 × M35 seed-size QTLs were mapped to 20 QTL intervals corresponding to only 14 non-overlapping regions of the same IWG consensus map (Fig. 1). A total of 69 seed QTLs from both families, M26 × M35 and C3-2331 × C3-2595, mapped to only 26 non-overlapping regions on 20 LGs with correspondence between families on LG06, LG08, LG09, LG11, LG14, LG15, LG17, and LG20 (Fig. 1). A total of 51 AM markers mapped to 18 of the 21 LGs, excluding LG06, LG07, and LG19 (Zhang et al. 2017). No significant seed-size QTLs were detected on LG05, in the M26 × M35 or C3-2331 × C3-2595 families, but there were eight AM markers on this LG. Five LGs (LG02, LG06, LG15, LG17, and LG20) had significant QTL effects on all four seed-size traits (SEMA, SEAR, SELE, and SEWI), but only LG06 had significant QTL effects on all four seed-size traits in both M26 × M35 and C3-2331 × C3-2595 families. Only LG03, LG05, LG13, and LG20 had significant AM markers for all four seed-size traits. Only LG20 had significant QTL and AM markers for all four seed-size traits.

Two of the 25 most significant M26 × M35 and 44 most significant C3-2331 × C3-2595 UMN seed-size QTL markers, TP678810 and TP693406, were also among 51 most significant seed-size AM markers. The TP678810 GBS polymorphism was the most significant marker for the LG17 SELE QTL in the M26 × M35 family (Supplemental Document 1), and it was also associated with SEMA, SEAR, and SELE in the UMN AM population (Zhang et al. 2017). The TP693406 polymorphism was the most significant marker for a SEWI QTL (Ti_QSws.umn_4.1) in the C3-2331 × C3-2595 family and the SELE in the AM population (Zhang et al. 2017). The coincidence of these two markers, ranking among the 69 most significant QTL markers and 51 most significant AM markers, raised questions about the probability of this happening by chance considering that there were 3856 M26 × M26 markers, 2167 C3-2331 × C3-2595 markers, and 4873 AM markers used in these experiments (Zhang et al. 2017). The probability of having at least one identical marker between the M26 × M35 QTL experiment and AM experiment was calculated as 13.9% using the formula for Pxy where only x = 12 of the most significant M26 × M35 QTL markers, y = 14 AM markers, and s = 1139 markers were common to both experiments. The probability of having at least one identical marker in the C3-2331 × C3-2595 QTL experiment and AM experiment was calculated as 18.9% using the formula for Pxy where only x = 23 of the most significant C3-2331 × C3-2595 markers, y = 10 of the most significant AM markers, and s = 1115 markers were common to both experiments. The probability of both occurrences was 2.6%, which suggests that these coincidences occurred more frequently than expected by chance alone. These two shared markers, LG17 TP678810 and LG04 TP693406, were located within two IWG gene models, Thintv21245331 m.g and Thintv21054937 m.g, respectively. The LG17 Thintv21245331 m.g IWG gene model belongs to a GDSL lipase (GLIP) gene, and the LG04 Thintv21054937 m.g IWG gene model encodes a putative xylogalacturonan beta-1,3-xylosyltransferase, xylogalacturonan deficient (XGD), gene (Supplemental Document 2). A total of 22 (43%) of the 51 most significant AM markers (Zhang et al. 2017) were located in annotated IWG gene models, whereas only 34.3% of all 9403 GBS alignments to the IWG draft genome sequence were located in structurally annotated gene model, suggesting that some of the most significant AM markers may be located in functionally relevant genes.

Alignments of 42 domestication genes of wheat, barley, rice, and other plants (Table 8) to the draft genome sequence of IWG detected significant homology to 142 annotated gene models of IWG (Fig. 1; Supplemental Document 2). Most of these genes showed evidence of orthology based on functional annotations and known patterns of synteny among species (Supplemental Document 2). Moreover, alignments of IWG chromosome sequences to the IWG QTL map, based on GBS markers, revealed 98 CGs with possible alignments to one or more relevant IWG QTLs (Table 8, Fig. 1; Supplemental Document 2).

Table 8 Description of 42 candidate genes located in 87 different chromosome regions containing one or more relevant quantitative trait loci (QTL) identified in the intermediate wheatgrass M26 × M35 or C3_2332 × C3_2595 (UMN) families, listed by homoeologous group (HG)

Discussion

IWG QTLs correspond to candidate genes (CGs) on seven homoeologous groups (HGs)

One of the ultimate goals in many QTL studies is to identify genes and mutations that underlie functionally important traits, initiated here by the identification of IWG loci corresponding to domestication orthogenes of closely related species such as wheat, barley, and rice. The development of an annotated draft genome sequence for IWG greatly facilitated this effort with identification of 142 possible domestication orthogenes, including 98 possible CG-QTL alignments. Discussion of these alignments was organized according to the seven homoeologous groups of wheat, barley, and IWG (Kantarski et al. 2017), which also have well-defined relationships to the ten chromosomes of maize and 12 chromosomes of rice (La Rota and Sorrells 2004; Tulpan and Leger 2017).

Homoeologous group 1 was interesting in part because it produced the highest SENA LOD score (Table 5). Two possible orthologs of the Seedstick gene of Arabidopsis (Balanzà et al. 2016), designated STKa and STKb, were found on each of the three IWG HG1 chromosomes (Fig. 1). The Seedstick gene encodes a MADS-box transcription factor required for seed abscission in Arabidopsis, with seeds remaining attached to the funiculus when after fruits dehisce in stk mutants (Balanzà et al. 2016). Although patterns of synteny between Arabidopsis and IWG are not easy to discern (Tulpan and Leger 2017), the functional annotation of the putative IWG STK orthogenes matches that of Arabidopsis (Supplemental Document 2). The IWG STKa and STKb loci on CHR02 align to a SENA QTL on LG02, which had the most consistent SENA effects across environments (Table 4) and the highest LOD score (Table 5). Orthologs of the rice grain weight chromosome-5 QTL (GW5) gene (Shomura et al. 2008; Weng et al. 2008) aligned to seed-size QTL on LG01, LG02, and LG03 (Table 8). One of three possible IWG orthologs of maize the barren inflorescence 4 (Bif4) gene (Galli et al. 2015) located on CHR01, CHR02, and CHR03 aligned to a SPIN QTL on LG03. The maize Bif1 and Bif4 genes encode AUXIN/INDOLE-3-ACETIC ACID (Aux/IAA) proteins required for early steps of inflorescence development (Galli et al. 2015).

Homoeologous group 2, particularly LG06, produced the highest M26 × M35 LOD scores for SEMA, SEAR, SELE, and SYIN. The LG06 SELE QTL had the second highest LOD score of 12.0 in this experiment (Table 5), and it was the only QTL that was significant across all five environments (Table 6). Homoeologous group 2 was also remarkable in that it showed significant M26 × M35 QTL effects for all traits except CRCI and INCR (Table 5) and the greatest overall number of seed-size effects (22) in the M26 × M35 family (four QTLs), C3-2331 × C3-2595 UMN family (seven QTLs), and AM population (11 QTLs). The grain incomplete filling (GIF) cell-wall invertase gene of rice (Li et al. 2013; Wang et al. 2008; Yan et al. 2011), grain length chromosome-7 QTL (GL7) gene (Wang et al. 2015), sucrose synthase 2 (SUS2) gene (Hou et al. 2014; Jiang et al. 2011), and xylogalacturonan-deficient (XGD) gene of Arabidopsis (Jensen et al. 2008) were present on CHR04, CHR05, and CHR06 with alignments to seed-size and seed-yield traits on LG04 and LG06 (Fig. 1, Table 8). A GL7 locus on LG06 is an interesting candidate for the major-effect SELE QTL associated with this linkage group (Fig. 1, Tables 5 and 6). The XGD locus on CHR04 was of particular interest because it contained the most significant SEWI QTL marker in the IWG C3-2331 × C3-2595 family, TP693406, which was also associated with SELE in the AM population (Zhang et al. 2017). The xgd1 mutant of Arabidopsis has decreased levels of xylose and pectic xylogalacturonan, which are important components of cell walls and reproductive tissues that may affect process of plant development such as pectin degradation during fruit ripening (Jensen et al. 2008). The clustered primary branch 1 (CPB1) gene encodes a cytochrome P450 protein involved in brassinosteroid biosynthesis, which is associated with inflorescence architecture, seed-size, and plant-height traits including the DWARF11 (D11) phenotype of rice (Wu et al. 2016). Three CPB1 loci were found on CHR05, CHR06, and one scaffold that presumably should map to CHR04 where it might align to inflorescence and seed-size QTLs on LG04. The CHR05 and CHR06 CPB1 genes aligned to STLE and seed-size QTLs on LG05 and LG06, respectively. Candidate genes closely associated with the barley gibberellin-insensitive Semidwarf 3 (SDW3) gene (Vu et al. 2010) were present on CHR04, CHR05, and CHR06 including one that also aligned to the relatively broad STLE QTL interval on LG05 (Fig. 1). The six-rowed spike 1 (Vrs1) encodes a homeobox-leucine zipper protein that reverts the rudimentary lateral spikelets of two-row barleys into fully developed spikelets of the six-row barleys (Komatsuda et al. 2007). Homoeologous Vrs1 genes were present on CHR04, CHR05, and CHR06 with two copies of this gene on CHR06. The Vrs1 genes on CHR04 and CHR06 align to SPIN and fertility (SEFL and SESP) QTLs on LG04 and LG06, respectively (Fig. 1, Table 8). The barley ZEOCRITON (ZEO) spike density gene, located on the long arm of barley chromosome 2H (Houston et al. 2013), was present on the long arm of IWG CHR04, the short arm of IWG CHR06, and a paralogous locus on IWG CHR21 (Fig. 1). The IWG ZEO loci did not align to INLE QTLs, but there was an INLE QTL aligned to the long arm of IWG CHR05, which may not be fully and correctly assembled.

Homoeologous group 3 had the highest LOD scores for SEWI, SPIN, FLSP, and INLE (Table 5). A total of 17 loci corresponding to six HG3 CGs (Table 8) were syntenous and colinear among CHR07, CHR08, and CHR09 (Fig. 1). Mutations of the maize Barren inflorescence 1 (BIF1) gene (Galli et al. 2015) and Barren stalk 1 (BA1) gene (Gallavotti et al. 2004) reduce the number of spikelets due to defects in auxin signaling. The barley chromosome-3H six-rowed spike 4 (VRS4) gene, an ortholog of the maize RAMOSA2 inflorescence architecture gene, is associated with spikelet fertility and determinacy (Koppolu et al. 2013). The BA1, BIF1, and VRS4 genes aligned within or near SPIN and FLSP QTL intervals on LG07 and LG09. The Grain number chromosome-1 QTL (GN1) gene (Ashikari et al. 2005; Zhang et al. 2012a) is a cytokinin oxidase gene associated with a major grain-number QTL and grain weight in rice. The GN1 gene was present on CHR07 and two CHR08 loci aligned to SEAR and SPIN QTLs on LG07 and a SEWI QTL on LG08. The brassinosteroid-insensitive 1 (BRI1) gene is responsible for the uzu semidwarfing mutation (Chono et al. 2003), which has pleiotropic effects on spike length (Chen et al. 2016) and aligned to INLE and STLE QTLs on LG07 and LG09. The barley DENSO (Semidwarf 1) gene encodes a gibberellic acid (GA)-20 oxidase enzyme required for GA biosynthesis (Jia et al. 2009), which can have pleiotropic effects on heading date and possibly grain size but not spike length (Kuczyńska et al. 2014). The DENSO gene aligned within or near STLE and seed-size QTLs on LG08 and LG09. The rice grain size chromosome-5 QTL (GS5) serine carboxypeptidase gene plays a major role in regulating grain size and weight in rice and possibly wheat (Li et al. 2011; Ma et al. 2016). The GS5 gene maps to wheat 3A, 3B, and 3D (Ma et al. 2016) and IWG CHR07, CHR08, and CHR09 (Fig. 1), which align to seed-size QTLs on LG07, LG08, and LG09.

Homoeologous group 4 produced a total of 23 M26 × M35 QTLs, only slightly fewer than HG5, and produced the highest LOD scores for STLE, SESH, ZAMA, and INCR (Table 5). This HG was exceptional in that it produced the highest LOD score for any trait, which was associated with the LG11 SESH QTL (Table 5). Moreover, significant QTL effects for SESH, SENA, SEFL, and ZAMA were detected on all three HG4 linkage groups (LG10, LG11, and LG12). Orthologs of the wheat “green revolution” reduced height (RHT) gene (Peng et al. 1999) were found on CHR10 and CHR12 including one that aligns to the high-LOD STLE QTL on LG10, which also overlaps with other QTLs controlling other traits (Fig. 1). Despite efforts to find seed-shattering CGs from wheat, barley, or rice that correspond to IWG HG4, the only relevant gene that we found was the Arabidopsis Seuss (SEU) gene (Balanzà et al. 2016), which aligned to the SESH and SENA QTLs on LG10 and the SESH QTL on LG12. Many genes controlling photoperiod and flowering date have been identified in temperate cereals including wheat and barley (Cockram et al. 2007). The vernalization 2 (VRN2) gene is a major determinant of flowering time in wheat (Yan et al. 2004) and barley (Dubcovsky et al. 2005; Karsai et al. 2005; Laurie et al. 1995). The VRN2 gene is located on barley chromosome 4HL and a region of wheat chromosome 4AL that was translocated to 5AL (Devos et al. 1995; Dubcovsky et al. 1998). The VRN2 gene was located on IWG CHR10 (HG4) and CHR15 (HG5), which was not unexpected because LG12 (HG4) and LG15 (HG5) of IWG display a reciprocal translocation (Kantarski et al. 2017) similar to the 4AL/5AL translocation of Triticium monococcum, T. aestivum and other Triticeae species (Devos et al. 1995; Dubcovsky et al. 1998; Larson et al. 2012). A third VRN2 gene was found on an unmapped scaffold_264 that probably belongs to CHR11, which is colinear with barley chromosome 4H (Kantarski et al. 2017). However, none of the ZAMA QTLs aligned to the CHR10-, CHR15-, or predicted CHR11-VRN2 loci. Two Phytochrome genes, designated PHYA and PHYB (Mathews and Sharrock 1996) map to CHR10, CHR11, and CHR12 with alignments to ZAMA QTLs on LG10 and LG11. The LG10 ZAMA QTL could be a pleiotropic effect of the LG10 STLE QTL, but there was no alternative explanation for the alignment of the LG11 ZAMA QTL to the CHR10 PHYB gene.

Homoeologous group 5 was exceptional in that it displayed the greatest number of QTLs, 25 in total, and that it displayed significant effects for SEFL and SESP on all three HG5 linkage groups including LG13, LG14, and LG15 (Table 5). The Q gene on wheat chromosome 5AL confers the free-threshing trait of domesticated wheat and has pleiotropic effects on glume shape, glume tenacity, rachis fragility, spike length, plant height, and heading date (Faris and Gill 2002; Simons et al. 2006). Orthogenes of Q were found on the long arm of CHR13, CHR14, and CHR15 with alignment to INLE, SESH, and SENA QTLs on LG14 and possibly another SENA QTL on LG15 (Fig. 1). Other potentially important CGs located near Q include the Phytochrome C (PHYC) and Vernalization 1 (Yan et al. 2003) genes, which aligned to a ZAMA QTL on LG14 (Fig. 1). Orthologs of the Constans 3 (CO3) CONSTANS-like flowering gene (Griffiths et al. 2003) also align to ZAMA QTLs on LG14 and LG15. Homologs of the rice grain length chromosome-3 QTL (GL3) gene (Zhang et al. 2012b) were present at two loci on CHR13, CHR14, and CHR15 with alignments to seed-size QTLs on all three corresponding linkage groups (Fig. 1). Two closely linked homologs of the rice thousand-grain weight 6 (TGW6) gene (Ishimaru et al. 2013), located on CHR15, also aligned to seed-size QTLs on LG15. Homologs of the maize barren inflorescence 2 (BIF2) gene (McSteen et al. 2007) and the dense and erect panicle 1 (DEP1) gene of rice and barley (Huang et al. 2009; Wendt et al. 2016) show alignments to INLE and SPIN QTLs on LG13 and LG14.

Homoeologous group 6 was tied with HG2 for having the highest SEAR LOD score, and all three HG6 linkage groups (LG16, LG17, and LG18) had significant ZAMA effects (Table 5). Three CONSTANS-like flowering orthogenes, designated CO2, CO5 and CO7 (Griffiths et al. 2003), were each present on CHR16, CHR17, and CHR18 except that CO5 was not present on CHR16. At least one of two of these CONSTANS genes aligned to each of the ZAMA QTLs on LG16 and LG17, but none aligned to the ZAMA QTL on the distal long arm of LG18. Orthologs of the growth-regulating factor 4 (GRF4) gene (Sun et al. 2016), Grain weight chromosome-2 QTL (GW2) gene (Song et al. 2007; Su et al. 2011), and a GDSL lipase (GLIP) gene aligned to seed-size QTLs on LG16, LG17, and LG18. The rice GRF4 gene encodes a growth and cytokinin-biosynthesis regulator that increases grain size and inflorescence length and decreases seed shattering (Sun et al. 2016). The GLIP locus on LG17 contained one of the most significant SELE QTL markers in the M26 × M35 family, TP678810, which was also associated with SEMA, SEAR, and SELE in the UMN AM population (Zhang et al. 2017). Some GLIP genes (Jiang et al. 2012) have roles in seed size and metabolism (Clauß et al. 2011; Huang et al. 2015; Lai et al. 2017; Ma et al. 2018; Tiwari et al. 2016).

Homoeologous group 7 was remarkable in that it was associated with a total of 21 seed-size QTLs, only one less than HG2, and the highest number of M26 × M35 seed-size QTLs (Table 5). Compared to the other HGs, more seed-size CGs and CG-QTL alignments were also found in HG7. Six relevant seed-size CGs including the rice chromosome-8 fertilization independent seed (FEI) gene (Kapazoglou et al. 2010; Nallamilli et al. 2013), rice grain width chromosome-8 QTL (GW8) gene (Wang et al. 2012), glucan, water-dikinase (GWD) gene (Ral et al. 2012; Shu and Rasmussen 2014), sucrose synthase 1 (SUS1) gene (Hou et al. 2014; Jiang et al. 2011), thousand-grain weight 6 (TGW6) gene (Ishimaru et al. 2013), and wheat thousand-grain weight chromosome-7A (TGW7A) gene (Hu et al. 2016) all aligned to seed-size QTLs in HG7 (Table 8). The maize teosinte glume architecture 1 (tga1) gene (Wang et al. 2005) and barley nud gene (Taketa et al. 2008), responsible for naked grains, were both found on LG19, LG20, and LG21 of HG7 but no significant IWG SENA QTLs were detected in HG7.

The S and Z self-incompatibility (SI) genes

The S and Z SI genes have been shown to affect seed-size and fertility traits in perennial ryegrass (Lolium perenne) if there is insufficient genetic variability in populations such as a full-sib mapping family (Studer et al. 2008), which raises concerns about possible effects of these genes in the full-sib IWG QTL mapping populations. The S and Z SI genes could also be yield-limiting factors in the broader IWG grain breeding populations, which purportedly incurred a population bottleneck as few as 14 individuals (Wagoner 1990; Zhang et al. 2016). The S and Z genes are located on chromosomes that correspond to homoeologous groups 1 and 2, respectively, in both Lolium and Secale (Hackauf and Wehling 2005; Manzanares et al. 2016; Shinozuka et al. 2010; Thorogood et al. 2017). Seed fertility QTLs (SEFL and SESP) were detected on ten linkage groups (Table 5) including LG01 (HG1) and LG06 (HG2). In fact, LG06 displayed the highest LOD scores for several seed-size (SEMA, SEAR, and SELE), seed-fertility (SEFL and SESP), and seed-yield (SYIN) traits. Identification of S and Z genes has been difficult (Thorogood et al. 2017), but it is thought that a domain of unknown function (DUF247) gene is the pollen component of the S locus on HG1 (Manzanares et al. 2016). Another paralogous DUF247 gene (Shinozuka et al. 2010) and closely linked ubiquitin-specific protease (USP) gene (Hackauf and Wehling 2005) are considered the best candidates for the Z locus (Thorogood et al. 2017). Putative orthologs of the S (DUF247) gene were found on IWG CHR01, CHR02, and CHR03 of HG1 with alignments to seed-size QTLs (Fig. 1). Moreover, two markers with significant seed-mass effects in the AM population (Zhang et al. 2017) were located about 5 MB and 68 KB from the LG01 and LG02 S loci, respectively (Fig. 1). However, none of the putative IWG S genes aligned to SESP or SEFL fertility QTLs (Fig. 1). Tightly linked DUF247 and USP genes on IWG CHR01 (HG1) and CHR06 (HG2) were homologous to tightly linked DUF247 and USP genes corresponding to the Z ortholoci on HG2 of Secale (Hackauf and Wehling 2005) and Lolium (Shinozuka et al. 2010). The putative Z locus on IWG CHR06 shows possible alignment near seed-size, fertility, and seed-yield QTLs on LG06, and it is presumably orthologous to the Z locus on HG2 of Secale (Hackauf and Wehling 2005) and Lolium (Shinozuka et al. 2010). Linkage group 6 (LG06) was the only LG that had significant QTLs for all four seed-size traits in both M26 × M35 and C3-2331 × C3-2595 families, and LG06 also had the highest LOD scores for SEFL and SESP (Table 5). However, it was also interesting that CHR06 was one of only three chromosomes that did not have any significant seed-size AM effects (Zhang et al. 2017), indicating that diversity of the putative IWG CHR07 Z locus was not limiting seed size in the AM population. Tightly linked DUF247 and USP genes on IWG CHR01 aligned to SESP and SYIN QTLs on LG01 (Fig. 1), but it is not absolutely clear whether this CHR01 (HG1) IWG locus is orthologous to the HG2 Z locus of Secale and Lolium (Hackauf and Wehling 2005; Shinozuka et al. 2010). Further research is needed to determine whether the S or Z SI genes are affecting grain size and grain yields in the IWG grain breeding populations, and future experimental research should be designed to consider this question.

Prospects for gene identification

Recent advancements in DNA sequencing and genomics have enabled scientists to pinpoint genes and chromosome regions that distinguish crops from their wild progenitors, with promising applications to a multitude of additional crop species (Tang et al. 2010). Development of high-density GBS linkage maps (Kantarski et al. 2017) and a draft genome sequence will facilitate identification of CGs associated with IWG domestication QTLs, but it is also recognized that most of these QTLs span large regions of the genome containing many other genes (Fig. 1). Additional sequencing of M26, M36, C4-3471, and other IWG reference plants will facilitate identification of mutations in these CGs, some of which may have large or dramatic effects (Tang et al. 2010). Moreover, relatively low levels of linkage disequilibrium and high levels of outcrossing coupled with recent advances in DNA sequencing and genotyping will facilitate genetic testing of CG variants by association mapping in IWG (Zhang et al. 2017). This approach was used to identify or test CGs of wheat (Hou et al. 2014; Hu et al. 2016; Jiang et al. 2011; Ma et al. 2016; Su et al. 2011; Zhang et al. 2012a), barley (Shu and Rasmussen 2014), maize (Gallavotti et al. 2004), and rice (Li et al. 2011; Wang et al. 2015; Weng et al. 2008; Zhang et al. 2012a) examined in this study (Table 8). However, it is likely that many of the IWG QTLs and AM effects are caused by other genes not examined in this study or perhaps not recognized in any other crop species. Positional cloning was used to identify many CGs of the wheat (Faris et al. 2003; Ishimaru et al. 2013; Yan et al. 2003, 2004), barley (Komatsuda et al. 2007; Vu et al. 2010), rice (Ashikari et al. 2005; Huang et al. 2009; Li et al. 2011; Song et al. 2007; Sun et al. 2016; Wang et al. 2008, 2012, 2015; Weng et al. 2008; Wu et al. 2016; Zhang et al. 2012b), and Lolium (Manzanares et al. 2016; Shinozuka et al. 2010) examined in this study (Table 8). Positional cloning has been particularly successful in rice in part because of its relatively small genome size, but recent advances in DNA sequencing will accelerate progress in large-genome species including wheat, barley, and IWG.

Comparisons of QTL results from the M26 × M35 and C3-2331 × C3-2595 families identified possible targets for positional cloning. The M26 × M35 family was specifically developed to examine the unique combination of seed-retention and naked-seed traits found in C3_3471, by crossing C3_3471 with a more primitive plant, C3_3941, and testing for segregation of QTLs from the F1 hybrid parent, M26. Results of this experiment indicate that this unique combination of SESH and SENA traits, observed in C3_3471 (parent of M26), involved a combination of at least two QTLs located on LG10 and LG11, which had synergistic M26-α effects on both traits (Table 6). The LG11 SESH QTL had the highest LOD score for any trait (Table 5), and it may be a reasonable target for positional cloning especially if this QTL could be isolated in an otherwise isogenic or neutral background. Other possible targets for positional cloning are the SENA QTL on LG02 and seed-size and seed-yield QTLs on LG06, but there are also some potentially good CGs for these QTLs that should be considered as discussed above. The LG02 SENA QTL on LG02 had relatively large effects as percentage of the mean (Table 6), meaning that it may have high breeding value. However, the LOD score of the LG02 SENA QTL was not so exceptional (Table 5), meaning that it may be difficult to obtain the mapping precision required for positional cloning unless measurements can be improved, or the experimental complexity can be reduced. Relatively strong seed-size and seed-yield QTLs were associated with LG06 in both M26 × M35 and C3-2331 × C3-2595 families (Fig. 1, Table 5), and the LG06 SELE QTL was the only QTL that was significant across all five environments (Table 4), but additional research is needed to determine whether these LG06 QTLs were caused by the Z SI gene or some other candidate genes such as GIF, SDW3, SUS2, or GL7 (Fig. 1) before positional cloning should be considered.

Implications for IWG domestication and improvement

Transgressive genetic variation, where progenies (Table 2) exceed parents (Table 1), and 111 significant QTLs were observed for 17 traits related to seed size, reproductive fertility, inflorescence capacity, stem and inflorescence compactness, seed retention in the field, percentage of naked seeds after threshing, and maturity in the M26 × M35 family. Theoretically, fixation of the optimum QTL genotypes could improve trait mains in the M26 × M35 family (Table 2) by an average of 12–118% across all five environments (Table 6) but this would require a uniform hybrid with one copy of the best M26 QTL allele and one copy of the best M35 allele for each QTL. It may be possible to make greater improvements by fixing the best M26 allele or the best M36 QTL allele, as homozygous genotypes, but the breeding values of these genotypes are unknown because all progenies contained one M26 allele (a or b) and one M35 allele (c or d). Thus, marker-assisted selection for most of these traits may be challenging considering the number and complexity of QTL effects and the fact that the M26 × M35 family is a small sample of the genetic variation present in genetically heterogeneous IWG grain breeding populations (Zhang et al. 2016). However, comparisons of QTLs and traits from the M26 × M35 family to those of fully domesticated grain crops, such as wheat, will elucidate pathways for successful IWG domestication.

Grain weights of wheat normally range from about 27 to 60 mg per grain among wheat landraces and cultivars (Abbo et al. 2014; Gegas et al. 2010; Ma et al. 2016) with up to 65 seeds per spike (Wang et al. 2010). Seed yield potential of wheat normally ranges from about 1500 to 2400 mg per spike, but up 5000 mg per spike has been reported for Tibetan Triple-Spikelet wheats that have up to 121 seeds per spike (Yang et al. 2005). With up to 37 spikelets per inflorescence and four seeds per spikelet observed on some IWG genets, the total number of seeds possible on each spike (148) exceeds that of wheat or barley. With nearly 9 mg per seed on some M26 × M35 IWG genets (Table 2) and up to 12 or 14 mg per seed on other IWG plants (Cattani and Asselin 2018; Zhang et al. 2017), the maximum theoretical seed yield per spike would be nearly 2100 mg. The best M26 × M35 genets produced no more than 525 mg of seed per spike, but up to and 932 mg of seed per spike has been reported for some IWG plants (Cattani 2017). With an average of about seven (Table 2) to nine (Cattani and Asselin 2018) florets per spikelet, up to 37 spikelets per head (Table 2), and grain weights of 12 to 14 mg, the maximum theoretical seed yield per spike for IWG would be from 3100 to 4600 mg if every floret produced seed. However, we never observed more than 49% floret fertility in the M26 × M35 family (Table 2). Fertility is considered one of the limiting factors in grass seed production (Armstead et al. 2008), and we observed a relatively large number of QTLs with relatively large QTL effects for SEFL and SESP traits in the M26 × M35 family (Tables 5 and 6). These findings indicate that selection for greater fertility may be a promising avenue to improve IWG seed yields, with potential for matching the seed yield per spike of wheat, but domesticated crops often have fewer and larger fruits or grains compared to their progenitors (Doebley et al. 2006).

Seed retention, free-threshing (naked) seed, and increased seed size are all key domestication or improvement traits of most grain crops (Abbo et al. 2014; Doebley et al. 2006; Gegas et al. 2010; Kovach et al. 2007; Lenser and Theißen 2013; Liu et al. 2016; Meyer and Purugganan 2013) including IWG. Compared to other traits evaluated in this experiment, IWG seed shattering (SESH) was controlled by a relatively small number of QTLs with relatively large QTL effects (Table 6), including the highest LOD score in this study (Table 5), which is similar to observations in other grasses and grain crops (Doebley et al. 2006; Doust et al. 2014; Kovach et al. 2007; Larson and Kellogg 2009; Pourkheirandish et al. 2015; Simons et al. 2006). Conversely, genetic factors controlling seed-threshing (SENA), seed-size (SEMA, SEAR, SELE, and SEWI), and other traits such as maturity (ZAMA) were substantially more complex. Compared to other traits, SENA and ZAMA were controlled by a relatively large number of QTLs (Tables 5 and 6) that explained more phenotypic variation in the M26 × M35 family (Table 7). Although we only detected four SEMA QTLs, there were a total of at least 12 loci affecting at least one of the four seed-size traits (SEMA, SEAR, SELE, and SEWI). Interestingly, the relative magnitudes of Avg-total QTL effects (Table 6) and PVE (Table 7) for SENA and SEMA were similar to the expected selection gains of 181% and 60% over five cycles of selection for these two traits, respectively (DeHaan et al. 2018). The average seed weights of IWG forage varieties, ranging from about 2–8 mg per seed (Berdahl and Frank 1998; Schulz-Schaeffer and Haller 1987), overlap with the associated IWG plants selected for grain production, which range from 3 to 14 mg per seed as reported in this and other studies (Cattani and Asselin 2018; Zhang et al. 2017). Similar overlap in seed sizes exists among wild and domestic forms of barley, wheat, and rice (Abbo et al. 2014; Fuller 2007; Gegas et al. 2010), but kernel weight has reportedly increased tenfold during the domestication of maize (Liu et al. 2016). Evidence from this and other studies demonstrates that IWG seed size can be improved with some rapid initial gains (Cattani and Asselin 2018; DeHaan et al. 2018; Zhang et al. 2017), but additional improvements in seed size may be slow and incremental or require new discoveries and breakthroughs. Conversely, relatively strong QTL effects for SENA and SESH, discussed above, suggest that free-threshing (naked) seed and strong seed retention are attainable domestication traits (Abbo et al. 2014; Doebley et al. 2006) that will clearly distinguish wild and domestic forms of IWG.

Conclusions

A total of 111 QTLs were detected for 17 variable traits in the M26 × M35 family including several large-effect QTLs responsible for critical IWG domestication and improvement traits related to fertility, inflorescence architecture, plant height, seed retention, seed size, and seed threshing. The magnitude of M26 × M35 QTL effects, heritabilities, and range of phenotypic variation observed in this and other studies (Cattani and Asselin 2018; Zhang et al. 2017) demonstrate the potential for fixing critical domestication traits, including seed retention and free-threshing (naked) seed, and improving other important grain production traits in IWG. With up to four possible alleles for each M26 × M35 QTL and even greater complexity in genetically heterogeneous breeding populations, the prospects for marker-assisted selection are not certain. However, identification of genes and loci directly associated with critical IWG domestication and improvement traits will enable better management and utilization of IWG germplasm. A total 42 domestication orthogenes, including the wheat free-threshing Q (Simons et al. 2006) and reduced-height green revolution (Peng et al. 1999) genes, aligned to one or more relevant QTLs in this experiment. Closely linked DUF247 and USP genes on IWG CHR06 corresponding to the Z self-incompatibility locus of Secale and Lolium (Hackauf and Wehling 2005; Shinozuka et al. 2010) showed possible alignment to seed-size, fertility, and seed-yield QTLs on IWG LG06, suggesting that diversity of SI genes may be a limiting factor for seed production in the full-sib IWG families. A large-effect QTL, with a LOD score of nearly 15 for seed shattering, had synergistic effects resulting in greater seed retention and more free-threshing (naked) seed in the M26 × M35 family. Although no candidate genes were associated with this large-effect QTL, recent advancements in genome sequencing and genotyping provide useful approaches to pinpoint genes or loci responsible for domestication of other crops (Tang et al. 2010) such as IWG. Herein, two DNA markers with relatively strong effects on IWG seed size across independent QTL and association-mapping experiments (Zhang et al. 2017) were located directly within IWG GDSL lipase (Clauß et al. 2011; Huang et al. 2015; Jiang et al. 2012; Lai et al. 2017; Ma et al. 2018; Tiwari et al. 2016) and xylogalacturonan (XGA) xylosyltransferase (Jensen et al. 2008; Zhang et al. 2015) orthogenes, which may be plausible candidate genes for these IWG seed-size QTLs. Results of this study demonstrated the increasing power of high-density genotyping (Kantarski et al. 2017), genome sequencing, and QTL mapping to elucidate pathways for the domestication and improvement of a new and genetically complex perennial grain crop.

Author contribution statement

SL and LD conceived M26 × M35 QTL experiment and conducted field evaluations. SL performed QTL analyses and wrote manuscript. JP and TK genotyped the M26 × M35 plants. XZ and JA conducted C3-2331 × C3-2595 QTL and AM experiments and provided unpublished data from these experiments. TK, SL, XZ, LD, JA, and JP developed linkage maps used for all QTL and AM analyses. JP, LD, KD, and SL conceived the IWG genome sequencing project as described in JGI Proposal Id. 1997. JS, JG, JJ, and SS assembled and annotated the IWG draft genome sequence. JC and MR assisted with data management and data analysis. KJ provided guidance and support for field and trait evaluations in Utah. All authors read and reviewed the manuscript.