Introduction

Flowering time in the grasses is a complex trait that controls adaptation of plants to their environment by tailoring vegetative and reproductive growth phases to local climatic effects. In determinant plants, such as cereal crops, flowering needs to occur when conditions for pollination and seed development are optimal and consequently most plants restrict flowering to a specific time of the year. This control is commonly achieved by using two environmental cues; day length (photoperiod) and temperature, in addition to endogenous cues, primarily related to development. The response of the plants to these different cues is controlled by four interacting pathways: photoperiod, temperature, autonomous and plant hormonal pathways (e.g. Mouradov et al. 2002; Amasino and Michaels 2010). The photoperiod pathway gives broad adaptation to a particular climatic region based on the day length cues either causing flowering to occur faster as day length declines (short day plants) or as day length increases in long day plants. The temperature pathway controls the fine scale progression to flowering, reacting to local changes in temperature that may occur in one season and not the next. In contrast, the autonomous pathway promotes flowering indirectly by facilitating responsiveness to cues that actively promote flowering (Simpson 2004), and the hormonal pathway controls the timing of flowering in relation to reproductive development. Farmers have exploited variation in flowering time to expand the areas where particular crops are grown, well beyond their natural range, and to fine-tune adaptation of particular varieties to specific environmental and management conditions.

Sorghum is a short day C4 grass native to Africa, grown as a crop for its grain, forage and biomass. Under domestication, its range as a cereal crop extends from the equator up to approximately 50o of latitude and is grown at elevations up to 2,500 m. To achieve such wide adaptation, flowering time in cultivated grain sorghum varieties ranges from as little as 45 days to more than 120 days. To create varieties that can be grown effectively in temperate environments, breeders in developed countries have systematically eliminated photoperiodism from elite grain sorghum germplasm (Chantereau et al. 2001).

Because of its quantitative nature, economic importance and its ease of scoring, the genetic basis of flowering time has been extensively studied in sorghum. Classical genetic studies have identified six major maturity loci Ma1, Ma2, Ma3 and Ma4 (Quinby 1967) and Ma5 and Ma6 (Rooney and Aydin 1999), the latter two interact epistatically to increase photoperiod sensitivity and lengthen the duration of the vegetative phase. Sorghum is grown as a hybrid crop in the developed world and as a result the degree of dominance and heterosis for flowering time is of practical importance. If photoperiod effects are excluded studies indicate that the genetic control of flowering time in sorghum is largely additive, although a degree of dominance is also observed leading to hybrid vigour for early flowering with estimates of mid-parent heterosis varying from 1 to 6 % for (Kirby and Atkins 1968; Liang and Walter 1968; Wenzel 1988).

The underlying genes and molecular mechanisms controlling flowering have been more extensively studied in the model dicot Arabidopsis (e.g. Greenup et al. 2009; Imaizumi 2010), with over 80 genes in the four parallel gene pathways described and characterised (e.g. Flowers et al. 2009; Higgins et al. 2010; Putterill et al. 2004). Comparative genomics studies have identified conserved homologues of the genes in these pathways in other cereals, including rice, Brachypodium and barley and in particular, aspects of the photoperiod and autonomous pathways are well conserved across species (e.g. Higgins et al. 2010; Lagercrantz 2009; Colasanti and Coneva 2009; Cockram et al. 2007; Jung and Muller 2009).

Although the number of genes present in the flowering time pathway of the Arabidopsis model provides a likely upper limit to the number of flowering genes present in sorghum, it is likely that the number of genes that have allelic series and can therefore be detected as QTL is lower. Theoretically, only a relatively small number of genetic loci with allelic series would be required to generate the range of phenotypic variation in flowering observed in cultivated sorghum. The actual number of loci and their allelic diversity will be a product of the diversity within the wild population and selection pressures that occurred during domestication. Potentially, these pressures may have reduced variation at some loci, alternatively the maintenance of variation at many loci may have been favoured because of the flexibility it provide farmers to fine tune their varieties while allowing selection for QTL for other traits linked to flowering time loci.

Because the advent of molecular marker technologies over two decades ago, 14 QTL and association mapping studies for flowering time in sorghum have been undertaken (Brown et al. 2006; Chantereau et al. 2001; Crasta et al. 1999; Feltus et al. 2006; Hart et al. 2001; Kebede et al. 2001; Kim 2003; Lin et al. 1995; Mannai et al. 2011; Murray et al. 2008; Parh 2005; Ritter et al. 2008; Shiringani et al. 2010; Srinivas et al. 2009). To date, the majority of sorghum QTL studies have been based on the crosses between parental lines selected for their diverse phenotypes (often interspecific crosses). To date, these 14 studies have identified 78 QTL for flowering time, representing approximately 27 unique genomic regions, with an average of 5.8 QTL detected per study. In addition, a recent positional cloning study (Murphy et al. 2011) identified the causal gene at the Ma1 locus in sorghum. However, most of what we know to-date about the genetic architecture of flowering time in sorghum is based on the traditional QTL linkage mapping in bi-parental populations, and while such bi-parental crosses provide valuable insights, the current view of the genetic architecture of flowering time in sorghum, particularly as it pertains to cultivated sorghum, is very limited. A more global investigation of the genetic control of this complex trait requires resources that enable multiple alleles to be detected simultaneously. A large sorghum backcross nested association mapping (BC-NAM) population has recently been developed (Jordan et al. 2011). The BC-NAM population captures a substantial proportion of the global genetic diversity of sorghum inbred lines as measured by DArT markers (Jordan et al. 2011) and provides the power to resolve complex, quantitative traits. The sorghum BC-NAM panel consists of a set of 4,000 BC-RILs from crosses between the elite line R931945-2-2 and 56 other diverse inbreds. A subset of this resource has been genotyped with DArT markers. The high allele diversity and large sample size provide power for detection and resolution of QTL. A comparable NAM resource has recently been developed in maize (Buckler et al. 2009; McMullen et al. 2009) consisting of 25 recombinant-inbred line (RIL) populations; based on 25 founder lines each crossed in a half-sib design to the common reference parent B73. The power of the maize NAM resource has been successfully demonstrated through the genetic dissection of a number of complex traits, including flowering time (Buckler et al. 2009), northern (Poland et al. 2011) and southern (Kump et al. 2011) leaf blight resistance, leaf architecture (Tian et al. 2011) and kernel composition (Cook et al. 2012).

The objective of the current study was to use the sorghum BC-NAM population in order to dissect the genetic architecture of flowering time in sorghum; to compare the QTL identified with other genomic regions identified in previous sorghum and maize flowering time studies and to highlight the implications of our findings for sorghum domestication and breeding.

Materials and methods

Germplasm

The study used a backcross nested association mapping population (BC-NAM) previously developed by the Department of Agriculture Fisheries and Forestry (DAFF), formerly the Queensland Department of Primary Industries (QDPI) (Jordan et al. 2011). The development of the population is described in detail by Jordan et al. (2011), and the 24 BC-NAM families used in the current study are detailed in Table 1. The 24 diverse lines used for population development for the current study included 16 lines from breeding programs around the world, 6 landraces converted to temperate adaptation by the sorghum conversion program (Stephens et al. 1967), one landrace and one weedy species. The sorghum BC-NAM used for this study is a reference design with each subpopulation being produced by crossing a single elite parent (R931945-2-2) with a diverse line and backcrossing the resulting F1 to a genetic male sterile (ms3) version of the elite parent to produce a large BC1F1 population. These seeds were grown in a single long row of well-spaced plants. Fifty to one hundred plants were selected from each row on the basis appropriate agronomic type. The selection focused predominantly on choosing plants with height and maturity within the acceptable range required for performance in Australian cropping systems. Each selected BC1F1 plant was self-pollinated and the resulting BC1F2 seed from each plant was grown in a 4-m row containing approximately 40 plants. One fertile plant was selected from each row and self-pollinated. Again selection was imposed for appropriate agronomic type which reduced genetic variance but did not generate non-normality in the phenotype within individual population and thus is still valid for analysis. This process of head to row selection was repeated once more to produce BC1F4 seed. Selection for male fertility at each generation insured that most rows were homozygous for the fertile allele of the ms3. The BC1F4 seed from the between 30 and 90 lines from each population was grown in a single row and top cross hybrid seed was produced by taking bulk pollen from each BC1F4 row and crossing it to a cytoplasmic genetic male sterile line (B923296). The female parent was chosen to represent female parental lines typically used in Australia with good general combining ability for grain yield, sorghum midge resistance and stay green.

Table 1 Description of the characteristics of the diverse parental lines used to develop the 24 BC-NAM families included in current study

Phenotypic data and testing

Data in this paper were produced from trials grown in the Australian summer cropping seasons of 2004/05, 2005/06, 2006/07 and 2007/08. Hybrids were evaluated each year in trials planted at ten locations across the 4 years with 20 trials overall in Queensland and Northern NSW covering the region between 29 and 32 degrees of latitude (Table 2; ESM Table S1). Trials were planted between November and February each season. Days to flowering was defined as the time taken from planting for 50 % of the plants in a plot to commence flowering. There were 14, 4, 14 and 12 BC-NAM families in each of the four seasons, respectively, making an unbalanced set of data for an overall comparison of BC-NAM families.

Table 2 Description of the phenotypic field trial data, including the total number of genotypes per trial, the number of BC-NAM families per trial, the number of BC-NAM individuals per trial, the mean days to flowering (DTF) per trial, the genetic variance, heritability and loading per trial site

Statistical analysis of breeding trials

The trials used partially replicated designs (Cullis et al. 2006) with the number of genotypes ranging from 674 to 1,197 across the locations, with 30 % of the genotypes replicated two or more times and single plots of the remaining 70 %. Each individual trial had a different design and each design attempted to diminish the possibility of spatial error effects occurring within each trial. Concurrence of genotypes and populations between years allows groups of trials from multiple years to be analysed as a single multi-environment trial (MET). A MET containing the 20 trials was analysed by fitting a linear mixed model using the package ASReml (Butler et al. 2009) and the R statistical software (R Development Core Team 2012). The model consists of a fixed effect for flowering at each trial, random effects for trial by genotype and the spatial error for each individual trial using procedures, such as those discussed in Smith et al. (2001). The genotype by environment (GxE) interaction was examined by fitting a second-order factor analytic structure to the trial by genotype interaction. The MET included additional genotypes that were in the trials, but not included for this study. Although some trials included only a very small number of BC-NAM families, for example trials from 2006, all the data from these trials was included to establish a more extensive analysis of the relationship between each trial. The analysis resulted in a genetic variance for each trial along with a set of loadings that represent factor analytic loadings. The flowering times for the trials showed high levels of correlation across sites thus ranking the genotypes similarly. As a result, a single estimate of flowering time was produced for each genotype using the MET analysis.

Generalised repeatability estimates on a hybrid mean basis were calculated for flowering time using the method proposed by Cullis et al. (2006) (Table 2). This method was used because the standard definitions of heritability assume balanced data, where trials are analysed using models with independent random effects for blocks and plots and are not applicable to the partially replicated designs used in this study. For example, the standard definitions of heritability do not allow for the use of spatial models that were used for these analyses. Such models, by their nature, imply complex variance covariance structures pertaining to experimental units.

Markers

Five plants of each of the 1,389 individuals from the 24 BC-NAM families (BC1F4 generation) were bulked for DNA extraction and screened with 1,005 DArT markers, following the methodology described in Mace et al. (2008) and Jordan et al. (2011). A subset of 932 high quality markers, with minimal missing data across all lines, were selected for subsequent analyses, spanning 95 % of the consensus genetic linkage map distance (Mace et al. 2009a), averaging 0.6 markers/1 cM (Table 3). The number of polymorphic markers per BC-NAM family ranged from 192 to 489 (ESM Table S2).

Table 3 Details of DArT markers used genome-wide in BC-NAM data analysis

QTL analyses

Owing to the small size of the individual populations and the selection imposed for maturity and height genes during their development, the standard mapping approaches typically applied to bi-parental crosses were not appropriate. Therefore, we used two different approaches, one novel and one already established, and compared the results of the two methodologies.

Multi-populational genomic regional QTL analysis (mpQTL)

The novel mpQTL analysis method calculated an integrated probability statistic to highlight regions of the genome that have a key contribution to the variation of flowering time in the BC-NAM families studied, through a two-step process. The first step involved single marker analysis across the individual BC-NAM families. As all the BC-NAM families were not present in all trial years (ESM Table S1) and markers were also unbalanced across BC-NAM families, with polymorphism varying across the BC-NAM non-recurrent parental (NRP) lines (ESM Table S2). Therefore, an individual subset of polymorphic markers was selected for each BC-NAM family. A linear mixed model was then conducted for each individual marker within each BC-NAM family. For a more extensive description of this model, see ESM file S1. Each marker was used as a fixed effect and random effects included trial by genotype, spatial effects within each trial and error effects within each trial. For each marker, the model estimates a statistic that can be translated into a P value that tests the hypothesis that the two marker alleles have equal flowering effect within the specific BC-NAM family. At the completion of these analyses, a table of P values for marker by BC-NAM family was generated.

To combine the results of the single marker analysis across BC-NAM families, the most significant marker was selected from a sliding window of length 5 cM and a step of 1 cM. This stepwise process was performed on each BC-NAM family separately to generate a series of probability values spaced 1 cM apart along each chromosome. Fisher’s combined probability test (Fisher 1932) was then applied to combine the results of the single marker regression across all the BC-NAM families to generate a single probability value representing the approximate presence of significance at each 1 cM point along each LG. In order to overcome the imbalance across the BC-NAM families at each marker location, a false discovery rate adjustment was made to the Fisher P values to allow a consistent 0.01 % significance value (−log Fisher P value of >3) to declare QTL significance.

Association mapping (AM)

To assist with the confirmation of our results, a secondary, more standard analysis was performed. A mixed model marker–trait association analysis was performed using the statistical package GenStat (VSN International 2011). Population structure was accounted for as a fixed effect using the subpopulation grouping option (equating to each BC-NAM family). In the majority of cases, there was little or no pedigree information for the exotic parents and hence our parentage structure only has a single generation which is equivalent to the sub-population grouping. The model for marker–trait association included an intercept term, the effects associated with each BC-NAM family, random effects of genotype and the fixed effects of the tested markers. The Wald statistical test was then used for each marker, individually, to test the null hypothesis that the marker’s effect was zero. A 1 % false discovery rate (−log10 P = 2) was used to declare QTL significance.

Projecting maize QTL onto the sorghum consensus map

A total of 75 significant markers were identified by Buckler et al. (2009) associated with flowering time in maize (36 significant marker/trait associations for days to anthesis (DA; male flowering) and 39 significant marker/trait associations for days to silking (DS; female flowering)), representing a total of 46 unique genomic locations. The significant markers identified were sequence mapped with an in silico mapping strategy analogous to e-PCR (Schuler 1998) using BLAST similarity (Altschul et al. 1990) between the maize marker and the sorghum whole genome sequence (WGS) sequence and conducted via Phytozome (http://www.phytozome.net), requiring hits with E ≤ 1e − 10 based on BLASTn. The predicted cM locations of the identified physical locations of the maize markers on the sorghum genome were calculated using a framework map based on the 504 sequenced markers on the consensus map, averaging 1 marker/3.1 cM or 1 marker/1.3 Mbp, as detailed in Mace and Jordan (2011).

Results

Phenotypic assessment of flowering time in BC-NAM families

A summary of the results from the MET analysis is presented in Table 2. There was a difference of 30 days between the earliest flowering trial (Bil2007; 46.9 days) and the latest flowering trial (Dal2008; 77.4 days). The individual trial repeatabilities were all greater than 75 % with half of the trials having a repeatability greater than 80 % indicating that in general the analysis explained the majority of the genetic variation within each trial. The total variance explained from the fit of the second-order factor analytic model was 81 %, this indicated that we can explain 81 % of the total variation in these data with only two factor components; the first of these factors explained 76 % and have an average trial loading of 0.84 and an average between trial correlation of 0.72. Trial loadings were all higher than 0.7; these values indicate that in general all sites ranked the genotypes in a similar way, indicating minimal genotype by environment interaction, and an across site average for each genotype would be representative of the performance of these genotypes and could be used for further study.

Figure 1 shows a boxplot representation of the mean and variance of the predicted average across site flowering times for each BC-NAM family. The vertical line at 60.73 represents the average flowering time for the recurrent parent genotype R931945-2-2. All families except for SC326-6 have an average flowing time less than the recurrent parent.

Fig. 1
figure 1

Box-plots of BLUPs of days to flower (DTF) per BC-NAM family based on data from 20 locations, the vertical line representing the mean of the recurrent parent, R931945-2-2

QTL results

QTLs were mapped across the 1,389 individuals comprising the 24 BC-NAM families using the AM and mpQTL approaches. 25 QTL were identified by the AM and 34 QTL by the mpQTL method, with 19 out of a total of 40 unique QTL being in common (Table 4; Fig. 2). In a given region of the genome, the integrated probability statistic, calculated through the mpQTL analysis method, increased with the number of BC-NAM families for which the region displayed significant QTL effects in addition to the proximity of the QTL positions estimated in the different families and the level of significance in different families. When the stringency of the thresholds were relaxed slightly (−log10 P > 1.5 for the AM and –log Fisher P > 2 for mpQTL analysis) a further 12 QTL were found to be in common; totalling 31 out of the 40 unique QTL in common across the two methodologies.

Table 4 Summary of QTL identified in BC-NAM families, detailing location, probability values (significant P values in bold), total range of significant effect size in days to flowering (DTF)
Fig. 2
figure 2

QTL analysis results for flowering time in the BC-NAM population across the genome. The vertical lines indicate the breaks between chromosomes. a QTL analysis results based on AM; b QTL analysis results based on mpQTL analysis

All 40 QTL are presented in this study. The majority of flowering time QTL was shared among multiple families, with 35 out of the total of 40 QTL showing significant effects among 3 to 12 families, with each QTL on average being detected in 5.6 families (Table 4). The 40 QTL identified were distributed across all chromosomes; the highest number of QTL per chromosome was 6 (for SBI-03 and SBI-04), while the lowest was 2 (SBI-10). At least one QTL was located in the heterochromatic region of each chromosome, with the exception of SBI-02. The average length of the support intervals for QTL positions was 4.1 cM, as determined through the extent of the QTL region greater than the defined statistical thresholds in the mpQTL method (ESM Table S3).

Relative to the recurrent parent, the absolute values of significant allelic effects ranged from −1.46 to 1.78 days, averaging −0.70 days across significant alleles contributing to earliness and averaging 0.78 days across significant alleles contributing to lateness (Table 4; Fig. 3). At 75 % of the QTLs, the NRPs contributed both positive (later) and negative (earlier) allele effects relative to R931945-2-2, suggesting that allelic series of at least three alleles each exist at these QTLs (ESM Figure S1). Figure 4 details an example of a putative allelic series for one of the QTL on SBI-10, QDTF_NAM_10_61, which has allele effects ranging from −1.45 days (contributed by SC56-14E) up to 0.76 days (contributed by parent SC23). At 6 QTL, the diverse, NRP lines only contributed negative (earlier) allele effects and at 4 QTL, only positive (later) allele effects were detected (Table 4). The total range of QTL allele effect sizes per QTL, across all BC-NAM families segregating at each loci, ranged from 0.61 days up to 3.08 days (Table 4), with an average range of allele effects per QTL of 1.72 days.

Fig. 3
figure 3

Heat-map for flowering time QTL effects by chromosomal position and donor parent

Fig. 4
figure 4

Allelic series for QTL QDTF_NAM_10_61 across 24 BC-NAM families. Allele effect estimates are relative to R931945-2-2 allele days to flowering (asterisk indicates significant effect)

A common consequence of using small populations such as the BC-NAM families for QTL analysis is the tendency for QTL effects to be overestimated (Beavis 1994). To determine the impact of population size on allele effects estimated, we compared the effect sizes determined in the S. arundinaceum BC-NAM subset family (96 individuals) with those determined in a larger (214 individuals) population derived from the same cross. The allele effect sizes between the two populations were similar, having an average difference of just 0.17 days, and with no apparent bias towards larger estimates of effect sizes in the small population (ESM Figure S4). We believe this is in part due to the high heritability of the trait simplifying effects of the backcross structure on the composition of individual genotypes within each NAM family.

In terms of allele occurrence, the QTL identified solely by the AM methodology were rarer than those identified by the mpQTL method. Using the AM methodology solely, there were two populations, on average, with a significant marker (p < 0.05), underlying the QTL regions, compared to 5.3 for the mpQTL methodology solely, and compared to 6.1 for the QTL detected in both methodologies. The most extreme example of this was for QDTF_NAM_6_23 which was a very rare QTL allele, detected as significant (p = 0.0165) by the AM methodology in only one population (Rio). The AM methodology also detected alleles with smaller effect size on average, compared to the mpQTL methodology (average total range of allelic effects of QTL detected solely by the AM methodology was 1.4 days compared to 1.7 days in QTL detected solely by the mpQTL methodology). The mpQTL methodology, due to the 5 cM sliding window analysis step which attempted to compensate for the disparity in common markers across population, could identify more populations with a significant marker in a given region than the AM approach, and as a consequence the confidence interval (CI) of the QTL identified solely by the mpQTL methodology were on average larger (3.6 cM) compared to the CI of QTL detected solely by the AM methodology (0.2 cM).

The number of QTL detected with significant allelic effects within each BC-NAM family ranged from 2 (SC62C) to 18 (SC56-14E and Tx2903) with an average of 9.45 QTL detected per BC-NAM family (Fig. 5). The total additive effect of all the significant effect QTL alleles per BC-NAM family, calculated by summing significant additive positive and negative QTL effects, ranged from 19.85 days (SC56-14E) to just 1.92 days (SC62C), with an average of 6.9 days summed across positive and negative QTL allele effects per BC-NAM family (Fig. 5). The BC-NAM family with the largest summed negative (early) effect was S. arundinaceum, with 16 significant early effect QTL with a total additive effect of −16.5 days. The S. arundinaceum population had only one significant positive (late) effect QTL with an estimated allele effect of 1.72 days. In contrast, the BC-NAM family with the largest summed positive (late) effect was SC56-14E, with 9 significant late QTL with a total additive effect of +10.77 days. This population also had nine significant early QTL with a total additive effect of −9.08 days. One BC-NAM family, Sureno, only had significant late effect QTL alleles (three significant QTL with a total additive effect of 1.29 days) and one BC-NAM family, SC62C, only had significant early effect QTL alleles (two significant QTL with a total additive effect of −1.92 days. Overall, the BC-NAM families contributed approximately twice as many significant negative (early) effect QTL (146 in total; average of 6.08 per BC-NAM family) compared to significant positive (late) effect QTL (81 in total; average of 3.37 per BC-NAM family).

Fig. 5
figure 5

Sums of the estimated positive (late) and negative (early) QTL allele effects for each BC-NAM family. Number of QTLs next to each bar

We compared flowering time QTL detected in the BC-NAM population to those reported in 13 previous sorghum studies (Table 5). The QTL identified in the 14th study (Murray et al. 2008) were unable to be projected onto the consensus map as the genetic linkage map in this study was constructed predominately with AFLP markers which are non-sequenced based and hence not readily comparable across studies. Nine unique sorghum bi-parental populations and one diversity research set were used across the 13 studies; from these a total of 70 QTL were projected onto the sorghum consensus map (Mace and Jordan 2011), representing 27 unique genomic locations (62 of the QTL representing 20 meta QTL, and 8 unique QTL). On average, 5.8 QTL were detected per study, significantly less than the number detected per BC-NAM family in the current study (average of 9.45 QTL/BC-NAM family). Of the 40 QTL detected in the BC-NAM, 22 co-located with previously identified sorghum QTL, based on the genetic distance of the QTL mid-points being less than 10 cM away from each other (ESM Fig. S2). A further 4 of the 40 detected QTL were within 8 cM or less of the CI of a previously identified sorghum QTL, 16 QTL detected in the BC-NAM populations were novel, and not previously identified in sorghum. In total, 22 of the 27 unique genomic regions previously identified to be associated with flowering time co-located with the BC-NAM QTL. Only five genomic regions containing eight previously identified flowering time QTL have been reported in previous studies but not detected in the BC-NAM population.

Table 5 Details of previously identified flowering time QTL from 13 other studies projected onto the consensus map

We further evaluated the correspondence of the sorghum BC-NAM QTL with the maize NAM QTL recently described (Buckler et al. 2009). Using a multiple-family joint stepwise regression method, Buckler et al. (2009) identified 36 QTL for days to anthesis (DA; male flowering) and 39 QTL for days to silking (female flowering); the comparable traits for days to flower in sorghum, representing a total of 46 unique genomic locations across the 75 significant marker–trait associations. Of the 40 sorghum BC-NAM QTL for flowering time detected in this study, 37 (92.5 %) were less than 10 cM away from the projected location of at least one significant maize marker (based on the mid-point of the sorghum QTL); 30 (75 %) were less than 5 cM away from at least one significant maize marker and 15 (37.5 %) were less than 1 cM away from at least one significant maize marker (Table 6; ESM Fig. S3; ESM Table S3). Each of the 37 sorghum QTL corresponding closely to maize QTL co-located with, on average, 3.05 maize QTL, ranging from one (for seven sorghum QTL) to six (for one sorghum QTL). Of note, one sorghum QTL (QDTF_NAM_6_8) was less than 1 cM away from four syntenous maize QTL, while a further 13 sorghum QTL were less than 5 cM away from at least three syntenous maize QTL, corresponding to the known genomic synteny between the species. Figure 6 demonstrates the general finding of this study that the sorghum QTL correspond to multiple maize QTL detected in Buckler et al. (2009). Figure 6a details the correspondence of QTL detected on SBI-01 with maize flowering time QTL; four of the five sorghum BC-NAM DTF QTL co-locate within 10 cM of at least six maize flowering time QTL. Both Figs. 6a and b demonstrates the apparent conservation of QTL order between the species however there were a number of examples of striking differences in the relative distances between the QTL. In some cases increased recombination per unit of physical distance in sorghum suggest that some maize QTL may be the result of multiple linked flowering time genes, as demonstrated by Fig. 6b.

Table 6 Correspondence to previously identified QTL for flowering time in Sorghum (SbQTL) and Zea mays (Zm Chr1 -10) detailing the number of QTL previously identified where the distance between the mid-point of the QTL was less than 10 cM away from the sorghum flowering time QTL detected in current study
Fig. 6
figure 6

Sorghum/maize flowering time QTL synteny. a Between sorghum SBI-01 and maize chromosomes 1, 5 and 9. b Between sorghum SBI-06 and maize chromosomes 2 and 10. Sorghum QTL indicated as green segments on sorghum chromosomea. Projected locations of the mid-point of the sorghum QTL onto the syntenic maize chromosomes indicated by lines between chromosomes. Locations of the maize DA (blue) and DS (red) significant markers indicated. Co-ordinates of segments of maize chromosomes detailed (based on maize NAM map) and orientation of maize chromosomes, relative to sorghum chromosome, indicated by arrows. Asterisks indicate maize QTL with mid-points less than 10 cM from the location of the sorghum QTL mid-pointa. For graphical display purposes, when the QTL CI was based on a single marker only, the average CI distance (4.1 cM) was used

We also predicted the genetic linkage map (cM locations) of the sorghum BC-NAM QTL on the maize NAM map (Buckler et al. 2009), based on the projection of the mid-points of the sorghum QTL (Fig. 7). These were calculated based on 838 sequenced markers on the consensus maize NAM map (available via http://www.panzea.org). The 40 sorghum QTL corresponded in total to 83 unique loci on the maize genome, with each QTL having at least two locations. When considering an average CI of 4 cM, 54 % (45 of the 83) of the sorghum QTL projected onto the maize genome co-located with one or more maize NAM QTL for flowering time. A χ 2 goodness of fit test indicated that the degree of correspondence between the sorghum and maize QTL was greater than would be expected by chance and highly significant (p = 0.00142).

Fig. 7
figure 7

Genetic linkage NAM map of the maize genome with maize significant markers for flowering time highlighted in red (DS: female flowering) and blue (DA: male flowering) and 4 cM CI highlighted in grey around each significant marker. The predicted locations (based on the projection of the mid-point of the QTL with 4 cM CI) of the sorghum NAM QTL for flowering time are highlighted to the right of each chromosome in dark green, where the CIs overlap with a maize QTL, or light green, when the CIs don’t overlap

Discussion

This study reports on the use of the sorghum BC-NAM population to identify QTL for flowering time, using two different QTL analysis methodologies. In total, 1,389 BC-NAM individuals, from 24 families, were evaluated for flowering time using data from 20 locations over 4 years. We identified 40 QTL for flowering time from the BC-NAM population and demonstrated a high degree of correspondence between the sorghum NAM QTL for flowering time and the maize NAM QTL for flowering time, in addition to previously identified flowering time QTL in sorghum. Putative allelic series of multiple, relatively small effect alleles were detected at 75 % of the QTL detected in this study, with the NRPs contributing both positive (later) and negative (earlier) allele effects relative to R931945-2-2. The significant individual QTL allele effect sizes ranged from 0.32 days up to 1.78 days, and additionally both rare and common QTL alleles were detected in our study, ranging from a very rare QTL allele detected in only one population to very common QTL alleles detected in 12 populations. The integration of the two QTL analysis methodologies used in the current study allowed for the identification of QTL with both rare and common alleles and with small and large effect and also accounted for independent polymorphism patterns across the BC-NAM families. The identification of syntenous QTL in maize provided additional precision, and in many cases permit the identification of underlying candidate genes. The results of this study have successfully contributed to an enhanced understanding of the genetic architecture of flowering time in sorghum, through the identification of QTL in common with previous studies, in both sorghum and maize, and novel QTL, providing unique insights into the genetic control of this trait and its implications for domestication and plant breeding.

Genetic architecture of flowering time in sorghum is controlled by a relatively large number of loci with multiple alleles with small effects

In any QTL experiment, the design of the population or populations studied and the environments where phenotypic data are collected, influence the QTL that are detected, as well as the accuracy of estimates of their locations and effects. The impact of these factors needs to be understood in order for data from experiments to be interpreted appropriately. In this study we used multiple backcross derived populations with a common parent R931945-2-2. The design allows the effects of QTL from each NRP to be calculated relative to R931945-2-2. Selection, to ensure genotypes fell within a phenotypic range that enabled adaptation to Australian sorghum growing environments, during population development resulted in the removal of genes with very large effects, such as major photoperiod response genes that are known to be present in a number of the NRP, as well as removing some allelic combinations that showed strong epistatic effects such as the known interaction between Ma5 and Ma6 (Chantereau et al. 2001). However, even in the absence of large effect QTL, the analyses undertaken were sufficiently powerful to detect multiple small effect QTL. The backcrossing design also reduces the potential contribution of epistatic interactions between alleles from the NRPs to the genetic variance of the individual BC-NAM families. In addition, our use of F1 hybrids to evaluate the lines potentially influenced the QTLs we detected due to the potential for dominance of genes in the tester to mask QTL from the NRPs.

The genetic architecture of flowering time in the BC-NAM population was characterised by relatively small allelic effects. Although a few QTL with relatively large QTL effects were detected (ten QTL with a total putative allelic series effect range of >2 days), most of the QTL detected had total estimated allele effect ranges of between 1 and 2 days. The largest individual allele effect size was +1.78 days (QDTF_NAM_6_43). Overall, the effect sizes determined in the BC-NAM population were comparable to the allele effect sizes reported for the previously identified sorghum flowering time QTL (Table 5); the majority (almost 85 %) of the QTL also had QTL allele effect sizes between −2 and +2 days. The largest flowering time QTL allele effect reported for sorghum to date is for Ma1, the major photoperiod-sensitivity locus in sorghum, which was reported to have an additive effect of 40.3 days in an interspecific mapping population (S. bicolor × S. propinquum) (Lin et al. 1995). However, this QTL is part of the photoperiod pathway and alleles with such large effects would have been removed from our populations by the selection imposed during population development; as evidenced by our finding of that the QTL we identified that co-located with Ma1 having a much smaller effect size (1.81 days) than previously reported in the literature. This example also highlights the issue of the multiple environmental cues influencing flowering time; specifically temperature and photoperiod. In our study, despite the BC-NAM population being grown in environments with a range of different photoperiods, the genetic correlations between environments were high, indicating low levels of re-ranking of the genotypes which in turn indicates that the majority of the QTL were associated with the temperature component of flowering rather than photoperiod response. Of the 13 QTL studies on flowering time in sorghum to date, only two have specifically separated the effects of photoperiod and temperature in order to identify QTL for photoperiod sensitivity (Chantereau et al. 2001; Mannai et al. 2011). Three of the QTL identified in the BC-NAM population correspond to previously described photoperiod sensitivity QTL (2 on SBI-01; QDTF_NAM_1_53 and QDTF_NAM_1_112 and one on SBI-10; QDTF_NAM_10_44). A known flowering time candidate gene in the photoperiod response pathway underlies the QTL on SBI-ten (FT; Sb10g003940), however as multiple QTL co-locate in this genomic region from multiple studies with different environments and photoperiods, it is possible that either this gene influences both the temperature and photoperiod pathways or that it is linked to other flowering time genes in the temperature pathway. The results indicate that the relatively small QTL effects detected was predominantly the result of the removal of photoperiod sensitivity QTL and possibly other large effect QTL by selection imposed during the development of the population. The remaining QTL would be enriched for QTL mainly associated with the other components of the pathway.

This study has revealed that the combined effect of relatively small allelic substitutions across many QTLs can lead to the substantial differences in flowering time observed across the BC-NAM population and by inference cultivated sorghum more generally. We observed large numbers of polymorphic flowering loci. Four BC-NAM families detected 15 or more QTL in the current study; 18 QTL were detected in both the SC56-14E and Tx2908 BC-NAM families, 17 QTL in the S. arundinaceum BC-NAM family and 15 in the LR2931-2 BC-NAM family. The genomes of the NRPs were found to be a mosaic of loci for both late and early flowering time effects, leading to total ranges of combined QTL allele effects per population between 19.8 days (SC56-14E) and 1.9 days (SC62C). It should be noted, however, because the populations were backcross derived and selected for maturity, individuals with extreme phenotypes were excluded from the lines that were evaluated. As a result the phenotypic range of flowering time observed in the populations (Fig. 1) was expected to be much less than the range that might be expected if the parental alleles were segregating randomly at a frequency of 0.5.

In general, the numbers of positive and negative alleles detected in a particular NRP were roughly equal suggesting that balancing selection for a particular flowering time was occurring in the breeding populations where the line was developed. The most extreme exception to this general observation was observed in the S. arundinaceum population where 19 alleles for early flowering were observed and only one allele for late flowering. This population was the only one that involved a cross between a wild and domesticated sorghum. The accession is an introduced weed in Australia with genes for photoperiod response. This accession was collected at the southern end of the weed’s natural range where the presence of photoperiod genes would potentially prevent the plant flowering at a time when environmental conditions allowed seed production. The weed population was introduced after settlement and has only been in this location for a relatively short period of time (<150 years). The high number of early alleles probably reflects selection for early maturity genes to mitigate the mal-adaption imposed by the presence of large effect photoperiod sensitivity genes. In the BC-NAM families based on cultivated varieties the detection of large numbers of loci with multiple functional alleles indicate the adaptive benefit to maintaining variation at multiple loci even in cultivated sorghum.

The comparison of our BC-NAM QTL results with previous QTL studies for flowering time in sorghum, revealed that 60 % of the QTL identified in the sorghum BC-NAM population co-located with previously identified flowering time QTL in sorghum. The remaining 40 % (16 in total) was novel QTL for flowering time, likely identified due to the broader range of genetic diversity captured by the BC-NAM, novel segregation between recurrent parent and/or tester, and its greater statistical power compared to that achieved in standard, bi-parental mapping studies. However, it is also possibly due to unique untested environments or statistical anomalies in our approach. In contrast, five genomic regions have been reported in previous studies but were not detected in our BC-NAM population. The QTL in these regions could potentially be related to photoperiod and hence less likely to be detected in the BC-NAM panel; indeed in two-thirds of these QTLs absent from the BC-NAM panel, where flowering time candidate genes underlying the QTL were identified, candidate genes from the photoperiod pathway were present (Table S4); or alternatively these QTL could have been eliminated when selecting for a reduced maturity range in the BC-NAM population. Two of the five QTL not detected in the BC-NAM population were identified in multiple previous studies; however, the remaining three were only detected in single studies, and could represent low frequency alleles contributing novel QTL only detected in specific genetic backgrounds or with specific environmental sensitivity. A range of other factors can also contribute to lack of congruency between studies including mapping resolution influenced by the magnitude of linkage disequilibrium in the mapping population or diversity set and QTL analysis methods used.

Our study validated the majority (22 of the 27) of previously identified unique genomic regions associated with flowering time in sorghum, in addition to identifying 16 novel QTL not previously discovered. In addition to the QTL unique to previous sorghum QTL studies, the total number of unique genomic regions identified to date associated with flowering time in sorghum is currently just over 43. This is significantly less than the approximately 80 genes currently identified in the flowering time pathway of Arabidopsis. However, considering that the majority of flowering time QTL have been identified in temperate sorghum where major effect genes for photoperiod sensitivity have been removed it seems likely that these QTL are predominately associated with temperature rather than photoperiod, whereas 38 genes are known in the photoperiod pathway of Arabidopsis, the total number of 43 flowering time QTL identified for sorghum is comparable to the 42 genes currently identified in the temperature, autonomous, and plant hormonal pathways in Arabidopisis. This could indicate that a very high proportion of the QTL involved with the genetic control of flowering time associated with temperature have been identified and hence that this study significantly contributes to our global understanding of the genetic architecture of flowering time in sorghum.

High degree of correspondence between sorghum and maize flowering time QTL, as determined through NAM populations

This study revealed a remarkably high degree of correspondence between flowering time QTL identified in both the sorghum and maize NAM populations; two-thirds (50 out of 76) of the significant markers identified for days to anthesis (DA) and days to silking (DS) in the maize NAM panel (Buckler et al. 2009), when projected onto the sorghum consensus map, were within 10 cM of the mid-point of at least one sorghum QTL detected in this study. This is in agreement with a previous study (Lin et al. 1995) that also noted the conservation of genomic regions associated with short-day flowering between many members of the Poaceae family, including sorghum, maize and rice. As noted previously, all the sorghum QTL corresponded to two locations in maize (three sorghum QTL corresponded to three locations), as expected based on the previous studies that have identified that colinear arrangements of chromosomal segments of sorghum (and rice) match maize at a ratio of 1:2, typical of a whole genome duplication event in maize (Messing 2009). Seventy-five percent of the sorghum QTL detected in this study corresponded to a maize flowering time QTL at both syntenic genomic locations. Duplicated regions of genomes frequently accumulate mutations faster than a functional single-copy gene over generations and often lose function (Taylor and Raes 2004); however, the correspondence of the majority of the sorghum QTL to maize QTL in both expected syntenic regions indicate that, in the case of the flowering time genes underlying these QTL, both duplicated genes are likely to be functional, potentially indicating that the maintenance of functional alleles at duplicated loci is being favoured by natural and/or artificial selection, in contrast to the general observation of biased fractionation in regions of the maize genome derived from whole genome duplication (Woodhouse et al. 2010).

The results of this study highlight the value of using sorghum as a tool to facilitate the dissection of the genetic control of quantitative traits in maize and vice versa. In particular, in situations when the maize syntenous QTL is in the heterochromatin, but the corresponding sorghum syntenous QTL is in the euchromatin (e.g. multiple sorghum QTL corresponding to the heterochromatic region on maize chromosome 2, Fig. 7), the information on the potential candidate genes from the shorter physical fragment underlying the sorghum QTL can be used to inform potential candidates in maize too. Similarly, we can use maize as a model for sorghum, e.g. the sorghum QTL in the heterochromatic region of SBI-05 (QDTF_NAM_5_67) corresponds to two maize QTL, both for DS on two duplicated regions on maize chromosome 4 (DS_4_40.4 and DS_4_105). The mpQTL probability plots based on log Fisher P show a sharp decrease in the P value in the middle of this QTL region, however, it does not fall below the set threshold (Fig. 8). The additional evidence of the two maize QTL provides support for further dissecting the larger sorghum QTL on SBI-05 into two separate QTL; this approach also identified another sorghum QTL on SBI-02 (QDTF_NAM_2_147) that could be further dissected into two separate QTL based on the evidence of multiple maize QTL underlying it. The high degree of correspondence between the maize and sorghum NAM flowering time studies, combined with the high proportion of phenotypic variance explained in the NAM study (89 %), provide further evidence that the majority of the genetic regions controlling flowering time in the sorghum BC-NAM population have been identified.

Fig. 8
figure 8

Segment of SBI-05 showing location of sorghum NAM QTL, QDTF_NAM_5_67 based on log Fisher P plot and two co-locating maize NAM QTL. Grey region on chromosome bar indicates heterochromatic region of SBI-05

The genetic architecture of flowering time, therefore, has been demonstrated to have a number of similarities between maize and sorghum based on the number and location of QTL identified for flowering time in both the maize and sorghum NAM populations; however, differences in the magnitude of estimated QTL alleles were observed. In maize, the majority of QTL allele effects were modest with ~80 % of QTL alleles effecting DS by less than 0.5 days. In contrast in sorghum only 22 % of QTL alleles had an effect size of less than 0.5 days, with 18 % having an effect size of greater than 1 day (compared to 2 % in maize). Given that there appears to be a selective advantage to maize in maintaining alleles at both duplicated loci it could be hypothesized that if similar pressures are acting on sorghum then we might see larger effect sizes and a greater number of alleles in each putative allelic series. Additionally, the larger range in most putative allelic series in sorghum in comparison to maize could have been influenced by the selection we carried out for a particular maturity range in our BC-NAM population. Evidence for the putative allelic series, with both early and late effects relative to R931945-2-2, was found in 75 % of the QTL identified in our study, compared to between 69 and 72 % in the maize NAM study. In contrast, in other studies using the maize NAM population to dissect the genetic architecture of other complex traits, the proportion of QTL identified with allelic series was much smaller, e.g. only 37 % of QTL with both positive and negative effects for southern leaf blight resistance (Kump et al. 2011) and only 31–43 % with allelic series for kernel composition traits (Cook et al. 2012). The similarity between the proportion of flowering time QTL identified in the sorghum and maize NAM populations with putative allelic series, despite differences in the magnitude of allele effects, likely reflects the adaptive importance of maintaining variability for this trait, to enable the plant to adjust flowering time in response to changing environmental conditions.

The synteny of QTL within, and particularly across, taxa provides opportunities to fast-track candidate gene identification, through the identification of common candidate genes in sytenous QTL regions. By integrating information on genes involved in flowering time across cereal crops from the reviews of Higgins et al. (2010) and Cockram et al. (2007) with previously identified flowering time genes in sorghum (Mace et al. 2009b), we determined 77 genes involved in flowering time and mapped these sequences in silico on the sorghum genome. This yielded a total of 221 loci, out of which 58 corresponded to 18 sorghum BC-NAM QTL for flowering time (ESM Table S4). To demonstrate the utility of this approach, we looked at the closest QTL syntenous between maize and sorghum when projected on to the sorghum genome; QDTF_NAM_9_25 whose mid-point was just 0.03 cM away from the projected location of the significant marker for the maize QTL DS_6_6.9. One candidate gene for flowering time was located within 3 cM of the syntenous sorghum (QDTF_NAM_9_25) and maize (DS_6_6.9) QTL; APETELA2, a floral identity gene (SbAP2 or Sb09g002080) which in Arabidopsis is thought to be a floral repressor, causing late flowering when overexpressed (Putterill et al. 2004). And such an approach, however, assumes that both QTL positions and annotation and gene prediction in the genome databases are accurate; in particular larger QTL intervals are more likely to encompass multiple candidate genes (e.g. of the five sorghum BC-NAM QTL with CI > 10 cM, four had five or more underlying candidate genes for flowering time), so any gene within the interval may be considered a candidate until the interval can be narrowed by either further genetic analysis or cross taxa comparison. In addition, it is possible that QTL represent multiple genes in tight linkage. Nevertheless, our preliminary analysis of candidate genes underlying syntenous QTL in sorghum and maize indicate that unification across taxa provides additional power and resolution to identify candidate genes underlying QTL intervals influencing a complex trait such as flowering time.

Implications for plant breeding

Flowering time is a fundamental trait for plant breeders developing varieties with adaptation to particular target environments and is often the subject of early generation selection in sorghum. Changes in climate brought about by global warming will likely require plant breeders to change the maturity genes present in their elite breeding populations (e.g. Craufurd and Wheeler 2009). To efficiently achieve such change will require detailed knowledge of the genetic architecture of flowering time and its association with other traits. It is critically important that breeders know if particular flowering time alleles are linked to genes for other traits of interest. In such circumstances, early generation selection may aid or restrict progress for those traits. An analysis of the genome-wide distribution of the flowering time QTL known to date in sorghum also indicates that they are not evenly distributed on a centimorgan basis (ESM Fig. S2), with flowering time QTL occurring near the centromere in the heterochromatic regions, which are regions known to suppress recombination (e.g. Akhunov et al. 2003), of every chromosome. Heterochromatic enrichment (on a centimorgan basis) for QTL has been previously observed (Mace and Jordan 2011) and creates conditions favourable for heterosis to occur via pseudo over dominance. Consequently, an understanding of the impact of flowering time and favourable and unfavourable genetic linkage will be critical for the development of efficient selection strategies, particularly those involving molecular markers.

One further application of detailed knowledge of the flowering time network in sorghum is the capacity to use crop models to predict the consequences of particular combinations of flowering time QTL in particular environments including likely future climate change scenarios. Quantitative crop growth simulation models such as APSIM-sorghum (Hammer et al. 2010), can link changes in the allelic values of genes directly to their phenotypic consequences at the crop level (van Oosterom et al. 2006). Simulation models represent a very important resource for the modern breeder (Sun et al. 2011), facilitating the link between genotype to phenotype and capturing and connecting research at molecular, organism and crop system scales. Until very recently, the level of understanding of the dynamics and functionality of gene networks that determine the link between the genotypic expression of a particular gene (or QTL) and its consequence at the phenotypic level has been lacking and mostly limited to a few model species (van Oosterom et al. 2006; Hammer et al. 2004). Our improved global understanding of the genetic architecture of sorghum, through a meta-analysis of existing QTL studies in addition to the data generated from the current study using the BC-NAM population provides the required input data for the sorghum crop simulation model to dynamically simulate the phenotypic effect of changes in any given genotype, and hence accurately predict the phenotypic consequences of new combinations of genes.

Conclusions

The successful resolution of flowering time genetic architecture in sorghum demonstrates the power of the sorghum BC-NAM resource to catalogue allelic variants at a maximal number of QTL and understand their functional allelic variation and distribution across diverse germplasm. The use of individual populations precludes the generation of such a comprehensive catalogue of both QTL and alleles as it is virtually impossible to create a single bi-parental population that segregates for allelic variants at all genetic loci influencing a complex phenotype (Lin et al. 1995).

The sum of the allele effects presented in this study indicate that the QTL identified cause significant variation in flowering time across the BC-NAM families, with a maximum summed additive effect of QTL alleles of almost 20 days in an individual family. This study provides important information for sorghum breeders and researchers to further manipulate flowering time, which is likely to be of increasing importance in response to climate change.

The utility of the BC-NAM approach presented here is exemplified by correspondence of QTL previously identified in sorghum and by correspondence of QTL in different taxa. In particular, the significant correspondence of the flowering time QTL identified in the maize and sorghum NAM resources highlights the synergies of studying traits across maize and sorghum, which provide exciting new opportunities and real efficiencies for gene discovery and the dissection of the genetic architecture of many quantitative traits, including but not limited to flowering time.