Introduction

Sorghum (Sorghum bicolor (L.) Moench) is an annual C4 plant belonging to the botanical family Poaceae under the Andropogoneae tribe (Clifford et al. 1990). It is the 5th most important cereal crop globally (FAO 2019) and a dietary staple for over 750 million people in the semi-arid regions of the world (FAO 2018). Because of its ability to cope with unfavorable growing conditions, sorghum will continue to feed the world’s expanding populations under the changing climate (Paterson 2008). Therefore, continuous improvement of sorghum cultivars for high yield is one of the main goals of sorghum breeding programs.

Yield is a polygenic trait and is affected by many factors such as plant phenology, morphology, and other physiological indices (Nadolska-Orczyk et al. 2017). Uncovering the genetic basis of these traits is critical for their effective manipulation, thus making the crop more efficient and resilient under a changing climate (Cattivelli et al. 2008). During the last two decades, extensive efforts have been made to identify genomic regions/quantitative trait loci (QTLs) underlying traits of agronomic interest in sorghum through bi-parental linkage mapping studies (Crasta et al. 1999; Haussmann et al. 2002; Rama Reddy et al. 2014; Sanchez et al. 2002; Subudhi et al. 2000; Sukumaran et al. 2016; Tao et al. 2000; Tuinstra 1997; Xu et al. 2000). However, this approach provides low mapping resolution, limited allelic diversity, and population specificity of detected QTLs (Feltus et al. 2006; Gupta et al. 2005; Korte and Farlow 2013). These limitations thus partly contributed to the slow transfer of knowledge from bi-parental QTL studies to practical applications in plant breeding.

In recent years, genome-wide association study (GWAS) has been widely used to identify genomic regions controlling traits of interest. Albeit being prone to false positive results, its high resolution and broader allele coverage make GWAS an important addition to the toolkit for genetic dissection of complex traits (Fang et al. 2017; Li et al. 2012; Ma et al. 2018; Zhao et al. 2011; Zhu et al. 2008). Sorghum is an ideal crop for linkage mapping studies due to its moderate linkage disequilibrium and self-pollination system (Hamblin et al. 2005). Several studies in sorghum have recently used association mapping to uncover the genetic control of important traits. Traits are as follows: flowering time (Bouchet et al. 2017; Zhao et al. 2016); plant height, panicle length, panicle exertion, tiller number, and seed number (Shehzad and Okuno 2014; Zhao et al. 2016); culm length and number of panicle (Shehzad and Okuno 2014); inflorescence trait components (Morris et al. 2012); grain fill duration, panicle weight, and harvest index (Boyles et al. 2015); and grain yield (Boyles et al. 2016). However, most of these studies had various limitations: Firstly, most of these studies used germplasm that had gone through the sorghum conversion program (Morris et al. 2012; Zhao et al. 2016) reducing genomic diversity in regions targeted for selection and hence limited success to dissect underlying loci for various traits in sorghum (Morris et al. 2012). Secondly, they were based on single-locus GWAS (SL-GWAS) methods that are limited in detecting marginal effects quantitative trait nucleotides (QTNs) (Wang et al. 2016), and hence the multiple QTNs controlling complex traits could not be effectively identified in sorghum.

To overcome the major limitations of SL-GWAS, a series of multi-locus GWAS methods, including mrMLM (Wang et al. 2016), FASTmrMLM (Tamba et al. 2017), FASTmrEMMA (Wen et al. 2017), ISIS EM-BLASSO (Tamba et al. 2017), pLARmEB (Zhang et al. 2017), and pKWmEB (Ren et al. 2018) have emerged as a powerful tool for QTN detection and QTN effect estimation for complex traits (Wang et al. 2016; Li et al. 2017; Chang et al. 2018; Peng et al. 2018). The approach has already been successfully utilized to dissect the genetic basis of important traits in several crops, such as maize (Zhang et al. 2018), rice (Liu et al. 2020) and barley (Hu et al. 2018). In addition, Ethiopia is a center of origin and diversity of sorghum and has tremendous genetic diversity in the crop for various traits (Snowden 1936; Stemler et al. 1977). The availability of such diverse germplasm provides an opportunity for new insight into the genetic architecture of important traits, and applying this knowledge in sorghum breeding programs might advance efficient genetic improvement of this crop.

In this study, we utilized the advantageous multi-locus GWAS to investigate the genetic control of nine important agronomic traits in natural population of 304 sorghum accessions by using 79,754 high quality SNP markers. We aim to identify common QTNs via multiple methodologies and then deduce potential candidate genes that can be further validated and utilized in marker-assisted selection (MAS) to enhance the efficiency of cultivar development.

Materials and methods

Plant materials and phenotyping

A total of 304 diverse sorghum accessions were collected from farmers’ fields of major sorghum growing regions (Amhara, Oromia, Southern Nations, and Tigray) of Ethiopia. The complete list of accessions and relevant information are previously reported (Wondimu et al. 2021). These accessions were evaluated for important agronomic traits at two environments, Kobo (North Ethiopia, altitude: 1400 m) and Mieso (East Ethiopia, altitude: 1380 m) during the 2018 cropping season. The meteorological data for the two environments is given in Supplementary Table 1.

In brief, with an alpha lattice design, all accessions were sown at two field sites in two replications with a plot size of 4.5 m2 consisting of 2 rows with a spacing of 75 cm between rows. Fertilizer was applied at the rate of 100 kg/ha DAP at planting and 50 kg/ha urea at about 35 days after planting. Data were collected for nine agronomic traits following the standard sorghum descriptor (IBPGR and ICRISAT 1993). Days to 50% flowering (DF) was recorded as the number of days from emergence until 50% of the panicles in a plot were at mid-anthesis. Plant height (PH) was measured at the flowering stage from the ground surface to the tip of the main panicle; panicle exertion (PE) was measured as the length between the base of flag leaf and the base of the panicle; and number of tillers per plant (TN) was counted on the main stalk when the flower was in full bloom. At maturity, main panicles, from the ten random plants already earmarked, were cut and oven dried at 70 °C for 72 h. Before threshing, all panicles were weighed to get an average panicle weight (PWT), then the panicles were manually threshed and the weights of grain yield per panicle (GYP) and hundred seeds (HSW) were recorded. Structural panicle mass (SPM) was calculated as the difference between PWT and GYP, and grain number per panicle (GNP) was estimated as the ratio of GYP to HSW and multiplied by 100.

Phenotypic data analysis

Summary statistics were calculated for each trait at each environment. Phenotypic data from each environment were analyzed by a single environment linear mixed model with sorghum accessions fitted as fixed effects. The model was illustrated as:

$${{Y}}{{i}}{{j}}{{k}}={ }{{\mu}}+{ }{{g}}{{i}}+{ }{{r}}{{k}}+{ }{{b}}{{j}}{{k}}+{ }{{\varepsilon}}{{i}}{{j}}{{k}}$$

where yijk is the random phenotypic effect of the genotype i at block j, in replication k; μ is the general mean; gi is the fixed effect of genotype i; rk is the random effect of replication k; bjk is the random effect of block j, in replication k; εijk is a random non-genetic effect, with εijk ~ N (0, σ2).

To assess the effects of genotype (G), environment (E), and G × E interaction for each trait, the two environments were combined, and the genetic effect associated with accessions was decomposed into two components, the genetic effect of accessions and the interaction effect between accessions and environment (G × E effect). The linear mixed model was:

$${{Y}}{{i}}{{j}}{{k}}{{l}}={ }{{\mu}}+{ }{{E}}{{i}}{ }+{ }{{r}}{{j}}{ }({{E}}{{i}}){ }+{ }{{b}}{{k}}{ }({{E}}{{i}}{ }{{r}}{{j}}){ }+{ }{{G}}{{l}}{ }+{ }{{E}}{{i}}{ }{{G}}{{l}}{ }+{ }{{\varepsilon}}{{i}}{{j}}{{k}}{{l}}$$

In this case, the new terms Ei and Ei Gl are the random effects of environment and environment by genotype interaction, respectively. Fixed and random effects in the model were tested using the F-test and likelihood ratio test (Neyman and Pearson 1928), respectively. Variance components were estimated using a residual maximum likelihood method (Harville 1977). Broad-sense heritability (h2) value for all traits was then calculated using the formula given by Allard (1999). All mixed model analyses were performed using the REML (residual maximum likelihood) algorithm of SAS v9.2 (SAS Institute Inc 2008).

SNP genotyping

The 304 sorghum accessions were genotyped using genotyping-by-sequencing (GBS) methodology (Elshire et al. 2011), as briefly described in our previous work (Wondimu et al. 2021). The raw data for all accessions across 115,501 SNPs is publicly available at figshare ( https://doi.org/10.25387/g3.12813224). Data filtering using minor allele frequency (MAF > 5%) for the 304 samples yielded a total of 79,754 high quality SNP markers for the current genome-wide association study.

LD

Pairwise linkage disequilibrium (LD) as measured by the allele frequency correlations (r2) of each pair of SNPs was estimated separately for each chromosome and across the ten chromosomes in TASSEL 5.0 using a sliding window of 50 bp (Bradbury et al. 2007). The critical value of r2 of 0.1 was considered as LD decay criterion (Nordborg et al. 2002; Palaisa et al. 2003; Remington et al. 2001). LD decay curve for each chromosome and whole genome level was fitted using a non-linear regression model in R software (R Core team 2019), as described by Remington et al. (2001).

ML-GWAS

Multi-locus genome-wide association analysis (ML-GWAS) analyses were performed using three datasets: (i) Kobo-2018 (E1), (ii) Mieso-2018 (E2), and (iii) Kobo-2018 and Meiso-2018 combined dataset (Em). Best linear unbiased estimators (BLUEs) of the genotypic values for each of the above nine traits in two environments (E1 and E2) and their combined dataset (Em) were estimated using the REML algorithm, as described above. Marker-trait association analyses were performed using six ML-GWAS methods, including mrMLM (Wang et al. 2016), FASTmrMLM (Tamba and Zhang 2018), FASTmrEMMA (Wen et al. 2018), pLARmEB (Zhang et al. 2017), pKWmEB (Ren et al. 2018), and ISIS EM-BLASSO (Tamba et al. 2017) implemented in the “mrMLM.GUI” R package (https://cran.r-project.org/web/packages/mrMLM/index.html). Population structure for these accessions has been previously estimated as six subpopulations (Wondimu et al. 2021) using ADMIXTURE analysis (Alexander et al. 2009). The co-ancestry coefficient matrix (Q) of the 304 accessions is publicly available at figshare ( https://doi.org/10.25387/g3.12813224). Kinship matrix (K), an estimate of the level of relatedness among individuals, was internally calculated within mrMLM.GUI package. The population structure (Q) and kinship (K) matrices were then included in all the tested models to minimize the identification of false-positive associations and increase the statistical analysis power. All parameters in GWAS were set at default values. The critical threshold for significantly associated QTNs was set at LOD ≥ 3.0 for all the six multi-locus models, as described in previous studies (Tamba et al. 2017). The resulting -log10 (P) values from the ML-GWAS approaches were used to draw the Manhattan and Q-Q plots using the mrMLM.GUI package in R software (R Core team 2019).

Identification of reliable/stable QTNs and candidate genes

We considered a QTN reliable when it is detected by at least three multi-locus GWAS methods and/or in at least two situations (E1, E2, and Em). Additionally, QTNs that are consistently detected across at least two situations (E1, E2, and Em) were further regarded as stable QTNs and followed in this study. To determine the regions of interest for selection of potential candidate genes, the average LD decay in which flanking SNP markers had strong LD (r2 > 0.1) was used. All the genes present in the association region with known putative functions were extracted from the most recently annotated sorghum reference genome v3.1 (McCormick et al. 2017) available at phytozome (https://phytozome.jgi.doe.gov). By comprehensive analysis of gene annotation information promising candidate genes for each trait were further mined.

Results

Phenotypic variation

The distributions of the nine agronomic traits measured in sorghum accessions evaluated in this work are depicted graphically using histograms (Fig. 1). Two-way ANOVA showed significant (p < 0.05) differences among the genotypes (G) and genotype by environment (G × E) interaction effects for all the traits studied (Table 1), suggesting the wide genetic variability among the Ethiopian sorghum accessions, which provides opportunities for effective selection. As for heritability estimates, the traits DF, PH, PE, and HSW presented relatively high heritability values (h2 > 0.5), while TN, PWT, GYP, SPM, and GNP had moderate heritability estimates (Table 1).

Fig. 1
figure 1

Histogram showing the distribution of the nine agronomic traits evaluated in two different environments. DF, days to flowering (days); PH, plant height (cm); TN, number of tillers per plant (no.); PWT, panicle weight (g); GYP, grain yield per panicle (g/panicle); SPM, structural panicle mass (g); HSW, hundred seed weight (g); GNP, grain number per panicle (no.); PE, panicle exertion (cm). Environments, E1, Kobo-2018; E2, Mieso-2018

Table 1 Two-way analysis of variance and descriptive statistics for nine agronomic traits of sorghum accessions evaluated in two environments

Comparing the mean performance of the accessions in each of the environments (Table 1), mean days to flowering (DF) was slightly earlier in E1 (96 days) than E2 (99 days); however, mean plant height (PH, 311.45 cm) and mean panicle exertion (PE, 9.08 cm) were relatively higher in E1 than the 270.24 and 6.56 cm observed in E2 (Table 1).

Structural panicle mass (SPM) and grain number per panicle (GNP) had greater variation in E2 than in E1. However, the remaining traits displayed more consistent variation between the two environments. The complete phenotypic data of all accessions in two environments (E1 and E2) and their combined data (Em) are provided in Supplementary Table 2.

Whole genome patterns of LD

Characterizing patterns of LD is critical for the design of association studies (Mather et al. 2007) and interpretation of association peaks (Huang et al. 2010). In general, there was a rapid LD decay with increasing physical distance along the 10 sorghum chromosomes (Fig. 2 and Supplementary Fig. S1). At a threshold value of 0.1, LD decays within 60–80 kb on chromosomes 5, 6, 7, and 9 but 80–100 kb on chromosomes 1, 2, 3, 4, 8, and 10 (Supplementary Fig. S1). On average, LD decays to background levels (r2 < 0.1) within 100 kb (Fig. 2).

Fig. 2
figure 2

Genome-wide LD (r2) decay in the 304 Ethiopian sorghum accessions. Average r2 (squared allele frequency correlation between pairs of SNPs) were plotted against the corresponding genetic distance between markers. The vertical solid green line represents the average genome-wide LD decay (i.e., LD decay = 64,550 base pairs) point

This LD decay estimate is higher than previously reported values in sorghum of 10–30 kb (Wang et al. 2013) and 10–15 kb (Hamblin et al. 2005). This difference may be due to the low coverage of the genome by the markers and the small number of genotypes in previous studies. Since sorghum is largely self-pollinated, we expect higher levels of LD than in outcrossing species (Flint-Garcia et al. 2003). Accordingly, the extent of LD in sorghum is similar to that of rice (∼65–150 kb) (Mather et al. 2007), another self-pollinated crop, but much greater than maize (∼2 kb) (Yan et al. 2009), which is an out-crosser. Although we expect mapping resolution to range widely across the genome depending on the chromosome, the overall modest LD decay rate (< 100 kb) makes this Ethiopian collection suitable for GWAS.

QTNs identified by ML-GWAS

To explore the genetic factors associated with nine agronomic traits, we conducted ML-GWAS based on a total of 79,754 high quality SNP markers (The genomic distribution of the SNP markers used in this study is shown in Fig. 3), and BLUEs from three datasets (E1, E2, and Em). Using six ML-GWAS models, a total of 338 QTNs distributed on 10 chromosomes were identified that are significantly associated with nine agronomic traits based on a LOD score threshold of ≥ 3 in three situations/environments (E1, E2, and Em), as summarized in Table 2. A full list of the QTNs significantly associated with the phenotypes in each environment (E1 and E2) and the combined dataset (Em) is presented in Supplementary Table 3, while the Manhattan and Q-Q plots of the ML-GWAS results are reported in Supplementary Figs. S2 and S3. Of the identified QTNs, 66, 110. and 162 were identified in E1, E2, and Em situations, respectively (Table 2). Among the ML-GWAS models, mrMLM resulted in the greatest number of significant QTNs identified (192), whereas the FASTmrEMMA had the lowest number of QTNs (78). Chromosome 1 had the highest number of the identified QTNs (49), followed by chromosome 9 (41), and chromosome 3 (39). Overall, the LOD value ranged from 3.01 to 8.37, and the proportion of phenotypic variance explained (r2) by each QTN ranged from 0.45 to 25.92% (Table 2).

Fig. 3
figure 3

Genomic distribution of the 79,754 high quality SNP markers across the ten sorghum chromosomes and their corresponding density

Table 2 Summary of significant QTNs identified in two environments and their combined data using six ML-GWAS methods

To obtain accurate results, only QTNs showing repeatability (i.e., detected by at least three different ML-GWAS models and/or in two different situations/ environments) were considered reliable. Using these criteria, we identified a total of 121 reliable QTNs significantly associated with nine agronomic traits, as presented in Supplementary Table S4. The 121 QTNs identified each explained a low percentage of phenotypic variation (PVE): DF (n = 13, PVE = 0.60–21.60%), PH (n = 13, PVE = 2.04–16.28%), TN (n = 9, PVE = 1.01–25.92%), PWT (n = 15, PVE = 1.54–15.65), GYP (n = 30, PVE = 0.79–13.64%), SPM (n = 12, PVE = 2.20–11.97%), HSW (n = 13, PVE = 0.01–16.54%), GNP (n = 6, PVE = 1.85–11.33%), and PE (n = 10, PVE = 1.86–17.76%). Additionally, a total of 29 QTNs were significantly associated with more than one trait (Supplementary Table S4). For instances, the traits DF and PH shared a common QTN (S10_13295281) mapped on chromosome 10 that on average explained ~ 4.50% of the variation for the traits, whereas GYP and GNP had seven common QTNs (S1_22881870, S1_28143445, S2_58161802, S3_12356222, S7_63176270, S9_38639556, and S10_47554177) on chromosomes 1, 2, 3, 7, 9, and 10, and accounting for 2.04–13.64% of the total phenotypic variance for these traits. The traits PWT, GYP, and GNP also shared four common QTNs (S1_70244848, S8_6755616, S8_48609940, and S9_438623) mapped on chromosomes 1, 8, and 9 (Supplementary Table S4).

Identification of stable QTNs and candidate genes

A total of 46 QTNs consistently detected in at least two environments (E1, E2, and Em) were regarded as stable QTNs (Table 3). All these stable QTNs were distributed on the 10 sorghum chromosomes, with chromosome 10 showing the lowest number of associations, while chromosome 8 showing the highest number of associations (10 QTNs associated with seven traits).

Table 3 List of stable QTNs co-detected in at least two environments for nine sorghum agronomic traits

Among the 46 stable QTNs detected in at least two environments, 7, 9, and 13 were detected by three, four, and five ML-GWAS methods, respectively (Table 3). Moreover, 7 QTNs (S1_56717177, S1_56748133, S7_42021189, S8_43981111, S8_6755616, S9_57542210, and S10_13295281) were identified by six ML-GWAS methods to be associated with five agronomic traits in at least two environments, with LOD score values ranging from 3.05 to 11.42 (Table 3). Interestingly, 2 QTNs (S8_43981111for DF and S9_48893285 for PE) with moderate effects (r2 =  ~ 6%) were consistently detected across all situations/environments (E1, E2, and Em). The region containing one stable QTN (S8_6755616, LOD = 3.14–7.75; r2 = 1.45–5.21%) on chromosome 8 was significantly associated with PWT and GYP in two environments (E1 and Em).

To further understand the genetic basis of agronomic traits, we detected several candidate genes surrounding 100 kb upstream and downstream of the above 46 stable QTN position, as suggested by the LD decay analysis in this study (Fig. 2). The complete list of candidate genes in proximity of the stable QTNs is reported in Supplementary Table S5). For instances, two putative candidate genes, Sobic.001G266200 and Sobic.007G193300 surrounding significant QTNs associated with DF have annotations as F-box and MADS-box family proteins, respectively, that are involved in multiple developmental processes in plants (Saha et al. 2015).

Two candidate genes (Sobic.001G013800 and Sobic.003G324400) were also identified for PH on chromosomes 1 and 3, respectively, with the first gene encoding Ser/Thr protein phosphatase family protein and the other encoding Ethylene responsive transcription factor (AP2/ERF) family protein (Supplementary Table S5). Interestingly, several candidate genes including, Sobic.009G075400 (Protein RALF-like 4), Sobic.008G102200 (Photosystem II reaction center protein), Sobic.004G053400 (similar to Auxin responsive protein-like), and Sobic.008G037300 (similar to Terminal flower1/TF1), and Sobic.009G237900 (Plastocyanin-like domain protein) were identified adjacent to the stable QTNs associated with TN, PWT, SPM, HSW, and PE, respectively. Further examples are given in Fig. 4 and Supplementary Table S5.

Fig. 4
figure 4

Linkage groups and chromosomal positions of stable QTNs and candidate genes identified for sorghum agronomic traits. The stable QTNs and candidate genes are labeled on the right side of chromosomes, and trait name abbreviations display different traits. QTNs and candidate genes on each chromosome are highlighted with colors. The intervals between adjacent loci in chromosomes denote the physical distance in mega bases

Discussion

Although several studies already identified the genetic basis of important agronomic traits in sorghum using GWAS (Bouchet et al. 2017; Boyles et al. 2016; Morris et al. 2012; Zhao et al. 2016), panels composed exclusively of sorghum accessions from the center of origin and diversity had not been sufficiently explored (Girma et al. 2019). Moreover, very few studies have implemented the ML-GWAS approach to identify genetic variants in sorghum. The use of ML-GWAS has become a powerful means to identify genomic regions underlying traits of interest, particularly for complex traits controlled by multiple genes of small effect (Wen et al. 2018; Zhang et al. 2019). Hence, associated genomic regions reported herein provide valuable knowledge that could be further investigated for advancing understanding of the genetic control of traits of economic and adaptive importance.

In this study, we identified a total of 121 reliable QTNs detected by at least three ML-GWAS models and/or in two different environments (Supplementary Table 4). A comparison of the six ML-GWAS methods revealed that mrMLM was more powerful and robust than the other five models in the detection of reliable QTNs for agronomic traits. Most of the QTNs identified in this study were observed in only one environment, supporting our observation of the presence of significant genotype by environment (G × E) interaction effects for all the traits studied (Wondimu et al. 2020; Table 1). The presence of the G × E interaction is one of the main challenges in selecting QTNs in breeding programs, as gene expression of these QTNs depend on the evaluation environments (Wu et al. 2020). On the other hand, the stable QTNs identified herein provide great prospects for future genetic improvement of the traits evaluated in this study through the accumulation of favorable alleles. Genetic correlations between traits can be ascribed to gene linkage and/or pleiotropy (Saltz et al. 2017). In this study, a total of 29 pleiotropic QTNs were detected associated with more than one trait (Supplementary Table S4). Among these, one QTN (S10_13295281) on chromosome 10 was associated with DF and PH. Another four pleiotropic QTNs (S1_70244848, S8_6755616, S8_48609940, and S9_438623) mapped on chromosomes 1, 8, and 9 were associated with PWT, GYP, and GNP. The presence of pleiotropic effects of these QTNs controlling different agronomic traits has previously been suggested by our phenotypic correlation analysis (Wondimu et al. 2020).

As most of the agronomic traits studied are controlled by polygenes, the effects of most of the QTNs identified in this study were small, confirming the quantitative nature of the traits (Gupta et al. 2020). Nonetheless SL-GWAS methods have been widely adopted; they are limited in detecting marginal effects QTNs (Wang et al. 2016), and hence the use of ML-GWAS methods can mitigate the above limitation and estimate the effects of all markers at the same time (Cui et al. 2018).

To explore the spectra of candidate genes, we focused on physical intervals supported by the LD decay information (i.e., 100 kb upstream and downstream of associated QTNs; Fig. 2). One of the stable QTN discovered in this study is S7_62550036 that explained ~ 4.0% of the variation in flowering time (DF) (Supplementary Table 5). This marker is in close proximity to Sobic.007G193300 gene, which encodes a MADS transcription factor family protein. MADS family members widely take part in the key regulatory pathways of plant growth and reproduction, including flower formation (Callens et al. 2018). In rice, the OsMADS family is involved in controlling flowering time and development of flower organs (Yu et al. 2014). Another QTN (S3_65025755) with important effect on plant height (PH) is located near the Sobic.003G324400 gene that encodes Ethylene responsive transcription factor (AP2/ERF) family protein, which has been reported to limit internode elongation by down regulating gibberellin biosynthesis genes in rice (Qi et al. 2011). The candidate gene, Sobic.008G102200, associated with panicle weight (PWT) encodes Photosystem II reaction center protein, which is important for light harvesting during photosynthesis (Pietrzykowska et al. 2014). Thus, its possible role in photosynthesis might in theory explain its association with panicle weight, as panicle yield can be determined by factors regulating photosynthetic rate (Ramamoorthy et al. 2017). The QTN (S8_3450030) associated with hundred seed weight (HSW) is very close to a gene, Sobic.008G037300 (Terminal flower1/TF1), which functions in the control of flowering time and floral architecture (Alvarez et al. 1992). Mutations in TFL1 accelerate flowering time and resulted in higher seed weight in Arabidopsis (Hanano and Goto 2011). Another gene, Sobic.009G237900, encoding plastocyanin-like domain (Cu_bind_like) protein was found near S9_57542210 associated with PE (Supplementary Table S5). Previous studies have indicated that phytocyanin gene family is involved in key plant activities, including apical bud organ development in plants (Fedorova et al. 2002).

Other candidates emerging from our search include genes putatively involved in biotic and abiotic stress responses, kinase activity, transport, and signal transduction (Supplementary Table 5). For instance, Sobic.001G266700 (zinc finger domain; C3HC4 zinc finger), and Sobic.004G028600 (Leucine-rich repeat receptor-like protein kinase/LRR-RLKs) were located near QTNs (S1_50707856 and S4_2182692, respectively) significantly associated with DF and TN. Previous studies identified C3H4-type zinc finger member, as the gene most strongly upregulated by various abiotic stresses including drought (Ali-Benali et al. 2012). It has also been proposed that LRR-RLKs might be involved in early responses to drought and ABA perception (Osakabe et al. 2005).

Conclusions

This study involved field-based phenotyping and genotyping-by-sequencing of Ethiopian sorghum landrace collection, representing a wide range of genetic variation that has evolved under diverse environmental conditions. This approach helped identified valuable loci and potential candidate genes underlying genetic variation in nine important agronomic traits of sorghum. Here, we presented a list of important QTNs and candidate genes that offer opportunities for identifying specific genes associated with complex traits and elucidating underlying biological functions. Furthermore, functional validation of these newly discovered candidate genes is important to confirm the association results observed in the present study and perhaps providing a foundation for engineering alternative alleles with still-greater value. Overall, the results reported herein advance our understanding of the genetic mechanisms underlying complex traits and further support the development of new DNA marker tools for efficient genetic improvement of this crop through molecular breeding.