Introduction

The Food and Agriculture Organization (FAO) estimates that the world’s population will surpass 9 billion by 2050 (Nations and United Nations. 2019). Fulfilling the food and feed demand, the average genetic gain per year of maize should be accelerated to more than 2%, which is a big challenge under the effect of climate change (Prasanna et al. 2021). With the change in precipitation, temperature, and humidity, crop diseases become the key factor that affects genetic gain. Stalk rot and ear rot are the two crucial major maize diseases having the highest impact on climate change (Prasanna et al. 2021).

Stalk rot, one complex fungal disease, can be caused by Fusarium verticillioides (F.v), F. graminearum (Gibberella), Colletotrichum graminicola (Anthracnose), and Pythium aphanidermatum, as well as some bacterial species of Erwinia with similar symptoms caused by Fusarium spp. (Chambers 1987). Fusarium stalk rot (FSR), caused by F.v, is one of the most disastrous diseases worldwide, especially in the tropical and subtropical zones (Savary et al. 2019; Chivasa et al. 2021). In South and Central America, the incidence of FSR is usually above 50% (Christensen et al. 2014). Generally, the incidence of FSR ranges from 30 to 70%, and it could surge to 90% in some specific years in India, China, and the Philippines (Duan et al. 2019). FSR can cause 38–100% yield loss of maize; furthermore, it can produce low molecular-weight secondary metabolites known as mycotoxins in the grain and the plant, bringing fatal harm to humans and other animals (Maize AICRP 2014; Subedi et al. 2016; Mueller et al. 2022). As one of the most aggressive pathogens, F.v can infect any part of maize from the beginning to the end of the cropping season, and it keeps alive on the residue of maize or the rotation crops during winter (Munkvold 2003; White 1999). Currently, the efficiency of controlling FSR disease in the field was relatively low, because the effective fungicides to control FSR disease are rare (Zhu et al. 2021; Holland et al. 2020). Alternatively, development and deployment of maize varieties with genetic resistance to FSR are the most cost-effective and environment-friendly approach.

Novel genetic tools, such as GWAS (Genome-wide association study), provide the opportunity to dissect the genetic architecture of FSR resistance for the development of maize varieties with genetic resistance to FSR more rapid and effective. Various genetic studies had reported that FSR resistance is a complex quantitative inherited trait, hundreds of QTL (quantitative trait loci) and genomic regions associated with resistance to stalk rot caused by different pathogens have been reported, including qRfg1 (Yang et al. 2010; Wang et al. 2017), qRfg2 (Zhang et al. 2012; Ye et.al. 2018), qRfg3 (Ma et al. 2017) and Rgsr8.1 (Chen et al.2017) for Gibberella stalk rot, Rpi1 (Yang et al. 2005), RpiQI319-1, RpiQI319-2 (Song et al. 2015), RpiX178-1 and RpiX178-2 for Pythium stalk rot (Duan et al. 2019), and Rcg1 (Jung et al. 1994) for Anthracnose stalk rot. The causal genes of qRfg1 and qRfg2 have been cloned, and functional markers were developed for the implementation of marker-assisted selection for improving stalk rot resistance.

A key genomic region on chromosome 6 at 168 Mb conferring FSR resistance, with the PVE values ranging from 6.16 to 8.38%, was identified by a GWAS analysis recently, which was further validated by linkage mapping in two F2:3 populations (Rashid et al. 2022). The candidate gene conferring FSR resistance in this crucial region is annotated as a nucleic acid binding protein, playing an integral part in gene silencing pathways, and responding to diverse abiotic stress tolerances in maize (Zhai et al. 2019; Qian et al. 2011). However, no more GWAS research was reported to dissect the genetic architecture of maize FSR resistance; more GWAS researches are required for a comprehensive understanding of the genetic architecture of FSR resistance in different genetics and breeding populations.

Genomic prediction (GP), another novel genomic tool, provides opportunities for improving breeding efficiency and accelerating the development of maize varieties with genetic resistance to FSR. GP was also known as genomic selection (GS), which offers an attractive alternative to conventional breeding or marker-assisted selection (Meuwissen et al. 2001). In GS, the effects of all the molecular markers across the entire genome were estimated to predict the genomic estimated breeding value (GEBV) of candidates to be selected (Vivek et al. 2017; Jannink et al. 2010; Meuwissen et al. 2001). Prediction accuracy was always used to evaluate the effectiveness of GS for improving the target trait; it was reported that key factors affecting prediction accuracy include trait heritability, prediction model, marker density, genotype-by-environment interaction (G × E), the relationship between the training and testing population, etc. (Edriss et al. 2017; Crossa et al. 2017; Guo et al. 2020; Mageto et al. 2020). However, the potential of exploiting GP for improving FSR resistance has not been reported; the potential of GS for improving FSR resistance needs to be further investigated by estimating the prediction accuracy of FSR resistance in different breeding scenarios and assessing the effects of different factors on the estimation of the prediction accuracy.

In this study, GWAS and GP analyses were performed on 562 tropical and subtropical maize inbred lines, where all of the maize inbred lines were screened in four environments under artificial inoculation to evaluate their response to FSR resistance and genotyped with genotyping-by-sequencing. The main objectives of the present study are to: (1) dissect the genetic architecture of FSR resistance, identify the significantly associated single-nucleotide polymorphisms (SNPs) and stable genomic regions conferring FSR resistance, and estimate the genetic effects of the favorable alleles and haplotypes in improving FSR resistance; (2) evaluate the prediction accuracy of FSR resistance in different breeding scenarios to explore the potential of GP for improving FSR resistance by performing the predictions with various cross-validation schemes and different prediction models, within and across populations; and (3) investigate the effects of key factors on estimation of prediction accuracy of FSR resistance, including to incorporate the G × E into prediction, utilize the combined phenotypic datasets from the same year or same location for GP analysis, and select a subset of molecular markers by considering both the genome coverage and the threshold of the P-value of SNPs for prediction.

Materials and methods

Plant materials

In the present study, 562 tropical and subtropical maize inbred lines from two populations were used to conduct GWAS to dissect the genetic architecture of FSR resistance and estimate the prediction accuracy of FSR resistance under different scenarios. The first population, designated as the CIMMYT maize lines (CML) panel, consists of 280 tropical and subtropical maize inbred lines developed by the International Maize and Wheat Improvement Center (CIMMYT). From 1984 to 2023, CIMMYT developed and released a total of 647 CMLs, which represent a significant portion of the genetic diversity of CIMMYT maize germplasm (Wu et al. 2016). In the present study, the lowland tropical, mid-altitude/subtropical lines between CML300 and CML603 were selected for screening their response to FSR resistance.

The second population, designated as the Drought Tolerant Maize for Africa (DTMA) panel, consisted of 282 tropical and subtropical inbred lines developed by CIMMYT. These lines originated from different breeding programs of CIMMYT, consisting of different kinds of lines with tolerance or resistance to an array of abiotic and biotic stresses (Yuan et al. 2019).

Experimental design

The CML and DTMA populations were screened for FSR resistance at two CIMMYT experimental stations, Agua Fria (AF), in the state of Puebla in Mexico (97°38′ W, 20°28′Ν, 110 m above sea level); and Tlaltizapan (TL), in the state of Morelos, Mexico (99°07'W, 18°4'N; 940 m above sea level). The CML population was planted in AF and TL in the summer season of 2018 and 2019. The DTMA population was planted in AF and TL in the summer season of 2014 and 2019. A randomized complete block design was used for all experiments with three replications per location and a single-row plot per replication. Each plot was 2.5 m long with 11 plants. The distance between rows was 0.80 m, and the distance between plants in a plot was 0.25 m.

The environment was defined as a combination of year and location. Therefore, each population would have phenotypic data from four environments and 12 data points. For example, the CML population was screened in four environments, designated as 2018AF, 2018TL, 2019AF, and 2019TL, respectively.

Artificial inoculation and evaluation

All the 562 inbred lines were artificially inoculated with the pathogen of Fusarium verticillioides (F.v), which was the main pathogen causing stalk rot in maize in Mexico (Prasanna et al. 2021). It was cultured in fresh potato dextrose agar plates in which sterile toothpicks were inserted. The culture was incubated at 25 °C for 2 weeks, and the infected toothpicks were used for inoculation (Lal and Singh 1984). Fourteen days after flowering, all plants in each plot were inoculated by inserting infected toothpicks into a drilled hole on the first stem segment (approximately 0.1 m above the soil surface). A recent study reported that the toothpick inoculation method is effective with similar performance as other widely used inoculation methods, such as soil inoculation, drilling inoculation, and needle injection (Asiedu et al. 2024).

At the harvest, the plants were cutoff at the height of the cob approx. 0.50 to 1.00 m high above the ground, and the stalks were split longitudinally through the points of inoculation. Disease severity was estimated by the formula below:

$$\begin{aligned} {\text{FSR severity }}\left( \% \right) \, =& {\text{ visible lesion area}}/\\&{\text{ whole longitudinal cut area}} \times {1}00\%\end{aligned}$$

The FSR severity ranges from 0 to 100%. The FSR severity close to 0% (no visible disease symptoms or lesions identifiable on the stalk) means that the line has the highest level of resistance to FSR, i.e., the lowest FSR severity, whereas the FSR severity close to 100% means that the line has the lowest level or no resistance to FSR, i.e., the highest FSR severity.

Phenotypic data analysis

For the CML and DTMA population, the best linear unbiased estimate (BLUE) values and broad sense heritability (H2) of FSR severity were analyzed within the single environment analysis and the combined analysis across environments (CombinedENV) by META-R software (https://hdl.handle.net/10883/20997) (Alvarado et al. 2020) using the mixed linear model. The mixed linear model applied in META-R was implemented with the ‘lme4’ (Bates et al. 2015) R-package using the function of ‘lmer’. Meanwhile, the estimation of variance components in the mixed linear model was used the function of ‘reml’. The formula was as follows:

$$\mathrm{Individual \,environment}: {y}_{ijk}=\mu +{g}_{i}+{e}_{j}+{r}_{k}{e}_{j}+{\varepsilon }_{ijk}$$
$$\mathrm{Combined \,environments}: {y}_{ijk}=\mu +{g}_{i}+{e}_{j}+{ge}_{ij}+{r}_{k}{e}_{j}+{\varepsilon }_{ijk}$$

where \({y}_{ijk}\) is the FSR severity, µ is the overall mean, \({g}_{i}\), \({e}_{j}\), and \({ge}_{ij}\) are the effects of the ith genotype, jth environment, and ith genotype by jth environment interaction, respectively. \({r}_{k}{e}_{j}\) is the effect of the kth replication within the jth environment. \({\varepsilon }_{ijk}\) is the residual effect of the ith genotype, jth environment, and kth replication. Genotype is treated as the fixed effect, whereas all other effects are declared as random effects. Moreover, there is no \({ge}_{ij}\) (interaction between genotype and environment) in the single environment analysis.

The environment with an estimated heritability below 0.40 was excluded from the CominbedENV analysis. The H2 of FSR severity in individual environment analysis and CombinedENV analysis were calculated as:

$$\mathrm{Individual \,environment \,analysis}: {H}^{2}=\frac{{V}_{g}}{{V}_{g}+\frac{{V}_{e}}{k}}$$
$$ \mathrm{CombinedENV \,analysis}: {H}^{2}=\frac{{V}_{g}}{{V}_{g}+\frac{{V}_{{\text{ge}}}}{j}+\frac{{V}_{e}}{jk}}$$

where \({V}_{g}\) is genetic variance, \({V}_{ge}\) is the variance of interaction between genotype and environment, \({V}_{e}\) is error variance, \(j\) is the number of environments, and \(k\) is the number of replications within each environment.

In addition, description statistics of phenotypic data analysis was carried out in IBM SPSS Statistics, version 22.0 (IBMCorp. 2022). The distributions of FSR severity in the individual environment and CombinedENV analysis were plotted in R (R Core Team 2020) using the ‘ggplot2’ package (Wickham 2016). The Pearson correlations of FSR severity among each single and combined environment in the populations of CML and DTMA were calculated using the BLUE values and visualized in R using the package ‘ggcorrplot’ (Wickham et al. 2016). Moreover, the top 10 lines with the lowest FSR severity and the bottom 10 lines with the highest FSR severity were identified within each population.

Genotyping, GBS, and SNP calling

Total genomic DNA was extracted from bulked young leaves for all lines using a CTAB procedure (Doyle and Doyle 1987). Genotyping was performed at Cornell University Biotechnology Resource Center (Ithaca, NY). Genomic DNA was digested with the restriction enzyme of ApeK1. Genotyping-by-sequencing (GBS) libraries were constructed in the 96-plex and sequenced on Illumina HiSeq2000 (Elshire et al. 2011). SNP calling was performed using the TASSEL GBS Pipeline, where the GBS Version 2.7 TOPM (tags on physical map) file downloaded from Panzea (www.panzea.org) was used to anchor reads to the maize reference genome of B73 RefGen_v2 (Glaubitz et al. 2014; Wang et al. 2020). For each inbred line, 955,690 SNPs were called, 955,120 SNPs of them were evenly distributed on the ten maize chromosomes, while the other 570 SNPs were without position information.

GWAS analysis

Before GWAS analysis, quality control of the genotypic data is an important step to ensure the accuracy of the later analysis. The combined population consisting of all the 562 inbred lines was abbreviated as CominbedPOP. The raw GBS datasets were filtered with a minor allele frequency (MAF) above 0.05, missing data rate below 30%, and heterozygosity rate below 5% in TASSEL V5.0 in the populations of CML, DTMA, and CombinedPOP, respectively. Then, the imputation was performed with the default parameters in TASSEL 5.0 (Bradbury et al. 2007) using the LD KNNi method (Money et al. 2015). The imputed GBS datasets and the BLUE values of FSR severity were used to conduct GWAS analyses in all three populations mentioned above.

Bayesian information and linkage disequilibrium iteratively nested keyway (BLINK) model (Huang et al 2019) was chosen to detect the associations between the SNPs and FSR severity in the GWAS analysis, because this model effectively reduces the false positives. In addition to the capability to incorporate principal components (PCs) and kinship (K) as covariates to reduce false positives, BLINK iteratively incorporated associated markers as covariates to eliminate their unclear connection to the individuals. Moreover, the SNPs sampled in the BLINK model were selected according to linkage disequilibrium, optimized for Bayesian information content (BIC), and re-examined across multiple iterative to reduce false positives. The BLINK conducted two fixed effect models and one filtering process (Huang et al. 2019).

The GWAS analysis using the BLINK model was run by GAPIT version 3 (Wang et al. 2021) in R. Meanwhile, PC analysis and K matrix calculation were also conducted using default parameters. The default threshold of P-value in BLINK for selecting SNPs significantly associated with FSR resistance was defined as the Bonferroni multiple test threshold of 0.05/n, and n is the number of SNPs. The results of PC analysis were visualized in R using the first two principal components by the package ‘ggplot2’ (Wickham et al. 2016). The Manhattan and quantile–quantile (QQ) plots of GWAS results were drawn by the ‘complot’ package (Van den Ende et al. 2019) in R.

Genetic effect of the favorable allele

For each SNP significantly associated with FSR resistance detected by GWAS, the allele with a lower average FSR severity was assigned the favorable allele, whereas another allele with a higher average FSR severity was assigned the unfavorable allele. The formula for calculating the effect of each favorable allele was shown below:

Effect of each favorable allele = Average FSR severity of the lines carrying the favorable allele–Average FSR severity of the lines carrying the unfavorable allele.

Candidate gene analysis

The average linkage disequilibrium (LD) decay for each chromosome was measured in TASSEL V5.0, using sliding window analysis with a window size of 50 SNPs. Squared Pearson correlation coefficient (r2) between vectors of SNPs was used to assess the level of LD decay on each chromosome, and the average LD decay distance across ten chromosomes at r2 = 0.1 was used to measure the LD decay distance in the populations of CML, DTMA, and CominbedPOP (Yan et al. 2009). The LD decay results were plotted against physical distance (kb) in R by the package ‘ggplot2’ (Wickham et al. 2016).

Considering LD decay distance, the interval of the physical position of SNP ± LD decay distance was defined as a genomic region. The overlapped or partially overlapped genomic regions were joined together as one region. Putative genes located in all the genomic regions were considered as candidate genes conferring FSR resistance. Annotation of candidate genes was performed on NCBI (https://www.ncbi.nlm.nih.gov) and MaizeGDB (https://www.maizegdb.org).

Haplotype analysis within the genomic regions conferring FSR resistance

Haplotype blocks in genomic regions associated with FSR resistance were built using LDBlockShow software (Dong et al. 2021) based on standardized disequilibrium coefficients (D′) (Flint-Garcia et al. 2003), and the significantly associated SNPs selected at the threshold of P-value at 10–3 with in each genomic region were used to build the haplotype blocks.

Genomic prediction analysis

A five-fold cross-validation (CV) scheme with 20 replications was used to generate the training and validation sets randomly and assess the prediction accuracy. The average value of Pearson correlations between the true breeding values and the genomic estimated breeding values in the testing population was defined as the prediction accuracy (Liu et al. 2021). GP analysis was conducted using whole genome-wide SNPs and the BLUE values of FSR severity from single environment analysis in the populations of CML, DTMA, and CombinedPOP, respectively. The GP analysis was conducted using the BGLR library (Pérez et al. 2014) in the R program, where deviance information criterion (DIC) value was calculated for each model at the same time. The lower DIC value means that the model was more precise (Tomohiro 2011).

Two CV schemes were applied. The first scheme of CV1 was used to mimic one breeding scenario that predicts the newly developed lines, and it means that these lines have not been observed in any environment. The second scheme of CV2 was used to mimic sparse testing, in which some lines were observed in some environments but absent in others (Mageto et al. 2020).

To compare the prediction accuracy between phenotypic selection and GS and to assess the effects of incorporating genotype-by-environment interactions in improving prediction accuracy, three prediction models were applied. The first prediction model of M1 is a phenotypic prediction model, where the effects of the environment and lines were employed for prediction. The second prediction model of M2 is a general GP model where the effects of molecular markers were employed for prediction. The third prediction model of M3 is an extension of M2 incorporating G × E into prediction. More details of these three models were described in Method S1.

To evaluate the effects of year and location on estimation of prediction accuracy, the phenotypic data of FSR severity were analyzed within the CML population and the DTMA population by combining the data from the same location (CombinedAF and CombineTL) or the same year (Combined2018 and Cominbed2019 in the CML population, Combined2014 and Cominbed2019 in the DTMA population). Within each population, the prediction accuracy was estimated using the BLUE values of FSR severity from the same location or the same year.

To investigate the GP accuracy estimated with the significantly associated SNPs conferring FSR resistance, different numbers of SNPs detected by GWAS at different thresholds of the P-value of 10–3, 10–4, and 10–5 were selected for conducting GP analyses with M3 in CV2, only the unique SNPs across all the GWAS analyses were selected for GP analyses.

GP accuracy was also estimated between the CML population and the DTMA population, by training one population to predict the other as a testing population, where both the genome-wide SNPs and the significant SNPs conferring FSR resistance detected by GWAS at a P-value threshold of 10–3 were used for GP analyses with all three prediction models and CV2 scheme.

Results

Phenotypic variation of FSR severity and correlation analysis

The FSR severity had broader variations and higher average values in the CML population than those in the DTMA population across the individual environment and CombinedENV analyses, except for in 2019TL (Table 1, Fig. 1a and c). In the CombinedENV, the FSR severity in the CML population ranged from 29.17 to 92.50%, with an overall mean of 56.24%. The FSR severity in the DTMA population ranged from 17.41 to 79.86%, with an overall mean of 46.70%. The phenotypic differences between these two populations indicated their differences in genetic variations responding to FSR resistance, and in the disease, pressure occurred in different years.

Table 1 Summary information of Fusarium stalk rot (FSR) severity in the populations of CIMMYT Maize Line (CML) and Drought Tolerant Maize for Africa (DTMA) in each individual environment and the combined environment analyses
Fig. 1
figure 1

a The violin plot for the distribution of Fusarium stalk rot (FSR) severities in the CML population. b The phenotypic correlations of FSR severity among different environments in the CML population. c The violin plot for the distribution of FSR severities in the DTMA population. d The phenotypic correlations of FSR severity among different environments in the DTMA population. The individual environment was defined as a combination of year and location. Agua Fria and Tlaltizapan were abbreviated as AF and TL, respectively. The combined analysis across all environments was abbreviated as CombinedENV. Correlation coefficients with ** represent extremely significant correlations, and correlation coefficients with * represent significant correlations

The estimated heritabilities of FSR severity were medium to high in both populations, ranging from 0.67 to 0.85 in the CML population, and from 0.53 to 0.79 in the DTMA population by excluding the lowest heritability of 0.38 observed in the environment of 2014AF. In the CombinedENV analysis, the heritability of FSR severity in the populations of CML and DTMA was 0.77 and 0.55, respectively (Table 1).

The Pearson correlation coefficients of FSR severity among all the individual and combined environments were positive and moderate to high (Fig. 1b and d). The Pearson correlation coefficients between the CombinedENV and the individual environments were higher than those between the individual environments in both populations. In the CML population, the correlation coefficients between the individual environments ranged from 0.26 to 0.56, and the correlation coefficients between the CombinedENV and the individual environments ranged from 0.56 to 0.82. In the DTMA population, the correlation coefficients between the individual environments ranged from 0.15 to 0.44, and the correlation coefficients between the CombinedENV and the individual environments ranged from 0.64 to 0.77.

Within each population, the top ten resistant lines with the lowest FSR severity values and the bottom ten susceptible lines with the highest FSR severity values were identified based on the BLUE values in CombinedENV analysis (Supplementary Table 1). The FSR severity values of the top ten resistant lines ranged from 29.17% to 37.50% in the CML population (CML552, CML596, CML581, CML601, CML389, CML478, CML582, CML600, CML307, and CML401), and from 17.41% to 28.15% in DTMA population (DTMA175, DTMA143, DTMA180, DTMA155, DTMA187, DTMA192, DTMA146, DTMA261, DTMA60, and DTMA145). The bottom ten susceptible lines with the highest FSR severity values ranged from 78.33% to 92.50% in the CML population (CML511, CML360, CML590, CML334, CML467, CML584, CML585, CML591, CML329, and CML362), and from 67.78% to 79.86% in DTMA population (DTMA107, DTMA241, DTMA166, DTMA65, DTMA231, DTMA98, DTMA13, DTMA120, DTMA126, and DTMA29).

Population structure analysis and LD decay distance

After QC, 215,914, 209,111, and 221,190 SNPs were selected to perform further genetic analysis in the population of CML, DTMA, and CominbedPOP, respectively. The high-quality SNPs were distributed evenly on ten chromosomes in all three populations. The average MAF was 0.22, 0.19, and 0.20 in the population of CML, DTMA, and CominbedPOP, respectively. The average missing rate was 9%, 3%, and 6% in the population of CML, DTMA, and CominbedPOP, respectively (Fig. S1).

The result of population structure in all three populations was illustrated by the PCA plot, where the first two principal components of PC1 and PC2 together explained a total of 5.6%, 8.1%, and 4.7% of the phenotype variation in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 2a–c). All three populations have been divided into two clusters of tropical lines and subtropical lines based on their pedigree information.

Fig. 2
figure 2

a, b, c Principal component (PC) analysis plots in the populations of CML, DTMA, and the CombinedPOP consisting of all 562 inbred lines. The tropical lines and subtropical lines in PCA plots were colored in different colors. d, e, f The linkage disequilibrium decay plots in the populations of CML, DTMA, and the CombinedPOP

The average LD decay distance at r2 = 0.10 across the ten chromosomes was 3.60 kb, 3.47 kb, and 2.83 kb in the populations of CML, DTMA, and CombinedPOP, respectively (Fig. 2d–f).

Significantly associated SNPs and genomic regions conferring FSR resistance detected by GWAS and annotation of candidate genes

In total, 15 SNPs significantly associated with FSR resistance were detected in GWAS analyses across all three populations at the P-value threshold of 0.05/n (n is the number of genome-wide SNPs used for GWAS), i.e., 2.3 × 10–7, 2.4 × 10–7, and 2.3 × 10–7 in the populations of CML, DTMA, and CombinedPOP, respectively (Table 2, Fig. 3a, c and e). The QQ plots from the three GWAS analyses indicated that the population structure was well controlled, and the BLINK model applied in the present study is powerful to identify reliable SNPs conferring FSR resistance (Fig. 3b, d and f). These 15 SNPs significantly associated with FSR resistance were distributed on all ten chromosomes, only except for on chromosomes 9 and 10. Five of them were detected in the CML population, which were located on chromosomes 1, 2, 3, and 5. Seven of them were detected in the DTMA population, which were distributed on chromosomes 1, 3, 4, 6, 7, and 8. Three of them were detected in the CombinedPOP, which were concentrated on chromosomes 1 and 4. The P-values of the 15 significantly associated SNPs ranged from 1.99 × 10–7 to 8.27 × 10–13, whose phenotypic variance explained (PVE) values ranged from 0.94 to 8.30%, with an average PVE value of 3.63% (Table 2). The significantly associated SNP of S6_112215613 detected in the DTMA population had the lowest P-value of 8.27 × 10–13, with a PVE value of 2.09% and a MAF of 0.32. Among these 15 SNPs, four of them had PVE values greater than 5%. The significantly associated SNP of S2_41485521 detected in the CML population had the largest PVE value of 8.30%, with a P-value of 1.99 × 10–7 and MAF of 0.11. The significantly associated SNP of S4_211481644 detected in CombinedPOP had the second largest PVE value of 6.98%, with a P-value of 1.66 × 10–7 and a MAF of 0.05. These results showed that FSR resistance in tropical maize is controlled by multiple QTL (quantitative trait loci) with small to medium effects, and it is highly influenced by the genetic background of the populations studied.

Table 2 Information of significantly associated SNPs conferring FSR resistance detected by GWAS using BLINK model in populations of CML, DTMA, and CombinedPOP
Fig. 3
figure 3

a-f Manhattan and Quantile–Quantile (QQ) plots of the GWAS result using the BLINK model in populations of CML (a, b), DTMA (c, d), and CombinedPOP (e, f). The green bar showed the stable genomic region conferring FSR resistance at 250Mb on chromosome 1 across all three populations. g The distribution of annotated candidate genes based on their physical positions in this region

The genetic effects of the 15 SNPs significantly associated with FSR resistance had significant or extremely significant differences between the favorable and unfavorable alleles in all the populations (Fig. 4). In the CML population, the genetic effect differences between the favorable and unfavorable alleles for each SNP ranged from −7.19 to −14.12%, with an average genetic effect of −9.62%, while those values in the DTMA population ranged from −4.29 to −9.65%, with an average genetic effect of −7.09%. The genetic effect differences between the favorable and unfavorable alleles for each SNP in the CombinedPOP ranged from −5.26 to −12.22%, with an average genetic effect of −8.47%. For the 15 SNPs significantly associated with FSR resistance, the frequencies of the favorable alleles of 11 SNPs were greater than 0.50, and the frequencies of the favorable alleles of the rest of 4 SNPs, i.e., S2_41485521 and S3_165448326 in CML population, and S6_112215613 and S8_21865355 in the DTMA population, were smaller than 0.50.

Fig. 4
figure 4

The distribution, effect, and comparison analysis of FSR severity between the materials carrying favorable and unfavorable alleles for five significantly associated SNPs in the CML population (green and pink), seven SNPs in the DTMA population (indigo and yellow), and three SNPs in the CominbedPOP (rose madder and blue)

Based on the information of LD decay distance within each population and the physical position of the significantly associated SNPs detected, 13 key genomic regions conferring FSR resistance were identified (Table 2), including five genomic regions in CML located in bins 1.09, 2.04, 2.06, 3.05, and 5.06, seven genomic regions in DTMA located in bins 1.07, 1.09, 3.07, 4.05, 6.04, 7.01, and 8.03, and three genomic regions in CombinedPOP located in bins 1.04, 1.08, and 4.09. In total, nine haplotype blocks were built for all 13 key genomic regions, only except for the key genomic region at ~ 250 Mb on chromosome 1 detected across all three populations (Fig. S2).

In these genomic regions, 26 putative candidate genes conferring FSR resistance were identified (Supplementary Table 2). GRMZM2G414537, GRMZM2G059106, and GRMZM2G042027 were associated with transmembrane transport, which may be involved in the invasion and anti-invasion process between pathogen and plant (Sailer et al 2018). Furthermore, AC213890.4_FG004, GRMZM2G070323, and GRMZM2G002555 have the function of protein phosphorylation, known as a switch or coordinator, which can help crops to make responses to some specific stresses by coordinating the expression of some functional genes (Yao and Xu 2017).

A stable genomic region associated with FSR resistance at ~ 250 Mb on chromosome 1

Across all three populations, a stable genomic region conferring FSR resistance was consistently detected at ~ 250 Mb on chromosome 1, in an interval of 0.95 Mb from 250,089,724 bp to 251,044,933 bp (Table 2) based on the reference genome of B73 RefGen_v2. In addition, a significantly associated SNP of S1_253271793 close to this genomic region was also identified in the GWAS analysis in the DTMA population using the phenotypic data from the environment 2019AF (data not given), confirming the importance of this genomic region. The P-values of the three significantly associated SNPs detected in this genomic region ranged from 1.04 × 10–8 to 2.73 × 10–8, with PVE values ranging from 2.16 to 5.18%, and the differences of the genetic effects between the favorable and the unfavorable alleles ranging from −7.94 to −9.65% (Table 2 and Fig. 4).

In total, 21, 16, and 20 haplotype blocks were identified in the stable genomic region conferring FSR resistance at ~ 250 Mb on chromosome 1 in the populations of CML, DTMA, and CombinedPOP, respectively. Among them, 28 unique haplotype blocks were repeatedly detected in at least two populations, including nine unique haplotype blocks repeatedly detected across all the three populations. Moreover, eight of these nine unique haplotypes carried favorable genetic effects conferring FSR resistance. The information of physical position and the D’ of each pair of SNPs within each of eight haplotype blocks is shown in Fig. 5. The eight haplotype blocks were distributed evenly in this stable genomic region, with the length ranging from 3 bp to 110.24 kb, and the number of SNPs ranging from 2 to 10. The average D’ value of each pair of SNPs across each haplotype block was higher than 0.90, indicating that all the SNPs within each haplotype block were all strongly linked (Fig. 5).

Fig. 5
figure 5

Eight haplotype blocks carrying favorable genetic effects conferring FSR resistance located in the stable genomic region at ~ 250 Mb on chromosome 1 detected in all three populations

In total, 12 putative candidate genes in this genomic region were identified, the distribution of these candidate genes based on their physical positions in this genomic region is shown in Fig. 3g, and their annotated functions associated with the responses to biotic or abiotic stresses in crops are listed in Supplementary Table 2. Among them, GRMZM2G457357, GRMZM2G364069, and GRMZM5G829103 were associated with zinc ion binding and zinc finger proteins. Zinc finger proteins maintain a finger-like spatial configuration by binding to zinc ions through amino acids in the peptide chain, which are widely distributed on the plasma membrane under abiotic stress, and may have functions as sensors or abscisic acid (ABA) receptors in abiotic stress signaling (Han et al 2020). GRMZM2G457357 and GRMZM2G027991 belonged to the ubiquitin–proteasome system (UPS), which is a rapid regulatory mechanism for selective protein degradation and plays crucial roles in an integral part of plant adaptation to stresses, such as drought, salinity, cold, nutrient deprivation, and pathogens (Xu et al. 2019). GRMZM2G070323 and AC213890.4_FG004 have phosphoprotein phosphatase activity as mentioned above. Additionally, the rest of the putative candidate genes were directly associated with the intracellular signal transduction or response to abiotic stresses. Results of the annotated functions of these candidate genes revealed the importance of this genomic region at ~ 250 Mb on chromosome 1 involving in positive responses to biotic and abiotic stresses, in particular in improving FSR resistance.

Prediction accuracies of FSR severity estimated from different breeding scenarios

Prediction accuracies of FSR severity estimated from different breeding scenarios, including two CV schemes and three prediction models, are shown in Fig. 6a–c. Results showed that prediction accuracies of FSR severity estimated from the CV2 were higher than those values estimated from the CV1 across all the populations and prediction models, indicating that the predictions could benefit from previous records of lines whose FSR severity values have already been observed in other environments. Across all the CV schemes and populations, GP model of M2 obtained higher accuracies of FSR severity than those values obtained from the phenotypic prediction model of M1, and the GP model of M3 incorporating G × E obtained the highest accuracies of FSR severity.

Fig. 6
figure 6

a-c Prediction accuracies of FSR severity estimated using three prediction models of M1, M2, and M3, in two cross-validation (CV) schemes in the populations of CML, DTMA, and CombinedPOP. d-e The distribution of FSR severity in the environments used in the CML and DTMA populations when conducted GP. f The prediction accuracy of FSR severity estimated in CML (left side) and DTMA (right side) populations using CV2 and M3

In the CV1 of mimicking the breeding scenario that predicts the newly developed lines never been tested in any environment, the accuracies of FSR severity estimated with the phenotypic prediction model of M1 were close to zero across all the three populations. The accuracies of FSR severity estimated with the GP model of M2 were 0.36, 0.29, and 0.34 in the populations of CML, DTMA, and CombinedPOP, respectively. The accuracies of FSR severity estimated with the GP model of M3 incorporating G × E were 0.40, 0.36, and 0.36 in the populations of CML, DTMA, and CombinedPOP, respectively.

In the CV2 to mimic breeding scenario of sparse testing, in which some newly developed lines were observed in some environments but absent in others, the accuracies of FSR severity estimated with the phenotypic prediction model of M1 were 0.48, 0.29, and 0.48 in the populations of CML, DTMA, and CombinedPOP, respectively. The accuracies of FSR severity estimated with the GP model of M2 were 0.51, 0.34, and 0.51 in the populations of CML, DTMA, and CombinedPOP, respectively. Among all the three prediction models, the highest accuracies of FSR severity were observed in the GP model of M3 incorporating G × E, which were 0.55, 0.42, and 0.53 in the populations of CML, DTMA, and CombinedPOP, respectively.

Prediction accuracies of FSR severity estimated with the phenotypic data from four individual environments and the combined phenotypic data from the same location or same year

Prediction accuracies of FSR severity estimated with the BLUE values from the combined analysis across the same location or same year were higher than those values estimated with the BLUE values from the combined analysis across four individual environments. In the CML population, the prediction accuracy of FSR severity in M3 and CV2 estimated with the BLUE values from the combined analysis across four individual environments was 0.55, which was improved to 0.60 using the BLUE values from the combined analysis across years, i.e., Cominbed_2018 and Cominbed_2019, and to 0.62 using the BLUE values from the combined analysis across locations, i.e., Cominbed_AF and Cominbed_TL, (Fig. 6d and f). Similar trend was also observed in the DTMA population (Fig. 6e, f).

Prediction accuracies of FSR severity estimated with different numbers of significant SNPs detected by GWAS

Prediction accuracies of FSR severity estimated with the whole genome-wide SNPs, 2105, 197, and 26 significantly associated SNPs selected at the thresholds of P-values at 10–3, 10–4, and 10–5 are shown in Fig. 7a-c. Prediction accuracies of FSR severity estimated with 2105 significantly associated SNPs selected at the thresholds of P-values at 10–3 were the highest across all the three populations, and the prediction accuracies estimated with the whole genome-wide SNPs were higher than those values estimated with 197 and 26 significantly associated SNPs selected at the thresholds of P-values at 10–4 and 10–5 across all the three populations. Prediction accuracies of FSR severity in CV2 and M3 estimated with 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were 0.69, 0.60, and 0.69 in populations of CML, DTMA, and CombinedPOP, respectively.

Fig. 7
figure 7

a-c Prediction accuracies of FSR severity estimated with the different number of significantly associated SNPs selected from GWAS results at P-value thresholds of 10–3 (2105 SNPs), 10–4 (197 SNPs), and 10–5 (26 SNPs) in populations of CML, DTMA, and CombinedPOP using CV2 and M3. d-e Prediction accuracies of FSR severity estimated with the whole genome-wide 221,190 SNPs, and 2105 SNPs detected by GWAS at the P-value threshold of 1 × 10–3 in M1, M2, and M3 by training the CML population to predict the DTMA population or by training the DTMA population to predict the CML population

Prediction accuracies of FSR severity estimated across different populations

Prediction accuracies of FSR severity estimated across different populations, i.e., training the CML population to predict the DTMA population as the testing population, or vice versa, are shown in Fig. 7d-e. Moderate prediction accuracies of FSR severity were observed across population in the GP model of M2 and the GP model of M3 incorporting G × E, when the whole genome-wide SNPs or 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were used for prediction. By training the CML population to predict the DTMA population as testing population, the prediction accuracies of FSR severity across populations in CV2 and M3 estimated with the whole genome-wide SNPs and the 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were 0.35 and 0.49, respectively. By training the DTMA population to predict the CML population as testing population, the prediction accuraciesof FSR severity across populations in CV2 and M3 estimated with the whole genome-wide SNPs and the 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were 0.44 and 0.59, respectively.

Discussion

Similar to the observations were reported in other maize stalk rots caused by different pathogens (Pè et al. 1993; Duan et al. 2019; Mu et al. 2019; Jung et al. 1994), this study also showed that the FSR resistance in maize is a complex quantitative trait with medium to high heritabilities and significantly affected by G × E, it is controlled by multiple loci with minor effects. In total, 15 SNPs significantly associated with FSR resistance were identified across all three populations, with the P-values ranging from 1.99 × 10–7 to 8.27 × 10–13, and PVE values ranging from 0.94 to 8.30%. These findings extend the knowledge of understanding the genetic architecture of FSR resistance in tropical maize.

In a previous study of Rashid et al. (2022), 342 tropical/sub-tropical maize inbred lines in the CAAM panel were also used to conduct GWAS analysis for genetic mapping the FSR resistance. Although this previous and the present studies both focused on the genetic dissection of the resistance to FSR in tropical maize, the genetic mapping populations, FSR screening environments, and validation approaches were different between these two studies, resulting in the detection of different loci significantly associated with the FSR resistance. In the previous study of Rashid et al. (2022), the maize inbred lines in the CAAM panel for GWAS analysis are adapted to Asian tropical ecologies with predominantly yellow kernel color. In the present study, the CML and DTMA populations represented a broader genetic diversity of tropical maize germplasm developed by CIMMYT with mostly white kernel color. The CAAM panel was screened for FSR resistance in two locations in India in the previous study, whereas the CML and DTMA populations used in the present study were screened for FSR resistance in two locations in Mexico. Moreover, the validation approaches were different in these two studies, the peak at 168 Mb on chromosome 6 detected by GWAS was validated in further QTL mapping analyses in the previous study of Rashid et al. (2022), and the peak at ~ 250 Mb on chromosome 1 detected by the present study was validated across two independent GWAS populations. Although there are several major differences between these two studies, the importance of the crucial genomic region conferring FSR resistance at 168 Mb on chromosome 6 identified by Rashid et al. (2022) was also partially validated in the present study (Supplementary Table 3), six SNPs highly linked with FSR resistance in the genomic region between 162 and 168 Mb on chromosome 6 were detected by the mixed linear model in GWAS analysis using the populations from the present study (Supplementary Table 3). The P-value of these six SNPs ranged from 2.88 × 10–5 to 7.07 × 10–4, and the PVE values ranged from 2.60 to 6.13%, indicating the consistency of the results across different studies.

In addition, a stable genomic region conferring FSR resistance at ~ 250 Mb on chromosome 1 was identified by the present study, providing additional valuable information on crucial genomic region conferring FSR resistance to further investigate the possibility of developing trait markers for deployment in breeding programs. Several QTL associated with stalk rot resistance caused by different pathogens have been reported on chromosome 1 as well. In our previous study, a genomic region in bin 1.06 on chromosome 1 conferring FSR resistance was identified in different GWAS panels developed by CIMMYT (Song et al. 2024). In addition, QTL in bin 1.03 and 1.09 associated with Pythium stalk rot resistance were reported previously by Song et al. in 2015 and Duan et al. in 2019, respectively. In this crucial genomic region conferring stalk resistance, several putative candidate genes highly associated with responses to abiotic and biotic stresses were also reported.

One of the novel genomic tools of GP, also known as GS, had been reported as an effective approach for crop improvement, despite the prediction accuracies were highly affected by the trait heritability, prediction model, marker density, genotype-by-environment interaction, the relationship between the training and testing population, etc. (Sitonik et al. 2019; Nyaga et al. 2019; Zhang et al. 2017). Extensive research has been conducted on evaluating the potential of the utilization of GP for improving breeding efficiency for various traits, including the major maize diseases (Yu et al. 2022; Guo et al. 2020; Beyene et al. 2015; Oakey et al. 2016). In the previous studies, prediction accuracy for resistance to northern corn leaf blight reached to 0.70, for resistance to maize lethal necrosis ranged from 0.46 to 0.86, and for resistance to Fusarium ear rot ranged from 0.46 to 0.67 (Technow et al. 2013; Sitonik et al. 2019; Kuki et al. 2020; Liu et al. 2021; Holland et al. 2020). In the present study, GP accuracies of FSR severity estimated with the whole genome-wide SNPs were moderate and ranged from 0.29 to 0.51 in different breeding scenarios across two CV schemes and three populations. Moreover, moderate prediction accuracies of FSR severity estimated with the whole genome-wide SNPs were also observed across populations. These results show the high potential of GP for improving the FSR resistance.

For further improving the prediction accuracy of FSR resistance, the effects of the key factors on estimations of prediction accuracy were investigated in the present study, focusing on how to sample the most informatic molecular markers, and incorporate the effects of environments and G × E into prediction models. Generally, GP using the whole genome-wide SNPs was expected to achieve the highest prediction accuracy (Massman 2013, Lorenz et al. 2011; Budhlakoti et al. 2020). However, the low-density markers were also cost-effective for the implementation of GS (Zhang et al. 2015; Abed et al. 2018; Werner et al. 2018). In the present study, the prediction accuracies of FSR severity estimated with the 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were higher than those prediction accuracies estimated with the whole genome-wide SNPs across all the three populations, whereas the prediction accuracies of FSR severity estimated with the 197 and 26 significantly associated SNPs selected at the thresholds of P-values at 10–4 and 10–5 were lower than those prediction accuracies estimated with the whole genome-wide SNPs across all populations. These results indicated that considering both the genome coverage and the threshold of the P-value of SNPs to select a subset of molecular markers could improve the GP accuracy. Similar trends were also observed in the total PVE values estimated with the different SNP datasets (Fig. 8). Across all populations, the total PVE values estimated the 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were higher than those estimated with the whole genome-wide SNPs, whereas the total PVE values estimated with the 197 and 26 significantly associated SNPs selected at the thresholds of P-values at 10–4 and 10–5 were lower than those total PVE values estimated with the whole genome-wide SNPs. These results indicated that the subset of SNPs representing more genotypic information conferring the FSR resistance could achieve higher prediction accuracy. Furthermore, analysis of the variance components estimated with different SNP datasets also validated that the prediction accuracies of FSR severity estimated with the 2105 significantly associated SNPs selected at the threshold of P-value at 10–3 were highest across all populations, due to the minimum value of residual variance and the lowest deviance information criterion (DIC) value representing more effective and precise prediction (Supplementary Table 4). The previous studies showed that GP model incorporating G × E achieved higher prediction accuracy (Burgueño et al. 2012; Guo et al.2013; Jarquín et al. 2017; Zhang et al. 2015; Monteverde et al. 2018). In the present study, the analysis of variance components of the random effects in each prediction model and the percentage of the total variance explained by each random effect were performed (Supplementary Tables 5 and 6). In Supplementary Table 5, GP model of M3 incorporating G × E had the lowest percentage of the total variance explained by the residual effect, the highest percentage of the total variance explained together by the environment and the genotyped-by-environment interaction, and the lowest DIC values among all the three models in both populations. These results support the conclusion that GP model of M3 incorporating G × E could improve the prediction accuracy. Similar results are also observed in Supplementary Table 6, the GP model employing the phenotypic data combined the same year or same location had the lower percentage of the total variance explained by the residual effect, the higher percentage of the total variance explained together by the environment and the G × E, compared with the GP model employing the phenotypic data form individual environment. These results enhance the understanding of exploiting GP for improving FSR resistance.

Fig. 8
figure 8

a, d, e, h The distribution on each chromosome of the whole genome-wide SNPs (a), 2105 SNPs (d), 197 SNPs (e), 26 SNPs (h) selected from GWAS results at P-value thresholds of 10–3, 10–4, and 10–5. b, c, f, g The total phenotypic variation explained (PVE) was estimated in the populations of CML, DTMA, and CombinedPOP using the whole genome-wide SNPs (b), 2105 SNPs (c), 197 SNPs (f), and 26 SNPs (g), respectively