1 Introduction

Maize is one of the most important crops after rice and wheat and has numerous industrial uses. It is a diploid species from the tribe Maydae of the Poaceae family, having 2n = 20 chromosome numbers, with its primary center of origin in Mexico and Central America. Since its domestication, maize has undergone artificial and natural selection for centuries. Selection for morphological traits has been the foundation of crop improvement. Thousands of years of conscious selection by the farmers have led to the development of landraces adapted to specific climatic conditions harboring valuable alleles for various traits related to quality and yield. In the early 1900s, efforts for systematic corn breeding to develop hybrids started (East 1908; Shull 1909). The selection was practiced even before that time, providing several open-pollinated cultivars through mass selection. This improved germplasm acted as a sourced germplasm for deriving inbred lines for hybrid breeding (Hallauer et al. 1988). Europe observed a tremendous expansion of corn area in some countries, aided by selection for early maturity (Trifunovic 1978). During the earlier phases of crop improvement, unconscious selection was practiced for a few loci, followed by selecting many loci through mass selection. After the invention of Mendelian genetics, selection for a few loci resulted in the improvement of disease resistance and the development of dwarf wheat varieties enabling the green revolution. Later, the development of the breeder’s equation (Lush 1937) and linear mixed model (Henderson et al. 1959) methods provided new tools to plant breeding. With the advent of sequencing technologies, genomic data could be associated with phenotype data which helped identify causal genomic regions through QTL (quantitative trait loci) and association mapping. It facilitated the transfer of QTLs using marker-assisted selection (MAS) approach. A detailed description of all these plant breeding phases is provided elsewhere by Ramstein et al. (2019) and recommended for more information. The major drawback of MAS was its unsuitability for quantitative traits where QTLs have minor effects only, thus having huger QTL × environment interactions. Also, QTL and association mapping strategies are challenged by the difficulty in identifying rare and small effect QTLs for important traits. The use of genome-wide markers was proposed to predict the breeding value of genotypes (Meuwissen et al. 2001) and is called genomic selection (GS). GS is contrary to MAS, where few loci with significant effects are targeted. The goal of GS is to predict the breeding and/or genetic values of the genotypes. Although getting genotypic information is still a significant financial bottleneck for implementing GS in breeding programs, reducing cost of genotyping is a strong motivation for adopting this advanced tool in crop improvement. Statistically, GS has supervised learning where a set of individuals (inbreds, hybrids, or segregating genotypes) acts as a training set having both genotypic and phenotypic information. A suitable prediction model is applied to this training set, and a fitted model is used to predict the breeding values of unknown samples using only the sequence data. A common misunderstanding about GS is that it predicts phenotypic value, which it does not. Instead, it predicts genomic estimated breeding values (GEBVs), which do not directly represent phenotypic values. Despite this, it may be used to rank genotypes for a trait that is used to fit the model. A correlation between the actual phenotypic value and GEBVs might give a good idea about the accuracy of the predictions by such an approach.

The selection of an appropriate prediction model is an essential aspect of GS. Several models have been proposed for GS considering different statistical factors (Crossa et al. 2017). GS has several complexities which need to be addressed to obtain acceptable prediction accuracies. One of these complexities is a huge number of markers (p) compared to population size (n). It makes least squares estimates for marker effects to be less practical to compute. Solutions to these complexities include dimensionality reduction, penalized regression, and variable selection, to mention a few. Another critical challenge is to consider genotype × environmental interaction in GS models. GS methods can be classified based on different criteria. A major classification scheme divides GS models into parametric, semiparametric, and nonparametric models. Among many, parametric models include ridge regression BLUP (rrBLUP), genomic BLUP (gBLUP), compressed BLUP (cBLUP), and super BLUP(sBLUP), collectively called BLUP models (Endelman 2011; Pérez and de Los Campos 2014; Wang et al. 2018). Another group of models in the parametric category is Bayesian models comprising Bayesian ridge regression, Bayesian LASSO, and Bayes alphabets A, B, and C (Pérez and de Los Campos 2014). The principal components can be integrated into parametric models to account for the population structure in the GS models (Merrick and Carter 2021). Semiparametric methods include reproducing Kernel Hilbert spaces regression (Gianola and van Kaam 2008), abbreviated as RKHS. RKHS is supposed to capture complex gene interaction. Random Forest and Support Vector Machine regressions are among the nonparametric methods for GS. The choice of model depends on various factors, including the composition of the training population. Epistatic interaction also affects prediction accuracies and can be improved slightly using specific models that capture epistasis, such as EG-BLUP. The predictive abilities of models also depend on crop and trait genetics.

2 GS in Maize for Biomass, Yield, and Yield-Related Traits

Yield is a complex phenotype governed by several loci with small to medium effect sizes in maize (Chen et al. 2017). Prediction of yield using GS may be made at several levels depending on the study’s objectives. Since hybrids are the main cultivar types in maize, GS can be employed to identify high-yielding parents in segregating or double haploid populations to predict hybrids’ yield to narrow down candidates for field trials.

2.1 Prediction of Per Se and Hybrid Performance in Segregating Generations

The development of parental lines for a hybrid breeding program is a continuous process in a maize breeding program. Pedigree breeding is a popular method of developing new inbreds where two or more selected parents are crossed to generate segregating population to select desired segregants based on their per se performance or combining ability with chosen testers. Early generation testing helps determine better lines by reducing the cost of evaluation and advancing many lines from a cross. GS may be applied at early generation testing to reduce the cost of evaluation of test crosses. In one such attempt, the shelling percentage was predicted with higher accuracy than the yield in test crosses of an F2 population obtained with the primary aim of improving the shelling percentage (Sun et al. 2019). To improve grain yield and stover quality traits, the genome-wide selection was implemented on a testcross population from 223 recombinant inbreds. The results were compared with that of marker-assisted recurrent selection (MARS) in the same population (Massman et al. 2013). GS resulted in significantly higher realized gains than MARS for yield + stover index. GS was employed to predict the hybrid performance in test crosses to predict GCA for grain yield (Burdo et al. 2021). Genomic-estimated GCA for the inbred lines was computed and was found to have a higher correlation to testcross values than phenotypic GCA. The primary motivation was to identify the best combiner lines in the early generations of inbred line development to avoid an exhaustive and practically unfeasible scheme to test all the parental line candidates. Such an approach for GS-assisted line selection can save a significant amount of finances by narrowing down the best candidates for field-based testcross evaluations for yield.

2.2 GS in Inbred Lines

Inbred lines are homogenous and homozygous populations used as parents in maize hybrid breeding. GS selection in maize inbred populations primarily aims at the parental selection and studying the feasibility of GS for a specific crop-trait scenario. Stalk strength is an important agronomic trait in maize and is related to stalk lodging and grain yield. A set of inbred lines belonging to two RIL populations were subjected to GS for rind penetrometer resistance (RPR), an indicator of stalk strength in maize. High prediction accuracy for RPR was observed when a multivariate model was used and when QTLs were taken as a fixed effect in the model (Liu et al. 2020). The authors explained that fixed and multivariate models might better capture the genetic variance of the trait and probably both the additive and nonadditive interaction effects. The husk is a part of the maize ear that indirectly affects grain yield by reducing susceptibility to ear rot (Warfield and Davis 1996), providing limited photosynthesis and acting kernel dehydration after physiological maturity. Suitable husk characteristics are important for obtaining optimum yield in specific agroecologies. On an association mapping panel of 498 inbred lines, GS models were used to predict husk-related traits (Cui et al. 2020). The highest prediction accuracy was observed for husk thickness. Diverse association mapping panels are highly likely to have the presence of subpopulations. While predicting husk-related traits, subpopulation-level training of models showed higher prediction accuracies than when modeling across the subpopulations, provided that the subpopulation size is large enough. Similarly, the kernel oil trait was predicted with good prediction accuracy (0.68) in a set of maize inbred lines to assess the feasibility of GS for this trait (Hao et al. 2019).

2.3 GS for Double Haploid-Based Breeding Programs

In commercial maize breeding programs, double haploid (DH) line development has become a routine scheme. The convenience of generating a large number of DH lines in a much shorter time compared to the traditional pedigree method has allowed generating thousands of DH lines every year. The phenotypic evaluation of this vast novel germplasm resource is a very challenging and costly task. GS may help narrow down good candidates for field evaluation for specific traits, thus saving substantial financial resources. Scientists at CIMMYT (International Maize and Wheat Improvement Center) evaluated a scheme to predict yield and agronomic traits for a set of 3068 tropical DH lines at the early stages of the pipeline (Beyene et al. 2019). The experiment was conducted over multiple years, and it suggested that the inclusion of 10–30% of lines from the following year to the existing training set of the previous year can significantly increase the prediction accuracies and save high costs compared to testcross formation and multilocation testcross evaluation. A comparative study at CIMMYT showed that the performance of testcrosses from DH lines selected from GEBVs has an advantage over DH lines selected based on phenotypic values for yield and yield-related traits (Beyene et al. 2019). The gain from GS was realized in terms of a 32% cost reduction and time savings. A good reference for different phases of GS in the breeding program is reported elsewhere (Fu et al. 2022).

2.4 Rapid Cycling Genomic Selection

Time is an essential factor in the breeder’s equation. The time taken to complete a breeding cycle depends on factors like crop species and the availability of off-season nurseries. Breeders have to optimize their program to incorporate GS in such a way that reduces breeding cycles. Maize is a crop that can be grown in multiple seasons in tropical climates, providing opportunities to utilize off-season nurseries. Several studies have reported efforts to shorten the generation interval to increase genetic gain per unit of time (Beyene et al. 2015; Gaynor et al. 2017; Massman et al. 2013; Vivek et al. 2017). Rapid cycling genomic selection (RCGS) was implemented in a multi-parental tropical maize population, using 18 founder lines for 4 cycles (2 cycles per year) for selecting grain yield (Zhang et al. 2017). A slight reduction in genetic diversity was reported after four cycles compared to the base population. The authors suggested that RCGS can be adopted in tropical maize breeding programs without rapidly losing genetic diversity and achieving higher genetic gains in a short period. A good compilation of information on RCGS is available elsewhere (Volpato et al. 2021).

3 GS for Abiotic and Biotic Stress Tolerance

Abiotic stresses are a few critical challenges of today’s climate change era. Maize faces substantial loss in biomass and yield when subjected to environmental stresses such as drought, heat, salt, and waterlogging. The evaluation of breeding material takes twice as much effort as the same set of germplasm is evaluated both under control and stressful environments compared to yield and yield-related traits. Implementing GS may save time and resources in stress tolerance-oriented breeding programs, thus increasing genetic gain per unit of time. The genomic selection was conducted on eight bi-parental populations by CIMMYT to estimate genetic gain for grain yield under managed drought conditions (Beyene et al. 2015). The study suggested the superiority of the GS approach over conventional pedigree-based phenotypic selection for increasing genetic gains for yield in a drought environment. In another attempt to improve drought tolerance in tropical maize using GEBV-based selection, researchers at CIMMYT (Vivek et al. 2017) demonstrated the superiority of genomic selection over phenotype-based selection in two bi-parental populations. Markers were used to generate a stable source population by selecting drought-tolerant alleles without selecting under drought stress.

Possible erratic rainfall pattern due to climate change poses a risk of drought and waterlogging within the same crop growth period. It is important to simultaneously improve drought and waterlogging tolerance in the same genetic background. It is theoretically possible due to potential links between molecular mechanisms imparting tolerance to moisture-deficit and excess environments. A rapid cycling genomic selection was implemented to select combined stress tolerance in a multi-parent yellow synthetic population (Das et al. 2020). The study enables the development of a breeding population through molecular markers even in the absence of the target stress in one season. No yield penalty under optimal moisture conditions was observed while selecting yield for drought and waterlogging environments. Not much efforts have been made to improve cold and heat tolerance using genomic selection. Salinity stress is another critical environmental stress causing substantial yield reduction to maize yield. In a first attempt to predict salinity tolerance for biomass-related traits in maize, GS was implemented on a set of diverse inbred lines for the shoot and root-related traits (Singh et al. 2019). With leading efforts by CIMMYT, good progress in improving abiotic stress tolerance using genomic selection is expected.

Plant diseases pose severe threats to global food security, and with climate change, pathogens are expected to evolve rapidly, overthrowing the existing tolerance of crops for them. Efforts to map R genes have been undertaken in the past (Carson et al. 2004; Collins et al. 1998; Kuki et al. 2018). Despite such efforts, not many large-scale marker-assisted selection (MAS)-based gene pyramiding studies are available to transfer these mapped genomic regions. Genomic selection can be a good approach where there are no or very few major QTLs available for quantitative disease resistance. For selecting genotypes with reduced disease severity against northern lead blight (NLB) caused by Setosphaeria turcica, the G-BLUP model was implemented (Technow et al. 2013) with prediction accuracies up to 0.706 (dent corn) and 0.690 (flint corn). High prediction accuracies using GWAS-detected SNP markers were observed for Fusarium ear rot (FER), another destructive fundal disease of maize (Liu et al. 2021). The feasibility of GS has been studied for a few other maize diseases, such as tar spot complex (Cao et al. 2021), lethal necrosis (Gowda et al. 2015), and Gibberella ear rot (Riedelsheimer et al. 2013). With careful design of strategy to implement, GS may yield higher genetic gains saving time and resources as it does for any other trait.

4 GS for Pre-breeding

The genetic basis of elite cultivars in major crops suffers from a narrow genetic base. It poses a risk of the inability to cope with the new challenges of climate change. Corp wild relatives harboring valuable alleles for the traits of interest are an essential resource for crop improvement, and introgression of these alleles into cultivation background, popularly known as pre-breeding, is a challenging task. Despite difficulties, several attempts to introgress valuable alleles in the active breeding germplasm have been made for a few traits in some crops (Barrantes et al. 2016; Dutra et al. 2018; Fulop et al. 2016; Grewal et al. 2020; dos Santos et al. 2022; Singh et al. 2021; Wang et al. 2017). Most marker-assisted introgressions for pre-breeding have been successful for qualitative traits and traits for which major alleles are present. For quantitative traits, the traditional introgression approaches are not an idea. A novel origin-specific genomic selection (OSGS) scheme was proposed (Yang et al. 2020) where in a bi-parental derived population, separate marker effects were predicted for favorable exotic alleles based on their origin (wild vs. elite). The scheme aims at increasing the contribution of favorable exotic alleles in a bi-parental cross between wild and elite lines. The scheme was validated on two nested association mapping populations of barley and maize. In another interesting study, a proposed design was evaluated, which aimed at initiating a pre-breeding program to harness polygenic variation from landraces using genomic selection (Gorjanc et al. 2016). The study suggests the introgression of favorable alleles from landraces in a phased manner. Thus, genomic selection may help broaden the genetic base of breeding populations.

5 Take-Home Message

Genomic selection is a powerful tool gaining popularity in maize breeding programs to predict the breeding values of individuals. Genomic selection is a practically proven tool and can give remarkable success for maize improvement if used wisely. With the reducing prices of DNA sequencing, an upward trend is expected in its adoption as an essential breeding tool. However, for developing countries in general and public sector breeding programs in particular, DNA sequencing of many maize inbreds is still not feasible. Hence, an appropriate strategy for selecting various parameters, such as the number of markers and population size, in the context of specific traits is recommended.