Keywords

4.1 Introduction

Maize is entitled as ‘Drosophila of Plant Breeding’ owing to the vast amount of genetic diversity and amenable to undertake basic and applied studies. It is a member of Poaceae family with a chromosome number of 2n = 20 and genome size of 2.5–2.7 Gb (Haberer et al. 2005). The existence of both dicliny and dichogamy made the crop highly cross-pollinated. Additionally, maize is one of the three major staple food crops. Maize has offered countless benefits to mankind since the prehistoric era. Maize as a crop showed its ability to support and uplift the farmer’s standard of living, serve as a soil fertility indicator, generate income and feed the growing population. In addition to sustaining food and nutritional security, maize also serves as raw material for versatile industries such as starch and glucose production, biofuel processing, ethanol production and other sub-by-products.

Photosynthetically efficient (C4), day-neutral and highly adaptive nature of the maize makes it suitable for most agro-climatic regions. Presently, maize is being grown in 169 countries across the globe (Anonymous 2019). Globally, maize is cultivated in 197.20 million hectares with a production of 1.15 billion tonnes and has a productivity of 5.82 tonnes/ha (Anonymous 2019). The major maize-growing countries in the world are the USA, China, Brazil and India. The USA is the major global corn-growing country with 30.21% of global production and accounting for 32.95 million hectares of the area under cultivation (Anonymous 2019). The low productivity in the developing world can be attributed to various biotic and abiotic stresses. Maize is highly sensitive to many pests, diseases and abiotic stresses like drought, salinity, nutrient deficiency and temperature stresses (Fig. 4.1). In addition, climate change induced abiotic constraints to have a wide range of yield-reducing effects on all field crops, including maize. Thus, they should be given high priority in maize improvement programmes (Gazal et al. 2018). Further, intensive cultivation of potential hybrids and varieties resistant to major diseases and pests leads to resurgence and increased crop vulnerability to minor pests and diseases.

Fig. 4.1
An illustration depicts the percentages of grain loss in maize with various stress factors. The factors are as follows. Heat, nutrient deficiency, drought, salinity, waterlogging, maize lethal necrosis virus, northern and southern corn leaf blight, ear rot, sorghum downy mildew, and others.

The potential per cent grain yield losses due to various abiotic and biotic stress factors in maize. The abiotic stress factors, diseases, insect pests and weeds are represented with orange, magenta, olive green and light green colour texts

Presently, more than 60 diseases are reported in maize. The major diseases which are severely affecting the production and productivity of maize are Northern and Southern corn leaf blights (NCLB, SCLB), sorghum downy mildew (SDM), brown spot (BS), polysora rust (PR), brown stripe downy mildew (BSDM), pre- and post-flowering stalk rots (PSR) and ear rots (ER) (Hooda et al. 2018). NCLB is the major foliar fungal disease affecting global maize production (Technow et al. 2013) and is reported to cause a yield loss of >50% (Raymundo and Hooker 1981; Perkins and Pedersen 1987). Maydis leaf blight or SCLB is the most severe disease in warm and wet temperate and tropical areas of the world, and yield losses are reported up to 70% (Hooda et al. 2018). SDM is a major foliar disease of maize with global distribution and is prevalent in different altitudes and agro-ecological systems in the American, African, Australian and Asian subcontinents (Wongkaew et al. 2014; Lukman 2012). Ten different downy mildew pathotypes are known to uniquely infect the maize crop in tropical regions (Hooda et al. 2018). The SDM causes severe yield losses of 30–40% in maize (Rashid et al. 2018). Maize ER disease is prevalent in all the maize-growing areas worldwide. It is known to be caused by more than 20 fungal species, viz. Aspergillus flavus, Cladosporium spp., Fusarium graminearum, Fusarium verticillioides, Penicillium spp., Trichothecium roseum, etc. (Zummo and Scott 1990; Görtz et al. 2008; Guo et al. 2020). However, the most important ER fungi occurring globally are Fusarium verticillioides (F. moniliforme Sheldon) that causes Fusarium ear rot (FER) and Fusarium graminearum that causes Gibberella ear rot (GER) (Mesterhazy et al. 2012). Along with causing yield loss, Fusarium ear rot also produces fumonisins (mycotoxins) that affect grain quality and consumers’ health. These mycotoxins are known to cause higher rates of oesophageal cancer (Munkvold and Desjardins 1997).

Recently, the synergistic interaction of maize chlorotic mottle virus (MCMV) and sugarcane mosaic virus (SMV) resulted in serious maize lethal necrosis virus (MLND) incidences in maize-growing areas of Eastern Africa, the USA, parts of Latin America and China (Wangai et al. 2012; Gowda et al. 2015). Maize plant shows susceptibility to this disease at all the growth stages and leading to the death of plants in severe condition (Gowda et al. 2015). Tar spot (TS) is one of maize’s most destructive foliar diseases, prevalent in tropical and subtropical areas of South and Central America. Under a favourable environment, TS is reported to cause significant grain yield losses (Cao et al. 2017). Cultivating resistant cultivars is considered the most effective method to manage the disease incidences in maize (Dingerdissen et al. 1996).

Along with several biotic stresses, the maize crop is also severely affected by various abiotic stresses. Among several abiotic stresses, drought exerts the most harmful effects on maize production, resulting in substantial yield loss in rainfed areas accounting for 74% of maize-growing areas (Nepolean et al. 2014; Wang et al. 2019). Under the present era of climate change, drought has become more recurrent and unpredictable. Similarly, heat and waterlogging stresses are other important abiotic stresses in maize. Temperature regimes beyond the threshold level (max. 35 °C to min. 23 °C) result in heat stress in maize (Mallikarjuna et al. 2020). The one-degree rise above 30 °C in each day above was seen to lower the final grain yield of maize by 1% and 1.7%, in optimum and drought conditions, respectively (Lobell et al. 2011). Additionally, 4–5 °C increase in air temperature during the kernel development stage drastically reduces the kernel number per ear up to 73% (Cárcova and Otegui 2001).

Flooding or excessive soil moisture or waterlogging is one of the impeding abiotic stresses for maize production in South and South-East Asia owing to erratic rainfall patterns (Zaidi et al. 2004). Maize is a highly resource-demanding crop. Further, in maize production systems of developing world like sub-Saharan Africa, low nitrogen stress is one of the widespread problems, especially among marginal farming community (Ertiro et al. 2020). Phosphorus is a vital nutrient that mostly gets fixed in the soil largely by aluminium and calcium ions and becomes unavailable to the plants. Phosphorus starvation will severely affect the growth and development of maize, thereby decreasing the biomass and yield (Yu et al. 2018). The management of these stresses necessitates the development of stress-resilient maize cultivars in addition to system-specific management practices.

4.2 Genetic Improvement Strategies for Stress Tolerance in Maize

Genetic improvement of crops for enhanced stress tolerance is the most effective approach in managing unwarranted stress occurrence. The breeders employ various conventional and improved molecular tools in maize to improve the genetic tolerance for several abiotic and biotic stresses. The genetic improvement of maize through conventional breeding methods was mainly based on (i) selection and backcrossing, (ii) extensive screening for stress tolerance to derive improved stress-resilient germplasm via recurrent selection, (iii) utilization of alien genetic variation (pre-breeding activities) and (iv) breeding for early maturity and varieties with adaptation to specific ecologies (Gazal et al. 2018). The conventional methods of plant improvement are successful in increasing maize production by exploiting hybrid vigour, using male sterility systems through backcross methods, population improvement schemes for deriving the good inbred lines, synthetics and composites. As in conventional breeding, selection depends mainly on the phenotype that is highly sensitive to the environment; thus, selection efficiency is low. The advancement in molecular biology, i.e. marker technologies, supplemented the conventional breeding approaches by increasing the selection efficiency and contributed substantially to crop improvement. Molecular marker technology helps in reshaping the breeding activities and facilitates rapid gains from selection (Jannink et al. 2010; Liu et al. 2020).

Currently, the role of marker-assisted selection (MAS) in improving the polygenic traits is limited. However, it has been effectively utilized to improve traits with large effect alleles linked to markers (Zhong et al. 2007). The major limitation in improving quantitative traits is having the same QTL or genomic region expressing target traits across environments owing to QTL × environment interactions. Similarly, the genetic background of inbred lines limits the QTL expression across the germplasm set in a crop (Bernardo 2016). MAS and marker-assisted recurrent selection (MARS) depend mainly on significantly linked markers, tagged gene(s) or mapped quantitative trait loci (QTL). Furthermore, the MAS or MARS has main disadvantage in capturing the significant marker-QTL associations with minor effects (Heffner et al. 2009; Xu et al. 2012). Thus, marker-assisted selection has two components: first is to identify the QTL, and second is to estimate the effects (Jannink et al. 2010). QTL identification using linkage mapping is carried out using biparental populations, but the power of detecting marker-trait association is poor because of the presence of chromosomes with low recombination rates and tedious and time-consuming nature (Guo et al. 2020).

To resolve the issue associated with linkage mapping for QTL detection, the concept of association mapping started during the early twentieth century to facilitate the identification of marker-trait association in non-biparental populations and for fine mapping of genomic regions with higher recombination rates. Nevertheless, even though it is advantageous, it has a drawback in identifying rare QTLs with minor genetic effects governing the economically important characteristics and is greatly influenced by the environment (Jannink et al. 2010). For instance, resistance for NCLB disease in maize showed many QTL dispersed throughout the genome (Van Inghelandt et al. 2012; Poland et al. 2011; Wisser et al. 2006; Ranganatha et al. 2021). Furthermore, high cross-pollination in maize resulted in the rapid decay of linkage disequilibrium (LD). Hence, maize demands a large number of polymorphic SNPs distributed throughout the genome (Gowda et al. 2015). To overcome the disadvantages associated with the above breeding approaches, Meuwissen et al. (2001) proposed the genomic selection (GS) to capture the total additive genetic variance using genome-wide molecular markers and to enhance the genetic gain for quantitative traits (Poland and Rutkoski 2016).

4.3 Genomic Selection in Maize: Need and Importance

With the advent of third-generation sequencing, longer sequence reads can be generated in a short period and at a significantly lower cost per run, which are subsequently helpful in the creation of fixed SNP-genotyping arrays that encompasses set of genome-wide dispersed genic and non-genic SNPs (Varshney et al. 2014). The cost of genotyping has reduced significantly relative to phenotyping costs; thus, GS becomes an attractive selection decision tool in breeding activities (Atanda et al. 2021). GS arose with an intention to utilize the available high-density parallel NGS technologies. Unlike other methods, GS capitalizes on all marker loci with and without significant trait association, thereby giving unbiased estimates of marker-trait association, and it is assumed that casual polymorphism would be coherent across the families, so the marker effects based on population-wide estimates would be meaningful (Jannink et al. 2010; Meuwissen et al. 2001; Guo et al. 2020). GS enhances the genetic gain through improved prediction accuracy of genomic estimated breeding values, shortening generation intervals and effective utilization of existing germplasm via genome-guided selection (Sonesson et al. 2010; Schierenbeck et al. 2011; Pryce et al. 2012). In different sets of maize, Arabidopsis and barley germplasm, the GS reduces the selection time by almost half per cycle compared to the phenotypic selection for most of the traits (Lorenzana and Bernardo 2009). The effectiveness of GS in predicting complex traits has been proven in various crops, including maize (Zhao et al. 2012; Rutkoski et al. 2014; Zhang et al. 2015). Further, GS with linkage and association mapping has improved breeding efficiency (Cao et al. 2017). However, GS is mainly used to predict the additive genetic value of the line, and non-additive genetics are often disregarded (Robertson et al. 2019).

GS can be employed to predict breeding values of the individuals with or without phenotypic information of their own. GS with phenotypic information improve the accuracy of selection, and without phenotypic information, it shortens the breeding cycle length by eliminating the need for phenotyping of the candidates before selection. Further, it is also possible to predict breeding values for a very large number of individuals, which cannot be phenotypically assessed, resulting in increased selection intensity. Additionally, GS can also be applied at several stages in the breeding process to enhance the genetic gain from selection (R total) (Poland and Rutkoski 2016).

In the genomic selection, marker effects are estimated based on the training set of genotypes, which are both phenotyped and fingerprinted with dense marker data. Based on these estimated marker effects, the individuals related to the training population that is only genotyped but not phenotyped are selected (Zhao et al. 2012). The estimated GEBVs are not the function of underlying genes; instead, they are the ideal selection criterion (Jannink et al. 2010). The major problem in the development of the prediction model is over-fitting, and such models can exaggerate minor variations in the data, and the prediction ability decreases (Jannink et al. 2010). Hence, the application of GS in breeding pipelines is influenced by several factors when the trait of interest is affected by a large number of loci. The training population size, genetic diversity and genetic relationship with the breeding or test population, i.e. the individuals of the training population or a close relative or distant relatives of the individuals of the breeding/test population, are the most important among other factors (Pszczola 2012). The heritability of the trait under selection, i.e. complex traits with low heritability and small marker effects, is suitable for genomic prediction/selection, whereas the oligogenic traits can be predicted accurately with few markers with relatively large effects (Daetwyler et al. 2010). The prediction accuracy is low for a complex trait(s) with a large number of markers when these markers are not in linkage disequilibrium with the QTL/genomic regions. However, the accuracy increases when the heritability of the trait and training population size increases (Isidro et al. 2015).

4.4 Genetic Resources for Genomic Selection in Maize for Stress Tolerance

Germplasm in crop plants serves as a valuable resource for crop improvement activities as they exhibit a high level of genetic diversity in many important agronomic traits. The usefulness of genetic resources or germplasm collections in achieving the improvement in grain yield and agronomic performance was unequivocally established by numerous reports. Presently, more than seven million crop accessions are presently preserved in the global gene banks worldwide, which represent the paramount but largely untapped opportunities for breaking productivity bottlenecks to accelerate genetic gain for yield and other traits (Wang et al. 2017). The major hindrances in utilizing the crops’ genetic resources comprise the availability of larger germplasm collections and the lack of an integrated method to exploit the available germplasm resources. Recent advancements in high-throughput genotyping and phenotyping tools, along with evolving biotechnological tools, create opportunities to employ exotic germplasm in plant improvement programmes.

Phenotyping is the current bottleneck in plant breeding compared to genotyping, especially with the decline in genotyping cost by more than 100-fold in the last two decades. Therefore, the phenotyping cost needs to be optimized within the breeding programme. While designing the implementation of the GS scheme into the breeding cycle, the breeders need to select the optimal method for the selection of the training population so that the prediction accuracy increases and reduce the phenotyping cost with improvement in precision (Akdemir and Isidro-Sánchez 2019).

Various panels, training populations and biparental populations have been used to predict GEBVs for various stress-resilient traits in maize. Cao et al. (2017) have used the Drought Tolerant Maize for Africa (DTMA) association mapping panel to implement GS and GWAS analysis for tar spot complex in maize. The DTMA panel carries 282 tropical and subtropical maize inbreds developed at CIMMYT and comprised of lines with resistance or tolerance to an array of biotic and abiotic stresses, which affects the maize production, improved nitrogen use efficiency and grain nutritional quality. At the University of Hohenheim, 2 elite mapping panels comprising 130 dent and 114 flint lines of European origin were used to investigate the GWAS and genomic predictions for Gibberella ear rot (Han et al. 2018) and NCLB resistance in maize (Technow et al. 2013). Similarly, for Fusarium ear rot resistance, genomic predictions were performed in a panel of 874 lines encompassing the previous DTMA panel, CML lines and SYN_DH population (Liu et al. 2021) and tropical maize core collection (Ertiro et al. 2020). Three DH populations (CML550 × CML504, N = 219; CML550 × CML511, N = 110; CML550 × CML494, N = 229) and IMAS (Improved Maize for African Soil) panel were employed in genomic prediction for maize chlorotic mottle virus and maize lethal necrosis resistance (Sitonik et al. 2019) and nitrogen use efficiency (Ertiro et al. 2020). For insect pest resistance, Badji et al. (2021) employed a diverse tropical maize panel composed of 341 DH and inbred lines.

For drought and heat stresses, genomic predictions were carried in the diverse maize gene pools. A recent study employed 3068 DH lines derived from 54 biparental and test crosses generated by crossing an agronomically elite line with lines of drought-tolerant and farmer-preferred traits (Beyene et al. 2021). Further, a multi-parent yellow synthetic maize population and rapid cycle genomic selection were employed to simultaneously improve drought and waterlogging stress tolerance in maize (Das et al. 2020). Many of the previous GWAS panel results can be employed to constitute the testing population to predict genomic breeding values for various abiotic stress tolerances like drought and heat (Shikha et al. 2017; Seetharam et al. 2021). Further, maize wild relatives and landraces harbour various abiotic and biotic stress tolerance genes (Table 4.1). Applying appropriate breeding tools like DH technology coupled with genomic selections could bring these valuable genes into breeders’ disposal for their rapid utility in the current breeding pipelines to deliver stress-resilient maize hybrids.

Table 4.1 Genetic resources for various stress tolerance in maize to develop training sets in genomic selection (Modified from Mammadov et al. 2018)

4.5 Statistical Models in Genomic Selection

In GS, various statistical methods have been employed to estimate marker effects which are classified into parametric, semi-parametric and nonparametric models. The accuracy of marker effects estimation using various statistical methods is a function of the target trait’s genetic architecture (Daetwyler et al. 2010), the population structure (Habier et al. 2007; Zhong et al. 2009) and the marker’s density (Meuwissen and Goddard 2010). The genomic selection models can be categorized into parametric, semi-parametric and nonparametric methods.

4.5.1 Classification of Statistical Models in Genomic Selection

4.5.1.1 Parametric Models in Genomic Selection

Linear least square regression model: Genomic selection is focused on predicting individuals’ breeding value by modelling the association between individuals’ genotype and phenotype. Linear least square regression (LLSR) is the simplest parametric model. The major problem associated with the LLSR model is that it is difficult to perform the estimation with much higher number of markers than the number of individuals with phenotypic information. Although an alternative approach, a subset of markers can be used; still the poor prediction accuracies are obtained if the ratio of the markers’ number and the individuals’ numbers is very large or has multicollinearity (Howard et al. 2014). Therefore, Meuwissen et al. (2001) suggested the modifications to the LLSR model to eliminate the problem of more independent variables (predictor) than dependent variables (regressands). However, it fails to fully take advantage of all the markers’ information since the final model is based on markers with a significant effect only.

Ridge regression (RR): Multicollinearity between the marker data negatively affects the performance of variable selection methods. Ridge regression can be used when a large amount of marker information is available, so it can overcome the ‘p > n’ problem of the least square regression model (Howard et al. 2014). RR in GS was implemented with an assumption of random marker effects (mj’s j = 1. . .p), and markers were drawn from a group with normal distribution and Var(mj) = \( {\sigma}_m^2 \), where \( {\sigma}_m^2=\frac{\sigma_a^2}{n_{k\ {\sigma}_a^2}} \) represents the additive component of genetic variance expressed among individuals and nk is the number of marker loci (Meuwissen et al. 2001; Habier et al. 2007). The key property of RR is that it won’t select a subset of predictors in contrast to other methods such as LASSO and elastic net (de Vlaming and Groenen 2015).

Best linear unbiased prediction (BLUP): The concept of BLUP theory and the mixed model formulation and their utility in animal and plant breeding were discussed by Henderson (1949) and Henderson et al. (1959). The BLUPs are useful to deal with unbalanced datasets, for instance, multilocational datasets, a discrepancy in the number of individuals, etc. (Bernardo 2010). Genomic BLUP (GBLUP) is based on a genomic relationship matrix that explains genetic relationships between individuals, which are calculated from genotypes at single-nucleotide polymorphisms (SNPs), whereas traditional pedigree BLUP (Henderson 1975) uses pedigree relationship matrix with a genomic relationship matrix (Habier et al. 2013).

Least absolute shrinkage and selection operator (LASSO): LASSO is a compelled form of ordinary least squares, which is developed to overcome the limitations of linear least squares by Tibshirani (1996) and in GS first implemented with cross-validation by Usai et al. (2009). LASSO is indifferent to closely correlated markers, i.e. LASSO picks one among the highly correlated markers and ignores the remaining (Wang et al. 2018). Being a penalized regression-based approach, LASSO gives better estimates when the number of markers is greater than the number of individuals (p > n) (Budhlakoti et al. 2020).

Bayesian alphabet models: The Bayesian alphabet models in the genomic selection were started with BayesA and BayesB models (Meuwissen et al. 2001). Later several models, viz. BayesCπ and BayesDπ (Habier et al. 2011), fast EM-BayesA (Sun et al. 2012), fast BayesB (Meuwissen et al. 2009), BRR (Bayesian ridge regression on markers) (VanRaden 2008), Bayesian LASSO (Park and Casella 2008), etc., were derived.

In BayesA and BayesB models, the data and the variances of the marker positions need to be modelled. The main difference between BayesA and BayesB is the prior for the variance components, i.e. in contrast to BayesA, BayesB assumes that not all the markers contribute to the genetic variation. The BayesCπ gives a more sensible formulation of the mixture. However, it poses the same spirit and limitations as BayesB (Gianola 2013). Park and Casella (2008) used the idea from Tibshirani (1996) to connect with Bayesian analysis to come up with Bayesian LASSO. Bayesian LASSO generates the models with non-null regression coefficients even if p > n (Gianola and Fernando 2020). In other words, LASSO results in the sparse model, whereas Bayesian LASSO yields an effectively sparse specification like BayesB (Meuwissen et al. 2001). Yi and Xu (2008) first used the Bayesian LASSO model for QTL mapping followed by subsequent applications in genomic prediction by various researchers (de Los Campos et al. 2009; Legarra et al. 2011; Lehermeier et al. 2013).

4.5.1.2 Semi-Parametric Models

Reproducing kernel Hilbert space (RKHS): Gianola et al. (2006) proposed this semi-parametric model proposed by coalescing the best qualities of a nonparametric model with a mixed model framework (Howard et al. 2014). The RKHS model combines a genomic relationship matrix (G) and pedigree-based numerator relationship matrix (A) in a kernel matrix while making weaker assumptions on the compatibility of G and A (Rodríguez-Ramilo et al. 2014).

4.5.1.3 Nonparametric Models

Nadaraya-Watson estimator: Using Silverman’s (1986) nonparametric kernel estimator, which is used in the estimation of p (x), Nadaraya (1964) and Watson (1964) estimated the conditional expectation function. The estimator is just a weighted sum of observations yi, i = 1, 2, 3… ..n and is called Nadaraya-Watson’s equation. Nadaraya-Watson estimator is one of the most widely used nonparametric models for genomic selection. In the presence of additive effects, the prediction of Nadaraya-Watson estimator model is poor compared to other nonparametric models. However, in the presence of epistatic interactions, the performance of Nadaraya-Watson estimator was significantly better than the parametric methods (Howard et al. 2014).

Support vector machine (SVM) regression: Vapnik (1995) and Cortes and Vapnik (1995) proposed and discussed SVM approach. The SVM is originally employed in classification and regression analysis as supervised learning method. Here, the training dataset is used to create a maximum marginal classifier that results in the biggest possible separation between the comparing classes of observations. In plant breeding, the SVM regression explains the association between the marker’s genotypes and the phenotypes which can be modelled with a linear or nonlinear mapping function that takes samples from a predictor space to an abstract, multi-dimensional feature space (Hastie et al. 2009; Long et al. 2011).

Neural networks (NNs): NNs are types of nonparametric GS models. NNs are originally developed to understand how neurons of the human brain interact, work and conduct computations (Bain 1873; James 1890; Hastie et al. 2009). The feed-forward model is a basic NN model, which is a two-stage network with three types of layers, i.e. an input layer, a hidden layer and an output layer. Nonparametric nature of NNs able to model both linear and complex nonlinear functions permits the quantification of additivity and epistasis interactions (Howard et al. 2014).

4.5.2 Genomic Selection Models: Predictive Abilities and Accuracies

Presently, there are various statistical models available to estimate genomic estimated breeding values. The selection of appropriate models is most crucial for effective genomic selection. Some models are fit better for extremely quantitative traits, while some are performing good for traits which fall between qualitative and quantitative nature (Poland and Rutkoski 2016).

An optimal model should give the highest possible prediction accuracy, limit overfitting on the training dataset and be based on maximum marker-QTL LD rather than on kinship (Habier et al. 2007). This makes models easy to implement as these are consistent across the broad range of phenotypes and datasets and computationally efficient (Heslot et al. 2012). Further, the prediction ability of GS models can be increased by correcting the field spatial variation, which includes use of blocking, with resolvable incomplete block designs such as the alpha-lattice being popular in early-generation testing (Patterson and Williams 1976; Ward et al. 2019).

Furthermore, genetic architecture and heritability have the utmost influence on estimates of prediction accuracy and mean squared error (MSE). Parametric methods give somewhat superior estimates than nonparametric methods for traits with additive genetic architecture. However, when the genetic architecture of the target trait is entirely under the interaction component, parametric methods fail to provide accurate estimates (Howard et al. 2014; Momen and Morota 2018). The parametric, semi-parametric and nonparametric models showed increased prediction accuracies with heritability and the number of markers and individuals. However, an inverse association was observed with the increase in the number of QTLs from 50 to 200 (Sahebalam et al. 2019).

4.6 Genomic Selection Strategies for Stress Tolerance in Maize

The genomic selection strategies can be grouped into three categories other than regular or basic GS in cereals. These strategies can be employed for the GS of desired traits depending on the germplasm relatedness, trait phenotyping and resources (Robertson et al. 2019).

4.6.1 Across-Breeding Cycle Genomic Selection

Across-breeding cycle GS necessitates the good association between training and test datasets. The relationship between training and the test data and high association between the datasets in subsequent years are most pre-requisite for across-breeding cycle GS. The association of training and test datasets can be achieved by including common parents in crossing plans for subsequent years, and/or the crossings must be based on the progeny of previous years that were used as parents (Robertson et al. 2019). In many cases, the varieties released by other breeders or germplasm of special interests enter the breeding pipelines as a source for germplasm diversification. At this juncture, to ensure the breeding materials with sufficient genetic relatedness to implement the across-breeding cycle GS is more challenging. Additionally, without any modifications of across-breeding cycle GS, 6 years is required to use the lines from the respective breeding programme as new parents (Robertson et al. 2019; Michel et al. 2016).

4.6.2 Within-Breeding Cycle Genomic Selection

In within-breeding cycle GS method, the lines from the same breeding cycle are used to constitute the training population for GS, for example, to predict GEBVs of the sister lines with missing phenotypic datapoints. GS within-breeding cycle is important when the aim is to reduce the phenotyping or environments or measuring the expensive and complex traits on selected portion of the progenies to predict for the rest. Generally, high GS prediction accuracies in the same generation are often associated with high genetic relatedness between lines, since multiple lines from each family are being tested. Therefore, prediction accuracy of GS selection is higher within the breeding cycle or generation (Robertson et al. 2019).

4.6.3 Genomic Selection Using Untested Parents for Breeding

In GS with untested parents’ method, the untested parents refer to those lines which are started being used as parents without being tested in the field. It is a drastic way to use genomic selection wherein the phenotyping testing is skipped, at least for the portion of breeding programmes. Here, novel parental lines are selected solely on GEBVs. The use of untested parents can often significantly shorten the breeding cycle and allow faster genetic gain per year, especially when the breeding cycles are large owing to extensive phenotyping. In dairy cattle breeding, Schaeffer (2006) suggested the use of untested parents for the selection of bulls. Presently, use of untested parents in predicting the GS is more popular and revolutionizing dairy cattle breeding programme. In the case of agricultural crops and cereals where the extensive phenotyping is required, the use of untested parents could similarly revolutionize cereal’s breeding approaches (Robertson et al. 2019).

4.7 Genomic Selection for Abiotic Stress Tolerance in Maize

Drought, heat, salinity, waterlogging and mineral nutrient stresses are the major abiotic constraints limiting maize production worldwide (Edmeades et al. 1989). The climate change effects resulted in increased frequency of moderate to severe drought, high air temperature and erratic rainfalls with high intensity. The major focus of maize research in the present scenario is to improve abiotic stress tolerance. However, identifying genetic components that provide abiotic stress tolerance is challenging and resource demanding.

The traits imparting abiotic stress tolerance are governed by several QTL with small individual effects on overall trait expression, which makes it difficult for its identification, modifications and introgression into elite cultivated varieties. Thus, marker-assisted selection and QTL mapping using linkage analysis fail significantly in bringing significant changes. Hence, with the advent of high-throughput genotyping, genomic selection is now being used in breeding for abiotic stress resistance in maize (Pace et al. 2015; Table 4.2).

Table 4.2 Genomic selection in maize for various abiotic stress tolerance traits using different training populations and GS models

4.7.1 Drought Tolerance

The genomic selection was attempted for drought tolerance in 240 maize subtropical lines employing 29,619 SNPs and assessed the genomic prediction accuracies with 7 GS models, i.e. BayesA, BayesB, elastic net, LASSO, random forest, reproducing kernel Hilbert space and ridge regression, for different agronomic target traits under drought stress environments. Of these seven genomic selection models, BayesB has been shown to have the highest prediction accuracy for the dataset. From the top 1053 SNPs, 77 SNPs were found to be associated with 10 drought-responsive transcription factors, which are associated with different physiological and molecular functions. Thus, these drought-related SNPs can be further employed for the development of drought-resilient maize cultivars (Shikha et al. 2017).

Rapid cycle genomic selection (RCGS) for drought resulted in a genetic gain of 110 and 135 kg ha−1 year−1 in multi-parent yellow synthetic populations MSY-1 and MSY-2, respectively. The higher genetic gain for the trait of interest in biparental populations could be due to a change in the population structure of the base population. Further, the genetic diversity of MSY-1 and MSY-2 did not change significantly even after two cycles of GS, indicating that RCGS can be effectively used to achieve high genetic gains without loss of genetic diversity (Das et al. 2020).

Genomic prediction in 210 maize inbred lines under drought and well-watered conditions was conducted using all the SNPs, random SNPs and trait-associated SNPs. The investigation revealed the greater prediction accuracies with trait-associated SNPs across drought and well-watered conditions and all the traits such as grain yield, plant height, ear height, date of anthesis and silking and anthesis-silking interval (Wang et al. 2019). Recently, Beyene et al. (2021) employed genomic selection in 3068 DH lines derived from 54 biparental populations generated by crossing elite inbred line with lines showing tolerance to drought tolerance and other farmer-preferred traits. The study demonstrated that increasing the training set with genotyping and phenotyping data from the previous year along with combining 10–30% lines from the year of testing results in enhanced prediction accuracies. Additionally, Cerrudo et al. (2018) showed the superiority of GS over MAS for grain yield and physiological traits in the maize DH population across the water stress regimes.

4.7.2 Heat Tolerance

Along with drought stress, maize production is also constrained by damage caused by heat stress, which is more predominant in the present circumstances because of global warming. Inghelandt et al. (2019) carried out an experiment to assess the diversity and QTL and predict the genomic values for heat tolerance-associated traits. The genome-wide prediction models’ ability was tested by employing a large number of KASP (Kompetitive allele-specific PCR genotyping) and RAD (restriction site-associated DNA sequencing) SNPs. Additionally, the intra-, inter- and mixed pools prediction accuracies were also examined. Results from the study confirmed that the prediction ability of genome-wide prediction models was found high for within-population calibrations; hence, use of such approaches for selecting heat tolerance at the seedling stage is most preferred (Inghelandt et al. 2019). Genomic prediction for combined drought and heat stress in a panel of 300 maize lines of tropical and subtropical origin revealed that the genomic prediction accuracies obtained from marker trait-associated SNPs were comparatively greater (0.28 to 0.75) than those obtained from the genome-wide SNPs (0.13 to 0.64) for most of the targeted traits (Yuan et al. 2019).

4.7.3 Waterlogging Tolerance

Waterlogging tolerance in maize can be assessed easily by means of yield reduction. Three genomic selection models, viz. RR-BLUP, Bayesian RR and Bayesian LASSO, were employed in 92 sampled families from 390 S1 families tested for waterlogging tolerance. The prediction accuracies from the three models were closer to 0 for crop yield susceptibility index and ranged from 0.16 to 0.44 for yield per se under normal and stressed conditions indicating the necessity of employing larger populations in genomic predictions (Paril et al. 2017). Under waterlogging stress, phenomic and genomic selection showed a genetic gain of 80 and 90 kg ha−1 in populations MSY-1 and MSY-2, respectively, whereas rapid cycle genomic selection resulted in a gain of 90 (MSY-1) and 43 kg ha−1 (MSY-2) (Das et al. 2020).

4.7.4 Nutrient Use Efficiency

Maize is a fertilizer-responsive crop and shows increased grain yield per unit fertilizer application associated with better nutrient use efficiencies. These nutrients are vital for plants to carry out many of the metabolic processes. Any deficiency of these nutrients inhibits plant growth and development, thereby affecting plant yield. Few studies were undertaken to predict genomic breeding values for major nutrients like nitrogen and phosphorus use efficiency traits in maize.

Under low phosphorus stress, the genomic prediction was undertaken in a maize panel with 410 maize inbred lines for 11 agronomic traits employing 5 classical models, viz. RR-BLUP, GBLUP and three Bayesian models (BayesA, BayesB and BayesC). The prediction accuracy was assessed by fivefold cross-validation. The predictive ability of all five models was comparable, although GBLUP outperformed the others. The prediction accuracies significantly varied between contrasting phosphorus environments. Under normal phosphorus conditions, the prediction accuracies were ranging from 0.40 (ASI in 2015) to 0.76 (days to tasselling in 2014), with a mean of 0.53, whereas, under low phosphorus stress, the predictions were 0.06 (ASI in 2015) to 0.73 (days to tasselling in 2015), with a mean of 0.45. Furthermore, traits with higher heritability mostly showed better prediction accuracies than those with relatively low heritability (Xu et al. 2018).

Genome breeding values were predicted in testcross progenies of 411 inbred lines selected from the IMAS panel and crossed with tester CML539 under both low and optimum nitrogen conditions. Moderate to high GEBVs were observed under both optimum and low nitrogen conditions. Under optimum nitrogen condition, the GEBVs of 0.42, 0.62, 0.59, 0.48, 0.60, 0.54, 0.29 and 0.52 were predicted for grain yield, anthesis date, ASI, plant height, ear height, ears per plant and senescence, respectively, whereas, under low nitrogen, the corresponding GEBVs were 0.45, 0.67, 0.64, 0.53, 0.64, 0.63, 0.42 and 0.24 (Ertiro et al. 2020).

4.8 Genomic Selection for Biotic Stress Tolerance in Maize

Maize production is limited by biotic stresses commonly induced by insect pests and/or diseases (Lodha et al. 2013). Maize is plagued by pests, including stem borers, pink borers, shoot fly, termites and various storage pests. In maize, resistance to various biotic stresses is controlled by various QTLs with small or minor effects (Gazal et al. 2018). Hence, the marker-assisted selection cannot serve the purpose. Thus, researchers are facing the genomic predictions and association mapping for the resistance breeding in maize. Presently, quite a good number of investigations were available on genomic perfections for biotic stresses, viz. insect pests and diseases, by fungal and viral pathogens (Table 4.3).

Table 4.3 Genomic selection in maize using different training populations for biotic stress tolerance

4.8.1 Fungal Diseases

Resistance to many fungal pathogens is complexly inherited. Thus, genomic prediction seems to be the viable option. Genomic predictions for NCLB resistance in the two heterotic groups (N = 197) of maize through the BLUP model showed greater prediction accuracies (~0.70) for both dent and flint heterotic groups (Technow et al. 2013). The application of the RR-BLUP model in F2:3 populations derived from crosses CM212 × MAI 172 (population 1) and CM202 × SKV 50 (population 2) achieved the prediction accuracies of 0.83 (population 1) and 0.79 (population 2) for NCLB resistance, respectively (Balasundara et al. 2021).

A total of five biparental DH populations (N = 635) phenotyped for Gibberella ear rot incidence and three grain yield component traits were used to predict the GEBVs employing the RR-BLUP model. The prediction accuracies ranged from 0.20 to 0.39 among the DH populations. Within DH populations, the prediction accuracies were in agreement with theoretical expectations for the target traits showing moderate to high heritability. In contrast, the prediction accuracies are declined by 42% when full-sib lines were replaced by half-sib lines (Riedelsheimer et al. 2013).

Two hundred and thirty-eight maize lines were clustered to identify the lines resistant to ear rot using 23,154 DArTseq markers. Bayesian stochastic search variable approach and RR-BLUP methods were employed to carry out genomic predictions, and both methods presented the equivalent predictive abilities (dos Santos et al. 2016). For FER resistance, the genomic predictions in a panel of 509 maize lines showed similar prediction ability of five GS models, viz. BayesA (0.355), BayesB (0.338), BayesC (0.357), GBLUP (0.367) and RR-BLUP (0.351) (Guo et al. 2020). Similar work on Fusarium ear rot and fumonisin contamination in maize using 449 S0:1 lines derived from recurrent selection population was subjected to GBLUP, BayesCπ, Bayesian LASSO and extreme gradient boosting models.

The prediction accuracies showed a maximum value of 0.46 for FER and 0.67 for fumonisin (Holland et al. 2020). Further, the prediction accuracies for FER resistance estimated with genome-wide markers across the environments in the CML population, DTMA-AM panel and SYN_DH population and across the populations were 0.46, 0.53, 0.32 and 0.57, respectively. These prediction accuracies were improved (CML, 0.74; DATM, 0.62; SYN_DH, 0.63; and across populations, 0.65) when the models were framed with FER resistance-associated SNPs (Liu et al. 2021). However, quite low prediction accuracies were reported for FER (0.34) and starburst (0.4) in 320 tropical maize inbred lines using GBLUP, Bayesian LASSO and BayesC prediction models with 5000-fold cross-validations (Kuki et al. 2020).

In the case of tar spot disease, genomic predictions showed moderate to high prediction accuracy in different populations (DTMA, 0.55; pop1, 0.58; pop2, 0.74; and pop3, 0.69) employing several training populations and marker densities. When half of the population was included in the training set with 500 to 1000 SNPs, the prediction accuracy was more than 0.50 (Cao et al. 2017). There are no large effect resistant genes nor any practical control methods available to control Goss’s wilt and leaf blight diseases. Additionally, the GWAS was not effective to identify variants that are significantly associated with Goss’s wilt. However, genomic prediction with RR-BLUP showed prediction accuracy of 0.69, indicating the possible scope of GS in improving Goss’s wilt and leaf blight diseases in maize (Cooper et al. 2019).

In maize, grey leaf spot (GLS) is one of the major diseases. GLS resistance is genetically controlled by multiple genes with cumulative effects. The genomic prediction was performed in biparental populations and association panel consisting of 410 maize lines employing RR-BLUP with fivefold cross-validation. The prediction accuracies within populations were low to moderate, i.e. 0.39, 0.37, 0.56, 0.30, 0.29 and 0.38 for IMAS association panel, DH pop1, DH pop2, DH pop3, F3 pop4 and F3 pop5, respectively. Further, across the populations, the prediction accuracy was greatly increased to 0.84. GP results further consolidated the resistant line development by incorporating both major and minor effect genes (Kibe et al. 2020).

4.8.2 Virus Diseases

Recently, the viral diseases are gaining importance owing to sudden outbreaks and devastating effects on maize production in the developing world. Among the viral diseases of maize, maize lethal necrosis (MLND) is the most prominent one. MLMD resulted through synergistic interaction of two viruses, viz. maize chlorotic mottle virus (MCMV) and sugarcane mosaic virus (SCMV). The fivefold cross-validation of ridge regression best linear unbiased prediction (RR-BLUP) model revealed higher prediction accuracy of 0.56 and 0.36 for IMAS-AM (N = 380) and DTMA-AM (N = 235) panels, respectively. Importantly, the addition of SNPs associated with MLND resistance in the prediction model improved the prediction accuracy from 0.41, which increased to 0.56 in the panels (Gowda et al. 2015). Genomic prediction for the MLND using 1400 diverse inbred lines showed an increase in prediction accuracy for disease severity and AUDPC with an increase in marker density and training population size from 500 to 6300 and 230 to 915, respectively (Nyaga et al. 2020). Similarly, genomic prediction for MLND which was carried out with RR-BLUP in three doubled haploid populations with fivefold cross-validation showed high prediction accuracy for the populations with high heritability and large population size compared to the others (Sitonik et al. 2019).

4.8.3 Bacterial Diseases

In maize, Goss’s bacterial wilt and leaf blight are the major bacterial diseases. Under severe disease incidences, yield losses of >40 per cent have been seen in susceptible maize hybrids (Carson and Wicks 1991). Resistance to Goss’s wilt is an intricate and polygenic trait with no large effect resistance genes or major QTL. Goodman maize diversity panel consisting of 223 diverse maize lines was evaluated to identify the genomic regions associated with Goss’s wilt resistance by using the genomic prediction model RR-BLUP. The prediction accuracy of 0.69 was recorded (Cooper et al. 2019).

4.8.4 Insect Pests

Insect damage on maize plants in the field and stored grains severely affects food security in many countries across the globe (Demissie et al. 2008). Fall armyworm and stem borers are the major pests of maize that impede maize production in the field, and maize weevils are the major category of storage pests, causing the severe yield loss of up to 10 to 90 per cent. This, in turn, affects the grain marketability, and consumer health concerns may arise due to probable contamination of the grains with aflatoxins and mycotoxins (Tefera et al. 2019; Munyiri et al. 2013).

Individual and joint-population QTL analyses and genome-wide predictions with GBLUP for European corn borer stem damage resistance showed the superiority of the GBLUP model with the prediction accuracy of 0.70 over the QTL model despite the detection of QTL with large effects. The genomic trained model based on DH line per se performance was effective in predicting stalk breakage in test crosses (Foiada et al. 2015). Genomic prediction with 16 GP models on BLUPs and BLUEs for fall armyworm and maize weevil resistance was employed in 341 doubled haploid and inbred lines with ten- and fivefold cross-validation. The prediction accuracy realized with BLUPs was at least as twice as those with BLUEs. Additionally, genomic prediction models showed similar predictive abilities for all the studied traits, and a highly positive correlation (0.92) was witnessed between training population size and prediction accurecies in the random-based training set approach, and the reverse was seen in the pedigree-based training set approach (−0.44), owing to degree of kinship between the training and the breeding populations (Badji et al. 2021).

4.8.5 Weeds

Weeds cause devastating effects on maize yield potential. The striga (Striga hermonthica) parasitism is one of the major hurdles in the maize production system of sub-Saharan Africa. The genomic selection for striga resistance showed impressive gains in grain yield under striga-infested (498 kg ha− 1 cycle− 1 or 16.9% cycle− 1) and optimal environments (522 kg ha− 1 cycle− 1 or 12.6% cycle− 1), respectively. Additionally, the study revealed an enhanced genetic gain of grain yield per cycle in striga-infested condition was associated with enhanced plant and ear heights, resistance to root lodging, husk cover, ear parameters and striga tolerance level (Badu-Apraku et al. 2019).

4.9 Integrating Genomic Selection with Contemporary Maize Breeding Tools for Stress Tolerance

With improvement in biotechnological tools, the selection of plants has become more accurate and precise owing to the integration of both phenotypic and genotypic criteria in the selection process. Traditional marker-assisted selection methods with QTL or MAS served as a complementary tool to accelerate the selection in maize breeding programmes (Ribaut and Ragot 2007; Mayor and Bernardo 2009; Tuberosa and Salvi 2009; Beyene et al. 2016). But the identification of the QTLs that are showing expression constitutive across environments and populations with different genetic background is essential to use them in MAS (Bernier et al. 2008). G × E interactions reduce the correlation between the traits and QTL detected among the target environments (Bolanos and Edmeades 1996; Tuberosa et al. 2002). In practical breeding, the QTL identified for the target trait usually changes with different genetic backgrounds (Rong et al. 2007) and maize between the inbred lines per se and their testcross hybrids (Mei et al. 2005; Szalma et al. 2007). Numerous QTL mapping experiments conducted in the past have limited application in actual breeding because of the low marker densities in those studies which resulted in poor genetic resolution. Recent advances in genotyping techniques, such as genotyping by sequencing, have resulted in the availability of thousands of SNPs that are equally scattered throughout the genome (Elshire et al. 2011; Poland et al. 2012). The high-resolution genetic maps with high-density SNPs reduce the confidence interval of surrounding QTL, thereby allowing high-precision mapping.

Recently, MAS-based GS (GS-MAS) is considered as an upcoming strategy in maize breeding (Meuwissen et al. 2001). The GS-MAS allows the major benefit of capturing the minor effects in selection process. The traditional QTL-MAS demands the use of the flanking markers of target QTL or gene; however, the GS-MAS requires the large number of genome-wide distributed markers (Peng et al. 2014). For complex traits controlled by many QTLs with minor effects or low heritability, simulation and empirical analyses suggested the superiority of GS-MAS over QTL-MAS (Bernardo and Yu 2007; Mayor and Bernardo 2009; Heffner et al. 2010; Guo et al. 2013). Proper integration of GS-MAS in the breeding workflow can partially replace the field testing and reduce the line development time and cost of breeding activities (Heffner et al. 2010).

Phenotyping of the large-scale breeding material like doubled haploids is highly resource demanding and often exceeds the phenotyping capacity to evaluate all the lines in multi-environment trials. Therefore, partial use of genotypic data to select DH lines while improving the genetic gains for the key traits along with phenotypic selection can significantly save resources (Beyene et al. 2021). Hybrid breeding is also an evergreen area in maize research. Therefore, the application of genomic prediction in the pre-screening of hybrids could improve the efficiency and efficacy of maize hybrid breeding programmes. Among the various prediction models available, Bayesian models offer great flexibility for predicting and studying the hybrid performance (Alves et al. 2019). Additionally, bringing all the contemporary breeding tools on a platform with GS could enhance the genetic gain and efficiency of GS for stress resilience in maize. Especially, integrating GS with rapid generation advancement methods like doubled haploid (DH) technology, speed breeding coupled with precision phenotyping and high-throughput genotyping assisted by decision support tools could be useful in the rapid delivery of stress-resilient maize cultivars (Fig. 4.2).

Fig. 4.2
An illustration depicts genomic selection with speed breeding and D H technology. Maize germplasm is introduced to technologies. High throughput phenotyping for stress tolerance and high throughput genotyping undergo genomic selection that leads to improved maize hybrids.

Schematic representation of integrating genomic selection with rapid generation advancement tools like DH technology, speed breeding, high-throughput genotyping and novel high-throughput phenotyping for stress tolerance to enhance the efficiency and pace of stress-resilient maize breeding

4.10 Major Challenges for Genomic Selection in Maize Stress Tolerance Breeding

Genomic selection is yet to be popular among the plant breeding community, which necessitates more evidence for sensible and successful utility in ongoing breeding programmes. In fact, most of the studies on GS application rely on statistical models and simulations, which requires appreciable knowledge of both statistical genomics and quantitative genetics. Furthermore, many of the abiotic and biotic stress tolerance/resistances show complex inheritance and challenge the accuracy of GS as much as phenotypic selection. Since the statistical models in GS are trained with phenotypic data, therefore, the reliability and successful utility of GS for stress resilience breeding depend on well-replicated phenotypic data (Juliana et al. 2018).

The applicability of GS is limited within its scope. The performance of GS is generally low when GS models are trained with completely unrelated germplasm or with lines evaluated in non-correlated environments (Juliana et al. 2018; Ertiro et al. 2020). The major goal of GS is to reduce the repeated phenotyping cost and accelerate the genetic gain. The GS requires high-throughput genotyping to capture the genomic contribution towards GEBVs for target traits. The necessity for genotyping with a large number of markers in every generation of selection adds considerably to the price of breeding programmes. Although NGS cost is reduced very significantly, still the prices are not affordable by many plant breeders of developing and underdeveloped worlds to incorporate the GS in their regular breeding programme.

Changes in the gene frequencies and interactions in the breeding generations influence the marker effects and subsequently GEBVs. Therefore, an amendment of the trained GS model in the breeding cycle with the addition/deletion of markers is required. Additionally, the accuracy of GEBVs has been evaluated with the additive component-based simulation models. However, these models ignore interaction components that do not seem to be realistic in practical plant breeding. Therefore, there is a need to develop statistical models which consider interaction effects in addition to additive genetic components.

The successful implementation of GS in stress-resilient breeding requires intensive infrastructure in terms of high-throughput genotyping and phenotyping, which are rapid, reliable and easy. Unfortunately, most of the moderate-sized public sector breeding programmes in the developing world don’t possess high-end phenotypic platforms. Further, planning, integration and execution of GS with ongoing breeding programmes require breeders to reorient their strategies in their breeding programmes.

4.11 Prospects

Presently, GS is one of the most promising breeding methods for accelerating the development and release of new cultivars; as a consequence, the use of GS to shape the gene pools and breeding populations from gene bank accessions demands further focused investigation, especially given the vulnerability of elite inbred lines and hybrids to climate change-induced stresses. Furthermore, GS is mainly practised for a single trait; developing models to practise selection for multiple traits and including the component of G × E interaction would be more beneficial.

Genomic prediction requires the marker information that covers the entire genomic region. Thus, it becomes necessary to genotype the breeding material extensively. With the advancement in next-generation sequencing technologies, the genotyping has become easy and less resource driving. The most employed genotyping platforms like Ion Torrent, AmpSeq, GBTS and SNP-seq are reported to genotype thousands of SNPs at a time. A recent technique termed target SNP-seq conglomerates the benefits of high-throughput sequencing and multiplex PCR amplification. The genome-wide SNPs employed in the SNP-seq are poses the conserved flanking sequences, which facilitated capturing through PCR amplification. Furthermore, SNP-seq is suitable in developing countries owing to gain in several hundred SNPs while sequencing the SNP location with approximately a thousand times coverage within a short time and reduced cost.

Efficiency of genomic prediction is adversely affected by outliers, which may occur due to erroneous data imputation and outlying responses. Outlier detection in high-dimensional genomic data is difficult. Therefore, combining p-values based strategies to obtain a single p-value have been found to be very useful. The prediction accuracy of breeding values can be improved by considering the group means or group sums as a substitute to individual records for several traits which are difficult to phenotype but are economically important. For some of the economical and difficult-to-quantify traits, utilizing group means or group sums as an alternative to individual records can increase breeding value prediction accuracies. These prediction accuracies increase with increasing relationships between the group members.