Introduction

Maize (Zea mays L.) has become adapted “to the broadest range of climatic conditions of all crops, from 40S in Chile to 50N in Canada and Russia, from sea level in the West Indies to elevations above 3400 m in the Andes” (Bouchet et al. 2013). Global maize agriculture was significantly enabled through adaptation to temperate environments that initially occurred during a 2000-year period following introduction into the North American continent ca. 4000 years before present (BP) (Bouchet et al. 2013; Swarts et al. 2017). Record global maize grain production of 1054 million metric tonnes was achieved during 2016/17, rising by approx. 15 mio. metric tonnes/year since 1961. Eighty-five percent of maize grain is produced by nine countries and the European Union. China and the USA collectively accounted for 56% of global maize production in 2017/18 (National Corn Growers Association 2018). In addition, global production of silage maize increased > fourfold since 1961 to 18 million tonnes contributed by a doubling of area under production to 1.4 million ha and a > twofold increase in yield to 12.8 tonnes/ha (FAOSTAT 2018). Population growth and greater demand for animal products, particularly in developing countries, continue to increase demands for maize usage as food, feed and fuel (ethanol) and other industrial raw materials.

Maize occupies approximately equal areas of production in the tropics and temperate environments, yet the majority (70%) of maize production occurs under temperate conditions (Edmeades et al. 2017). Most global production is provided by hybrid maize (Duvick 2005a, b; Masuka et al. 2017a, b). Hybrids developed by CIMMYT yield > 20% more than OPVs under optimal conditions, and the disparity is magnified to 30– > 60% under abiotic and biotic stress conditions (Masuka et al. 2017a). However, open-pollinated varieties (OPVs) provide the majority of seed supply in some regions provided by the formal breeding sector (e.g., West Africa), albeit with much regional variation (Kassie et al. 2012) and due to many cited issues including seed supply (Pixley 2006; Gaffney et al. 2016). More resources in terms of breeding support over a longer period of time have been directed toward maize improvement in temperate climates than have been applied, to date, to the improvement in maize production in the tropics (Edmeades et al. 2017) and heterotic patterns are not firmly established in tropical maize populations (Betran et al. 2003; Reif et al. 2003; Wen et al. 2011). Nonetheless, mean rates of yield increase in tropical environments are now similar to those in temperate climates at 74–75 kg/ha/year, i.e., at 1% or 2.3% in temperate and tropical environments, respectively. However, there are major disparities between South America and SE Asia (128–142 kg/ha/year) and sub-Saharan Africa, Central America and the Caribbean (27–40 kg/ha/year) (Edmeades et al. 2017). Figure 1 shows trends in maize production and yields for countries or regions that collectively provide ca. 70% of global maize grain production. Global trends in maize yields during 1961–2008 are shown in Fig. 2 (Ray et al. 2012). Low rates of yield gain are usually due to several interrelated factors that can change with circumstances and which include a lack of uptake of improved varieties, poor soil fertility, weeds, pests, disease and droughts (Fischer et al. 2014).

Fig. 1
figure 1

Trends in maize yields 1961–2014 for the five top producing maize countries globally. Data from FAOSTAT

Fig. 2
figure 2

Changes in maize yields for individual countries and regions on a global basis 1961–2008 Fig. 2a from Ray et al. (2012)

Comparing the contribution of breeding (genetic gain) to increased yields among different studies is challenging due to specific contextual issues including germplasm, breeding strategies, cultivation and harvesting practices, initial yield levels, methodology and length of study. Genetic gain data achieved using single-cross hybrids for eight countries with rates of gain ranging from 50 to 194 kg/ha/year (Smith et al. 2014). Stagnating or declining maize yields are not due to a lack of potential genetic gain. For example, results from CIMMYT hybrid maize breeding program for eastern and southern Africa demonstrate rates of genetic gain of 109.4 kg/ha/year (optimal conditions), 12 kg/ha/year (low N), 23–32 kg/ha/year (drought) and 141.3 kg/ha/year under maize streak virus infestation (Masuka et al. 2017b). Accelerated adoption of improved drought-tolerant maize varieties could generate from US$362–US$590 during 7 years and use of low-N-tolerant varieties has similar financial gross benefits including US$100–US$136 in benefits to producers (Masuka et al. 2017a, b). Strategies to improve maize productivity in China include higher seed quality and the development of hybrids that preserve individual plant yield under higher planting densities (Ci et al. 2011; Li et al. 2011; Niu et al. 2013; Qin et al. 2016).

Rates of production have increased since 1991 in Iowa, Illinois and Minnesota, while remaining stable for Indiana and decreasing in Nebraska (Fig. 3). The rate of genetic gain for maize (Smith et al. 2014), updated to include hybrids released during 1930s–2016, indicates a single inflection point during the 1960s when single-cross hybrids replaced three- and four-way hybrids. The rate of genetic gain during 1930–1960s was 55.5 kg/ha/year, which then rose to 99.3 kg/ha/year in subsequent decades. In addition to advancing yield, maize breeders have added genetic contributions to defense against insect pests and enhanced environmental quality by facilitating conservation tillage. The rate of genetic gain per se may have been temporarily reduced, while selection for defensive traits increased for factors including improved germination and seedling growth in cold, wet and disease-infested soils, agro-ecological conditions that are associated with conservation tillage. Other trait deficits arise as particular hybrids gain market share, e.g., root lodging, brittle snap, disease, insect pests.

Fig. 3
figure 3

Maize yields on farms during 1963–2017 in Iowa (rain-fed), Nebraska (rain-fed and irrigated), during 1947–2017 in Kansas (rain-fed and irrigated) and National Corn Grower Winners (mean of top 3 per class) reported during 1988–217 (irrigate conventional tillage, irrigated no-tillage and = rain-fed, all tillage classes)

US corn grower contest winning yields have increased. However, harvests of > 25–30 t/ha are from plots located at lower latitudes, with longer growing seasons, more solar radiation and microclimates that reduce heat stress compared to the central Corn Belt. In contrast, annual contest winning yields obtained on irrigated plots in Nebraska (mean top three, each year 2015–2017) are 19.3 t/ha, similar to contest winning yields already achieved since the early 1980s (Duvick and Cassman 1999). Consequently, there is no firm evidence to show that yield potential (Yp) has increased due to changes in physiology underpinned by genetic change. The primary means by which maize yields have increased is through increased abiotic and biotic stress tolerances which avoid barrenness while allowing increased planting densities. Newer hybrids outperform older hybrids regardless of weather, drought or low nutrient stress. Nonetheless, physiological studies show that increased grain production per plant and the ability to yield under higher planting densities are independent. Consequently, advances in both attributes might provide successful breeding strategies (Gonzalez et al. 2018). Even though most selection has been practiced under fertile conditions, Haegele et al. (2013) found a relatively high rate of genetic gain under low-N conditions. Most genetic gain in high N environments could be explained by improvements in grain yield under low N. Mastrodomenico et al. (2018) found large genetic variation for most N-use traits among maize inbred lines with expired plant variety protection (PVP). Progress in NUE through breeding appears thus feasible (Haegele et al. 2013; Mastrodomenico et al. 2018).

Forecasting future climates and yield responses are notoriously complex and speculative. Heat, precipitation and solar radiation affect yield trends over years and affect annual deviations due to extreme events from year to year. Amelioration of weather effects through breeding is feasible given the repertoire of available genetic diversity for maize for climatic trends, while breeding for annual stability is more challenging, if not impossible to remedy either through genetics or agronomic practices. Enhanced levels of CO2 do not stimulate photosynthetic accumulation of biomass due to the C4 physiology of maize, although higher levels of CO2 can mitigate negative effects of drought stress (Leakey et al. 2006; Markelz et al. 2011). The primary effects of climate change will likely be driven by changes in heat and precipitation with various models predicting both increase and decrease in maize yield. Taken together, future breeding efforts will continue to focus on exploiting current Yp, which plateaued in yield contests of past decades, rather than increasing the Yp itself.

The objectives of this manuscript are to review important areas that have contributed to progress in maize breeding and will likely determine its future success. Specifically, we will report on progress in (1) genome analysis, (ii) germplasm diversity characterization and utilization, (3) manipulation of genetic diversity by transformation and genome editing, (4) inbred line development and hybrid seed production, (5) understanding and prediction of hybrid performance, (6) novel breeding methodologies and (7) synthesis of opportunities and challenges for future maize breeding.

Maize pan-genome: evolution of genomic resources facilitating gene identification and genetic mapping

Maize is the most widely grown agricultural crop in the world and a pre-eminent experimental model plant. This dual importance of maize is largely due to its complex and diverse genome, which has allowed researchers to better understand genetics, cytogenetics and genomics, and has offered a rich pool of genetic diversity to help breeders create improved germplasm. Maize has a medium-sized genome among the grasses with approximately 2.4 billion base pairs. There are between 30,000 and 40,000 genes in maize, with a large proportion being syntenically conserved with related grasses.


Maize maps Early maps (cytogenetic maps) (Birchler 1980) were based on the cytogenetic position of a gene relative to its location on a stained chromosome. Beginning in the late 1980s, the first “molecular” maps (Burr et al. 1988; Helentjaris et al. 1986; Weber and Helentjaris 1989; Gardiner et al. 1993) were developed using small DNA fragments (probes) that detected restriction fragment length polymorphisms (RFLPs). This early DNA mapping technique improved the coverage of genetic maps to a few hundred markers across the genome. Additional molecular marker approaches to create maps included simple sequence repeat (SSR) markers, random amplified polymorphic DNA (RAPD) markers (Lanza et al. 1997) and amplified fragment length polymorphism (AFLP) markers (Castiglioni et al. 1999). The evolution of DNA markers as important breeding tools from RFLPs to SSRs/AFLPs and now to single-nucleotide polymorphisms (SNPs) has enabled high-resolution mapping at much lower costs. Composite genetic maps merged data from distinct mapping populations to create high-resolution linkage maps (Beavis and Grant 1991).


Progress in sequencing technologies Sanger sequencing (Sanger et al. 1977) was used for most of the early sequencing projects. Sanger sequencing produces reads of up to one kilobase (kb) in length. To produce longer stretches of sequence, a “shotgun” approach was used, where overlapping DNA was cloned and sequenced and then assembled to create contigs (i.e., contiguous sequences). These contigs were assembled to scaffolds, the framework for extended sequences. As sequencing technology steadily improved, more complex DNA entities were sequenced: genes, mitochondria, chloroplasts, single chromosomes, viruses, etc. (Heather and Chain 2016). The first plant genome (Arabidopsis) was sequenced in 2000 (Initiative 2000) and the first crop (rice) in 2002 (Goff et al. 2002; Yu et al. 2002) (Fig. 4). Prior to the first maize genome sequence, bacterial artificial chromosomes (BACs) were used to understand the structural organization of maize and to create a physical map with anchor markers on genetic maps (Yim et al. 2002). Other popular sequence data in the maize community include expressed sequence tags (EST), genome survey sequences (GSS) and random clone inserts (Dong et al. 2003). This groundwork eventually led to the maize genotype B73 being sequenced (Schnable et al. 2009), based on Sanger sequencing using the shotgun approach. B73 was chosen because it is a key founder line for various elite inbreds used in public and private hybrid cultivars.

Fig. 4
figure 4

Timeline of sequencing technologies, major genomes and maize genomics. The figure shows three timelines. The first timeline list the release dates of major sequencing technologies focusing on the early first-generation technologies (1950–1990) and the next-generation sequencing technologies (2000s). The second timeline shows the release dates of four major genomes (yeast, arabidopsis, human and rice) and the first reported pan-genome (bacteria). The third timeline shows the release dates of maize genomes and major genotype datasets

The second generation of sequencing technologies, also known as next-generation sequencing (NGS), was led by sequencers that were commercialized by 454 Life Sciences (Ronaghi et al. 1998) and Illumina (Voelkerding et al. 2009). These sequencers allowed for parallelized DNA amplification, which greatly increased the amount of DNA sequenced per run cycle. Current Illumina sequencers can sequence between 50- and 500-bp reads, and can produce up to 100 Gb of total sequence per lane. Many of the recent maize genome projects (Hirsch et al. 2014; Unterseer et al. 2017; Yang et al. 2017a, b) are primarily based on Illumina sequences. A third generation of sequencing technologies is emerging, such as the single-molecule real-time (SMRT) platform from Pacific Biosciences (PacBio) (Eid et al. 2009). PacBio reads can be in the 10,000 s of bp and can even reach over 100,000 bp. The major disadvantage (besides cost) is that PacBio sequences have higher error rates. Therefore, they are generally used in combination with Illumina data in bioinformatic assembly approaches. Other emerging third-generation technologies (with leading commercial providers in parentheses) include: nanopore sequencing (Clarke et al. 2009) (Oxford Nanopore Technologies), assembly of synthetic long reads (Eisenstein 2015; Li et al. 2015) (Illumina and 10× Genomics), high-throughput optical mapping (Schwartz et al. 1993) (BioNano Genomics) and chromosome conformation capture sequencing (Belton et al. 2012; Putnam et al. 2016) (Dovetail Genomics). Each of these technologies is contributing to higher-quality, more contiguous genome sequence assemblies at lower costs. The original full-genome sequencing projects took years to complete with costs measuring as high as $3 billion (human genome project). Current sequencing efforts are measured in weeks and in thousands of dollars.


B73 Maize Sequencing Project, and subsequent versions and challenges Maize is a difficult genome to sequence and assemble due to its complex and repetitive composition. Maize has hundreds of thousands of long terminal repeats accounting for about 85% of the genome (Huang et al. 2012). In addition, maize is a paleopolyploid. One genome duplication event occurred around 5–12 million years ago, and another one predates the divergence of cereal crops around 70 million years ago (Schnable and Freeling 2011; Woodhouse et al. 2010). The first maize genome sequence (Schnable et al. 2009) was completed by the Maize Genome Sequencing Consortium (MGSC). This genome assembly (B73 RefGen_v1) contained 2048 Mb in 125,325 sequence contigs (N50 of 40 kb), forming 61,161 scaffolds (N50 of 76 kb), and was anchored to a high-resolution genetic map (Wei et al. 2009). The structural annotation included a total of 32,540 high-confidence protein-coding genes. Since then, the B73 sequence has undergone three major updates (RefGen_v2, RefGen_v3, RefGen_v4). The latest B73 RefGen_v4 (Jiao et al. 2017) was based on PacBio sequencing and high-resolution optical mapping and is the most accurate assembly of maize to date.

For the past 8 years, B73 has been the only publicly available sequenced maize genome and has been the focus and the representative Z. mays reference genome at the public information resources MaizeGDB (Andorf et al. 2016), Gramene (Tello-Ruiz et al. 2018), Ensembl Plants (Bolser et al. 2017), GenBank (Benson et al. 2013), EMBL-EBI (Chojnacki et al. 2017) and Phytozome (Goodstein et al. 2012). The structural annotations for B73 are used in most maize experiments and are cited in the majority of maize publications.


HapMap and Diversity projects To examine the genetic diversity of maize, thousands of maize lines have been genotyped and aligned to B73. SNPs are usually determined relative to B73 (Jiao et al. 2012; Ganal et al. 2011; Lu et al. 2015; Romay et al. 2013; Wallace et al. 2014). The most comprehensive dataset generated began in 2009 as a complementary study to the B73 reference genome. Millions of sequence variations across 27 maize genome lines were identified to create a first-generation haplotype map of maize (HapMap) (Gore et al. 2009). The study found areas of suppressed recombination near centromeres and hundreds of regions associated with geographic adaptation. HapMaps are especially important in maize due to the high variation across any two maize lines, including extensive presence/absence variation at the gene level between inbred lines. Subsequent versions, HapMap2 (Chia et al. 2012) and HapMap3 (Bukowski et al. 2018), expanded the number of lines and increased resolution by additional SNPs. HapMap2 found 55 million SNPs among 103 maize and teosinte (i.e., wild maize) lines. HapMap3 identified 83 million variant sites for 1218 maize lines.


Maize pan-genome In addition to B73 RefGen_v4, complete reference-quality maize genomes released in the past year include W22 (Springer et al. 2018), PH207 (Hirsch et al. 2014), CML247 (Lu et al. 2015), Mo17 (Sun et al. 2018), Z. mays Mexicana (Yang et al. 2017a, b) and the European Flint lines EP1 and F7 (Unterseer et al. 2017). Soon several more maize and Zea lines are expected to be released, including B104 (USDA-ARS/Iowa State University), Ki11 and NC350 (Doreen Ware, USDA-ARS) and a Zea diploperennis genome (Matthew Hufford, Iowa State University). At this rate, dozens, if not hundreds, of sequenced maize genomes will be available in the near future.

These newly assembled genomes are integrated into a maize pan-genome, a concept first illustrated in bacteria (Donati et al. 2010; Liu et al. 2014; Tettelin et al. 2005). Pan-genomes contain two types of gene models: core genes (or accessory genes) (Segerman 2012) and pan-genes or dispensable genes (Li et al. 2014; Vernikos et al. 2015). For maize, core genes are found in all lines of maize, whereas pan-genes are found at least once but not in all maize lines. An initial maize pan-genome study (Hirsch et al. 2014) found that approximately 38% of all annotated B73 reference genes were present in all 503 of the maize inbred lines examined. This percentage is low in comparison with the percentage of core genes found in 3000 cultivars in one rice study (47%) (Sun et al. 2017) and to the percentage of core genes (80%) found in a soybean pan-genome made up of seven cultivars (Li et al. 2014). However, as the number of core genes and pan-genes are relative to the number of cultivars studied, we predict the expected number of pan and core genes in maize will change as more maize accessions are sequenced and compared.

When constructing a pan-genome, accuracy depends on assembly and annotation quality, orthologue detection methods and diversity of the selected lines (Golicz et al. 2016). Visual representation of a pan-genome is still an open question with many challenges (Consortium 2018; Golicz et al. 2016). A large and complex genome such as maize presents more challenges compared to small genomes. Diversity involves differences not only in sequence but also in the position of orthologues within the pan-genome, as well as in copy number. Certain gene families are positionally dynamic and tend to reside in tandem arrays. The simplest approach to create a maize pan-genome is to select a reference genome against which all other genomes will be compared. This is the approach taken with initial maize pan-genomes, though this approach limits the study of full maize diversity, since it excludes genes that might be present in other maize lines but not in the reference genome. An ideal maize pan-genome would include positional information for all contributing maize lines, representing the total positional diversity in maize. De Bruijn graphical representations (Paten et al. 2017), such as a practical haplotype graph (PHG), may offer solutions to this problem.


Gene and quantitative trait locus (QTL) mapping One of the driving forces to establish genetic maps, sequence genomes, etc. is the interest in the identification of genes or QTL affecting traits of interest. While genetic mapping has been used for more than 100 years, a limitation has been the low number of genetic markers, until the advent of molecular markers in the 1980s. In biparental populations using few hundred markers covering the genome, genes, mutant phenotypes and QTL were mapped based on their linkage to RFLP markers. Genetic fine mapping along with physical maps enabled map-based gene isolation in the 1990s. Progress in molecular techniques was accompanied by progress in the development of statistical methods. For QTL mapping, the initial simple and interval mapping has been replaced by composite and multiple interval mapping approaches (Jeffrey and Lübberstedt 2014). A limitation of biparental populations (e.g., F2-derived) of low genetic resolution was initially addressed by repeated intercrossing before the development of mapping families. The intermated B73 and Mo17 (IBM) population (Lee et al. 2002) was established as community resource, with more than 2000 markers and representing two of the major heterotic groups used in US maize germplasm (Coe et al. 2002; Cone et al. 2002; Lee et al. 2002). Subsequently more sophisticated multiparental mapping populations were developed, capitalizing on sequence-based markers at high density. Most prominently, the maize nested association mapping (NAM) population was created (Yu et al. 2008) by crossing B73 to 25 diverse maize lines (a.k.a. founder lines). For each of the 25 F1s, 200 recombinant inbred lines (RILs) were developed. With a total of 5000 RILs, and combined linkage and association mapping, the maize NAM population has been a valuable public resource to map genes for various complex traits such as flowering time (Buckler et al. 2009) and disease resistance (Kump et al. 2011).

Based on concepts in human genetics, candidate gene-based and later genome-wide association studies (GWAS) have become increasingly popular, enabled by high-density markers and sequenced genotypes. The main strength of GWAS populations is their low LD and thus the ability to map loci at high resolution, potentially pinpointing causative genes. For example, the above-mentioned HapMaps were used to identify SNPs associated with agriculturally important traits including leaf architecture (Tian et al. 2011; Li et al. 2012b), resistance to Northern and Southern Leaf Blight (Kump et al. 2011; Poland et al. 2011), plant height, flowering time (Li et al. 2016; Peiffer et al. 2014) and ear rot disease resistance (Zila et al. 2014). Other GWAS panels in maize include the Ames panel (Romay et al. 2013; Pace et al. 2015) and the BGEM panel (Sanchez et al. 2018; Vanous et al. 2018). The identification of genes of interest did undergo a major paradigm shift from being an exclusively forward genetic approach to being increasingly a reverse genetic approach based on rapidly accumulating sequence and gene function information. Moreover, with an increasing number of genotyped panels and mapping populations, gene and QTL mapping efforts shifted from genotyping to phenotyping and analysis of large datasets.

The Maize Genetics and Genomics Database (MaizeGDB) integrates mapping data from a wide range of genetic maps currently hosting over 2000 genetics maps. A universal composite map (i.e., Genetic Map) is updated and maintained at MaizeGDB (Andorf et al. 2016; Lawrence et al. 2008). Table 1 shows a snapshot (August 2018 release) of the counts of major data types available at MaizeGDB. While the numbers of mapping studies and identified QTL and loci are overwhelming, the statement of Bernardo (2009) is certainly still valid: “…the vast majority of the favorable alleles at these identified QTL reside in journals on library shelves rather than in cultivars that have been improved through the introgression or selection of these favorable QTL alleles.” Alternative approaches to marker-assisted selection such as genomic selection (see below), not requiring mapped loci, seem to be of more practical relevance for plant breeders at this time.

Table 1 Data types available at MaizeGDB. The table lists 12 major data types at MaizeGDB with the counts of each. Genomes are the most recent version of reference-quality genome assemblies. Gene models are the structural gene predictions for each genome assembly. Genes are loci in the maize genome identified as a gene. Annotated genes are genes that are associated with a phenotype or have a gene ontology term. Genetic maps are maps created by the maize community. Loci/QTL are points, probes, QTL, etc. identified in the maize genome that have functional or regulatory relevance. Markers are loci in the genome used as molecular markers. SNPs are single-nucleotide polymorphisms in the maize genome. Count information is for an allele for each position per germplasm. Germplasm are records of germplasm or genetic stocks. Gene expression is based on RNA-Seq experiments. Phenotype terms are unique descriptions for phenotypes

Maize germplasm: diversity and utilization

Origins and taxonomic organization The origin of maize was hotly debated until the late 1970s after which genetic studies, including the use of molecular markers and comparative DNA sequence data allowed breakthroughs in the taxonomy and phylogeny of maize and its wild relatives, including the identification of specific loci involved in the domestication process. Maize was domesticated in the tropical lowlands of southwest Mexico with subsequent introgression from teosinte (Matsuoka et al. 2002; Piperno et al. 2009; van Heerwaarden et al. 2011; Hufford et al. 2013). Maize diversified under genetic drift and selection as it was carried through a diverse habitats during its spread by humans both south and north from its origin, including its arrival in the southwestern region of North America by 2260 BC (Merrill et al. 2009). The initial selection for adaptation to a temperate environment then occurred during the subsequent 2000 yrs in North America (Bouchet et al. 2013; Swarts et al. 2017). Further introductions occurred into the USA, Europe, Africa and Asia in the 16th C (Vigouroux et al. 2008; Bedoya et al. 2017; Edmeades et al. 2017). Further climatic adaptation leads to the development of European flint landraces which involved different genetic loci to those associated with temperate adaptation of dent germplasm in North America (Unterseer et al. 2016).

Characterization of germplasm provides an improved basis to inform plant breeders and conservators about genetic resource diversity. Morphological descriptions of races or “group(s) of related individuals with enough characteristics in common to permit their recognition as a group” (Anderson and Cutler 1942) were initiated in 1919, with further studies during 1943–1952 culminating in a series of “race bulletins” for Mexico, central and South America 1952–1963 (Brown and Goodman 1977). Other publications describing maize races for Europe and Asia are cited by Brown and Goodman (1977). Collectively, some 285 maize races have been described in the several “Races of maize” publications, see, for example, Wellhausen et al. (1952), although Hallauer and Miranda (1981) conjectured these collections may represent 130 distinct types. In contrast, more than 300 maize races are reported to be represented in the collection maintained at CIMMYT (www.genebanks.org/resources/crops/maize/). Six races have achieved global economic importance: Mexican Dents, Corn Belt Dents (CBD), Tusons, Caribbean Flints, Northern Flints and Flours, and the Catetos (Argentinean Flints), although several other races are important regionally (Goodman 1978). Comparisons utilizing molecular marker data mostly support previously assigned racial groupings (Liu et al. 2003: Lu et al. 2011; Mir et al. 2013). Comparisons of molecular marker or DNA sequence data allow global views of taxonomic and phylogenetic relationships of maize genetic resources (Lu et al. 2009). This capability allows genetic diversity present in one country or region to be compared in a global context, both quantitatively and particularly when linked to phenotypes, in qualitative terms also. This holds true, for example, for African maize in a global context (Westengen et al. 2012), similarly for maize utilized in China (Zhang et al. 2016) and likewise for maize cultivated in Europe (Tenaillon and Charcosset 2011; Brandenburg et al. 2017), including development of breeding strategies to further broaden the genetic base of European maize with continued introductions of US Corn Belt germplasm (Reif et al. 2010). Similarly, this is valid for the CBDs, which have achieved global usage and which comprise a relatively diverse germplasm base due to their origins from highly differentiated Northern Flints (NF) and Southern Dents (SD) (Doebley et al. 1988; Dubreuil et al. 1996; Troyer 1999, 2006; Vigouroux et al. 2008; Bauer et al. 2013; Giraud et al. 2014; Unterseer et al. 2016). Nevertheless, the CBDs comprise a minority of germplasm diversity that is represented among maize landraces globally (Mir et al. 2013), most of which is found in tropical maize (Lu et al. 2011). Patterns of genetic diversity and phylogenies of maize in the American continent where maize was domesticated and diversified during later millennia are available to conservators, geneticists and developers of new germplasm (Vigouroux et al. 2008; Bedoya et al. 2017). Comparisons of genetic diversity between tropical, subtropical and temperate maize germplasm are available (Reif et al. 2003; Laborda et al. 2005; Lu et al. 2011). Comparisons of genetic diversity within and among breeding programs can be made (Inghelandt et al. 2010; Nelson et al. 2016). Molecular marker technology has evolved rapidly and substantially from the mid-1970s when 21 isozymic loci and several blocks of zein coding loci were available as molecular markers through to today when DNA sequence data are utilized. The culmination of these developments to the use of sequence data is very important because opportunities are now available to genetically characterize inbred lines, hybrids and populations, including landraces utilizing a common global genetic “language.”


Studies of hybrid diversity It is challenging, yet vital to monitor genetic diversity during selection (NRC 1972, 1993; Rogers and McGuire 2015; Brown and Hodgkin 2015; CGC 2018). However, the ideal is difficult, if not impossible to achieve due to (1) the complexity of the maize genome, (2) G × G and G × E interactions in phenotypic expressions and (3) the difficulty in predicting useful future traits. Consequently, pedigree and molecular markers or sequence data provide useful surrogates. Temporal changes for maize diversity deployed on farms are available for France (Le Clerc et al. 2005), for Zimbabwe, Zambia and Malawi (Magorokosho 2006), and for the USA, although during the past 2–3 decades only for inbred lines after expiration of their terms of PVP and utility patent protection have expired (Nelson and Goodman 2008; Romay et al. 2013; Beckett et al. 2017). Historically, most studies of diversity have focused on US hybrid maize due to its relative longevity in cultivation and the importance of the CBDs to global maize production, not only in the USA but in many other countries. Six pedigree-based studies on the use of public inbred lines carried out between 1956 and 1986 (Darrah and Zuber 1986), showing change in diversity in time, although by 1984 only three publicly bred inbreds contributed >1% to hybrid seed production. These surveys seemed to indicate that diversity of the US maize crop might be narrowing (NRC 1972; Zuber and Darrah 1981). However, more positive responses were received about available diversity (Duvick 1984), reflecting the understanding that diversity resides in breeding programs as a whole rather than just the commercial portfolio at any single place and time (Duvick 1984).

The optimal way to meaningfully monitor diversity is to apply molecular markers directly to hybrids coupled with information on their relative usage. Molecular marker surveys of widely used hybrids during the 1986 and 1990 harvests (Smith and Smith 1991; Smith et al. 1992) showed that many hybrids (46–48%) grouped with others, including open pedigreed hybrids at > 95% thresholds of similarity. Nonetheless, 5–6 out of 18 companies had > 50% of their widely grown hybrids that could be categorized as “unique.” For the 1990 harvest (Smith et al. 1992), US hybrids were associated into two large clusters: The first comprised 91% of the hybrids developed by Pioneer Hi-Bred International (now Corteva Agriscience), two were > 95% similar, three hybrids developed by DeKalb, and one each by Funk Seeds and Asgrow (now all part of Bayer Crop Science). The second comprised all other hybrids including seven subgroups of hybrids.


Status of genetic diversity in US maize breeding and agriculture today Although legal restrictions have stymied assessments of temporal trends in cultivated maize diversity, some insights can still be gleaned from published data. Mikel and Dudley (2006) collated pedigrees of proprietary US inbred lines from information in the PVP and patent databases. Comparisons of a series of progenitor lines indicate that major contributions are from public breeding programs and by Pioneer Hi-Bred International (PHI), including via Holden Foundation Seeds (now Bayer Crop Science). When additional inbreds are added, then stiff stalk contributions from DeKalb and one inbred developed by Northrup King (now Syngenta) occur. These data beg the question: What happened to diversity previously developed by at least five other breeding programs as exemplified by the molecular marker-based assessments of uniqueness of several hybrids widely used during the 1986–1990 time frame (Smith and Smith 1991; Smith et al. 1992)? Some diversity seems to have been lost, and thus, major breeding programs are becoming genetically more similar. There is continued heavy usage of B73, PH207 and PH595 descendents, sourced, not only via PHI hybrids 3180, 3535, 3737, but also directly via proprietary inbred lines (Garing 2000; Larkins 2000). Continued development of US hybrids has increasingly apportioned genetic diversity between heterotic groups (Feng et al. 2006). In contrast, van Heerwaarden et al. (2012) demonstrated a narrowing of SNP haplotype diversity within each of three US heterotic pools.


Genetic diversity in US maize: present situation and future Given evidence of an overall narrowing of genetic diversity in US maize germplasm, the more effective use of a broader genetic resource base is an important strategy to pursue. Greater usage of CBDs globally is one example, but risks of further global dependence upon a relatively limited genetic base should be hedged by utilizing programs that can more effectively characterize hitherto underutilized, including exotic genetic resources. However, there are very few examples where exotic germplasm has been successfully incorporated into the CBDs with marginal use of temperate exotic germplasm (usually 2–6%, occasionally 12–25%) and minimal use of tropical germplasm (0.1–5%) in a few hybrids (Goodman 1999, 2005). The Maize Crop Germplasm Committee (MCGC) expressed concerns about the vulnerability of genetic diversity in US maize stating that “the genetic health of the maize crop is a matter of National security” (MCGC 2016). Chief recommendations included: (1) international collaborations to screen for resistance to diseases not yet found in the USA, (2) genetic diversity of the U.S. maize crop should be evaluated using DNA-based tools, (3) regeneration and characterization must be increased, (4) additional collections of landraces, populations, wild relatives and inbred lines from programs are needed before closure and (5) expansion of germplasm enhancement. A reduction in useful genetic diversity will ultimately result in a decline in the rate of genetic improvement unless remedial measures are taken. The rate of decline accelerates as inbred development becomes more effective (Gaynor et al. 2017). Programs to increase diversity require “long-term commitment and appropriate breeding strategies, and may be assisted by DNA marker technologies” (Holland 2004). Programs designed specifically to adapt and characterize exotic germplasm for the purpose of identifying new useful diversity are termed “pre-breeding.” The international scope of breeding programs provides prospects whereby breeding in one location is “pre-breeding” for other global locations. For example, introgression of temperate germplasm into tropical hybrids developed in Brazil may, in turn, help identify useful tropical germplasm for use in temperate locations. An exotic germplasm introduction program initiated at NCSU takes advantage of cycles of inbreeding to develop tropical inbred lines as a means of reducing alleles that may have deleterious effects. Further cycles of pre-breeding then occur in a temperate climate (Goodman 2005; Nelson and Goodman 2008; Gardner 2012; Nelson et al. 2016). Readers are recommended to Smith et al. (2017) for further information on the importance and challenges of utilizing wild and other exotic genetic diversity in maize breeding, case studies demonstrating the use of exotic genetic resources and critical issues faced by genebank curators now and in the immediate future. New breeding strategies including the use of molecular markers and precision phenotyping provide improved opportunities to more effectively access diversity (Yu et al. 2016) and see Seeds of Discovery (SeeD) (Prasanna 2012). Breeding programs that are primarily aimed at developing improved hybrids can also be modified to reverse the decline in genetic variance (Gaynor et al. 2017; Voss-Fels and Snowdon 2016).

Manipulation of genetic diversity: mutagenesis, transformation, genome editing

Induced mutagenesis Manipulation of genetic diversity in maize can be achieved through various approaches such as hybridization with sexually compatible wild relatives (Mangelsdorf 1961), treatment with physical and chemical mutagens (Bird and Neuffer 1987), transposable elements (May et al. 2003; Neuffer et al. 2009), transgenesis (Wang et al. 2003a, b) and genome editing (Gao 2018) (Table 2). All these approaches can cause gene mutations and rearrangements. Besides its utility in broadening genetic variation, mutagenesis is also a useful tool to understanding gene function. For instance, X-rays were used to induce mutations at the yellow–green (Yg) locus (Dollinger 1954), and ethyl methanesulfonate (EMS) was used to determine the function of the Sh1 protein involved in endosperm development (Chourey and Schwartz 1971).

Table 2 Progress in the use of mutagenesis, transgenesis and genome editing to broaden genetic diversity in maize

Attempts to introduce physical changes to the maize genome using irradiation started as early as the 1930s (Stadler and Sprague 1936). This work helped demonstrate that shorter wavelengths (~ 2600 Å) of non-ionizing radiation such as UV are more effective in inducing DNA damage (Stadler and Sprague 1936). In addition, toxic compounds such as mustard gas were evaluated for their effectiveness as mutagens in maize (Gibson et al. 1950). However, more success was attained with UV radiation (Stadler and Uber 1942) and ionizing radiation, such as X-rays and gamma rays (Neuffer 1957; Sarvella and Grogan 1967). In this early work (Gibson et al. 1950; Sarvella and Grogan 1967; Stadler and Uber 1942), there were limitations with respect to exposing target tissue and cells to the mutagen to effectively generate heritable mutations. For example, maize pollen was the main target for the application of UV radiation and mustard gas likely due to the abundance and ability to pass the mutations to the progeny.

Induction of mutations in maize with chemical mutagens, for example, EMS and various N-nitroso compounds has also been carried out for a while (Amano and Smith 1965; Bird and Neuffer 1987; Williams 2016). EMS is more effective than non-ionizing and ionizing radiation (Neuffer and Fiscor 1963; Neuffer et al. 2009). When EMS is used in combination with carriers such as paraffin (Neuffer and Coe 1978) or mineral oil (Neuffer 1994), less damage to the germ cells is experienced, resulting in increased mutation frequency (Brunelle et al. 2017). TILLING (Targeting Induced Local Lesions IN Genomes) uses chemical mutagenesis methods to create libraries of mutagenized seed that are later screened using high-throughput approaches for the discovery of useful mutations. With an increase understanding of sequence—function relationships—valuable alleles might be identified in respective TILLING populations in a more targeted way by reverse genetic approaches. Till et al. (2004) obtained 17 independent EMS-induced mutations from a population of 750 maize plants derived from mutagenized pollen. Such novel mutations are useful in dissecting complex traits such as seed number (Bommert et al. 2013).

Transposable elements or transposons are DNA sequences capable of migrating around the genome and may induce various chromosomal mutations and genetic variation. The early works of Rollins Emerson (1917), Barbara McClintock (1950), Marcus Rhoades (1938) and Peter Peterson (1953) on transposable elements paved the way for their widespread use in maize research. Recent discoveries suggest a role for transposable elements in maize gene regulation in response to stress conditions (Makarevitch et al. 2015). Examples of transposable element systems in maize include activator–dissociation (Ac-Ds), suppressor–mutator (Spm) and Robertson’s mutator (Mu) (Robertson 1957). Although the use Ac-Ds, Spm and Mu transposon systems became useful methods to study gene function in maize (Neuffer et al. 2009), the loss of transposon activity can lead to the suppression of large numbers of mutations in the F2 population, making it difficult to understand the function genes of interest (May et al. 2003). In recent years, transposon-based genetic resources such as mapped Ac/Ds families and UniformMu have been established and are available through MaizeGDB (https://www.maizegdb.org/) and Maize Genetics Cooperation Stock Center (http://maizecoop.cropsci.uiuc.edu/). UniformMu (McCarty et al. 2013) is a population of mutator-induced mutants in a highly uniform W22 background. Over 95,871 unique germinal insertions were mapped in over 14,000 seed stocks. Insertion positions are available for both B73 and W22. The Ac/Ds (Vollbrecht et al. 2010) resource is a collection of sequence-tagged Ds insertions in W22-derived inbred lines generated by aligning 2072 Ds flanking sequences against B73. Both resources display the insertions as genome browser tracks at MaizeGDB with tools to order the seed stock containing the insertion.


Transgenesis Despite the important value of induced mutations, the process of mutation breeding is labor-intensive and time-consuming. Some of the breeding objectives can be achieved through in vitro tissue culture processes and genetic transformation (Lee and Phillips 1987). First attempts to transform maize involved direct injection of DNA into tissue, but without success (Coe and Sarkar 1966). Twenty years later, progress in biotechnology resulted in stable transformation of maize (Fromm et al. 1986) and the discovery that Agrobacterium enabled transfer of DNA to maize cells (Grimsley et al. 1987). The first transgenic maize was developed by protoplast transformation (Rhodes et al. 1988), albeit infertile. The first fertile transgenic maize was developed by particle bombardment of embryogenic suspension culture (Gordon-Kamm et al. 1990) and protoplast transformation (Golovkin et al. 1993). Since then, transgenic maize plants primarily carrying insect resistance and herbicide tolerance traits have been commercialized in many countries and are one of the major successes of this technology in the last century. In 2016, transgenic maize occupied 59.7 mio. hectares globally (ISAAA 2017).

In the mid-1990s to early 2000s, more robust protocols were developed for Agrobacterium-mediated maize transformation (Ishida et al. 1996) and polyethylene glycol (PEG)-mediated protoplast transformation (Wang et al. 2000). Maize transformation protocols are highly genotype dependent. Therefore, the identification of transformable genotypes was a primary challenge. High type II callus production (Hi II) became one of the most widely used families for maize transformation because of its ability to produce highly transformable calluses (Zhao et al. 2001; Frame et al. 2002). However, Hi II is a segregating family, which complicates gene function studies. For that reason, the discovery that overexpression of maize genes encoding baby boom and Wuschel morphogenic regulators can enable leaf cell transformation of recalcitrant genotypes (Lowe et al. 2016) is an important milestone for maize transformation.


Genome editing Site-directed nuclease systems have been developed for targeted genome editing for more than two decades. The application of genome editing technologies is expected to generate new genetic variation in maize for both basic research and development of improved commercial germplasm. Current genome editing tools use nucleases to induce DNA double-strand breaks (DSBs). These tools include zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated (Cas)-CRISPR-Cas systems (Georges and Ray 2017).

The application of ZFN technology resulted in reduction in seed phytate content via specific targeting of one of the inositol phosphate kinase (IPK) homologues (Shukla et al. 2009). However, a challenge with the ZFN technology is its limited ability to generate a high frequency of mutations, making it difficult to identify the mutated alleles (Puchta and Hohn 2010).

TALENs are similar to ZFNs and comprise a non-specific Fok1 nuclease fused to a DNA-binding domain. However, the biggest challenge with the use of TALENs is engineering a highly specific TALE domain to avoid off-target DNA cleavage. Such non-specific DNA editing may have deleterious results making it difficult to obtain a desirable mutation. Nonetheless, using a combination of gene promoters, heritable and site-specific DNA changes in the maize glossy2 (gl2) locus were generated by the TALEN approach using Agrobacterium transformation of the B104 inbred (Char et al. 2015).

In maize, CRISPR technology has been applied to modify various traits such as male sterility, lignin biosynthesis, herbicide tolerance, secondary metabolism, grain composition and drought tolerance (Chilcoat et al. 2017). The first use of Cas9/gRNA for genome editing in maize targeted the maize IPK gene (ZmIPK) using PEG-mediated protoplast transformation (Liang et al. 2014). Similar experiments involved transformation of immature embryos of the B73 inbred line (Xing et al. 2014). In addition, CRISPR/Cas9 has been used to induce mutations and replace or add genes in maize using biolistic maize transformation (Svitashev et al. 2015). Char et al. (2017) showed that co-infection of two Agrobacterium strains harboring distinct Cas9/gRNA can generate transgenic plants with mutation rates as high as 70%. Even though the CRISPR-Cas technology enabled modification of various traits in maize (Svitashev et al. 2015, 2016; Shi et al. 2017; Char et al. 2017), the native maize GOS2 promoter was used by Shi et al. (2017) to both replace and supplement the native ARGOS8 promoter. Variants with altered expression of ARGOS8, a negative regulator of ethylene responses, showed yield gains under drought stress, with no yield penalty under well-watered conditions (Shi et al. 2017).


Future directions Maize harbors a vast amount of spontaneous mutations that can be leveraged to create adapted varieties and inbred lines (Bird and Neuffer 1987). However, this genetic variation may not be available in elite germplasm. Genome editing methods are precise and faster to attain desirable genome changes without lengthy backcross procedures. Coupled with genotype-independent maize transformation, genome editing technologies could become a prevalent approach to introduce specific genetic changes for organizations with negotiated access to genome editing technologies and financial resources to address regulatory requirements.

Inbred line and hybrid seed production

Maize has a convenient reproductive organization with separate male and female flowers on the same plant enabling both inexpensive self-pollination for inbred line development and controlled cross-pollination for hybrid seed production. This is likely one of the reasons why hybrid breeding was first implemented in maize after discovery of heterosis. Efficient hybrid breeding requires methods that (a) quickly generate homozygous and homogeneous lines and (b) enable cost-efficient seed production. Inbred lines developed by continuous self-pollination have been largely replaced by doubled haploid (DH) lines. Hybrid seed production was initially done by individual hand crosses and benefited from mechanization and mechanical detasseling in isolation nurseries, which are being replaced by natural or transgenic male sterility systems.


Inbred line development A major breakthrough in inbred line development has been the discovery of haploid plants and the concept of DH lines, which require only two instead of six or more generations to develop. Haploid plants are smaller and less vigorous than corresponding diploid plants (Chase 1952). Spontaneous parthenogenetic or androgenetic maize haploids occur at very low frequencies. Randolph (1940) discovered 23 parthenogenetic diploids among 17,165 individuals in the progeny of tetraploid maize, a frequency of about 1:750. Einset (1942) found two monoploids among 1916 plants, a frequency of 1:958. Chase found 43 monoploids among 38,684 seedlings using a dominant gene for purple plumule (Chase 1949). Stadler (1949) obtained a frequency of about 1:100 with a diploid multiple recessive tester. It took decades from the initial discovery of haploids to routine use of DH lines in maize breeding programs. Three main factors limited use of DH technology: low haploid induction rate, difficulty in the identification of haploid kernels and limited genome doubling capabilities. Besides haploid induction in vivo, in vitro techniques have been evaluated. However, maize turned out to be highly recalcitrant apart from a limited number of genotypes (Geiger 2009).


Haploid induction Chase (1949, 1951) suggested that haploids could be used for line development in hybrid breeding. Identification and use of haploid inducer Stock 6 was a major breakthrough for DH technology in the 1950s, with a maternal haploid induction rate of 2.3% (Stock 6 used as male). From the first use of inducer line Stock 6 to modern inducers, induction rates increased from 2% to close to 15% over about five decades. Rotarenco et al. (2010) reported the highest induction rate (14.5%) in their recently developed inducer lines derived from Stock 6 and MHI. At Iowa State University, induction rates above 15% were obtained for F6:7 families from the cross of unrelated inducers with > 10% induction rates each (Frei, personal communication). We conclude that inducer lines with haploid-inducing capacity in excess of 20% are likely in the near future. Moreover, genes involved in maize haploid induction have been identified (e.g., Kelliher et al. 2017; Liu et al. 2017). A deletion in Matrilineal (MTL) causes significantly increased haploid induction rates in maize. Gene editing can thus be used to increase haploid induction rates in any genotype accessible to genome editing in maize.


Identification of haploid kernels About 90% of offspring from inducer crosses are regular diploids and thus undesirable. Coe and Sarkar (1964) developed several marker systems including R1-Navajo (R1-nj) and applied them to facilitate the identification of haploid seed. The R1-nj gene is a dominant anthocyanin color marker gene, which expresses in the aleurone as well as the embryo. It enables the identification of kernels with haploid embryos based on (lack of) color. The R1-nj genetic marker that has to be sorted manually is still widely used. Expression of the R1-nj color marker can vary depending on genetic donor background and environmental factors (Liu et al. 2016). To overcome the shortcomings of R1-nj, alternative methods for automated sorting based on high oil, color, seed weight and near-infrared spectroscopic differences are under development (Liu et al. 2016). This includes the development of haploid inducers with high oil content, to facilitate haploid–diploid discrimination (Melchinger et al. 2015).


Genome doubling In the 1950s, colchicine was introduced to generate DH lines. Colchicine is an effective method for plant somatic genome doubling. Different protocols were established for maize Eder and Chalyk (2002). As colchicine is toxic to humans and the environment, alternative chemicals such as herbicides, such as nitrous oxide and trifluralin, have been proposed for genome doubling (Kato 2002; Häntzschel and Weber 2010).

Haploids may become fertile spontaneously by haploid genome doubling (SHGD). Barnabás et al. (1999) reported that SHGD rates ranged from 0 to 21.4% among maize germplasm. After in vivo haploid induction and planting in the field, maize haploids usually show a high degree of haploid female fertility (HFF) (Chase 1952; Chalyk 1994; Kleiber et al. 2012). More than 90% haploid ears with kernels are obtained after crossing haploid plants with regular diploid maize pollen (Chalyk 1994; Geiger et al. 2006). The average haploid male fertility (HMF) rate is usually below 10% (Chase 1949; Chase 1952; Chalyk 1994) which limits the number of DH lines produced in a population without colchicine treatment. Thus, methods to improve HMF are of interest to maize breeders. Geiger and Schönleben (2011) found significance within population variation for HMF, corroborated in studies of temperate and tropical germplasm (Zabirova et al. 1993; Chalyk 1994; Kleiber et al. 2012). Major QTL affecting HMF has been identified (Ren et al. 2017).


Future DH breeding schemes Further improvement in DH technology will reduce costs for inbred line development considerably. Breeding strategies that make best use of breeder genetic, technical and monetary resources have been proposed (Gordillo and Geiger 2008; Geiger 2009). Major breeding programs combine DH technology with genomic selection (GS) (see below), to maximize genetic gain per breeding cycle. With doubling rates exceeding 17%, the costs for GS at the haploid stage would be lower than conducting GS one generation later, at the diploid stage (Wu et al. 2015). Thus, if SHGD works well, maize breeding programs using both DH technology and GS can be accelerated.


Controlled pollinations With the first commercial hybrid seed production in 1923, manual detasseling of seed parents was employed to maximize hybrid purity and to avoid hand pollinations in hybrid seed production fields (Wych 1988). Manual detasseling has later been advanced to mechanical detasseling, or a combination of mechanical followed by manual detasseling for control (Wych 1988). Manual and mechanical detasseling remains an important method of hybrid seed production today; its use depends on the availability of alternative biological mechanisms, which allow hybrid seed production at lower costs. Manual detasseling was increasingly replaced by the use of cytoplasmic male sterility (cms) in the 1950s to 1970s, but gained renewed importance with the advent of Southern corn leaf blight, which eliminated the use of T-cytoplasm as a primary cms source for hybrid seed production.

cms in maize was first described by Rhoades (1931). Three major sources of cms have been recognized: cms-T (Texas), cms-C (Charrua) and cms-S (USDA) (Gabay-Laughnan and Laughnan 1994). While cms is caused by defects in mitochondrial DNA and, thus, maternally inherited, fertility in hybrids needs to be restored. This is accomplished by crossing cms females with males, carrying matching genic inherited restoration of fertility (Rf) genes. Rf1 and Rf2 restore the fertility of cms-T, Rf3 the fertility of cms-S, and Rf4 and Rf5 the fertility of cms-C (Gabay-Laughnan and Laughnan 1994). While actual seed production using cms is less costly compared to mechanical detasseling, both cms and Rf genes need to be introduced into the respective female and male parents, respectively. Moreover, cms and any biological systems for pollen control may be affected by environmental conditions. Fertility of cms females has been observed under some conditions (Jugenheimer 1985), which leads to self-pollination of females and reduced yield of respective hybrid seed lots. Alternative genic male sterility and chromosomal–genic systems have been developed (Duvick 1965), but the majority of seed produced using biological pollen control has been based on cms systems (Jugenheimer 1985).


Combination of DH technology and cms conversion Paternal haploid induction in maize is mediated by the gene ig1 (indeterminate gametophyte), which increases the frequency of haploids in its progeny (Kermicle 1969). Homozygous ig1 mutants show several embryological abnormalities including egg cells without a nucleus. After fusion with one of the two paternal sperm cells, such an egg cell may develop into a haploid embryo possessing the maternal cytoplasm and only paternal chromosomes. In selected genetic backgrounds, the HIR ranges from 1% to 2% (Kermicle 1994). Because of low frequency of haploids, this system is not widely used to derive DH lines. However, the ig1/ig1 genetic stock can be useful for the conversion of an inbred line to its cms form. For this purpose, ig1/ig1 inducer lines with various cms-inducing cytoplasms have been created (Pollacsek 1992; Schneerman et al. 2000).


Transgenic methods aiding seed production Male sterility and fertility restoration were among the first transgenic traits available, including the Barnase/Barstar system (Mariani et al. 1990). A team at Corteva Agriscience developed the seed production technology (SPT) system in maize (Wu et al. 2016), which has been deregulated by USDA APHIS in 2011 and is thus available for commercial hybrid seed production in maize. The maize SPT maintainer line is a homozygous recessive male sterile transformed with a SPT construct containing (1) a complementary wild-type male fertility gene to restore fertility, (2) an α-amylase gene to disrupt pollination and (3) a seed color marker gene. The sporophytic wild-type allele enables the development of pollen grains, carrying the recessive allele. Only half carry the SPT transgenes. Pollen grains with the SPT transgenes exhibit starch depletion resulting from expression of α-amylase and are unable to germinate. Pollen grains that do not carry the SPT transgenes are non-transgenic and are able to fertilize homozygous mutant plants, resulting in non-transgenic male-sterile progeny for use as female parents. Because transgenic SPT maintainer seeds express a red fluorescent protein, they can be detected and efficiently separated from seeds that do not contain the SPT transgenes by mechanical color sorting. Alternative systems have been or are being developed. Monsanto’s Roundup Hybridization System (RHS) utilizes a transgenic maize trait (MON87427) that exhibits tolerance to glyphosate in all plant tissues except male reproductive tissues (Feng et al. 2014). Thus, genotypes carrying this event can be used as females, and pollen sterility can be induced by glyphosate application at flowering. The multicontrol sterility (MCS) system (Zhang et al. 2018) is based on the male sterility 7 (ms7) mutation in maize and uses color and herbicide tolerance to discriminate between male-sterile and fertile seeds.


Future It is likely that transgenic mechanisms (including those generated by genome editing) will increasingly be used to produce maize hybrid seeds to overcome the need (and costs) of detasseling. Primary concerns are (1) environmental stability of male sterility systems, i.e., ability to produce high-purity hybrid seed independent of environmental variation, and (2) regulatory acceptance of using transgenic hybrid seed production systems, which will likely differ substantially among countries.

Hybrid performance—hypotheses and prediction

Hybrid performance and heterosis played an important role in the history of maize breeding. Consequently, long-term research questions relate to the biological underpinnings of heterosis and on developing methods to predict the hybrid performance of various combinations of inbred lines.


Hypotheses regarding mechanisms of heterosis Observations of hybrid vigor in maize stretch back to Darwin who was the first to systematically describe the phenomenon (Darwin 1889). Darwin corresponded with Asa Gray at Harvard throughout his experiments who was a mentor to James Beal (Singleton 1941). Shortly after leaving Harvard for a position at the Michigan Agricultural Research Station, Beal conducted the first experiment in which one variety of maize was detasseled and then pollinated by another (Singleton 1941). Across multiple years of trials, crossed plants were consistently found to out-yield open-pollinated individuals (Singleton 1941). Starting in 1905, George H. Shull began a series of experiments at Cold Spring Harbor Laboratory in which lines of maize were self-pollinated. Inbreeding resulted in a marked decrease in plant vigor, whereas a dramatic rise in vigor was observed, when selfed lines were crossed. These results were published in two seminal papers that laid the groundwork for hybrid maize breeding (Shull 1908, 1909) and were supported by substantial further work on the effects of inbreeding by Edward East (1908).

Almost immediately, those observing the phenomenon of heterosis in maize proposed causal biological mechanisms. Shull (1908) and East (1908) posited superior performance in hybrids was caused by heterozygosity itself, which acted as a physiological stimulus. This explanation of heterosis is known as the overdominance hypothesis. In contrast, the dominance hypothesis, first developed by Davenport (1908) and then clearly articulated by Bruce (1910), attributes heterosis to the masking of effects of deleterious alleles by dominant or partially dominant alleles, with each inbred line providing its own complement of dominant, favorable alleles. While the dominance hypothesis was generally supported during the early portion of the twentieth century, a surge of support for overdominance grew in the 1940s leading up to the first Heterosis Conference at Iowa State University in the summer of 1950 (Gowen 1952). During this conference, it was generally concluded that the dominance hypothesis explains the loss of vigor due to inbreeding and subsequent recovery upon crossing, but is insufficient to explain the marked increase in vigor of hybrids relative to open-pollinated varieties, which could only be explained by overdominance (Crow 1999). Following the Heterosis Conference, the pendulum shifted again toward support for the dominance hypothesis with accumulating evidence for both high mutation rates requiring complementation of deleterious alleles in maize and for improved performance over time of inbred lines per se, presumably due to purging of deleterious alleles (Crow 1998). In addition, examples of a previously proposed mechanism, pseudo-overdominance (Jones 1917), were discovered, in which favorable, dominant alleles found in repulsion phase in a particular linked chromosomal region led to a signal that could be mistaken for overdominance. For instance, Moll and colleagues (1964) found that signatures of statistical overdominance in early cycles of selection quickly disappeared, with later cycles characterized by dominance, presumably due to decreasing linkage. Pseudo-overdominance was clearly illustrated when Stuber and co-authors found that a QTL for heterosis fractionated during fine mapping into two dominant QTL in repulsion phase (Stuber et al. 1992; Graham et al. 1997).

While dominance and overdominance have been the primary explanations for heterosis, epistasis has often been described as an additional mechanism. However, epistasis appears to play a minor role in heterosis in maize (Melchinger et al. 1986; Garcia et al. 2008), but may be more important in self-pollinating, homozygous species such as rice in which dominance is likely less pervasive and epistatic interactions may be more stable (Garcia et al. 2008).

More recently, our understanding of heterosis in maize has been informed by genomic data. For example, with complex, quantitative traits, genomic data have linked heterosis to the combined effects of a number of loci (Stuber et al. 1992; Giraud et al. 2017). Separate traits (e.g., yield, plant height) have demonstrably unique genetic architectures of heterosis in the same hybrid-parent triplet, confirming a multigenic nature of heterosis (Flint-Garcia et al. 2009). Genomic data have also led to an elaboration of existing hypotheses regarding the mechanisms of heterosis. Comparison of sequences in different maize inbreds revealed a surprising level of presence/absence variation (PAV) in which sequences found in one inbred are lacking in another. For example, investigation of the bz locus in maize (Fu and Dooner 2002; Wang and Dooner 2006) revealed that inbred lines share only 50% of their sequence in this chromosomal region. These findings suggest that dominance, previously attributed solely to complementation of slightly deleterious alleles, may also involve complementation of absent sequence. Recently, Baldauf et al. (2018) had demonstrated that many more genes were actively expressed in hybrids than in their inbred parents. In several instances, this was due to absence of the gene in a particular inbred; in many more cases, the gene was present in the inbred but not expressed. Such scenarios could lead to expression complementation. Similarly, recent work has shown that dysregulation of expression (i.e., aberrantly low or high levels of expression of a given gene) can be caused by deleterious alleles (Kremling et al. 2018). At such loci, hybrids may experience mid-parent values of expression within an optimal range resulting in an increase in fitness in hybrids relative to the inbred parents as proposed by Springer and Stupar (2007).

While accumulating evidence in the molecular and genomic era continues to favor dominance as the prevailing mechanism driving heterosis in maize, the phenomenon is observed in highly quantitative traits that interact in complex pathways to produce a given phenotype. In all likelihood, diverse mechanisms including overdominance and epistasis play at least a minor role in heterosis and a single, unifying mechanism cannot be determined (Kaeppler 2012). A further complicating factor has been the observed trend of decreasing percentage of heterosis over time (Fig. 5), with a concomitant increase in the yield performance of hybrids and parental inbred lines (Troyer and Wellin 2009). This finding may be linked to the continued purging of deleterious alleles among inbred lines with each heterotic group but the general separation between heterotic groups.

Fig. 5
figure 5

Adapted from Troyer and Wellin (2009), Crop Science 49:1969–1976

Changes in hybrid yield, inbred yield, heterosis and percent heterosis along year of hybrids. Percent heterosis is calculated as heterosis over hybrid yield.


Development of methods to predict hybrid performance Despite gaps in our understanding of the mechanism of heterosis, substantial progress has been made in predicting hybrid performance. As hybrid breeding programs became established, the number of inbreds within each heterotic group increased dramatically. Soon, it became unfeasible to phenotypically evaluate performance of all single-cross hybrids due to the overwhelming number of pairwise combinations of inbred lines. Unfortunately, the evaluation of inbred lines per se has proved to be an ineffective predictor of hybrid performance due to the prevalence of strong dominance effects (e Gama and Hallauer 1977; Smith 1986). Therefore, the focus in evaluation of hybrid performance has since shifted to model-based prediction using both pedigree and genomic data.

In his pioneering work, Rex Bernardo modified the classical best linear unbiased prediction (BLUP) approach of Henderson (1975), predicting the performance of single-cross hybrids based on both yield data from related single crosses and a relationship matrix derived from molecular marker data from the parental inbreds (Bernardo 1994). This approach is now commonly known as genomic BLUP or GBLUP. In a subsequent study, Bernardo implemented this procedure to predict maize single-cross traits including yield, moisture, stalk lodging and root lodging in a population large enough to approximate a modern commercial breeding program (Bernardo 1996a). While prediction accuracies were high when both parents were tested in single-cross combinations, they dropped considerably when parents were not tested (Bernardo 1996b). Several statistical modifications and improvements upon the basic GBLUP approach have since been made (cf., Zhao et al. 2015; Desta and Ortiz 2014). For example, the ridge-regression BLUP approach (RR-BLUP; Whittaker et al. 2000) can predict the effects of individual markers on hybrid performance and Bayesian methods allow for a range of variances of individual marker effects (Zhao et al. 2015). Such methods have been implemented in maize using a process known as genomic selection (GS), in which high-density marker data are employed without pre-screening in order to determine genotypic values (Piepho 2009). GS has been shown to predict single-cross hybrid performance in maize at high accuracy even in germplasm from the early stages of a hybrid maize breeding program (Kadam et al. 2016). Quite recently, increasing attention has been paid to incorporating data into hybrid performance prediction that reflect intermediary steps between genotype and phenotype such as expression and metabolomic data (Schrag et al. 2018). Finally, genomic selection models have been shown to improve when modified to include annotation of deleterious alleles (Yang et al. 2017a, b).

Breeding project designs

A brief history The first modern maize breeders (George Shull, Edward East, Donald Jones, Henry Wallace, Perry Holden, Raymond Baker and George Sprague) were cognizant of evidence for response to selection provided by William Beal, Charles Darwin, Isaac Hershey, George Krug, Jake Leaming and Robert Reid (Kingsbury 2009) as well as the theoretical basis for response to selection (Fisher 1930). They designed breeding systems based on their research objectives, their understanding of heterosis and constraints imposed by reproductive biology and available resources. Due to its flexible reproductive biology, designs of maize breeding projects were more numerous than those developed for other crops (Comstock et al. 1949; Hull 1945; Jenkins 1940). Let us consider these designs from the perspective of two distinct objectives: to improve average population performance for a particular trait and to develop hybrids for sales to corn farmers.

Academic maize breeders designed recurrent population improvement projects using numerous locally adapted, open-pollinated (Leaming, Midland, Hays Golden, Golden Glow, Krug, Reid, Dawes, Iowa Ideal, Indian Chief, Jarvis, Burr’s White, Lancaster, Kolkmeier) and synthetic populations (BS, BSSS, BSCB, CGSyn, EZS, NDS, VCBS). The purpose of these projects was to evaluate responses to selection in local environments using various types of selection units including: mass selection of individual plants, half-sib family selection, full-sib family selection and self-pollinated family selection. These same selection units also were evaluated for performance in hybrid combinations using reciprocal recurrent selection projects. Depending on the genetic variability in the founder populations, a wide range of responses were observed for all of the methods (Hallauer et al. 2010). Some of the recurrent selection projects were coupled with line development projects (Fig. 6a) that occasionally produced exceptional lines used in production of hybrids broadly grown by farmers: e.g., B13, B37, B73, B84.

Fig. 6
figure 6

Adapted from (Gaynor et al. 2017) (color figure online)

a Depiction of relationship between recurrent population improvement projects and line development projects; b depiction of maize hybrid development as consisting of parallel line development pipelines (red and yellow) within heterotic groups and a hybrid evaluation and commercialization pipeline (green). Lines advanced to late stages with desirable attributes are used in crossing nurseries to recurrently initiate the development of novel replicable lines; c depiction of maize hybrid development pipelines modified to include trait introgression within heterotic groups; d depiction of maize hybrid development pipelines modified to include introgression of non-negotiable traits for hybrid sales and rapid cycling through genomic selection for population improvement.

The basic design of hybrid maize development pipelines (Fig. 6b) was well established by the 1970s. The goals of these projects were to: (1) maximize additive genetic variance and minimize contributions from non-genetic variance through the development of replicable homozygous lines within heterotic groups (Eberhart 1970); (2) evaluate lines per se for parental attributes and in hybrid combinations for agronomic attributes from two or more heterotic groups using replicated small plot field trials; (3) improve breeding lines genetically within each heterotic group by recycling lines with desirable agronomic attributes; (4) identify the best hybrid-environment combinations for selected hybrids using large-scale, on-farm, field trials requiring practical aspects of preparing foundation, registered and certified seed (Fehr 1991).

The predicted response to selection or predicted genetic gain for each cycle of breeding has several forms:\( \Delta G_{c} = \iota \sigma_{p} h^{2} = \iota \sigma_{a} h =\iota \sigma_{a} \rho \). For maize breeders, the predicted genetic gain per year, \( \Delta G_{\text{t}} = \Delta G_{\text{c}} /{\text{years}} \) (Eberhart 1970; Hallauer and Miranda 1981), has been a metric for making decisions about proposed modifications to breeding methods (Fehr 1991). Based solely on the criterion of cycle time, population improvement projects in the 1970s were faster than hybrid development projects. They intermated selected lines every three to 5 years, while line and hybrid development projects intermated selected lines every seven to 10 years. The advantages of the pipelines model are that they provide opportunities for evaluation and selection across years (stages) which enabled greater selection intensities, ι, for multiple traits and they produced greater correlations, ρ, between selection units and response units (Holland et al. 2003). Perhaps more importantly, the pipeline models were designed to develop, evaluate and disseminate hybrids to farmers, i.e., they were economically sustainable. Indeed, many land-grant universities in the USA initially supported line and hybrid development projects, but only a few remain because public maize breeding did not capture the value of their released lines and hybrids in the marketplace. Currently most maize breeding projects in the USA and Europe are supported by revenues from sales of commercial seed. There is significant effort on behalf of nonprofit organizations to implement maize line and hybrid pipelines in an economically sustainable manner for developing countries (Gary Atlin, personal communication).

The basic form of the maize hybrid development pipelines has been sufficiently robust to incorporate a large number of technological innovations such as expanded evaluation phases for germination tests, and disease and insect nurseries to protect genetic gains (Dicke and Guthrie 1988; Smith and White 1988). In the 1990s, the pipelines became longer with the introduction of new pipeline segments to accommodate marker-assisted introgression of transgenic events from poorly adapted, but transformable, lines (Fig. 6c).

Backcross-enabled introgression has been practiced for a long time (Johnson and Eldredge 1953; Wilcox and Cavins 1995), and marker-assisted introgression will likely continue for breeding teams that have not negotiated access to enabling technologies for maize genome editing (Lowe et al. 2016) or lack sufficient resources to address regulatory requirements. Potential target alleles for introgression have been discovered using both forward and reverse genetic approaches in germplasm resources and gene banks (Blumel 2015; Kumar et al. 2010; Leung et al. 2015) and have been catalogued in MaizeGDB (Lawrence et al. 2008). Emerging high-throughput technologies have been proposed to increase the pace of genetic discoveries (Yu et al. 2016), although the “turbocharged” discovery process will require the development of automated curation processes for MaizeGDB to keep pace.

While some technological innovations added components to the pipelines, other technological innovations were adopted to reduce time required to develop new hybrids and coincidently reduce time per breeding cycle (Li et al. 2018). For example, the development of “winter nurseries” in tropical locations and subsequent development of continuous nurseries in tropical and high-latitude locations reduced the time to develop replicable homozygous lines. The time to develop replicable homozygous lines was further reduced with the development of doubled haploid capabilities. Also the number of years required for hybrid field evaluations was reduced through the use of locations at equivalent latitudes in opposite hemispheres (Cooper et al. 2014). Continuous nurseries and continuous field trial evaluations required information technologies, logistical software systems and development of high-throughput seed handling and transfer capabilities to assure timely tracking and delivery of seed (Serhatli et al. 2018). As a consequence, the time required to develop and deliver a new hybrid was reduced from more than a dozen years in the 1970s to about 7 years, while the time required per breeding cycle was reduced from about 10 years to as little as 5 years by the end of the first decade of the twenty-first century. The associated costs for implementing these innovations are not clear, not even for stockholders.

Discoveries of genome organization, maize diversity, genetic signaling networks and metabolic pathways, as well as development of digitized phenomics, precision envirotyping (Xu 2016), genomic selection (Cooper et al. 2014; Meuwissen et al. 2001), genome editing (Shi et al. 2017; Svitashev et al. 2015) and speed breeding (Watson et al. 2017) coupled with eco-physiological crop models (Technow et al. 2015) have potential to modify or completely redesign maize genetic improvement models. A comprehensive descriptive review of how these discoveries and technical innovations could affect the components of \( \Delta G_{t} \) was provided by (Xu et al. 2017). As with pre-molecular breeding, the overriding theme of most is to reduce time per cycle of breeding. With the exception of genomic selection, reductions in time have been associated with increased costs. A new conceptual framework is needed to address the trade-offs between time and cost because corn growers are questioning the value of continuously escalating costs for seed and other inputs (Serhatli et al. 2018).


Exploring possible modifications to maize breeding projects The current ratio of annual genetic gains to seed costs is not sustainable, as evidenced by global consolidation of breeding and ag chemical organizations. As a consequence, maize breeding programs will be expected to modify or redesign their development pipelines to double the rates of genetic gain with fewer resources (lower costs). For the last 20 years, suggested modifications to maize (plant) breeding pipelines have been investigated using simulations rather than descriptive assessments of impacts on the components of \( \Delta G_{t} \). Simulations are used because the functional relationships for annual genetic gains of a single-trait, as well as its multiple-trait version, \( \Delta H_{c} = \iota r_{IH} \sigma_{H} = \sum {a_{i} } \Delta G_{{c_{i} }} \)(Hazel 1943; Smith 1936), depends on simplifying assumptions about the genetic architecture (infinitesimal additive model), genome organization (no linkage), population structure (Hardy–Weinberg equilibrium) and nature of selection intensity (single-stage truncation).

Since none of the assumptions underlying the breeder’s equation are correct for any specific plant breeding project, computer simulations using models for genetic architectures, genetic linkage, population structures, multistage selection and assortative mating were developed (Cooper and Podlich 2002; Fraser and Burnell 1970; Peccoud et al. 2004; Tinker and Mather 1993) to explore specific impacts of modifications to breeding pipelines (Cress 1967; Li et al. 2012a; Mi et al. 2014; Podlich and Cooper 1998; St Martin and Skavaril 1984; Sun et al. 2011). The initial simulation studies focused on discoveries about nonadditive genetic architectures and dependencies on environmental signals (Cooper et al. 2002; Cooper and Podlich 2002; Peccoud et al. 2004) or resource allocations in terms of numbers of plots and environments (Longin et al. 2007; Wang et al. 2004, 2007). Also, simulation experiments have been reported for marker-assisted backcross introgression (Cameron et al. 2017; Chevalet and Mulsant 1992; Frisch et al. 1999; Herzog and Frisch 2011; Herzog et al. 2014; Hillel et al. 1990; Hospital 2001; Hospital and Charcosset 1997; Peng et al. 2014a, b; Visscher et al. 1996), accuracy of genomic prediction methods (Daetwyler et al. 2010; Heslot et al. 2012; Howard et al. 2014), genomic selection across stages within line and hybrid development pipelines (Longin et al. 2007, 2015; Marulanda et al. 2016; Mi et al. 2014) and genomic selection across stages and cycles (Bernardo and Yu 2007; Gaynor et al. 2017; Gorjanc et al. 2018; Heffner et al. 2009; Jannink 2010).

Most simulation studies can be characterized as exploratory in silico experiments. They not only allow the investigators to accommodate deviations to the assumptions underlying the theoretical breeder’s equation, but also allow the investigators to explore replicated combinations of factors and modifications to pipelines that could affect outcomes (Gaynor et al. 2017; Podlich and Cooper 1998; Sun et al. 2011). A clear advantage of this approach is that sample sizes, numbers of replications, numbers of factors and modifications can be very large, and the time required to conduct an in silico experiment is very small, thus allowing the investigators to screen a large number of possible designs before investing in possible expensive and time-consuming product development projects.

As with all experimental approaches, simulation experiments are subject to an investigator bias that constrains the consideration of possible factors and modifications to development projects. Nonetheless, an innovative design has emerged for integrated recurrent genomic selection and conventional line development projects (Gaynor et al. 2017). The innovation (Fig. 6d) is referred to as a two-part program consisting of a conventional product development component that develops and screens inbred lines using established pipelines and a genetic improvement component enabled by rapid cycling with recurrent genomic selection. Models used in the population improvement part are created using training sets consisting of phenotypic and genotypic data from established lines. Furthermore, drift between the rapid cycling genetic improvement part and the training set from development pipelines as well as the loss of genetic diversity through selection can be reduced by selecting crosses that balance the trade-offs between maintaining genetic diversity and genetic gain (Gorjanc et al. 2018).

Future maize breeding teams, however, should be careful to avoid confusing outcomes from exploratory in silico experiments with designing projects for specific breeding objectives. Exploratory in silico investigations include discovery objectives. In contrast, a systematically designed project will explicitly state objectives in terms that can be optimized.


An engineering approach to design optimal maize breeding projects Merriam-Webster (https://www.merriam-webster.com/) considers optimization to be the process of designing a system to be the most effective and efficient possible, as determined by mathematical models and computational procedures. Operations research (OR) is a subdiscipline of applied mathematics devoted to the study of optimization of complex systems. OR was created to provide quantitative risk assessments of proposed military activities for uncertain and dynamic conditions in WWII and has since been used to design optimal manufacturing, transportation, energy and communications systems and networks. Some of the first civilian applications were in agriculture (Boles 1955; Heady 1954; Heady and Pesek 1954; Rendel and Robertson 1950; Robertson 1957), but with one exception (Johnson et al. 1988) OR was ignored for designing plant breeding systems until about 10 years ago (Akdemir et al. 2018; Akdemir and Sanchez 2016; Byrum et al. 2016, 2017; Cameron et al. 2017; Canzar and El-Kebir 2011; De Beukelaer et al. 2015; Han et al. 2017; Xu et al. 2011).

OR approaches are based on systematic development of mathematical programming (MP) models. MP models are comprised of: clear measurable objectives that are translated into objective functions consisting of decision variables and constraints (Winston et al. 2003). Often objectives are not complementary, and the optimization problem consists of multiple competing objectives such as minimizing time while assuring the probability of success is > 0.95 for an enforced budget constraint. In a MP model, the components of annual genetic gain, such as described in Table 1 of (Xu et al. 2017), would be considered decision variables, while the “subcomponents” and “contributors” would be considered constraints. Decision variables are parameters of the objective functions that can be controlled, e.g., numbers of years to complete a cycle, number of progenies to grow per evaluation stage, number of markers or phenotypes to assay and possible selection intensities. Constraints are limitations on the decision variables, e.g., budget restrictions, time required for stages of phenology, reproductive biology, assay deadlines, size and number of field plots.

After translating the objectives into mathematical functions, and the decision variables and constraints into a MP model, the model may be recognized as belonging to a family of models that have been previously solved analytically. For example, an optimization model to maximize annual genetic gains is likely to be of the same form as nonlinear programming models (Winston et al. 2003). If it is, then there exists a large library of algorithms to solve nonlinear programming models some of which use KKT conditions (Karush 1939; Kuhn and Tucker 1951) providing a set of feasible solutions that have been proved to be among the best possible. Most of these algorithms have been implemented in solver software packages such as GUSEK, R, MATLAB and even EXCEL.

The allure of analytic solutions supported by mathematical proofs to find a best possible breeding design should be tempered by recognizing that optimization MP models for plant breeding projects will seldom consist of a single objective, nor will there be known functional relationships among constraints because stochastic processes rather than mechanistic functions generate envirotypes, annual budgets and genotype to phenotype relationships. The relationships among these and their impact on meeting the objectives will need to be investigated with simulations similar to those used for exploratory in silico experiments.


Examples of optimal designs for plant breeding projects As previously noted, there are many published exploratory simulation experiments for backcross introgression of a single allele from homozygous donors to homozygous recipients (Frisch et al. 1999; Herzog and Frisch 2011; Herzog et al. 2014; Hospital 2001; Hospital and Charcosset 1997; Hospital et al. 1992; Peng et al. 2014a, b). From among these publications (Peng et al. 2014a, b) investigated the largest number of backcross breeding strategies, where each strategy represented a combination of backcrossing generations, number of progeny per generation and numbers and genomic distributions of markers assayed per backcross generation. They identified one strategy that was better than all others with respect to recovering the introgressed allele, the average recovery of the recipient genome among four selected progenies, and minimal number of backcross generations, minimal number of progeny and minimal marker costs. Rather than exploring a larger number of possible backcrossing strategies, Cameron et al. (2017) formulated an optimization model in which the objective was to maximize the probability of success, where success was clearly defined for a genome with the desirable allele in a homozygous condition and no more than the average donor genome described for the best strategy identified by Peng et al. (2014a, b). The solution to the optimization model was a backcross design that doubled the probability of success for about the same cost and time as that of the best model found by Peng et al. (2014a, b). Subsequently, the optimization model was modified by removing the constraint of backcrossing selected progeny to the recipient line every generation. The best solution to the modified model increased the probability of success by another 20% and reduced costs and time. In other words, backcrossing is not always the best action to pursue in an introgression project (to be published at a later date). This also illustrates that changing constraints and/or decision variables of the optimization model will likely produce a different (often better) design to meet the objectives.

Introgression of a single desirable allele inevitably produces the challenge to design projects that combine several desirable alleles from multiple donors into a single recipient line, a.k.a. gene stacking. Initially possible gene stacking designs were suggested based on knowledge of inheritance in the context of traditional breeding designs (Ishii and Yonezawa 2007; Peng et al. 2014a, b; Servin et al. 2004; Ye and Smith 2008). Because gene stacking is similar to assembling products in a manufacturing system, it has been amenable to mathematical programming approaches (Canzar and El-Kebir 2011; De Beukelaer et al. 2015; Xu et al. 2011). Since every gene stacking problem is unique, there will be no single best solution; rather, the solution will need to be found for each set of alleles and the distributions of their specific genomic locations as well as the distributions of the marker loci. Nonetheless, an OR approach should be capable of developing a MP model that can be solved.

After a set of desirable alleles has been stacked into a single line, there will be a need for an optimal design to transfer these to other lines in a product development project. This type of objective might also arise when attempting to transfer multiple desirable alleles from one germplasm group to another. For example, consider the evaluation of tropical lines in high latitudes as a new source for useful genetic variability. Recall that most tropical lines are photoperiod-sensitive. If photoperiod-insensitive alleles from lines adapted to high latitudes were transferred to the tropical lines, the genetic and geographic barriers for genetic improvement would be lowered as it has been in Sorghum (Klein et al. 2008, 2016). There are between a dozen and two dozen loci (Buckler et al. 2009; Romero Navarro et al. 2017) that will need to be introgressed from high-latitude maize lines to produce photoperiod-insensitive tropical introgression lines. Genomic selection is a reasonable approach for this type of challenge (Bernardo 2009). By framing multiallelic introgression as an optimization model, (Han et al. 2017) realized the need for a metric other than genomic estimated breeding values (GEBVs) that would assign a specific combining ability, known as the predicted cross value (PCV), to all possible crosses among the sample of progeny created each generation. In contrast, the estimated GEBV is analogous to general combining ability. A comparison of the PCV with GEBVs and optimized haploid values (OHVs) revealed that selecting specific crosses based on the PCV, rather than random crosses among truncation selected individuals with high GEBVs and OHVs, produced the desired introgression lines in less time (Han et al. 2017). Optimal designs for the use of the PCV are still being developed for introgression projects (submitted). Also MP models have been used to increase the efficiencies of genomic selection for individual and multiple traits by enabling selection of individual crosses instead of truncation selection based on GEBVs (Akdemir et al. 2018; Akdemir and Sanchez 2016).

The power of genomic selection to increase genetic gains through greater selection intensities and shorter time intervals, like all forms of intense selection, could exhaust useful genetic variability and prevent plant breeding populations from reaching their full genetic potential (Bulmer 1971; Hill and Robertson 1968; Jannink 2010; Robertson 1960). Animal breeders have been using MP to address the challenge of minimizing the trade-offs between genetic gain and loss of genetic diversity in breeding populations for at least 20 years (Eynard et al. 2018; Fernandez and Toro 1999; Howard et al. 2017; Kinghorn 1998; Pong-Wong and Woolliams 2007; Woolliams et al. 2015). To our knowledge, MP models still need to be developed to address this challenge in designing maize (plant) genetic improvement and hybrid development pipelines.

Perhaps the most impactful application of OR to plant breeding began in 2009 when Syngenta teamed with Kromite, an OR and decision analytics company, to evaluate the efficiency of breeding systems inherited from the merger of three companies (Byrum et al. 2016). They first recognized that genetic improvement, trait introgression, variety development and variety placement were distinct projects within their breeding programs. Next, they disentangled the variety development pipelines from genetic improvement, trait introgression and product placement projects. They also had distinct variety development pipelines for each maturity group. Overall they identified about 250 decision points per year for variety development. If the decisions were independent and binary (they are not), there would be at least 2250 possible outcomes from the existing pipelines. Next, the evaluation of possible outcomes required critical thinking about appropriate metrics to quantify impacts on meeting their breeding objectives. Based on the development of novel metrics (Byrum et al. 2017) and a comprehensive exploration of the modification space for each variety development project, they implemented modified variety development pipelines, resulting in over $287 M US cost savings during the period 2010 to 2015 and awarding of the 2015 Edelman prize (https://www.informs.org/About-INFORMS/News-Room/Press-Releases/Syngenta-Wins-2015-INFORMS-Edelman-Prize). To our knowledge this approach has not been applied to maize hybrid development pipelines, but given the pressure to reduce costs, it is likely that several commercial maize breeding companies are pursuing similar approaches to quickly become more efficient.


Challenges of the OR approach for maize (plant) breeders The most difficult aspect of executing the OR approach is to clearly define the objectives so that they can be translated into MP models. This is nothing new because the most difficult aspect of maize breeding has always been to clearly define objectives in terms of measureable metrics.

With the development of large databases containing historical field trials, historical weather records and the ability to merge data from these databases, it has been possible to develop envirotypes that are associated with stable and plastic responses (Li et al. 2018; Xu 2016). Linking envirotyping with crop growth models, Cooper et al. (2014) demonstrated that measurable breeding objectives for targeted environments can be clearly articulated, and more importantly, by enlisting precision digital phenotyping and genomic selection the objectives were met and a drought-tolerant hybrid was delivered to the marketplace. Interestingly, the same objectives were met using genome editing by a different research group at the same company (Shi et al. 2017). While both genomic selection and genome editing effectively developed hybrids that were drought-tolerant, it is not clear which approach is more efficient. Also, given the relatively small number of genome edits required to create drought resistant hybrids, there may be genetic introgression approaches that are more efficient.

The question of which approach is most efficient will depend, in part, on whether the breeding team has access to enabling technologies for genome editing and sufficient resources to address regulatory requirements, but more importantly what metrics will represent a successful outcome? How much time and capital will optimally achieve the breeding objectives? These questions frame the optimization challenge in terms of cost, time and probability of success (CTP). With competing objectives in the CTP framework, there will not be a single best solution. Instead, there will be a set of optimal solutions, a.k.a., Pareto frontiers that help decision makers to quantify trade-offs in which it is not possible to improve one objective without degrading others.

While it is tempting to frame optimization models using a CTP recipe, defining success needs to be based on predicted benefits. Success framed in terms of forecast benefits will enable quantification of the trade-off between time and cost. For example, more expensive, shorter times to complete a project could bring greater benefits by being first to market to offset increased costs. Unfortunately, plant breeders and agronomists, in general, have little experience with developing models to forecast benefits, especially in terms of net present value. Forecasting benefits, like predicting phenotypes, is based on uncertain outcomes from stochastic processes. Fortunately, OR has been successfully integrating forecasts and risk assessments for optimal outcomes involving uncertainty since its creation in WWII (Birge and Louveaux 2011). We suggest that it is time for maize (plant) breeding teams to return to their original role as designers of systems for genetic improvement, line and hybrid development, introgression and product placement by learning OR approaches and collaborating with experts in stochastic programming.

Synopsis

Maize and generally plant breeding is gradually transitioning from a black box approach, largely agnostic of genes and alleles affecting trait variation, to a discipline, where decisions are based on a combination of deep understanding of which combinations of genes and respective alleles lead to improved breeding populations and cultivars and massive testing of selected candidates based on prior performance and predictions. Currently, genomic selection is an extension of traditional selection methods, where large numbers of genotypes are evaluated first at DNA and a selected fraction at the more expensive agronomic levels, to ultimately identify a limited number of superior experimental variety candidates. With a more complete understanding of which gene and respective allele combinations would result in the optimal genotype for a given environment, more targeted approaches are expected to emerge, which will enable their design. Hypothetically, such designer genotypes could be obtained using traditional recombinant technology (by crossing carefully selected founder genotypes), or by editing multiple genes. However, there remain challenges in obtaining a more complete understanding of trait variation, including the classical challenges of small genetic effects of QTL, difficulty in predictions in the presence of complex G x G and G x E interactions. Nonetheless, the breeder toolbox is becoming populated with modernized tools. The question is no longer, whether a tool or approach is available, but which of the increasing number of options should be chosen, given limited financial resources, to maximize both genetic gain and economic return. Availability of genome editing as a powerful tool for plant breeders will be substantially affected by its regulatory framework. Even if liberal in some countries, its use may be limited by more restrictive regulations in other countries. Changing climates may drive the need to use broader genetic resources, which in the longer run may help not only to close the gap between actual and potential yields, but also to raise the bar for potential yields.

Author contribution statement

All authors (CA, WDB, MH, SS, WS, KW, MW, JY, TL) contributed equally to design, writing, and revision of this review article. TL served as corresponding author to assemble drafts provided by individual co-authors, to submit files to TAG, and to finalize the submission process.