Introduction

Waxy maize, i.e., sticky maize (Zea mays L. sinensis Kulesh), was originally found in China, where it was produced before 1760 (Zeng 1987). Collins (1909) was the first to describe the characteristics of waxy maize. The endosperm of waxy maize is dark, smooth, and waxy, hence the name ‘waxy endosperm’, and contains 95–100 % amylopectin (Andrés and Bascialli 1941; Sprague et al. 1943), which is derived from an endosperm mutation caused by genetic drift (Sprague 1939; Stonor and Anderson 1949; Mangelsdorf 1924, 1974; Andrés and Bascialli 1941; Breggar 1928). The recessive single gene mutation wx is responsible for the formation of waxy endosperm in maize grains, while the dominant gene Wx controls the formation of normal endosperm. The Wx locus is located on the short arm of chromosome 9 (Collins 1909; Kempton 1919). The Wx gene encodes an NPD-glucose-starch glucosyltransferase, which is responsible for the biosynthesis of amylose (Nelson 1962; Echt and Schwartz 1981). NPD-glucose-starch glucosyltransferase bound to the starch grains of the recessive wxwx genotype has very low activity; therefore, its endosperm has a very low content of amylose (Nelson 1962). The Wx gene has been cloned and sequenced—it contains 3,718 base pairs and comprises 14 exons and 13 introns (Klösgen et al. 1986)—enabling the study and use of the Wx gene, such as in the development of internal markers of the gene for marker-assisted selection (MAS). The website MaizeGDB has published three simple sequence repeat (SSR) markers for the detection of the Wx gene: phi022, phi027 and phi061.

Because of its high amylopectin content and stickiness, sticky maize has been widely used in food, animal feed and food processing. However, sticky maize has low lysine content. Yang et al. (2005) developed a novel high-lysine mutation, which had opaque endosperm and was named opaque-16 (o16). The mutant line had a lysine content of 0.36 %. The o16 gene is located on the long arm of chromosome 8 and is within 5 cM of the molecular marker umc1141. The effect of pyramiding the o16 gene and the opaque-2 (o2) gene was studied and the results showed that pyramid lines generally had higher lysine content in their grains than their parents. Zhang et al. (2010) introduced the o16 gene into an o2 line using backcrossing and obtained lines with higher lysine content.

Therefore, it is possible to develop maize lines of superior quality in terms of food, animal feed, processing and nutritional value through pyramiding the two genes o16 and wx. However, this objective would be difficult to achieve through conventional breeding because: (1) the grain phenotypes of the two mutants and their recombinant lines are similar and hard to distinguish with the naked eye; (2) both mutations are recessive, and require multiple generations of backcrossing and selfing; and (3) the phenotype of the endosperm can be affected by heterofertilization (Yang et al. 2008). MAS can be used to pyramid the two genes accurately and quickly because it allows detection of the target genes at the DNA level, reduces the number of generations required for backcrossing and selfing, and eliminates the effect of heterofertilization.

In this study, o16 and wx lines were used as parents to construct cross and backcross populations. MAS and biochemistry were integrated to combine the o16 and wx genes to obtain innovative breeding materials with high lysine and amylopectin contents. These breeding materials will be useful in maize quality improvement.

Materials and methods

Population construction

Lines QCL5019, QCL5008 and QCL3024 were used to develop single-cross and backcross populations, respectively. The aim was to obtain materials with combined o16 and wx genes in different backgrounds, and to compare the relative difference between single-cross and backcross MAS.

Parents

QCL3024 is an inbred line containing the o16 gene. Its grain lysine content is 0.353 % and its amylopectin content is 60.71 %. QCL5019 and QCL5008 are two inbred lines containing the wx gene, with grain lysine contents of 0.294 and 0.287 %, respectively, and amylopectin contents of 97.89 and 97.61 %. The methods used to determine their lysine and amylopectin contents are described below in the section “Quality analysis of seeds”.

Cross population

QCL3024 and QCL5019 were used as parents for an F1 cross. F1 hybrids were selfed to get F2 seeds. Target F2 plants recessive homozygous at the o16 and wx loci were selected using MAS and selfed to get F3 family seeds, some of which were used for planting and the rest for quality analysis. After quality analysis and molecular marker screening, F3 lines of superior quality were selected and selfed to get F4 seeds. Quality analysis was performed on the F4 seeds.

Backcross population

QCL5008 and QCL3024 were used as parents for an F1 cross. BC1F1 and BC2F1 were obtained using QCL5008 as the recurrent parent because it has better drought resistance than QCL3024. BC2F2 was obtained through selfing of BC2F1. Individual plants containing the target genes were first selected from the progeny populations of the three generations mentioned above, and were then selected for their genetic background. After quality analysis on the seeds from the selected plants, plants that were recessive homozygous at the target gene loci, that contained more of the recurrent parent’s genetic background, and that had higher lysine and amylopectin contents were selfed to get BC2F3 family seeds. After quality analysis and molecular marker screening, BC2F3 lines of superior quality were selfed to get BC2F4 seeds. Quality analysis was again performed on the BC2F4 seeds.

DNA isolation, PCR amplification and electrophoresis

Genomic DNA was extracted from young seedlings using the CTAB method (Yang 2005). PCR amplification and electrophoresis of the PCR products were performed according to Yang et al. (2004, 2005). PCR was carried out using a 2720 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) and a DNA Engine Peltier Thermal Cycler (Bio-Rad, Hercules, CA, USA). The amplification products were separated on a Sequi-Gen® GT (Bio-Rad) electrophoresis system.

Marker-assisted selection

Foreground selection

Foreground selection refers to selection for the target genes o16 and wx. The SSR markers umc1141 and umc1121, which are linked to the target gene o16, were used to select the o16 gene (Yang et al. 2005). The SSR markers phi022, phi027 and phi061 were used to screen for the wx gene. The marker phi022 is located in the GC stretch upstream of the coding sequence of wx; phi027 is located between direct repeat 1 and the GC stretch upstream of the coding sequence; and phi061 is located in exon 12 and intron 12 of the coding sequence (Klösgen et al. 1986).

Background selection

Ninety-six SSR markers covering the whole genome were used in polymorphism screening to select polymorphic markers between the two parents of the backcross. The selected markers were used for background selection of individuals in the BC1F1, BC2F1 and subsequent generations.

The primer sequences of the SSR markers used in the target gene and background selections were obtained from www.maizegdb.org and synthesized by the Shanghai Generay Biotech Company.

Quality analysis of seeds

The lysine content of seeds was determined using the Acid Orange-12 Dye Binding Lysine method (DBL) (Yang et al. 2004). The amylopectin content of seeds was qualitatively analyzed in the BC1F1, BC2F1, BC2F2 and BC2F3 generations using the I2-KI procedure and quantitatively measured in the F4 and BC2F4 generations using double-wavelength spectrophotometry (DWLS) (He 1981). Each biological sample was repeated twice and the mean was used for analysis.

Statistical analysis

The χ2 test was used to analyze the segregation of DNA markers in segregating generations; the t test was used to check the significance of differences in lysine and amylopectin contents. These analyses were performed using SPSS 11.5 software.

In the backcross population, the SSR marker alleles were classified into four kinds, A, B, H and U; where A represented alleles identical to the recurrent parent, B represented alleles identical to the donor parent, H represented heterozygous alleles, and U represented unidentified alleles. Based on the statistical analyses of the molecular markers, the formula G(g) = [L + X(g)]/(2L) was used to calculate the restoration rate of the recurrent parent genetic background in progenies that were selected because they contained the desirable target genes. In the equation, the variable G(g) is the restoration rate of the recurrent parent genetic background in backcross g generation, X(g) is the number of alleles identical to the recurrent parent, and L is the total number of molecular markers in the analysis. The formula E[G(g)] = 1 − (1/2)g+1 was used to calculate the theoretical restoration rate of the genetic background of the recurrent parent, where g is the number of backcross generations (Xia and Zheng 2002; Hospital et al. 1992).

Results

Analysis of SSR marker polymorphism

Our preliminary experiments indicated that the linkage marker umc1141 of the o16 gene and the internal markers phi027 and phi061 of the wx gene showed polymorphism among the parents QCL3024, QCL5019 and QCL5008, and that they were co-dominant. Therefore, umc1141 and phi027 were chosen for foreground selection of the o16 and wx loci, respectively.

Ninety-six SSR markers covering the whole genome of maize were used in this study. Sixty of them showed polymorphism between the two parents of the backcross population and were used for background selection (Supplementary Table S1).

Foreground selection for target genes in various cross and backcross generations

The two markers umc1141 and phi027 were used for foreground selection in different generations of cross and backcross populations. To intuitively describe the foreground selection results and clearly show whether any given locus exhibits distorted segregation, the selection results are listed in Table 1 by individual locus. χ2 tests revealed that there were no significant differences between the theoretical values and actual results regarding the selection of individual plants in different generations using the two markers. Therefore, the segregation of the two marker loci was not skewed.

Table 1 Results of foreground selection for O16 and Wx loci in the single-cross F2 population and each backcross generation population

Theoretically, 17 plants in the F2 population could produce one plant with the double recessive genotype (1/[(1/16) × 97 %]) because umc1141 is 3 cM away from the o16 gene (Yang et al. 2005). However, to allow for germination failure, PCR amplification and electrophoretic detection failure, disease and insect damage, pollination failure, etc., the population should be enlarged. Hence, 355 plants were raised in the F2 generation. The o16 and wx loci were detected in 345 and 339 of these, respectively; they were not detected in the remaining plants because of failure of PCR and electrophoresis. Seventeen double recessive plants were eventually identified, which corresponded to the theoretical value (71 × ¼ × 97 % = 17.2) (Table 1). The seventeen target plants were then grown and selfed to get F3 seeds for quality analysis.

One-hundred and seventy-seven, 165 and 355 plants were grown in the BC1F1, BC2F1 and BC2F2 populations, respectively. Of these, 18 O16o16wxwx plants, 35 O16o16wxwx plants and 11 o16o16wxwx plants were selected from the BC1F1, BC2F1 and BC2F2 generations, respectively. The numbers of selected plants were lower than the theoretical values of 32 (66 × ½ × 97 %), 78 (81 × 97 %) and 16 (66 × ¼ × 97 %) because plants with indistinct band patterns at the two loci were eliminated in the electrophoresis experiment (Table 1). Eight, 15, and nine plants were eventually harvested in the BC1F1, BC2F1 and BC2F2 generations, respectively, while the remaining individuals were lost to pollination failure and lodging. The nine double recessive plants were selfed to get F3 seeds for quality assay.

In the F3 and BC2F3 generations, the four F3 and three BC2F3 families selected after quality assay were planted to check that they had the genotype o16o16wxwx (Supplementary Fig. S1 A, B).

Background selection in backcross progenies

In this study, only foreground selection was performed on the single-cross F2 population. The 17 selected o16o16wxwx plants were not targeted to any parental genome by molecular markers and were free of recombination.

In BC1F1, the background restoration rate of the eight harvested plants was between 76.83 and 84.14 %, averaging 79.88 %. This was 4.88 % higher than the theoretical value. One plant that had a background restoration rate of 84.14 % was used for subsequent backcrossing.

In BC2F1, the background restoration rate of the 15 harvested plants was between 78.78 and 95.45 %, averaging 87.27 %. This was 0.23 % lower than the theoretical value. Two plants that had a background restoration rate of 95.45 % were used for subsequent selfing.

In the nine harvested individuals of the backcross BC2F2 population, the restoration rate of genetic background was 90.74–96.30 %, the proportion of the recurrent parent genotype was 81.48–92.59 %, the proportion of the donor parent genotype was 0–3.7 %, the proportion of heterozygous genotype was 0 %, and the proportion of unidentified genotype was 7.41–18.52 % (Table 2). Of these nine plants, plant nos. 3, 4, 5, 7 and 8 had the least donor genotype, and plant no. 5 had the highest restoration ratio to the recipient.

Table 2 Restoration ratio of genetic background and proportions of the recurrent parent, donor parent, heterozygous and unidentified genotypes of the nine selected individuals in the backcross BC2F2 population

For the nine plants harvested, the genetic background restoration rate on chromosome 9 ranged from 85.1 to 100 %, and the heterozygous genotype ratio was 0 %. Of them, plant no. 8 was the only individual that contained an unrecognizable fragment at umc1698. This indicated that chromosome 9 in these nine plants contained more of the recurrent parent wx line’s genetic background.

Based on the background analyses of the whole genome and chromosome 9, plant no. 5 had the highest genetic background restoration rate.

Analysis of lysine content

The lysine content of the 17 F3 seeds from the single-cross population was 0.227–0.372 %, probably due to recombination between the o16 gene and the umc1141 marker locus, and the effect of amino acid modifiers. Four families with higher lysine content were selected and selfed to get F4 seeds. The lysine content of the four F4 seeds was 0.342–0.373 %, averaging 0.355 %. This was not significantly different from QCL3024 (t = 0.332, P > 0.05) but highly significantly different from QCL5019 (t = 9.041, P < 0.01) (Table 3).

Table 3 Lysine content of four o16o16wxwx family seeds in the single-cross F4 generation and three o16o16wxwx family seeds in the backcross BC2F4 generation

The lysine content of the nine BC2F3 seeds from the backcross population was 0.290–0.367 %, probably for the aforementioned reasons. Three lines with higher restoration rate and lysine content were selected and selfed to get BC2F4 seeds. The lysine content of the three BC2F4 seeds was 0.339–0.367 %, averaging 0.352 %. This was not significantly different from QCL3024 (t = –0.123, P > 0.05) but was significantly different from QCL5008 (t = 7.981, 0.01 < P < 0.05) (Table 3).

Because of the pyramiding of the o16 and wx genes, the lysine content of the sticky parents QCL5019 and QCL5008 was augmented by 16–28 % (Table 3). The augmentation was especially high in lines F4-4 and BC2F4-1.

Analysis of amylopectin content

After I2-KI treatment, the seed endosperms of F3, BC2F1, BC2F2 and BC2F3 turned purplish-red, but the endosperm of the non-sticky parent QCL3024 turned dark blue. This indicated that these lines had a higher content of amylopectin but accurate measurement of their content was not conducted.

The DWLS method was used to analyze the amylopectin content of four F4 seeds from the single-cross population, which averaged 97.89 %. This was not significantly different from QCL5019 (t = –0.040, P > 0.05) but it was highly significantly different from QCL3024 (t = 297.242, P < 0.01) (Table 4).

Table 4 Amylopectin content of four o16o16wxwx family seeds in the single-cross F4 generation and three o16o16wxwx family seeds in the backcross BC2F4 generation

The average amylopectin content in the three BC2F4 seeds from the backcross population was 98.15 %, which was not significantly different from QCL5008 (t = 1.279, P > 0.05) but was highly significantly different from QCL3024 (t = 88.138, P < 0.01) (Table 4).

Because of the pyramiding of the o16 and wx genes, the amylopectin content of the high-lysine parent QCL3024 was elevated by 61–63 % (Table 4). The increase was especially high in lines F4-1 and BC2F4-1.

Genomic background differences between the four F6 and three BC2F5 lines

The four selected F4 lines derived from the single-cross were inbred for two seasons to get four F6 lines, and three BC2F4 lines derived from the backcross were inbred for one season to get three BC2F5 lines. The genomic background differences of these nearly-pure F6 and BC2F5 lines were determined.

One-hundred and eighty-nine SSR markers covering the whole maize genome were screened at non-target loci. Of these, 177 were polymorphic among the three parents, QCL5019, QCL3024 and QCL5008, and were used for genomic background detection (Supplementary Table S2).

In the four F6 lines from the single-cross, the proportion of the maternal parent (QCL3024) genotype was 35.59–42.94 %, the proportion of the paternal parent (QCL5019) genotype was 30.51–38.42 %, the proportion of heterozygous genotype was 0.56–1.69 %, the proportion of unidentified genotype was 14.69–18.08 %, and the proportion of non-polymorphic genotype was 10.17 % (Table 5). In the three BC2F5 lines from the backcross, the proportion of the recurrent parent (QCL5008) genotype was 71.19–74.01 %, the proportion of the donor parent (QCL3024) genotype was 3.95–6.78 %, the proportion of heterozygous genotype was 1.69–3.39 %, the proportion of unidentified genotype was 4.52–5.08 %, and the proportion of non-polymorphic genotype was 15.25 % (Table 5). It was obvious that the single-cross lines shared nearly equal biparental genome components, while the genome composition of the backcross lines tended towards the recurrent parent; moreover, the single-cross lines had more unidentified genotype and less non-polymorphic genotype than the backcross lines. This indicated that the single-cross lines had more complicated genomic backgrounds than the backcross lines.

Table 5 Proportions of QCL5019, QCL3024, QCL5008, heterozygous, unidentified and non-polymorphic genotypes in the selected o16o16wxwx lines derived from the single-cross and backcross

In the three BC2F5 lines from the backcross, after subtracting 27 non-polymorphic marker loci between the two parents from the 177 total marker loci, the restoration ratio of the genetic background was 92.00–93.67 % and the proportion of the recipient parent genotypes was 84.00–87.33 %, in accordance with the nine plants in the backcross BC2F2 population.

Discussion

Through MAS, we were able to successfully pyramid the o16 and wx genes in maize. The lysine content of the pyramid lines was 20 % higher than the waxy parent, and the amylopectin content of the pyramid lines was 60 % higher than the high lysine parent. Therefore, the o16 and wx genes can work together in the same genome to control the content of lysine and amylopectin in maize, resulting in increases in both. This is of significance for the improvement of maize quality for food, animal feed, and processing uses.

Both the lysine and amylopectin contents in maize grain are qualitative–quantitative traits. Quantitative variation exists in the selected materials (Tables 3, 4). Since the marker linked to the o16 gene was used in foreground selection and the o16 line QCL3024 has modifiers, the lysine content of the selected lines could be influenced by recombination and modifiers. Although a marker within the wx gene was used for the foreground selection, the endosperm ploidy level of the selected materials could affect their amylopectin contents (Sun et al. 1998). Therefore, we examined the lysine and amylopectin contents of selected individuals and lines in various generations that already contained homozygous recessive wx or o16 genes to ensure that the selected materials were of superior quality.

As is well known to breeders, the F2 population derived from a single-cross is the main origin of second-cycle line selection in maize. Therefore, the purpose of using the o16 line QCL3024 and the wx line QCL5019 to make a single-cross F2 population was to breed second-cycle lines that combined the two target genes, o16 and wx. This is a forward breeding MAS scheme (Sorrells 2007; Gupta et al. 2010). In this strategy, forward selection is performed for marker loci linked to major and consistent quantitative trait loci, and background selection is rarely performed (Holland 2004). In our tests, background selection was not conducted to reducing the MAS cost. This strategy is quick for pyramiding target genes. In our experiment, double recessive F3 seeds could be obtained in the third season, while double recessive BC2F3 seeds could be obtained only in the fifth season. On the other hand, the selected homozygotes at both the o16 and wx loci under the single-cross scheme have complex genetic backgrounds (Table 5). Thus, the selected inbred lines necessitate combining ability analysis before their application in breeding programs. In contrast, the selected lines under the backcross scheme, known as a marker-assisted backcrossing (MABC) strategy (Gupta et al. 2010; Xu et al. 2012), have a similar genetic background to the recurrent parent. In our results, the nine individual plants in the backcross BC2F2 population had 91–96 % recurrent parent genome content (Table 2), which is in accordance with the results reported by Babu et al. (2005). In addition, the three BC2F5 lines selected by MABC had 92–94 % recipient genome content. Accordingly, these three lines could be directly used in breeding programs because QCL5008 has good combining ability.

However, when pyramid lines, either from a cross or a backcross scheme, are used to improve other maize materials, the number of generations needed can be greatly reduced compared with improving the materials gene-by-gene. Furthermore, along with the high-quality reference genome sequences that are available, whole-genome genotyping can be done through resequencing or high-density chips containing informative single nucleotide polymorphisms for all genes. It is possible to carry out MAS for many traits, especially complex traits (Xu et al. 2012). Thus, it can be expected that pyramid lines with many desirable traits will be incorporated into breeding programs in the future.