Introduction

Micronutrient malnutrition (MNM) is not only common in industrialized nations but also serious in developing regions of the world. It is estimated that nearly one third of the world’s population are affected by deficiencies of one or several micronutrients (Mayer et al. 2008). Iron (Fe) is a micronutrient essential to cells in the human body, especially red blood cells, which transport oxygen from the blood to tissues in the body. Iron deficiency, one of the most common forms of MNM worldwide, may cause anemia (Lucca et al. 2002; Tako et al. 2013). Zinc (Zn), another mineral micronutrient, is essential for over 300 enzymes and nearly 2000 transcription factors in the human body. Zinc deficiency will impact multiple body functions and result in a wide variety of symptoms (Prasad 2012). Biofortification, the process of breeding food crops rich in bioavailable micronutrients, is the most effective and economical way to increase the micronutrient quality of food (Bouis et al. 2011; Lung’aho et al. 2011). Thus, development of micronutrient-enriched staple foods is an important goal of crop breeding (Chakraborti et al. 2011).

Maize (Zea mays L.), one of the oldest human-domesticated plants, is among the most important cereal crops in the world. In addition to its use in feed and industry, maize provides about one third of food calories to more than 4.5 billion people (Xue et al. 2014). Maize is also a major supplementary source of iron, magnesium, zinc, and other minerals for humans in many nations. Benefiting from genetic improvement of plant architecture and better management, the average maize yield has increased steadily over the years. However, to meet the demand of human diet, significant improvement is still needed in corn quality, especially micronutrient concentrations in kernels. Unlike the improvement of plant architecture, the genetic improvement of kernel nutrition traits is dependent much more on molecular marker-assisted selection (Prasanna et al. 2010; Yang et al. 2014a; Choudhary and Watson 2013). Therefore, identifying functional molecular markers for kernel nutrition traits has become increasingly important for maize breeding programs (Flint-Garcia et al. 2005; Yan et al. 2011).

Recently, much progress has been made in understanding primary Fe uptake from the soil (Waters et al. 2006; Curie and Briat 2003). The process of iron acquisition by gramineous plants includes four steps, namely biosynthesis of phytosiderophores (mugineic acids) in roots, secretion of phytosiderophores to the rhizosphere, solubilization of insoluble iron in soils by chelation of phytosiderophores, and uptake of the ferric-phytosiderophore complex by roots (Ueno et al. 2009; Ma 2005). The maize Yellow Stripe 1 (ZmYS1) gene encodes a specific transporter that takes up ferric-phytosiderophore complexes into root (Yen et al. 2001; Le Jean et al. 2005; Curie et al. 2001). Expression of ZmYS1 in heterologous systems has shown that ZmYS1 functions as a proton-coupled symporter for phytosiderophore-chelated metals (Roberts et al. 2004). In addition to root, ZmYS1 is also expressed in blades and sheaths of leaves as well as crowns of plants, suggesting that ZmYS1 is involved in both primary iron acquisition and intracellular transport of Fe and other metals (Ueno et al. 2009). Moreover, members of the yellow stripe 1-like (YSL) family that transport Fe(III)-phytosiderophores were further identified in barley and rice (Araki et al. 2011; Murata et al. 2006; Lee et al. 2009; Inoue et al. 2009; Kakei et al. 2012).

Although the role of ZmYS1 in Fe(III)-phytosiderophore uptake has been illustrated (Curie et al. 2001; Yen et al. 2001), the sequence polymorphism of this gene in natural populations has not been investigated. It is also unclear whether and how ZmYS1 sequence variants are associated with changes in mineral concentrations in maize kernels. In the present study, we aimed to analyze the sequence variability of ZmYS1 in natural populations and to test the association of identified sequence variants with mineral concentrations, including copper (Cu), Fe, Zn, calcium (Ca), potassium (K), magnesium (Mg), and phosphorus (P). The results of this work will lay the foundation of molecular marker-associated selection for the biofortification of Fe and Zn in maize kernels.

Materials and Methods

Plant Materials, Field Experiments, and Detection of Mineral Concentrations in Maize Kernels

A total of 88 elite maize inbred lines were used in this study. These inbred lines covered temperate germplasm from five heterotic groups, as well as tropic and waxy germplasm. They represented much of the genetic diversity available to breeding and research programs in China. In addition, some germplasm introduced from other countries, including the USA, Canada, France, and Germany, were also included in this study (Supplementary Table 1).

The inbred lines were grown in two-row plots with a randomized block design of two repetitions in a natural environment during 2013 in Sanya, Hainan province. All lines were self-pollinated, and ears were air-dried before manual shelling. Kernels from the middle of at least three ears in each replicate were harvested and further used for phenotypic analyses.

Maize kernels of each inbred line were oven dried at 70 °C until constant weight. The kernels were further grounded with a stainless steel grain crusher and oven dried at 60–70 °C for 2 h to get sample powder. Each test tube contained 0.5 g of the powder sample, 2 ml of HNO3, and two drops of H2O2. The digestion was performed using a microwave digestion system (MARS5, CEM, USA). The concentrations (mg/g) of Cu, Fe, Zn, Ca, K, Mg, and P were measured using an atomic absorption spectrometer (Solar S4+Graphite Furnace System 97, Thermo Elemental, USA).

DNA Isolation, ZmYS1 Re-sequencing, and Analysis

Genomic DNA was extracted from maize leaves at the four-leaf stage using the CTAB method (Murray and Thompson 1980). ZmYS1 genes from the selected 88 inbred lines were re-sequenced by BGI Life Tech Co., Ltd. using the target sequence capture technology on the NimbleGen platform. The genomic sequences of ZmYS1 (GRMZM2G156599) from the B73 inbred line were used as the reference sequences for target sequence capture.

Sequence alignment of ZmYS1 in the 88 inbred lines was performed using the software Clustal X (Larkin et al. 2007), with the resulting alignment further edited manually. The software DNASP 5.0 (Librado and Rozas 2009) was used to detect the sequence nucleotide polymorphism, haplotype diversity, and recombination and to test the neutral mutation hypothesis. The linkage disequilibrium (LD) between any two polymorphic sites was estimated using TASSEL v4.0 (Bradbury et al. 2007). In addition, the decay of LD with physical distance in ZmYS1 was evaluated by the nonlinear regression (PROC NLIN in SAS software) following Remington’s model (Remington et al. 2001).

Population Structure and Association Analyses

To exclude the effect of population structure on association mapping, all inbred lines were genotyped with the single nucleotide polymorphism (SNP) chips that contained 3072 random SNP markers evenly covering the maize genome (Yang et al. 2014a). SNP genotyping was performed via the GoldenGate assay at the National Maize Improvement Centre of China, China Agricultural University. The population structure was evaluated based on above SNP markers, and resulting Q values were calculated using the STRUCTURE program (Falush et al. 2003). Five independent runs were performed setting the number of populations (k) from 1 to 10, burn in time and Markov chain Monte Carlo (MCMC) replication number both to 100,000, and a model for admixture and correlated allele frequencies. The k value was determined by LnP(D) in STRUCTURE output and an ad hoc statistic Δk based on the rate of change in LnP(D) between successive k. According to the rate of change of log likelihood, the ad hoc statistic Δk was estimated using the STRUCTURE HARVESTER software (Earl 2012). In order to estimate the genetic relatedness among inbred lines, pairwise relatedness coefficients (kinship matrix) were calculated using the software SPAGeDi (Hardy and Vekemans 2002). Association mapping between seven mineral concentrations in maize kernels and the nucleotide diversity of ZmYS1 was performed using TASSEL 4.0 (Bradbury et al. 2007), with a mixed linear model (MLM) controlling both population structure and relative kinship. Only variants with a minor allele frequency (MAF) higher than 0.05 were used in the association analyses.

Results

Nucleotide Diversity and Selection on the ZmYS1 Gene

Sequence polymorphism of ZmYS1 among 88 maize inbred lines across 2684 base pairs (bp) of sequence includes seven exons covering 2049 bp and six introns covering 635 bp. Nucleotide substitutions and insertion and deletion (indel) variations at the ZmYS1 locus are summarized in Table 1 and Supplementary Tables 2 and 3. A total of 61 SNP sites were identified, among which four and 57 sites are singleton variable sites and parsimony informative sites, respectively. In addition, ten indels covering 76 bp were identified, all of which are located in introns. For all 88 inbred lines, the overall nucleotide diversity (π) of the ZmYS1 locus is 0.0068, but the coding regions are less diverse than the intron regions (Table 1 and Supplementary Table 3). On average, single nucleotide changes occur every 52.5 bp in the coding regions, while this value is 28.9 bp for the intron regions. When a sliding window of 100 bp under a step size of 25 bp was used, we found different frequencies of polymorphic sites in 13 regions (7 exons and 6 introns) of the ZmYS1 gene (Fig. 1). The highest nucleotide diversity was found in the 1–100-bp region of the first exon with π = 0.0314. In the coding region, π often dropped to zero, suggesting no polymorphic site. Among the 13 exon and intron regions, only exon 04 does not possess any variants. The observed distribution of SNPs and indels was significantly different (for SNP, χ 2 = 5.615, P < 0.05) from an expected even distribution between exons and introns. This uneven distribution of polymorphisms might be particularly due to the low frequency of variants in the coding region.

Table 1 Summary of parameters for the analysis of nucleotide polymorphisms of the maize ZmYS1 gene
Fig. 1
figure 1

The nucleotide diversity (π) estimated along the sequences of maize ZmYS1 gene. π is calculated using the method of sliding windows of 100 bp with a step of 25 bp. The exons and introns were indexed on the top of the coordinate

The Tajima’s D statistic is commonly used to identify sequences that do not fit the neutral theory model at equilibrium between mutation and genetic drift (Yang et al. 2014b; Tajima 1989). In our analyses of ZmYS1, all estimates of Tajima’s D for exons, introns, and the entire sequence region were not statistically significant, indicating no significant selection in ZmYS1 in the tested population. However, both Fu and Li’s D* and F* (Fu and Li 1993) are significant for exons of this gene, but their estimates were not significant for intron regions (Table 1). In addition, the estimate of Fu and Li’s F* is also significant, when the entire sequence of this gene was used. Although these results could not reject the hypothesis of mutation drift equilibrium, a lack of a footprint of positive selection in ZmYS1 was suggested. We also noticed that the directions for all of these statistics are positive, suggesting low levels of both low and high frequency polymorphisms, possibly resulting from balancing selection on the ZmYS1 gene.

Haplotype Diversity of the ZmYS1 Gene

Analyses of full sequences of ZmYS1 identified 17 haplotypes, with a haplotype diversity (Hd) of 0.8443 (Supplementary Table 4). The tested inbred lines are unevenly distributed in these haplotypes. Among the identified haplotypes, seven contained only one inbred line. The most frequent haplotype is Hap_1, which consists of 16 inbred lines. The other haplotypes with more than 10 lines include Hap_4 and Hap_17, consisting of 12 and 15 inbred lines, respectively.

The coding region of ZmYS1 contained 2049 bp (682 aa). Although no indel was found in the coding regions of ZmYS1, 39 SNPs were detected. When only coding sequences were used to evaluate the haplotype diversity, 14 haplotypes were identified, with an Hd equal to 0.8289 (Supplementary Table 4). Among these 14 haplotypes, five contain only one inbred line. The most frequent CDS haplotype was CDS_Hap_1, which contains 29 inbred lines. The other haplotypes with more than 10 inbred lines are CDS_Hap_3 and CDS_Hap_15, containing 14 and 15 inbred lines, respectively.

Although no indel was found in the coding region of ZmYS1, 13 non-synonymous sites were detected, which may cause heteromorphosis of ZmYS1 protein. Among these non-synonymous sites, the sites 1896 and 1898 belong to the same codon; therefore, these 13 non-synonymous sites will result in 12 amino acid changes. The site 44 possesses three variants encoding three different amino acids. When we translated the coding sequence (CDS) into amino acid sequences, 13 types of ZmYS1 proteins were found to be encoded (Supplementary Fig. 1). Haplotypes CDS_Hap_1 and CDS_Hap_4 encoded the most frequent type of ZmYS1 protein and contained 40 inbred lines.

Linkage Disequilibrium and Recombination Events

Linkage disequilibrium (LD) was investigated between pairwise segregating sites in order to predict the expected resolution and marker density needed for candidate gene association mapping. In this analysis, all SNPs identified in ZmYS1 were used to estimate the LD between two polymorphic sites, and r 2 values were used as the index of LD (Supplementary Fig. 2). In addition, the decay of LD with increasing physical distance was also estimated according to Remington’s model (F = 1123.40, P < 0.0001) (Remington et al. 2001). Our analyses found significant LD in most pairs (739 out of 326 for the tested LD possessed a value of r 2 < 0.1; Supplementary Fig. 2). The LD decays rapidly with increasing physical distance. The predicted value of r 2 declines to 0.1 within 1640 bp at the ZmYS1 locus.

The patterns of the polymorphism identified in the tested inbred lines indicate a history of recombination at the ZmYS1 locus. Under the algorithm of Hudson and Kaplan (Hudson and Kaplan 1985), at least 14 recombination events were responsible for the polymorphism of ZmYS1 locus. The consequences of recombination events are evident in the pattern of polymorphisms when compared the sequence of one haplotype with others. For example, the sequence of the first exon of the Hap_4 is the same as that of Hap_5. However, across the first intron to the sixth intron regions, there are two variants between them. The seventh exon region of Hap_4 was also virtually identical to Hap_5. This result suggested that the ZmYS1 sequence in Hap_5 has resulted from at least two past recombination events relative to Hap_4.

Phenotypic Variations and Association Analysis

The concentrations of Cu, Fe, Zn, Ca, K, Mg, and P in maize kernels were measured, and their descriptive statistics are presented in Table 2. Significant differences in all these mineral concentrations were observed among different maize inbred lines through one-way ANOVA (Table 2). These results suggest that the 88 inbred lines are representative in terms of maize mineral concentrations and are qualified for association analysis. To explore the relationship between these mineral concentrations, pairwise correlation analyses were performed, and the Pearson correlation coefficients (r) between any two concentrations were calculated (Table 2). Interestingly, most correlations were statistically significant, and only five pairwise correlations did not reach the significant level of 0.05. The concentration of Cu in maize kernels was significantly correlated with those of Zn and Ca but not with those of Fe, K, Mg, and P. In addition, the concentration of Fe in maize kernels was significantly correlated with all other mineral contents except for those of Cu and Ca. It is worth mentioning that all the correlation coefficients were positive, suggesting that potentially similar genetic mechanisms are responsible for these mineral concentrations.

Table 2 The result of descriptive statistics, one-way ANOVA, and pairwise correlation coefficients for kernel mineral concentrations among 88 maize inbred lines

In order to determine the association of ZmYS1 nucleotide polymorphism with maize kernel mineral concentrations, we also estimated the population structure of 88 inbred lines using 3072 SNPs. The k value was determined by LnP(D) in STRUCTURE output and an ad hoc statistic Δk that is based on the rate of change in LnP(D) between successive k. The Δk show a clear peak at the true value of k = 3 (Supplementary Fig. 3), suggesting that the total panel could be divided into three major subpopulations. The information of population structure was further used in candidate gene association analysis.

MLM of association analysis controlling the effects of population structure and individual relative relationship was used to identify the association between the mineral concentrations in maize kernels and the nucleotide polymorphism in ZmYS1. Five variation sites, including SNPs 17, 29, 401, and 443 and indel_02, were found to be statistically associated with the Fe concentrations (Table 3) at the level of 0.01. In addition, three other SNPs, including SNPs 31, 43, and 98, were estimated to be statistically associated with Zn concentrations. No associations were identified for other mineral concentrations. Among the SNP sites showing association with Fe and Zn concentrations, three (SNPs 17, 31, and 43) are non-synonymous. These sites might cause phenotypic variation by causing changes in amino acids. Other associated sites might be the result of linkage with these non-synonymous SNPs.

Table 3 Association analysis between the sequence polymorphisms of the maize ZmYS1 gene and the kernel mineral concentrations

We also estimated the allelic effects of the three non-synonymous SNPs associated with Fe and Zn concentrations. The SNP_17 explains 8.01% of the variation of Fe concentration in maize kernels. The concentration of Fe in the inbred lines carrying allele ZmYS1_17A is significantly lower than that of Fe in the inbred lines containing ZmYS1_17C at the level of 0.05, when independent-samples t test was employed (Fig. 2). The SNP_31 explained 8.65% of the variation of Zn concentration in maize kernels. The inbred lines carrying allele ZmYS1_31G are significantly lower in Zn concentration than those carrying ZmYS1_31C. The SNP_43, explaining 12.63% of the variation of Zn concentration, can classify the inbred lines into three alleles, ZmYS1_43A, ZmYS1_43G, and ZmYS1_43T. According to the result of one-way ANOVA, there is statistical significance among the three alleles caused by this site. In addition, we also noticed that the Zn concentration of inbred lines carrying ZmYS1_43A was significantly lower than that of inbred lines carrying ZmYS1_43G and ZmYS1_43T. These results support the strength of the observed association between the three non-synonymous SNPs in ZmYS1 and Fe and Zn concentrations in maize kernels.

Fig. 2
figure 2

Comparisons of maize kernel Fe and Zn concentrations among groups carrying different ZmYS1 alleles. P values for t test and one-way ANOVA comparing the groups carrying different alleles were indexed on the top

Discussion

Significant phenotypic differences and nucleotide polymorphism are essential for genetic mapping via linkage or association analyses (Yang et al. 2014a; Zhang et al. 2013). Maize is a typical outcrossing crop with broad morphological variation, genetic diversity, and high effective frequency of recombination (Li et al. 2012; Yan et al. 2011). It was suggested that the divergence between two maize inbred lines is even greater than that between human and chimpanzees, which diverged about 3.5 million years ago (Buckler et al. 2009). In the present study, analyses of genomic sequences of the maize ZmYS1 gene from 88 inbred lines identified 61 SNPs and 10 indels and an average SNP frequency of one SNP/44 bp in the entire region with a π = 0.0068. The nucleotide diversity of this gene is slightly higher than the overall nucleotide diversities estimated using 27 diverse inbred lines (Gore et al. 2009) and 278 temperate maize inbred lines (Jiao et al. 2012). Coding regions of functional genes tend to be relatively conserved, due to their specificity for and affinity with other types of molecules (Weng et al. 2013). In this study, the intron regions of ZmYS1 also show higher frequency of nucleotide polymorphisms, especially indel variation. In addition, all the estimated values of Tajima’s D in coding, intron, and entire regions were positive, indicating balancing selection with an excess of alleles of intermediate frequencies and a scarcity of rare alleles.

Association mapping through LD analysis is a powerful tool to dissect complex agronomic traits and identify alleles useful for biofortifying target traits (Yan et al. 2011). The candidate gene method of association analysis is also a hypothesis-driven approach for complex trait dissection that aims to improve the chances of identifying the most important alleles (Hall et al. 2010). For example, the rice GS3 gene was cloned using map-based cloning, and a mutant resulting in a stop codon in the second exon was detected as the contributory polymorphism for larger rice grain size (Fan et al. 2006, 2009). Based on the hypothesis that its orthologs in maize might possess similar roles, analyses of GS3 orthologs in maize also revealed that several polymorphic sites are significantly associated with maize kernel size (Li et al. 2010a). Candidate gene association mapping is widely used to detect functional SNPs or alleles that are associated with gene-related agronomic traits in maize, such as Dwarf8 (Thornsberry et al. 2001; Andersen et al. 2005) and Vgt1 (Ducrocq et al. 2008) for maize flowering time; ae1, bt2, sh1, sh2, sugary1, and waxy1 for kernel composition and starch pasting properties (Wilson et al. 2004); lcyE (Harjes et al. 2008) and crtRB1 (Yan et al. 2010) for carotenoid content; GS3 (Li et al. 2010b) and GW2 (Li et al. 2010a) for kernel shape and weight; and Zmisa2 for starch pasting and gelatinization properties (Yang et al. 2014a).

Micronutrient malnutrition, especially zinc and iron deficiency, may seriously affect human health. Biofortification is the most viable method to tackle micronutrient malnutrition of humans in low-income countries (Bouis et al. 2011; Lung’aho et al. 2011). However, a precondition for biofortified breeding program of Fe and Zn concentrations in maize kernels is to detect the relevant genes (Lung’aho et al. 2011). In addition to association analysis, the other powerful and widely used tool for dissecting the genetic basis of micronutrient concentration in grains is quantitative trait locus (QTL) mapping (Jin et al. 2013). QTL analysis has been applied to Fe and Zn concentrations in maize kernels, and a few QTLs were detected (Qin et al. 2012; Lung’aho et al. 2011; Jin et al. 2013; Šimić et al. 2012). In this study, we showed that the maize ZmYS1 gene possessed abundant nucleotide polymorphisms in the tested population. Our data based on association with mineral concentrations in kernels also show that the variations of this gene are significantly associated with Fe and Zn concentrations. Although both of the Fe and Zn concentrations show correlations with other mineral concentrations, no significant association was detected between the polymorphic sites and concentrations of Cu, Ca, K, Mg, and P. The uptake of Fe and Zn is a complex process that is likely influenced by multiple genes (Ueno et al. 2009; Ma 2005). Thus, our results need further verification since only ZmYS1 was used in this study.

Many factors, including genetic diversity, rate of LD decay, sample size, and population structure, can influence the effect and power of candidate gene approach of association mapping (Yan et al. 2011). A larger population size will provide more power and precision for candidate-gene association analyses (Rafalski 2010). The present study used 88 maize inbred lines, including the representative lines of five temperate heterotic groups, tropical and waxy germplasm. This population represents much of the genetic diversity available to breeding and research programs in China. The relatively balanced sample with extensive genetic diversity in association mapping will reduce the possibility of false discovery (Xu et al. 2013). In addition, the validation of causal polymorphisms in candidate genes should be confirmed in distinct association mapping populations (Larsson et al. 2013; Kumar et al. 2014). Further research should focus on other population with adequate genetic diversity to confirm the causal variants of maize kernel Fe and Zn concentrations within ZmYS1.