Introduction

Maize (Zea mays L.) is one of the most important crops in the world for food, livestock feed, and biofuel because it has excellent adaptability to a wide range of environments in tropical and temperate regions (Steinhoff et al. 2012). Researchers in many countries are interested in the collection and preservation of maize germplasms, and maize genetic resources are stored in gene banks throughout the world (Li et al. 2004). Today, maize consumption in Korea is increasing as the population transitions from a traditional diet based on rice to a western diet based on meat. Thus, the country needs new high-performing maize cultivars. RDA-Genebank maintains 8472 genetic variants of maize (http://www.genebank.go.kr/). Although many RDA-Genebank maize PGRs (Plant Genetic Resources) have already been evaluated in the field for population diversity and phenotypic characteristics, there has been a lack of analysis of genetic diversity (GD) and relationships at the molecular level. In addition, most of the archived accessions have been utilized rarely or not at all in maize breeding programs. In order to improve breeding material selection and development of core germplasms, we selected 105 accessions from 1000 maize accessions from the Korean RDA-Genebank for further analysis based on their morphological characteristics, such as silking stage, plant height, and ear length (Kim et al. 2016).

Understanding the GD and population structure is helpful for germplasm preservation and utilization for crop improvement. GD information provides breeders with tools to develop new and improved cultivars with better traits (Govindaraj et al. 2015). However, utilization of PGR is lower than it could be because of inadequate information for breeders and lack of genetic information about cultivar collections (Vančetović et al. 2010). Molecular markers are used to assess GD and population structure to predict hybrid performance and heterosis because these markers are inherited by Mendelian rules and are not influenced by environmental factors (Kashiani et al. 2012; Legesse et al. 2007; Solomon et al. 2012). A remarkable variety of molecular marker techniques have developed in recent years. Among different molecular marker systems, simple sequence repeats (SSRs) or microsatellites, which are short sequences containing tandemly repeated copies of 1–6 nucleotides, are the most suitable markers for assessing GD and population structure among PGR of maize. SSR analysis has advantages for identifying high levels of allelic variation and is reliable, reproducible, and discriminating (Rafalski et al. 1996; Akagi et al. 1997; Smith et al. 1997; Enoki et al. 2002). Confirming the genetic basis of agronomically valuable traits is very important for crop improvement in plant breeding programs. Association analysis based on linkage disequilibrium was first introduced in human genetics to identify genes that control traits (Khoury et al. 2009). This method has since been successfully applied to analysis of genetic markers and agronomic traits of interest in many crops (Flint-Garcia et al. 2005; Yu and Buckler 2006), such as rice (Borba et al. 2010), maize (Mezmouk et al. 2011), and barley (Lorenz et al. 2010). Furthermore, association analysis provides some advantages over quantitative trait locus (QTL) mapping, including reduced time and costs, evaluation of more than two alleles per locus, and increased mapping resolution (Flint-Garcia et al. 2005; Zhu et al. 2008).

Genetic analysis is very important to ensure the long-term success of preservation and utilization of PGR in the RDA-Genebank. Therefore, our objective was to investigate the GD of 105 maize accessions from the RDA-Genebank using SSR markers, and to evaluate their population structure and clustering patterns. We also attempted to elucidate the genetic bases of agronomic traits by analyzing the associations of 100 SSR markers with 11 agronomic traits. These results will help to improve maize breeding programs and will inform the preservation and application of maize genetic resources in Korea.

Materials and methods

Plant materials and phenotypic evaluation

The accession numbers and sources of 105 maize accessions are listed Table 1. All maize accessions were obtained from the Genebank of National Agrobiodiversity Center (NAAS) of the Rural Development Administration (RDA) of Korea. The 30 seeds for each maize accession were sowed in plastic plug tray at 25 April 2015 and keep in greenhouse to 3–4 leaf stage for germination and cultivation. And then seedling plants transferred to field at the College of Agriculture and Life Sciences, Kangwon National University, Chuncheon, Gangwon-do. The experimental design was a randomized complete block with three replicates. Each replication was planted as a block design with seven plants per each accession, and the distance between rows and plants was 70 and 25 cm, respectively. A total of 11 agronomic traits were evaluated in a field: percent germination rate (GR), tasseling stage (DT), silking stage (DS), stem diameter (SD), plant height (PH), ear height (EH), leaf width (LW), leaf length (LL), ear length (EL), ear row number (ER), 100 kernel weight (100 KW). Basic statistics were performed using Microsoft Office Excel 2010.

Table 1 Derivation of 105 maize accessions of RDA-Genebank used in this study

DNA extraction and SSR analysis

Genomic DNA was extracted from young maize leaves as described by Dellaporta et al. (1983), with minor modifications. One hundred SSR markers, distributed across the ten maize chromosomes (ten loci per chromosome), were used to evaluate genetic variation in 105 maize accessions from RDA Genebank. The SSR markers used in this study were obtained from MaizeGDB (http://www.maizegdb.org/).

SSR amplification was conducted in a total volume of 30 µl, and consisted of 20 ng of genomic DNA, 1× PCR buffer, 0.3 µM forward and reverse primers, 0.2 mM dNTPs, and 1 unit of Taq Polymerase (Biotools). The PCR profile consisted of a 5-min initial denaturation period at 94 °C, followed by two 1-min denaturation cycles at 94 °C, a 1-min annealing cycle at 65 °C, and one 2-min extension at 72 °C. After the second cycle, the annealing temperature was decreased in 1 °C increments every second cycle, until a final temperature of 55 °C was reached. The last cycle was then repeated 20 times. A 10-min extension at 72 °C followed the completed cycles.

Five µl of the final reaction product was mixed with 10 µl of electrophoresis loading-buffer (98% formamide, 0.02% BPH, 0.02% Xylene C, and 5 mM NaOH). After denaturation and immediate cooling, 2 μl of the sample was loaded onto a 6% denaturing (7.5 M urea) acrylamide-bisacrylamide gel (19:1) in 1× TBE buffer, and electrophoresed at 1,800 volts and 60 watts for 120 min. The separated fragments were then visualized using a silver-staining kit (Promega, USA).

Data analysis

The number of alleles, allele frequency, major allele frequency (MAF), gene diversity (GD), and polymorphic information content (PIC) for 100 SSR markers were calculated with the PowerMarker 3.25 program (Liu and Muse 2005). GD is defined as the probability that two randomly chosen alleles from the population are different. It can be estimated at the lth locus as:

$${\text{Gene diversity }}\left( {{\text{GD}}} \right)=\left( {1 - \sum\limits_{u=1}^k {p_{lu}^2} } \right)/\left( {1+\frac{{1+f}}{n}} \right)$$

where f represents the inbreeding coefficient, Plu the frequency of the uth allele, and n the sample size. And PIC (Bostein et al. 1980) was calculated by:

$${\text{PIC}}=1 - \sum\limits_{u=1}^k {p_{lu}^2} - \sum\limits_{u=1}^{k - 1} {\sum\limits_{v=u+1}^k {2p_{lu}^2} } p_{lv}^2$$

where Plu 2 and Plv 2 are the frequency of the uth and vth alleles of marker l.

The genetic similarities (GS) were calculated for each pair of accessions using the Dice similarity index (Dice 1945). The similarity matrix was used to construct an Unweighted Pair Group Method with Arithmetic Mean Algorithm (UPGMA) dendrogram with the help of SAHN-clustering from NTSYSpc version 2.1 (Rohlf 1998).

The population structure (Q matrix) of 105 maize accessions was analyzed using the model-based program STRUCTURE 2.2 (Pritchard and Wen 2003). The membership coefficient for each individual in each subpopulation was run five times for each cluster (K), ranging from 1 to 10, using the admixture model with a burn-in of 100,000 and a replication of 100,000. Because the estimated log probability of data [LnP(D)] overestimated the number of subgroups, we used the ad hoc criterion (ΔK) described by Evanno et al. (2005) to determine the most probable value of K. The run of the estimated numbers of subgroups showing the maximum likelihood was used to assign maize accessions with membership probabilities ≥0.80 to subgroups. The maize accessions with membership probabilities <0.80 were assigned to an admixed group (Wang et al. 2008).

TASSEL 3.0 (Bradbury et al. 2007) was used to evaluate r 2 and D′ for level of linkage disequilibrium and marker-trait associations using a Q general linear model (GLM) and a Q + K mixed linear model (MLM). The Q GLM method was performed using a Q-matrix derived from the STRUCTURE program. The number of permutation runs was set to 10,000 to obtain a marker significance value of P ≤ 0.01. The Q + K MLM method used a kinship K matrix and the population-structure Q matrix at P ≤ 0.01. To obtain the K matrix, kinship coefficients was calculated with SPAGeDi software (Hardy and Vekemans 2002) using the method by Loiselle et al. (1995).

Results

Phenotypic analysis and correlation analysis

Phenotypic variations for 11 agronomic traits in 105 maize accessions of Korean RDA-Genebank are shown in Table 2. The average GR value was 88.8 ± 14.0%, ranging from 40.0 to 100%. The DT value ranged from 44.0 to 59.0 days, with an average of 52.8 ± 4.4 days. The average DS value was 56.8 ± 6.4 days, ranging from 42.0 to 70.0 days. The SD value ranged from 1.0 to 3.9 cm, with an average of 1.9 ± 0.6 cm. The average PH value was 143.6 ± 23.1 cm, ranging from 76.0 to 215.5 cm. The EH value ranged from 28.0 to 112.0 cm, with an average of 67.3 ± 15.6 cm. The average LW value was 7.4 ± 1.3 cm, ranging from 3.9 to 10.9 cm. The LL value ranged from 46.9 to 89.2 cm, with an average of 63.0 ± 8.0 cm. The average EL value was 14.7 ± 2.6 cm, ranging from 7.5 to 23.5 cm. The ER value ranged from 9.3 to 20.0 row, with an average of 13.9 ± 2.0 row. The average 100 KW value was 21.7 ± 5.0 g, and ranged from 11.6 to 41.9 g. We evaluated correlation coefficients among 11 agronomic traits in 105 maize accessions. Twenty-seven combinations correlated significantly with P < 0.05 or 0.01. Among them, PH and EH (0.799**), DT and DS (0.748**), EH and LL (0.573**), and LW and 100 KW (0.555**) showed comparatively higher correlation coefficients than the other combinations (Table 2).

Table 2 Correlation coefficient, mean and standard deviation for 11 agronomic traits in total 105 maize accessions

Genetic diversity among 105 maize accessions of Korean RDA-Genebank

We used a total of 100 SSR loci to evaluate the GD among 105 maize accessions (Fig. 1; Table 3). These loci comprised a total of 1104 alleles in 105 accessions. The number of alleles per locus ranged from 3 to 27, and the average number of alleles per locus was 11.0 (Table 3, Supplementary Table 1). The average GD was 0.73 with a range of 0.18–0.92. In addition, the average PIC value was 0.70 with a range of 0.17–0.91. The average MAF was 0.41 with a range of 0.18–0.90 (Table 3). Of the 1104 alleles, 269 private alleles (24.4%) were each detected in only 1 of the 105 maize accessions. The frequency of rare alleles (frequency <0.05) was 60.6% (669 of 1104 alleles), whereas intermediate (frequency 0.05–0.5) and abundant alleles (frequency >0.5) comprised 37.0% (408 alleles) and 2.4% (27 alleles) of 1104 alleles, respectively (Fig. 2).

Fig. 1
figure 1

Frequency of gene diversity and PIC per locus in 105 maize accessions of Korean RDA-Genebank

Table 3 Total number of alleles and genetic diversity index for 100 SSR loci in 105 maize accessions of RDA-Genebank
Fig. 2
figure 2

Histogram of allele frequencies in 105 maize accessions of Korean RDA-Genebank

Population structure and cluster analysis among 105 maize accessions

For the full set of maize accessions, the highest ΔK value was confirmed for K = 2 in population structure analysis (Fig. 3). Based on a membership threshold of 0.8 (Wang et al. 2008), the maize accessions were divided into group I, group II, or the admixed group. We assigned 35 maize accessions to group I. Group II contained 46 maize accessions. The admixed group, composed of accessions with membership thresholds lower than 0.8, contained 24 maize accessions (Fig. 3). A dendrogram of the 105 maize accessions developed by UPGMA analysis is presented in Fig. 3, which shows three clusters with a GS value of 36%. Group I accounted for 57 accessions, group II contained 47 accessions, and group III contained only one accession (Fig. 3). Among them, 21 accessions were collected in foreign countries: three (IT026994, IT026995, IT026996) from France (FRA), six (IT105365, IT105366, IT105367, IT105368, IT105369, IT105370) from Austria (AUT), nine (IT124200, IT124217, IT124226, IT124236, IT124242, IT124259, IT124273, IT124279, IT124282) from the USA, and three (IT124245, IT124246, IT124247) from Canada (CAN). Our STRUCTURE results placed 12 of the foreign accessions (1 FRA, 4 AUT, 6 USA, 1 CAN) in group II, and assigned the remaining 9 accessions to the admixed group. However, no foreign maize accessions were clustered into group I. The NTSYS results assigned 19 foreign accessions to group I and the remaining two accessions to group II (Fig. 3).

Fig. 3
figure 3

UPGMA dendrogram and population structure in 105 maize accessions of Korean RDA-Genebank based on the SSR markers

Level of linkage disequilibrium and association analysis using Q GLM and Q + K MLM

Level of LD was confirmed based on combinations of the 100 SSR loci in 105 maize accessions. The mean of r 2 value was 0.016 and 0.014 in intra- and inter-chromosome, respectively. The mean of D′ values for intra-chromosome was 0.161, whereas that for inter-chromosome was 0.149 (Table 4). In addition, 2.26% combinations of SSR pairs were showed significant LD (P < 0.01) among total combinations (Table 4).

Table 4 Information on overlapping SMTA markers between Q GLM and Q + K MLM

We performed association analysis between our sets of 100 SSR markers and 11 phenotypic traits in 105 maize accessions by Q GLM and Q + K MLM. We detected 72 marker-trait associations involving 42 SSR markers associated with the 11 agronomic traits using Q GLM (Supplementary Table 2). When we used Q + K MLM, five markers were associated with traits of SD, LW, LL, and ER (Supplementary Table 3). Table 4 presents information on overlapping significant marker trait associations (SMTAs) between Q GLM and Q + K MLM at a significance level of P ≤ 0.01. Among the five SMTAs, umc1062 and nc009 were associated with SD. phi092 was associated with ER, umc1857 was associated with LW, and umc1638 was associated with LL.

Discussion

Erosion of plant GD is a very serious problem caused by modernization and replacement of wild plants or landraces with a few elite varieties (van de Wouw et al. 2010; van Heerwaarden et al. 2009). Therefore, collection and preservation of PGR is increasingly important for crop breeding to support the demands of a growing human population. Effective management and utilization of PGR requires information about strain origins, phenotypic traits, and GD identified by molecular techniques. This study analyzed GD, as well as marker-trait associations, of maize accessions from the RDA-Genebank. To estimate GD and population structure in 105 RDA-Genebank maize accessions, we analyzed 100 SSR loci (10 loci per chromosome) covering the whole maize genome. We detected a total of 1104 alleles, with an average number of 11.0 alleles per locus in 105 maize accessions, and average GD and PIC of 0.73 and 0.70, respectively (Table 3).

So far, many similar studies have been done with us. In the International Maize and Wheat Improvement Center (CIMMYT), GD of 137 maize accessions at 79 SSR loci showed an average 7.2 alleles per locus and an average PIC value of 0.64 (Xia et al. 2005). An analysis of 129 maize accessions from Agriculture and Agri-Food Canada (AAFC) found an average of 3.62 alleles per locus and an average PIC of 0.68 at 105 SSR loci (Reid et al. 2011). In our study of RDA-Genebank accessions, we found that the average allele number and PIC values were higher than in the other two studied groups. Although the number of individuals, markers, and material types were different among the three groups, we concluded that the genetic resources of RDA-Genebank are relatively diverse. A previous report suggested that good populations for association analysis in plants are breeding and genebank collections of cultivars, breeding lines, and germplasm (Malosetti et al. 2007). The materials used in this study were genetic resources from a genebank with relatively higher GD, indicating that our materials are suitable for association analyses.

Determining the GD and population structure in a set of cultivars is very important for marker-assisted selection (MAS) and association analysis (Flint-Garcia et al. 2005; Wang et al. 2008). Our study used two methods to evaluate population structure, a model-based clustering method and a distance-based phylogenetic method. The model-based clustering method using the STRUCTURE program revealed a maximum ΔK value at K = 2, based on the statistical method described by Evanno et al. (2005). In addition, we divided the 105 RDA-Genebank lines into three groups by a distance-based phylogenetic method using the NTSYS program (Fig. 3). As a result, the patterns of population structure calculated by the two methods did not clearly distinguish between Korean and foreign collection regions. In general, population structures and clustering patterns are influenced by many different factors, like the natural history of a population, breeding systems, complexity of breeding practices, and selection by humans (Xie et al. 2008).

Identifying genes that control important agronomic traits is essential for effective maize breeding programs. Many other studies have identified SMTAs using linkage disequilibrium analysis (Borba et al. 2010; Mezmouk et al. 2011; Lorenz et al. 2010; Cui et al. 2015). LD is the non-random association of alleles between different two loci in a population (Flint-Garcia et al. 2005). The success of association analyses depend on the level of LD among the marker alleles related trait (Xie et al. 2008; Yan et al. 2011). The levels of LD in this study confirmed by calculating LD parameters, r 2 and D′, and allowed us to determine significant marker-trait associations. In general, if absolute value of D′ value is one, these two markers or alleles is a complete linkage disequilibrium. Whereas if absolute value of D′ is zero, represents as linkage equilibrium. The range of r 2 is between 0 and 1. If r 2 value is 1, those pair is known as perfect linkage disequilibrium. Based on these theory, we checked value of r 2 and D′ between intra- and inter-chromosome. As a results, both value was higher in intra-than those of inter-chromosome, and percentage of significant LD pair was also higher in intra-chromosome than inter-chromosome (Table 4). These results indicated that linkage is main factor causing higher LD. The level of LD in population was affected from many factors including linkage, experimental materials, marker density, selection, mutation and genetic drift (Wang et al. 2008). When compared with previous studies, level of LD in this study was relatively lower than other studies (Wang et al. 2008; Zhang et al. 2011). These may result from relatively lower marker density or using different experimental materials, i.e. relative higher heterozygosity due to lines in this study is breeding line.

However, false positives (Type I error) caused by spurious associations with population structure (Q matrix) and kinship (K matrix) are critical problems associated with these analyses (Zhang et al. 2010). To avoid false positives and compare two statistical models, we performed Q GLM and Q + K MLM (Supplementary Tables 2 and 3). The Q GLM, using only the Q matrix, identified 72 marker-trait associations, while the Q + K MLM, using both the Q matrix and the K matrix, identified five marker-trait associations. The Q + K MLM model had fewer markers than Q GLM at the same significance level, P ≤ 0.01, but all of the significant associations (P ≤ 0.01) detected by the Q + K MLM were also detected by the Q GLM. These results indicate that the Q + K MLM method is better for reducing the false positive rate in association analyses. Based on overlapping SMTAs between Q GLM and Q + K MLM, we detected two markers significantly (P ≤ 0.01) associated with SD, umc1062 and nc009, located on chromosomes 3 and 6, respectively. A SSR marker, phi092 located on chromosome 4, was associated with the ER trait. Another SSR marker, umc1857 located on chromosome 6, was associated with LW. Finally, umc1638, located on chromosome 8, was significantly associated with LL (P ≤ 0.01) (Table 4).

Among these SSR markers, umc1062, associated with SD in this study, was tightly linked to the phot1 (blue-light receptor phototropin 1) gene in the distal tip of the long arm of chromosome 3 (bin 3.09) (http://www.maizeGDB.org). The phot1 gene of maize plays a role in first positive phototropic curvature of maize coleoptiles through a mechanism involving chloroplast accumulation (Suzuki et al. 2014). In a previous report, umc1062 was a flanking marker for QTL of potassium (K) content in maize stalks (Tang et al. 2015), and K content significantly correlated with lodging as increase of stem strength and with rind thickness in maize (Melis and Farina 1984; Arnold et al. 1974). nc009, also associated with SD in this study, is tightly linked to the pl1 (purple plant 1) gene in the middle part of the long arm of chromosome 6 (bin 6.04) (http://www.maizeGDB.org). Moreover, nc009 is linked to QTL for leaf number (Zhang et al. 2011) and ear width (Peng et al. 2016) (Table 4).

The ER trait was linked with SSR marker phi092, which is also tightly linked to the ssu1 (ribulose bisphosphate carboxylase small subunit 1) or ssu2 (ribulose bisphosphate carboxylase small subunit 2) gene in the middle part of the long arm of chromosome 2 (bin 2.05) or 4 (bin 4.08), respectively (http://www.maizeGDB.org). The phi092 marker maps to regions of both chromosomes 2 and 4 in many separate mapping populations (http://www.maizeGDB.org). Although we did not check its exact chromosomal position in our study, phi092 is associated with QTL for oil/starch ratio (Guo et al. 2013), and QTL for kernel oil concentration on chromosome 2 (Song et al. 2004). This locus is also associated with QTL on chromosome 4 for growing degree units to anthesis, tassel branch number, and tassel length (Mayor 2008) (Table 4). Another SSR, umc1857, related to LW in this study, is linked to QTL for traits related to roots (Cai et al. 2012) and QTL for traits related to anthesis, silking interval (Chen et al. 2012), and maximum root length (Pan et al. 2011) on chromosome 6 (bin 6.04). Finally, umc1638 is linked to QTL for 20-kernel thickness (Liu et al. 2014). Thus, we found several SSR markers associated with different traits compared to other studies, suggesting potential pleiotropy or tight linkage of genes (Table 4).

In conclusion, this study successfully confirmed the GD and established the population structure of 105 maize accessions from the Korean RDA-Genebank PGR. We found relatively high diversity and two- or three-cluster population structures by two different methods, STRUCTURE and NTSYS. In addition, we found marker-trait associations, which can assist in marker-assisted selection (MAS) for breeding programs. We detected a total of five SMATs in our association analysis. These loci may support opportunities for effective preservation and utilization of existing cultivars, and for maize breeders to improve crop quality by MAS.