Introduction

Downy mildew (DM), which is caused by several species in the genera Peronosclerospora, Sclerospora, and Sclerophthora, is one of the most destructive maize diseases in subtropical and tropical regions of the world, including America, Asia, Africa, Europe, and Australia. In Asia, the main causal agent is Peronosclerospora sorghi (Weston and Uppal) C.G. Shaw (Sriwatanapongse et al. 1993; Jeffers et al. 2000; Raymundo 2000; George et al. 2003).

In tropical and subtropical lowland Asia, maize-growing areas report economic losses occasionally as high as 75 % (Exconde and Raymundo 1974) due to DM (Jeffers et al. 2000). While genetic resistance is a cost-effective and environmentally safe method for controlling DM, despite the use of DM resistant cultivars and metalaxyl fungicide as a seed treatment for controlling DM, the incidence of the disease is still severe in localized areas (Dalmacio 2000). New sources of locally adapted DM-resistant lines may alleviate this problem.

Linkage mapping of DM resistance in studies of the genetic basis of DM resistance have been complicated by the polygenic nature of the trait and by the fact that additive effects contribute to resistance (Kaneko and Aday 1980; Singburandom and Renfro 1982; Borges 1987; De Leon et al. 1993). So far, linkage mapping has been the tool of choice for the identification of quantitative trait loci (QTL) that confer resistance to maize DM (Agrama et al. 1999; George et al. 2003; Nair et al. 2005; Sabry et al. 2006; Jampatong et al. 2008). Consequently, QTL from various genomic regions on chromosomes 1, 2, 6, 7, and 10 have been found to confer resistance to DM (George et al. 2003). Despite these discoveries, the QTL approach does have some limitations, including high costs and poor resolution in defining QTL. Furthermore, when using bi-parental crosses between inbred lines, only two alleles at any given locus can be studied at one time by this method.

Recently, sets of cultivars, lines, or landraces have been used to identify marker-trait associations in plants. This method uses linkage disequilibrium (LD) between DNA polymorphisms and genes underlying agronomic traits of interest (Thornsberry et al. 2001; Flint-Garcia et al. 2005; Yu et al. 2006; Buckler and Gore 2007; Zhu et al. 2008) to identify useful markers. The use of association mapping or LD mapping has been broadened to plant studies, and many QTL have recently been identified and confirmed by this approach (Parisseaux and Bernardo 2004; Breseghello and Sorrells 2006; Stich et al. 2006; Agrama et al. 2007; Tommasini et al. 2007; Christopher et al. 2007; Skot et al. 2007; Crossa et al. 2007; Casa et al. 2008; Maccaferri et al. 2011). Gene mapping through association has several advantages in traditional biparental populations because it can precisely pinpoint the genomic region responsible for the expression of the target trait, and has the potential to allow evaluation of a large number of alleles per locus (Buckler and Thornsberry 2002; Flint-Garcia et al. 2003, 2005). Gene mapping through association appears to be a promising approach to overcome some of the previous limitations of conventional linkage mapping in plant breeding (Stich et al. 2005; Yu and Buckler 2006).

R genes confer resistance to pathogens that express matching avirulence genes in a gene-for-gene manner (Flor 1956, 1971). The largest class of known R proteins includes those encoding nucleotide-binding site–leucine-rich repeat (NBS–LRR) proteins and receptor-like kinase enzymes. The major classes of R genes contain a highly conserved NBS domain adjacent to the N terminus and a LRR domain involved in host recognition of pathogen-derived elicitors. Responses to fungal, viral, and bacterial pathogen infections are mediated by genes encoding receptor proteins (Dangl 1995). Association mapping has been used to identify disease resistance genes in several crop species including sugarcane, maize, barley, and potato (Meyers 2003; Flint-Garcia et al. 2005; Yu and Buckler 2006; Wei et al. 2006; Malosetti et al. 2007; Stich et al. 2008; Murray et al. 2009; Inostroza et al. 2009). Although genome scanning is also useful for identifying markers for DM resistance, Phumichai et al. (2012) previously conducted an association mapping study of this trait. In the present study, candidate R genes were used to assess the extent of LD in the target population and to identify single nucleotide polymorphisms (SNPs) that significantly affect DM resistance.

Materials and methods

Plant material and phenotypic evaluation

A panel of this study consisted of 60 maize inbred lines (Table S1) supplied by two public-sector institutions and two private companies in Thailand. The National Corn and Sorghum Research Center (NCSRC-IICRD KU; Suwan Farm) and Nakhon Sawan Field Crop Research Center (NFR) supplied 17 and 15 inbred lines of field corn, respectively. Bangkok Seeds Industry and Sweet Seeds Company provided 15 inbred lines (7 field corn, 4 sweet corn, 2 waxy corn, 2 popcorn), and 13 inbred lines (11 sweet corn, 2 waxy corn), respectively. Although our sample was restricted to 60 inbred lines the average genetic diversity and number of alleles (0.7 and 10.1, respectively) was previously reported by Phumichai et al. 2012.

Artificial inoculation of DM in the field was described in detail in Phumichai et al. (2012) and below (S2). Field experiments were conducted at two locations: Nakhon Sawan Field Crop Research Center (NFR) (15°20′45″N, 100°29′4″E) and the National Corn and Sorghum Research Center, Inseechandrastitya Institute for Crop Research and Development (NCSRC-IICRD KU) (14°24′42″N, 101°25′18″E), Thailand. All maize inbred lines were laid out in a randomized complete block design with three replicates during the 2008 rainy season (May–July). Two-row plots, 5 m in length with 0.75 m row spacing, containing 42 plants per plot were planted using a hand jab planter. Methods for evaluation of disease resistance were described in detail in Phumichai et al. (2012) (S2).

DNA isolation and candidate genes sequencing

Genomic DNA was isolated from fresh young maize leaves using the modified cetyltrimethyl ammonium bromide (CTAB) method (Doyle and Doyle 1990). PCR primers were designed for the three candidate genes based on resistance genes previously identified on maize chromosomes (Table 1). PCR was carried out using 0.5 U/μl Pfu Taq polymerase (Fermentas), 10× PCR buffer (200 mM Tris–HCl pH 8.8, 100 mM (NH4)2SO4, 100 mM KCl, 1 % Triton X-100, 1 mg/ml BSA), 25 mM MgSO4, 1 mM dNTP, 10 μM each of the forward and reverse primers, and 20 ng template DNA. PCR cycles were conducted as follows: 1 cycle of 94 °C for 2 min; then 30 cycles of 94 °C for 30 s, 30 s at annealing temperature specific to each primer pair, and 72 °C for 1 min; followed by 1 cycle of 72 °C for 5 min on a PTC-225 Peltier Thermal Cycler (MJ Research, St. Bruno, Canada). The PCR products were separated by electrophoresis on 1 % (w/v) agarose gels in 1× TAE buffer at 50 V for 30 min, stained with ethidium bromide for 30 min, and visualized under UV light. PCR products were directly purified and sequenced on the ABI 3730XL DNA Analyzers at Macro Gen Company (Seoul, South Korea).

Table 1 Three candidate R genes and primers sequences used for association analysis

Candidate genes and LD analysis

Candidate R gene loci were chosen as the loci most closely linked to the three SSR loci reported in a previous DM disease association study (Phumichai et al. 2012). Three partial R genes from the chosen candidate genes, PIC15, PCO145579, and zmcf5, located on chromosomes 1, 2, and 9, respectively, were analyzed in this study (Table 1). Sequence analyses were performed using BLAST at the Maize Genetics and Genomics Database (MaizeGDB; http://blast.maizegdb.org/home.php?a=BLAST_UI) and BLASTN (Altschul et al. 1997) at the National Center for Biotechnology Information (NCBI; http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch). The partial sequences used here were assembled using MEGA version 5 software (Tamura et al. 2011). Assembled sequences were then aligned using ClustalW (Chenna et al. 2003). Sequences were manually checked using Sequence Scanner Software v 1.0 (Applied Biosystems 2005). For these three candidate genes, the number of polymorphic sites (S), haplotypes, haplotype diversity, pairwise nucleotide diversity (π), nucleotide polymorphism (θ) and Tajima’s D were computed using DnaSP version 5.1 (Librado and Rozas 2009). LD between pairs of polymorphic sites in each of these three genes was estimated by TASSEL stand-alone software version 3.1 (Bradbury et al. 2007). LD was estimated using squared allele-frequency correlations (r 2). The significance of LD was tested using Fisher’s exact test.

Association analysis and narrow-sense heritability

Analyses of associations between nucleotide polymorphisms and phenotypic values for DM resistance were performed using the general linear model (GLM) and mixed linear model (MLM) functions in TASSEL stand-alone version 3.1 (Bradbury et al. 2007). Principal component analysis (PCA) was used to replace the population structure (Q) (Price et al. 2006; Yu et al. 2006; Zhao et al. 2007; Zhang et al. 2009) and was generated by TASSEL stand-alone version 3.1 (Bradbury et al. 2007). PCA can be used to infer population structure, and the ideal number of principle components can be identified as the component number close to an elbow in the curve of a scree plot between eigenvalues on the y axis and the number of components on the x axis (Linting et al. 2007). An elbow should occur near the last component that contributes significant variation to the trait being analyzed. The kinship matrix for all 60 maize inbred lines was also calculated using polymorphic sites with kinship analysis in TASSEL stand-alone version 3.1 (Bradbury et al. 2007). Type I error was controlled by applying the 1,000 permutation test in TASSEL stand-alone version 3.1 (Bradbury et al. 2007) to control error associated with multiple comparisons and generate strong p values for association analysis of polymorphic sites. The statistical power calculation of the study was calculated using the GWAPower statistical program designed for genome wide association (GWA) studies with quantitative traits where genetic effect was defined as heritability (Feng et al. 2011).

Marker-based narrow-sense heritability (h 2) is defined as the proportion of genetic variance over the total variance, using the restricted maximum likelihood (REML) estimates of V a and V e obtained using TASSEL stand-alone version 3.1 (Bradbury et al. 2007) as: h 2 = V a/V a + V e, where V a is the genetic variance and V e is the residual variance.

Results

Nucleotide diversity

The resulting alignments of PIC15, PO1455579 and zmcf5, from a set of 60 maize accessions, were 587, 712, and 428 bp—in length, respectively (Table 2). Considering all three genes in this study, the average number of polymorphic sites (S) was 43.7, average haplotype diversity was 0.738, and average nucleotide diversity was 0.015. In addition, neutral selection was evaluated using Tajima’s D test with DnaSP version 5.1 software. Tajima’s D test results for these three genes revealed negative values for selection that were significant for the partial zmcf5 gene (Table 2).

Table 2 Summary of nucleotide polymorphism and nucleotide diversity in candidate genes PIC15, PCO145579, and zmcf5

Linkage disequilibrium

Linkage disequilibrium was estimated between all pairs of polymorphic sites in the partial sequences of the PIC15, PO145579, and zmcf5 genes using TASSEL stand-alone 3.1. Average r 2 values for PIC15, PO145579, and zmcf5 were 0.28, 0.29, and 0.05, respectively. The PIC15 sequence displayed two blocks of polymorphism (from position 238 to 279, and from position 418 to 455) and showed strongly significant LD estimates with r 2 greater than 0.8 (p < 0.0001). The PO14557 sequence displayed three blocks of polymorphism (from position 79 to 148, from position 312 to 369, and from position 383 to 608) with strongly significant r 2 estimates of LD of greater than 0.7 (p < 0.0001). In contrast, only slightly significant LD was observed between a few polymorphic sites in the zmcf5 partial sequence (r 2 ≤ 0.3, p < 0.01) (Fig. 1).

Fig. 1
figure 1

Linkage disequilibrium estimates for partial sequences of the PIC15, PCO145579, and zmcf5 genes in 60 maize inbred lines; lower left triangle p values derived from Fisher’s exact test; upper right triangle r 2 values

Association analysis with candidate genes and narrow-sense heritability

Analysis of association between these three candidate R genes and phenotypic traits was performed using GLM incorporating PCA, and MLM incorporating both PCA and relative kinship (K) (PCA+K) in TASSEL standalone 3.1. PCA of polymorphic sites from these three candidate genes in 60 maize inbred lines revealed that the top three axes explained 55.6 % of the variation for these three R genes in these populations. PCA was used to remove population effects such as geographic origin or diversifying selection (Price et al. 2006) for analysis of associations between genotypic (PIC15, PO145579, and zmcf5) and phenotypic traits related to DM resistance. Significant association between polymorphisms in the partial PIC15 and PO145579 genes and DM resistance was found, while no such association was identified for zmcf5 (Table 3).

Table 3 Polymorphic sites of the partial PIC15 and PCO145579 genes significantly associated with DM resistance identified by GLM incorporating PCA, and MLM including PCA and K

One SNP in the partial PIC15 gene was significantly associated with DM resistance at both the NCSRC-IICRD KU and NFR experimental locations, with r 2 value of 16.4 and 7 %, respectively (Table 3). The SNP site in exon 267 of the PIC15 partial gene sequence encoded a nucleotide transition from A to G. The phenotypic contribution for the A allele was detected in 36 maize inbred lines, of which 21 (58 %) showed resistance to DM. Another SNP detected in the PO145579 partial gene sequence was associated with DM at the NCSRC-IICRD KU experimental location with r 2 value of 7 %, while three more SNPs showed association with DM at the NFR location with r 2 value of 6 % (Table 3). The haplotype (TTGT) characteristic of these SNPs in the partial PO145579 gene contributed 64 % of phenotypic variation for DM resistance. Genetic power was calculated with a GWAPower program (Feng et al. 2011) by using broad sense heritability (Phumichai et al. 2012). The simulation demonstrated that the current sample size (n = 60) has 97.7 % of genetic power to achieve in detecting the studied SNPs. Marker-based narrow-sense heritability (h 2) for these 60 maize inbred lines was calculated using the formula described above for each experimental location by dividing genetic variance by total phenotypic variance obtained from the MLM incorporating PCA and K. Narrow-sense heritabilities were 33 and 29 % at the NCSRC-IICRD KU and NFR experimental locations, respectively (Table 3).

Discussion

The average nucleotide diversity of these genes was 0.015, close to the nucleotide diversity of 0.018 for the non-synonymous maize hm1 disease resistance gene (Zhang et al. 2002). Tajima’s D values were negative and were significant only for zmcf5 (Table 2). The negative value for Tajima’s D in our study, particularly the significant negative value for zmcf5, may be due to purifying selection, which results in a few alleles predominating but most other alleles occurring at low frequencies. Examples of purifying selection have been detected for several plant NBS disease resistance domains (McHale et al. 2006). In the case of lettuce, the Type I RGC2 genes were identified as having undergone diversifying selection, while Type II RGC2 genes were identified as having undergone purifying selection (Kuang et al. 2004). In addition, population studies have shown that balancing selection maintains polymorphism at R gene loci (Meyers et al. 2005). Therefore, balancing, diversifying, and purifying selection may all play roles in the evolution of a particular R gene cluster.

Using mean r 2 = 0.2 as a cutoff point for estimation, the extent of LD for all three of these candidate R genes was ~200 bp in these 60 public and private maize inbred lines (S3). The LD estimator r 2 ranged from 0.05 to 0.29. A low level of LD was observed in the partial sequence of the zmcf5 gene (r 2 = 0.05). Many factors, such as the origin of populations, the choice of populations for analysis, the particular genomic region analyzed, high rates of recombination or mutation, and subdivision of populations, can affect LD. In maize, genome-wide LD decay values have been shown to be in the range of ~200–1500 bp (Remington et al. 2001; Tenaillon et al. 2001). Su et al. (2010) reported that LD of the naked and rab28 drought tolerance genes in maize varied by more than 700 bp. The maize disease resistance gene, glutathione S transferase (GST), exhibited rapid decay of LD in the range of 1 or 2 kb (Wisser et al. 2011) when analyzed in 253 maize inbred lines. Thus, the level of variability in LD in the three candidate R genes in this study could be due to the germplasm origins of these maize inbred lines or to low selection pressure on these genes during the breeding histories of these lines. In addition, the limited partial gene fragments analyzed here might not sufficiently describe the overall patterns of LD in these candidate R genes. Therefore, full-length candidate R genes should be analyzed to better describe patterns of LD among maize DM resistance genes.

Most plant disease resistance genes encode proteins containing NBS and LRR domains, which have been identified in several plant species (McHale et al. 2006). The NBS–LRR domains are involved in the detection of diverse plant pathogens, including bacteria, viruses, fungi, nematodes, insects, and oomycetes (McHale et al. 2006).

Association analysis of partial sequences of three candidate R genes has detected five SNPs associated with DM disease resistance in the PIC15 (maize chromosome 1) and PO145579 (maize chromosome 2) genes. The r 2 values in these SNPs showed ranging from 6 to 16.4 %. The highest r 2 value was found in PIC15 gene with DM resistance at the NCSRC-IICRD KU (16.4 %), while, most revealed minor r 2 was detected in PO145579 gene. The A allele SNP from the PIC15 gene contributes 58 % of the phenotypic variation in DM resistance, while the TTGT haplotype from the PO145579 gene contributes 64 % of the phenotypic variation in DM resistance. Phenotypic contribution information is typically reported in association analysis to estimate the efficiency of detecting associations between traits and particular SNPs (Thornsberry et al. 2001). In soybean, a SNP in the F3H gene appeared in nine of 12 accessions (75 %) that were susceptible to mosaic virus strain SC-7 (Cheng et al. 2010). The lettuce dieback resistance gene, Cntg10192, contains three SNPs, two of which associate perfectly with the resistance allele, while the third SNP explains 40.9 % of the variation in the resistance trait, which was due to variation in association among lines of different origin (Simko et al. 2009).

Broad-sense heritability was previously estimated as the genetic component from ANOVA (H 2 = 97 %) in Phumichai et al. (2012). Marker-based narrow-sense heritability of h 2 = 29–33 % for DM resistance in this study was expected, dominance or epistasis may have caused discrepancies among variance components. In this study, low narrow-sense heritability (29–33 %) may be due to the analysis of few candidate R gene sequences of limited length. Preliminary studies of the inheritance of DM resistance in corn by Gomes et al. (1963) showed that a few partially dominant genes controlled resistance. However, genetic analysis of DM resistance using composite interval mapping (Jampatong et al. 2008) successfully detected QTL with additive and dominant genetic effects, including partial dominance, dominance, and overdominance.

In this study, the significant SNP in the PIC15 partial gene sequence could be detected using either GLM with PCA or MLM with PCA+K. In addition, the A allele of PIC15 and TTGT haplotype of PO145579 also showed strongly significant LD estimates, with r 2 greater than 0.8 (p < 0.0001) and 0.7 (p < 0.0001), respectively. Yu et al. (2006), reported that the Q+K model controls the number of false associations better than other tests using either the K- or Q- models alone. These results indicate that these five SNPs from chromosomes 1 (PIC15) and 2 (PO145579) will be good candidate SNPs to evaluate for use as functional molecular markers for DM resistance in breeding programs.

There were two limitations in the panel used for the GWA. First, the number of inbred lines was small, which 60 could weaken the power of the association and could result in some loci being missed, especially those with small effects. Yang et al. (2010) reported that a panel of 155 inbred lines could obtain 59.2 and 87.6 % of the quantitative genes explaining 5 and 10 % of phenotypic variation, respectively. Secondly, small population size may introduce high frequency of spurious associations when estimating population structure and familial relatedness compared to larger populations and inadequate to identify QTL with low minor allele frequencies (Wang et al. 2012). Bradbury et al. (2011) showed that population sizes of 300 were sufficient to detect QTL for traits with moderate to high heritability (0.75–1.0); however as the number of QTL increased, the larger effect QTL were detected but not the smaller effect QTL. Genetic power was calculated with a GWAPower program (Feng et al. 2011). When the genetic power simulated with the low (0.1) and high (0.97) heritability with the same sample size (n = 60), it is represented with 28.4 and 97.7 % of power to detect the studied SNPs, respectively. Therefore, keeping the same power as well as high heritability (0.97) simulated, the increasing sample size needed is 277 when low heritability simulated. Alternatively, considering combined association analysis with bi-parental linkage analysis to identify the true genetic variants for these traits could help to as the covariance between genotypes and phenotypes can be broken up by generating controlled cross (Yang et al. 2010). In addition, increasing the population size of germplasm panel, consideration of population structure for reducing frequency of spurious associations, may enhance this panel useful in identifying the genetic factors associated with many traits (Yang et al. 2010).

As the small population association analysis in this study could detect only major allele with high heritability traits. In addition, the limitations of association mapping may suffer from a false positives caused by population structure within a germplasm (Stich and Melchinger 2009). Spurious association that found in population could be corrected by using statistical models accounting for population structure (Q matrix) and kinship matrix (K) to reduce the false positive associations (Yu et al. 2006; Price et al. 2006). The results of Zhang et al. (2013) comparison 16 different models for reduction of false-positive associations, the performance of MLM (PCA+K) model was indicated that the best reduction in false-positives frequency in the maize panel.

Therefore, the results of this study provide not only basic genetic information useful for the development of functional molecular markers, but also contribute important information for further research to validate the polymorphic sites by bi-parental mapping and increasing maize populations size to enhance the power of future association analyses.