Introduction

Maize (Zea mays L.) is widely used as a forage crop in European agriculture. During recent decades, breeding efforts have led to a substantial increase in the whole plant yield, facilitated by improved maize stalk standability, and stalk rot and lodging resistance (Barrière et al. 2004). However, during the same period of time there has been a steady decrease in cell wall digestibility, and, consequently, in feeding value of elite maize hybrids (Barrière et al. 2005).

Cell wall digestibility in forage crops is influenced by both lignin content and lignin structure (reviewed by Barrière et al. 2003). The first step in lignin biosynthesis in plants is the deamination of l-phenylalanine by Phenylalanine Ammonia-Lyase (PAL) to cinnamic acid. PAL also catalyzes the first step of several other phenylpropanoid pathways, leading to the formation of a variety of secondary metabolites (reviewed by Winkel 2004). A full-length PAL cDNA has been isolated from maize and the encoded enzyme has been shown to catalyze the deamination of both l-phenylalanine and l-tyrosine (Rosler et al. 1997). Successive enzymatic steps in the monolignol pathway lead to the formation of three monolignols (p-hydroxycinnamyl alcohols); p-coumaryl-, coniferyl-, and sinapyl alcohols from which p-hydroxyphenyl- (H), guaiacyl- (G), and syringyl units (S), respectively, are derived. Subsequently, G, S, and H undergo polymerization by oxidases to form lignin (reviewed by Boerjan et al. 2003; Grabber et al. 2004). Not unexpectedly, given that PAL is the first enzyme in lignin biosynthesis, impaired expression of PAL results in defective lignin formation in tobacco (Nicotiana tabbacum L.) and Arabidopsis (Sewalt et al. 1997; Raes et al. 2003; Rohde et al. 2004).

The brown-midrib (bm) mutants of maize are characterized by decreased lignin content, altered cell wall composition, and a brown-reddish color of leaf midribs. Of the four known bm mutants, bm3 exhibits the strongest effect on plant phenotype, and several feeding studies in dairy cattle have shown the positive impact of bm3 mutants on intake and digestibility of forage maize (reviewed by Barrière et al. 2003). This phenotype is caused by a knock-out mutation in the caffeic acid O-methyl transferase (COMT) gene (Collazo et al. 1992; Vignols et al. 1995; Morrow et al. 1997). However, this phenotype also results in inferior agronomic performance such as lodging and lower biomass yield, restricting the use of bm3 mutants in maize hybrid breeding programs (Ballard et al. 2001; Cherney et al. 1991; reviewed by Pedersen et al. 2005). Thus, the characterization of genetic diversity in other genes involved in lignin biosynthesis could facilitate identification of allelic variation more applicable to breeding programs.

Recently, reports have emerged on nucleotide diversity and extent of linkage disequlibrium (LD) at the COMT locus in maize genotypes currently employed in breeding programs (Fontaine and Barrière 2003; Guillet-Claude et al. 2004a; Zein et al. 2006). Knowledge on the extent of LD is relevant when estimating the marker saturation necessary for high-resolution association analysis at a given locus. Following the pioneering study in plants, associating individual Dwarf8 polymorphisms with flowering time in maize (Thornsberry et al. 2001), association analysis has been applied in a number of species (reviewed by Gupta et al. 2005). In maize, association analyses have been employed to associate individual candidate gene polymorphisms to phenotypic variation in flowering time, endosperm color, starch production, and maysin and chlorogenic acid accumulation (Palaisa et al. 2003; Wilson et al. 2004; Andersen et al. 2005; Szalma et al. 2005). In addition, associations between cell wall digestibility and genetic variation in COMT and other “lignin genes” have been reported (Guillet-Claude et al. 2004a, b; Lübberstedt et al. 2005). Thus, by association analysis causative polymorphisms can be identified from which functional markers can be derived (Andersen and Lübberstedt 2003).

Further characterization of genes affecting fodder quality would facilitate targeted identification, maintenance, and utilization of genetic variation for this trait in maize breeding lines. Three putative PAL genes have been mapped to unlinked positions in the maize genome (http://www.maizegdb.org) by RFLP based on a partial PAL cDNA sequence (GenBank no. M95077; Keith et al. 1993). Thus, it is likely that several PAL genes are present in maize, as is the case in Arabidopsis where four PAL genes have been identified (Raes et al. 2003). The genomic sequence analyzed in the present study has been obtained based on a full-length cDNA sequence (GenBank no. L77912) identified by Rosler et al. (1997), and is throughout the text denoted PAL. Given that PAL is the first enzyme in the biosynthesis of lignin and the phenotypic consequences of impairing PAL expression in tobacco and Arabidopsis, the aims of the present study were to (1) examine the sequence diversity at the PAL locus in European inbred lines of maize, (2) study the extent of LD at this locus in these lines, and (3) test associations between individual PAL polymorphisms and four different phenotypic traits related to forage quality.

Materials and methods

Plant materials and phenotypic analyses

A collection of 32 maize inbred lines consisting of 19 Flints and 13 Dents were included in the analysis. Twenty-nine lines were elite inbreds from the current breeding program of KWS Saat AG and three lines were from the public domain (AS01, AS02, and AS03 identical to F7, F2, and EP1, respectively; Table 1). The 32 lines were selected based on digestibility of neutral detergent fiber (DNDF) values to represent a broad range of variability for this trait in central European germplasm employed in forage maize breeding. The included lines were derived from several breeding populations of Flint and Dent, respectively, and are not related apart from lines AS20 and AS21 which are an isogenic line pair included in the analysis based on contrasting DNDF values. The inbred lines were evaluated in Grucking (sandy loam) in 2002, 2003, and 2004, and in Bernburg (sandy loam) in 2003 and 2004. The experiments included 49 entries in a 7 × 7 lattice design with two replications. Plots consisted of single rows, 0.75 m apart and 3 m long with a total of 20 plants. About 50 days after flowering, the ears were manually removed and the stover was chopped. Approximately 1 kg of the material was collected and dried at 40°C after which the stover was ground to pass through a 1 mm sieve. Quality analyses were performed with near infrared reflectance spectroscopy (NIRS) based on previous calibrations on the data of 300 inbred lines (unpublished results). The following data were recorded: % water soluble carbohydrates (WSC) (Luff Schoorl 1929), % neutral detergent fiber (NDF) (VanSoest 1963), in vitro digestibility in % of organic matter (IVDOM) (Tilley and Terry 1963), and DNDF given by the formula DNDF = 100−(100−IVDOM)/(NDF × DM/OM/100) where DM is dry matter content and OM is organic matter content of the sample (Tilley and Terry 1963; VanSoest 1963).

Table 1 Phenotypic means for individual lines across five environments, overall- and within-pool phenotypic means, variance components (Var. cp.) and their significance levels, and least significant difference between lines (LSD) for four quality-related traits recorded for 32 lines across five environments

DNA isolation, PCR amplification, and DNA sequencing

Plants were grown for DNA isolation in the greenhouse and leaves were harvested at 3 weeks after germination. Genomic DNA was extracted from the leaves using the Maxi CTAB method (Saghai-Maroof et al. 1984). DNA templates for sequencing were obtained by polymerase chain reaction (PCR) to produce two overlapping fragments using Taq DNA polymerase and primers based on the sequence of a full-length PAL cDNA of maize (GenBank no. L77912; Rosler et al. 1997). The combination of the forward and reverse primers PAL_F1 (5′- ACT CCT CCG GCT CTT CTT CTC) and PAL_R1 (5′- CTT GTG GGT CAG GTG GTC CGT) produced a 2,200 bp fragment spanning the full first exon, the intron, and the 5′ end of the second exon. The combination of the forward and reverse primers PAL_F2 (5′- CGC CGA GGC GTT CAA GAT C) and PAL_R2 (5′- GTG GCA GGG CAC AGC TAC) produced a 1,640 bp fragment, overlapping with the PAL_F1PAL_R1 fragment, spanning the second exon, and part of the 3′-UTR.

DNA amplification was performed in a 50 μl reaction mixture containing 20 ng genomic DNA, primers (200 nM), dNTPs (200 μM), 1 M Betain and two units of Taq polymerase (Peqlab, Erlangen, Germany). Touchdown PCR was applied as follows: an initial denaturation step at 95°C for 2 min, seven amplification cycles: 45 s at 95°C; 45 s at 68°C (minus 1°C per cycle), 2 min at 72°C, followed by 30 amplification cycles: 30 s at 94°C; 45 s at 60°C, 2 min at 72°C, and a final extension step at 72°C for 10 min. Products were separated by gel electrophoresis on 1.5% agarose gels, visualized by ethidium bromide staining and photographed using an eagle eye apparatus (Herolab, Wiesloch, Germany).

Fragments were purified using QiaQuick spin columns (Qiagen, Valencia, USA) according to the manufacturer’s instructions, and sequenced directly using internal sequence specific primers and the Big Dye1.1 dye-terminator sequencing kit on an ABI 377 (PE Biosystems, Foster City, USA). Electropherograms of overlapping sequencing fragments were manually edited using the software package Sequence Navigator version 1.1 from PE Biosystems. Final full alignment was built up using default settings of the Clustal program version 1.8 (Thompson et al. 1994) followed by manual refinement to minimize the number of gaps.

Analysis of sequence data

DNA sequences were analyzed for the complete sample and within individual subpopulations (Flint and Dent). DnaSP Version 4.10 (Rozas et al. 2003) was applied for the analysis. Two estimates of diversity, π and θ, were calculated. π is the average number of nucleotide differences per site between two sequences (Nei 1987), and θ is derived from the total number of segregating sites and corrected for sampling size (Watterson 1975). These estimates were based on single nucleotide polymorphisms only.

To test for neutrality of mutations, Tajima’s D statistic (Tajima 1989), and Fu and Li’s D* and F* statistics (Fu and Li 1993) were applied. These statistics are based on different comparisons of Θ = 4N eμ, where N e equals the effective population size and μ the mutation rate (Watterson 1975). Tajima’s D results from the comparison of Θ based on the number of pair-wise differences and the number of segregating sites between sequences in the sample. Fu and Li’s D* and F* result from comparisons of Θ based on the number of singletons and the number of either segregating sites (D*) or pair-wise differences (F*). The minimum number of recombination events between pairs of non-overlapping SNPs was determined using the four-gamete test (Hudson and Kaplan 1985).

LD between pairs of polymorphic sites (SNPs and insertion/deletion polymorphisms (indels), excluding singletons) in PAL was estimated by the TASSEL software, version 1.9.0 (Thornsberry et al. 2001); http://www.maizegenetics.net/bioinformatics/tasselindex.htm). Various measurements for LD have been developed (reviewed by Gaut and Long 2003) of which squared allele frequency correlations (r 2) (Weir 1996) were chosen for our calculations. The significance of LD between sites was tested by Fisher’s exact test.

For the phylogenetic analysis of allele sequences, the MEGA software version 3 (Kumar et al. 2004) was used with default settings. Bootstrapping, based on 1,000 replications of the dataset, was performed to test phylogenies.

Population structure and association analysis

All lines were genotyped with 101 simple sequence repeat markers (SSRs) providing an even coverage of the maize genome. The employed SSR markers are publicly available (http://www.maizegdb.org/ssr.php). Population structure was inferred from SSR data by using the Structure 2.0 software (Pritchard et al. 2000; Falush et al. 2003). Structure applies a Bayesian clustering approach to identify subpopulations, each modeled by a characteristic set of allele frequencies, in this case based on genotyping data from 101 SSRs. The procedure assigns individuals to these populations, while simultaneously estimating the population allele frequencies. Structure produces a Q matrix that lists the estimated membership coefficients for each individual in each cluster. The Admixture model was applied. A burn-in length of 50.000 followed by 50.000 iterations was used (See the Structure 2.0 documentation at http://www.pritch.bsd.uchicago.edu/).

The estimated Q matrices were used in the subsequent association analysis carried out in TASSEL. This software applies a logistic regression ratio test to calculate, whether the likelihood of the candidate gene distribution (in this case PAL polymorphisms) is associated with either (1) population structure and phenotypic variation or (2) population structure only. The test statistic (Λ), the ratio between these two likelihoods, indicates associations between individual polymorphisms and traits, in this case four quality-related traits. Mean phenotypic values across five environments (Table 1) were applied for the association analysis. In addition, the general linear model (GLM) analysis in TASSEL was employed to identify associations, not considering population structure. All PAL polymorphisms (including singletons) were tested and the P-value for individual polymorphisms was estimated based on 1,000 permutations of the dataset, both for GLM and logistic regression.

Results

Phenotypic data and correlations

Mean phenotypic values across environments were calculated for the overall sample, for within the Flint and Dent pools, and for individual lines (Table 1). Mean phenotypic values for individual lines ranged between 12.29 and 25.81 for WSC, 50.33 and 63.03 for NDF, 67.23 and 77.98 for IVDOM, and 49.59 and 60.99 for DNDF. Overall means were 19.68, 56.06, 73.26, and 56.33 for WSC, NDF, IVDOM, and DNDF, respectively. The phenotypic variation was significantly affected by lines and environments, as well as the interaction between these two (Table 1). Coefficients of correlation between traits and tests of significance are given in Table 2. DNDF was significantly correlated to NDF (−0.32) and IVDOM (0.86). IVDOM was significantly correlated to all other traits, negatively to NDF and positively to WSC and DNDF. The closest positive correlation was observed between DNDF and IVDOM, while the closest negative correlation (−0.89) was observed between NDF and WSC (Table 2).

Table 2 Coefficients of correlation for 32 European inbred lines between water soluble carbohydrates (WSC), neutral detergent fiber (NDF), in vitro digestibility of organic matter (IVDOM), and digestibility of neutral detergent fiber (DNDF)

Sequence alignment and haplotypes

The full PAL alignment spanned 3,594 bp including 453 sites with alignment gaps (indel polymorphisms). In the first exon, two SNPs were identified. In the intron, 23 SNPs and 17 indels of varying size were identified. The largest indel span ∼300 bps in the 3′ end of the intron in a complex manner, not affecting the intron–exon splice site. The two alleles of this indel discriminate the lines into two groups, primarily consisting of Flint and Dent lines, respectively. In the second exon, two 1-bp deletions and 11 SNPs were identified. In total, 39 single nucleotide polymorphic sites (SNPs) were identified (Table 3). Of these, 33 were parsimony informative sites, each allele carried by two or more individuals. The remaining six sites were singletons, i.e., sites in which only one copy of the rare variant was present in the complete sample. No SNPs with more than two variants were identified. While three SNPs in the second exon altered the amino-acid sequence, the remaining SNPs were synonymous mutations, not altering the amino-acid sequence. Of the six singleton sites, five and one were identified in the lines AS31 and AS13, respectively. In total, eight PAL haplotypes were identified based on the 39 SNPs (Table 3). Haplotype 1 comprised the majority (18) of lines, including 15 Flint lines. Haplotypes 6 and 8 comprised four lines each, including seven Dent lines. Haplotype 2 comprised two lines (both Flint), while the remaining haplotypes comprised one line each (Table 4). Considering the intron, exon 2, and 3′ UTR regions, all lines except three (AS09, AS13, and AS31), exhibit one of the two haplotypes, identical to haplotypes 1 and 6 (Table 3) constituted predominantly of Flint and Dent lines, respectively (Table 4).

Table 3 The PAL genomic sequences grouped in eight haplotypes defined by 39 single nucleotide polymorphisms identified in the alignment
Table 4 Lines and mean phenotypic values included in the haplotypes

Nucleotide diversity and selection

Nucleotide diversity (π) was determined for the Flint and Dent heterotic groups individually and for the combined sample based on the 39 SNPs (Fig. 1 and Table 5). For the combined sample, nucleotide diversity was lowest in the coding regions (π = 0.00248) and highest in the intron (π = 0.00821) and 3′ UTR (π = 0.00751) regions. Overall, nucleotide diversity was π = 0.00424 in the combined sample, and was lower in the Flint lines (π = 0.00166) as compared to the Dent lines (π = 0.00427).

Fig. 1
figure 1

Nucleotide diversity (π) calculated along the PAL locus for Flint, Dent, and the combined sample. π is shown in sliding windows of 100 bp using a step size of 10 bp

Table 5 Summary of DNA polymorphisms, diversity estimates, and selection estimates for the PAL locus of maize

Tajima’s D was not significant when considering either the entire PAL sequence or the non-coding regions of the combined sample (Table 5). However, when considering only the ORF, Tajima’s D was positive and significant. This indicates selection and an excess of alleles with intermediate frequencies at the PAL locus across the 32 lines. Fu and Li’s D* and F* were non-significant in all regions in the combined sample. Within the Flints, Tajima’s D was negative and significant considering the entire region and the ORF. This suggests selection as well as the presence of low-frequency alleles within the Flints. Within the Dents, Tajima’s D was non-significant in all regions while Fu and Li’s D* and F* were both significant considering the ORF. This indicates selection in the ORF and few mutations in more recent generations.

Phylogenetic analysis

Phylogenetic analysis by the neighbor-joining (NJ) method based on the PAL genomic sequence revealed two major clusters, predominantly Flint- and Dent lines, respectively (Fig. 2). In the “Flint” cluster, 17 Flint- and three Dent lines (AS8, AS11, and AS29) grouped together while in the “Dent” cluster 10 Dent and two Flint lines (AS12 and AS13) grouped together.

Fig. 2
figure 2

Phylogenetic relationships estimated by neighbor-joining based on PAL parsimony informative SNPs. I Flint cluster; II Dent cluster

Linkage disequilibrium (LD) and recombination

LD was estimated between all pairs of polymorphic sites (SNPs and indels) in the PAL genomic sequence (Fig. 3). A plot of r 2 against physical distance for polymorphism pairs indicated that LD persisted (r 2 > 0.2) for the entire length of the PAL locus (Fig. 4). However, as is evident from Fig. 3, LD is not evenly distributed along the locus. Both r 2 values and Fisher’s exact test of LD identified an LD block spanning the 3′ half of the intron, the second exon, and the 3′ UTR. Another LD block was identified spanning the 5′ half of the intron. No LD was detected between these two blocks. Thus, sites in strong LD were predominantly identified in the terminal ∼2.5 kb of the PAL gene. The relative high level of LD is supported by the detection of only two recombination events; one between sites 199 (exon 1) and 948 (intron), and one between sites 2,048 and 3,013 (both exon 2) of the alignment.

Fig. 3
figure 3

LD across the PAL locus. Bp position of polymorphisms in the alignment are given. The vertical bar on the left represents 1st exon-, intron-, 2nd exon-, and 3′UTR regions, respectively. Polymorphisms at bp positions 566, 652, 720, 726, 776, 806, 857, 1,945, 2,086, 2,470, and 3,452 are singletons. Lower left triangle: P-values derived from Fisher’s exact test. Upper right triangle: r 2 values

Fig. 4
figure 4

Plot of r 2 values (Y-axis) against physical distance (bp) along the PAL locus (X-axis). A logarithmic trend line is fitted to the data

Population structure and association analysis

Estimation of population structure was performed by Structure based on 101 SSR markers providing an even coverage of the maize genome. Two subpopulations, in agreement with the Flint and Dent pedigree information, were estimated as the most likely subdivision of our plant material (Fig. 5). While most lines exhibited a homogenous genetic background (either Flint or Dent), two Dent lines, AS29, and AS34, harbored ∼5%, and ∼60% of the “Flint genetic background”, respectively (Fig. 5).

Fig. 5
figure 5

Population structure estimates based on 101 SSRs evenly distributed across the maize genome. Vertical bars represent individual maize lines. The area of either green or red illustrates the proportion of either “Flint” or “Dent” genetic background, respectively, harbored in individual lines based on these SSR markers

The population structure estimates was used in TASSEL to test for associations between PAL polymorphisms and WSC, NDF, IVDOM, and DNDF (Table 1). All polymorphisms, including singletons, were considered in the association analysis. By GLM analysis (not considering population structure), all polymorphisms in the 3′ LD block, excluding singletons, and all singletons in the 5′ part of the intron (bp positions 566, 652, 720, 726, 776, 806, and 857), were significantly associated (P<0.05) with NDF (Fig. 6a). No associations were identified for WSC, IVDOM, or DNDF by this approach. By logistic regression analysis (considering population structure), a 1 bp indel at bp position 2,086 was significantly associated (P<0.05) with IVDOM (Fig. 6b). No associations were identified for DNDF, WSC, or NDF, when considering population structure.

Fig. 6
figure 6

Test of associations between PAL polymorphisms and four quality-related traits. Tests were performed excluding (a) and including (b) population structure estimates in the analysis. Individual dots represent the significance (Y-axis) of individual polymorphisms (X-axis). The broken line represents = 0.05. WSC water soluble carbohydrates, NDF neutral detergent fiber, IVDOM in vitro digestibility of organic matter, DNDF digestibility of neutral detergent fiber

Discussion

Nucleotide diversity and linkage disequilibrium at the PAL locus

In the present study, the genomic sequence of the PAL gene has been obtained for 19 Flint and 13 Dent maize inbred lines. With the exception of lines AS29 and AS34, all lines exhibited a homogenous “Flint” or “Dent” genetic background, defined by SSR markers (Fig. 5). This is in general agreement with pedigree information and with previous studies showing the ability of SSR markers to reliably define heterotic groups in maize (Smith et al. 1997; Senior et al. 1998). While the phylogeny derived from PAL polymorphisms also predicts two main clusters, predominantly Flint and Dent, more interspersions of lines are indicated (Fig. 2). This inconsistency of subpopulations based on multi-locus (SSR)- and single-locus (PAL) data, respectively, is not unexpected. As shown for the tb1 genomic region, single loci polymorphisms produced different phylogenies, depending on the locus in question (Wang et al. 1999; Clark et al. 2004). Strong selection upon a single locus can result in fixation of alleles within (sub)populations, while alleles of other closely linked loci, not under selection, might be randomly distributed across (sub)populations. For the PAL locus, similar to the COMT locus (Zein et al. 2006), most alleles are fixed within the Flint and Dent heterotic groups, indicating selection and/or genetic drift at these loci after the separation of breeding pools.

Population bottlenecks and selection are expected to decrease nucleotide diversity and increase LD at a given locus (Ching et al. 2002; Flint-Garcia et al. 2003). At the PAL locus, selection is indicated in the coding region (Table 5) and nucleotide diversity (π = 0.00424) is within the range of what has been reported for other maize loci (e.g. Thornsberry et al. 2001; Whitt et al. 2002; Palaisa et al. 2003; Clark et al. 2004). However, comparing to loci putatively involved in the biosynthesis of lignin (Guillet-Claude et al. 2004a; b; Zein et al. 2006), the nucleotide diversity at the PAL locus was relatively low. Previously, we have reported an overall nucleotide diversity of π = 0.00834 at the COMT locus among the lines included in this study plus ten additional lines (Zein et al. 2006). Thus, within a similar sample of lines, the overall nucleotide diversity at the COMT locus exceeded that at the PAL locus by two-fold.

Overall, LD persisted (r 2 > 0.2) over the length of the PAL locus (3.7 kb) when considering all polymorphisms, excluding singletons (Fig. 4). Due to population bottlenecks and selection, LD can be expected to be higher among elite breeding lines as compared to among more distantly related genetic resources. In agreement with this, a rapid breakdown of LD (r 2 < 0.1 within few hundreds of bps) has been reported for several loci in diverse sets of maize germplasm (Remington et al. 2001; Tenaillon et al. 2001), while extended LD, up to hundreds of kbs, has been reported in sets of inbred lines (Ching et al. 2002; Jung et al. 2004). However, extended LD was also identified at the sugary1 locus (r 2 > 0.4 over 7 kb) in a set of diverse germplasm (Remington et al. 2001) indicating considerable between-loci variation in LD regardless of the sampled plant material. Different from the PAL locus, LD levels have previously been reported to decline rapidly for several loci involved in lignin biosynthesis (Guillet-Claude et al. 2004a; b). Specifically, we have found r 2 < 0.1 within 2 kb at the COMT locus within an overlapping sample of lines (Zein et al. 2006). This difference in LD-decay could reflect the levels of constrains put on the respective loci by selection.

Two distinct LD blocks were evident at the PAL locus (Fig. 3). While no LD was detected between the 5′ part of the intron and other regions in PAL, extensive LD was detected spanning the 3′ half of the intron, the second exon, and the 3′ UTR (Fig. 3). Though not uncommon (Guillet-Claude et al. 2004a; b), such LD pattern might be due to cloning artifacts or the amplification of segments from different members of a gene family. However, the first of the two amplicons spans the first exon, the intron, and ∼200 bp of the second exon. As the border between the two LD blocks is located in the centre of the intron, this renders cloning artifacts causing the observed LD pattern unlikely. The organization of the two LD blocks in haplotypes (Table 3) and the separation of Flint and Dent lines in haplotypes (Table 4) further argues against artifacts, e.g. a “frame-shift” in the assignment of DNA sequences to individual lines. Extended LD can arise from several processes of population genetics, including population size, population bottlenecks and selection (reviewed by Flint-Garcia et al. 2003). Extended LD might also indicate relatedness between individuals. However, the lines included in the present study originate from several breeding populations of Flint and Dent and are, apart from the isogenic line pair AS20 and AS21, not related. Also, local LD can be a signature of a selective sweep, i.e. a local reduction of genetic variation, caused by the rapid fixation of a beneficial mutation (Kim and Nielsen 2004). Thus, the distinct LD pattern at the PAL locus could reflect genetic drift or selective sweeps within the Flint and Dent pools, fixating different alleles in the respective breeding pools (Tables 3, 4). If caused by a selective sweep, the high LD spanning the 3′ half of the PAL gene (Fig. 3) might indicate causative sites, with regard to phenotype, within this region. In Arabidopsis, PAL mutants were affected in several metabolic pathways, including the monolignol pathway (Rohde et al. 2004). Given a similar function of PAL in maize, functional constrains of the enzyme could restrict mutation- and recombination rates at the gene, resulting in the relatively low nucleotide diversity and high LD observed here.

PAL polymorphisms are associated with forage quality

Previous studies have identified genes involved in the biosynthesis of lignin as promising targets for identification of polymorphic sites associated with forage quality (Guillet-Claude et al. 2004a; b; Lübberstedt et al. 2005). However, the extended LD at the PAL locus has consequences for association analysis. No recombination was detected between bp positions 947 and 1,655 in the intron and between bp positions 2,131 and 2,470 in the second exon (Table 3). Thus, it is not possible to discriminate the effects of individual polymorphisms on phenotypes within these LD blocks in the present maize lines. However, a 1-bp deletion ∼400 bps downstream of the start of the second exon (position 2,086), was associated with IVDOM when considering population structure (Fig. 6b). The deletion introduces a stop codon ∼450 bp into the second exon and could thus affect the functionality of the PAL protein. Consequently, this deletion is a candidate site for deriving a functional marker. However, it is present only in a single line (AS07), and the association needs to be interpreted with caution until validated in more lines. Ultimately, all polymorphisms identified in this study could be evaluated in larger and/or broader collections of germplasm to attempt to enhance both the genetic resolution (i.e. decrease LD) and the power of the association analysis.

Population structure can result in false positive associations, which is controlled by considering population structure in the association analysis (Thornsberry et al. 2001). However, true functional polymorphisms might be confounded with population structure, e.g. between Flint and Dent lines. Consequently, such polymorphisms will not be identified by association analysis when considering population structure. By GLM analysis, not considering population structure, several polymorphisms were associated with NDF, including three non-synonymous SNPs in the second exon (Fig. 6a and Table 3). However, both NDF values and haplotypes were confounded with population structure in the maize lines included in this study (Table 4). Thus, the PAL-NDF associations were not detected when considering population structure. This illustrates a potential problem of considering population structure in the association analysis, specifically within crop plants maintained as separate breeding pools/lines: while the number of false positives can be reduced, “true” causative polymorphisms might be masked, i.e., the number of false negatives might increase. Consequently, the choice of plant material can significantly impact the outcome of the analysis, as illustrated by studies on the Dwarf8 locus in two different sets of maize plant materials (Thornsberry et al. 2001; Andersen et al. 2005).

Deriving functional markers for forage quality traits

Studies in tobacco and Arabidopsis have shown that impaired expression of PAL results in defective lignin formation (Raes et al. 2003; Rohde et al. 2004; Sewalt et al. 1997) and that lignification is crucial for structural integrity of the cell wall and strength of the stem (Chabannes et al. 2001; Jones et al. 2001). Thus, it is conceivable that polymorphisms in PAL could affect forage quality in maize. The three non-synonymous SNPs in the second exon (Table 3) could be considered candidate causative polymorphisms from which functional markers could be derived. Studies of gene expression and enzyme activity could further elucidate the allelic effects of these polymorphisms. However, compared to other genes involved in the monolignol pathway, PAL exhibits low nucleotide diversity and extended LD (Guillet-Claude et al. 2004a; b; Zein et al. 2006), restricting the identification of discrete, causative polymorphisms by association analysis. Consequently, investigations in larger and/or broader sets of maize germplasm are necessary to enhance the genetic resolution at the PAL locus. Alternatively, a series of PAL mutants could be produced by tilling, allowing for comparisons of single polymorphisms, currently in complete LD, in isogenic backgrounds (http://www.genome.purdue.edu/maizetilling/).

PAL is the first enzyme in several phenylpropanoid pathways, catalyzing the production of a number of phenylpropanoids, including monolignols, from phenylalanine (reviewed by Winkel 2004). These phenylpropanoids are functioning as, e.g., structural components (lignins), UV sunscreens, and signaling molecules. In agreement with these diverse functions of phenylpropanoids, it has been shown that PAL mutants in Arabidopsis were affected not only in the monolignol pathway, but that also carbohydrate- and amino acid metabolisms were altered (Rohde et al. 2004). Thus, an allelic shift at a PAL locus could affect (positively or negatively) several traits, restricting selection at the locus. Consequently, genes more downstream and specific to monolignol synthesis, e.g., O-methyltransferase genes (Guillet-Claude et al. 2004a; Lübberstedt et al. 2005), could prove to be more suitable candidates for deriving functional markers for forage quality. However, three unlinked mapping positions in the maize genome (http://www.maizegdb.org) indicate that PAL is organized as a small gene family in maize, as is the case in Arabidopsis (Raes et al. 2003). Thus, alleles at other PAL loci might differentially affect forage quality.