Introduction

White clover (Trifolium repens L.) is an allotetraploid (2n = 4x = 32) forage legume species that is widely cultivated in temperate pastures (Atwood and Hill 1940), to which it confers the benefits of atmospheric nitrogen fixation (Crush 1987) and high quality forage yield. It is used primarily for sheep and cattle grazing, but is also often conserved for winter feed, particularly on dairy farms (Frame et al. 1998). Classical methods of phenotypic trait selection (Woodfield and Caradus 1994) suitable for obligate outbreeding species (Allard 1960) have been applied to commercial cultivar development using polycross procedures.

Development of molecular genetic approaches to white clover germplasm improvement has been constrained by the presence of high levels of genetic heterogeneity within natural and synthetic populations due to a gametophytic self-incompatibility system (Atwood 1940). The methods of molecular genetic map construction, genetic trait dissection and marker-assisted selection appropriate to diploid, inbreeding crop species require modification for use with allogamous polyploids. However, molecular genetic markers in close linkage to genes for agronomic characters are still pre-requisites for molecular marker-based breeding of white clover. An essential precursor to such studies is the construction of genetic linkage maps. The first map of white clover (Jones et al. 2003) contained genomic DNA-derived simple sequence repeat (SSR) and amplified fragment length polymorphism (AFLP) loci and was based on the F2(I.4R×I.5J) family (Michaelson-Yeates et al. 1997). The second map (Barrett et al. 2004) contained SSR markers derived from expressed sequence tag (EST) data (EST-SSRs) and was based on a two-way pseudo-testcross population. Although no common markers are currently available to align the two maps, they collectively provide the basis for trait dissection and quantitative trait locus (QTL) detection.

A number of important objectives have been defined for the breeding of white clover (Williams 1987). Breeding for enhanced yield and persistence involves the manipulation of a series of morphological parameters such as plant spread, height and density, the latter being influenced by stolon branching frequency, stolon thickness and leaf size. The harvestable yield component is predominantly contributed by leaf material. Under optimum growth conditions, large-leaved morphotypes are more productive than smaller-leaved forms (Davies 1958). However, under non-optimal conditions, such as intensive grazing, stolon density is positively correlated with herbage production (Caradus and Mackay 1991), and in general the most dense types are observed among small-leaved genotypes (Caradus and Williams 1981). The genetic association between leaf size (Woodfield and Caradus 1990) and stolon density is influenced by both internode length and branching propensity (Caradus and Chapman 1996). Marker-based analysis may provide an understanding of the underlying genetic bases for these relationships, including possible co-location of QTLs for vegetative morphogenesis traits.

Improvement of seed yield is also an important breeding objective. Differences between cultivars for seed yield components have been noted (Connolly 1990; Williams et al. 1998), suggesting that genetic variation is available for selection. Although selection gains may be made separately for vegetative and reproductive traits, profuse flowering and seed production are in general inversely correlated with plant persistence (Gibson 1957; Piano and Annicchiarico 1995). The potential to break these associations may be assessed through analysis of genomic architecture by trait dissection.

Environmental modification of quantitative traits also contributes to imprecision in QTL detection and MAS implementation. Phenotypic plasticity and genotype-environment interactions are significant contributors to morphophysiological diversity in plants, and are particularly common for pasture species (Clements et al. 1983). Phenotypic plasticity for traits such as leaf production (Collins et al. 2002) and flowering frequency (Hay and Newton 1996) has been reported for white clover. Such effects may be at least partly attributable to QTL × environment (QTL × E) interactions (Li et al. 2003; Ungerer et al. 2003). Apart from comparisons of QTL location and magnitude of effect from different environments, the presence or absence of QTL × E may be determined through a combined analysis (Cullis et al. 1998), which entails the aggregation of data from different sources in a single study. The clonal biology of pasture species such as white clover provides an opportunity for combined analysis of data from single sib-ships under spatial and temporal replication.

We present here for the first time trait dissection and QTL identification results from multiple environments for white clover. Data for agronomically important vegetative and reproductive traits from multiple trials of the F2(I.4R×I.5J) population were analysed independently and in a combined analysis. Robust genomic regions suitable for evaluation of MAS implementation were identified, and the implications of the analysis are discussed in the context of the known biology of white clover.

Materials and methods

Genetic mapping family

A total of 125 seeds were germinated in May 1997 under glasshouse conditions to generate individual genotypes of the F2(I.4R×I.5J) mapping family, and initial cuttings were taken in June 1999.

The cuttings consisted of a terminal portion from a primary stolon 7 cm in length, which were then planted into a John Innes No. 2 loam-based compost in 7.8 cm pots located in a glasshouse. The explants were then transplanted to the field, as described below. Subsequent cuttings were taken from the same individual genotypes maintained in the glasshouse: for instance, those for field analysis initiated in 2001 were obtained in June of the same year.

Field evaluation of phenotypic traits

For all trials, four replicate plants from 100 selected F2 individual plant genotypes were transplanted in a randomised complete block design. Rows consisted of ten plants, each corresponding to a different individual F2 genotype, plants and rows were at a spacing of 0.9 m.

Trial established at IGER in 1999

At the Institute of Grassland and Environmental Research (IGER), Aberystwyth, Wales, United Kingdom (latitude 52.4 N, longitude 4.0 W), on August 10th 1999, the population was transplanted into a fine-rolled seed bed incorporating 4 tonnes ha−1 lime, 100 kg ha−1 Triple SuperPhosphate (46 kg ha−1 P2O5) and 200 kg ha−1 Muriate of Potash (60 kg ha−1 K2O). The soil type was of the Nercwys series: deep, medium loams with slowly permeable sub-soils and slight seasonal water logging on a moderate to gentle slope.

The following characters were recorded: flowering date (determined as days from April 1st 2000 to produce the first open floret on one flower that is capable of pollination); plant height; plant spread; and height of tallest flower (all measured in cm on 13th June 2000). When all plants had flowered, the primary stolon was selected. Leaf length, leaf width and petiole length (in mm), along with estimated leaf area (in mm2), were measured on the leaf at the third node distant from the growing point, and internode length (in mm) was measured between the third and fourth nodes (all determinations performed on 23rd June 2000). The whole plant was harvested on 14th July 2000 and the green weight (in g) was recorded.

The total number of inflorescences per plant was recorded on the 19th and 27th June and 7th July 2000. On 12th July 2000, three mature inflorescences were removed from each plant, ensuring that the base of the peduncle (flower stalk) was carefully detached from the stolon. Peduncle length and girth for this freshly harvested material were measured (in mm). Inflorescences with attached peduncles were air-dried, and peduncle length and girth subsequently measured (in mm). Florets from each inflorescence were removed, counted and threshed to remove seeds, which were then weighed and counted. These data were then used to calculate seed yield components per plant, including fertility (seeds per floret), thousand seed weight (g) and seed yield per plant (g). The phenotypic measurements were designated as the IGER 1999 experimental dataset.

Trial established at IGER in 2001

The previously described seed-bed establishment and fertilisation procedures were repeated at IGER, Aberystwyth. Transplantation from clonal cuttings took place on August 15th 2001, into a soil type of the Rheidol series: permeable well-drained medium loam.

The following characters were recorded, using the same procedures as for the IGER 1999 plantings: growth score (on a ascending scale from 0 to 5, on 4th May 2002; flowering date (as days from April 1st 2002 to produce the first open floret on one flower that is capable of being pollinated); plant height and plant spread (in cm on 13th June 2002); leaf length, leaf width (in mm), estimated leaf area (in mm2), petiole length and internode length (in mm), all on 13th June 2002; height of tallest flower (in cm on 18th June 2002), number of flowers (a single measurement on 26th June 2002); and green weight (in g on 27th June 2002). The phenotypic measurements were designated as the IGER 2001 experimental dataset.

Trial established at East Craigs in 2001

At East Craigs, Edinburgh, Scotland, United Kingdom (latitude 55.9 N, longitude 3.36 W), the mapping family was planted into a soil of the MacMerry type using procedures as described above for the trials established at IGER on the 29th August 2001.

The following characters were recorded: flowering date (as days from April 1st 2002); plant height and plant spread (in cm on 19th June 2002); leaf length, leaf width (in mm), estimated leaf area (in mm2), petiole length and internode length (in mm), all on 19th June 2002; length of freshly harvested peduncle (in mm on 19th June 2002); number of flowers (a single measurement on 1st August 2002); and growth score (on an ascending scale from 0–5 on 1st August 2002). The phenotypic measurements were designated as the East Craigs 2001 experimental dataset.

Statistical analysis of phenotypic data

Data for each site or year were analysed by analysis of variance (ANOVA). The mean values were then used to summarise the range and mean for each site or year combination and to derive a matrix of correlations between traits within each combination. Broad sense heritability was calculated according to the formula of Nyquist (1991) for clonal replicates of cross-pollinated species, with blocks and genotypes assumed as having random effects in the ANOVA.

All site or year data were combined and were analysed by the multiple experiment or combined analysis procedure, which specifies the residual term for the individual experiments, using the residual maximum likelihood (REML) method. Therefore, trait means were derived as best linear unbiased estimates (BLUEs). Data transformation (square root conversion) was required for the leaf area trait in this instance. All statistical analyses were performed using GenStat (GenStat Committee 2003).

QTL analysis

Following genetic map construction using MAPMAKER 3.0, a sub-set of marker loci was selected to provide even coverage of the genome with marker intervals of 5–10 cM (a total of 104 markers across the 18 linkage groups) and consensus map distances were subsequently preserved. Single-marker regression (SMR) was initially employed to identify significant variation associated with selected genetic markers. Simple interval mapping [(SIM) Lander and Botstein 1989; Haley and Knot 1992] and composite interval mapping [(CIM) Zeng 1994] methods were used to identify and confirm the presence of QTLs. All analyses were performed using the QTL Cartographer, version 2.0, application (Basten et al. 1994). The maximum log-of-odds (LOD) score of association between the genotype and trait data was calculated for SIM and CIM, and QTL location predictions were accepted for SIM for values greater than a threshold value of 2.5. Permutation analysis (1,000 iterations) was used to establish an experiment-wise significance value at the 0.05 confidence level, defined as a minimum LOD threshold for each trait in CIM (Churchill and Doerge 1994; Doerge and Churchill 1996). For each form of interval analysis, maximum LOD value, location of maximum LOD value on the genetic map, additive marker allele effects and the proportion of phenotypic variance attributable to the QTL were tabulated. During the course of mapping family propagation, the progenitor inbred and F1 parental genotypes were unable to be preserved, resulting in an inability to obtain estimates of mean, additive and dominance effects from examination of data from the different generations. As a consequence, no a priori knowledge was available for the additive and dominance components of the assayed traits. Estimates of LOD scores, additive and dominance effects and proportions of phenotypic variance explained by the QTL were consequently based on the assumption that both additive and dominance effects could be present in all instances.

Results

Statistical analysis of data

Traits measured across the various years are summarised in Table 1. Measurements of broad-sense heritability showed a range of non-zero values from 11.3 to 78.7%. Most of the higher heritability values (above 60%) were associated with reproductive traits. Among those traits that were represented in all three experimental datasets, the most consistent estimates of heritability were obtained for leaf length (range from 29.3 to 38.6%). By contrast, the flowering date trait showed large variation (range from 18.5 to 69.1%) between estimates (Table 1).

Table 1 Broad sense heritability estimates for vegetative morphogenesis and reproductive morphogenesis and development traits obtained from experimental datasets derived from plantings at IGER in 1999, IGER in 2001 and East Craigs in 2001

Analysis of trait data in all instances demonstrated significant variation associated with the plant genotypes (P<0.01 data not shown). For those traits measured in more than one trial, variation was observed between experimental datasets, for example flowering time varied from a mean value of 64.8 in the IGER 1999 dataset to 41.39 in the East Craigs 2001 dataset. Analysis of those traits represented in all three datasets, and hence available to combined analysis, revealed overlapping distributions that did not show significant deviation from normality (Table 1, Supplementary Fig. S1).

Correlation coefficients between different traits within experimental datasets were evaluated (Tables 2, 3, 4, 5). High correlations were observed between plant height, plant spread and green weight in the IGER 1999 and IGER 2001 datasets (0.64–0.84). By contrast, a consistently low correlation was observed between the internode length and petiole length traits (0.16–0.25). Flowering date was negatively correlated with all other traits, with the exception of a weak positive correlation (0.11) with petiole length in the IGER 2001 dataset. The strongest negative correlations with flowering date were observed with plant spread (−0.48 in the IGER 1999 dataset and −0.58 in the IGER 2001 dataset) (Tables 2, 4, 5).

Table 2 A Phenotypic correlation coefficients for vegetative morphogenesis and common reproductive morphogenesis and development traits derived from the IGER 1999 experimental dataset
Table 3 Phenotypic correlation coefficients for reproductive morphogenesis and development traits exclusively derived from the IGER 1999 experimental dataset
Table 4 Phenotypic correlation coefficients for all traits derived from the IGER 2001 experimental dataset
Table 5 Phenotypic correlation coefficients for traits derived from the East Craigs 2001 experimental dataset

Among the reproductive morphogenesis and development traits exclusively measured in the IGER 1999 dataset, very high correlation coefficients (0.89–0.93) were observed between the successive measures (1–3) of flower number, indicating a strong degree of similarity between these traits. Fertility score, number of seeds per flower and seed yield were also highly mutually correlated (0.79–0.92), but these relationships were not reflected in high correlations with thousand seed weight. Weak correlations were observed between some of the peduncle dimension traits and a number of other characters (Table 3).

QTL analysis of individual experimental datasets

Vegetative morphogenesis

Analysis of the IGER 1999 dataset identified a total of 15 putative QTLs for all traits apart from leaf width, with multiple QTLs for each trait apart from leaf area. Only 2 (13%) of the QTLs were unequivocally identified by SMR, SIM and CIM, the remainder being supported by one or other of the interval mapping approaches. Analysis of the IGER 2001 dataset identified a total of 23 putative QTLs for all 9 traits. Multiple QTLs were identified for each trait, except leaf width and internode length. Seven (30%) of the QTLs were identified by all three analytical methods. Analysis of the East Craigs 2001 dataset identified a total of 24 QTLs for all 8 traits, with multiple QTLs for each trait. Ten (41%) of the QTLs were identified by all three analytical methods (Fig. 1, Supplementary Table S2).

Fig. 1
figure 1figure 1figure 1

Location of QTLs detected by individual analysis of reproductive morphogenesis and development traits (a, b) and vegetative morphogenesis traits (c–e) on the genetic map of white clover derived from the F2(I.4R×I.5J) population. Nomenclature of genomic DNA-derived SSR (TRSSR) loci and AFLP loci is as described by Jones et al. (2003). QTL nomenclature is adapted from McCouch et al. (1997) in the form q-trait-dataset-analytical method

Across the three datasets, a total of 62 QTLs were identified for all 9 independently evaluated traits. QTL were detected in both instances for two traits measured in only two experimental datasets (growth score and green weight). Of those traits measured across all experimental datasets, all but leaf width detected QTLs in each individual analysis. Graphical depiction of QTL locations on the genetic map revealed large clusters of coincident QTLs on LGs 2, 3, 7, 11 and 12 (Fig. 1). No QTLs were identified on LGs 10, 13 and 17, and the other LGs were lightly populated. Coincident QTL locations across experimental datasets were observed for a number of traits. For example, QTLs for plant height of similar magnitude were observed on LG11 from analysis of the IGER 1999 and IGER 2001 datasets, and on LG12 from all three datasets. The LG12-located plant height QTLs from the 2001 datasets were unequivocally identified with each analytical method. Leaf length QTLs on LG7 also were identified from each individual analysis, and were detected by SMR, SIM and CIM (Fig. 1, Supplementary Table S2). In contrast, of the seven QTLs for petiole length, only two (from the IGER 1999 and East Craigs 2001 datasets) showed a common location (LG11). Even among those traits with a higher proportion of common QTLs between individual analyses, trial-specific locations were observed. For instance, a plant height QTL on LG1 was only detected from the East Craigs 2001 dataset.

Coincidence of QTLs for highly correlated traits provided a potential basis for the observed co-variation. From each dataset, QTLs for leaf dimension traits were identified in similar positions on LG7. The correlated plant height and green weight traits from the IGER 1999 dataset detected coincident QTLs on LG2, and similar effects were seen on LG7 for green weight and plant spread measured in the IGER 2001 dataset. Three traits from the IGER 2001 dataset showing moderate-to-high correlation (0.45–0.82) detected coincident QTLs on LG8, but the remaining coincident QTLs on this LG from analysis of this dataset are for traits showing weak mutual correlation (internode length and petiole length).

For those QTLs located in clusters on the same LG, the additive genetic effects were generally in the same direction. For the QTLs on LG7, only the growth score trait from the IGER 2001 dataset showed a negative value with SIM (−0.12). A similar pattern is shown on LG3, for which the majority of the effects have positive values, while the QTLs on the LG12 cluster showed exclusively negative effect values.

Reproductive morphogenesis and development traits

The three traits that were measured in at least two experimental datasets identified a total of ten QTLs. No flowering date QTLs were identified from the IGER 1999 dataset, but putative QTLs were detected on LGs 2, 16 and 18 from the IGER 2001 dataset and LG12 from the East Craigs dataset. Consequently, no coincident locations were observed for this trait. There was also an absence of QTL coincidence across datasets for the height of tallest flower and single-point flower number traits. The remaining 13 traits from single trials detect a total of 13 QTLs (Fig. 1, Supplementary Table S2). The majority of the peduncle dimension traits failed to detect QTLs, with the exception of the measurement taken as part of the East Craigs 2001 dataset. Only one (8%) of the QTLs was supported by all forms of analysis.

Quantitative trait locus clusters were observed on LGs 2 and 3, as for the vegetative morphogenesis traits. However, LG7, which contained the largest number of vegetative morphogenesis QTLs, was the location of only two reproductive trait QTLs, while LGs 11 and 12 contained only one QTL. Each of the number of flower estimates made on successive dates in 2000 permitted detection of coincident QTLs on LG3, but these did not coincide with QTLs for single-point flower number from the IGER 2001 and East Craigs 2001 datasets. The LG3-located flower number QTLs did, however, coincide with a QTL for seed yield. This trait shows a moderately high level of correlation (0.64–0.66) with the flower number traits. Although the two QTLs for seed yield (on LGs 3 and 8) were not consistently detected by both SIM and CIM, the maximum LOD scores detected by SIM were among the highest within the whole dataset.

Multi-environment combined analysis of phenotypic data

The eight traits common to each dataset showed a general absence of G × E interactions: only flowering time and plant spread showed significant (P>0.05 effects, which were not associated with large changes of rank. Data for these traits were aggregated for combined analysis, identifying 25 genomic regions (Table 6). Coincident QTLs were identified on LGs 2, 7, 8, 11 and 12 (Fig. 2), with the largest numbers on LG7 (6 QTLs), LG12 (4 QTLs) and LG8 (3 QTLs). A total of ten QTLs (40%) were unequivocally identified with each of the interval mapping techniques, with at least one such QTL for each trait. Five of the remaining QTLs failed to reach the minimum LOD value for SIM, while ten showed values lower than the empirical threshold for CIM. The majority (23) of the QTLs detected by the combined analysis co-locate with QTLs for the corresponding traits from the individual analyses (Fig. 3). The exceptions are the combined analysis QTLs for the petiole length trait on LG5, and the flowering date trait QTL on LG13, in each case detected by CIM. Both LGs were largely devoid of QTLs from analysis of the individual datasets, apart from an equivalent region on LG5 affecting growth score in the IGER 2001 trial. Comparison of the additive genetic effects for the combined-analysis QTLs revealed similar patterns as the individual analyses. All of the QTLs on LG7 showed positive effects, while the LG12-located cluster contained QTLs of negative effect.

Table 6 Summary of QTL detection information for the eight phenotypic traits subjected to combined analysis
Fig. 2
figure 2

Location of QTLs detected by combined analysis of seven vegetative morphogenesis trait and one reproductive development trait on the genetic map of white clover derived from the F2 (I.4R×I.5J) population. Nomenclature of genomic DNA-derived SSR (TRSSR) loci and AFLP loci is as described by Jones et al. (2003). QTL nomenclature is adapted from McCouch et al. (1997) in the form q-trait-dataset-analytical method

Fig. 3
figure 3

Comparison of locations between vegetative morphogenesis QTLs detected by combined analysis clustered on LGs 7 and 12, and corresponding QTLs from the analysis of vegetative morphogenesis based on individual experimental datasets. QTL nomenclature is adapted from McCouch et al. (1997) in the form q-trait-dataset-analytical method

Discussion

Genetic control of phenotypic traits

Analysis of broad-sense heritability obtained results that are generally consistent with previous knowledge. Leaf size and internode length traits have been reported to be highly heritable (Woodfield and Caradus 1990; Caradus and Chapman 1996), and show mean values across the experimental datasets of 35.4% (leaf area) and 47.3%, respectively. Among the reproductive traits the value for number of florets per inflorescence, which has previously been reported to show high heritability (Van Bogaert 1977), was 43.3%.

In some cases, the extent and direction of phenotypic correlation coefficients between different traits are similar to those obtained from previous studies. For those datasets in which plant height, plant spread and green weight were all measured, high mutual positive correlations were observed. Plant height has been shown to be highly correlated with green weight in white clover plants from diverse geographical populations (Barcikowska 1976), and these relationships also extend to leaf size and petiole length, as seen in the present study. Seed yield shows moderate-to-high levels of correlation with number of flowers (a mean value of 0.65 across the successive measures) and number of florets per flower, consistent with previous studies (Brigham and Wilsie 1955; Van Bogaert 1977). The consistent negative correlations observed between flowering date and vegetative morphogenesis traits may be related to observed decline of vegetative growth following onset of flowering, due to inhibition of stolon growth (Kawanabe et al. 1963). Other relationships are less consistent with previous studies. Internode length and leaf dimension traits show moderate positive correlations in this study (a mean value of 0.45 across the eligible experimental datasets), but an inverse relationship has been observed between leaf size and internode length under conditions of intensive grazing (Woodfield and Caradus 1990). This anomaly may be a feature of the specific germplasm used in construction of the mapping cross, or possibly due to the different environmental conditions obtained through assessment of spaced plants in an experimental field trial as compared to a grazed sward.

Individual and combined-analysis of phenotypic data

The F2 mapping population structure described here is unusual for an obligate outbreeding species. Seed set and germinant establishment were presumably limited by the effects of inbreeding depression in the initial cross. In addition, the loss of parental genotypes due to reduced viability precludes recapitulation of the cross, and high levels of residual heterozygosity within the inbred line (Jones et al. 2003) prohibits the use of related individuals. The loss of parents has also prevented accurate estimation of dominance effects due to the inability to compare generation means. The use of a population of limited size will generate inaccuracies through overestimation of genetic effects, underestimation of the number of regions of the genome controlling the trait and imprecise estimates of location (Beavis 1994). A theoretical model has been proposed to explain these observations (Xu 2003). However, the degree of overestimation of genetic effects will be trait- and population-specific, and bias in the estimation is still observed with large populations (n>500), albeit to a lesser extent (Schön et al. 2004).

Further accuracy of QTL detection could be obtained by assignment of additional marker loci to provide greater genome coverage, which would also be expected to generate a consolidated set of LGs matching the expected haploid karyotypic number (16). However, even in the higher density genetic map of Barrett et al. (2004), two LG fragments were generated which could not be coalesced except by homologous inferences. The selection of markers for interval analysis from the full F2(I.4R×I.5J) genetic map was made on the basis of an average spacing of c. 10 cM, which is typical for this type of study (reviewed by Kearsey and Farquhar 1998).

Apart from these constraints, the current study provides a major advantage through the increased power of F2 structures in QTL detection compared to pseudo-testcrosses (Grattapaglia et al. 1995), based on the ability to make comparisons between homozygous QTL combinations (Q1Q1 vs. Q2Q2), as opposed to comparisons of the form (Q1Q3+Q1Q4) vs. (Q2Q3+Q2Q4). Trait-dissection data obtained using this approach will permit validation of more subtle effects observed in subsequent studies using pseudo-testcross families.

Comparison between the individual analyses, and with the results of the combined analysis, provides evidence for stability of QTL location and also QTL × environment (QTL × E) interactions, which are locus-specific manifestations of classical genotype × environment (G × E) effects. The genetic basis of phenotypic plasticity resulting from G × E interactions is poorly understood, although models based on differential allelic effects and environment-specific regulatory effects have been proposed (Via et al. 1995). In the current dataset, a large proportion of the measured traits showed variability of QTL detection between the individual datasets. Even for leaf dimension traits, which are known to be characteristic of different varietal types, variation was observed. The results of such studies have traditionally been represented on linkage maps as separate QTLs for each trait-environment combination. However, the combined analysis method has been applied here to detect QTLs that are relatively insensitive to environmental variation. For perennial pasture plant species, which may be clonally propagated for multi-environment analysis but frequently exhibit high levels of phenotypic plasticity, combined analysis permits the simplification of complex datasets and targeting of regions for effective marker-assisted breeding. Combined analysis identified several major genomic regions containing coincident QTLs, the majority co-locating with those derived from the individual analysis. In addition, two regions were identified solely by the combined analysis. The failure to detect corresponding regions in the individual datasets may be due to sub-significant effects that have been cumulatively increased through the estimation of combined predicted means. The detection of such QTLs is another important outcome of combined analysis.

Genetic architecture of trait variation

The coincidence of QTL locations supports the observed phenotypic correlation. QTLs for leaf dimension traits were consistently detected on LG7, and the positive correlation between internode length and leaf area is consistent with the identification of coincident combined analysis QTLs on LG8. Individual and combined analyses both reveal LGs 7 and 12 as the most important regions of genetic effect. In addition, the majority of additive genetic effects act in the same direction within these clusters. QTL coincidence is less obvious among the reproductive morphogenesis and development traits, although the coincidence of QTLs for successive measures of flower number and seed yield on LG3 is consistent both with high trait correlation and previous agronomic studies (Brigham and Wilsie 1955). The remaining QTLs for both flowering intensity and seed yield are non-coincident with the major vegetative morphogenesis QTL clusters, implying that independent selection for these trait classes may be achieved by marker-assisted selection.

The coincidence of QTLs for correlated traits with co-directional genetic effects is compatible with two hypotheses. Alleles with similar directions of effect may be in coupling phase at a number of adjacent loci, and such linked complexes may have been generated and maintained by selection prior to or after domestication. Alternatively, allelic variation at a single genetic locus may control pleiotropic variation for a number of characters. A number of previous studies have revealed evidence for co-location of QTLs for vegetative morphogenesis traits in dicotyledenous plants, such as leaf architecture in Arabidopsis thaliana (Pérez-Pérez et al. 2002) and leaf and stem traits in Brassica oleracea (Lan et al. 2001; Sebastian et al. 2002). Similar studies have been performed for reproductive development traits (Ungerer et al. 2002; Bert et al. 2003). The presence of multiple genes controlling related characters may be resolved by fine mapping studies, as shown for the floral development genes CONSTANS and FLC on chromosome 5 of A. thaliana (Ungerer et al. 2002). Possible causal mechanisms for pleiotropic effects of individual gene loci on multiple morphogenetic and developmental characters have been identified through the analysis of A. thaliana mutants impaired in actin polymerisation (Li et al. 2004).

Implications for breeding improvement

A number of reports have confirmed the ubiquity of correlations between white clover agronomic traits and potential for their use in germplasm improvement (Caradus and Woodfield 1990; Rhodes et al. 1994). Co-location of QTLs in the present study provides confirmation for the genetic basis of such correlations, and suggests that selection for a small number of target genomic regions may achieve gains for multiple characters. Coordinate selection for vegetative morphogenesis traits based on the major combined analysis QTL clusters has the potential to increase harvestable herbage yield. At the same time, the presence of QTLs for internode length outside the major clusters may provide the basis for divergent selection of stolon density and leaf dimensions. Stolon morphology traits are of key importance for persistence and yield in white clover. Previous studies have demonstrated the effectiveness of divergent selection for stolon characteristics (Collins et al. 1997) and the extent to which correlations between morphological traits are affected by such selection (Collins et al. 1998). The development and implementation of marker-assisted selection for stolon traits is a high future priority as it will allow the identification of relevant recombination events at an early stage without the need for costly and repetitive phenotyping, especially when genetic markers are deployed concurrently for a number of traits. Among the reproductive characters, enhanced seed yield is of key importance for the commercial success of white clover cultivars. The two seed yield QTLs of relatively large effect, one of which (on LG18) is solely associated with a flowering time QTL, provide the basis for independent selection of this trait.

Although the combined analysis has revealed stability of QTL locations across environments for a single sib-ship, maximum value for applied breeding will be obtained when QTLs are conserved between pedigrees. As the F1 (Sustain 65625/2 × NRS 364/7) mapping family (Barrett et al. 2004) is based on a cross between parental genotypes that differ for leaf dimensions, stolon dimensions and growth type, phenotypic variation for equivalent traits may be anticipated. However, the absence of common genetic markers prohibits QTL coincidence studies between the two reference populations. Pair-crosses have been performed using parental genotypes from varieties that diverge for vegetative morphogenesis and reproductive development traits (K.F. Smith, unpublished data). Construction of consolidated genetic maps from these crosses based on common markers will identify conservation of QTL locations with the F2 (I.4R× I.5J) population and the F1 (Sustain 65625/2 × NRS 364/7) populations. This research is currently in progress, and will prove especially valuable for seed yield trait QTLs which have been independently mapped in the latter population (Barrett et al. 2005). Other pair cross-derived families are in development for the analysis of stolon morphology traits (M.T. Abberton, unpublished data).

Comparative genetics and genomics with model legume species

For the legume family (Fabaceae), macrosyntenic relationships have been defined within the sub-family Papilionoideae (Choi et al. 2004b). More detailed micro-scale studies of genome conservation will be enabled through genetic analysis (Choi et al. 2004a; Pedrosa et al. 2002; Sandal et al. 2002) and genome sequencing (Vandenbosch and Stacey 2003) of two model species, Medicago truncatula Gaertn. and Lotus japonicus Gifu. As the genus Trifolium is closely allied to Medicago within the Trifolieae tribe (Kajita et al. 2001), comparative genomics is likely to be highly effective for white clover genetics.

Quantitative genetic variation for vegetative morphogenesis traits such as leaf dimensions and plant height, and for reproductive traits such as flowering date and seed pod weight, has been detected in M. truncatula (Bonnin et al. 1996, 1997). Several studies have detected genetic effects on herbage yield in alfalfa (Juan et al. 1993; Kimbeng and Bingham 1998). The genetic control of reproductive traits such as components of seed yield was also analysed in alfalfa (Bolanos-Aguilar et al. 2001, 2002). Functionally-associated genetic markers (Andersen and Lübberstedt 2003) such as EST-derived SSRs and SNPs will provide the most efficient means for alignment between the genetic maps of white clover and the model legume species and identification of putative conserved QTLs, leading to computational detection of orthologous candidate genes.