Introduction

The grass family is one of the largest, with 10,000 species, and has arisen only 60 million years ago (MYA), indicating a rapid pace of speciation (Kellogg 2001). It is also the source of modern agriculture. Except for soybean, a legume, the cereals, wheat, rice, maize, and sorghum, are the largest crops in the world and provide our major nutritional resources. Initial analysis of comparative maps of some cereal genomes indicated that conserved genetic markers are preserved in their order (synteny) and that speciation resulted in the breakage and reassortment of chromosomes (Gale and Devos 1998). Because of the c-value paradox among these species it therefore became attractive to conceptualize an anchor genome for the grasses that is not only an important worldwide crop but also has a small genome (Messing and Llaca 1998). This concept resulted in the international effort to sequence the rice genome, which has a size of 378 Mb (International Rice Genome Sequencing Project 2005).

However, comparisons of the rice genome with other members of the grass family have shown that they not only differ in their expansion of chromosomes but also in their micro-colinearity (Tikhonov et al. 1999). Moreover, it appeared that gene movement has lead to gene insertions that disrupt syntenic alignments of orthologous regions from different species (Song et al. 2002). Furthermore, genome organizations differed in their response to polyploidization. Although in hexaploid wheat homoeologous chromosomes are largely preserved due to the Ph1 locus (Griffiths et al. 2006), maize underwent significant changes not only in terms of gene loss but also in chromosome expansion (Bruggmann et al. 2006). Therefore, our previous beliefs from comparative genetic maps that grasses represent simple integrative genetic systems has to be reconsidered and understanding the speciation processes are more important steps in investigating the divergence, organization, and functions of plant genes than previously anticipated.

We believe that commonly used turfgrasses could provide such interesting speciation events that could provide additional perspectives to the evolution of the grass family and new approaches to breeding. Colonial and creeping bentgrasses (A. capillaris L. 2n = 4x = 28, A1A1A2A2 and A. stolonifera L. 2n = 4x = 28, A2A2A3A3, respectively) are closely related commercially important turfgrass species that are used extensively on golf courses in temperate regions (Warnke 2003; Ruemmele 2003). Bentgrasses are commonly used on the greens, tees, and fairways where they form a dense turf that can be maintained at low cutting heights. Currently breeding programs for both creeping and colonial bentgrasses exist in research universities and in the private sector. These programs have been successful in developing and releasing improved cultivars to the public (Meyer and Funk 1989). Approximately £2.8 million of creeping bentgrass seed and approximately £1.5 million of colonial bentgrass seed were produced in Oregon in 2004 (Young 2004).

However, not much is known about the gene content of these species. Still in its infancy, genetic linkage maps based on molecular markers have been developed for creeping bentgrass (Chakraborty et al. 2005; Bonos et al. 2005) and a linkage map of colonial bentgrass is currently being developed as well (Rotter et al. 2006). In the absence of a physical map and as a first step to assess their gene content, we generated 8,470 creeping bentgrass EST sequences and 7,528 colonial bentgrass sequences. We anticipate that these sequences will serve as an important resource for the development of maps with higher densities of markers and therefore enable marker-assisted breeding and discovery of the genes conferring desirable traits.

For instance, one of the major management problems for creeping bentgrass is the fungal disease dollar spot (Walsh et al. 1999). There are no current cultivars of creeping bentgrass that are completely resistant to dollar spot, but there is a considerable variation in the degree of susceptibility (Bonos et al. 2003; Chakraborty et al. 2006). In contrast, cultivars of colonial bentgrass generally have good resistance to dollar spot (Plumley et al. 2000) and, therefore, may be a source of novel genes or alleles that could be used for genetic improvement of creeping bentgrass (Belanger et al. 2003, 2004). The fungal agent of dollar spot is referred to as Sclerotinia homoeocarpa F.T. Bennett. However, it is widely recognized that the taxonomic classification is incorrect and the fungus should be reclassified in one of the genera Rutstroemia, Lanzia, Moellerodiscus, or Poculum (Carbone and Kohn 1993; Holst-Jensen et al. 1997). Formal reclassification has not been done due to lack of information on the teleomorph (Viji et al. 2004). Differences in disease resistance between colonial and creeping bentgrass could be analyzed based on genetic maps and by differential gene expression. Both approaches will be facilitated by the availability of the ESTs.

Besides building a resource for gene content and expression, we also used sequence-clustering techniques to identify conserved gene copies to determine the relationships of the progenitors of these genomes. Both colonial and creeping bentgrasses are allotetraploids, having the A2 genome in common (Jones 1956b). Analysis of conserved genes (Fulton et al. 2002) present among the ESTs allowed us to estimate the age of divergence between the two subgenomes within each species and between the shared colonial and creeping bentgrass A2 genomes.

Materials and methods

RNA isolation and cDNA library construction

The RNA was isolated from an individual creeping bentgrass plant and an individual colonial bentgrass plant that were part of the 2002 field test (Belanger et al. 2004). Leaf samples were collected 56 days after inoculation with the dollar spot fungus and stored at −80°C. For RNA isolation, leaf samples (1 g) were ground to a fine powder with liquid nitrogen and resuspended in 10 ml Tri-Reagent (Sigma-Aldrich, St. Louis, MO, USA). Debris was removed by centrifugation and the supernatant was extracted twice with chloroform. RNA in the aqueous layer was precipitated with isopropanol, and the RNA pellet was washed once with ethanol and dissolved in water. Poly(A+) RNA was isolated from the total RNA using a commercial kit (Oligotex mRNA Purification Midi Kit, Qiagen USA, Valencia, CA, USA).

The cDNA libraries were constructed from the creeping bentgrass and colonial bentgrass poly(A+) RNA. cDNA synthesis and phage packaging were carried out by using a commercial kit (λ ZAP-Express cDNA Library Construction Kit, Stratagene, La Jolla, CA, USA). The primary creeping and colonial bentgrass cDNA libraries contained 0.4 × 106 and 1 × 106 plaque-forming units, respectively. The primary libraries were amplified and stored at −80°C, as described by the manufacturer.

EST sequencing

pBK-CMV phagemids were mass excised from the λ ZAP-Express vector, transformed into bacterial strain XLOLR, and plated onto Luria-Bertani (LB) plus kanamycin medium as described by the manufacturer. Approximately 2,000 phagemids per plate were grown in 243 × 243 mm2 BioAssay Dishes (Nalge Nunc International Corp., Rochestser, NY, USA). To allow discrimination between recombinant and nonrecombinant phagemids, a solution containing 1 ml LB, 1 ml of 100 mm isopropylthio-β-d-galactoside (IPTG), and 200 μl of 100 mg ml−1 5-bromo-4-chloro-3-indolyl-β-d-galactopyranoside (XGAL) was spread on the agar surface of each plate prior to plating the bacterial cultures. White colonies were picked on a ‘Q’ Pix2 automated colony picker (Genetix USA Inc., Boston, MA, USA) into 96 well plates containing 200 μl per well of LB plus kanamycin freezing medium (Zimmer and Gibbins 1997). The cultures were incubated at 37°C for 16 h, duplicated, and stored at −80°C.

The glycerol stocks were used to inoculate deep well plates with 1.2 ml of LB plus kanamycin using a ‘Q’ Bot robot (Genetix USA Inc.). Plasmid DNA was isolated from these overnight cultures using the alkaline lysis method (Birnboim and Doly 1979) in a 96-well format using Whatman filters (Whatman Inc., Clifton, NJ, USA). The DNA was pelleted using isopropanol and dissolved in 1 mM Tris–Cl, pH 8.0. The cDNA inserts were sequenced from the 3′ end with the T7 primer using an ABI PRISM BigDye Terminator v3.1 Cycle Sequencing Ready Reaction kit (Applied BioSystems, Foster City, CA, USA). The unincorporated dyes were removed by solid phase reversible immobilization (SPRI) using magnetic beads (Agencourt Biosciences Corp., Beverly, MA, USA). The reactions were then suspended in 0.1 mM EDTA and run on an ABI 3730xl automated capillary sequencer with 50 cm capillary arrays. Base-calling was done using the ABI KB basecaller with lower quality sequences being trimmed using the “Lucy” software (TIGR) for quality control (.Q16). Sequences were also screened for vector contamination. All sequences >100 bp were processed for GenBank submission. A total of 15,998 successful sequence reads were obtained and deposited to GenBank.

Computational analysis

ABI trace files were converted to fasta format by using the AutoEditor program (TIGR). Cleaned sequences were assembled into contigs by using the CAP3 program (Huang and Madan 1999). The ESTs were imported into the openSputnik EST database and clustered into unigenes for functional annotation (Rudd 2005). Functional assignments were performed using BLASTX (threshold value of 1 × 10−10) against the MIPS catalog of functionally assigned proteins (FunCat; Ruepp et al. 2004).

Reverse transcriptase-PCR (RT-PCR) amplification of creeping bentgrass sequences

Two of the creeping bentgrass sequences (accessions EH667188 and EH667189) used for the subgenome comparisons were obtained by RT-PCR to extend the coding sequences from the original EST accessions DV860785 and DV865904, respectively. Total RNA was isolated from the creeping bentgrass plant by using Tri-Reagent, as described earlier. First strand cDNA was synthesized using a polyT primer, the CDSIII/3′ primer from the Smart cDNA Library Construction Kit (Clontech Laboratories, Inc., Mountain View, CA, USA). One microgram of total RNA and 1 μl of 10 mM CDSIII/3′ primer were mixed in a total volume of 5 μl, heated at 70°C for 5 min and cooled on ice for 2 min. To this mixture was then added 2 μl of M-MLV RT buffer, 1 μl of 10 mM dNTP mix, 1 μl of M-MLV reverse transcriptase (Fisher Scientific, Pittsburgh, PA, USA), and 0.8 μl of Rnasin RNase inhibitor (Promega Corporation, Madison, WI, USA). The reaction was carried out at 42°C for 60 min and then stopped by cooling on ice.

PCR amplification of the entire cDNA product was carried out with gene-specific primers by adding 5 μl of 10X Taq PCR buffer, 4 μl of 2.5 mm dNTP mix, 5 μl of the forward and reverse primers at 10 pmol μl−1, and 1 μl of AmpliTaq DNA Polymerase (Applied Biosystems, Inc.) in a final volume of 50 μl. The 3′ primer for amplification (5′-GCAAGTTAGTGGAACAAGTTAAGAAC-3′) was designed based on the creeping bentgrass EST sequence DV865904. The 5′ primer (5′-GGTGGCTCCCTCTTGTCATA-3′) was designed based on the coding sequence of the colonial bentgrass EST sequence DV855725. The reaction was carried out in a GeneAmp 9700 thermocycler (Applied Biosystems Inc.). The initial denaturation was conducted at 94°C for 2 min, followed by 40 cycles of 30 s denaturation at 94°C, 30 s annealing at 56°C, and 1 min extension at 72°C, followed by a final extension at 72°C for 7 min. The PCR products were purified with a QIAquick PCR purification Kit (Qiagen). The purified PCR products were ligated into the plasmid pGEM-T-Easy (Promega Corporation) and transformed by electroporation into XL1-Blue MRF’ cells (Stratagene). Plasmids from several colonies were purified with a QIAprep Spin Miniprep kit (Qiagen) and sequenced using the forward PCR primer (GeneWiz, Inc., North Brunswick, NJ, USA).

Phylogenetic analysis

The CLUSTAL-X (Thompson et al. 1997) program was used to align DNA sequences. Phylogenetic analysis was performed with the PAUP program (version 4.0b10 for Macintosh; Swofford 2002). Phylogenetic trees were generated using the maximum parsimony (1,000 bootstrap replications) and the neighbor-joining methods. The K s and K a values were determined using the MEGA3.1 program (Kumar et al. 2004).

Results

EST generation

As a first step to gain insight into the gene content and the evolution of colonial and creeping bentgrass, we constructed cDNA libraries for both. To simultaneously facilitate discovery of differential gene expression in response to environmental stress, we took advantage of the fact that colonial bentgrass generally has good resistance to dollar spot while creeping bentgrass does not. Therefore, leaf tissue from creeping and colonial bentgrass plants from the 2002 field test was taken after inoculation with the dollar spot fungus, at a time when the creeping bentgrass plant was showing symptoms of dollar spot disease (Belanger et al. 2004). The colonial bentgrass plant was not exhibiting symptoms of the disease. The creeping and colonial bentgrass cDNA libraries are therefore expected to include clones for genes that were induced in response to environmental stress and in particular to the fungal inoculum.

The creeping and colonial bentgrass cDNAs were cloned unidirectionally into the λ ZAP Express vector. The cDNA clones were sequenced from the 3′ ends to generate sequence data from the untranslated region of the transcripts. The 3′ untranslated regions of genes are generally more polymorphic than the coding sequences, and therefore more useful for molecular marker development (Bhattramakki et al. 2002; Brady et al. 1997). The characteristics of the colonial and creeping bentgrass ESTs are summarized in Table 1. A total of 10,944 randomly chosen cDNA clones from each library were single pass sequenced from the 3′ end. From the creeping bentgrass cDNAs, 8,470 usable sequences were obtained with an average read length of 567 bases after vector trimming. From the colonial bentgrass cDNAs, 7,528 usable sequences were obtained with an average read length of 745 bases after vector trimming. These sequences have been deposited in the NCBI dbEST (colonial bentgrass accession nos: DV852741–DV860268; creeping bentgrass accession nos: DV860269–DV868738) and to the openSputnik EST database (http://www.sputnik.btk.fi/ests; Rudd 2005). The sequences can also be searched at http://www.aesop.rutgers.edu/∼belangerlab/research.htm.

Table 1 Characteristics of the colonial and creeping bentgrass EST sequences

Chloroplast and microbial sequences

To determine the level of chloroplast contamination among the ESTs, the colonial and creeping bentgrass sequences were compared with the wheat chloroplast genome sequence (Ogihara et al. 2002) by a BLASTN search. Some sequences nearly identical to the wheat chloroplast genome were present among the colonial and creeping bentgrass ESTs (Table 1). The contamination level was 0.5 and 1.4% for the colonial and creeping bentgrass ESTs, respectively.

Since the cDNA libraries were constructed from field grown plants that were inoculated with the dollar spot fungus, some microbial sequences would be expected among the ESTs. To estimate the level of microbial ESTs in the libraries, the colonial and creeping bentgrass ESTs were compared to the 14,522 protein sequences of the plant pathogen Sclerotinia sclerotiorum (Sclerotinia sclerotiorum Sequencing Project, Broad Institute of Harvard and MIT, http://www.broad.mit.edu) using a BLASTX search. Although the dollar spot fungus should not be classified in the genus Sclerotinia (Holst-Jensen et al. 1997), S. sclerotiorum is the most closely related species for which there is a whole genome sequence available. For the bentgrass ESTs that had a match to a S. sclerotiorum protein sequence, the expected value was compared with the expected value to the best match in a BLASTX search of the NCBI database. Those bentgrass ESTs for which the expected value was lower with the S. sclerotiorum match than with the NCBI match were considered to be likely of microbial origin. This comparison was done because most of the S. sclerotiorum protein sequences were not included in the NCBI database. For the colonial and creeping bentgrass ESTs, 184 and 99 sequences, respectively, met these criteria. Sequences of microbial origin in the libraries are therefore estimated to be 2.4% for colonial bentgrass and 1.2% for creeping bentgrass.

Unigene analysis

The sequences from each library were imported into the openSputnik EST database and clustered into unigenes (Rudd 2005). Analysis of the colonial bentgrass sequences resulted in 1,095 multimember unigenes and 3,579 singletons. Analysis of the creeping bentgrass sequences resulted in 980 multimember unigenes and 3,901 singletons. Recent estimates of the gene content of another ancient tetraploid grass species, maize, have placed its total gene number between 42,000 and 56,000 genes (Haberer et al. 2005), whereas the total predicted gene number for the sequenced rice genome was 37,544 (International Rice Genome Sequencing Project 2005). Therefore, we estimated that we had obtained sequences from roughly 10% of the total gene content of the two turfgrass species.

The unigene sequences were assigned to functional categories based on the MIPS catalog of functionally assigned proteins (Ruepp et al. 2004) (Table 2). Of the 4,425 colonial bentgrass unigenes, 1,843 could be classified. Of the 4,747 creeping bentgrass unigenes, 1,048 could be classified. There were more creeping bentgrass unigenes with no match. This may be a reflection of the shorter average length of the creeping bentgrass sequences and the creeping bentgrass unigenes. Since the sequences were from the 3′ ends, there may be inadequate coding sequence information in some of the creeping bentgrass ESTs on which to base a match for functional classification. Of those unigenes that could be assigned a functional classification, the percentage distribution between colonial and creeping bentgrass was similar, with one exception. There was a 10-fold higher representation of creeping bentgrass unigenes in the category of transposable elements, relative to that of colonial bentgrass. The difference in transposable elements between the creeping and colonial ESTs is discussed later. Annotation of the unigenes within each functional category is available on the openSputnik website (http://www.sputnik.btk.fi/ests).

Table 2 Distribution of the functional classes of colonial and creeping bentgrass unigenes

Transposable element sequence representation in the colonial and creeping bentgrass ESTs

In the FunCat analysis of the unigenes, transposable element sequences represented a 10-fold higher percentage of the creeping bentgrass unigenes, relative to the colonial bentgrass unigenes. The distribution of transposable element ESTs between the two species was, therefore, further investigated. Transposable elements are found in all plant genomes and are classified into two groups, the Class I retroelements (retrotransposons with LTRs and retrotransposons without LTRs) and the Class II DNA transposons (hAT, CACTA, Mutator, MITEs, Helitrons, etc.). Many grass genomes are largely composed of retrotransposon sequences (Bennetzen 2000; Feschotte et al. 2002; Messing et al. 2004; Haberer et al. 2005). An analysis of 7.8 × 105 EST sequences from numerous plant species revealed that retrotransposons were active and represented 0.12% of the ESTs (Vicient et al. 2001). Some of the colonial and creeping bentgrass EST sequences were most similar to retrotransposons described from other species, particularly rice. The colonial and creeping bentgrass ESTs were systematically searched with the coding sequences of numerous rice retrotransposons of different classes and the best matches were tabulated. The results are summarized in Table 3. The expected values of the matches ranged from 8 × 10−5 to 8 × 10−128. For all the ESTs included in the data summarized in Table 3, a retrotransposon sequence was the best match. There is a striking difference in the representation of retrotransposons among the colonial bentgrass ESTs, relative to that among the creeping bentgrass ESTs. In the colonial bentgrass library, retrotransposon ESTs were 0.18% of the total, similar to the 0.12% reported from the survey of ESTs from many plant species (Vicient et al. 2001). In contrast, in the creeping bentgrass EST library retrotransposon sequences represented 1.4% of the total, an 8-fold higher representation than in the colonial bentgrass library.

Table 3 Colonial and creeping bentgrass transposable element ESTs

The Class II DNA elements are found in lower copy numbers than the retrotransposons. The bentgrass ESTs were searched with several Class II DNA element coding sequences and some significant matches were found (Table 3). Similar to the case with the retrotransposons, there was a 16-fold higher representation of the Class II DNA elements among the creeping bentgrass ESTs, relative to the colonial bentgrass ESTs.

This difference in the transposable element representation between the creeping bentgrass and colonial bentgrass ESTs shows an interesting correlation between expressed transposable elements and response to environmental stress. Stress was phenotypically measurable because the creeping bentgrass sample was exhibiting symptoms of dollar spot disease, whereas the colonial bentgrass sample was not. Further studies of this aspect would call for microarray or 454 sequencing analysis of different mRNA sampling from different types of stress conditions. Nevertheless, it has been shown that transposable elements can be activated by various abiotic and biotic stresses, including pathogen attack (McClintock 1984; Casacuberta and Santiago 2003; Grandbastien 1998; Kimura et al. 2002; Wessler 1996).

Phylogenetic analysis

Taxonomically, the genus Agrostis is placed in the subfamily Pooideae of the grass family Poaceae. Agrostis is assigned to the tribe Aveneae based on morphological characters and chloroplast DNA restriction site characters (Renvoize and Clayton 1992; Watson 1990; Soreng and Davis 1998). Agrostis species have not previously been included in any phylogenetic comparisons with other grass species based on DNA sequence data. To better establish the relationship of Agrostis to the cereal grasses, we generated phylogenetic trees based on our EST sequences (Fig. 1; Tables 4, 5).

Fig. 1
figure 1

Rooted maximum parsimony phylogenetic trees comparing COS sequences from colonial bentgrass and creeping bentgrass with orthologous sequences from cereal grasses. The gene identifications and the GenBank accession numbers for the sequences used in the analysis are presented in Tables 4 and 5. The maize sequence was designated as the outgroup for rooting the trees. The numbers at the nodes are the bootstrap percentages based on 1,000 replications

Table 4 Accession numbers of sequences used for the colonial bentgrass phylogenetic analysis presented in Fig. 1
Table 5 Accession numbers of sequences used for the creeping bentgrass phylogenetic analysis presented in Fig. 1

For this type of analysis, it is important to select putative orthologous sequences from all the species involved. Identification of orthologous genes usually requires positional information so that their colinearity and descent from a common ancestral chromosome can be established. Such information cannot be obtained from cDNA sequences. However, clustering methods have shown that orthologous and paralogous sequences can be distinguished by their divergence rates (Xu and Messing 2006). Fulton et al. (2002) first proposed the concept of conserved ortholog set (COS) markers for comparative genomics. COS markers are single copy genes that have been conserved throughout evolution, and can therefore be considered as orthologs (Fulton et al. 2002). The creeping and colonial bentgrass ESTs were therefore systematically searched for COS genes. COS genes have been identified in rice (1,290) in the Michelmore lab (http://www.cgpdb.ucdavis.edu/COS Arabidopsis/). Here, creeping and colonial bentgrass EST sequences were compared to the rice COS genes and orthologs of 177 and 161 rice COS genes were found among the colonial and creeping bentgrass ESTs, respectively. To maximize the total number of sequences that could be used for the phylogenetic analysis, separate trees for colonial bentgrass and creeping bentgrass COS sequences were generated because there was not much overlap between the two bentgrass COS datasets. These colonial and creeping bentgrass COS sequences were used to identify similar sequences from Avena sativa (oats), Festuca arundinacea (tall fescue), Hordeum vulgare (barley), Triticum aestivum (wheat), Brachypodium distachyon, Oryza sativa (rice) and Zea mays (maize). Overall, for 11 and 7 of the colonial bentgrass and creeping bentgrass COS sequences, respectively, similar sequences were available for all the other species. These genes were confirmed to be single copy genes in rice by BLASTX searches in Gramene (http://www.gramene.org/). For each gene, the sequences were aligned and trimmed to include only the region of coding sequence overlap for all the species. The trimmed sequences for all the genes were combined and a phylogenetic tree generated from a maximum parsimony analysis (Fig. 1). The maximum parsimony analysis of the combined dataset for the colonial bentgrass tree was based upon 3,675 total characters, of which 2,820 were constant, 483 variable characters were parsimony uninformative, and 372 characters were parsimony informative. The creeping bentgrass tree was based on 1,754 total characters, of which 1,361 were constant, 234 variable characters were parsimony uninformative, and 159 characters were parsimony informative. The maize sequence was designated as the outgroup for rooting the trees. Neighbor-joining analysis of the data generated trees of identical topology (not shown). In both the maximum parsimony and neighbor-joining analyses colonial bentgrass and creeping bentgrass were closer to tall fescue (tribe Poeae) than to oats (tribe Aveneae).

Estimation of the divergence time of the bentgrass subgenomes

An extensive cytological investigation into some Agrostis spp. and their interspecific hybrids was reported by Jones (1956a, b, c). Based on the cytological results, Jones (1956b) proposed a model for genome organization of creeping and colonial bentgrass in which both species were considered to be allotetraploids, having one ancestral genome in common. Both species were found to have 14 chromosome pairs. The genome organization of colonial bentgrass was designated as A1A1A2A2 and that of creeping bentgrass A2A2A3A3 (Jones 1956b). The A1 subgenome of colonial bentgrass was considered to be related to the diploid species A. canina (velvet bentgrass). The diploid origins of the A2 and A3 genomes are unknown. Analysis of marker segregation in creeping bentgrass demonstrated that inheritance is strictly disomic (Chakraborty et al. 2005) so distinct subgenomes are expected.

As allotetraploids, both creeping and colonial bentgrass would be expected to have homoeologous COS genes. Sequence comparisons of orthologous creeping and colonial bentgrass genes will be helpful in the identification of the subgenomes of both species. For some of the COS genes represented in the bentgrass ESTs, multiple sequences for each species were obtained. There were four such cases where two distinct sequence types were found among each of the colonial and the creeping COS ESTs. Maximum parsimony phylogenetic analyses of these homoeologous sequence sets revealed that some creeping and colonial sequences were more similar to each other than they were to the other similar sequences from within each species (Fig. 2). These results are consistent with expectations based on the previous cytological work. Since the A2 genome is shared between colonial and creeping bentgrass (Jones 1956b), the A2 genome sequences are expected to be more similar to each other than to the A1 or A3 genome sequences. Subgenome assignments could therefore be made from the phylogenetic analyses of the sequences. The orthologous rice sequence was used to root the trees. The tree topologies were the same using either the maximum parsimony or neighbor-joining methods (data not shown). For the phylogenetic analyses presented in Fig. 2, both the available coding sequences and the 3′ untranslated regions of the EST sequences were used. The sequences of the creeping bentgrass accessions EH667188 and EH667189 were generated by RT-PCR because the original EST sequences did not include much coding sequence.

Fig. 2
figure 2

Rooted maximum parsimony phylogenetic tree comparing homoeologous colonial and creeping bentgrass COS sequences. The orthologous rice sequence was designated as the outgroup for rooting the trees. The numbers at the nodes are the bootstrap percentages based on 1,000 replications. The total characters/constant characters/parsimony uninformative variable characters/parsimony informative characters for the analyses presented in ad are a (508/400/91/17), b (547/412/126/9), c (703/528/151/24), and d (603/461/130/12)

The divergence times of the subgenome pairs were estimated by using an approach similar to that used to estimate the age of polyploidy in Gossypium (cotton) (Senchina et al. 2003). First, the average divergence rate for each gene was calculated by calibration to rice using the formula T = K s /2r (Senchina et al. 2003). The divergence time (T) between rice and the Aveneae has been estimated to be 46 MYA (Gaut 2002). The average K s (substitutions at synonymous sites) and K a (substitutions at nonsynonomous sites) for the coding sequence of each gene relative to rice was determined by comparing the colonial and creeping bentgrass coding sequences to the orthologous rice sequence. The low K a /K s ratios for the four COS genes used for the divergence time estimates indicate these genes are under purifying selection. From the average K s for each gene, the rate of divergence of synonymous sites (r) for each gene could therefore be calculated (Table 6). The calculated rates varied from 4.3 × 10−9 to 6.7 × 10−9 substitutions per synonymous site per year, which are similar to rates calculated for other grass species (Gaut et al. 1996; Swigonova et al. 2004; Devos et al. 2005).

Table 6 Estimation of the divergence rate of colonial and creeping bentgrass sequences relative to rice

The calculated rates of divergence of each gene were then used to calculate the time of divergence between the colonial A1/colonial A2, creeping A2/creeping A3, and colonial A2/creeping A2 subgenome pairs, based on the K s value determined for each pair (Table 7). The mean divergence times of the colonial and creeping bentgrass subgenomes were similar, at 8.9 and 10.6 MYA, respectively. The mean divergence time of the colonial and creeping bentgrass A2 subgenomes was 2.2 MYA, suggesting a recent origin of these allotetraploid species.

Table 7 Estimation of divergence times of the bentgrass subgenome pairs

Discussion

The generation of colonial and creeping bentgrass ESTs presented here is a first step in applying a genomics approach to the analysis of the gene content and the evolution of these commercially important turfgrasses. We have made available 7,528 and 8,470 colonial and creeping bentgrass EST sequences, respectively. Based on the unigene clustering, the redundancy of the EST sequences was 56% for both colonial and creeping bentgrass. This relatively low redundancy level suggests a high gene number for these species and that additional cDNA sequencing would be a cost-effective way to obtain additional gene content information.

Analysis of COS sequences found among the ESTs was used to place Agrostis in a phylogeny of other Poaceae. Although Agrostis is traditionally considered a member of the tribe Aveneae (Renvoize and Clayton 1992; Watson 1990; Soreng and Davis 1998), both the maximum parsimony and neighbor-joining analyses placed it closer to tall fescue, a member of the tribe Poeae, rather than to oats, also in the tribe Aveneae. The tribes Aveneae and Poeae are closely related (Kellogg 1998) and in some phylogenies have been combined, although this point remains unresolved (Grass Phylogeny Working Group 2001). The phylogenetic analysis presented here is consistent with the close relationship of the Aveneae and Poeae tribes.

To examine the relationships of the bentgrass subgenomes to each other we estimated the divergence times of the subgenome pairs based on comparisons of four COS genes for which all sequence types were present among the ESTs. The average divergence times for the colonial A1/colonial A2, creeping A2/creeping A3, and the colonial A2/creeping A2 subgenome pairs were 8.9, 10.6, and 2.2 MYA, respectively. Based on a cytological analysis, Jones (1956b) concluded that creeping and colonial bentgrass had the A2 genome in common. The analysis presented here based on the EST sequence data supports his conclusion. The colonial and creeping bentgrass A2 subgenome sequences are considerably more similar to each other than to the orthologous A1 or A3 sequences, respectively. The estimated divergence of the colonial and creeping bentgrass A2 subgenomes from a common ancestor 2.2 MYA suggests this as the approximate time of the polyploidy events.

Recent analysis of homoeologous regions of the maize genome with orthologous regions of the sorghum and rice genomes showed that sorghum and maize separated from rice 50 MYA, whereas the progenitors of maize and sorghum split nearly instantaneously 11.9 MYA (Swigonova et al. 2004). Maize then arose from the hybridization of two progenitors as recently as 4.8 MYA. Since maize chromosomes expanded in size after this event in bursts (Du et al. 2006), it will be interesting to analyze the repeat elements of the bentgrass subgenomes as well. This would be particularly interesting in light of their possible differential roles in response to environmental stress. Colonial bentgrass and the A1 diploid species velvet bentgrass both have good resistance to dollar spot whereas creeping bentgrass does not. Therefore, the genetic basis of the dollar spot resistance in colonial bentgrass possibly originates from the A1 subgenome.

Colonial and creeping bentgrass provide examples of polyploid grass genomes that are intermediate in age between that of maize, which has become diploidized, and hexaploid wheat. The tetraploid ancestor of wheat (T. turgidum) is believed to have formed 0.5 MYA and its hybridization with the diploid species Aegilops tauschii producing hexaploid T. aestivum is believed to have been as recently as 8,000 years ago (Huang et al. 2002).

Although rice has a diploid genome, it underwent many segmental duplications, the most recent one 7.7 MYA (Rice Chromosomes 11 and 12 Sequencing Consortia 2005). Given the recent polyploidization of colonial and creeping bentgrass, it will also be interesting to see whether any of the ancestral rice segmental duplications are also detectable in the four bentgrass homoeologous subgenomes.

The age of polyploidy in tetraploid cotton (Gossypium) was estimated to be about 1.5 MYA by comparing the divergence times of the tetraploid subgenomes with the corresponding diploid species (Senchina et al. 2003). For tetraploid colonial and creeping bentgrasses, the diploid origin of the shared A2 genome is unknown. We therefore directly compared the divergence times of four colonial bentgrass A2 and creeping bentgrass A2 gene sequences to estimate the age of polyploidy. Velvet bentgrass was considered to be the diploid origin of the A1 genome present in colonial bentgrass (Jones 1956b). Our efforts to use RT-PCR using primers designed from colonial bentgrass to obtain sequence data from velvet bentgrass for inclusion in the analysis were unsuccessful. As more sequence data becomes available from velvet bentgrass and if the diploid A2 and A3 Agrostis spp. can be identified, a more extensive analysis of the age of polyploidy of colonial and creeping bentgrass will be possible. Nevertheless, even sampling the gene content of genomes at roughly the 10% level provides important insights into the effects of polyploidization during speciation, and a solid foundation to build a resource for the studies of the evolution and gene pool of today’s domesticated plant species.