Introduction

There are at least 250,000 species of angiosperms, and the morphologies of their reproductive organs, flowers, vary extensively (Thorne 1992; Crane et al. 1995). Great efforts have been made to understand the extent and origin of the diversity of flowering plants, and progress has been made by applying molecular analysis to the fields of phylogenetics and developmental biology. Studies have shown the monophyletic relationship of extant angiosperms, but the early evolution of angiosperms, as well as their closest relatives among gymnosperms, is still unclear because of a lack of appropriate fossils and ambiguous phylogenetic placement of basal angiosperm lineages (Judd et al. 2002). Five groups of angiosperms have been identified as the earliest flowering plants: Amborellaceae, Nymphaeales, Illiciales, Trimeniaceae, and Austrobaileya (Mathews and Donoghue 1999; Parkinson et al. 1999; Qiu et al. 1999; Soltis et al. 1999; Borsch et al. 2003). These groups of angiosperms are termed the ANITA grade. Although these groups are likely basal, further investigation is still necessary to determine the exact order of relationships among them (Barkman et al. 2000; Graham and Olmstead 2000; Mathews and Donoghue 2000; Zanis et al. 2002; Goremykin et al. 2003).

Investigation of the diversity of angiosperms often makes use of the structures of flowers. To investigate floral structure and development, molecular genetic studies used dicotyledonous plants, such as Arabidopsis thaliana, Antirrhinum majus, and Petunia hybrida. These plants have the general floral structure of eudicotyledonous plants, with four organ types (sepal, petal, stamen, and gynoecium) arranged in four whorls. Molecular genetic analyses with these plants have focused on a set of homeotic floral-organ mutants, leading to a model in which five classes of loci, known as the A, B, C, D, and E classes, interact in a combinatorial manner to specify organ identities (Bowman et al. 1991; Coen and Meyerowitz 1991; Meyerowitz et al. 1991; Theissen 2001; Theissen and Saedler 2001). In mutants of the B-class loci, petals are transformed into sepals and stamens into carpels, indicating that the products of the B-class genes are required for the specification of petal and stamen identities (Sommer et al. 1990; Jack et al. 1992; Tröbner et al. 1992; Goto and Meyerowitz 1994). Like many floral-organ identity genes, the Arabidopsis B-class genes APETALA3 (AP3) and PISTILLATA (PI) belong to the MADS-box gene family, which encodes transcription factors with four domains (Fig. 1; Ma et al. 1991; Richmann and Meyerowitz 1997; Irish 2003). The region in these proteins that contains the highly conserved MADS-box sequence is termed the MADS domain. The second of the four domains, the K domain, is similar to the coiled-coil segment of keratin. The intervening region between the MADS and K domains is termed the I domain, and the C-terminal region is also called the C domain. MADS-box genes that contain these four regions are specific to plants and are designated as the MIKC-type MADS-box genes.

Fig. 1
figure 1

Diagram of the exon/intron organization of the B-class genes. Intron positions are indicated by triangular cutouts. Numbers of amino acid residues from the methionine initiation codon are shown below the diagram. Exon lengths in nucleotides are shown in parentheses (see also Table 3). The MADS, K, I, and C domain regions are indicated above the diagram. The ranges of MADS and K domains are consistent with those reported by Ma et al. (1990). Lines with numbers shown below the diagram indicate the region used for the phylogenetic analysis

Phylogenetic analysis of the MADS-box gene family has revealed that this family is composed of several defined gene clades (Theissen et al. 1996; Munster et al. 1997; Hasebe et al. 1998; Becker and Theissen 2003). The establishment of these gene clades is thought to have been an important event in the evolution of flowers (Theissen et al. 1996, 2000; Bowman 1997; Kramer et al. 1998; Becker et al. 2002). The clade of B-class genes of angiosperms comprises two lineages: the AP3 and the PI homologues. Numerous genes of the AP3 and PI lineages have been isolated from eudicots and magnoliids (Kramer et al. 1998; Kramer and Irish 2000; Kramer et al. 2003). Recently, putative orthologs of angiosperm B-class genes have been isolated from several gymnosperms (Mouradov et al. 1999; Sundström et al. 1999; Winter et al. 1999; Fukui et al. 2001). A group of MADS-box genes has also been identified as the sister lineage of the B-class genes and termed the Bsister (Bs) clade (Becker et al. 2002). Phylogenetic studies with these genes have provided evidence that the evolution of these lineages was complex, involving many duplication events followed by considerable sequence divergence (Kramer and Irish 2000; Nam et al. 2003). It has been suggested that additional sequence data from basal angiosperms would clarify the relationships among these genes (Kramer and Irish 2000).

We identified putative B-class MADS-box genes from eight species of the basal angiosperm groups Amborellaceae, Nymphaeales, Illiciaceae, and Schisandraceae. The molecular evolutionary analysis that we performed using these genes will advance the understanding not only of the evolution of floral homeotic genes but also of the phylogenetic relationships among the major lineages of basal angiosperms. Furthermore, we use several methods of analysis to estimate the divergence times between the AP3 and PI lineages as well as the time of first divergence of the extant angiosperms.

Materials and methods

Sources of plant materials

The plant materials used in this study are shown in Table 1. Developing floral buds and flowers were collected throughout the growing season from the field or botanical gardens. The taxonomic designations in this study follow the system proposed by APG (Angiosperm Phylogeny Group) II (2003) and two other studies (Soltis et al. 1999; Endress 2001).

Table 1 Plant materials and homologuesa of B-class genes identified in this study

Isolation of AP3-like and PI-like MADS-box genes

Total RNA from flowers, floral buds, and floral organs was prepared using an improved version of the CTAB method (Shindo et al. 1999). Single-stranded complimentary DNA (cDNA) was synthesized using the 3′ Rapid Amplification of cDNA Ends (RACE) System (Gibco BRL). Primers for degenerate polymerase chain reactions (PCR) were designed to match the MADS-box sequences, as follows: MADS head primer, 5′-CAUCAUCAUCAUATGGGIMGIGGIAARATHGARAT-3′; AP3MADS3 primer, 5′-CAUCAUCAUCAUACIAAYMGICARGTIACNTA-3′. PCR analysis was performed in 25-µl reactions containing 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 1 µM each of the 5′ and 3′ primers, 200 µM dNTPs, and 2 units of AmpliTaq Gold polymerase (Perkin-Elmer). Amplifications were performed using the GeneAmp PCR System 9700 (PE Applied Biosystems) with the following program: preheating at 96 °C for 12 min; 45 cycles of 95 °C for 20 s (denaturation), 52 °C for 20 s (annealing), and 72 °C for 1.5 min (extension); and post-heating at 72 °C for 7 min. The annealing temperature was modified appropriately for different primers and plant materials. PCR products of about 0.8–1 kbp were purified using gel electrophoresis and inserted into the pAMP vector (Promega). Both strands of the inserted DNA fragments were sequenced with the T7 and SP6 primers. Primers for 5′ RACE were then designed according to the sequences obtained from the 3′ RACE amplification, and isolation and sequencing of the fragments was performed using the 5′ RACE System (Gibco BRL). The number of independent clones sequenced for each gene is shown in Table 1. We obtained several clones from different PCR reactions for each gene except NtAP3-2. Nucleotide sequence of NtAP3-2 is determined from a clone and may contain PCR errors. The sequences of cDNAs isolated in this study were submitted to the DDBJ/EMBL/GenBank sequence database (Appendix).

Genomic DNA of plant materials, mainly leaves, was isolated using the DNeasy Plant Mini Kit (Qiagen). Partial genomic DNA fragments were isolated using the shuttle-PCR method with suitable primers and the following program: preheating at 96 °C for 12 min; 45 cycles of 95 °C for 20 s (denaturation) and 58 °C for 2 min (annealing and extension); and post-heating at 72 °C for 7 min. The resulting products were inserted into the pGEM-T vector (Promega) and sequenced. The genomic sequences identified in this study have been deposited into the DDBJ/EMBL/GenBank sequence database with accession numbers AB154840, AB164842, and AB154847.

Phylogenetic analysis

At present, more than 200 sequences of B-class and Bs MADS-box genes from various plant species are available. We compiled 125 of these sequences into a data matrix. Some genes (e.g., CjMADS2 of Cryptomeria japonica; Fukui et al. 2001) were omitted from the phylogenetic analysis because of deletions in their sequences. DAL12 (Sundström et al. 1999) and its putative ortholog GGM15 (Winter et al. 2002) were also eliminated because these genes have an exon 5 of atypical length. These genes are thought to have undergone unusual changes (Winter et al. 2002). As the sequences of several gene sets from closely related plants are highly similar, we used representative sequences from each gene set. Recently, many genes have been isolated from plants in the Ranunculales and it has been shown that B-class genes of these plants have undergone many duplication events (Kramer et al. 2003). The phylogenetic tree of the Ranunculales genes has clades at various phylogenetic levels (Kramer et al. 2003), and we included representative members from each. We omitted GGM2 (Winter et al. 1999; Becker et al. 2000, 2002; Tanabe et al. 2003) because the clade of gymnosperm B-class genes did not include this gene.

Sequences that are divergent in several regions, such as the I and C-terminal regions, complicate alignment of this gene family and the previously reported alignments of MADS-box genes vary greatly. We accepted the alignment method of Johansen et al. (2002). All of the protein sequences, including both previously reported sequences and those identified in this work, were aligned exon by exon according to the known gene structures, as the individual exons are likely homologous (Fig. 2A). Genes for which genomic sequences were available were aligned first, and subsequently genes for which only the cDNA sequence is known were grouped and aligned to the others. Alignment was initially performed using the program CLUSTAL W version 1.7 with a gap-opening penalty of 15.00 and a gap extension penalty of 6.66, followed by manual adjustment, taking into account gene structure. Prior to phylogenetic analysis, regions in which indels (insertion/deletion) were present in some sequences were excluded from the data set, and the sequence lengths were adjusted. As the alignment was not straightforward at the terminal regions of exons 1 and 2, these sites were also omitted before analysis. The alignment of 125 MADS-box genes provided a matrix of 154 protein residue sites. Figure 1 shows the scheme of exons 1–6 and the sequence regions used for phylogenetic analysis.

Fig. 2A, B
figure 2figure 2

Sequence alignment of B-class MADS-box genes. A Predicted amino acid sequences of MADS-box genes from basal angiosperms aligned with representatives of eudicots and monocots. Question marks indicate that the sequence is ambiguous. The MADS and K domains and characteristic motifs of the B/Bs genes are boxed. The identical five amino acids between the AmAP3 and AmPI genes are also boxed and the sites of this motif are marked by asterisks. Brackets indicate the positions of introns located between codons (zero-phase introns). Plus signs indicate the position of introns located within codons. They are all phase 2 introns, i.e., they are located between the second and third codon positions. DEF and GLO are derived from Antirrhinum majus; PI, AP3, ABS/AGL32 from Arabidopsis thaliana; OsMADS2 and OsMADS16 from Oriza sativa; DAL13-1 from Picea abies; and GGM2 from Gnetum gnemon. B Alignment of the C-terminal regions of the deduced protein sequences. The names of the genes isolated in this study are in bold. The motifs identified by Kramer et al. (1998) are boxed. Dark gray indicates identical amino acids and light gray represents conserved amino acids

Maximum-likelihood (ML) analysis was carried out using the package MOLPHY version 2.3 (Adachi and Hasegawa 1996). A preliminary neighbor-joining (NJ) tree was constructed with the programs ProtML and Njdist, using the JTT model. A local rearrangement search was then carried out to construct the ML tree, using the ProtML program. Local bootstrap probabilities for branches were estimated using resampling by the estimated log-likelihood (RELL) method (Kishino et al. 1990; Hasegawa and Kishino 1994). We also compared several competing phylogeny hypotheses using the user’s tree option in ProtML, as described by Adachi and Hasegawa (1996). NJ analysis was performed using PHYLIP version 3.6a (Felsenstein 1995). Evolutionary distances were calculated with the PROTDIST program using the JTT model, and a gene tree was constructed with the NJ method (Saitou and Nei 1987) using the NEIGHBOR program. Statistical support for internal branches was estimated by bootstrap probabilities with 100 replicates using the programs SEQBOOT, PROTDIST, NEIGHBOR, and CONSENSE. Parsimony analyses were conducted using PAUP* 4.0b10 (Swofford 2001). A most parsimonious (MP) tree was generated with a heuristic search, using Tree Bisection Reconnection (TBR) and the MulTrees option. Bootstrap support with 1,000 replications was calculated by performing a heuristic search using the TBR option, but not the MulTrees option. Templeton tests were also estimated using PAUP* 4.0b10 with the options of nonparametric test (Templeton 1983).

To select the outgroup of the B-class genes, we first constructed a phylogenetic tree using 38 sequences of MIKC-type MADS-box genes from Arabidopsis thaliana. Although the reconstructed ML tree of Arabidopsis MADS-box genes showed that the ABS/AGL32 gene (the Bs gene of A. thaliana) is sister to the AP3/PI clade, support was very low for the monophyly of the B/Bs clade (data not shown). It has been noted elsewhere that the inter-clade relationships within the B and Bs clade are ambiguous (Nam et al. 2003; Soltis and Soltis 2003). We then used genes of the AGL15 and AGL24 lineages as the outgroup in the following analyses, because these lineages have a close relationship to the B-class and Bs genes in our tree. In several published phylogenetic trees, the relationships between genes of the AGL15 and AGL24 lineages and the B-class genes are rather close (Alvarez-Buylla et al. 2000; Theissen et al. 2000; Johansen et al. 2002; Kofuji et al. 2003; Nam et al. 2003; Parenicova et al. 2003; Tanabe et al. 2003). Phylogenetic analysis of the Arabidopsis MADS-box genes showed that AGL63 belongs to the Bs clade (Martinez-Castilla and Alvarez-Buylla 2003; Parenicova et al. 2003; Tanabe et al. 2003), but we did not use this gene because of its high divergence and its lack of motifs specific to the B/Bs genes.

The divergence times between lineages were estimated with the LINTREE program (Takezaki et al. 1995) using the linearized tree method. Yoder and Yang’s likelihood method (2000) was also employed, using PAML version 3.12 (Yang 2002) with the model of local clocks, which was implemented using ML, allowing different evolutionary rates for some lineages while assuming rate constancy in others. The value for the gamma shape parameter α(1.46204) of the Poisson-correction (PC) gamma distance obtained using the likelihood method was estimated with the codeml program of PAML version 3.12 (Yang 2002). The Dayhoff distance, which takes into account backward and parallel mutations, can be computed from the PC gamma distance with α=2.25 (Nei and Kumar 2000).

Results

Phylogenetic analysis of AP3- and PI-like MADS-box genes

Total RNA was isolated from floral buds or floral organs of basal angiosperms shown in Table 1. cDNA of these species was synthesized using MADS-box-specific degenerate primers, and sequenced. We then compared the inferred protein sequences against sequences in protein databases and selected those that were closely related to the sequences of B-class MADS-box proteins. Eight PI-like and twelve AP3-like sequences were isolated (Fig. 2A).

Relationships of PI and AP3 homologues isolated in this study to other PI and AP3 genes were inferred using maximum-likelihood, parsimony, and neighbor-joining methods (Fig. 3). Four monophyletic groups (angiosperm AP3 homologues, angiosperm PI homologues, gymnosperm AP3/PI homologues, and Bs genes) with moderately high bootstrap values were recognized with all three methods. Maximum-likelihood analysis placed the genes of Amborella and Nymphaeales at the base of the AP3 and PI lineages (Fig. 3A). This relationship was not rejected by either parsimony or neighbor-joining methods (Fig. 3B and C), but bootstrap support of the interior branches was low.

Fig. 3A–C
figure 3figure 3figure 3

Phylogenetic tree of B-class MADS-box genes from angiosperms and gymnosperms. Classification follows Soltis et al. (1999) and Endress (2001). Eudicots* indicates genes from eudicots except for the Ranunculales. Magnoliids** indicates genes from magnoliids except for the monocots. A Maximum-likelihood tree using the JTT model with local bootstrap support estimated by resampling of the estimated log-likelihood method with 1000 replications. B Strict consensus tree of 12 equally parsimonious trees. Numbers above branches indicate bootstrap values (>25%) from 1,000 replications. C Neighbor-joining tree with bootstrap support from 100 replications indicated above branches

Several recent phylogenetic analyses have suggested that Amborella is sister to all other angiosperms (Mathews and Donoghue 1999; Parkinson et al. 1999; Qiu et al. 1999, 2001; Soltis et al. 1999; Savolainen et al. 2000; Zanis et al. 2002). However, the topology of the AP3 and PI lineages in this study suggests that either the Nymphaeales genes alone are sisters to those of other angiosperms, or that Amborella and Nymphaeales genes together form a clade that is sister to the remaining angiosperms. Therefore, we compared three hypotheses for the root of the angiosperms, using a method described previously (Fig. 4; Zanis et al. 2002). A tree-length difference test was applied to these hypotheses. As shown in Fig. 3B, the strict consensus tree of twelve MP trees (3215 steps) supports hypothesis C. We built the preliminary constraint tree for hypotheses A and B using TreeView version 1.6.5 and performed MP analysis enforcing the topological constraints imposed by these hypotheses. MP analysis produced 74 trees with 3221 steps for hypothesis A and 1593 trees with 3217 steps for hypothesis B (Table 2). A Templeton test indicated that the difference between the three hypotheses is not significant (Table 2; Templeton 1983).

Fig. 4A–C
figure 4

The three phylogenetic hypotheses for the placement of the root of the angiosperms (Zanis et al. 2002). A Amborella is sister to all remaining angiosperms. B Amborella and Nymphaeales are sister to all other extant angiosperms. C Nymphaeales is sister to all other angiosperms

Table 2 Results of Templeton test

We isolated three divergent AP3 homologues from Illicium anisatum. These genes, designated IaAP3-1, IaAP3-2, and IaAP3-3, fall into two clades: one clade consisting of IaAP3-1 and IaAP3-3, and another consisting of IaAP3-2 and KjAP3 (Kadsura japonica). The monophyly of these clades is strongly supported under all methods of phylogenetic inference. Although only one AP3 homologue has been isolated from Kadsura japonica, this finding suggests that the IaAP3-1/IaAP3-3 and IaAP3-2/KjAP3 lineages were established before the existence of the common ancestor of Illicium anisatum and K. japonica. It has been suggested that Austrobaileyaceae, Trimeniaceae, Illiciaceae, and Schisandraceae form a clade at the base of the angiosperms rather than Amborella and Nymphaeales (Mathews and Donoghue 1999; Parkinson et al. 1999; Qiu et al. 1999; Soltis et al. 1999). Our data suggest that the genes from Illicium (Illiciaceae) and Kadsura (Schisandraceae) may be sisters to the genes of the remaining angiosperms (the AP3 lineage of the ML tree; Fig. 3A). However, the low bootstrap values mean that this result needs further testing.

Features of the isolated AP3-like and PI-like gene sequences of basal angiosperms

It is known that the genes of the AP3 and PI lineages have several specific sequence features. First, in all previously reported genes of the PI lineage there is a deletion of four residues in the K domain. The four-residue deletion also exists in the K domain of KjPI (K. japonica), but is not found in the sequences of any of the PI homologues of Amborella or the five species of Nymphaeales (Fig. 2A). To further investigate this deletion in the PI homologues, the portion of the genomic DNA that contains the ORF region of the AmPI (A. trichopoda) and KjPI genes was sequenced. Analysis of the exon/intron organization showed that the sequence of the 4-aa site of the K domain corresponds to a 12-bp deletion in exon 5. The length of exon 5 of the KjPI gene is 30 bp and is equal to that of the PI genes of Arabidopsis thaliana, Antirrhinum majus, and Oryza sativa (Table 3). By contrast, exon 5 sequence of AmPI is 42 bp long, as in the genes of the AP3, gymnosperm B-class, and Bs lineages that are sisters to the PI lineage.

Table 3 Structure of the exons of MADS-box genes

The C-terminal region of the AP3 and PI lineages also has specific sequence features. As shown in Fig. 1, the C-terminal region is encoded by exons 6 and 7 (and sometimes 8). In most MADS-box genes of known structure, exon 6 is 42 bp long (Johansen et al. 2002). In contrast, exon 6 in the AP3 and PI homologues is 45 bp long. It has been suggested that the ancestral length of exon 6 of all of the MIKC-type MADS-box genes is 42 bp and that an insertion of 3 bp occurred in the ancestral genes of the AP3 and PI lineages after the divergence of angiosperms and gymnosperms (Johansen et al. 2002; Winter et al. 2002). As shown in Table 2, exon 6 is 45 bp long in AmAP3 (A. trichopoda), AmPI, and KjPI, as in other genes of the AP3/PI clade. This result supports the previous reports (Winter et al. 2002; Johansen et al. 2002).

We found a characteristic sequence in the downstream portion of exon 6 of the AP3 and PI homologues from Amborella. In the 5′ region in exon 7 of the AmAP3 and AmPI genes there is a sequence of five identical residues DEAER (Fig. 2A). However, we cannot be certain that this 5-aa sequence is synapomorphic for the B-class genes because the strong divergence of the C-terminal region complicates alignment of the sequence. By contrast, in the 3′ region of the C domain of AP3 and PI homologues, several motifs have been identified (Kramer et al. 1998; Kramer and Irish 2000; Becker et al. 2002). All previously reported PI homologues share a PI motif, while both a PI motif-derived and a AP3 motif are shared by the AP3 homologues so far reported. The AP3 motif was classified into EuAP3 and PaleoAP3 motifs based on their conserved amino acid residues. We determined the sequence of the C-terminal regions of several genes isolated in this study. As shown in Fig. 2B, the predicted protein sequences of these genes contain the motifs characteristic of AP3 and PI homologues.

Estimates of divergence times

The divergence times among the genes of B-class lineage were estimated. Although several studies have determined molecular estimates of divergence times among the MADS-box genes, the B-class genes were eliminated from these analyses because evolutionary rates of PI and AP3 homologues are heterogeneous (Purugganan 1997; Nam et al. 2003). To select genes for which a molecular clock is applicable, we performed the two-cluster test (Takezaki et al. 1995) with PC distance (Nei and Kumer 2000; Nam et al. 2003). The genes that had evolved significantly differently from other genes at >3% level were eliminated. This analysis suggested that the AP3- and PI-like genes of eudicots, Ranunculales, Piperales, monocots, and Cabombaceae of Nymphaeales evolved significantly more rapidly than genes in the remaining lineages at 1 or 3% levels. Therefore, these genes were eliminated, and a linearized tree with a PC distance was constructed for the remaining genes (Fig. 5). To calibrate the time scale of the linearized tree, a divergence time between angiosperms and extant gymnosperms was used as a calibration point (node a in Fig. 5). Previous studies have revealed that extant gymnosperms (conifers, cycads, Gnetales, and Ginkgoales) are monophyletic (Hasebe et al. 1992; Winter et al. 1999; Bowe et al. 2000; Chaw et al. 2000; Donoghue and Doyle 2000), indicating that the divergence time of extant gymnosperms occurred after the angiosperms/extant gymnosperms split. We have made reference to the fossil records of seed plants for the divergence time of angiosperms and extant gymnosperms. Paleontological studies (Stewart and Rothwell 1993) showed that floras consisting of cordaites and seed ferns had reached a great diversification by the Mississippian and Pennsylvanian (Carboniferous) between 280 and 345 Ma. Cordaites, an extinct plant group, are thought to have a close relationships to conifers and may represent a base branch of seed plants (Eames 1952; Stewart and Rothwell 1993; Taylor and Hickey 1996). We therefore assumed that the divergence time between angiosperms and extant gymnosperms may be sometime in the Carboniferous at around 340 Ma (Sanderson and Doyle 2001). Martin et al. (1993) and Sanderson (2003) also used a calibration point of 330 Ma (mid-Carboniferous) for the angiosperm/extant gymnosperm divergence. Based on rbcL and Rrn18 gene sequences and using four calibration points of the divergence time between land plants, Savard et al. (1994) obtained an estimate of the divergence time between extant angiosperms (9 species) and extant cycads and conifers (12 species) of 275–290 Ma. Goremykin et al. (1997) used 58 genes from six completely sequenced chloroplast genomes and estimated the split between Pinus and extant angiosperms (Nicotiana, Oryza, and Zea) as 348 Ma, assuming that Marchnantia diverged at 450 Ma. A recent molecular study by Soltis et al. (2002) that used the assumed age of 125 Ma of the earliest angiosperms as a calibration point yielded an estimate of 343.7 Ma as the divergence time between extant angiosperms (Austrobailea and Chloranthus) and extant gymnosperms (Cycas, Ginkgo, Gnetum, and Pinus) using parsimony analysis and a combined data set of four genes.

Fig. 5
figure 5

Linearized tree used for estimation of divergence times. The tree topology and time scale are based on the result with the PC distance (see text). Letters at nodes indicate divergence points of major clades

Table 4 shows the results, including time estimates obtained using PC, PC gamma, and Dayhoff distances. When the PC distance was used, the divergence time between the AP3 and PI lineages (node b in Fig. 5) was estimated to be 301 Ma. The divergence time was estimated at 293 Ma from the Dayhoff distance and at 287 Ma from the PC gamma distance. Yoder and Yang’s method (2000) with a local clock model was also used, including all of the angiosperm sequences. The local clock model allows different evolutionary rates for some lineages. This method estimated the divergence time between the AP3 and PI lineages as 316 Ma.

Table 4 Estimates of divergence timesa (mya) of B-class MADS-box genes

We also estimated the age of the most recent common ancestor of all living angiosperms (Table 4). Because the AP3 and PI lineages are in a clade of the angiosperm B-class genes, two results for the first divergence time of the extant angiosperms (nodes c and d in Fig. 5) were obtained from a linearized tree. Using the linearized tree method with a PC distance gave an age of extant angiosperms of 208 Ma from the AP3 lineage and 170 Ma from the PI lineage. Analyses with the Dayhoff distance and the PC gamma distance gave smaller estimates than that obtained when using PC distance. Estimates generated using Dayhoff distance were 170 Ma for the AP3 lineage and 151 Ma for the PI lineage, and estimates using PC gamma distance were 161 Ma for the AP3 lineage and 145 Ma for the PI lineage. The likelihood method gave almost the same estimates as the linearized tree method: 196 Ma for the AP3 lineage and 157 Ma for the PI lineage.

Discussion

Phylogenetic relationship of basal angiosperms

We have reported here the cloning and characterization of seventeen new genes of the AP3 and PI lineages from basal angiosperms. In recent decades, several molecular phylogenetic studies have been performed to deduce the relationship of basal angiosperms. Early phylogenetic analyses using rbcL sequences led to the proposal that Ceratophyllum may be sister to the remaining angiosperms (Chase et al. 1993). Subsequent studies have identified species of the ANITA grade as the earliest diverging lineages of angiosperms. Analyses of atpB (Savolainen et al. 2000); phytochrome genes (Mathews and Donoghue 1999); a combined 18S, rbcL, and atpB data set (Soltis et al. 1999); a data set of five mitochondrial, plastid, and nuclear genes (Parkinson et al. 1999; Qiu et al. 1999); and noncoding plastid trnT-trnF sequences (Borsch et al. 2003) have suggested that Amborella is the sister of all other angiosperms, followed by Nymphaeales, and then a clade consisting of Austrobaileya, Trimeniaceae, Illiciales, and Schisandraceae. Other analyses link Amborella with Nymphaeales or reverse these two taxa, with both lines basal to other angiosperms (Barkman et al. 2000; Graham and Olmstead 2000; Qiu et al. 2000). However, another recent study that used the complete chloroplast genome sequences of thirteen land plants suggested that the monocot lineage diverged earlier than the other groups of angiosperms, including Amborella (Goremykin et al. 2003). In the present study, we used AP3-like and PI-like MADS-box genes, which play an important role in floral development. Molecular phylogenetic analysis of floral homeotic genes with the ML method revealed Amborella and Nymphaeales as basalmost lineages in the angiosperms (Fig. 3A). Our data, however, do not allow us to conclusively determine the order of the branching relationship among the putative B-class genes of Amborella, Nymphaeales, and the remaining angiosperms (Table 2).

The gene structure of the PI homologues of basal angiosperms supports the idea that Amborella and Nymphaeales are basalmost lineages in angiosperms. Among the protein sequences deduced from the cDNAs of the basal angiosperm genes, the PI homologues of Amborella and Nymphaeales do not have the 4-aa deletion that is characteristic of other angiosperm PI homologues. The most parsimonious explanation for these results is that the 4-aa deletion was established in the PI lineage after the separation of Amborella and Nymphaeales from other angiosperms (Fig. 6). The 4-aa deletion of PI homologues is in the upstream part of the third putative amphipathic α-helix in the K domain, called K3 (Yang et al. 2003). It has been reported that the K3 region is more important for the interaction between the Arabidopsis AP3 and PI proteins than either the I region or the C-terminal region (Yang et al. 2003). It is likely that the deletion in the ancient PI gene modified the function and protein–protein interaction characteristics of this protein.

Fig. 6
figure 6

Phylogeny of B-class genes. A highly simplified phylogenetic tree of B-class and Bs genes is shown, with AP3 and PI symbolizing gene clades. Arrows with letters at nodes define putative apomorphies, as described in the figure

Previous studies have shown that the molecular evolution of the B-class genes of some species has been rapid (Purugganan 1997; Barrier et al. 2001; Nam et al. 2003). The two-cluster test used in the present study revealed that rapid evolution might have occurred in several lineages of the angiosperm AP3 and PI homologues. Barrier et al. (2001) have suggested that the accelerated evolutionary rates of the MADS-box genes may accompany rapid morphological diversification in adaptive radiations of Hawaiian silversword alliance. Lamb and Irish (2003) also have suggested that changes in protein sequence and function of B-class MADS-box proteins influence floral morphologies among the angiosperms. To understand what changes in the sequence of AP3 and PI genes may underlie morphological diversification, investigation of the influence of rapid molecular evolution and the 4-aa deletion on the function of these genes will be important. Further analysis of AP3 and PI genes may thus contribute to our understanding not only of the phylogenetic relationship of angiosperms but also of the morphological diversification of flowering plants.

Estimates of divergence times

Using the linearized tree and likelihood methods, the divergence time between the AP3 and PI lineages was estimated as approximately 34–53 million years after the assumed divergence between angiosperms and extant gymnosperms at 340 Ma (Table 4). However, given the error margin and variation of our various estimations (Table 4), this result does not exclude the possibility that the divergence between the AP3 and PI lineages occurred immediately after the angiosperms/extant gymnosperms split (Winter et al. 2002). The divergence between the AP3 and PI lineages appears to have occurred somewhere between immediately after to several tens of millions of years after the split between angiosperms and extant gymnosperms.

We also estimated the age of the most recent common ancestor of the extant angiosperms. At present, the oldest definite angiosperm fossils are pollen grains of lower Cretaceous age, 132–141 million years old (Hughes 1994; Brenner 1996). Molecular-based estimates of the divergence times between the lineages of extant angiosperms range widely. The first molecular study by Ramshow et al. (1972) obtained an estimate of 350–420 Ma for the higher angiosperms/buckwheat divergence based on the cytochrome c protein, calibrated with the bird/mammal split at 280 Ma. Several subsequent analyses obtained basically an estimate of the divergence time between monocots and eudicots. Using the gapC gene, calibrated with an animal fossil and the presumed divergence of plants, animals, and fungi at 1000 Ma, Martin et al. (1989) dated the split between monocots (two species) and dicots (Magnolia and six eudicots) as 319 Ma. These studies gave far older ages for the angiosperms than the oldest fossil record. Subsequent studies used the divergence time between land plants for calibration and estimated the divergence time between monocots and eudicots as 200±40 Ma (Wolfe et al. 1989), 230–350 Ma (Brandl et al. 1992), 300 Ma (Martin et al. 1993), 200 Ma (Laroche et al. 1995), 160 Ma (Goremykin et al. 1997), and 239 Ma (Nam et al. 2003). A study by Sanderson (1997) with a novel method, nonparametric rate schift (NPRS), used rbcL genes of 36 land plants that include many basal angiosperms. He dated the age of the most recent common ancestor of extant angiosperms as 165 Ma, calibrated with the divergence of Matchantia and remaining land plants at 450 Ma. Using almost the same data set of 31 plants, Thorne et al. (1998) applied a model-based Bayesian approach and estimated that the angiosperm root node is 51% as old as the most recent common ancestor of vascular plants (i.e., ~200 Ma; Sanderson and Doyle 2001). Subsequently, Sanderson and Doyle (2001) presented a series of analyses that illustrated the effect of various factors of age estimation using different tree topologies (Mathews and Donoghue 1999; Parkinson et al. 1999; Qiu et al. 1999; Soltis et al. 1999; Savolainen et al. 2000). They obtained estimates of ~140–190 Ma for the base divergence of extant angiosperms.

The present estimate used the AP3-like and PI-like MADS-box genes of basal angiosperms and the linearized tree and likelihood methods (Takezaki et al. 1995; Yoder and Yang 2000), calibrated with an approximate angiosperm/extant gymnosperm split at 340 Ma. Our analysis suggests that the extant angiosperm divergence began at about 145–208 Ma (upper Triassic to Jurassic). This estimate coincides with the estimates obtained by Sanderson (1997), Thorne et al. (1998), and Sanderson and Doyle (2001). The common feature of our analysis and these previous analyses is that all have used a dense sample of basal angiosperms.