Duplication and divergence of the MADS-box genes in plants is likely to have played a fundamental role in the elaboration of the plant body plan (Theissen et al. 2000). The MADS-box genes are found in both plants and animals, and encode transcription factors that contain a 58-amino-acid DNA-binding domain and regulate a variety of developmental processes (Schwart-Sommer et al. 1990). The MADS-box genes have undergone a significant amount of gene duplication (Theissen et al. 2000). This increase in the numbers of MADS-box genes, as well as the recruitment of these genes to new roles, is likely to have contributed to the evolution of new plant morphologies (Theissen et al. 2000).

SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 (SOC1) is a member of the MADS-box genes from Arabidopsis thaliana that integrates signals from the photoperiod, vernalization, and gibberellin pathways (Borner et al. 2000; Lee et al. 2000; Samach et al. 2000). SOC1 is a dosage-dependent promoter of flowering, and its expression gradually increases during development (Borner et al. 2000; Moon et al. 2003). Moreover, SOC1 is up-regulated by vernalization as well as by gibberellin application (Borner et al. 2000; Moon et al. 2003). Both the photoperiodic pathway gene CONSTANS (CO) and the vernalization pathway gene FLOWERING LOCUS C (FLC) regulate SOC1 and FLOWERING TIME (FT) gene expression, thereby affecting flowering time (Borner et al. 2000; Moon et al. 2003). The CO activates flowering largely by increasing the activity of SOC1 and FT, whereas FLC delays flowering, at least in part, by repressing the same target genes (Lee et al. 2000; Samach et al. 2000; Hepworth et al. 2002). Hepworth et al. (2002) have identified a 351-bp region in the SOC1 promoter that is required for activation by CO and repression by FLC. They also revealed that CO and FLC act on separate regions within the 351-bp segment and that the binding of FLC to a CArG box within this region may be essential for its repressive function in vitro (Hepworth et al. 2002). Unlike SOC1, the MADS-box gene AGL24 is not affected by FLC, although AGL24 is also up-regulated by vernalization in A. thaliana (Michaels et al. 2003). This result indicates that the similar functions of SOC1 and AGL24 might diverge as these genes specialize or acquire additional functions.

The ETL, which is one of three SOC1/TM3-like genes in Eucalyptus globulus ssp. bicostata, is expressed in both vegetative and reproductive organs, predominantly in root and shoot meristems and organ primordia, as well as in developing male and female organs (Decroocq et al. 1999). Moreover, PTM5, which is a SOC1/TM3-like gene in the aspen tree (Populus tremuloides), is expressed in vascular cambium and xylem tissue as well as in the vascular bundles of expanding catkins (Cseke et al. 2003). These results indicate that SOC1/TM3-like genes in dicots have various expression patterns, suggesting that the functions of these genes may have diversified significantly via gene duplication events in the course of dicot evolution. In addition, recent comprehensive phylogenetic approaches have identified the monocot as one of the angiosperm ancestral groups (e.g., Soltis et al. 1999), therefore, it is likely that SOC1/TM3-like genes in monocots have ancestral functions in angiosperms. Among monocots, ZmMADS1, which is a SOC1/TM3-like gene in maize (Zea mays), is co-expressed with ZmMADS3, which is a SQUAMOSA-like gene, in all ear spikelet organ primordia during floral development (Heuer et al. 2001). The OsSOC1, which is one of two SOC1/TM3-like genes in rice (Oryza sativa), is expressed in vegetative tissues, and its expression is elevated at the time of floral initiation (Tadege et al. 2003). This expression pattern of OsSOC1 is similar to that of SOC1 from A. thaliana (Tadege et al. 2003). Thus, although the functions of SOC1/TM3-like genes have been identified in some monocots, the expression or function of only one of the duplicated genes has been elucidated. Moreover, it is only in the grass family (Poaceae) that the characterization of SOC1/TM3-like genes has been made so far in monocots. Therefore, it is difficult to infer the ancestral function of SOC1/TM3-like genes in angiosperms from these results in monocots.

The aim of the current study is to investigate the ancestral function of SOC1/TM3-like genes in angiosperms and to infer the evolutionary histories of SOC1/TM3-like genes via gene duplication. In this article, we describe the isolation and characterization of the SOC1/TM3-like gene from Trillium camtschatcense (Trilliaceae) and elucidate the phylogenetic relationship of the SOC1/TM3-like genes. We also analyze some aspects of the molecular evolution of the SOC1/TM3-like genes to further understand the evolutionary dynamics.

Individual native plants of Trillium camtschatcense Ker Gawl. were collected in the field. Samples were divided into various tissues and stored in a deep freezer. Vouchers for species sampled in this study have been deposited in the Herbarium, Graduate School of Science, Tohoku University (TUS). Total RNA was prepared from whole flower buds of T. camtschatcense using the Rneasy Plant Mini Kits (QIAGEN) as described. Poly (A)-RNA was extracted from the total RNA using the Oligotex-dt30 Super Kit (Roche). Partial cDNA samples were isolated using the 3′ rapid amplification of cDNA ends (RACE) method (Frohman et al. 1988). The MADS-box degenerate primer (5′-GACARGTCACKTTYTCKAAGC-3′) and the poly-T primer (5′-GGCTCGAGTCGACATTGATTTTTTTTTTTTTTTTT-3′) were used for the 3′RACE procedures. Approximately 100 clones were characterized and sequenced using a BigDye Terminator Cycle Sequencing Premix kit (Applied Biosystems, CA, USA) on an automated sequencer (Model 310, Applied Biosystems) according to the manufacturer’s instructions. At least two independent cDNA samples of each gene were cloned, and both strands were sequenced. Expression of Trillium camtschatcense MADS-box gene 1, which we named TrcMADS1, in each organ of T. camtschatcense was characterized by PCR of cDNA pools prepared from dissected tissues. Gene-specific primers (5′-CTGAGCTGCAATTAGCC-3′, 5′-GTGCAAATATTCCAACTCTG-3′) were designed based on known sequences from T. camtschatcense. The following thermocycling conditions were employed: (94°C, 2 min) × 1 cycle, (94°C, 30 s; 50–55°C, 30 s; 72°, 120 s) × 25 cycles, and (72°C, 15 min) × 1 cycle. Amplified products were run on 1.5% agarose gels and digitally photographed.

To construct a phylogenetic tree for SOC1/TM3-like genes, amino acid sequences were obtained from the EMBL, DDBJ, and GenBank DNA databases and aligned using Clustal W (Thompson et al. 1994). Sequences of AG, CAG1, GHMADS2, PLE, SrhAG, and ZAG were used as outgroups. The phylogenetic tree was constructed by using the maximum-likelihood (ML) method. For the ML analyses, we used the PROML program of PHYLIP Version 3.6 (Felsenstein 2004). We employed the JTT model of amino acid substitution. We performed ten random sequence addition searches using the J option and global branch swapping using the G option to isolate the ML tree with the best log-likelihood, and we performed bootstrap analysis with 100 replications.

We isolated cDNA clones of MADS-box genes from T. camtschatcense using the RACE method (TrcMADS1: AB181491). BLAST searches revealed that one of these clones shares high sequence identity with the PTM5 gene (Cseke et al. 2003) and the SOC1 gene (Borner et al. 2000). The result of alignment of this and other SOC1/TM3-like genes revealed that TrcMADS1 has a well-conserved MADS-domain and K-domain. Moreover, 11 amino acid residues at the C-terminal region, hereafter termed the “SOC1 motif,” were conserved among previously published sequences (Fig. 1), as also described by Vandenbussche et al. (2003). The highly conserved SOC1 motif is present in SOC1/TM3-like genes from gymnosperms and angiosperms. Four of the 11 amino acid residues in the SOC1 motif are the same in angiosperms. This motif is not conserved in the outgroup lineages of AGL 6-like genes (Fig. 1).

Fig. 1
figure 1

Alignment of the C-terminal regions of predicted amino acid sequences for select representatives of SOC1/TM3-like genes and their outgroup lineage, AGL6-like genes (bottom six sequences). A highly conserved region, the SOC1 motif, is indicated with a box. Shaded amino acid residues in this region are functionally conserved relative to the SOC1 consensus sequence. The taxa of origin for these genes are noted in Fig. 2

To determine the phylogenetic position of the TrcMADS1 gene isolated in this study, we conducted phylogenetic analyses of SOC1/TM3-like genes for a data set including most of the published genes in the MADS-box gene family. We used the 82-amino-acid sequences of the MADS-domain and K-domain for this analysis. Figure 2 shows the result of the ML analysis. All of the SOC1/TM3-like genes in angiosperms appeared within clades 1 and 2. Clade 3 contained SOC1/TM3-like genes of gymnosperms. The SOC1/TM3-like genes in clade 1 could be roughly divided into three groups, named clades 1A–C.

Fig. 2
figure 2

Phylogenetic trees of SOC1/TM3-like genes generated by the maximum-likelihood (ML) method. The log-likelihood of the best ML tree is −4,281.06. A gene from Trillium camtschatcense is boxed. Numbers below the branches represent bootstrap values from 100 replicates (values ≥50% are indicated). The taxon of origin is shown in parentheses after each gene name

The expression pattern of the TrcMADS1 gene in T. camtschatcense was analyzed by RT-PCR analysis using total RNA isolated from the root, the stem, the leaf, and the early and late floral buds. The triosephosphate isomerase gene (TPI) was used as a positive control. The RT-PCR analysis revealed that TrcMADS1 is expressed in both reproductive and vegetative organs, although the signal of the early floral organ is weaker than that of the late floral organ (Fig. 3).

Fig. 3
figure 3

Gene-specific RT-PCR reactions using various organs dissected from Trillium camtschatcense. The RNA transcripts from root (R), stem (S), leaf (L), early floral bud (EF), and late floral bud (LF) are shown. TPI was used as a positive control

When SOC1 and OsSOC1 were first described, no sequence similarity to the previously published SOC1/TM3-like genes was found. However, more recent database searches and accumulated sequencing information allow us to investigate sequence similarity in SOC1/TM3-like genes. A highly conserved C-terminal motif, the SOC1 motif, exists in SOC1/TM3-like genes from both gymnosperms and angiosperms (Fig. 1) (Vandenbussche et al. 2003). This SOC1 motif is conserved in spite of the large phylogenetic distance that separates these genes and the emergence of mutations that occurred during the functional diversification of plants. Therefore, it is possible that the SOC1 motif plays important roles in determining partner specificity in higher-order complex formation or that it contains an activation domain. The SOC1 motif is only present in genes in the monophyly of SOC1/TM3-like genes, and no consensus sequences in this region are present in genes in outgroups, such as the AGL6-like genes (Figs. 1, 2). This result indicates that the SOC1 motif is the synapomorphic character in the phylogenetic analysis of the MADS-box gene family.

This sequence may also be subject to post-transcriptional modification that influences DNA-binding specificity, subcellular localization, or the ability to attract interesting binding partners. Specific protein sequences outside the MADS domain and homeodomains of the HOX orthologue proteins are linked to diversity in plants and insect body plans (Galant and Carroll 2002; Levine 2002; Ronshaugen et al. 2002; Lamb and Irish 2003). With regard to the motifs of the C-termini of predicted protein sequences, for example, Egea-Cortines et al. (1999) reported that in Antirrhinum majus, the first half of the C-terminal regions of the MADS-box genes DEFICIENS (DEF) and GLOBOSA (GLO) appear to be essential for in vitro ternary complex formation with the SQUAMOSA (SQUA) protein. Moreover, Lamb and Irish (2003) indicated that in Arabidopsis thaliana the PI motif at the C-terminal of the PI orthologue in the MADS-box genes is necessary for the specification of organ identity, although this PI motif is positioned outside the functional MADS-domain and K-domain. In the HOX orthologue proteins, Galant and Carroll (2002) and Ronshaugen et al. (2002) have also shown that the gain or loss of the QA motif in the C-terminal regions of Ultrabithorax (Ubx) contributes to the evolution of hexapod body patterns in animals. In view of these previous results, the well-conserved C-terminal SOC1 motif in our study is especially intriguing with respect to its role in generating floral diversity. The high sequence conservation observed in the SOC1 motif possibly reflects a similar conservation of biochemical interactions among the SOC1/TM3-like proteins. Clearly, the function and complex formation of this motif should be studied using proteins lacking portions of the characteristic C-terminal SOC1 motif.

Since the MADS-box gene family is important for flower development, the phylogenetic analysis of the MADS-box genes might improve our understanding of plant diversification. Studies of the MADS-box gene family have provided the ability to correlate differences in various morphologies with molecular and functional changes in the MADS-box genes (e.g., Theissen et al. 2000). Our study has unequivocally shown that several clades of SOC1/TM3-like genes exist in angiosperms (Fig. 2). Phylogenetic analysis of genes homologous to the SOC1/TM3-like genes in angiosperms suggests that SOC1/TM3-like genes in rosid plants, including Arabidopsis thaliana, make up a paraphyletic assemblage (Fig. 2).

Among clades in our phylogenetic results, clade 1A consists of 14 genes, one of which is SOC1 from Arabidopsis thaliana (Fig. 2). The SOC1 integrates signals from the photoperiod, vernalization, and gibberellin pathways during development, and is up-regulated by vernalization as well as by gibberellin application (Borner et al. 2000; Moon et al. 2003). These functions of SOC1 may be conserved in the orthologous genes of rosid and asterid groups, such as BrAGL20, CfAGL20, DEFH28, and SaMADSA (Fig. 2). Moreover, the expression pattern of TOBMADS1 from Nicotiana tabacum (Mandel et al. 1994), a paralogous gene of SOC1 in clade 1A, is similar to that of SOC1; therefore, genes belonging to clade 1A seem to have a function similar to that of SOC1. This hypothesis is supported by the high functional conservation of the SOC1 motif in clade 1A (Fig. 1) despite the use of MADS-domain and K-domain without the SOC1 motif. Clade 1B consists of four genes in the rosid group (Fig. 2). In this clade, the A. thaliana gene AGL42 is expressed in roots, leaves, and inflorescences (Parenicova et al. 2003). Clade 1C consists of three genes, including ETL from Eucalyptus globulus (Fig. 2). The ETL is expressed in both vegetative and reproductive organs, predominantly in root and shoot meristems and organ primordia, as well as in developing male and female organs (Decroocq et al. 1999). In light of these characteristics of clades 1B and C, it is likely that the functions of the SOC1/TM3-like genes in clades 1B and C may be partially redundant with SOC1. In other words, it appears that only the SOC1/TM3-like genes in clade 1A have limited functions in the photoperiod, vernalization, and gibberellin pathways.

Clade 2 consists of six genes, including the TrcMADS1 gene from T. camtschatcense (Fig. 2), which is expressed in both reproductive and vegetative organs (Fig. 3). This expression pattern of TrcMADS1 resembles the patterns of ETL and AGL42. Despite the pleiotropic expression pattern of TrcMADS1, ZmMADS1 of maize in this clade is expressed in all ear-spikelet-organ primordia during floral development (Heuer et al. 2001), and OsSOC1 of rice in this clade is expressed in vegetative tissues, although its expression is elevated at the time of floral initiation (Tadege et al. 2003). However, OsSOC1 is a duplicated gene in rice. Therefore, the partitioning of gene expression patterns by gene duplication may reduce the pleiotropic expression pattern. To verify this, further studies are needed to investigate the expression of the other copy of the duplicated genes in rice and maize. An alternative is suggested by the results of Tadege et al. (2003), who reported that the function of OsSOC1 in rice is similar to that of SOC1 in Arabidopsis thaliana. Our phylogenetic result indicates that these genes belong to different clades (Fig. 2). Therefore, our result indicates that these functions are possibly a result of parallel evolution via gene duplication in the course of angiosperm diversification and could have been acquired independently in each lineage. Further comparative analyses using broadly diverse taxa and transgenic plants will be needed to illuminate the evolutionary histories of SOC1/TM3-like genes.