Introduction

Members of the MADS-box gene family function as key regulators throughout plant development (for a review see Becker and Theissen 2003; Hileman et al. 2006). In recent years, major progress has been made in understanding the evolutionary history of this gene family. From a phylogenetic point of view, the evolutionary history of the MADS-box gene family is characterized by frequent gene duplication and gene loss (Shinozuka et al. 1999; Parenicova et al. 2003; Hileman et al. 2006; Leseberg et al. 2006). Gene loss is regarded as the most likely fate for one of the duplicated copies. However, following a period of redundancy, retention of both copies can occur through neo- or subfunctionalization (Moore and Purugganan 2005). Neofunctionalization, indicating a novel function acquired by one of the two duplicates, can occur through changes in either the protein sequence or in the cis-regulatory sequences, while subfunctionalization is the process in which each duplicate implements different elements of the original function of the ancestral gene, and occurs most likely in cis-regulatory sequences (Ohno 1970; Force et al. 1999; Lynch and Conery 2000; Lynch and Force 2000; Zhang 2003; Moore and Purugganan 2005; Preston and Kellogg 2006). The occurrence of neofunctionalization through changes in coding sequences is considered to be detectable by comparing synonymous and nonsynonymous substitutions. Several studies in the MADS-domain family, however, were not able to find convincing evidence for this type of duplicate retention mechanism (Aagaard et al. 2005; Kellogg 2006; Preston and Kellogg 2006). Since differences in protein structure of type II MADS box genes are most obviously present in the C-terminal domain (Kramer et al. 1998, 2004; Kramer and Irish 2000; Vandenbussche et al. 2003) and frameshifts resulting from length mutations can explain the origin of the characteristic C-termini of the different MADS-domain subfamilies (Vandenbussche et al. 2003), the length mutations occurring in the C-terminal domain could be positively selected for, a hypothesis which so far has not been investigated.

Within the MADS-box gene family, class B genes are one of the best-studied subfamilies (Kramer et al. 1998; Stellari et al. 2004; Aagaard et al. 2005; Zahn et al. 2005; Janssens et al. 2007). Floral homeotic class B genes, which comprise AP3/DEF and PI/GLO, are required for petal identity in the second whorl and stamen identity in the third whorl of eudicot flowers (Jack et al. 1992, 1994; Goto and Meyerowitz 1994; Theissen et al. 2000; Alvarez-Buylla et al. 2000). AP3/DEF belong to the type II class MADS-box genes, consisting of four distinct regions: a MADS domain (M), an intervening region (I), a keratine-like domain (K), and a C-terminus (C) (Munster et al. 1997). The MADS domain is essential for binding DNA to CarG promoter sequences. The intervening region is considered to be critical to mediate dimerisation with other MIKC-type proteins (Riechmann et al. 1996). In addition, the keratin-like domain, which is located between domain I and the C-terminus, is involved in mediating specific protein/protein interactions (Riechmann et al. 1996). The C-terminal region, which is most variable in length and sequence, contributes to the formation of higher-order protein complexes between dimers of MIKC proteins (Egea-Cortines et al. 1999; Honma and Goto 2001). The evolution of the C-terminal domain in AP3/DEF can be understood by inferring a frameshift mutation in the ancestral paleoAP3-motif. This frameshift gave rise to the present euAP3-motif in core eudicots (Vandenbussche et al. 2003; Kramer et al. 2006). Although these frameshift mutations are usually expected to cause loss of function due to the formation of premature stop codons or a fundamental change of the downstream protein sequence, they sometimes result in a functional fragment that will remain conserved in the genome (e.g., the euAP3-motif). Additionally, other C-terminally truncated class B genes have been found in diverse angiosperm species (e.g. Kramer et al. 2006).

In Impatiens (Balsaminaceae), a recent study identified two AP3/DEF-like genes with strongly divergent C-terminal domains in the species I. hawkeri (Geuten et al. 2006). One of these duplicates is characterized by a truncated C-terminal domain with loss of the euAP3- and PI-derived motifs (IhDEF2), while the other is unusually long but with euAP3- and PI-derived motifs present (IhDEF1). From a phylogenetic point of view, an AP3/DEF-like gene in Marcgravia (MuDEF; Marcgraviaceae) is sister of both Impatiens AP3/DEF-like genes, from which it can be derived that the duplication event that gave rise to the two paraloguous ImpDEF copies must have happened subsequent to the Balsaminaceae-Marcgraviaceae split. In addition, this duplication most likely occurred prior to the origin of the genus Impatiens, because we were able to sequence both copies from several representatives of the genus (Janssens et al. 2007). Both genes in I. hawkeri (IhDEF1 and IhDEF2) seem to have retained a rather similar expression pattern, occurring in particular parts of the corolla and the stamens (Geuten et al. 2006). In addition, we know that the two AP3/DEF paralogues in Impatiens show differences in protein interaction (Geuten et al. 2006).

The case of Impatiens is interesting because it concerns a relatively recent duplication, which seemingly coincides with the origin of the genus. This allows us to trace the events subsequent to duplication and frameshift with time being less of an obscuring factor.

In this paper we show that recent duplications and subsequent frameshift mutations that give rise to evolutionary retained duplicates can be considered an ongoing process in MADS-box gene diversification. Furthermore, we demonstrate that the diversification of such a recent duplicate pair is likely not because of positive selection on individual codons, but because of positive selection on the length of the C-terminal domain. The latter is a completely new and previously uninvestigated finding that helps us understand how these duplicates survive after their origin.

Materials and Methods

Molecular Protocols

DNA was extracted for 62 species (Table 1) using a modified version of the hot CTAB protocol (Geuten et al. 2006; Janssens et al. 2006). Specific primers used for the amplification of the C-terminal ends of the two AP3/DEF homologues in Impatiens (ImpDEF1 and ImpDEF2) were designed using the cloned cDNA sequences of Geuten et al. (2006): IhCT-DEF1F (5′-GCAGATACACAGAAACCTACTTCAG-3′), IhCT-DEF1R (5′-TTATAGTTAAAGCAAAGCATAGGTTGTGAG-3′), IhCT-DEF2F (5′-ACAGACACATAGAAACTTACTCCAGC-3′), and IhCT-DEF2R (5′-CCATCACCATCTTCATCTTCAATCC-3′). The temperature profile used to amplify the C-terminal end of ImpDEF1 consisted of 2 min of initial denaturation at 94°C and 30 cycles of 30 s of denaturation at 94°C, 30 s of primer annealing at 52.5°C, and 1 min of extension at 72°C. Amplification of the C-terminal domain of ImpDEF2 was executed under the same conditions as described above except for an annealing temperature of 53°C. Amplification reactions were executed with a GeneAmp PCR system 9700 (Applied Biosystems, Fostercity, CA, USA). Sequencing reactions were performed on an ABI310 capillary sequencer (Applied Biosystems). All accessions surveyed were homozygous in length for both paralogues, thus displaying no sequence polymorphism.

Table 1 Accession numbers, voucher data, and origin of plant material for taxa included in the DNA analyses

Phylogenetic Analysis

Initial alignment of the DNA sequences was carried out with CLUSTALX and manually adjusted following the protein alignment using MacClade 4.05 (Maddison and Maddison 2002). We used MrBayes 3.1.1 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003) to conduct Bayesian analysis. The best-fitting substitution model for Bayesian analysis (GTR+G) was selected using a series of likelihood-ratio tests as implemented in ModelTest 3.06 (Posada and Crandall 1998). According to Janssens et al. (2006), I. omeiana is sister to all other Impatiens species and was, therefore, used as outgroup for all phylogenetic analyses. All sequences were submitted to GenBank (Table 1).

Tests for Selection on Amino Acids

In order to identify positively selected amino acid sites in the C-terminal domain of the ImpDEF duplicates, the ratio of nonsynonymous-to-synonymous substitutions (d N/d S or ω) at the codon level has been examined. d N/d S values for amino acid sites in ImpDEF were estimated using the approach of Nielsen and Yang (1998) as implemented in HyPhy (Kosakovsky Pond et al. 2005). This site-specific method, which uses a likelihood-based approach to identify selection, assumes variable selective pressures among sites under the assumption of different classes of sites in the gene with different d N/d S (ω) ratios. Both ImpDEF1 and ImpDEF2 data sets were investigated by comparing discrete distribution models (M1, M2, and M3) and comparing continuous distribution models (M7 and M8). Likelihood-ratio tests were carried out for the comparison of following evolutionary models: single rate model (M0) and discrete model (M3), neutral (M1) and selection model (M2), and β (M7) and β + ω model (M8).

In order to examine differences in selective pressure along the branches to either duplicate clade, the likelihood of branch-site models (model A and model B [Yang and Nielsen 2002]) was tested with codeml, a software program of the PAML package (Yang 1997). By comparing the likelihood of model A (MA) and model B (MB) with the likelihood of model 1a (M1a) and model 3 (M3), respectively, we were able to test for selection on either of the branches leading to ImpDEF1 and ImpDEF2.

Indel Substitution Rates and Selective Pressures

Several complementary approaches were applied to investigate the evolutionary rates and selective pressures in the Impatiens AP3/DEF C-terminal domain sequences. First, we tested whether the strong sequence diversity of the ImpDEF1 C-terminal domain in Impatiens shows an increased rate compared to the AP3/DEF C-terminal domain sequences of Brassicaceae and Solanaceae. To do this, we used the approach of Podlaha and Zhang (2003), in which the rate of indel mutations was calculated per site per year. Indels in the C-terminal end of ImpDEF1 were separately mapped following the parsimony principle onto the ImpDEF1 C-terminal domain-based phylogeny of Impatiens and calculated for each branch of the tree. This tree was chosen because it had resulted in the best-resolved and supported phylogenetic hypothesis for our species sampling.

Subsequently, we tested for possible selection on indels using an approach that is described by Podlaha et al. (2005). This method hypothesizes that the indel mutation rate can be computed per unit of d S distance and is in contrast to the first method not affected by inaccurate time estimates. Using this approach, the possibility of selection on indel substitutions is examined by separately comparing the indel substitution rate of the C-termini of ImpDEF1 and ImpDEF2 with the indel mutation rate of ImpDEF introns 4 and 5.

Results

The C-Terminal Domain of ImpDEF2 Is Conserved in Length and Sequence, Whereas the C-Terminal Domain of ImpDEF1 Shows Strong Length Variation

The cloned AP3/DEF sequences from I. hawkeri (IhDEF1 and IhDEF2) are remarkably different, with IhDEF2 terminated by a premature stop codon, positioned immediately before the normally expected PI-derived motif. To confirm that the sequences did not represent experimental artifacts, the partial cDNA sequences were checked by sequencing the corresponding genomic locus (Geuten et al. 2006). Our data show that the truncation in the ImpDEF2 C-terminus is coded by the genome of almost all Impatiens species examined, suggesting that this derived protein does have a specific function throughout the genus. We present an alignment of the protein sequence of the C-terminus of ImpDEF1 and ImpDEF2 in Figs. 1 and 2, respectively. The length of these C-terminal ends was calculated following the initial mapping of the protein regions in AP3 (Krizek and Meyerowitz 1996). The average length of the ImpDEF2 C-terminus is 20.95 ± 0.25 codons, showing almost no length variation. Furthermore, within the C-terminal domain of ImpDEF2, only two independent indel substitutions were observed. One mutation was found in I. flaccida, while another was located in I. edgeworthii. Both species are not closely related, as they belong to two different clades in the Impatiens phylogeny (Janssens et al. 2006)

Fig. 1
figure 1

Amino acid sequence alignment of the C-terminal domain of ImpDEF1 from 61 Impatiens species. Dots represent amino acids identical to the first sequence, while dashes represent aligned gaps

Fig. 2
figure 2

Amino acid sequence alignment of the truncated C-terminal end of ImpDEF2 from 31 Impatiens species. Dots represent amino acids that are identical to the first sequence, while dashes represent aligned gaps

.

In contrast to ImpDEF2, ImpDEF1 contains both characteristic PI-derived and euAP3-motifs and is characterized by a considerable length divergence throughout the whole genus (Fig. 1). To make PCR amplification of the C-terminal domain in ImpDEF1 possible using a single primer pair, we designed the reverse primer in the euAP3-motif. As a result, we only used the sequenced segment of the ImpDEF1 C-terminal domain, which is the equivalent to the complete C-terminal domain minus the length of the euAP3-motif (27 nucleotides [nt] or 9 amino acids [aa]). Hence, the average length of the C-terminal domain (minus the euAP3-motif) in ImpDEF1 is 93.95 ± 5.3 aa, with a maximum length of 101 codons for I. xanthina and a minimum length of 74 codons for I. omeiana. Only the sequenced segment of the ImpDEF1 C-terminal ends has been used for further analyses in our study. Indel substitutions that occurred within the C-termini of ImpDEF1 were mapped onto the ImpDEF1 C-terminal domain phylogeny. In total, 118 indel substitutions were found following the parsimony principle (Fig. 3).

Fig. 3
figure 3

Bayesian consensus cladogram based on the C-terminal domain sequences of ImpDEF1. The numbers on the branches indicate the number of indel substitutions in the C-terminal end of ImpDEF1 based on the alignment shown in Fig. 1

The Origin of ImpDEF2 and ImpDEF1 Can be Explained by Inferring a Frameshift Mutation

To infer the mutational event(s) that resulted in the divergent C-terminal sequences, we compared the C-terminus of ImpDEF1 of I. omeiana with every available ImpDEF2 C-terminal domain present in our dataset. Unfortunately, we were unable to sequence the ImpDEF2 C-terminus for I. omeiana, despite the fact that the ImpDEF2 K-domain had already been successfully sequenced for this species (Janssens et al. 2007). Therefore we could not compare both C-terminal ends in I. omeiana, which is considered to be sister to all other species of Impatiens (Janssens et al. 2006).

Comparing the two ImpDEF paralogues resulted in a simple explanation in which a deletion of 11 nt gave rise to the truncated C-terminal end in ImpDEF2. This deletion caused a shift in the original reading frame which resulted in a premature stop codon, located 10 aa before the PI-derived motif (Fig. 4).

Fig. 4
figure 4

A Phylogeny of the basal asterid AP3/DEF-like proteins including Hydrangea, Marcgravia, and Impatiens (Geuten et al. 2006). Insertions (Ins) and deletions (Del) that occurred in the C-terminal ends of AP3/DEF in Marcgravia and/or in one of the two AP3/DEF paralogues of Impatiens are indicated with a number in parentheses (see B) and are highlighted on the basal asterid AP3/DEF phylogeny. B Alignment of the AP3/DEF-like C-terminal domains of Hydrangea macrophylla, Marcgravia umbellata, Impatiens omeiana (ImpDEF1), and I. chungtienensis (ImpDEF2). The deletion causing the frameshift mutation is indicated on the initial ImpDEF2 sequence. The result of the deletion is shown in the frameshifted ImpDEF2 sequence. Differences between the frameshifted fragment in ImpDEF2 and the similar fragment in ImpDEF1 are highlighted with a line between both fragments

Figure 4 shows the alignment of both ImpDEF C-terminal domains with the AP3/DEF C-termini of Hydrangea macrophylla and Marcgravia umbellata. It is interesting to note that the differences in nucleotides between the shifted ImpDEF2 region and the paralogous ImpDEF1 region resulted from substitutions that occurred most likely before the frameshift event (Fig. 4). This alignment illustrates that three of five nucleotide substitutions were present as silent mutations in the original amino acid order. Apparently, two of the three substitutions were nonsynonymous mutations in ImpDEF2 until their reading frame was moved two positions to the left and they became synonymous. Based on the comparison of the Impatiens paralogues with M. umbellata and H. macrophylla, it became clear that one of the three silent substitutions occurred in ImpDEF1 and not in ImpDEF2. Furthermore, one substitution is present in the second position of the original amino acid arrangement, suggesting that the nucleotide at this position must have changed after the deletion event. The shifted fragment also contained one substitution in the first position of the original codon order. However, when we compared this codon with the same amino acids in M. umbellata and H. macrophylla, we noticed that this nucleotide alteration occurred in ImpDEF1 instead of ImpDEF2.

Codon Mutations in the C-Terminal Domain of ImpDEF1 and ImpDEF2 Evolve Under Purifying Selective Pressure

In order to examine the molecular evolution of the C-terminal region, we investigated which selective pressures act on nonsynonymous substitutions in each of the two paralogues. The comparison of different models of evolution that allow for relative fixation rates of synonymous (silent) and nonsynonymous (amino acid-altering) mutations provided similar results for both ImpDEF1 and ImpDEF2 (Tables 2 and 3, respectively). For both paralogues, the estimated ω-ratios were significantly lower than 1.0, suggesting that purifying selection has taken place instead of positively selected evolution.

Table 2 Sites under positive selection, parameter estimates, likelihood values, and ω-values for the ImpDEF1 C-terminal domain as estimated for several discrete distribution and continuous distribution models
Table 3 Sites under positive selection, parameter estimates, likelihood values, and ω-values for the ImpDEF2 C-terminal domain as estimated for several discrete distribution and continuous distribution models

In addition, the branch-site models (models A and B) specifying the branches toward ImpDEF1 and ImpDEF2 did not explain the data significantly better than the models they were compared with (MA vs. M1a, 2ΔL = 0.02, p = 0.41; MB vs. M3, 2ΔL = 0.04, p = 0.63), suggesting that selection on the branches did not change subsequent to duplication.

Accelerated 3n-Indel Substitution Rate in the ImpDEF1 C-Terminal Domain and Low 3n-Indel Substitution Rate in the ImpDEF2 C-Terminal Domain

Despite the evidence for negatively selected codons within the relatively long C-terminal domains of ImpDEF1, we noticed that there is a remarkable length divergence between C-termini of paralogues from different species. Therefore we were interested to know (a) if this is a remarkable observation in comparison to other taxonomic groups and (b) which selective pressures act on these length mutations in ImpDEF1. In addition, we tested whether the truncated C-termini of ImpDEF2 might also have evolved under selective pressure.

For comparison to other species, Brassicaceae and Solanaceae were selected. For these core eudicot plant families, it is reasonable to assume that the euAP3 lineage is represented by one member in the genome. For Solanaceae, we used the data on MADS-box class B genes presented by Rijpkema et al. (2006). This dataset contains eight species: Brunfelsia uniflora, Cestrum elegans, Juanulloa aurantiaca, Mandragora autumnalis, Petunia hybrida, Solandra maxima, Solanum lycopersicum, and Solanum pseudolulo. Indel substitution events that occurred within the C-terminal domain of this group were mapped onto the Solanaceae tree of Olmstead et al. (1999). In this phylogeny, the Cestroideae containing Cestrum elegans are sister to the Solanaceae species that were present in the dataset of Rijpkema et al. (2006). Consequently, we separately compared Cestrum elegans with each species from the Solanaceae dataset and counted the number of indel substitutions that had occurred between the outgroup and the examined species. On average, three indel substitutions were found for a common C-terminus in Solanaceae with a length of 183.9 nt. Moreover, insertions or deletions that occurred within the Solanaceae C-terminal domain are always in multiples of three nucleotides.

According to Wikström et al. (2001) the crown group in Solanaceae, in which the Cestroideae is one of the most basal lineages, has been dated at an age of 41 to 36 million years (±38.5 my on average). The method of Podlaha and Zhang (2003) was used to calculate the rate of indel substitution in Solanaceae. Consequently, the substitution rate for 3n-indels in the C-termini of Solanaceae is 3/183.86/(38.5 × 107 × 2) = (2.11 ± 0.15) × 10−10 per site per year (Table 4).

Table 4 Comparison of substitution rates of 3n-indels for ImpDEF1 and ImpDEF2

In order to obtain a second estimate of the average substitution rate for insertions and deletions in the C-terminal domain of core eudicots, we used the same approach for a distinct dataset. We chose to compare the C-terminal domain of Arabidopsis thaliana with that of Brassica napus (Jack et al. 1992; Pylatuik et al. 2003). Recent studies in Brassicaceae dated the most recent common ancestor of Arabidopsis and Brassica between 20.4 and 14.5 million years ago (Bowers et al. 2003; Schranz and Mitchell-Olds 2006). By comparing the C-termini of both species, we identified only one 3n-indel over an alignment of 189 nt. As a result, the neutral rate of indel substitution for these data is 1/189/(17.45 × 107 × 2) = (1.52 ± 0.2) × 10−10 per site per year (Table 4).

The estimated substitution rates can now be used to calculate the expected number of indels in the C-termini of ImpDEF1 and ImpDEF2 considering that the crown group in Impatiens originated 25.9 million years ago (Janssens et al., submitted for publication). The observed number of 3n-indels in the C-terminal domain of ImpDEF1 is 15.99, which is significantly higher than the indel substitution rate for Solanaceae (p < 0.001; Poisson test). In contrast, the observed number of indels in the C-terminal region of ImpDEF2 is 0.06, which is significantly lower than the indel substitution rate for Solanaceae (p < 0.001). Similar results were obtained using the estimated indel substitution rate based on the comparison between Arabidopsis and Brassica. With this substitution rate, we expected on average 2.15 indels in the C-terminal end of ImpDEF1 and 0.47 indel in the C-terminus of ImpDEF2 (Table 4). The significant difference between the observed and the expected amount of indels is even higher for both Impatiens paralogues when using the Arabidopsis/Brassica-based substitution rate (p < 0.001; Poisson test). These results indicate that indels in the C-terminal domain of ImpDEF1 evolve at a significantly faster rate. In contrast, indel substitution events in ImpDEF2 evolve at a significantly slower rate.

Because of the observed rate differences, we wanted to test if they could be explained by positive and/or purifying selection on indel substitutions. Similarly to Podlaha et al. (2005), we used the general indel mutation rate in introns to evolve under neutral selection. Following this approach, we calculated the number of 3n-indel substitutions that occurred within intron 4 and intron 5 of both ImpDEF1 and ImpDEF2 (Janssens et al. 2007). An average of 14.72 3n-indels/kb was found for the two introns in both paralogues. Comparing this value with the average number of 3n-indels/kb in the C-terminal domain of ImpDEF1 (54.47 3n-indels/kb) and ImpDEF2 (0.91 3n-indel/kb), we noticed a significant difference between the neutral value and the calculated value for each paralogue (Poisson test: ImpDEF1, p < 0.001; ImpDEF2, p < 0.001). This result suggests that indel substitutions in the C-terminal domain of ImpDEF1 genes evolve under positive selection, while these mutations are strongly selected against in the ImpDEF2 gene.

Discussion

Frameshift mutations, which often result in the early termination of the transcript by coding a premature stop codon, are generally regarded as detrimental for protein function and have therefore been considered of little evolutionary significance. More recently, however, the combination of gene duplications and frameshift mutations is assumed to be an important evolutionary mechanism for the emergence of new biological functions (Kopelman et al. 2005; Raes and Van de Peer 2005; Taylor and Raes 2004). Vandenbussche et al. (2003) identified the importance of frameshifts in flowering plant MADS-box gene-family evolution. Recently, Kramer et al. (2006) raised the question of how often these frameshift mutations subsequent to duplication would have occurred within the AP3/DEF lineage. These authors identified three other frameshifts in the AP3/DEF lineage. Although we describe an additional frameshift mutation in the AP3/DEF lineage, the case of Impatiens is of specific interest, as it involves a recent duplication event that probably coincides with the origin of the genus.

By comparing the C-terminal domains of ImpDEF1 with those of ImpDEF2, not only did we observe a conserved dissimilarity in length between the two duplicates, but also we noticed a significant difference in the molecular evolution of both C-termini. As for ImpDEF1, the C-terminal domain is characterized by numerous indel substitutions. The fact that no out-of-frame indels were detected within the ImpDEF1 gene lineage suggests that this C-terminus is still a functional domain. The indel substitution rate of the C-terminal end is significantly higher than that of the AP3/DEF C-terminus in Brassicaceae or Solanaceae. Furthermore, by comparison to what can be considered a “neutral rate” of 3n substitutions in the K-domain introns, our results reveal an adaptive selective pressure on the length of the ImpDEF1 C-terminus. Consequently we can consider the functional mechanism responsible for this mode of evolution. Knowing that the K-domain sequence of both paralogues is very similar (Janssens et al. 2007), we assume that the protein interaction specificity of each gene copy is probably derived from their different C-terminal domains. Some clues come from the yeast-two-hybrid data presented by Geuten et al. (2006). According to their results IhDEF1 and IhDEF2 neither form homodimers nor heterodimerize with each other in vitro. Both paralogues, however, dimerize in vitro with IhGLO. In addition, IhDEF1 does not interact, while IhDEF2 does interact with IhSEP3. This means that the ImpDEF1 C-terminal domain can be involved in establishing a specific configuration of the higher order transcriptional regulatory complex, possibly by recruiting other specific factors to the complex. So far, no earlier studies have suggested that the C-terminus would be involved in such an interaction, yet it would be interesting to test for such a novel function.

In contrast to ImpDEF1, the C-terminal domain in ImpDEF2 is characterized by an indel substitution rate that is significantly lower than estimated for the C-terminus of a standard Brassicaceae or Solanaceae AP3/DEF-like gene. The observed indel substitution rate of the C-terminal end in ImpDEF2 is significantly lower than was expected based on a neutral indel rate, indicating that the length of this domain is under strong purifying selection. Similarly to ImpDEF1, though, the truncated C-terminal domain could be involved in allowing protein-protein interactions that influence the configuration of the higher-order complex regulating transcription of AP3/DEF-like downstream genes. Likewise, the very conservative evolution of the ImpDEF2 paralogue is probably related to a new function. Together, their retention throughout the genus, their mode of evolution, and the molecular biological evidence suggest that the gene products of both ImpDEF1 and ImpDEF2 act in functional transcriptional regulatory complexes. Recently, Piwarzyk et al. (2007) demonstrated that both euAP3- and PI-derived motifs are not required for the AP3/DEF floral organ identity function, and thus it is possible that both AP3/DEF paralogues in Impatiens are still functional in floral identity specification. The specific role of these paralogues and the complexes they form remains to be determined, however.

The several subfamilies of MADS-box genes have characteristic C-terminal domains. Establishing these diverse C-terminal domains is proposed to have happened by frameshift mutations for several of these subfamilies (Vandenbussche et al. 2003). Here we have demonstrated that positive and negative selection on length mutations can explain the diverse sequences of the Impatiens AP3/DEF-like genes. In doing so, we identified a novel selective pressure for MADS-box gene C-terminal domains. Impatiens might be an exception, but could as well be an example of a more general phenomenon. The retention of frameshift mutations in MADS-box gene evolution, resulting in length difference in addition to a different sequence, seems to emphasize the importance of length for function. Functionally interpreting adaptive evolution by length mutations would require a better understanding of the specific function of the C-terminal domains of MADS-box genes. Conversely, taking length into account as a variable could help to better understand C-terminal domain function.