Introduction

The formation of new genes by various duplication events results in the creation of multigene families and has been long thought to be a major source for the origin of evolutionary novelties, including new gene functions and expression patterns. Thus, the evolution of multigene families has been extensively studied at both the empirical and the theoretical levels (Ohta 1987; Clark 1994; Walsh 1995; Liebhaber et al. 1981; Brown and Ish-Horowicz 1981; Hibner et al. 1991; Nei et al. 1997; Rooney et al. 2002). It has long been believed that the members of a multigene family do not evolve independently but instead evolve together as a unit by means of gene conversion and/or unequal crossing-over (Smith 1974; Arnheim 1983). This concerted evolution was observed in ribosomal and small nuclear RNA genes and globin genes (Liebhaber et al. 1981; Brown and Ish-Horowicz 1981; Hibner et al. 1991; Liao 1999). On the other hand, Nei and Hughes (1992) proposed a birth-and-death model for evolution of multigene families. This model expresses frequent duplication and loss of gene copies and assumes independent evolution between the members. The birth-and-death model approximates evolution of large multigene families, such as the major histocompatibility complex (MHC), immunoglobulin (Ig), antibacterial ribonuclease genes, and nematode chemoreceptor gene families (Nei and Hughes 1992; Ota and Nei 1994; Nei et al. 1997; Robertson 1998), as well as of smaller multigene families such as the ubiquitins (Nei et al. 2000). It is a matter of concern to characterize the cases to where the two models apply. We could expect that tandemly arrayed members in the genome are likely to evolve in a concerted manner, while members in different genomic environments evolve independently. And also, the members will have some constraint as a whole in functional level or expression level. In this paper, we report the evidence from the Amy gene family in Drosophila.

The α-amylase enzyme is one of the most important enzymes for eukaryotic organisms, especially animals, because it is essential for digestive processes in which food starch is hydrolyzed into maltose and glucose. In Drosophila, the members of the Amy gene family vary from two to seven among species (Doane et al. 1987; Brown et al. 1990; Shibata and Yamazaki 1995; Da Lage et al. 1996, 2000; Popadic et al. 1996; Inomata, Tachida and Yamazaki 1997; Steinemann and Steinemann 1999; Inomata and Yamazaki 2000). The duplicated Amy genes in the melanogaster species subgroup (Shibata and Yamazaki 1995) and in D. pseudoobscura (Brown et al. 1990; Popadic et al. 1996) are respectively, inverted and tandem repeats. They have been shown to evolve in a concerted manner (Hickey et al. 1991; Shibata and Yamazaki 1995; Popadic et al. 1996). Recently, however, Inomata and Yamazaki (2000) found that D. kikkawai and its sibling species have two types of highly divergent, paralogous Amy genes with different GC contents at the third codon positions (GC3) at different chromosomal locations. They encode active amylase isozymes that show distinct expression patterns. Furthermore, two such types of divergent Amy gene duplicates appear to be common in the montium subgroup, to which D. kikkawai and its sibling species belong (Zhang et al. 2003). Similar observations were reported in D. ananassae (Da Lage et al. 2000). Importantly, the two types of the paralogous Amy genes reside in chromosomal regions that are very distant from each other (Inomata and Yamazaki 2000; Da Lage et al. 2000).

Although the evolution of paralogous Amy genes has been studied previously (Inomata and Yamazaki 2000; Da Lage et al. 2000; Zhang et al. 2002), their origins in Drosophila remain to be resolved. Furthermore, inference on an evolutionary history such as gene duplication, gene loss, and divergence among the Amy gene members will strengthen our understanding of this gene family. Here, we analyzed all of the 49 active and complete Amy sequences from Sophophora subgenus. With this extensive dataset, we were able to show that the two gene duplication events occurred in different lineages of the Sophophora subgenus independently and produced distant paralogs. In addition, the melanogaster subgroup was estimated to have lost one of the two types of the Amy genes. While no significant change in rate of amino acid replacement was observed among the lineages after gene duplications, the GC3 contents decreased significantly along one of the two Amy clusters both in the montium subgroup and in D. ananassae right after gene duplication. This suggests that the two types of the Amy genes within species undergo a birth-and-death process, whereas tendemly repeated Amy members evolve in a concerted manner.

Materials and Methods

Sequence Data

Forty-nine complete Amy gene sequences were retrieved from GenBank. Their accession numbers, GC contents at all positions of coding region, and codon usage bias indices (the scaled chi-square) are shown in Table 1. Although pseudogenes are part of the evolutionary history of multigene families, they have very different evolutionary rates compared with functional genes and may contort attempts to date duplication events. Therefore, we used only the available complete DNA sequences of functional Amy genes.

Table 1 A list of Drosophila species and Amy sequences used in this study, GC content, codon usage bias, and accession number

Sequence Analyses

Sequences were first aligned at the amino acid level using CLUSTALX (Thompson et al. 1997). The Amy genes code 494 amino acid residues (1482 nucleotides), including the signal peptide, which encompasses the first 18 amino acid residues. The only Amy4N and Amyi5 genes in D. ananassae have an additional amino acid (Arg) in the signal peptide (Da Lage et al. 2000). After removing this additional amino acid, the length of sequences analyzed in this study was 494 amino acids.

To reconstruct phylogenetic trees, we used the neighbor-joining method in MEGA 2.0 (NJ; Saitou and Nei 1987), the maximum likelihood (ML) method in PHYLIP 3.6a3 (Felsenstein 2002), and the maximum parsimony (MP) method (Branch-and-Bound search) in PAUP* 4.0 (Swofford 1998). The JC69 (Jukes and Cantor 1969), K80 (Kimura 1980), and TN (Tamura and Nei 1993) distance measures were used for the NJ tree reconstruction to examine their effects on topological stability. The precision of the tree topology was assessed by bootstrap analysis, with 1000 resampling replicates for the MP and NJ methods and 100 replicates for the ML method.

To assess the significance of differences in evolutionary rate among gene clusters, the ML method, as implemented in the DNAML and DNAMLK programs in PHYLIP 3.6a3 was used. Since base composition at the third codon position varies among Amy sequences, we used Galtier and Gouy’s (1998) maximum likelihood method as implemented in the EVAL_NH program (NHML package). This method is based on a nonhomogeneous, nonstationary model of DNA sequence evolution, to estimate base compositional change in evolutionary course of the Amy genes in Drosophila.

Results

Gene Duplication Followed by Gene Loss

Previous studies indicate that there is great heterogeneity in GC3 content among the Amy genes even within species (Inomata and Yamazaki 2000; Da Lage et al. 2000; Zhang et al. 2002). Therefore, to reduce the effects of compositional bias on phylogenetic reconstruction, the only first and second codon positions were used for phylogenetic analyses. Because topologies of NJ trees constructed by K80 distances (Kimura 1980; Saitou and Nei 1987), JC distances (Jukes and Cantor 1969), and TN distance (Tamura and Nei 1993), and of MP (Swofford 1998) and ML (Felsenstein 2002) trees were almost the same, we show only the NJ tree constructed by K80 distances (Fig. 1). Using the Amy sequences of D. virilis and S. lebanonensis as outgroups, the Amy genes of the obscura group first diverged from the other lineages. Furthermore, the Amy genes of the melanogaster group formed a monophyletic group with very high bootstrap probability (96%). The Amy genes of the ananassae subgroup diverged first from the other Amy genes of the melanogaster and montium subgroups, followed by those of the montium subgroup. All Amy genes in the subgroups and groups (except for the montium subgroup) formed monophyletic clusters, although some clusters do not have high bootstrap probabilities. These results are consistent with previous studies (Russo et al. 1995; Inomata et al. 1997). Another important observation in Fig. 1 is that there are two gene clusters with high bootstrap probabilities within the montium subgroup species and D. ananassae. For the montium subgroup, we refer to the Amy paralog cluster including the Amy1 and Amy2 genes as “mon-cluster 1” and the Amy paralog cluster including the Amy3 and Amy4 genes as “mon-cluster 2.” Similarly, we refer to the Amy paralog cluster including the Amy58 and Amy35 genes as “ana-cluster 1” and the Amy paralog cluster including Amy4N and Amyi5 genes as “ana-cluster 2” in D. ananassae (Fig. 1).

Figure 1
figure 1

Gene NJ tree reconstructed by first and second codon positions and Kimura’s two-parameter distances. The numbers near the nodes refer to bootstrap probabilities and the boldface underlined numbers refer to GC3 contents of the corresponding ancestral nodes.

Figure 1 clearly suggests that one duplication event, which resulted in two gene clusters in the montium lineage, predated the split of the melanogaster and montium subgroups and that another duplication event occurred independently in the ananassae lineage after the split of the ananassae subgroup and the montium and melanogaster subgroups. Since the melanogaster subgroup species have only one gene cluster, they are likely to have lost one of the two homologous gene clusters in the montium subgroup. This inference on gene duplication/loss events is further supported by comparison of their gene arrangements in genomes. For instance, Fig. 2 shows that the two gene clusters are located on different regions of the same chromosome in the montium subgroup species but on different chromosomes in D. ananassae (Inomata and Yamazaki 2000; Da Lage et al. 2000). The most likely scenario is that the two gene duplication events occurred independently in two different lineages and that the melanogaster subgroup species might have lost the corresponding paralogous Amy cluster 2 (Figs. 1 and 2).

Figure 2
figure 2

Organization of the Amy gene clusters in D. melanogaster, D. pseudoobscura ST, D. kikkawai, and D. ananassae. Open circles refer to centromeres of chromosomes. Orientations of the Amy genes are indicated by arrows if they are known. Open rectangles indicate a pseudogene or partial sequence available. Genes in gray do not have significant expression information available or are Amyrel genes. The D. melanogaster arrangement is taken from Boer and Hickey (1986), D. pseudoobscura ST from Brown et al. (1990), D. kikkawai from Inomata and Yamazaki (2000), and D. ananassae from Da Lage et al. (2000). The figure shows just the organization of the Amy gene clusters, not the real sizes and distances between genes.

To test the hypothesis of molecular clock at the first and second codon positions, we compared the likelihoods of the phylogenies assuming a constant rate and without assuming a clock (DNAMLK vs. DNAML in PHYLIP 3.6 [Felsenstein 2002]). Both models resulted in the same topologies. The log likelihood under the assumption of a molecular clock was l 0 = −4144.23, whereas the log likelihood under the assumption of no clock was l 0 = −4121.26. Comparison of twice the log-likelihood difference, 2δl = 2 × (−4121.26 − (−4144.23)) = 45.94, with the chi-square distribution (df = 47, p = 0.516). The difference between the two models was not significant, indicating that the molecular clock holds at the first and second codon positions. Therefore, the outputs of maximum likelihood analysis under a molecular clock and bootstrap resampling with 100 replicates were used to estimate the gene duplication times and their standard deviations, A and B (Fig. 1), for the mon-clusters 1 and 2 and for the ana-clusters 1 and 2, respectively. Using the 24.9-mya divergence time of the obscura and melanogaster groups as a calibration point (Russo et al. 1995), the duplication time between mon-cluster 1 and mon-cluster 2 was estimated at about 13.96 ± 1.93 mya and the duplication time between ana-cluster 1 and ana-cluster 2 was estimated at about 12.38 ± 1.76 mya. These results indicate that the two gene duplication events occurred independently and relatively recently, after the split of the ananassae subgroup and the montium and melanogaster subgroups. The calibration time used in the present study is considerably conservative. The estimate of the divergence time for the split of the obscura and melanogaster groups obtained from immunological distance data (Beverley and Wilson 1984) is about 46 mya, twice the estimate obtained by Adh sequence data (Russo et al. 1995). Thus, the estimates of the duplication time in present study should be regarded as the minimum ones.

Birth-and-Death Process Versus Concerted Evolution

Since there is great heterogeneity in GC3 content among the paralogous Amy gene clusters (Table 1), the method of Galtier and Gouy (1998) was used to estimate the ancestral GC3 contents. The numbers underlined in Fig. 1 show the estimates of the corresponding ancestral node GC3 contents. Our results indicate that the common ancestor of Sophophora species had an elevated GC3 content, which is consistent with at least one other study (Rodriguez-Trelles et al. 2000). The GC3 content of the common ancestor of mon-clusters 1 and 2 was 91.4%, whereas the GC3 contents of the ancestral nodes of mon-clusters 1 and 2 are 90.4 and 76.5%, respectively. The difference in GC3 content between the ancestral nodes of mon-clusters 1 and 2 was 13.9%. The standard error was estimated by a bootstrap method with 100 resampling replicates. The estimated difference was statistically significant (Z = 3.69, p < 0.01). For D. ananassae, the GC3 content of the common ancestor of ana-clusters 1 and 2 was 92.8%, whereas the GC3 contents of the ancestral nodes of ana-clusters 1 and 2 are 74.2 and 63.6%, respectively. The ancestral node of ana-cluster 1 has a significantly higher GC3 content than does that of ana-cluster 2 (Z = 4.54, p < 0.01). These results consistently suggest divergent evolution between the two gene clusters within species.

Figures 1 and 2 imply that the melanogaster subgroup species might have lost one Amy cluster homologous to mon-cluster 2 with a lower GC3 content. Although the montium subgroup species and D. ananassae retain cluster 2, the gene cluster has lost some specific regulatory elements compared with the corresponding cluster 1’s (Inomata and Yamazaki 2000; Da Lage et al. 2000; Zhang et al. 2002). All the above observations suggest that the two Amy clusters within species have experienced frequent duplication, gene and regulatory losses, and divergent evolution. This appears to be consistent with a birth-and-death process (Rooney et al. 2002).

On the other hand, on the basis of the observations of the electrophoretic polymorphism of amylases and southern hybridization of a molecular probe specific for the α-amylase coding region in the melanogaster subgroup species, Dainou et al. (1987) and Payant et al. (1988) demonstrated that duplication of the tandemly repeated Amy members predated the speciation events within the melanogaster species subgroup. Furthermore, Hickey et al. (1991) found that the 5′-flanking and 3′-flanking region sequences are highly divergent between the two tandemly repeated Amy members in D. melanogaster and D. erecta, while the coding region of the two genes in D. melanogaster had extreme similarity compared with the homologous sequence in D. erecta. This suggests that the two copies were not the consequence of very recent duplication but of frequent gene conversions in the coding region. Concerted evolution of the tandemly duplicated genes was also reported in a study on the Amy genes in D. kikkawai and its sibling species (Inomata and Yamazaki 2000). The phylogenetic tree in Fig. 1 shows that the mon-cluster 1 has a branching pattern very similar to that of the melanogaster subgroup Amy cluster and that the tandemly repeated members within species group by cluster. Furthermore, the head-to-head gene arrangements of the two tandemly repeated members are conserved for mon-cluster 1 of D. kikkawai and Amy (p) and Amy (d) of D. melanogaster (Fig. 2). All of these results suggest that concerted evolution holds for the members within the gene clusters (Figs. 1 and 2).

Discussion

The Amy gene family in Drosophila is a relatively small multigene family. The melanogaster subgroup species and some other species have one gene cluster with two or three tandemly arrayed members. They have been shown to be subject to concerted evolution (Hickey et al. 1991; Shibata and Yamazaki 1995; Popadic et al. 1996). We also observed that the Amy genes within the cluster evolved in a concerted manner (Table 1 and Fig. 1). Since the Amy genes within the cluster are tandemly repeated (Fig. 2), concerted evolution is the expected result. However, for the two types of Amy genes with a genomic organization of nontandem repeats of each other, they evolve independently and divergently. In this sense, members with different genomic organizations, even if in the same gene family, may exhibit different evolutionary modes. In other words, the different genomic organizations of a gene family may determine the evolutionary modes of multigenes.

We have shown that two duplication events occurred independently and relatively recently in different Drosophila lineages, resulting in two types of Amy genes in these species. The two types of Amy genes cluster by type and not by species (Fig. 1). Furthermore, it is most likely that the melanogaster subgroup lost one of the two types of Amy genes (Fig. 1). The shared evolutionary rate at the first and second codon positions of the paralogs suggests strong purifying selection at the amino acid level. On the other hand, one cluster, which is located close to the centromere, experienced a significant decrease in GC3 content, while the other maintained it (see Fig. 1 and Table 1). Comparing with Figs. 1 and 2, we would expect that the melanogaster subgroup species lost the Amy cluster 2 with a lower GC3 content in the past after duplication. In the preceding work, we found that the Amy gene cluster 2 with a lower GC3 content lost some cis-regulatory elements compared with gene cluster 1 in the montium subgroup species (Inomata and Yamazaki 2000; Zhang et al. 2002). Similarly, D. ananasse Amyi5, with a lower GC3 content, also lost a putative midgut regulatory element, whereas other copies maintain it in this species (Da Lage et al. 2000). These observations suggest that a decrease in GC3 content is coupled with gene and regulatory element loss after duplication. This also implies that one of the two types of Amy genes is undergoing a functional decay process. All the above observations suggested that the two types of Amy genes experienced relatively recent gene duplications, gene loss, and divergent evolution and are consistent with a birth-and-death process with strong purifying selection (Rooney et al. 2002).

Previous studies indicate that in D. kikkawai species, the two Amy clusters reside in different genomic locations on the same chromosome at a considerable distance from each other (Inomata and Yamazaki 2000). Similarly, the two Amy clusters in D. ananassae species are located on different chromosomes (Da Lage et al. 2000). If the syntenic groups of the Amy genes are conserved in the montium subgroup species, the genomic organizations of two types of Amy genes provide a structural basis for divergent evolution. That is, the nontandemly arrayed members of this gene family most likely evolved independently of each other and have little probability for gene conversion and unequal crossing-over. However, strong purifying selection maintains sequence homogeneity at amino acid level. This scenario also explains the recent observation that the nontandemly repeated histone 3 genes evolve independently and retain amino acid sequence homogeneity under strong purifying selection (Rooney et al. 2002). It must be pointed out that our postulation on conservative syntenic groups of Amy genes in the montium subgroup species should be plausible, because the montium subgroup species used in this study are closely related and their two types of Amy genes exhibit very similar expression and phylogenetic patterns (Zhang et al. 2002, 2003).

Finally, it should be pointed out that a decisive argument for a common origin of the Amy clusters and the subsequent loss of one Amy cluster in only the melanogaster subgroup will require examination of the cluster structure of the Amy genes in related species at a phylogenetic (taxonomic) level intermediate between the melanogaster and the montium subgroups, that is, species belonging to the so-called oriental subgroups (Ashburner 1989), such as D. elegans and D. takahashi. This leaves open future experimental research. However, the occurrence in D. eugracilis, another species belonging to the oriental subgroups, of an electrophoretic pattern of two very distinct groups of variants similar to that of D. kikkawai (Inomata et al. 1995) suggests that they are encoded by two sets of duplicated Amy genes. This seems to be a good indication of a structure similar to that of mon-clusters 1 and 2.