Introduction

Recent comparative gene expression analyses between closely related species of the Drosophila melanogaster subgroup have highlighted the importance of stabilizing selection in shaping the evolution of gene expression within (Rifkin et al. 2005) and across (Rifkin et al. 2003; Lemos et al. 2005; Gilad et al. 2006) species. For instance, using a mutation-drift model to infer patterns of evolution of gene expression levels across D. melanogaster, D. simulans, and D. yakuba, Rifkin et al. (2003) found that 67% of genes demonstrated similar levels of expression among all three species. Despite the observation of widespread conservation of expression levels between species, the sterile offspring of interspecific crosses within the D. melanogaster subgroup demonstrate large-scale misregulation of gene expression relative to parental expression levels (Ranz et al. 2004; Michalak and Noor 2003, 2004; Haerty and Singh 2006; Moehring et al. 2007). This suggests that, while gene expression remains conserved between species, nucleotide sequence divergence of regulatory elements is occurring; only in the case of interspecific hybridizations are the effects of such nucleotide divergence revealed.

The phenomenon of interspecific hybrid sterility is thought to arise from genetic incompatibilities linked to divergence at interacting loci (Dobzhansky 1936; Muller 1942). Classical genetic studies of hybrid sterility in multiple taxa have supported the Dobzhansky-Muller model at the gene-gene interaction level (for review see Coyne and Orr 2004). The phenomenon of gene misregulation in interspecific sterile hybrids has provided evidence of Dobzhanky-Muller incompatibilities at the transcriptional level as well (Ortiz-Barrientos et al. 2006). Michalak and Noor (2004) have postulated a causal link between gene misregulation and hybrid sterility. In a study of the expression patterns of a small number of genes known to be misregulated in the interspecific hybrids between D. simulans and D. mauritiana, the authors found that four of the five genes assayed were misregulated in sterile fifth-generation backcross males, while fertile fifth-generation backcross males demonstrated parental levels of expression for these same genes. Classical and molecular genetic analyses have also found that genes implicated in hybrid sterility or inviability show evidence of rapid and adaptive evolution at the nucleotide level (Ting et al. 1998; Presgraves et al. 2003; Barbash et al. 2004; Presgraves and Stephan 2007). As protein coding sequence evolution and gene expression divergence appear to be coupled (Castillo-Davis et al. 2004; Nuzhdin et al. 2004; Lemos et al. 2005), we sought to test if the genes that are misregulated in hybrids show a greater degree of protein sequence divergence relative to nonmisregulated genes. We find that genes that are misregulated in D. simulans female × D. melanogaster male hybrids (specifically those underexpressed relative to parents) are evolving more rapidly at the amino acid level than nonmisregulated genes. In addition, misregulated genes show a paucity of proteins with known lethal mutant phenotypes, suggesting that similar selective forces are acting to minimize sequence and expression divergence for essential genes.

Materials and Methods

Using cDNA microarray hybridization data from a study of gene expression in hybrid testes between D. simulans females and D. melanogaster, D. mauritiana, and D. sechellia males (Gene Expression Omnibus databank accession no. GSE3673 [Haerty and Singh 2006]), genes were classified as misregulated or nonmisregulated in hybrids in comparison to both parents as well as whether parental expression levels were significantly different from one another as per the criteria used by Haerty and Singh (2006). The absolute average difference in the log2 ratio (expression in the testis/expression in D. melanogaster whole body) of the expression values was computed as a measure of gene expression difference between D. melanogaster and D. simulans as well as between parents and hybrids.

Using the Drosophila genome annotation project (http://www.rana.lbl.gov/drosophila/; Supplementary Table 1), we were able to retrieve coding sequences for D. melanogaster and D. simulans for a total of 2637 of the genes expressed within the hybrids between D. simulans females and D. melanogaster males in Haerty and Singh’s (2006) study (Supplementary Table 1). The longest available transcript for each gene was used. Sequences were aligned using ClustalW (Thompson et al. 1994) according to the protein sequence alignment. Nonsynonymous (d N) and synonymous (d S) rates of divergence were computed using CODEML from PAML (Yang and Nielsen 2002). Using available gene predictions for D. melanogaster, D. simulans, D. sechellia, D. yakuba, D. erecta, and D. ananassae, we tested for evidence of positive selection for each gene using models allowing d N/d S to vary across sites (M7 and M8, PAML Yang and Nielsen 2002; Dorsophila 12 Genomes Consortium 2007). A Bonferroni correction was applied to the results. Using FlyBase annotations (http://www.flybase.org) we collected information on lethal and sterile mutant phenotypes for all of the genes used in our study.

Previous studies have shown that potential hybridization bias can arise from the use of a single-species microarray for cross-species hybridization due to sequence divergence between species (Gilad et al. 2005; Oshlack et al. 2007). Therefore we applied a more conservative procedure by limiting our analysis to genes without significant expression difference between D. melanogaster and D. simulans in order to account for the possible confounding effect of sequence divergence on gene hybridization (Michalak and Noor 2003). After removing genes with significant differences in expression levels between D. melanogaster and D. simulans, 1841 of 2637 genes remained, including 729 genes significantly misregulated in comparison to both parental species (245 overexpressed, 484 underexpressed).

The differences between significantly and nonsignificantly misregulated genes were determined using permuted Kruskal-Wallis rank sum tests (10,000 permutations, coin package for R [R Development Core Team 2004]). Kendall rank sum coefficients of correlation between sequence divergence and expression difference were computed for coding sequences using d N, d S, and d N/d S values. As a significant correlation exists between d N and the expression differences between parental species as well as between d N and the expression differences between parental species and the hybrids, we controlled for any potential effect of the former on the latter using multiple regression analysis. Our regression model was: lm (d N mel (or sim) + parents), where mel represents the expression difference between D. melanogaster and the hybrids, sim the difference between D. simulans and the hybrids, and parents the expression difference between parental species. Overrepresentations of genes with lethal or sterile mutant phenotypes relative to expectations within categories were computed using chi-square tests.

Results

Faster Sequence Divergence of Underexpressed Misregulated Genes in Hybrids

Using expression data for the hybrids between D. simulans females and D. melanogaster males obtained from Haerty and Singh (2006), we compared the rate of evolution between misregulated and nonmisregulated genes in hybrids and analyzed the relationship between nucleotide sequence divergence and gene expression differences between hybrids and parental species. We found that genes misregulated in hybrids show greater d N, and d N/d S relative to nonmisregulated genes (Kruskal-Wallis test, p < 2 × 10−16 in both comparisons; Supplementary Table 2), while no significant differences were observed for d S (p = 0.063). More specifically, comparing over- and underexpressed misregulated genes in hybrids, we observed that underexpressed genes show greater d N, and d N/d S than overexpressed genes (p < 2.2 × 10−16 in both comparisons, with a Bonferroni correction applied) (Fig. 1, Supplementary Table 2); in fact, overexpressed genes show significantly lower d N, and d N/d S than nonmisregulated genes (p = 0.0012 and p = 0.0024, respectively, a Bonferroni correction was applied). Again, no significant differences were observed for the d S between nonmisregulated genes and overexpressed or underexpressed misregulated genes, nor between over- and under-expressed misregulated genes themselves (p = 0.6891, p = 0.2494, and p = 0.5876, respectively). Given that over- and underexpressed misregulated genes show differences in their evolutionary patterns, we divided them into two separate categories for all subsequent analyses.

Fig. 1
figure 1

Comparison of d N , d S , and d N /d S between nonmisregulated genes and misregulated genes in hybrids. Misregulated genes are divided into those underexpressed and those overexpressed relative to parents

Supporting the results of previous studies (Castillo-Davis et al. 2004; Lemos et al. 2005), we also found that genes with divergent expression between species had significantly greater d N (Kruskal-Wallis test, p = 0.013) and d N/d S (p = 0.039) than genes with similar expression levels between species (Fig. 1, Supplementary Table 2). Differences in d S were nonsignificant (p = 0.964). We attempted to determine whether a greater proportion of misregulated genes shows evidence of positive selection than do nonmisregulated genes. Unfortunately, only 17 genes in our entire dataset display evidence of positive selection after the application of a Bonferroni correction, preventing the application of reliable statistical analysis.

We attempted to control for any potential hybridization bias of D. simulans transcripts on the D. melanogaster array by removing genes showing significantly different levels of expression between parental species (as per Michalak and Noor 2003). We observed similar results to the analysis using the full dataset, as underexpressed misregulated genes present a greater d N and d N/d S than nonmisregulated or overexpressed misergulated genes (Kruskal-Wallis test, p < 2.2 × 10−16 in all comparisons, a Bonferroni correction was applied) (Table 1). Also, as previously observed, overexpressed misregulated genes present a lower d N and d N/d S than nonmisregulated genes (p = 0.0075 and p = 0.0089, respectively, a Bonferroni correction was applied). No significant difference was observed for the d S between nonmisregulated and overexpressed or underexpressed misregulated genes (p = 0.4849 and p = 0.2559) or between over- and undermisregulated genes (p = 0.5893).

Table 1 Comparison of average (±SD) evolutionary rates for genes showing similar levels of expression between D. melanogaster and D. simulans

Correlation Between Sequence Divergence and Misregulation in Hybrids

In order to test for a possible association between gene misregulation and sequence divergence at the protein level, we performed a correlation analysis between the estimated rates of nucleotide sequence evolution and the absolute, average gene expression differences between parents and between hybrids and both parental species. We found a significant correlation between d N and d N/d S and gene expression differences between species as well as between parents and their hybrids (Fig. 2). d S was only significantly correlated with gene expression differences between the D. melanogaster parent and the hybrids, which may reflect the greater expression level detection ability of these transcripts on the D. melanogaster cDNA microarray. Removing the effect of parental expression differences, we still observed a significant relationship between expressions difference between D. melanogaster and the hybrids and d N or d N/d S (p < 2.2 × 10−16 and p < 2.2 × 10−16, respectively) and between expression differences between D. simulans and the hybrids for d N or d N/d S (p = 9.17 × 10−12 and p = 2.18 × 10−8, respectively). Furthermore, the absence of significant correlation between d S and expression difference between D. melanogaster and D. simulans (Fig. 2) indicates that the previously observed significant correlations between d N and expression differences between parental species and their hybrids are not the results of hybridization biases caused by sequence divergence.

Fig. 2
figure 2

Relationship between sequence divergence (d N , d S ) and absolute gene expression difference between D. melanogaster and D. simulans species and between parental species and hybrids. Kendall rank-sum coefficients of correlation (τ) and p-values are shown in each frame. The correlation between expression difference and d N/d S is not shown, however, the values are τ = 0.1104 and p = 1.44 × 10−14, τ = 0.0561 and p = 1.77 × 10−5, and τ = 0.0269 and p = 0.0389 for the correlations with expression difference between D. melanogaster/hybrid, D. simulans/hybrid, and D. melanogaster/D. simulans, respectively

Once again, in order to remove any potential bias linked to sequence divergence, we reanalyzed the data, removing genes showing a significant expression differences between parental species. The conclusions of the analysis remain the same (Table 2).

Table 2 Relationship between expression difference and sequence divergence for genes showing similar levels of expression between D. melanogaster and D. simulans

Functional Difference Between Nonmisregulated and Misregulated Genes in Hybrids

Nuzhdin et al. (2004) found that genes with known mutations of large phenotypic effect were underrepresented in the category of genes with divergent expression between species compared to genes with similar levels of expression. We performed an analysis in which we determined whether genes with known lethal or sterile mutant phenotypes showed the same distribution between misregulated and nonmisregulated genes in hybrids. We found an underrepresentation of genes with known lethal mutant phenotypes among the misregulated genes in hybrids (84/978 vs. 205/1659, χ² = 7.25, p = 0.007). When examining the distribution of lethal phenotypes and taking into account the pattern of misregulation in hybrids, a significant underrepresentation of genes with lethal mutant phenotypes in underexpressed misregulated genes compared to nonmisregulated or overexpressed misregulated genes is observed (50/696 vs. 205/1659, χ² = 11.15, df = 1, p = 8.4 × 10−4, and 50/696 vs. 34/280, χ² = 5.15, df = 1, p = 0.0232, respectively). We found no significant difference between nonmisregulated and overexpressed misregulated genes (205/1659 vs. 34/280, χ² = 0.01, df = 1, p = 0.92). There was also no significant difference in the proportion of genes with known sterile mutant phenotypes between misregulated and nonmisregulated genes in hybrids (χ² = 1.8, p = 0.180). However, only 29 genes in our dataset had annotated male specific sterile mutant phenotypes, reducing the power of our statistical analysis.

Discussion

Consideration of Hybridization Bias of Interspecific Transcripts on the D. melanogaster Microarray

No correlation was observed between the rate of synonymous substitution (d S) and the magnitude of expression difference between species in our dataset, indicating that the effect of sequence divergence on hybridization efficiency may be small. Moreover, removing all genes showing a significant difference in expression level between parental species (as per Michalak and Noor 2003) does not affect the conclusions of the previous analysis. As noted in the study by Gilad et al. (2005), performing cross-species hybridizations on a single-species array for the purpose of direct comparison of relative expression levels can lead to biased estimates, due to the effect of sequence divergence on the efficiency of hybridization. However, in the same study it was also found that when a minimum between-species expression difference cutoff of 1.5-fold was employed, almost all of the genes classified as differentially expressed on a single-species array were confirmed by multi-species-array analyses. In the present data set, the smallest gene expression difference between D. melanogaster and D. simulans for genes that were considered significantly differentially expressed is 1.59-fold. In the case of the comparisons between the hybrids and the parental species, the smallest gene expression difference for genes classified as significantly misregulated is 1.02-fold (a total of 7 genes of 978 show a gene expression difference <1.5).

Similar interpretations of cross-species hybridizations have also been corroborated by the recent study by Moehring et al. (2007) on gene expression in interspecific hybrids between D. simulans, D. sechellia, and D. mauritiana. The authors of that study compared the accuracy of their results from cross-species hybridization on a single-species microarray to a small-scale multispecies array and showed that, although cross-species hybridizations led to a decrease in power to detect genes significantly differentially expressed between species, genes called significantly misregulated in the D. melanogaster single-species array were also observed to be significantly misregulated in the multispecies microarray.

Underexpressed Misregulated Genes Diverge More Rapidly

As the comparison between parental species and hybrids was performed on genes expressed in the testes, a possible tissue effect (i.e., faster evolution of testis-expressed genes in comparison to genes expressed in different organs [Civetta and Singh 1995; Jagadeeshan and Singh 2005]) should not account for the increased divergence observed among misregulated genes compared to nonmisregulated genes in hybrids. The greater average d N and d N /d S coupled with the absence of significant differences for the average d S in misregulated genes suggests that the observed sequence divergence may be due to directional selection. Unfortunately, as we implemented a conservative test for positive selection (comparison of models M7 and M8 from PAML, associated with Bonferroni correction), very few genes within our dataset show significant evidence of positive selection (i.e., d N /d S > 1; 17 genes in total; Supplementary Table 3). Therefore, we were unable to test whether misregulated genes showed a significant enrichment of genes demonstrating evidence of Darwinian selection. However, it should be noted that the rapid divergence of misregulated genes is also consistent with the predictions of models of the accumulation of Dobzhansky-Muller incompatibilities under directional selection (Johnson and Porter 2000, 2007). Such simulations have found that the rate of accumulation of incompatibilities between populations, and thus the rate of speciation, increases when the affected loci are under directional selection.

An alternative explanation for rapid evolution of misregulated genes may be that these genes are nonessential and thus are subject to relaxed selective constraint. The paucity of genes with lethal mutant phenotypes among those misregulated would appear to support this notion (see below). Application of population genetic analyses that are more sensitive to detecting weaker signatures of positive selection in closely related species (i.e., those incorporating polymorphism information between and within species [Eyre-Walker 2006]) will be required to determine whether directional selection or relaxed constraint is the more likely explanation for the patterns observed.

Several studies have shown that genes that are considered essential (i.e., have lethal mutant phenotypes) are more conserved over evolutionary time than those that are considered dispensable (Torgerson and Singh 2006; Hahn et al. 2006; He and Zhang 2006). Our study demonstrated not only that genes with a known lethal mutant phenotype are evolving more slowly at the sequence level, but that they are also less likely to be underexpressed relative to parental species in interspecific hybrids. This suggests that the phenomenon of gene misregulation in interspecific hybrids is occurring predominantly among genes whose mutants do not display lethal phenotypes, possibly due to the effect of strong purifying selection acting on genes with severely deleterious mutant phenotypes.

Several mechanisms have been proposed in order to account for the misregulation of genes in interspecific hybrids (for review see Ortiz-Barrientos et al. 2006). Such mechanisms include the divergent coevolution of transcription factors and their binding sites between species (Johnson and Porter 2000), such that they fail to complement in the hybrid background. Other mechanisms involve species-specific loss of regulatory pathway elements (such that the pathways no longer complement each other in the hybrids) and divergent evolution of alternatively spliced transcripts and other forms of posttranslational modification (Ortiz-Barrientos et al 2006).

All such mechanisms share a common feature in that they predict that the most rapidly evolving (and thus divergent) genes will be those that are most likely to be subject to misregulation in interspecific hybrids. Simulations of these conditions have found that the rate of accumulation of incompatibilities between populations increases when the affected loci are under directional selection (Johnson and Porter 2000, 2007). These predictions are partially validated by our observation that underexpressed misregulated genes evolve more rapidly than nonmisregulated genes (Fig. 1, Supplementary Table 2), as well as the observation that male-biased genes (MBGs) are overrepresented among them.

Unfortunately, these mechanisms do not seem to account for our observation that overexpressed misregulated genes evolve less rapidly than nonmisregulated genes (Fig. 1, Supplementary Tables 1 and 2). A closer inspection of the predicted functions of these genes using FATIGO (Al-Shahrour et al 2004) reveals that they are enriched in proteins involved in translation (more specifically ribosomal proteins; p = 5.88 × 10−5) relative to underexpressed misregulated genes. Ribosomal production may be upregulated in hybrids in order to maintain viability under the burden of reduced expression of many genes, though such an effect would only be proximally related to hybrid misregulation rather than a direct effect of hybrid incompatibility. Female-biased genes (FBGs) are also overrepresented among overexpressed misregulated genes (Haerty and Singh 2006). Such genes have been shown to evolve less rapidly than MBGs or genes that do not show sex bias (Meiklejohn et al. 2003). Proper male sex determination and differentiation in drosophilids requires the activation of male-determining genes as well as the concomitant repression of female-determining genes (for review see Schütt and Nöthiger 2000). The overwhelming misregulation of MBGs in hybrids could produce a lack of proper repression of FBGs, leading to an overall pattern of overexpression among these conserved transcripts.

While such mechanisms can account for why a portion of the most conserved misregulated genes tends to be overexpressed, it is quite probable that additional mechanisms also play important roles in the phenomenon. For instance, 75 of the overexpressed misregulated genes are classified as MBGs (Supplementary Table 1). Analyzed as a group, these genes are still evolving less rapidly than nonmisregulated genes in both d N and d N /d S (Kruskal-Wallis test, p < 2.2 × 10−16 and p = 2 × 10−4 respectively), while no difference is observed for the d S (p = 0.0668), indicating that additional evolutionary processes are responsible for the coupling of sequence conservation and overexpression in interspecific hybrids. Additional functional studies leading to a better understanding of the mechanisms of gene regulation may be required to identify these processes.

Correlation Between Parental Expression and Sequence Divergence

As previous studies have shown, we also observed a significantly positive correlation between expression difference between parental species and the d N or d N /d S ratio (Castillo-Davis et al. 2004; Nuzhdin et al. 2004; Lemos et al. 2005). However, these results are in contrast with Good and coworkers’ (2006) finding that genes with divergent expression between species do not have a significantly higher d N/d S than genes with similar levels of expression. Such a discrepancy may be due to the larger number of genes used in the present study, as well as to the fact that our analysis is restricted to genes expressed within the testis, which are known to present a higher variation in sequence and expression differences between species (Meiklejohn 2003; Ranz et al. 2003).

In conclusion, we find that underexpressed misregulated genes in interspecific sterile hybrids are evolving more rapidly at the coding level than genes that are not misregulated or overexpressed misregulated genes and that gene expression differences between hybrids and parental species are significantly correlated with coding sequence divergence. Our observation that misregulated genes, specifically those that are underexpressed, show an underrepresentation of genes with known lethal mutant phenotypes would suggest that similar selective pressures are acting to maintain expression levels as well as to minimize sequence divergence of essential genes between species. The phenomenon of gene misregulation in hybrids appears to involve a more rapid evolution of coding sequences coupled with an enrichment of male-biased genes among those misregulated (e.g., Michalak and Noor 2003; Haerty and Singh 2006). Previous studies have shown that male-biased genes are evolving more rapidly, probably due to sexual selection (Swanson and Vacquier 2002; Singh and Kulathinal 2000; Jagadeeshan and Singh 2005); therefore, this suggests that sexual selection may be the driving force behind the rapid divergence of the mostly male-biased, misregulated genes observed in this study.