Introduction

Sex determination is the commitment of an embryo to either the female or male developmental pathway. The best-known case is that of Drosophila melanogaster, whose sex determination mechanism has been thoroughly analysed. The epistatic relationships between the sex determination genes in this species show that hierarchical interaction occurs among them (reviewed in Sánchez et al. 2005). The characterisation of these genes has shown that their control during development is governed by the sex-specific splicing of their products. The product of a gene controls the sex-specific splicing of the pre-messenger ribonucleic acid (mRNA) from the downstream gene in the genetic cascade. Sex-lethal (Sxl) is at the top of this cascade; its product controls the splicing of its own pre-mRNA as well as the splicing of the pre-mRNA from the downstream gene transformer (tra). The Tra product and the product of the constitutive gene tra-2 control the sex-specific splicing of pre-mRNA from the gene doublesex (dsx), which is transcribed in both sexes but gives rise to two different proteins, DsxF and DsxM. These are transcription factors that impose female and male sexual development, respectively, via the sex-specific regulation of the so-called sexual cytodifferentiation genes.

The gene dsx has been characterised in the dipterans Megaselia scalaris (Kuhn et al. 2000), Musca domestica (Hediger et al. 2004) and Anopheles gambiae (Scali et al. 2005), in the fruit flies Bactrocera tryoni (Shearman and Frommer 1998), Bactrocera oleae (Lagos et al. 2005), Ceratitis capitata (Salvemini et al., personal communication) and Anastrepha obliqua (Ruiz et al. 2005) and in the lepidopteran Bombyx mori (Ohbayashi et al. 2001). In all these species, dsx codes for male- and female-specific RNAs, which encode the male-specific and female-specific Dsx proteins.

We previously isolated and characterised the gene dsx of Anastrepha obliqua (Aodsx; Ruiz et al. 2005). Figure 1 shows it to be composed of four exons instead of the usual six seen in other species. Exons 1 and 2 are important to both sexes, whereas exon 3 is female-specific and exon 4 is male-specific. The dsx gene is transcribed during development and in adult life in both sexes, but its primary transcript undergoes sex-specific splicing so that a different mRNA is produced in each sex. These mRNAs encode the female DsxF and male DsxM proteins; these have the amino-terminal region in common but differ in the carboxyl-terminal region. The comparison of Aodsx mRNA molecular organisation in males and females suggest that in A. obliqua, the male-splicing pathway represents the default mode. The conceptual translation of the male and female Aodsx mRNAs shows that they encode two polypeptides of 397 and 319 amino acids, respectively. Their comparison with the Dsx proteins of other insects shows that the degree of similarity is higher for the female-specific than for the non-sex-specific and the male-specific regions. The number of amino acids in the OD1 and OD2 domains of the Dsx protein of A. obliqua and other insect species is the same; the overall similarity of these domains is also very strong (Ruiz et al. 2005).

Fig. 1
figure 1

The molecular organisation of gene dsx in Anastrepha species. Exons (boxes) and introns (broken lines indicate that the length of the intron remains unknown) are not drawn to scale. The numbers inside the boxes identify the exons. The beginning and the end of the ORF are indicated by ATG and TGA or TAA, respectively. AAA stands for polyadenylation. a Arrows indicate the positions of the primers, the sequences of which are shown below. b The female-specific dsx exon of Anastrepha species showing the distribution of the 13 nucleotide DsxRE repeats and the purine-rich element (PRE), as well as their sequences

We were interested in the evolution of gene dsx. Because of the different structural organisation of the dsx gene in A. obliqua and in the other dipteran species, we decided to investigate its organisation and variability in other Anastrepha species. This paper reports the characterisation of the dsx gene of several Anastrepha species belonging to three intrageneric taxonomic groups, i.e. A. serpentina, A. striata and A. bistrigata (serpentina group), A. grandis (grandis group) and A. amita and A. sororcula (fraterculus group; Norrbom et al. 1999), plus the four recognised species of the so-called Anastrepha fraterculus complex of cryptic species, i.e. A. sp.1 aff. fraterculus, A. sp.2 aff. fraterculus, A. sp.3 aff. fraterculus and A. sp.4 aff. fraterculus (Selivon et al. 2005), in comparison to the already described dsx of A. obliqua (Ruiz et al. 2005) and other insects.

Materials and methods

The species of Anastrepha studied, their host fruits and the sites where they were collected are shown in Table S1 in the Supplementary material.

For extraction of total genomic deoxyribonucleic acid (DNA) and total RNA from adult flies, the procedure described in Ruiz et al. (2005) was followed.

For polymerase chain reaction (PCR) and reverse transcription PCR (RT-PCR) analyses, 500 ng of genomic DNA from each adult insect were used for PCR analyses. Five micrograms of total RNA from each were reverse transcribed with Superscript (Invitrogen) following the manufacturer’s instructions. Ten percent of the synthesised complementary DNA (cDNA) was amplified by PCR. PCR and RT-PCR products were analysed by electrophoresis in agarose gels, and the amplified fragments were sub-cloned using the TOPO TA-cloning kit (Invitrogen) following the manufacturer’s instructions. These sub-clones were then sequenced using universal forward and reverse primers. Figure 1 shows the sequences and location of the primers.

DNA sequencing was performed using an automated 377 DNA sequencer (Applied Biosystem). The following list shows the accession numbers for the open reading frames (ORFs) and protein sequences of the insect species studied: A. sp.1 aff. fraterculus DsxF (DQ494344) and DsxM (DQ494334); A. sp. 2 aff. fraterculus DsxF (DQ494325) and DsxM (DQ494335); A. sp. 3 aff. fraterculus DsxF (DQ494326) and DsxM (DQ494336); A. sp. 4 aff. fraterculus DsxF (DQ494327) and DsxM (DQ494343); A. grandis DsxF (DQ494328) and DsxM (DQ494337); A. serpentina DsxF (DQ494329) and DsxM (DQ494338); A. sororcula (DQ494330) and DsxM (DQ494339); A. striata (DQ494331) and DsxM (DQ494340); A. bistrigata (DQ494332) and DsxM (DQ494341); and A. amita DsxF (DQ494333) and DsxM (DQ494342).

For comparison of DNA and protein sequences and for phylogenetic analyses of gene dsx, the methodology used for the analysis of gene Sxl was followed (Serna et al. 2004)

Results and discussion

Molecular organisation of dsx in the Anastrepha species studied

It was assumed that the dsx gene of the Anastrepha species studied had the molecular organisation of the Aodsx gene (Ruiz et al. 2005; see Fig. 1a). With this assumption, the characterisation of dsx from the different Anastrepha species was undertaken as follows.

Firstly, RT-PCR analyses of total RNA from male and female adults were performed separately. RT was performed with oligo-dT, and PCR with a primer (P1) from the 5′ untranslated region (UTR) of Aodsx plus a primer from the 3′-UTR of either the female (primer P4) or the male (primer P5) Aodsx mRNAs. This ought to allow the whole ORF of the male and female Dsx proteins of the different Anastrepha species to be amplified.

Secondly, a set of primers was synthesised corresponding to the sequences of the three introns in the proximity of the exons of the Aodsx gene. Using these primers in combination with those from the exon sequences in PCR amplification of genomic DNA, it was expected that the intron and exon sequences flanking the exon–intron junctions would be amplified. The comparison of these amplified genomic sequences with those obtained by RT-PCR was expected to allow the molecular organisation of dsx from the different Anastrepha species to be determined. Unfortunately, of all the primers synthesised, only primer P3 in intron 2 produced amplified genomic fragments in all species. This was not an unexpected result, however, as introns show a greater degree of sequence variation than exons. Despite this drawback, the molecular organisation of dsx in the different Anastrepha species was ascertainable, as shown below.

The Aodsx gene contains a small intron of 126 bp (intron 2) between the common exon 2 and the female-specific exon 3 (Ruiz et al. 2005). PCR analysis of the genomic DNA using primer P3, located at the end of intron 1, and primer P4, at the end of the female-specific exon, ought to amplify exons 2 and 3 plus intron 2 lying between them. This ought to reveal the junctions between intron 1 and exon 2, between this exon and intron 2 and between this intron and exon 3. PCR amplification of genomic DNA using primers P1 and P2 (at the end of exon 1) and the comparison of the amplicons with the cDNA sequence obtained by RT-PCR were undertaken to determine whether exon 1 is the same in all species and to reveal the junction between exon 1 and intron 1. It should be remembered that the female-specific exon 3 behaves as an intron in the male mode of splicing, so that in males, the common exon 2 is joined to the male-specific exon 4. The comparison between the amplified genomic sequence (amplified using primers P3 and P4; see above) and the male and female cDNA sequences (obtained by PCR with P1 and P5; see above) indicated that all Anastrepha species have exon 2 in common. In addition, the locations of the junctions between intron 3 and its two flanking exons were resolved. Furthermore, all these results eliminated the existence of micro-exons in the large introns 1 and 3.

Collectively, these results indicate that the molecular organisation of gene dsx is conserved in all the Anastrepha species here analysed. The gene is composed of four exons: The first two are common to both sexes, the third exon is female specific and the fourth exon is male specific. Exon 1 of Anastrepha corresponds to the fusion of exons 1 and 2 of the Drosophila and Bactrocera dsx genes and exon 4 to the fusion of Drosophila and Bactrocera male-specific exons 5 and 6.

Sex-specific splicing of dsx pre-mRNA

The comparison of the molecular organisation of dsx in males and females of the various Anastrepha species suggests that the male-splicing pathway is the default mode for all of them. Firstly, the putative female-specific amino acid region is skipped over in males, and secondly, the female-specific exon 3 contains three putative DsxRE targets for the Tra–Tra2 complex as well as the purine-rich element (PRE). The DsxRE and PRE elements are highly conserved and are located in the same position in the female-specific exon of all Anastrepha species (Fig. 1b), with the PRE element inserted between DsxRE targets 2 and 3. The DsxRE elements are composed of the same sequence of nucleotides (Fig. 1b). The PRE element is made up of the same nucleotides in all species except in A. sp.2 aff. fraterculus and A. grandis, in which there is a change in one nucleotide at the same position: The C nucleotide at the eighth position is replaced by A or T in A. sp. 2 aff. fraterculus and A. grandis, respectively (Fig. 1b).

The incorporation of female-specific exon 3 instead of male-specific exon 4 is probably caused by activation of the splice acceptor site of exon 3, as seen in other dipteran insects (see references in Introduction). This activation is exerted by the binding of the Tra–Tra2 complex to the 13-nucleotide repeated sequences (DsxRE), plus the binding of a specific member of the SR family—RBP1—to the repeats and the binding of dSF2/ASF to PRE (only present in the female-specific exon). This allows the weak female specific 3′ splice site to be recognised and used by the generic splicing machinery (reviewed in Black 2003 and cites therein).

The Dsx proteins of Anastrepha and other insects

Theoretically, the translation products of the male and female dsx ORFs of the different Anastrepha species are two polypeptides of 396 and 317 amino acids, respectively. To better compare the degree of conservation between these proteins, they were divided into three regions: The non-sex-specific, female-specific and male-specific regions (see Fig. S1 in Supplementary material). The Dsx proteins of A. obliqua were used as a reference.

The common region was formed by the first 287 amino acids, while the female- and male-specific regions of 30 and 109 amino acids, respectively. Conservation was exceedingly high—Similarity was more than 98% (referring to the number of identical and conserved amino acids; data not shown). Variation was mainly present in the non-sex-specific region and to a lesser extent in the male-specific regions. The OD1 domain was 100% similar among all species. The same was true for the OD2 domain, except in the case of A. sp.3 aff. fraterculus and A. bistrigata, in which the similarity was slightly reduced (98.5%). In all species, the OD2 domain was formed by amino acids of the non-sex-specific region and extended into the female-specific exon. The high degree of conservation of the OD1 and OD2 domains is to be expected because these domains endow the Dsx proteins with the capacity to interact with others and with DNA (Cho and Wensink 1997).

The degree of conservation of Dsx proteins of Anastrepha species contrasts with the degree of variability of Tra proteins of Drosophila species. The tra homologous genes of the drosophilids D. simulans, D. erecta, D. hydei and D. virilis (O’Neil and Belote 1992) and of the four sibling species of the Melanogaster complex D. melanogaster, D. simulans, D. mauritiana and D. sechellia (Kulathinal et al. 2003) were characterised and compared. The Tra proteins showed an unusually high degree of evolutionary divergence, yet the SR motifs (protein regions with a high level of arginine–serine dipeptides) were conserved. Tra belongs to the SR protein family, whose members are involved in spliceosome assembly and the regulation of alternative splicing (reviewed in Black 2003 and cites therein). The different degree of variability in the Dsx and the Tra proteins might be due to the less constrained structural requirements of Tra proteins with respect to the exertion of their function. The SR domains appear to be the major functional part of the Tra protein, and it appears that the presence of just 10–20% of RS dipeptides in the Tra protein is sufficient to bestow functionality (Kulathinal et al. 2003). Therefore, the Tra proteins probably underwent high rates of neutral evolution whenever they maintained the appropriate levels of SR dipeptides.

In the evaluation of Dsx protein variation in Anastrepha and in other insects, males showed a slightly higher (p = 0.267 ± 0.010) overall divergence than females (p = 0.203 ± 0.009), probably because of the high degree of variability in the male sex-specific region (p = 0.439 ± 0.011, Table 1). The nature of the nucleotide variation underlying this protein variability was essentially synonymous, being roughly the same both in the common and male sex-specific regions (p S  ∼ 0.450). A slight asymmetry was seen in the females, which showed more variation in the common region of the gene (0.447 ± 0.010, Table 1). In all cases, the observed synonymous variation was significantly greater than the non-silent divergence (p < 0.001, Z test). This excess of synonymous substitutions agrees with the overall codon bias values determined in the present work for complete dsx genes in females and males. This was also seen in comparisons with the common and sex-specific regions of the gene (Table 1). These results, together with those obtained for the codon-based Z test of selection—in which the null hypothesis of absence of selection can be safely rejected—suggest the presence of strong-purifying selection acting on dsx to preserve the mechanism of action of the Dsx proteins. This further indicates the important function that dsx has in controlling the sexual development of insects.

Table 1 Evolutionary distances, codon bias and tests for selection in the dsx gene

Phylogenetic analysis of dsx in Anastrepha species and in other insects

Figure 2a and b show the phylogenetic relationships among the dsx genes of the Anastrepha species and between these and other insects. Construction was undertaken independently for the entire DsxF and DsxM nucleotide sequences (Fig. 2a) or using the region of the dsx gene common to both sexes (the segment corresponding to male and female-specific forms was removed in this case) (Fig. 2b). Nucleotide distances (Kimura-2 parameter) were used to construct the phylogenetic tree given that they yield more detailed information on the relationships among closely related species; the topology was essentially the same as that obtained with the amino acid sequences (data not shown). Maximum-parsimony analyses supported the constructed topology, indicating that these results are not dependent on the tree-building method used (neighbour joining).

Fig. 2
figure 2

a Phylogenetic relationships among dsx genes in dipteran insects. The references corresponding to the dsx nucleotide sequences of all the insects analysed here are given in the “Introduction.” Female and male-specific sequences are indicated as M and F, respectively, near the species names. Numbers for interior nodes indicate BP and CP values, followed by confidence values resulting from the maximum parsimony topology (the third value in boldface, not shown if less than 50%). Confidence values were based on 1,000 replicates and are only shown when at least one was greater than 50%. Taxonomic relationships among dipterans are indicated on the right-hand side of the tree for comparison, indicating the evolutionary origin of each taxa by circles at the corresponding nodes of the topology. The dsx gene from the lepidopteron Bombyx mori was used as an out-group. b Phylogenetic relationships among dipterans based only on the coding region of gene dsx common to both sexes (the segment corresponding to male and female-specific isoforms was removed in this analysis). The tree was reconstructed as in a, indicating the taxonomic relationships and the confidence values at interior nodes

The resulting topologies agreed well with the taxonomy of the Order Diptera in comparisons made above the genus level and show very similar branch lengths, the different nodes being well supported by the phylogenetic tests performed. All those investigated appear to share a common ancestor with Anopheles, which belongs to the Suborder Nematocera. Together, the latter forms a phylogenetic branch. The remaining dipteran species, which belong to the Suborder Brachycera, form the other phylogenetic branch. Among the members of Brachycera, the tephritids are more closely related to the drosophilids than to Musca or finally to Megaselia. With respect to the Tephritidae, the clustering of dsx sequences is in line with the current taxonomic relationships among the genera Anastrepha, Bactrocera and Ceratitis (Korneyev 1999). Whereas the first is assigned to the subfamily Trypetinae, Ceratitis and Bactrocera belong to the subfamily Dacinae (primary branching), but they are assigned to different tribes, Ceratitidini and Dacini, respectively (secondary branching).

As mentioned above, the gene dsx is subject to strong purifying selection to preserve the mechanism of action of Dsx proteins, indicating the important function that dsx has in controlling the sexual development of insects. Nevertheless, the common region of DsxF and DsxM proteins appeared to be the main target in the long-term evolution of gene dsx, as it was uncovered by the resemblance in the topologies and branch lengths of the trees reconstructed from complete DsxF and DsxM sequences (Fig. 2a) and the phylogeny corresponding to the region common to both sexes (Fig. 2b). It should be remembered that the binding capacity of both Dsx proteins resides in the OD domains: OD1 in the common region and OD2, which covers part of the common region and part of the female-specific region. In this context, it is worth mentioning that this common region represents the major source of variation, although in the case of males, the sex-specific region significantly contributes to the overall variation of the dsx gene as well (see Table 1). It is argued here that the predominant evolutionary role of Dsx common region might be the consequence of both DsxF and DsxM proteins to bind to the same sequences and control the activity of their final common targets, the sexual cyto-differentiation genes. Therefore, sufficient time has almost certainly elapsed since the moment of separation of the different phylogenetic lineages to allow the accumulation of divergence between the species sufficient to result in the formation of different co-adapted complexes between dsx and its target genes. This would result in a higher degree of variability in the common region.

Although in a lesser extent if compared with common regions, the sex-specific segments of DsxF and DsxM proteins are also subject to purifying selection (Table 1), as expected, as they endow these proteins with a different, oppose transcriptional role that would be preserved across species.