Introduction

The origin and evolution of spliceosomal introns are debatable since their discoveries, e.g., early or late, gain or loss. The dispute regarding the opposing introns-early (exon shuffling) versus introns-late (intron insertion) theory is very much alive (e.g., Stoltzfus et al. 1994; Logsdon 1998; O’Neill et al. 1998; de Souza et al. 1998; Roy et al. 2001; 2002; Wolf et al. 2001; Fedorova and Fedorov 2003; Roy 2003). The introns-early theory suggests that introns are ancient and present in the common ancestor of all life for exon shuffling to accelerate the formation of new genes (e.g., Hurst and McVean 1996; De Souza et al. 1998). The introns-late theory, however, proposes that ancient genes existed as uninterrupted exons, and introns were introduced during the course of gene evolution (e.g., Logsdon et al. 1995; Cho and Doolittle 1997; O’Neill et al. 1998). In recent years, the introns-early theory is becoming dominant (Fedorova and Fedorov 2003; Roy 2003). However, the debate may last, with the evidence established from ongoing studies, though a synthetic theory is being considered compatible to make equal contributions to interpret intron origin/evolution (de Souza et al. 1998, 2003).

Serine protease subtilisins have been identified from bacteria, fungi, plant, invertebrates, and vertebrates and represent an ancient family of evolutionarily conserved proteins (Siezen and Leunissen 1997). Indeed, extracellular alkaline subtilisins of pathogenic fungi play an important role during the process of fungal pathogenesis. The subtilisin PR1A of M. anisopliae has been demonstrated to be one of the key virulence factors evident from which a transformant with multiple copies of the Pr1A gene exhibited an enhanced level of pathogenicity (St Leger et al. 1996), while a gene-deficient mutant lost its aggressiveness against an insect host (Wang et al. 2002). Similarly, the transformants of the nematode-trapping fungus Arthrobotrys oligospora developed a higher pathogenic ability when they contained the additional subtilisin gene PII (Åhman et al. 2002). In total, 11 subtilisin genes, Pr1A to Pr1K, have been identified from M. anisopliae (Bagga et al. 2004). In this study, phylogenetic and exon–intron structure analyses of retrievable fungal subtilisins were conducted at both intra- and interspecific levels and the findings support a mixed model of intron origins.

Materials and Methods

Gene Amplification

The primers for nested-PCR amplifications of the subtilisin gene Pr1A of M. anisopliae were described before (Wang et al. 2002). After analysis of the restriction map of a structure gene sequence (AJ416695) with the program Webcutter 2.0 (http://www.firstmarket.com/cutter/cut2.html), enzymes that had more than four cuttings were selected for restriction analysis. Five enzymes, DdeI, HinfI, HaeIII, MspI, and RsaI, were used for digestion analysis of the PCR products from 32 different original strains and the products from 17 strains with different restriction profiles from each other were cloned and sequenced subsequently (Table 1).

Table 1 Background information on the examined strains with the sequenced Pr1A gene of Metarhizium anisopliae

Cloning and Sequencing

The PCR products from the above selected strains were separated on 1.5 % agarose gels and the target bands were sliced off and purified with a Qiagen QIAEX-II purification kit according to the manufacturer’s instructions. The purified products were quantified with a spectrophotometer (Beckman DU-600) and cloned with a Qiagen PCR cloning kit according to the provider’s protocol. Plasmid DNA was extracted with a QIAprep Spin Miniprep kit (Qiagen) and sequenced with T7 (reverse) and Sp6 (forward) promoter primers. Automated sequencing reactions were performed on a Beckman Coulter CEQTM 2000XL DNA analysis system using a CEQ cycle sequencing kit (Beckman Coulter) according to the manufacturer’s instructions.

Sequence analysis

The Pr1A sequences generated from 17 different original strains in this study and six other Pr1A sequences of M. anisopliae, i.e., AJ416688, AJ416695, AB073326, AB073327, AB073344, and AB073345, retrieved from GenBank by BLAST search were analyzed (Table 1). Nucleotide sequences were aligned with the program DIALIGN 2 (Morgenstern 1999) and edited by visual inspection. The individual intron sequence was isolated after alignment with a cDNA sequence (M73795).

Phylogenetic and Exon–Intron Structure Analyses

Neighbor-joining phylogenetic analysis was performed using the program MEGA 2.1 (Kumar et al. 2001). Parsimony bootstrap analysis was conducted with 1000 replicates and a 50% majority rule tree was produced. For nucleotide sequences, a Kimura two-parameter model was selected, with pairwise deletion of gap/missing data. For amino acid sequences, a Poisson correction model was selected.

Phylogenetic relationships between the topologies of coding region and intron-based trees were analyzed using a partition-homogeneity test with the program PAUP (v4.0 b10a; Swofford 2001). The aligned sequences were trimmed to the same size and combined with noted positions, e.g., gene 1, position 1 to position 968; and gene 2, position 968 to position 1044. The tests were carried out using a heuristic search with random addition of sequences and 1000 replicates. The estimated P value demonstrates the significance of the homogeneous or heterogeneous relationship between the data sets (Farris et al. 1994).

The correlation analysis of gene exon–intron (intron position and phase) and protein structure provides useful information on intron origin and evolution (e.g., Stoltzfus et al. 1994; Logsdon et al. 1995; Rzhetsky et al. 1997; Sánchez et al. 2003). Intron phase (the position of the intron within a codon: phase 0, an intron lying between two condons; phase 1, an intron lying after the first nucleotide of a codon; phase 2, an intron lying after the second nucleotide) and position have been considered important characters for evaluating intron evolution (Long et al. 1995, 1998). In this study, the subtilisin Pr1A gene of M. anisopliae together with retrievable fungal subtilisin genes with clearly annotated exon–intron structures from GenBank was examined to interpret intron origin/evolution. In total, 60 subtilisin genes/proteins were analyzed, including the subtilisin genes from Pr1A to Pr1K of M. anisopliae, among them the Pr1J gene from different strains containing different numbers of introns (Bagga et al. 2004) (Fig. 3).

Results

Gene Structure of Subtilisin Pr1A

The Pr1A gene (1175 of 1371 bp) from 23 different original strains of M. anisopliae was analyzed in this study (Fig. 1a). There were three introns conservatively inserted in the gene (Fig. 1b). Introns 1 (76 bp) and 2 (59 bp) had the same sizes in all examined strains, while intron 3 had two different sizes, 68 bp (AJ416695, AY389121, AY389133, and AY389134 from four American strains) and 72 bp (the others; Table 1), where the former were short four nucleotides, TACC, compared to the latter (Fig. 1b). All introns have typical 5′ and 3′ boundaries of GT and AG of spliceosomal introns. Decoding analysis indicated that these introns were phase 0 introns, i.e., the exons of Pr1A were symmetric (Fig. 1b).

Figure 1
figure 1

Schematic presentation of the Metarhizium anisopliae Pr1A gene (a) and the partial sequence alignment of representative sequences (b) showing the insertion and variation of introns. The region within the arrows (1175 of 1371 bp) was sequenced and analyzed from different original strains in this study. M73795, Pr1A cDNA sequence. The sizes of intron 1 (76 bp) and intron 2 (59 bp) are the same in all examined strains, while there are two types of intron 3 (72 bp, e.g., AY389119, and 68 bp, e.g., AY389133). All introns have typical consensus 5′ and 3′ boundaries of GT and AG of spliceosomal introns, respectively. All three introns lie between two codons, indicating that they belong to phase 0 introns.

Phylogenetic Relationships

Phylogenetic analysis was performed in association with the Pr1A structure gene and its three introns (Fig. 2). There were five clades in the phylogenetic tree of the structure gene, which were clearly separated and supported by high bootstrap values (Fig. 2a). When lined with insect host orders and geographic localities of the host strains, the sequences from geographically associated localities (the same country or region) exhibited close evolutionary relationships (Fig. 2a). Particularly, the sequences from four American strains with a shorter intron 3 were monophyletic in all the trees, while the gene (AY389126) from an African strain was isolated from the others (Fig. 2).

Figure 2
figure 2

Neighbor-joining phylogenetic trees of the Pr1A structure gene (a) and introns (bd). Bootstrap values >50% are shown as numbers above branches. The clades with thick branches exhibited the sequences originally from the same localities of a country or a geographic region. Hosts: OR, Orthoptera; SO, isolated from soil; CO, Coleoptera; AC, Acari, tick; NK, not known; BL, Blattodea, cockroach; HO, Homoptera; LP, Lepidoptera. Asterisks (*) denote sequences from this study.

The differences between the topologies of the phylogenetic trees were observed and verified by partition-homogeneity tests (Fig. 2, Table 2). The topologies of the pairing trees of cDNA and intron 1 and of cDNA and intron 2 were phylogenetically congruent (P > 0.05), while the topology of the intron 3 tree was significantly heterogeneous to the cDNA- (P = 0.02) and intron 1-based trees but, interestingly, was congruent with the intron 2 tree (P = 0.752) (Table 2).

Table 2 Partition-homogeneity test between the cDNA and introns of Metarhizium anisopliae Pr1Aa

Relationship Between Gene and Protein Structure

The 60 examined fungal subtilisin genes contained from zero to three introns (Fig. 3). The phase distribution of the total 133 introns was highly biased toward phase 0, with the ratio of phase 0:1:2 being 122:10:1 (Fig. 3). In addition, a high bias (>80 %) of symmetric exons (flanked by introns in the same phase) was observed in fungal subtilisin genes (Fig. 3). Phylogenetic and exon–intron structure analyses indicated that the higher the similarity between subtilisin amino acid sequences, the more similar exon–intron structure they usually have (Fig. 3). The close relationship is independent of host fungal species, e.g., the subtilisins from three different species of Aspergillus sharing an identical gene structure (Fig. 3). One significant exception was that the Pr1J gene from three different strains of M. anisopliae had different numbers of introns, with AJ269535 containing one intron (phase 0) and AJ416687 and AJ416694 having an additional phase 1 intron (Fig. 3). Nucleotide sequence alignment showed that the putative intron insertion site was exactly the same in AJ269535 (data not shown).

Figure 3
figure 3

Phylogenetic relationship between amino acid sequence and exon–intron structure of fungal subtilisins. Neighbor-joining tree with pairwise deletion of gap/missing data and 1000 bootstrap replicates. The right frame diagrams show the intron distribution (position and phase) in proteins that were trimmed into the same size after alignment. Arrowheads indicate the intron insertion position and the numbers in frame representing the intron phase 0, 1 or 2, respectively (see also 4). Fungal species: Ma, Metarhizium anisopliae; Ac, Acremonium chrysogenum; Th, Trichoderma harzianum; Hv, Hypocrea virens; An, Aspergillus nidulans; Af, Aspergillus favus; Afu, Aspergillus fumigatus; Bg, Blumeria graminis; Ani, Aspergillus niger; Mg, Magnaporthe grisea; Pa, Pandora anserine; Tr, Trichophyton rubrum; Ta, Tritirachium album; Bb, Beauveria bassiana; Vc, Verticillium chlamydosporium; Un, unnamed gene product.

Amino acid sequence alignment of the selected proteins from each subclade indicated that there were 10 intron insertion positions among the examined fungal subtilisins. Positions 1, 2, 6, 8, and 10 were highly conserved, while the others fell into different subgroups (Figs. 3 and 4). Positions 1 and 2 had high percentages of intron occupancy, the frequencies being 90 and 45%, respectively. There was no position 1 in the intronless genes Pr1E and Pr1F of M. anisopliae, and the subtilisin gene Pep (AY039006) of Blumeria graminis contained no intron at position 1 (Figs. 3 and 4).

Figure 4
figure 4

Truncated amino acid sequence alignment of selected subtilisin proteins showing intron insertion positions. The amino acid positions correspond to Tritirachium album proteinase K. Dots indicate omission of amino acids and hyphens indicate gaps. Two shaded amino acids indicate the insertion of a phase 0 intron, while a single shaded amino acid indicates the insertion of a phase 1 (positions 5, 6, 8, and 10) or phase 2 (position 4) intron (see also 3). The boldface D, H, and S are the conserved catalytic triad of subtilisins.

Partition-homogeneity tests were conducted using the intron sequences isolated from positions 1 and 2 and the corresponding coding region sequences and the P values were estimated as 0.075 and 0.557, respectively, indicating that the evolutionary histories between the introns at these two positions and the coding region were phylogenetically congruent.

Discussion

Phylogenetic and exon–intron structure analyses of intra- and interspecific fungal subtilisins in this study supported the theory of a mixed model of intron evolution: the evidence supported both introns-early and introns-late speculations. Intraspecifically, phylogenetic congruence or incongruence between different introns and coding region of M. anisopliae Pr1A indicates that no single scenario can the plain intron origin/evolution, which is similar to the evolution of group I introns (e.g., Wang et al. 2003). The congruence between the trees of introns 1 and 2, intron 1 and cDNA, and intron 2 and cDNA would suggest that these two introns were primordially present in the Pr1A gene and share a common evolutionary history with the coding region, supporting the introns-early speculation. The incongruence between intron 3 and cDNA as well as the two different sizes of intron 3 between strains suggests that the presence of intron 3 in Pr1A was not primordial, supporting the introns-late speculation. In addition, Pr1A introns 1 and 2 were located at highly conserved sites across fungal subtilisins, while the position (9) for intron 3 was subgroup specific (Fig. 4). Phylogenetic incongruence between intron 1 and intron 3 was expected, while the high topological homogeneity (P = 0.725) between the introns 2 and 3 trees was probably the result of coevolution of adjacent introns as speculated before (Schwaiger et al. 1993).

In this study, direct support for the introns-late theory came from the observation that the M. anisopliae Pr1J gene could intraspecifically contain one intron (AJ269535) or two introns (AJ416687 and AJ416694). The high percentage of inton absence at position 6 (90%; the location of the Pr1J intron2) (Figs. 3 and 4) would suggest that it is more likely that intron insertions occurred in AJ416687 and AJ416694 rather than that AJ269535 lost an intron. This is consistent with the conclusion of a prevalence of intron gain over loss of splicesomal introns (Qiu et al. 2004; Babenko et al. 2004). However, it is debatable regarding the addition or deletion of introns from putative insertion sites during gene evolution, different studies reporting either that only random insertion and no loss had occurred (Cho and Doolittle 1997) or that just loss and no gain occurred (Roy et al. 2003) or both (Rogozin et al. 2003; Zhaxybayeva and Gogarten 2003). Intron presence–absence polymorphism in Drosophila jingwei genes was reported to result from intron loss, driven by positive Darwinian selection acting on the flanking sequences (Llopart et al. 2002). Whatever the scenario, it can be concluded that a recent event of intron insertion/deletion occurred in the Pr1J gene. A similar case in support of the introns-late view was the discovery of intron presence in the triose-phosphate isomerase (TPI) gene of Culex and Aedes mosquitoes but intron absence from those of Anopheles mosquitoes and the fly Calliphora vicina as well as the moth Spodoptera littoralis (Kwiatowski et al. 1995).

Interspecifically, the analysis of fungal subtilisin genes also supported a mixed model of intron origin/evolution. There were at least 10 intron insertion sites across fungal subtilisins, albeit at most only three introns were detected from a single gene. Half of these positions were highly conserved, and most significantly, 90% of fungal subtilisins contained a phase 0 intron at the highly conserved position 1, followed by a phase 0 intron at the highly conserved position 2 (45%) (Figs. 3 and 4), i.e., a significant excess of phase 0 introns and an excess of symmetric exons, strong evidence for exon shuffling (Long et al. 1995; De Souza et al. 1998; Roy et al. 1999; Long and Deutsch 1999; Fedorov et al. 2001, 2003). In addition, similarly to the discussion above, the partition-homogeneity test resulting in a phylogenetic congruence relationship between the evolutionary histories of introns and coding regions again supported the primordial presence of introns at these two positions. In contrast, positions 4, 5, and 10 (Figs. 3 and 4) were detected from only a single fungal species, appearing to be of recent origins, which is similar to the observations in TPI supporting the introns-late theory (Logsdon et al. 1995). For the introns-late theory, intron origin or the arising of new intron positions is still perplexing; extensive genomic examinations demonstrated that intron transposition within genomes could not have occurred recently (Fedorov et al. 2003), however, intron sliding over short distances (1–15 bases) has been demonstrated (Rogozin et al. 2000), but its contribution to the diversity of intron positions could be ignored (Stoltzfus et al. 1997). Sequence analysis indicated that no significant similarity existed between subtilisin introns at different positions (data not shown), suggesting that intron drifting did not occur to contribute to the divergence of fungal subtilisin exon–intron structure. Regardless of functional magnitude, multiple subtilisins identified from a single fungal species demonstrated events of gene duplications (Bagga et al. 2004). A recent study suggested that gene duplication could boost intron insertion (Babenko et al. 2004), an alternative scenario that could be employed in this study to explain the diversity of intron positions.

The parameters of protein tertiary structure (e.g., centrality, diameter, or radius [Logsdon et al. 1995]) were usually used for correlation analysis of protein and gene structure to examine intron evolution. In this study, phylogenetic analysis based on amino acid sequences provided a fairly clear relationship between protein primary structure and gene exon–intron structure (Fig. 3). This could be an alternative method for future similar studies since the three-dimensional structures of a large number of proteins remain unknown. An exception that could call the gene–protein structure relationship into question was that the Pr1J gene from different strains of M. anisopliae had different exon–intron structures. There were some other exceptions of genes with the same exon–intron structure that were not clustered together, for example, the proteins with a single phase 0 intron at position 1 (Fig. 3). However, with the combination of protein size data, the difference between those proteins became clear, for example, proteinase K (X14689) of Tritirachium album has 384 amino acids, while the proteins in the subgroup including Pr1H of M. anisopliae have more than 530 amino acids (Fig. 3). The close relationship of gene–protein structure independent of fungal species suggests that subtilisin evolution and diversification have not been closely related to fungal specification. Supportively, the subtilisins Pr1APr1K could be identified from a single strain of M. anisopliae as a paralogous gene family (Bagga et al. 2004). More interestingly, the host strains for Pr1J AJ269535 (one intron) and AJ416687 (two introns) belong to the subspecies M. anisopliae sf. anisopliae, while AJ416694 (two introns) belongs to M. anisopliae sf. acridum (Bagga et al. 2004), indicating that intron insertions at AJ416687 and AJ416694 occurred before the divergence of M. anisopliae into two subspecies but could be after the separation of M. anisopliae sf. anisopliae strains. For the subtilisins identified from the genus Aspergillus, PepC (A. niger) has no introns at positions 6 and 7 compared with PrtA (A. nidulans), Alk1 (A. favus), and Alp (A. fumigatus) (Fig. 3), again suggesting that intron insertion could occur before fungal specification.

In general, phylogenetic analyses of intra- and interspecific fungal subtilisins in this study provided evidence in support of the speculations of both the introns-early and the introns-late theories. However, no conclusion can be drawn from this study to resolve the issue of exon–intron origin/evolution. Literature review indicated that the arguments from some studies were more likely to be case associated, for example, studies of TPI genes (Kwiatowski et al. 1995; Logsdon et al. 1995), carbonic anhydrases (Hewett-Emmett and Tashian 1996), receptor tyrosine kinase genes (Gamulin et al. 1997), and aldehyde dehydrogenase genes (Rzhetsky et al. 1997), and even a large-scale examination of 10 protein families (Cho and Doolittle 1997), supported the introns-late theory, while a study of xylanase genes provided evidence for the view of ancient introns (Sato et al. 1999). Interestingly, a reexamination of TPI exon–intron structure recommended a reconciliation of the introns-early and introns-late views (Tyshenko and Walker 1997). More comprehensively, a statistical test of a data set of 44 ancient proteins with 988 introns indicated that about one-third of all introns were ancient and belonged to phase zero introns, while the others corresponded to intron addition/deletion, appearing equally in all three intron phases (De Souza et al. 1998). The examination of intron positions in 684 orthologous gene sets from 8 complete genomes suggests that many introns are ancient while lineage-specific intron loss/gain events have been apparently taking place (Rogozin et al. 2003). In this respect, a synthetic theory should be acceptable to provide insights that introns have been not only the remnants for exon shuffling to accelerate the formation of new genes, but also the mobile elements in gene evolution (De Souza 2003; Fedorova and Fedorov 2003).