Introduction

Sterol carrier protein-2 (SCP-2) is a protein domain that in vitro enhances the transfer of lipids between membranes. SCP-2 was originally described as a sterol binding protein, but more recently it has been shown to also bind phospholipids, fatty acids, and fatty acyl-coenzyme A. Mammalian and insect SCP-2 has an α/β-fold consisting of a five-stranded β-sheet and five α-helices. A C-terminal segment together with part of the β-sheet and four α-helices form a hydrophobic tunnel suitable for binding of lipids (Choinowski et al. 2000; Haapalainen et al. 2001; Dyer et al. 2003). The biological function of SCP-2 remains obscure, although it has been extensively studied in mammalian cells (reviewed by Gallegos et al. 2001; Seedorf et al. 2000; Stolowich et al. 2002). One hypothesis is that SCP-2 is involved in cytosolic non-vesicular transfer of cholesterol. There are also suggestions that SCP-2 plays a role in peroxisomal oxidation of fatty acids, where it might facilitate the presentation of certain substrates and/or stabilizing the enzymes involved in catalyzing the reaction cycles.

Recently, we published the first report on SCP-2 from plants (Edqvist et al. 2004), where we showed that in Arabidopsis thaliana and probably also most other plants SCP-2 is solely expressed from genes encoding unfused SCP-2 domains. This is surprising considering that in animals and many other eukaryotes, SCP-2 are often present in the terminal of multidomain proteins. For instance, in the human genome there are four genes, HSD17B4, SCPX, HSDL2, and STOML1, that encode multidomain proteins having SCP-2 domains in their C-terminals. But as in plants there is also a human gene, C20orf79, that encodes a protein consisting of an SCP-2 domain which is not fused to any other recognizable protein domain. HSD17B4 encodes a D-bifunctional protein (DBP) with domains for D-3-hydroxyacyl-CoA dehydrogenase, enoyl-CoA hydratase, and SCP-2 (Adamski et al. 1995; Leenders et al. 1998). The SCP-2 domain contains a peroxisomal targeting (PTS1) that locates DBP to the peroxisomes. The enzymatic domains take part in peroxisomal oxidation of fatty acids through D-3-hydroxyacyl-CoA intermediates. The enoyl-CoA hydratase domain catalyzes the hydration of trans-2-enoyl-CoA to D-3-hydroxyacyl-CoA, while the D-3-hydroxyacyl-CoA dehydrogenase converts D-3-hydroxyacyl-CoA esters to 3-ketoacyl-CoA. The SCP-2 domain is always expressed as an integral part of DBP, which has a molecular mass of 80 kDa. However, it has been observed that the 80-kDa protein to some extent undergoes proteolytic cleavage at a site between the domain for D-3-hydroxyacyl-CoA dehydrogenase and that for enoyl-CoA hydratase (Leenders et al. 1996).

The gene SCPX (also known as SCP2) encodes SCP-X, which consists of 3-ketoacyl-CoA thiolase connected to a C-terminal SCP-2 domain (Ohba et al. 1994). 3-Ketoacyl-CoA thiolase catalyzes the reversible cleavage of 3-ketoacyl-CoA into acyl-CoA and acetyl-CoA, which is the last step in each round of the β-oxidation cycle. The 3-ketoacyl-CoA thiolase domain of the human SCP-X shows about 40% similarity to the peroxisomal 3-oxoacyl-Coenzyme A thiolase encoded by ACAA1 on chromosome 3 in the human genome (Bout et al. 1991). Mice lacking a functional Scp2 show a deficiency in the thiolytic cleavage of 3-ketopristanoyl-CoA, indicating that the SCP-X thiolase is important for the β-oxidation of pristanic acid (Seedorf et al. 1998). Due to the existence of dual promoters, as well as proteolytic cleavage sites, the SCPX encoded SCP-2 is also expressed as a separate protein not fused to any other domain (Ohba et al. 1995).

HSDL2 encodes the protein hydroxysteroid dehydrogenase-like 2 (HSDL2). HSDL2 has a molecular weight of 45 kDa and carries domains for short-chain dehydrogenase and SCP-2 (Dai et al. 2003). The SCP-2 domain contains a PTS1 that is likely to locate the protein to the peroxisomes. The amino acid sequence of the human HSDL2 dehydrogenase shares approximately 40% similarity to the D-3-hydroxyacyl-CoA dehydrogenase of human DBP, still the specificity and function of the short-chain dehydrogenase of HSDL2 are unknown. The gene STOML1 encodes stomatin (EBP72)-like 1 (STOML1), which is a protein of 398 residues with a molecular weight of 43 kDa. STOML1 consists of a stomatin-like domain and SCP-2 (Seidel and Prohaska 1998). Stomatin is a 31-kDa integral membrane protein, named after the rare human hemolytic anemia hereditary stomatocytosis. It is found in lipid/protein-rich microdomains, and it is believed to regulate the function of ion channels and transporters (Stewart 2004).

Similar multidomain proteins carrying SCP-2 domains have also been identified in other eukaryotic organisms. UNC-24 is a stomatin-like protein carrying a SCP-2 domain in the C-terminus from the nematode Caenorhabditis elegans (Barnes et al. 1996). DBP with domains for D-3-hydroxyacyl-CoA dehydrogenase, enoyl-CoA hydratase, and SCP-2 have been reported from arbuscular mycorrhizal fungi (Requena et al. 1999), whereas the slime mold Dictyostelium spp. (Matsuoka et al. 2003) encodes a DBP lacking the enoyl-CoA hydratase. The yellow fever mosquito Aedes aegypti expresses the SCP-X fusion as well as proteins carrying unfused SCP-2 domains (Krebs and Lan 2003; Lan and Wessely 2004). In plants (Edqvist et al. 2004), the yeast Candida (Szabo et al. 1989; Hwang et al. 1991), archaea (Kawashima et al. 2000), and bacteria, SCP-2 seems to be encoded solely by stand-alone, unfused genes.

During evolution genes can produce more complex proteins by gene fusion or less complex ones by gene fission. It has been estimated that fusion events are four times more common than gene fission events (Kummerfeld and Teichmann 2005). We have performed a systematic investigation of the distribution and evolution of SCP-2 domains in eukaryotic organisms to trace gene fusion and fission during evolution of SCP-2. In particular, we are interested in the origin and function of plant SCP-2. In this study we aimed to establish whether the plant SCP-2 genes originated from a gene fission event or whether all fusion events involving SCP-2 occurred after the split of animals and plants estimated to 1609 million years ago (Hedges et al. 2004). Our obtained data suggest that plant SCP-2 was formed from fission of a gene encoding D-3-hydroxyacyl-CoA dehydrogenase and SCP-2. We also show that similar fission events involving SCP-2 have occurred at several other times, for instance, during the evolution of fungi and vertebrates.

Materials and Methods

Retrieval of SCP-2 Sequences

SCP-2 amino acid, cDNA, and gene sequences were extracted from the National Center for Biotechnology Information (NCBI) database using BLASTP and TBLASTN (Altschul et al. 1997). Sequences were also retrieved from other public databases: ToxoDB (toxodb.org/ToxoDB.shtml), tigr db (http://www.tigr.org), Wormbase (http://www.wormbase.org), Genoscope (http://www.genoscope.cns.fr), Phytophthora Functional Genomics Database (http://www.pfgd.org), Dictyostelium discoideum genome project (http://genome.imb-jena.de/dictyostelium), Paramecium Genome Project database (paramecium.cgm.cnrs-gif.fr), and ENSEMBL (http://www.ensembl.org). BLAST searches were performed with substitution matrix BLOSUM 62 and gap costs: existence 11, extension 1. The criterion for identification of an SCP-2 domain with BLAST was set to an obtained E-value less than 1e-10 for database searches using any of the characterized human SCP-2 domains as bait. Higher E-values were accepted for proteins encoding unfused SCP-2 domains from A. aegypti and Anopheles gambiae, which in such searches showed E-values from 1e-6 to 1e-9. All sequences included here were also identified to contain an SCP-2 domain according to the conserved domain database for protein classification CDD (Marchler-Bauer et al. 2005).

Sequence Analyses

Multiple sequence alignments were created using Clustal X (Thompson et al. 1997). Protein sequence alignments were performed with the following parameters: gap opening penalty = 10.0, gap extension penalty = 0.20, and Gonnet protein weight matrix. Multiple sequence alignments were inspected by eye. To reconstruct phylogenetic trees by maximum likelihood the multiple sequence alignments were analyzed with PHYML (Guindon and Gascuel 2003) by submitting the alignments to the PHYML server (http://www.atgc.lirmm.fr/phyml) (Guindon et al. 2005). The JTT substitution matrix was used for calculation of the amino acid substitutions (Jones et al. 1992). A discrete-gamma distribution with four categories was used to account for variable substitution rates among sites. The gamma distribution parameter was estimated by PHYML. A BIONJ distance-based tree was used as the starting tree to be refined by the maximum likelihood algorithm. The number of generated bootstrapped pseudo data sets was 100.

Results

SCP-2 Domains in Humans and Other Vertebrates

The amino acid sequences of all five human SCP-2 domains were aligned in a phylogenetic tree (Fig. 1A). We use DBP to describe the protein expressed from HSD17B4. The sequences share from 20 to 50% identity and from 40 to 70% similarity (Table 1). The SCP-2 domains from DBP and C20orf79 share 50% identity and 70% similarity, indicating that these are the most closely related SCP-2 domains in the human genome. The similarity scores are in general lower for comparisons to the SCP-2 domain from STOML1. The reconstructed phylogenetic tree shows that the SCP-2 domains from DBP and C20orf79 share a common ancestor, as do the domains from STOML1 and HSDL2 (Fig. 1A). The alignment reveals that only 13 residues are conserved in all five human SCP-2 domains: V28, G44, D53, G59, D70, P90, A93, G97, K100, G103, K110, L111, and A121 (Fig. 1B). The numbering is according to the mature SCP-2 from SCP-X. In comparison to the structure of SCP-2, residues G44, G59, D70, G97, and G103 are located in loops connecting β-strands and/or α-helices (Choinowski et al. 2000). V28 resides in helix B, D53 in strand II, P90 and A93 in helix D, K100 in strand V, K110 and L111 in helix E, and A121 in the C-terminal region, as one of the residues in the PTS1 targeting signal.

Figure 1
figure 1

Comparison of human SCP-2 domains. A Phylogenetic analysis of human SCP-2 amino acid sequences by maximum likelihood. Numbers indicate the percentage of 100 bootstrap resamplings that support the inferred topology. B Multiple sequence alignment of human SCP-2 domains. Black boxes indicate identical amino acids in at least four of the sequences, while shaded boxes indicate the presence of amino acids with similar physicochemical properties in at least four of the sequences.

Table 1 The identity and similarity of the amino acids sequences of human SCP-2 domains: Percentage of amino acid identity/similarity

We have identified SCP-2 domains in genes orthologous to SCPX, HSD17B4, HSDL2, and STOML1 in all vertebrate genomes we have surveyed, such as Mus musculus, Gallus gallus (Fig. 2), Danio rerio, and Xenopus laevis. C20orf79 genes were only identified in mammals. None of the proteins encoded from C20orf79 orthologs carry a C-terminal defined as a PTS1 peroxisomal targeting sequence when analyzed with the PTS1 predictor (http://www.mendel.imp.univie.ac.at/mendeljsp/sat/pts1/PTS1predictor.jsp [Neuberger et al. 2003]). According to the annotation of the rat genome, locus 311491 on chromosome 3 encodes a fusion between a papain domain and a SCP-2 with high similarity to C20orf79. Papain domains are known to be present in cysteine proteases. We have not detected similar fusions in other genomes. Moreover, in Danio rerio and Canis familiaris, SCP-2 domains are present in the N-terminal of proteins related to the small leucine-rich repeat protein podocan (Ross et al. 2003).

Figure 2
figure 2

Distribution of SCP-2 domains in different organisms. The following symbols are used: , SCP-2 domain; , SCP-X;.U271A;- , STOML1/UNC-24; and , DBP; , ancestral DBP; , HSDL2; SCP-X thiolase domain; , enoyl-CoA-hydratase domain; , D-3-hydroxyacyl dehydrogenase domain; HSDL2 short-chain dehydrogenase domain; , stomatin-like domain. X indicates the unfused SCP-2 expressed from the human SCPX gene or orthologous genes in other animals.

SCP-2 Domains in Other Metazoans

In the nematode C. elegans SCP-2 domains are encoded from four loci, dhs-6, nlt-1, unc-24, and dhs-28 (Fig. 2). Dhs-28 on chromosome X is coding for a DBP consisting of domains for D-3-hydroxyacyl-CoA dehydrogenase and SCP-2. The domain for enoyl-CoA hydratase activity present in vertebrate DBP is missing from the nematode DBP. Hereafter, we use the term ancestral DBP for DBPs lacking the enoyl-CoA hydratase domain. The nlt-1 locus located on chromosome II in the worm genome encodes a protein consisting of an unfused SCP-2 domain of 118 amino acids with a molecular weight of 13.1 kD and a theoretical pI of 9.5. The locus unc-24 on chromosome IV encodes the nematode orthologue to the human STOML1. The dhs-6 locus on chromosome II in the nematode genome corresponds to the human HSDL2. In the C. elegans genome there is no gene corresponding to the human SCPX. All C. elegans SCP-2 domains, with the exception of UNC-24, carries C-terminal PTS1 targeting signals. UNC-24 is localized to the plasma membrane (Sedensky et al. 2004).

In the genome of Drosophila melanogaster there are three genes, CG5590, CG17320, and CG11151, which encode SCP-2 domains. On the right arm of chromosome 3, CG5590 encodes a short-chain dehydrogenase/SCP-2 fusion of 412 amino acids (aa), which corresponds to the human HSDL2 or dhc-6 in C. elegans. CG17320 on the left arm of chromosome 2 encodes SCP-X, thus this is the orthologue to the human SCPX. CG11151, located on chromosome X, encodes an unfused SCP-2 of 115 aa (12.5 kDa) with a theoretical pI of 9.1. All D. melanogaster SCP-2 domains carry C-terminal PTS1 targeting signals. In the D. melanogaster genome there is no gene corresponding to STOML1/unc-24.

In the malaria mosquito Anopheles gambiae there are at least six genes encoding SCP-2 domains (Fig. 2). ENSANGG00000011810 encodes ENSANGP000 00014299, a DBP with a SCP-2 domain in its C-terminus. The mosquito DBP is similar to the vertebrate DBP as it also contains domains for D-3-hydroxyacyl-CoA dehydrogenase and enoyl-CoA hydratase. The malaria mosquito genome also contain genes that are orthologous to the human SCPX (gene ENSANGG00000007479/protein ENSANGP00000009 968) and HSDL2 (ENSANGG00000013825/ ENSANGP00000016314). The mosquito SCP-2 domains from DBP, HSDL2, and SCP-X/SCP-2 all contain peroxisomal targeting sequences. Moreover, there are three genes in the A. gambiae genome (ENSANGG00000018004, ENSANGG00000022459, and ENSANGG00000017981) that express unfused SCP-2 domains (ENSANGP00000020493, ENSANGP00 000026507, and ENSANGP00000020470, respectively). ENSANGP00000020493 consists of 111 aa/12.3 kD and has a theoretical isoelectric point of 5.0, ENSANGP00000026507 is 105 aa/11.1 kD and has an isoelectric point of 5.3, and ENSANGP00000020470 contains 105 aa/11.1 kDa and has a putative pI of 5.4. Thus, all three of these mosquito proteins have remarkably low isoelectric points compared to other eukaryotic SCP-2 domains. These proteins also lack peroxisomal targeting signals, suggesting a cytoplasmic location. Such unfused SCP-2 domains are also expressed in other mosquitos, such as Aedes aegypti. We have not identified STOML1/unc-24 in mosquito genomes.

The translation of expressed sequence tags from the freshwater hydrozoan Hydra magnipapillata classified in the phylum Cnidaria revealed that this organism express SCP-2 domains in the fusions HSDL2 and SCP-X. We have not identified H. magnipapillata cDNAs that encode DBP, STOML1, or unfused SCP-2 domains.

SCP-2 in Fungi

In the arbuscular mycorrhizal fungi (Glomeromycota) Glomus mosseae an SCP-2 domain is present in the C-terminus of DBP. The G. mosseae DBP, known as GmFOX2, resembles the vertebrate DBP in that it contains domains for D-3-hydroxyacyl-CoA dehydrogenase, enoyl-CoA hydratase, and SCP-2 (Requena et al. 1999). However, unlike the vertebrate DBP, GmFOX2 carries two N-terminal D-3-hydroxyacyl-CoA dehydrogenase domains (Fig. 2). As in other DBPs, the SCP-2 domain encodes a PTS1 targeting signal in the C-terminus (SKL). All ascomycetes and basidomycetes we have investigated express a DBP without the SCP-2 domain. Instead, many of these fungi, such as Neurospora crassa, express an unfused SCP-2 with peroxisomal targeting sequence (Fig. 2). It is notable that there is no gene with similarity to SCP-2 in the completely sequenced genomes of the yeast Saccharomyces cerevisae or Schizosaccharomyces pombe.

SCP-2 in Alveolates, Slime Molds, and Oomycetes

SCP-2 domains were identified in genome sequences and ESTs from the cilitates Paramecium auralia, Tetrahymena thermophila, and Sterkiella histrimuscorum. These ciliates express separate, unfused SCP-2, as well as SCP-2 domains connected to D-3-hydroxyacyl-CoA dehydrogenase in ancestral DBPs, like the ones in nematodes. Perkinsus marinus, an alveolate classified to branch at the dinoflagellate clade, also encodes an SCP-2 domain, which seemingly is unfused.

In Toxoplasma gondii, classified to the phylum Apicomplexa, there is a gene encoding an ancestral DBP variant consisting of one N-terminal D-3-hydroxyacyl-CoA dehydrogenase domain connected to two C-terminal SCP-2 domains in a tandem arrangement (Fig. 2). The most C-terminal of these domains carries a PTS1 (SRL). We have also detected a gene encoding a separate SCP-2 in the apicomplexan Eimeria tenella, which belongs to the family Eimeriidae. Yet there are no identified SCP-2 domains in other apicomplexans, such as Cryptosporidium and Plasmodium, or in Trypanosoma and Leishmania belonging to kinetoplastida.

The ancestral DBP found in nematodes and ciliates is also present in the slime mold D. discoideum (Fig. 2) and in the oomycete Phytophtora sojae. D. discoideum and P. sojae also express cDNAs that encode a fusion protein consisting of two SCP-2 domains in a tandem arrangement. In these arrangements the most C-terminal domain carries a PTS1 (AKL).

SCP-2 in Plants

In plants SCP-2 is expressed from stand-alone, unfused genes (Fig. 2). In the completely sequenced genome of the dicot A. thaliana there is one SCP-2 gene residing on chromosome 5 (Edqvist et al. 2004). The A. thaliana SCP-2 consists of 123 aa and has a theoretical pI of 9.2 and a molecular weight of 13.6 kDa. It contains a PTS1 targeting signal, SKL, in the C-terminus and has been shown to localize to the peroxisomes. In the monocot Oryza sativa, SCP-2 is encoded from three genes, located on chromosomes 1, 2, and 6. The gene on chromosome 2 encodes a protein consisting of 122 aa, including a PTS1 (SKL) probably targeting the protein to the peroxisomes. The genes on chromosomes 1 and 6 have C-terminal extensions of approximately 70 and 50 aa, respectively. The relevance of the extensions is not known, but the C-terminals of the larger proteins do not resemble peroxisomal targeting sequences. Unfused, separate SCP-2 domains are also expressed in other plants such as the gymnosperm Cryptomeria japonica, the fern Ceratopteris richardii, the moss Physcomitrella patens, and the green algae Chlamydomonas reinhardtii.

Phylogenetics of SCP-2 Domains

The amino acid sequences of the eukaryotic SCP-2 domains were aligned and a phylogenetic tree was reconstructed using the maximum-likelihood method (Guindon and Gascuel 2003). The SCP-2 domains from HSDL2, STOML1, SCPX, and DBP form separated groups (Fig. 3). The eukaryotic separate and unfused SCP-2 domains encoded by stand-alone genes are found in several separated branches. It is interesting to note that most branches with unfused SCP-2 domains are within the cluster formed of SCP-2 domains from DBP. For instance, the unfused SCP-2 encoded by C20orf79 shares a common ancestor with the SCP-2 domain of DBP. The unfused SCP-2 domains from fungi are in a branch together with the SCP-2 domain present in the DBP of G. mosseae. The land plant SCP-2 domains also form a specific cluster, with the SCP-2 domains of DBP and of unfused proteins from ciliates and apicomplexan (Fig. 3). The unfused SCP-2 domain encoded by the D. melanogaster gene CG11151-PA branches together with SCP-2 domains from insect DBP, whereas the unfused SCP-2 domains from malaria and yellow fever mosquitoes are in a separate branch.

Figure 3
figure 3

Phylogenetic tree of SCP-2 amino acid sequences reconstructed by maximum likelihood. Numbers indicate the percentage of 100 bootstrap resamplings that support the inferred topology. Only bootstrap values over 50% are shown. Sequences are labeled SCP-2, SCP-X, DBP, Podocan, or STOML1 to indicate the protein within which the SCP-2 domain is present. SCP-2 indicates proteins with unfused SCP-2 domains. The sequences included in the analysis have the following accession numbers: Homo sapiens, B40407; Gallus gallus, Q07598; Drosophila melanogaster, NP_524715.2; Ciona intestinalis, BW501051.1; Mus musculus, AAA40098.1; Pan troglodytes, XP_513413.1; Aedes aegypti, AAQ24505.1; Tetraodon nigroviridis, CAF96357.1; Anopheles gambiae, EAA12893.2; Danio rerio, NP_957159.1; Caenorhabditis elegans, NP_496161.1; Dictyostelium discoideum, BAA94961; Glomus mossaea, CAB55552.1; H. sapiens, P51659; M. musculus, NP_032318.2; Xenopus laevis, AAH74145.1; A. gambiae, XP_314766.1; D. melanogaster, NP_572917.1; C. elegans, NP_509146.1; Apis mellifera, XP_393475.1; D. rerio, NP_956430.1; G. gallus, NP_990274.1; C. intestinalis, BW496744.1; H. sapiens, AAH04331; M. musculus, NP_077217.1; X. laevis, AAH59996.1; D. rerio NP_955893.1; C. intestinalis BW403353; A. gambiae XP_307644.1; D. melanogaster, AAK77240.1; C. elegans, NP_740993.1; Strongylocentrotus purpuratus, CD311368.1; Sterkiella histriomuscorum, AY618149.1; T. thermophila, CX574182.1; Euphorbia lagascae, AY987485; Arabidopsis thaliana, AAM51290.1; Physcomitrella patens, BJ200729.1; Oryza sativa, BAD35172.1; O. sativa, XP_467958.1; Cryptomeria japonica, AU084446.1; U. maydis, EAK83903.1; Gibberella zeae, PH-1 XP_390240.1; Neurospora crassa, CAD21491.1; Candida maltosa, Q00680; Aedes aegypti, 1PZ4_A; A. gambiae, EAA08376.3; Paramecium tetraurelia, FN0AA188AH11LM1.SCF; G. gallus, XP_413691; M. musculus, NP_081218.2; T. nigroviridis, CAF98342.1; H. sapiens, CAG46886; C. elegans, NP_501335.1 ; C. intestinalis, BW084548.1; Gasterosteus aculeatus, CD501703.1; H. sapiens, Q9UJQ7; M. musculus, Q9DAH1; A. gambiae, XP_307198.1; A. gambiae str. PEST, EAL42128.1; Chlamydomonas reinhardtii, BI874770.1; Phytophthora sojae, CF846956.1; D. discoideum, EAL65110.1; Eimeria tenella, Et_v1_Twnscn_Contig5432.tmp1chr; Toxoplasma gondii, TgTwinScan_2681; Hydra magnipapillata, CN628855.1; H. magnipapillata, DN138329.1; Thermoplasma volcanium, NP_111830.1; Branchiostoma floridae, BW696831.1; Parastrongyloides trichosuri, BI451354.1; Necator americanus, BU088364.1; Ancylostoma ceylanicum, CB276866.1; Canis familiaris, XP_536704.1; D. rerio, AAH83225.1; D. rerio, XP_699702.1; Xenopus tropicalis, AAH80462.1; Angiostrongylus cantonensis, DN190860.1; Mytilus galloprovincialis, AJ625081.1; C. familiaris, XP_532163.1; Schmidtea mediterranea, AY068105.1; Molgula tectiformis, CJ356207; Branchiostoma floridae, BW874569.1; C. familiaris, XP_542873.1; Pan troglodytes, XP_525275.1; Bos taurus, XP_582345.1. Sequences are identified by their NCBI accession numbers or by the identity numbers given in other databases such as TIGR Tetrahymena thermophila Genome Sequencing Project, TIGR C. intestinalis Gene Index, and ToxoDB.

Distribution of SCP-2 Fusion Partners

To learn more about the evolutionary history of the SCP-2 gene fusions, we were interested to trace genes encoding separate, unfused proteins related to the SCP-X thiolase, the DBP enoyl-CoA hydratase, and the HSDL2 short-chain dehydrogenase. The amino acid sequences of proteins encoded from such genes were extracted from public databases using BLAST. The obtained sequences were subjected to multiple sequence alignments, which were used for reconstructing phylogenetic trees. The phylogenetic tree of 3-ketoacyl-CoA thiolases shows that the SCP-X thiolases form a specific branch together with unfused homologous proteins from fungi, nematodes, bacteria, and archaea, indicating that these are encoded from orthologous genes (Fig. 4A). The SCP-X branch is well separated from other thiolases, such as the human peroxisomal thiolase encoded by ACAA1. The SCP-X thiolase do not seem to be expressed in plants or in the slime mold D. discoideum. One can note that several of the bacterial and archaean putative thiolases are in the databases annotated as putative lipid transfer proteins, although they lack the SCP-2 domain.

Figure 4
figure 4

Phylogenetic trees of amino acid sequences of 3-ketoacyl-CoA thiolases (A), enoyl-CoA hydratases (B), and D-3-hydroxyacyl-CoA dehydrogenases and short-chain dehydrogenases with similarity to the domain in HSDL2 (C). The phylogenetic trees were reconstructed using the maximum likelihood method. Numbers indicate the percentage of 100 bootstrap resamplings that support the inferred topology. Only bootstrap values over 50% are shown. Symbols are as in Fig. 2. Sequences are identified by gene names (H. sapiens), open reading frame names (A. thaliana), NCBI accession numbers, or identity numbers given in other databases such as TIGR Tetrahymena thermophila Genome Sequencing Project, TIGR C. intestinalis Gene Index, and ToxoDB.

Separate, unfused genes encoding enoyl-CoA hydratase, with high similarity to the domain in vertebrate, insect and fungal DBP are present in A. thaliana, C. elegans, D. discoideum and T. gondii. Also bacteria express proteins which are encoded by orthologs to the enoyl-CoA hydratase domain in DBP (Fig. 4B). The reconstructed evolutionary tree of short-chain dehydrogenase domains suggests that the domain from HSDL2 share a common origin with homologous and unfused proteins from D. discoideum, T. thermophila and bacteria (Fig. 4C). We also included the D-3-hydroxyacyl-CoA dehydrogenase domains from DBP in the reconstructed phylogenetic tree. The tree clearly shows that the D-3-hydroxyacyl-CoA dehydrogenase from DBP and the short-chain dehydrogenase domains from HSDL2 have evolved from different ancestors (Fig. 4C). We did not detect the D-3-hydroxyacyl-CoA dehydrogenase in plants. The plant protein most similar to the D-3-hydroxyacyl-CoA dehydrogenase in DBP is the chloroplast located 3-oxoacyl-[acyl-carrier-protein] reductase known to be involved in the synthesis of fatty acids. This protein shows high similarity to proteins from cyanobacteria, indicating that the gene was obtained during the endosymbiotic event leading to the formation of chloroplasts. The reconstructed phylogenetic tree also shows that the plant and cyanobacterial 3-oxoacyl-[acyl-carrier-protein] reductases are expressed from orthologous genes (Fig. 4C).

Discussion

We have investigated the distribution and evolution of SCP-2 domains in eukaryotic organisms. We hypothesize that the ancestral eukaryotic SCP-2 was present in a fusion to D-3-hydroxyacyl-CoA dehydrogenase (Fig. 5). This hypothesis is strengthen by the presence of this fusion in alveolates (T. thermophila), oomycetes (P. sojae), metazoa (C. elegans), and mycetozoa (D. discoideum) (Fig. 5).

Figure 5
figure 5

A model for the evolution of SCP-2. The SCP-2 domains and its fusion partners were placed on the tree of life proposed by Baldauf et al. 2000. The symbols are described in the legends to Figs. 2 and 4.

Furthermore, we have not identified eukaryotic genes encoding separate, unfused proteins orthologous to the D-3-hydroxyacyl-CoA dehydrogenase in DBP (Fig. 4C). The clustering of the DBP SCP-2 domains in the phylogenetic tree also indicates that the D-3-hydroxyacyl-CoA dehydrogenase/SCP-2 fusions have evolved from a common origin (Fig. 3). Moreover, the cluster of SCP-2 domains from DBP and SCP-X contains all unfused eukaryotic SCP-2 domains, showing that none of those represent the eukaryotic ancestral SCP-2. Thus, according to our data it is likely that an ancestral eukaryotic SCP-2 was present in a fusion to D-3-hydroxyacyl-CoA dehydrogenase. We conclude that the present eukaryotic repertoire of SCP-2 encoding genes has evolved through duplications, fissions, and fusions, with the ancestral D-3-hydroxyacyl-CoA dehydrogenase/SCP-2 fusion as the starting point for the evolutionary events. Our model indicates that there have been about as many fission as fusion events during the evolution of the eukaryotic genes encoding proteins carrying SCP-2 domains (Fig. 5).

An integral SCP-2 domain is not necessary for the function of DBP, since there are a number of organisms, such as D. melanogaster and fungi belonging to the phylums Ascomycota and Basidomycota that express a DBP lacking the SCP-2 domain. D. melanogaster and most of the fungal species express unfused SCP-2 domains orthologous to the SCP-2 domains in DBP. Thus, it seems that during evolution of Ascomycota and Basidomycota, as well as Drosophila, genes encoding unfused SCP-2 domains have evolved through fission of the gene encoding DBP. The formation of an unfused SCP-2 from DBP has also occurred during the evolution of vertebrates, since the mammalian SCP-2 domains of DBP and C20orf79 have evolved from a common origin (Figs. 1 and 3). The SCP-2 domains are still present in the mammalian DBP, which indicates that the SCP-2 encoded by C20orf79 evolved from duplication and fission of an ancestral HSD17B4. The unfused, separate SCP-2 in plants also evolved from a SCP-2 domain in DBP, as the plant proteins share a common ancestor with ciliate and apicomplexan SCP-2 domains present in DBP. In Nematoda, we have detected the SCP-X fusion only in Parastrongyloides trichosuri. Nematodes lacking SCP-X express an unfused SCP-X thiolase (Fig. 4), as well as an unfused SCP-2 domain which in the phylogenetic tree group together with the SCP-2 domain from the P. trichosuri SCP-X (Fig. 3). Thus, during evolution of nematodes, genes encoding unfused SCP-2 domains have been formed by fission of the gene encoding SCP-X.

These numerous gene fission events have probably been important to evolve a more versatile, useful SCP-2, which can take part in additional cellular activities compared to the SCP-2 domain being an integral part of DBP or SCP-X. An example of a gene fission event that has opened for the evolution of unfused SCP-2 domains with additional value to the organism is from the mosquito A. aegypti. Mosquitoes express a number of genes encoding unfused SCP-2 domains (Fig. 3). In insects trafficking of cholesterol is an important process due to the fact that insects depend on dietary cholesterol to fulfill their physiological needs. A protein carrying an unfused SCP-2 domain from A. aegypti is located in the cytoplasm, has a high affinity for cholesterol, and, when overexpressed, increased the cholesterol uptake in A. aegypti cell lines (Krebs and Lan 2003; Lan and Massey 2004). Thus, the results are indicative that a gene fission event has opened for the evolution of an insect SCP-2 domain with a function in cholesterol trafficking in the cytoplasm. The significance of the gene fission behind the formation of the plant genes encoding unfused SCP-2 domains remains unclear. The plant protein most similar to the D-3-hydroxyacyl-CoA dehydrogenase in DBP is the chloroplast located 3-oxoacyl-[acyl-carrier-protein] reductase known to be involved in the synthesis of fatty acids. Plants obtained the genes encoding this protein from cyanobacteria during the endosymbiotic event leading to the formation of chloroplasts (Fig. 4C). We can speculate that this endosymbiotic event made the D-3-hydroxyacyl-CoA dehydrogenase in DBP unnecessary for plant growth and development, thereby opening for a gene fission to retain in the plant genomes only the gene encoding the SCP-2 domain.

It is likely that the SCP-2 fusions have evolved in two steps. The first step is the duplication of the SCP-2 domain of DBP, yielding an unfused SCP-2, which in subsequent events has fused to genes encoding thiolase, short-chain dehydrogenase, or stomatin-like proteins. Thus, the events leading to the evolution of genes encoding separate unfused SCP-2 domains may also be the starting point for the evolution of novel SCP-2 fusions. The fusion genes SCPX and HSDL2 are present in the genome of the cnidarian H. magnipapillata, suggesting that these gene fusions had already evolved before cnidarians were separated from vertebrates and nematodes. The expression in ciliates, slime molds, and fungi of unfused proteins encoded by genes orthologous to the short dehydrogenase domain in HSDL2 further supports that the fusion occurred in a metazoan ancestor after the split of metazoan and fungi (Figs. 4 and 5). DBP, SCP-X, and HSDL2 are localized to the peroxisomes. In these peroxisomal fusion proteins the SCP-2 domains are fused to enzymes, of which at least D-3-hydroxyacyl-CoA dehydrogenase, enoyl-CoA-hydratase, and 3-ketoacyl-CoA thiolase are involved in the metabolism of fatty acyl-CoA. The functions of the SCP-2 domains in these proteins are not unambiguously determined. We think it is likely that the formation of the genes encoding these SCP-2 fusions let the organisms take better advantage of the lipid binding or trafficking properties of the SCP-2 domains in lipid metabolism. Possibly, the SCP-2 domains are involved in the delivery of certain substrates to the enzymatic active sites or in establishing a lipid environment that increases reaction rates. Anyway, since these gene fusions are retained in many genomes, it is evident that they have added additional value to the peroxisomes.

An interesting example of novel exploitations of the SCP-2 domain is the fusion of SCP-2 and the stomatin-like protein in STOML1, unc-24, and orthologous genes. This fusion occurred at least before the Pseudocoelomata (including Nematoda) was separated from Coelomata (including Chordata and Arthropoda) (Fig. 3). We have not identified STOML1 in insects, such as D. melanogaster and A. gambiae, showing that the gene was lost during the evolution of insects. In C. elegans the genes unc-1 and unc-8 each control normal locomotion and sensitivity to volatile anesthetics. The protein UNC-1 is a close homologue of the mammalian stomatin, is expressed primarily in the nervous system, and localizes to detergent-resistant fractions of the cell membranes resembling lipid rafts. These rafts are formed in the endoplasmic reticulum. The STOML1 orthologue in C. elegans UNC-24 is necessary for movement of UNC-1 as a part of a raft from the endoplasmic reticulum to the cell membrane (Sedensky et al. 2004). The stomatin portion of UNC-24 could be the necessary component for the initial formation of the maintenance of multimeric interactions and the SCP-2 domain for transfer to the cell membrane. Further, the response to gentle body touch in C. elegans requires a degenerin channel complex containing four proteins (MEC-2, MEC-4, MEC-6, and MEC-8). Mutation of unc-24 enhances the touch-insensitive phenotypes produced by mec-4 and mec-6 alleles. UNC-24 and the stomatin-like protein MEC-2 interact through their stomatin-like regions with the degenerin MEC-4 (Zhang et al. 2004). This binding allows MEC-2 to regulate the activity of the degenerin channel. Zhang et al. (2004) suggested that MEC-2 and UNC-24 may be associated with a specialized lipid environment that influence channel activity. It is likely that the lipid binding and transfer capability of the SCP-2 domain is involved in establishing this lipid environment of the degenerin channel.

It was somewhat surprising to find that the apicomplexan T. gondii express a DBP with two N-terminal SCP-2 domains and a peroxisomal targeting signal. Peroxisomes have never been unambiguously identified in apicomplexans. Previous results showed that the peroxisomal enzyme catalase in T. gondii is present primarily in punctate compartments anterior to the nucleus, which indicated that T. gondii may contain peroxisomes (Kaasch and Joiner 2000). However, the localization pattern of a fusion of the C-terminal 12 aa, including the putative PTS1 from T. gondii catalase to GFP, could not confirm the existence of peroxisomes, since the GFP-catalase fusion localized predominantly to the cytosol (Ding et al. 2000). Our finding that T. gondii expresses SCP-2 reopens the question whether some apicomplexans contain peroxisomes.

The yeasts S. cerevisiae and S. pombe do not encode proteins with similarity to SCP-2. Many other related yeasts express proteins carrying unfused SCP-2 domains, which makes the lack of SCP-2 in S. cerevisae and S. pombe quite puzzling. Possibly other proteins can replace the function of SCP-2, and there is at least one report on a lipid transfer protein with a broad substrate specificity that is associated with the peroxisomal membrane in S. cerevisiae (Ceolotto et al. 1996). So far this lipid transfer activity has not been connected to any gene in the completely sequenced yeast genome. S. cerevisiae can utilize a wide range of saturated and unsaturated fatty acids as sole carbon source, however, it cannot β-oxidize 2-methyl branched-chain fatty acids, such as pristanic acid (van Roermund et al. 2003). There seems to be a connection between SCP-2 and the metabolism of branched-chain fatty acids, since gene targeting in mice revealed that complete deficiency of SCP-2 resulted in impaired catabolism of methyl branched-chain fatty acyl CoAs as shown by a 10-fold accumulation of phytanic acid in SCP-2(−/−) mice (Seedorf et al. 1998). Moreover, in humans a deficiency of DBP leads to accumulation of the branched-chain fatty acids pristanic and phytanic acid (Wanders et al. 2001). We can speculate that there is a correlation between the inability to metabolize branched-chain fatty acids and the lack of SCP-2 domains in S. cerevisiae. Unfortunately, we are not aware of any investigations on the ability of the fission yeast S. pombe, or other yeasts, to metabolize branched-chain fatty acids. Such information will obviously be required before we can draw any conclusions that SCP-2 is required for metabolism of branched-chain fatty acids in fungi.

The evolution of more advanced cell systems with organelles and specialized organs is based on the simple components once evolved in bacteria. The SCP-2 domain was evolved in bacteria, where the single-domain SCP-2 presumably acts in lipid trafficking. The lipid-binding and lipid transfer properties of SCP-2 have been preserved during evolution of eukaryotes but events such as gene fission and gene fusion have placed the domain in novel contexts. The novel fusion genes formed have let eukaryotes, in particular, animals, to exploit the properties of the SCP-2 domain in organelles such as the peroxisomes, as well as in the membrane channels of the nervous system. There is, on the other hand, also a selection for less complex genes, as we have traced many events where the fused genes have been split to form novel eukaryotic genes encoding protein consisting only of SCP-2 domains.