Introduction

Carbonic anhydrase (CA; EC 4.2.1.1) is a ubiquitous enzyme catalyzing interconversion between CO2 and bicarbonate (HCO 3 ) (Smith and Ferry 2000). CA is fundamental to many biological functions including photosynthesis, respiration, and CO2 and ion transport. To date, four distinct types (α, β, γ, and δ) of CA have been known. Interestingly, no significant sequence similarities are observed among the types; hence, the catalytic function of CA is recognized as an excellent example of convergent evolution (Smith and Ferry 2000). While most mammalian and plant CA isozymes specifically belong to the α and β types, respectively, the distribution of CA in prokaryotes is irregular; some prokaryotes retain enzymes from two or three types or multiple enzymes from the same type [currently, the δ type consists of only one enzyme identified in marine diatom (Lane et al. 2005)], and others do not retain any type of CA. Hence, it is likely that the evolution of CA function in prokaryotes has a complex historical background.

Recently, a significant insight into the role of microbial CA has been provided by genetic studies in several model microorganisms. Evidence showed that knockout mutants of Ralstonia eutropha (Kusian et al. 2002), Escherichia coli (Hashimoto and Kato 2003; Merlin et al. 2003), Corynebacterium glutamicum (Mitsuhashi et al. 2004), and Saccharomyces cerevisiae (Aguilera et al. 2005) were unable to grow under ambient air, but that they grew under an atmosphere with high levels of CO2. This phenomenon is explained by the availability of bicarbonate, which is essential for the reaction catalyzed by several housekeeping enzymes such as phosphoenolpyruvate carboxylase and acetyl-CoA carboxylase. While CA-positive microorganisms can generate bicarbonate from environmental CO2 and supply it to these enzymes, CA-negative ones cannot. Hence, the former can grow even under ambient air containing a low level of CO2 (0.035%) but the latter cannot initiate growth unless they are supplied with a high concentration of bicarbonate. The latter organisms, however, can grow under a high-CO2 atmosphere since it generates a high concentration of bicarbonate to maintain natural equilibrium. This in turn indicates that CA is not essential for microbial growth under high-CO2 conditions including commensal situations. Actually, our previous study showed that an E. coli CA mutant was able to grow even under ambient air when it was cocultured with Bacillus subtilis (Watsuji et al. 2006).

The above knowledge makes us anticipate that the detailed characterization of CA distributed in microbial genome will provide significant information with regard to their history of evolution and adaptation to environment. In this paper, we deal with the absence of any CA gene in the genome of Symbiobacterium thermophilum. S. thermophilum is a syntrophic bacterium that effectively grows in coculture with a cognate Geobacillus sp. (Suzuki et al. 1988; Ueda and Beppu 2007). Although Symbiobacterium is widely distributed in natural environments (Sugihara et al. 2008; Ueda et al. 2001), its taxonomic diversity has not yet been well studied due to the difficulties in cultivation. Our recent study showed that one of the growth factors of this bacterium is CO2 generated along with the growth of Geobacillus (Watsuji et al. 2006). Based on the absence of CA gene in its genome (Ueda et al. 2004), we speculate that the requirement for high CO2 level in S. thermophilum is based on the deficiency of the catalytic function. In this, we first characterized the phylogenetic position of S. thermophilum based on ribosomal protein sequences and revealed its affiliation with the class Clostridia. The subsequent phylogenetic analyses of each type of CA distributed in this group of bacteria supported the view that S. thermophilum and several related bacteria have lost this enzyme in the course of evolution.

Materials and Methods

Phylogenetic Analyses

To determine the phylogenetic relationships among the Firmicutes, the genome sequences of the following 19 species belonging to the Firmicutes were studied: Bacillus subtilis, Bifidobacterium longum, Clostridium acetobutylicum, Corynebacterium glutamicum, Frankia sp. Ccl3, Fusobacterium nucleatum, Lactobacillus plantarum, Lactococcus lactis, Listeria monocytogenes, Mycobacterium tuberculosis, Mycoplasma genitalium, Phytoplasma sp. OY, Rubrobacter xylanophilus, Saccharopolyspora erythraea, Staphylococcus aureus, Streptococcus pyogenes, Streptomyces coelicolor A3(2), Symbiobacterium thermophilum, and Thermoanaerobacter tengcongensis. The Escherichia coli genome was used as an outgroup. We selected 31 ribosomal proteins (S2, S3, S5, S7, S8, S9, S10, S11, S12, S13, S15, S16, S17, S19, L1, L2, L3, L4, L5, L6, L7, L13, L14, L15, L16, L17, L18, L20, L22, L23, and L27) that were distributed among all 20 bacteria and constructed 31 multiple alignments using Clustal W (Thompson et al. 1994). Then a concatenated multiple alignment of the 31 multiple alignments was generated. The whole multiple alignment had 5,742 amino acid sites, including 1,567 gap/insertion sites. Hence, phylogenetic analyses were performed based on 4,175 amino acid sites without the gap/insertion sites.

To study the phylogenetic relationships among the Clostridia, the genome sequences of the following 21 species of the Clostridia were analyzed: Alkaliphilus metalliredigenes, A. oremlandii, Caldicellulosiruptor saccharolyticus, Carboxydothermus hydrogenoformans, Clostridium acetobutylicum, C. beijerinckii, C. botulinum A, C. difficile, C. kluyveri, C. novyi, C. perfringens 13, C. phytofermentans, C. tetani E88, C. thermocellum, Desulfitobacterium hafniense, Desulfotomaculum reducens, Moorella thermoacetica, Pelotomaculum thermopropionicum, Symbiobacterium thermophilum, Syntrophomonas wolfei, and Thermoanaerobacter tengcongensis. Bacillus subtilis genome was used as an outgroup. We selected 27 ribosomal proteins (S2, S3, S7, S8, S9, S11, S12, S13, L1, L2, L3, L4, L5, L6, L10, L11, L13, L14, L15, L16, L17, L18, L20, L21, L22, L24, and L27) that were distributed among all the 22 bacteria and constructed 27 multiple alignments using Clustal W (Thompson et al. 1994). Then a concatenated multiple alignment of the 27 multiple alignments was generated. The whole multiple alignment had 4,574 amino acid sites, including 560 gap/insertion sites. Hence, phylogenetic analyses were performed based on 4,014 amino acid sites without the gap/insertion sites.

Neighbor-joining trees were reconstructed using the MEGA software, version 4 (Tamura et al. 2007). The bootstrap was performed with 1,000 replicates. The other default parameters were not changed. A maximum likelihood tree was reconstructed using the PHYLIP software, version 3.67 (http://evolution.genetics.washington.edu/phylip.html). The rate variation among sites was considered with a gamma-distributed rate (α = 1). The number of categories was three. The other default parameters were not changed.

To search for the distribution of CA in the Firmicutes, Blastp search was performed with regard to the 128 genome sequences of the Firmicutes in the KEGG database (Kanehisa et al. 2006). The search was carried out using the three distinct types of CAs (α, β, and γ) from the cyanobacterium Synechococcus elongatus PCC7942 as a query sequence (KEGG entry numbers: α, Synpcc7942_1388; β, Synpcc7942_1447; and γ, Synpcc7942_1961). The 19 α types, 36 β types, and 44 γ types were multiply aligned using the Clustal W (Thompson et al. 1994). A total of 145 (α-type), 151 (β-type), and 147 (γ-type) amino acid sites excluding gap/insertion sites were considered for tree construction. Each phylogenetic tree was constructed based on each multiple alignment using MEGA 4 (Tamura et al. 2007).

Results

Phylogenetic Characterization of S. thermophilum Based on Ribosomal Proteins

Previously, we reported the ambiguous phylogenetic position of S. thermophilum in a gram-positive bacterial cluster based on 16S rRNA gene sequence; it fell into a deep phylum located between Actinobacteria and Firmicutes. In the conventional phylogeny based on the 16S rRNA gene, the Firmicutes was not a monophyletic lineage due to the polyphyly of the Clostridia (Supplementary Fig. S1a). This does not appropriately represent the phylogenetic relationship among the Firmicutes. On the other hand, in the phylogenetic tree based on the comparison of 23S rRNA genes, the monophyly of the Firmicutes was supported by only a 60% bootstrap value (Supplementary Fig. S1b), which has a different topology from the 16S rRNA gene tree.

To clarify the phylogenetic position of S. thermophilum, we carried out an analysis based on ribosomal protein comparison (see Materials and Methods). We expected this analysis to provide clearer information regarding the taxonomic affiliation of this bacterium since a recent finding indicated that the phylogenetic relationship of ribosomal proteins represents the overall information obtained from the whole-genome sequence more appropriately than that of ribosomal RNAs (Oshima and Nishida 2007; Zhao et al. 2005).

In contrast to the rRNA gene-based phylogeny, Clostridium, Symbiobacterium, and Thermoanaerobacter were clustered with 100% bootstrap support in the phylogenetic tree based on the 31 ribosomal proteins (Fig. 1a). This indicates that Symbiobacterium belongs to the class Clostridia of the division Firmicutes. The monophyly of the Firmicutes was supported with a 96% bootstrap value, which consisted of the following three classes: Bacilli, Clostridia, and Mollicutes (Fig. 1a). On the other hand, the phylogenetic analyses among Clostridia by the neighbor-joining (Fig. 1b) and maximum likelihood (data not shown) methods located Symbiobacterium within the group but in an ambiguous position.

Fig. 1
figure 1

Phylogenetic relationships based on ribosomal proteins. a Relationships among representative organisms belonging to Firmicutes and Actinobacteria. The neighbor-joining tree was constructed based on 4,175 amino acid sites of the ribosomal proteins S2, S3, S5, S7, S8, S9, S10, S11, S12, S13, S15, S16, S17, S19, L1, L2, L3, L4, L5, L6, L7, L13, L14, L15, L16, L17, L18, L20, L22, L23, and L27 of the 19 species belonging to the Firmicutes and E. coli (outgroup). The number at each node represents the percentage in the bootstrap analysis. The bar indicates 10% difference of the evolutionary distance. b Phylogenetic relationships among organisms within Clostridia inferred from the neighbor-joining method. The tree was constructed based on 4,014 amino acid sites of the ribosomal proteins S2, S3, S7, S8, S9, S11, S12, S13, L1, L2, L3, L4, L5, L6, L10, L11, L13, L14, L15, L16, L17, L18, L20, L21, L22, L24, and L27 of the 21 Clostridia and B. subtilis (outgroup). Each percentage in parentheses indicates the genomic G + C content (mol%), and β and γ beside the parentheses indicate the presence of β- and γ-carbonic anhydrase in each genome, respectively. The number at each node represents the percentage in the bootstrap analysis. The bar indicates 5% difference of the evolutionary distance. A similar topology was shown in a tree constructed by the maximum likelihood method

Phylogenetic Characterization of CA Distributed in Firmicutes

To study the distribution and evolution of CA in the Firmicutes, we selected 18 α-, 35 β-, and 43 γ-type CAs of the Firmicutes based on the top score of the Blastp search result and carried out phylogenetic analyses (see Materials and Methods). The Mollicutes genomes did not contain any type of CA gene. The markedly different profiles exhibited by the three types of CA (Fig. 2) suggested that each type has evolved independently. In some cases, there is a possibility that the distribution involved multiple horizontal gene transfers (multiple alignments are given in Supplementary Fig. S2).

Fig. 2
figure 2

Phylogenetic relationships among carbonic anhydrases of the Firmicutes. A total of 145 (α-type), 151 (β-type), and 147 (γ-type) amino acid sites were considered for tree construction. The cyanobacterium Synechococcus is the outgroup. The number at each node represents the percentage in the bootstrap analysis

The α type

The α-type CA was distributed only in the class Bacilli (the Bacillales and Lactobacillales). Neither of the orders—Bacillales or Lactobacillales—was monophyletic, which raised the possibility that this type of CA has been distributed via multiple horizontal gene transfers.

The β type

The β-type CA was distributed in the order Bacillales and the class Clostridia but not in the order Lactobacillales. Neither the order Bacillales nor the class Clostridia was monophyletic. These results suggest that the ancestor of the Lactobacillales lacked this enzyme, and that horizontal gene transfer between the Clostridia and the Bacillales occurred after the two orders—Bacillales and Lactobacillales—branched off during the course of evolution.

The γ type

The γ-type CA was distributed in the Bacillales, the Clostridia, and the Enterococcus genus of the Lactobacillales. The Bacillales were monophyletic, but the Clostridia were not. The Bacillales enzymes were more closely related to those of the Clostridia than of Enterococcus. On the other hand, in the phylogeny based on ribosomal proteins (Fig. 1a), the Bacillales were more closely related to the Lactobacillales than to the Clostridia. This raised the possibility that a horizontal gene transfer of γ-type CA had occurred between the Bacillales and the Clostridia.

Discussion

Originally, S. thermophilum was described as a gram-negative bacterium based on several physiological properties (Suzuki et al. 1988). However, analysis of 16S rRNA gene revealed its true affiliation with the gram-positive group, although it did not clearly show to which class it belongs (Ohno et al. 2000) (Supplementary Fig. S1). Despite the Actinobacteria-like high G + C content (68.7 mol%), a whole genome sequencing study revealed many Firmicutes-like features in S. thermophilum (Ueda et al. 2004; Gao and Gupta 2005). This study successfully provided a clear view that Symbiobacterium belongs to Clostridia by the ribosomal protein-based phylogenetic analysis.

The class Clostridia is a large group of the division Firmicutes represented by the genus Clostridium. In traditional taxonomy, microorganisms have to meet only the following four criteria to be classified as a Clostridium species: form endospores, be obligatorily anaerobic, have a gram-positive cell wall, and exhibit incompetence in dissimilatory sulfate reduction. Hence, the genus Clostridium includes diverse psychrophilic, mesophilic, and thermophilic species with a wide range of low G + C-content bacteria (21–54 mol%) (Hippe et al. 1992). Despite the high G + C content, S. themophilum fulfills the following criteria: (i) a whole set of genes involved in endospore formation is present in its genome, and the development of a spore-like structure was actually observed (Ueda et al. 2004); (ii) genes involved in anaerobic respiration are present in the genome (Ueda et al. 2004), and actually S. thermophilum grows effectively under anaerobic conditions (Ohno et al. 1999); (iii) the electron microscopic observation of cell wall structure showed the presence of a unit structure that resembles the characteristic cell surface layer (S-layer) of the Bacillaceae (Ohno et al. 2000), and the genome contains six putative S-layer-associated proteins (Ueda et al. 2004); and (iv) S. thermophilum performs nitrate respiration but does not reduce sulfate (our unpublished observation). These features indicate that Symbiobacterium is closely related to Clostridium in terms of traditional classification. Hence, we conclude that the above prediction based on ribosomal proteins is most reliable.

The wide distribution of β- and γ-CAs in the genera of the Clostridia, such as Clostridium, Desulfitobacterium, Desulfotomaculum, and Pelotomaculum (Fig. 2), strongly suggests that these enzymes were retained by an ancestor of the Clostridia. The above prediction that Symbiobacterium belongs to the Clostridia hence supports the view that this organism lost the CA genes during evolution. A similar idea is adaptable to several other Clostridia, including Clostridium novyi, Moorella thermoacetica, and Carboxydothermus hydrogenoformans, whose genome does not retain any CA homologues (Fig. 1b).

It is known that some symbiotic bacteria lack genes involved in primary metabolism, for example, intestinal bacteria such as Buchnera sp. (endosymbiont of aphid) and Lactobacillus johnsonnii (intestinal lactobacilli) lack amino acid biosynthesis genes, which are compensated for by the activity of the host organism (Shigenobu et al. 2000; Pridmore et al. 2004). Such genetic defects in symbionts’ genome have probably occurred after establishing a tight, symbiotic relationship with the host organism. Similarly, the loss of the CA gene in Symbiobacterium and some Clostridia may have occurred based on the dependence on high-CO2 environments.

As predicted in our recently published paper (Ueda et al. 2008), it is possible that a large population of high-CO2-requiring microorganisms including CA-negative strains has not yet been isolated due to their unculturability under the conventional cultivation conditions using a normal atmosphere. Hence, the potential phylogenetic diversity of such kinds of microorganisms could have not yet been adequately taken into account in the current systematics. We anticipate that precise phylogenetic characterization based on the distribution of CA will provide deep insight into the evolutionary traits of environmental microorganisms.