Introduction

The Synergistetes group of bacteria is a recently recognized phylum for which 40 organisms have been isolated and over three hundred 16S rRNA sequences are available (Hugenholtz et al. 2009; NCBI Taxonomy 2012). The phenotypic characteristics shared by the species from this phylum include their gram-negative cell wall, anaerobic existence, and rod/vibrioid cell shape (Jumas-Bilak et al. 2009). Although the presence of lipopolysaccharides, which is an important characteristic of diderm cell envelopes, in Synergistetes species has not yet been reported, they do contain genes for various proteins that are involved in lipopolysaccharide biosynthesis (Sutcliffe 2010). While a few species have been shown to be asaccharolytic, all Synergistetes have the ability to ferment amino acids (Magot et al. 1997; Baena et al. 1998, 1999a; Surkov et al. 2001; Hongoh et al. 2007; Downes et al. 2009; Jumas-Bilak et al. 2009). The Synergistetes inhabit primarily anaerobic environments including animal gastrointestinal tracts, soil, oil wells and wastewater treatment plants and they are also present in sites of human diseases such as cysts, abscesses and areas of periodontal disease (Godon et al. 2005; Kumar et al. 2005; de Lillo et al. 2006; Horz et al. 2006; Jumas-Bilak et al. 2007; Vartoukian et al. 2007; Zijnge et al. 2010). Due to their presence at illness related sites, the Synergistetes are suggested to be opportunistic pathogens but they can also be found in healthy individuals in the microbiome of the umbilicus and in normal vaginal flora (Vartoukian et al. 2007; Marchandin et al. 2010). Other species from this phylum have been identified as significant contributors in the degradation of sludge for production of biogas in anaerobic digesters and are potential candidates for use in renewable energy production through their production of hydrogen gas (McSweeney et al. 1993; Maune and Tanner 2012; Delbes et al. 2001; Riviere et al. 2009; Ziganshin et al. 2011).

Synergistetes species were first identified with the isolation of Synergistes jonesii from which the phylum name “Synergistetes” is derived. S. jonesii was isolated from the rumen of a goat in 1992 and described as a gram-negative staining, anaerobic, rod-shaped, commensal bacteria with the ability to degrade the toxic compound pyridinediol 3-hydroxy-4-1(H)-pyridone (Allison et al. 1992; McSweeney et al. 1993). S. jonesii 16S rRNA was not closely related to that of any bacteria characterized at the time but the species was later misclassified as a member of the Deferribacteres group (Garrity et al. 2004). Around the same time, the species Thermoanaerovibrio acidaminovorans was isolated from methanogenic sludge and indicated to be a member of the Selenomonas genus within the Firmicutes phylum (Guangsheng et al. 1992; Baena et al. 1999b). These misclassifications of Synergistetes continued, as species from the genera Aminobacterium and Dethiosulfovibrio were described as forming a deep-branching clade beside cluster V within Clostridia, consisting of a group composed of Thermoanaerobacter species (Magot et al. 1997; Baena et al. 1998). Several other organisms now considered Synergistetes were also placed among the Syntrophomonadaceae family of the Firmicutes (Garrity et al. 2004; Dahle and Birkeland 2006; Diaz et al. 2007). Eventually, efforts based on 16S rRNA sequences by Jumas-Bilak et al. (2009), identified the monophyletic nature of the Synergistetes within the bacterial domain and proposed that these “Synergistia jonesii-like” species form a distinct phylum, now named the Synergistetes (Jumas-Bilak et al. 2009). All characterized Synergistetes species are currently placed under the class Synergistia, the order Synergistiales and the family Synergistaceae (Jumas-Bilak et al. 2009). This family until recently was comprised of 11 genera, namely: Aminiphilus, Aminobacterium, Aminomonas, Anaerobaculum, Cloacibacillus, Dethiosulfovibrio, Jonquetella, Pyramidobacter, Synergistes, Thermoanaerovibrio and Thermovirga (Hugenholtz et al. 2009; Jumas-Bilak et al. 2009; NCBI Taxonomy 2012). Recently, a new genus Fretibacterium has also been described, which contains a single species Fretibacterium fastidiosum that was previously known as Synergistes bacterium SGP1. A candidate genus Tammella, composed of a group of related and uncultured species found within termite guts, has also been suggested to belong to the phylum Synergistetes (Hongoh et al. 2007; Hugenholtz et al. 2009).

While the Synergistetes are currently classified as belonging to a separate phylum based on their 16S rRNA sequences, no characteristic of these bacteria is known that can easily differentiate a Synergistetes species from other bacteria. Though all cultured Synergistetes can ferment amino acids, various species from other taxa also share this ability (Hou et al. 2004; Fonknechten et al. 2010). The availability of genome sequences has allowed for the employment of comparative genome approaches for the identification of molecular markers that are specific for different bacterial groups at various taxonomic levels (Gupta 1998; Griffiths et al. 2005; Gupta and Bhandari 2011; Gupta and Shami 2011). Using genomic sequences, our lab has pioneered the discovery of conserved signature insertions/deletions (i.e. indels, CSIs) present in protein sequences that are specific for particular groups of organisms (Gupta 1998, 2009; Gao and Gupta 2005; Griffiths and Gupta 2006; Gupta and Bhandari 2011). The group specific presence of CSIs can be parsimoniously explained through rare genetic changes occurring in a common ancestor to the particular groups of species and then being passed down through vertical descent (Gupta 1998, 2000, 2009). Such CSIs, which are present in a related group of species and absent in other organisms, are useful as molecular markers for the identification of species belonging to a taxonomic group and the demarcation of the group’s boundaries. Additionally, through comparison of sequences and based on the presence or absence of the indicated CSIs in outgroup species, a rooted phylogenetic relationship can be inferred among the species (Rivera and Lake 1992; Baldauf and Palmer 1993; Gupta 1998, 2001).

From the species identified as Synergistetes, complete or annotated draft genomes are now available for nine species (described below). In the present work, we have carried out detailed comparative analyses on protein sequences from these genomes to identify molecular markers (CSIs) that are specific for the phylum Synergistetes and some of its subgroups, as well as those that provide information regarding its relationship to other bacterial phyla. Our work has identified numerous CSIs that provide highly specific markers for all sequenced members of the Synergistetes phylum as well as a number of its sub-groups. Additionally, several CSIs that are commonly shared by Synergistetes and some species from other bacterial phyla suggest potential cases of lateral gene transfers. These CSIs provide novel and powerful means for the identification/circumscription of species from the phylum Synergistetes and for different types of studies on them.

Phylogenetic analysis of the genome sequenced Synergistetes

The complete genomes for Aminobacterium (Amb.) colombiense (Chertkov et al. 2010), T. acidaminovorans (Chovatia et al. 2009) and Thermovirga (Tv.) lienii (Dahle and Birkeland 2006) have been published while annotated draft genomes were accessible for Dethiosulfovibrio peptidovorans (Labutti et al. 2010), Aminomonas (Amm.) paucivorans (Pitluck et al. 2010), Anaerobaculum (An.) hydrogeniformans, Jonquetella anthropi and Pyramidobacter piscolens (NCBI genomic database 2012). Limited sequence data for F. fastidiosum, which is currently referred to as Synergistetes bacterium SGP1 in the NCBI database, was also available (Vartoukian et al. 2012). These species represent nine of twelve characterized genera from the phylum. Some characteristics of these organisms and their genomes are provided in Table 1.

Table 1 Characteristics of the Synergistetes species with sequenced genomes

The relationships of the species in the Synergistetes phylum have thus far been primarily analyzed through 16S rRNA sequence data. However, it is now recognized that trees based on a larger dataset of genes or proteins representing diverse functional categories are more reliable in resolving phylogenetic relationships than a single gene such as the 16S rRNA or a single protein (Rokas et al. 2003; Ciccarelli et al. 2006; Wu and Eisen 2008). Therefore, in order to visualize the relationship among the sequenced Synergistetes species, phylogenetic trees based upon concatenated sequences of ten housekeeping proteins were constructed. The 10 proteins that were used for phylogenetic analysis (viz. ArgRS, GyrB, Hsp70, ribosomal proteins L1 and L5, RpoB, RpoC, TrxB, UvrD and ValRS) are found in most bacteria and they have been extensively used for other phylogenetic studies (Bocchetta et al. 2000; Ciccarelli et al. 2006; Soria-Carrasco et al. 2007; Zhaxybayeva et al. 2009; Naushad and Gupta 2012). In addition to the Synergistetes species, the dataset that was employed for phylogenetic analyses also contained information for species from several other bacterial phyla including those in whose proximity the Synergistetes species were observed to branch in earlier studies (Guangsheng et al. 1992; Magot et al. 1997; Baena et al. 1998; Garrity et al. 2004; Diaz et al. 2007; Herlemann et al. 2009). The results for the multi-protein concatenated phylogenetic analysis are presented in Fig. 1a. In parallel, a 16S rRNA tree was also created to investigate the congruence with the protein tree (Fig. 1b).

Fig. 1
figure 1

Phylogenetic tree for sequenced Synergistetes species and representative species from some closely related bacterial phyla. a A neighbour joining (NJ) distance tree based upon concatenated sequences for 10 highly conserved and widely distributed proteins (ArgRS, GyrB, Hsp70, ribosomal proteins L1 and L5, RpoB, RpoC, TrxB, UvrD and ValRS). The numbers on the node indicate the % bootstrap score (or statistical support) for each node in the NJ and maximum-likelihood analyses, respectively. The dashes (–) at nodes indicate that the statistical support for this particular branching relationship was <50 % in the NJ or ML analysis. b A NJ tree for the same species as shown in (a) based upon 16S rRNA sequences. The trees were constructed as described in our earlier work (Gupta and Bhandari 2011)

In both the protein tree and the 16S rRNA tree, the Synergistetes species formed a monophyletic clade that was distinct from all other bacterial groups, supporting their assignment into a separate phylum. The species such as Syntrophomonas wolfei or Selenomonas sputigena in whose proximity some of the Synergistetes species were indicated to branch in earlier studies, branched distinctly from them. Although the relationships among other bacterial species/phyla differed within the two trees and they were mostly unresolved, within the Synergistetes clade both the concatenated protein tree and the rRNA tree displayed a similar branching order. In both trees, the Synergistetes species showed a split into two clades at the highest level. One clade is comprised of Thermanaerovibrio and Aminomonas sharing a distant relationship with the Thermovirga and Anaerobaculum species while the other clade is comprised of the five species from the genera Aminobacterium, Jonquetella, Pyramidobacter, Dethiosulfovibrio and Fretibacterium. The concatenated protein tree shows, with high statistical support, that the J. anthropi and P. piscolens species branch together and that D. peptidovorans is the closest relative of these two species. The trees also show a well-supported, close relationship between T. acidaminovorans and Amm. paucivorans. The species Amb. colombiense and F. fastidiosum are observed to robustly branch together, though their relationship to the DethiosulfovibrioPyramidobacterJonquetella clade was strongly supported only by the NJ concatenated protein tree. The ML analysis and the rRNA tree weakly supported this relationship. The position of An. hydrogeniformans and Tv. lienii species within the phylum was poorly resolved in both the concatenated protein and the rRNA trees. In both the trees, short branches connect these species to the ThermoanaerovibrioAminomonas clade and their grouping in this clade was weakly supported by the ML tree and the rRNA tree.

Identification of CSIs that are specific for the Synergistetes species

For the identification of CSIs, BlastP searches against the non-redundant protein sequence (nr) database were carried out on all proteins from the genome of the species Amb. colombiense DSM 12261 and T. acidaminovorans DSM 6589 from the Synergistetes phylum (Altschul et al. 1997, 2005). Using the ClustalX program, multiple sequence alignments were created for all proteins for which high scoring homologues were available from most Synergistetes species as well as several other groups of organisms. The aligned proteins were visually inspected to identify insertions or deletions that were flanked by conserved amino acids on both sides. Insertions and deletions that were not flanked by at least 4–5 conserved residues within the neighbouring 30–40 residues were not further considered as they do not provide useful molecular markers (Gupta 1998, 2001; Gupta and Bhandari 2011). More detailed BlastP searches (searching for 250 of the closest sequence matches) were then carried out on 50–80 aa long segments (longer in some cases) containing the indels and its flanking conserved regions to determine the species distribution for the identified indels. Indels predominantly found in Synergistetes species or those that were found in Synergistetes along with some other taxonomic group of organisms were retained and compiled into signature files. The signature files shown here contain sequence alignments of various detected indels along with the flanking conserved regions for all Synergistetes and representative species from other taxonomic groups for which information was detected in the Blast searches. However, due to spatial considerations sequence information for only limited numbers of species from other groups are shown here. In a few cases, where more than one homologue of a protein was detected for the same species, sequence information for different homologues was included only if they showed differing characteristics (viz. one homologue contained the indel while the other(s) did not). All of the indels reported in this work are independent of each other and they are not part of indels for other larger clades.

CSIs that are specific for the Synergistetes phylum

CSIs in proteins brought about by rare genomic changes that are restricted to phylogenetically well-defined groups are useful as molecular markers that provide means for evaluating evolutionary relationships (Gupta 1998; Rokas and Holland 2000). Our analyses of genome sequences from Synergistetes species have identified 32 CSIs that help define and demarcate the species of this phylum. Some characteristics for these Synergistetes specific CSIs are listed in Table 2 and two examples are provided in Fig. 2. Figure 2a depicts two inserts that are present in close proximity within the β′ subunit of the RNA polymerase enzyme (RpoC), an essential enzyme responsible for transcription of genes in all organisms. The region of the protein shown is highly conserved among all organisms and it contains a 2 aa insert that is specifically present in all homologues of the RpoC enzyme from species of the phylum Synergistetes. Another example of a conserved indel that is specific for all Synergistetes species is shown in Fig. 2b. In this case, the α-subunit of DNA polymerase III contains a 1 aa insert that is specifically present in all Synergistetes species (Fig. 2b) but not in any other bacteria. The absence of amino acid residue both the CSIs shown in Fig. 2 in all other organisms except Synergistetes indicates that these CSIs constitute inserts rather than deletions. Within the RpoC protein, in the neighborhood of the conserved insert that is shown in Fig. 2a, another very large insert consisting of between 311 and 316 aa is also uniquely present in all sequenced Synergistetes species. The sequence region corresponding to this large insert is shown in Fig. 3. BlastP searches with this insert show no significant hits for any proteins from organisms outside of other Synergistetes, indicating that this insert is a distinctive characteristic of the species of this phylum. Because of its large size, this large insert likely forms a unique domain of the RpoC protein that is only found in the Synergistetes species.

Table 2 Characteristics of the CSIs that are specific for the Synergistetes phylum
Fig. 2
figure 2

Partial sequence alignments of conserved region within the a RNA polymerase β′ subunit (RpoC) and b DNA polymerase III α subunit showing two CSIs (boxed) that are uniquely present in species from the Synergistetes phylum, but not in any other bacteria. The dashes (–) in this and all other alignments indicate identity with the corresponding amino acid on the top line. The numbers in the second column are the GenBank identifier numbers of the particular proteins. The numbers below the taxon identifiers indicate the number of species detected with the indel and the total species of the respective taxon which were detected. Only representative species are shown in the alignments, however, no other species in the indicated number of blast hits contained the indel (0/250). Information for 12 other CSIs in widely distributed proteins that are specifically present in all sequenced species from the Synergistetes phylum is provided in Supplemental Figs. 1–11 and summarized in Table 2

Fig. 3
figure 3

Partial amino acid sequence alignment of the RpoC protein showing a large insert that is specifically present in all sequenced Synergistetes species. Partial sequence for the neighbouring regions is also shown in the alignment. The dashes in this particular alignment represent sequence gaps. The identical and conserved residues in this alignment are indicated by * and semicolons (:), respectively. Blastp searches with the insert sequence (without the flanking region) show no significant hit for any protein except for the RpoC homologs from the Synergistetes species. Sequence information is shown for only a few Synergistetes, but this insert is present in all sequenced species

Other indels present in all genome sequenced Synergistetes, and absent in species from other taxonomic groups, are depicted in Supplementary Figs. 1–12 and some characteristics of them are summarized in Table 2 . These indels are present in proteins involved in important cellular processes such as DNA replication (e.g. DNA polymerase I), protein translation (30S ribosomal protein S1) and cell metabolism (2-oxoglutarate synthase). For some of these Synergistetes specific CSIs, protein homologues for one or more Synergistetes species were not detected (Supplementary Figs. 7–13). A 3 aa insert in 2-oxoglutarate synthase (Supplementary Fig. 7) is an example of such an indel. The insert is present in all detected Synergistetes but in this case the homologue for F. fastidiosum was not found in BlastP searches. It is possible that the gene coding for this protein has been lost from this species due to genetic, environmental or physiological factors. However, as fully published genome sequences among the Synergistetes species are available for only T. acidaminovorans, Tv. lienii and Amb. colombiense, the lack of a protein homologue for some of these species could also be due to the fact that their entire genomes have not yet been sequenced and/or annotated. Nevertheless, since these CSIs are only found in the Synergistetes species and not in any other bacteria (0/250; top 250 blast hits), they also provide reliable molecular markers for this group.

The indels identified above are completely specific for the Synergistetes species. However, for a small number of other CSIs discovered in this work, along with their presence in all Synergistetes, these CSIs were also found in a small number (usually 1–2) of species belonging to other taxonomic groups. Two such examples are shown in Fig. 4. The first of these is another 2 aa long insert in the RpoC protein (Fig. 4a). This CSI is found in all Synergistetes in a highly conserved region of the protein, however it is also present in the species Eubacterium yurii from the Clostridia class of the phylum Firmicutes. The CSI is not present in any other organisms, including other Firmicutes species. Different possibilities exist for the presence of the CSI in a single species outside of the phylum. The shared presence of the CSI in E. yurii, a species not considered to be directly related to the Synergistetes (see Fig. 1), might be the result of a lateral gene transfer event wherein the Synergistetes gene containing the indel might have been introduced into E. yurii. Alternatively, it is possible that two separate genetic events led to the presence of similar CSIs in Synergistetes and E. yurii. A second example of such an indel is shown in Fig. 4b. Here a 2 aa insert is present within the 30S ribosomal protein S8 of Synergistetes species and an uncultured Termite group 1 phylotype RS-D17 considered to belong to the phylum Elusimicrobia. As shown, Elusimicrobium minutum itself contains a 1 aa insert in a similar position in the protein. It is possible that the Elusimicrobia are a sister taxon of the Synergistetes and the indel has been passed on to both phyla through a common ancestor. However, this postulation is not supported by the phylogenetic trees (Fig. 1) and it is possible that the CSI in these two taxa occurred independently or by means of LGT. The information for other CSIs where indels found in Synergistetes are also present in one or two species from other taxa is summarized in Table 2 and the sequence alignments for these are presented in Supplementary Figs. 13–20.

Fig. 4
figure 4

Partial sequence alignments of RpoC and RpsH proteins showing two CSIs, which in addition to the Synergistetes species are also present in isolated other bacterial species. a Excerpt from RpoC sequence alignment depicting a 2 aa conserved insert which in addition to the Synergistetes is also present in Eubacterium yurii. b Sequence alignment of the ribosomal protein S8 (RpsH) showing a 2 aa insert which in addition to the Synergistetes is also found in an Uncultured Termite group 1 bacterium phylotype Rs-D17. A 1 aa insert in this position is also present in Elusimicrobia minutum. Sequence information for 8 other CSIs in different protein containing an isolated exception is provided in Supplementary Figs. 13–20 and Table 2

A further seven CSIs, specific for species of the Synergistetes phylum, were discovered where one species from the phylum was detected to lack the indel. A 3 aa insert in the 3-4-dihydroxy-2-butanone 4-phosphate synthase (Supplementary Fig. 21) is an example of such a CSI. The insert is present in all detected species of the Synergistetes except for P. piscolens. No species outside of the phylum contain the insert. The information for other such CSIs is summarized in Table 2 and sequence alignments for them are presented in Supplementary Figs. 21–27. It is possible that these CSIs were also originally introduced in a common ancestor of the Synergistetes phylum but they were lost in some species over time due to ecological/physiological pressures or by mechanisms such as LGT followed by gene loss. In some of the CSIs described above, in addition to the CSIs that were specific for the Synergistetes, indels of different lengths were also present in species from other taxonomic groups. Due to their different lengths, these CSIs have likely originated from independent genetic events.

CSIs that are specific for subgroups of the Synergistetes phylum

All Synergistetes species are currently classified as part of a single class (Synergistia), order (Synergistales) and family (Synergistaceae) (Jumas-Bilak et al. 2009; www.bacterio.cict.fr). The relationships among the species/genera of this phylum are not well understood. In the phylogenetic trees based upon concatenated protein sequences and the 16S rRNA a number of strongly supported relationships among the species within this phylum are observed (Fig. 1). Importantly, in the present work, our analyses of protein sequences from Synergistetes have led to discovery of several CSIs that are commonly shared only by species from this phylum and that are absent in all others. These CSIs independently support a specific evolutionary relationship among these species and they, in conjunction with the results from phylogenetic analyses, can be used for determination of the relationships among the members of the phylum Synergistetes.

In the phylogenetic trees shown in Fig. 1, a clade consisting of D. peptidovorans, J. anthropi and P. piscolens is supported with high statistical support in both the concatenated protein tree and the rRNA tree. In our analysis, we have identified seven indels (Table 3) that are uniquely present in these three species supporting independently that these three species are closely related and form a distinct clade within the Synergistetes phylum. The first of these is a 4 aa deletion in the penicillin-binding protein 1A family protein which is involved in cell wall construction (Fig. 5). This deletion is found only in homologues of the protein from D. peptidovorans, P. piscolens and J. anthropi and all other Synergistetes, as well as non-Synergistetes species, lack this deletion. An additional 6 CSIs specific to these three organisms were discovered in the proteins tRNA modification enzyme TrmE, ribonucleoside diphosphate reductase, putative DEAD/DEAH box helicase, RpoB and the PlsC proteins. Information for these CSIs is summarized in Table 3 and their sequence alignments are presented in Supplementary Figs. 28–33. Among the three organisms which are part of this clade, J. anthropi and P. piscolens were observed as being more closely related to each other than either is to D. peptidovorans. This close association is underscored by a total of 15 CSIs, including an example that is shown in Fig. 6, a 1 aa deletion in a conserved region of the enzyme RNA polymerase β subunit. Other indels that provide similar molecular evidence for the observed close relationships between these two genera are presented in Supplementary Figs. 34–47 and information for them is summarized in Table 4. The fidelity of these molecular markers can be tested on cultured but unsequenced members of the phylum Synergistetes and as more species belonging to these genera are sequenced, the identified CSIs should provide molecular markers for their induction into the clade formed by this sub-group of the phylum.

Table 3 Characteristics of the CSIs that are Specific for a Clade Consisting of J. anthropi, P. piscolens and D. peptidovorans
Fig. 5
figure 5

Partial sequence alignment of a family 1A penicillin-binding protein containing a 4 aa deletion that is specific for D. peptidovorans, P. piscolens and J. anthropi. Sequence information for five other CSIs that are specific for this clade of species is presented in Table 3 and Supplementary Figs. 28–33

Fig. 6
figure 6

Excerpts from sequence alignment for RNA polymerase β subunit (RpoB) showing a 1 aa deletion that is specifically present in J. anthropi and P. piscolens. The region contains a 1 aa deletion specific for the three species. Sequence information for 14 other CSIs that are specific for these two species is presented in Table 4 and Supplementary Figs. 34–47

Table 4 Characteristics of CSIs that are specific for J. anthropi and P. piscolens

The phylogenetic trees also support a cladal relationship among two other species, Amm. paucivorans and T. acidaminovorans, which branch as sister organisms with high statistical support (Fig. 1). The clade harbouring these genera has been proposed to form a higher-level taxon within the phylum (Jumas-Bilak et al. 2009). In the present work, we have identified 7 CSIs that differentiate the species representing these two genera from all other species and support a specific grouping of the genera Thermanaerovibrio and Aminomonas (Table 5). Among these CSIs is a 2 aa insert in enzyme S-adenosyl-methionine isomerase (Fig. 7). The information for 6 other CSIs supporting a specific relationship among these two species is provided in Table 5 and their sequence alignments are depicted in Supplementary Figs. 48–53. Two other CSIs identified in the present work, which include a 1–2 aa deletion in the ribosomal protein L13 (Supplementary Fig. 54) and 2 aa deletion in DNA gyrase B (Supplementary Fig. 55), are present in all detected Synergistetes species except Thermanaerovibrio and Aminomonas. The absence of these CSIs in the two species suggests that this clade may have diverged from the common Synergistetes ancestor before the other species of the phylum and the two indels may have been introduced after the divergence of this clade from the common Synergistetes ancestor. A loss of this signature from this clade after its divergence from other Synergistetes can also explain the observation. These indels also support a close relationship among the genera ThermanaerovibrioAminomonas and information for them is also summarized in Table 5. These two species were observed to branch in a weakly supported clade with the Tv. lienii and An. hydrogeniformans (Fig. 1). However, only 1 CSI supporting the three-species-clade with Tv. lienii was identified in a protein of unknown function (Supplementary Fig. 56) and no CSI specific for all four organisms was discovered.

Table 5 Characteristics of the CSIs that are specific for a clade consisting of T. acidaminovorans and Amm. paucivorans
Fig. 7
figure 7

Partial sequence alignment of S-adenosylmethionine/tRNA-ribosyltransferase-isomerase protein showing a 2 aa insert in a conserved region that is specific for T. acidaminovorans and Amm. paucivorans species. Nine other CSIs that are specific for these two species have also been identified (Table 4 and Supplementary Figs. 34–47)

In the phylogenetic trees (Fig. 1), the species Amb. colombiense and F. fastidiosum were observed to branch with J. anthropi, P. piscolens and D. peptidovorans. A specific relationship among these species is also supported by two of the identified CSIs. The first of these CSIs consists of a 1 aa del in GyrB that is uniquely present in all five of these species (Fig. 8a). Another CSI in orotidine 5′-phosphate decarboxylase, also consisting of a 1 aa deletion, is commonly shared by Amb. Colombiense, J. anthropi, P. piscolens and D. peptidovorans (Fig. 8b). A homologue for this protein was not detected for F. fastidiosum, whose genome has not been fully sequenced. Thus, it is likely that this CSI will also be present in this species and could provide an additional molecular marker for this clade.

Fig. 8
figure 8

Partial sequence alignments of a DNA Gyrase subunit B and b orotidine 5′-phosphate decarboxylase showing two CSIs in conserved regions that are specific for the species D. peptidovorans, J. anthropi, Amb. colombiense, P. piscolens, F. fastidiosum, which form or define a higher clade within the Synergistetes group of species

CSIs that are commonly shared by species of the Synergistetes phylum with other taxonomic groups

The Synergistetes is a taxonomic group that has only recently been identified as a separate phylum within the bacterial domain. Though it branches distinctly in the 16S rRNA trees with long branches separating it from other bacterial groups (Fig. 1; Jumas-Bilak et al. 2009), species from the phylum had previously been classified as part of Syntrophomonadaceae family in the Firmicutes (Baena et al. 1998, 1999a; Diaz et al. 2007), grouped with Deferribacteres (Garrity et al. 2004) and misclassified as Selenomonas (Guangsheng et al. 1992). The presence or absence of CSIs that associate these groups with the Synergistetes should prove helpful in determining whether any link exists between the Synergistetes and these other groups of bacteria.

In our analysis we have identified some CSIs that, along with being present in some or all the Synergistetes species, were present in other groups of organisms. Two examples of such indels are presented in Fig. 9. Figure 9a shows a 1 aa insert in the MiaB-family RNA modification enzyme that is uniquely present in all detected Synergistetes as well as various species from the phylum Chloroflexi. All other bacteria lack this insert. Similarly, in the DNA polymerase III α-subunit, a 1 aa insert is present in all detected Synergistetes and also in various Fusobacteria, an Opitutaceae species as well as in Thermomicrobium (Fig. 9b). In phylogenetic trees constructed from these protein sequences, the Synergistetes species do not branch with species from these taxa (unpublished results) indicating that the shared presence of these CSIs is not due to their being sister taxa of Synergistetes or LGTs. The CSIs in these groups have thus likely originated independently. Other CSIs that the Synergistetes share with species from other taxonomic groups are listed in Table 6 and sequence information for them is provided in Supplementary Figs. 57–74. These other taxa include the Fusobacteria (Supplementary Figs. 57–61), Elusimicrobia (Supplementary Figs. 61, 62), class Negativicutes (Supplementary Figs. 63–66), Acidobacteria (Supplementary Fig. 67), Proteobacteria (Supplementary Figs. 68–70), Aquificae (Supplementary Fig. 71), Erysipelotrichi (Supplementary Fig. 72), Actinobacteria (Supplementary Fig. 73) and order Lactobacillales (Supplementary Fig. 74). The Synergistetes share the greatest number (five) of these CSIs with the Fusobacteria and they share only 1–2 indels with most other taxonomic groups. In many cases where the Synergistetes share CSIs with other taxa, only some species from the Synergistetes or the other taxa contain the indel. The CSIs in these other groups may have arisen independently through separate genetic events or it is also plausible that their shared presence in some of these cases is due to LGTs.

Fig. 9
figure 9

Examples of CSIs that are commonly shared by Synergistetes species and other groups of bacteria. a A CSI consisting of 1 aa insert in the MiaB-family of RNA modification enzyme that is commonly shared by different Synergistetes and Chloroflexi. b A 1 aa insert in a conserved region of DNA polymerase III, α subunit shared by all Synergistetes and Fusobacteria as well as two other bacteria belonging to the Chloroflexi and Verrucomicrobia phyla

Table 6 Conserved indels common to Synergistetes and shared with other groups

Discussion and concluding remarks

The Synergistetes are a relatively unknown group of species living ubiquitously in anaerobic environments. Though characteristics for the isolated Synergistetes are known, such as their gram-negative morphology and their ability to ferment amino acids, no single molecular, morphological or physiological characteristic is known that distinguishes them as a group from other bacterial organisms. Utilizing the available genomic data for this group of organisms, we report here identification of over 60 novel CSIs specific for the species of the Synergistetes phylum. Of the various discovered CSIs, 32 were identified to be specific for all or most Synergistetes species (maximum of three exceptions unrelated to each other). These CSIs are present in widely distributed proteins with important cellular functions and they are rarely present in protein homologues of species outside of the phylum. As they are present in most or all Synergistetes and absent in bacteria from all other taxonomic groups, they provide strong evidence that species of the Synergistetes phylum constitute a monophyletic group that is distinct from all other prokaryotic taxa. These CSIs also provide novel molecular means for identification and circumscription of species from this phylum.

The bacteria belonging to the Synergistetes have been classified into 12 different genera (and a candidate genus) within the phylum. Despite the recognition of numerous species and genera, due to the lack of reliable biological characteristics that can identify the interrelationships among these bacteria, all genera are presently grouped into a single class, order and family. Numerous CSIs were discovered during the course of the study that were present in only certain clades of species within the Synergistetes phylum and absent from others. The group specificities of these CSIs are summarized in Fig. 10. Explicitly, 7 CSI were detected to be specifically found in only the J. anthropi, P. piscolens and D. peptidovorans species; 15 CSIs were identified that are specific for the J. anthropi and P. piscolens species (or differentiate them from other Synergistetes) and 9 other CSIs differentiated the T. acidaminovorans and Amm. paucivorans from other members of the phyla. In addition, two of the discovered CSIs also supported a grouping together of the J. anthropi, P. piscolens, D. peptidovorans, Amb. colombiense and F. fastidiosum species. These relationships are also consistently observed in phylogenetic trees created for the Synergistetes group and the identified CSIs provide valuable markers that consolidate these relationships. Furthermore, it should be noted that in contrast to the CSIs supporting these relationships, very few, if any, CSIs that supported alternative relationships among these species were detected. Thus, the identified CSIs provide independent evidence for the existence of these clades and provide molecular means to demarcate and circumscribe these clades. The evidence based upon identified CSIs supports the division of the phylum Synergistetes into a number of distinct families (or other higher taxonomic groupings) and a formal proposal in this regard will be made in future work. Though the branching and interrelationships of several species within the phylum is well supported by multiple CSIs, the relationships of Tv. lienii and An. hydrogeniformans, and also to some extent Amb. colombiense and F. fastidiosum to other Synergistetes species were not resolved by the identified CSIs. This problem may be addressed as genome sequences for additional Synergistetes species become available.

Fig. 10
figure 10

A summary diagram portraying the species distribution of various identified Synergistetes-specific CSIs and the evolutionary stages where the genetic changes responsible for them have occurred

As previously mentioned, the Synergistetes have often been misclassified as a lower ranked taxonomic group with bacteria belonging to other phylogenetic divisions. In our analysis, some CSIs were also discovered that were shared by Synergistetes species along with species from other taxonomic groups. Some of the organisms sharing such indels included species from the Fusobacteria, Chloroflexi, Proteobacteria, Acidobacteria, Aquificae and Firmicutes phyla. Most of these groups shared no more than 1–3 CSIs and in many cases only a few species within the groups contained the indels. Geissinger et al. (2009) presented a study suggesting a shared common ancestor for Elusimicrobium and Synergistetes and a recent study by Gupta (2011) also suggested that the Negativicutes, Fusobacteria, Elusimicrobia and Synergistetes phyla might be closely related to each other based on their cell membrane structure and shared indels in their DnaK and GroEL proteins (Geissinger et al. 2009; Gupta 2011). Though the Elusimicrobia and Fusobacteria share some CSIs with the Synergistetes species, no CSI was found that was specifically shared by all detected Synergistetes and species from these taxa. Furthermore, the branching of these phyla in the protein trees (Fig. 1) does not support their close relationship with the Synergistetes. Hence, based upon these results, at present no clear relationship of the Synergistetes species to other bacteria phyla can be inferred. These results provide further evidence supporting the placement of Synergistetes species into a distinct phylum.

Due to their specificity, Synergistetes-specific CSIs provide interesting prospects for future research. Since these CSIs are present in conserved regions of various proteins, degenerate primers utilizing the conserved regions can be designed for use as a means for identification of various species of the phylum in different environments (Gao and Gupta 2005). This might prove to be especially useful, as it is surmised that universal primers utilized in detection of organisms is metagenomic studies may not efficiently identify some Synergistetes species (Hamady and Knight 2009). As molecular markers, the phylum-specific CSIs can be useful as identification tool for detection of known and unknown species in metagenomics experiments. These CSI can also assist in the classification of newly discovered bacteria into the phylum Synergistetes and its sub-groups.

Finally, some species of Synergistetes have also been notoriously difficult to culture/isolate (Vartoukian et al. 2010) and, for others, their biological nuances have just begun to be understood. It has been suggested that Synergistetes act in concert with other oral bacteria to degrade proteinaceous compounds in periodontitis lesions (Homer and Beighton 1992; Wei et al. 1999; Vartoukian et al. 2007). Prior functional studies on taxa-specific CSIs have shown that such indels are usually present in peripheral regions of proteins and they tend to be essential for the function of the proteins in the organisms where the CSIs occur (Itzhaki et al. 2006; Akiva et al. 2008; Hormozdiari et al. 2009; Singh and Gupta 2009). Hence, agents that bind to these CSIs and inhibit their cellular functions could provide novel therapeutics, which are specifically directed against this group of bacteria. Lastly, the molecular markers discovered in this study, due to their specificity for Synergistetes species provide novel and valuable means for understanding the contribution of this group of bacteria to the environment and to the microbial communities that they inhabit. Thus, analyses devoted to the understanding of the function of these CSIs should provide important insights into the biochemical and physiological properties that define the Synergistetes and their roles in different environments.