Introduction

The members of the order Pasteurellales are Gram-negative, non-motile and aerobic to facultative anaerobic bacteria, which constitute one of the main orders within the Class Gammaproteobacteria (Pohl 1981; Mutters et al. 1989; Paster et al. 1993; Olsen et al. 2005; Christensen et al. 2007; Christensen and Bisgaard 2010). The order Pasteurellales presently contains a single family, Pasteurellaceae, that is made up of at least 15 genera and >70 species (see http://www.the-icsp.org/taxa/Pasteurellaceaelist.htm; Christensen and Bisgaard 2010). These bacteria are commonly present as commensals in the mucosal membranes of the respiratory, alimentary and reproductive tracts of various vertebrates (mainly birds and mammals) including humans (Bisgaard 1993; Olsen et al. 2005; Christensen and Bisgaard 2010). The presence of these bacteria in both healthy as well as diseased vertebrates indicates that they are opportunistic pathogens and several of them are important human and animal pathogens. For example, Haemophilus influenzae, Haemophilus ducreyi and Aggregatibacter (Agg.) actinomycetemcomitans are respectively involved in the causation of bacteremia, pneumonia and acute bacterial meningitis; the sexually transmitted disease chancroid; and juvenile periodontitis in humans (Bisgaard 1993; Fleischmann et al. 1995; Spinola et al. 2002; Olsen et al. 2005; Christensen and Bisgaard 2010). Other species such as Mannheimia (Man.) haemolytica, Pasteurella multocida and Actinobacillus (Act.) pleuropneumoniae are causative agents of the shipping fever in cattle, fowl cholera and pleuropneumonia in pigs, respectively (Bisgaard 1993; Bosse et al. 2002; Gioia et al. 2006).

The Pasteurellales are presently distinguished from other bacteria primarily on the basis of their branching in 16S rRNA gene sequence trees, where they form a distinct cluster (Mutters et al. 1989; De Ley et al. 1990; Dewhirst et al. 1992; Dewhirst et al. 1993; Olsen et al. 2005; Christensen and Bisgaard, 2006; Christensen and Bisgaard, 2010). The species from this order/family also form a distinct clade in phylogenetic trees based on numerous other genes and protein sequences (Korczak et al. 2004; Christensen et al. 2004; Kuhnert and Korczak, 2006; Gao et al. 2009; Williams et al. 2010). Some morphological and nutritional characteristics such as lack of motility, requirement for sodium ions, V-factor and organic nitrogen sources for growth, are often used to distinguish these bacteria from other orders of Gammaproteobacteria (e.g. Vibrionales, Aeromonadales, Enterobacteriales and Alteromonadales) (Olsen 1993; Kainz et al. 2000; Olsen et al. 2005; Christensen and Bisgaard 2006; Hayashimoto et al. 2007). However, none of these characteristics are unique for the Pasteurellales and reliance only on them can lead to incorrect identification/placement of species in this group and its various genera (Christensen et al. 2004; Olsen et al. 2005; Christensen et al. 2007; Christensen and Bisgaard 2010). Presently, no convincing molecular or biochemical characteristic is known that is uniquely shared by various Pasteurellales and which can be used to clearly distinguish this group of bacteria from all others. Our current understanding of the phylogeny/taxonomy for these bacteria is also unsatisfactory (Olsen et al. 2005; Christensen and Bisgaard 2006). For example, several of the genera classified within Pasteurellales (viz. Haemophilus, Actinobacillus and Mannheimia) are not monophyletic and species from them branch in a number of different clusters with other members of this group (Olsen et al. 2005; Gioia et al. 2006; Redfield et al. 2006; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010; Bonaventura et al. 2010). Although suggestions have been made to restrict these genera to a limited number of species (Olsen et al. 2005; Christensen and Bisgaard 2006), the taxonomy of members of the Pasteurellales/Pasteurellaceae is clearly unsatisfactory at present (Christensen et al. 2007; Christensen and Bisgaard, 2010; Bonaventura et al. 2010). Thus, it is important to identify other novel sequence based characteristics that could provide reliable means for the identification of species from this order and which could also prove useful in clarifying their taxonomy and evolutionary relationships.

Since the sequencing of first genome for H. influenzae in 1995 (Fleischmann et al. 1995), sequence data for more than 1500 bacteria covering all major bacterial phyla are now available (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html). Of these genomes, 20 genomes are from Pasteurellales species/strains representing five genera from this family (Table 1). These genome sequences provide an unprecedented and valuable resource for discovering novel molecular characteristics that are uniquely shared by either all Pasteurellales or specific groups/clades of these bacteria and could provide more reliable means for their identification (Shah et al. 2009). Using genomic sequences, our recent work has focused on identifying two different types of molecular markers that are specific for different groups of bacteria. One type of molecular markers consists of conserved signature inserts or deletions (i.e. Indels) (CSIs) in widely distributed proteins, that are specifically present in particular groups of bacteria (Gupta 2000; Gupta and Mok 2007; Gupta 2009; Gupta 2010). The whole proteins that are uniquely present in particular groups of bacteria provide another type of molecular markers that are useful for these studies (Gupta 2006; Gupta and Griffiths 2006; Gupta and Mathews 2010). Our recent work has identified large numbers of CSIs for a number of major taxa within bacteria (viz. Alphaproteobacteria, Epsilonproteobacteria, Chlamydiae, Actinobacteria, Cyanobacteria, Bacteroidetes-Chlorobi, Deinococcus-Thermus) and for many of their subgroups (Gupta and Griffiths 2006; Gupta and Mathews 2010). Recently, some molecular signatures for the Class Gammaproteobacteria as a whole were also identified (Gao et al. 2009).

Table 1 Sequence characteristics of the Pasteurellales genomes

In the present work, we have employed these comparative genomic approaches in conjunction with phylogenetic analysis for investigation of the available Pasteurellales genomes. The primary objective of this work is to identify novel molecular markers consisting of conserved signature indels (CSIs) that are unique to either all Pasteurellales or its major subgroups/clades. Our work has identified >40 CSIs that are specific for all (or most) genome sequenced Pasteurellales species/strains. In addition, we also describe many CSIs that are specific for a number of distinct subclades of Pasteurellales, which are also supported by phylogenetic analyses. These molecular signatures provide valuable means for the identification of members of the Pasteurellales and a number of their subclades and for the division of Pasteurellales into two distinct groups.

Methods

Phylogenetic analysis

Phylogenetic analysis was performed on a concatenated sequence alignment for 10 highly conserved proteins (viz. 50S ribosomal protein L5, RNA polymerase subunit beta (RpoB), prolyl-tRNA synthetase, chaperone protein DnaK, threonyl-tRNA synthetase, valyl-tRNA synthetase, cell division protein FtsY, alanyl-tRNA synthetase, translation initiation factor IF-2, DNA gyrase subunit B) that are present in most extant bacteria (Harris et al. 2003) and which have been extensively used for phylogenetic studies (Korczak et al. 2004; Christensen et al. 2004; Gao et al. 2009; Gupta 2009). The sequences for these proteins for various Pasteurellales and several other Gammaproteobacteria, which served as outgroup, were retrieved and multiple sequence alignments for them were created using the CLUSTAL_X 1.83 program (Jeanmougin et al. 1998). After concatenation, the poorly aligned regions from the sequence alignment were removed using the Gblocks 0.91b program (Castresana 2000). The resulting alignment, which consisted of 6783 characters, was employed for phylogenetic analyses. A neighbour-joining (NJ) tree based upon 500 bootstrap replicates of this sequence alignment was constructed employing Kimura’s distance calculation using the TREECON 1.3 program (Van de Peer and De Wachter 1994).

Identification of CSIs for members of the order Pasteurellales

To identify conserved indels in protein sequences that might be specific for the Pasteurellales, Blastp searches were performed on all proteins from the genome of Aggregatibacter aphrophilus NJ8700 (Di Bonaventura et al. 2009). For those proteins/ORFs for whom high scoring homologues were present in most Pasteurellales species/strains as well as certain outgroup species, sequences for 10–15 high scoring homologues were retrieved from diverse Pasteurellales and other bacteria and their multiple sequence alignments were constructed using the Clustal_X 1.83 program. These sequence alignments were visually inspected to identify any conserved inserts or deletions that were restricted to either all Pasteurellales or its major clades and which were flanked by at least 5–6 identical/conserved residues in the neighboring 30–40 amino acids on each side. The indels that were not flanked by conserved regions were not further studied as they do not provide useful molecular markers (Gupta 1998; Gupta 2000; Gupta 2009). The conserved indels, which in addition to the Pasteurellales were also present in a few other bacteria, were also retained. The indels for individual species or smaller clades were not analyzed in detail in the present work. The species distribution patterns of all such indels were further evaluated by detailed Blastp searches on short sequence segments containing the indels and their flanking conserved regions (Gupta 2009). The sequence information for various conserved indels from all Pasteurellales and some representative high scoring Gammaproteobacteria were compiled into signature files. Due to space consideration, sequence information for different strains of the same species is not shown, but the indicated CSIs were present in all of the sequenced strains. Further, unless otherwise noted, all of these CSIs are specific for the indicated groups.

Results

Phylogenetic analysis of Pasteurellales

The evolutionary relationships among Pasteurellales in the past was mainly examined on the basis of phylogenetic trees for the 16S rRNA gene and a number of individual protein sequences (Dewhirst et al. 1993; Korczak et al. 2004; Christensen et al. 2004; Olsen et al. 2005; Christensen and Bisgaard 2006). However, the availability of genome sequences now enables one to determine the branching order of these species based upon concatenated sequences for large numbers of proteins. The trees based upon large numbers of characters derived from multiple proteins provide more reliable indication of the phylogenetic relationships within a given group than those based on any single gene or protein (Rokas et al. 2003; Ciccarelli et al. 2006; Gao et al. 2009; Wu et al. 2009; Williams et al. 2010). Previously, Redfield et al. (2006) and Gioia et. al. (2006) have reported construction of phylogenetic trees for eight Pasteurellales species (viz. H. influenzae, H. ducreyi, Haemophilus somnus, P. multocida, Act. pleuropneumoniae, Agg. actinomycetemcomitans, Mannheimia succiniciproducens and Man. haemolytica, based upon concatenated sequences for 12 and 50 conserved proteins, respectively. More recently, Bonaventura et al. (Bonaventura et al. 2010) have carried out detailed phylogenetic analyses for 12 Pasteurellales genomes representing 10 species (the above eight species plus Agg. aphrophilus and Actinobacillus succinogens) based upon concatenated sequences for different orthologous proteins found in their genomes. Although, these trees provide useful resources for understanding the evolutionary relationships among the indicated Pasteurellaceae species/strains, in the past 2–3 years sequences for a number of new Pasteurellaceae species (viz. Haemophilus parasuis, Actinobacillus minor and Pasteurella dagmatis), as well as additional strains for several species, have become available in the NCBI database (Table 1). A few characteristics of these genomes, some of which are draft genomes, are listed in Table 1. In order to determine the evolutionary significances of various CSIs identified by our analyses, it was necessary to construct a phylogenetic tree that included sequence information for all of these Pasteurellales. In the present work, phylogenetic trees for 20 Pasteurellales species/strains representing 13 species were constructed based upon concatenated sequences for 10 conserved proteins.

A NJ distance tree for the above Pasteurellales species that was rooted using other Gammaproteobacteria (viz. Vibrionales or Aeromonadales) is shown in Fig. 1. As expected, the Pasteurellales species formed a distinct and strongly supported clade in the tree. Further, as observed in earlier studies, species from a number of Pasteurellales genera viz. Haemophilus, Actinobacillus and Mannheimia branched in a number of different clusters, indicating that these genera are not monophyletic. In the NJ tree shown, the Pasteurellales species formed two main clades. The first of these clades (Clade I) consists of various Aggregatibacter and Pasteurella species and it also included Act. succinogenes, Man. succiniciproducens and various strains of H. influenzae and H. somnus. Within this clade, the grouping of Aggregatibacter with Pasteurella species and that of Act. succinogenes with Man. succiniciproducens was strongly supported. The second clade (Clade II) consisted of H. ducryi, H. parasuis, Man. haemolytica and various strains of Act. pleuropneumoniae. These two clades of Pasteurellales were also supported by earlier phylogenetic studies based upon different datasets of protein sequences (Gioia et al. 2006; Redfield et al. 2006; Bonaventura et al. 2010). These trees provide us a phylogenetic framework to understand/interpret the evolutionary significance of various identified CSIs.

Fig. 1
figure 1

A neighbor-joining distance tree for the sequenced Pasteurellales based upon concatenated sequences for 10 conserved proteins. The tree was rooted using sequences for other Gammaproteobacteria (viz. Vibrionales or Enterobacteriales) and the numbers on the nodes indicate the bootstrap values out of 500. The two main clades of Pasteurellales that are seen in the tree are marked

Identification of conserved indels that are specific for the order Pasteurellales

Our analyses have identified 44 CSIs in broadly distributed proteins that are largely specific for most of the sequenced Pasteurellales species (Table 2). The CSIs in the first 23 proteins listed in this table are commonly shared by all sequenced Pasteurellales species/strains but they are not found in the homologues from any other bacteria (at least the top 500 blast hits). One example of these Pasteurellales-specific CSIs is shown in Fig. 2a. In this case, an 8 aa insert in a highly conserved region of a tetratricopeptide (TPR) domain-containing protein is uniquely present in all sequenced Pasteurellales. Although, sequence information is presented here for only a limited number of species, unless indicated otherwise, the CSI shown here as well as other molecular signatures shown are specific for the Pasteurellales group and not found elsewhere. Other CSIs that are uniquely present in all Pasteurellales are listed in Table 2 and the sequence alignments of these proteins showing the presence of the indicated CSIs are provided as Supplementary Figs. 1–21. Of these, the enzyme peptidyl-prolyl cis-tran isomerase B contains two 6 aa inserts in different positions that are specifically present in all sequenced Pasteurellales. However, there are two homologues of this protein in P. multocida, P. dagmatis and Man. succiniciproducens and these CSI are present in only one of the homologues (Supplementary Figs. 20, 21). Five other proteins listed in Table 2 (Supplementary Figs. 22–26), also contain CSIs that are specific for the Pasteurellaceae species. However, the homologues for these proteins were not detected in one of the Pasteurellales species (viz. H. ducreyi or Agg. actinomycetemcomitans). Similarly, for four other proteins that contained Pasteurellales specific CSIs, their homologues were not detected in a few species from this group (Supplementary Figs. 27–30).

Table 2 Conserved Signature Indels that are specific for all Pasteurellales
Fig. 2
figure 2

Partial sequence alignments of the proteins a a tetratricopeptide domain-containing protein showing a conserved CSI (boxed) that is uniquely present in all Pasteurellales species and b DNA-dependent helicase II, showing a conserved insert (boxed) that is largely specific for all Pasteurellales. However, in this case the CSI was also present in one non-Pasteurellales species (marked with arrow). The shared presence of the CSI in this species could be due to LGTs, however, other possibilities cannot be excluded. The dashes in the sequence alignments indicate identity with the amino acid on the top line. The numbers on the top lines indicate the regions of proteins where these CSIs are present in the species shown on the top. Sequence information for other bacteria is shown here for only a limited number of species. However, no other species within the first 500 blast hits contained the indicated indels. Information for many other CSIs that are specific for all Pasteurellales is provided in Table 2

In a number of additional proteins, while the CSIs of interest are specifically present in most Pasteurellales, they are lacking in 1–2 species. For example the 1 aa insert in 23S rRNA (guanosine-2′-o)-methyltransferase and the 17 aa insert in glutamate ammonia ligase adenylyltransferase are specifically present in all Pasteurellales except H. parasuis (Supplementary Figs. 31–32). Likewise, the 1 aa inserts in murein transglycosylase C, ProS protein and d-methionine-binding lipoprotein are present in all Pasteurellales except Act. minor and the two Pasteurella species, respectively (Supplementary Figs. 33–35). The absence of CSIs in these Pasteurellales species could result from a variety of possibilities including deeper branching of these species in relation to other species or replacement of the gene containing CSI by a gene lacking the CSI by means of LGTs. However, at present these or other possibilities cannot be distinguished.

In addition to the above proteins that contained CSIs that were highly specific for either all or most Pasteurellales species, in a small number of cases the identified CSIs in addition to being shared by all or most Pasteurellales were also present in 1–2 isolated species from other Gammaproteobacteria. One example of such CSIs is a 3–4 aa insert in the DNA dependent helicase II (Fig. 2b), that is commonly shared by all sequenced Pasteurellales species as well as by Tolumonas auensis, belonging to the order Aeromonadales. However, this CSI is not present in other Aeromonadales. The other proteins containing Pasteurellales-specific CSIs with isolated exceptions include the presence of a 2 aa insert in the hypothetical protein NTO5HA_0747 that is also shared by Psychrobacter sp. PRwf-1 (Supplementary Fig. 36A); a 2 aa deletion in the Lysyl tRNA synthetase that is also shared by Marinomonas sp. MWYL1 (Supplementary Fig. 36B); a 1 aa insert in the protein Cof, a haloacid dehalogenease-like hydrolase, that is also present in Pantoea sp. At-9b (Supplementary Fig. 37); a 4 aa deletion in 6-phophogluconolactonase that is also found in Cardiobacterium hominis (Supplementary Fig. 38), a 2 aa deletion in the geranyltranstransferase also present in Allochromatium vinosum, Marinobacter algicola and Marinobacter aquaeolei (Supplementary Fig. 39); and lastly a 3 aa insert in the DNA repair protein RecN that in addition to all Pasteurellales is also present in Cellvibrio japonicus and Psychromonas sp. CNPT3 (Supplementary Fig. 40). The shared presence of these CSIs in isolated species from other groups could result from a variety of possibilities including lateral gene transfer from Pasteurellales to these species; independent occurrence of similar genetic changes in these species; or that some of these species might be more closely related to the Pasteurellales and that they have been incorrectly assigned to these other genera/orders. We are unable to distinguish between these possibilities based upon the available data.

Molecular signatures distinguishing two main clades of Pasteurellales

The order Pasteurellales currently consists of a single family Pasteurellaceae and the interrelationship among different species/genera within this family is poorly understood (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). Thus, molecular markers that can provide reliable insights concerning the evolutionary relationships among these species should be of much interest. In phylogenetic trees, based upon two different large sets of protein sequences, the sequenced Pasteurellales species formed two distinct clades (Gioia et al. 2006; Redfield et al. 2006; Bonaventura et al. 2010), as confirmed in the present study (Fig. 1). Importantly, the existence of these two clades is independently strongly supported by the species distribution patterns of many CSIs that we have identified in the present work. A brief description of these CSIs is provided below.

The protein glutamyl-tRNA reductase, which catalyzes the NADPH-dependant reduction of glutamyl-tRNA to glutamyl-1-semialdehyde, contains two different lengths of CSIs in the same position that serve to distinguish various Pasteurellaceae species from all other bacteria and at the same time they also provide clear distinction between the Clades I and II species (Fig. 3a). In this case, a 4 aa insert in a conserved region is uniquely present in all of the Pasteurellales species that form Clade I (viz. Agg. actinomycetemcomitans, P. multocida, P. dagmatis, Act. succinogenes, Man. succinoproducens, H. somnus and H. influenzae), whereas in the various species that comprise Clade II, a 2 aa insert is present in the same position. Because these CSIs are related in sequence, the most likely explanation to account for them is that a 2 aa or 4 aa insert was initially introduced in a common ancestor of all Pasteurellales and it was followed by either a 2 aa insert in the Clade I species or a 2 aa deletion in the Clade II species. Similarly to glutamyl-tRNA reductase, in the protein long chain fatty acid-CoA ligase, which plays an important role in the breakdown of fatty acids, different lengths of CSIs in a conserved region are uniquely present in the two Pasteurellales clades (Fig. 3b). In this case, a 2 aa insert is present in all of the Clade I species, whereas the Clade II species have a 1 aa insert in this position. The presence of different lengths of CSIs in this protein can also be explained as above. Interestingly, the homologues of both of these proteins were not detected in H. ducreyi.

Fig. 3
figure 3

Partial sequence alignments of a glutamyl-tRNA reductase and b long-chain-fatty-acid-CoA ligase, each containing two CSIs of different lengths (boxed) at the same positions that are specific for the two Pasteurellales clades. The dashes in the sequence alignments indicate identity with the amino acid on the top line. In the case of Glutamyl-tRNA reductase, a 4 aa insert is present in various Clade I species, while all of the Clade 2 species contain a 2 aa insert in this position. In the long-chain-fatty-acid-CoA ligase, 2 aa and 1 aa inserts are found in the Clades 1 and 2 species, respectively. The different lengths of CSIs in these proteins serve to distinguish the Clades 1 and 2 species from each other. Sequence information for only a limited number of species from other bacterial group is presented here

In addition to these CSIs that distinguish both Clades I and II species, we have also identified 11 CSIs in widely distributed proteins that are either uniquely or mainly found in the Clade I species (Table 3A). Two examples of such CSIs are presented in Fig. 4. In the universally distributed ribosomal protein S1, which plays a central role in protein synthesis, an eight amino acid deletion in a conserved region is uniquely present in all Clade I Pasteurellales species (Fig. 4a). The absence of this indel in all other Pasteurellales as well as other bacteria provides evidence that this indel represents a deletion in the Clade I species rather than an insert in other bacteria. Similarly, in the protein cytochrome-D-ubiquinol oxidase subunit 1, which is a component of the aerobic respiratory chain, a 5 aa insert in a conserved region is uniquely present in all Pasteurellales species belonging to Clade I, but not found in any other bacteria (Fig. 4b). Sequence alignments for other proteins which contain CSIs that are specific for Pasteurellales Clade I are presented in Supplementary Figs. 41–45. The CSIs in all of the above proteins are highly specific for Pasteurellales Clade I indicating that they were introduced in a common ancestor of this clade.

Table 3 Conserved signature indels that are specific for two different pasteurellales clades
Fig. 4
figure 4

Excerpts from the sequence alignments for a ribosomal protein S1 and b cytochrome D ubiquinol oxidase subunit 1, showing two different CSIs in conserved regions of these proteins that are uniquely present in various Clade 1 Pasteurellales species. The other CSIs those are specific for the Clade I species are listed in Table 3A. The dashes in the sequence alignments indicate identity with the amino acid on the top line

Four other proteins also contain CSIs that are largely specific for the Clade I. Within Clade I, H. influenzae shows deepest branching in the phylogenetic tree (Fig. 1). We have identified a 2 aa insert in the protein thiamine-monophosphate kinase that is commonly shared by all Clade I species except H. influenzae (Supplementary Fig. 46). The most likely explanation for this CSI is that the genetic change responsible for it occurred in a common ancestor of the remaining Clade I species after the branching of H. influenzae. For CSIs in three other proteins, the indels of interest are also present in an isolated species from Clade II in addition to the members of Clade I. For example, in the fumarate reductase iron-sulfur subunit, which is involved in the interconversion of fumarate and succinate, an 11 aa insert in a highly conserved region is uniquely present in various Clade I species and also H. parasusis, which shows deepest branching in the Clade II (Supplementary Fig. 47). Likewise, in the cell division protein FtsZ, a 3 aa insert is present in various Clade I species and also Man. haemolytica (Supplementary Fig. 48). The protein lysyl-tRNA synthetase also contains a 2 aa insert that is specific for the Clade I. However, in this case, only one of the H. somnus strain contains this CSI, whereas the other H. somnus strain has a more divergent homologue that lacks this indel (Supplementary Fig. 49). The species distribution patterns of these latter CSIs could result from a number of possibilities including LGT events or introduction of these genetic changes at various stages in the evolution of the Pasteurellales species that are not apparent from this tree.

The Pasteurellales species Act. pleuropneumoniae, Act. minor, H. ducreyi, Man. haemolytica and H. parasuis form Clade II in the phylogenetic tree (Fig. 1). As indicated above, the proteins glutamyl-tRNA reductase and long chain fatty acid-CoA ligase contain distinctive inserts that are specific for the Clade II species (Fig. 3). We have also identified a number of other CSIs that are specific for this clade (Table 3B). In the enzyme DNA adenine methylase, which is responsible for methylation of the newly synthesized strand of DNA, a 3 aa insert that is specific for the Clade II species is present in a highly conserved region (Fig. 5a). Other sequence alignments showing CSI specific to Pasteurellales Clade II (Table 3B) are shown in Supplementary Figs. 50–52. The genetic changes responsible for these CSIs were likely introduced in a common ancestor of the Clade II species and they strongly support the existence of this clade.

Fig. 5
figure 5

Partial sequence alignments for the proteins a DNA adenine methylase showing a 3 aa insert that is specific for Clade 2 Pasteurellales species and b tRNA (uracil-5-)-methyltransferase, showing a 5 aa insert, that is uniquely found in all Clade 2 species except H. parasuis, which is the deepest branching species in Clade 2 (Fig. 1). Other CSIs showing similar specificity are listed in Table 3B. The dashes in the sequence alignments indicate identity with the amino acid on the top line

Within Clade II, the deepest branching in the phylogenetic tree is observed for H. parasuis (Fig. 1). A clade consisting of the remaining Clade II species (all except H. parasuis) is strongly supported in the phylogenetic tree. We have identified three CSIs that are specific for this subclade of the Clade II. Information for one of these CSIs is presented in Fig. 5b, which shows a 5 aa insert in the enzyme tRNA-(uracil-5-)-methyltransferase. Similar to this CSI, a 2 aa insert in a highly conserved region of the ribosomal proteins S4 (Supplementary Fig. 53) and a 7 aa deletion in the enzyme adenylate cyclase is also specific for this subclade of the Clade II species (Supplementary Fig. 54). The genetic changes for these CSIs were likely introduced in a common ancestor of the remaining Clade II species after the branching of H. parasuis. In the enzyme DNA gyrase B, which contains a 2 aa insert specific for the Clade II species, in the same position where this insert is found, a 5 aa insert is also uniquely present in the two succinic acid producing bacteria Act. succinogenes and Man. succiniciproducens (Supplementary Fig. 51). The latter two bacteria form a strongly supported cluster in the phylogenetic tree and the shared presence of this insert support that they are specifically related (Fig. 1). The different lengths and species specificity of these inserts indicate that the genetic changes responsible for them occurred independently in the common ancestors of these two groups of Pasteurellales species.

Discussion

The members of the Order Pasteurellales are presently distinguished from other bacteria primarily on the basis of their distinct branching in phylogenetic trees (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). Furthermore, although this order is comprised of at least 15 genera, due to a lack of reliable information about their interrelationships, all of them are placed into a single family (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). We report here for the first time >60 molecular signatures that are distinctive characteristics of either all sequenced Pasteurellales species/strains or a number of well-defined subclades within this order. Of the signatures described here, 23 CSIs in widely distributed proteins are uniquely found in all of the sequenced Pasteurellales species/strains (Table 2) and they are not found in any other bacteria. Due to their specificity to the Pasteurellales, the rare genetic changes responsible for them were likely introduced only once in a common ancestor of these bacteria and then passed on to various descendent species (Gupta 1998; Rokas and Holland 2000; Gupta and Mathews 2010). The presence of these CSIs in all Pasteurellales and their absence in all other bacteria strongly indicates that the genes for these proteins have not been laterally transferred from Pasteurellales to other bacterial groups or vice versa (Gogarten et al. 2002; Christensen and Bisgaard 2010). Thus, these CSIs provide potentially useful molecular markers (synapomorphies) for the identification and circumscription of species from the order Pasteurellales in molecular terms.

In addition to these CSIs that are uniquely found in all sequenced Pasteurellales, 21 other CSIs were identified that are also largely specific for this order of bacteria. However, in some of these cases the homologues for these genes/proteins were not detected in 1 or 2 Pasteurellales species, whereas in some others an isolated species from other bacterial groups was also found to contain these CSIs. Because these CSIs are commonly present in all (or most) Pasteurellales, with only isolated exceptions showing no specific pattern, it is highly likely that the genetic changes responsible for them also occurred in a common ancestor of the Pasteurellales. This was likely followed by loss of the genes from a few species and their acquisition by isolated species from other groups by LGTs (Gogarten et al. 2002). However, the possibility that sequence information for some of these observed exceptions might be incorrect in the public databases cannot be entirely ruled out.

All of the genera within the order Pasteurellales are currently placed into a single family, Pasteurellaceae (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). However, the present work has also identified many CSIs that are specific for two distinct clades of Pasteurellales, which are also supported by our phylogenetic analyses (Fig. 1) and that of others (Gioia et al. 2006; Redfield et al. 2006; Bonaventura et al. 2010). The first of these clades, supported by 13 CSIs (Table 3A), includes Aggregatibacter and Pasteurella species and also Act. succinogenes, Man. succiniciproducens and various strains of H. influenzae and H. somnus. The remaining Pasteurellales species (viz. Act. pleuropneumoniae, Act. minor, H. ducryi, Man. haemolytica and H. parasuis) formed the second clade, which was supported by nine uniquely shared CSIs (Table 3B). Within Clade II, several CSIs also supported the deeper branching of H. parasuis in comparison to other species. The mutually exclusive presence of many of these CSIs in species from these two clades make a persuasive case that these clades are evolutionarily distinct and the genetic changes responsible for these CSIs were introduced in their common ancestors as indicated in Fig. 6. It should be noted that in contrast to numerous CSIs that supported the existence of these two clades, we have not come across significant numbers of CSIs that support any other alternative clades. Therefore, the identified CSIs, independently of phylogenetic analyses, provide strong evidence for the existence of these two Pasteurellales clades. We suggest that these two Pasteurellales clades, whose existence is supported by both phylogenetic analyses and by many discrete molecular signatures, should be recognized as distinct higher taxonomic groupings (i.e. families) within this order.

Fig. 6
figure 6

A summary diagram showing the distribution patterns of various Pasteurellales-specific CSIs indicating the evolutionary relationships among Pasteurellales species. The different clades within this order that are supported by both phylogenetic studies and the identified molecular signatures are shown

Sequence information for all of the identified CSIs is presently limited to only those Pasteurellales species/strains, whose genomes have been sequenced. Hence, to fully understand the evolutionary and taxonomic significance of these CSIs, it is of much importance to obtain sequence information for them from other Pasteurellales species, notably including the appropriate type strains. For the CSIs that are specific for all Pasteurellales, due to their exclusive presence in all sequenced species/strains from this order and no other (>1500) prokaryotic or eukaryotic organisms, it is highly likely that they will also be present in other Pasteurellales species/strains for whom no sequence information is presently available. Our earlier work on many CSIs for other prokaryotic groups indicates that the CSIs of this kind have a high degree of predictive ability (Griffiths and Gupta 2002; Gupta 2005; Gao and Gupta 2005; Griffiths and Gupta 2006; Gupta 2009) and many of them will provide reliable molecular markers for the entire Pasteurellales order as sequence information for other species becomes available. However, for those CSIs that are specific for the two subclades of Pasteurellales, further studies to obtain sequence information from additional species/strains should be very informative. Based upon the presence or absence of the CSIs that are specific for the two subclades, it should be possible to assign/place other species into these subclades. This should help in determining more clearly the taxonomic boundaries of these two subclades. It is also possible that some species of Pasteurellales may be lacking both Clades I and II specific CSIs. This would suggest that such species might be parts of other higher taxonomic clades within the order Pasteurellales that have yet to be identified.

The Pasteurellaceae species are important human and animal pathogens and new species related to them are continually being discovered (Christensen and Bisgaard 2010). The identification of these medically important bacteria at present primarily relies upon culture-based nutritional and phenotypic characteristics (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). However, such tests are unable to reliably distinguish members of Pasteurellales species from some other orders of Gammaproteobacteria (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). In this context, the Pasteurellales-specific CSIs described here provide a novel means for the identification of these bacteria. Degenerate PCR primers based on conserved regions of these CSIs-containing genes, should provide novel and specific means for the detection of both previously known as well as novel Pasteurellales species (or isolates) in different environments.

In the present study, our focus has been mainly on identifying CSIs that are specific for either all Pasteurellales or its larger clades. Although our work has identified many CSIs of these kinds, further detailed studies on other Pasteurellales genomes could lead to identification of additional signatures of this kind. In the present work, we have not analyzed CSIs that were specific for individual species/genera or for the smaller clades of Pasteurellales. We have also not yet looked for the presence of signature proteins (CSPs) that are specific for either all Pasteurellales or its different subgroups. Such studies will form the focus of our future work. A number of Pasteurellales genera (viz. Haemophilus, Actinobacillus and Mannheimia) are not monophyletic and it is important to develop reliable means to reorganize them (Olsen et al. 2005; Christensen and Bisgaard 2006; Christensen and Bisgaard 2010). The identification of large numbers of CSIs and CSPs those that are specific for individual species or smaller clades, in addition to their diagnostic values, should prove very helpful in the reorganization and circumscription of various Pasteurellales genera.

Most of the CSIs identified in this work are present in conserved regions of various proteins that are involved in wide variety of essential cellular functions. Our recent work on a number of CSIs in the GroEL and DnaK proteins show that these CSIs are essential for the group of organisms where they are found (Singh and Gupta 2009). Any deletions or significant changes in them lead to failure of cell growth, indicating that they are playing essential roles in these organisms (Singh and Gupta 2009). Based upon these observations and the evolutionary conservation of these CSIs for the Order Pasteurellales, it is expected that these CSIs also play important (and possibly essential) functional roles in these bacteria. Hence, further studies on understanding the cellular functions of these CSIs could provide important insights into novel genetic, biochemical and physiological characteristics of members of Pasteurellales or their different clades.