Keywords

1 Introduction

Human interest in microbes has been growing steadily as their applications in our daily life have gained importance (Kalia 2015b; Moroeanu et al. 2015). The role of bacteria in causing diseases has always been studied with the intention of finding mechanisms to kill them. The discovery of antibiotics proved extremely helpful in reducing human misery. Since antibiotics are targeted to kill bacteria (Alipiah et al. 2015), they respond to this stress and undergo genetic changes, which are expressed as functional changes. The immediate impact of these changes has been the evolution of drug resistance in bacteria (Saxena et al. 2014). It was also realized that without undergoing any genetic change, bacteria under certain conditions may show enhanced antibiotic resistance. This acquired antibiotic resistance was attributed to the biofilm formed by bacteria by quorum sensing (QS), a phenomenon which is controlled by cell density. Bacteria under the regulation of QS express virulent metabolic behavior. Under this new regime, the approach is to inhibit QS and prevent bacteria from acquiring resistance to antibiotics (Gui et al. 2014; Kalia 2013a, 2014a, b; Prakasham et al. 2014). Hence, QS system is turning out to be a novel drug target (Kalia and Purohit 2011; Agarwala et al. 2014; Shang et al. 2014; Hema et al. 2015; Kaur et al. 2015; Koul et al. 2015b; Arya and Princy 2016). In all these scenarios, the need is to identify bacteria and provide a rapid diagnosis, which will permit the treatment to proceed.

Initially, phenotypic and biochemical characteristics were used as the basis for their identification and classification. The developments in molecular biology, genomic, and bioinformatics have changed the pace of research in these organisms. The turning point in the new era of genomics came into effect primarily because of the insights provided by Prof. Carl R. Woese (Kalia 2013b; Prakash et al. 2013; Mahale et al. 2014). Microbiologists around the world have used the tools developed in the last three to four decades to identify bacteria: PCR-ribotyping, microarray analysis, restriction endonuclease (RE) digestion, amplified fragment length polymorphism, multi-locus sequence analysis, DNA hybridization, isotope distribution analysis, molecular connectivity, etc. (Sharma et al. 2008; Kapley and Purohit 2009; Nguyen et al. 2013; Prakash et al. 2014; Wang et al. 2014; Yu et al. 2014, 2015; Meza-Lucas et al. 2016; Yagnik et al. 2016).

2 Bacillus

Bacillus is a versatile organism, which has been exploited for a large number of biotechnological applications. This genus has encompassed such a large number of diverse organisms that may be equated with Pseudomonas, as the “dumping” ground for gram-positive organisms (Porwal et al. 2009). This genus represents a lot of phenotypic and genotypic heterogeneity, such that an unambiguous identification up to species level has been a tough task (Porwal et al. 2009). Members of Bacillus subtilis group, B. cereus group, B. licheniformis, and B. sphaericus are some of the most notorious trouble spots among Bacillus spp. High genomic similarity between B. subtilis and B. halodurans is recorded for G+C content, genome size, and the characteristics of their ABC transporter genes, ATPases, etc. Information presented by the complete genome of B. halodurans show similarity for the enzymes transposases and recombinases, with those recorded among Anabaena, Clostridium, Enterococcus, Lactococcus, Rhodobacter, Staphylococcus, and Yersinia species. It clearly hints that Bacillus needs further segregation into new genera: Aneurinibacillus, Ureibacillus, Virgibacillus, Brevibacillus, and Paenibacillus. In fact, Bacillus stearothermophilus, B. thermoleovorans, B. kaustophilus, and B. thermoglucosidasius are categorized as Geobacillus, whereas B. pasteurii, B. globisporus, and B. psychrophilus are now known as Sporosarcina spp. (Fig. 1). The members of Bacillus marinus are presently classified as Marinibacillus marinus (Yoon et al. 2001).

Fig. 1
figure 1

Reorganization of bacterial systematics: Bacillus, Pseudomonas, and Clostridium (Porwal et al. 2009; Kalia et al. 2011a, b; Bhushan et al. 2013)

3 Clostridium

The biotechnological importance of Clostridium has made researchers to monitor this organism with curiosity and precision. The bacteria is benign on one hand as it can produce solvents, enzymes, biofuels like butanol, ethanol, hydrogen, etc. and is extremely dangerous on the other hand, with the ability to produce deadly toxins (Carere et al. 2008; Bhushan et al. 2015). The heterogeneity of Clostridium has been recorded in phenotypic, biochemical, and genotypic characteristics. For quite some time, the issue of accommodating organisms varying in GC content from as low as 24 mol% (Clostridium perfringens) to as high as 58 mol% (Clostridium barkeri) did not appear justified (Fig. 1) (Kalia et al. 2011a).

4 Pseudomonas

Just like Clostridium and Bacillus, another equally important organism is Pseudomonas. In spite of these developments, there are still quite a few bacteria which were otherwise clubbed together and needed reclassification. Almost all organisms which were difficult to categorize were labeled as Pseudomonas. It was comprised of phenotypically and biochemically highly diverse organisms and was named the “dumping ground” (Fig. 1). They have a versatile metabolic ability to infect and degrade almost everything and no doubt are among the most widely studied pathogens and biodegraders (Bhushan et al. 2013). Pseudomonas has been subjected to repeated taxonomic revisions (Lalucat et al. 2006; Peix et al. 2009). Combined use of housekeeping genes such asgyrB, rpoB, rpoD, recA, atpD, and carA and the classic gene—rrs—was found to be effective in distinguishing different species of Pseudomonas: P. flavescens, P. mendocina, P. resinovorans, P. fluorescens, P. chlororaphis, P. aeruginosa, P. syringae, P. putida, P. stutzeri, etc. (Spiers et al. 2000; Hilario et al. 2004; Aremu and Babalola 2015).

5 Stenotrophomonas

Another highly versatile organism is Stenotrophomonas spp. The phylogenetic diversity of Stenotrophomonas spp. is quite interesting as its members were initially grouped under Pseudomonas and Xanthomonas. Presently, eight recognized Stenotrophomonas spp. exist: S. maltophilia, S. nitritireducens, S. acidominiphilia, S. rhizophila, S. koreensis, S. terrae, S. humi, and S. chelatiphaga. Stenotrophomonas dokdonensis has been transferred to Pseudoxanthomonas as P. dokdonensis. As far as the functional abilities of Stenotrophomonas species are concerned, they are able to treat aromatic compounds either individually or in combination with Bacillus, Pseudomonas, Flavimonas, and Morganella spp. Ecological and metabolic (genetic and functional) diversity of S. maltophilia implies high taxonomic heterogeneity (Anzai et al. 2000; Peix et al. 2007; Tourkya et al. 2009; Verma et al. 2010, 2011; Aremu and Babalola 2015).

6 Streptococcus

The genus Streptococcus has a big number of species with important clinical relevance. Severe and acute diseases are known to be caused by species such as S. agalactiae, S. pneumoniae, and S. pyogenes. Different analytic methods enable identification to a limited extent and are laborious to apply. The genetic variability among different Streptococcus groups is quite low, and distinguishing them is a tough task (Lal et al. 2011; Kalia et al. 2016).

7 Helicobacter

Another group of organisms, which have great economic importance and are a cause of worry for health departments, belong to the genus Helicobacter (Puri et al. 2016). These organisms cause many diseases in human beings. Among the different species of Helicobacter, the following show high genetic variability: H. cinaedi, H. pylori, H. felis, H. bilis, and Candidatus H. heilmannii. Previously, Helicobacter spp. were categorized as Campylobacter sp. (Goodwin et al. 1989). Since H. pylori is responsible for 50 % of the infections caused by different relatives of Helicobacter (Suerbaum and Michetti 2002), it happens to be the most extensively studied species with 450 sequenced genomes. Biochemical assays, including urease test, are cheap but may not be very accurate. Molecular techniques like PCR and MLSA have also not proved to be highly precise (Puri et al. 2016).

8 The Novel Approach to Exploit Hidden Talents of rrs

With constant research efforts over the last three decades, rrs gene sequencing technique has been simplified to such an extent that almost all research laboratories around the globe have adopted it as routine assay. The RDP database has become a rich referral center, to which the newly sequenced rrs gene is subjected and the best match is used for identifying the organism. It must, however, be realized that RDP database can identify the new organism only against what is already known and deposited. The database cannot classify a gene sequence which has not been seen by it so far. It therefore requires a novel overture, wherein we need to first define the taxonomic and phylogenetic limits of each known species and key out the disruptions in the evolutionary scale. In a serial publication of works undertaken in this guidance, extensive genomic analyses were performed to look for any potential characteristics of rrs genes, which have not been elucidated so far. In the following text, a few case studies will be described to elucidate the approach, its validity, and significance (Fig. 2) (Kalia 2015a).

Fig. 2
figure 2

Novel molecular techniques to explore the microbial taxonomy and phylogeny (Porwal et al. 2009; Kalia et al. 2011a; Lal et al. 2011; Bhushan et al. 2013; Puri et al. 2016)

9 Bacillus

9.1 Phylogenetic Framework

The first step was to define the phylogenetic boundaries of a species within a given genus. Bacillus was the first genus to be studied to evaluate the potential of this approach. Out of the available information in the database (at that time), 1121 rrs gene sequences of 10 Bacillus species were taken into consideration: B. thuringiensis, B. anthracis, B. pumilus, B. cereus, B. subtilis, B. megaterium, B. sphaericus, B. clausii, B. halodurans, and B. licheniformis (36–211 strains) (Figs. 3 and 4) (Porwal et al. 2009). Phylogenetic trees developed on the basis of 1121 allowed their segregation into 89 clusters. From each cluster, the outermost and innermost rrs gene sequence was considered as representative of the limits of the species. In case, there were a large number of sequences in a cluster; additional 1–2 sequences were also taken into account. In all cases, the type strain of the species was also used to develop the phylogenetic framework. On the basis of these criteria, a comprehensive framework consisting of 34 rrs sequences representing 10 different Bacillus species were established (Fig. 5). With this genomic tool in hand, 305 Bacillus strains which were identified only up to genus level could be reclassified as members of these 10 known species (Fig. 6) (Porwal et al. 2009). On the basis of this genomic study, it was proposed to segregate the strains of B. subtilis into two species/subspecies. It was revealed in the literature that on the biochemical basis, B. subtilis can be divided into subspecies—subtilis and spizizenii (Nakamura et al. 1999). The study was limited only to 10 species out of around 189 species which are reported to be known today. This indicates that there is a lot of scope to extend this work.

Fig. 3
figure 3

Number of rrs gene sequences of Bacillus, Clostridium, and Pseudomonas used for developing phylogenetic framework—(1) identified up to species level and (2) those identified only up to genus level (Porwal et al. 2009; Kalia et al. 2011a; Bhushan et al. 2013)

Fig. 4
figure 4

Segregation of rrs gene sequences of Bacillus species into multiple sequence alignment groups on the basis of variability in the terminal regions (Adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438)

Fig. 5
figure 5

Phylogenetic framework sequences of rrs genes of ten Bacillus species. aType strain. bFor B. cereus group as a whole, only one type strain was used (Adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438)

Fig. 6
figure 6

Number of rrs gene sequences of different organisms identified up to species level with the help of genomic frame work (Data on Bacillus has been adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438) (Porwal et al. 2009; Kalia et al. 2011a; Bhushan et al. 2013)

10 Unique Signatures

In order to validate the authenticity of the segregation of rrs sequences of strains which could be classified among known Bacillus species, 20–30 nucleotide long unique signatures were identified among the 10 known species using MEME program (http://meme.nbcr.net/meme/meme.html). The uniqueness of these signature sequences was verified by carrying out a blast search against all the sequences available at NCBI (Fig. 7) (Porwal et al. 2009). The motifs (signature sequences) were reported to be unique to a species, if these were absent from other species. Two to five 29–30 nucleotide long unique signatures were detected for Bacillus cereus, B. thuringiensis, B. clausii, B. pumilus, B. megaterium, B. sphaericus, and B. halodurans. In the cases of B. anthracis, B. licheniformis, and B. subtilis, unique signatures were not detectable.

Fig. 7
figure 7

Representative unique nucleotide signatures for rrs gene sequences of different Bacillus species. No unique signature was detectable for B. anthracis, B. subtilis, and B. licheniformis (Data adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438)

A very interesting observation was made among signatures detected in rrs gene sequences of organisms which were identified only as Bacillus sp. Their nucleotide signatures indicated that either these organisms belong to those Bacillus spp. which have not been used in this study (Porwal et al. 2009) or they represent some other genus/genera. A few of these signatures did show a close resemblance to organisms belonging to Virgibacillus, Geobacillus, Jeotgalibacillus, Brevibacillus, Marinibacillus, Paenibacillus, and Pontibacillus (Porwal et al. 2009). A survey of published works revealed that some of these organisms (still classified as Bacillus sp.) had been reclassified and belong to Virgibacillus, Geobacillus, Jeotgalibacillus, Brevibacillus, Marinibacillus, Paenibacillus, and Pontibacillus (Heyndrickx et al. 1999; Nazina et al. 2001). It reflected on the strength of the study, which with the help of in silico analysis alone provided evidences that these Bacillus spp. needed segregation as new genera or species.

11 Restriction Endonuclease Digestion Analysis

Another unique feature to further support the segregation of organisms on the basis of rrs gene was the identification of RE, which elucidated a unique digestion pattern. Here the best part was the number of fragments, their size (nucleotides), and the order in which they occur within the gene.

Fourteen type II REs (Table 3) were used: (1) four base pair cutters (AluI, HaeIII, DpnII, BfaI, Tru9I, and RsaI), (2) six base pair cutters (EcoRI, BamHI, NruI, SmaI, HindIII, PstI, and SacI), and (3) eight base pair cutter (NotI) (rebase.neb.com/rebase/rebase.html). These REs were selected due to the occurrence of highly specific cleavage sites. It was realized that out of these 14 REs, only four base pair cutter could be exploited as these REs generated 5–9 fragments with sizes, which can be easily distinguished even under experimental conditions (Figs. 8 and 9). RE-RsaI digestion of rrs of different Bacillus spp. resulted in 2–6 fragments ranging in size from 11 to 502 nucleotides. B. cereus group members gave similar digestion patterns and were indistinguishable among them. B. halodurans and B. pumilus were easily distinguished from others based on their unique RE digestion patterns (Fig. 8). B. clausii and B. sphaericus could be identified as distinct on digestion with RE HaeIII (Fig. 8). In silico digestion of rrs of B. megaterium, and B. pumilus, gave a unique pattern with RE Tru9I (Fig. 9). The presence of two sets of unique digestion patterns in rrs sequences belonging to B. subtilis with RE AluI (Fig. 9) provided a strong evidence of the potential segregation of this group into two subspecies or as separate species. This observation was made in the phylogenetic framework analysis described above. It may be remarked that certain species segregate together in one RE can be distinguished by analyzing the digestion patterns with another RE.

Fig. 8
figure 8

In silico restriction endonuclease digestion of rrs gene sequences of different Bacillus species with RsaI and HaeIII. Values represent the fragment size (nucleotides). The filled symbol represents the RE action site (Data adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438)

Fig. 9
figure 9

In silico restriction endonuclease digestion of rrs gene sequences of different Bacillus species with Tru9I and AluI. Values represent the fragment size (nucleotides). The filled symbol represents the RE action site (Data adapted from Open Access article: Porwal et al. 2009. doi:10.1371/journal.pone.0004438)

12 Clostridium

The approach described above for developing genomic tools for phylogenetic analysis were extended to rrs gene sequences of Clostridium (Kalia et al. 2011a). Here 756 rrs sequences of 110 Clostridium species were taken into consideration. Clostridium botulinum rrs gene sequences were segregated into four groups. Out of these four groups of C. botulinum, 10 rrs sequences were selected to represent 128 sequences (Fig. 3). By drawing phylogenetic trees of 15 different Clostridium species, 56 rrs gene sequences were selected for creating a phylogenetic framework. It defined the phylogenetic limits of the C. acetobutylicum, C. butyricum, C. beijerinckii, C. perfringens, C. botulinum, C. chauvoei, C. baratii, C. pasteurianum, C. colicanis, C. sardiniense, C. subterminale, C. novyi, C. sporogenes, C. kluyveri, and C. tetani. With this genomic tool in hand, 356 Clostridium strains identified only up to genus level could be classified among these 15 known species (Kalia et al. 2011a). A confirmation of this initial reclassification was achieved through nucleotide signature analysis and unique RE digestion patterns. In this case also, REs—HaeIII, AluI, RsaI, DpnII, Tru9I, and BfaI—proved effective in providing relevant information. RE—AluI—was instrumental in clearly segregating C. chauvoei, C. acetobutylicum, C. kluyveri, C. perfringens, C. colicanis, C. pasteurianum, and C. subterminale (Fig. 10) (Kalia et al. 2011a).

Fig. 10
figure 10

In silico restriction endonuclease digestion of rrs gene sequences of different Clostridium species with AluI. Values represent the fragment size (nucleotides). The filled symbol represents the RE action site (Data adapted from Open Access article: Kalia et al. 2011a. doi:10.1186/1471-2-2164-12-18)

13 Pseudomonas, Helicobacter, and Streptococcus

Using approaches similar to those defined above for Bacillus and Clostridium spp., it was found that effective and meaningful phylogenetic and taxonomical information can be retrieved also in the cases of Stenotrophomonas, Streptococcus, Pseudomonas, and Helicobacter (Verma et al. 2010, 2011; Lal et al. 2011; Bhushan et al. 2013; Puri et al. 2016).

14 Functional Genomics

In addition to using the genomic tools primarily for bacterial identification, it was realized that these can be extended to derive meaningful information in other genes as well. In attempts to inhibit the virulent behavior of bacteria causing infectious diseases, a search for organisms producing bioactive molecules to act as antibacterial was conducted. Since most infectious diseases are caused by organisms which produce biofilm through QS system, anti-QS molecule producers were targeted. Two enzymes—acyl-homoserine lactone acylase and acyl-homoserine lactone lactonase—have been shown to inhibit QS. Phylogenetic and functional genomic analyses of the genes responsible for the production of these enzymes were carried out. Unique signatures and RE digestion patterns enabled to establish the phenomenon of horizontal gene transfer as well (Kalia et al. 2011b). The unique signatures were proposed as candidates for designing primers for detecting such organisms in unknown samples. On the basis of this functional genomic analysis, three organisms were traced, which possessed genes for both the AHL inhibitory enzymes (Kalia et al. 2011b; Kalia 2014a). The RE digestion of AHL-lactonase with Tru9I, RsaI, and DpnII could validate the phylogenetic segregation of the organisms based on rrs gene (Huma et al. 2011). Diversity analysis of Citrobacter species isolated from diverse niches was carried out to check their abilities to degrade aromatic compounds. Nine strains having genes, which coded for aromatic ring-hydroxylating dioxygenases, were analyzed using a diversity of REs—DpnII, RsaI, and HaeIII. Unique signature analysis in combination with RE showed that genomic similarity in a few specific strains supported their closeness in metabolic functions as well (Selvakumaran et al. 2011). Functional genomics of Stenotrophomonas diversity in effluent treatment plants was established with precision using RE digestion strategy. This enabled the development of a consortium to be used for bioremediation (Verma et al. 2010, 2011).

More recently, the genomic tool—RE digestion pattern—has been extensively used for identification of organisms, which are economically highly significant for health departments. The primary objective was to find novel markers to be used for diagnostic purposes (Arasu et al. 2015). The use of functional genes was necessitated by the multiple copies of rrs genes in different species of Clostridium, Vibrio, Yersinia, Staphylococcus, Streptococcus, and Lactobacillus (Kalia et al. 2015, 2016; Kalia and Kumar 2015; Kekre et al. 2015; Koul et al. 2015a; Koul and Kalia 2016; Kumar et al. 2016).

15 Opinion

Phylogenetic analysis based on gene sequences is a very handy and effective tool. The gene most widely used for bacterial identification and overall taxonomical purposes is rrs. Although it is a widely employed technique, it leads into trouble quite frequently. Often, the most obvious choice is to employ other highly conserved genes (HKGs). It, however, implies higher inputs of time and money. Invariably, additional 7–8 HKGs are required to resolve the issue. To circumvent the efforts needed for identifying bacteria using a single gene—rrs—a fresh round of studies were conducted, to develop genomic tools: phylogenetic framework, signatures, and RE digestion patterns. Once again, these tools ran into trouble in case of organisms which possessed multiple copies of rrs. The potential solution seems to lie in the genes common to all the species within a genus. Unique gene-RE digestion pattern allowed identification of novel biomarkers. It thus can be envisaged that the use of specific REs-gene combinations can be used for all kinds of phylogenetic and functional genomic analysis.