Keywords

1 Introduction

The first description of the microorganism in the early 1670s by Robert Hooke and Antonie van Leeuwenhoek under a microscope was the beginning of the field of microbiology. Microbiology then had several dissections and bifurcations owing to novel implications, in various fields such as taxonomy, medicine, agriculture and environment. Like any other field in science, the in-depth and comprehensive knowledge requires the fundamental understanding of the subject under speculation. By implying the term ‘fundamental’ here, we allude to the application of ‘taxonomy’ in bacteriology. Bacterial taxonomy at the outset serves as a platform to figure out the basic characteristics of the species and then correlates it with its phylogenetic properties. It deals with the identification of isolates, their classification into the taxa and creating new ones (if novel) and their nomenclature which are carried out in accordance with the rules and regulations laid down in the Bacteriological Code 1990 (revised in 2008 by Parker et al. 2018). Classification and identification has been served best with polyphasic studies, whereas the nomenclature has always been advised to imply and reflect the genomic association. It is as essential as any other discipline of the biological sciences because it provides a scientific framework for the salient understanding of the bacterial species.

Taxonomy and systematics have been often used interchangeably, but there lies a thin line that governs a difference between the two. Taxonomy is based on practical classification dictated by theory, whereas systematics is the evolutionary study of the diverse group of organisms and its related taxa. Conventionally, bacterial taxonomy helps us to picturise the evolutionary history and its concordant relationship with the nearby organisms. Bacterial taxonomy is aimed at achieving authentic, reliable and reproducible knowledge ready for dissemination. As of today, the newness of the bacterial taxa is determined by the phylogenetic status of an isolate mostly using the 16S ribosomal RNA (rRNA) gene in supplementation of the phenotypic and chemotaxonomic properties of the culture. Although conscious efforts are meticulously made to describe and define taxa validly, still there are gaps which need to be filled with the help of upcoming techniques. The present system of bacterial taxonomy has progressed and developed due to inclusions in the light of various taxonomical methods. It has also led to the unearthing of new valuable taxa thus giving microbiology a new dimension.

2 Historical Developments

Bacterial taxonomy became the most sought-after subject in the field of microbiology after its inception in the early 1600s due to quantum surge in the discoveries and inventions, thus escalating the accumulation of knowledge. Understanding bacteria at the surface level became futile which genuinely leads to the importance in the definition of the taxa under study. This ultimately led to the development in the robust methods for the taxonomy in bacteriology. Even without the advent of molecular work, various attempts were made consciously for the classification although it was discreetly based on morphology, therefore always trying to systematise and correlate its phenetic with phylogenetic characteristics and its significance. Partitioning domain bacteria into various taxa levels proved beneficial and productive. Till date, various methods have been incorporated and improvised for the enhancement of the bacterial taxonomy. Direct or indirect contributions by scientists throughout the history of biological sciences made it possible, and therefore, we see that one event led to another, thereby causing the definitive birth of bacterial taxonomy (Fig. 1.1).

Fig 1.1
figure 1

Diagrammatic representation of evolution of the classification systems extensively used in different era for appropriate delineation of bacterial taxa

The development of bacterial taxonomy can be traced into different phases:

Phase I (1600–1900 AD)

Taxonomic study during this phase was based on simple biological observations and morphological descriptions. It was in this era that Antonie van Leeuwenhoek and Robert Hooke first observed ‘animalcules’ like structure under the single-lens microscope in the early 1670s. In the next decade of the 1800s, maximum contributions were made to aid the morphological studies. Muller and his contemporaries played a major role in promoting taxonomy as they inherently perceived the importance of assigning taxa genuinely owing to its towering application. Koch developed agar plate technique for the isolation of pure cultures, making the isolation of bacterial species convenient. In 1872, Ferdinand Cohn proposed that bacteria could also be designated as genus and species. Assorted methods like acid-fast stain and Gram staining were developed for understanding the morphology by Paul Ehrlich and Christian Gram, respectively. Further, for the adept usage, petri plates were developed by R. J. Petri. These scientific landmarks improved the morphological studies and paved the ways for the future development in molecular-based taxonomy.

Phase II (1900–1980)

This phase saw the emergence of the application of biochemical and physiological properties for the taxa descriptions in accordance with the report presented by the Society of American Microbiologists (later changed into American Society for Microbiology) in 1923. This report also served as the ground for the publication of the first edition of Bergey’s Manual of Determinative Bacteriology by David Hendricks Bergey. An exclusive journal was then established at the fifth International Congress for Microbiology in 1950 as the International Bulletin of Bacteriological Nomenclature and Taxonomy (IBBNT) and then later renamed the International Journal of Systematic Bacteriology (IJSB) in 1966. It was only in 2000, that it was called International Journal of Systematic and Evolutionary Microbiology solely for the description of valid taxa (Oren 2015). Numerical taxonomy was widely used owing to the large datasets arising from the characterisation. The coming years saw the emergence of semantides (or semantophoretic molecules which are biological macromolecules that carry phylogenetic information about evolutionary history) and its applications. DNA-DNA hybridisation was widely applied from the 1960s by various groups and is still considered as the golden standard for the species description. Polyphasic term was first coined by Colwell (1970), where the combined approach of phenotypic, genotypic and chemotypic characterisation was applied for the genus Vibrio. It was finally in 1977 that Carl Woese implied the use of ribosomal RNA (16S rRNA) sequence to identify Archaea and Bacteria.

Phase III (From 1980 Till Now)

This phase saw the dawn of the genomic era and its wide application in the demarcation of the taxa. The following years showed marked development in the genomic techniques for the better understanding of the genome content. Walter Gilbert and Frederick Sanger in 1977 made the sequencing less tedious by initiating methods for DNA amplification (Heather and Chain 2016). Various DNA typing methods were applied for the determination of inter- and intraspecies relatedness (Stackebrandt et al. 2002). The years after 1980s saw the emergence of techniques targeting fractions of whole genome. Randomly amplified polymorphic DNA (RAPD), restriction fragment length polymorphism (RFLP), pulse-field gel electrophoresis (PFGE), ribotyping and amplified ribosomal DNA (rDNA) restriction analysis (AFLP) were widely celebrated and employed for delineating bacterial taxa. Further in 1998, multilocus sequence typing approaches (Maiden et al. 1998) and coupled in silico bar coding (Shivali et al. 2012) were used for annotation of genomic relatedness.

It was in the year 1995 that the first bacterial genome of Haemophilus influenzae was sequenced by Fleischmann’s group (Fleischmann et al. 1995). This was a breakthrough in the genomic era owing to significant developments contributed by the molecular biologists. Thus it began the genomic era resulting in the higher resolution in the genome analysis. Focussed genome studies are now possible owing to the marked development in molecular techniques along with advancement in the bioinformatics tools. Recently, a digital protologue database was designed in order to make a reposition of all the newly described species or the emended taxa (Rossellό-Mόra et al. 2017). According to the EzBioCloud database, 63,587 16S rRNA gene sequences have been deposited, and 92,802 genomes have been sequenced. Out of which, 23.04% (14650) are valid names, 0.81% (515) are invalid names, 0.46% (292) are Candidatus taxa, and the rest of 75.69% (48129) are the phylotypes of the total bacterial taxa (www.ezbiocloud.net/).

3 Importance of Bacterial Taxonomy in Applied Sciences

Taxonomy aids scientific communication as it allows the scientists to make predictions and frame hypotheses about the organisms. Often microbiologists use informal names like purple bacteria, sulphur bacteria and spirochetes, for example. However, in the scientific classification, each organism/species is assigned to a genus using a two-part binary name written in italic (underlined when handwritten) with a majuscule first letter with the exception of sobriquets for species and subspecies, e.g. Escherichia coli. The most widely accepted prokaryotic classification by microbiologist’s community appeared in the early 1990s in the Bergey’s Manual of Systematic Bacteriology as ‘Taxonomic outline of the Prokaryotes’ aiming to aid in the identification of species. For bacterial taxonomy, valid name must be in Latin or Neo-Latin using basic Latin letters only. Many species are named after person, either discoverer or a famous person in the field, for example, Shivajiella is named after Dr. Shivaji, an eminent Indian microbiologist who has made a significant contribution to our knowledge of heterotrophic bacteria from different predominantly cold habitats worldwide (Kumar et al. 2012). Many species (the specific epithet) are named after the place they are present or found, for example, the specific epithet of Rhodomicrobium udaipurense is named after the place (Udaipur) from where it was isolated (Ramana et al. 2013).

Usefulness of bacterial taxonomy to its core is evident by the heterogeneity in the metabolism of strain variation. For instance, from an evolutionary point of view, the species of the genus Shigella (S. dysenteriae, S. flexneri, S. boydii, S. sonnei) are strains of Escherichia coli (polyphyletic), but due to the difference in the genetic make-up of the pathogenic strains, they cause different medical conditions. Escherichia coli is a poorly classified species since some strains share only 20% of their genome. Being so diverse, it should be given a higher taxonomic rank. But, due to the maladies associated with the species and to avoid confusion in medical context, it remained unchanged. According to the National Centre for Biotechnology Information (NCBI), 3180 strains of Escherichia coli were reported (of which 2383 strains have their genome sequenced). Merely calling E. coli will not assure the organism under consideration. Thus, identity of the organism to its core (up to the strain level) is crucial before the organism is taken up either to industry or for research purpose.

There are cases where investigators misidentified the species resulting in taxonomic errors in classification. For instance, genus Agrobacterium is nested under Rhizobium based on molecular data. Thus, Agrobacterium species are transferred to the Rhizobium genus resulting in Rhizobium radiobacter (formerly Agrobacterium tumefaciens), Rhizobium rhizogenes, Rhizobium rubi, Rhizobium undicola and Rhizobium vitis. But, due to the plant pathogenic nature of Agrobacterium spp., maintaining the genus Agrobacterium was proposed and later counterargued. Similarly, in the order Pseudomonadales (Gammaproteobacteria), the genera Azotobacter and Azomonas macrocytogenes (true members of the genus Pseudomonas ) were misclassified due to nitrogen-fixing capabilities and the large size of the genus Pseudomonas thus rendering classification problematic. Also, the Bacillus species of the phylum Firmicutes, belonging to the ‘Bacillus cereus group’ (Bacillus anthracis, Bacillus thuringiensis, Bacillus weihenstephanensis, Bacillus mycoides, Bacillus pseudomycoides, Bacillus cereus and Bacillus medusa), have 99–100% 16S rRNA gene sequence similarity (97% being commonly cited acceptable species cut-off) and are polyphyletic but for medical reasons retained separate. Also, there are cases where investigators rectified the errors in taxonomic classification. For instance, Deinococcus radiodurans was originally classified as Micrococcus radiodurans by Anderson et al. (1956) due to its similarity with the genus Micrococcus, but later on, it was renamed Deinococcus radiodurans based on polyphasic data (Brooks and Murray 1981). Therefore, we see that precise bacterial identification is crucial for taking an organism for any study and thus avoiding any taxonomic error.

4 Prevailing Methods for Classification

Bacterial taxonomy first started with a vision of resolving its physical affiliation to its phylogeny in order to correlate its genomic imprints. The phenotype-based taxonomy led to the enormous addition of bacterial species because only few morphological factors were considered for the classification. More than 90% of all the species described in Bergey’s Manual were subsequently reduced, and only species included on the approved lists of bacterial names became validly named species (Skerman et al. 1980; Garrity 2016). Major amendments occurred due to the use of DNA-DNA hybridisation (DDH) and 16S rRNA gene applications. Hence, transitioning from simple to holistic approaches, bacterial taxonomy has come a long way. It has still left a room for numerous improvisations owing to the gradual advancement in science.

Till date, polyphasic approach has been relevantly applied for the taxonomic purpose. It includes chemotaxonomic features (cell wall components, quinones, polar lipids, etc.), morphology, staining behaviour, culture characteristics (medium, temperature, incubation time, etc.) and genetic properties (G + C content, DDH value, 16S rRNA gene sequence identity with other closely related species) (Tindal et al. 2010). In some cases, DDH values have been strictly advised to strengthen taxa delineation. According to the report by the Ad Hoc Committee of the International Committee for Systematic Bacteriology issued in 1897, the following parameters could be used for valid taxa description: (a) phenotypic (b) chemotaxonomic and (c) genotypic properties.

  1. (a)

    Phenotypic

Phenotypic methods form the basis for formal description of taxa, from species and subspecies up to genus and family level (Garrity 2016). Traditional phenotypic tests used in classical microbiological laboratories include characteristics of organism on different growth substrates and growth range in different conditions such as pH, temperature, salinity and susceptibility towards different antibiotic stress (Prakash et al. 2007). Phenotypic data which are analysed by using computer-assisted numerical comparison is known as numerical taxonomy. Phenotypic data matrices showing the degree of similarity between each pair of strains and cluster analysis resulting in dendrogram revealed a general picture of the phenotypic consistency of a particular group of strains. The advantage of phenotypic characterisation is that they can be easily observed, scored and measured without using any expensive technology. As the phenotypic characteristics depend on the conditional nature of gene expression, the same organism may show different phenotypic characteristics under various environmental conditions. Therefore, phenotypic data must be compared with a similar set of data from type strain of closely related organisms (Tindal et al. 2010).

  1. (b)

    Chemotaxonomy

The term ‘chemotaxonomy’ refers to the application of analytical methods based on various chemical constituents of the cell to classify bacteria (Komagata and Suzuki 1987; Tindal et al. 2010; Sutcliffe 2015). It has enabled the establishment of specific chemical markers for proper classification and identification. The most commonly used chemical markers include cell wall/membrane component such as peptidoglycan, teichoic acid, polar lipid composition, relative ratios of fatty acid, sugars, lipopolysaccharide, isoprenoid quinones, carotenoids, chlorophyll composition, polyamines and fermentation products. Peptidoglycan can be an excellent marker for the taxonomic studies as it is present in most of the phyla, even including Planctomycetes and Chlamydiae according to recent studies (Pilhofer et al. 2013; Liechti et al. 2014) except for Mycoplasma (Rottem and Naot 1998).

Bacteria vary in their membrane lipid composition; therefore, polar lipids are considered for classification and identification of bacteria. Various chemical structures of fatty acids have been identified. The variability in chain length, double-bond position and substituent groups has proven to be very useful for the characterisation of bacterial taxa. Fatty acids are also the major constituents of lipids and lipopolysaccharides in microbial cells and have been therefore extensively used for taxonomic purposes. The process is termed the fatty acid methyl ester (FAME) analysis. Often fatty acids of variable length between 9 and 20 are considered for classification (Sharmili and Ramasamy 2016). Hopanoids are pentacyclic triterpenoid sterol-like membrane lipids (Belin et al. 2018). Since hopanoids preserve source-specific information and can be linked with specific taxonomic group, physiological process, metabolic process or environmental condition, they can be used as a lipid biomarker (Cvejic et al. 2000; Blumenberg et al. 2012; Silipo et al. 2014). Hopanoids as chemotaxonomic markers were used in some of the recent studies (Tushar et al. 2015).

Isoprenoid quinones are components of cytoplasmic membrane of bacteria. Due to inconsistency of isoprenoid quinones along with difference in hydrogenation, saturation and side chain length, it acts as signature molecule for characterisation of bacteria at different taxonomic levels (Nowicka and Kruk 2010). Distribution of polyamines is universal in bacteria with significant quantitative and qualitative difference due to which they can be used as suitable chemotaxonomic markers. Depending on the group of organisms studied, polyamine patterning is being used to trace relatedness at and above the genus level and at the species level.

Whole-cell protein pattern can be analysed and compared for grouping of many closely related strains. Numerous studies have revealed a correlation between high similarity in whole-cell protein content and DNA-DNA hybridisation (Jarman et al. 2000). Identification is based on the comparison of the spectral database containing peptide mass fingerprints with the type strains by using the technique of MALDI-TOF (matrix assisted laser desorption/ionisation-time of flight) [Singhal et al. 2015]. It is applied to diagnose commensal bacterial species of Enterococcus sp. and Escherichia sp. by the determination of their unique spectra (Santos et al. 2015). Fourier transform infrared (FTIR) spectroscopy, on the other hand, uses the inherent property of the organism to produce specific metabolites to identify at the species and strain level (Naumann et al. 1991). When the whole microbial cells are excited by the absorption of the IR radiation, then it produces vibrational properties specific to the chemical bonds produced (Carlos et al. 2011). Metabolomic techniques like FTIR are often being used for the quick identification of bacteria on the basis of their particular metabolic fingerprints (Venkata Ramana et al. 2013). Biolog MicroPlates exploit the bacterial metabolism process for the utilisation of carbon sources. Species may be identified by specific colour change on the plate based on the metabolic fingerprint (Vehkala et al. 2015; Al-Dhabaan and Bakhali 2017). Lipidomes (Srinivas et al. 2016) and fermentomes (Sravanthi et al. 2016) are some of the chemomics used in recent bacterial taxonomy.

  1. (c)

    Genotypic

The genotype-based methods have completely changed the scenario in the bacterial systematics world. It has finally assisted to draw lines between the various taxa levels. It mostly focusses on the retrieving of genomic information like DNA-DNA hybridisation, G + C contents, rRNA gene sequence analysis and DNA-based typing methods (DNA fingerprinting). DDH is required when a new taxon shares more than 97% 16S rRNA gene sequence similarity (Tindall et al. 2010). Value equal to or higher than 70% has been recommended for the definition of members of a species (Wayne et al. 1987). The GC content is the calculated percentage of GC in the genome and therefore varies from one organism to another. Within prokaryotes, the G + C content varies between 20% and 80%. If the phylogenetic studies of an isolated strain reveal approximately 6% 16S rRNA gene sequence difference with its other closely related genus, then it can be recommended to represent the novel genus (Yarza et al. 2008).

It was considered that bacterial strains can be delineated with the data on 16S rRNA gene sequence analysis wherein the strains that show more than 3% sequence divergence are considered to represent different species (Rossellό-Mόra and Amann 2001). However, with good quality and near full-length sequences, the value has been revised to 98.7–99% (Sackebrandt and Ebers 2006). A bacterial species can be properly defined as the group of strains sharing 70% or more DNA-DNA hybridisation with 5 °C or less ΔTm value (Tm is the melting temperatures of the hybrid) among members of the group (Grimont 1981; Wayne et al. 1987). DDH is deemed necessary when strains share >98.7% 16S rRNA gene sequence identity. However, DDH has its own disadvantage because of which it cannot be applied to all the genera of prokaryotes. The difference in the sequences must be strongly supported by its distinctive characteristics. When a genetically close organism diverges in phenetic characteristics, then it can be ranked as a subspecies (Wayne et al. 1987).

The ribosomal locus such as the internal transcribed spacer region which is located between the 16S and the 23S rRNA genes has been scrutinised for the phylogenetic properties. Although this technique can outline the species/strain level but still at the lower level, it remains incongruous (Valera and Garcia-Martinez 2000). Multilocus sequence typing has been mostly used in epidemiology and pathological purposes but still generated its place in bacterial systematics. It has out sided the traditional procedure for determination of the genomic relatedness at inter- and intraspecific levels by sequence profiling of housekeeping genes (Maiden et al. 1998). The advantage of it not only lies in the application of the cultivable species, but also to those which are difficult to cultivate (Martens et al. 2008). Here, 6–11 housekeeping genes of the microbial species are profiled, which are around 470 bp long and stably selected. Amplified ribosomal DNA (rDNA) restriction analysis (ARDRA) can be used for the characterisation of bacterial isolates and has potential for analysing mixed bacteria communities. It is based on the principle of conserved restriction sites on the 16S rRNA which forms particular phylogenetic patterns specific to certain taxa (Abed 2008). The obtained banding pattern serves as a fingerprint for identification of respective bacteria. BOX-A1R-based repetitive extragenic palindromic-PCR (BOX-PCR) techniques play a vital role in the studies of microbial isolates from various environments.

The sequencing of the Haemophilus influenzae genome was a landmark in modern biology, as it marked the beginning of the genomic era. Next-generation sequencing (NGS) technologies introduced from 2005 provided a new platform resulting in a rapid increase in the prokaryotic genomes getting sequenced (Deurenberg et al. 2017; Besser et al. 2018). Genomic taxonomy is the newest addition to the bacterial systematics world. Genome microbial taxonomy is paving a new path for the dynamic system-based classification.

5 Advanced Genome-Based Bacterial Taxonomy

Polyphasic taxonomy complemented along with the molecular fingerprinting techniques (AFLP, RFLP and others) served for the delineation of the taxa for a long time (Rademaker et al. 2000; Gurtler and Mayall 2001; Van Belkum et al. 2001). There are certain gaps generated in the definition of species which remain to be duly filled. Genome-based taxonomy is the missing link and can bridge the gap between the genome and phenotype-based classification. It has been well proven that genomic signatures can be tapped for the definition of bacterial species. A uniform definition of the bacterial species on the establishment of genomics would be to consider the strains from the same species.

With all the advantages that each technique has added in this field, the whole-genome sequencing will be the ultimate step for the resolution of taxonomic position of prokaryotes. The fine advancement in technology and the reduction in the cost of the whole-genome sequencing have led to the monumental shift in the sequencing of the genomes. There are thousands of whole-genome sequences of prokaryotes available, but still only a few hundred are of the type strains, therefore greatly restricting the use of genomic data for the comparative use in taxonomy (Chun and Rainey 2014). Therefore, the governing body has made mandatory in some cases for the whole-genomic sequencing where it would be used to break the incertitude situation in outlining its taxonomic rank (Konstantinidis and Tiedje 2005a). Although not mandatory for publication, the inclusion of this genome sequence data is highly recommended and will be expected to include new taxa descriptions submitted to International Journal of Systematic and Evolutionary Microbiology.

In the recent years, whole-genome sequencing has assisted in solving the complex taxonomical positions of certain species of Vibrio, Mycoplasma, Xanthomonas and Prochlorococcus (Jones et al. 2004; Thompson et al. 2013; Barak et al.2016). On the basis of genomic parameters like average amino acid sequence identity (AAI) and average nucleotide sequence identity (ANI), data has been applied for the bacterial species definition and classification (Qin et al. 2014). The ANI of common genes between strains being compared is especially closely correlated with the level of DDH, and a 95–96% ANI value can serve as a genomic measure for prokaryotic species delineation (Konstantinidis and Tiedje 2005b). At the genus level, the percentage of conserved proteins (POCPs) was used for the robust indexing of the genus boundary for the prokaryotic group. If all the pairwise POCP values are higher than 50%, then it could be defined as prokaryotic genus (Qin et al. 2014).

Vibrios belonging to the Gammaproteobacteria are found in the surrounding environment and also in close association as pathogen with the plant and animal. There are around 152 species of the genus, most of which are in specific host-pathogenic relationship whether it be human or animals and some in mutualistic relationship (http://www.bacterio.cict.fr/index.html). Often techniques like MLSA, DDH and ΔTm are often conventionally used for the delineation of the Vibrio species (Thompson et al. 2004). They are usually difficult to segregate into taxa owing to similar genomic and phenotypic characteristics. In a recent study, Thompson et al. (2009) restructured the Vibrio genus by a study comprising 43 genomes and observed that vibrios were distributed into three major groups or genera of Vibrio, Photobacterium and Aliivibrio. Critical genome analysis of vibrios has evidently revealed the description of two novel species which were closely related to Vibrio cholerae. Similarly, species closely related to Mycoplasma are very difficult to delineate on the basis of their 16S rRNA similarity and DDH. For example, Mycoplasma pneumoniae and Mycoplasma genitalium have a 16S rRNA similarity of about 98%. When critical analyses of 46 different genomes of Mycoplasmas were done, it was observed that Mycoplasma pneumonia and Mycoplasma genitalium had only 73% MLSA similarity, 67% AAI and 88 Karlin genomic signatures. With many more observations based on genome, Mycoplasma was seen to be paraphyletic (Thompson et al. 2013).

More evidences need to be generated using core genome for structuring the flexible positions of few species. Another interesting case is that of Mycobacterium. Till date, there are 193 species and 13 subspecies validly described (http://www.bacterio.net). Mycobacteriaceae consist of pathogenic as well as non-pathogenic species. Mycobacterium tuberculosis and Mycobacterium leprae and Mycobacterium abscessus are considered as pathogenic, whereas Mycobacterium smegmatis and Mycobacterium thermoresistibile are non-pathogenic (Brosch et al. 2000; Prasanna and Mehra 2013). Miscellaneous software for bioinformatics especially designed for the analysis of genomes makes the annotation of several mycobacterial species feasible. In the present scenario, the advancement has led to the collective information in relation to the evolutionary traits, sequence homology, conserved regions and gene ontology content (Malhotra et al. 2017). The study of comparative genomic analysis of 21 mycobacteria conducted by Zakham et al. (2012) revealed that 1250 Mycobacterium gene families were conserved across all species. The Mycobacterium pan-genome showed a total of 20,000 gene families (Zakham et al. 2012). Moreover, it was seen that the pathogenic ones had undergone genome reduction and gained defined group of genes for repair and protection (Wassenaa et al. 2009; Zakham et al. 2012). Mycobacterium leprae is the pathogenic one with the diminutive genome with 1600 genes and approximately 1300 pseudogenes (Singh and Cole 2011). Functional orthologs of these pseudogenes (>75%) were present in other mycobacterial species belonging to various protein groups (Malhotra et al. 2017; Muro et al. 2011). Therefore, tapping these variations for the genomic identity can be an excellent tool for the taxonomic purposes.

Coleman and Spain (2003) first described Mycobacterium strain JS623 from the environmental sample based on the identity value of 96.7% (421 bp) 16S rRNA being more similar to Mycobacterium smegmatis. It was still studied as the strain under the Mycobacterium smegmatis species until Ramasamy et al. (2014) increased the species limit of delimitation to 98.7%. Subsequently undertaken methodical gene and genome analyses showed that strain JS623 is a mycobacterium more related to M. moriokaense than to M. smegmatis and indicate that it is not a member of this last species, as was previously believed (Garcia and Gola 2016). Therefore, JS623 was probably not a member of M. smegmatis.

We see that standard methods like 16S rRNA gene sequence analysis and DDH might not be superior in terms of establishing phylogenetic relationships. It has to be corroborated with other references generated from whole-genome sequencing. Genome-based taxonomy becomes very essential for the delineation of the closely related species in order to disclose the species-specific patterns. In terms of genomics, a collate prokaryotic species can be defined as the strains from the same species which share <10 in Karlin signatures (Karlin et al. 1997; Coenye and Vandamme 2004), > 95% AAI and ANI (Goris et al. 2007; Konstantinidis and Tiedje 2005a, b; Rohwer and Edwards 2002), > 95% identity based on multiple alignment genes (Thompson et al. 2008) and > 70% in silico genome-to-genome distance (Auch et al. 2010).

6 Future Prospectives

We have seen the expansion of bacterial taxonomy, mainly due to the contributions of more of accessible and unambiguous techniques. Further advent of molecular techniques added to their improved taxonomic studies. Now, we are in the transitional stages of omics wherein a lot of data generated shall be subdivided and specific markers shall be applied for the taxa studies. Already, huge genome data have been submitted in various databases. The Genomic Encyclopaedia of Bacteria and Archaea (GEBA) project is started by the DOE Joint Genome Institute in 2007 to pilot genome sequences of the type strains (Wu et al. 2009). This effort will definitely bridge in the gap which has arisen from the biased sequencing of only the physiologically advantageous ones. It has been suggested that time and again that the type strains of genomes can be used for comparative studies for taxonomical purposes (Chun and Rainey 2014). Whitman also emphasised on the future complementation of DNA and genome sequences to some extent to substitute the pure cultures as type materials to be deposited in culture collections (Whitman 2015). Moreover, the genomes reflect the biology of the organism and its evolutionary lineage pattern, therefore avoiding fabrication of redundant species. The expansion of the next-generation sequencing and its subsequent decrease in the price have resulted in the better understanding and scanning of gene families for fine delineation of the taxa. Dynamic approaches are being embraced for the accelerated taxonomical purposes of bacteria. Innovative technologies and study systems are underway for effective identification. The following systems are highly ambitious and visionary, thus enabling the filling of gaps and errors in the taxonomic studies. Advanced taxonomy can be summarised as follows:

  1. (i)

    Integrated Taxonomy: Although polyphasic taxonomy has solved the problem of ambiguous ranking of the taxa by its methodical application of phylogeny, chemotaxonomy and phenotype-based studies, there remains a generous gap for upgradation. Polyphasic studies have added some dimension to taxonomy, but it has remained confined to certain aspects as it does not reflect the genomic content of the organism. Incorporation of various metabolomic and physiological affixes in polyphasic taxonomy has its own impediment owing to the variability caused by the environmental differences. All the more, the attributes considered for classification remain boxed and not interlinked inherently with the genomic content. Therefore, there is a certain need for an immediate classification system that would encompass genome data for the formulation of new taxa. The unparallel system for such kind of taxonomy would be that of ‘integrated taxonomy’.

In the present context, scientists are exploring the aspects of genomics, transcriptomics and metabolomics to elucidate various processes and functions. Whole-genome sequencing plays a vital role in describing bacterial phylogeny through systems biology approach by their mechanistic genome annotation. Application of the same genomics for the translational-based studies would lead to the legitimate discerning of the relative species taxa based on the data generated. This type of integrated studies will help us assemble diverse information and put forth re-analysed phylogenetic history and novel biological proteins (Wu et al. 2009). Genome annotation of the taxa under study can ravel huge information owing to large generation of dataset. Translational and prediction-based inspection of the genome sequences can help us ascertain the production of possible novel metabolites and proteins specific to the taxa. Thus, all these compendious works shall complement 16S rRNA gene base for describing bacterial phylogeny with added values.

In the next-generation bacterial identification (NGBI), taxonomist would rely on both genomics and metabolomics for determining the microbial phylogeny. Postgenomic developments are, therefore useful in describing the phylogeny in a more determined way. Taxonomy, especially the bacterial one, has lawfully accepted the new advancing technologies. Therefore, polyphasic taxonomy can be clearly replaced by the integrated taxonomy.

  1. (ii)

    Systems Taxonomy: Even with the integrated taxonomy providing the genome framework for classification, yet system-based taxonomy remains the advisable and prudent system for classification. It is an ultimate and ambitious goal to have inclusive taxonomic studies embracing genomics, proteomics and metabolomics along with various other components. It would in fact consider all the factors which influence the survival of the bacteria under study. It is not too far owing to the paramount advancement in the technologies for the development of system-based taxonomy wherein compendious interdisciplinary subjects are included for the comprehensive yet precise functional taxonomy.

Generally, it would be a holistic approach to identify and assign taxa to the bacteria under studies. Microbial metabolomics, proteomics and transcriptomics are variable and dynamic in nature under different environmental conditions, but still they play a vital role in bacterial physiological processes. So far, all these factors are not considered for taxa delineation because of which we are failing to understand certain evolutionary divergences and convergences leading to speciation. System-based taxonomy can be the missing link for critical studies for taxa delineation. Moreover, systems-based taxonomy constitutes all the consolidated component of systems biology that affects the organism’s existence. It would as such consider all the parameters encompassing the taxa citing from a single cell to that of its complex interaction in the environments. Starting with the cell architecture (shape and size), its membrane components, biochemical activities, metataxon information (polar lipids, quinones, etc.) and physiological processes (growth mode, respiration, reproduction and energy metabolism) to its genomic fingerprints all would be considered. At another higher level, all the proteomic and metabolomic networking along with its ecological habitats and environmental factors (such as pH, temperature and salinity) shall be connected. Basically, these types of studies would network all the possible factors together into functional units for the proper understanding of the taxa under study and subsequently describe them into novel ones. Therefore, we see that this type of system-based taxonomy can solve major issues arising out of biased studies.

  1. (iii)

    Virtual Taxa: It is evident from numerous studies that less than 1% of the actual microbial wealth is known, whereas the remaining part lies undiscovered. Metagenomics has also strengthened the same scientific belief of yet to be uncultivated microbial wealth. Therefore, forming ranks with the help of virtual taxa would be the most appropriate need of the hour. Particularly, it would further strengthen the taxonomy of the uncultured or the Candidatus species status; therefore, virtual taxa can be well correlated. It is also at par with the Candidatus species status as it is strenuous to cultivate them. Virtual taxonomy can be defined as the identification and classification of single-cell bacteria based on its genomic DNA and other cellular parameters under consideration. Single cell, screened using fluorescence-activated cell sorting (FACS), can be genome sequenced to frame the virtual taxonomy of the organism using bioinformatics tools. This tendency can be targeted to make use of Candidatus species for studying the community analysis.

As such genome sequences would serve as the main source of information in postulating and circumscribing species that are not available as pure cultures (Konstantinidis and Rosselló-Móra 2015). Based on the DNA sequences retrieved from the metagenomic studies, virtual taxonomy could be functional and practical in which prediction and conclusive studies based on the genome information about its cell structure, physiology and biological roles could be reported (Fig. 1.2). Combining and analysing the results thus obtained will fetch ‘consensus taxonomy’ of a microorganism. As a consequence, there is an immediate need of non-culture-based rapid method of identification utilising molecular approaches mainly centring DNA-based methods selecting taxa-specific loci. This propensity will foster a rapid and cost-effective method for bacterial identification in the coming years.

Fig. 1.2
figure 2

Identification and classification of finely delineated virtual species by filtration of the epitaxonomic information and genomic information such as genomics, phenotyping, chemataxonomy, metataxon information, physiology and metabolic networking, using bioinformatic tools

7 Concluding Remarks

Since all the information applied in the taxonomic studies are based on assets and not just liability, integrated and system-based taxonomy has a good probability of expansive implementation in the future owing to its functionality. It may be a gradual change, but it surely helps in obtaining the bigger picture basically concatenating its genome to its state of being (phenotype) and also various environmental factors. Formulating virtual taxa related to taxonomy is on the other side a liability used only for aiding taxonomists to understand its inherent property of uncultivated state. It is a visualisation aid. Despite the fact that many methods have been realised and still being developed, there is still an undying need for the quick and rapid method of identification of bacteria.