Abstract
The current prokaryotic taxonomy classifies phenotypically and genotypically diverse microorganisms using a polyphasic approach. With advances in the next-generation sequencing technologies and computational tools for analysis of genomes, the traditional polyphasic method is complemented with genomic data to delineate and classify bacterial genera and species as an alternative to cumbersome and error-prone laboratory tests. This review discusses the applications of sequence-based tools and techniques for bacterial classification and provides a scheme for more robust and reproducible bacterial classification based on genomic data. The present review highlights promising tools and techniques such as ortho-Average Nucleotide Identity, Genome to Genome Distance Calculator and Multi Locus Sequence Analysis, which can be validly employed for characterizing novel microorganisms and assessing phylogenetic relationships. In addition, the review discusses the possibility of employing metagenomic data to assess the phylogenetic associations of uncultured microorganisms. Through this article, we present a review of genomic approaches that can be included in the scheme of taxonomy of bacteria and archaea based on computational and in silico advances to boost the credibility of taxonomic classification in this genomic era.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Prokaryotic taxonomy is pragmatic and gradually evolving as more and more organisms are being discovered with advancement in technological innovations. In the late nineteenth century, bacterial strains were delineated by using only phenotypic properties (Cohn 1872), which were soon found insufficient to classify diverse microorganisms that were subsequently isolated. Hence, physiological, chemotaxonomic and biochemical properties of bacteria were included in the bacterial classification system (Orla-Jensen 1909; Buchanan 1955). However, this approach was challenged in the 1970s and further strengthened by the inclusion of numerical taxonomy, DNA–DNA hybridization (DDH) (Brenner et al. 1969) and the introduction of the polyphasic taxonomy (Colwell 1970). Current taxonomic schemes are based on the polyphasic approach which specifically provides measures of evolutionary relationships using 16S rRNA gene and other genomic information such as DDH to determine the phylogenetic position of an isolate and is supplemented with phenotypic, chemotaxonomic and physiological properties to assess species novelty (Prakash et al. 2007).
The polyphasic approach, despite being widely used today, is in need of rectification due to (1) ambiguity in precise taxonomic assignments in light of genome data analysis, or due to inconsistency between taxonomic assignments and genomic data, (2) difficulty in classification of closely related strains and (3) discrepancy in assignments of some organisms which does not comply to their original assignments. Another major challenge that present day prokaryotic taxonomy is facing includes the goal to cover a huge landscape of around >106 bacterial and archaeal species (Yarza et al. 2014). So far, taxonomists have characterized only ~15,000 validly named species (Sutcliffe 2015) and there are many which are yet to be validly assigned with a taxonomic position. In line with these challenges, another factor which could be crucial is the limited use of genomic information in taxonomic identification of novel genera or species being introduced. At the genetic level, a species generally includes strains with approximately >70% DNA–DNA relatedness, <5 °C ΔTm, <5 mol% G+C content difference and >97% 16S rRNA gene identity (Wayne et al. 1987). In addition, a recent genome-level study has shown an intra-genus variation of up to 8% in %G+C content (Kumar et al. 2017). Recently, various genomic data analysis tools based on whole genome sequences such as Average Nucleotide Identity (ANI), Average Amino acid Identity (AAI) and in silico Genome to Genome Distance Hybridization (GGDH) have provided scientifically valid taxonomic standards for classifications. Thus, it is anticipated that taxonomy will be more steadily dependent on genomic signatures (Thompson et al. 2009, 2011) rather than relying only upon classical polyphasic characterization. This review focuses on a recent taxonomic workflow linking classical taxonomy with advances in genomics to supplement bacterial systematics. Although genomics data have been applied to microbial systematics for about a decade, the present review highlights recently described methods to accommodate the rapidly expanding field of genomics to classify microbial diversity.
Limitations of traditional taxonomy practices
While traditional methods have been reviewed from time to time, it is necessary to mention the limitations in brief in order to describe newer strategies for bacterial classification. During the past 30 years, microbiologists have relied mostly on single gene sequence information, namely the 16S rRNA gene sequence, for microbial classification. This universal prokaryotic gene is considered to behave as a molecular chronometer (Woese 1987; Vandamme and Peeters 2014; Yarza et al. 2014) and it allows analysis of phylogenetic relationships among distant taxa. The primary reason for its widespread use in the polyphasic approach is the universal presence of the 16S rRNA gene in all bacteria and archaea. The limitations include multiple copy numbers and intra-genomic sequence differences (2–5%) of this gene in some organisms (Schmidt et al. 2001; Acinas et al. 2004; Větrovský and Baldrian 2013; Kim et al. 2014). In addition, it has also been reported that the 16S rRNA gene has low phylogenetic resolution at the species level and poor resolving power for some genera due to its highly conserved nature (Jaspers and Overmann 2004; Hahn et al. 2016).
Despite this ambiguity, sequence variation in variable regions of the 16S rRNA gene sequence provides sufficient diversification to be considered for phylogenetic delineation of different taxa, while conserved regions are used as targets to design primers for polymerase chain reaction or hybridization probes. Apart from the 16S rRNA gene, other genes/genomic regions have also been used as marker genes for the study of taxonomic identification (Sharma et al. 2016b). The 16S-23S rRNA gene internal transcribed spacer sequence has been used to distinguish between Mycobacterium spp. and was found to be useful for species that are not distinguishable by 16S rRNA gene sequences (Roth et al. 1998). Similarly, 23S rRNA gene sequences have been helpful in distinguishing among Streptococcus spp. (Kotilainen et al. 2006). Other genes such as the citrate synthase gene in the genera Bartonella and Rickettsia and heat shock proteins in mycobacterial species have been used to define taxonomic relationships (Pai et al. 1997; Zeaiter et al. 2002; Lee et al. 2003; Fournier et al. 2003; Lassance et al. 2010; Verbeke et al. 2011).
For species delineation, another commonly used technique is DDH, which measures the genetic identity between pools of DNA of different strains. It provides the genetic distance between two organisms based on DNA hybridization percentage. As compared to the traditional radioactive method of DDH determination, a far better technique exploits the microtiter assay for DNA hybridization that uses fluorimetry to estimate DNA–DNA relatedness using SYBR Green I (Ezaki et al. 1989) and quantitatively determining fluorescence at increasing temperatures by using a Real-Time PCR thermal cycler (Gonzalez and Saiz-Jimenez 2005; Rosselló-Móra 2006; Tindall et al. 2010). However, even after such refinements, the DDH methods are not free from limitations, including a cut-off limit of ≥70% has not been applied consistently to all bacterial genera. In case of Rickettsia species, for example, a DDH of 70% would not distinguish Rickettsia rickettsi, Rickettsia conorii, Rickettsia sibirica and Rickettsia montanensis (Drancourt and Raoult 1994; Sentausa and Fournier 2013). Besides, this technique is based on comparative estimation; hence, no incremental database can be created. Additionally, this technique requires special facilities and is labor intensive, expensive and lacks reproducibility (Prakash et al. 2007).
Thus, it is evident from the above discussion, that characterization of bacteria cannot be based solely on approaches such as 16S rRNA gene sequence comparison and DDH values. However, to date, complete genome sequences and improved genome annotations have not yet resulted in reliable predictions of metabolic and chemotaxonomic features, as the ability to deduce the chemotaxonomic properties from genomic data is still in its infancy (Sutcliffe and Trujillo 2012). Thus, in the present scenario, we need an integrative method that employs the best aspects of the traditional polyphasic approach with genomic data to infer systematic relationships.
Coupled with advances in sequencing technologies and the availability of a large number of genome sequences (draft and complete), the application of computational tools have further provided an impetus to establish taxonomic schemes based on the evolutionary information contained in genome sequences. In many review articles (Klenk and Göker 2010; Thompson et al. 2011; McDonald et al. 2012), it has been projected that microbial taxonomy will be steadily more dependent on genome sequences rather than relying on the classical polyphasic approach, as will be described in this review. The use of genomic data is not new and genomic information can be easily harnessed to study inter- and intra-species relationships using concepts such as Karlin genomic signatures, AAI and in silico GGDH (Thompson et al. 2013). With these genome comparison concepts, we can now overcome the limitations posed by the traditional polyphasic approach. In addition, sequencing technologies have now become relatively affordable, easing their use in routine microbial identification (Loman and Pallen 2015).
Several research studies (Auch et al. 2010a, b; Meier-Kolthoff et al. 2013) and symposia discussions targeted at microbial taxonomy and systematics emphasized the study of genomics to resolve and overcome the limitations of the traditional polyphasic approach (Thompson et al. 2011; Větrovský and Baldrian 2013; Pillonel et al. 2015; Sangal et al. 2016). There was also a hot debate and discussion on this topic even at the Bergey’s International Society for Microbial Systematics Symposium Conference held on September 12th–16th, 2016 at Pune, India (http://bismis.org/?cm=body_bismis2016). The intuitive idea among the scientists engaged in taxonomy of prokaryotes is that it is only a matter of time till the current microbial taxonomy scheme will move towards genomics inferences (Bernard et al. 2010, Chun and Rainey 2014). As mentioned above, this shift is primarily due to limitations in the classical polyphasic taxonomy and the constant development of newer sequencing technologies and computational tools (Sangal et al. 2016).
Emergence of high throughput sequencing techniques
Advancements in next-generation sequencing technologies in the last decade have enabled easy access to economically feasible sequencing platforms. As a result, thousands of bacterial genome sequences (more than 87,400) are now available in the public database (https://www.ncbi.nlm.nih.gov/genome/browse/). However, most of these sequences do not represent type strains. Realizing this gap and exploiting the advantage of next-generation sequencing (NGS) technologies, several efforts are being made to sequence the type strains. For instance, the Genomic Encyclopaedia of Bacteria and Archaea (GEBA) was launched in 2007 (Wu et al. 2009) by the Joint Genome Institute with an aim to sequence 250 type strains from branches of the tree of life with low sequence representation. Since then, two phases of this project have been successfully completed (Krypides et al. 2014). Currently, the third phase of GEBA sequencing project is going on, which encompasses the genome sequencing of soil- and plant-associated bacterial type strains, including newly characterized type strains (Whitman et al. 2015). In any case, we believe that sufficient available supporting genomics data should be added to perform taxonomic characterization.
Minimum computational tools and reliable information required for genome annotations
As mentioned before, the emergence of cost-effective, high-throughput DNA sequencing technologies including Illumina, Solexa, Ion Torrent, Single Molecule Real Time (SMRT, PacBio) and Oxford Nanopore have made it possible to sequence bacterial or archaeal genomes even in general microbiology laboratories. Further advancement of these sequencing technologies coupled with lower sequencing cost has brought genomics to the forefront of modern microbial taxonomy. A new term called taxo-genomics to describe this approach has been coined by Ramasamy et al. (2014). Taxo-genomics as highlighted in many reports will not be a complete analysis of the genome(s) but will be dependent on minimum bioinformatics tools that can be easily handled and can at the same time provide sufficient information to resolve the taxonomy of a particular strain from draft or complete genomes (Dunlap et al. 2016; Kumar et al. 2015). Based on the currently available sequencing technologies and computational tools, we propose that taxo-genomics should include the use of comparative genomics methodologies, primarily Multi Locus Sequence Typing (MLST) (Miyoshi-Akiyama et al. 2013), Genome to Genome Distance Calculator (GGDC) (Meier-Kolthoff et al. 2013), Average Nucleotide Identity (ANI), Average Amino Acid Identity (AAI) (Konstantinidis and Tiedje 2005), Tetranucleotide frequency (Teeling et al. 2004), Codon Usage Bias (Ran et al. 2014), Pan genomic Analysis (Tettelin et al. 2005) and synteny analysis. However, this concept will need to be continuousy evaluated, as we understand that this scenario will change rapidly due to increasing developments in sequencing technologies and computational tools. The use of above mentioned minimum tools for taxo-genomics is described below.
Multi locus sequence typing/analysis
Multi-locus sequence typing (MLST) can be used to overcome the limitations of 16S rRNA gene methods and delineate closely related strains (showing >99% 16S rRNA gene sequence identity). This approach was used to characterize bacterial pathogen variants at the sub-species level as early as in 1998 when eleven housekeeping gene alleles were employed to reliably identify the major meningococcal lineages associated with invasive disease among Neisseria meningitides (Maiden et al. 1998). Further, these gene loci were compared and based on the variation, these sequences were provided with an allelic identifier. These alleles present at each locus were designated as a sequence type and the relationship between species was thus generated based on a comparison of these allelic profiles (Maiden et al. 1998).
A variation of MLST known as Multi Locus Sequence Analysis (MLSA) was developed to decipher phylogenies based on the concatenated sequences of various protein-coding housekeeping genes. Although the application of the 16S rRNA gene provides taxonomic resolution at the species level or above, integration of MLSA into prokaryotic taxonomy provides the additional advantage of assignment of taxonomic status at the sub-species level. Studies using MLSA (McTaggart et al. 2010; Glaeser and Kämpfer 2015) employ at least eight protein-coding genes for species delineation. Using the MLST approach, we have proposed several novel species having high 16S rRNA gene sequence identity to their neighbours e.g. Acinetobacter indicus A648T (97.6%) (Malhotra et al. 2012); “Thermus parvatiensis” RLT (99.5%) (Dwivedi et al. 2015) and Fictibacillus halophilus AS8T (99.9%) (Sharma et al. 2016b). However, there is a need to develop consensus among taxonomists to decide a universal set of genes to be employed in MLSA to accurately determine phylogenetic relationships among different species. For MLST, the following methods have been used:
-
1.
rMLST This method predicts species based on the use of 53 ribosomal genes (Jolley et al. 2004).
-
2.
TaxonomyFinder This method involves species prediction through the use of the proteome specific to taxonomic groups and incorporates data from three databases namely, PfamA, TIGRFAM and Superfamily, to cluster homologous proteins into protein families. This software is freely available at http://cge.cbs.dtu.dk/services/TaxonomyFinder/ (Lukjancenko et al. 2013).
-
3.
AMPHORA2 This tool uses 31 bacterial and 104 archaeal protein coding marker genes for phylotyping purpose. It is freely available at https://pitgroup.org/amphoranet/ (Wu and Scott 2012)
Average nucleotide identity (ANI) and average amino acid identity (AAI)
With the wide availability of bacterial genome sequences, the gold standard of identifying genome relatedness, DDH, has been superseded by the more reproducible, fast and easy to implement overall genome relatedness index (OGRI) methods. ANI is the most widely implemented OGRI algorithm for identification and measurement of overall genomic relatedness between two strains (Beaz-Hidalgo et al. 2015; Li et al. 2015a, b; Rosselló-Móra and Amann 2015; Yi and Chun 2015; Lee et al. 2016a, b). For taxonomic identifications, BLASTN based rather than mummer-based ANI has been successfully employed so far (Ramasamy et al. 2014).
Konstanitinidis and Tiedje as early as in 2005 proposed the use of ANI and AAI for comparative genomic analyses especially in relation to taxo-genomics (Konstantinidis and Tiedje 2005). Both of these approaches can be used to delineate inter-genomic distance between closely related prokaryotic species. A closer 16S rRNA similarity score (>98%) has been predicted (Goris et al. 2007), wherein ANI has been applied as more of a traditional tool for bacterial taxonomy. Average Nucleotide Identity by Orthology (OrthoANI) was introduced later, when an analysis using a total of 14,745 genome sequences (representing members of 10 genera) was conducted. A total of 63,690 genome pairs were analyzed and it was found that 55% of these pairs exhibited over 0.1 % discrepancy and 1101 pairs showed more than 1% discrepancy between reciprocal ANI values (Lee et al. 2016a, b). This level of discrepancy between reciprocal ANI values is significant enough to affect subsequent taxonomic interpretation, as approximately 95–96% ANI value is considered as the species boundary (Goris et al. 2007; Richter and Rosselló-Móra 2009; Chun and Rainey 2014).
Use of AAI has also been demonstrated to correlate with MLST. Konstantinidis and Tiedje (2005) studied the relationships among 175 strains using complete genome sequences based on the shared gene content and AAI. In this study, conserved gene content was predicted using a two-way BLAST-based algorithm (tBLASTN). By comparing the result of 16S rRNA- and AAI- based tree constructions, they concluded that AAI offered better resolving power within species and is a good option for phylogenetic comparisons. In fact, this approach has a major role to play in taxo-genomics (Konstantinidis and Tiedje 2005). In any case, it is proposed that rather than using the mean of nucleotide identity values between fragments of the query strain and the genome of the subject strain as a measure of genome relatedness, the mean of two reciprocal ANI values should be employed to measure the genome relatedness as described by Lee et al. (2016a, b).
Genome-to-genome distance calculator as replacement of DDH
The recent advent of in silico genome-to-genome comparison endeavours to replace the cumbersome DDH procedure. Genome-to-Genome Distance Calculator (GGDC) (Auch et al. 2010a) is a procedure used to calculate inter-genomic distances for outlining species relationships. GGDC is based on the Genome Blast Distance Phylogeny (GBDP) program (Henz et al. 2004), which can be used for genome-based species and subspecies delineation. Comparisons among genomes are made pair-wise using alignment programs of GGDC such as high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) (Auch et al. 2010b). Sequences can be directly uploaded on the GGDC web page available on the DSMZ website (http://ggdc.dsmz.de/). The user can choose the sequence similarity search tool from a list of algorithms including BLAST+, NCBI-BLAST, BLAT, BLASTZ, WU-BLAST and MUMMER (Meier-Kolthoff et al. 2013). Following this, distances are deduced based on score calculations. Finally, these values are converted into DDH values so as to make them comparable with laboratory ranges of DDH values. As with DDH, the species delineation cut-off with dDDH (digital DDH) is also 70% (Meier-Kolthoff et al. 2013). Additionally, even subspecies can be delineated using dDDH values for which a cut-off of 79% has been considered appropriate (Meier-Kolthoff et al. 2014a).
GGDC has been used to deduce DDH values in silico for the reclassification of Desulfurococcus mobilis as Desulfurococcus mucosus and reclassification of Desulfurococcus fermentans and Desulfurococcus kamchatkensis as Desulfurococcus amylolyticus (Perevalova et al. 2016). D. mobilis, D. fermentans and D. kamchatkensis had been characterized earlier using the polyphasic approach, but once their genome sequences were available, their classifications were re-assessed based on DDH values generated using GGDC. It was shown that reassociation values were much above 70% and hence their definitions were revised accordingly. Similarly, Methanocaldococcus bathoardescens has been successfully designated a novel species using purely in silico tools such ANI, GGDC and synteny analyses (Stewart et al. 2015). ANI and GGDC were also used as a replacement tool for DDH for the classifications of Paracoccus sanguinis (McGinnis et al. 2015) and Thermodesulfobium acidiphilum (Frolov et al. 2017). In another study (Lagkouvardos et al. 2016), the mouse gut metagenomic diversity was established and taxonomically assigned using purely in silico tools based on GGDC and %G+C difference analysis. In a recent study, “T. parvatiensis” was distinguished from its closely related neighbour Thermus thermophilus strains (>99% 16S rRNA gene identity) by employing GGDC, ANI, pan-genome and MLST (Tripathi et al. 2017). In various other studies, GGDC has been used for delimitation of both species and subspecies.
Recent additions to the online GGDC 2.1 calculator include the phylogeny pipeline for calculation of taxonomic relationships using individual genes by estimation of pairwise similarities. Another feature available is the %GC difference calculator (Meier-Kolthoff et al. 2014b). The %GC difference calculator is based on the idea that two strains of a single species cannot have a %GC difference more than one percent. GGDC has also been shown to provide accurate DDH values even from draft genomes (representing 97–99% of the genetic information) (Meier-Kolthoff et al. 2013) using Formula 2 (identities/HSP length). Formula 2 (also, the recommended Formula) calculates GGDC irrespective of the total length of the genome, whereas the other two formulas, [Formula: 1 (HSP length/total length) and Formula: 3 (identities/total length)], are dependent on the length of the genome and hence must be used only for complete genomes. In addition, GGDC also calculates confidence intervals (both model-based and resampling-based/bootstrapping) along with each pairwise genome comparison (Meier-Kolthoff et al. 2013) and delivers a value on the same scale as DDH, thus making it easy to compare. Hence, instead of performing inconvenient and not so reliable DDHs, there is an incremental demand to shift to GGD estimations that are rapid, convenient, reliable and accurate.
Tetranucleotide frequency
In addition to ANI and AAI, tetranucleotide frequency analysis, based on the differences in the frequency of occurrence of four nucleotides (A,T,G,C) between two genomes, can also be employed to classify a given pair of organisms or to supplement other data (Teeling et al. 2004; Kim et al. 2014). Here, all possible combinations of tetranucleotide frequencies (256) for each oligonucleotide sequence are calculated. This alignment-free method has been shown to correlate well with ANI. These methods are more reliable as these consider the whole genome as compared to the 16S rRNA gene sequence and single copy gene approaches like AMPHORA2 (which use 31 bacterial housekeeping genes and 107 archael essential marker genes). This approach acquires major significance when taxonomic marker genes are absent in the genome of a particular organism (Alsop and Raymond 2013). Despite all these limitations, these signatures have increased the taxonomic resolving power by differentiating even very closely related (intra-specific) organisms. In our opinion, ANI, AAI and tetranucleotide frequency data can also be merged with codon usage bias and codon preference for bacterial systematic as described below.
Codon usage bias and codon preference
Codon usage bias refers to the difference in the frequency of occurrence of synonymous codons in coding DNA. Alternatively, it can be said that in a wide variety of organisms belonging to a particular species, synonymous codons are used with different frequencies, a phenomenon which is known as codon bias (Hershberg and Petrov 2008; Lal et al. 2016).
In prokaryotes, there is a wide range of factors responsible for such selective preference for a particular codon, including gene expression level and percent GC composition. Codon bias is substantially correlated with the level of gene expression, with the strongest influence on the highly expressed genes (Gouy and Gauiter 1982; Akashi 2003). Many studies have correlated the expression of genes with selected codon synonyms in organisms like Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Escherichia coli for optimized translation (Stenico et al. 1994; Akashi 1996; Ghaemmaghami et al. 2003; Fraser et al. 2004; Goetz and Fuglsang 2005). The influence of codon usage bias on encoded amino acids is associated with the %GC content, with the strongest effects on regions with high GC content (Li et al. 2015a, b). %GC content is probably determined mostly by genome-wide processes and less by specific selective forces on coding regions (Sueoka 1962). However, factors like amino acid conservation, protein stability, mutational bias for leading and lagging strand and extensive HGT under selective pressure are some other factors responsible for codon bias (Hershberg and Petrov 2008). Sharp et al. (1986) proposed Relative Synonymous Codon Usage (RSCU), which gives an idea of the most frequently used codons for a specific amino acid with respect to other codons. However, this estimation has inherent limitation to identify two closely related organisms, as the demarcation between them is solely on the basis of synonymous codons. With advancement of computational tools, it is now easy to generate and compare the codon usage bias even in closely related organisms in the form of codon usage bias tables. Commonly used genetic codon frequency tables in different expression host organisms are available online as a database (http://www.genscript.com/cgi-bin/tools/codon_freq_table) for the identification of new species (Nakamura et al. 2000).
Codon usage bias creates a pattern by selecting specific codons for an amino acid over others and can be specific for a gene, genes or genomes. However, this naturally occurring phenomenon can be measured and used for classification of a novel species as supplementary information as well as for comparative studies. Beyond the classical 16S rRNA method, Clusters of Orthologous Groups (COG) within species can be verified for codon usage bias to distinguish them in different species. Recently, taxonomists have been able to resolve the longstanding questions about codon bias and this pattern has been linked with the protein synthesis hypothesis and role in prokaryotic systematics (Plotkin and Kudla 2011; Wald et al. 2012; Ran et al. 2014; Babbitt et al. 2015).
Role of pan-genome in taxo-genomics
The pan genome refers to the sum total gene sets in a clade and is composed of both the core and the variable or accessory genome. With the upcoming challenges faced by the classical techniques, pan-genomics seems to be an additional resource to address taxonomic questions. DNA–DNA reassociation studies, multi-locus (16S rRNA and single-copy genes) typing and a variety of single and multigene approaches derive conclusions only on the basis of the core genome, but ignore the accessory genome. Pan-genomic analysis was introduced by Tettelin et al. (2005) in a study where 8 genomes of Streptococcus agalactiae strains were used to obtain the patho-genome of strains, which led to enhanced development of vaccines and understanding the virulence markers. Additional studies have been conducted where metagenomic recruitments were used to determine the pan-genome. For instance, Salinibacter genomes were taxonomically placed using metagenome binning of saturated brine shotgun data (Pašić et al. 2009). Similarly, over 30 genomes belonging to Actinobacterium, Nitrosomonadales, Polynucleobacter, Chlorobium, Holophagales, Methylotenera and Desulfobulbus were phylogenetically differentiated using global alignments and metagenome data from a freshwater lake (Bendall et al. 2016). Also, this approach was employed to characterize genomes from Actinobacteria from brackish waters of the Caspian Sea (Mehrshad et al. 2015). Additionally, a recent metagenomic study identified the core and accessory pathogenomes of opportunistic Cellulosimicrobium cellulans species (Sharma et al. 2016a). Pan-genomics thus helps to determine the accessory genome content, which can be a significant for delineating closely related species (Caputo et al. 2015). Future advances in bacterial systematics may also encompass pan-genomic information as a critical factor to delineate novel taxa.
Importance of synteny analyses for identifying taxonomic relationships
The concept of including all available genetic information for taxonomic purposes was termed as “integrative taxonomy” (Will et al. 2005). Synteny analysis is one such approach that holds potential not only for taxonomic characterization of bacteria but also for studying the phylogenetic and evolutionary relationships among bacteria. Synteny (“same ribbon” in Greek) means comparison of the order of arrangement of genes on a chromosome or a plasmid among different genomes. A generally accepted fact is that closely related genomes will have a similar arrangement of genes (synteny within strains > species > genera). A disordered arrangement of genes (or lack of synteny) may be seen among species of genera having undergone major genetic rearrangements and HGT events (Verma et al. 2017). In order to study synteny, genomes can be visualized linearly using Mauve (Darling et al. 2004) and circular representations can be generated using BRIG (Alikhan et al. 2011). Mauve is a cross-platform GUI genome alignment package that uses an anchored alignment approach to generate multiple alignments of long reads, contigs, scaffolds and whole genomes. It identifies locally collinear blocks among the genomes being aligned. More than two genomes can also be aligned using Mauve. Less synteny is observed as the taxonomic distance increases. Mauve compares the location of these blocks among the genomes, shedding light on the organizational arrangement of genetic blocks and large-scale genomic rearrangements.
M. bathoardescens is one such novel strain for which synteny analysis has paved the way for its classification and differentiation from nearest phylogenetic neighbours and specifically highlighted regions specific to individual genomes (Stewart et al. 2015). In a recent study, Pucker et al. (2016) used a reciprocal best hits (RBH) synteny approach to demonstrate the occurrence of large presence/absence variations (PAVs) in Arabidopsis thaliana genomic strains Nd-1 and Col-0, thus differentiating the two. Tools for analyzing synteny are evolving and recent additions to this tool kit include Multisyn (Baek et al. 2016), Phagonaute (Delattre et al. 2016), Synteny Portal (Lee et al. 2016a, b) and Vector Graph Toolkit of Genome Synteny and Collinearity (VGSC) (Xu et al. 2016), among others. SyntTax (Oberto 2013) is a tool that depicts taxonomy-based synteny arrangement. Synteny studies in the plant family Solanaceae (Wang et al. 2008; Rinaldi et al. 2016) have headed the way of understanding evolution through comparative mapping. In the case of Methanocaldococcus bathoardescens JH146-22, full-genome synteny analysis has been used to validate its taxonomy (Stewart et al. 2015). Among circular visualization methods to view and compare genomes, BRIG (Blast Ring Image Generator) is a powerful and easy to use application. BRIG compares the query genomes with a reference genome using BLAST (Altschul et al. 1990). Multiple genome comparisons can be shown in a single image displaying similarity between reference and query genome sequences concentrically. Visualization can be optimized by using cut-off e-value or minimum percentage identity to filter the results. BRIG comparisons can highlight regions of similarity and demonstrate overall genome similarity visually (Verma et al. 2014), thus helping to deduce relatedness of genomes based on whole genome sequences.
Linking functional genomics to taxonomy and systematics
While functional genomics does not have any direct link with taxonomy and systematics, it certainly plays a significant role in understanding evolutionary relationships among microorganisms. Using genome sequence data, attempts to study the functional aspects of a genome can be used as a modern approach beyond the conventional taxonomic methods (Khanna et al. 2011; Sharma et al. 2014; Puri et al. 2016). Functional genomics deals with the identification of genes and proteins and their interactions in different metabolic pathways. Functional potential of a genome can be analyzed by studying gene expression profiles, small non-coding RNAs, mutations (e.g. single nucleotide polymorphisms), proteomics, DNA methylation and genome-wide association studies (Chen et al. 2009). Functional profiling can be used to draw comparisons between different bacterial species by studying differentially enriched metabolic pathways in the genomes. FragGeneScan (Rho et al. 2010) and Web-MGA (Wu et al. 2011) are used for predicting open reading frames (ORFs) in the contigs whereas tools for functional annotation of predicted ORFs include KAAS (KEGG Automated Annotation Server; Moriya et al. 2007) and searching the COG database (Tatusov et al. 2001) for assigning KEGG ontology (KO) numbers and Clusters of orthologous genes (COG) categories, respectively. MinPath (Ye and Doak 2009) and FMM (Chou et al. 2009) are used for metabolic reconstruction of pathways. The above-mentioned tools mostly use KEGG as a single reference database and do not provide any statistical output whereas tools like metaSHARK (Hyland et al. 2006), MEGAN6 and MG-RAST (Meyer et al. 2008) use different databases, PRIAM, SEED and eggNOG, respectively. KEGG is an integrated knowledge-based reference database which consists of 17 main databases in four broad groups, namely genomic information, systems information (metabolic pathways), chemical information (metabolites, ligands, enzymes), and health information (drug and diseases).
Although taxonomic analyses using the 16S rRNA approach have been successful, they fail to explain gaps in evolution caused by the functional replacement of genes, HGT, or gene duplications (Hong et al. 2004). Phylogeny based on metabolic pathway content can account for overall evolutionary processes and therefore help in understanding adaptation mechanisms in micro-organisms inhabiting diverse environmental niches. Functional genomic analyses to infer the phylogenetic relationships show divergent functional profiles of taxa and clades (Chai et al. 2014). This correlation can be used to identify clade-specific cellular functions both with low and high parsimony scores. These clade-specific cellular functions can also be used in addition to conventional approach for taxonomic characterization of novel bacterial species. For instance, Conserved Signature Indels (CSIs) represented within the protein sequences are used as molecular markers to infer phylogeny and molecular placement of species into specific clades.
Future challenges in the evolving field of taxonomy
The enormous amount of genomic information that is becoming available by cultivation-independent approaches is forcing taxonomists to discover ways to deal with uncultured microbial diversity and explore it further. As is widely known, metagenomics refers to the study and analysis of genetic material recovered directly from environmental samples. It has been estimated that only 0.1% of the microbial diversity is cultured (Davis et al. 2005); therefore, taxonomic classification of remaining 99.9% of uncultured microbes using the standard polyphasic methods is very challenging. Nowadays, metagenomics plays a key role in identifying abundant bacteria present in environmental samples, some of which are difficult to isolate using canonical techniques. With advances in bio-computational tools, metagenomics data can be taxonomically classified following binning the metagenomics reads or contigs based on species-specific parameters like tetranucleotide frequency, %GC content and clade-specific markers. The draft genomes generated through this approach can be used to infer important metabolic pathways present in the uncultured organisms and this information can be further utilized for modifying cultivation approaches in such a way so that these difficult-to-culture organisms can be enriched by using special media, filtration processes, different temperatures or specific electron donors or acceptors. The use of metagenomics for taxonomic classification of uncultured microorganism is in its developing stage and will advance with time. With advancements in technology, modern tools are being developed that can efficiently perform taxonomic classification of majority of the microorganisms inhabiting any particular environment.
Using the information gathered from the in silico analysis, attempts are being made to culture or isolate the uncultured microbes (Joseph et al. 2003). The classical taxonomy based on the traditional polyphasic approach is shifting to a more advanced approach in association with bioinformatics. Co-culturing bacteria is also an alternative to effectively culture the uncultured (Stewart 2012) and here metagenomics along with metaproteomics can play a crucial role in deciphering the community dynamics. Not only cultivation and taxonomic classification, but a proper preservation of environmental samples is also important. For this, Biological Resource Centres with well-equipped infrastructures for ex situ preservation of native biodiversity for future research, reference, and applications should be established (Prakash et al. 2007).
Conclusions
Taxonomic assignments are critical in deciphering and classifying the rich microbial diversity that is being unraveled by science at an unparalleled pace. Advancements in genomics have led to robust and stable notation of characterization and assignment of taxonomic descriptions. Even with inherent limitations, the 16S rRNA gene-based taxonomic assessment is still the basis for systematics. However, with advancements in NGS methods and availability of genome data, there has been an incremental development of various genome analysis algorithms, thus moving taxonomic delineation of prokaryotes towards the new era of taxo-genomics.
Despite the debate on genome-based phylogeny (as data on genome-based taxonomy remains scarce even at present) there is a need to supplement effort and set some norms to characterize prokaryotic species with supplementation of basic genomic annotations which should lead to more distinctive bacterial classification. In total, the efforts focusing on both the minimalist and/or genomic approaches to identify novel taxa will prevent redundancy in prokaryotic nomenclature. Thus, the fundamental principles of taxonomy must not be abandoned, but supplementing the emerging taxonomic studies with genomic data would make the results more robust and conclusive. Based on the review of the available classical taxonomic approach along with the taxo-genomics and the upcoming taxonomic methods, as discussed in this study, we propose a minimum parameter or set of tools for performing taxonomic characterization of prokaryotes (Fig. 1).
References
Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF (2004) Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J Bacteriol 186:2629–2635
Akashi H (1996) Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution and larger proteins in D. melanogaster. Genetics 144:1297–1307
Akashi H (2003) Translational selection and yeast proteome evolution. Genetics 164:1291–1303
Alikhan NF, Petty NK, Zakour NLB, Beatson SA (2011) BLAST ring image generator (BRIG): simple prokaryote genome comparisons. BMC Genom 12:402
Alsop EB, Raymond J (2013) Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification. PLoS ONE 8:e67337
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Auch AF, Klenk HP, Göker M (2010a) Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs. Stand Genom Sci 2:142–148
Auch AF, von Jan M, Klenk HP, Göker M (2010b) Digital DNA-DNA hybridisation for microbial species delineation by means of genome-to-genome sequence comparison. Stand Genom Sci 2:117–134
Babbitt GA, Alawad M, Schulze KV, Hudson AO (2015) Synonymous codon bias and functional constraint on GC3-related DNA backbone dynamics in the prokaryotic nucleoid. Nucleic Acids Res 42:10915–10926
Baek JK, Kim J, Kim CK, Sohn SH, Choi D, Ratnaparkhe MB, Kim DW, Lee TH (2016) MultiSyn: a webtool for multiple synteny detection and visualization of user’s sequence of interest compared to public plant species. Evol Bioinform Online 12:193
Beaz-Hidalgo R, Hossain MJ, Liles MR, Figueras MJ (2015) Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for Aeromonas genomes in the GenBank database. PLoS ONE 10:e0115813
Bendall ML, Stevens SL, Chan LK, Malfatti S, Schwientek P et al (2016) Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME J 10:1589–1601
Bernard HU, Burk RD, Chen Z, van Doorslaer K, zur Hausen H, de Villiers EM (2010) Classification of papillomaviruses (PVs) based on 189 PV types and proposal of taxonomic amendments. Virology 401:70–79
Brenner DJ, Fanning GR, Rake AV, Johnson KE (1969) Batch procedure for thermal elution of DNA from hydroxyapatite. Anal Biochem 28:447–459
Buchanan RE (1955) Taxonomy. Annu Rev Microbiol 9:1–20
Caputo A, Merhej V, Georgiades K, Fournier PE, Croce O, Robert C, Raoult D (2015) Pan-genomic analysis to redefine species and subspecies based on quantum discontinuous variation: the Klebsiella paradigm. Biol Direct 10:55
Chai J, Kora G, Ahn TH, Hyatt D, Pan C (2014) Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam. BMC Evol Biol 14:207
Chen X, Jorgenson E, Cheung ST (2009) New tools for functional genomic analysis. Drug Discov Today 14:754–760
Chou CH, Chang WC, Chiu CM, Huang CC, Huang HD (2009) FMM: a web server for metabolic pathway reconstruction and comparative analysis. Nucleic Acids Res 37:W129–W134
Chun J, Rainey FA (2014) Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. Int J Syst Evol Microbiol 64:316–324
Cohn F (1872) Grundzüge einer neuen natürlichen Anordnung der kryptogamischen Kulturpflanzen. Jahresb Schle Ges f vaterl Kultur 49:83–89
Colwell RR (1970) Polyphasic taxonomy of the genus Vibrio: numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. J Bacteriol 104:410–433
Darling ACE, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14:1394–1403
Davis KER, Joseph SJ, Janssen PH (2005) Effects of growth medium, inoculum size, and incubation time on culturability and isolation of soil bacteria. Appl Environ Microbiol 71:826–834
Delattre HO, Souiai K, Fagoonee R, Guerois MA (2016) Phagonaute: a web-based interface for phage synteny browsing and protein function prediction. Virology 496:42–50
Drancourt M, Raoult D (1994) Taxonomic position of the Rickettsiae: current knowledge. FEMS Microbiol Rev 13:13–24
Dunlap CA, Bowman MJ, Schisler DA, Rooney AP (2016) Genome analysis shows Bacillus axarquiensis is not a later heterotypic synonym of Bacillus mojavensis; reclassification of Bacillus malacitensis and Brevibacterium halotolerans as heterotypic synonyms of Bacillus axarquiensis. Int J Syst Evol Microbiol 66:2438–2443
Dwivedi V, Kumari K, Gupta SK, Kumari R, Tripathi C, Lata P, Niharika N, Singh AK, Kumar R, Nigam A, Garg N, Lal R (2015) Thermus parvatiensis RLT sp. nov., isolated form a hot water spring located atop the Himalayan ranges at Manikaran, India. Indian J Microbiol 55:357–365
Ezaki T, Hashimoto Y, Yabuuchi E (1989) Fluorometric deoxyribonucleic acid-deoxyribonucleic acid hybridization in microdilution wells as an alternative to membrane filter hybridization in which radioisotopes are used to determine genetic relatedness among bacterial strains. Int J Syst Evol Microbiol 39:224–229
Fournier PE, Dumler JS, Greub G, Zhang J, Wu Y, Raoult D (2003) Gene sequence-based criteria for identification of new Rickettsia isolates and description of Rickettsia heilongjiangensis sp. nov. J Clin Microbiol 41:5456–5465
Fraser HB, Hirsh AE, Wall DP, Eisen MB (2004) Coevolution of gene expression among interacting proteins. Proc Natl Acad Sci USA 101:9033–9038
Frolov EN, Kublanov IV, Toshchakov SV, Samarov NI, Novikov AA, Lebedinsky AV, Bonch-Osmolovskaya EA, Chernyh NA (2017) Thermodesulfobium acidiphilum sp. nov., a thermoacidophilic, sulfate-reducing, chemoautotrophic bacterium from a thermal site. Int J Syst Evol Microbiol 67:1482–1485
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A et al (2003) Global analysis of protein expression in yeast. Nature 425:737–741
Glaeser SP, Kämpfer P (2015) Multilocus sequence analysis (MLSA) in prokaryotic taxonomy. Syst Appl Microbiol 38:237–245
Goetz RM, Fuglsang A (2005) Correlation of codon bias measures with mRNA levels: analysis of transcriptome data from Escherichia coli. Biochem Biophys Res Commun 327:4–7
Gonzalez JM, Saiz-Jimenez C (2005) A simple fluorimetric method for the estimation of DNA–DNA relatedness between closely related microorganisms by thermal denaturation temperatures. Extremophiles 9:75–79
Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA to DNA hybridisation values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91
Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7074
Hahn MW, Jezberová J, Koll U, Beck-Saueressig T, Schmidt J (2016) Complete ecological isolation and cryptic diversity in Polynucleobacter bacteria not resolved by 16S rRNA gene sequences. ISME J 10:1642–1655
Henz SR, Huson DH, Auch AF, Nieselt-Struwe K, Schuster SC (2004) Whole-genome prokaryotic phylogeny. Bioinformatics 21:2329–2335
Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299
Hong SH, Kim TY, Lee SY (2004) Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Appl Microbiol Biotechnol 65:203–210
Hyland C, Pinney JW, McConkey GA, Westhead DR (2006) metaSHARK: a WWW platform for interactive exploration of metabolic networks. Nucleic Acids Res 34:W725–W728
Jaspers E, Overmann J (2004) Ecological significance of Microdiversity: identical 16S rRNA gene sequences can be found in bacteria with highly divergent genomes and ecophysiologies. Appl Environ Microbiol 70:4831–4839
Jolley KA, Chan MS, Maiden MCJ (2004) mlstdbNet–distributed multi-locus sequence typing (MLST) databases. BMC Bioinform 5:86
Joseph SJ, Hugenholtz P, Sangwan P, Osborne CA, Janssen PH (2003) Laboratory cultivation of widespread and previously uncultured soil bacteria. Appl Environ Microbiol 69:7210–7215
Khanna M, Solanki R, Lal R (2011) Selective isolation of rare actinomycetes producing novel antimicrobial compounds. Int J Adv Biotechnol Res 2:357–375
Kim M, Oh HS, Park SC, Chun J (2014) Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64:346–351
Klenk HP, Göker M (2010) En route to a genome-based classification of Archaea and Bacteria? Syst Appl Microbiol 33:175–182
Konstantinidis KT, Tiedje JM (2005) Towards a genome-based taxonomy for prokaryotes. J Bacteriol 187:6258–6264
Kotilainen P, Heiro M, Jalava J, Rantakokko V, Nikoskelainen J, Nikkari S, Rantakokko-Jalava K (2006) Aetiological diagnosis of infective endocarditis by direct amplification of rRNA genes from surgically removed valve tissue. An 11-year experience in a Finnish teaching hospital. Ann Med 38:263–273
Krypides NC, Woyke T, Eisen JA, Garrity G, Lilburn TG, Beck B et al (2014) Genomic encyclopedia of type strains, phase I: the one thousand microbial genomes (KMG-I) project. Stand Genome Sci 9:1278
Kumar A, Bajaj A, Kumar RM, Kaur G, Kaur N, Singh NK, Manickam N, Mayilraj S (2015) Taxonomic description and genome sequence of Rheinheimera mesophila sp. nov., isolated from an industrial waste site. Int J Syst Evol Microbiol 65:3666–3673
Kumar R, Verma H, Haider S, Bajaj A, Sood U, Ponnusamy K, Nagar S, Shakarad MN, Negi RK, Singh Y, Khurana JP, Gilbert JA, Lal R (2017) Comparative genomic analysis reveals habitat-specific genes and regulatory hubs within the genus Novosphingobium. mSystems 2:e00020-17
Lagkouvardos I, Pukall R, Abt B, Foesel BU, Meier-Kolthoff JP, Kumar N, Bresciani A et al (2016) The mouse intestinal bacterial collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota. Nat Microbiol 1:16131
Lal D, Verma M, Behura S, Lal R (2016) Codon usage bias in phylum Actinobacteria: relevance to environmental adaptation and host pathogenicity. Res Microbiol 167:669–677
Lassance J, Groot T, Liénard MA, Antony B, Borgwardt C, Andersson F, Hedenström E, Heckel DG, Löfstedt C (2010) Allelic variation in a fatty-acyl reductase gene causes divergence in moth sex pheromones. Nature 466:486–489
Lee JH, Park HS, Jang WJ, Koh SE, Kim JM, Shim SK, Park MY, Kim YW et al (2003) Differentiation of Rickettsiae by groEL gene analysis. J Clin Microbiol 41:2952–2960
Lee I, Kim YO, Park SC, Chun J (2016a) OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol 66:1100–1103
Lee J, Hong WY, Cho M, Sim M, Lee D, Ko Y, Kim J (2016b) Synteny Portal: a web-based application portal for synteny block analysis. Nucleic Acids Res 44:W35–W40
Li J, Zhou J, Wu Y, Yang S, Tian Dacheng (2015a) GC-content of synonymous codons profoundly influences amino acid usage. G3 (Bethesda) 5:2027–2036
Li X, Huang Y, Whitman WB (2015b) The relationship of the whole genome sequence identity to DNA hybridisation varies between genera of prokaryotes. Antonie Van Leeuwenhoek 107:241–249
Loman NJ, Pallen MJ (2015) Twenty years of bacterial genome sequencing. Nat Rev Microbiol 13:787–794
Lukjancenko O, Thomsen MC, Larsen MV, Ussery DW (2013) Panfunpro: PAN-genome analysis based on functional profiles. F1000Research 2:265
Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, Feavers IM, Achtman M, Spratt BG (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95:3140–3145
Malhotra J, Anand S, Jindal S, Raman R, Lal R (2012) Acinetobacter indicus sp. nov. isolated from a hexachlorocyclohexane dumpsite. Int J Syst Evol Microbiol 62:2883–2890
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P (2012) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6:610–618
McGinnis JM, Cole JA, Dickinson MC, Mingle LA, Lapierre P, Musser KA, Wolfgang WJ (2015) Paracoccus sanguinis sp. nov., isolated from clinical specimens of New York State patients. Int J Syst Evol Microbiol 65:1877–1882
McTaggart LR, Richardson SE, Witkowska M, Zhang SX (2010) Phylogeny and identification of Nocardia species on the basis of multilocus sequence analysis. J Clin Microbiol 48:4525–4533
Mehrshad M, Amoozegar MA, Ghai R, Shahzadeh Fazeli SA, Rodriguez-Valera F (2015) Genome reconstruction from metagenomic data sets reveals novel microbes in the brackish waters of the Caspian Sea. Appl Environ Microbiol 82:1599–1612
Meier-Kolthoff JP, Auch AF, Klenk HP, Göker M (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinform 14:60
Meier-Kolthoff JP, Hahnke RL, Petersen J, Scheuner C, Michael V, Fiebig A, Rohde C, Rohde M, Fartmann B, Goodwin LA, Chertkov O, Reddy TBK, Pati A, Ivanova NN, Markowitz V, Kyrpides NC, Woyke T, Göker M, Klenk HP (2014a) Complete genome sequence of DSM 30083T, the type strain (U5/41T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Stand Genom Sci 9:2
Meier-Kolthoff JP, Klenk HP, Göker M (2014b) Taxonomic use of DNA G+C content and DNA-DNA hybridisation in the genomic age. Int J Syst Evol Microbiol 64:352–356
Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA (2008) The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform 9:386
Miyoshi-Akiyama T, Hayakawa K, Ohmagari N, Shimojima M, Kirikae T (2013) Multilocus sequence typing (MLST) for characterization of Enterobacter cloacae. PLoS ONE 8:e66358
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 35:W182–W185
Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from the international DNA sequence databases: status for the year 2000. Nucleic Acids Res 28:292
Oberto J (2013) SyntTax: a web server linking synteny to prokaryotic taxonomy. BMC Bioinform 14:4
Orla-Jensen S (1909) Die Hauptlinien des naturalischen Bakterien systems nebsteiner Ubersicht der Garungsphenomene. Zentralbl Bakteriol Parasitenkd Abt II 22:305–346 (in German)
Pai S, Esen N, Pan X, Musser JM (1997) Routine rapid Mycobacterium species assignment based on species-specific allelic variation in the 65-kilodalton heat shock protein gene (hsp65). Arch Pathol Lab Med 121:859–864
Pašić L, Rodriguez-Mueller B, Martin-Cuadrado A-B, Mira A, Rohwer F, Rodriguez-Valera R (2009) Metagenomic islands of hyperhalophiles: the case of Salinibacter ruber. BMC Genom 10:570
Perevalova AA, Kublanov IV, Bidzhieva SK, Mukhopadhyay B, Bonch-Osmolovskaya EA, Lebedinsky AV (2016) Reclassification of Desulfurococcus mobilis as a synonym of Desulfurococcus mucosus, Desulfurococcus fermentans and Desulfurococcus kamchatkensis as synonyms of Desulfurococcus amylolyticus and emendation of the D. mucosus and D. amylolyticus species descriptions. Int J Syst Evol Microbiol 66:514–517
Pillonel T, Bertelli C, Salamin N, Greub G (2015) Taxogenomics of the order Chlamydiales. Int J Syst Evol Microbiol 65:1381–1393
Plotkin JB, Kudla G (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42
Prakash O, Verma M, Sharma P, Kumar M, Kumari K, Singh A, Kumari H, Jit S, Gupta SK, Khanna M, Lal R (2007) Polyphasic approach of bacterial classification—an overview of recent advances. Indian J Microbiol 47:98–108
Pucker B, Holtgräwe D, Sörensen TR, Stracke R, Viehöver P, Weisshaar B (2016) A de novo genome sequence assembly of the Arabidopsis thaliana accession Niederzenz-1 displays presence/absence variation and strong synteny. PLoS ONE 11:e0164321
Puri A, Rai A, Dhanaraj P, Lal R, Patel D, Kaicker A, Verma M (2016) An in silico approach for identification of the pathogenic species, Helicobacter pylori and its relatives. Indian J Microbiol 56:277–286
Ramasamy D, Mishra AK, Lagier JC, Padhmanabhan R, Rossi M, Sentausa E (2014) A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species. Int J Syst Evol Microbiol 64:384–391
Ran W, Kristensen DM, Koonin EV (2014) Coupling between protein level selection and codon usage optimization in the evolution of bacteria and archaea. MBio 5:e00956-14
Rho M, Tang H, Ye Y (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res 38:e191
Richter M, Rosselló-Móra R (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci USA 106:19126–19131
Rinaldi R, Deynze AV, Portis E, Rotino GL, Toppino L, Hill T, Ashrafi H, Barchi L, Lanteri S (2016) New insights on eggplant/tomato/pepper synteny and identification of eggplant and pepper orthologous QTL. Front Plant Sci 7:1031
Rosselló-Móra R (2006) DNA-DNA reassociation methods applied to microbial taxonomy and their critical evaluation. In: Stackebrandt E (ed) Molecular identification, systematics and population structure of prokaryotes. Springer, Heidelberg, pp 23–50
Rosselló-Móra R, Amann R (2015) Past and future species definitions for bacteria and archaea. Syst Appl Microbiol 38:209–216
Roth A, Fischer M, Hamid ME, Michalke S, Ludwig W, Mauch H (1998) Differentiation of phylogenetically related slowly growing mycobacterial based on 16S–23S rRNA gene internal transcribed spacer sequences. J Clin Microbiol 36:139–147
Sangal V, Goodfellow M, Jones AL, Schwalbe EC, Blom J, Hoskisson PA, Sutcliffe IC (2016) Next-generation systematic: an innovative approach to resolve the structure of complex prokaryotic taxa. Sci Rep 6:38392
Schmidt R, Acarkan A, Boivin K (2001) Comparative structural genomics in the Brassicaceae family. Plant Physiol Biochem 39:253–262
Sentausa E, Fournier PE (2013) Advantages and limitations of genomics in prokaryotic taxonomy. Clin Microbiol Infect 19:790–795
Sharma P, Jindal S, Bala K, Kumari K, Niharika N, Kaur J, Pandey G, Pandey R, Russell RJ, Oakeshott JG, Lal R (2014) Functional screening of enzymes and bacteria for the dechlorination of hexachlorocyclohexane by a high-throughput colorimetric assay. Biodegradation 25:179–187
Sharma A, Gilbert JA, Lal R (2016a) (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans. Sci Rep 6:25527
Sharma A, Kohli P, Singh Y, Schumann P, Lal R (2016b) Fictibacillus halophilus sp. nov., from a microbial mat of a hot spring atop the Himalayan range. Int J Syst Evol Microbiol 66:2409–2416
Sharp PM, Tuohy TMF, Mosurski KR (1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14:5125–5143
Stenico M, LIoyd AT, Sharp PM (1994) Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res 22:2437–2446
Stewart EJ (2012) Growing unculturable bacteria. J Bacteriol 194:4151–4160
Stewart LC, Jung JH, Kim YT, Kwon SW, Park CS, Holden JF (2015) Methanocaldococcus bathoardescens sp. nov., a hyperthermophilic methanogen isolated from a volcanically active deep-sea hydrothermal vent. Int J Syst Evol Microbiol 65:1280–1283
Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA 48:582–592
Sutcliffe IC (2015) Challenging the anthropocentric emphasis on phenotypic testing in prokaryotic species descriptions: rip it up and start again. Front Genet 6:218
Sutcliffe IC, Trujillo ME (2012) Publication of descriptions of novel bacterial taxa in Antonie van Leeuwenhoek. Anton van Leeuwenhoek 101:13–20
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS et al (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29:22–28
Teeling H, Waldmann J, Lombardot T, Bauer M, Glöckner FO (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinform 5:163
Tettelin H, Masignani V, Cieslewicz MJ, Donati C et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci USA 102:13950–13955
Thompson CC, Vicente ACP, Souza RC, Vasconcelos ATR, Vesth T, Alves N, Ussery DW, Iida T, Thompson FL (2009) Genomic taxonomy of vibrios. BMC Evol Biol 9:258
Thompson CC, Vieira NM, Vicente AC, Thompson FL (2011) Towards a genome based taxonomy of Mycoplasmas. Infect Genet Evol 11:1798–1804
Thompson CC, Chimetto L, Edwards RA, Swings J, Stackebrandt E, Fabiano L (2013) Microbial genomic taxonomy. BMC Genom 14:913–917
Tindall BJ, Rosselló-Mora R, Busse HJ, Ludwig W, Kämpfer P (2010) Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol 60:249–266
Tripathi C, Mishra H, Khurana H, Dwivedi V, Kamra K, Negi RK, Lal R (2017) Complete genome analysis of Thermus parvatiensis and comparative genomics of Thermus spp. provide insights into genetic variability and evolution of natural competence as strategic survival attributes. Front Microbiol 8:1410
Vandamme P, Peeters C (2014) Time to revisit polyphasic taxonomy. Anton van Leeuwenhoek 106:57–65
Verbeke TJ, Sparling R, Hill JE, Links MG, Levin D, Dumonceaux TJ (2011) Predicting relatedness of bacterial genomes using the chaperonin-60 universal target (cpn60 UT): application to Thermoanaerobacter species. Syst Appl Microbiol 34:171–179
Verma H, Kumar R, Oldach P, Sangwan N, Khurana JP, Gilbert JA, Lal R (2014) Comparative genomic analysis of nine Sphingobium strains: insights into their evolution and hexachlorocyclohexane (HCH) degradation pathways. BMC Genom 15:1014
Verma H, Bajaj A, Kumar R, Kaur J, Anand S, Nayyar N, Puri A, Singh Y, Khurana JP, Lal R (2017) Genome organization of Sphingobium indicum B90A: an archetypal Hexachlorocyclohexane (HCH) degrading genotype. Genome Biol Evol. doi:10.1093/gbe/evx133
Větrovský T, Baldrian P (2013) The Variability of the 16S rRNA Gene in Bacterial Genomes and Its Consequences for Bacterial Community Analyses. PLoS ONE 8:e57923
Wald N, Alroy M, Botzman M, Margalit H (2012) Codon usage bias in prokaryotic pyrimidine-ending codons is associated with the degeneracy of the encoded amino acids. Nucleic Acids Res 40:7074–7083
Wang Y, Diehl A, Wu F, Vrebalov J, Giovannoni J, Siepel A, Tanksley SD (2008) Sequencing and comparative analysis of a conserved syntenic segment in the Solanaceae. Genetics 180:391–408
Wayne LG, Brenner DJ, Colwell RR, Grimont PA, Kandler O, Krichevsky MI, Moore LH, Moore WE, Murray R, Stackebrandt ES, Starr MP (1987) Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Evol Microbiol 37:463–464
Whitman WB, Woyke T, Klenk HP, Zhou Y, Lilburn TG, Beck BJ, De Vos P, Vandamme P, Eisen JA, Garrity G, Hugenholtz P, Kyrpides NC (2015) Genomic Encyclopedia of Bacterial and Archaeal Type Strains, Phase III: the genomes of soil and plant-associated and newly described type strains. Stand Genom Sci 10:26
Will KW, Mishler BD, Wheeler QD (2005) The perils of DNA barcoding and the need for integrative taxonomy. Syst Biol 54:844–851
Woese CR (1987) Bacterial evolution. Microbiol Rev 51:221–271
Wu M, Scott AJ (2012) Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28:1033–1034
Wu D, Hugenholtz P, Mavromatis K et al (2009) A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462:1056–1060
Wu S, Zhu Z, Fu L, Niu B, Li W (2011) WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genom 12:444
Xu Y, Bi C, Wu G, Wei S, Dai X, Yin T, Ye N (2016) VGSC: a web-based vector graph toolkit of genome synteny and collinearity. Biomed Res Int 2016:7823429
Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer KH, Whitman WB, Euzéby J, Amann R, Rosselló-Móra R (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12:635–645
Ye Y, Doak TG (2009) A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 5:e1000465
Yi H, Chun J (2015) Neisseria weaveri Andersen,1993 is a later heterotypic synonym of Neisseria weaveri Holmes et al. 1993. Int J Syst Evol Microbiol 65:463–464
Zeaiter Z, Liang Z, Raoult D (2002) Genetic classification and differentiation of Bartonella species based on comparison of partial ftsZ gene sequences. J Clin Microbiol 40:3641–3647
Acknowledgements
GD thanks Kirori Mal College, University of Delhi for providing study leave to participate in this study.
Funding
This study was funded by Government of India under project from Department of Biotechnology (DBT), National Bureau of Agriculturally Important Microorganisms-Indian Council of Agricultural Research (NBAIM-ICAR), University of Delhi Research and Development (DU-R&D) Grant, ICAR and DU-DST Promotion of University Research and Scientific Excellence (PURSE). NKM, VG, PS, RK, HV, CT, PR, AS, NS, US, PH, PK, NN, AP, AB, RK, VN, CT, HK, SN, MS and AKS gratefully acknowledge funds from Council of Scientific and Industrial Research, University Grants Commission (UGC), UGC-DSK PDF, DBT, NBAIM, ICAR, DU and Indian Council of Medical Research for providing research fellowships.
Conflict of interest
All authors declare that there is no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahato, N.K., Gupta, V., Singh, P. et al. Microbial taxonomy in the era of OMICS: application of DNA sequences, computational tools and techniques. Antonie van Leeuwenhoek 110, 1357–1371 (2017). https://doi.org/10.1007/s10482-017-0928-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10482-017-0928-1