Keywords

18.1 Introduction

Since the earliest identification of living beings or microbes, researchers have been developing systematic categorization methods in the field of evolution and phylogeny. This gets more problematic in the context of bacteria, the most common kind of microbe. Bacteria reproduce asexually, which explains the traditional concept of species as a collection of organisms capable of interbreeding and procreating fertile offspring is not universally applicable. Additionally, bacteria’s tiny size contributes to their restricted variety of morphological features. Bacteria show a broad variety of biochemical variation in terms of cell structure and metabolism, and although this provides some background knowledge about their taxonomy, it is far from comprehensive. With the advent of molecular biology, a new revolution has occurred, and this new revolution has made major contributions to bacterial taxonomy and systematics, as well as to other fields of biological taxonomy.

In the 1970s, Carl Woese proposed a classification system based on the molecular comparisons of evolutionarily conserved ribosomal genes and segregated two domains, Bacteria and Archaea as they are different from the Eukaryotes (contains all the higher forms of the organism) (Woese 1987). Now the ribosomal based classification system is widely accepted by the microbiologist across the world. Still the bacterial systematic is still evolving and also the standardized concept of bacterial species constituents (Berg et al. 2020). However, molecular-based systematics has given a strong outline for designing classification schemes.

Despite the lack of a clear logical and precise definition of species, traditional techniques continue to be used in a wide variety of fields or sectors. However, modern molecular methods (genomics and proteomics) provide superior characterisation than traditional techniques. They rapidly produce multidirectional information on both microbial communities and their taxonomic relationships.

In most cases, bacteria can only be identified using restricted approaches that rely on genetic techniques that use microorganism genetic profiling and phenotypic techniques that use metabolic characteristics and chemical composition of the organism to identify the microbes in question. The benefits of genotypic methods above phenotypic techniques are not influenced by the physiological condition, medium composition, or growth phase of the organisms. The phenotypic techniques, on the other hand, revealed the functional features of organisms, such as metabolic processes, that are required for their survival, growth, and development.

18.2 Genomic Methods

These methods are based on the analysis of the genome that is represented by the haploid set of genes or chromosomes within an organism. It may be classified as structural genomics and functional genomics (Wang et al. 2020; Raghu et al. 2021; Soni et al. 2021). Structural genotyping involves the gene location, sequence, and physical characterization; while, functional genotyping involves gene regulation and protein expression (Soni et al. 2016; Suyal et al. 2019a). Moreover, the combinations of various “meta-” and “-omics” technologies have made it beneficial to humankind especially in environmental, medical, industrial, and agricultural fields (Suyal et al. 2015a, b, 2019c).

For the identification, genotypic methods are classified into two distinctive categories: (1) pattern- or fingerprint-based techniques and (2) sequence-based techniques. Pattern-based methods make use of a systematic process that generates a sequence of fragments from the chromosomal DNA of the organism being studied. The fragments are then segregated based on the size which generates a profile (or fingerprint) unique to that organism and its close relatives. Then, using the information gathered, researchers can create a database of the particular organism (fingerprint), which can then be used as a reference for the test organisms to compare against (Emerson et al. 2008). If two profiles of different organisms’ match, they can be viewed as a close relative of each other, particularly at the level of strain or species. On the other hand, sequence-based techniques usually depend on the specific stretch of DNA or chromosome but do not always like the specific gene. In general, the approach is similar to the genotyping method: a specific sequence of DNA database created and then the test organism’s sequences compared with it. The degree of homology or similarity or matched sequences between the compared organisms is an estimate of how closely linked the organisms are among the compared organisms. Several computer-based algorithms have been created to build the phylogenetic tree in which we can compare the multiple sequences of different organisms to one another at a time. Thus, by making use of sequence comparisons of ribosomal RNA (rRNA) gene, archaea and bacteria can be easily distinguished as separate branches or having different relationships among the microorganisms (Raina et al. 2019). Both the techniques discussed above have merits and demerits. Conventionally, for the establishment of phylogenetic relationship among the bacteria at phylum, order, family, genus level, 16S rRNA gene sequence was analysed whereas for the establishment of relatedness at the species level or genus level the fingerprinting-based methods are good but less dependable above those levels (Vandamme et al. 1996). Fingerprinting and sequence-based methods combined with phenotypic characters is called polyphasic technique, is the standard approach nowadays to describe a new species or genus (Carro and Nouioui 2017).

18.3 Specific Genotyping Methodologies

The current techniques for characterisation may make use of a variety of fingerprinting or sequence-based methods, which may be employed either individually or in combination. These methods are continuously evolving and improving in terms of accuracy. Some of the most frequently used methods are listed below.

18.3.1 Fingerprinting-Based Methodologies

Among the genotypic methods, fingerprinting techniques are the most widely used presently. Techniques like Amplified fragment length polymorphism (AFLP), repetitive element PCR (rep-PCR), and random amplification of polymorphic DNA, utilize PCR for amplification of desired short DNA fragments by using specific primer sets (Sharma et al. 2020). These methods use the advantages of polymorphism in the DNA of the concerned organism which might be formed from the evolutionary process. A unique set of primer is used for more than one organism in the multiplex PCR; based on the molecular weight of amplicon (size) these sets can be separated through electrophoresis. They enable the fast identification of many microorganisms from a single sample combination (Settanni and Corsetti 2007).

Riboprinting utilizes sensitive probes instead of PCR to detect the difference in gene sequence or pattern between species and strain (Bruce 1996). It is one of several molecular methods that generates comparative data which is independent of the complexity of the morphology of the organisms. Diversi Lab system for rep-PCR (http://biomerieux-usa.com/diversilab) (Dou et al. 2015) and DuPont’s Ribo-Printer system (www2.dupont.com/Qualicon/en_US/) (Shintani 2013) have been exclusively developed commercial products bacterial identification. All the techniques discussed here are already mentioned in many kinds of literature as identification methods. These applications include source tracing, authentication of bacterial isolates for archiving reasons, taxonomy and systematics, as well as the identification of microbial population patterns, among other things.

18.3.2 Sequence-Based Methodologies

As the housekeeping genes are conserved and present universally in all the cells, the primer can be designed for the amplification of similar genes across the multiple genera. Multilocus sequencing (MLS) is a promising method developed to identify microbiological species. The principle is almost similar to 16S rRNA gene sequencing, but fragments of multiple “housekeeping” genes are sequenced. Later the combined sequences are put into one long sequence which is then compared with other sequences.

Since designing universal sets of primers is not possible, designing specific sets of primers for families or orders is a good concept. Two multilocus sequencing approaches used are multilocus sequence typing (MLST) and multilocus sequence analysis (MLSA). In MLST, a set of primers are used according to 6–10 genetic loci which allow the PCR amplification (amplicon size 400–600 bp). The concatenated sequences are then compared with the existing sequence database for the same organism. The result exhibits a very strong identification of a particular strain and showed a very close evolutionary relationship (Huebner et al. 2021). Among other things, this method may be used to monitor the spread of a disease and to demonstrate their usefulness in epidemiological research (Pérez-Losada et al. 2011). On the other hand, MLSA involves sequencing of multiple fragments of conserved protein-encoding genes, but with the more ad-hoc approach for gene selection for comparative analysis as it uses identification using a small subset of genes or loci (Glaeser and Kämpfer 2015). It identifies the organisms and finds relationships of species within genera of families in detail. One of the major limitations of this approach is the lack of standardization and central databases. Recently, several studies conducted with MLSA showed that instead of using a single common gene, different sets of genes were used for the identification of various bacterial phyla (Glaeser and Kämpfer 2015; Palmer et al. 2018). Hence comparative analysis is impossible with this technique.

18.4 The Genomic Future

Whole-genome comparisons have proven to be more accurate and precise than DNA-DNA hybridization and has gained more popularity over the phenotypic traits concept for bacterial classification and identification. At the moment, the notion of “species” is defined by digital whole genome comparisons utilising average nucleotide identities (ANIs) or genome-to-genome distance computations (GGDCs). Since the advent of whole genome sequencing, phylogenomics has made significant contributions to the field of contemporary taxonomy (Lalucat et al. 2020). Complete genome comparisons identify a species at the genomic level based on 95% average nucleotide identity between two related strains (Olm et al. 2020). The advanced technology of next-generation sequencing has provided more rapid, economical, and easily available sequence-based methods for the identification and classification of bacteria at all levels. Another promising approach for the identification and characterization of microbes at the community level is Microarray. It works by probing several genes on a substrate (example: glass, silicon, nylon, etc.) and further hybridizing with DNA or RNA samples (Solieri et al. 2013). For rapid detection of hybridized samples with probes, fluorescent reporter molecules are used as markers on the microarray. In addition, use of microarray is of great importance for medical purposes such as disease diagnosis and pathogen identification (Herrera-Rodriguez et al. 2013). Some modifications such as phylochips are used to identify specific or various groups of bacteria directly from the environment samples and geochips for the identification of microbes responsible for biogeochemical processes (Liu et al. 2021).

18.5 Proteomics Technologies in Bacterial Identification and Characterization

Genotypic and phenotypic methods are not enough to understand the physiological and functional activities of an organism at the protein level. Proteomics a new approach is a rapid way to explore biomolecules and understand their activity. It is based on mass spectrometry and provides an integrative study of genotypic and proteomic data with all the vital information (Suyal et al. 2018, 2019b). Several of the most widely used technologies include electrospray ionisation mass spectrometry (ESI-MS), matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), one- or two-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), and surface-enhanced laser desorption/ionization (SELDI) mass spectrometry, etc.

18.5.1 Mass Spectrometry-Based Bacterial Characterization and Identification

Thomson has invented mass spectrometry to determine the mass to charge ratio of electrons in the late nineteenth century. The method is used to identify, quantify, and deduce the structure of a wide range of molecules (Baghel et al. 2017). The twentieth century saw an expansion of technology and its applications to chemical characterization, physical measurement, and biological identification.

Some soft ionization methods in mass spectrometry such as ESI-MS and MALDI-TOF-MS have made it easier to analyze larger molecules. It allows direct use of samples in their native form for interrogation (Fenn et al. 1989). MS has shown better outcomes than traditional approaches: in resolving the time constraint and generating protein profiles. Applications of these abovementioned techniques for identification and characterization are described below:

18.5.1.1 Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry

The mass spectral method gives: detailed overview of whole bacterial cells, spectrum patterns over a broad mass range, and identification and characterization are done by comparing with reference data. Initially, application of MALDI-TOF-MS in rapid identification of whole bacteria was shown by Holland et al. (1996) after which various strains such as Mycobacteria sp. (Pignone et al. 2006), Staphylococcus sp. (Edwards-Jones et al. 2000), and extremophilic bacteria and archaea (Krader and Emerson 2004) have been analyzed using the same.

One of the most famous examples was during the first outbreak of methicillin-resistant S. aureus (MRSA) in European Hospitals (1960). The threat of spreading resistant S. aureus urgently required some rapid identification method. Edward-Jones et al. (2000) developed matrix-assisted laser desorption/ionization time-of-flight mass spectrometry for the same purpose as well as for the differentiation of methicillin-sensitive Staphylococcus aureus (MSSA) from methicillin-resistant Staphylococcus aureus (MRSA). The procedure involves smear preparation of a single bacterial colony on slide followed by applying matrix which is then observed using MALDI-TOF-MS. The analysis shows distinct spectral peaks for MRSA and MSSA. Based on this technique, several other instruments such as Bruker Daltonics’ MALDI BioTyper equipped with bioinformatics tools were developed. It also serves the same function by targeting some ribosomal proteins and proteins found in high amounts (Mellmann et al. 2008).

18.5.1.2 Electrospray Ionization Mass Spectrometry

ESI-MS is a potential approach for the characterization and analysis of various cellular components in microbes. It is considered more accurate in protein identification than MALDI-TOF-MS (as it is based on only molecular weight). ESI-MS uses peptide fragmentation fingerprints to search in the database and identifies the specific protein. The fingerprint is obtained by tandem mass spectrometry, in which target protein can be fragmented for second mass analysis. Modified approach, developed by integrating PCR with ESI-MS introduced Ibis Biosciences (the T5000 Biosensor System) for identification and characterization of bacteria (Sampath et al. 2007). Few major advantages associated with ESI-MS are rapid and fast process (Banerjee and Mazumdar 2012) provides specific identification of target bacteria in mixed culture; high resolution; and identification of virulence factors (Ho and Reddy 2011).

18.5.1.3 Surface-Enhanced Laser Desorption/Ionization

SELDI is a relatively new technique that separates proteins based on their binding affinity to a chip surface. Chemically and biologically modified chips are used for mass spectrometric analysis of complex protein mixtures. SELDI-MS generates a unique spectra pattern for proteins in the mixture based on their mass-to-charge ratio (You et al. 2013). Furthermore, different proteins can be identified from these profiles by comparing the respective peak intensities (Lu et al. 2010). Lundquist et al. (2005) demonstrated that SELDI-TOF-MS is one of the potential methods to produce distinct and reproducible protein profiles for the identification and discrimination of different species. For example, it made it possible to identify and distinguish the most infectious subspecies of Francisella tularensis out of four, the only subspecies found in North America causing tularemia in humans. This technique is used in the identification and characterization of bacteria, exploring bacterial proteomes, pathogen detection (Ho and Reddy 2011; Ardito et al. 2016), virulence factor identification, biomarker, and protein profiling in oncology (Langbein et al. 2006; Liu 2011), etc.

Although mass spectrometry plays a great role in the identification and characterization of bacteria by generating spectral patterns various factors cause difficulty in the reproducibility of protein profiles. Factors associated are physiological state of cell (García-Flores et al. 2012), growth medium of the cell (Wieme et al. 2014), sample preparation, the difference in instrument quality, and matrix selection (Wunschel et al. 2005; Vats et al. 2016). Scientists resolved this issue by introducing standard techniques for MALDI-TOF-MS of whole cells (Strejcek et al. 2018).

18.5.2 Gel-Based Method

SDS-PAGE is a widely used method for differentiating bacteria based on their protein contents. It separates the entire protein complement based on their charge and molecular weight. The difference in mobility of charged molecules leads to different migration patterns of proteins. This unique pattern helps to differentiate and characterize the variety of bacterial strains. It is considered promising fractionation technique and provides good resolution for proteins based on sizes, isoelectric points, and hydrophobic behaviour (Carruthers et al. 2015). The drawback of this approach is that it is time-consuming and tedious.

18.5.2.1 Two-Dimensional Gel Electrophoresis (2DE)

Combining SDS-PAGE with isoelectric focusing (IEF)—lead to the development of a new high-resolution technique named 2DE discovered by O’Farrell in 1975. It is capable of separating complicated protein mixtures in a single gel analysis. Two-dimensional gel electrophoresis begins with initial segregation on the basis of pH gradient associated with the isoelectric point of the proteins in the first dimension, followed by SDS-PAGE separation in the second dimension. Further staining of a gel with standard staining solutions for visualization of protein spots and analysis of protein gel patterns or 2DE maps (Soni et al. 2015; Kendrick et al. 2019). These patterns can be studied further and stored in reference databases for future use. It is often used for isolating and analyzing target protein from complex protein mixtures, and identification of unknown species by comparing differential expression 2DE maps, with a reference database. To obtain more efficient and complex proteome analysis 2DE is merged with mass spectrometry. Numerous reports demonstrated that this combined approach can be used to study the entire proteome or subproteome of a variety of species, including the exosporium of Bacillus anthracis spores (Redmond et al. 2004), Bacillus subtilis, Helicobacter pylori, E. coli, Pseudomonas aeruginosa, and Staphylococcus aureus (Hecker et al. 2003; Peng et al. 2005; Pieper et al. 2006). Databases with complete information of 2DE maps and mass spectra of known bacteria will provide rapid identification and efficient comparative study of unknown bacteria (Curreem et al. 2012). However, building such a database is generally a very tedious job.

18.6 Databases

To generate, archive, process and integrate large data sets of many samples with robust quality is a real challenge for both the (genomic and proteomic) approaches. There are a variety of databases and tools available which provide integrated data for the particular type of analysis. For genomic analysis databases based on 16SrRNA genes include green genes (DeSantis et al. 2006) and Ribosomal Database Project (Cole et al. 2009). On the other side, in-depth data analysis of proteomics has been carried out with tools like GlycoMod and databases such as Phospho Site (Gasteiger et al. 2003). Moreover, new algorithms have been developed that adapt to actual experimental phenomenon and parameters (Lees et al. 2016; San et al. 2020). Wilke et al. (2003) introduced the ProDB platform which provided enriched protein profile along with experimental set-up and parameters, like growth and culture conditions to check impact generation on mass-spectra profile.

18.7 Conclusion

The use of molecular technologies is at the core of the identification and characterisation of microorganisms. However, there are certain problems that need to be addressed, such as the functional knowledge of related instruments, their mobility, cost-effectiveness, and accessibility, among others. It will undoubtedly inspire students, researchers, and the scientific community to use a variety of technologies in order to achieve environmental sustainability.