Introduction

The biosphere was formed by and is completely dependent on the metabolism of microorganisms and on their interactions with each other. Currently, it is estimated that there are about 4–6 × 1030 different prokaryotic cells, exceeding, by various orders of magnitude, all plant and animal diversity [124]. Bacterial cells arose about 3.8 billion years ago; they are microscopic, morphologically simple and are widespread throughout all environments, including those with extreme conditions [3, 44, 55, 69, 97, 103, 114]. The long history and the importance of microorganisms help explain the great morphological, physiological, and genetic diversity of these forms of life [114]. This enormous genetic variability is the result of rare mutations and recombination events, which allow them to reply to environmental changes. Bacteria can exchange and acquire genes from distantly related organisms by horizontal gene transfer (HGT), consequently increasing rates of speciation, which can be considerably higher than in eukaryotes [14, 52, 119, 123].

The fundamental unit of biological diversity and the basis of taxonomic hierarchy is the species. However, most bacteria cannot be isolated and cultured with current techniques; consequently, natural bacterial diversity is still relatively unexplored. For more than 100 years, microorganisms have been described and identified by culture methodologies [36, 62, 97, 103]. During this period, there has been immense progress in clinical microbiology, much more than in environmental microbiology, due to the importance of microorganisms for public health. Recently, the astonishing diversity of microorganisms has been discovered, along with their importance in providing environmental services essential for sustained life on Earth. Bacteria play important roles in biogeochemical cycles (carbon, nitrogen, and other minerals), bioremediation processes, energy conversion, biocatalysis, and natural product synthesis; this makes bacteria important potential resources for new industrial and biomedical processes [9, 71, 24]. Consequently, there has been increasing interest in the exploration of natural bacterial communities and their beneficial aspects; such studies are required to fulfill the promise of bacterial biotechnology.

Bacterial species are regarded as a genomically coherent group sharing a high degree of similarity in several different phenotypic and genomic properties that have been characterized through a polyphasic approach. However, knowledge about natural bacterial diversity is still limited, and there are only 6,373 validly described species [32] (for updates see http://www.bacterio.cict.fr). The difficulties and limitations of the methods available for culturing bacteria found in natural environments, such as soil and freshwater, has made the study of microbial diversity a difficult and complex effort [36, 64, 84, 97, 98, 114]. Nevertheless, during the last few years, advances and improvement of molecular techniques based on DNA sequencing and analysis of ribosomal gene sequences (mainly of small subunit 16S rRNA- SSU rRNA) of various prokaryotes has provided data that give considerable information on taxonomic relationships, ecological roles and the evolution of bacterial species found in environmental samples, without the need for isolation and culture. These new techniques have uncovered new bacterial functions and have revolutionized our concept of the value of microbial diversity, which till now has been largely undescribed [20, 27, 35, 36, 59, 63, 64, 72, 83, 97, 98, 114, 125]. Hence, bacterial ecology and industrial microbiology have come together; biodiversity studies have potential for important discoveries in both fields.

Approaches used to identify bacterial species

At present, it is widely accepted that it is necessary to use various methods to identify and characterize bacteria species and to study and determine the diversity of this domain. A combination of many different methodologies has been directed toward analyzing phenotypic, genomic, and phylogenetic characteristics for taxonomic purposes; this combination is defined as a ‘polyphasic approach’ (Fig. 1) [5, 15, 20, 37, 98, 117]. In practice, the meaning of specific tests has been influenced by how they correlate with DNA analysis for the identification of a particular bacterium.

Fig. 1
figure 1

Useful culture dependent and culture independent methods. RFLP restriction fragment length polymorphism, ARDRA amplified rDNA restriction analysis, DGGE denaturing gradient gel electrophoresis, TGGE temperature gradient gel electrophoresis, SSCP single-stranded conformation polymorphism, FISH fluorescence in situ hybridization

Characterizing phenotypes

The classical phenotypic tests have been used for morphological, physiological, and biochemical identification of bacteria and for biotechnological applications [44, 109]. Substrate utilization and bacterial product production profiles provide important data for studies of functional biodiversity. Phenotypic characterization has proven useful to study and characterize genes that encode proteins, which could have industrial applications (Table 1).

Table 1 Biotechnological importance of bacterial phenotype characterization

A combination of phenotypic and genomic information is required for correct description and classification of bacteria [15, 99, 113]. Recently, significant developments in various areas (chemistry, molecular biology, and bioinformatics) have provided new techniques of improving our knowledge of microbial diversity, as well as facilitating taxonomic and biotechnological studies [82, 99].

DNA–DNA hybridization

Whole genome DNA–DNA hybridization has been used for bacterial genotypic characterization for decades, increasing the quality and quantity of information useful for identification, which was previously limited to phenotypic properties [99]. DNA–DNA hybridization was one of the first molecular researches [18] and until now remains a cornerstone for bacterial species delineation [108]. This standard technique allows for comparison and measurement of genomic similarities between the total genome of two species under standardized conditions. A group of strains showing 70% or greater DNA–DNA similarities and with 5°C or less in thermal stability between homologuous and heterologuous heteroduplexes is considered to be the same species [45, 58, 117, 122]. Nevertheless, this method is arbitrary in its ability to evaluate the actual sequence similarity between two whole genomes and to suitably describe bacterial species [87, 90, 117]. Additionally, DNA–DNA hybridization studies are time-consuming; they allow the study of only a few bacterial clusters and are not applicable to uncultured organisms, which account for the majority of living prokaryotes [37, 58, 80, 87]. Despite the drawbacks and limitations of this method, it has been one of the most important approaches in bacterial species circumscription, as shown by Mechichi et al. [79, 108]. They described two new species of the genus Thauera and one species of the genus Azoarcus, using a polyphasic approach. They used DNA–DNA hybridization, which is an essential step for the differentiation of strains that have been isolated from different environments. These strains degrade aromatic compounds, such as phenol, benzoate, and toluene. The characterization of these three new species demonstrates the importance of genome research and it will be extremely important for analysis of novel aromatic catabolic functions and for the use of these bacteria as biocatalysts for the biodegradation and biotransformation of aromatic compounds [79].

Considering the significant role of DNA–DNA hybridization assay and the difficulty to implement this technique in routine laboratories, novel and alternative methods, such as melting profiles in microplates, random genome fragments, and DNA microarrays have been proposed to improve or supplement this approach [11, 80, 85]. Microarrays have potential as a rapid technique for environmental and phylogenetic studies, allowing the analysis of a large number of samples. An example of application is the rapid detection of 90 antibiotic resistance genes in 36 strains of Gram-positive bacteria (Bacillus, Clostridium, Enterococcus, Lactococcus, Lactobacillus, Listeria, Staphylococcus, and Streptococcus). Using this technique, three different genes involved in erythromycin resistance were detected in Staphylococcus haemolyticus and three genes carried by transposon Tn5405 were detected in Clostridium perfringens. These results demonstrated that microarrays make rapid screening of antibiotic resistance genes possible and that they can be useful for industrial and biomedical applications [93].

RNA ribosomal (rRNA) genes

Ribosomal RNAs are ancient molecules, being primordial participants of cell protein production machinery; they have a ubiquitous distribution, are conserved between organisms that are phylogenetically distant and are not affected by environmental changes [99]. Ribosomal RNA genes can occur in variable numbers in different organisms and are almost unaffected by horizontal genetic transference mechanisms. The combination of these properties makes these genes suitable for studies of microbial evolution and phylogeny [1, 12].

Bacteria have three genes that code 5S, 16S, and 23S rRNAs, essential components of ribosomes involved in the translation of messenger RNA (mRNA) for protein synthesis. These genes are typically organized into an operon that contains internal transcribed spacers (ITS), which vary widely in length and sequences [1, 34, 61, 65]. Spacer regions, especially those located between the 16S and 23S rRNA genes, have more genetic variation than the other regions that code rRNA genes, due to variation in length and in the number of tRNA genes contained in the sequences. ITS polymorphism can be useful for differentiation of closely related bacterial genera and species [13, 20, 25, 61]. Bacterial species can have up to 15 copies of the ribosomal operon in their genome [1, 34, 65].

Analysis of bacterial diversity

The 16S rRNA gene has several characteristics that favor its use for molecular studies, including suitable length, about 1,500 bp, the highly conserved regions among different and distant species and ease of its manipulation [4, 99]. This gene is generally weakly affected by HGT; currently, there is considerable information available in databases due to its extensive and growing utilization in microbial taxonomic studies. The properties of the 16S rRNA gene make it a suitable tool for phylogenetic inferences [99].

Woese in 1970 proposed a phylogenetic classification system for prokaryotic species based on the divergence of small subunit ribosomal RNA sequences (16S rRNA for prokaryotes and 18S rRNA for eukaryotes). Comparative analysis of SSU rRNA can elucidate evolutionary relationships and diversity among organisms. Woese introduced a view of the tree of life divided into three domains: Bacteria and Archaea, both classified as prokaryotes, and Eukarya. Inside the Bacteria domain, he described 11 divisions based on nucleotide sequences of 16S rRNA of cultured microorganisms [125]. This phylogenetic system, based on analysis of 16S rRNA gene sequences, opened new perspectives and became a useful model for studies of evolution and relationships among the various existing species [77]. There has been interest in examining the enormous unknown diversity of prokaryotes present in different ecosystems, based on information about the 16S rRNA gene in microorganisms that cannot be isolated in culture. Finding <97% 16S rRNA gene sequence similarity in comparison with known species has been regarded as a criterion for a new species description.

Today, it is believed that more than 53 divisions exist in the Bacteria domain, identified mainly through phylogenetic analysis of environmental samples that cannot be cultured, based on 16S rRNA gene sequences. However, most of the bacteria described in these new divisions based on molecular analysis are not cultivable [63]. This opens possibilities for obtaining evolutionary and phylogenetic information about new microorganisms, but frequently gives little information about their functional role, ecological relevance and genetic information about the different species of the community [54, 63, 81, 97, 104, 111].

Polymerase chain reaction (PCR)

PCR technology was developed in 1985 by Kary Mullis, and has had such a strong impact and has been so incredibly useful in many areas of science that the inventor received a Nobel Prize. Amplification of DNA by PCR has become extremely useful for bacterial detection in heterogeneous samples. Exploring bacterial biodiversity by PCR is the most commonly used method in studies of 16S rRNA genes; the information that is obtained can be compared with that from other techniques through pattern analysis (Table 2), including ARDRA, amplified rDNA restriction analysis [38, 68, 126]; DGGE, denaturant-gradient gel electrophoresis [106]; TGGE, temperature-gradient gel electrophoresis [8]; RFLP, restriction fragment length polymorphism [73]; SSCP, single-stranded conformation polymorphism [40]; evaluation of sample size and replication; FISH, fluorescence in situ hybridization [2]; cloning and sequencing; probe hybridization and microarrays. These techniques allow rapid and sensitive detection of bacteria diversity independent of culture methods and can also be used to identify metabolically active compounds if the desired gene is conserved enough to allow the design of specific primers.

Table 2 Advantages, limitations, and biotechnological applications of genetic fingerprinting methods

PCR-based DNA fingerprinting provides information about differences in bacterial populations. Over the last decade, there has been increasing interest in studying sulfate-reducing bacteria (SRB), which play an important role in the degradation of organic matter, an essential process in ecosystem and environmental remediation. Nested-PCR-DGGE was used to study the diversity of SRB in samples from mixed microbial communities, with high resolution and reproducibility. The knowledge acquired in this study proved to be useful in bioremediation processes and for studies to improve pollutant removal efficiency [21].

Aerobic thermophilic bacteria diversity isolated from Irish soils was analyzed by ARDRA; sequencing of 16S rRNA genes showed that an abundance of genetically different members of the genus Geobacillus predominated in the sample. Geobacillus species were found to be sources of diverse compounds (proteases, lipases, amylases, and pullanases) with biotechnological applications and of interest for industry. McMullan et al. described the capacity of Geobacillus isolates to metabolize herbicides and showed that these isolates are potential sources of genes that would be useful for agricultural biotechnology [78].

Sequencing of the 16S rRNA gene and comparative analysis of the sequences is useful for understanding phylogenetic relationships among prokaryotes above the species level. Nevertheless, as the more information we have about 16S rRNA gene sequences, the more evident it becomes that they have limitations in their use in studies to distinguish closely related but ecologically distinct bacteria [90, 92]. Approaches based only on 16S rRNA gene sequences are insufficient to classify prokaryotic species and do not reveal the true relationships among the genomes of microorganisms.

Limitations of 16S rRNA sequence analysis

The finding of more than one 16S rRNA gene sequence in a single genome and variation in operons among strains of the same species provides critical information on bacterial diversity and evolution. The heterogeneity of 16S rRNA gene sequences among operons of the same genome is not an indication of the actual bacterial diversity [1, 19, 37, 105, 121]. Thermophilic microorganisms have the highest divergence among 16S rRNA gene sequences within a single genome; analyses have suggested that the rrnC operon was acquired by HGT. Generally, the divergence among 16S rRNA gene sequences in the same genome is less than 1% [1]. High similarity can be found in 16S rRNA gene sequences among some closely related microorganisms that lack resolution at the species level, as has been described for Bacillus, Ochrobactrum, Enterobacter and Taylorella [49, 67, 75, 95]. The considerable stability of 16S rRNA gene sequences and the low rate of evolution of this gene sometimes do not permit the identification of ecotypes; consequently, other techniques that examine genomic and physiological diversities and ecological niches may need to be applied [60, 70, 116]. Jaspers and Overmann [60] isolated 11 strains of Brevundiomas alba with identical 16S rRNA gene sequences. Using a combination of methods, they found great genetic diversity among strains from distinct populations with genetically determined adaptations and concluded that these strains probably occupy different ecological niches. Another limitation is the use of a universal pair of primers for amplification of 16S rRNA gene sequences in studies of bacterial diversity. PCR artifacts, such as preferential amplification of certain sequence types, generation of chimeric sequences and false positives due to experimental contaminants, can give inaccurate information about the actual diversity of microbial communities in environmental samples [4, 20, 28, 34, 51].

New approaches to access bacterial diversity

Approaches based on housekeeping genes that exist in a single copy have been suggested due to the heterogeneity of 16S rDNA and the implications of this heterogeneity for studies of bacterial diversity and phylogeny [20, 37]. Hence, the use of other genes, such as rpoB, gyrB, recA, dnaK, and hsp60 (also known as groel and cpn60) has been more suitable to discriminate species for some groups [19, 43, 49, 70, 74, 95, 113]. An example is the use of the rpoB gene sequence for rapid identification of Bacillus and the use of hsp60 for phylogenetic and taxonomic studies of Enterobacter [49, 50, 56].

Multilocus sequence typing (MLST) is a new approach that has been used to obtain genetic information for characterization of distinct strains of bacteria within known species, analyzing sequences of internal fragments of a set of housekeeping loci (∼7). MLST has been considered a powerful tool for bacterial typing and has been developed for several pathogenic bacteria, such as Vibrio sp., Neisseria meningitides, Streptoccocus pyogenes, Staphylococus aureus and Campylobacter jejuni [10, 22, 30, 33, 37, 113]. The combination of allele numbers leads to the discovery of the sequence type. Feil et al. [33] concluded that variation at a single nucleotide site within a meningococcal housekeeping gene is more probable to occur as a result of recombination than due to point mutation in bound N. meningitidis clonal complexes. Thompson et al. [113] believes that MLST will improve the knowledge of taxonomy, and phylogeny of Vibrios indicated that MLST will improve knowledge on the taxonomy and phylogeny of Vibrios. MLST has been shown to be a very specific and unambiguous method for the characterization of bacteria species; recently, it was also used for non-pathogenic bacteria, including Lactobacillus plantarum [22].

Multilocus sequence analysis (MLSA) is a more general technique than MLST, suitable for genotypic characterization of a more diverse group, using sequences of genes that encode proteins, which are ubiquitous and are present in a single copy of the genome and present at least in the taxa under study. Bacterial identification is first made by sequencing the 16S rRNA gene of an unknown strain, identifying it at the genus or family level and then determining a set of genes and primers to be used to identify strains at the species level using MLSA [37]. The MLSA approach has been used to identify clinical and environmental enterococcal species, using rpoA and pheS genes; it is an efficient screening method for the detection of novel species [83].

Specific functional genes are other options for bacterial diversity studies, whenever metabolic function is known for cultured microorganisms. These genes are examined to determine the relation between structure and function, such as the genes pmoA and mxaF for methanotrophic bacteria [6, 53] and nodD for Rhizobium [120]. Cloning and sequencing of functional genes from environmental samples is extremely useful for investigation of ecologically distinct groups and classification at the species level [88, 89].

Importance of culturing techniques

Traditionally, culturing recovers only a minute fraction of bacterial diversity. This significantly reduces our understanding of the actual physiological and metabolic properties of the community of microorganisms found in natural environments [81, 92].

There is no doubt that the culture methodologies are still important tools for the study of prokaryote diversity. Continued research is required and new methods need to be developed for the isolation of newly identified organisms that are discovered through techniques that do not depend on bacterial culture. Culture is still needed to help understand the characteristics and properties of these groups and their contribution to the immense existing prokaryotic diversity [20, 54, 81, 97, 103].

Bacterial populations that are phylogenetically closely related can be distinct physiologically (definition of microdiversity), while phylogenetically distant species can have physiological similarities and can coexist in the same natural environment [60, 99].

Though we can obtain phylogenetic data about unculturable microbes isolated from natural environments, normally there is no means of obtaining information about their genetic and ecophysiological characteristics. However, the data obtained through molecular methodologies using 16S rRNA gene sequences are very important for our efforts to develop and improve culture methods for bacteria isolation that will allow studies of morphology, physiology, abundance, and distribution in natural habitats [60, 99]. Nowadays, some progress has been made in this area, in the development of new methods of culture that allow the isolation of bacteria previously not cultivable, in order to explore the potential of these microorganisms and to understand their ecological role. These studies have associated new culture techniques with analysis of 16S rRNA gene sequences, opening new perspectives and improving our knowledge about bacterial communities in many ecosystems [27, 29, 46, 59, 63, 76, 85, 92, 107]. When these methodologies are associated, most of the 16S rRNA gene sequences analyzed from clones of libraries constructed from total DNA extracted directly from environmental samples are found to be different from sequences obtained from cultivable bacteria isolated from the same sample, identical sequences rarely occurring [29, 92]. However, 16S rRNA gene sequence libraries have more objective data than those obtained with culture techniques. The combination of the two methodologies will likely provide additional information about the diversity and physiology of the natural bacterial community, giving relevance to the results obtained with the polyphasic approach [26, 27, 29, 76, 92]. Research with cultured bacteria that have been isolated is of enormous relevance for our knowledge of this practically unexplored universe, since bacteria perform essential roles in various ecosystems.

Metagenome: a function and sequence-driven approach

Nowadays, one of the most widely adopted strategies for studying microbial diversity is the metagenomic research. A metagenome is the entire genetic composition of microbial communities of an environment; this approach is based on direct isolation of total DNA in environmental samples, construction of libraries and the amplification of 16S rRNA genes and functional genes to study the total diversity, physiology, ecology and phylogeny of bacteria that cannot be cultivated in the laboratory (Fig. 2) [71, 100, 110, 111, 118]. Such investigations aim to reveal and understand the relationship between community composition and functional diversity in natural microbial ecosystems.

Fig. 2
figure 2

Scheme of construction and screening of environmental metagenomic libraries

Metagenomic research is useful to exploit the unknown bacterial diversity in different environments; it can be used to discover novel genes and to increase our knowledge on bacterial ecology and physiology [17, 110]. The 16S rRNA gene accounts for a minor fraction of the average prokaryotic genome and it does not give information about the physiology of the bacteria [97]. Metagenomic approaches using 16S rRNA gene sequences as a phylogenetic marker have been used to characterize uncultivated prokaryotes and can help to discover metabolic functions, enhancing our knowledge about bacterial ecology and phylogeny [86, 96, 110, 115].

We can use metagenomic sequences to help understand how complex microbial communities function and how bacteria interact within these niches. The diverse bacteria in a natural environment can be a complex chemical source of many undiscovered biodynamic compounds, with potential for bioprospecting. New antibiotics, enzymes and proteins have been identified by functional analysis of metagenomic libraries, including turbomycin A and B, lipases, amylases, nucleases and hemolytic activities (Table 3) [39, 96, 100]. Metagenomics thereby has two main goals: the identifying novel genes and increasing our understanding of microbial ecology [71]. This approach is very promising for novel biochemical and ecological discoveries, though development and improvement of new methodologies is essential for us to take advantage of this data.

Table 3 Functional genes of biotechnological importance identified by metagenomic approaches

Futures perspectives

All of the approaches that are available today have advantages and limitations, though none of them provide complete access to the extremely important and complex bacterial world. These new techniques, which are in constant development, have provided powerful and important confirmation of previous phenotypic and genotypic studies of bacteria. The combination of different methods is still the most suitable way of having a better understanding about diversity, phylogeny, ecology, evolution, and taxonomy of the largest group of living organisms on Earth—the Prokaryotes. Several questions remain to be resolved and the collaboration of taxonomists, microbiologists, and molecular biologists is essential and very important for the integration of the different research methods to allow for a proper assessment of microbial diversity and its real potential.