1 Conception and Importance of Biodiversity

Conception

Biological diversity or biodiversity refers to the wide variety of living things on Earth, including the diversity of ecosystems, diversity of species and diversity of genes. Although studies of biodiversity have a long history, the term “biodiversity” was first used in 1986 by Walter G. Rosen in National Forum on Biodiversity (de Andrade Franco 2013). Ecosystem diversity is the largest scale of biodiversity that reveals variations in ecosystems on Earth such as the terrestrial ecosystem, the aquatic ecosystem, agricultural ecosystems, forestry ecosystems, etc., in which the organisms colonise and interact in trophic chains. The diversity of ecosystems can be measured in terms of variation in the complexity of communities, such as trophic levels, niche types/numbers, productivity and biotransformation efficiencies, etc., that depend on both species and genetic diversity (Ives and Carpente 2009). Species diversity is related to the numbers of species represented in the ecosystems or communities and considers both species richness and their relative abundance (species evenness) (Hill 1973). Gene or genetic diversity is usually applied to the biodiversity within species, relating to the total number of genetic characteristics in their chromosomes. This diversity allows microbial populations or species to adapt different environments. A greater gene diversity in a population or species means the existence of more alleles that offer the population and species a greater chance to adapt to variations in the environment and to maintain the population. It has been estimated that about 5.3 × 1031 megabases (Mb) of DNA exist on Earth (Landenmark et al. 2015), which form a huge gene pool for diverse metabolic pathways and for diversification of the species. In conclusion, biodiversity was defined by Wilson (1992) as “… all hereditarily based variation at all levels of organization, from the genes within a single local population, to the species composing all or part of a local community, and finally to the communities themselves that compose the living parts of the multifarious ecosystems of the world”.

Importance

The importance of biodiversity can be estimated from three aspects: the ecological aspect, economic aspect and scientific aspect. For the ecological aspect, biodiversity guarantees the wellness and equilibrium of ecosystems, since diverse organisms occupy different niches and functions and form trophic chains that drive the flux of energy and materials. However, the relationship between biodiversity and ecosystem function may be positive, but it can be inconsistent across scales and systems (Harrison et al. 2014), depending on the species functional traits in the ecosystem (Hooper et al. 2005). Usually, the more biological diversity present in an ecosystem, the more stability exists in the ecosystem; therefore, biodiversity is the basis for long-term sustainability of ecosystems in the face of environmental changes; however, not all ecosystems with higher biodiversity are more stable. The effects of biodiversity on stability of ecosystems may be through the following mechanisms: (1) species respond to environmental fluctuations asynchronously; (2) species respond at different speeds; and (3) species diversity reduces the strength of competition (Loreau and de Mazancourt 2013). The economic importance of biodiversity arises because the diverse organisms are bioresources for food production, for useful enzyme and metabolite (antibiotics, etc.) production as well as for biotransformation of components (nitrogen fixation, bioremediation, etc.). From the scientific aspect, “biodiversity is inherently multidimensional, encompassing taxonomic, functional, phylogenetic, genetic, landscape and many other elements of variability of life on the Earth” (Naeem et al. 2016). Indeed, as result of long-term evolution, each species is unique in nature and occupies its special position in phylogeny. In this case, the extinction of a species might mean a loss of irreplaceable genetic information and a breakdown of biological interaction in an ecosystem. The development of genome analysis offers a powerful tool and abundant genetic information for reconstructing the evolution of organisms. A recent study estimated the number of genes that possibly existed in the last universal common ancestor of cellular organisms (LUCA, or the progenote), based on an investigation of all 6.1 million protein-coding genes from prokaryotic genomes, and identified 355 protein families in LUCA, which “depict LUCA as anaerobic, CO2-fixing, H2-dependent with a Wood–Ljungdahl pathway, N2-fixing and thermophilic” (Weiss et al. 2016). With the phylogenies of these 355 genes, clostridia and methanogens were recognised as basal among bacteria and archaea, respectively. “LUCA inhabited a geochemically active environment rich in H2, CO2 and iron” (Weiss et al. 2016). These data offered genomic evidence for the autotrophic origin of life in a hydrothermal environment.

2 Bacterial Diversity and Taxonomy

Bacterial Diversity

Bacteria, together with archaea, are prokaryotes with single cells typically in shapes of sphere, rod, vibrio or spiral and size in the order of micrometers. They exist in almost all environments, including water bodies, soils, deep surfaces and endosphere of macroorganisms, and they are the living things with the longest history on Earth. It is estimated that the unicellular bacteria and archaea occurred on Earth about 4 billion years ago, and they occupied the Earth for about 3 billion years as the dominant, even the only forms of life on Earth (DeLong and Pace 2001; Schopf 1994). To estimate the evolution and systematic relationships of macroorganisms, fossil data are essential (Benton 2015), but fossil records do not allow us to retrace the origin and evolution of bacteria (Dodd et al. 2017; Schmidt and Schäfer 2005) because of the tiny and simple forms of bacteria. However, the development of gene sequence analysis (both the sequencing techniques and computer estimation) opened the gate to reconstruct bacterial phylogeny and to estimate their evolutionary history (Brown and Doolittle 1997; Di Giulio 2003). Based on sequence analysis, it is estimated that bacteria and archaea were split at proximately 4 billion years ago (Battistuzzi et al. 2004).

As a result of their long-term evolution and diverse distribution in nature, it is believed that very diverse bacteria exist on Earth, with more than 1012 species of microbes altogether (bacteria, archaea and microscopic eukaryotes) (Locey and Lennon 2016; Pike et al. 2018) and about 4–6 × 1030 bacterial and archaeal cells (Whitman et al. 1998). The diverse and abundant bacteria play important roles in nature based upon their huge metabolic diversity. Some metabolic functions are only found in bacteria and archaea, such as chemoautotrophy (nitrification, sulphur oxidation, hydrogen bacteria), photoheterotrophy (purple and green non-sulphur bacteria), anoxygenic photosynthesis (purple and green sulphur bacteria), biological nitrogen fixation (diazotrophs such as Azotobacter, Rhizobium), anaerobic respiration (denitrification, sulphate reduction, arsenate reduction), methanogenesis, etc. These capabilities make prokaryotes a unique bioresource in biodegradation and biotransformation, especially in biodegradation of the xenobiotic compounds like the polychlorinated biphenyls (Borja et al. 2005). In addition, some bacteria also can be pathogens for human being, for animals and for plants (Boyd et al. 2013; Vouga and Greub 2016). Based on their great economic and ecological importance, as well as their scientific value in study of the origin and evolution of life (Errington 2013), taxonomy of bacteria or bacterial systematics is needed by both the scientific and wider human communities.

Bacterial Taxonomy

Among the three hierarchical levels of biodiversity, the species diversity is directly and closely related to taxonomy. Taxonomy is a classic and basic branch of science concerning with putting organisms into a ranking system and giving each taxon (taxonomic unit) a scientific name, which is used in communication for biological investigation, application and education. Therefore, bacterial taxonomy organises the organisms in a hierarchical system. Based upon taxonomy, Escherichia coli as the unique name for the same bacterial group has been used for a hundred years in all the world since it has been reported (Castellani and Chalmers 1919).

Taxonomy of bacteria is the study to accommodate (group) the bacteria into a hierarchical system (classification) based on their similarities or relationships, to give the bacterial groups scientific names (nomenclature) according to the International Code of Nomenclature of Bacteria (Parker et al. 2019) and to clarify the taxonomic affiliation of newly isolated or detected bacteria (identification). The principal purpose of bacterial taxonomy is to give the study of bacteriology a comparable basis, e.g. to make sure that the same scientific name used in the world refers to the same bacterial identity (species). In bacterial classification, the system of Carl Linnaeus is used, in which species is the basic taxonomic unit (with a genus name and a specific epithet – the so-called binary nomenclature) to construct a hierarchy of ranks from lower to higher levels: genus, family, order, class, phylum and kingdom. Based upon the molecular phylogeny of ribosomal RNA (rRNA), all cellular organisms are classified into three kingdoms or domains: eukaryotes, bacteria and archaea (Woese et al. 1990). Besides the phylogenetic division, bacteria can be differentiated in phenotype from the other two domains: the absence of nuclei in cells and sensitivity to antibiotics distinguishes prokaryotes (bacteria and archaea) from eukaryotes, while the presence of peptidoglycan and ester composed of unbranched fatty acids and glycerol in the membrane differentiates bacteria from archaea, the latter having ethers formed by saturated isoprenoids and glycerol.

Currently, the bacterial taxa at genus and higher levels are mainly based on phylogenetic relationships estimated from sequencing of small-subunit ribosomal RNA (SSU rRNA) genes. A sequence similarity of 94.5% or lower is evidence for distinct genera (Yarza et al. 2008). Qin et al. (2014) proposed a “genus boundary for the prokaryotes based on genomic insights” and suggested the use of “percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance”. They suggested that “a prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%”.

With the development of culture-independent methods, such as deep sequencing of environmental samples and single-cell genomics, and their application in research of bacterial diversity, the number of bacterial phyla has expanded from 6 (Woese 1987) to more than 70 based on culture-independent metagenomic analysis (Pace 2009; Yarza et al. 2014) and then 99 based upon the phylogenetic analysis of 94,759 bacterial genomes (Parks et al. 2018). Further novel phyla are still reported as a result of analyses of metagenomes (Brown et al. 2015; Eloe-Fadrosh et al. 2016) (Fig. 2.1).

Fig. 2.1
figure 1

Maximum likelihood phylogeny of bacterial phyla based on concatenation of 56 conserved marker proteins, showing the novel phylum “Ca. Kryptonia”. Scale bar, 10% of the protein sequences. [Deduced from Eloe-Fadrosh et al. (2016)]

Concept of Bacterial Species

Species is the basic unit in taxonomy. In contrast to the definition of animal and plants, interspecific reproductive isolation has not usually been a criterion to separate bacterial species from each other. In Bergey’s Manual of Systematic Bacteriology, a bacterial species was defined as “a distinct group of strains that have certain distinguishing features and that generally bear a close resemblance to one another in the more essential features of organization” (Brenner et al. 2005). Other definitions of bacterial species have also been given in several publications (Doolittle and Zhaxybayeva 2009; Konstantinidis et al. 2006; Rosselló-Mora and Amann 2001; Staley 2006). In general, only cultured bacteria can be described as species. Until recently, the bacterial species definition has been based on polyphasic analysis (Gillis et al. 2015; Prakash et al. 2007; Vandamme et al. 1996), including phenotypic characterisation and taxonomy (Sneath 1995; Willcox et al. 1980), chemical taxonomy (Brondz and Olsen 1986), multilocus sequence analysis (MLSA) of housekeeping genes (Glaeser and Kämpfer 2015) and some kind of genomic analysis like average nucleotide sequence identity (ANI) or average amino acid sequence identity (AAI) (Garrity 2016; Thompson et al. 2013). Among these analyses, some thresholds for the species definition have been suggested: 80% phenotypic similarity (Austin et al. 1978), 70% DNA-DNA relatedness (Wayne et al. 1987), 97% sequence similarity of 16S rRNA gene (Vandamme et al. 1996), 96–97% similarity in MLSA (Glaeser and Kämpfer 2015) and 95–96% of ANI (Goris et al. 2007; Richter and Rosselló-Móra 2009; Tindall et al. 2010).

The exact features or methods in the polyphasic approach may vary in the taxonomy of different bacterial groups. In taxonomy of the genus Streptomyces and related bacteria, the polyphasic approach included numerical taxonomy based on a large number of phenotypic traits and chemotaxonomy based on analyses of fatty acids; whole-cell analysis with Curie-point pyrolysis mass spectrometry (PyMS); biochemical (enzymatic) analyses; serotype, phage type and protein profiling; genomic analyses for DNA-DNA hybridisation; low-frequency restriction fragment analysis (LFRFA) of total chromosomal DNA and randomly amplified polymorphic DNA (RAPD) PCR assays; and phylogenetic (nucleic acid sequence) comparisons like rRNAs, elongation factors and ATPase subunits (Anderson and Wellington 2001). For polyphasic taxonomy of genus Shewanella, growth features, haemolysis, tolerance to NaCl in different concentrations, iron reduction, anaerobic respiration with nitrate, etc. were included in phenotypic characterisation; fatty acid analysis and quinone analysis were used for chemical taxonomy; sequence analyses of 16S rRNA and gyrB were performed for phylogenetic study, while DNA-DNA hybridisation was applied for genomic characterisation (Venkateswaran et al. 1999).

With the development of molecular biology, especially the application of metagenomic analysis in the studies on biodiversity, more than three million 16S rRNA gene sequences have accumulated in databases (Quast et al. 2013), and many distinct 16S rRNA gene sequences that share sequence similarities less than 97% with those of the defined species have been detected from environmental samples, either by cloning-sequencing procedures or by the high-throughput sequencing methods. These 16S rRNA sequences undoubtedly represent different bacterial species, but the related bacteria have not been cultured and isolated. Therefore, the concept of genomic-phylogenetic species (GPS) has been suggested for the taxonomy of prokaryotes (Staley 2006), which refers to the bacterial group represented only by the 16S rRNA sequences or genome sharing similarities ≥97%. Indeed, this concept is also applicable for isolates, before distinctive features are found. In taxonomy, species that have not been cultured, but for which there is evidence based on sequences and other observations, can be described as Candidatus (Murray and Schleifer 1994).

Up to date, about 5 × 104 cultured bacterial species, 105 potential species (without cultured strain) represented only by 16S rRNA sequences and 107 phylogenetic species detected by high-throughput sequencing from metagenomic DNAs isolated from different sites have been reported (Locey and Lennon 2016). Given the enormous estimated number of bacterial species (1012) in the world (Locey and Lennon 2016), as well as the existence of primer specificity-related “blind spots” (Evguenieva-Hackenberg 2005) in high-throughput sequencing, the diversity of bacterial species is far from sufficiently explored.

3 History of Studies on Rhizobial Diversity and Taxonomy

The diversity and taxonomy of rhizobia have been studied since root nodule bacteria were first isolated 100 years ago. Based upon biogeographic and genetic studies, we can deduce that rhizobial diversity depends on four factors: their long evolutionary history, environmental selection for their survival (chromosome genes), host selection for nodulation (symbiosis genes) and symbiosis gene lateral transfer (creating novel combinations of chromosome and symbiosis genes). As a functional group, rhizobia present great diversity at the species level, with about 100 species within many genera.

As reviewed previously (Parker 2001; Willems 2006), the nodules on roots of legumes were recognised by Malpighi in 1675 on common bean (Phaseolus vulgaris) and on faba bean (Vicia faba); the ability of these nodules to fix atmospheric nitrogen was demonstrated by Hellriegel and Wilfarth in 1888. In the same year, Beijerinck isolated for the first time the rod bacteria from root nodules of pea plants (Pisum sativum) and named them Bacillus radicicola. Furthermore, he related them to the nitrogen fixation process. In 1889, Frank suggested the genus name Rhizobium to accommodate the root nodule bacteria and described the only species Rhizobium leguminosarum. After that, the name Rhizobium has been used up to the present time (http://doi.namesforlife.com/10.1601/nm.1280), although knowledge of the diversity and taxonomy of rhizobia has developed dramatically.

3.1 Cross-Nodulation Groups and the Early Definition of Rhizobium Species

The enormous agricultural and economic value of rhizobia made their symbiotic properties important in the development of rhizobial taxonomy and diversity studies. Therefore, host specificity was given a lot of weight in the early studies on rhizobia, which led the definition of rhizobial species based upon the cross-nodulation groups for about 80 years.

In the beginning of the twentieth century, nodulation of diverse leguminous species was extensively studied, and the specificity of the symbiosis between the rhizobial isolates and the host plants was recognised. Based upon the specificity, cross-nodulation groups were described for the rhizobia isolated from a spectrum of leguminous species, in which the plants can share their microsymbionts for nodulation. Then six main cross-nodulation groups were defined as six species within the genus Rhizobium, and some other strains in the cowpea miscellaneous group were named as Rhizobium spp. Strains of R. leguminosarum, R. phaseoli, R. trifolii and R. meliloti grew fast and produced acid on YMA medium, while strains of R. lupini, R. japonicum and those in the cowpea miscellaneous group (Rhizobium spp.) grew slowly and produced alkali on YMA (Table 2.1) (Fred et al. 1932).

Table 2.1 Rhizobium species defined based upon the cross-nodulation groups (Fred et al. 1932)

After its establishment, the species definition based on cross-nodulation groups was frequently thrown in doubt by the results of subsequent studies, such as many nodulation cases that crossed the boundary of cross-nodulation groups (Wilson 1944) and the high phenotypic similarities among the strains of different cross-nodulation groups (Graham 1964). Although the concept of cross-nodulation group as a basis for species has been abandoned, the specificity between rhizobial strains and leguminous species or preference of legume species for some rhizobial species is still an important feature for rhizobial application or inoculation practice.

3.2 Rhizobial Classification by Numerical Taxonomy

In the 1960s, numerical taxonomy was introduced into the taxonomy of rhizobia (Graham 1964). This is a technique that uses computers for comparison of a substantial number of phenotypic characters, covering morphological, growth conditions (pH, temperature and salinity ranges), spectra of C and N resources, metabolic features (acid/alkali production, respiration/fermentation) and resistance to antibiotics and other chemicals (Graham 1964; Moffet and Colwell 1968; ‘tMannetje 1967). During the 1960s–1990s, numerical taxonomy based on phenotypic features played a key role in improving the Rhizobium classification. In the same period, serological analyses (Graham 1963; Vincent and Humphrey 1970) and DNA G+C mol% (De Ley and Rassel 1965) were also used for rhizobial classification.

The earlier numerical taxonomy studies revealed that (1) strains in the three cross-nodulation groups of peas, beans and clovers (R. leguminosarum, R. phaseoli, R. trifolii) might be the same species since they were grouped in a single phenon (phenotypic group); (2) R. japonicum and R. lupini might be the same species; and (3) the fast-growing rhizobia (R. leguminosarum, R. phaseoli, R. trifolii, R. meliloti) were more related to Agrobacterium than to the slow-growing rhizobia (R. japonicum, R. lupini, cowpea miscellaneous group). These findings were considered by Jordan and Allen (1974), who proposed the Family Rhizobiaceae consisting of the genera Agrobacterium, Chromobacterium and Rhizobium, while the six Rhizobium species were maintained and divided into the fast and slow groups. Later, the slow-growing rhizobia were transferred into a novel genus, Bradyrhizobium, by combining the results of numerical taxonomy with those of other analyses including DNA and rRNA (Jordan 1982).

In the 1980s–1990s, numerical taxonomy, alone or together with other methods, was widely applied for investigating phenotypic similarities among rhizobial strains, which greatly enlarged the diversity of rhizobia and led to the description of a third rhizobial genus, Sinorhizobium, for fast-growing soybean rhizobia (Chen et al. 1988). In addition, several numerical taxonomy studies alone defined a unique phenon for the photosynthetic rhizobia nodulating Aeschynomene species (Ladha and So 1994), two unique phenotypic groups for Rhizobium strains from legumes of the temperate zone (Novikova et al. 1994) and Bradyrhizobium members for Sarothamnus scoparius rhizobia (Sajnaga and Malek 2001). Subsequently, some of the phena resulting from numerical taxonomy were confirmed by polyphasic analyses, including 16S rRNA gene sequence, such as the emendation of genus Sinorhizobium (De Lajudie et al. 1994) and the description of Rhizobium hainanense (Gao et al. 1994; Chen et al. 1997). However, the results of numerical taxonomy do not have phylogenetic significance, i.e. the phena defined in numerical taxonomy do not reflect evolutionary relationships among the rhizobia. For example, the photosynthetic rhizobia nodulating Aeschynomene species were identified as members of the defined Bradyrhizobium genus (Molouba et al. 1999), although they formed a unique group (phenon) in numerical taxonomy (Ladha and So 1994).

In addition to the phenotypic features mentioned above, the method of numerical taxonomy was also applied to data of qualitative coding of immunodiffusion reactions (Dudman and Belbin 1988), cross-nodulation patterns of legumes and Rhizobium (Lieberman et al. 1985) and chemical taxonomy such as cellular fatty acids (Dunfield et al. 2001). This method is also a basis for grouping biochemical and genetic fingerprint data, such as the patterns of multilocus enzyme electrophoresis (MLEE) (Wang et al. 1998, 1999), SDS-PAGE of total cellular proteins (Dupuy et al. 1994; Doignon-Bourcier et al. 1999), PCR-based restriction fragment length polymorphism (RFLP) of 16S rRNA genes (Wang et al. 1998, 1999) and IGS genes (Laguerre et al. 1996), BOX- or rep-PCR, Eric-PCR (De Bruij 1992; Laguerre et al. 1996), random amplified polymorphic DNA (RAPD) (Dooley et al. 1993; Harrison et al. 1992), amplified fragment length polymorphism (AFLP) (Terefework et al. 2001) and so on.

3.3 DNA and Phylogenetic Analyses

After the 1970s, many DNA sequence analyses were introduced into taxonomic studies of rhizobia. These analyses were used for estimating genetic and phylogenetic relationships at the genetic level, the species level and the genus and higher taxonomic levels. To define the genetic diversity among the strains within the species, DNA fingerprinting methods like BOX- or rep-PCR, Eric-PCR (De Bruij 1992; Laguerre et al. 1996) and RAPD (Dooley et al. 1993; Harrison et al. 1992) have been widely used, and BOX-PCR is still used currently based upon the reproducibility of its PCR patterns and sensitivity to distinguish closely related strains (Chidebe et al. 2018; Dall’Agnol et al. 2014).

For species definition, DNA-DNA hybridisation (Rosselló-Mora 2006), PCR-based RFLP of 16S rRNA genes (Wang et al. 1998, 1999), PCR-based RFLP of 16S–23S rRNA IGS (Laguerre et al. 1996) and AFLP (Terefework et al. 2001) were used. Among these methods, AFLP was not so popular since it is a complicated and labour-consuming procedure. With the development of DNA sequencing techniques, sequencing has become more convenient, economic and rapid than the PCR-based RFLP of 16S rRNA gene and IGS. In addition, sequence data are available for repeated use, so the RLFP analyses have been almost replaced by sequencing analyses of the corresponding DNA fragments. Sequence similarity of 97% for the16S rRNA gene has been used as the threshold for species definition (Wayne et al. 1987), and this threshold was later modified to 98.7% (Stackebrandt and Ebers 2006). However, rhizobial strains sharing very high or even identical 16S rRNA gene sequences have been divided into distinct species (Román-Ponce et al. 2016; Yan et al. 2017). DNA-DNA hybridisation was formerly used as a gold criterion for the definition of bacterial species, and 70% relatedness was used as the species threshold (Graham et al. 1991), but it has been made unnecessary, firstly by the sequencing and phylogenetic analysis of housekeeping genes, the so-called multilocus sequence analysis (MLSA) (Gaunt et al. 2001; Martens et al. 2008), and more recently by genome sequence analyses such as average nucleotide identity (ANI) (Ormeño-Orrillo et al. 2015; Zhang et al. 2014) or in silico DNA-DNA hybridisation based on genome sequences (Meier-Kolthoff et al. 2013; Ormeño-Orrillo et al. 2015).

For the definition of genera and higher taxonomic levels, DNA-rRNA hybridisation (De Smedt and De Ley 1977; Wang 1990), rRNA catalogues (Hennecke et al. 1990; Jarvis et al. 1986), 16S ribosomal DNA restriction fragment length polymorphism (Molouba et al. 1999) and rRNA gene sequencing (So et al. 1994) were used. Currently, the phylogeny of 16S rRNA gene is widely used as the primary criterion for definition of rhizobial species, genera (Graham et al. 1991) and higher taxonomic levels (Yanagi and Yamasato 1993). In general, 95% identity of the 16S rRNA gene sequence has been used as the threshold for a bacterial genus (Rossi-Tamisier et al. 2015). Based upon the data of 16S rRNA gene phylogeny, Allorhizobium (de Lajudie et al. 1998), Azorhizobium (Dreyfus et al. 1988), Bradyrhizobium (Jordan 1982), Mesorhizobium (Jarvis et al. 1997) and Sinorhizobium (Chen et al. 1988; de Lajudie et al. 1994) were reported or emended as novel genera for some new strains or for species previously described within the genus Rhizobium.

These genera were further transferred into different families: the Bradyrhizobium branch located in family Bradyrhizobiaceae, the Rhizobium-Sinorhizobium-Allorhizobium branch affiliated in Rhizobiaceae, the Mesorhizobium branch classified as a member of Phyllobacteriaceae and the Azorhizobium branch in Xanthobacteraceae (Sy et al. 2001; van Berkum and Eardly 1998). Sequence similarities of 16S rRNA genes greater than 92% were detected among the genera within the family Rhizobiaceae (Yanagi and Yamasato 1993); therefore, 92% similarity of 16S rRNA gene sequence might be a reference threshold for family definition.

Recently, the species within the genus Rhizobium were further reclassified based upon the phylogeny of 16S rRNA gene and MLSA (Mousavi et al. 2014, 2015), which led to the proposal of Neorhizobium and Pararhizobium, as well as the emendation of Agrobacterium and Allorhizobium. However, the existence of several lineages without affiliation to the defined genera implies the necessity for further improvement of the taxonomy of rhizobia.

3.4 Rhizobial Taxonomy in Genome Era

Until recently, the definition of rhizobial species was mainly based on the polyphasic approach, covering the estimates of evolutionary relationships from the gene sequence data (16S rRNA gene and housekeeping genes), chemotaxomic, physiological and cultural features. However, the development of whole-genome sequencing has opened a new era of bacterial taxonomy in general (Coenye et al. 2005; Thompson et al. 2013), as well specifically for rhizobial taxonomy (Tong et al. 2018; Wang et al. 2016). Thompson et al. (2013) suggested a unified species definition based on genomics: “strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH)”. With these values, it is convenient to define genomic groups or species; when distinctive phenotypic features are found, the genomic group could be described as a species. In addition, Qin et al. (2014) found that two species within a genus shared a pairwise percentage of conserved proteins (POCP) higher than 50%, which could be used as the boundary of genus. Therefore, the integration of whole-genome data into the taxonomy of rhizobia might also help the genus definition and delimitation of these symbiotic bacteria. In fact, genome analysis has already been involved in the taxonomy of rhizobia, and it is now recommended as the primary approach (de Lajudie et al. 2019). Wang et al. (2016) reported that both ANI and core-genome phylogenetic trees revealed similar relationships among rhizobial strains. In many publications, the standard ANI value of 95–96% has been applied for species threshold (Richter and Rosselló-Móra 2009), while 75% and 70% could be the thresholds for genus and family, respectively (Wang et al. 2016). With analysis of genome data of 45 strains representing the genera Agrobacterium, Allorhizobium, Bradyrhizobium and Sinorhizobium, 24 defined species and a putative novel genus represented by Agrobacterium albertimagni AOL15 were distinguished (Wang et al. 2016).

It is worthy to note that ANI and/or POCP values between rhizobial genera or species greater than the suggested threshold for genus (75% ANI, 50% COCP) and species (95% ANI) have been detected. For example, ANI values >75% were reported between Pararhizobium and Sinorhizobium (75.3–77.9%), or between Neorhizobium and Ensifer (72.7–75.2%); while 74.5–81.5% ANI among the species in Allorhizobium and 48.2–86.9% POCP among the species in Rhizobium were observed (our unpublished study). The existence of values crossing the thresholds demonstrated that both ANI and POCP thresholds have their limits in taxonomy of rhizobia, similar with the other methods used in bacterial taxonomy. So, genome analysis may replace some other genomic analysis, such as DNA-DNA hybridisation and G + C mol% determination, but it is still a method in the polyphasic taxonomy.

In addition, genome data also offered the possible phenotyping of the strains, for example, arsenite-oxidising and antimonite tolerant genes were detected in ten strains of Agrobacterium radiobacter and of two distinctive Sinorhizobium genomic species. In another study, Tong et al. (2018) investigated the species diversity of bean and clover rhizobia by comparative genome sequence analysis. In this study, 28 clusters were defined among 69 Rhizobium strains based on genome ANI, digital DNA-DNA hybridisation and phylogenetic analysis of 1458 single-copy core genes. The grouping results in this study were consistent with the species affiliation based on of atpD, glnII and recA. Therefore, the MLSA could be a cheaper and more rapid method for grouping rhizobial strains at species level, and genome analysis of at least the types strains of each species could be used for genus definition as revealed by both Tong et al. (2018) and Wang et al. (2016).