Introduction

Many bacteria collectively known as “rhizobia” form symbiotic associations with legumes, establishing the key process of biological nitrogen (N2) fixation, which is responsible for the wide adoption of legumes as food crops, forages, green manures and in forestry (Allen and Allen 1981; Polhill and Raven 1981). An impressive number of studies on biological N2 fixation were performed in the 1970s, but a relative ostracism was observed in the following decades. Recently, interest in the biological process is increasing and should expand in the coming years, due to higher costs of N fertilizers and concerns about environmental pollution (Binde et al. 2009).

Advances in the development of molecular tools have greatly contributed to improve knowledge about the symbioses between rhizobia and legumes. The profound changes in the taxonomy of rhizobia represent a good example. In 1984, rhizobial strains were classified in only two genera of the family Rhizobiaceae (Jordan 1984) and today, based on polyphasic analyses of phenetic and genetic properties, rhizobia are categorized in five genera of the order Rhizobiales—Azorhizobium, Bradyrhizobium, Mesorhizobium, Sinorhizobium and Rhizobium (Garrity and Holt 2001; Lloret and Martinez-Romero 2005; Willems 2006). In addition, as Rhizobium, Agrobacterium and Allorhizobium are closely related, they were joined into the genus Rhizobium (Young et al. 2001); furthermore, Sinorhizobium was reclassified into the genus Ensifer (Young 2003), but this reclassification is still under debate (Lindström and Young 2009). Finally, new diazotrophic symbiotic bacteria have been described and classified in non-traditional rhizobial genera, belonging either to the Alphaproteobacteria—Methylobacterium, Devosia—or to the Betaproteobacteria—Burkholderia, Cupriavidus (former Ralstonia and Wautersia) (Garrity and Holt 2001; Lloret and Martinez-Romero 2005; Willems 2006).

Ribosomal sequences, with an emphasis on the 16S rRNA genes, have become the method of choice in molecular taxonomy for tracing bacterial phylogenies (e.g. Woese 1987; Woese et al. 1990; Weisburg et al. 1991; Garrity and Holt 2001). However, although precise for the definition of kingdoms and genera, 16S rRNA provides poor resolution at the species and subspecies levels (Woese 1987; Garrity and Holt 2001). Shortcomings reside mainly on the high level of conservation documented in the 16S rRNA (e.g. Gevers et al. 2005), but concerns were also raised after the reports that genetic recombination and horizontal gene transfer may also occur among 16S rRNA genes (van Berkum et al. 2003; Gevers et al. 2005). On the basis of these observations, as well as to minimize their effects, other genes with a faster evolution rate than the 16S rRNA, but conserved enough to retain genetic information, have been proposed as alternative phylogenetic markers (Stackebrandt et al. 2002; Stepkowski et al. 2003; Martens et al. 2007; Alexandre et al. 2008).

Other ribosomal genes, such as 23S rRNA and ITS improve species definition (e.g. Tesfaye et al. 1997; Vinuesa et al. 1998; van Berkum and Furhman 2000; Willems et al. 2001; Germano et al. 2006; Menna et al. 2009). However, as they are located in the same operon as the 16S rRNA, limitations due to horizontal gene transfer continue. Another strategy relies in a complementary analysis of housekeeping genes broadly distributed among taxa, present in single copies and dispersed throughout the genome (Stackebrandt et al. 2002; Zeigler 2003; Gevers et al. 2005). In phylogenetic studies of bacteria belonging to the order Rhizobiales, housekeeping genes used in this approach include atpD, dnaK, dnaJ, gap, glnA, glnII, gltA, gyrB, pnp, recA, rpoA, rpoB and thrC (e.g. Turner and Young 2000; Gaunt et al. 2001; Alexandre et al. 2008; Martens et al. 2008; Vinuesa et al. 2008; Ribeiro et al. 2009; Rivas et al. 2009; Menna et al. 2009). However, one problem with those studies has been that different sets of genes and primers are employed for each genus.

Biodiversity in the tropics is a valuable, important, but still poorly studied resource, and even though biological N2 fixation is a key process for soil sustainability, genetic diversity of diazotrophic symbiotic bacteria has been scarcely investigated. In this study we have used the glnII gene, in addition to the 16S rRNA, aiming at improving the knowledge about taxonomy and phylogenetic relations of elite rhizobial strains previously studied by our group (Menna et al. 2006; Binde et al. 2009) and relevant for their economical importance as commercial inoculants in Brazil. This glnII gene was chosen after amplification tests with other housekeeping genes with a variety of rhizobial species, as we were searching for a gene that would be easily amplified with all species.

Materials and methods

Strains

Twenty-three strains from the Brazilian “Rhizobium Culture Collection SEMIA” (Seção de Microbiologia Agrícola) (IBP World Catalogue of Rhizobium Collections—SEMIA; WFCC—World Federation for Culture Collections # 443) were selected from previous studies (Menna et al. 2006; Binde et al. 2009). Table 1 provides information of the strains, as well as of the host plants from which they were isolated and for which they are recommended as inoculants. Strains were provided by FEPAGRO (Fundação Estadual de Pesquisa Agropecuária, Porto Alegre, Rio Grande do Sul, Brazil), and their purity was verified on yeast extract-mannitol agar (YMA) medium (Vincent 1970) containing Congo red (0.00125%). Stocks were prepared on YMA and kept at −70°C (under 30% glycerol) for long-term storage and at 4°C as source cultures. Strains are also currently maintained at the “Diazotrophic and Plant Growth Promoting Bacteria Culture Collection” of Embrapa Soja.

Table 1 Information about the strains used in this study and their host legumes

DNA extraction, amplification and sequencing

Total genomic DNA of each strain was extracted from bacterial batch cultures grown in YM broth until late exponential phase (109 cells mL−1). Extraction of DNA, purification and maintenance of the DNA were performed as described before (Menna et al. 2006).

To obtain the complete sequence of the 16S rRNA gene, reactions with five pairs of primers were carried out, as described before (Menna et al. 2006), and resulted in readings of about 1,422 bp. For the glnII gene, two reactions were performed, using the primers [(TSglnIIf (5′-AAGCTCGAGTACATCTGGCTCGACGG-3′) and TSglnIIr (5′-SGAGCCGTTCCAGTCGGTGTCG-3′)], under the amplification conditions described by Stepkowski et al. (2005), resulting in fragments of about 600 bp. All PCR reactions were carried out on an MJ Research Inc. PTC 200 thermocycler. The PCR products were purified with the PureLink™ PCR Purification kit (Invitrogen), according to the manufacturer’s instructions. Sample concentration was verified by electrophoresis of 2 μL PCR products on 1% agarose gel and staining with ethidium bromide.

The sequencing reactions were carried out in ninety-six-well full-skirt PCR microplates. Purified PCR products of each bacterium culture (80 ng per reaction) received a mixture of 3 μL of dye (DYEnamic ET terminator reagent premix for the MegaBACE, Amersham Biosciences), and 3 pmol of each primer. The same program was used for all primers, as follows: denaturation at 95°C for 2 min; thirty cycles of denaturation at 95°C for 10 s, 50°C for 4 s, and extension at 60°C for 4 min; final soak at 4°C. The sequencing was performed on a MEGA BACE 1000 (Amersham Biosciences) capillary sequencer, according to the manufacturer’s instructions.

High-quality sequences obtained for each strain were assembled into contigs using the programs phred (Ewing et al. 1998), phrap version 0.990722 (www.phrap.org) and Consed (Gordon et al. 1998). Sequences confirmed in the 3′ and 5′ directions were submitted to the GenBank database (http://www.ncbi.nlm.nih.gov/blast) to seek significant alignments. For the 16S rRNA, sequences were in full agreement with those deposited by Menna et al. (2006) and Binde et al. (2009). Accession numbers obtained for the glnII genes are listed on Table 2.

Table 2 Information about the gene sequences of the strains used in this study

Phylogenetic data analysis

Multiple alignments for each gene were performed with ClustalX version 1.83 (Thompson et al. 1997). Sequences of type/reference strains were included in the analyses and the accession numbers of the GenBank/EMBL/DDBJ Data Libraries are listed in parentheses for the 16S rRNA and glnII genes, respectively, as follows: Mesorhizobium amorphae strain ACCC 19665T (DQ02832, EU518372); M. tianshanense USDA 3592T (AF041447, AF169579); M. loti USDA 3451T (X67229, not available—meaning that the sequences are not available at the Genbank database); M. ciceri USDA 3383T (U07934, AF169580); Rhizobium tropici CIAT 899T (U89832, EU488791); R. etli CFN 42T (U28916, NC007761); R. leguminosarum USDA 2370T (U29386, EU155089); R. mongolense USDA 1844T (U89817, AY929453); R. rhizogenes ATCC 11325T (AY945955, not available); R. lusitanum P1-7T (AY738130, AY738130); B. elkanii USDA 76T (U35000, AY599117.1); B. betae PL7HGT (AY372184, AB353733); B. yuamingense CCBAU 10071T (AF193818, AY386780); B. canariense BCC2T (AY577427, AY386762.1); B. japonicum USDA 6T (U69638, AF169582); Methylobacterium nodulans ORS 2060T (AF220763, not available). Caulobacter crescentus strain CB15 (AE005673) was used as outgroup.

Phylogenetic trees were generated using MEGA version 4.0 (Kumar et al. 2004) with default parameters, K2P distance model (Kimura 1980), and the Neighbor-Joining algorithm (Saitou and Nei 1987). Statistic support for tree nodes was evaluated by bootstrap analyses (Felsenstein 1985) with 1,000 samplings.

For the alignment of the sequences aiming at verifying differences in the number of nucleotides among pairs of strains we have used ClustalW program (Thompson et al. 1994), version ClustalW2 (http://www.ebi.ac.uk/Tools/clustalw2/index.html).

Results

Five main phylogenetic branches or groups were observed in the 16S rRNA tree (Fig. 1) and they were split in ten subgroups, in addition to several strains forming distinct phylogenetic lineages. In the tree built after the alignment and analysis of 480 bp of glnII there were also five main groups, with eight subgroups in addition to isolated lineages (Fig. 2). The concatenation of the 16S rRNA and glnII genes provided a much better definition of the clusters of the strains, generating four main groups and twelve subgroups (Fig. 3), most with higher bootstrap support than in the single trees.

Fig. 1
figure 1

Phylogenetic tree of the 16S rRNA genes of twenty-three rhizobial strains from this study and of other rhizobial taxa. Strains and accession numbers are described in the “Materials and methods” section. The tree was generated using MEGA version 4.0 with default parameters, K2P distance model and the Neighbor-Joining algorithm

Fig. 2
figure 2

Phylogenetic tree of the glnII genes of twenty-three rhizobial strains from this study and of other rhizobial taxa. Strains and accession numbers are described in the “Materials and methods” section. The tree was generated using MEGA version 4.0 with default parameters, K2P distance model and the Neighbor-Joining algorithm

Fig. 3
figure 3

Phylogenetic tree of concatenated genes (glnII + 16S rRNA) of twenty-three rhizobial strains from this study and other rhizobial taxa. Strains and accession numbers are described in the “Materials and methods” section. The tree was generated using MEGA version 4.0 with default parameters, K2P distance model and the Neighbor-Joining algorithm

We have also analysed the differences in the number and percentage of nucleotides of the 16S rRNA, glnII and of the concatenated genes, comparing the sequences of each strain with the closest type strain after alignment with ClustalW (Table 2). To the results obtained from this comparison, we applied a value previously established in our laboratory (Menna et al. 2006), in which 1.03% of different nucleotides in the 16S rRNA sequences might indicate new species. As studies with glnII or with the concatenated sequences are still scarce, we have proposed 5 and 3% nucleotide differences, respectively, as indicative of putative new species. The results obtained are shown on Table 2.

Considering the strains from this study, SEMIA 6423, isolated from Calliandra houstoniana (subfamily Mimosoideae) clustered with R. lusitanum—a symbiont of Phaseolus vulgaris (Papilionoideae), also effective for Macroptilium atropurpureum and Leucaena leucocephala (Valverde et al. 2006)—in subgroup I.I of the 16S rRNA, with a bootstrap support of 75% (Fig. 1). However, in the trees built with the glnII (Fig. 2) and with the concatenated (Fig. 3) genes, the strain was positioned in an isolated cluster with SEMIA 6435, symbiont of Gliricidia sepium (Caesalpinoideae). According to the aligned sequences with ClustalW, SEMIA 6423 is fully conserved in the 16S rRNA gene, but has a 6.66% nucleotide difference in the glnII gene in comparison to the closest type strain, and a 2.79% nucleotide difference on the concatenated tree (Table 2). Interestingly, the cluster included strains capable of nodulating species from the three subfamilies of legumes.

Strain SEMIA 4088 clustered in subgroup I.II of the 16S rRNA with R. tropici type strain, with a bootstrap support of 99%. Clustering was confirmed in the analysis of both the glnII (Fig. 2) and the concatenated (Fig. 3) genes, with bootstrap supports of 100%, confirming the phylogenetic grouping of these two symbionts of P. vulgaris.

Four SEMIAs—3026, 2082, 2051 and 3007—were positioned in subgroup I.III of the 16S rRNA, together with R. leguminosarum (Fig. 1). Except for SEMIA 2051, clusters were confirmed with high bootstrap supports in both the glnII (Fig. 2) and on the concatenated (Fig. 3) trees. Considering the three strains, the maximum difference in the number of nucleotides for the glnII gene relative to R. leguminosarum was of 1.25% for SEMIA 3026 (Table 2), strongly indicating their taxonomic position as R. leguminosarum. On the other hand, SEMIA 2051 showed a 7.08% nucleotide difference in the glnII gene, and of 2.03% on the concatenated genes (Table 2), deserving further studies; for now the strain will continue to be classified as R. leguminosarum.

Strain SEMIA 384, symbiont of Vicia sativa, was positioned in subgroup I.IV of the 16S rRNA tree and clustered with R. etli CFN 42T with a bootstrap support of 80% (Fig. 1); the clustering was confirmed considering the glnII and the concatenated genes, with bootstrap supports of 84 and 99%, respectively. Considering the glnII and the concatenated genes, 4.79 and 1.51% of the nucleotides, respectively, were different from R. etli, therefore, the strain resembles R. etli, but other housekeeping genes should be investigated to confirm its precise taxonomic position (Table 2).

Still in Group I of the 16S rRNA tree, strains SEMIA 6435, symbiont of Gliricidia sepium and SEMIA 6437, symbiont of Adesmia latifolia, occupied isolated positions. In both the glnII (Fig. 2) and on the concatenated (Fig. 3) trees, strain SEMIA 6437 also occupied an isolated position, while SEMIA 6435, as discussed before, clustered with SEMIA 6423. In addition to a high percentage of different bases when compared to the closest type strain (Table 2), the results indicate that both SEMIA 6435 and 6437 might represent new species.

Bacteria belonging to the genus Sinorhizobium (= Ensifer) were coherent in all three trees, and the cluster included SEMIA 6161, symbiont of the tropical tree Prosopis juliflora (Figs. 1, 2, 3). Although clustered with Methylobacterium nodulans in the 16S rRNA tree, strain SEMIA 6407 differs in 6.45% of the nucleotides (Table 2) and unfortunately, no glnII sequences were available for M. nodulans. However, as pointed out before (Binde et al. 2009), SEMIA 6407 shows higher similarity of bases with the 16S rRNA of the non-diazotrophic M. mesophilicum, formerly classified as Pseudomonas mesophilica; therefore it will be interesting to proceed with the investigation of both housekeeping and symbiotic genes of this strain.

Interesting results were also obtained with SEMIAs 830 and 816, positioned in the great group of Mesorhizobium in the 16S rRNA tree (Fig. 1), but showing a high percentage of different nucleotides in comparison to the closest type strain (Table 2). Both strains formed a separated cluster in the tree built with the glnII gene (Fig. 2), resulting in a different subgroup of the genus Mesorhizobium on the concatenated tree (Fig. 3). The phylogenetic topology of all other strains fitting into the Mesorhizobium great group in the 16S rRNA tree (Fig. 1) was confirmed in both the glnII (Fig. 2) and on the concatenated (Fig. 3) trees; furthermore, the clustering of SEMIA strains with specific Mesorhizobium species was greatly improved with the complementary analysis of the second gene.

The last great group in the 16S rRNA tree included strains belonging to the genus Bradyrhizobium (Fig. 1), and species definition was very poor when using exclusively this gene, but improvement was achieved with the analysis of the glnII gene (Figs. 2, 3 and Table 2). Better certainty in the classification of strains SEMIA 696, 6053 and 6428 as B. elkanii was obtained, and a clustering of SEMIA 6153 with B. lianingense was also demonstrated; based exclusively on the 16S rRNA, this strain had been previously classified as B. japonicum (Binde et al. 2009). Strain SEMIA 6396, which did not occupy a clear position in the 16S rRNA tree, clustered with B. canariense in the glnII and on the concatenated trees. Finally, the high diversity (5.62–10.85% of different bases in comparison to the closest type strain) of the glnII genes of SEMIA strains 6154, 6392, 6396, 6407 and 6423 is noteworthy (Table 2).

On an overall basis, the additional analysis of the glnII for the twenty-three strains of this study resulted in changes in the nomenclature previously proposed (Menna et al. 2006; Binde et al. 2009) for six strains, SEMIAs 6153, 6154, 6392, 6396, 6407 and 6423 (Table 2). The observation that the clusters formed in the concatenated tree (Fig. 3) were far more defined than in the single analysis of the 16S rRNA gene (Fig. 1) is also very important. Finally, it should be noted that nine strains are denominated as “sp.” in Table 2, as this study strongly indicates that they may represent new species (Table 2).

Discussion

We have studied a collection of twenty-three elite rhizobial strains, chosen from previous studies (Menna et al. 2006; Binde et al. 2009) in which their precise taxonomic position—based on the 16S rRNA genes—was not clearly defined. All strains have been selected as the most effective in fixing N2 with twenty-one legume hosts and they are officially authorized for the production of commercial inoculants for these legumes in Brazil. The strains studied nodulate a wide-range of legumes, positioned in fourteen different tribes in all three subfamilies of the family Leguminosae (= Fabaceae in USA), including species grown in both tropical and subtropical regions (denominated here as tropics) (Table 1) and represent a valuable resource of symbiotic rhizobia with biotechnological potential for the tropics. A high level of diversity was clearly shown, reinforcing previous statements that there are many more varieties of rhizobia in tropical and subtropical than in temperate regions (Oyaizu et al. 1992).

Complete sequences of the 16S rRNA and partial sequences (480 bp) of the glnII genes were obtained. The same primers and amplification conditions were successful for the sequencing of the glnII genes of bacteria belonging to five different rhizobial genera—Bradyrhizobium, Mesorhizobium, Methylobacterium, Rhizobium, Sinorhizobium)—positioned in distantly related branches. For the great majority of the strains there was a good agreement between clustering with the 16S rRNA and with the glnII genes. However, as expected due to a higher number of informative sites, analysis of the glnII detected a higher level of genetic diversity than the 16S rRNA. Most important, the analysis of the concatenated genes (16S rRNA + glnII) considerably improved the information about phylogeny and taxonomy of rhizobia in comparison to the single analysis of the 16S rRNA, with most groups showing higher bootstrap support than in the single trees. The improvements were particularly important for bacteria belonging to the genera Mesorhizobium and Bradyrhizobium. One good example was achieved with Bradyrhizobium, as the poor species resolution within this genus based on the analysis of the 16S rRNA gene has been pointed out (e.g. Vinuesa et al. 1998; van Berkum and Fuhrmann 2000; Willems et al. 2001; Germano et al. 2006; Menna et al. 2006). In our study, in few cases the clustering position was not confirmed, and further analyses with other housekeeping genes will be performed, trying to define the correct taxonomic position and if horizontal gene transfer events have occurred.

In studies aiming at improving knowledge about diversity and taxonomy of prokaryotes, including rhizobia, at least five housekeeping genes have been included, in addition to the 16S rRNA. However, different genes and amplification conditions have been used for each genus (e.g. Stackebrandt et al. 2002; Zeigler 2003; Gevers et al. 2005; Vinuesa et al. 2008; Menna et al. 2009; Ribeiro et al. 2009). This strategy is thus expensive and time-demanding for the characterization of large culture collections, or in surveys of many rhizobial isolates.

In this study we have tried to determine if a single gene capable of improving information about taxonomy and phylogeny of rhizobia could be used in addition to the 16S rRNA. The gene should be easily amplified in a wide range of rhizobia using the same primers and amplification conditions. A similar approach has been previously used with gltA gene, coding for the citrate synthase; however, sequencing was not applicable to all rhizobial species studied (Hernández-Lucas et al. 2004). We have then proposed the broadly distributed and well conserved glnII gene, coding for the glutamine synthetase 2 (EC = 6.3.1.2). Genes coding for the glutamine synthetase are amongst the oldest ones on earth and thus may be very useful for tracing phylogeny (Tateno 1994) or to improve taxonomy. Two forms of glutamine synthetase (GSI and GSII) can be found in nitrogen-fixing bacteria: GSI, a typical prokaryotic glutamine synthetase and GSII, similar to the eukaryotic enzyme; a third GS isozyme (GSIII) can be found in S. meliloti and R. etli (Patriarca et al. 1992; Espín et al. 1994). Shatters and Kahn (1989) have found 83.6% of identity when comparing GSII proteins of S. meliloti and B. japonicum. In addition, the comparison with several GSII has proven that the gene has not been transferred among large taxonomic groups (symbiotic bacteria, plants and mammalian) (Shatters and Kahn 1989); the gene has also been successfully applied in studies with Bradyrhizobium (Vinuesa et al. 2008). In our study, the same primers and amplification conditions resulted in glnII sequences for symbiotic bacteria belonging to five genera (Bradyrhizobium, Mesorhizobium, Methylobacterium, Rhizobium and Sinorhizobium) and positioned in considerably different phylogenetic branches. The higher number of parsimony informative sites of glnII in comparison to the 16S rRNA has allowed to detect higher diversity among the strains; furthermore, the analysis of the concatenated genes (glnII + 16S rRNA) has greatly improved phylogeny, clarified the taxonomic position of many strains, and indicated that others might represent new species, deserving further studies.