Introduction

Lactobacilli is an important taxa involved in food microbiology and human nutrition owing to their role in food and feed production and preservation [1]. Therefore, precise identification and classification of lactobacilli to the species level are required. Established methodologies, routinely used for assessing the evolution of lactic acid bacteria (LAB), rely on morphological descriptors and phenotypic methods based on biochemical systems such as API 50CH carbohydrate tests (API Products, Bio-Merieux, France) and phenotypic arrays (Biolog, Hayward, CA, USA) [2, 3]. Although phenotypic tests provide evidence of the metabolic capabilities of different strains, these methods have problems such as non-reproducibility and lack of discriminatory power [4].

Identification of LAB is most commonly based on the variability in their ribosomal RNA genes. Comparison of the gene sequences of bacterial species showed that the 16S rRNA gene is highly conserved within a species and among species of the same genus, so 16S rRNA gene can be used as the “gold standard” for the determination of the bacteria [5]. However, genotypic identification based on 16S rRNA sequences has limited discriminating power for closely related Lactobacillus species [6, 7]. Recently, more protein encoding genes have been applied for classification of closely related species: the tuf gene (encoding elongation factor Tu) [8], the hsp60 gene (encoding 60-kDa heat shock protein) [9], the htrA gene (encoding stress-inducible trypsin-like serine protease) [10], the recA gene (encoding recombinase A) [11], and the rpoA gene (encoding RNA polymerase beta subunit) [12].

The tuf gene encodes the elongation factor Tu, involved in protein biosynthesis, which facilitates the elongation of polypeptides from the ribosome and aminoacyl-tRNA during translation. It is universally distributed, and in most gram-positive bacteria, only one tuf gene per genome has been found [13]; thus, it is ideally suited for phylogenetic studies. The tuf gene has been used as a target gene for phylogenetic studies [14]. The hsp60 gene, which encodes a 60-kDa subunit (known as GroEL, 60-kDa chaperonin, and heat shock protein 60) of a complex that assists with the three-dimensional folding of bacterial proteins, has the potential to serve as a general phylogenetic marker because of its ubiquity and conservation in nature. Recent studies [9, 15, 16] have shown that hsp60 gene may be an alternate DNA target for species-specific identification of microbial species. Phenylalanyl-tRNA synthase gene (pheS) also been proven to be a valuable tool for the identification of Lactobacillus species and the delineation of novel taxa [17, 18].

The objective of this study was to design an accurate and rapid method to identify closely related lactobacilli isolated from traditional fermented dairy samples in six different regions and to assess the usefulness of the partial sequence of tuf, hsp60 and pheS for the differentiation of closely related species. The combination of tuf, hsp60 and pheS sequences alignments and phylogenetic analysis proved that the three genes were able to identify the species of L. casei group, L. plantarum group and L. acidophilus group at the species level.

Materials and methods

Bacterial strains

All lactobacilli were isolated from traditional fermented dairy samples collected from Tibet, Qinghai, Inner Mongolia, Yunnan and Xinjiang province of China and Mongolia. A total of 234 lactobacilli were chosen from isolates (Table 1), which were preliminary classified and identified as L. casei group, L. plantarum group and L. acidophilus group by biochemical tests and 16S rRNA gene sequences in our laboratory [1922].

Table 1 Strains and numbers of lactobacilli used in this study

DNA extraction

Total genomic DNA was extracted from overnight cultures by the previous method [23]. Briefly, 1 ml liquid culture of each strain incubated overnight in MRS broth was pelleted by centrifugation at 8,000g for 5 min. The pellets were washed with 500 μl TE buffer (10 mM Tris-Cl, 1 mM EDTA, pH 8.0) in a clean 1.5-ml microcentrifuge tube and repelleted by centrifugation. Washed cell pellets were resuspended in 500 μl TE buffer; tubes were frozen for 5 min by liquid nitrogen and then incubated at 65 °C for 5 min. This freezing–thawing step was repeated 4 times; 10 μl proteinase K solution (20 mg proteinase K/ml in TE; Amresco Inc.) and 60 μl SDS solution (10 %) were added. After incubated at 37 °C for 1 h, 100 μl NaCl (5 M) and 80 μl CTAB/NaCl (10 % cetyltrimethylammonium bromide, 0.7 M NaCl) were added, and tubes were incubated at 65 °C for 30 min. The mixture was extracted with an equal volume of phenol/chloroform/isoamylalcohol. After centrifugation, DNA was obtained by the addition of isopropanol (one equal volume) and then washed in 500 μl ethanol (70 %). DNA was pelleted, dried and dissolved in 100 μl RNase solution (100 μg/ml RNase in TE; Sigma-Aldrich). After incubation for 1 h at 37 °C, the sample volumes were adjusted to 400 μl with TE. Then, phenol/chloroform extraction was performed, and DNA was precipitated with 0.5 M NaAc (final concentration) and two volumes of ethanol (99 %). Pellets were washed in 500 μl ethanol (70 %). Finally, DNA was solubilized as above in 50–250 μl TE, and stock solution were stored at −20 °C.

PCR amplification

For each strain, the genomic DNA was used as a template for PCR amplification of a segment of tuf, hsp60 and pheS genes on the automatic thermal cycler (PTC-200, MJ Research, Waltham, MA). The sequences of the primers were listed in Table 2. For each target, PCR mixture (50 μl) contained 200 ng template DNA, 5 μl 10 × buffer with 3.0 mM MgCl2, 6.0 unit Taq DNA polymerase, 0.4 mM of dNTP and 20 pmol of each primer.

Table 2 Oligonucleotide primers used in this study

A DNA fragment corresponding to partial of tuf gene was amplified; each PCR cycling profile consisted of an initial denaturation step of 3 min at 95 °C, followed by amplification for 35 cycles as follows: denaturation (30 s at 95 °C), annealing (30 s at 52 °C) and extension (2 min at 72 °C). Amplification was completed with an elongation phase (10 min at 72 °C).

Universal hsp60 oligonucleotide primers H729 and H730 (Table 2) were used to amplify the hsp60 gene. The cycling conditions were 5 min at 95 °C for 1 cycle, followed by 30 cycles of 1 min at 95 °C, 30 s at 37 °C and 1 min at 72 °C. The last cycle was performed at 10 min at 72 °C.

For the convenience of sequencing, 24-bp nucleotide sequences were inserted in front of the primers PheS-21FA and PheS-21RA as described in the study conducted by Naser et al. [24], and they were used to amplify the pheS gene. The program consisted of 30 cycles of 30 s at 94 °C, 30 s at 35 °C and 2 min at 72 °C. PCR products were electrophoresed in a 1.0 % agarose gel and visualized by UV transillumination after ethidium bromide staining.

Nucleotide sequencing and phylogeny study

Sequencing of the PCR products was performed in Shanghai Sangni Biosciences Corporation. To confirm the species, the nucleotide sequences of the tuf, hsp60 and pheS genes of all the tested strains were analyzed and determined by BLAST program on NCBI. Consensus sequences were imported into MEGA version 4.0 software (http://www.megasoftware.net) [25], with which a sequence alignment and the representative sequences of each group were selected for phyletic tree construction based on neighbor-joining method, Bifidobacterium longum was considered as an outlier.

Nucleotide sequence accession numbers

All sequences determined in this study were deposited in GenBank under the following accession numbers: FJ983574 to FJ984182, FJ825030 to FJ825054, FJ825056 to FJ825086, FJ825088 to FJ825118 and FJ825120 to FJ825125.

Results

The three genomic regions of 234 isolates targeted were successfully amplified. The expected fragment lengths were observed for the PCR products. Removed the biased primer regions and ambiguous single-strand data, about 760 bp for tuf, 450 bp for pheS and 570 bp for hsp60 were subjected to phylogenetic analysis.

hsp60 sequence and phylogenetic analysis

Partial hsp60 gene was determined from closely related species including L. casei group, L. acidophilus group and L. plantarum group. Furthermore, nucleotide sequence homology searches of databases available through the BLAST yielded the highest matching scores for the corresponding reference strains; 113 strains shared 100 % identity with the hsp60 gene sequence of L. helveticus ATCC15009T , and 63 isolates showed a similarity of 100 % to L. paracasei ATCC 25302T, and less than 90 % to the other species (L. rhamnosus ATCC 7469T, 88 %; L. casei ATCC 393T, 85 %; L. zeae ATCC 15820T, 86 %); 58 isolates was close to L. plantarum ATCC 14917T, and they shared 99.9 % homology; however, all isolates only shared 95 and 94 % homology with L. paraplantarum DSM 10667T and L. pentosus ATCC 8041T, respectively. The phylogenetic tree constructed from the alignment of the representative hsp60 gene sequences is shown in Fig. 1. All strains were divided into two large branches, the first branch contained two groups, the L. acidophilus group and the L. casei group. In the L. acidophilus group, representative IMAU60068 and type strain L. helveticus ATCC15009T were clustered into a group, IMAU10126 of L. casei group was closely related to L. paracasei ATCC 25302T in 100 % of bootstrap analyses. The second branch contained L. plantarum group, where representative IMAU50045 was placed in the cluster of L. plantarum ATCC 14917T, which recovered in 100 % of bootstrap analyses. According to the phylogenetic analysis based on the hsp60 gene, all isolates could be clearly identified as L. casei (63 strains), L. plantarum (58 strains) and L. helveticus (113 strains).

Fig. 1
figure 1

Neighbor-joining tree showing the phylogenetic relationships between representative strains from each group and the type strains based on hsp60 gene sequences. Bifidobacterium longum was considered as an outlier. Bootstrap values based on 100 replications are given at the nodes

tuf sequence and phylogenetic analysis

The partial tuf gene sequences (760 bp) of all strains were determined. Resulting sequences were compared with related bacteria sequences in the GenBank, and sequence similarities were determined using BLAST program. Phylogenetic trees were constructed with the tuf gene sequences (Fig. 2). In this way, 234 lactobacilli belong to three different tuf clusters. One tuf cluster has showed the highest sequence similarity (100 %) with the type strain of L. paracasei ATCC 25302T and clearly distinct from other species (Similarity: L. rhamnosus ATCC 7469T, 93 %; L. casei ATCC 393T, 91 %; L. zeae ATCC 15820T, 92 %). The other tuf cluster has showed the highest sequence similarity of 100 % with the type strain of L. plantarum ATCC 14917T and similarity of 98 % with the other two species. Moreover, the isolates IMAU60068 have high similarity to L. helveticus ATCC15009T (100 %). Phylogenetic analysis showed the same result with hsp60 gene and provided higher resolution than the 16S rRNA gene.

Fig. 2
figure 2

Neighbor-joining tree showing the phylogenetic relationships between representative strains from each group and the type strains based on tuf gene sequences. Bifidobacterium longum was considered as an outlier. Bootstrap values based on 100 replications are given at the nodes

pheS sequence and phylogenetic analysis

Phylogenetic analysis of the pheS gene sequences showed distinct positions of the species (Fig. 3). IMAU60068 shared 100 % homology with L. helveticus ATCC15009T. Strain IMAU50054 and type strain L. plantarum ATCC 14917T were clustered into a group with a similarity of 100 %, and its pheS gene sequence showed a similarity of 90 % to L. paraplantarum DSM 10667T and 84 % to L. pentosus ATCC 8041T. IMAU60068 was placed in the cluster of L. casei group, which recovered in 100 % of bootstrap analyses, and its pheS gene sequence showed a similarity of less than 93 % to the other type strains. The topology of the phylogenetic tree from pheS sequences showed a distribution of lactobacilli similar to that based on hsp60 gene and tuf gene sequence analysis. However, it is more discriminatory than hsp60 gene and tuf gene.

Fig. 3
figure 3

Neighbor-joining tree showing the phylogenetic relationships between representative strains from each group and the type strains based on pheS gene sequences. Bifidobacterium longum was considered as an outlier. Bootstrap values based on 100 replications are given at the nodes

Comparative sequence analyses

Partial DNA sequences of tuf, pheS and hsp60 genes were obtained for 113 L. helveticus, 63 L. casei and 58 L. plantarum isolates. Our data clearly shown that hsp60, tuf and pheS genes sequences are more discriminatory than 16S rRNA, especially in L. plantarum group. The average nucleotide sequence similarities of hsp60, tuf and pheS genes among the L. plantarum group type strains was significantly less than that of 16S rRNA (96, 98.6, 91.3 and 99.4 %, respectively). On the 16S rRNA gene tree (Fig. 4), L. paraplantarum, L. pentosus and L. plantarum could not be distinguished due to few nucleotide differences in the variable region. However, sequencing of hsp60, tuf and pheS genes from the investigated lactobacilli confirmed the clustering of the L. plantarum, which were clearly and easily separated from L. pentosus and L. paraplantarum. Within L. casei group, the average nucleotide sequence similarities of hsp60, tuf and pheS genes were significantly less than that of 16S rRNA gene (89.8, 94, 88.3 and 99.1 %, respectively). Although the 16S rRNA gene sequence analysis of L. acidophilus group showed the representative strains form a well-defined cluster with their type strains in phylogenetic tree, the average nucleotide sequence similarities of hsp60, tuf and pheS genes among the type strains in L. acidophilus group were less than that of 16S rRNA gene (93.8, 97, 92.3 and 98.5 %, respectively). Therefore, the sequence analysis of hsp60 gene and pheS gene was useful and exact taxonomic criteria for L. acidophilus group.

Fig. 4
figure 4

Neighbor-joining tree showing the phylogenetic relationships between representative strains from each group and the type strains based on 16S rRNA gene sequences. Bifidobacterium longum was considered as an outlier. Bootstrap values based on 100 replications are given at the nodes

Discussion

The genus Lactobacillus is the largest group among the Lactobacteriaceae and contains more than 170 species (http://www.bacterio.cict.fr/). Significant changes have occurred in bacterial taxonomy since the introduction of molecular techniques. Currently, several molecular targets have been exploited for the molecular identification of Lactobacillus species. The identification of many species can be accomplished by 16S rRNA gene, which is considered an important molecular marker of modern bacterial taxonomy. Although the comparison of the 16S rRNA gene sequences has been useful in phylogenetic studies at the species level, in some cases, closely related species cannot be differentiated from each other by 16S rRNA gene. Therefore, the use of highly conserved protein encoding genes as evolutionary chronometers might have strong applications in the identification and differentiation of species [8]. We studied the partial sequences of hsp60, pheS and tuf genes of 234 isolates with the aim of exploiting a rapid and reliable tool for discrimination very closely related species.

Lactobacillus plantarum group include L. plantarum, L. pentosus and L. paraplantarum. These three closely related species have very similar fermentation abilities and cannot be distinguished by 16S rRNA gene sequence analysis because they show 99 % sequence similarity [26]. In view of its demonstrated effectiveness, sequence analysis of protein coding genes as alternative phylogenetic markers was applied to difference L. plantarum group. Torriani et al. [11] designed species-specific primers based on recA gene to distinguish species in L. plantarum group. Huang et al. [27] reported that a molecular marker, dnaK gene, was used for discriminating phylogenetic relationships among L. plantarum group, and the data indicated that phylogenetic relationships between 22 strains are easily resolved using sequencing of the dnaK gene. In this study, sequencing of tuf, hsp60 and pheS gene of 58 L. plantarum was performed, and reliable phylogenetic trees based on those genes were clearly shown for three species of L. plantarum groups (Figs. 1, 2 and 3). The species L. zeae, L. rhamnosus, L. paracasei and L. casei are phylogenetically and phenotypically closely related and are regarded as the L. casei group. Some strains in this group have been shown to be probiotic and are widely used in the food and feed industries [28]; however, more than 28 % of commercial probiotic products are mislabeled at the genus or species level due to the use of methods that have limited taxonomic resolution or that are unsuitable for reliable identification to the species level [29, 30]. In this study, 63 isolates tentatively identified as belonging to the L. casei group by prior sugar fermentation profiles and 16S rRNA gene sequence analysis were examined using hsp60, tuf and pheS genes sequence analysis; three phylogenetic trees based on the partly hsp60, tuf and pheS gene sequences clearly display all isolates belonging to L. paracasei.

Lactobacillus helveticus is an important species used as a starter in the dairy industry for many hard cheese productions [31]. However, a clear identification of L. helveticus within the genus Lactobacillus is sometimes ambiguous and complicated [32]. Therefore, rapid and accurate identification of L. helveticus is important. L. acidophilus, L. gallinarum, L. crispatus and L. helveticus form a cluster of closely related species based on 16S rRNA gene and those strains that showed more than 98 % homology with each other; we analyzed the hsp60, tuf and pheS genes of 113 L. helveticus strains to provide a more accurate method and prove the feasibility of three genes for distinguishing L. acidophilus group. The results of phylogenetic analysis confirmed that those genes showed better resolution with a high discrimination level than 16S rRNA gene to differentiate the L. acidophilus group species. Similar to our results, Sun et al. [22] used hsp60, tuf and pheS gene to distinct 32 L. helveticus strains, and the result shown that the analysis of tuf, hsp60, especially pheS partial gene sequences, effectively allows L. helveticus group to be differentiated at a higher discrimination level. However, based on a large amount of data, we demonstrated hsp60, tuf and pheS genes analysis showed better resolution not only among the L. acidophilus group species, but also among the L. casei group and L. plantarum group species.

In conclusion, in this study, we determined the protein encoding genes sequences of a large number of lactobacilli, increasing the already existent sequence databases of LAB species. We demonstrated a higher distinctness of the tuf, hsp60 and pheS sequences than of the 16S rRNA sequences and offered a valid molecular marker for inferring phylogeny among closely related taxa. Moreover, owing to its specificity, manageability and rapidity, this approach proposed in this study can be considered a valid strategy for typing at the species level lactobacilli isolated from food samples.