Lactobacillus fermentum is a gram-positive facultatively anaerobic bacterium [1, 2]. L. fermentum is known to be one of the most important lactic acid bacteria contributing to the formation of healthy intestinal flora [1, 2]. L. fermentum is commonly found in milk, milk products, sewage, fermenting plant materials and intestinal tracts of animals [1, 2]. L. fermentum was isolated from Kimchi and identified by 16S rRNA-based PCR analysis [3].

The phage LF1 was isolated from a mitomycin C (MMC C)-induced lysate of L. fermentum and characterized in our laboratory. The temperate phage was concentrated by polyethylene glycol (PEG) precipitation and purified using a CsCl gradient [4]. CsCl-purified phages were negatively stained with 2% (w/v) aqueous uranyl acetate (pH 4.0) on copper grids covered with a film of carbon-formvar, ionized, and examined by transmission electron microscopy (Hitachi H-7500, Japan) at an accelerating voltage of 80 kV. Morphological analysis by TEM showed that LF1 had an isometric head and a long tail, indicating that it belongs to the family Siphoviridae in the order Caudovirales (Fig. 1).

Fig. 1
figure 1

Electron micrograph of the Lactobacillus phage LF1. A CsCl-purified bacteriophage preparation was negatively stained with 2% uranyl acetate (pH 4.0). Scale bar: 500 nm; Magnification: x97000

The phage DNA was isolated from purified phage particles by the phenol extraction method [4]. The phage lysate was treated with DNase I (10 μg/ml) and RNase A (20 μg/ml) at room temperature for 15 min, and then incubated with 0.5 M EDTA (pH 8.0) and proteinase K (1 mg/ml) at 65°C for 30 min. After incubation, the phage DNA was extracted with phenol-chloroform-isoamyl alcohol, precipitated with ethanol, and dissolved in sterile distilled water. The genome sequence was determined by ultra-high throughput GS FLX sequencing to 20-fold redundancy on average. Putative open reading frames (ORFs) were analyzed using GeneMark.hmm for Prokaryotes, version 2.4 (http://exon.gatech.edu/) [5]. Amino acid sequences were compared using the protein Basic Local Alignment Search Tool (BLASTp) at the National Center for Biotechnology Information (NCBI) [6]. Genome comparisons at the nucleotide level between LF1 and other phages were made with Mauve software, using a progressive alignment with the default settings (http://gel.ahabs.wisc.edu/mauve/) [7].

Bioinformatic analysis of the phage genome revealed 57 putative ORFs. Of these, twelve were on the complementary strand (Fig. 2; Supplementary Material 1) and six were in the lysogeny module. A total of 31 ORFs showed homology with genes with annotated functions in the GenBank database, and the other 17 ORFs had matches with uncharacterized entries. Four ORFs had no match to sequenced genes in the database, and the remaining five ORFs had matches with uncharacterized entries, but significant amino acid identity was not found (Supplementary Material 1). Sequences of LF1 were attributed to the composition of the following five functional clusters: replication/regulation/modification, DNA packaging, structure/morphogenesis, lysis, and lysogeny (Fig. 2) [812]. The complete genome sequence of LF1 showed 28% nucleotide identity with L. fermentum phage ΦPYB5 by BLASTn analysis [13, 14]. The genomic sequences of LF1 and ΦPYB5 showed gene synteny within their packaging and structural modules (Fig. 3) [13].

Fig. 2
figure 2

Schematic representation of the dsDNA genome of the temperate phage LF1. Fifty-seven Putative ORFs are presented as arrows, with predicted functions where available. Proposed modules are based on predicted functions

Fig. 3
figure 3

Alignment of the genome of the temperate phage LF1 with those of other Lactobacillus phages using Mauve. Nucleotide sequence similarity is indicated by the height of the colored bars, while regions that are dissimilar are in white

In the module for DNA packaging, DNA packaging regions of phage LF1 showed 93% sequence similarity with those of phage ΦPYB5 of L. fermentum. The mosaicism is evidently the result of non-homologous recombination during the evolution of these viruses [8, 15, 16]. Based on the sequence similarity of the DNA packaging modules of LF1 and ΦPYB5, which show 93% identity, there is a good genetic mosaic relationship between the genomes of these two bacteriophages [13]. The putative protein of ORF1 showed 67% sequence identity to the small subunit of the terminase of a prophage of Lactobacillus antri, DSM16041. ORF2 encodes the protein homologous to the large subunit of terminase from phage ΦPYB5 of L. fermentum [13]. These terminases are DNA packaging enzymes that contain the ATPase activity [17]. Most terminases also contain the activity of the endonuclease that cuts concatemeric DNA into genome lengths during DNA packaging [17, 18]. The predicted protein of ORF4 showed 97% identity to the portal protein of ΦPYB5. ORF5 encodes a putative protein that is homologous to the phage head maturation protease of ΦPYB5. The predicted protein products of ORF8, ORF9 and ORF10 are predicted to encode head-tail joining proteins, and these showed a high degree of similarity to phage proteins from ΦPYB5.

In the structure/morphogenesis module, the predicted protein product of ORF11 showed 72% identity to the major tail protein of L. antri DSM16041. The predicted protein product of ORF14 was the longest ORF in LF1 and showed 56% identity to the tape measure protein (TMP) of a prophage of L. reuteri 100-23. The TMP usually functions as a template for measuring length during tail assembly [15]. The protein of ORF16 was identified as a phage-associated protein/endopeptidase and showed 44% identity to that of L. reuteri CF48-3A. ORF17 was the second-longest ORF, showing 36% identity to the tail fiber from Lactobacillus phage LP65.

The region of the lysis module from bacteriophage LF1 showed significant sequence similarity to that of Lactobacillus phage ΦPYB5 but did not show genetic mosaicism as was seen in the DNA packaging module [13]. The predicted protein product of ORF20 showed 80% identity to the GDSL family lipase from ΦPYB5, and ORF24 encoded a protein with 85% sequence similarity to holin from ΦPYB5. The predicted protein product of ORF25 was matched to lysin from ΦPYB5, with 88% identity.

In the lysogeny module, the predicted protein product of ORF26 showed 55% identity to a GNAT family acetyltransferase from Enterococcus faecalis E1Sol. The predicted protein of ORF28 showed 62% identity to the integrase of Lactobacillus vaginalis ATCC 49540. This integrase belongs to the tyrosine recombinase family. ORF32 encoded a protein that showed 41% similarity to a Zn finger protein of a prophage from Leuconostoc mesenteroides subsp. Mesenteroides ATCC8293.

In the replication module, ORF35 encoded a protein homologous to the lj965 prophage repressor from L. fermentum 28-3-CHN, with 48% identity. The ORF36 had a helix-turn-helix (HTH) domain and a sequence-specific DNA binding site in the N-terminal region that showed 58% identity to the xre family toxin-antitoxin system of L. antri DSM16041. Also, ORF37 had an HTH domain similar to that of ORF36 and showed 37% identity to the HTH domain-containing protein of L. crispatus CTV-05. The predicted protein product of ORF38 showed 66% identity to the phage antirepressor from L. fermentum IFO3956. ORF45 encoded a protein homologous to the RecT protein from L. reuteri DSM20016, with 57% identity. The predicted proteins of ORF39 to ORF42 were very short and did not have annotated functions. ORF47 encoded a protein homologous to the DNA replication protein from Leuconostoc kimchi IMSNU11154, and ORF48 encoded a protein homologous to Lactobacillus phage Lrm1, with 42% identity [19]. The predicted protein product of ORF52 showed 60% identity to endodeoxyribonuclease RusA from L. reuteri 100-23. The predicted protein of ORF55 matched phage transcriptional regulator from ΦPYB5, with 35% identity. The putative protein of ORF57 had an HNHc domain that showed 73% identity to the HNH endonuclease domain of L. antri.

Nucleotide sequence accession number:

The complete genome sequence of LF1 was deposited in the GenBank database under accession number HQ141410.