Introduction

The advent of affordable whole-genome sequencing and the renewed interest in using bacteriophages as alternatives to antibiotics or to decontaminate food substances has led to a marked increase in submissions of complete phage genomes to public databases such as GenBank. While this process has led to comparative genome analyses and resulted in the recognition of relationships between newly submitted phage genome sequences and those already deposited, official recognition of these new taxonomic units has lagged. This is particularly true for members of the family Siphoviridae (double-stranded dsDNA genome, non-contractile tail), for which the present International Committee on Taxonomy of Viruses (ICTV) only recognizes 10 genera and 31 species (www.ictvonline.org/virusTaxonomy.org). These species account for less than 10 % of the fully sequenced genomes from members of this family, a situation that must be addressed. A significant number of these unclassified Siphoviridae members Salmonella hosts.

Serovars of Salmonella are widespread aetiological agents of food and waterborne diseases in humans and livestock. Salmonella enterica is classified into six subspecies by biochemical tests. Based upon serology of the lipopolysaccharide (O) and flagella (H) antigens, over 2,600 serovars of Salmonella have been described [22]. Not surprisingly, Salmonella phages are also numerous and varied. The morphology of 177 Salmonella phages was reviewed in 2007 [2] (updated in [55]), with most representatives belonging to the order Caudovirales (tailed phages with a dsDNA genome) and its three families, Siphoviridae, Myoviridae and Podoviridae. A small number of phages representing the families Inoviridae, Leviviridae, Microviridae and Tectiviridae have also been documented.

A number of Salmonella phages were originally isolated for the purpose of phage typing. Phage typing represents a useful epidemiological tool whereby strains of a particular serovar may be differentiated into phage types based on susceptibility to a panel of bacteriophages. Phage Jersey was originally isolated and used by Felix and Callow in the development of a typing scheme for Salmonella paratyphi B [1].

Jersey has a morphotype that is recognizable by electron microscopy, with a tail-terminal disk-like structure with six club-shaped spikes [1, 2, 14]. To date, almost all phages examined that exhibit the Jersey-type morphotype have been restricted to Salmonella and Escherichia, the sole exception to this rule being Serratia phage η [15]. No phages with this morphotype have been observed to infect other Gram-negative genera, including Aeromonas, Pseudomonas, Vibrio, or Rhizobium (Table 1). The genome of phage Jersey has recently been sequenced, and discontinuous MEGABLAST analysis revealed similarity to a number of siphovirus genomes deposited in GenBank.

Table 1 Jersey-like phages in the literature: (I) phages for which morphology and genome sequence are known, (II) phages with sequence only, and (III) phages with morphology only

On the basis of an in-depth bioinformatics analysis, this work proposes the creation of three new genera within a novel subfamily, “Jerseyvirinae”.

Results

Morphological characteristics

Members of this subfamily that have been examined by transmission electron microscopy are typical siphoviruses with isometric heads of 60 nm in diameter between opposite apices and long, non-contractile tails of 120 × 8 nm (Table 1). Capsids appear as hexagonal or pentagonal and are thus icosahedral in shape. Tails have about 27 transverse striations with a thin, 17-nm-wide baseplate with 5-6 spikes of 10 × 3 nm. Capsid and tail dimensions differ slightly from those of phosphotungstate-stained phage Jersey (head, 68 nm; tail length, 116 nm), which were obtained in 1970 without the benefit of rigorous electron magnification calibration. A morphological diagram of Jersey-like phages is presented in Fig. 1. Phages exhibiting a Jersey-like morphology have been described many times in the literature (Table 1).

Fig. 1
figure 1

Scale drawing of Jersey-like phage (A). Representative negatively stained TEM images of phage Jersey (B) and K1H (C). Scale bars are 100 nm and 20 nm, respectively

Comparative genomics and proteomics of bacteriophages belonging to the proposed subfamily “Jerseyvirinae

Phages considered in this manuscript are summarised in Table 2. Each was initially linked by similar morphology and discontinuous MEGABLAST analysis.

Table 2 Bacteriophages with fully sequenced genomes belonging to the proposed subfamily “Jerseyvirinae”

In addition to phages with completely sequenced genomes, several phages with partial sequences in GenBank were also identified using discontigous MEGABLAST; ST4 [JX233783], L13 [KC832325] and MB78 [AY040866, AF156970, AF349435, AJ277754, AJ249347, AJ245858, AJ245537, X87092, X86562, Y19202, Y19203, Y18133]. These phages are considered tentative members, subject to morphological examination and complete sequencing. Two Salmonella phages, FSL SP-038 and FSL SP-049, are also represented by partial genomic sequences in GenBank. Analyses suggest that they are part of the genus “Sp3unalikevirus”.

Each of the phages listed in Table 2 was colinearized and then examined for overall sequence similarity using progressiveMauve [12] (data not shown), and DNA sequence identity using EMBOSS Stretcher [44] and CLUSTALW [33], which have been widely used for nucleotide sequence alignment of virus genomes [4, 41, 58]. Using the latter methodology, three clades were clearly defined (Fig. 2).

Fig. 2
figure 2

Clustal analysis reveals that the Jersey-like phages fall into three distinct groups, for which we proposed three genera “Jerseylikevirus” (red), “K1glikevirus” (dark blue), and “Sp3unalikevirus” (light blue) within a proposed subfamily, the “Jerseyvirinae.” The genomes were colinearized before analysis. E.coli phage HK578 was included as an outlier. The scale bar represents 0.09 substitutions per site (color figure online)

In addition, the viral proteomes were subjected to pairwise comparisons using CoreGenes 3.0, which provided a measure of the total similarity at the protein level based on pairs of proteins scoring above a pre-defined BLASTP bit score threshold [64]. The CoreGenes and Stretcher results are presented in Table 3. Lastly, using the Markov clustering algorithm OrthoMCL [37] with an e-value threshold of 1e−5 and an inflation value of 1.15, all of the proteins encoded by members of the proposed subfamily were examined (data not shown). Functional annotations of proteins encoded by Jersey-like phages were obtained using BLAST tools and HHpred [50, 51]. Transmembrane domains were identified using TMHMM [31], and conserved domains were identified using Pfam [16] and InterProScan [27]. In line with the 95 % DNA sequence identity threshold used to delineate phage species specified by the Bacterial and Archaeal Viruses Subcommittee of the ICTV, the genus “Jerseylikevirus” comprised 10 species, whose sequence identities range from 71.6 % (FSL SP-101) to 67.6 % (SETP13) and have between 79.7 % and 68.1 % of their proteins in common relative to Jersey (Table 3).

Table 3 Proteome and nucleotide sequence similarity of members of the “Jerseyvirinae”

The proposed genus “K1glikevirus” consists of four viruses: K1H, K1G, K1ind1 and K1ind2. They share between 79 % and 97.1 % DNA sequence identity and a minimum of 84 % homologous proteins. Relative to phage Jersey, members of the genus “K1glikevirus” exhibit at minimum 58 % nucleotide sequence identity. Despite exhibiting significant protein homology (66.7 %), the DNA sequence of FSL SP-031 shows only 53.4 % identity to Jersey, a difference sufficient to indicate that this phage is distinct from other members and warrant the creation of a separate genus, “Sp3unalikevirus” within the subfamily.

Based on DNA sequence alignment, protein homology, morphology, and genome organisation, the Jersey-like phages form three distinct groups, united by 53 % DNA sequence identity, and these groups are proposed to represent three genera: “Jerseylikevirus”, “Sp3unalikevirus” and “K1glikevirus” (Fig. 2). In the following sections, the common properties of these viruses, which belong to the proposed subfamily “Jerseyvirinae”, are discussed alongside the specific properties of the members of the three genera.

Protein phylogeny

Phylogenetic trees were constructed to investigate common proteomic features of members of the “Jerseyvirinae” for the large terminase subunit (TerL), portal protein, DNA polymerase (DpoI), helicase, major capsid, and major tail (MTP) proteins (Fig. 3). Analysis of the helicase proteins indicated that this group of phages are phylogenetically related and distinct from the Hk574-like viruses. The trees constructed using the TerL, portal and DpoI sequences clearly indicate that this group of viruses can be subdivided into three clades. Analysis of the major capsid and MTPs reveals that viruses in the K1G clade are significantly different from those in the “Jerseylikevirus” group. Interestingly, the K1H major capsid protein is distinct from those of other members of the subfamily, a feature corroborated by the OrthoMCL groupings.

Fig. 3
figure 3

Phylogenetic analysis of the “Jerseyvirinae” common proteins produced using www.phylogeny.fr. Branch length is proportional to the number of substitutions per site

Common features of members of the “Jerseyvirinae

The genomes of members of the proposed subfamily “Jerseyvirinae” range in size from 40.7 to 43.6 kb, with a G+C content ranging from 49.6 to 51 %. They also encode between 48 to 69 proteins but no tRNAs. These phages appear to be strictly lytic/virulent, as none have been shown to be capable of lysogeny or to harbour homologues of known integrases, recombinases or excisionases. As with most phages, their genomes exhibit a modular structure, and the organisation of genes shows a high degree of synteny across all members (Fig. 4). The genome may be divided into four modules on the basis of the predicted functions of their component genes: (i) virion structure and assembly, (ii) regulation/immunity, (iii) genome replication and (iv) host lysis. Despite the presence of a number of deduced proteins of unknown function, the roles of some gene products can be predicted on the basis of BLAST and HHpred searches or the presence of conserved domains.

Fig. 4
figure 4

Genetic and physical map of Salmonella phage Jersey prepared using CGView [21]. For sequence similarity comparison, TBLASTX was used versus wks13 (red), FSL-SP031 (dark green) and K1H (dark blue). GC content is depicted in black while positive and negative GC skew is denoted by green and purple, respectively (color figure online)

Using OrthoMCL, the 1,057 proteins encoded by the 18 members of the three proposed genera were found to form 90 clusters of orthologous proteins and 51 singletons. Of these, 25 clusters were present in all members of the three proposed genera, representing conserved proteins in the structural, replicative and lysis gene modules. In addition to these ‘core’ genes/proteins, a further 1, 5 and 12 clusters were present in all members of “Jerseylikevirus”, “K1glikevirus”, and “Sp3unalikevirus”, respectively. The structural gene module represents the most highly conserved region with 17 ortholog clusters. Almost half of the genes encoded by the “Jerseyvirinae” phages are devoted to genome packaging, assembly and structure of mature virions. The structural module follows a strongly conserved gene order, encoding genes involved in DNA packaging, followed by the virion capsid, tail and adsorption apparatus in an arrangement reminiscent of that observed in many phages.

Several features are of interest within the morphogenesis module. Using phage Jersey as reference, the major capsid protein (gp12) shows structural similarity to Bacillus phage SPP1 and coliphage HK97 (Protein Data Bank [PDB] accession numbers: 4an5 and 3p8q, respectively), suggesting that the capsid protein shares the evolutionarily conserved HK97-like fold [17]. Three gene products showed structural similarity to Ig-set domain proteins when analyzed using HHpred. Two of these proteins, Jersey gp7 (AGP24895) and SETP13 gp12 (AGX84616), returned matches to the Hoc protein of enterobacteria phage RB49 (PDB 3shs) while a third, Jersey gp13 (AGP24902), returned a match to the head fibre of Bacillus phage Φ29 (PDB 3qc7). Ig-like domains are found widely in members of the order Caudovirales and are predominantly associated with structural proteins, suggesting that these proteins play a role in capsid assembly or completion and may also be involved in nonspecific binding to host cells [18]. The presence of a weak match to an Ig-like I-set domain (pfam: PF07679) in gp07 and the proximity of its gene to the gene coding for the major capsid protein suggest that this gene might encode a capsid decoration protein [52]. Head decoration proteins have been described in a number of phages, such as L, λ and ES18, and are thought to aid in the stabilisation of the capsid structure [20]. However, to date, none of the structural proteins present in mature Jersey-like virions have been identified using mass spectrometry.

Using OrthoMCL, it was found that two coding sequences positioned immediately upstream of the gene coding for the putative tape measure form a single cluster that is present in all members of the three proposed genera. The positioning of these genes is similar to that of the λ tail assembly chaperones gpG and gpGT. In λ, gpG and the translational frameshift product gpGT interact with the tape measure and major tail proteins and are required for correct tail formation [6062]. Analysis of the putative gpG and gpGT genes in phage Jersey (AGP24914 and AGP24915) using MFOLD and HPKNOTTER provided evidence for the presence of a stem-loop structure and a pseudoknot, respectively, suggesting that the Jersey-like phages produce a gpGT-like fusion product.

Using PSI-BLAST, it was found that the putative tail fibre (Jersey gp32, AGP24920) shows distant similarity to the central tail fibre gpJ of phage λ and p33 of phage T1 and is conserved among all members of the “Jerseyvirinae”. However, a central tail tip fibre has not been observed in electron micrographs, so the precise role and location of this protein remain unclear. This protein may in fact form the virion baseplate in conjunction with a gene (Jersey gp31, AGP24919) encoded immediately upstream that exhibits similarity to the endolysin of Pseudomonas phage ϕKZ (PBD: 3bkh), suggesting a role in cell wall degradation.

In all members of the “Jerseyvirinae”, the morphogenesis gene module is interrupted after the gene coding for the major tail protein by a cassette of between 1 to 5 genes encoded in the opposite orientation, where no single protein is present in all members of the subfamily. With the exception of phages SS3e and SP-031, all members of the “Jerseyvirinae” encode a putative serine/threonine protease in this region (pfam:PF12850, metallophos_2), which, according to HHpred, is linked to the NinI protein phosphatase in λ (PDB 1g5b).

Four critical units involved in genome replication, a replicative family A DNA polymerase (pfam: PF00476), DEAD-box helicase-primase (pfam: PF13481) and helicase (pfam: PF00176), are also conserved in all members of the proposed subfamily. Analysis of the replication gene cluster using HHpred provides additional evidence for the presence of a Holliday junction resolvase (PDB 1hh1), a helicase subunit (PDB 3h4r), and a helix-destabilising ssDNA-binding protein similar to gp2.5 of phage T7 (PDB 1je5).

Finally, the lysis gene cluster codes for proteins facilitating lysis of the infected host cell, allowing egress of newly formed virions into the surrounding environment. The lysis or late gene module represents an area of significant divergence among the three genera and between individual phages. The module is replete with ORFs of unknown function, and only three gene products, endolysin, holin, and a protein of unknown function are conserved across the subfamily. While the holin formed a single cluster when analyzed using OrthoMCL, the number of predicted transmembrane domains differed between phages. On the basis of predicted transmembrane domains, phages AG11, Jersey, SE2, SETP3, wksl3, Ent2 and SP-031 are presumed to encode class I holins, while Ent1, Ent3, SP-101, SETP7, SETP13 and SS3e [63] encode class II holins. The endolysin belongs to the glycoside hydrolase 24 family (muraminidase, pfam:PF00959). Many of the lysis gene products are shared between only some members of each genus, while others are found between a limited number of representatives of one or more genera, suggesting that this region has been the site of frequent genetic exchange.

Description of individual genera

“Jerseylikevirus”

This proposed genus is named after the first characterized phage of this morphotype, Salmonella phage Jersey [1]. Members of the genus are distinguished by a distinct morphology, a similar genome size among its members, a conserved gene organisation, and the use of a P22-like tailspike to facilitate host recognition and adsorption, the last of which distinguishes them from members of the genera “K1glikevirus” and “Sp3unalikevirus”, as do the additional 25 accessory genes that are shared between members of two or more species. To date, 12 members of this genus, isolated from four different countries, have been fully sequenced and annotated (Table 2). Members of the genus “Jerseylikevirus” have an average G+C content of 49.79 %, slightly lower than the average of 52 % reported for serovars of Salmonella enterica [40].

Host specificity is conferred by six tailspikes, observed as short clubs attached to the tail terminus. Like the tailspikes of podophages P22, Sf6 and HK620, they exhibit a modular design consisting of a conserved N-terminal binding domain and a P22-like C-terminal catalytic domain (data not shown). In phage P22, the tailspike facilitates adsorption to Salmonella O-antigen 12 of the cell surface LPS, which is expressed by members of White-Kauffmann-Le Minor serogroups A, B and D1 [22]. The high degree of conservation of the P22-like domain and catalytic residues, combined with host-range data for SETP3, SETP7, SETP13, vB_SenS-Ent1 and wksl3 [14, 29, 56], suggests that “Jerseylikevirus” members isolated to date are limited to these Salmonella serogroups. However, SS3e appears to be an exception: it has been reported to be capable of lysing enterobacteria other than Salmonella, including E. coli, Enterobacter cloacae, Shigella sonnei, and Serratia marcescens [56].

Several members of the proposed genus “Jerseylikevirus” (Jersey, Ent1, Ent2, Ent3, SE2, wksl3 and SP-101) encode a putative DNA-binding protein containing one or both of the Pfam-family domains ANT (PF03374) and pRha (PF09669). In phage P22, ant encodes an anti-repressor, which inhibits binding of the c2 repressor to the PL and PR operators, enabling the expression of genes necessary for lytic development [9]. The pRha domain represents a family of proteins whose expression is detrimental for lytic growth in the absence of integration host factor function [54].

A gene product with similarity to inner membrane immunity (Imm) proteins, which protect against superinfecting phages [32], is also found in all Jersey-like phages except Jersey and AG11. This gene product contains an Imm_superinfect motif (pfam: PF14373), is predicted to localize at the cytoplasmic membrane (PSORTb), and contains two transmembrane domains (TMHMM). Only the putative immunity protein [AAZ41745] is annotated in SS3e, although this appears to be due to an incomplete genome sequence rather than the absence of further protein-encoding genes in this region.

An interesting feature of members of “Jerseylikevirus” is that intein insertion is evident in the DNA helicase of phages vB_SenS-Ent1, Ent2, Ent3, SETP3 and SETP7, and also within the DNA polymerase of phages FSL SP-101, Ent1, Ent2, Ent3, SE2 and SETP3 (data not shown). Inteins are defined as protein sequences that are embedded within a precursor sequence which, upon translation, catalyzes self-excision from the host polypeptide and ligation of the flanking sequences to yield two stable products: the mature protein (extein) and the intein [47].

In SE2 and SS3e, the DNA polymerase appears to be encoded by more than one gene. For SE2, two gene products, gp05 (AEX56144) and gp06 (AEX56145) have predicted DNA polymerase activity. The large subunit, gp06, is predicted to possess an intein similar to that found in SETP3 and vB_SenS-Ent1. Like SE2, the DNA polymerase of SS3e also appears to be split into two coding sequences (gp41 and gp43), although in this case, the subunits are interrupted by an additional gene (gp42; AAW51247) with a predicted C-terminal HNH endonuclease domain. Notably, all three gene products show a short match to either the N- or C-terminal end of the SETP3 intein. However, re-examination of these coding sequences suggests that the DNA polymerase in SS3e and SE2 has been split due to sequencing or assembly errors.

A total of 15 accessory genes are encoded within the lysis/late gene cassette, of which only three have homologues with predicted functions: a putative protease (9 members), a HNH homing endonuclease (7 members) and a putative RNA-binding protein (2 members).

K1glikevirus

The Escherichia phages K1G, K1H, K1ind1, K1ind2 and K1ind3, comprising the proposed genus “K1glikevirus”are described as K1-dependent or -independent, denoting the requirement for the K1 capsule for productive infection [8]. The GC content of their genome is slightly higher than that of members of “Jerseylikevirus”, ranging between 51.1 and 51.5 %, a value closer to that of their host, E. coli (50.8 %). Moreover, they encode fewer proteins than do members of “Jerseylikevirus”.

Members of “K1glikevirus” possess tailspikes, which are of similar size to those of members of “Jerseylikevirus”. These proteins exhibit similarity to the conserved N-terminal domain identified in members of “Jerseylikevirus” but possess divergent C-terminal domains, indicative of their different host specificity. Tailspikes from the K1-independent phages K1ind1, K1ind2 and K1ind3 exhibit high similarity to the HK620 tailspike [PDB 2 × 6w] and appear to belong to the pectatelysase 3 family (pfam: PF12708). The HK620 tailspike possess endo-N-acetylglucosaminidase activity, which degrades the O-antigen of E. coli serotype O18A1 [6]. In contrast, the tailspike encoded by the K1-dependent phages K1G and K1H exhibit endo-N-acyl-neuraminidase (endosialidase) activity [8] and are nearly identical to the tailspike of the T7-like phage K1F (PDB 3ju4). Endosialidases bind to and degrade the K1 capsular polysaccharide, a homopolymer made up of α2,8-linked sialic acid residues [6]. Each of the K1-dependent tailspikes is predicted to contain a C-terminal Peptidase_S74 domain (PF13884), which shows homology to the protease domains of K1F, K1E, K1-5 tailspikes as well as the long tail fibre of T5. In these phages, this domain has been shown to function as an intramolecular chaperone, whose presence and subsequent auto-cleavage is essential for folding and assembly of mature proteins [49]. These data indicate that the K1-dependent tailspikes undergo a maturation process that is distinct from those of their counterparts in the genus “Jerseylikevirus” and K1-independent members of the genus “K1glikevirus”.

With the exception of the tailspike proteins and capsid-associated genes in K1H, members of “K1glikevirus” share the complete structural module with phages of the genus “Jerseylikevirus”. Phage K1ind3 has a HNH homing endonuclease gene immediately downstream of a gene coding for a gp7 family morphogenesis protein (gp04) related to Bacillus phage SPO1 (PBD 1u3e). K1G encodes another homing endonuclease (gp12) with 31 % identity to MobE of coliphage T6. Notably, the K1H major capsid sequence differs substantially from those of other “Jerseyvirinae” phages, but it is predicted to have a similar structure to SPP1 and HK97, according to HHpred. A further gene product of unknown function, K1H gp09, which was not found in other members of the proposed subfamily, is encoded immediately downstream of the gene coding for the major capsid protein (K1H gp08, ADA82303).

Three gene products are conserved in all K1G-like phages in the immunity gene module: a serine/threonine protein phosphatase (pfam:PF12850), an acid-phosphatase B domain protein (pfam: PF03767), and a protein predicted to have ATPase activity (pfam: P13207). With the exception of K1H, all members of “K1glikevirus” encode a superinfection immunity protein. Neither Jersey, AG11 nor K1H appear to harbour an immunity protein; instead, these phages encode a hypothetical protein forming a single OrthoMCL cluster.

In addition to the core replication proteins of members of the “Jerseyvirinae”, each of the K1G-like phages encodes a C-5 cytosine-specific methylase (pfam:PF00145) that bears little sequence similarity to the methylase of phage Jersey or FSL SP-101 and was grouped into a separate cluster using OrthoMCL. All K1G-like phages except K1ind1 encode a putative UvsX-like protein. In coliphageT4, UvsX functions as a RecA-like recombinase, interacting with the UvsY helicase to promote strand-exchange during genome replication [19].

Finally, a total of seven proteins are conserved in all K1G-like phages in the lysis or late gene module, three of which are conserved across the subfamily: holin, lysin and a hypothetical protein. Unlike the members of “Jerseylikevirus”, each of the K1G-like phages is predicted to encode a separate class I and class II holin, with three and two transmembrane domains, respectively. Each K1G-like phage has a NinH-like domain protein (pfam:PF06322) in addition to two gene products of unknown function. Lastly, a gene product found only in the K1-independent phages contains a DUF3850 family motif (pfam:PF12961), which has been suggested to be involved in RNA recognition.

“Sp3unalikevirus”

This genus was recently proposed by Moreno Switt et al. (2014). To date, three members (FSL SP-031, FSL SP-038, and FSL SP-049) have been reported, though only the genome of phage FLS SP-031 is fully sequenced, hence the proposed genus “Sp3unalikevirus” [42]. Phages in the genus “Sp03unalikevirus” were isolated from dairy farms in the state of New York with a history of Salmonella isolation [43]. Sp031-like viruses have a GC content of 51.1 %, closer to the Salmonella G+C value of approximately 52 % [46].

The members of “Sp3unalikevirus” are distinguished from members of the other genera by the presence of a number of genes encoding hypothetical products and genes associated with host specificity. The genes coding for a number of unique hypothetical proteins with unknown function are located in the replication module of SP-031-like viruses. Two annotated proteins that are absent in the “Jerseylikevirus” and “K1glikevirus” phages are homing endonucleases. These homing endonucleases are inserted in the replication module of FLS SP-031. One endonuclease shows 41 % identity over 93 % of the protein to an endonuclease of Synechococcus phage S-SSM7 (GenBankYP_004324330.1), and the other endonuclease, with the homing endonuclease domain PF13392, showed 42 % identity over 89 % of the protein to an endonuclease in the genome of Salmonella phage FLS SP-126 (GenBankAGF87903.1). Another difference of Sp03una-like viruses is the presence of a gene encoding a putative phosphoadenosinephosphosulfate (PAPS) reductase (pfam: PF01507) in the virion assembly module. PAPS reductases are involved in the reduction of sulfate to hydrogen sulfide, which is generated by Salmonella in the gut of humans and animals [59]. While PAPS reductases have previously been reported in temperate phages (e.g., BcepB1A [53]), there is no evidence that this enzyme confers a fitness advantage to the host.

Sp031-like phages have a very narrow host range. When 23 different Salmonella serovars were tested, only Salmonella Cerro was lysed [43]. This host specificity is probably determined by the tail spikes in members of “Sp3unalikevirus”. A BLAST search of the tail spike amino acid sequence of FSL SP-031 showed as the best hit (78 % over 80 % of the protein) a tail spike of a P22-like prophage found in the genome of S. Cerro str. 818 (GenBank ESH26034), with divergence at the N-terminal end. This prophage shows no further sequence identity to any other CDSs of Sp031-like phages. To investigate the potential genetic basis of this host range, an alignment of the tail spike amino acid sequences of representative Jersey-like viruses (i.e., Jersey, L3, SETP7, SETP3, SE2, Ent2, SETP12, and FSL SP-101) and Sp031-like viruses (FSL SP-031) showed a conserved N-terminus (approx. 87 % aa identity) and a divergent C-terminus (approx. 15 % aa identity). This finding corresponds to the host specificity reported. While other Salmonella phages in the subfamily “Jerseyvirinae” infect serogroups A, B, and D; Sp031-like viruses only infect serogroup K.

Discussion

The availability of an increasing number of genome sequences and improvements in gene prediction methods has resulted in a sizeable shift towards the inclusion of genomic and proteomic data for the taxonomic classification of bacteriophages [36]. However, a number of different, and sometimes conflicting, approaches have been reported in the literature for the purpose of delineating evolutionary relationships between phages, including the use of proteomic trees [48], analysis of shared homologous/orthologous proteins [34, 35], and reticulate classification based on gene content [38, 39].

Our analysis revealed that the classification criteria introduced for members of the families Podoviridae and Myoviridae [34, 35], based upon the existence of protein homologs, tend to “lump” taxa rather than represent the true taxonomic relationship within a genus. Using an alternative approach, employing NCBI BLASTN and TBLASTX, a genus can be defined as a group in which members share ≥65 % DNA sequence identity, while members of the same subfamily should show ≥40 % protein homologs (Kropinski, Edwards and Mahadevan, unpublished results). These values are those derived by Niu et al. in their assessment of the T1-likeviruses [45].

Phylogenetic analysis also demonstrates that the Salmonella and Escherichia phages described here fall into three clusters, substantiating the establishment of three genera. Analysis using EMBOSS Stretcher demonstrated that while phages within each genera were closely related, a significant relationship based on shared homologous proteins identified using CoreGenes and OrthoMCL existed between genera. Based upon these data, we propose the establishment of the subfamily “Jerseyvirinae” within the family Siphoviridae, comprising three genera, “Jerseylikevirus”, “K1Glikevirus” and “Sp3unalikevirus”. The three genera of closely related phages were isolated from geographically disparate locations. This suggests that phages of this subfamily are widely distributed and have persisted and evolved in various environments. Considering the length of time between the isolation of Jersey and the other members, horizontal gene transfer does not always mask the identification of taxonomic relationships between phages. Grose and Casjens have recently clustered 337 phages infecting various members of family Enterobacteriaceae [23] based upon average nucleotide sequence identity and conserved gene product content in addition to whole-genome nucleotide and amino acid dotplots. They report a “SETP3 supercluster”, formed from five clusters of lytic phages: SETP3-like, SO-1-like, ECO1230-10-like, Gj1-like and PY100-like phages. Our independent findings are in broad agreement with those reported by Grose and Casjens but provide a more detailed analysis of the phages comprising the SETP3-like subcluster.

All Jersey-like phages isolated to date exhibit a strictly lytic lifestyle and encode no proteins with homology to known toxins or allergens. As such, members of this subfamily appear suited to biocontrol applications [7, 24], particularly members of “Jerseylikevirus”, which exhibit a broad host range encompassing important serogroups of Salmonella.

Undoubtedly, more Jersey-like bacteriophages will be isolated in the future. The authors hope the data provided here will act as a starting point for the annotation of these future isolates.