Introduction

Baculoviruses are DNA viruses of the arthropod-specific virus family Baculoviridae [1]. Members of Baculoviridae have been isolated primarily from the insect order Lepidoptera (moths and butterflies), but some baculoviruses have also been described from mosquitoes, sawflies, and shrimp. Currently, baculoviruses are divided into the genera Nucleopolyhedrovirus and Granulovirus [2]. Both nucleopolyhedroviruses (NPVs) and granuloviruses (GVs) produce two structurally and functionally distinct types of virions: an occluded form that initiates primary infection of the host insect, and a budded form that spreads infection to other tissues of the infected host. NPVs produce cuboidal polyhedra containing many occluded virions, while GVs produce smaller spheroid occlusions containing a single virion. The GVs have been isolated exclusively from Lepidoptera. GVs cause three distinct types of pathology in infected hosts [3]. Type 1 GV pathology is characterized by an infection limited to the host’s midgut and fat body resulting in a relatively slow speed of kill. Type 2 GV pathology is characterized by infection of most of the host’s tissues and a rapid speed of kill. A single GV, the Harrisina brillians granulovirus, causes a third type of pathology resulting in an infection that is constrained to the midgut epithelium and that results in the rapid death of the host. Phylogenetic analysis of GV sequences suggests that these different types of GV pathogenesis do not have monophyletic origins [4].

Baculovirus genomes consist of a single large circular double-stranded DNA molecule, ranging in size from 80 to 180 kbp. Several NPV genome sequences have been reported, but to date complete genome sequences have only been reported for six GVs [510] with genome sequences for another three GVs (from Phthorimaea operculella, Agrotis segetum, and Spodoptera litura) on file in GenBank.

A GV from the Old World bollworm, Helicoverpa armigera, was first described by Whitlock [11]. This virus, H. armigera granulovirus (HearGV), kills larvae of H. armigera slowly and appears to cause type 1 GV pathology. HearGV exhibits a relatively broad host range in bioassays compared to other GVs [12]. Of particular interest is the interaction between HearGV and other NPVs in mixed infections. The LC50 of Lymantria dispar multiple nucleopolyhedrovirus (LdMNPV) against L. dispar larvae is reduced when LdMNPV polyhedra was mixed with HearGV granules [13], probably due to the action of enhancins in HearGV granules [14]. However, mixed infections of H. armigera larvae with H. armigera single nucleopolyhedrovirus (HearSNPV) and HearGV results in lower mortalities than infections with either HearSNPV or HearGV alone [15]. In co-infections of Helicoverpa zea with HearGV and H. zea single nucleopolyhedovirus (HzSNPV), survival times of larvae increase with an increasing dose of HearGV, even when HearGV is applied 36 hr after infection with HzSNPV [16]. Cadavers from co-infected larvae contain HearGV granules instead of HzSNPV polyhedra, suggesting that HearGV inhibits HzSNPV replication.

To add to our knowledge of granulovirus molecular genetics and to acquire preliminary information for studies into HearGV’s broad host range and ability to interfere with NPV infection, the complete genome of HearGV was sequenced and analyzed.

Materials and methods

Viral DNA extraction and cloning into fosmid vectors

The HearGV isolate used in previous studies [13, 16] was amplified in Heliothis virescens larvae by per os infection. Viral particles were isolated by sucrose gradient centrifugation and DNA was subsequently extracted from the purified particles [17]. Because HearGV could not be propagated in cell culture, viral DNA from granules was sheared and cloned into the vector pCC1FOS using the Copy Control Fosmid Library Production Kit (Epicentre Biotechnologies). Fosmid DNA was purified as per kit instructions. Following the characterization of 60 clones by EcoRI, HindIII, BgII, NheI, and BamHI restriction endonuclease digest, a complete genomic library of HearGV was chosen consisting of seven overlapping fosmid clones.

DNA sequencing and analysis

The DNA of fosmid clones containing HearGV inserts was sheared and cloned into the pCR-Blunt II-TOPO plasmid vector (Invitrogen) as previously described [18]. White colonies from the cloning procedure were picked, thermally lysed, and inserts amplified by PCR with vector-specific primers as described [18]. The PCR products were precipitated with 20% PEG-2.5 M NaCl to remove excess primers and dNTPs as described [18] and sequenced using nested plasmid vector-specific primers T7 (5′GTAATACGACTCACTATAGGG-3′) and SP6 (5′-GCTATTTAGGTGACACTATAG-3′). Reactions were carried out with the Applied Biosytems BigDye Terminator Cycle Sequencing kit with AmpliTaq DNA polymerase, and fragments were electrophoresed on an Applied Biosystems 3100 DNA sequencer.

Contigs were assembled from DNA sequencing runs with the Seqman program of the Lasergene suite (DNASTAR, Inc.). Gaps and ambiguities in the genome sequence were resolved by amplifying the corresponding regions of the sequence from both fosmid and viral DNA by PCR (40 pg DNA/reaction) with custom-designed primers and sequencing the PCR products.

Open reading frames (ORFs) greater than 50 codons in length that did not overlap larger ORFs by more than 75 nt and were not present in a homologous repeat region (hr) were identified and selected for further characterization. ORFs with homologues in other baculovirus genomes also were characterized. Predicted amino acid sequence identities were obtained from the results of protein database searches using the standard protein-protein BLAST algorithm (http://www.ncbi.nlm.nih.gov/BLAST/).

For phylogenetic inference, amino acid sequences derived from selected genes were aligned by ClustalW [19] using Gonnet matrices with a gap penalty of 10 and a gap extension penalty of 0.1 for pairwise alignments and 0.2 for multiple alignments. Sequence alignments for different genes were concatenated using BioEdit [20]. The concatenated amino acid alignments were used to construct phylograms with MEGA version 4.0 [21] using minimum evolution (ME) and maximum parsimony (MP) methods. ME and MP trees were sought by using a close-neighbor-interchange heuristic search, starting with either 1 initial neighbor-joining tree (ME) or 10 initial trees generated by random addition of sequences (MP). For ME trees, Poisson correction distances were estimated with a gamma shape parameter of 2.25. In both cases, the reliability of the trees was tested with bootstrap re-sampling using 1,000 replicates. To calculate Kimura-2-parameter nucleotide distances [22], nucleotide sequences were aligned by ClustalW using IUB matrices, gap penalties of 15 and gap extension penalties of 6.66 for pairwise and multiple alignments. Individual and concatenated nucleotide alignments were used to calculate distances using MEGA version 4.0.

Results and discussion

Characteristics of the HearGV genome sequence

A 13.4X sequence of the HearGV genome was compiled from all sequence data generated. The size of the final draft sequence was 169,794 nt, which makes the HearGV genome the second largest baculovirus genome sequenced after the XecnGV genome, which is 178,733 nt [5]. The third largest baculovirus genome is that of Leucania separata nucleopolyhedrovirus, at 168,041 nt [23], followed by Lymantria dispar multiple nucleopolyhedrovirus (LdMNPV) at 161,046 nt [24]. The HearGV genome has an A+T content of 59.2%, which is closest to that of XecnGV (59.3%) and Plutella xylostella granulovirus (59.3%; [6]). Among GV genomes, only the Cydia pomonella granulovirus (CpGV; [7]) has a lower A+T content at 54.8%.

One hundred seventy-nine ORFs were identified that equal or exceed 50 codons in length, had minimal overlap with larger ORFs or shared significant sequence identity with previously characterized baculovirus ORFs (Table 1, Fig. 1). The ORF encoding granulin was designated as the first ORF (hear1), and the first nucleotide position for the HearGV genome sequence was set to the adenine of the granulin ORF initiation codon. As with other baculovirus genomes, the ORFs were randomly distributed with 90 ORFs in the granulin-sense orientation and 89 in the opposite orientation. Canonical baculovirus early and late gene promoter sequences were associated with 93 of the HearGV ORFs (Table 1).

Table 1 Features of the HearGV genome
Fig. 1
figure 1

Map of the ORFs and other features of the HearGV genome. ORFs are represented by arrows, with the position and direction of the arrow indicating ORF position and orientation. The number of each ORF is displayed, with the name of the ORF following a colon. Homologous repeat regions (hrs) are represented by hatched boxes. The shading of the ORFs indicates the degree of predicted amino acid sequence similarity to XecnGV homologues

In regions of the HearGV genome sequence that were covered by more than one fosmid clone, sequence polymorphisms were detected. Amplification and sequencing of selected regions directly from viral DNA where polymorphisms were observed in the assembled sequence confirmed the presence of polymorphic sites in the viral DNA preparation from which the fosmids were constructed. No more than two different nucleotides were observed at polymorphic sites, even in regions covered by more than two fosmid clones. The polymorphisms numbered 333 substitutions and 24 insertions and deletions (indels) ranging from 1 to 71 nt in size. Of the 230 substitutions that occurred in ORFs, 126 were silent, while 104 resulted in amino acid changes. Of the fourteen indels occurring in ORFs, three indels did not alter the reading frame, while eleven caused frameshifts. This result indicated that the HearGV sample used in this study consisted of more than one genotype. Genotypic variation is common among field isolates of baculoviruses [10, 2527].

Baculovirus genomes generally contain clusters of repeated sequences known as homologous repeat regions (hrs). The hrs function as enhancers of transcription and origins of replication [28, 29]. While NPV hrs typically consist of repeated palindromes, GV hrs are more variable in structure. Nine hrs were identified in HearGV that consisted of the same kind of repeated sequences as found in the hrs of XecnGV [5], with two conserved 10-bp core sequences. These repeats shared the same relative locations on the genome as the XecnGV repeats, and thus were given the same numbering designations. A large proportion of sequence polymorphisms was found in HearGV hr1, with 65 substitutions and 6 indels observed among shotgun clones from the fosmids containing this region. While HearGV and XecnGV hr1, hr5a, and hr6 contained the same number of repeats, hr2, hr3, hr4, hr5, hr7, and hr8 differed between the two viruses by one or two repeats. Differences in the number of repeats among the hrs of closely related viruses have been documented previously for several pairs of closely related NPVs [18, 25, 3032].

Relationships with other granuloviruses

Roelvink and co-workers [14] previously sequenced a 3,213-bp region from a H. armigera granulovirus. This sequence is 100% identical with three 1-nt gaps to nt 146202–149415 of the HearGV genome reported here. BLAST searches with HearGV nucleotide sequences and ORF predicted amino acid sequences revealed that HearGV also was closely related to XecnGV. To examine the relationship of HearGV to other GVs, phylogenetic trees were inferred from a set of concatenated aligned partial amino acid sequences of granulin and late expression factors 8 and 9 (lef-8 and lef-9; Fig. 2) for HearGV, other completely sequenced GVs, Autographa californica multiple nucleopolyhedrovirus (AcMNPV-C6, [33]) and partial sequences from other GVs [4, 34]. This analysis confirmed the close relationship of HearGV and XecnGV. HearGV and XecnGV were placed in a clade of closely related GVs isolated from Lepidoptera of family Noctuidae, including Autographa gamma GV, Hoplodrina ambigua GV, Euxoa ochrogaster GV, and Scotogramma trifolii GV (Fig. 2). These viruses, along with XecnGV, are considered to be isolates of the same virus species [34].

Fig. 2
figure 2

Phylogenetic analysis of concatenated amino acid sequence alignments, showing bootstrap values >50% for ME and MP trees at each node (ME/MP). The location of HearGV (bold) is indicated by an arrow. A consensus ME phylogram of concatenated partial polh, lef-8, and lef-9 sequence alignments for the following completely sequenced granuloviruses and other granuloviruses described in [34] and [4]: Choristoneura murinana GV (ChmuGV A11-1); Choristoneura occidentalis GV [10]; Pandemis limitata GV (PaliGV M36-1); Amelia pallorana GV (AcpaGV M30-1); Pieris rapae GV (PiraGV isolates M36-7 and S55); Pieris brassicae (PbGV S54); Andraca bipunctata GV (AnbiGV S48); Erinnyis ello GV (ErelGV M34-4); Clostera anachoreta GV (ClanGV S49); Phthorimaea operculella GV (accession number NC_004062); Adoxophyes orana GV (AdorGV isolates E1 [8], A2-3, A6-5, and S45); Plathypena scabra GV (PlscGV A25); Cryptophlebia leucotreta GV (CrleGV CV3 [9]); Cnephasia longana GV (CnloGV A2-2); Cydia pomonella GV (CpGV isolates M1 [7], A11-2, A6-4, and M39-1); Agrotis exclamationis GV (AgexGV S46); Agrotis segetum GV (AgseGV complete sequence accession number NC_005839 and isolates A17-5 and S47); Spodoptera litura GV (SpltGV K1, accession number NC_009503), Spodoptera androgea GV (SpanGV A25-7); Spodoptera frugiperda GV (SpfrGV A12-4); Peridroma morpontora GV (PemoGV A25-3); Trichoplusia ni GV (TnGV M10-5); Autographa gamma GV (AugaGV M39-3); Hoplodrina ambigua GV (HoamGV M39-2); Xestia c-nigrum GV (XecnGV α4 [5]); Euxoa ochrogaster GV (EuocGV A24-1); Scotogramma trifolii GV (SctrGV A26-3); Plutella xylostella GV (PlxyGV K1 [6]); Estigmene acrea GV (EsacGV M30-3); Hyphantria cunea GV (HycuGV A5-1); Autographa californica MNPV (AcMNPV C6 [33])

Jehle et al. [4] proposed a criterion for distinguishing among baculovirus species using nucleotide distances (in base substitutions per site) calculated with the Kimura-2-parameter (K-2-P) method for the three marker genes used in Fig. 2 for phylogenetic analysis (granulin, lef-8, and lef-9). Under this criterion, viruses with distances of <0.015 for single or concatenated sequences are considered to belong to the same species, while viruses with distances >0.05 are considered to be different species. For distances from 0.015 to 0.05, additional information is required to make a decision about species boundaries. K-2-P distances were calculated for the clade of viruses containing HearGV, TnGV, and the XecnGV isolates. The pairwise distances between HearGV and the XecnGV isolates for lef-8 and lef-9 ranged from 0.015–0.021 to 0.015–0.022, respectively (Table 2). For the more strongly conserved granulin gene, pairwise distances ranged from 0.010 to 0.012. The pairwise distances for the concatenated sequences ranged from 0.010 to 0.017. For all four sets of data, the nucleotide distances between HearGV and the XecnGV isolates were larger than the pairwise distances among the XecnGV isolates themselves (Table 2). However, the results of this analysis do not lend themselves to a straightforward conclusion about the taxonomic position of HearGV with respect to XecnGV and other XecnGV variants.

Table 2 Estimates of pairwise nucleotide distances (base substitutions/site) of the nucleotide sequences of (A) lef-9 and lef-8 fragments and of (B) granulin and concatenated granulin/lef-8/lef-9 fragments of HearGV, TnGV, and XecnGV variants

Gene content and order

Gene-parity plot analysis [35] revealed a strong degree of co-linearity between HearGV and other GVs (data not shown). PlxyGV and CpGV are missing many of the ORFs found in HearGV, which is expected given the significantly smaller sizes of the genomes of these GVs. Comparison of HearGV with AcMNPV-C6 revealed that the order of some ORFs in these NPVs was conserved between the two viruses, but the orientation of a large proportion of these ORFs was inverted relative to the polyhedrin gene. These results are similar to those of previous analyses of GV and NPV gene order [610].

All the 62 genes common among lepidopteran baculovirus genome sequences as of 2006 [2] were found in the HearGV genome. Seven HearGV ORFs have no discernible homologues in baculovirus genomes sequenced to date (Table 3). Four of these ORFs (hear70, hear104, hear111, and hear115) are small (<75 codons) with either no BLAST hits or hits with relatively high E-values (0.47 or higher). The other three ORFs (hear56, hear75, and hear76) have homologues in other families of insect DNA viruses.

Table 3 ORFs unique to HearGV

The predicted amino acid sequence for hear56 produced BLAST hits with orf76 of Spodoptera frugiperda ascovirus 1a (SfAV-1a; [36]) and orf95 of H. virescens ascovirus 3e (HvAV-3e; [37]). The role of these ORFs in ascovirus biology is currently unknown. ORFs hear75 and hear76 showed sequence similarity to members of the polydnavirus Rep gene family in ichnoviruses of Hyposoter fugitivus, Hyposoter didymator, Tranosema rostrales, and Campoletis sonorensis. Polydnavirus Rep genes contain one or more copies of an imperfectly conserved 540-bp sequence [38, 39]. Though they are present in large numbers in ichnovirus genomes and expressed in both the wasp and parasitized larval host, the function of their gene products is unknown [40, 41].

Another HearGV ORF of interest is hear21, a homologue of the sprT family of putative metallopeptidases thought to be involved in transcriptional elongation [42]. SprT homologues are also found in Trichoplusia ni ascovirus 2c (TnAV-2c; [43]) and in Mamestra configurata multiple nucleopolyhedroviruses A and B (MacoNPV-A and MacoNPV-B; [44, 45]).

There are 20 ORFs that HearGV and XecnGV have in common which are not present in other baculovirus genomes sequenced to date (Table 4). One of these ORFs (hear108) exhibits significant sequence similarity to ascovirus ORFs. The other ORFs either exhibit no significant similarity to other amino acid sequences or have BLAST hits with modest E-values (0.0008–0.0096) to uncharacterized sequences. Canonical baculovirus early and late gene promoter motifs are associated with ten of these ORFs. Several of these ORFs are located next to or near hr3 and hr7.

Table 4 ORFs unique to HearGV and XecnGV

Other HearGV ORFs exhibit interesting patterns of similarity with genes from other viruses. After xc21, the next two top BLAST hits for ORF hear20, a homologue of the NPV gene p94, are ORFs 3004 (accession number AAV98008, 49% identity) and 3003 (accession number AAV98006, 44.7% identity) of Cotesia plutellae bracovirus. ORFs hear27, hear36, and hear89 have homologues only in XecnGV and Spodoptera litura granulovirus (SpltGV, accession number NC_009503), while hear148 homologues are found in SpltGV and a selection of NPVs. Among granulovirus genomes, ORFs hear39, hear52, hear81, hear131, and hear132 (lef-7) only have homologues in XecnGV, but homologues of these ORFs also occur in a subset of NPVs.

Multigene families

Multigene families, or groups of related genes, occur in a number of large DNA viruses. One of the most widespread multigene families is the baculovirus repeated ORF (bro) family, found in many invertebrate DNA viruses [46]. Gene expression, nucleic acid binding activity, nucleosome association, and protein localization and trafficking have been characterized for some NPV bro genes and proteins [4650], but the functions of bro gene products in the baculovirus life cycle are not known with precision. The number of bro genes in a genome varies from virus to virus. The XecnGV has eight bro-homologous sequences, including one ORF (xc62) not previously identified as a bro family member. HearGV has 10 bro-homologous sequences, including homologues for XecnGV bro-a (xc60), xc62, bro-b (xc76), bro-c (xc109), bro-d (xc114), bro-f (xc131), and bro-g (xc159). In HearGV, there are three adjacent pairs of bro ORFs (hear54 and hear55; hear101 and hear102; and hear158 and hear159). ORF hear55 occurs in an approximately 1.3-kbp insertion of novel sequence (not found in the XecnGV genome) that also includes hear56. Although the hear55 coding sequence aligns with aa 179–223 of XecnGV bro xc60, the top BLAST hit for hear55 is MacoNPV-A bro-c. The ORFs hear101 and hear102 are contained within an approximately 1.5-kbp insertion of novel sequence, and both align with different portions of xc109. An approximately 1.3-kbp sequence that would contain the xc109 homologue is missing from where it should occur in the HearGV genome, suggesting that the position of hear101 and hear102 is a consequence of re-arrangement within the HearGV genome. Indels in the region of HearGV corresponding to xc159 resulted in the division of this ORF into hear158 and hear159. Of the three pairs of bro sequences, only hear54 has an intact Bro-N motif which is involved in DNA binding [50, 51].

HearGV contained homologues to a group of four XecnGV ORFs (xc61, xc73, xc155, and xc161) that are part of a five-gene family with significant sequence similarity to each other [5]. Homologues to various members of this family also occur in another granulovirus (AgseGV), an NPV (MacoNPV-B), an ascovirus (TnAV-2c), and an entomopoxvirus (Amsacta moorei entomopoxvirus (AmEPV; [52]). In ME and MP phylograms, the ascovirus and granulovirus members of this family form distinct clades (Fig. 3a). The MacoNPV-B ORF macoB57 was found to be more closely related to xc61 than the HearGV homologue. Deletions in the HearGV genome removed sequence containing the HearGV homologue for xc22 and 300 C-terminal xc61-homologous codons from hear57. An additional pair of homologous ORFs, hear155 and xc157, also exhibits a low degree of sequence similarity to members of this gene family. No conserved domains that would suggest a function were detected among the genes in this family.

Fig. 3
figure 3

Phylogenetic analysis of HearGV gene family amino acid sequences. Consensus ME phylograms inferred from the alignment of (a) hear57/hear67/hear154/hear160 and homologous ORFs, and (b) hear53/hear157 and homologous ORFs, with bootstrap values >50% shown at interior branches for ME and MP analysis (ME/MP) where they occur. The sequences used derive from XecnGV (xc22, xc61, xc73, xc155, and xc161 [5]), Amsacta moorei entomopoxvirus (AmEPV ORF109 [52]), Trichoplusia ni ascovirus 2c (TnAV-2c ORF21 and ORF74; [43]), Mamestra configurata NPV-B (macoB57 and macoB18; [45]), Heliothis virescens ascovirus 3e (HvAV-3e hr1 through hr5; [37]), and Spodoptera frugiperda ascovirus 1a (SfAV-1a ORF34 and ORF77; [36])

Open reading frames hear53 and hear157 have 92% sequence identity with each other by BLAST. These ORFs contain a hypothetical DNA binding domain (transposase_35) normally found in the C-terminus of many transposases. Five homologues for these genes are present in H. virescens ascovirus 3e (HvAV-3e; [37]). These ORFs constitute the homologous repeat regions for HvAV-3e. Two homologues are also found in the Spodoptera frugiperda ascovirus 1a (SfAV-1a; [36]) genome, and a single homologue is found in MacoNPV-B. No homologues for these ORFs are present in the XecnGV genome. Phylogenetic analysis divided the ascovirus and HearGV members of this gene family into separate clades (Fig. 3b). The MacoNPV-B homologue, macoB18, was grouped either with the HvAV-3e ORFs (by minimum evolution) or the SfAV-1a ORFs (by maximum parsimony; data not shown). However, macoB18 is only 301 codons in length and aligns with the C-terminus of the ascovirus ORFs. In contrast, the HearGV homologues, at 572 and 576 codons, align with most of the length of the ascovirus ORFs. A high concentration of polymorphisms (83 substitutions and six indels) occurs in hear53.

Comparison with XecnGV

Excluding gaps inserted to optimize the alignment, the overall nucleotide sequence identity between the HearGV and XecnGV genomes is 94.8%. The smaller size of the HearGV genome relative to the XecnGV genome is due to the reduced number of repeats in the hrs shared by the two viruses (Table 1) and a loss of approximately 16.6 kbp of XecnGV-homologous sequence from the non-hr regions of HearGV, counterbalanced by the occurrence of approximately 8.2 kbp of novel sequence not found in XecnGV (Fig. 4). Of the 179 ORFs in HearGV, 167 have homologues in the XecnGV genome. Two of these ORFs include homologues for two previously uncharacterized XecnGV ORFs (xc77a and xc89a) that also occur in other GVs (Table 1). There are differences in the promoter motif composition of 17 of the ORFs that HearGV and XecnGV have in common. Eight HearGV ORFs have a late or early promoter motif within 120 bp of the start of the ORF, while the corresponding XecnGV homologues are lacking these motifs. Nine XecnGV ORFs are preceded by early or late promoter motifs not present in the corresponding HearGV homologues. In addition to conservation of ORF content and hr placement, three relatively large (>500 bp) non-hr intergenic regions are conserved in the HearGV and XecnGV genomes. These intergenic regions lie between ORF pairs hear23/xc24 and hear24/xc25, hear24/xc25 and hear25/xc26, and hr4 and hear70/xc76.

Fig. 4
figure 4

Map of deletions and insertions in the HearGV genome relative to the XecnGV genome. The relative positions and sizes of insertions and deletions are indicated by boxes on a linear representation of the HearGV genome. Regions of XecnGV sequence that are missing from the homologous location in the HearGV genome are denoted by white boxes, while regions of novel HearGV sequence not found in the homologous location in the XecnGV genome are denoted by black boxes. XecnGV ORFs that are located in the XecnGV sequences missing from HearGV are indicated, as are HearGV ORFs present in the novel HearGV sequences. In the case of xc63, a HearGV homologue is present but truncated by 300 codons due to the deletion of XecnGV-homologous sequence in this region

Two XecnGV ORFs were fused into a single ORF in HearGV. A two-nt insertion at HearGV nt 26447–26448 relative to XecnGV resulted in a frameshift and fusion of sequences homologous to XecnGV ORFs xc30 and xc31 into a single ORF, hear29. A one-nt insertion at HearGV nt 37744 relative to XecnGV caused a frameshift enabling the xc47-homologous ORF to extend past a stop codon into the xc48-homologous ORF, leading to a fusion of these ORFs into hear44.

Fourteen XecnGV ORFs (xc6, xc22, xc49, xc58, xc59, xc63, xc69, xc104, xc130, xc138, xc151, xc153, xc160, and xc163) are not represented by homologues in the HearGV genome (Table 5). In addition, homologues for the XecnGV ORFs xc41 and xc156 are not listed among the ORFs of HearGV in Table 1. These sequences are present in the HearGV genome, but they are nested within larger neighboring ORFs (hear38 and hear155, respectively). Of the missing ORFs, five (xc6, xc63, xc104, xc153, and xc160) are relatively small (<100 codons). The remaining ORFs are >150 codons in size. The absence from HearGV of homologues for xc22, xc58, xc59, xc63, xc69, xc130, xc138, xc151, xc153, xc160, and xc163 was due to the deletion of XecnGV-homologous sequences containing those ORFs from the HearGV genome (Fig. 4). Homologues for xc6, xc49, and xc104 were absent from the HearGV genome due to substitutions leading to the loss of the initiation codons for those ORFs. Two of the missing ORFs (xc59 and xc138) share significant sequence similarity with each other and with ORF 133 from the Chrysodeixis chalcites NPV genome [53]. ORF xc63 has homologues in MacoNPV-A and -B, while ORFs xc151 and xc160 have homologues in several NPVs. ORF xc58 encodes a viral cathepsin L present in most baculoviruses that is involved in the liquefaction of host tissues and disintegration of the host cuticle often seen with baculovirus infections of larvae [54, 55]. Of nine completely sequenced granulovirus genomes, four (XecnGV, CrleGV, AgseGV, and CpGV) contain cathepsin genes, while five (PhopGV, PlxyGV, AdorGV, SpltGV, and ChocGV) do not possess a cathepsin gene.

Table 5 XecnGV ORFs missing from HearGV

The HearGV genome contains 10 ORFs (sprT (hear21), hear53, hear56, hear70, hear75, hear76, hear104, hear111, hear115, and hear157) with no homologues listed for XecnGV. Six of these ORFs (hear21, hear53, hear56, hear75, hear76, and hear157) are contained within insertions of novel sequence in the HearGV genome that are not present in XecnGV (Fig. 4). Homologous sequences for hear70, hear111, and hear115 exist in the XecnGV genome, but are shorter than 50 codons in length due to substitutions and indels resulting in the appearance of stop codons in the XecnGV sequences. A substitution in XecnGV eliminates the start codon for hear104.

Conclusions

Whole-genome analyses of baculovirus phylogeny have revealed that baculovirus genomes in general exhibit a tremendous degree of “genomic plasticity” [56, 57]. This genetic fluidity is also on display in the HearGV genome and its relationship to the XecnGV genome. Although HearGV and XecnGV share a high degree of nucleotide sequence identity, numerous re-arrangements in the form of insertions and deletions have occurred in HearGV and XecnGV since they diverged. A number of ORFs unique to either XecnGV or HearGV individually, or unique to both viruses, are located next to or near homologous regions (Tables 1 and 4), further highlighting the role that these sequences play in gene loss and acquisition events [58]. Many genes in HearGV and XecnGV are represented by homologues in the genomes of nucleopolyhedrovirues, ascoviruses, entomopoxviruses, and polydnaviruses, suggesting the acquisition or exchange of genes by non-homologous recombination with viruses other than GVs.

Results of pairwise nucleotide distances for three marker genes suggest that HearGV either belongs to the XecnGV species cluster or is very closely related to XecnGV. In addition to the four variants of XecnGV identified by Lange et al. [34], restriction endonuclease analysis by Goto et al. [59] previously identified additional GVs from four other noctuid hosts (Hydraecia amurensis, Celaena leucostigma, Aletia pallens, and Pseudaletia separata) that likely are also variants of XecnGV. These data suggest that the virus species group that includes HearGV and XecnGV possess a very broad host range, not only among granuloviruses, but among baculoviruses in general. The ORFs unique to HearGV and XecnGV (Table 4), as well as other ORFs shared by these GVs that do not occur in other GVs, may account for the broad host range exhibited by this virus group.