Marek’s disease (MD) is related to T-cell lymphomas in chickens. The causative agent is gallid herpesvirus type 2 (GaHV-2), a member of the genus Mardivirus within the subfamily Alphaherpesvirinae of the family Herpesviridae [1, 2]. GX0101 is the first natural recombinant GaHV-2 field strain isolated from birds showing tumors in China [3]. It contains a 538-bp reticuloendotheliosis virus (REV) long terminal repeat (LTR) inserted between nucleotide bases “C” and “A” numbered 153,175–153,176 (Md5 strain) or 154,507– 154,508 (RB1B strain) [4, 5]. GX0101 is a very virulent GaHV-2, with greater horizontal transmission ability than Md5, while other reported recombinant GaHV-2 strains with an REV-LTR [6], such as RM1 obtained from cell cultures, are attenuated and do not cause tumors [79].

Sequencing of 40 μg of GX0101-BAC [10] (200 ng/ul) was carried out commercially using a pyrosequencing platform, the Genome Sequencer 20 FLX System (454 Life Science Corporation). Problematic regions containing mononucleotide reiterations or repetitive regions were also scrutinized using PCR products derived from parental GX0101 DNA.

GX0101-BAC DNA sequences were assembled from 13,596 reads (average length 378 bp) using the Sequencher Program (Gene Codes, Ann Arbor, MI). On average, the final sequence represents a 28-fold coverage at each base pair. Ambiguities in the GX0101 BAC sequencing data were resolved by re-sequencing using an ABI-3730XL automated DNA sequencer (Applied Biosystems, Foster City, CA). Open reading frames (ORFs) and DNA regulatory sequences were maintained and analyzed using DNASTAR (Madison, WI), and other Web-based tools. The sequences of the GaHV-2 strains Md5, 814, pC12/130-10, pC12/130-15, CVI988/Rispens, GA, and RB1B used in comparisons were obtained from GenBank [1, 2, 1114].

The complete sequence of GX0101 (GenBank no. JX844666) genome is 178,101 bp, not including the BAC vector sequence. The partial sequence in the US2 region that was lost during the construction of GX0101-BAC was added by sequencing PCR products of the parental GX0101 DNA [10]. The lengths of the terminal repeat long (TRL) region, the unique long (UL) region, the internal repeat long (IRL) region, the internal repeat short (IRS) region, the unique short (US) region, and the terminal repeat short (TRS) region are 12,758, 113,572, 12,741, 12,700, 11,695, and 13,134 bp, respectively. The genome contains about 200 ORFs and two copies of a 132-bp repeat sequence.

GX0101 genome contains only one REV-LTR insert of 538 bp (corresponding to nt 152,724-153,261) located within the sorf1 gene and 267 bp upstream of the sorf2 gene. Unlike Md5, GX0101 encodes only one SORF2 protein (MDV 087) [2], which is most similar to strains pC12/130-10, pC12/130-15, and GA. The sorf2 gene is nonessential for virus replication and tumor formation [15]. The LTR from REV may act as a strong promoter or enhancer, affecting the transcription of GaHV-2 sorf2 and probably enhancing the expression of the GaHV-2 SORF2 protein [16]. In the RM1 strain, the REV-LTR insertion site is farther upstream of both sorf1 and sorf2 [8]. More interestingly, the REV-LTR inserted in GX0101 is more stable than it is in other similar recombinant GaHV-2 strains. RM1 has obtained another REV-LTR repeat in its TRS during passage [8]. Furthermore, the RM1 REV-LTR insert is not stable during passage in chickens when inserted into Md5 at the same site and is lost after several passages in vivo [17]. In contrast, GX0101 is very stable in maintaining its REV-LTR insert even after 20 passages in chickens.

At least 160 ORFs, originally defined by annotating the GaHV-2 genomes GA, Md5, 814, pC12/130-10, pC12/130-15, CVI988/Rispens, and RB1B, were examined for mutations. Table 1 presents the percentage similarity and length difference relative to the homologs found in the genomes of attenuated (CVI988/Rispens, 814, pC12/130-15) and virulent (GA, RB1B, Md5, pC12/130-10) strains. Some ORFs are also noted with putative functions according to their homologs in other alphaherpesviruses or their protein-binding partners. ORFs containing frameshift mutations were also re-examined using PCR products derived from the parental GX0101 DNA. ORFs with high similarity scores (100 %) were omitted, while those with low similarity scores (<98 %) indicative of frameshift mutations, truncations, and non-synonymous amino acid substitutions were re-examined using multiple protein alignments with homologous ORFs from CVI988/Rispens, 814, GA, RB1B, Md5, pC12/130-10, and pC12/130-15 strains.

Table 1 ORFs in the genome of GaHV-2 strain GX0101 that differ from those of other strains. Genes that differ (identity <100 %) relative to homologs found within the genomes of the attenuated (CVI988/Rispens, 814) and virulent (GA, RB1B, Md5, pC12/130-10, and pC12/130-15) strains are listed

In order to identify amino acid changes that are important for horizontal transmission ability and pathogenicity, the nucleotide sequence of GX0101 was compared with sequences of the attenuated (CVI988/Rispens, 814, pC12/130-15) and virulent (RB1B, Md5, GA, pC12/130-10) strains by multiple sequence alignment. As listed in Table 2, in addition to the 14 GX0101-specific amino acid mutations identified in its ORFs, 61 single nucleotide polymorphisms in 23 genes were identified when compared with the virulent RB1B, Md5, and the vaccine strain CVI988/Rispens. pC12/130-10 and pC12/130-15 contained only two polymorphisms relative to GX0101 in the rlorf7 and icp4 genes.

Table 2 Mutations associated with various strains of GaHV-2. (A) All open reading frames (ORFs) were examined for non-synonymous amino acid substitutions in the comparison to homologous ORFs from the attenuated strains CVI988/Rispens, pC12/130-15, and 814 as well as the virulent strains RB1B, GA, Md5 and pC12/130-10Mutations in vaccine strains compared to virulent strains of GaHV-2. (B) Single nucleotide polymorphisms shared between GX0101 and vaccine strain CVI988/Rispens, vvMDV RB1B, and vvMDV Md5

When all of the ORFs of GX0101 were compared with those of other GaHV-2 strains (814, CVI988/Rispens, GA, RB1B, Md5, pC12/130-10, and pC12/130-15), only 11 of them differed significantly due to non-synonymous substitutions, as listed in Table 1 (in bold), including ORFs 5.0/76, 4.0/77, 49, and 86.6, encoding the MEQ protein, 23-kDa nuclear protein, large tegument protein, and SORF1 protein, respectively, as wells as ORFs 3.4/78.3, 3.6/78.2, 76.4, 80, 85/99, 86/98, and 91.5, encoding the putative proteins RLORF4, RLORF5, MLHG, MFAY, MNDR, MSWP, and MAHG, respectively.

When compared with the Md5, RB1B, GA, 814, and CVI988/Rispens strains, GX0101 had the highest sequence identity to two BAC clones, pC12/130-10 and pC12/130-15, from the UK strain C12/130 [18]. Among the mutated 76 ORFs of all GaHV-2 strains, GX0101 has 49 and 46 ORFs that are 100 % identical to pC12/130-10 and pC12/130-15, respectively, but only 7-27 ORFs that are 100 % identical to other strains (Table 2). pC12/130-10 is virulent, but pC12/130-15 is attenuated, although they are descended from the same original strain [11]. Since GX0101 and its BAC-rescued virus bac-GX0101 are very virulent [9], it is of consequence to compare these three strains for further understanding of genomic sequences related to their virulence and pathogenicity.

We also found five consecutive repeats of a 217-bp fragment in ORFs 97.3-97.6 in the TRS region and three repeats of the same 217 bp in ORFs 86.2- 86.4 in the IRS region of GX0101, while only 1-2 copies in both the TRS and IRS regions were found in other strains. GX0101 has a 486-bp deletion (corresponding to nt 164,033-164,518 of Md5) in its US region, which is also absent in GA, pC12/130-10, and pC12/130-15 [1, 12] but present in Md5, RB1B, and CVI988/Rispens [2, 13, 14]. It is not clear if these differences have an influence on the biological activity of GX0101.

Considering the above characteristics of pathogenicity, horizontal transmission ability, and genomic structure with a stable REV insert, this analysis of the complete sequence of GX0101 will be useful not only in studies of gene functions related to pathogenicity and transmission but also in understanding genomic mutations and evolutionary relationships of GaHV-2 from different geographical areas of the world.