Introduction

Hepatitis B virus (HBV) remains a global health problem with approximately 240 million chronically infected people and 780,000 deaths per year due to acute infections, cirrhosis of the liver, and hepatocellular carcinoma [1, 2]. The prevalence of HBV is the highest in sub-Saharan Africa and East Asia, where 5–10% of the adult population is chronically infected [1]. Hepatitis B is an enveloped, 3.2-kb partially double-stranded DNA virus, which belongs to the Hepadnaviridae family [3]. The virus is arranged in four overlapping open reading frames (ORFs): polymerase (P), surface (S), precore/core (C), and the X gene (X) [4]. Like retroviruses, HBV utilizes a reverse transcriptase enzyme during replication, resulting in high viral diversity [5].

There are ten HBV genotypes (A–J) classified by sequence divergence of at least 8% in the entire HBV genome [6,7,8]. Genotypes A–D and F have been further classified into subgenotypes with a genetic variation of 4–8% [9,10,11]. Genotypes A and D are prevalent in the European Union and in Central/South Asia; B and C in South/South East Asia and the Pacific region; E in West/Central Africa; F in South/Central America and Alaska; G and H in Europe and Japan; I in Vietnam and Laos; and J in Japan [11, 12].

Genotypes represent an invaluable tool for tracing the molecular evolution and transmission patterns of HBV [11]. Several studies suggest that clinical outcomes reflect differences in pathogenic potential between genotypes [13,14,15]. HBV genotypes have been shown to differ according to the course of disease, development of mutations, and response to antiviral therapy [16]. Genotype determination is important in identifying patients who are at increased risk of disease progression and in optimizing treatment [15].

Recombination between genotypes occurs in regions where multiple genotypes co-circulate and facilitates diversification within individuals and in the general population. Novel variants generated by recombination between different HBV genotypes have been documented worldwide [17,18,19,20,21]. In West Africa, one or more A/B, A/C, A/E, C/E, D/E, and D/E/A recombinants have been identified [5, 22, 23]. Recombination is an important element of HBV genetic variability with possible clinical implications [24, 25]. The research reported here was aimed at further investigating an HBV variant from Ghana whose initial partial sequence was identified as a possible recombinant.

Methods

Patient information

The newly identified recombinant is from a HIV–HBV-coinfected, 52-year-old male from Northern Ghana who was previously enrolled in two cross-sectional studies. At the time of sample collection in 2013, he was antiretroviral treatment naïve with an unknown CD4+ count, hepatitis B e-antigen negative, e-antibody positive, and had HBV DNA of 84,060,683 IU/mL. The original study investigated the proportion and factors associated with hepatitis B viremia in treatment-experienced and treatment-naïve HIV-coinfected Ghanaians [26]. A subsequent substudy examined HBV resistance mutations [27]. Phylogenetic analysis from this resistance study grouped the genotype D patient (sample 007N) with three D/E recombinants from Niger (FN594768–71), although it included only a 1004-bp fragment corresponding to the HBV reverse transcriptase region. Because genotype D is relatively uncommon in West Africa and to determine if this variant was, in fact, a D/E recombinant, the entire full-length genome was analyzed.

Next-generation sequencing

HBV DNA was extracted from 500 μL of patient serum using the QIAamp UltraSens Virus Kit (QIAGEN, Valencia, CA, USA), and the full-length sequence was obtained using rolling circle amplification followed by PCR [18]. DNA-seq was performed by the Genomics, Epigenomics and Sequencing Core (GESC) at the University of Cincinnati. The PCR product was transferred to a microTUBE (Covaris, Woburn, MA) and sheared with the Covaris S2 focused ultrasonicator. The sample was prepared for the sequencing library using the PrepX DNA Library kit, Apollo 324 NGS automatic library prep system (WaferGen, Fremont, CA), and a ChIP-seq script. The ligated library was indexed and enriched by six cycles of PCR using sample-specific index and universal PCR primers, followed by automated AMPure XP bead purification (Beckman Coulter, Brea, CA). Equal-amount (ng) libraries determined by NanoDrop were pooled, diluted, and qPCR quantified with NEBNext Library Quant Kit (New England Biolabs, Ipswich, MA) using ABI 9700HT real-time PCR system (Lifetech, Grand Island, NY).

The quantified pooled library was used for cluster generation in the cBot system (Illumina, San Diego, CA). A library at the final concentration of 16 pM was clustered onto a flow cell using Illumina TruSeq SR Cluster kit v3 and sequenced for 50 cycles using TruSeq SBS kit on Illumina HiSeq system. A total of 445,497 reads (51 bp each) were collected and imported into CLC Genomic Workbench 9 for quality control and consensus assembly. 366,512 (75.5%) reads passed the initial quality import filter. A consensus sequence was extracted from 348,145 (94.9%) of the remaining reads using the de novo assembler module v1.3 and complete HBV reference genomes from GenBank. The reads excluded from the consensus sequence did not meet the similarity cut-off (similarity fraction set to 0.80) and only mapped to a small portion of the HBV genome.

Sequence alignment and Bayesian Markov chain Monte Carlo analysis

The consensus sequence from this novel HBV variant was aligned with full-length references in ClustalX 2.1 [28]. Additional phylogenetic inference was performed using a Bayesian Markov chain Monte Carlo (MCMC) approach as implemented in the BEAST v1.8.0 program [29] under an uncorrelated log-normal relaxed molecular clock and the general time-reversible model with nucleotide site heterogeneity estimated using a gamma distribution. The initial MCMC analysis was run for a chain length of 500,000,000 with sampling every 50,000th generation. Results were visualized in Tracer v1.5 to confirm chain convergence, and the effective sample size (ESS) was calculated for each parameter. All ESS values were >1000 indicating sufficient sampling. The maximum clade credibility tree was selected from the posterior tree distribution after a 10% burn-in using TreeAnnotator v1.8.0. An additional MCMC analysis was conducted with all recombinant sequences, genotype E references, and subgenotype D7–D10 references and a chain length of 200,000,000.

Recombination analysis and intragroup genetic distance

To identify possible recombination, bootscanning analysis of full-length sequences was performed in SimPlot version 3.5.1 using the Kimura’s 2-parameter method with a 300-base pair (bp) window, a 30-bp step increment, and 1000 bootstrap replicates [30]. The full-length HBV sequence was compared to consensus sequences generated using full-length GenBank references for HBV genotypes A–H. If >80% of the permuted trees showed similarity to more than one genotype, the “parental” sequences were retained within a second bootscanning analysis along with the consensus Pan troglodytes sequence as an outlier. To eliminate the possibility of a dual infection, the full-length consensus sequence was aligned with a second sequence amplified by targeted PCR and Sanger sequencing [27]. Despite the use of different amplification and sequencing methods, the consensus sequences were identical, indicating that the patient was not infected with multiple HBV genotypes. Intragroup genetic distances were calculated by pairwise comparison of nucleotide sequences using the Kimura’s 2-parameter method of Molecular Evolutionary Genetics Analysis (MEGA) v7.0.18 [31, 32]. The complete genome of patient 007 N was deposited in GenBank under the accession number KU711666.

Results

Bayesian Markov chain Monte Carlo analysis

The phylogenetic analyses included 13 full-length D/E recombinant references from different regions in Africa including Niger (FN594767, FN594768, FN594769, FN594770, and FN594771), Sudan (KU736916 and KU736917), Ethiopia (KU736914 and KU736915), Kenya (KP168420 and KP168421), Ghana (GQ161754), and Gabon (GU177079). As shown in Fig. 1, all recombinants from East Africa grouped together, and most of the recombinants from West Africa—excluding GU177079, FN594767, and 007 N—grouped together. D/E recombinant sequences—regardless of their country of origin—were most closely related to subgenotype D7 sequences but did not form a single monophyletic group.

Fig. 1
figure 1

Bayesian phylogenetic tree of 87 full-genome HBV sequences generated with BEAST v1.8.0. The sample of interest is denoted as “007N,” while reference sequences are labeled “genotype—accession number—country of origin.” All reported recombinants are marked with an asterisk after their genotype designation, and relevant posterior values are shown

A subsequent MCMC analysis was conducted focusing specifically on the D/E recombinant references, isolate 007N, genotype E references, and subgenotype D7–D10 references (Fig. 2). All but three of the D/E recombinants—GU177079, FN594767, and 007N—clustered together. The D/E recombinant from Gabon (GU177079) clustered with the subgenotype D7 strains from Ireland, Central African Republic, the Netherlands, and Tunisia. The E/D recombinant from Niger (FN594767) clustered with the genotype E references. The isolate of interest—007N—clustered with a different group of D7 references from Tunisia, as well as D7 strains from Belgium, Cuba, and Venezuela. There are multiple clusters of D7 reference strains—some grouping with known D/E recombinants and others grouping with D10 reference strains from Ethiopia. Thus, subgenotype D7 is not a well-defined monophyletic cluster based on the limited sequence data available currently.

Fig. 2
figure 2

Bayesian phylogenetic tree of 68 full-genome HBV sequences generated with BEAST v1.8.0. Genotype E, subgenotype D7, subgenotype D8, subgenotype D9, subgenotype D10, and genotype D/E recombinant reference sequences are included. The sample of interest is denoted as “007N,” while reference sequences are labeled “genotype—accession number—country of origin.” All reported recombinants are marked with an asterisk after their genotype designation, and relevant posterior values are shown

Intragroup genetic distance

Table 1 shows the mean genetic distance among all D/E recombinants including 007N. The mean genetic distance was 3.72%, with a range of 0.26–7.50%. All 14 full-length D/E recombinants were distinct from one another, with only one recombinant from Sudan and two from Ethiopia (KU736917, KU736914, and KU736915) having a genetic distance of less than 1% between them. Sequence 007N was most closely related to the other D/E recombinant from Ghana (GQ161754) with a genetic distance of 2.23%. The sequence from Niger (FN594767) had the genetic distance farthest from 007N at 7.50%. Among all the D/E recombinants, the most similar were the two recombinants from Ethiopia with a genetic distance of 0.26%, while the most divergent—aside from FN594767 and 007N—were FN594767 from Niger and FN594770 from Niger at 7.50%.

Table 1 Genetic distances for all 14 D/E recombinants calculated by pairwise comparison of nucleotide sequences using the Kimura 2-parameter method

Recombination analysis

All D/E recombinant sequences—including the 13 references and 007N—were analyzed further in SimPlot to determine recombination patterns. The bootscans of each recombinant were created using all full-length genotype D7 and E references used in the MCMC analysis with the full-length Pan troglodytes sequence as an outlier. Bootscans with a genome map to illustrate where the recombination breakpoints are located within the circular HBV genome are shown in Fig. 3.

Fig. 3
figure 3figure 3figure 3

Bootscan analysis of reference a KP168420, b KP168421, c KU736914, d KU736915, e KU736916, f KU736917, g FN594767, h FN594768, i FN594769, j FN594770, k FN594771, l GQ161154, m GU177079, and n sample 007N (KU711666) using the Kimura’s 2-parameter method with a 300-base pair (bp) window, a 30-bp step increment, and 1000 bootstrap replicates

Seven of the 14 sequences—KU736914–17, FN594768, FN594771, and GQ161754—have a breakpoint between nt 1100 and 1200 (located in the P ORF) where >80% of permuted trees showed a shift from genotype D to genotype E. With the exception of sequence FN594767, all the recombinants showed a shift from genotype D to genotype E between nt 1700 and 2400 corresponding to the P, X, and C ORFs. Eleven recombinants from Kenya, Ethiopia, Sudan, Niger, and Ghana—KP168420–21, KU736914–17, FN594768–71, and GQ161154—have an additional breakpoint between nt 2900 and 3100 which is located in the S and P overlapping ORFs. One of the Niger samples—FN594767—has a unique recombination pattern compared to other D/E recombinants with only one breakpoint between nt 500 and 900 in which >80% of permuted trees show a shift from genotype D to genotype E. There is a breakpoint between nt 900 and 1100—as shown in Fig. 3g—where the sequence shifts back from genotype E to genotype D, although the percentage of permuted trees is ~75%. This recombinant is referred to in the original publication as an E/D recombinant because a majority of the sequence data are similar to genotype E [33].

The reference recombinant from Gabon and 007N—Fig. 3m and n—both have only one breakpoint above 80%. Thus, the two sequences from Ghana—Fig. 3l and n—have distinctly different recombination patterns despite GQ161754 being the closest reference sequence to 007N based on genetic distance.

Discussion

Over 400 complete HBV intergenotypic recombinant genomes have been identified around the world [34]. HBV genotypes can help elucidate the transmission patterns of HBV because variants have a distinct geographic distribution genotype. Overall, intergenotypic recombinants exhibit similar circulation patterns as their original genotypes [34]. Hepatitis B virus genotype A is predominant in South/East Africa, D in North Africa, and E in West/Central Africa [12]. Genotype E hybrids are generally found in African countries with A/E recombinants in Cameroon, Ghana, and Guinea [35, 36] and D/E recombinants in Gabon, Ghana, Niger, Sudan, Kenya, and Ethiopia [33, 35, 37, 38]. There are two documented E hybrids reported from France (A/E) and Ireland (D/E); however, both patients originated from Africa [18, 39].

When all full-length intergenotypic recombinant genomes were compared, the majority (73%) consisted of HBV variants with identical recombination sites [34]. Similar breakpoint locations among several sequences occur due to recombination hot spots or a single ancestral recombination event circulating within a population. An analysis of breakpoint positions was conducted on 117 sequences with different breakpoints and/or genotype composition. Favored recombination sites were identified within nt 1700–1900, nt 1800–2000, and nt 2100–2300 [34]. Isolate 007N from this study has two breakpoints which span these favored recombination sites between nt 1700 and 2400. The other D/E recombinants analyzed in this study share this breakpoint and have an additional breakpoint between nt 1100 and 1200.

Based on the MCMC phylogenetic analysis, SimPlot recombination analysis, and intragroup genetic distance calculations, the isolate 007N full-length genome is unique compared to other D/E recombinants reported in Africa. As shown in Figs. 1 and 2, isolate 007N—as well as other published, full-length D/E recombinant strains—clustered with HBV subgenotype D7. This may indicate that the D/E recombinants are more specifically D7/E recombinant strains. However, this conclusion cannot be confidently drawn from the data available given that multiple clusters of subgenotype D7 sequences were observed in Fig. 2. Despite many of the subgenotype D7 strains coming from the same country of origin, they did not form a monophyletic group, suggesting that the D7 subgenotype is not accurately defined to date and requires further analysis of additional samples from a wider geographic distribution of patients.

Available data on HBV genotypes and their effect on disease profile are currently limited to specific genotypes—mostly A, B, C, and D. However, evidence suggests that HBV genotypes and subgenotypes play a critical role in host–virus interactions. Knowledge of HBV genotype will enable clinicians to determine patients’ response to treatment and their risk of future complications [13,14,15]. More expansive epidemiological and in vitro studies are needed to better understand the pathogenic effects of all HBV genotypes. These data would then help determine if HBV genotype D differs from genotype E in its recombinogenic potential and/or if a particular recombinant HBV genotype differs from a non-recombinant HBV genotype in terms of pathogenic mechanisms and patient outcomes.