Introduction

According to recent estimates, there were 23.6 million new cases of cancer and 10.0 million cancer deaths worldwide in 2019 [1]. Although global cancer incidence rates remained the same from 2010 to 2019, there has been a two percent increase in Iran [1]. Cancers associated with infection are more common than other cancers [2]. Various groups of oncogenic DNA and RNA viruses, including human papillomavirus (HPV), Epstein-Barr virus (EBV), hepatitis B virus (HBV), hepatitis C virus (HCV), human T-lymphotropic virus 1 (HTLV-1), and Merkel cell polyomavirus (MCV), are contributing factors in 15 to 20 percent of human cancers [3].

Blood malignancies account for about eight percent of total cancers in Iran [4]. Chronic lymphocytic leukemia (CLL) is one of the most prevalent leukemias that affect elderly individuals in Western countries [5]. The median age of individuals diagnosed with CLL ranges from 67 to 72 years, and it is more common in males than in females [6]. CLL begins with an accumulation of B cells in the blood, bone marrow, and lymphatic tissues. Various risk factors, including blood group, genetics, medication during pregnancy, occupation, radiation exposure, smoking, and EBV infection, have been reported to be associated with leukemia [7].

EBV is a gammaherpesvirus that is ubiquitous in adults worldwide [8]. EBV was the first virus identified to be associated with human cancer. Primary infection with EBV usually occurs before adolescence, followed by a life-long persistent infection of B cells [8]. EBV can establish distinct types of latent infection, and the gene expression pattern varies among them. Tumor cells infected with EBV are mostly latently infected [9]. Pattern III EBV gene expression is observed in lymphoblastoid cell lines (LCLs) [10].

Epstein-Barr virus nuclear antigen 1 (EBNA1) is a viral protein that is expressed in all of the distinct latency patterns and in lytic infection. This protein is composed of 641 amino acids and has several functional motifs and domains, including Gly/Arg, Gly/Gly/Ala, a dimerization domain (DD), a DNA-binding domain (DBD), and a ubiquitin-specific protease 7 (USP7) binding domain [11]. EBNA1 binds to viral DNA elements and cellular promoters, which not only leads to the maintenance of viral episomes in the host cell but also regulates the transcription of viral and cellular genes, making it essential for efficient viral genome replication, persistence, and transcription [12]. EBNA1 can also interact with USP7 and destabilize p53, which causes a reduction in its concentration and restricts cell apoptosis and death [13]. Expression of EBNA1 has been observed in gastric carcinoma (GC) cells with enhanced tumorigenicity [14]. Downregulation of this phosphoprotein by RNA interference has been shown to decrease cell proliferation [15].

EBV has been classified into types 1 and 2, based on EBNA2 and 3 sequences [16]. Moreover, Gutierrez et al. classified EBV into five subtypes based on residue 487 in EBNA1 (P-ala, P-thr, V-val, V-leu, and V-pro) [17]. Different frequencies of EBV strains have been reported in Asia, particularly in China and Japan, compared to the rest of the world [18, 19]. Studies have also found evidence for the association of certain types of EBV with cancer, as EBNA-3B mutations have been found to be more common in isolates from patients with diffuse large B-cell lymphoma (DLBCL) [19]. Some studies have also described variations in the EBNA1 sequence in EBV isolates from tumors and LCLs [18, 20].

Therefore, in this study, we analyzed the sequences of the DBD/DD domain and USP7 binding domain in the C-terminal region of the EBNA1 gene in CLL patients and compared them with those of healthy individuals.

Materials and methods

Study population, sampling, DNA extraction, and DNA integrity

A total of 40 patients with CLL and 21 gender- and age-matched healthy volunteers were included in this study from 2019 to 2020. CLL patients were selected from among patients who were referred to the Daneshbod Pathobiology Laboratory based on clinical diagnosis and laboratory and immunophenotypic results. Ten ml of whole blood was collected from each participant in a tube containing ethylenediaminetetraacetic acid (EDTA) (10% g/L). The study was approved by the Ethics Committee of Shiraz University of Medical Sciences (IR.SUMS.REC.1400.377). All subjects signed an informed consent form prior to sample collection. Samples were shaken for 2 minutes and spun at 1500 × g for 20 min. After centrifugation, the buffy coat (BC) was harvested from the top of the red cells and then transferred as 250-µl aliquots to microcentrifuge tubes. RBC Lysis Buffer (Cyto Matin Gene, Isfahan, Iran) was added to the BC and incubated for 20 min, and the sample was centrifuged at 450 × g for 5 min. The RBC lysate was discarded, and the BC was transferred to a washing solution (phosphate-buffered saline, 2.5 mM EDTA, and 2% fetal bovine serum) in a microcentrifuge tube, mixed, and centrifuged at 450 × g for 5 min. This step was repeated twice. The supernatant was discarded, and the BC pellet was stored at -20°C. DNA was extracted from a buffy coat of samples using a viral nucleic acid extraction kit (Roche, Mannheim, Germany). All steps were performed according to the manufacturer’s instructions. The extracted DNA was stored at -20°C. In order to confirm the quality of the extracted DNA, PCR was performed using the consensus primers PCO3/PCO4 (β-globin) as described previously [21].

EBV genome detection (BHRF1) and EBNA1 amplification

To detect the EBV genome, a set of primers was used to amplify the BHRF1 gene region of EBV [21] as follows: 10 min initial denaturation at 95°C, 45 cycles of denaturation at 95°C for 45 s, annealing at 57.6°C for 45 s, and extension at 72°C for 45 s, and final extension at 72°C for 10 min. After that, an in-house nested PCR using two sets of primers was performed for amplification of the C-terminal region of EBNA1 gene (Table 1) [18]. The cycling conditions for the first reaction were as follows: 95°C for 10 min, 20 cycles of 95°C for 30 s, 60°C for 35 s, and 72°C for 45 s, followed by 72°C for 10 min. In order to obtain better results in the nested PCR reaction, the touch-down method was used as follows: 10 cycles at 63°C for 30 s, then 10 cycles at 61°C for 35 s, and finally, 15 cycles at 59°C for 45 s. DNA extracted from the cell line B-95.8 (Pasteur Institute, Tehran, Iran) was used as a positive control during each run. PCR products in a 1.5% agarose gel were visualized under ultraviolet light.

Table 1 Primers used for amplification of the C-terminal region of the EBNA1 gene

Sanger sequencing

Purified PCR products were sequenced directly by the Sanger method (Microsynth, Switzerland), and the resulting sequences were aligned, translated, and analyzed using MEGA 11, with the B95.8 EBV strain as a reference sequence.

Bioinformatic analysis

After nucleotide sequencing, the raw sequence data were trimmed, aligned, and analyzed using the bioinformatics software MEGA 11. Phylogenetic analysis was performed in MEGA 11 software, using the neighbor-joining method. The full-length EBNA1 reference sequences B95.8 (V01555.2), GD1 (AY961628.3) [22], Akata (KC207813.1) [23], AG876 (DQ279927.1) [24], Mutu (KC207814.1) [23], HKNPC1 (JQ009376.2) [25], M81 (KF373730.1) [26], Akata-GC1 (MG021307) [27], C666-1 (KC617875) [28], IM-3 (MK973061) [29], Mutu-GC2 (MG011309) [27], SNU-719 (AP015015) [30], and YCCEL1 (AP015016) [30] were obtained from the GenBank database. Sequence data can be found in the NCBI database with the following accession numbers: OQ468273, OQ468274, OQ468275, OQ468276, OP271457, OQ834937, OQ834938, OQ834939, OQ834940, OQ834941, OQ834942, OQ834943, OQ834944, OQ834945, OQ834946, OQ834947, OQ834948, and OQ834949.

Statistical analysis

SPSS 26 software was used for data analysis. Fisher’s exact test was performed on data, and a P-value below 0.05 was considered statistically significant.

Results

Demographic characteristics of patients and healthy volunteers

The mean ages of patients and healthy volunteer participants were 61.07 ± 10.2 and 59.08 ± 10.3 years, respectively. Out of 13 patients, nine were male, and out of 12 healthy volunteer subjects, eight were male.

EBV detection and sequencing in patients and healthy individuals

The results of PCR for EBV detection showed that 52.5% (21/40) of the patients and 66.6% (14/21) of the healthy individuals were EBV positive (P > 0.05). Moreover, the results of the nested PCR assay showed that 13 samples from patients and 12 healthy volunteers were suitable for Sanger sequencing.

Genetic variation at the C-terminus of EBNA1

A global comparison and pairwise alignments of the 25 EBNA1 C-terminal sequences with those of 12 different strains of EBV isolated from patients with various diseases (B95.8, GD1, Akata, AG876, Mutu, HKNPC1, M81, Akata-GC1, C666-1, IM-3, Mutu-GC2, SNU-719, and YCCEL1) showed that the sequences were generally very similar except for several polymorphic residues, such as T497, T524, and V574.

A list of all of the variations in the DNA binding domain and in the USP7 binding site is presented in Table 2. Twenty-five nucleic acid point mutations (11 models of transition and 14 models of transversion), including G96892C, A96914T, G96946A, G96954C, G96976A, C97072G, C97088A, A97110C, G97118A, G97120A, C97121T, A97135T, C97158T/A, A97221C, A97231G, C97232T, G97320A, G97350T, T97382G, A97411C, A97414C, G97423C, G97442A, and T97445C, were found more frequently in the CLL patients than in the healthy volunteer subjects. Of these, only the transition at position 97320 had a higher frequency in CLL patients than in the healthy controls (P = 0.039). Twenty-one point mutations led to 21 amino acid substitutions, E411Q, H418L, V429M, A439T, Q471E, P476Q, E483D, R486K, A487T/V, S492C, D499E, T524I/V, M563I, V574G, M584L, T585P, A588P, R594K, and V595A, which had a higher frequency in the patient group than in the healthy volunteer group, but the differences were not statistically significant (P > 0.05).

Table 2 Nucleotide variations of EBNA1 identified in CLL patients and in the control group

Three EBV subtypes, including the prototype P-ala and two known variants, V-val and P-thr, were found in our study groups (Table 3). In CLL samples, 76.9% (10/13), 15.4% (2/13), and 7.7% (1/13) of the viruses were of the P-ala, P-thr, and V-val subtype, respectively. All of the samples from the healthy volunteer group were infected with the P-ala subtype. Statistical analysis showed that the frequency of the different EBV subtypes was not significantly different between patients and healthy volunteers (P = 0.207) (Table 4).

Table 3 Nonsynonymous sequence variations at amino acid positions 411of 641 of Epstein-Barr virus (EBV) nuclear antigen (EBNA) 1 compared with the wild type (wt)
Table 4 Comparison of mutations in Epstein-Barr virus (EBV) nuclear antigen (EBNA) 1 sequences from CLL patients and healthy individuals

Phylogenic analysis

We compared the EBNA1 gene sequences of the CLL isolates from this study to those of 13 other isolates (B95.8, GD1, Akata, AG876, Mutu, HKNPC1, M81, Akata-GC1, C666-1, IM-3, Mutu-GC2, SNU-719, and YCCEL1) (Fig. 1). Alignment of these sequences revealed that 21 isolates (10/13 CLL and 11/12 controls) from this study were closely related to the reference EBV sequences. Phylogenetic analysis based on the C-terminal region of EBNA1 showed that four isolates (3/13 CLL and 1/12 controls) were more similar to EBV strains isolated from Burkitt’s lymphoma (BL) in Japan and Kenya and non-Asian GC. CLL6 was found to be more distantly related to the other EBV sequences.

Fig. 1
figure 1

Phylogenetic tree based on the C-terminal region of EBNA1 from 38 EBV strains (25 new strains and 13 published strains). The tree was constructed by the maximum-likelihood method. Bootstrap values above 60 are shown, and the scale bar represents 0.01 nucleotide substitutions per site. CLL, chronic lymphocytic leukemia; H, healthy subjects

Discussion

EBV has been associated with various malignancies in different geographic regions, and EBNA1, which is the only viral protein that is expressed in all EBV-associated tumors, has shown sequence variations [18]. EBNA1 gene variations in distinct EBV-associated malignancies have been analyzed, but a final conclusion about geographical or disease associations with EBNA1 subtypes has not been reached [31]. Some researchers have suggested that a possible association exists between EBNA1 gene variation and tumors [32]. In the present study, we investigated the nucleotide variations in the DNA binding domain, dimerization domain, and USP7 binding domain of EBNA1 (from aa 411 to 626) in chronic lymphocytic leukemia and healthy individuals.

Our results revealed that sequence variations within the EBNA1 C-terminal domains were present in 38.4% (5/13) of CLL patients and 16.6% (2/12) of healthy volunteer subjects. Moreover, the results showed that non-synonymous mutations, including E411Q, H418L, V429M, A439T, Q471E, P476Q, E483D, R486K, A487T/V, S492C, D499E, T524I/V, M563I, V574G, M584L, T585P, A588P, R594K, and V595A, and synonymous mutations at P431, D499, L520, and L553 were more frequent than mutations at other sites. This suggests that some residues can be considered hotspots for mutation. Frequent changes at residues 429, 492, 499, 520, 524, 553, 574, and 594 have been observed in previous studies [16, 18, 33,34,35,36,37,38].

In a previous study, a comparison of EBNA1 sequences showed that, out of 116 isolates from nasopharyngeal carcinoma (NPC) tumor biopsies, 40 had matching non-synonymous mutations, including P476Q, E483D, A487T, S492C, D499E, and T524I [16]. Moreover, Sun et al. reported that the substitutions H418L, V429M, A439T, A487T/V, S492C, D499E, T524V/I, M563I, V574G, T585P, R594K, and V595A were more frequent in isolates from NK/T, HL, DLBCL, and T cell lymphoma than in a healthy control group [31]. Moreover, the amino acid changes H418L, V429M, A439T, A487T/V, S492C, D499E, T524V/I, M563I, V574G, T585P, R594K, and V595A have been reported to be more frequent in NPC biopsy samples than in samples from healthy individuals [34]. Also, the mutations most frequently found in Australian Caucasian patients with infectious mononucleosis (IM) and healthy donors were V429M, P476Q, A487T, S492C, T524I, M563I, V574G, T585P, R594K, and V595A [35].

Furthermore, the amino acid substitutions E411D, H418L, A439T, T524I, I528V, L533I, and R594K have been reported in isolates from GC and NPC, with all other substitutions except for I528V and L533I [27]. Zhou et al. found 33 non-synonymous mutations, among which were H418L, V429M, A439T, Q471E, P476Q, R486K, A487T/V/P, S492C, D499E, T524I, M563I, V574G, M584L, T585P, R594K, and V595A. SNPs in the C-terminal region of the EBNA1 gene have been reported in isolates from patients with different diseases, including BL, NPC, and IM, as well as isolates from healthy individuals [38].

In a study by Tschochner et al. [36], the most frequently identified changes in EBNA1 were E411Q, V429M, P476Q, A487T, S492C, T524I, M563I, V574G, T585P, R594K, and V595A. Thuan et al. reported the amino acid changes A487V, S492C, D499E, T524I, I528V, L533I, A588P, and R594K in isolates from NPC cases in Vietnam [37].

In the case of synonymous mutations, our results demonstrated that a transition of guanine to adenine at nt 97,320 (aa 553) was significantly more frequent in CLL than in the control group. A recent analysis of isolates from NPC patients in Vietnam showed the same silent mutation [37]; however, Banko et al., in another study on NPC patients, did not observe a G-to-A transition at nt 97,320 [16]. Two more synonymous mutations at nt positions 97,158 (C to T) and 97,221 (A to C) have been detected exclusively in the CLL group. These point mutations have been detected more frequently in isolates from NPC patients than in isolates from healthy subjects [37]. Also, many isolates from patients with EBV-associated diseases have been reported to contain silent mutations at positions 499, 520, and 553 [38].

Altogether, nucleotide variations at the EBNA1 C-terminal region are more common in EBV isolates from patients with EBV-associated diseases than in those from healthy controls, suggesting that sequence variations at the EBNA1 C-terminus might be associated with EBV pathogenesis and infection outcome.

Bell et al. have demonstrated that genetic diversity within EBNA1 can substantially affect immune recognition of this protein [35]. Mutations at nucleotides 97,231 and 97,232 result in an amino acid substitution at residue 524, which is part of an endogenously processed HLA-B8-binding CTL epitope. This modification has been seen in four isolates from CLL patients [39]. Moreover, the replacement of threonine by isoleucine at position 524 causes the loss of a phosphorylation site in the V-val subtype. Therefore, amino acid variations in V-val strains might be easier to maintain in the latent infection stage [40].

The results of our study also demonstrated the presence of three EBV subtypes: P-ala 76.9% (10/13), P-thr 15.4% (2/13), and V-val 7.7% (1/13) in CLL patients, but only the P-ala (12/12, 100%) subtype in healthy volunteer subjects. In a previous study, the reported distribution of the EBV subtype in 44 NPC cases from Vietnam was as follows: P-ala 4.55% (2/44), P-thr 11.36% (5/44), V-val 79.55% (35/44), and V-leu 4.55% (2/44) [37]. Gutiérrez et al. found V-pro in BL but not in NPC, and V-val was found more frequently in NPC than in BL [17]. Habeshaw et al. also found V-leu to be the most common subtype in East Africa and P-ala and P-thr to be common in BL in Europe, and in control samples [41]. Bhatia et al. reported a higher frequency of the P-thr and V-leu subtypes in BL samples than in healthy subjects [20].

In another investigation, the distribution of EBNA1 subtypes in NPC, GC, lymphoma, and healthy donors was as follows: in NPC: V-val 73.2% (30/41), P-thrV 24.4% (10/41), and V-leuV 2.4% (1/41); in GC: V-val 78% (32/41), P-thrV 12.2% (5/41), and V-leuV 4.9% (2/41); in lymphoma: V-val 68.2% (75/110), P-thrV 15.5% (17/110), V-leuV 3.6% (4/110), and P-ala 10.9% (12/110), and in healthy subjects: V-val 61.8% (34/55), P-thrV 27.3% (15/55), V-leuV 1.8% (1/55), and P-ala 1.8% (1/55). Sun et al. also found a higher number of V-val strains in lymphoma samples [31].

It is likely that these variants have functional differences yet to be identified. Considering the frequency at which mutations are seen in EBV isolates from CLL patients, EBNA1 variation might be associated with a higher risk of CLL development. Moreover, the fact that P-thr and V-val subtypes were detected exclusively in CLL patients suggests that these subtypes might be involved in the pathogenesis of EBV in CLL disease.

Our results also showed that 21 (52.5%) samples from CLL patients and 14 (66.6%) from healthy individuals were EBV positive, but this difference was not statistically significant. Grywalska et al. found EBV DNA in 53.91% of CLL patients, whereas no EBV DNA was found in healthy subjects [42]. Kimura et al. reported higher copy numbers of EBV DNA in patients with lymphoproliferative disease than those with no EBV-related disease [43].

In conclusion, the results of this study showed that variation in the C-terminal region of EBNA1 in CLL patients, especially mutations at nucleotides A97231, C97232, T97382, and G97442 (amino acids 524, 574, and 594), might contribute to the pathogenesis of EBV in CLL patients. More studies are recommended to verify these results.