Introduction

Certain types of human papillomaviruses (HPV) appear to be associated with the genesis of human malignancies. It has been widely confirmed that high-risk human papillomavirus is the major etiological factor of cervical carcinoma [1]. Numerous clinical studies have also been detected HPV in several other cancers, including esophageal carcinoma, lung cancer, ovarian cancer, breast cancer, penile cancer, skin cancer, and head and neck cancers [26]. HPV-related cancers are considered to have better prognoses than the classical carcinogenic factor-associated cancers [7]. The increasing incidence of human cancer necessitates a better understanding of the state of existence of HPV in the development of HPV-related cancers.

Nearly 20 % of human cancers are closely related to viral infections. A link between certain cancers and carcinogenic viruses has been confirmed [810], and the strategy used by these carcinogenic viruses for carcinogenesis is genome integration [1113]. The integration of viral genomes into host genomes is considered a key step in the process of cancer-related viral infection. The integration of the HPV genome into the human genome may be one of the most important mechanisms of the induction of HPV-related cancers [14]. Thus, it is vitally important to investigate the mechanism of viral integration for the development of treatments for viral infections and gene therapies.

HPV integration sites in genomes have been extensively explored in recent years. The most comprehensively reported HPV integration sites are those in cervical carcinoma cells and include 2p23, 5p11–15, 8q13, 8q24, 9q31–q34, 9q31–34, 10q26, 11q13, 11q23–q25, 14q11.2, 19p13.1, 20p11.2, and 22q12–13 [1520]. Only one HPV integration site, 8p11.21 [21], has been reported for esophageal carcinoma cells. Furthermore, few data are available concerning HPV infection and integration status in other HPV-related cancers, such as tongue carcinoma, lung carcinoma, and breast cancer.

Whether the HPV insertion site is randomly or regularly distributed in the human genome has been widely discussed, and this is still controversial [22, 23]. According to previous research, HPV tends to integrate into fragile sites, or cancer-related sites, in human chromosomes of cervical carcinoma cells, but this bias has not been detected in every carcinoma cell line [24, 25]. For other HPV-related cancers, the integration site tendencies are still unclear.

Based on previous studies, the relevance of HPV was examined in ten cell lines that may be associated with HPV infection, including esophageal carcinoma (EC109, EC9706, KYSE150, KYSE450 and KYSE140), lung carcinoma (A549), hepatocellular carcinoma (Hep G2), tongue cancer (Tca83), adrenal neuroblastoma (NH-6), and mammary cancer (MCF7) cells lines. We detected HPV both at the DNA and RNA level. Nested PCR was used to obtain fusion mRNAs containing half viral sequences and half human sequences and was followed by bioinformatics analysis. Our research expanded the available information on HPV integration conditions in these carcinoma cell lines. The results promote the understanding of the states of HPV existence in HPV-related cancers.

Materials and methods

Cell lines

Esophageal carcinoma (EC109, EC9706, KYSE150, KYSE450 and KYSE140), lung carcinoma (A549), hepatocellular carcinoma (Hep G2), tongue cancer (Tca83), mammary cancer (MCF7), adrenal neuroblastoma (NH-6), and cervical cancer (Hela) cell lines were maintained in our laboratory. The HeLa cell line served as the HPV 18 positive control. Under standard conditions, these cell lines were cultivated in Dulbecco’s modified Eagle’s medium (GIBCO, China) supplemented with 10 % fetal bovine serum, 100 U/mL penicillin, and 100 U/mL streptomycin. MRC5 cells were cultivated in standard DMEM with 1 % NEAA. Cell lines were maintained in a 5 % CO2 atmosphere at 37 °C and were harvested at the exponential proliferation phase.

Detection of HPV DNA

Genomic DNA of carcinoma cells was isolated from each cell line using the QIAamp DNA Mini Kit (QIAGEN, Germany) according to the manufacturer’s protocol. HPV DNA was detected by polymerase chain reaction (PCR) with the general primer sets of My09/11 and GP5+/6+ for conserved genes of HPV. The type-specific primer sets were designed according to the HPV16 and 18 gene sequences in GenBank. The general primer sets for HPV DNA detection are listed in Table 1. PCR was conducted according to Jacobs [26]. The PCR products were examined by 1.2 % agarose gel electrophoresis.

Table 1 General primer sets for HPV DNA detection and APOT assay

Reverse transcription

Total RNA from the cell lines were extracted using the TRIzol method, eluted in RNase-free water, and stored at −80 °C for further analysis. RNA concentration was quantified by spectrophotometry. A human housekeeping gene was used as a positive control to ensure that the nucleic acid integrity was sufficient for PCR amplification. Total RNA isolated from each cell line was reverse-transcribed into cDNA. Reverse transcription was performed using 0.2 µg RNA, 2 µL dNTPs (2.5 mmol/L), 1 µL oligo-dT primer (10 pmol/L), 1 µL RNase inhibitor, 2 µL buffer, 0.2 µL reverse transcriptase, and RNase-free water to a final volume of 20 µL. Each reaction system was reverse-transcribed at 42 °C for 60 min and then inactivated at 70 °C for 15 min. The reverse-transcribed products were stored at 4 °C until further use.

Amplification of papillomavirus oncogene transcript (APOT) assay

To confirm whether these cell lines had detectable viral-cell fusion, a PCR-based protocol for the amplification of papillomavirus oncogene transcripts was used to analyze integrated transcripts of HPV in the chromosomes of these cancer cell lines. Nested PCR (TaKaRa, Japan) was performed with the cDNA samples to produce the fusion mRNA sequences described by Klaes et al. [27]. The primer pairs used are shown in Table 1. To control for false-positives, HEK293 cellular DNA was included as a negative control.

Genomic context analysis

The nested PCR products were separated by agarose gel electrophoresis. To analyze the chromosomal location of HPV integration, PCR products were inserted into the PMD19-T single vector (TaKaRa, Dalian, China) and transformed into Escherichia coli DH5α-competent cells (TaKaRa, Dalian, China). The sequences containing both viral and human genes were analyzed using NCBI BLAST. Because the human genome has been completely sequenced, the human genome portions of these fusion mRNAs were precisely located in homologous human genomes. We performed two searches of both the human genomic transcript collection and the nucleotide collection of BLAST for each sequence after vector cleaning with NCBI Vecscreen. The human genomic transcript collection contains only human genetic information; the nucleotide collection contains genetic information from other species. The NCBI Map Viewer (http://www.ncbi.nlm.nih.gov/mapview/) was used to map cellular breakpoints into specific chromosomal regions and to determine cellular genes located at or near HPV integration sites.

Results

Detection of HPV DNA in carcinoma cell lines

Ten cell lines were analyzed by polymerase chain reaction for HPV detection and genotyping. For years, the relationship between HPV infection and esophageal carcinoma has been hypothesized. To explore whether esophageal carcinoma cell lines are infected by high-risk human papillomavirus, EC109, EC9706, KYSE150, KYSE450, and KYSE140 cell lines were examined for the presence and genotype of HPV. The general primer sets of GP5+/6+ and My09/11, which generate amplicons of 450 and 150 bp, respectively, were used for HPV L1 detection. The amplified fragment sizes of the type-specific primer sets for HPV18 E6 and HPV16 E6 were both 335 bp. As shown in Fig. 1, the esophageal carcinoma cell lines EC9706 and EC109 and the cervical cancer cell line HeLa (positive control) were positive for HPV 18 infection at the DNA level, and the other three esophageal carcinoma cell lines examined (KYSE140, KYSE150 and KYSE450) were negative for HPV18 infection.

Fig. 1
figure 1

Detection of HPV 16 and HPV 18 type DNA in five esophageal carcinoma cell lines. M 100-bp DNA ladder, 1–4 DNA detection in five cell lines of esophageal carcinoma using the primer pairs GP5+/6+, MY09/11, HPV 16, and HPV 18, respectively. HeLa cell line was used as positive control (template of HPV 18-infected cervical cancer cell)

Another five cell lines, lung carcinoma (A549), hepatocellular carcinoma (Hep G2), tongue cancer (Tca83), mammary cancer (MCF7), and adrenal neuroblastoma (NH-6) cell lines, which may be associated with HPV infection, were analyzed using the same methods as above. The results are shown in Table 2. HPV DNA was not inserted in the genome of MCF7 or NH-6 cell lines. The genomes of Tca83, Hep G2 and A549 cell lines contained HPV18 DNA inserts. Based on these results, it can be concluded that the mammary cancer cell line MCF7, adrenal neuroblastoma cell line NH-6, and esophageal carcinoma cell lines KYSE150, KYSE140 and KYSE450 are not HPV-related carcinoma cell lines, and this has not been reported previously. EC9706, EC109, Tca83, Hep G2, and A549 cell lines, which showed HPV infection at the DNA level, were chosen for further exploration for HPV integration at the mRNA level.

Table 2 Detection of HPV DNA in upper aerodigestive tract cancer cell lines

HPV integration in carcinoma cell lines

The amplification of papillomavirus oncogene transcript assay was used to confirm whether cell lines positive for HPV at the DNA level (Tca83, Hep G2, A549, EC109 and EC9706) had detectable viral-cell fusion. The APOT analysis showed that only EC9706 and EC109 cell lines generated transcripts of HPV genes integrated in their chromosomes. Thus, integration of HPV at the transcriptional level was not detected in tongue cancer (Tca83), hepatocellular carcinoma (Hep G2), or lung carcinoma (A549) cell lines. We conclude that HPV infection occurred in Tca83, Hep G2, and A549 cell lines, but the inserted sequences are not transcribed into mRNA.

Figure 2 shows the results of nested PCR of the cDNA of esophageal carcinoma cell lines. Both HPV 18-specific primer and HPV 16-specific primers were used for the nested PCR of the cDNA of EC109 and EC9706 cell lines. Approximately an HPV 18-specific transcript of 200 bp was identified in both EC109 and EC9706 cell lines. The 200-bp nested PCR product of integrated HPV 18 DNA was cloned into competent E. coli cells for sequence alignment.

Fig. 2
figure 2

Nested PCR of esophageal carcinoma cell lines. a cDNA from EC109 cell line was subjected to nested PCR with the following: 1 a set of specific 16 primers and 2 a set of specific 18 primers. b DNA from EC9706 cell line was subjected to nested PCR with the following: 3 a set of specific 18 primers and 4 a set of specific 16 primers. M 100-bp DNA marker

The alignment sequencing results of the partial sequence of HPV 18 obtained from EC9706 and EC109 cells are shown in Fig. 3. The sequences containing both viral and human genes were analyzed using NCBI BLAST. The HPV 18 amplicon from EC9706 cells was a 99-bp gene segment with an identical sequence to that of early protein E7 of the standard HPV 18 genomic sequence from the B2542 entry in GenBank (Fig. 3a). The HPV 18 amplicon from EC109 cells was a 98-bp gene segment with an identical sequence to 98 bp of HPV 18 from the BF226 entry in GenBank (Fig. 3b).

Fig. 3
figure 3

Alignment sequencing results for EC9706 and EC109 cells with HPV 18. a Sequencing results for EC9706 cells after PCR amplification with HPV 18-specific primers. Subject 242–340 partial sequence of human papillomavirus 18 E7 gene from B2542 entry in GenBank. b Sequencing results for EC109 cells after PCR amplification with 18-specific primers; Sbjct. 832–929 partial sequence of human papillomavirus 18 from BF226 entry in GenBank

Because the human genome has been completely sequenced, the human genome portions of these fusion mRNAs were precisely located in homologous human chromosomes. We performed two searches of both the human genomic transcript collection and the nucleotide collection of BLAST for each sequence after vector cleaning with NCBI Vecscreen to analyze the chromosomal locations of HPV integrations. The fusion mRNA sequences were obtained for each cell line. The sequencing results for EC109 (Query 98–170) and EC9706 (Query 99–171) cells were the same. The integrated transcripts were derived from the HPV E7 oncogene.

Interestingly, the HPV integration sites of EC109 and EC9706 were exactly the same. As a representative sequence, the fusion mRNA sequence of EC9706 was meticulously mapped (Fig. 4). The long arm of chromosome 8 (8q24) in the EC9706 cell line was the expression area for HPV type 18 (Fig. 4c). The viral transcription sites and the characteristics of the HPV integration sites are shown in Fig. 4. The red points on the sequence map of chromosome 8 indicate three possible insertion sites of viral-cell fusion. They are partial sequences of alternate assembly CHM1 1.1 (Sbjct 128281045–128281117), alternate assembly HuRef (Sbjct 123558541–123558613) and primary assembly GRCh38 (Sbjct 127228559–127228631) in human chromosome 8. The partial chromosome maps of each mRNA and their exact locations are shown in Fig. 4b. The blast hits of mRNAs, the viral portions of the fusion mRNAs, the human portions of the fusion mRNAs, and their corresponding locations are all presented. Fusions of EC9706 and EC109 contain half viral sequences at their 5′-ends and half human sequences at their 3′-ends. As shown in Fig. 4a, there were no protein-coding genes around the red points. Thus, the integration sites were all gene desert regions. The viral copies integrated into gene desert regions in esophageal carcinoma EC9706 and EC109 cells.

Fig. 4
figure 4

Transcription sites of EC9706 cell fusion mRNA map. a Human sequence map of chromosome 8. The red points on chromosome 8 show the viral transcription sites and the characteristics of integration sites. b The partial chromosome maps of each mRNA and their exact locations. The black strips show the viral portions of the fusion mRNAs; the blue strips show the human portions of the fusion mRNAs; their corresponding locations are represented on the green axes with red points showing their positions; the blue points on the axes are blast hits of mRNAs. c The red strips on chromosome 8 show the viral transcription sites confirmed by us

Discussion

Human papillomavirus (HPV) is a circular double-stranded DNA virus that has been widely confirmed to be one of the most important cancer-related viruses. High-risk HPV has been observed in at least 90 % of cervical cancers and approximately 20 % of head and neck cancers [4]. Integration of HPV genomes into human genomes has been regarded as an important step in the process of carcinogenesis. HPV integration affects the expression of viral proteins and evokes subsequent cell-transformation.

Infection of human papillomavirus could be a suspected or potential modifiable risk factor for carcinoma [28]. Many HPV-integrated carcinoma cell lines have been reported [1517]. However, there are still a great number of undiscovered HPV-integrated cell lines. Most previous studies of HPV integration focused only on the integration conditions at the DNA level; very few studies have examined HPV viral mRNA transcription status. In our research, HPV integration was detected both at the DNA and RNA level in ten cancer cell lines. The APOT assay was used to obtain fusion mRNAs for the detection of the oncogenic HPV transcripts. Viral-cellular fusion transcripts are molecular markers of transcriptionally active integration sites, and PCR amplification can be used for subsequent sequence analysis. In this study, we identified viral-cellular fusion transcripts in five cell lines. Interestingly, three cell lines positive for HPV at the DNA level did not have detectable viral-cell fusion. The results of this study demonstrate that HPV DNA was inserted in 50 % of the cell lines examined, but only 20 % of cell lines transcribed the integrated HPV segment into mRNA. To determine the mechanisms of this phenomenon, the process of successful and complete integration should be discussed.

Theoretically, a complete integration should lead to the persistent expression of viral proteins at a sufficient level for the induction of malignancy. HPV integration cannot be readily detected if integration occurs in a single cell without subsequent clonal selection pressure [29]. If the viral genome integrates into a site without surrounding transcription factor binding sites or a site that lacks the necessary factors for mRNA transcription in eukaryotes, the viral genes might not be transcribed. A lack of surrounding transcription factor binding sites or of the necessary factors for mRNA transcription explains the phenomenon of Tca83, Hep G2, and A549 cell lines exhibiting HPV integration at the DNA level but not at the RNA level. Thus, an important finding is that successful integration occurs in transcriptionally active regions.

This study provides further support for HPV integration sites being located in chromosomal hot-spot regions. According to our study, the 8q24 region is a hot spot for the transcription of HPV 18 in the esophageal cancer cell lines EC9706 and EC109. Fusion mRNAs were clearly transcribed from highly focused regions of the human chromosomes. According to previous research, integration of the virus occurs most commonly at the 8q chromosomal locus. A number of HPV integration sites in HPV-related cancer cell lines, such as the cervical cancer cell lines HeLa and CaSki, have been reported to be located in this region [3032]. The 8q24 region, a gene desert area, has been identified as a viral transcription hot spot and as a fragile site [30, 33]. Chromosome breakage studies have suggested that the observed viral-cellular integration sites may be within common fragile sites. Thus, this observation is of particular interest because it suggests that integration is not only associated with transcriptionally active regions but also with fragile sites.

Conclusions

The HPV genome exists in a variety of cancers. Integration of the HPV genome and persistent viral infection play a crucial role in the progression of cancerous lesions of HPV-related cancers. Additionally, fragile sites, such as 8q24, are involved in the integration process and act as integration hotspots. Examining the transcription of HPV genes integrated in the genome of cancer cells can aid in the understanding of the integration mechanisms of HPV. The results of this study suggest that comprehensive mapping of HPV integration sites will elucidate the mechanisms of HPV integration and aid in the development of individualized biomarkers for the early diagnosis of HPV-related cancers.