Introduction

Hepatocellular carcinoma (HCC) has the highest incidence among all subtypes of liver cancer [1]. In 2018, HCC ranked as the sixth most common cancer and the fourth leading cause of cancer related death.[2]. The main risk factors of HCC are infection by hepatitis B or C virus, cirrhosis, alcohol, and metabolic diseases [3,4,5]. However, the molecular mechanisms underlying HCC is unclear. Therefore, this study aimed to identify novel biomarkers and possible pathogenesis for HCC.

Accumulating evidence based on next-generation sequencing suggests that more than three-quarters of genes are non-coding RNAs, among which microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are more studied. LncRNAs are more than 200 nucleotides in length, and could be potential biomarkers and therapeutic targets for cancer by participating in transcriptional regulation and post-transcriptional regulation [6,7,8]. The competitive endogenous RNA (ceRNA) refers to the interaction between miRNA response elements (MREs) and lncRNAs to form a giant RNA network and regulate physiological and pathological processes [9]. Many studies have confirmed that in the ceRNA network, lncRNAs complementary to miRNA sequences regulate the encoded proteins through sponge miRNA [10,11,12,13].

The Cancer Genome Atlas (TCGA) is a public dataset based on clinicopathological data and high throughput sequencing results from huge number of patients with various types of cancer, and has been employed to understand molecular basis of cancer. At present, information from 10, 000 patients has been collected to provide convenience for cancer research. In present study, RNA expression profiles were downloaded from TCGA, and differentially expressed RNAs were identified. Next, CeRNA network was constructed to reveal the crosstalk of differentially expressed lncRNAs, differentially expressed miRNAs and differentially expressed mRNAs (DElncRNAs, DEmiRNAs, and DEmRNAs). Finally, quantitative RT-PCR was used to verify that these results are consistent with the results on our specimens. The ceRNAs in HCC will provide important clues on molecular mechanism of HCC.

Materials and methods

Data source

The RNA-Seq data of patients in TCGA-LIHC project were downloaded from TCGA data portal (https://portal.gdc.cancer.gov/, accessed February 7, 2019), as well as the corresponding clinical information (Cancer Genome Atlas Research, 2011). 14254 lncRNA ID and 19660 mRNA ID were annotated in the GENCODE annotation file in GTF format of 374 HCC tumor tissues and 50 adjacent normal liver tissues; 1881 miRNAs were also downloaded from TCGA.

Screening of differentially expressed lncRNAs/miRNAs/mRNAs

The edgeR package of the R platform was used for DElncRNAs/DEmiRNAs/DEmRNAs screening in comparison with expression levels between HCC tumor and adjacent normal liver tissues, with the thresholds of |log2FC|> 2.0 and adjusted P < 0.05.

CeRNA network construction

LncRNA-miRNA interactions were combined with DElncRNAs and DEmiRNAs according to miRcode database. Target mRNAs for DEmiRNAs were predicted using miRTarBase and TargetScan database, respectively. Then, the intersection of the target mRNAs predicted by two databases was considered as the final target mRNAs of DEmiRNAs. The ceRNA regulatory network visualization for the regulatory relationship between lncRNA-miRNA and miRNA-mRNA was conducted using Cytoscape.

Functional and pathway enrichment analysis of common mRNA

GO and KEGG analyses were applied for the functional annotation and pathway analysis, using the Database for Annotation Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/). Three GO categories [cellular component (CC), biological process (BP) and molecular function MF)] were detected. The mRNAs in the ceRNA network were analyzed in DAVID. The human genome was selected as the background parameter. P < 0.05 and a count ≥ 2 were set as the thresholds to indicate a statistically significant difference.

Survival analysis

The “survival” package in R software was used to plot the Kaplan–Meier curves of DElncRNAs, DEmiRNAs, and DEmRNAs of the ceRNA network, with P < 0.05.

Clinical significance Analysis

LncRNAs were analyzed with clinicopathological characteristics of HCC patients, including gender (male or female), age (over or under 60 years), risk factors (alcohol consumption, hepatitis B, hepatitis C, no history of primary risk factors, and non-alcoholic fatty liver disease), histologic grade (G1,G2,G3 or G4) and clinical stage (III, IV or I, II), with P < 0.05.

Quantitative real-time polymerase chain reaction validation

One normal human liver cell lines (L02) and three human HCC cell lines (HepG2, Hep3B, and Huh7) saved in our laboratory were grown in Dulbecco’s modified Eagle’s medium (GIBCO-BRL, USA) plus 10% heat-inactivated fetal bovine serum (FBS) (GIBCO, USA), 100 U/mL penicillin, and 100 µg/mL streptomycin. All cells were maintained in a humidified incubator with 5% CO2 atmosphere at 37 °C. Twelve paired human HCC tumor and adjacent normal tissues were collected from the Second Affiliated Hospital of Nanjing Medical University, of which ten patients were in T1 stage and two patients were in T2 stage. Ethical approval was obtained from the Ethics Committee and written informed consent was obtained from all patients. The specimens were collected during surgery and immediately frozen in liquid nitrogen. The total RNA from cell lines and tissues was extracted through the Trizol Reagent (Invitrogen, USA) and 1 μg RNA was used to synthesize cDNA using SuperScriptR III Reverse Transcriptase (Vazyme, China). Real-time PCR was further performed with SYBR Green PCR Master Mix (Invitrogen, USA) on Cobas Z480 System (Roche, Germany). In this study, lncRNA qRT-PCR Primer Sets ( a pair of qPCR primers for each set) specific for PVT1, SFTA1P, C17orf82, AC016773.1, AC073352.1 and AL512652.1 designed by RiboBio (China), and all reactions were carried out in triplicate and β-actin were used as internal control and relative gene expressions were analyzed by the 2−△△ct method.

Statistical analysis

All bioinformatics analysis and statistical tests were performed with R language. Pearson correlation analysis was used to analyze the correlation between lncRNAs and miRNAs. Data were analyzed using GraphPad Prism 7 software (GraphPad, USA).

Results

Identification of DElncRNA/DEmiRNA/DEmRNAs

1070 DElncRNAs, including 1013 (94.67%) upregulated and 57 (5.33%) downregulated DElncRNAs in tumors were identified using the “edgeR” package. 147 DEmiRNAs, including 119 (80.95%) upregulated and 28 (19.05%) downregulated DEmiRNAs in tumors were also identified as significantly differentially expressed. Moreover, we got 1993 DEmRNAs, including 1788 (89.71%) upregulated and 205 (10.27%) downregulated DEmRNAs in tumors. The expression patterns were displayed in Fig. 1.

Fig. 1
figure 1

Differential expressed analysis of RNAs from HCC patients compared with normal samples. DElncRNAs, DEmiRNAs, and DEmRNAs were hierarchically clustered by R software (a–c), respectively. The left longitudinal axis indicated the cluster analysis of DERNAs, and the right axis denoted the results of DElncRNAs, DEmiRNAs, and DEmRNAs (a–c), respectively. The upper horizontal axis denoted the cluster analysis of each sample, and the down axis below the map corresponded to the results. Each RNA analysis was plotted into the volcano map, and red color represented the upregulated DElncRNAs, DEmiRNAs, and DEmRNAs (d–f) with log FC ≥ 2 while green represented the downregulated differential expressed RNAs with log2 FC < 2. FC, fold change; DE, differential expressed

Construction of the ceRNA network

As shown in Fig. 2, a total of 349 lncRNA-miRNA pairs were identified by miRcode database, including 18 miRNAs and 79 lncRNAs. Then, the target genes of 18 miRNAs which identified in the above steps were predicted by miRTarBase and TargetScan, respectively. We gained 115 common mRNAs from the intersection of these target mRNAs and DEmRNAs, and 192 miRNA-mRNA pairs includes 17 miRNAs and 115 mRNAs. To better understand the role of DElncRNAs in HCC, we further constructed a ceRNA network by combining 349 lncRNA-miRNA interactions with 192 miRNA-mRNA interactions, which included 79 lncRNAs, 18 miRNAs, and 115 mRNAs.

Fig. 2
figure 2

Competitive endogenous RNA network for differential expressed lncRNAs (DElncNNAs), DEmiRNAs, and DEmRNAs in HCC. Triangle, circle, and diamond denoted DEmRNAs, DEmiRNAs and DElncRNAs, respectively. Red color represented the upregulated differential expressed RNAs while green represented the downregulated differential expressed RNAs

Functional and pathway enrichment analysis of DEmRNAs

To further explore the systematic characterization and biological functions of the DEmRNAs in the ceRNA network, functional annotation and pathway analysis were extracted by GO and KEGG. The functional enrichment analysis suggested that the target genes of these mRNAs may be involved in various pathways related to “regulation of transcription”, “RNA metabolic process”, “cell cycle process”(GO);and “p53 signaling pathway”, “Pathways in cancer”, “pancreatic cancer”, “prostate cancer”, “small cell lung cancer”, and other noncancer-related pathways such as “cell cycle” (KEGG) (Fig. 3).

Fig. 3
figure 3

Significant functional KEGG pathway analysis and GO analysis of the differential expressed mRNAs in ceRNA network. a Representative of top 7 enriched pathways with P value less than 0.05. b Representative of top 29 GO terms enriched analysis on targeted mRNAs. KEGG, Kyoto Encyclopedia of Genes and Genomes, ceRNA, competitive endogenous RNA; GO, gene ontology

Survival analysis for RNAs in ceRNA network

Survival analysis was performed for the RNAs in the ceRNA network. We performed Kaplan–Meier univariate analysis to predict overall survival (OS) in HCC patients. Due to the lack of follow-up information, 304 cases were included for lncRNA and mRNA survival analysis, and 306 cases were included for miRNA survival analysis. A total of 27 DElncRNAs, five DEmiRNAs and 30 DEmRNAs were found to be associated with prognosis (Table 1).

Table 1 Survival analysis of DElncRNAs, DEmiRNAs, and DEmRNAs in ceRNA network

Correlation analysis of lncRNA and miRNA

Correlation analysis was conducted on each lncRNA and miRNA (which associated with prognosis) based on the analysis of survival analysis in the previous steps. Pearson correlation coefficient analysis was used to find negatively correlated lncRNA and miRNA pairs, with P < 0.05. Correlation analysis between lncRNA and miRNA was visualized using corrplot (Fig. 4). Bounded by the coefficient < − 0.3, ultimately, we obtained six lncRNAs and two miRNAs.

Fig. 4
figure 4

Correlation analysis of lncRNA and miRNA by Pearson correlation coefficient analysis, with p < 0.05

Clinical significance of prognostic six lncRNAs

To further explore the relationship between the six lncRNAs (Table 2) and clinical data of HCC patients, clinicopathological characteristics were divided into different groups (Table 3). As show in Table 3, SFTA1P was correlated with gender, risk factors and clinical stage; AC016773.1 was correlated with age, risk factors and histologic grade; AC073352.1 was correlated with histologic grade; AL512652.1 was correlated with histologic grade and clinical stage. These results revealed that these lncRNAs can be used for effective risk stratification in HCC. However, PVT1 and C17orf82 showed no relationship with these factors.

Table 2 Clinical significance of prognostic 6 lncRNAs
Table 3 The detailed information of 6 aberrantly expressed lncRNAs

Validation of lncRNAs through qRT-PCR

In terms of the above analysis from TCGA, these six lncRNAs (PVT1, SFTA1P, C17orf82, AC016773.1, AC073352.1 and AL512652.1) involving ceRNA network were regarded as potential biomarkers linked with HCC patients. As indicated in Fig. 5, the results of qRT-PCR were consistent with our findings in tissues (Fig. 5). Additionally, SFTA1P was not detected in L02, but was expressed in three human HCC cell lines (Fig. 6).

Fig. 5
figure 5

Expressions of AC016773.1, SFTA1P, AC073352.1, AL512652.1, C17orf82 and PVT1 were evaluated by RT-qPCR in tissues. *P < 0.05

Fig. 6
figure 6

Expressions of AC016773.1, AC073352.1, AL512652.1, C17orf82 and PVT1were evaluated by RT-qPCR in one normal human liver cell lines (L02) and three human HCC cell lines (HepG2, Hep3B, Huh7). *P < 0.05

Discussion

Accumulating evidence has shown that lncRNAs can regulate mRNAs via sponge miRNAs, through epigenetic regulation, transcription regulation, and post-transcription regulation, and participate in tumorigenesis [14,15,16,17]. Therefore, the function of lncRNAs can be explored systematically through the ceRNA network.

In this study, we extracted transcriptome data from TCGA and identified 1993 DEmRNAs, 147 DEmiRNAs and 1070 DElncRNAs between HCC and adjacent normal tissues. To further investigate the role of ncRNAs and mRNAs in cancer, we predicted target lncRNAs of DEmiRNAs based on miRcode database, and then predicted target mRNAs of DEmiRNAs based on miRTarBase and TargetScan databases. Finally, HCC ceRNA network was constructed on the basis of 192 miRNA-mRNA interactions and 349 lncRNA-miRNA interactions. Furthermore, pathway analysis showed that mRNAs were enriched predominantly in “pathways in cancer”, “regulation of transcription”, “RNA metabolic process” and “cell cycle process”, indicating that these mRNAs were implicated in tumorigenesis.

Up to now, prognostic features of ceRNA network in HCC remain largely unknown. In this study, six DElncRNAs (PVT1, SFTA1P, C17orf82, AC016773.1, AC073352.1 and AL512652.1), two DEmiRNAs (hsa-mir-214, hsa-mir-424), and seven DEmRNAs (CPNE7, GNAL, ASF1B, GLP2R, PRSS21, HOXA3, ITGA2) in our ceRNA network showed significant association with OS in HCC patients. Further analysis showed that SFTA1P was correlated with gender, risk factors and clinical stage; AC016773.1 was correlated with age, risk factors and histologic grade; AC073352.1 was correlated with histologic grade; and AL512652.1 was correlated with clinical stage and histologic grade. Therefore, these lncRNAs could be utilized for risk stratification in HCC. The results of qRT-PCR showed that PVT1, C17orf82, AC016773.1, AC073352.1 and AL512652.1 were upregulated, indicating their potential as HCC biomarkers. While SFTA1P could only be detected in HCC cell lines, this may be resulted from that the expression of SFTA1P in L02 is lower than the detection limit of qRT-PCR. We postulated that SFTA1P may be a powerful marker for the diagnosis of HCC.

The human PVT1 gene located on 8q24 has 9–12 exons with the length of 1957 base pairs [18]. As an oncogenic lncRNA, PVT1 is highly expressed in non-small-cell lung cancer [19], colon cancer [20], leukemia [21], and HCC [22]. The biogenesis of PVT1 is still unclear, but PVT1 is involved in cancer cell differentiation, proliferation, invasion, and drug resistance, and is associated with OS and tumor stage [23, 24]. For example, PVT1 could upregulate miR-214 by promoting the binding of enhancer of zeste homolog 2 (EZH2) to its promoter, leading to increased ovarian cancer cell proliferation and invasion [25]. In this study, we found the same trend of expression of PVT1, miRNA-214 and miRNA-424 in HCC, but the mechanism needs to be clarified.

SFTA1P is the pseudogene-derived lncRNA located on 10p14 with the length of 693 nts, and can suppress the migration and invasion of lung adenocarcinoma (LUAD) and gastric cancer cells.[26,27,28]. However, we found that SFTA1P was upregulated in HCC, and was correlated with gender, risk factors and clinical stage of HCC, consistent with previous report [29]. Due to different expression of SFTA1P in different tissue types, its mechanism of action remains elusive.

C17orf82 located on 17q23.2 has one exon with the length of 1529 bp. Previous study revealed that C17orf82 is upregulated in hepatitis B virus-associated HCC [30], but the biological function remains unclear. In our study, hsa-mir-214 and hsa-mir-424 were predicted to be the response elements to regulate the expression of CCNE1, CCNE2 and CDK1. Besides PVT1, SFTA1P and C17orf82, three upregulated DElncRNAs (AC016773.1, AC073352.1and AL512652.1) have never been reported in previous studies and their biological function remains unknown.

Nevertheless, some limitations of this study should be acknowledged. First, compared to HCC tumor tissues, the number of adjacent tissues is less due to various factors of TCGA data collection, which could cause bias of our results. Second, whether these lncRNAs and their interactions in ceRNA are specific to HCC has not been confirmed.

In summary, this study systematically analyzed possible interacting genes in HCC by exploring lncRNA-related ceRNA networks, and identified six lnRNAs which may be helpful for understanding the oncogenesis and prognosis and of HCC.