Introduction

Hepatocellular carcinoma (HCC) is one of the most common malignant cancers and the third leading cause of death induced by cancer worldwide [1]. Generally, HCC develops in patients suffering from chronic liver diseases, such as hepatitis, steatosis, cirrhosis. Hepatitis C virus (HCV) is one of the causes of chronic liver diseases. HCV is a positive single-stranded RNA virus and belongs to the Flaviviridae family [2]. Although it has identified for many years that the development of HCC induced by HCV contains four sections including: hepatitis, fibrosis, cirrhosis and HCC, the molecular mechanism of this procedure still remains unclear [3].

HCV-related HCC is a highly heterogeneous and comprehensive tumor with several alterations of genes and molecular pathways. For example, the expressions of insulin growth factor 2 and aldo-keto reductase family 1 member B10 as well as mRNA-binding protein 3 are significantly upregulated in HCV-related HCC [4, 5]. The crosstalks among inflammation, lipid metabolism and epigenetic alteration are important parts of the hottest topics in hepatocellular carcinogenesis. Inflammation is the first response of HCV infection and may run through all of the pathologic processes, Wnt signal pathway [6], P53 pathway and NF-κB pathway are all involved in the inflammatory responses [7]. HCV infection and replication are the original causes and accomplice of the development of HCV-related HCC. As a previous review reported that HCV virus hijacks the host cell lipid metabolism to facilitate their infections and replications by upregulating lipid synthesis and downregulating secretion and catabolism in infected hepatocytes [8]. It has been clearly reported that fatty acid synthase (FAS) and sterol-regulatory element binding proteins (SREBPs) are significantly upregulated during HCV infection [1]. DNA methylation is a new research topic of the development of HCC, it has been clarified that the promoters of RASSF1 (Ras association domain-containing protein 1), DOK1 (Docking protein 1) and CHRNA3 (Cholinergic receptor nicotinic alpha-3 subunit) were significantly methylated in HCV-related HCC [9].

Microarray data have been used for the investigation of the molecular mechanism underlying HCV-related HCC for a period time. For example, in order to clarify the association of IL28B (Interleukin 28B) and HCC recurrence, a series of human HCV-related HCC samples was collected and detected and the final microarray data were submitted under the dataset of GSE41804 [10]. According to this dataset, Hodo et al. have documented that patients with Il28B rs8099917 TT genotype showed higher recurrence than patients with TG/GG genotype after curative therapy. Besides, based on this microarray dataset, there also some other researches have been done, including the studies on HCC carbon metabolism [11], cancer invasion and metastasis [12], gene expressions profiles of different lesions by Gaussian model [13], and liver cell dedifferentiation [14]. However, the difference of gene co-expression network analysis between HCV-related HCC and normal tissues has been not yet investigated.

In order to further uncover the potential molecular mechanisms of Il28B rs8099917 TT genotype involved HCV-related HCC, microarray data in GSE41804 [10] was used to screen differentially expressed genes (DEGs) between TT genotype HCV-related tumor tissues and normal controls. Kyoto encyclopedia of genes and genomes (KEGG) was used to identified the biochemical pathways of DEGs, then the weighted gene co-expression network analysis (WGCNA) was used to screen hub factors of the potential specific genes co-expressed in HCV-related HCC, and finally used Gene ontology (GO) to identify the bio-functions of the selected genes. The results may provide novel information for the screening of target genes and diagnostic biomarkers for HCV-related HCC.

Materials and Methods

Data of Hepatocellular Carcinoma Gene Expression Chips

Expressional profile of GSE41804 [10] was downloaded from Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/), and the data of tissue samples with Il28B rs8099917 TT genotype were produced from the platform of GPL570 Affymetrix Human Genome U133 Plus 2.0 Array to analyze the difference between HCV-related HCC and normal liver tissues. Finally, a total of 20 TT samples, including 10 HCV-related HCC samples and 10 non-carcinoma liver tissue samples were included.

Data Preprocessing and Identification of DEGs

Firstly, robust multi-array average (RMA) method in the Affy package in R [15] was used to preprocess the downloaded chip data, including background correction, data normalization, and calculation of expression values. Subsequently, DEGs between the HCC and normal tissue samples were identified using T-test in the Linear Models for Microarray Data (limma) package in R [16], and thresholds were set as |log fold change (FC)| ≥ 0.58 and P-value ≤0.05.

Functional Enrichment Analysis of DEGs

KEGG Pathway Database [17] is a relatively common and comprehensive database in different biochemical pathway studies. In this study, the DAVID [18] online tool was selected to perform the KEGG pathway enrichment analysis for DEGs. Only the pathway terms with P-value <0.05 were considered significant.

Construction of Clustering Tree for WGCNA

WGCNA arithmetic [19] is a typical systemic biological arithmetic for the construction of gene co-expression network, and this arithmetic is based on high-throughput chip data of mRNA and is widely used in biomedical area all over the world. For WGCNA, it is assumed that all the gene networks obey scale-free and define the adjacent function of gene co-expression related matrix and gene-network formation to calculate discrepancy coefficient of different nodes, and the hierarchical clustering tree is constructed by the computed results. For hierarchical clustering tree, each line represents a single gene and each branch represents the co-expression gene, so the different branches can obviously reflect the different gene modules, different co-expression of genes in these modules and the levels of co-expression genes of different modules. Except these advantages for searching co-expression modules, this network analysis is also able to help us to focus on the genes whose fold changes are not obvious (0.58 ≤ |logFC| ≤ 1) but have potential to affect the development of HCV-related HCC.

Here, WGCNA package in R was used to construct clustering trees for both normal liver and HCV-related HCC samples, and then continually used to compare and screen the specific modules for HCV-related HCC for the further study.

Construction of WGCNA Network and Screening for the hub Genes

R package of WGCNA was used to construct networks for the selected modules. Due to too much genes in modules, according to the connectivity degree of node, the top 30 hub genes were firstly selected, and about 5 genes were then chosen from the 30 genes based on their topological overlaps. Finally, GO functional enrichment analysis was performed via DAVID online tool to select gene modules and analyze these gene functions in molecular level.

Results

Screening of DEGs

Based on the criteria of |logFC| ≥ 0.58 and P-value ≤0.05, a total of 1151 DEGs were identified in the HCV-related HCC samples compared with the normal samples, including 433 up-regulated genes and 718 down-regulated genes.

KEGG Pathway Enrichment Analysis

With the criterion of 0.05 in the KEGG pathway enrichment analysis, the significant enriched biochemistry pathways of the upregulated genes, such as LAMA4, MDM4, COL4A2, COL4A1 and CDKN2B, were mainly concentrated in ECM-receptor interaction, Cell cycle, Pathway in cancer, P53 signaling pathway and small cell lung cancer (Table 1). Meanwhile, with the same cut-off criteria, the significant enriched biochemistry pathways of the downregulated genes, such as ACADM, HMGCS2, CXCL14, ACADS, and CXCL2, were mainly concentrated in cytokine-cytokine receptor interaction, peroxisome proliferator-activated receptor (PPAR) signaling pathway, complement and coagulation cascades, tryptophan metabolism, metabolism of xenobiotic by cytochrome P450 and drug metabolism pathways and so on (Table 2).

Table 1 The results of pathway enrichment analysis for the upregulated genes
Table 2 The results of pathway enrichment analysis for the downregulated genes

Analysis of WGCNA Clustering Trees

Based on the WGCNA, the systematic clustering were generated (Fig. 1). The gene modules were signified by different colors and the grey module indicated the genes that cannot be merged. For the HCC tissue clustering tree (Fig. 1a), with the soft-threshold of 7, 14 gene modules were identified, and the gene number of 14 modules ranged from 31 (Cyan) to 219 (Turquoise). For the normal tissue clustering tree (Fig. 1b), with the soft-threshold of 6, 8 gene modules were identified, and the numbers of genes in these modules ranged from 43 (Pink) to 372 (Turquoise).

Fig. 1
figure 1

The cluster dendrogram and color display of co-expression network modules for differentially expressed genes in hepatitis C virus related hepatocellular carcinoma samples (a) and normal samples (b) Each short vertical line represents a gene; each branch represents the co-expression genes in each cluster module; and the area of each colors represents the number size of each cluster module

Furthermore, repeatability analysis was performed for the modules of two clustering trees (Fig. 2), it clearly showed that genes only in Grey module of normal tissue were not overlapped with modules of HCV-related HCC, but the genes in Tan, Yellow and Cyan modules in HCV–related HCC were not overlapped with any other modules in normal tissues. This result may indicate that only modules Tan, Yellow and Cyan modules specifically belonged to HCV-related HCC, and genes in these 3 modules were not overlapped with genes expressed in normal tissue clustering trees.

Fig. 2
figure 2

Comparisons between HCV related HCC and normal liver modules. Vertical axis represents HCV-related HCC modules, and the horizontal axis represents normal tissue modules; the depth of red color is positively related to coincidence depth of crossed modules, and the numbers in each rectangle represents the number of overlapping genes

Analysis of WGCNA Network

To further explore the biological functions of these included genes, WGCNA network analysis map was constructed by combining distinctive modules of HCV-related HCC. In the Tan network, with the cut-off criterion of 0.05 and node connection >10, a total of 4 hub genes in the network were identified, including two upregulated genes EPS8L3 and WNK4 and two downregulated genes GCKR and SDC4 (Fig. 3a). In the Yellow network, with the cut-off criterion of 0.1 and node connection >10, a total of 3 hub genes in the network were identified, including two upregulated genes SLA25A47, EFNA4 and one downregulated gene MME (Fig. 3b). Additionally, in the Cyan network, with the cut-off criterion of 0.03 and node connection >10, a total of 3 downregulated genes hub genes: DHX32, SERTAD1 and STBD1 were obtained in the network (Fig. 3c).

Fig. 3
figure 3

Three specific network analysis modules Tan (a), Yellow (b) and Cyan (c) in HCV-related HCC. Hub genes of each modules are signified by black color

Furthermore, in order to assess whether these genes share specific biological function features, GO enrichment analysis was performed for the genes of these 3 specific modules. Only Yellow module had a significant enrichment effect (Enrichment score > 2, Table 3) and its function was mainly concentrated on the extracellular matrix integrated, polysaccharide combination, heparin-binding etc., so this module was named substance combination module.

Table 3 Gene ontology (GO) functional enrichment analysis of yellow module in HCV-related HCC

Discussion

Based on the analysis of HCV-related microarray data, a total of 1151 DEGs were identified including 433 upregulated and 718 downregulated genes. Based on these genes, Tan, Yellow and Cyan were three HCV-related HCC-specific-modules. Based on the functional enrichment analysis by the GO method, the result showed that only module Yellow had a significant enrichment score, so we named it as substance combination module.

Yellow module was the unique module which had a meaningful enrichment score in the GO enrichment analysis. SLC25A47, EFNA4 and MME were the hub genes in this module. SLC25A47 (Solute carrier family 25 member 47) is related to the development of liver disease [20]. A recent study has reported that SLC25A47 contributes to the hepatitis-associated inflammation [20] and is upregulated in livers of the mice with the deficiency of Rictor, which plays a vital role in mammalian target of rapamycin complex 2 (mTORC2) signaling pathway of HCC development [21]. Given that the expression of SLC25A47 was also increased in HCV-related HCC, this gene might play a vital role in the development of HCV-related HCC. MME, encoding membrane metallo-endopeptidase, is an indicator of cancers, especially in leukemia. The bio-functions of MME are mainly concentrated on cell-cell signaling and proteolysis [22]. It has previously reported that the CpG islands of MME were hyper-methylated in HCC compared with pre-neoplastic tissues and normal tissues [22], which is consistent with the result of this study that MME was downregulated in HCC tissues compared with the normal controls. In addition, the research of Tong et al. also has indicated that MEM also has a decrease in the focal segmental glomerulosclerosis which is caused by podocyte injury [23]. EFNA4 encodes ephrin A4, which belongs to the family of receptor tyrosine kinases which has two subclasses: A (A1-A5) and B (B1-B3) [24]. Members of epherin A subclass are anchored to the membrane via glycosylphosphatidylinositol linkage to mediate signaling about growth, migration and invasion in cancer [24]. As it previously reported, EFNA4 is found to be upregulated in lung cancer compared with normal lung tissues [25], which is consistent with the result in this study. In a clinical research, ephrin A4 was also found to be a potential risk factor for chemotherapy and prognosis of osteosarcoma patients [26]. Based on the bio-functions of the hub genes in Yellow module, we suggested when liver was infected by HCV, inflammation induced by the immune response will lead to a upregulated expression of SLC25A47, which promoted hepatitis to develop into HCC; on the same time, hyper-methylation would downregulate the expression of MME and this results may impair the hydrolysis of MME protein to maintain the disordered increasing of liver cells so that can trigger EFNA4 to promote the migration and invasion of infiltrating tumor cells.

Furthermore, in this study, Tan and Cyan were another two HCC-specific modules. There were four hub genes including two upregulated genes WNK4 and EPS8L3, and two downregulated genes GCKR and SDC4 in Tan module. The downregulated GCKR is the coding gene of glucokinase regulator, which is the key enzyme for de novo lipogenesis from glucose [27]. Another downregulated gene SDC4 is an important sensor of the extracellular matrix in wound contraction, fibrosis, regulation of motility [28]. WNK4 was one of the upregulated genes in the module Tan. It has reported that WNK can affect the HCV entry via modulating the phosphorylation of claudins [29]. Despite there was no direct evidence to prove that these four genes participated in HCC occurrence, the bio-functions of them also could clearly show that they may play vital roles in the development and metabolism of HCC. EPS8L3 is encoding gene of a protein which is a substrate of the epidermal growth factor receptor, but the function of this protein is still unclear. Three downregulated hub genes (DHX32, SERTAD1 and STBD1) were included in the module Cycan. DHX32 is originally identified as an RNA helicase, and the function of this gene is largely unknown. A recent research has reported that DHX32 promotes proliferation, migration and invasion of colorectal cancer cells with an upregulated expression [30], but this result was not according to the result of HCC. The conflict may suggest that DHX32 may be differentially expressed in different tumor types. SERTAD1 was mainly reported being upregulated with cell proliferation. STBD1 is the coding gene of starch-binding domain-containing protein 1 which can fix glycogen to autophagic membrane to help lysosomal degradation [31]. In addition, it has also reported that STDB1 plays a vital role in the transportation of glycogen to lysosomes in liver [32]. It is worthy to point out that the expression levels of DHX32 and SERTAD1 in the above studies are contrary to their expressions in this study. Whether they had any other mechanism involved should be concerned in the future.

In this study, WGCNA was used to analyze the expression of mRNA between HCV-related HCC and normal tissues in order to obtain some specific genes and modules of HCC which were not identified in known pathways and studies. Among the selected genes, for some genes, only names and variations in expression have been reported, and the functions of some genes in HCV-related HCC are still unknown. Performing researches on these selected genes and modules may allow us to have a further understanding of the development of HCC and provide some potential drug targets in targeted therapy. Meanwhile, in order to screen more potential remarkable genes, we also set a wider threshold (|logFC| ≥ 0.58 and P-value ≤0.05) than general (|logFC| ≥ 1 and P-value ≤0.05) selected criteria. However, this study still has some limitations. The clinical information was not included in the analyses. Sufficient clinical data not only can eliminate the influence of gender and age but also can provide a specific analysis for certain symptom, but there is only scarce data in GEO (https://www.ncbi.nlm.nih.gov/geo/), so a larger clinical information should be collected to have a better analysis in the future. In addition, the absence of experimental verification was also another deficiency of this study.

In conclusion, a total of 10 hub genes were identified in three specific modules in HCV-related HCC tissues compared with normal tissues. Among three modules, Yellow module had significant enrichment score, the hub genes SLC25A47, EFNA4 and MME in this module are new–found to be related to the development of HCV-related HCC. Studies on these genes may not only enable us to have a further understanding of the HCC development, but also can provide potential treatment targets for HCC therapy.