Introduction

Intrahepatic cholangiocarcinoma (iCCA) is a malignant disease of the biliary tree and the second most common primary liver cancer, accounting for 10–15% of hepatobiliary neoplasms [1], but its incidence and associated mortality rate have recently increased [2]. iCCA is hard to diagnose in its early stages, so by the time they are diagnosed, most patients have an advanced stage. Since chemotherapy or targeted therapy is limited, managing this cancer remains challenging.

Recent molecular analyses have revealed several markers for poor prognosis including KRAS mutation and activation of the oncogenic pathways (e.g., HER2 and EGFR signaling) [3, 4]. At the molecular level, iCCA has subgroups that are associated with different outcomes [3,4,5,6]. However, the relation between molecular and clinicopathologic features has not been fully known yet. Molecular characterization might be helpful for screening patients, predicting prognosis, and developing targeted therapies. However, applying molecular signatures in clinical settings is still not feasible because of the high cost for evaluating molecular characterization. If the molecular subtype of iCCA has specific clinical and pathologic features, the molecular subtype could be predicted from clinical findings, and the approach may influence treatment selection.

In this study, we aimed to answer three key questions: (1) Do different molecular subtypes of iCCA have distinct clinicopathologic features and can the molecular subtype be predicted on the basis of clinical findings? (2) Do molecular signatures determined in previous studies agree with each other and can they be validated? (3) Is there an effective subtype-specific treatment? We hypothesized that iCCA can be classified according to molecular characterization, that each molecular subtype would have corresponding clinical and pathologic features that are associated with different outcomes, and that treatment options may be determined according to the subtype of iCCA. To test the hypotheses, we used next-generation sequencing (NSG) to analyze RNA sequencing data obtained from frozen tissues of iCCA, annotated with clinical and pathologic data. Additionally, we evaluated the therapeutic potential of gemcitabine for treating cholangiocarcinoma (CCA) cell lines on the basis of molecular subtypes.

Materials and methods

Patients, tissue samples and clinical analysis

From March 2008 to August 2013, 56 patients with iCCA underwent curative-intent surgical resection at Keimyung University Dongsan Medical Center (DSMC), Daegu, Korea. For 42 cases, tumor tissues were available as frozen tissues. We excluded seven patients whose tissues did not qualify for RNA analysis. Additionally, five patients who did not have adenocarcinoma (combined HCC/iCCA or cholangiosarcoma) were excluded. Finally, surgical specimens from 30 patients with iCCA were enrolled. Surrounding normal liver tissues were available for 26 patients and were used as paired control tissues. The 56 fresh samples (30 primary tumor and 26 matched control tissues for surrounding normal liver) were saved frozen (− 80 °C) and used for analysis. Two pathologists reviewed all hematoxylin and eosin-stained slides and classified pathologic type into cholangiolar and bile duct type (Supplementary Fig. 1) [7]. As a validation data set, 32 curative, resected fresh tumor samples were collected from 2001 through 2014 from patients admitted to Mayo Clinic, MN, USA, with the same inclusion criteria as those used for the Korean data set. The patients’ clinical, pathologic and survival data were analyzed retrospectively.

Statistics, bioinformatics, and molecular analysis

The methods of histologic, statistics, bioinformatics, and molecular analysis are described in Supplemental Methods online.

RNA sequencing data with clinical information are available in Gene Expression Omnibus(GEO; http://www.ncbi.nlm.nih.gov/geo) with accession number of GSE107943.

Results

Demographic characteristics and clinical outcomes

The DSMC (Korean) patient population consisted of 24 males and 6 females, with a median (range) age of 65 (49–79) years (Table 1). All of the Korean patients were Asian. Ten patients received palliative chemotherapy postoperatively after an iCCA recurrence. During the median (range) follow-up of 30.5 (9.3–115.7) months, 21 patients (70.0%) had disease recurrence, and 17 (56.7%) of them died. The 3- and 5-year overall survival rates were 45.8% and 27.1%, respectively, and the 3- and 5-year disease-free survival rates were 34.2% and 20.5%, respectively.

Table 1 Clinical and pathologic features according to molecular subclass

In the Mayo (USA) cohort used for validation, there were 15 male and 17 females, with a median (range) age of 64 (39–83) years. Twenty-three of the patients were of Caucasian race/ethnicity, and 9 were American African, Hispanic, or other. During the median (range) follow-up of 40.7 (3.7–128.3) months, disease recurred in 13 patients, and 17 patients died. Data regarding recurrence were unavailable for six patients, including for three patients who died postoperatively. Seven patients underwent chemotherapy: one neoadjuvant; one adjuvant; and five palliative. The 3- and 5- year overall survival rates were 47.1% and 32.5%, and the 3- and 5-year disease-free survival rates were 29.5% and 22.7%, respectively.

In both cohorts, all patients had mass-forming (MF) type of CCA.

Identification of two distinct iCCA subclasses by RNA expression profiling

Unsupervised hierarchical clustering by Pearson correlation in all 56 tissues (30, tumor; 26, surrounding normal liver) of the DSMC data set showed two distinct clusters between tumor and normal tissues (Supplementary Fig. 2). A total of 2379 differentially expressed genes (DEGs) between tumor and adjacent normal liver tissues were identified and used for further analysis for identifying tumor subtypes. In 30 tumor samples, unsupervised hierarchical cluster analysis separated the tumors into two distinct subclasses that were associated with survival (Fig. 1a, b). The 5-year disease-free and overall survival rates in subclass A (n = 12) were 56.2% and 70.7%, respectively, compared with 0% and 7.6% in subclass B (n = 18) (Fig. 1c).

Fig. 1
figure 1

Tissue samples from both cohorts. Thirty samples from the Korean data set were separated into two subclasses with distinct clinical features by unsupervised hierarchical cluster (a) and principal component analysis (b). c Kaplan–Meier analyses showed worse disease-free and overall survival in subclass B than subclass A for the Korean patients. US tumor samples used for validation also revealed two distinct subclasses by unsupervised hierarchical cluster (d) and principal component analysis (e). f Similarly, in the US cohort, the overall survival rates of subclass B were worse than that of subclass A. Data for disease recurrence were not available for six patients, three of whom died postoperatively; follow-up data for these patients were also unavailable. CA cancer antigen, CEA carcinoembryonic antigen, PC principal component

A total of 148 DEGs between two subclasses of tumor and an FDR < 0.05 were identified (Supplementary Table 1). Of these 148 genes, 95 genes were up-regulated in the group with poor survival (subclass B) and 53 were up-regulated in the group with better survival (subclass A). To validate tumor subtype, the 148 DEGs were applied to the validation group (Mayo cohort). By the same profiling with the Korean cohort, unsupervised cluster analysis revealed two distinct subgroups and successfully separated patients with poor survival (subclass B, n = 11) from those with better survival (subclass A, n = 21) (Fig. 1d–f). When the DSMC and Mayo cohorts were combined, there were significant differences in disease-free survival (p = 0.001) and overall survival (p = 0.001) according to subclass (Supplementary Fig. 3).

Pathway analysis

Ingenuity pathway analysis (IPA) revealed that activation of cancer, cellular growth and inflammation-related pathways were predicted in subclass B, and metabolism-related pathways (lipid, carbohydrate and amino acid) were predicted by activation in subclass A (Supplementary Table 2).

According to gene set enrichment analysis (GSEA), several oncogenic pathways were enriched in subclass B, and over half of the significant gene sets (FDR < 0.05) or pathways were related to inflammatory and immune responses, including interferon alpha response, inflammatory response, IL-6, and the cytokine-mediated signaling pathway. KRAS-related signaling, TGF-β, and TNF-α signaling were also enriched, and genes potentially repressed by p53 were up-regulated in subclass B. In contrast, subclass A was characterized by metabolic-related pathways, including bile acid and steroid metabolic process. Several liver-specific genes or liver cancer-related pathways were also more enriched in subclass A than in subclass B (Fig. 2, Supplementary Tables 3 and 4).

Fig. 2
figure 2

Gene set enrichment analysis. a Subclass A was characterized by an enriched bile acid metabolism-related pathway and b a fatty acid metabolism pathway. c Subclass B was characterized by an enriched P53-signaling pathway, d an enriched KRAS-signaling pathway, e inflammatory-response pathway and f ribosome-related pathway

mRNA variant analysis

Supplementary Table 5 shows variants that were finally selected. Mutations of PMS2 (21%), KRAS (14.5%), TP53, and ERBB2 (9.7%) were frequently found in the DSMC and Mayo cohorts (Supplementary Fig. 4). KRAS mutations were seen in nine patients, all in subclass B (p = 0.002).

Because KRAS variant was significantly more frequent in subclass B, we validated the KRAS mutation by Sanger sequencing of two specific regions with available Korean samples. In the Korean data set, a KRAS mutation was found in five samples by RNA sequencing and all of them were matched with Sanger sequencing (Supplementary Fig. 5, Supplementary Table 6).

RNA fusion

The RNA fusions in the DSMC and Mayo samples are listed in Supplementary Table 7. Three FGFR2 fusions (FGFR2-BICC1, FGFR2-CDYL, FGFR2-WAC) were found in subclass A.

Integrated molecular and clinical analysis

We then examined the association of each molecular subclass with clinical and pathologic features. In the DSMC cohort, hepatitis and cholangiolar-type pathologic findings were dominant in subclass A, and higher carcinoembryonic antigen (CEA) levels, more frequent bile duct pathologic type, and history of cholangitis were dominant in subclass B. In the Mayo cohort, patients with a high cancer antigen (CA) 19-9, bile duct pathologic type, and advanced American Joint Committee on Cancer stage were more frequent in subclass B than subclass A (Table 1; Fig. 1).

In the combined DSMC–Mayo data, cholangiolar type was remarkably dominant in subclass A, and bile duct type was dominant in subclass B. Higher levels of CEA and CA 19-9 were observed in subclass B, and hepatitis B or C was more frequent in subclass A. Nine patients (31.0%) in subclass B had a KRAS mutation; no patients in subclass A had a KRAS mutation. Although an IDH1 mutation was more common in subclass A than in subclass B (15.2% vs 3.4%), there was no statistical difference (Table 1).

On multivariate analysis in the combined DSMC–Mayo data set, subclass B was an independent risk factor (p < 0.05) for disease-free survival, and advanced AJCC stage (III and IV) and subclass B were independent risk factors for overall survival (Table 2).

Table 2 Risk factors for disease recurrence and overall survival in the combined Korean and US data sets

Comparison of the present CCA classifier with other data sets

The recently published data on CCA gene expression was used to validate the molecular classification with gene signatures by microarray analysis (GEO accession number GSE26566 [3], GSE33327 [5], GSE89749 [4]) and The Cancer Genome Atlas Project (TCGA) [6]. Their RNA expression data were downloaded from the TCGA portal (https://portal.gdc.cancer.gov/) and GEO, as well as clinical information. For comparison with clinicopathologic findings, we analyzed studies evaluating pathologic classification by Liau et al. [7] and Akita et al. [8]. For GSE26566 (cluster 1, good prognosis; cluster 2, poor prognosis) and GSE33327 (inflammation class, good prognosis; proliferation class, poor prognosis), gene lists for each subclass are available in supplementary tables. For TCGA (cluster 1 + 4 vs 2 + 3) and GSE89749 (cluster 1, poor prognosis vs 4, good prognosis), DEGs from the two subclasses according to sample-cluster annotations were identified.

In the GSEA, many DEGs of cluster 1 (GSE26566), cluster 4 (GSE89749), and cluster 1 + 4 (TCGA) which are good prognostic subclasses were overlapped with subclass A DEGs of the study (FDR < 0.001). Subclass B correlated with poor prognostic subclasses of cluster 2 (GSE26566), cluster 4 (GSE89749), and cluster 2 + 3 (TCGA) (FDR < 0.001) (Fig. 3a). Although inflammatory gene sets (good prognostic subclass) of GSE33327 were correlated with subclass A (FDR = 0.226), proliferative gene sets of GSE33327 did not correlate with either subclass A or B.

Fig. 3
figure 3

a In the gene set enrichment analysis, gene sets of cluster 1 (GSE26566), cluster 1 + 4 (TCGA) and cluster 4 (GSE89749) were significantly correlated with gene sets of subclass A in the study (FDR < 0.001). Subclass B in the study corresponded to that of cluster 2 (GSE26566), and cluster 2 + 3 (TCGA) and cluster 1 (GSE89749) (FDR < 0.001). Corresponding gene sets of the study with TCGA (b), GSE26566 (c) and GSE89749 (d). When 148 subclass-specific genes of the study were applied to those in the data set, unsupervised cluster analysis revealed two distinct subgroups in all data sets and successfully separated patients with poor survival (corresponding to subclass B of the present study) from those with better survival (corresponding to subclass A of the present study) (survival information was not available in the GSE26566 data set)

Generally, the molecular features of subgroup according to the prognosis were similar among different data sets. Therefore, elevated levels of CEA and CA 19-9, bile duct-type (perihilar type) pathology, and a history of cholangitis can be specific features of molecular subgroup with a poor prognosis, whereas underlying hepatitis and cholangiolar-type (peripheral type) pathology are features of better prognostic molecular subgroup.

To validate tumor subtype of the present study, the 148 DEGs of the present study were applied to the three validation data sets (TCGA, GSE26566 and GSE89749). Unsupervised cluster analysis revealed two distinct subgroups in all data sets and successfully separated patients with poor survival (corresponding to subclass B of this study) from those with better survival (corresponding to subclass A) (Fig. 3b–d).

Association between gemcitabine sensitivity and subclasses

Among gemcitabine-related genes [9], SLC28A1 and SLC29A1 were associated with a gemcitabine response and were up-regulated in subclass A. CDA and RRM2, which are related to gemcitabine resistance or toxicity, were up-regulated in subclass B. All patients who had higher expressions of SLC28A1 and SLC29A1 and lower expressions of RRM2 and CDA were in subclass A, and patients who had opposite expression pattern were in subclass B (Fig. 4a).

Fig. 4
figure 4

Expression of gemcitabine-response genes (SLC28A1 and SLC29A1) and resistance-related genes (CDA and RRM2) in the Korean Cohort. a Gemcitabine-response genes were up-regulated in subclass A and resistance-related genes in subclass B. b For ten genes that were up-regulated in subclass A (SPP1, ITIH2, ITIH5, CDH6, and AQP1) or subclass B (AGR2, S100P, TSPAN8, MUC1, and SLC44A4), mRNA expression was measured with quantitative polymerase chain reaction in six cholangiocarcinoma cell lines (HuCCT, EGI-1, WITT, LIV27, KMBC, and Huh28). LIV27, KMBC, and Huh28 had similar gene expression patterns as subclass A, and HuCCT, EGI-1, and WITT had features of subclass B. c KMBC and LIV27 were sensitive to gemcitabine at concentrations between 0.03 and7500 nM

We selected ten genes that were up-regulated in each subclass on the basis of log-fold change and FDR (SPP1, ITIH2, ITIH5, CDH6, and AQP1 for subclass A and AGR2, S100P, TSPAN8, MUC1, and SLC44A4 for subclass B) and measured mRNA expression with quantitative polymerase chain reaction in 6 CCA cell lines. LIV27, KMBC, and Huh28 showed similar gene expression patterns as subclass A, and HuCCT, EGI-1, and WITT had similar features to subclass B (Fig. 4b).

We compared 6 CCA cell lines for sensitivity to gemcitabine. KMBC and LIV27 (gene expression patterns of subclass A) were sensitive to gemcitabine at concentrations between 3 and 7500 nM (Fig. 4c).

Discussion

The two molecular subclasses in the study had specific clinical characterizations with distinct survival differences. In subclass A, patients with chronic viral hepatitis (B and C) were at a significant risk for CCA [10]. Most of these patients (93.9%) had cholangiolar-type (peripheral type) CCA and CDH6, which are commonly found in HCC [11, 12], and were up-regulated in subclass A. In the aspect of mutation, five patients (15.2%) in subclass A had an IDH1 mutation, which is usually found in patients with iCCA and not in those with extrahepatic CCA or gallbladder cancer [13, 14]. Moreover, all FGFR2 fusions were in subclass A and they are typically found in iCCA, not extrahepatic or perihilar CCA [15]. In subclass B, eight patients (27.6%) had a history of cholangitis due to clonorchiasis, primary sclerosing cholangitis, or hepatolithiasis. In subclass B, bile duct type (perihilar type) of pathology was significantly more frequent than cholangiolar type. S100P, AGR2, and MUC1 were up-regulated; these genes are related to bile duct-type CCA [7] and are frequently expressed in extrahepatic CCA [16, 17] or pancreatic cancer [18, 19]. Interestingly, serum levels of CEA and CA 19-9 were significantly higher in patients in subclass B than in those in subclass A. In subclass B, CEACAM6, one of the CEA family genes, was up-regulated. CEACAM6 expression associated with serum CEA is common and a poor prognostic indicator [20]. CA19-9 is a cell surface antigen located on MUC genes-associated marker [21]. Several genes in the MUC family (MUC1, MUC5B, and MUC13), which have roles in early detection and prognostic prediction for pancreatic cancer [22], were up-regulated in the subclass B. Therefore, CEA and CA 19-9 may be useful for screening certain patients in subclass B. A KRAS mutation was found only in class B (31%). Usually, KRAS mutation is more frequent in extrahepatic CCA (~ 40%) and pancreatic cancer (~ 90%) than in iCCA (10–20%) [14, 23]. Therefore, the study revealed that MF type of iCCA has two molecular characterizations with distinct clinical features: one is peripheral type of iCCA with risk factor of hepatitis (subclass A) and another with pathological and clinical features with perihilar or extrahepatic CCA (subclass B).

iCCA is believed to originate from biliary epithelial cells in the intrahepatic bile duct or from hepatic progenitor cells. Hepatic bipotent progenitor cells along the small intrahepatic bile duct may be able to differentiate not only into hepatocytes but also into cholangiocytes, which can lead to iCCA [24, 25]. Therefore, two types of iCCA may develop: one originating from hepatic stem cell-derived lineages with stem-like molecular characteristics similar to those in HCC or combined HCC–CCA and the other originating from biliary tree progenitor stem cell-derived cholangiocytes found along the large intrahepatic bile duct with similar characteristics to those of perihilar or extrahepatic CCA [17, 26]. The clinical and molecular integration analysis showed that the results for the MF type of iCCA were divided, with different molecular pathways and clinical outcomes. Abundant liver-specific genes and HCC-related signaling, IDH1 mutation, FGFR2 fusion, underlying hepatitis, and cholangiolar types of pathology were characteristic of the group with a good prognosis (subclass A); in this group, iCCA may arise from the smaller intrahepatic bile duct or bipotential hepatic progenitor cells within portal areas. Meanwhile, the poor prognosis of the group (subclass B) characterized by KRAS mutations, underlying cholangitis, high serum CEA and CA 19-9, and bile duct-type pathology, likely results from malignant transformation of cholangiocytes that line the larger bile ducts (Supplementary Fig. 6) [7]. KRAS and P53-related pathways have strong impact on poor prognosis in many cancer including cholangiocarcinoma [3]; therefore, they might contribute to the worse prognosis in the B subclass than to the metabolic-related pathway in the A subclass. Most of recent cancer genomic studies focused on molecular findings and could not provide clear clinical relevance, especially for pathologic findings according to molecular subtypes. The molecular characteristics in the two subclasses could be validated to other public data sets and showed different survival outcomes as in this study (Fig. 3). Each of the two subclasses in the validation sets has similar molecular characteristics to those of this study. Although clinical information in validation sets is limited, subclass-specific clinical and pathological characteristics of this study can be applied also to validation sets. This study has the novelty of linking the gap between the clinicopathological [7] and molecular features [3, 5, 6] of iCCA, and from the clinicopathologic characteristics, the molecular subtypes could be predicted. Especially, the study revealed that the two different pathologic types are important features for predicting the molecular subtype.

CCA is highly resistant to chemotherapy. This study showed that patients with certain molecular characteristics may benefit from gemcitabine. On the basis of mRNA expression, some, but not all, patients in subclass A were expected to respond to gemcitabine because of a higher expression of gemcitabine-response genes (SLC28A1 and SLC29A1) and a lower expression of gemcitabine-resistant or toxicity-related genes (CDA and RRM2). LIV27 and KMBC cell lines, which had similar molecular features as subclass A and responded well to gemcitabine. Low initial serum levels of CEA or CA 19-9, a clinical feature of subclass A, more likely resulted from a good response to gemcitabine in patients with advanced CCA [27]. For subclass B, the presence of a KRAS mutation and high expression of S100P and AGR2 are known to be important factors in resistance to gemcitabine [18, 28]. Therefore, gemcitabine-based chemotherapy can be considered for patients with advanced CCA that has similar molecular features to that of subclass A. Since this study could not show in vivo the clinical results of gemcitabine sensitivity, animal study or clinical study in non-resectable CCA or neoadjuvant chemotherapy is necessary for evaluation of the benefits of gemcitabine. Tyrosine kinase inhibitors (TKIs), especially erlotinib, are another potential therapeutic option for patients with advanced biliary cancer who have activated HER2 and EGFR signals, with overexpression of EGFR [29]. In the study, sorafenib was not sensitive in all CCA cell lines (data not shown). However, in the cell line studies of GSE26566, WITT, and HuCCT, which have similar molecular features to those of subclass B, were effective for lapatinib and trastuzumab [3]. Based on this study and the GSE26566 study [3], the effectiveness of combination therapy (gemcitabine for subclass A and TKIs for subclass B) should be evaluated in the future.

There are a few limitations in the present study. Among the three types of iCCA [MF type, periductal infiltrating (PI), and intraductal growing (IG) type], all patients were of MF type. PI and IG types are usually seen in extrahepatic CCA and have homogeneous histological features, bile duct type [7]. Therefore, most of the PI and IG types may correspond to subclass B. On the contrary, MF is the most frequent form of iCCA, accounting for more than 70% [30] and has heterogenous histologic features [7]. It implies two different origins as seen in the study. Therefore, focusing on the MF type might be efficient for revealing the molecular–clinical relevance; however, further study for PI and IG is necessary. In addition, the study had limited case number. Although we used other public data for validation, there is lack of clinical and pathological information and further analysis is needed for validating pathological–molecular relevance.

In summary, we used clinicopathologic and molecular integration analysis to identify two molecular classifications of iCCA that have distinct clinical, pathologic, biologic, and prognostic differences. With this knowledge, molecular subtypes can be predicted. In addition, different subtype signaling pathways may lead to more rational, targeted approaches to treatment.