Introduction

Intrahepatic cholangiocarcinoma (ICC) , which accounts for approximately 10%–15% of all primary liver cancers and is the second most common type after hepatocellular carcinoma (HCC), has shown an increasing trend in recent years [1, 2]. While ICC and HCC have similar risk factors, such as chronic hepatitis, chronic liver disease and diabetes [3], ICC tends to have a poorer prognosis and survival outcomes than HCC, with only 30% 5-year survival even after curative-intent surgery [4, 5]. Moreover, treatment options vary between both conditions. For example, transarterial chemoembolization and liver transplantation are not recommended for patients with ICC owing to the relative more hypo-vascularity and overall worse prognosis than HCC [6]. Furthermore, the HCC can be accurately diagnosed using two different enhance imaging modalities, whereas ICC can only be confirmed using histopathologic assessment [3, 7]. Therefore, preoperatively differentiating ICC from HCC is vital for optimizing clinical decision-making and evaluating prognosis in patients.

Contrast-enhanced ultrasound (CEUS) has been used to improve the diagnostic performance of B-mode ultrasound (BUS) for focal liver lesions [8]. However, given the overlap of features, such as the presence of arterial phase hyperenhancement (APHE) or wash-out, the clinical value of CEUS in distinguishing ICC from HCC has been controversial [9, 10].

In the CEUS Liver Imaging Reporting and Data System (LI-RADS) established in 2017, the LR-M category was introduced to characterize definitely or probably malignant lesion but not specific for HCC, and the appearance of ICC most closely tended to the LR-M criteria [11]. Yet, approximately 40% of lesions in the LR-M category were HCCs, indicating a relatively low sensitivity for HCC [12]. Therefore, improving the diagnostic performance of CEUS in differentiating between ICC and HCC warrants further study. Li et al. proposed that the specificity of CEUS could be increased by adjusting the early wash-out onset of the LR-M criteria from 60 s to 45 s, without affecting the sensitivity [13]. However, whether the modified LR-M criteria is superior than the current LR-M criteria in distinguishing ICCs from HCCs has yet to be validated comprehensively. Furthermore, some clinical and imaging features may have potential value in aiding ICC diagnosis, such as serum tumor markers and lesion boundary [14, 15]. Based on these factors, we hypothesized that the diagnostic performance of CEUS in the differentiation of ICC and HCC could be further improved by combining clinical and BUS features.

Therefore, this study aimed to validate the performance of the modified LR-M criteria in differentiation between ICC and HCC and to establish a multi-parameter ICC scoring system to further improve the performance by adjusting and refining the CEUS, BUS, and clinical features.

Materials and methods

Patients

This study was approved by the Ethics Committee of *** (No: B2022-569R). The requirement for informed consent was waived due to the retrospective nature of the study. We enrolled 80 consecutive high-risk patients with pathologically confirmed ICCs between January 2022 and December 2022 from two institutions (A, n=70; B, n=10). The inclusion criteria were patients with: (a) a pathological diagnosis confirmation; (b) BUS and CEUS examinations within 1 month before surgery or biopsy; and (c) chronic hepatitis or cirrhosis. A total of 18 patients were excluded based on the following criteria: (a) history of biopsy, ablation, or systemic therapy (n=7); (b) incomplete clinical information (n=4); and (c) poor quality of US imaging data, such as incomplete arterial phase (AP), portal venous phase (PVP), or late phase (LP) clips in CEUS (n=7). If a patient had multiple lesions, the largest tumor was selected as the target lesion for analysis. In total, 62 patients with ICC were enrolled in the study, consisting of 55 cases confirmed by surgical resection and 7 by percutaneous biopsy.

Likewise, 543 consecutive high-risk patients with pathologically confirmed HCCs were enrolled according to the above inclusion and exclusion criteria and matched at the ratio of 1:1 based on tumor size. A total of 62 patients with HCCs (59 diagnosed by surgical resection and 3 by percutaneous biopsy) were included in the analyses. The flowchart of patient selection is shown in Fig. 1. All clinical and pathological data were acquired from the medical record systems.

Fig. 1
figure 1

Flowchart of the study design

US imaging acquiring

BUS and CEUS examinations were performed by experienced US radiologists using one of the following US scanner systems: the Samsung RS80A (Samsung Ultrasound System, Seoul, Korea) with a C1-6 convex transducer, the Aplio 500 (Toshiba Medical Systems, Tochigi, Japan) with a 375BT convex transducer, or the Mindray Resona 7s (Mindray Medical, Shenzhen, China) with a SC5-1U convex transducer. First, the whole liver was scanned using BUS. After identifying the target lesion, CEUS was performed on the largest section of the tumor using a low mechanical index (MI) pattern (MI, 0.08–0.12).

The US contrast agent (2.0 mL; SonoVue, Bracco SpA, Milan, Italy) was diluted in 0.9% saline and intravenously injected into the antecubital vein followed by a 5 mL saline flush. The targeted lesions were observed continuously for at least 120 s, and then scanned at 20–30 s intervals and recorded for 5 min or until the microbubbles disappeared. All imaging data were stored on a hard disk for subsequent analysis.

US imaging data analysis

BUS and CEUS data were independently analyzed by two experienced US radiologists with 15 and 18 years of experience in liver CEUS, respectively, who were blinded to the patients’ pathological results and clinical information. Disagreements were reached consensus by discussion. The AP, PVP, and LP were defined as 10-40 s, 41-120 s, and 121-300 s after contrast agent injection, respectively, based on the CEUS LI-RADS (v2017). The following BUS and CEUS features were assessed: (a) lesion numbe: one or multiple; (b) target lesion size: maximum diameter; (c) lesion shape: regular or irregular; (d) lesion echogenicity: hyper-, iso-, hypo-, or mixed-echogenicity (compared with the liver parenchyma echogenicity surrounding the lesion); (e) lesion boundary: clear or obscure (defined as lesions indistinguishable from the surrounding normal liver tissue on BUS); (f) hilar lymph node metastasis: present or absent; (g) liver background: normal, fatty liver, or cirrhosis; (h) intrahepatic bile duct dilatation: present or absent; (i) enhancement onset time; (j) AP enhancement degree: hyper- or iso-/hypo-enhancement (compared with the enhancement degree of the liver parenchyma surrounding the lesion); (k) AP enhancement patterns: APHE (homogeneous or heterogeneous hyperenhancement), peripheral rim-like hyperenhancement (rim APHE), or other; (l) time to peak; (m) intra-tumoral dendritic vessel: present or absent (defined as dendritic vessel branches extending through the lesion); (n) wash-out onset time; (o) wash-out degree in PVP or LP: no, mild, and marked (Fig. 2).

Fig. 2
figure 2

Representative examples of B-mode ultrasound and contrast-enhanced ultrasound features of ICC and HCC. A Clear lesion boundary. B Obscure lesion boundary. C Intrahepatic bile duct dilatation. D Peripheral rim-like arterial phase hyperenhancement (rim APHE). E Homogeneous APHE. F Heterogeneous APHE. G Intra-tumoral dendritic vessel during the portal venous and late phase. H Marked wash-out in the portal venous phase. I Mild wash-out in the late phase

Evaluation of the LR-M criteria and modified LR-M criteria

The LR-M criteria was defined as rim hyperenhancement in the AP, early wash-out onset within 60 s, and/or a marked wash-out (punch-out) within 2 min. Based on previous studies [13, 16], the LR-M criteria was modified and defined as rim hyperenhancement in the AP, early wash-out onset within 45 s, and/or marked wash-out within 3 min. Each lesion was classified into ICC or HCC according to the LR-M and modified LR-M criteria, respectively.

Multi-parameter ICC scoring system development

A multi-parameter ICC scoring system was established by combining independent features selected from clinical, BUS, and CEUS data (including the modified LR-M criteria) using multivariate logistic regression analysis and weighted by their respective coefficients as follows: ICC score = β0 + β1 × X1+ β2 × X2 + … + βn × Xn, where β0 indicates constant, X indicates independent feature, and β indicates weighted coefficient. Subsequently, the optimal cut-off value of the ICC scoring system was calculated using receiver operating characteristic (ROC) curve analysis and Youden’s index.

Statistical analysis

Data analyses were performed using SPSS software (Version 22.0, IBM Corporation, Armonk, USA). Normality was assessed using the Kolmogorov–Smirnov test. Continuous variables are presented as the mean ± standard deviation or the median value with interquartile range (IRQ). Differences were compared using the t-test or rank sum test. The chi-squared test was used to evaluate the differences between categorical variables. The diagnostic performance was evaluated using ROC curve analysis. P-values less than 0.05 were considered statistically significant.

Results

Demographics and clinical characteristics

Overall, 124 lesions (62 ICCs and 62 HCCs) from 124 patients were enrolled in this study. The male-female ratio differed between the ICC and HCC groups (1.95, 41/21 vs. 9.3, 56/6, p=0.001). Elevated alpha-fetoprotein (AFP, >20 µg/L) was observed in 7 (11.3%) and 23 (37.1%) patients with ICC and HCC, respectively (p=0.001). Elevated CA19-9 (>35 U/ml) was observed in 29 (46.8%) and 11 (17.7%) patients with ICC and HCC, respectively (p=0.001). Elevated CA125 (>24 ug/L) was observed in 12 (19.4%) and 2 (3.2%) patients with ICC and HCC, respectively (p=0.004). In total, 11 (17.7%) and 28 (45.2%) patients with ICC and HCC were diagnosed with cirrhosis by pathology respectively (p=0.012). The detailed demographic and clinical characteristics are shown in Table 1.

Table 1 Univariate analysis of the demographics, clinical, and US characteristics

BUS and CEUS features in discrimination of ICC and HCC

Using univariate analysis, we found the following significant differences in BUS and CEUS features between the ICC and HCC groups: obscure lesion boundary (77.4%, 48/62 vs. 40.3%, 25/62, p<0.001), hepatic cirrhosis (33.9%, 21/62 vs. 56.5%, 35/62, p=0.006), intrahepatic bile duct dilatation (22.6%,14/62 vs. 1.6%, 1/62, p=0.001), rim APHE (45.2%, 28/62 vs. 3.2%, 2/62, p<0.001), intra-tumoral dendritic vessel (11.3%, 7/62 vs. 0.0%, 0/62, p=0.026), marked wash-out in the PVP or LP (62.9%, 39/62 vs. 8.1%, 5/62, p<0.001), peak time (25.33±5.58 s vs. 30.90±10.24 s, p=0.017), and wash-out onset time (43.74±16.03 s vs. 65.13±22.66 s, p=0.001). Representative ICC and HCC cases are presented in Figs. 3 and 4, respectively.

Fig. 3
figure 3

A 51-year-old man with a 32 mm ICC lesion. A B-mode ultrasound showed a hypoechoic lesion (white arrow) with an obscure boundary. B Rim APHE was observed at 16s after contrast agent injection. C Early wash-out was observed at 45 s (white arrow). D Marked wash-out was present at 88 s (white arrow).

Fig. 4
figure 4

A 47-year-old man with a 28 mm HCC lesion. A B-mode ultrasound showed a hyperechoic lesion (white arrow) with a clear boundary. B APHE was observed at 17 s after contrast agent injection. C Initial wash-out was observed at 63 s (white arrow). D Mild wash-out was present at 160 s (white arrow)

No difference in BUS echogenicity (p=0.374), hilar lymph node metastasis (p=0.127), and time of enhancement onset (p=0.147) was found between the ICC and HCC groups (Table 1).

A multi-parameter ICC scoring system based on clinical, BUS, and CEUS features

The independent features selected by multivariate logistic regression analysis were as follows: elevated AFP, elevated CA 19-9, obscure lesion boundary, rim APHE, wash-out onset within 45 s, and marked wash-out within 3 min (Table 2). Based on these features, a multi-parameter ICC scoring system was established for differentiating ICC from HCC as follows: ICC score = –2.474 – 2.554 × elevated AFP + 2.537 × elevated CA 19-9 + 2.451 × obscure lesion boundary + 3.164 × rim APHE + 1.976 × wash-out onset within 45 s + 2.976 × marked wash-out within 3 min. The Table 3 shows these independent features and the odds ratios (ORs) value of the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system in differentiating ICC from HCC.

Table 2 Logistic regression of the independent features of the multi-parameter ICC scoring system for discriminating ICC from HCC
Table 3 The independent features and their odds ratio value of the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system in differentiating ICC from HCC by multivariate logistic regression analysis

ICC classification using the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system

According to the current LR-M criteria, 85 of the 124 nodules were assigned to the LR-M category. Of these, 58 (68.2%) were ICCs. But 27 (43.5%) HCC nodules were classified into LR-M category. According to the modified LR-M criteria, 69 nodules were classified as the modified LR-M category. Of these, 55 (79.7%) were ICC nodules. Moreover, the number of HCC nodules classified as the LR-M category decreased from 27 (43.5%) to 14 (22.6%) (p=0.001) in comparison with the LR-M criteria. After combining the independent clinical (elevated AFP and CA19-9) and BUS (obscure lesion boundary) features with the modified LR-M criteria, 59 nodules were assigned as the M category of the multi-parameter ICC scoring system. Of these, 55 (93.2%) were nodules. Moreover, the number of HCC nodules classified as the M category decreased from 27 (43.5%) to 4 (6.5%) (p<0.001) compared with the LR-M criteria. These details are shown in the Table 4.

Table 4 Classification of ICC and HCC using the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system

Diagnostic performance of the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system for differentiating ICC from HCC

Using ROC analysis, the AUC of the multi-parameter ICC scoring system for differentiating ICC from HCC was 0.911 (95% CI: 0.853–0.969) and the optimal cut-off value of the scoring was 1.322. The ICC scoring system showed a significantly higher diagnostic performance than the CEUS LR-M criteria (AUC=0.750; 95% CI: 0.662–0.838) and modified LR-M criteria (AUC=0.831; 95% CI: 0.754–0.907) for differentiating between ICC and HCC (both p<0.05; Table 5 and Fig. 5). Moreover, compared with the LR-M criteria, the multi-parameter ICC scoring system and modified LR-M criteria significantly improved the diagnostic specificity (0.565 vs. 0.774 and 0.935, both p<0.05) and accuracy (0.750 vs. 0.831 and 0.911, both p<0.05) for ICC, while the sensitivity remained the same (0.935 vs. 0.887, p =0.250 and 0.453) (Table 5 and Fig. 6). The diagnostic performance of each feature in the LR-M and modified LR-M criteria is detailed in Supplement Table 1.

Table 5 Diagnostic performance of the CEUS LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system in differentiating ICC from HCC
Fig. 5
figure 5

Receiver operating characteristic curves of the LR-M criteria (blue), modified LR-M criteria (green), and multi-parameter ICC scoring system (red) for ICC diagnosis

Fig. 6
figure 6

Diagnostic performance of the LR-M criteria, modified LR-M criteria, and multi-parameter ICC scoring system in distinguishing ICC from HCC

Discussion

This study investigated and compared the diagnostic performance of the LR-M criteria of CEUS LI-RADS (v2017), a modified version of the LR-M criteria (adjusted for early washout onset within 45 s and marked wash-out within 3 min), and a multi-parameter ICC scoring system for distinguishing ICC from HCC in high-risk patients. The multi-parameter ICC scoring system was established based on the positive (elevated CA 19-9, obscure lesion boundary, and modified LR-M criteria) and negative (elevated AFP) independent features. For differentiating ICC from HCC, the ICC scoring system and modified LR-M criteria both exhibited a higher AUC (0.911 and 0.831 vs. 0.750) and specificity (0.935 and 0.774 vs. 0.565) than the LR-M criteria, without a significant reduction in sensitivity (0.887 and 0.887 vs. 0.935, both p>0.05). Meanwhile, the number of HCCs classified as the LR-M category significantly decreased (6.5% and 22.6% vs. 43.5%), which may help resolve the high proportion of HCCs in the LR-M category.

Previous studies have indicated that the LR-M criteria could be used for distinguishing ICC from HCC in patients with or without risk factors, with a high sensitivity [13, 17]. However, a certain number of HCCs (approximately 48%) are classified into the LR-M category, which increases the diagnostic challenge [12]. Therefore, we hypothesized that the diagnostic performance especially specificity of the LR-M criteria for differentiating ICC from HCC could be improved by modifying the criteria, without significantly disrupting sensitivity.

We compared the dynamic CEUS features among ICC with HCC cases and found that rim APHE, early wash-out, and marked wash-out had significant independent correlations with ICC, highlighting the importance of the LR-M criteria. Reports have shown that rim APHE is a distinct wash-in pattern of ICC, with occurrence rates ranging between 42.6 and 64.5% [13, 17,18,19,20]. Similarly, rim APHE was detected in 45.2% of ICC cases and only 3.2% HCC of cases, and showed the highest OR value in this study. The appearance of rim APHE in ICC may be highly correlated with the pathologically abundant distributions of fibrous stroma within a nest of peripheral tumor cells [21]. We found that the mean size of ICCs with rim APHE were larger than those with non-rim APHE (56.2 mm vs. 44.8 mm) and were deemed to appear more frequently with more fibrous stroma and even necrosis [22].

In the present study, we validated the performance of the modified LR-M criteria after adjusting the early onset and marked wash-out times for differentiating between ICC and HCC. We demonstrated that the mean early wash-out onset time in ICCs was faster than that of HCCs (43.74 s vs. 65.13 s), which was consistent with the results of previous studies [23,24,25]. Li et al. found that the diagnostic specificity could be significantly increased when the wash-out onset was adjusted from 60 to 45 s [13]. Our analysis also indicated that adjusting the early wash-out cutoff from within 60 s to within 45 s significantly improved the diagnostic specificity from 0.613 to 0.887; however, sensitivity decreased from 0.903 to 0.694. The LR-M criteria sets the marked wash-out time to within 120 s but has a relative low rate in ICC lesions (less than 50%) which might limit its application value [13, 17, 23]. Therefore, we adjusted the time to within 180 s according to previous studies [13, 16] and found marked wash-out in 62.9% ICCs, rather than 46.8% ICCs (compared to the LR-M criteria). The sensitivity also increased from 0.468 to 0.629, whereas specificity decreased slightly (0.968 vs. 0.919). LR-5 criteria have a high specificity for diagnosing HCC [26]. In our study, the new LR-5 criteria exhibited a higher sensitivity (0.710, 44/62 vs. 0.500, 31/62; p<0.05) and similar specificity ((0.935, 58/62 vs. 0.968, 60/62; p<0.05) for HCC than the LR-5 criteria based on CEUS LI-RADS. The detailed results are shown in Supplement Table 2. Thus, the modified LR-M criteria did not reduce the high specificity of LR-5 criteria for HCC, and was expected to further increase the sensitivity for HCC. Nonetheless, studies to investigate and improve the diagnostic performance of the modified LR-M criteria are warranted.

The current study assessed two diagnostic systems to help distinguish between ICC and HCC. The modified LR-M criteria was entirely based on CEUS features and showed a higher diagnostic area under the curve (AUC, 0.831 vs. 0.750) and accuracy (0.831 vs. 0.750) than the LR-M criteria. The multi-parameter ICC scoring was established using the modified LR-M criteria while combining the clinical and BUS features for optimizing the diagnostic performance for ICC. Upon analyzing these BUS features of ICC with matched tumor size of HCC using logistic regression, obscure lesion boundary was deemed a strong risk factor for ICC, and observed in 77.4% and 40.3% of ICC and HCC cases, respectively. This may be due to the more infiltrative growth nature of ICC arising from cholangiocytes, which is consistent with previous studies where obscure boundary was a good predictor for microinvasion [3, 27]. In addition, AFP and CA19-9 have been shown to contribute to the diagnosis of HCC and ICC [28, 29]. Chen et al. developed a CEUS-based nomogram that used CA19-9 levels to differentiate ICC from HCC in high-risk patients and showed a higher performance than the CEUS LI-RADS [19]. AFP is considered as a diagnostic and prognostic biomarker of HCC and has a significant clinical diagnostic value [30]. Consistently with our study, the multi-parameter ICC scoring system included the negatively correlated feature of AFP and the positively correlated feature of CA 19-9, which further enhanced its performance for differentiating between ICC and HCC.

This study had several limitations. First, we mainly focused on differentiating ICC from HCC among high-risk patients in this dual-institutional study, which resulted in a relatively limited sample size. A multi-center-based study with a larger sample is needed to validate our results in the future. Second, we did not enroll patients with other focal hepatic lesions which could be classified using the LR-M criteria. For example, combined hepatocellular cholangiocarcinoma, which are relatively rare and have similar clinical management to ICC, and metastatic lesions, since a medical history of primary cancer was capable of differentiation. Further study was expected to assess the value of CEUS in differentiating HCC and ICC from other LR-M tumors is warranted. Third, the pathological assessment was a requirement for ICC. However, HCC could be diagnosed either by pathology or by a noninvasive reference standard, such as contrast-enhanced computerized tomography and magnetic resonance imaging LR-5 criteria [31]. It may introduce selection bias when the pathologic confirmation is the only reference. Lastly, this retrospective study did not compare the diagnostic performance of CEUS with computed tomography or magnetic resonance imaging due to imaging data unavailability in some cases.

In conclusion, we established a multi-parameter ICC scoring system which improved the diagnostic performance, especially the specificity, of the current LR-M criteria for differentiating ICC and HCC, and significantly reduced the number of HCC cases misdiagnosed as ICCs.