Introduction

Intrahepatic cholangiocarcinoma (ICC) is the second most common primary liver cancer (PLC), which is prevalent in Asian countries [1,2,3]. Liver resection remains the only curative treatment for ICC, but most patients are at an advanced stage at first diagnosis that are not indicated for surgery[4]. Even after surgery, the prognosis is unsatisfactory, with a median overall survival (OS) of approximately 30 months and a 5-year survival rate of approximately 30%[5, 6]. Postoperative recurrence occurs in approximately 53%-79% of cases after surgery, which is the leading cause of death[7, 8]. As an important prognostic factor, early recurrence (ER) is common in clinical practice. Wang et al. showed that ICC patients with ER experienced a worse OS than patients with late recurrence, a similar finding to that observed by Tonouchi et al.[9, 10]. Li et al. also found that ER was associated with a shorter OS in ICC patients following surgery[11]. Therefore, ER is recognized as a risk factor of worse prognosis, and there is increasing interest in finding novel biomarkers predicting ER in ICC. Identifying patients who are likely to experience ER is important for determining surveillance strategies and optimizing individual management.

Radiomics is a new technique that involves the high-throughput extraction of quantitative image features [12, 13]. It has been widely applied in the diagnostic, prognostic, and predictive assessment of cholangiocarcinoma [14, 15]. In addition, some clinical indicators, such as the controlling nutritional status (CONUT) score, prognostic nutritional index (PNI) and albumin-bilirubin (ALBI) score, are also associated with the prognosis of ICC. The CONUT score is a valuable biomarker reflecting the patient’s immune-nutritional status which is a predictor of poor prognosis for ICC patients undergoing hepatectomy[16]. Zheng et al. confirmed the prognostic value of CONUT in predicting the recurrence of ICC[17]. The PNI is a predictor of individual nutritional and inflammatory status, and the ABLI score is an important indicator of liver function. Previous studies have demonstrated the prognostic value of PNI and ALBI for patients with ICC undergoing hepatectomy[18,19,20]. However, limited work has been reported about their role in predicting the ER for ICC patients undergoing hepatectomy.

In recent years, machine learning (ML) has attracted increasing attention in the field of hepatology[21, 22]. Previous studies have confirmed the practical value of combining radiomics with ML in various liver diseases. Shen et al. developed an ML radiomics model to identify ICC with lithiasis[23]. Qin et al. showed that ML radiomics could predict ER in perihilar cholangiocarcinoma[24]. Jhaveri et al. showed that ML radiomics could aid in the differentiation of liver cancers[25]. However, few studies have reported the role of ML radiomics based on contrast-enhanced computed tomography (CECT) in predicting ER in ICC. Therefore, the aim of our study was to develop ML radiomics models to predict ER in ICC after curative resection.

Material and methods

Patients

Patients with ICC who underwent curative resection from three institutions were retrospectively recruited from June 2011 to June 2021. Patients from the First Affiliated Hospital of Wenzhou Medical University were assigned to the training cohort, while patients from the Eastern Hepatobiliary Surgery Hospital of Naval Medical University and the First Affiliated Hospital of Zhejiang Chinese Medical University were assigned to the external validation cohort. The inclusion criteria were as follows: (1) pathologically diagnosed ICC; (2) receiving curative liver resection; and (3) performance status (PS) score 0–2. The exclusion criteria were as follows: (1) combined with other malignancies; (2) receiving other antitumour treatments before surgery; (3) Child–Pugh score > 7; (4) incomplete clinical information (e.g., laboratory test results, pathological data, or operative data); (5) absence of CECT images performed within 1 month before surgery; (6) OS < 1 month due to postoperative complications; and (7) lost to follow-up. The flowchart of the study is shown in Fig. 1.

Fig. 1
figure 1

The flowchart of this study

Definition of clinical parameters

Clinical information, including demographic data, laboratory test, clinicopathologic data and imaging data were collected. Body mass index (BMI) was calculated as weight (kg)/height2 (m2). CONUT score was calculated based on serum albumin (ALB), total lymphocyte count, and total cholesterol concentration[16]. PNI was calculated with the following formula: 10 × serum albumin (g/dl) + 0.005 × total lymphocyte count (/mm3)[18]. ALBI score was calculated as follows: (log10 bilirubin × 0.66) + (albumin × –0.085)[20]. Major resection was defined as liver resection over than three segments. Macrovascular invasion was defined as invasion of portal vein, hepatic artery, or hepatic veins, whereas microvascular invasion (MVI) was defined as intraparenchymal vascular involvement identified by pathological examination. TNM stage was defined according to the 8th edition of the American Joint Committee on Cancer staging system[26].

Surgical management and follow-up strategy

All patients involved in the study underwent curative resection with sufficient preservation of future liver remnant volume. The surgical plan and perioperative management were determined by a multidisciplinary team (MDT). Patients were followed up regularly once every three months within the first two years after surgery and then once every six months afterwards, following the guidelines of the Chinese Society of Clinical Oncology (CSCO). OS was defined as the time from surgery to death for any reason or censored at the last follow-up. Disease-specific survival (DSS) was defined as the time from surgery to death from ICC or censored at the last follow-up. Recurrence was monitored by two experienced hepatologists CG (15 years of clinical experience) and YZP (35 years of clinical experience). Recurrence was defined as positive findings on surveillance imaging or histologically confirmed disease, and was determined mainly according to imaging examinations (e.g., ultrasonography, CT, magnetic resonance imaging (MRI), or positron emission tomography/computed tomography (PET/CT)) and serum tumour biomarkers (e.g., carbohydrate antigen 19–9 (CA 19–9)) [27]. Early recurrence was defined as recurrence within one year after surgery[11]. The treatment strategy after recurrence was determined by the MDT. The observation deadline was set to June 30, 2022.

CT imaging protocol

Multiple CT scanners were used to perform the CECT scans. The CT scanning parameters were as follows: tube voltage, 110–120 kVp; tube current, 130–375 mAs; rotation time,0.5–0.8 s; pixel spacing, 0.5–0.8 mm; slice thickness, 5 mm; image matrix, 512 × 512; and reconstruction interval, 5 mm. Detailed information on the CT scanners and imaging protocols is shown in Supplementary Table 1. The nonionic contrast agents used were iohexol (Yangtze River Pharmaceutical Group, Taizhou, China) and ioversol (Liebel-Flarsheim Canada Inc., Quebec, Canada). A dosage of 1.5 ml/kg of nonionic contrast agent was injected intravenously at a speed of 3 ml/s. The arterial and portal venous phase CT scans were performed 25–30 s and 60–75 s after injection, respectively. To reduce the variability derived from the different CT scanners and parameters, image preprocessing was performed (e.g., gray level normalization).

ROI segmentation and radiomics feature extraction

Radiomics analysis was performed according to the standardized procedures from a previous study[28]. The arterial and venous phase CECT images were reviewed by two experienced radiologists YF and YYJ, who were blinded to the clinical data. MRIcroGL software was used to segment the three-dimensional (3D) region of interest (ROI) manually slice-by-slice. Python software with the “Pyradiomics” package was used to extract the radiomics features. The ROIs from the arterial and venous phase images were drawn separately (Supplementary Fig. 1). The intra- and interobserver agreement was performed from 30 randomly chosen images to assess the radiomics features’ reliability with “irr” R package. The intraobserver correlation analysis was performed according to twice extraction of radiomics features by reader 1 in a 1-week period. The interobserver correlation analysis was performed between the extraction of radiomics features by reader 1 and reader 2. Radiomics features with intra- and interclass correlation coefficient values > 0.75 were selected for subsequent analysis. The radiomics features extracted by YYJ were selected for subsequent analysis.

Before feature selection, Z-score standardization was used to normalize the radiomics features. The independent-samples t test was used to remove similar features and retain different features. The max-relevance and min-redundancy (mRMR) algorithm was implemented to rank the importance of the radiomics features and select the significant features for subsequent analysis.

Machine learning approaches for model construction

Patients in the training cohort were used to identify predominant features and develop predictive algorithms, and patients in the validation cohort were used to evaluate the predictive performance. Based on the selected radiomics features, seven supervised ML classifiers were used to construct radiomics models with the “Scikit-learn 0.24.0” python package, including logistic, random forest (RF), neural network (NN), Bayes, support vector machine (SVM), Light Gradient Boosting Machine (LightGBM), and Xtreme Gradient Boosting (XGBoost). In addition, we incorporated clinical data to develop clinical-radiomics models to improve the predictive value. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were generated to evaluate the performance of the models. Calibration curve analysis and decision curve analysis (DCA) were used to test the robustness and clinical applicability of the models.

Statistical analysis

SPSS software (version 20.0), Python software (version 3.9.R) and R software (version 4.1.0) were used to perform statistical analysis. Continuous data were analyzed by t test or Mann–Whitney U test, and are shown as mean ± standard deviation (SD) or median (interquartile range) according to the distribution. Categorical data were analyzed by the chi-square test or Fisher’s exact test, and are shown as numbers (percentages). Univariate and multivariate logistic regression analyses were performed to identify risk factors. Multivariable logistic regression analyses were carried out on variables with a P value < 0.1 in the univariate analysis. Kaplan–Meier curve analysis was used to evaluate OS and DSS. Statistical significance was established at P value < 0.05.

Results

Patient characteristics

A total of 254 patients with ICC undergoing surgery were reviewed from three institutions. After exclusion, 127 patients were included in the analysis, including 90 patients in the training cohort and 37 patients in the external validation cohort. The baseline characteristics between the training and validation cohorts are listed in Supplementary Table 2, and most characteristics were not different. The mean age was 63.8 ± 10.4 years, and the mean BMI was 22.2 ± 3.3 kg/m2. Sixty-seven patients (52.8%) were men, 37 patients (29.1%) had hepatitis B infection, 36 patients (28.3%) had liver cirrhosis, and the vast majority of patients (89.8%) had a Child–Pugh class of A.

A total of 92 patients experienced recurrence, including 72 patients (78.3%) diagnosed by CT, 17 patients (18.5%) by MRI, two patients (2.2%) by ultrasonography and one patient (1.1%) by PET/CT. Seventy-one patients (55.9%) exhibited ER. The baseline characteristics between patients with and without ER are shown in Table 1. Significant differences were observed in sex (P = 0.019), tumour size (P = 0.015), lymphatic metastasis (P = 0.003), MVI (P = 0.003), macrovascular invasion (P = 0.005), TNM stage (P < 0.001), carcinoembryonic antigen (CEA) (P = 0.016) and CA19-9 (P = 0.006). During the follow-up period, a total of 96 patients (75.6%) died, including 86 patients died of disease recurrence and 10 patients died of other reasons without experiencing recurrence (e.g., cerebral hemorrhage). Six patients (4.7%) were alive at the last follow-up with disease recurrence. The one-year overall survival rate and one-year disease specific survival rate for patients with ER was lower than patients without ER. Patients with ER experienced shorter OS (hazard ratio (HR) = 6.50, 95% CI: 3.94–10.71, P < 0.001) and DSS (HR = 10.70, 95% CI: 6.00–19.06, P < 0.001) than patients without ER (Fig. 2).

Table 1 Baseline characteristics of ICC patients with and without early recurrence after curative liver resection
Fig. 2
figure 2

Kaplan–Meier curves analysis between ICC patients with and without ER after curative resection. A Kaplan–Meier curves of overall survival between patients with and without ER; B Kaplan–Meier curves of disease specific survival between patients with and without ER. ICC, intrahepatic cholangiocarcinoma; ER, early recurrence; HR, hazard ratio

Extraction of radiomics features

A total of 214 features were extracted from each patient, with 107 each in the arterial and venous phase images as follows: 14 shape-based features, 18 first order statistical features, 24 Gy level cooccurrence matrix (Glcm) features, 16 Gy level run length matrix (Glrlm) features, 16 Gy level size zone matrix (Glszm) features, 14 Gy level dependence matrix (Gldm) features, and five neighboring gray tone difference matrix (Ngtdm) features (Supplementary Table 3). The results of intra- and interobserver agreement analysis are shown in Supplementary Table 4. We first excluded redundant features with intra- or interclass correlation coefficient value less than 0.75, and then 165 features were included into subsequent analysis. Finally, 57 differential radiomics features were retained, including 42 upregulated features and 15 downregulated features in patients with ER (Fig. 3). The importance of the radiomics features was ranked through the mRMR algorithm, and the top 10 most important features were selected for subsequent analysis.

Fig. 3
figure 3

Differential radiomics features and partial clinicopathological characteristics of ICC with and without ER after curative resection. A Heatmaps of differential radiomics features and partial clinicopathological characteristics between patients with and without ER; B Importance ranking of differential radiomics features through the mRMR algorithm. ICC, intrahepatic cholangiocarcinoma; ER, early recurrence; mRMR: max-relevance and min-redundancy

Clinical factors associated with early recurrence

The results of univariate and multivariate logistic regression analysis are shown in Table 2. Through univariate regression analysis, sex (P = 0.020), tumour size (P = 0.026), lymphatic metastasis (P = 0.005), MVI (P = 0.006), macrovascular invasion (P = 0.022), TNM stage (P < 0.001), CA19-9 (P = 0.041) and hospital stay (P = 0.030) were associated with ER. Then variables with P < 0.1 were then included into the multivariate analysis with “Forward LR” method, and the results showed that male sex (P = 0.026), MVI (P = 0.006), TNM III-IV stage (P = 0.002) and elevated CA 19–9 (P = 0.051) were independent risk factors of ER. Correspondingly, a clinical model was built with an AUC of 0.685, and the calibration curve and DCA indicated an unsatisfactory performance (Fig. 4D-F).

Table 2 Univariate and multivariable logistic regression analysis of clinical factors associated with early recurrence in ICC patients undergoing curative resection
Fig. 4
figure 4

Predictive performance of the ML radiomics models and clinical model in predicting ER in ICC. A ROC curves of the ML radiomics models; B Calibration plots of the ML radiomics models; C Decision curve analysis of the ML radiomics models; D ROC curves of the clinical model; E Calibration plot of the clinical model; F Decision curve analysis of the clinical model. ML, machine learning; ER, early recurrence; ICC, intrahepatic cholangiocarcinoma; ROC, receiver operating characteristic curves; TPR, true positive rate; FPR, false positive rate; AUC, area under the receiver operating characteristic curve; SVM, Support Vector Machine; LightGBM, Light Gradient Boosting Machine; XGBoost, eXtreme Gradient Boosting

Construction of machine learning radiomics-based models

Seven ML radiomics models were constructed with a mean AUC of 0.87 ± 0.02 (Fig. 4A). Among them, RF, NN and SVM showed the best performance (AUC of 0.89). The calibration curves and DCA showed a favourable performance (Fig. 4B and C). Seven ML clinical-radiomics models were also built with a mean AUC of 0.87 ± 0.03 (Fig. 5). The clinical-radiomics models showed a similar predictive power over the radiomics models, indicating the pivotal role of radiomics in predicting ER in ICC. To further confirm the stability of the ML algorithms, we developed two new models by swapping the parameters of the radiomics models and clinical-radiomics models, and the new radiomics models and new clinical-radiomics models all showed good predictive performance (Supplementary Fig. 2).

Fig. 5
figure 5

Predictive performance of the ML clinical-radiomics models in predicting ER in ICC. A ROC curves of the ML clinical-radiomics models; B Calibration plots of the ML clinical-radiomics models; C Decision curve analysis of the ML clinical-radiomics models. ML, machine learning; ER, early recurrence; ICC, intrahepatic cholangiocarcinoma; ROC, receiver operating characteristic curves; TPR, true positive rate; FPR, false positive rate; AUC, area under the receiver operating characteristic curve; SVM, Support Vector Machine; LightGBM, Light Gradient Boosting Machine; XGBoost, eXtreme Gradient Boosting

We further constructed models using the arterial and venous phase features separately and compare their individual contributions. We selected the top 10 most important arterial phase features and top 10 venous phase features to construct their individual radiomics and clinical-radiomics models. The mean AUCs of the radiomics and clinical-radiomics models derived from arterial phase features were 0.72 ± 0.04 and 0.79 ± 0.03 (Supplementary Fig. 3). The mean AUCs of the radiomics and clinical-radiomics models derived from venous phase features were 0.84 ± 0.02 and 0.85 ± 0.04 (Supplementary Fig. 4). The predictive performance of the arterial phase feature-based models was inferior to the venous phase feature-based models and the comprehensive models, while the comprehensive models including the whole set of features exhibited the best predictive performance.

Discussion

The proneness of ER after surgery remains a major barrier preventing therapeutic success in ICC, and timely identifying patients who are likely to experience ER is clinically important. Our study identified important radiomics features from pretreatment CECT images and constructed favorable ML radiomics-based models to predict ER in ICC. We propose that CECT-based radiomics is a useful, noninvasive and easy-to-use tool for predicting ER in ICC.

In light of the dismal prognosis of ICC, there has been increasing interest in identifying biomarkers for predicting the prognosis [29, 30]. Several clinical factors have been identified, such as serum tumour biomarkers, tumour size, tumour number, MVI, and lymph node metastases [7, 9, 31]. The study by Lang et al. showed that male sex is a predictor of RFS [32]. In our study, we found that male sex, MVI, TNM stage III-IV, and elevated CA19-9 were independent risk factors for ER in ICC, which is in accordance with previous studies. However, predicting the ER in ICC only with clinical factors was insufficient (AUC of 0.685), and a more reliable method was needed.

Currently, radiomics has been widely applied in the management of PLC, which is performed based on existing images without additional cost [3, 33, 34]. It can convert medical images into mineable data, displaying more information than simple tumour phenotypic data. In this study, we extracted radiomics features from 3D volumetric ROIs of pretreatment CECT images, which can provide more comprehensive information about tumour heterogeneity than 2D ROIs. We found that the radiomics models performed better than conventional clinical models in predicting ER in ICC. Further clinical-radiomics models also achieved remarkable predictive performance, superior to that of clinical model and similar to that of radiomics models, indicating the pivotal role of radiomics in predicting ER in ICC. Interestingly, the radiomics features from venous phase CECT images may play a more important role in predicting the ER in ICC, as venous images yielded most of the important radiomics features in this study.

Radiomics features are usually associated with tumour size, tumour shape, voxel intensity or spatial relationship between voxels[35]. In our study, the ten most important radiomics features are mainly shape-based features (e.g., shape surface area) and features reflecting tumour heterogeneity (e.g., gray level non-uniformity), which are known to be relevant for prognosis. Shape surface area could indicate the relative size of the image array, where a greater value implies a greater tumour size which can lead to a higher recurrence risk[7, 31]. Xiang et al. also identified shape features from CECT suggesting that larger tumors tend to have higher recurrence rate in gallbladder carcinoma[36]. Uniformity is an indicator of image array heterogeneity, with greater values implying greater heterogeneity which is associated with a complex tumour microenvironment (e.g., gene instability, hypoxia, angiogenesis, and immune status). Notably, Zhang et al. presented a radiomics signature based on texture and shape features to predict MVI status in hepatocellular carcinoma (HCC)[37], which may further contribute to the recurrence. Another study by Xu et al. also identified radiomics features related to tumour size and heterogeneity as the most important features for predicting MVI and survival in HCC, which is similar to our findings[38].

Although we confirmed a certain role of radiomics in predicting ER in ICC, the biological meaning of the selected radiomics features is still unclear, since there was a lack of genomic or immunohistochemistry profiling. A hypothesized interpretation is that these radiomics features, which correlate significantly with tumour structure and microenvironment, can be used as surrogate markers of tumour heterogeneity and a more aggressive biological behaviour in ICC. Therefore, it is worth investigating the potential relationship between radiomics features and clinicopathological features underlying the biological behaviour of tumours, since their combination has shown promising potential in improving outcome prediction[39].

To make the results more compelling, we applied seven ML algorithms to construct radiomics-based models. As a type of artificial intelligence (AI), ML is achieving increasing use in various liver diseases, with the advantages of generating predictive models more accurately than conventional approaches[22]. ML radiomics has been a promising tool in clinical practice[40, 41]. In our study, all the ML radiomics models achieved satisfactory performance, indicating the favourable clinical value of ML radiomics in predicting the ER of ICC. In addition, we performed stringent external validation to further improve the generalizability and reproducibility of our results, which is necessary to translate radiomics analysis into clinical application[42].

The potential benefit of neoadjuvant therapy and postoperative adjuvant therapies is still controversial, and standard selection criteria for patients suitable for these therapies are absent[43, 44]. Promisingly, the present ML radiomics models can provide significant reference in estimating recurrence risk and optimizing surveillance program for ICC, especially in the selection of potential patients eligible for neoadjuvant or adjuvant therapies. By inputting the radiomics features extracted from pretreatment CECT images into our ML algorithms, the probability of ER is provided. For patients who are predicted to have a high risk of ER, appropriate neoadjuvant or adjuvant therapies and intensive screening after surgery may be needed to delay or avoid recurrence, which is also recommended by the guidelines of the National Comprehensive Cancer Network (NCCN) [45]. In contrast, neoadjuvant or adjuvant therapies should be taken with caution to avoid additional adverse results which may reduce the patients’ quality of life. In addition, combining surgery with locoregional therapies or novel drugs such as immune checkpoint inhibitors, routine lymphadenectomy and accurate nodal staging may also provide some benefit in improving the prognosis of ICC for patients with high recurrence risk[4, 46]. Therefore, the ML radiomics is valuable and practical for optimizing treatment strategies and guiding clinical decision-making. In addition, the extracted radiomics features can also be used as surrogate markers of underlying biological activities driving the ER of ICC.

Although the results are promising, there are still some limitations in the study. First, this is a retrospective study with a limited sample size. A large number of patients were excluded because of a lack of imaging data, which may result in some potential selection bias. Prospective studies with larger cohorts are needed to validate the results. Second, the manual delineation of ROIs may have resulted in some degree of heterogeneity and affected the accuracy of data extraction. Third, although external validation was performed to enhance the generalizability of radiomics models, differences among the CT scanners and their parameters may have led to some bias because engineered features are critically dependent on image acquisition settings. Finally, the potential mechanism underlying the radiomics features has not been elucidated. Clarifying the biological meaning of radiomics features underlying tumour behaviour is important but challenging, and may shed new light on the biological drivers of patient outcomes[42].

In conclusion, this study characterized the potential role of CECT-based radiomics in predicting the ER of ICC and developed valuable ML radiomics models. This may provide significant benefit for accurate risk stratification and guiding clinical decision making.