Introduction

Hepatocellular carcinoma (HCC), accounting for 75–85% of primary liver cancer cases, is the third leading cause of cancer-related death, with approximately 780,000 deaths annually [1]. A comparative study analyzing more than 8500 patients receiving treatment for HCC showed that less than 10% of the candidates met the criteria for liver resection [2]. For patients with unresectable or advanced HCC who are ineligible for surgery, recommended first-line systemic treatments include the single agents sorafenib, lenvatinib, and donafenib [3,4,5]. Moreover, tumor immunotherapy has been reported to provide promising outcomes. The combination of atezolizumab and bevacizumab has become the standard first-line systemic therapy according to the IMbrave150 trial [6]. Additionally, the evidence from a recent phase III randomized clinical trial indicated that tislelizumab may represent a potential first-line treatment option for patients with unresectable HCC [7]. However, the effectiveness of molecular targeted therapy and immunotherapy is limited by the low response rate. Additionally, the high cost of immunotherapy may prevent it from being widely used. Transarterial chemoembolization (TACE) is the most widely used treatment option for patients with unresectable HCC [8, 9]. However, TACE is not effective in patients with diffuse HCC, arterioportal/arteriovenous shunts, or major vascular tumor thrombosis [10]. As hepatic arterial infusion chemotherapy (HAIC) is associated with increased local agent concentrations and reduced systemic distribution of anticancer drugs, it has shown stronger treatment efficacy and fewer systemic adverse effects than other systemic therapies [11,12,13,14,15]. Clinical experience has indicated that HAIC is more suitable for patients with arteriovenous/arterioportal shunts and major vascular tumor thrombosis. HAIC was listed in the Japan Society of Hepatology guidelines in 1995 and accepted as an interventional therapy for advanced HCC by the Chinese Society of Clinical Oncology [15, 16]. In addition, according to the Pan-Asian–adapted European Society for Medical Oncology Guidelines, HAIC was recommended as the first-line option for advanced, nonmetastatic HCC with macrovascular invasion [17].

According to the retrospective study of Zhao et al., HAIC was associated with longer OS than sorafenib (14.5 and 7.0 months, P < 0.001) in patients with locally advanced HCC [18]. A clinical trial by Shi et al. indicated that FOLFOX-HAIC plus sorafenib had a greater ORR (40.8% vs. 2.46%, P < 0.001) than sorafenib alone [19]. On the other hand, Shi et al. showed that FOLFOX-HAIC significantly improved OS compared with TACE in patients with unresectable large HCC [20]. Additionally, compared with sorafenib alone, HAIC plus sorafenib was associated with longer median OS (16.3 vs. 6.5 months, P < 0.001) and PFS (9.0 vs. 2.5 months, P < 0.001) and a greater ORR (41% vs. 3%, P < 0.001) in patients with major portal vein tumor thrombosis [21]. Moreover, HAIC combined with targeted therapy or immunotherapy can be used when patients present with tumor progression after traditional TACE treatment in clinical practice [22]. In many studies, scholars have reported high treatment response rates in patients with unresectable HCC receiving HAIC, and the clinical response significantly affects survival. A comparative study showed that HAIC may improve OS in patients with advanced HCC with major portal vein tumor thrombosis compared to TACE/TAE treatment [23]. Moreover, a meta-analysis indicated that HAIC may have better therapeutic efficacy than TACE in patients with unresectable HCC [24]. To our knowledge, three models have been developed to predict the therapeutic response to HAIC in advanced or unresectable HCC patients [25,26,27]. Among them, two models used contrast-enhanced computed tomography, and one model used only axial T2-weighted images. However, not all of those studies considered the use of convolutional neural networks (CNNs), which are tools that focus on image processing.

Deep learning has been widely applied in the area of prediction owing to its advantages of being accurate, fast, and reproducible [28]. CNNs have gained extensive attention related to pattern recognition in images as a strategy for deep learning [29, 30]. CNNs can find image details that physicians cannot find and automatically make a quantitative evaluation. The aim of this study was to evaluate the potential of combining CNNs with preoperative MRI and clinical factors for the prediction of treatment response in unresectable HCC patients treated with HAIC.

Patients and Methods

Study Population

This retrospective study was performed in accordance with the institutional ethics committee of our hospital and the Helsinki Declaration. The ethics review number is KY2022078. Between May 2019 and March 2022, a total of 191 patients with unresectable HCC after HAIC from our hospital were included in this study. HCC was pathologically confirmed by liver biopsy or clinically confirmed by the American Association for the Study of Liver Diseases guidelines. Tumors were considered unresectable either because patients were at an advanced stage or because patients had insufficient remnant liver volume after liver surgery (< 40% for patients with cirrhosis; < 30% for patients without cirrhosis). The key inclusion criteria for our study included the following: (1) patients had received at least one HAIC cycle; (2) Eastern Cooperative Oncology Group Performance Status of 0–1; (3) Child‒Pugh class A or B; (4) adequate hematologic function—white blood cell counts ≥ 3.0 × 109/L, neutrophils ≥ 1.5 × 109/L, platelets ≥ 50 × 109/L, hemoglobin ≥ 100 g/L; (5) adequate renal function—serum creatinine ≤ 2 mg/dL. The exclusion criteria were as follows: (1) had severe cardiovascular and cerebrovascular diseases, (2) had other malignant tumors in addition to HCC, (3) unavailable qualified pretreatment MRI at our institution, (4) unavailable clinical medical records, and (5) lacked survival data.

Treatment Procedure

For the HAIC procedure, we introduced a 5F RH catheter (Cook, Bloomington, USA) or a 5F MPA catheter (Cordis, USA) through the femoral artery or the radial artery via the Seldinger technique. Then, we performed angiographic surveys of the celiac trunk and superior mesenteric artery. We advanced the catheter or a 2.7F microcatheter (Terumo, Tokyo, Japan) into the hepatic artery with reference to tumor size, location, and arterial supply. The catheter or microcatheter was connected to a chemotherapy infusion pump. After HAIC treatment, the catheter or microcatheter was removed. HAIC was delivered via segmental, lobar, or whole-liver infusion according to the tumor burden. The FOLFOX regimen was used as previously reported [22]. Specifically, at the discretion of the treating physician, oxaliplatin was administered at 85 mg/m2 for 2 h, leucovorin at 200–400 mg/m2, and 5-fluorouracil at 2400 mg/m2 for 46 h (48 h protocol) or 1200 mg/m2 for 22 h (24 h protocol). HAIC was performed every 3–4 weeks and was repeated unless deterioration of hepatic function, deterioration of clinical conditions, tumor progression, or unacceptable toxicity occurred.

Data Collection

The clinical characteristics of each patient included age, sex, pre-HAIC treatment, number of HAIC cycles, HAIC regimen, tumor number, maximum tumor size, tumor thrombus, extrahepatic metastasis, stage, liver function, hepatitis panel, liver cirrhosis markers, circulating blood cell laboratory values, best tumor response, overall survival (OS), and progression-free survival (PFS). OS was defined as the time interval between the first day of treatment and the date of death or last follow-up. PFS was defined as the time interval between the first day of treatment and the first documented disease progression or death.

MRI Acquisition

Each patient underwent gadoxetic acid–based MRI using a 3.0-Tesla MR scanner (uMR 770; United Imaging Health care, Shanghai, China). The routine plain-scan liver MRI protocol included diffusion-weighted imaging, T2-weighted imaging (T2WI), T1-weighted in-phase imaging, opposed-phase imaging, and contrast-enhanced T1-weighted imaging in the arterial phase (20–30 s), portal venous phase (60–80 s), and delayed phase (180 s) after injection of gadoxetic acid (extracellular) at a dose of 0.1 mmol/kg and at a rate of 2 mL/s, followed by a 20-mL saline flush. The detailed parameters of each MRI sequence are listed in Supplementary Table 1. In this study, enhanced T1-weighted sequences (including arterial, portal, and delayed phases), T2 fast spin echo (FSE) sequences, and dual echo sequences were selected for radiomic analysis.

Follow-Up and Response Evaluation

All enrolled patients were treated and monitored regularly. Tumor response was evaluated every 1–3 HAIC cycles according to the enhanced T1-weighted sequence in the arterial phase by two investigators according to Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1 [31]. The tumor response could be divided into four conditions: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). CR was defined as the disappearance of all target lesions (TLs) (up to two measurable liver lesions). PR was defined as a ≥ 30% decrease in the sum of the greatest one-dimensional diameters of TLs. PD was defined as a ≥ 20% increase in the sum of the diameters of the TLs. SD was defined as a case that did not qualify for either PR or PD. The objective response rate (ORR) was calculated as the percentage of patients with the best ORR or PR. The disease control rate (DCR) was the percentage of patients with the best ORR, PR, or standard deviation (SD). We analyzed adverse events according to the National Cancer Institute Common Terminology Criteria for adverse events version 4.03. In our study, we considered patients who achieved CR, PR, or SD as having a positive treatment response, while patients with PD were considered to have a poor treatment response. The timeframe for evaluating tumor response was during our median follow-up time.

Convolutional Neural Network Development

For each patient, consecutive images were collected from enhanced T1-weighted sequences (including arterial, portal, and delayed phases), T2 FSE sequences, and dual-echo sequences. In this study, we extracted features by deep learning model automatically instead of segmenting the MRI images manually. A total of 191 patients were randomly divided into a training group (n = 127) and a validation group (n = 64). Because of the limited amount of data in our training group, data augmentation and image processing were performed to increase the amount of data to 10 times the original size according to random geometric image transformations, including shifting, rotation, flipping, and enlarging. In this way, we could make the model focus on the lesions rather than various sources of noise. All the transformed images were resized to 450 × 320 pixels to standardize the distance scale. The data augmentation method was proven to help prevent network overfitting and memorization of the exact details of the images. All the above steps were conducted in Python (version 3.7) by using the resize function in OpenCV (https://opencv.org).

In our study, we used CNN models to transform the MRI images of each patient into digital features, and different sequences were input into different channels. The image features were combined with the other 27 clinical features as the overall characteristics of the patients. To select the CNN model with the best performance and eliminate the interference of other factors, we conducted unified experiments on three classification models, namely, AlexNet, ResNet, and InceptionV4. We compared the cross entropy loss (CEL) of the three models and selected InceptionV4 because it has the smallest CEL. InceptionV4 was proposed in 2021 and contains three Inception parts, two Reduction parts, and one Stem part. This approach introduces a residual block to substantially accelerate the training speed and achieves dual improvements in accuracy and training speed. Finally, we combined the image features with clinical features as input to the support vector machine (SVM) classifier. All three models were constructed with torch 1.16.0. InceptionV4 was initialized randomly before training. We used the SGD optimizer to train the model with a batch size of 1 and adjusted the parameters by end-to-end supervised learning. The initial learning rate was set to 0.01, and the learning rate decreased by a factor of 10 when the accuracy of the validation set did not further improve for 10 continuous epochs each time. CEL was selected as the loss function for backpropagation. CEL=\(f\left(x\right)=\frac{1}{{\text{batch}}\_{\text{size}}}\sum_{j=1}^{{\text{batch}}\_{\text{size}}}{\sum }_{i=1}^{n}-{y}_{ji}{\text{log}}\widehat{{y}_{ji}}\)  − (1 − \({y}_{ji}){\text{log}}(1-\widehat{{y}_{ji}})\) (n: the total category of classification; \({}_{ji}\): the jth sample in batch_size, classified as I; \({y}_{ji}\): actual value; \(\widehat{{y}_{ji}}\): predictive value). During the training progress, we used the dropout strategy with a probability of 0.2 on the fully connected layers and the L2 regularization strategy for weight and bias to prevent overfitting. All steps were run in Python version 3.7. We generated the model by using a computer with an NVIDIA GeForce RTX 2080 GPU, a 6 × Xeon E5-2678 v3 CPU, and 64 GB of memory.

Model Training and Validation

The patients in the training cohort (n = 127) composed the main cohort, and the corresponding data yielded an internal dataset for model development and cross-validation. The remaining patients in the validation cohort (n = 64) served as an independent cohort, and the corresponding data yielded a dataset for internal validation.

To further reduce the risk of model overfitting and make full use of our dataset, we adopted a fivefold cross-validation method to train and validate InceptionV4 in the training cohort. The method randomly divided the total existing data into five equal parts, among which four parts were used to train the whole model as the training set and the remaining part was used for testing as the testing set. In each iteration, we conducted model training on the training set, tested the well-trained model on the testing set, and calculated the accuracy. We repeated this series of steps five times and finally averaged the accuracy of the five folds to obtain the final accuracy. Additionally, the model trained in the training cohort was additionally validated in the validation cohort.

Evaluation Metrics, Comparative Methods, and Visualization Methods

We calculated the area under the ROC curve (AUC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV), and negative predictive value (NPV) as evaluation metrics to analyze the diagnostic capacity of the models. We compared the performance of the InceptionV4 model against that of the following methods: (1) a clinical model that combines multiple significant clinical features (age, number of HAIC cycles, tumor thrombus, extrahepatic metastasis and AST), including tumor thrombus, extrahepatic metastasis, and AST, and (2) two skilled radiologists. Two radiologists evaluated the response to HAIC according to the following features of the HCC lesions: (1) tumor size, (2) the distance between the tumor location and the hepatic hilar, (3) tumor number, (4) tumor margin, (5) portal vein tumor thrombosis, (6) gross growth type, (7) signal intensity on T2WI, (8) signal homogeneity on T2WI, (9) arterial enhancement intensity, (10) arterial enhancement range, (11) enhancement pattern, (12) tumor capsule, (13) hemorrhage, (14) necrosis, (15) fat, and (16) hepatic capsule bulge.

To interpret the CNN predictions, we used class activation mapping (CAM) to analyze the parameters of the model. The CAM is a tool that helps us visualize CNNs with heatmaps. Using CAM, we could clearly observe which area of the image the network focused on. After completing the training of the model, we used CAM to visualize the images and view the areas related to treatment response in heatmaps. It can be understood as the distribution of contributions to the predicted output. The higher the score, the higher the response and contribution of the corresponding region of the original image to the network, indicating the importance of each position to the classification. CAM was implemented by the package grad-cam (https://github.com/jacobgil/pytorch-grad-cam), and heatmaps were produced by using the package Matplotlib (https://pypi.org/project/matplotlib/).

Statistical Analysis

Given that some of our enrolled patients were treated with HAIC plus targeted therapy and/or immunotherapy within 2 weeks, we assigned patients to a HAIC monotherapy group and a HAIC plus targeted therapy and/or immunotherapy group. Then, we analyzed the clinical characteristics, best tumor responses, and survival differences between the two groups.

Continuous and categorical variables are expressed as medians with interquartile ranges (IQRs) and counts with percentages, respectively. Continuous and categorical variables were compared by using Student’s t test, the Mann‒Whitney U test, and the χ2 test/Fisher’s exact test, respectively. Variables with P < 0.05 in univariable analyses were entered into multivariable analyses. Univariate analyses and multivariate analyses for treatment response were performed with logistic regression models. The quantitative results of the AUC, SEN, SPE, PPV, NPV, and 95% confidence interval (CI) were calculated using the Clopper–Pearson method. Survival curves between groups were generated by Kaplan‒Meier analysis and the log-rank test. All the statistical analyses were performed with R 3.5.1.

The overall flowchart of the outline of our study is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of the outline of our study

Results

Patient Characteristics

A total of 444 patients with unresectable HCC treated with HAIC were studied, and 148 patients who accepted other treatments after HAIC were excluded. Additionally, 105 patients had incomplete data. Finally, we included 191 patients in the analysis, 127 of whom composed the training cohort and 64 of whom composed the validation cohort. The baseline characteristics of the patients are summarized in Table 1. Among all the enrolled patients, 91 patients (47.6%) had extrahepatic metastasis, including lymph node metastasis (67 patients), pulmonary metastasis (24 patients), bone metastasis (8 patients), brain metastasis (10 patients), and implantation metastasis (6 patients). The median OS was 4.0 (2.3–7.7) months for patients in the training cohort and 5.0 (2.9–9.7) months for those in the validation cohort. For patients in the training group, the percentages of patients who achieved CR, PR, SD, and PD were 2 (1.6%), 22 (17.3%), 86 (67.7%), and 17 (13.4%), respectively. For patients in the validation cohort, the percentages of patients who achieved CR, PR, SD, and PD were 3 (4.7%), 9 (14.1%), 39 (60.9%), and 13 (20.3%), respectively.

Table 1 Clinical characteristics of the study population

Treatment-Related Adverse Events

Treatment-related adverse events, which occurred in 53.9% of patients, are presented in Supplementary Table 2. The main adverse events were elevated ALT levels/elevated AST levels (34.9%), hyperbilirubinemia (27.9%), and pain (10.1%). Serious adverse events (grades 3–4) were found in 29 (28.2%) of the 191 patients, including 2 thrombocytopenia events, 1 leucopenia event, 2 pain events, 2 upper gastrointestinal bleeding events, 14 elevated ALT/elevated AST events, and 11 hyperbilirubinemia events. Among them, three patients had elevated ALT levels/elevated AST levels as well as hyperbilirubinemia.

Clinical Predictive Factors Related to DCR

As shown in Table 2, in the training cohort, age, number of HAIC cycles, tumor thrombus, extrahepatic metastasis, lymphocyte count, neutrophil-to-lymphocyte ratio (NLR), AST, and AKP were related to the DCR according to univariate logistic regression analysis. Multivariate logistic regression analysis confirmed age, number of HAIC cycles, tumor thrombus, extrahepatic metastasis, and AST as independent prognostic factors.

Table 2 Univariable and multivariable logistic regression analyses of predicting disease control rate (DCR) in patients after HAIC for unresectable hepatocellular carcinoma in the training group

Construction of the InceptionV4 Model

We combined 27 clinical features (age, sex, pre-HALC treatment, number of HAIC cycles, HAIC regimen, tumor number, maximum tumor size, tumor thrombus, extrahepatic metastasis, HbsAg, AFP, PIVKA-II, prothrombin time, PLT, neutrophil, lymphocyte, monocyte, NLR, PLR, total bilirubin, ALBI, ALT, AST, AKP, γ-GT, albumin level and CRP) and 5 image features obtained from 5 constant sequences (enhanced T1-weighted sequence [including arterial, portal, and delayed phase], T2 FSE sequence and dual-echo sequence) to complete the classification of patient status and constructed three classical CNNs, including AlexNet, ResNet, and InceptionV4. We calculated the CEL of various models to identify the most suitable model. The CEL curves of the three models for the training and validation groups are provided in Fig. 2. We randomly captured the information of one patient from the entire group in each epoch and performed 200 epochs in total. With increasing number of epochs, the CEL of the validation cohort was stable, and the best CEL was 0.7654 according to AlexNet, 0.8032 according to ResNet, and 0.7475 according to InceptionV4. Therefore, we used the InceptionV4 model as the treatment response prediction model for subsequent training and validation. The detailed information about the architecture of InceptionV4 is shown in Fig. 3.

Fig. 2
figure 2

Cross-entropy loss of various models. A Cross entropy loss index of different epochs in the training group and the validation group. B Cross-entropy loss curves of three CNN models in the training group. C Cross-entropy loss curves of three CNN models in the validation group

Fig. 3
figure 3

Architecture of inception

Cross-Validation Results and Internal Validation Results

In the training cohort, the AUCs of the Inception model, the clinical model, and the two radiologists were 0.871, 0.873, 0.807, and 0.783, respectively; in the validation cohort, the AUCs of these models were 0.826, 0.708, 0.806, and 0.748, respectively (Fig. 4B). Although there were no significant differences between the ROCs of the Inception model and the other models according to DeLong’s test (inception model vs. clinical model, Z = 1.39, P = 0.16; inception model vs. R1, Z = 1.43, P = 0.15; inception model vs. R2, Z = 0.72, P = 0.47) in the validation cohort, the Inception model had the greatest accuracy (AUC = 0.826, SEN = 0.769, SPE = 0.882, PPV = 0.625, NPV = 0.937) in predicting antitumor responses (Table 3).

Fig. 4
figure 4

ROC curves yielded by different models. A The cross-validation results. B The internal validation results

Table 3 Performance comparison of various methods

Visualization and Interpretability of the InceptionV4 Model

We artificially marked the most likely parts of the lesion in different sequences and planes. For example, as shown in the heatmaps produced by means of the CAM method (Fig. 5), we used red to indicate the most likely location of hepatic hilar tumor and portal vein tumor thrombosis. The purple and green backgrounds in the marked section reflect areas identified by the InceptionV4 model that have greater predictive significance. According to the brightness, we divided the remaining parts into blue and brown, indicating weaker predictive values. This shows that CNNs focused on the most predictive image features regarding treatment response.

Fig. 5
figure 5

Visualization of the InceptionV4 model. MRI images and heat maps of two HCC patients treated after HAIC. A, B Images in a 65-year-old man with best tumor response of CR, time to PD of 8 months. A Arterial phase image shows that HCC is not close to hepatic hilar. B Delayed phase image shows that the HCC patient is without portal vein tumor thrombosis or hepatic bile duct dilation. C, D Images in an 81-year-old man with best tumor response of PD, time to PD of 1 month. C Arterial phase image shows that HCC is close to hepatic hilar. D Delayed phase image shows that the HCC patient is with portal vein tumor thrombosis and hepatic bile duct dilation

Prognosis Analysis Based on Treatment Procedure, Pre-HAIC Treatment, BCLC Stage, AFP, PIVKA-II, AST, and γ-GT

Among the 191 enrolled patients, 67 patients received HAIC alone, and 124 patients received HAIC plus targeted therapy and/or immunotherapy. The related baseline demographics and clinical characteristics in the two groups are described in Supplementary Table 3. Moreover, there were no significant differences between the two groups. Moreover, we assigned patients to several subgroups based on pre-HAIC treatment status, BCLC stage, AFP, PIVKA-II, AST, and γ-GT. The OS and PFS rates in the different subgroups were determined via Kaplan‒Meier analyses with the log-rank test as shown in Supplementary Fig. 1. Patients who had an AST ≤ 40 had longer OS than individuals who had an AST > 40 (hazard ratio (HR) = 1.649 [95% CI, 1.017–2.674]; P = 0.0194). Patients who had γ-GT ≤ 60 had longer OS than individuals who had γ-GT > 60 (HR = 4.156 [95% CI, 1.865–9.262]; P = 0.0190). In addition, patients with AFP ≤ 400 or PIVKA-II ≤ 1000 had a higher ORR than did the other patients, patients with AST ≤ 40 had a higher DCR than did those with AST > 40, and patients with γ-GT ≤ 60 had a higher ORR and DCR than did those with γ-GT > 60.

Discussion

In this study, we developed and preliminarily validated a CNN model to predict the response to HAIC in patients with unresectable HCC based on radiologic and clinical factors. The CNN model achieved an outstanding predictive performance compared to the clinical model and our radiologists, suggesting that this CNN model may provide a noninvasive and effective method for HAIC response prediction. To the best of our knowledge, this is the first study to apply the deep learning of CNNs in treatment response prediction for unresectable HCC patients who undergo HAIC.

HAIC is a widely used locoregional therapy for unresectable HCC, especially in Asia, in which chemotherapeutic agents are directly delivered into targeted vessels at increased local concentrations [32]. However, due to the tremendous variations in tumor burden, liver function, and chemosensitivity, the treatment of HCC patients after HAIC is highly heterogeneous [33]. Therefore, a practical prognostic model based on objective measures is urgently needed to facilitate clinical decision-making.

Previous studies have shown that radiomics provides additional value to clinical factors in predicting the treatment response of HCC patients. For example, Park et al. reported that radiologic information obtained from pretreatment CT was thought to be an important factor for predicting the therapeutic response of patients with HCC after TACE, as this information could prevent unnecessary treatment [34]. Liu et al. suggested that pretreatment MRI-based radiomics could predict the prognosis of HCC patients treated with TACE combined with MWA [35]. Miyaki et al. reported that the combination of imaging assessment and tumor marker data could be useful for predicting the early response to HAIC and the prognosis of patients with advanced HCC [36]. Therefore, our study analyzed radiomic and clinical features to predict treatment response after the HAIC procedure.

In our study, the recruited patients had a median PFS of 3.0 (2.0–5.0) months, a median OS of 4.0 (2.3–7.7) months, an ORR of 18.9%, and a DCR of 86.6% in the training cohort; additionally, they had a median PFS of 3.6 (2.0–7.1) months, a median OS of 5.5 (2.9–10.2) months, an ORR of 18.8%, and a DCR of 79.7% in the validation cohort, values which were significantly lower than those of other previous studies [18, 2337]. This inconsistency may be attributed to the poorer baseline characteristics and more advanced stage of the recruited patients, which included tumor thrombus and extrahepatic metastasis. Moreover, more than half of the patients in our study had already received various therapies prior to HAIC, such as traditional TACE procedures. These therapies may increase the risk of decompensated liver function and damage the tumor-feeding artery, which impairs the delivery and perfusion of arterial infusion chemotherapy. Notably, since SD is also a result of effective treatment in our study, it is more accurate to consider patients who achieved CR, PR, or SD as having a positive treatment response.

Complex models, such as CNNs, can be affected by the problem of overfitting, which results in poor performance for data not included in the training process. Therefore, we divided the whole dataset into a training group and a validation group. Then, we used MRI images based on several constant sequences (enhanced T1-weighted sequences [including arterial, portal, and delayed phases]; T2 FSE sequence; and dual-echo sequence) to construct three representative CNN models, AlexNet, ResNet, and InceptionV4, in the training group. We selected the most appropriate model by CEL. The best-performing model yielded satisfactory predictions in the validation cohort, with a CEL of 0.7475 obtained by InceptionV4 at the 150th epoch. Finally, we combined the image features with clinical features as the total features of each patient in the SVM classifier because SVM has superior performance in classifying gene features and low sample size data [38]. Our model yielded satisfactory performance in the validation cohort, with an AUC of 0.826, an SEN of 0.769, an SPE of 0.882, a PPV of 0.609, and an NPV of 0.971.

Furthermore, we identified variables related to treatment response by logistic regression analysis in the training cohort. Age, number of HAIC cycles, tumor thrombus, extrahepatic metastasis, and AST (continuous variable) were found to be independent predictors. Our model had a greater AUC (AUC = 0.826) than did the clinical model (including the above five significant clinical features) and two experienced radiologists in the validation cohort. Additionally, it had optimal accuracy (SEN, SPE, PPV, NPV). Therefore, we believe that a combined model could be a more powerful predictor.

Recently, great progress has been made in deep learning, allowing machines to discover intricate structures in large datasets and change their internal parameters [29]. CNN has been applied to establish a direct connection between disease prediction and diagnostic images [39]. Conventional machine learning requires features from images to be extracted before learning, while CNNs allow the image itself to be used during the training process [29]. Therefore, CNNs enable all the information contained in images to be used, although this approach can be limited by the feature parameters chosen in conventional machine learning. However, deep learning models are black boxes, and future studies are needed to compare the efficiency of automated AI and handcrafted feature extraction methods.

Notably, in our study, there were no significant differences between the HAIC group and the HAIC combined with molecular targeted therapy/immunotherapy group regarding the best tumor response, OS or PFS. This is likely due to the increased risk of liver dysfunction caused by aggressive HAIC combination treatment that might compromise the potential survival benefits of initial treatment. According to previous studies, the survival benefit of HAIC combined with sorafenib in patients with unresectable HCC is controversial [19; 40]. Kudo et al. compared sorafenib with or without low-dose cisplatin and fluorouracil (HAIC) in a randomized phase III trial of advanced HCC patients [40]. The median OS was similar between patients treated with sorafenib plus HAIC and patients treated with sorafenib monotherapy (HR = 1.009 [0.743–1.371], P = 0.955). Thus, the optimization of HAIC combination treatment warrants future clinical studies in unresectable HCC patients. Our subgroup analyses suggested that the AST and γ-GT correlated with better OS outcomes. Low AST and γ-GT levels are correlated with better liver functional reserve.

Our study has several limitations. First, this was a retrospective study involving a single center with limited data, and selection bias was unavoidable. However, further external validation studies with larger prospective studies are needed before the findings can be applied to clinical use. Second, most patients enrolled in our study had hepatitis B virus–related HCC, whereas different ethnic backgrounds or common risk factors for HCC, such as alcoholic liver disease and HCV-related HCC, were not specifically evaluated in Western countries. Third, all the images were obtained from the same scanner, and our results require further validation with other MRI vendors to determine the accuracy of the findings. However, this is a pilot study in which deep learning was applied for treatment response prediction in patients with unresectable HCC after HAIC, resulting in a more accurate and reproducible prognostic evaluation than qualitative reasoning.

In conclusion, we prove that a CNN-based model can predict with good performance the treatment response in patients with unresectable HCC who undergo HAIC. The model may be an effective strategy to help physicians identify appropriate candidates for HAIC in unresectable HCC patients. With further validation, our CNN-based model has great potential to serve as a practical decision support tool in clinical applications.