Introduction

With mortality rates as high as 8.3%, liver cancer is the fifth most common malignant tumor and third leading cause of tumor-related deaths worldwide, posing a serious threat to public health [1]. Surgical resection and liver transplantation are the primary curative treatment options for hepatocellular carcinoma (HCC). However, most patients are diagnosed in the intermediate to advanced stages, and only approximately 5%-15% of patients with HCC have the opportunity to undergo radical surgery [2]. According to the Barcelona Clinic Liver Cancer (BCLC) staging criteria [3], transarterial chemoembolization (TACE) is the first-line treatment option for patients with intermediate to advanced HCC. In addition, drug-eluting beads transarterial chemoembolization (D-TACE) is an embolization therapy involving drug-eluting microspheres containing chemotherapeutic agents, known as “drug-eluting microspheres” owing to their long duration of action and low systemic toxicity and side effects. Moreover, D-TACE has been widely used in clinical practice.

Related studies have reported that D-TACE has greater advantages than conventional transarterial chemoembolization (c-TACE) in some HCC groups [4,5,6], such as patients with cirrhosis [6]. Moreover, D-TACE causes less postoperative pain compared with c-TACE [7]. However, as the cost of D-TACE is higher than that of c-TACE, noninvasive methods of efficacy prediction can reduce the financial burden on patients to some extent. In addition, some studies have shown that patients who achieve complete response (CR) after initial TACE treatment have a favorable long-term prognosis [8,9,10]. Therefore, effective monitoring of the short-term efficacy of TACE (especially after the initial treatment) is partly beneficial for physicians with regard to early adjustments to the therapeutic regimen, leading to longer survival times for these patients. Among imaging examinations, Magnetic Resonance Imaging (MRI) has high research value owing to its advantages in multi-parameter, multi-directional, and multi-functional imaging. In previous studies, some imaging signs were associated with the biological behavior of HCC, such as the maximum tumor diameter or degree of enhancement in the arterial phase [9]. Despite attempts by radiologists to standardize the evaluation of macroscopic tumor biological behaviors, the use of such qualitative imaging features in tumor characterization or efficacy assessment remains subjectively variable, even among senior physicians [11]. Recently, MRI-based radiomics showed improved predictive performance [12,13,14]. Therefore, our study was conducted to predict the short-term efficacy of initial D-TACE treatment in patients with HCC by constructing a preoperative multiparametric MRI radiomics model and further explore the relationship between the short-term efficacy of initial D-TACE treatment and long-term prognosis (progression-free survival).

Materials and methods

Research population

This retrospective analysis was approved by the Ethical Review Board of our hospital (IRB-2021–414), and the requirement for informed consent was waived. Previous studies on D-TACE prognosis-related radiomics included approximately 50–100 patients [15]; Therefore, a total of 113 patients with HCC confirmed by pathology or imaging examinations who received their initial D-TACE treatment between October 2018 and May 2022 at our hospital were included after the selection process.

The exclusion criteria were as follows: (1) preoperative history of targeted immunotherapy, (2) postoperative loss to follow-up, (3) lack of preoperative enhanced MRI or poor image quality, and (4) presence of diffuse HCC. The patient recruitment process is illustrated in Fig. 1. Finally, 113 patients were randomized into training (n = 79) and testing (n  = 34) groups.

Fig. 1
figure 1

Flowchart of study enrollment. OR Objective response, NR non-Objective response

Evaluation of efficacy

The short-term efficacy and long-term prognosis (PFS) of patients after the initial D-TACE treatment were evaluated according to the modified Response Evaluation Criteria in Solid Tumors (mRECIST). Patients were classified into two groups according to the treatment response to the initial D-TACE treatment. The objective response (OR) group included patients who achieved CR or partial response (PR), whereas the non-objective response (nR) group included patients with progressive disease (PD) or stable disease (SD). Efficacy evaluations were performed by three radiologists: two radiologists with more than 8 years of experience who completed the evaluations independently and another with more than 15 years of experience who was responsible for the efficacy evaluation in the occurrence of conflicting opinions. Figure 2 shows two examples of efficacy evaluations (OR vs. NR group).

Fig. 2
figure 2

Representative pre- and post-treatment CE-MRI images of for the OR and NR groups of patients with HCC conforming to the mRECIST. a A 70-year-old male HCC patient with a maximum tumor diameter of 10.2 cm underwent CE-MRI 2 days before D-TACE and CE-MRI review 35 days after treatment. AP, DWI, and T2WI images were obtained, and the results indicated CR according to the mRECIST, with no clear enhanced surviving lesions. b An 80-year-old female HCC patient with a maximum tumor diameter of 11.7 cm underwent CE-MRI 1 day before D-TACE and CE-MRI review 37 days after treatment. The results indicated SD; the maximum diameter of the surviving tumor lesion was approximately 10.6 cm, and the necrosis rate was approximately 9%. CE-MRI contrast-enhanced magnetic resonance imaging; OR objective response; NR non-objective response; AP arterial phase; DWI diffusion-weighted imaging; T2WI T2-weighted imaging

MRI protocol

All patients underwent CE-MRI of the upper abdomen within 1 month before operation and MRI of the abdomen within 4–6 weeks after operation. Before the scan, they underwent uniform breathing and breath-holding training. CE-MRI was performed in our department using two devices. DWI was performed using a cross-sectional single-shot plane echo imaging sequence with b-values of 50 and 800 s/mm, respectively. Detailed scanning parameters are described in Supplementary data, Table S1.

D-TACE procedure

The embolizing agents included CalliSpheres drug-loaded microspheres (Jiangsu Suzhou CalliSpheres Biomedical Technology Co., Ltd., China) or DC Beads drug-loaded microspheres (Boston Scientific International Medical Trading Co., Ltd., USA), and the diameter of one microsphere was specified as 70– 150, 100–300, or 300–500 um, which was selected according to the tumor size and blood supply. After sedimentation and discharge of the supernatant, an appropriate amount of the chemotherapeutic drug was loaded and diluted with a contrast agent (volume ratio, 1:1). Under fluoroscopy, the embolic agent was slowly injected into the blood-supplying artery of the tumor through the microcatheter. The endpoint of embolization was complete stagnation of blood flow in the tumor trophoblastic vessels or contrast agent retention.

Collection of clinical-imaging features and follow-up

Clinical-imaging characteristics were collected by two radiologists with more than 8 years of experience. The collected clinical information comprised three aspects: patients' baseline data, including demographic information and laboratory indicators; surgical information; and imaging features. The enrolled patients were followed up to assess PFS, mainly by clinical and imaging evaluations at the time of patient review every 1–3 months. PFS was calculated from the time of the initial D-TACE procedure to the time of progression, or if no disease progression was observed at the time of the last follow-up.

Radiomics feature extraction and analysis

CE-MRI (AP, DWI, and T2WI) results were output in Digital Imaging and Communications in Medicine (DICOM)format. Then We use publicly sourced software (3D Slicer version 5.1.0, https://www.slicer.org/) for semi-automatic outlining. Two radiologists from the abdominal subspecialty group outlined the regions of interest (ROIs). The three dimensional ROI included a pseudo-envelope at the outer edge, while avoiding extra-tumoral vessels as much as possible. Intra-observer agreement was assessed by calculating the intra-group correlation coefficient (ICC). To calculate the intra-observer ICC, MRI results from three sequences were randomly selected for 20 patients and outlined twice at 2-month intervals by an outliner. The intra-observer ICC was calculated after extracting the features using ROI segmentation. The reader was considered to have better consistency compared with the other readers when the ICC was > 0.75; therefore, the remaining target areas were completed by this reader.

Image preprocessing and feature extraction were performed using pyradiomics (version 3.1.0). We use two preprocessing methods in our study. The first is a linear interpolation algorithm to resample the images to a voxel size of 1 × 1 × 1 mm, and the second is a z-score method to normalize the radiomics features. [16, 17]. Then, 1781 radiomics features were extracted from each sequence (detailed data are available in the Supplementary Data, Table S2). A total of 5343 features were extracted from the three sequences. Radiomics feature values were normalized using the z-score method. The workflow of the radiomics analysis is shown in Fig. 3.

Fig. 3
figure 3

Analytical workflow of radiomics in our study. a Tumors are indicated by semi-automatic drawings of lines around the entire tumor contour in AP, DWI, and T2WI sequences, and three dimensional segmentations were created. b We used two methods of feature selection, mRMR and LASSO to select the features. Correlation coefficients were also calculated. c Radiomics models were built using logistic regression algorithms, and a nomogram was established, evaluated by calibration curve and decision curve analysis. AP arterial phase; DWI diffusion-weighted imaging; T2WI T2-weighted imaging; mRMR minimum redundancy maximum correlation; LASSO least absolute shrinkage and selection operator

Feature selection and radiomics model construction

We used two methods for feature selection and dimensionality reduction for radiomics. First, minimum redundancy maximum correlation (mRMR) was used to eliminate redundant and irrelevant features. Then LASSO was conducted to choose the optimized subset of features to construct the final model. The LASSO includes choosing the regular parameter λ, determining the number of the feature. The basic principle of LASSO is to introduce the L1 regularization term based on ordinary least squares to achieve feature selection and coefficient sparsification of the model by minimizing the objective function. The optimized features were selected to construct the final model, and the best radiomic features were chosen. Subsequently, a radiomics model was constructed based on the features selected for each sequence using multifactor logistic regression analysis. Rad-scores were calculated using a linear combination of selected radiomic features weighted according to their respective coefficients.

Clinical-radiomics modeling and evaluation

We further introduced clinical factors that were highly correlated with D-TACE efficacy into the prediction model to improve the prediction performance based on Rad-scores. First, clinical variables (p values < 0.1 in the univariate analysis and the selected radiomics features were subjected to multifactorial logistic regression analysis to identify the independent risk factors affecting short-term efficacy (p value < 0.05). These independent risk factors were then analyzed using a multifactorial logistic regression algorithm to establish a clinical-radiomics model. Finally, a nomogram of clinical-radiomics was created. The area under the curve (AUC) was plotted to assess the predictive performance of the model. DeLong's test was applied to compare the differences in AUCs between the models. In addition, we used calibration and decision curves to assess the calibration and clinical application of the model. Finally, the AUCs were calculated, calibration curves were plotted using the formulas constructed in the training set, and the model was validated in the testing set.

Statistical analysis

The two primary endpoints of interest in this study were the actual short-term efficacy of the initial D-TACE treatment and PFS. The former clinical-radiomics model was developed and assessed using R software (version 4.1.3, R Foundation for Statistical Computing, Vienna, Austria). Descriptive statistics and assessment of PFS for the different short-term efficacies (actual and predicted) were performed using SPSS software (version 26.0, IBM Corp., Armonk, NY, USA). Normally distributed continuous variables are denoted by the mean ± standard deviation (Mean ± SD), and those that are not normally distributed are denoted by the median (P25, P75). Qualitative data were represented as numbers (n) and percentages (%). The t-test or Mann–Whitney U test was used to compare quantitative data. Categorical variables were analyzed using chi-square or Fisher’s exact tests. A rank-sum test was used to analyze the rank data. Risk factors with p values < 0.1 in the univariate logistic regression analysis were included in the multivariate logistic regression model. Survival curves were constructed using the Kaplan–Meier method, and group comparisons were made using log-rank tests. The performance of our models was assessed by the AUCs and its 95% confidence interval (CI). We compared the AUCs of different models using Delong's test. p values < 0.05 were considered statistically significant.

Results

Clinical-imaging characteristics and follow-up results

Table 1 summarizes the clinical-imaging characteristics of the 113 patients. A total of 74 patients were categorized into the OR group and 34 patients into the NR group. The results of multivariate logistic regression showed that microsphere type was a valid predictor (p  = 0.042, OR = 0.224, 95% CI 0.053–0.948).

Table 1 Baseline clinical characteristics of patients with HCC in the training and testing datasets

More than half of the patients in the training (71%) and testing (53%) sets had OR in the efficacy evaluation of the initial D-TACE treatment, with a median time from initial the D-TACE procedure to final follow-up of approximately 9.5 (95% CI 6.6–12.4) months. The Kaplan–Meier curves of the actual and predicted short-term efficacies and PFS times for the two groups are shown in Fig. 4a, b. The median survival times in the actual OR and NR groups were 8.5 (95% CI 4.8–12.2) and 4.5 (95% CI 0.0–9.0) months, respectively, and the difference was statistically significant (p  = 0.027). A statistically significant difference was also observed in median survival times between the groups of patients that the best model predicted in the OR vs. NR analysis (p  = 0.011). Patients were divided into four groups (as shown in Fig. 4c) according to the Kaplan–Meier curve for efficacy based on the mRECIST. The median PFS times were 13.1 (95% CI 6.6–19.6), 8.0 (95% CI 6.7–9.3), 11 (95% CI 3.8–18.2), and 1.5 (95% CI 1.1–1.9) months in the CR, PR, SD, and PD groups, respectively, with a statistically significant difference (p  < 0.001). This finding indicates that OR versus NR status after the initial D-TACE treatment was associated with a better long-term prognosis, with the longest PFS observed in the CR group.

Fig. 4
figure 4

Kaplan–Meier curve for HCC survival. a Actual response status and b predicted response status by the combined model of two groups. c Actual response status of four groups

Feature selection and radiomics modeling processes

After ICC analysis, no statistically significant differences were observed in the two outlining results for observer A (p  = 0.85); therefore, the final outlining assignment was completed by observer A. In total, 5343 features (1781 features per sequence) were extracted from the AP, DWI, and T2WI sequences. Redundant features were eliminated sequentially using the mRMR method, with 20 features retained per sequence. Subsequently, 10 AP, 7 DWI, and 9 T2WI, combined with 8 triple-sequence, radiomics features were screened by LASSO and five-fold cross-validation, for a total of 34 radiomics features (Detailed information is available in Supplementary Data, Fig. S1). The Rad-score was calculated for each patient, and the formula is described in the Supplementary Data, Table S3.

Clinical-radiomics model construction and evaluation

Univariate analysis showed that three clinical (microsphere size, microsphere type, and drug load type) and two imaging (maximum tumor diameter and tumor-to-liver volume ratio) characteristics were statistically significantly different (Supplementary Data, Table S4). Multifactorial analysis revealed that microsphere type (odds ratio = 0.224, CI = 0.053–0.948, p  = 0.04) was an independent risk factor for predicting efficacy. Therefore, we used microsphere types to construct the clinical-radiomics models.

We built a total of six predictive models: four radiomics models, a clinical-imaging model, and a clinical-radiomics model. Among these radiomics models, the best performance was observed in the AP phase, with AUCs of 0.89 (95% CI 0.81–0.96) and 0.85 (95% CI 0.69–1.00) in the training and testing sets, respectively (Table 2). The ROC curves for the radiomics models are plotted in Fig. S2 (Supplementary Data). The model combining clinical factors (microsphere type) and radiomics models (AP) performed the best among all models, with AUCs of 0.902 (95% CI 0.832–0.959) and 0.845 (95% CI 0.708–0.944) in the training and testing sets, respectively. Moreover, the addition of the Rad-score to the clinical-imaging model significantly improved the predictive efficiency of the training set (AUC, 0.717–0.902; p  = 0.004), suggesting that the combined model maximizes the predictive value of efficacy. In the DeLong test (Supplementary Data, Table S5) for both the clinical-imaging and combined models comprised each of the four radiomics models, the combined model had the best performance (AP, p  = 0.002; DWI, p < 0.001; T2WI, p  = 0.005; triple-sequence, p  = 0.002). However, the DeLong test for the radiomics and combined models did not show a statistically significant difference between the two groups in terms of predictive performance (AP, p  = 0.32; DWI, p  = 0.84; T2WI, p  = 0.25; triple-sequence, p  = 0.58). Furthermore, ROC curves showed that radiomics curves based on a single sequence or three sequences frequently overlapped with those of the combined model, with much greater clinical efficacy than that of a single clinical-imaging model (Fig. 5).

Table 2 The performance of different models in the training and test sets for treatment response prediction
Fig. 5
figure 5

ROC curves of the combined model based on different sequences in the training set. AP. b DWI. c T2WI. d Triple-sequence. AP arterial phase; DWI diffusion-weighted imaging; T2WI T2-weighted imaging

Establishment and evaluation of nomogram

The nomogram was constructed based on the aforementioned predictive model, which effectively combined Rad-Scores and the selected clinical predictors (Fig. 6a). The calibration curves showed good calibration in the training set, which was slightly worse in the testing set (Fig. 6b, c). The Hosmer-Lemesow test showed no statistically significant difference in either set (p  = 0.235 < 0.05), indicating good agreement between the nomogram predictions and actual observations.

Fig. 6
figure 6

Nomogram and calibration curves of proposed model. a Nomogram of combined model integrating Rad-score and significant clinical variables. b Calibration of the combined model in the training set. c Calibration of the combined model in the testing set

Clinical-radiomics model and nomogram for clinical application

Decision curves illustrate how well the radiomics model, the clinical-imaging model, and the combined model of clinical utility perform. Moreover, the addition of the radiomics model resulted in significantly higher clinical efficacy in the combined model, with the combined model achieving the best clinical efficacy among all models.

Discussion

TACE plays an irreplaceable role in transformation therapy and is a nonsurgical alternative for patients with moderate to advanced HCC. However, owing to the high heterogeneity of HCC = c, the clinical response to TACE varies widely among patients with HCC. In addition, D-TACE is more expensive than c-TACE. Therefore, if the short-term efficacy of D-TACE can be predicted preoperatively, the treatment plan can be adjusted according to the patient’s specific condition. This can simultaneously improve the prognosis of the patients, reduce the economic burden, and avoid the wastage of medical resources.

D-TACE is an interventional approach that differs from c-TACE. To the best of our knowledge, most studies have applied radiomics to predict the efficacy of c-TACE [13, 19, 20] or a combination of the two TACE approaches [12, 21] rather than D-TACE, resulting in fewer studies applying radiomics to predict D-TACE efficacy alone and even fewer studies based on CE-MRI sequences. Tipaldi et al. [15] predicted the short-term efficacy of D-TACE based on preoperative CT texture features in 50 patients with HCC, with an optimal texture feature AUC of 0.733. Ultimately, the combined clinical-radiomics model performed the best in our study, which is in agreement with the results of a previous study [19]. However, its predictive performance was lower than that in our study, possibly owing to the following reasons. First, the features extracted in our study included first-, second-, and higher-order features, and were therefore more varied and comprehensive. Second, the soft-tissue contrast in the MRI sequences was better than that in the CT images, which showed the details of the lesions in a subtle manner. A previous study demonstrated that MRI outperformed CT for predicting the presence of microvascular invasion in patients with HCC, with a maximal diameter of 2–5 cm [22]; however, radiomics studies comparing the advantages of CT and MRI for predicting the efficacy of D-TACE are lacking.

We used two different MRI scanners with the same field strength (3.0 T) in our study, and the radiomics results were unaffected. First, the MRI data were subjected to image normalization, gray level standardization, and image alignment before modeling. Second, we used two different MRI scanners of the same field strength (3.0 T) controlled the differences between parameters to some extent. This phenomenon also better approximates a clinical practice setting, as it is impossible to guarantee that all patients will be imaged using the same scanning equipment.

In our study, a model based on three preoperative CE-MRI sequences and clinical-imaging features was developed and validated to predict the short-term efficacy of D-TACE. The model successfully combined the radiomics Rad-scores with clinical -imaging features to predict short-term efficacy, with patients categorized into OR and NR groups. Our study cohort consisted of 113 patients with HCC, and 79 (70%) patients actually achieved OR, suggesting that the majority of patients with HCC had favorable D-TACE efficacy. Most of the study participants were men with hepatitis B, BCLC stage B or C, similar to previous studies [23] and in line with the epidemiological characteristics of HCC [24], clinical practice, and therapeutic strategies for BCLC stage [3].

Our study included two endpoints. First, to evaluate the short-term efficacy of the initial D-TACE treatment in patients with HCC, four radiomics models were established using three CE-MRI sequences, which were also combined with the relevant clinical-imaging factors to predict the short-term efficacy of the initial D-TACE treatment in patients with HCC (mainly in the intermediate to advanced stages). Finally, a total of six models were constructed. Among these models, clinical-imaging features combined with a model based on AP radiomics demonstrated the best predictive efficacy. This may be because the AP-based radiomics model performed the best among the radiomics models, showing better accuracy, sensitivity, and specificity. The addition of other sequences in the combined model did not significantly improve the predictive performance (p  = 0.435). We hypothesized that the AP is more indicative of lesion heterogeneity. Obviously, tumor heterogeneity can be generated by variations in cell density, neovascularization, and the extracellular matrix [25]. Thus, the AP was the best performer among the four radiomics models in our study. Notably, the best AUC value was not obtained for the radiomics model consisting of the three sequences together, with AUC values of 0.87 and 0.84 for the training and testing sets, respectively. This is similar to the findings of Liu et al. [14], with the highest predictive efficacy reported for the augmented delayed-phase radiomics model (AUC = 0.907), which was superior to that of the T2WI, AP, portal phase, and delayed-phase multi-parameter radiomics models consisting of four sequences (AUC = 0.875). In the present study, this may be related to multiparametric feature redundancy and different image resolutions (layer thickness in the AP was different from that in T2WI and DWI in the present study). The radiomics features in our study for the predictive model included seven texture-based features and three intensity-based features in the first-order features, and shape-based features were not incorporated (The features and their definitions in the model are shown in the Supplementary Data S2 and Fig S1). Kurtosis, 10 Percentile, and Maximum are intensity-based features that are capable to study the histogram properties of tumor intensity. Grey Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM) are all texture-based features that have the advantage of complementing the spatial relationships of neighboring pixels[26]. And these are not available with shape or intensity-based features. Moreover, texture-based features can reflect the heterogeneity of tumors[26, 27]. This is a valuable reminder for determining the differences in the efficacy of D-TACE between individuals with different HCCs.

These results suggest to radiologists that the homogeneity of tumor intensity and the distribution of enhancement in the tumor in patients with HCC have the potential to help clinicians predict, to a certain extent, the efficacy of D-TACE. Simultaneously, radiomics also demonstrates the advantage of more subtle biological features of tumors that cannot be distinguished by the radiologist's naked eye.

In addition, shape-based features did not appear to reflect excellent predictive efficacy in our study, which is consistent with the absence of morphological information such as tumor volume in the clinical-imaging portion of the model. This might be explained by the fact that our study population was dominated by patients with massive HCC, with little variability in morphology between individual tumors. It thereby leads to no significant correlation between morphological type of information and D-TACE efficacy.

Among the clinical-imaging characteristics, only microsphere type was used to construct the combined model. The effects of microsphere size and drug-carrying concentration on D-TACE efficacy have been reported in clinical studies [28,29,30], whereas microsphere and drug-carrying types have been reported less frequently. Regarding drug-carrying types, drugs have toxic strengths. Idarubicin, for example, was the most cytotoxic of the 11 anticancer drugs tested against the three HCC cell lines [31]. In addition, the release rates vary among different drugs, which affects the degree of drug penetration into the tumor tissue and its distribution in the liver tissue [32]. Few studies have explored the effects of microsphere types on D-TACE efficacy. From the perspective of microsphere construction, different chemical structures of microspheres have different drug-carrying capacities and drug release rates [33]. In addition, the mechanical properties of the microspheres (such as compressibility and deformability) may affect their distribution density in the arteries, embolization effect, and suitability for catheterization [34]. Finally, microsphere type proved to be a clinical factor associated with the short-term efficacy of D-TACE in our study, whereas the drug-carrying type and microsphere size were excluded from the multifactorial logistic regression. We hypothesized that this may be due to the wide variety of microsphere sizes and drug-carrying types used in our relatively small cohort, which resulted in unbalanced data.

However, our clinical-imaging model showed low predictive performance in the training and testing sets, possibly because the clinical factors and macroscopic imaging features cannot reflect tumor heterogeneity better than the combined clinical-radiomics model. However, when the Rad-score was added, it outperformed the single clinical-imaging model. We hypothesized that this phenomenon is related to the sample size [35]. In our research, AP-based radiomics combined with microsphere types form a clinical-radiomics model with the best predictive performance. From one side, as mentioned above, radiomics can provide high-dimensional biological information about tumor heterogeneity, which is intrinsic information that determines the variable individual efficacy of D-TACE. As for the other side, microsphere types are selectable during D-TACE treatment, which is extrinsic information that determines the efficacy of patients with HCC. Thus, the combination of the both can provide the best predictive effect.

The secondary endpoint was PFS after initial D-TACE treatment in patients with HCC. Kaplan–Meier and log-rank tests showed that patients whose actual short-term efficacy after initial D-TACE was classified as OR had a better long-term prognosis compared with patients classified as NR (p  = 0.03). The group of patients attributed to the best prediction model with an outcome of OR also had a better long-term prognosis than that in patients with predicted NR (p  = 0.01). This validates the effect of the model established in this study and provides supporting evidence for the clinical significance of predicting the short-term efficacy of initial D-TACE treatment. Moreover, the CR group had the best long-term prognosis (p < 0.001). The time to progression differed significantly between the CR and PD groups, with the longest PFS after the initial D-TACE observed in the CR group. The Kaplan–Meier curves indicated that the survival curves for the PR and SD groups overlapped, and the time to progression was longer in the SD versus PR group with the accumulation of follow-up time, which may have been caused by the unbalanced distribution of the number of cases in each group and the highly censored values used in this study. Therefore, this should be explored in a larger prospective study. Related studies have been conducted on this topic. For instance, Kim et al. investigated the relationship between patients who achieved CR after initial TACE treatment and patients who achieved CR after multiple TACE procedures and their respective overall survival (OS) rates among 314 patients with HCC, the results showed that an initial CR was the strongest predictor of a favorable prognosis [36]. However, both studies focused on analyzing OS after c-TACE; therefore, our study is innovative in that we evaluated the relationship between long-term prognosis and the short-term efficacy of D-TACE.

Our study has some limitations. First, this was a single-center, retrospective study with a small sample size. In addition, selection bias was unavoidable during the case-screening process. A larger sample size is required to validate our results. In addition, cases from multiple centers are required for external validation, which should be considered in future studies. Second, not all included patients had pathological results; therefore, a risk of bias existed. We also intend to include HCC cases with different pathological results for a more advanced stratification in future studies. Third, our second study endpoint was PFS, although the gold standard for long-term patient prognosis is OS, which we hope to include in subsequent studies.

Furthermore, with the results of our study, we identify directions that we can continue to explore in the future. We have only utilized one single imaging modality, but other imaging methods such as ultrasound, CT and even positron emission tomography (PET) can provide a more comprehensive information about the biological behavior of the tumor. In addition, if genomics at the molecular level is combined with radiomics at the imaging level, this multimodal data may provide more comprehensive prognostic information. These deserve to be explored further in the future.

In conclusion, we showed that MRI-based radiomics features can predict the short-term efficacy of initial D-TACE treatment in patients with HCC and played a leading role in the combined clinical-radiomics model.