Introduction

Transcatheter arterial chemoembolization (TACE) is the most commonly used nonsurgical treatment modality for patients with hepatocellular carcinoma (HCC) [1]. TACE has shown a survival benefit over best supportive care for patients with unresectable HCC in two randomized controlled trials [2, 3] and is the first-line treatment recommended for Barcelona clinic liver cancer (BCLC) intermediate stage (multinodular, preserved liver function, and good performance status) HCC [4, 5]. TACE has also shown efficacy for early-stage and advanced-stage HCC [6,7,8] and is widely practiced across the BCLC stages [9].

Proper assessment of risk and benefit is a fundamental step in planning TACE. To date, there are several tools to guide further treatment strategies. ART score has been proposed to guide the re-treatment strategy for patients who underwent TACE [10]; it is composed of aspartate aminotransferase (AST) increase, radiologic tumor response, and Child–Pugh score increase after the first TACE. ABCR score has been proposed, which is composed of serum alpha-fetoprotein (AFP) level, BCLC stage, Child–Pugh score increase, and radiologic tumor response [11]. SNACOR score has also been proposed, which is composed of tumor size, tumor number, AFP level, Child–Pugh class, and radiologic response [12]. These models can help predict outcome for patients who underwent TACE and can be used to guide further treatment strategies. Notably, Child–Pugh score is included in all of these models [10,11,12].

Recently, the albumin–bilirubin (ALBI) grade was suggested and validated as a better method of assessing liver function than Child–Pugh score in HCC patients [13, 14]. Therefore, we tested whether ALBI is better than Child–Pugh score at predicting overall survival in HCC patients who underwent TACE. Then, we developed a risk prediction model that integrates the ALBI grade as a stratifying biomarker of liver reserves. Finally, we validated the newly developed risk prediction model by investigating whether it can estimate prognosis and guide further treatment strategies.

Methods

Study Population

This is a retrospective cohort study based on a prospective HCC registry from Samsung Medical Center, Seoul, South Korea. The HCC registry for the period of January 2007 to December 2012 was reviewed for potential inclusion in this study. A detailed description of the Samsung Medical Center HCC registry is described in our previous paper [15]. Briefly, all newly diagnosed HCC patients who received care at our institution were prospectively registered. HCC was diagnosed either histologically or clinically, according to regional guidelines [16]. Well-trained abstractors collected data, including age at diagnosis, gender, date of diagnosis, etiology, liver function, tumor characteristics, tumor stage, and initial treatment modality. From this registry, we included 1254 patients who received conventional iodized oil-based TACE as an initial treatment. Of those, we excluded 303 patients who met any of the following exclusion criteria: (1) major vascular invasion (n = 126), defined by a tumor invading main portal vein, (2) bile duct invasion (n = 21), (3) extrahepatic spread (n = 4), (4) combined treatment (radiofrequency ablation, radiation, or sorafenib) (n = 86), (5) received a liver transplantation during the follow-up period (n = 58), and (6) Child–Pugh class C (n = 8). Finally, a total of 951 treatment-naïve patients with BCLC stage 0 to C HCC who received TACE as the first-line treatment were analyzed. The entire study population was randomly divided into a derivation set (50%, n = 476) and an internal validation set (50%, n = 475). We also validated the risk score consisted of patients who received care at Seoul National University Hospital, Seoul, Korea, using the same inclusion and exclusion criteria and consisted of 243 patients (external validation cohort). The study was reviewed and approved by the Institutional Review Board at Samsung Medical Center and by the Institutional Review Board at Seoul National University Hospital. Because the study is based on a retrospective analysis of existing administrative and clinical data, the requirement for obtaining informed patient consent was waived by the Institutional Review Board.

TACE Procedure and Follow-Up

After selective arteriography of the superior mesenteric, celiac, and common hepatic arteries using a 5-French catheter, a microcatheter was positioned into or as close as possible to the tumor feeding branch. A mixture of doxorubicin hydrochloride (adriamycin, Ildong, Seoul, Korea) and iodized oil (Lipiodol; Guerbet, Aulnay-sous-Bois, France) was slowly infused through that catheter. The doses of iodized oil and doxorubicin were determined based on the size and vascularity of the tumor; the maximum doses of iodized oil and doxorubicin administered in a single session were 25 ml mg and 70 mg, respectively. Thereafter, embolization was performed with 1- to 2-mm-diameter gelatin sponge pledgets (Cutanplast, Mascia Brunelli, Milan, Italy) until blood flow ceased. Patients were followed up in 1 month with contrast-enhanced CT or MRI to evaluate their response to treatment. Then, patients were monitored at 3–6-month intervals. When recurrence was identified, subsequent treatment was performed if clinically feasible.

Variables

The primary end point was overall survival, which was defined as the time from the initial TACE procedure to death or last follow-up, whichever comes first. We used the following variables for the HCC registry: age at diagnosis, sex, etiology of liver disease, Child–Pugh score, albumin, bilirubin, alanine aminotransferase (ALT), AST, maximum tumor size, number of tumors, presence of vascular invasion, AFP, protein induced by vitamin K antagonist (PIVKA-II) level, and BCLC stage. For this study, we additionally collected tumor response and Child–Pugh score, albumin, bilirubin, AST, AFP, and PIVKA-II level at the time of response evaluation after the initial TACE session. Radiologic response was evaluated with mRECIST criteria, as described in a prior report [17], with four response categories: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD).

Statistical Analyses

We tested whether ALBI grade can further stratify overall survival within patients with the same Child–Pugh score or vice versa using the Kaplan–Meier method, and differences were assessed using the log-rank test. To generate the risk prediction model, the entire study population was randomly divided into a derivation set (50%, n = 476) and a validation set (50%, n = 475). We tested baseline variables (age, sex, etiology of liver disease, ALBI grade, AST, ALT, alkaline phosphatase, prothrombin time, platelet count, albumin, creatinine, AFP, PIVKA-II, number of tumors, maximum tumor size, presence of segmental vascular invasion, and BCLC stage), post-TACE variables at the first response valuation (week 4) (radiologic response, platelet, prothrombin time, albumin, AST, ALT, alkaline phosphatase, creatinine, AFP, PIVKA, and ALBI grade), and delta variables (change in AST, change in ALBI grade, change in AFP, and change in PIVKA-II) for possible associations with overall survival using the Cox proportional hazards model in the derivation set. The performance of the prediction model was tested using the C-index in the validation set. Numeric variables (e.g., AFP) were categorized based on cutoffs. When choosing the cutoffs for numeric variables, different cutoffs were tested and the cutoff point was determined as the point that produced the highest C-index (data not shown). For multiple category variables (e.g., ALBI grade, BCLC stage, and radiologic response), patients were simplified into two or three categories based on the C-index of the final model. The regression coefficient of each risk predictor from the multivariable Cox proportional hazards model was divided by the regression coefficient of the lowest value, and the resulting number was rounded to an integer value to generate each score. Area under the receiver operating characteristics (AUROC) curve was calculated to measure the performance of the risk prediction model, and cutoff point was chosen to group patients into low and high risk scores. The developed risk score was validated internally using randomly assigned validation cohort (n = 457) and using independent cohort from another institution (n = 243). A two-sided p value < 0.05 was considered significant.

Results

Baseline Characteristics

The baseline characteristics of the study subjects are summarized in Table 1. Vascular invasion limited to a segmental or subsegmental branch of the portal vein was observed in 113 patients (11.9%). The internal validation cohort had similar baseline characteristics as the derivation cohort (Table 1). However, there were significant differences between derivation and external validation cohort, in terms of age, Child–Pugh class, ALBI grade, incidence of vascular invasion, baseline AFP levels, and initial TACE response (Table 1).

Table 1 Comparison of baseline characteristics

Comparison Between Child–Pugh Score and ALBI Grade

The time-dependent AUROC for overall survival showed the ALBI grade as a better prognostic indicator than Child–Pugh class (Table 2). Among 814 patients with Child–Pugh class A HCC, the ALBI grade was 1, 2, and 3 for 429 (52.7%), 384 (47.2%), and 1 patient (0.1%), and overall survival was significantly different according to the ALBI grade among patients with Child–Pugh class A (Fig. 1a). Of the 137 patients with Child–Pugh class B, the ALBI grade was 1, 2, and 3 for 1 (0.7%), 112 (81.8%), and 24 patients (17.5%), and overall survival was significantly different according to the ALBI grade among patients with Child–Pugh class B HCC (Fig. 1b). There were 599 patients with a Child–Pugh score of 5. The ALBI grade was 1, 2, and 3 for 398 (66.4%), 200 (33.4%), and 1 patient (0.2%), respectively, among the patients with a Child–Pugh score of 5, and overall survival was also significantly different according to the ALBI grade among patients with a Child–Pugh score of 5 (Fig. 1c). When classified according to the ALBI grade, the Child–Pugh class was A for 429 of 430 patients (99.8%) with an ALBI grade 1, and the Child–Pugh class was B for 24 of 25 patients (96.0%) with an ALBI grade 3. Among 496 patients with ALBI grade 2 HCC, the Child–Pugh class was A for 384 patients (77.4%) and B for 112 patients (22.6%). Overall survival was not different according to Child–Pugh class among patients with an ALBI grade of 2 (Fig. 1d).

Table 2 Predictive performance of ALBI grade versus Child–Pugh class for overall survival in whole study population (n = 951)
Fig. 1
figure 1

Overall survival was significantly different according to ALBI grade among patients with the same Child–Pugh score: Child–Pugh class A (a), Child–Pugh class B (b), or Child–Pugh score 5 (c). However, overall survival was not different according to Child–Pugh class in patients with an ALBI grade of 2 (d)

After TACE, an increase in Child–Pugh score (more than one point) was observed in 207 patients (21.8%). Of them, 53 patients had more than 2-point increase in Child–Pugh score. The survival was worse in patients who had an increase in Child–Pugh score after TACE (40.5 vs. 57.5% at 3 years, p < 0.001). There were 163 patients (17.1%) who had an increase in ALBI grade after TACE. Likewise, those with an increase in ALBI grade showed worse survival than those without (41.6 vs. 56.3% at 3 years, p = 0.004).

Development of the Risk Model

Independent predictors of survival were selected from Cox regression analysis of data from the derivation cohort. The performance of the model was tested in the validation set using the C-index (Supplementary Table 1). Finally, ALBI grade (1 vs. 2 vs. 3), BCLC stage (0/A vs. B vs. C), radiologic response according to mRECIST criteria (CR + PR vs. SD + PD), baseline AFP level (≥ 100 ng/ml vs. < 100 ng/ml), and sex (male vs. female) were selected as variables to generate the risk prediction model. The risk scoring model (ABRAS score: ALBI grade, BCLC stage, radiologic response, AFP level, and sex) was the sum of each score assigned to five risk predictors and ranged from 0 to 8 (Table 3).

Table 3 Prognostic factors for overall survival in the derivation cohort (n = 476)

Predictive Performance of the ABRAS Score

The median follow-up duration (range) was 3.0 years (0.1–8.8 years), 3.0 years (0.1–8.7 years), and was 2.2 years (0.1–4.1 years) for derivation, internal validation, and external validation set. In the derivation set, the time-dependent AUROCs of the ABRAS model for overall survival were 0.78, 0.73, and 0.75 at 1, 3 and 5 years, respectively (Table 4). The time-dependent AUROCs of the model for overall survival were 0.78, 0.71, and 0.72 at 1, 3, and 5 years, respectively, in the internal validation set and were 0.70 and 0.65 at 1 and 3 years in the external validation set (Table 4). Compared to BCLC stage or ALBI grade alone, ABRAS score showed higher AUROCs (Supplementary Table 2). The ABRAS model was also better than treatment response after TACE alone (Supplementary Table 2). According to their risk score, patients were stratified into low (score 0–2) and high risk (score 3–8). Survival was significantly different according to risk group, with a 3-year survival rate of 72 and 30%, respectively, in the derivation set (Fig. 2a), and 69 and 35%, respectively, in the internal validation set (Fig. 2b) (p < 0.001). Survival was also significantly different according to risk group in external validation set (2-year survival rate of 94 and 84% for low and high risk scores, p = 0.008) (Fig. 2c). When stratified by BCLC stage, the best cutoff point for the ABRAS score was score of 2 for BCLC stage 0 and stage A, score of 3 for BCLC stage B, and score of 4 for BCLC stage C (Fig. 3), which corresponded to cutoff score of more than 2 using ALBI grade, radiologic response, AFP level, and sex.

Table 4 Predictive performance of prediction model to predict overall survival in the derivation and validation set
Fig. 2
figure 2

Overall survival was significantly different according to the ABRAS score in each study model. Those with a high score showed a lower survival than those with low score: derivation set (a), internal validation set (b), or external validation set (c)

Fig. 3
figure 3

Overall survival was significantly different according to the ABRAS score. Those with a high score showed a lower survival than those with low score in each BCLC stage: BCLC stage O (a), BCLC stage A (b), BCLC stage B (c), or BCLC stage C (d)

Discussion

Patients who are receiving TACE present with varying tumor burdens and liver functions, and there is considerable variation in the clinical benefit that patients derive from TACE [18, 19]. In this respect, a good risk prediction model is needed for HCC patients undergoing TACE to optimize HCC management strategies. In this study, we developed and validated a new risk prediction model (ABRAS score) that can be used to predict prognosis and guide further treatment strategies for patients undergoing TACE. The five variables used in this model are simple clinical variables that can easily be recognized (ALBI grade, BCLC stage, radiologic response, AFP level, and sex). Age, sex, BCLC stage, and radiologic response after TACE have been reported as important factors associated with prognosis in HCC patients [1, 13, 20]. The uniqueness of this model is that the ABRAS score uses the ALBI grade to assess liver function which is a more objective and brief value than Child–Pugh class that is included in previous models to guide the re-treatment strategy of TACE.

Notably, the ALBI grade could discriminate between distinct prognostic groups within the same category of liver function as assessed by Child–Pugh class, while Child–Pugh class could not discriminate between distinct prognostic groups within the same category of ALBI grade. Assessment of liver function is particularly important because cirrhosis is a competing cause of death in HCC patients [13]. In this study, even in patients with a Child–Pugh score of 5, there were significant differences of survival according to ALBI grade, indicating that ALBI can further distinguish a low-risk group from a relatively poor-risk group within the same category based on Child–Pugh score. The ABLI grade has another advantage in that it is entirely composed of objective values. Consistent with the results of this study, ALBI grade was shown to be a better tool than Child–Pugh score in the prediction of survival in HCC patients [14, 21, 22].

The prognostic performance of the ABRAS model in predicting overall survival was acceptable, expressed as time-dependent AUROCs of 0.73–0.78 in the derivation set and 0.71–0.78 in the internal validation set. The performance of the ABRAS model was better than BCLC stage, ALBI grade, or TACE response alone (Supplementary Table 2). When compared to derivation cohort, there were no significant differences in the baseline characteristics in internal validation cohort, while several characteristics were different in external validation cohort. The external validation cohort was comprised of older patients, poorer liver function, but less advanced tumor with lower serum AFP levels. Follow-up duration was shorter in external validation cohort as well. Initial treatment response was better in external validation cohort with better survival. The differences in the several characteristics suggest that external validation cohort is composed of quite different population, although all of them received TACE as an initial treatment strategy. However, ABRAS score was also able to differentiate patient prognosis in the external validation cohort as well, although AUROCs of the model for the overall survival was slightly lower (0.70 and 0.65 at 1 and 3 years) than derivation cohort (0.78 and 0.73 at 1 and 3 years). When patients were stratified into two groups according to each one’s ABRAS score, patients showed a difference in survival without any overlap in the survival curve among high- and low-risk patients. When stratified according to BCLC stage, ABRAS score was able to classify patients into high- and low-risk patients across all the spectrum of BCLC stage (BCLC 0–BCLC C).

Notably, although we tested several on-treatment variables, only radiologic response after the first TACE was selected in this risk prediction model. The other variables, such as sex, AFP levels, BCLC stage, and ALBI grade, were baseline factors. In ART score and ABCR score, an increase in Child–Pugh score was found to be an independent factor to predict survival [10, 11]. In our cohort, most patients had preserved liver function, and Child–Pugh increase or ALBI increase was not selected as an independent factor for survival, as Child–Pugh increase or ALBI grade increase after TACE was not frequently observed in our cohort. Having baseline factors in this model means this simple risk score can be useful in planning the first TACE. Having radiologic response after TACE in the risk model indicates this model may be useful to estimate further prognosis after the first TACE and may guide further treatment. Prospective validation of ABRAS score is warranted to see whether ABRAS may also help guiding treatment.

Our data have some limitations. This was a retrospective study with several inherent limitations including potential selection, measurement, and misclassification biases. Because of the long study duration, many radiologists were involved in assessing treatment response after TACE. However, as the radiologists were unaware of the aims of the study, measurement errors in evaluating the response to TACE were independent and non-differential. Several important factors that can be associated with long-term outcome of patients were not assessed. For patients treated with TACE, the level of TACE (lobar, selective, superselective) are an important factor associated with treatment outcome, which we did not have information about[23, 24]. In this cohort, about 75% of patients underwent the second TACE after initial treatment response evaluation, and about 90% of the patients were re-treated with multiple treatment modalities during follow-up (data not shown). Retreatment can be another important factor that determined overall survival of HCC patients [25]. This study is unable to answer whether those with high ABRAS score may benefit from other treatment over TACE. Major strengths of this study are its large sample size, incorporation of the novel ALBI grade in predicting the prognosis of HCC patients receiving TACE, and validation from an independent internal and external cohort. Our real-world cohort included a wide range of BCLC stages (early to advanced) which reflects real-world practices and can be used to all spectra of BCLC stage. Testing of not only baseline, but many on-treatment variables in the survival model is another major strength.

In summary, we developed and validated the ABRAS score, which is entirely composed of objective variables. The novel prediction model can estimate patients’ prognosis and help guide treatment strategies for HCC patients planned or received initial treatment with TACE. Although this score needs to be validated in prospective trials and different populations, it can be a useful model in predicting prognosis and helping choose further treatment strategies for HCC patients treated with TACE.