Introduction

Involvement of the liver, which is the most common site of distant metastasis arising from colorectal cancer (CRC), is already evident in about 25 % of cases at initial presentation [1]. Selective internal radiation therapy (SIRT) through hepatic arterial delivery of Yttrium-90 [90Y] microspheres is a liver-targeted therapy that has emerged as an important treatment option for colorectal liver metastases (CRLM) [2]. A group of independent experts from all disciplines involved in microsphere therapy first met in Columbus, Ohio, on 6–8 April 2006 (The Radioembolization Brachytherapy Oncology Consortium); the consensus recommendation of the group was that a decision for radioembolization should be made by a multidisciplinary team consisting of radiologists, nuclear medicine specialists, and medical and surgical oncologists [3]. In patients who meet the inclusion criteria, treatment with SIRT results in favourable response rates [4]. However, the range of individual outcomes is highly variable. Stubbs et al. reports survival after SIRT ranging from 0.1 to 6.6 months and a mortality rate of 24 % in a group of 100 patients with advanced CRLM [5]. Poor outcome may be a consequence of more extensive disease, but may also be associated with SIRT-induced adverse events, which range from fatigue, abdominal pain, and nausea, to severe complications such as radiation-induced liver disease (REILD) or gastrointestinal ulceration [3, 6, 7]. Indeed, REILD with fatal outcomes has been reported in 15 of 310 patients (5 %) after SIRT for CRLM [8].

SIRT is almost exclusively performed at an advanced stage of terminal disease, with the goal of prolonging patient survival without limiting quality of life. Thus, SIRT candidates must be selected carefully in order to minimize the risk of treatment-related complications and unnecessary hospitalization. Current exclusion criteria do not always provide adequate risk stratification or sufficiently accurate estimation of patient survival to inform a rational decision to perform radioembolization. The aim of the current study was to combine pre-therapeutic CRLM patient characteristics that are plausibly associated with overall survival after SIRT into a prediction nomogram, and to provide internal and external validation of the prognostic accuracy of our system.

Materials and methods

Patients

One hundred consecutive CRC patients treated with SIRT between October 2003 and August 2010 at the Department of Nuclear Medicine, University Hospital of the Ludwig Maximilian University (LMU) in Munich were taken as the training set, and 25 consecutive CRC patients treated with SIRT between November 2008 and March 2011 at the Department of Nuclear Medicine at the University Hospital Bonn in Germany were taken as the validation set. The inclusion criteria for both cohorts were as follows: (a) age over 18 years, (b) confirmed hepatic metastases from colorectal cancer, (c) unresectable, progressive tumour refractory to chemotherapy, (d) preserved liver function, as defined by a serum bilirubin ≤ 2.0 mg/dl, (e) performance status of functional impairment ≥ 60 as measured with the Karnofsky index [9], (f) pre-SIRT life expectancy of at least 3 months, and (g) fitness to undergo angiography. Patients with limited extrahepatic metastases were not excluded if the hepatic metastases were deemed to be the predominant and presumptively life-limiting aspect of the disease. Exclusion criteria included (a) liver failure according to the bilirubin threshold as defined above (>2.0 mg/dl) or by the presence of ascites, (b) evidence of any uncorrectable hepatic arterial blood flow to the gastrointestinal tract observed at angiography or 99mTc-MAA (technetium-99- labelled macro-aggregated albumin) scintigraphy, (c) pulmonary shunt exceeding 20 %, as estimated with 99mTc-MAA scintigraphy, or (d) complete portal venous occlusion [10, 11]. Patients gave written consent to undergo SIRT. The observation period for overall patient survival ended 1 July 2012 for the training set (mean follow-up 60 weeks, range 10–238 weeks) and 1 July 2013 for the validation set (mean follow-up 54 weeks, range 9–209 weeks). The retrospective study protocol was approved by the local ethics committee, and written informed consent for entry into the study was waived. A flow diagram for selection of the study cohort is shown in Supplemental Fig. S1. Characteristics of the training and validation cohort are presented in Table 1.

Table 1 Characteristics of the study cohort at baseline. Numbers are given as total (%) or mean ± standard deviation

Data sets

The data sets included patient demographics (age and sex), the time between initial diagnosis and intended radioembolization, delivered dose, and patient pretreatment history, i.e., liver surgery, chemotherapy with capecitabine, anti-VEGF/EGFR antibody treatment (i.e., bevacizumab, cetuximab, or panitumumab), external radiotherapy, and liver-targeted therapy.

On admission, 1 day before planned radioembolization, serum levels of liver transaminases (alanine transaminase, ALT; aspartate transaminase, AST), bilirubin, and tumour markers CEA and CA 19-9 were obtained from all patients. During the 1 month prior to SIRT, a whole-body 2-deoxy-2-fluoro-D-glucose (18F-FDG) positron emission tomography–computed tomography (PET/CT) scan had been acquired for baseline staging, in accordance with the present guidelines [12]. On CT images, the largest diameters (LDs) of the two largest hepatic metastases were recorded, and tumour-to-liver volume ratios were categorized as either <25 % or ≥25 % by two observers, W.P.F. and A.R.H., with a combined total of more than 15 years of experience in PET/CT interpretation. On PET images, SUVmax of the three hepatic metastases with the highest 18F-FDG uptake was measured and added together as described previously [13]. In this previous study, we analyzed the prognostic value of changes on 18F-FDG PET/CT after SIRT in 80 of the 100 patients of the current training data set.

Radioembolization

Prior to SIRT, all patients underwent angiography with visceral catheterization to evaluate vascular anatomy and identify any relevant aberrant vessels. When necessary, prophylactic embolization of the gastroduodenal, right gastric, and other extrahepatic arteries was performed [3]. SIR-Spheres (SIRTeX Medical, Sydney, Australia) were applied directly into the right and left hepatic arteries. In accordance with recommendations, the necessary activity of SIR-Spheres was calculated in most cases (n = 78 of 100 in the training set and 21 of 25 in the validation set) using the body surface area (BSA) method, as follows [3]: activity in gigabecquerel (GBq) = (BSA – 0.2) + (liver involvement [%]/100). In some earlier cases (n = 22 of 100 in the training set and 4 of 25 in the validation set), dose had been determined using the empirical method (<2.5 % involvement, 2.0 GBq; >25 % involvement, 2.5 GBq) [3].

Statistical analysis

Data were analyzed retrospectively. Overall survival was defined as the interval between the date of radioembolization and the last date of contact as censored observation, or until disease-related death as the event of interest. A log-rank test was used for statistical comparison of survival rates between training and validation sets. After excluding Pearson intercorrelation, we performed univariate Cox regression analysis on the training data set to examine the association between parameters and overall survival. Any pre-therapeutic parameters with a p value < 0.20 in the univariate analysis were entered into a multivariate Cox regression model. In the multivariate analysis, we applied the Wald stepwise selection method with p = 0.05 as entry probability and p = 0.10 as removal probability. A statistically significant difference was defined as p < 0.05. A hazard ratio (HR) with a 95 % confidence interval (95 % CI) was calculated to quantify the strength of the association between relevant parameters and overall survival. All parameters that emerged as statistically significant from the multivariate analysis were used to construct a nomogram for predicting survival after SIRT. The underlying methodological approach has been published previously [14]. For further validation, the area under the receiver-operating-characteristic (AU-ROC) curve and corresponding p values were calculated for the nomogram and for a model that ignores covariates. For internal validation, ROC characteristics were calculated after bootstrap resampling with 20,000 repetitions. The SPSS software version 15.0 package (SPSS Inc., Chicago, Illinois, USA) was used for all statistical analysis.

Results

Patients

Patient characteristics for the training and validation sets are given in Table 1. The mean overall survival of the training cohort was 60 weeks (95 % CI: 50–69, range: 10–238 weeks). Forty-four of 100 patients in the training set survived longer than 1 year after SIRT. The mean overall survival of the validation cohort was 54 weeks (95 % CI: 28–79, range: 9–209 weeks). No significant difference was seen in survival rates between the training and validation sets using log-rank tests (p = 0.51).

Univariate analysis

Non-binary variables were dichotomized based on the median. Results of univariate Cox regression analysis are given in Table 2. Several variables (prior liver surgery, yttrium-90 microsphere dose, baseline CEA, transaminase toxicity, bilirubin toxicity, SUVmax, CT size, and PET/CT tumour-to-liver ratio) were associated with overall survival in the univariate analysis and were consequently included in the multivariate analysis. At least a twofold risk of reduced survival was obtained for baseline CEA ≥ 150 ng/ml (HR: 2.08, 95 % CI: 1.37–3.15), transaminase toxicity ≥ 2.5× upper limit of normal (ULN; HR: 2.91, 95 %CI: 1.63–5.22), CT size ≥ 10 cm (HR: 2.86, 95 % CI: 1.83–4.49), and tumour-to-liver ratio ≥ 25 % (HR: 2.77, 95 % CI: 1.75–4.37).

Table 2 Univariate Cox regression analysis for outcome following SIRT of CRLM

Multivariate analysis

Table 3 shows results of the multivariate analysis. Four variables were independently associated with significantly reduced overall survival after radioembolization. In particular, the relevant variables were no prior liver surgery, baseline CEA ≥ 150 ng/ml, transaminase toxicity ≥ 2.5× ULN, and CT size ≥ 10 cm, each of which increased the risk of reduced survival approximately twofold.

Table 3 Multivariate Cox regression analysis of selected variables

Nomogram

Each risk factor in the multivariate analysis was assigned points relative to its hazard ratio [14]. The probability of 1-year survival was calculated for every combination of risk factors, and therefore for every possible sum of points (Supplemental Fig. S2). Thus the nomogram (Fig. 1) assigns the probability of 1-year survival by summing the point-scale scores for each variable. The total score projected on the bottom scale indicates the probability of 1-year survival. The nomogram demonstrates that the occurrence of four risk factors in an individual reduced the predicted 1-year survival after radioembolization to about 1 %, whereas patients without risk factors had an estimated 80 % chance of 1-year survival. Among the most influential factors for poorer patient survival were transaminase toxicity (HR: 2.81, 95 % CI: 1.52–5.22) and CT lesion size (HR: 2.31, 95 % CI: 1.44–3.68) before treatment.

Fig. 1
figure 1

Nomogram for predicting 1-year survival based on pre-therapeutic variables

Table 4 provides AU-ROC characteristics. The AU-ROC of our prediction model was 0.81 (95 % CI: 0.73–0.89, p < 0.001) for the training cohort and 0.83 (95 % CI: 0.62–1.05, p = 0.010) for the validation cohort. The corresponding AU-ROC for a model that ignores covariates, which gives a uniform 0.44 probability of 1-year survival for each patient, was 0.50 (95 % CI: 0.50–0.50, p = 1.00) for the training cohort and 0.50 (95 % CI: 0.24–0.76, p = 1.00) for the validation cohort. Supplemental Figure S3 provides the corresponding ROC curves for the training and validation data sets. In the validation cohort, 9 of 25 patients had a score of ≤57 points, with a predicted 1-year survival ≥ 68 % according to the nomogram. Median overall survival for this subgroup was 93 weeks. Nine of the 25 patients had 57 to 152 points, with a predicted 1-year survival ranging between 61 and 35 % according to the nomogram. Median overall survival for this subgroup was 30 weeks. Seven of 25 patients had a score ≥ 209 points, a predicted 1-year survival < 15 %, and a true median overall survival of 16 weeks.

Table 4 AUC-ROC of the prediction model versus a model that ignores covariates

Discussion

The aim of this study was to analyze the predictive value of several pre-therapeutic CRLM patient characteristics for overall survival after SIRT. Characteristics that independently predicted patient survival were combined into a nomogram for prediction of 1-year survival. As methods of internal validation such as cross-validation or bootstrap resampling inherently lend themselves to over-interpretation, we conducted an independent validation based on an external patient cohort. Training and validation sets had similar characteristics for age, gender, and surgical pretreatment. However, the extent of extrahepatic disease and hepatic tumour load (data not shown) was higher in the validation cohort, leading to a higher SIRT treatment dose and lower mean survival rates for these patients. Still, overall survival and nomogram accuracy for prediction of 1-year survival, as determined by AU-ROC, did not differ significantly between the two patient groups.

To our knowledge, this is the first prediction model that combines multiple pre-therapeutic parameters for the estimation of patient survival after SIRT of CRLM. However, several studies have reported on prognostic markers in patients treated with radioembolization or similar liver-directed therapies that can be compared to the present findings. In our analysis, history of prior liver surgery independently predicted patient survival after SIRT. Unfortunately, up to 90 % of patients with CRLM present with unresectable disease at initial diagnosis [15]. For these patients, 5-year survival rates remain below 10 %, despite best medical treatment [16, 17]. Patients in our cohort whose disease was deemed unresectable at baseline had an approximately twofold increased risk of reduced survival after radioembolization compared to patients with prior liver surgery. Clearly, there are differences in the extent of disease and particulars of tumour biology in candidates for surgery and patients with liver disease considered to be unresectable. This underlying distinction is reflected in the multivariate analysis of prognostic factors in our patient cohort. In this sense, the present findings confirm those of Bester et al. in their study of the impact of prior hepatectomy on the efficacy of radioembolization in 427 patients with CRLM (231 cases) or other tumour entities [18].

Cross-sectional structural imaging is routinely performed for initial diagnosis and follow-up of patients with CRC in order to determine the extent of disease. Previous studies in large CRC patient cohorts have indicated a distinct prognostic value of CT- or MRI-based quantification of the hepatic metastatic tumour burden for predicting survival after liver surgery, SIRT, or other local interventions [5, 6, 19, 20]. Sato et al. found an association between patient survival and baseline tumour burden, categorized in quartiles of the total liver, in a cohort of 109 patients (including 51 patients with CRC) treated with radioembolization [6]. Several other groups have utilized CT examination to identify the size of the single largest liver lesion in CRC patients before surgery, after surgery, or before radiofrequency ablation (RFA); these studies likewise found reduced survival for patients with a single lesion exceeding 5 cm in diameter [19, 2125]. Based on Response Evaluation Criteria in Solid Tumours (RECIST) 1.1 criteria, we chose to calculate the sum of the diameters of the two largest liver lesions to obtain a simple and easily reproducible method of estimating hepatic tumour burden in patients with multiple lesions. As shown in Table 3, in the multivariate analysis, a summed tumour diameter exceeding 10 cm was an independent predictor of reduced survival, which confirms the aforementioned findings for the prognostic value of pre-therapeutic CT.

Even in the absence of imaging, tumour burden can be assessed from serum levels of specific tumour markers. Low pre-therapeutic CEA level has been linked in previous studies to better survival after liver surgery or RFA [22, 2628]. In a meta-analysis of prognostic factors for survival in CRLM patients after liver resection, six of nine published studies found an increased risk, and three studies indicated no association or a decreased risk for cases with baseline CEA levels exceeding 200 ng/ml [15]. In our cohort, the median CEA level at baseline was 161 ng/ml, and we identified a cutoff value of 150 ng/ml to discriminate risk groups. Based on this cutoff, pre-therapeutic CEA proved to be an independent predictor of patient survival, yielding a hazard ratio of 2.1 in multivariate analysis (Table 3). As such, our CEA findings are consistent with previous studies on CRC patients treated with liver surgery or local ablation therapy, indicating an association between tumour burden as measured by this tumour marker serum level and the rate of disease progression.

In our training cohort, the presence or absence of extrahepatic metastases was not predictive of overall survival. This negative finding stands in contrast to other reports on baseline prognostic factors in patients with CRC [29, 30]. The discrepancy is most likely related to the extensive pretreatment and careful pre-selection of patients being considered for SIRT: only those patients with very limited extrahepatic metastases were judged as suitable candidates. Thus, no patients showing advanced extrahepatic disease in the pre-therapeutic examination—which is likely to adversely affect survival—were selected for SIRT.

There is a lack of published literature on randomized prospective trials on SIRT, and as such, the value of radioembolization has not yet been clearly defined in guidelines for the management of CRC [31, 32]. In clinical practice, SIRT is usually applied as a palliative treatment, e.g., as a bridge between two chemotherapy cycles intended to allow recovery from non-hepatic toxicity, or as adjuvant therapy for patients showing insufficient response to chemotherapy. The decision to opt for SIRT must be based on estimated risks and benefits, as radioembolization entails hospitalization and can produce serious adverse effects. The prognosis after radioembolization is a particularly important aspect for clinicians and patients involved in the decision-making process. Our new risk nomogram, which combines several pre-therapeutic parameters, thereby provided a good estimation of outcome in two independent patient cohorts. However, further validation will need to confirm its utility for CRLM patients being considered for SIRT at other institutions.

Limitations

The limitations of our study arise from its retrospective design and the limited number of patients analyzed in both the training and validation cohorts. Consequently, a definitive validation should be undertaken in a larger cohort of patients, preferably recruited prospectively at multiple centres. Many of our initial patients were excluded during pre-therapeutic evaluation, as they did not meet the inclusion criteria for radioembolization. As such, our study may suffer from selection bias, thus limiting the validity of our nomogram for cases adhering strictly to the inclusion criteria described in the “Materials and methods” section. Prior to routine follow-up 3 months after SIRT, our patients received no further cancer-directed therapy. However, we did not analyze the effect of subsequent treatments, which may have influenced survival in the late-line setting. On the other hand, the validity of our study is supported by our inclusion of easy-to-obtain variables and by its cross-validation through an external patient cohort.

Conclusions

To the best of our knowledge, we have defined the first simple model for the prediction of 1-year survival of patients with CRLM after radioembolization. The model includes four pre-therapeutic parameters, each of which independently predicted reduced survival: no liver surgery, CEA ≥ 150 ng/ml, transaminase toxicity ≥ 2.5× ULN, and CT size ≥ 10 cm. Our nomogram provided good prediction of survival in two independent patient cohorts. However, this system needs further validation in larger patient cohorts before it can be applied in clinical practice.