Introduction

Approximately 50 % of patients diagnosed with a primary colorectal carcinoma eventually develop liver metastases (mCRC) during the course of the disease [1]. Resection of liver metastases in patients without extrahepatic disease can improve overall survival (OS) rate with 5-year survival rates ranging from 30 % to 47 %, which is comparable to that of patients with stage III disease [25]. The use of neoadjuvant chemotherapy regimens has led to an increase in the proportion of patients eligible for surgery by downsizing the liver metastases and rendering unresectable lesions resectable [6, 7]. The advent of targeted therapies against VEGF and EGFR signalling has further raised response rates, progression-free survival (PFS) and OS in mCRC, when combined with conventional chemotherapy regimens [8, 9]. As response rates have improved, the proportion of patients eligible for curative resection of liver metastases has also increased [6].

The recombinant humanized monoclonal immunoglobulin G antibody bevacizumab (Avastin) inhibits angiogenesis by neutralizing all isoforms of human VEGF and blocking their binding to VEGF receptors [10]. It is the antiangiogenic agent with the largest body of evidence in the treatment of mCRC [11]. The treatment response to bevacizumab is incompletely captured by the conventional size-based RECIST criteria (Response Evaluation Criteria in Solid Tumors) [12, 13]. These criteria were developed for assessing volume reduction of tumours following cytotoxic therapy, whereas these new targeted therapies are rather cytostatic. As such, new imaging modalities for rapid and effective identification of treatment (non)response seem necessary [1416].

18F-FDG PET imaging has an established role in the staging and restaging of CRC and several studies have also suggested a potential role in the assessment of treatment response [1719]. Reliable quantification of this 18F-FDG PET-derived treatment response remains a topic of ongoing debate [20]. The maximum standardized uptake value (SUVmax), which is the activity concentration in the hottest voxel normalized to administered activity and patient weight, is widely used and has proven its usefulness in various settings of response assessment [21]. However, there are several issues concerning its use, particularly its lack of robustness in terms of image noise, reconstruction methods, scanner sensitivity and partial volume effects (PVE) [22]. Another attractive and more comprehensive parameter is total lesion glycolysis (TLG), which is the product of the mean SUV and the volume of a lesion [23]. Though less susceptible to noise, significant variability in TLG values can be introduced by tumour delineation methods as well as PVE, which plays an important role in response assessment because of variations in lesion size [20].

Recently our group introduced the concept of standardized added metabolic activity (SAM) [24]. This parameter is a measure of the total excess tumoral SUV above the tumour background whilst excluding PVE. It is calculated by drawing a volume of interest (VOI1) around the tumour and a larger VOI (VOI2) around VOI1. Subtracting the background activity in [VOI2-VOI1] from that in VOI1 yields SAM. These VOIs have to be placed around the tumoral lesions at a sufficient distance from the actual tumour border to avoid PVE. As such, the method also avoids the variability which can be introduced by manual tumour delineation or lesion segmentation algorithms [24]. As correctly stated by Fleming et al. in their letter to the editor [25], the equation used to derive SAM is essentially the same as the specific uptake size index (SUSI) method, which was introduced and tested on planar and SPECT scintigraphy [26].

In this hypothesis-generating study, the potential role of 18F-FDG PET imaging in response assessment was evaluated in patients with mCRC and potentially resectable liver metastases, who underwent a neoadjuvant bevacizumab/chemotherapy treatment. We investigated the value of morphological as well as metabolic parameters (including SAM) in predicting therapy outcome.

Materials and methods

Patient selection, treatment and follow-up

This study was approved by the Ethics Committee of Ghent University Hospital. Included in the study were 19 patients with a diagnosis of mCRC and potentially resectable liver metastases. All patients provided written informed consent prior to inclusion. Before surgical resection of the liver metastases, patients were treated with either FOLFOX6 (oxaliplatin 100 mg/m2/2 h, leucovorin 400 mg/m2/2 h, 5-FU bolus 400 mg/m2/10 min and 5-FU continuous infusion 2,400 mg/m2/46 h) or FOLFIRI (irinotecan 180 mg/m2/90 min, leucovorin 400 mg/m2/2 h, 5-FU bolus 400 mg/m2/10 min and 5-FU continuous infusion 2,400 mg/m2/46 h). Both regimens were combined with bevacizumab (Avastin, 5 mg/kg). Altogether five cycles of chemotherapy with a cycle duration of 14 days were administered. The fifth cycle was administered without bevacizumab in order to avoid adverse surgical events such as impaired wound healing or bleeding complications.

Liver resection was performed at a median of 48 days (interquartile range, IQR, 38–55 days) after the last cycle of chemotherapy in 16 patients. Surgical eligibility was assessed by a multidisciplinary team, whereas the type and the extent of liver resection was decided by the hepatobiliary surgeon. Postoperatively, seven additional cycles of adjuvant chemotherapy (without bevacizumab) were administered. Two patients were not eligible for liver surgery and were directed to other treatments. As they did not receive the same treatment, they were excluded from the survival analysis.

The PFS was measured from the start date of treatment until the date of progressive or recurrent disease. The OS was measured from the start date of chemotherapy until death. The close-out date of the study was set at 30 September 2012. The patients who did not reach the specific end-point (progression/death) by the close-out date as well as the patients who were lost to follow-up, were censored at the date of their last visit.

18F-FDG PET/CT image acquisition and reconstruction

18F-FDG PET/CT scans were performed before the start of neoadjuvant chemotherapy and 15 days after the last cycle. All examinations were performed according to the standard FDG PET/CT protocol at the Department of Nuclear Medicine of Ghent University Hospital. The patients fasted for at least 6 h prior to the investigation and their blood glucose level was tested to ensure normoglycaemia. The patients then received 3.7–5.5 MBq/kg (0.10–0.15 mCi/kg) of 18F-FDG intravenously followed by 250 ml of sodium chloride and 20 mg of furosemide. Image acquisition was started 60 min after injection using an integrated PET/CT scanner (Philips Gemini PET/CT, Philips Medical Systems) which consists of a gadolinium oxyorthosilicate full-ring PET scanner with 5-mm spatial resolution and a 16-slice helical CT scanner. First a “whole-body” CT scan was performed (from the top of the skull to the proximal third of the femora). PET scanning was done immediately after acquisition of the CT images without moving the patient. The patients were instructed to breathe normally during the acquisition of CT and PET images. The standard software recommended by the manufacturer was used for acquisition and processing. The PET images were reconstructed using a three-dimensional row action maximum likelihood algorithm (3D-RAMLA) as provided by the manufacturer. The images were corrected for decay, scatter, random events and attenuation.

Image analysis

The morphological treatment response of the liver metastases was assessed by an experienced radiologist by applying the revised RECIST, version 1.1 [27]. According to these criteria, treatment responses were categorized into four categories: complete response (CR), partial response (PR), stable disease (SD) and progressive disease (PD). The “metabolic” treatment response was evaluated by quantifying the FDG uptake of the liver metastases on both the baseline as well as the follow-up PET images. These image analyses were performed by an experienced nuclear medicine physician using PMOD software (version 3.1; PMOD Technologies Ltd). VOIs were drawn around all liver metastases using an isocontour threshold method with the threshold set at 2 standard deviations above the mean SUV in healthy liver parenchyma. The SUVmax of the liver metastases was determined in the voxel with the highest measured activity concentration (becquerels per millilitre) and was calculated by dividing this activity concentration by the injected dose normalized to the body weight (becquerels per gram). If the liver lesions had disappeared on the follow-up images, the VOI of the most metabolically active lesion was copied from the baseline image and placed carefully in the same location. The SUVmax in this VOI was used as the follow-up value.

The rationale behind calculating SAM values has been described elsewhere [24] and was also used previously by Fleming et al. for planar and SPECT scintigraphy, as mentioned above [26]. A first VOI (VOI1) was drawn around all metastatic liver lesions. A second VOI (VOI2) was delineated around VOI1 directed to a small zone of homogeneous background. The borders of VOI1 were set at a reasonable distance from the tumour lesions in order to avoid PVE and to ensure that no spill-over from the metastatic lesion to VOI2 occurred. An example is shown in Fig. 1. Subsequently, the mean background activity (BG) was derived for all lesions using the following formula: mean BG = (total SUV VOI2 − total SUV VOI1)/(volume VOI2 − volume VOI1) in which total SUV is the product of the mean SUV and the respective volume. SAM was then calculated using the following formula: SAM =  total SUV VOI1 − (mean BG × volume VOI1). The obtained lesional SAMs were summed to produce one SAM value per scan, which represented the total metabolic burden of all liver metastases. These cumulative SAM values were used for further analysis.

Fig. 1
figure 1

18F-FDG PET scan in a patient with mCRC. A volume of interest (VOI1 yellow) is set at a reasonable distance from the liver lesion in order to avoid PVE. A second VOI (VOI2 blue) is drawn around VOI1. SAM is calculated by subtracting the background activity in [VOI 2-VOI1] from the total activity in VOI1

Statistical analysis

Statistical analysis was performed using SPSS version 19.0. Because of the sample size and the associated non-normality of the data, nonparametric tests were used. An independent Mann-Whitney U test and Wilcoxon’s rank sum test were used to evaluate the differences in SUVmax and SAM values between RECIST responders and nonresponders. A paired Wilcoxon signed ranks test was used to evaluate the differences in PET parameters between baseline and follow-up scans. PFS and OS estimates were calculated using Kaplan-Meier survival analysis and the log-rank test was used to evaluate the significance of the differences between the acquired survival curves. A p value <0.05 was deemed significant for all hypothesis tests.

Results

Patient characteristics

Patient characteristics are presented in Table 1. The mean age of the patients included was 61 years and 63 % were men. In spite of initial suspicion, one patient appeared to have only extrahepatic metastases and was excluded from all further data analysis. The remaining 18 patients had a median of 3.5 liver metastases (IQR 1–9.25). Of these 18 patients, 12 received the FOLFOX6 and bevacizumab regimen before liver surgery, and 6 were treated with FOLFIRI and bevacizumab. Despite this treatment, two patients were still not suitable candidates for liver surgery and were directed to other treatments. As they did not receive the same treatment, they were excluded from the survival analysis. The median PFS and OS in the remaining population of 16 patients were 10.9 months and 48 months, respectively. One patient was still disease-free at the close-out date, five patients had relapsed but were still alive and one patient was lost to follow-up. These patients were censored at their last visit (for PFS and/or OS, if applicable). The median follow-up time was 53 months.

Table 1 Patient and tumour characteristics of the 19 patients with mCRC

FDG PET and RECIST-defined treatment response

According to the revised RECIST criteria, eight patients showed a PR (44.4 %), nine SD (50 %) and one PD (5.6 %). When dichotomized into responders (PR) and nonresponders (SD and PD), the median number of liver metastases was 3.5 in both groups (p = 0.57). In contrast, both baseline SUVmax and baseline SAM values proved to be significantly different between morphological responders and nonresponders. The median baseline SUVmax value in responders was 3.8 (IQR 2.9–5.6) and 7.2 (IQR 4.7–8.7) in nonresponders (p = 0.021). The median baseline SAM value in responders was 34 (IQR 15–59) and 211 (IQR 68–1139) in nonresponders (p = 0.002). The follow-up SUVmax and SAM values as well as the difference between the baseline and follow-up values (∆SUVmax and ∆SAM) were not significantly correlated with the RECIST response.

RECIST-defined treatment response and survival

The Kaplan-Meier estimates for the median PFS were almost identical for patients with and without a morphological response: 10.9 and 10.8 months, respectively (p = 0.93). The median OS in patients with a morphological response was 51 months and in nonresponders was 39 months, but this difference was not statistically significant (p = 0.24). The survival curves are shown in Fig. 2.

Fig. 2
figure 2

Kaplan-Meier survival curves for PFS and OS in morphological responders and nonresponders (according to the RECIST criteria)

FDG PET-defined treatment response and survival

The median SUVmax and SAM values at baseline were 5.2 (IQR 3.6–8.4) and 67 (IQR 37–469). Decreases in SUVmax and SAM were seen after treatment (median decreases 25.3 % and 94.5 %; p = 0.033 and 0.003, respectively). The median and mean values as well as the distribution of the various PET parameters are presented in Table 2.

Table 2 Parameter values and distributions for the entire patient population (n = 18; two patients were eventually excluded from survival analysis as they did not receive the same treatment)

The median PFS and OS were similar in patients with SUVmax above and below the median of 5.2: 11.4 vs. 10.4 months (p = 0.81) and 47 vs. 48 months (p = 0.88), respectively. In contrast, a low follow-up SUVmax had a beneficial effect on both PFS and OS. Patients with a follow-up SUVmax higher than 2.85 had a median PFS of 10.4 months compared with 14.7 months in those with a low follow-up SUVmax. Patients with a high follow-up SUVmax had a median OS of 32 months, whereas the median OS had not been reached in those with a low follow-up SUVmax. This survival benefit was significant for both PFS (p = 0.01) and OS (p = 0.003). Survival curves stratified by follow-up SUVmax are shown in Fig. 3. On the other hand, the difference between baseline and follow-up SUVmax values (∆SUVmax) appeared to have no prognostic value. The median PFS and OS in patients with ∆SUVmax above and below the median of 26.9 % were 10.2 vs. 10.9 months (p = 0.39) and 47 vs. 51 months (p = 0.51), respectively. Recently, Heijmen et al. found a coefficient of repeatability of SUVmax of 33.3 % in untreated liver metastases [28]. Using this value as a cut-off, similar results were found: the median PFS and OS in patients with ∆SUVmax above and below 33.3 % were 11.4 vs. 10.9 months (p = 0.73) and 48 vs. 51 months (p = 0.47), respectively.

Fig. 3
figure 3

Kaplan-Meier survival curves for PFS and OS in patients with a low follow-up SUVmax versus patients with a high follow-up SUVmax

The baseline SAM was not significantly correlated with PFS (p = 0.38) or OS (p = 0.88), as the median PFS and OS estimates were almost identical between those with a baseline SAM above and below 65 (11.4 vs. 10.4 months for PFS and 47 vs. 48 months for OS, respectively). In contrast, both follow-up SAM and ∆SAM values appeared to be significantly correlated with PFS and OS. When dichotomized according to their median values, the patients were categorized into identical groups. The group with a high follow-up SAM and a low ∆SAM had a median PFS of 9.4 months, and the other group had a median PFS of 14.7 months (p = 0.002). The median OS in the former group was 32 months, whereas the median OS had not been reached in the latter group (p = 0.002). The corresponding survival curves are shown in Fig. 4.

Fig. 4
figure 4

Kaplan-Meier survival curves for PFS and OS in patients with a low follow-up SAM and a high ∆SAM versus patients with a high follow-up SAM and a low ∆SAM. When the patients were dichotomized according to the median values of follow-up SAM and ∆SAM, identical patient groups were formed

Discussion

Lower baseline SAM and SUVmax were associated with a potential morphological response (defined according to RECIST criteria). A correlation between baseline SUVmax and morphological response was also seen by Byström et al. in a similar setting [29]. On the other hand, baseline metabolic parameters as well as morphological response did not predict PFS or OS. The finding that baseline PET imaging had no prognostic value may appear to contradict previous results of Riedl et al. [30], who found a strong correlation between baseline FDG uptake values in liver metastases before metastasectomy and survival. This may most likely be explained by the fact that the patients in the present study received several cycles of chemotherapy and bevacizumab before liver resection, whereas the patients in the study by Riedl et al. were immediately directed to surgery.

The weak prognostic value of RECIST-defined morphological response in this study can possibly be explained by the presence of bevacizumab as an antiangiogenic agent in our treatment regimen. Despite the proven survival benefit of addition of bevacizumab to classic chemotherapy regimens [8], no increase in RECIST-defined response could be demonstrated by adding bevacizumab [13]. Although the RECIST criteria are a well-validated and widely used instrument for assessment of the response of solid tumours to classic cytotoxic chemotherapy, new antiangiogenic and other targeted therapies are putting pressure on these conventional size-dependent criteria. The effects of these new therapeutic modalities are considered predominantly cytostatic rather than cytotoxic and beneficial patient outcomes can be seen without an initial change in lesion size [14, 15, 31].

Follow-up SUVmax and SAM values were not correlated with morphological response, but proved to be significant prognostic factors for PFS as well as OS. This finding adds to the existing evidence about the prognostic power of 18F-FDG PET imaging in assessing treatment response of mCRC [18, 19, 3234] and suggests a potential role for a PET-guided treatment algorithm in the management of these patients. Further research is warranted in this field.

While high follow-up SUVmax proved to be an adverse prognostic factor, the ∆SUVmax between baseline and follow-up studies did not have any prognostic value. In contrast, the reduction in total metabolic tumour burden as assessed by ∆SAM proved to be significantly related to PFS and OS. The reason for this discrepancy is probably twofold. On the one hand, SUVmax is more prone to variability caused by several factors such as image noise and resolution, reconstruction methods, scanner sensitivity and PVE. It is determined in the hottest voxel, which, by definition, means that the value will be an outlier [22]. In a recent study, Heijmen et al. demonstrated a coefficient of repeatability of SUVmax of 33.3 % in untreated liver metastases [28], suggesting that ∆SUVmax should at least be of this magnitude to represent a true treatment response. However, dichotomizing the present patient population using 33.3 % as a cut-off ∆SUVmax yielded comparable results: no significant difference in PFS or OS was found between the high and low ∆SUVmax group.

On the other hand, SUVmax does not entirely capture the treatment response to therapy as it only reflects the metabolic activity per gram of tissue in one voxel and does not take into account the total tumour metabolic load [35]. SAM is a marker derived from TLG which takes into account metabolic activity as well as lesion volume [23]. In contrast to TLG, SAM uses a background subtraction, assuming that tumoral tissue develops on top of normal tissue. As such, the SAM method also avoids the problems of lesion segmentation. Moreover, when the necessary VOIs are placed at a sufficient distance from the actual tumour border, PVE are avoided, which makes accurate response assessment in small lesions possible [24].

Some limitations of this study should be addressed. First, the study population was rather small. Nevertheless, the significant p values in this hypothesis-generating study justify further evaluation in a larger patient population. Second, the follow-up PET scans were performed after completion of preoperative chemotherapy. Therefore, no conclusions can be made about the potential of PET imaging in picking up an early (non)response to treatment. Early identification of a nonresponder could reduce the side effects and costs of futile treatment as well as allow selection of patients who would be better directed immediately to surgery or second-line treatment. Further research is needed to clarify the potential value of “interim” PET in the treatment of mCRC.

In the survival analysis of this small patient population, SAM and SUVmax values were dichotomized according to the median value in order to obtain two patient groups of comparable size. Further research in a larger patient population could identify the optimal cut-off value to separate the patients in terms of prognosis.

Regarding the calculation of SAM, we have to consider potential drawbacks of the presented methodology. As SAM includes background correction, the values obtained might be less reliable in the presence of an irregular background. However, the drawback of an irregular background did not apply to liver lesions in the setting of this study as diffuse liver involvement (and the associated heterogeneous FDG uptake) would have obviated the need for neoadjuvant chemotherapy and subsequent resection of liver metastases. Nevertheless, further research is needed to identify the repeatability of the SAM method in liver metastases as has been performed recently for metabolic volume and TLG [28].

Finally, SAM values were compared with SUVmax values as the latter is the most widely used parameter in the assessment of treatment response by PET imaging. It cannot be excluded that metabolic volume and SUVmean, provided they are estimated using robust and repeatable lesion segmentation techniques [36], may provide similar or additional prognostic information to that obtained by SAM.

Conclusion

18F-FDG PET imaging could be a useful tool to assess the treatment response and predict the clinical outcome in patients with mCRC who undergo preoperative chemotherapy before liver metastasectomy. Both follow-up SUVmax and SAM as well as ∆SAM were found to be significant prognostic factors for PFS and OS in this hypothesis-generating study. Further research is warranted to confirm these findings in a larger population as well as to explore to what extent 18F-FDG PET imaging might guide treatment in patients with mCRC.