Introduction

Treatment modalities in rectal cancer have undergone continuous significant evolution over the past 20 years. Preoperative radiation therapy with or without concomitant chemotherapy followed by radical surgery is currently the recommended procedure for locally advanced rectal cancer (LARC) [14]. Neoadjuvant treatment is intended to reduce the risk of local recurrence as well as to downsize/downstage the tumour, making successful conservative surgery more likely and consequently improving the patient's quality of life [512]. Unfortunately, not all patients benefit equally from preoperative radiochemotherapy (RCT) because tumour response is very heterogeneous, ranging from complete remission (12 – 27 % of patients) to total resistance [13, 14]. Therefore, accurate assessment of response to preoperative treatment could be of great value for “tailoring” the neoadjuvant regimen as well as the surgical approach to the individual patient [15, 16]. For this reason, two time-points are particularly interesting for the assessment of response to preoperative treatment: the early phase, during the course of preoperative RCT, for discriminating the subgroup of nonresponder patients who may benefit from alternative treatment strategies, and the post-RCT time-point for guiding the choice of an individualized surgical approach. The current standard method for discriminating responders from nonresponders to preoperative RCT is histopathological analysis, but this method is applicable only in a postoperative setting and consequently cannot be used for tailoring therapy. Therefore, there is growing interest in noninvasive methods suitable for the prediction of pathological response and also clinical outcome, such as PET/CT using 18F-FDG [1722].

The main aim of this study was to evaluate whether metabolic changes in the primary tumour during and after preoperative RCT as well as the qualitative assessment of PET/CT images obtained after RCT could be applied to predict histopathological response in LARC. The secondary aim was to determine the value of 18F-FDG PET/CT for predicting disease-free survival (DFS) and overall survival (OS).

Materials and methods

Patient characteristics

From December 2006 to July 2012, 126 consecutive patients (79 men and 47 women, aged 65 ± 10 years) with histologically confirmed locally advanced rectal adenocarcinoma were prospectively enrolled (Table 1). Inclusion criteria were: (a) cT2–4 N0–2 M0 disease verified by MRI and chest/abdomen CT, (b) only primary tumours, no recurrences, (c) no previous chemotherapy or radiotherapy to the pelvis, and (d) no contraindications to RCT. Exclusion criteria were: (a) histology not adenocarcinoma, (b) previous malignant disease, and (c) lack of compliance. The study was approved by the local ethics committee and written informed consent was obtained from all patients before entering the study.

Table 1 Patient characteristics

18F-FDG PET/CT

All patients underwent 18F-FDG PET/CT imaging at three different time-points: before preoperative RCT (baseline PET/CT), at the end of the second week of preoperative RCT (early PET/CT) and shortly before surgery (late PET/CT). All patients fasted for at least 6 h prior to injection of 18F-FDG (222 – 333 MBq) followed by 500 ml of physiological saline. Blood glucose levels were systematically measured before 18F-FDG injection and were always less than 180 mg/dl. Immediately before the scan was started all patients were asked to void; bladder catheters or diuretics were not used. Whole-body PET/CT images were obtained using a GEMINI GXL scanner (Philips Medical Systems, OH) 60 – 70 min after 18F-FDG injection. The CT acquisition protocol included a low-dose CT scan from the base of the skull to the mid-thigh for attenuation correction and anatomical localization. All PET scans were acquired in three-dimensional mode with an acquisition time of 3 min per bed position. PET data were reconstructed using an iterative algorithm (3D-RAMLA) and corrected for dead time, decay, random coincidences and attenuation.

Tumour lesions were identified as areas of pathologically increased 18F-FDG uptake and maximum standardized uptake values (SUVmax) were measured using a three-dimensional volume of interest placed over the lesion, carefully avoiding the bladder. In the control PET/CT studies particular attention was made to the position of primary tumour on the baseline scan. For semiquantitative analysis, SUVmax was calculated using the maximum activity within the primary tumour normalized to the injected dose and the patient’s body weight. Tumour SUVmax values were used to assess the response to preoperative RCT by calculating a response index (RI), as RI = [(SUV2 − SUV1)/SUV1] × 100. Changes in 18F-FDG uptake between baseline and early PET/CT (early RI) and between baseline and late PET/CT (late RI) and the correlations between the changes and histopathological tumour responses were determined.

Only the post-RCT PET/CT images were qualitatively analysed and complete resolution of 18F-FDG uptake in the primary tumour was considered as complete metabolic response. The appearance of new area of 18F-FDG uptake in the radiation treatment field but outside the tumour area on the baseline scan was considered radiation-induced inflammation. Patients with visually apparent radiation-induced inflammation were included in the analysis only if the distinction between the tumour area and inflammation was clearly evident. The correlations between visually determined responses and histopathological tumour responses were also determined.

Preoperative RCT

Preoperative 5-fluorouracil-based RCT was offered to patients with a resectable clinical stage II/III extraperitoneal rectal cancer or with a clinical stage I low-lying tumour. Concurrent external beam radiotherapy consisted of a total dose of 50.4 – 55 Gy. Total and daily doses depended on the tumour presentation and the potential tumour involvement of the mesorectal fascia (MRF) determined from the MRI scan (MRF+ patients were considered those with a minimum distance of ≤1 mm from the tumour to the fascia) [23]. All patients received 45 Gy with conventional fractionation of 1.8 Gy a day to the pelvic lymph nodes. The boost dose was changed according to tumour stage: cT3 MRF− patients received a sequential boost to the tumour and the corresponding mesorectum of 5.4 Gy (1.8 Gy a day) over 5 weeks and 3 days to a total dose of 50.4 Gy, whereas cT3 MRF+/cT4 patients received to the same volume a concomitant boost of 10 Gy (1 Gy delivered twice a week immediately after the daily dose to the pelvis) over 5 weeks to a total dose of 55 Gy. Radiotherapy was delivered in five sessions per week. Concomitant chemotherapy was changed according to tumour stage: 1,300 mg/m2 capecitabine was administered daily during the whole radiotherapy treatment period 7 days a week to all patients, and weekly administration of 60 mg/m2 oxaliplatin was added in patients with more advanced tumour (cT3, MRF+ and cT4).

Surgery and pathological analysis

The standard surgical procedure was total mesorectal excision at least 8 weeks after the end of RCT; anterior resection or abdominoperineal resection was done according to clinical staging and restaging (Table 1). The surgical specimens were classified according to the American Joint Committee on Cancer TNM classification, 7th edition (ypTNM). Based on the histopathological report, a complete pathological response (cPR) was defined as ypT0 N0 and a non-complete pathological response (non-cPR) as ypT1–4 N0–2. Tumour response to preoperative RCT was also assessed according to tumour regression grade (TRG) score proposed by Mandard et al. [24] as follows: TRG1, complete tumour response; TRG2, residual cancer cells scattered through fibrosis; TRG3, increased number of residual cancer cells with predominant fibrosis; TRG4, residual cancer outgrowing fibrosis; and TRG5, no regressive changes within the tumour. Based on the TRG score, tumours were grouped into cPR (TRG1) and non-cPR (TRG2–5).

Statistical analysis

All quantitative values are expressed as mean ± standard deviation (SD) and ranges (minimum to maximum). Receiver operating characteristic (ROC) curves were constructed with respect to early and late RIs to assess the accuracy of percentage variation in tumour 18F-FDG uptake for predicting the presence of disease (non-cPR) on the histopathological report (other than ypT0 N0 or TRG2–5). The area under the curve (AUC) was used to assess accuracy; if a test was judged at least moderately accurate (AUC >0.7), the maximum product of sensitivity and specificity was chosen as the cut-off value. The same analysis was performed with respect to recurrence and death. Furthermore, Cohen’s kappa was used to evaluate the level of agreement between the qualitative assessment of late RCT PET/CT images (complete vs. incomplete metabolic response) and the histopathological report; sensitivity and specificity for assessing non-cPR were also calculated.

To evaluate the independence of the RI cut-off in predicting non-cPR, a multivariable logistic model was built. The following variables were entered into the model because of their clinical relevance: age, gender, clinical T stage, clinical N stage (0 vs. 1+), and the distance from the tumour to the anal verge (<3 mm vs. ≥3 mm). Stepwise model selection using the likelihood ratio test was used to identify the best model; the Hosmer and Lemeshow statistic was also used to identify the model with the best calibration. The c statistics was used to describe the discriminative power of the model. The odds ratio (OR) and 95 % confidence interval (95 % CI) were used to report the results. Similarly, a Cox regression model was used to assess the role of RI cut-off in predicting DFS and OS. DFS was defined as the time from initial treatment to documented local or distant recurrence or last follow-up. OS was defined as the time from initial treatment until death from any cause or to the last follow-up. The same variables as above were entered into the model. The results were expressed in terms of hazard ratio (HR) and 95 % CI. SPSS for Windows was used to carry out the analysis and the significance level was set at p = 0.05.

Results

Metabolic response evaluation

The baseline PET/CT scan was performed in all 126 patients, and the early and late PET/CT scans were performed in 105 and 123 patients (83.3 % and 97.6 %, respectively). The median times between completion of preoperative RCT and the late PET/CT scan, completion of RCT and surgery, and between the late PET/CT scan and surgery were, respectively, 49 days (range 28 – 101 days), 71 days (range 34 – 155 days) and 17 days (range 5 – 24 days). The mean tumour SUVmax values were, respectively, 17.2 ± 8.0 (range 6.4 – 46.1), 9.4 ± 6.3 (range 3.3 – 42.3) and 4.5 ± 2.0 (range 0.5 – 10.5) in the baseline, early and late PET/CT scans. The mean early and late RIs were respectively −43.4 ± 24.0 % (range −86.7 % to 57.5 %) and −69.3 ± 17.8 % (range −96.9 % to 6.1 %).

Of 123 patients, 26 (29.3 %) were classified as metabolic complete responders on visual analysis of late PET/CT images. Figures 1 and 2 show representative images of complete and incomplete metabolic responses, respectively. Radiation-induced inflammation on late PET/CT images, defined as the appearance of new 18F-FDG uptake in the radiation therapy field outside the tumour area on the baseline scan, was detected in 8 of 123 (6 %) patients. All of these patients met the criteria for qualitative assessment of treatment response.

Fig. 1
figure 1

18F-FDG PET/CT images in a patient with a complete pathological response. a Baseline images: intense 18F-FDG uptake is evident in a cT3 lesion (SUVmax 18.5). b Early images: a significant decrease in tumour SUVmax is evident (SUVmax 6.8; RI 63.2 %). c Late images: the tumour has completely disappeared (SUVmax 2.0; RI 89.2 %). Histopathological analysis after low anterior resection showed total tumour regression (ypT0 N0, TRG1). Neither local nor distant recurrence occurred during 36 months of follow-up

Fig. 2
figure 2

18F-FDG PET/CT images in a patient with non-complete pathological response. a Baseline images: intense 18F-FDG uptake is evident in a cT4 lesion (SUVmax 16.6). b Early images: a slight decrease in tumour SUVmax is evident (SUVmax 8.8; RI 46.9 %). c Late images: a further decrease in tumour SUVmax is evident (SUVmax 5.4; RI 67.5 %). Histopathological analysis after abdominoperineal resection showed ypT3N1, TRG3. Neither local nor distant recurrence occurred during 63 months of follow-up

Pathological response evaluation

Among 124 patients in whom ypTNM classification was available, 28 (22.6 %) were classified as cPR (ypT0 N0) and 96 (77.4 %) as non-cPR (ypT1–4 N0–2). In the TRG classification, 31 of 126 patients (24.6 %) were classified as cPR (TRG1) and 95 patients (75.4 %) as non-cPR (TRG2–5).

Metabolic changes in relation to histopathological response

Semiquantitative analysis

Mean (± SD) tumour SUVmax from the baseline, early and late PET/CT scans in patients with cPR and non-cPR according to the ypTNM classification and the TRG score are presented in Tables 2 and 3, respectively. Early and late RIs are also shown in Fig. 3.

Table 2 Tumour metabolic changes in relation to histopathological response (ypT0 N0 vs. ypT1–4 N0−2)
Table 3 Tumour metabolic changes in relation to histopathological response (TRG1 vs. TRG2−5)
Fig. 3
figure 3

Early and late RIs in patients with cPR (ypT0 N0 and TRG1) and non-cPR (ypT1–4 N0–2 and TRG2–5)

The AUCs for early RI in identifying non-cPR patients according to the ypTNM classification and TRG score were, respectively, 0.74 (95 % CI 0.61 – 0.87) and 0.75 (95 % CI 0.62 – 0.88); the optimal cut-off was a reduction of 61.2 % (83.1 % sensitivity and 65.0 % specificity in identifying ypT1–4 N0–2 patients; 85.4 % sensitivity and 65.2 % specificity in identifying TRG2–5 patients; Fig. 4). Among the 27 patients with an early RI lower than −61.2 %, 12 (44.4 %) were classified as TRG2–5. In contrast, among the remaining 78 patients with an early RI higher than −61.2 %, 70 (89.7 %) were TRG2–5 (positive predictive value, PPV, 89.7 % in identifying non-cPR patients and negative predictive value, NPV, 55.6 % in identifying cPR patients). The relationship between the early RI cut-off and histopathological response (TRG1 vs. TRG2–5) is presented in Table 4. The AUCs for late RI in identifying non-cPR patients were, respectively, 0.64 (95 % CI 0.49 – 0.79) for ypT1–4 N0–2 patients and 0.64 (95 % CI 0.50 – 0.78) for TRG2–5 patients; these values were considered not sufficiently accurate to establish an optimal cut-off (Fig. 4).

Fig. 4
figure 4

ROC curve for early and late RIs. The curve shows the accuracy of using early RI (blue line) for predicting complete pathological response (non-cPR vs. cPR) from the TRG score (AUC 0.75, 95 % CI 0.62 – 0.88): the optimal cut-off value was a SUVmax reduction of 61.2 %). The AUC of late RI (green line) is not sufficiently accurate to establish the optimal cut-off (AUC 0.64, 95 % CI 0.50 – 078)

Table 4 Correlation between early RI cut-off and histopathological response (TRG1 vs. TRG2−5)

Qualitative analysis of late PET/CT

Of 36 patients with a complete metabolic response on the late PET/CT scan, 18 (50 %) obtained a non-cPR according to the ypTNM classification (ypT1–4 N0–2) while 75 of 85 (88.2 %) with a qualitative incomplete metabolic response on the late PET/CT scan were ypT1–4 N0–2 yielding a sensitivity and a specificity in predicting non-cPR of 80.6 % and 64.3 %, respectively. Of these same 36 patients, 17 (47.2 %) were non-cPR according to the TRG score (TRG2–5) while 75 of 87 (86.2 %) with an incomplete metabolic response on the late PET/CT scan were TRG2–5 yielding a sensitivity and specificity for predicting non-cPR of 81.5 % and 61.3 %, respectively. Cohen’s kappa was 0.41 (p < 0.01). Table 5 shows the results of qualitative assessment of late PET/CT images in relation to TRG score.

Table 5 Qualitative assessment of late 18F-FDG PET/CT in relation to TRG score

Multivariable logistic model for the identification of non-cPR (TRG2–5)

The best multivariable model (Table 6) was the model including the early RI cut-off (OR 12.5, 95 % CI 3.9 – 40.6; p < 0.01 in patients with an early RI higher than −61.2 %), the distance from the tumour to the anal verge (OR 4.1, 95 % CI 1 – 16.1; p = 0.05 in patients with a distance ≥3 mm), and the clinical T stage (OR 7.9, 95 % CI 1 – 61.3; p = 0.05 in patients with T4 vs. T2). The model showed good discrimination with c-statistics of 0.845.

Table 6 Multivariable logistic model for the identification of non-cPR (TRG2−5)

Metabolic changes vs. DFS and OS

The median follow-up time in all patients was 68 months (range 4 – 185 months). Only one patient showed local relapse at the time of analysis, making statistical analysis inappropriate. Of the 126 patients, 13 (10.3 %) showed local or distant recurrence with median survival not calculable. With respect to OS, six patients (4.8 %) died. Consequently, median survival could not be computed. The AUC for early and late RIs did not satisfy the criteria defined in the Materials and methods section for both relapse and death.

Discussion

Functional imaging with 18F-FDG PET/CT is emerging as the preferred modality for evaluating tumour response to treatment in many types of cancer. In patients with LARC, PET scans were primarily performed for response assessment after completion of RCT [22, 2532]. Recently, several studies have evaluated the role of 18F-FDG PET/CT during treatment (approximately 2 – 3 weeks after the beginning of treatment) [3336]. Early prediction of histopathological response to preoperative RCT in LARC with PET/CT may offer the important clinical advantage of response-guided treatment, mainly allowing dose escalation or the use of more effective chemotherapeutic agents in nonresponder patients. Moreover, accurate restaging before surgery could guide optimization of the surgical approach. The most widely accepted surgical treatment in LARC is total mesorectal excision, but local excision or even observation alone can be considered in highly selected patients with complete regression after preoperative RCT [37, 38].

To the best of our knowledge this is one of the largest studies to date evaluating 18F-FDG PET/CT compared to histopathological response (ypTNM and TRG score) at two time-points, during RCT (early PET/CT) and shortly before surgery (late PET/CT). We could not find a significant difference in predicting histopathological response with PET/CT between the use of ypTNM classification and the use of TRG score. Second, we found the optimal cut-off to distinguish non-cPR patients (TRG2–5) from cPR patients (TRG1) on the early PET/CT scan as a reduction in tumour SUVmax of 61.2 % (85.4 % sensitivity, 65.2 % specificity). In addition, the PPV of identifying non-cPR on the early PET/CT scan, that is the probability that non-cPR is correctly identified, was 90 %. Conversely, the NPV of identifying cPR during the course of RCT, that is the probability that cPR is correctly identified, was 56 %. Unfortunately, an optimal tumour SUVmax cut-off for the late PET/CT scan was not found while the qualitative analysis of late PET/CT images showed a sensitivity of 81.5 % and a specificity of 61.3 % in predicting non-cPR (TRG2–5).

Our results are consistent with accumulating evidence, including a recent meta-analysis on the predictive role of metabolic response to preoperative RCT in LARC, that the early evaluation of response during treatment is more promising than evaluation after completion of therapy [39]. Previously reported results on the predictive value of 18F-FDG PET/CT in patients with LARC after and during preoperative treatment are not uniform. This heterogeneity is due to several factors including the PET scanner used, the acquisition protocol, the formulation of RCT treatment, the timing of early PET/CT, metabolic response criteria, and the definition of responder/nonresponder groups. Since there is a certain degree of heterogeneity in the applied methodology, it is difficult to directly compare our results with those obtained by different investigators. To put our data in the appropriate context, it is important to underline some key aspects.

First, most of the studies published to date compared the metabolic response to the histopathological response usually defined according to the TRG score as responders (TRG1–2) and nonresponders (TRG3–5) [35, 40, 41]. We considered responders as only TRG1 tumours in order to explore the ability of 18F-FDG PET/CT to distinguish patients with cPR, who are potentially candidates for new more conservative treatment strategies such as a minimal surgical procedure or a “wait-and-see” policy [42, 43]. We considered TRG2 tumours as non-cPR, together with TRG3–5, because of 15 – 20 % of these patients could have positive nodes after surgical resection [44]. Consequently, a minimal surgical procedure is not justified in these patients. However, to facilitate the comparison of our data with those of previously published studies, we repeated the analysis comparing TRG1–2 and TRG3–5, and found AUCs of 0.68 (95 % CI 0.57 – 0.78) and 0.66 (95 % CI 0.56 – 0.77) for early and late RIs, respectively. Second, we calculated tumour SUVmax changes as a quantitative parameter of metabolic response because this value is less observer-dependent and more reproducible than SUVmean which is dependent on the dimensions of the region of interest [45, 46]. Finally, we used visual assessment (complete vs. incomplete metabolic response) only for the late PET/CT scan because when evaluating PET for early response to treatment, metabolic changes may be subtle and not visually evident. Therefore, quantification such as SUV seems to play a more important role at this specific time-point.

From our results,18F-FDG PET/CT seems to be particularly useful for identifying the absence of response to preoperative treatment early, during the course of RCT (90 % PPV for non-cPR), allowing clinicians to replace/intensify RCT protocols with more aggressive alternatives such as a radiotherapy boost to the macroscopic tumour as well as chemotherapy administration during the waiting period between the end of radiotherapy and surgery. Moreover, the multivariate analysis which included clinical and PET-derived parameters showed that only early RI higher than −61.2 % significantly predicted non-cPR. In addition, late changes in 18F-FDG uptake were less predictive of histopathological response and the qualitative assessment of metabolic response seemed to be more accurate than semiquantitative analysis, but not sufficiently sensitive for guiding the decision as to whether a minimal surgical approach such as local excision, or even observation, is appropriate. Qualitative interpretation may be better than semiquantitative analysis as it includes a great deal of information, such as clinical experience, expectations of disease patterns for specific diseases and knowledge of normal variants and artefacts. In our series, visual analysis correctly identified five of five patients with a late tumour SUVmax <3 (late RI −70 ± 18 %) and no cPR (one TRG2 and four TRG3). However, the main limitation of this approach is the reproducibility of qualitative readings of PET/CT for diagnosis as well as treatment response. Reproducibility is important for clinical practice and clinical trials.

It is well known that both chemotherapy and radiotherapy generate an inflammatory reaction that can be detected by 18F-FDG PET/CT. Our data show that nonspecific 18F-FDG uptake during RCT does not interfere significantly with metabolic response assessment, probably due to the short time between starting RCT and the early PET/CT scan. Conversely, inflammatory changes after RCT treatment can be protracted and a potential source of false-positive findings on the late PET/CT scan due to concomitant peritumoral/intratumoral inflammation. In our series, 12 of 87 patients (13.8 %) with a positive scan after RCT were TRG1 and 10 of 85 (11.8 %) were ypT0 N0. The longer the interval between RCT and surgery, the lower was the probability of nonspecific 18F-FDG uptake. However, because the standard treatment of LARC after neoadjuvant RCT is radical surgery, false-positive results are somewhat less dangerous than false-negative results. Indeed, a negative post-RCT PET/CT scan cannot exclude the presence of a small tumour or microscopic disease. In our series, 17 of 36 patients (47.2 %) with complete metabolic response were TRG2 (11 patients) or TRG3 (6 patients). The limited spatial resolution of the PET scanner (4 – 6 mm) and the effect of preoperative treatment on cellular metabolism could explain the false-negative results. Dual time-point PET/CT imaging or, even better, PET/MRI could help better discriminate non-cPR from cPR [47, 48].

Furthermore, to date only limited data on the predictive value of 18F-FDG PET/CT for assessing DFS and OS in LARC are available. A recent systematic review including only five studies demonstrated that a complete metabolic response on PET/CT after preoperative RCT is predictive of OS, but not of DFS [49]. In our study population, the impossibility of calculating the correlation between metabolic response and survival might be explained by the rather limited follow-up in several patients. Aside from technical innovations such as dual time-point imaging and PET/MRI, the combination of functional imaging information and clinical and therapeutic data could help improve outcome prediction to move to the era of adaptive therapy in cancer patients [12, 19].

Conclusion

This study showed the ability of early metabolic response assessment using 18F-FDG PET/CT to predict non-cPR in patients with LARC. If confirmed in further studies, it may be enable practical modifications of preoperative treatment of patients with LARC. Conversely, the late RI is not sufficiently accurate for predicting non-cPR and therefore has a limited role in influencing the decision as to whether local excision or a “wait-and-see” policy is appropriate in an individual patient. Qualitative analysis of late PET/CT images is also not sufficiently sensitive alone to rule out the presence of residual disease.