Introduction

Preoperative chemoradiotherapy (CRT) for rectal cancer has been established as a standard treatment procedure [1]. Preoperative CRT may improve tumor local control and result in tumor downstaging. It may then lead to an increased rate of radical surgical resection and a reduction in local recurrence, and even an extension of survival time [2, 3]. Pathological stage is still the most reliable predictor of clinical outcomes in patients with primary rectal cancer [46]. Because of the use of preoperative CRT, the rate of complete pathological response (pCR) has risen. Some studies have reported that the pCR is an essential predictive factor associated with clinical outcomes [7]. In addition, tumor regression is also an important prognostic factor [8], and a standardized five-point tumor regression grade (TRG) classification initially described by Dworak and Keilholz [9] has become an essential component in the protocol for pathologic reporting with regard to rectal cancer resection specimens.

It is important to determine the potential predictive factor for tumor histopathological response to preoperative CRT in patients with rectal cancer, because accurate response assessment to CRT could potentially help in optimizing the surgical approach and in the prediction of long-term prognosis. Morphological imaging modalities, such as computed tomography (CT), magnetic resonance (MR) and endorectal ultrasound (EUS), are used to assess the response to therapy, but it is difficult to distinguish early radiotherapy-induced inflammation or fibrosis from viable tumor cells in residual masses using these modalities [1012]. In a review by de Geus-Oei et al. [13], positron emission tomography/computed tomography (PET/CT) was found to be an important tool for the prediction of the response to therapy in patients with rectal cancer; however, there was no consensus in the data from different studies and there was a lack of statistical data to confirm this standpoint. Therefore, in the current meta-analysis, we collected previous potentially relevant studies with the aim of evaluating the performance of fluorine-18-fluorodeoxyglucose (18F-FDG) PET in predicting histopathological response to preoperative CRT in patients with primary rectal cancer.

Materials and methods

Search strategy and study selection

Articles reporting on PET or PET/CT as tools for predicting the response of rectal cancer to CRT were searched in the databases of PubMed and Embase from January 1990 to September 2013. The primary keywords used in the searches were as follows: (positron emission tomography OR positron emission tomography/computed tomography OR PET OR PET/CT OR PET-CT) AND (colorectal neoplasm OR colorectal carcinoma OR colorectal cancer OR colorectal tumour OR colorectal tumor OR rectal cancer OR rectal carcinoma) AND (therapy OR chemotherapy OR radiotherapy OR chemoradiotherapy OR chemoradiation OR treatment) AND (response OR prediction).

The studies that met all of the following criteria were included in the meta-analysis: (1) The aim of the study was to evaluate the effectiveness of PET or PET/CT in predicting response to CRT. (2) All patients involved had primary rectal cancer with pathological evidence and without prior treatment. (3) All patients had undergone PET or PET/CT scans at least once after preoperative CRT. (4) The reference gold standard had to be histopathological examination. Duplicated articles, reviews, case reports, conference abstracts, animal studies and other non-related articles were excluded. With regard to all of the included studies, the language of publication was English.

Quality assessment

The methodological quality of all of the included studies was assessed independently by two reviewers according to the Quality Assessment for Diagnostic Accuracy Studies (QUADAS) criteria. The evaluation index consisted of 14 items and the total quality score for each study was expressed as a percentage of the total number of items.

Data extraction

In this meta-analysis, the following information was collected from each study: author names, year of publication, country, study design, scan time, imaging modality, reference response endpoint, evaluation index and treatment regimens.

The following values in each study were extracted independently by two reviewers: true positive (TP), false positive (FP), false negative (FN) and true negative (TN) obtained from the PET or PET/CT scan results after they had been compared with the histopathological results. If the raw data were insufficient in the original study, the available information was extracted using calculations based on the data provided, such as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). If this was not possible, the study was excluded from the meta-analysis. If there were disagreements between the two reviewers, a consistent view was resolved by the other two investigators after discussion.

Statistical analysis

The heterogeneity between all included studies was tested using the I-square (I 2) test. Statistical heterogeneity was defined when the I 2 value was more than 50 %. If there was heterogeneity among different individual studies, the random effects model (REM) was selected; conversely, the fixed effects model (FEM) was selected. The pooled sensitivity, specificity and diagnostic odds ratio (DOR) of all included studies were calculated according to the specific model. The summary receiver operating characteristic curve (SROC) was drawn using the Moses’ constant of linear model. The area under the curve (AUC) of the SROC was calculated to measure the accuracy of PET or PET/CT. Pooled data were presented with 95 % confidence intervals (CI). The funnel plot, Begg’s and Egger’s tests were used to test publication bias of all included studies, and we considered that publication bias existed when P < 0.05. The Z test was used to compare the difference between the two sets of data, and there was a significant difference when P < 0.05. Statistical calculation and analysis were performed using Meta-DiSc 1.04 (Hospital Ramón y Cajal and Universidad Complutense de Madrid, Spain) and Stata 12.0 (StataCorp, College Station, TX, USA).

Results

Study selection

A total of 424 studies were retrieved from the PubMed and Embase databases. There were 20 duplicate articles, 83 reviews, 22 conference abstracts, 34 animal studies, 44 case reports and 104 other non-related articles that were excluded from the meta-analysis. Among the 117 remaining potentially relevant articles, there were an additional 80 articles that did not meet the inclusion criteria and were also excluded. Thus, there were 37 relevant articles that met our inclusion criteria. However, the patient information in six studies [1419] overlapped other large sample studies. Therefore, 31 eligible studies involving 1527 patients were finally included in this meta-analysis [2050]. The flow chart regarding study selection is presented in Fig. 1. The detailed characteristics of all included studies are presented in Table 1.

Fig. 1
figure 1

Flow chart illustrating the detailed study selection process involved in the meta-analysis

Table 1 Characteristics of all studies included in the meta-analysis

Quality assessment

There were 14 items from the QUADAS criteria that were used to assess the quality of all included articles. The detailed information and scores regarding the quality assessment are displayed in Fig. 2. The results indicated that most of the included studies had relatively high-quality scores (71.43–92.86 %, 10/14–13/14).

Fig. 2
figure 2

Quality assessment of all of the studies included in the meta-analysis. QUADAS, quality assessment for diagnostic accuracy studies

Performance of 18F-FDG PET in predicting response to preoperative CRT

In this meta-analysis, three semi-quantitative parameters [response index (RI), post-treatment maximum standardized uptake value (SUVmax-post) and the percentage change in total lesion glycolysis (TLG) before and after CRT (deltaTLG%)] and one qualitative parameter [visual response (VR)] related to PET or PET/CT were assessed in predicting pathologic response to preoperative CRT. Our results showed that the pooled sensitivity, specificity, DOR and AUC were as follows: RI, 74 %, 66 %, 7.33 and 0.7922, respectively; SUVmax-post, 74 %, 64 %, 6.59 and 0.7938, respectively; deltaTLG%, 78 %, 81 %, 14.33 and 0.8573, respectively; and VR, 75 %, 67 %, 5.20 and 0.7843, respectively. The pooled sensitivity values of the four parameters were comparable and there was no significant statistical difference among them (P > 0.05). However, the pooled specificity for deltaTLG% was higher than that for the RI (P = 0.0163), SUVmax-post (P = 0.0047) and VR (P = 0.0309). Heterogeneity was observed in the pooled sensitivity and/or specificity for parameters regarding RI, SUVmax-post and VR, but not for deltaTLG%. The results are presented in Table 2.

Table 2 Performance of 18F-FDG PET in predicting pathological response to preoperative chemoradiotherapy in patients with primary rectal cancer

Publication bias

The Begg’s funnel plots are displayed in Fig. 3. They did not reveal a statistically significant effect among the studies included in the meta-analysis, indicating that there was no publication bias. The results using Begg’s test were as follows: RI, P = 0.206; SUVmax-post, P = 0.127; deltaTLG%, P = 1.000; and VR, P = 0.174. Using Egger’s test the results were: RI, P = 0.120; SUVmax-post, P = 0.276; deltaTLG%, P = 0.227; and VR, P = 0.054.

Fig. 3
figure 3

Publication bias tests regarding different evaluation indices (a RI, b SUVmax-post, c VR, d deltaTLG%) using Begg’s funnel plot with 95 % confidence intervals. RI response index, SUVmax-post post-treatment standardized uptake value, VR visual response, deltaTLG% the percentage change in total lesion glycolysis (TLG) before and after CRT

Subgroup analysis

To explore the sources of heterogeneity, we conducted subgroup analysis. The following potential sources of heterogeneity were studied: (1) the three different reference response endpoints namely pCR, TRG and post-treatment downstaging of T stage (ypT-down); (2) the post-treatment scan time (during CRT vs. after the completion of CRT); (3) different cutoff points; (4) the type of study design (prospective vs. retrospective); (5) the number of patients (<50 vs. ≥ 50) enrolled in each study; (6) the country of origin (European countries vs. other countries); and (7) the type of imaging modality (PET/CT vs. PET).

Subgroup analysis was performed regarding the three parameters, RI, SUVmax-post and VR, and the results are given in Table 3. The heterogeneity concerning the three subgroups, namely the TRG, cutoff (2–4.4) of SUVmax-post, the number of patients <50 and other countries of study origin, was eliminated completely when SUVmax-post was used as the evaluation index. In terms of predicting different response endpoints, the pooled specificity of RI in predicting the TRG was higher than that in predicting the pCR (P = 0.0275); the pooled specificity of SUVmax-post in predicting the pCR was lower than that in predicting the TRG (P = 0.0178) and ypT-down (P = 0.0312). The pooled specificity was higher in the group evaluated during CRT (78 %) than in the group evaluated after the completion of CRT (63 %) using the RI as the evaluation index (P = 0.0059); in addition, there was a similar trend regarding the pooled sensitivity (P = 0.0630). The parameter RI exhibited higher pooled sensitivity in prospective studies than in retrospective studies (81 vs. 67 %, P = 0.0086). In the studies involving a small sample size (<50), the RI and SUVmax-post had a higher pooled specificity (P = 0.0352 and P = 0.0053, respectively). The diagnostic efficiency of the parameter VR was not related to the abovementioned subgroup factors.

Table 3 Performance of 18F-FDG PET in predicting pathological response to preoperative chemoradiotherapy for each subgroup

Discussion

In the present meta-analysis, we studied the performance of PET in predicting tumor pathological response to preoperative CRT in patients with primary rectal cancer. All 31 of the included studies had a relatively high-quality score, and there was no publication bias between these studies, which increased the reliability of the meta-analysis. The diagnostic performance of three parameters (RI, SUVmax-post and VR) related to PET or PET/CT was similar in predicting pathological response. The parameter deltaTLG% had higher specificity than the other three parameters in predicting pathological response. The heterogeneity was not completely eliminated in subgroup analysis, so relevant large sample size studies should be conducted to further explore the potential sources of heterogeneity.

At present, many parameters are used to assess response to preoperative CRT. The RI [RI = (SUVmax-pre − SUVmax-post) × 100 %/SUVmax-pre] and SUVmax-post are the most commonly used semi-quantitative evaluation index. Our results showed that there was no statistical difference between them with regard to response assessment. Both the RI and SUVmax-post had similar and higher specificity in predicting TRG than that in predicting tumor pCR. Moreover, the heterogeneity of SUVmax-post was eliminated completely in the TRG subgroup. Consequently, it is suggested that SUVmax-post could be a suitable indicator for predicting the TRG in relation to preoperative CRT in patients with primary rectal cancer.

Surprisingly, in our meta-analysis we found that the parameter RI had higher specificity in predicting therapeutic response using PET, which may have been a consequence of the inclusion of fewer studies (n = 4 for PET vs. n = 16 for PET/CT). PET has been gradually replaced by combined PET/CT. Despite the fact that anatomical information obtained from CT in a PET/CT scan did not seem to improve the detection rate of residual disease in patients with rectal cancer [30], it still provided valuable information especially regarding differentiation of the physical uptake from the uptake in residual lesions in the intestine.

Although the VR has similar diagnostic sensitivity and specificity to the RI and SUVmax-post, the VR was a qualitative indicator that depended on human judgment and was vulnerable to being affected by the interpretation of different nuclear medicine physicians. TLG is also a semi-quantitative indicator, which is based on the metabolic tumor volume (MTV) and the mean SUV (TLG = MTV × SUVmean) and reflects tumor burden. The results showed that the deltaTLG% index [(TLGpre – TLGpost) × 100 %/TLGpre] had the highest diagnostic specificity among the four parameters. Although there was no heterogeneity, only three of the included studies indicated that it could not be representative. Moreover, this indicator still faced some problems in practical application; for example, in determining which value of SUVmax could be considered as a cutoff when the MTV was drawn, and which segmentation threshold could be applied when the SUVmax of the target lesion was lower. Furthermore, the segmentation of the MTV might be affected by the distribution of FDG within the rectal wall [47]. Therefore, the deltaTLG% should be applied with caution, and its prediction efficiency needs to be further explored in future studies.

The optimal post-treatment scan time point for PET/CT is an important and controversial issue that is being explored. A recent meta-analysis indicated that the post-treatment PET/CT scan should be conducted during therapy [51]. Although our subgroup analysis results were not completely consistent with the abovementioned conclusion, our data showed a similar trend regarding the parameter RI (Table 3). A possible reason for this inconsistency may be related to the stricter inclusion and exclusion criteria in our meta-analysis; for instance, studies regarding radiotherapy alone and chemotherapy alone were excluded. In any case, with respect to the scan time we must take into consideration the potential limitation of PET/CT related to the time interval between therapy and the scan, which may result in an underestimation of tumor response. Radiotherapy-induced inflammation could result in approximately 25 % of FDG uptake occurring in inflammatory cells; in addition, there may be a temporarily reversible reduction in tumor FDG uptake caused by the so-called stunning of tumor cells. Therefore, in this regard, further large sample size randomized-controlled clinical trials should be conducted to find the optimal post-treatment scan time for PET/CT to provide more useful and reliable information to clinicians. Moreover, future studies should pay more attention to the time interval between therapy and the scan to avoid false-positive or false-negative results.

To date, as compared with morphological imaging modalities, such as CT, MRI and EUS, metabolic imaging using PET/CT has been shown to be a most promising imaging modality regarding response assessment to therapy in rectal cancer. Denecke et al. [21] found that FDG PET was superior to CT and MRI, and Amthauer et al. [14] found that EUS seemed to be unreliable in the evaluation of response to therapy after radiotherapy with or without chemotherapy. Moreover, some studies have also shown that there is a tendency to overestimate local tumor extent after radiotherapy as a result of these therapy-induced anatomical alterations (inflammatory reactions, necrosis, or fibrosis), which could interfere with the detection of tumor regression by means of anatomical structure analysis and render the response assessment more difficult for CT, MRI and EUS [14, 21]. Consequently, at present functional imaging of PET/CT is a preferable choice with regard to the prediction of response to CRT.

Some studies have shown that tumor pathological response after therapy may be a powerful prognostic factor in patients with rectal cancer [2, 7]. Capirci et al. [5] analyzed the relationship between pCR and prognosis in 566 pCR patients with locally advanced rectal cancer and found that 5-year rates of disease-free survival, overall survival and cancer-specific survival were 85, 90, and 94 %, respectively. Rodel et al. [8] reported that patients with TRG4 did not experience local recurrence, but patients with TRG 2 + 3 and TRG 0 + 1 experienced local recurrence with rates of 4 and 6 %, respectively. However, in this regard, there is still no consensus concerning the correlation between pathological response and clinical outcomes. In recent years, there has been increasing concern with regard to the relationship between the metabolic response assessed by FDG PET or PET/CT and patient clinical prognosis. There has also been considerable disagreement regarding the prognostic value of FDG PET or PET/CT. For instance, Nakagawa et al. [52] reported that the median survival and the 5-year overall survival rates in the SUVmax-post < 5 vs. SUVmax-post > 5 groups were 95 vs. 42 months and 70 vs. 44 %, respectively (P = 0.042). On the contrary, Ruby et al. [53] carried out a large sample size (127 patients) prospective study and suggested that serial FDG PET before and after CRT could not provide prognostic information.

The present meta-analysis had some limitations. First, there was heterogeneity in diagnostic performance when RI, SUVmax-post and VR were used as evaluation indices. Moreover, heterogeneity could not be completely eliminated in the subgroup analysis. Faced with this problem, REM was employed to estimate the diagnostic performance. Second, all patients enrolled in each of the included studies received CRT; however, the detailed treatment regimens were not consistent, which may be a potential source of heterogeneity. Moreover, because of considerable variation in the treatment regimens used in these studies, the relevant subgroup analysis was not carried out. Last but not least, we compared the impact of two different ranges of RI and SUVmax-post in the evaluation of the diagnostic performance of PET or PET/CT; however, there was no significant difference, so we could not recommend a proper range of RI and SUVmax-post for clinicians according to the current data.

Conclusions

In conclusion, according to the current data, among the four evaluation parameters (RI, SUVmax-post, VR and deltaTLG%) related to PET or PET/CT, SUVmax-post was found to be the optimal index for the prediction of the TRG regarding preoperative CRT in patients with primary rectal cancer. The parameter deltaTLG% may be a potential response prediction index, but its efficiency still requires further study to confirm. Our present results also suggest that the diagnostic performance of PET or PET/CT could be related to its post-treatment scan time, and the post-treatment PET or PET/CT scan should be carried out during the process of CRT when using RI as evaluation index.