Introduction

Staging positron emission tomography-computed tomography (PET-CT) is recommended after CT for all potentially curable oesophageal cancers beyond T1aN0 [17]. For such tumours, surgical resection is the mainstay of radical treatment in Europe and the USA [11, 12]. This is preceded by neoadjuvant therapy (chemotherapy with or without radiotherapy) in 85 % [8, 9]. Fifty to sixty percent of tumours are chemoresistant [10, 11]; however, without markers to identify these and personalize therapy, some patients inevitably progress to incurable disease during therapy. This mandates restaging before surgery, but despite this incurable disease is often encountered [12, 13], with consequent psychological and physiological trauma from an ultimately futile attempt at resection.

However, no restaging guidelines exist; whilst intuitively PET-CT might be expected to be superior to CT, this has not been demonstrated. Baseline PET variables (primarily maximum standardized uptake value [SUVmax] and avid length) are associated with metastatic disease at presentation, and dynamic variables with pathological response, prognosis, recurrence, and nodal/distant metastases following treatment [1417]. However, no studies have assessed whether factors can predict progression during therapy. One candidate is FDG-avid nodal burden; we previously found their presence before NAC to predict worse prognosis (in unselected patients) [18]. However, no formal classification has been described.

We recently moved from routine restaging CT to PET-CT after neoadjuvant chemotherapy (NAC). This study aimed to determine the utility of restaging PET-CT from a number of perspectives. First, to quantify interval progression risk during NAC, compare detection by CT and PET-CT, and use decision theory to guide restaging PET-CT. Second, to determine whether pre-NAC variables predict interval progression. And finally, to define and quantify FDG-avid nodal stage and metabolic response to NAC for the first time, and determine whether these and other variables can stratify risk of progression.

Methods

Patients and staging protocol

All patients with oesophageal/gastrooesophageal junctional (GOJ) cancer staged initially with 18F-fluorodeoxyglucose (FDG) PET-CT, and restaged after NAC with CT or PET-CT were identified from prospectively maintained and approved databases (May 2006-November 2014) [19]. Patients were staged sequentially with CT, PET-CT, endoscopic ultrasound (EUS), and laparoscopy (greater sac, without routine washings) for tumours extending below the diaphragm. Investigations were reported and reviewed by a specialist oesophagogastric cancer multidisciplinary team [5]. From 2008, endoscopic resection was introduced for T1aN0 tumours.

Neoadjuvant chemotherapy

NAC was considered for disease beyond T1N0 (supplementary methods).

CT and PET-CT

Patients were restaged routinely using CT before 2008 and PET-CT afterwards (although some underwent CT due to clinical trial protocols). 18F-FDG PET-CT was performed using one of two scanners: before 3rd November 2009, using a General Electric Discovery STE 16-slice (60 min post 400 MBq 18F-fluorodeoxyglucose [FDG]; 3.3 mm slice reconstruction); after 3rd November 2009, a Discovery 690 64-slice system after (90 min post 4 MBq/Kg FDG; 2.5 mm slice reconstruction), without intravenous contrast using standard iterative reconstruction. Examinations were independently reported by two PET-CT radiologists. Contrast-enhanced multidetector CT was performed using a standard protocol [5], using 16- to 128-slice systems (Siemens, Toshiba, General Electric); 0.5 to 0.675 mm slice image acquisition; volumetric, multi-planar reformatting; 2.5 mm reconstruction or thinner; from the neck to symphysis pubis performed prone following gas granules, oral and 100 mL of 300 mg iodine per mL portal phase intravenous contrast medium. Examinations were reported by a consultant gastrointestinal radiologist using the contemporary UICC-AJCC TNM 6th [20] or 7th edition [21].

Data and variables

Pre-treatment variables comprised patient age and gender, tumour cell type, grade [22], site, T stage (TNM 7th edition), N stage (TNM 6th edition, as data were insufficient for conversion), whether impassable at oesophagogastroduodenoscopy (OGD), and PET-CT variables. NAC variables comprised regimen (grouped as dual or triple therapy), and days between scans, and restaging scan/operation. Incurable disease was defined using the TNM 7th edition: metastatic disease as nodal disease outside a radical lymphadenectomy field or haematogenous spread, either definitively identified on imaging or requiring confirmatory biopsy/imaging. Unresectable disease was defined as that invading unresectable structures.

PET-CT variables

Variables comprised tumour FDG-avidity (SUVmax, length [cm]), number of FDG-avid local nodes (visible separately from the tumour, within a standard lymphadenectomy territory, with SUVmax > 2.5 or background mediastinal blood pool), involved body compartments (neck, thorax, abdomen), and SUVmax of the most avid node. For examinations using the second scanner additional variables were generated retrospectively: metabolic tumour volume (MTV), SUVmean, SUVpeak, and tumour glycolytic volume (TGV) mean/max. MTV was measured using a fixed threshold technique (SUV ≥ 4). TGVmean was calculated as the product of MTV and SUVmean. TGVmax was calculated as the product of MTV and SUVmax. Metabolic tumour response (mTR) was quantified using percentage reduction in avidity (SUVmax, length) and PERCIST criteria (SUVmax) [23]; metabolic nodal response (mNR) using metabolic nodal (mN) stage and nodal SUVmax (percentage change, adapted PERCIST criteria).

Resections

Resections were performed via left thoracolaparotomy, laparotomy plus right thoracotomy (+/- neck dissection), or occasionally trans-hiatal [24]. Metastatic disease was confirmed via frozen-section histopathology. Unresectable disease due to invasion/extent was confirmed by two consultant oesophagogastric surgeons.

Statistical analysis

Analysis was performed using R v3.0.2 [25]. Two-tailed p < 0.05 was corrected for multiple comparisons (Bonferroni method [26]). p values are given to 3 decimal places. For regression, continuous variables were assessed (density plots) and transformed: age2 log SUVmax/mean/peak and time to re-staging/surgery). Multivariate regression included all variables (including PET-CT scanner) after exclusion of perfect separators. Non-FDG-avid tumours at baseline were also excluded (n = 12); for metabolic response only patients staged and restaged with the same scanner were included. Sensitivities were compared using McNemar’s t test (DTComPair v1.0.3 [31]).

Model development, tuning, validation, and performance

Three techniques were used: logistic regression (LR; backwards stepwise binary logistic), decision tree analysis (DTA; recursive partitioning using loss matrices) and artificial neural networks (ANN; feed forward back-propagation multilayer perceptron) [27, 28]. Models were tuned, generated, and validated internally (bootstrapping) as described previously (supplementary methods) [19].

Decision analytic measures and cost analysis

PET-CT probability threshold (Pt) were calculated using sensitivity, false positive rate, treatment risk, net benefit and test risk from this study, or highest level of evidence in the literature if not available (supplementary methods) [29]. Pt is the probability of demonstrating metastases at which PET-CT risk equals its benefit [29].

Results

Three hundred and eighty-three patients were restaged with PET-CT (n = 280; 73.1 %) or CT (n = 103; 36.9 %; Fig. 1). There were nominally significant differences as regards patient age and tumour site; however, there were no significant differences in NAC regimen. As expected, there was a significant difference in initial PET-CT PET-CT scanner used (Table 1).

Fig. 1
figure 1

Structure of the study

Table 1 Patient characteristics

Progression to incurable disease on re-staging examination

Overall, metastases were identified in 20 (5.22 %) patients: nodal (n = 9), liver (n = 5), liver and nodal (n = 1), liver and bone (n = 1), bone (n = 2), lung (n = 1), or disseminated (n = 1). Indirect comparison demonstrated detection by CT in three cases (2.91 % examinations) and PET-CT in 17 (6.07 %; p = 0.303, Fisher’s exact test; Figs. 2 and 3). Twelve (3.13 %) tumours were initially non-avid; none demonstrated metastatic progression.

Fig. 2
figure 2

Restaging FDG coronal PET image after neoadjuvant chemotherapy, demonstrating a 5.3 cm (SUV max = 17.2) oesophageal adenocarcinoma with new posterior mediastinal 1 cm FDG avid node (SUV max = 6.2)

Fig. 3
figure 3

Fused restaging axial PET/CT image in the same patient, demonstrating FDG avid bone metastasis in left L4 pedicle (SUV max = 6.6) invisible on CT component but which subsequently progressed to become visible on CT with further bone metastases six months later

All 17 PET-CT examinations were reviewed for direct comparison between PET and CT components. All metastases were visible on the PET component. In seven patients (41.2 %) metastases were also visible on the CT component; in two (11.8 %) one metastasis was visible on CT, but the other only with PET; in eight (47.1 %) metastases were only visible with PET (p = 0.006, Fisher’s exact test).

Progression to unsuspected incurable disease at surgery

Three hundred and thirty-six patients underwent surgery (n = 247 PET-CT, n = 89 CT). In 34 (10.1 %) unsuspected incurable disease was found (n = 26 PET-CT, n = 8 CT): n = 21 greater sac (61.8 %; peritoneal metastases [+/-liver] n = 15, liver metastases n = 6), n = 5 lesser sac (14.7 %; peritoneal metastases+/-T4b disease n = 3, T4b disease n = 2), n = 7 thoracic T4b disease (20.6 %), n = 1 pleural metastases (2.94 %). Twenty-seven patients without metastases did not undergo surgery (n = 16 PET-CT; n = 11 CT). However, they were not considered reliable indicators of additional false negatives.

Excluding these 27 patients, PET-CT identified 17 true positives, 26 false negatives, and 221 true negatives. Resultant sensitivity was 39.5 % (95 % CI 25.0-55.6), specificity 100.0 % (97.8-100.0), and negative predictive value 89.4 % (84.8-92.9 %). CT identified three true positives, eight false negatives, and 81 true negatives. Resultant sensitivity was 27.3 % (7.33-60.7), specificity 100.0 % (94.4-100.0) and negative predictive value 91.0 (82.6 -95.8 %).

In a subsidiary analysis considering the PET and CT components separately for patients restaged by PET-CT, PET identified 17 true positives, 26 false negatives, and 221 true negatives, and CT nine, 34, and 221, respectively. PET-CT was significantly more sensitive than CT alone (39.5 % versus 20.9 %; p = 0.005; McNemar test for paired sensitivies).

All patients with peritoneal disease had undergone staging laparoscopy. Two patients with liver metastases had tiny equivocal lesions on restaging PET-CT. For the first, this was reported as benign by MRI and percutaneous biopsy; the metastasis at surgery was not retrospectively apparent. For the second, the lesion was not amenable to biopsy; MRI was reported as benign. Both patients were restaged using the first PET-CT scanner, so were therefore excluded from model development. Of the seven patients with thoracic T4b disease all had undergone EUS, demonstrating T2 (n = 1) and T3-T4a disease (n = 6) prior to NAC, although for two impassable tumours mini-probe was not available.

Decision theory

Pt for PET-CT was 0.084 %; as this is considerably less than the probability of progression, routine re-staging with PET-CT rather than CT is justified. The number of PET-CT examinations rather than CT required to prevent inappropriate oesophagectomy was 23.3, at a net cost of US$10,669.90/£6,631.5/€9165.98. This was associated with a net 0.24 % reduction in lifetime cancer risk.

Pre-chemotherapy factors predicting progression to metastases

On multivariate analysis (Table 2; Supplementary Table 1) two pre-NAC PET-CT characteristics predicted progression: FDG-avid length (OR 1.45 [95 %CI 1.09-1.92]; p = 0.010) and metabolic nodal (mN) stage. There was prohibitive multicollinearity between nodal number and compartments; mN stage was therefore classified as mN0 (0 nodes), mN1 (1-2 nodes), and mN2 (>2 nodes). mN stage before NAC predicted progression: mN1: OR = 17.94 (2.62-122.97), p = 0.003; mN2: OR 33.85 (4.58-250.43), p = <0.001.

Table 2 Pre-neoadjuvant chemotherapy variables and progression to incurable disease on restaging scan

Interestingly, this association was independent of conventional N staging: overall, 92 patients were staged as N0 mN0, 150 N1 mN0, 105 N1 mN1, and 17 as N0 mN1. Progression occurred in 1 (1.10 %), 1 (0.67 %), 14 (11.8 %), and 4 (19.0 %), respectively (<0.001, Fisher’s exact test), suggesting risk to be minimal for patients with non-FDG-avid nodal disease (Table 3).

Table 3 Pre-chemotherapy factors associated with progression to unsuspected incurable disease at surgery

There were no associations between additional PET variables and progression (including MTV, TGV, and nodal SUVmax), although this was potentially biased by the limited number of events (Supplementary Table 1).

Pre-NAC factors predicting incurable disease at surgery

Three pre-NAC variables predicted incurable disease: an impassable tumour (OR 57.00 [14.65-221.78]; p < 0.001), FDG-avid length (OR 1.55 [1.21-1.98]; p < 0.001) and SUVmax (logSUVmax OR 0.04 [2.48 × 10-3-0.49]; p = 0.012]; Table 4; Supplementary Table 2). Of 42 patients with impassable tumours at baseline, 19 (42.4 %) had unresectable disease, compared with 14/293 without (4.78 %; p < 0.001)

Table 4 Post-chemotherapy variables associated with unsuspected incurable disease at surgery

Post-NAC PET variables predicting incurable disease at surgery

Three post-NAC variables predicted incurable disease (Table 4; Supplementary Table 3): FDG-avid length (OR 2.07 [1.41-3.05], p < 0.001), logSUVmax (OR 4.68 × 10-4 [3.40 × 10-6-0.06] p = 0.002), and mN stage (again independent of N stage: mN1 OR 12.52 [0.91-172.00], p = 0.059; mN2, OR 549.50 [22.43-13463.26], p < 0.001), in addition to an impassable tumour. These associations remained significant for patients staged with either scanner. There were no associations with either nodal SUVmax or additional primary tumour PET variables.

Predicting incurable disease at surgery with metabolic response

No classification of mTR predicted progression (Supplementary Table 4; Table 5). Notably, five patients (9.09 %) with complete metabolic response (CMR) had unresectable disease. However, an incomplete response of mN stage did (Supplementary Table 5), in addition to nodal SUVmax (p = 0.025) and nodal PERCIST response (p = 0.022). The optimal ROC nodal SUVmax threshold for predicting unresectable disease was ≥31.7 % reduction, but with no difference in predictive utility compared with the 30.0 % PERCIST threshold.

Table 5 Metabolic response to chemotherapy and unsuspected incurable disease at surgery

mNR was therefore classified as: no avid nodal disease, CMR, PMR (reduction in mN stage, or ≥30 % reduction in nodal SUVmax), stable metabolic disease (SMD), or progressive (PMD; increase in stage, or ≥30 % increase in nodal SUVmax). This strongly and independently predicted incurable disease at surgery: nodal SMD, OR 17.26 (1.85-160.76; p = 0.012); nodal PMD, OR 126.45 (8.19-1951.99; p < 0.001; Table 5). This remained significant for patients staged using the most recent PET-CT scanner, and was borderline for the first scanner (SND p = 0.089). Overall, of 43 patients with FDG-avid nodes (but no apparent metastases) despite NAC, incurable disease was encountered in 11 (25.6 %), compared with 15/197 (7.60 %) without (p = 0.002; Fisher’s exact test).

Predictive models for progression to metastases

LR comprised FDG-avid length and mN stage (Supplementary Table 8). DTA partitioned using FDG-avid length (<3.15 cm). A useful ANN could not be generated. No models, however, could identify patients with a risk of progression < Pt to forgo staging (Supplementary Table 6). However, using the optimal regression p value threshold (0.06, determined by ROC), LR predicted progression with 87.5 % sensitivity (44.4 % independently validated) and 72.6 % specificity (87.7 %).

Predictive models for incurable disease at surgery after re-staging PET-CT

Using restaging PET-CT variables, DTA identified patients with either an impassable tumour or mN stage ≥1 after NAC to be at high risk. This was 76.9 % sensitive and 76.6 % specific (75.0 % and 92.5 %). LR comprised tumour impassability and FDG-avid length, SUVmax, and mN stage. This was highly discriminant (AUC 0.903, r2 = 0.471). Using a threshold of 0.122 to identify patients, sensitivity was 92.3 % and specificity 86.3 % (61.5 % and 93.1 %, independent validation) with minimal over-fitting (Supplementary Table 7). The optimal ANN had no utility.

Using mTR and mNR, DTA identified patients with an impassable tumour, or with avid nodes despite NAC (i.e. nodal PMR/SMD/PMD). This was 76.9 % sensitive, 76.6 % specific (75.0 % and 92.5 %), and effectively identical to the DTA above. LR comprised tumour impassability and mNR. This performed worse than the LR using absolute re-staging values (r2 = 0.314; AUC = 0.814; 76.9 % sensitive and 76.6 % specific [69.2 % and 83.1 %] with minimal over-fitting). A useful ANN could not be validated. Composite models of post-NAC and dynamic variables had no additional benefit (data not shown).

Simple risk stratification using FDG-avid nodes after NAC, and an impassable tumour

As the presence of avid nodes after NAC and/or an impassable tumour before appeared the most reliable predictors of incurable disease at surgery, three risk groups were derived: low (neither factor; n = 181), medium (one; n = 58) and high (both; n = 8). The risk of incurable disease increased dramatically: low (7/181; 3.87 %), medium (13/58; 22.4 %), and high (6/8; 75.0 %; (p < 0.001, Fisher’s exact test)

Discussion

Approximately 5000 oesophagectomies are performed annually in the UK and USA [30, 31], equating to over 4000 restaging scans. We found by both indirect and direct comparisons that PET-CT was more sensitive for progression than CT, associated with a reduced radiation dose and minimal additional cost. A new classification of FDG-avid nodal stage (mN) before NAC predicted disease progression to metastases on restaging scan. mN stage after NAC and a new classification of mNR plus primary tumour impassability and FDG-avidity predicted unsuspected incurable disease at surgery (in contrast to mTR). These variables and derivative LR and DTA models identified patients (before and after NAC) at high risk of progression and abandoned resections, demonstrating that avid nodes after NAC and an impassable tumour beforehand can easily and powerfully stratify patient risk.

Whilst avid nodes predict worse prognosis in oesophageal cancer [14], cholangiocarcinoma [32], and uterine carcinoma [33], we believe this to be the first association with disease progression. Intriguingly, this was independent of traditional N staging: patients with ≥ N1 mN0 disease seem to have the same low risk as N0 mN0 disease. This suggests that FDG-avidity is an important surrogate marker of more aggressive metastatic clones, although this might possibly be explained in part by over-estimation of nodal stage by EUS [34]. Risk increased with nodal burden; we classified this analogous to the TNM 7th edition N stage (mN0, mN1, and mN2) although we could not reliably generate an mN3 stage.

Future questions include whether the proportion of EUS-identified nodes that are avid has predictive ability and whether mN stage and response predict pathological response, recurrence, and survival. More urgently, however, this has implications for tailoring neoadjuvant therapy on the basis of interval disease metabolic response. This is performed on the basis of mTR alone; [35] however, as FDG-avid nodal metastases might respond differently, consideration should be given to mNR urgently.

An impassable tumour (at baseline) strongly predicted incurable disease at surgery, independent of T stage and FDG-avid length. This might represent clonal evolution within a larger tumour, or nutritional compromise (although we routinely support these patients via jejunostomy tube feeding). Progression was more likely with a longer FDG-avid tumour (perhaps again representing volume), but unresectable disease was associated with a lower SUVmax (presumably representing less FDG-avid metastases from less FDG-avid primaries). Whilst we were able to assess a number of additional and composite metrics (such as MTV and TGV [17]), it may be that other PET [36] and non-PET metrics have utility [37, 38].

The high rate of incurable disease we encountered at surgery suggests that whilst re-staging PET-CT is preferable it is insufficient in isolation (perhaps due to limitations in primary tumour avidity as discussed), although a lack of evidence and individual case variation precludes making general recommendations. However, it seems logical that patients with thoracic disease either impassable at OGD or ≥ T3 may require additional re-staging cross-sectional imaging. The morbidity of laparoscopy is sufficiently low [39] that restaging laparoscopy (including the lesser sac) should be considered in all patients with distal oesophageal/GOJ disease, perhaps even before PET-CT. In this group, even the lowest risk group of patients (those with passable tumours and mN0 disease) had a 3.87 % risk of unresectable disease. In the UK National Health Service, this would be cost-neutral: 9.52 procedures (£14,613) required to prevent an abandoned oesophagectomy (£12,274).

Our rate of such incurable disease at surgery is relatively high compared with three recent studies: 0/89 (0.00 %) [40], 1/46 (2.17 %) [12], and 2/57 patients (3.57 %) [13], potentially related to differences in patient selection and treatment. Firstly, all three studies used neoadjuvant chemoradiotherapy rather NAC, potentially resulting in improved local control. Secondly, there were differences in stage, particularly N1 disease before therapy. This could not be ascertained from two studies [12, 40], and there were considerably fewer node positive patients in the other [13]. Thirdly, nodal avidity after therapy is important; in one study no patients had residual FDG-avid nodes [13], whilst the other two papers did not report this.

Our study has a number of limitations, including its opportunistic single centre retrospective design, an indirect comparison of two modalities, and use of two PET-CT scanners (although we controlled for a confounding variables including scanner, replicated findings for individual scanners, and generated models from the later group with validation in the former). The ideal study design for comparing both modalities would be a prospective within-subjects design, with patients undergoing both CT and PET-CT. However, such a trial would be very unlikely to be deemed ethically appropriate; indeed, we believe our evidence derived by comparing overlapping cohorts within a single high volume centre, whilst retrospective, would make this even less so. We sought to perform an unblinded post hoc within-subjects analysis using the PET and CT components assessed independently of patients with progression; this was supportive of our overall conclusions. Additionally, the relative infrequency of events resulted in wide regression confidence intervals, and we could not precisely ascertain utility of re-staging PET-CT in (initially) non-avid tumours. However, whilst these broad intervals might limit precision, we believe their associations to be genuine, on the basis of consistently strong associations on direct comparison. Consequently, LR models may have less generalizability than DTA, which do not use precise effect sizes. Whilst PET-CT detected twice as many instances of progression, this was not significant on direct comparison using Fisher’s exact test, most likely indicative of the lower CT sample size. However, we believe these rates to be representative; when considering those metastases evident on PET-CT, a similar proportion was evident on the CT component (plus PET) and on the PET component alone. Inevitably, further potential sources of bias are technological (in particular slice thickness) and clinical evolution towards triple agent NAC during this period, although we did adjust for these variables.

We used three modelling techniques to mitigate advantages and disadvantages, discussed more fully in the supplementary methods [27, 28]. Whilst the re-staging PET-CT LR model outperformed DTA for unresectable disease, its lower independent validation performance highlights difficulties in partitioning data with continuous variables, also limited by wide confidence intervals. By contrast DTA partitioned on pragmatic and more reliable categories—an impassable tumour and the presence of avid nodes despite NAC—with better validation performance. All models and markers require further external validation before being used to guide decision-making. However, we believe patients with either an impassable tumour or avid nodes after NAC can be easily identified to stratify risk.

In conclusion, restaging with PET-CT rather than CT appears significantly more sensitive for disease progression after NAC, with the caveats of comparison across a time period. New classifications of avid nodal stage (mN) and response (mNR), in addition to impassability and perhaps primary tumour FDG-avidity, can identify patients at risk. There is, therefore, a strong argument for evaluating nodal avidity in addition to traditional N stage within the TNM staging manual and also considering interval metabolic response of both the primary tumour and nodal metastases. However, there remains a need for additional staging in high and medium risk patients as a minimum; the most likely modalities being re-staging laparoscopy (including the lesser sac) and cross-sectional imaging beyond CT [41].