Introduction

Lung cancer remains the leading cause of cancer mortality in the United States [1]. Traditional prognostic factors for non-small cell lung cancer (NSCLC) include age, performance status (PS), weight loss [2], and stage [3]. 18F-Fluorodeoxyglucose positron emission tomography (FDG-PET) scans are now routinely used for staging in patients with newly diagnosed NSCLC, resulting in great potential to identify imaging-based prognostic biomarkers for these patients.

Metabolic tumor volume (MTV) has emerged as a prognostic factor for patients with NSCLC in multiple studies [49]. MTV reflects the volume of tumor tissue that demonstrates increased metabolic activity on FDG-PET, and may more accurately represent an individual patient’s overall tumor burden. Likewise, other groups have found that the maximum standardized uptake value (SUVmax) is a prognostic factor [10], but this result has not been universally consistent [11]. However, MTV has been shown to be independently prognostic for survival [47, 9].

We tested the hypothesis that higher pre-treatment total metabolic tumor volume (tMTV-pre) leads to worse overall survival (OS) in patients with NSCLC. We examined this question in patients enrolled in the American College of Radiology Imaging Network (ACRIN, now ECOG-ACRIN) 6668/Radiation Therapy Oncology Group (RTOG, now NRG Oncology) 0235 trial.

Materials and methods

Patient selection

ACRIN 6668/RTOG 0235 was a multicenter National Cancer Institute-funded prospective study. Each participating institution obtained institutional review board approval before accrual, and all patients provided written, study-specific informed consent. The current study is a secondary analysis of this data set.

Details on the study patients, treatment specifics, and FDG-PET scans can be found in the primary manuscript [12]. Patients with medically inoperable stage III (or selected inoperable stage IIB) NSCLC were eligible. All patients were treated with platinum-based chemoradiotherapy with definitive intent. Adjuvant chemotherapy was allowed. Pre-treatment FDG-PET scans were performed on ACRIN-qualified scanners. Post-treatment FDG-PET scans were required to be performed within 12–16 weeks after completion of therapy, using the same scanner as that used for the pre-treatment scans.

Measurement of metabolic tumor volume

Computer-aided MTV measurement was performed using RT_Image, an open-source software application [13]. The primary investigators (JGB and BWL) did not have access to the pre-treatment or post-treatment FDG-PET scan diagnostic reports. A maximum-intensity projection view of the images permitted rapid visual identification of the hypermetabolic lesions (primary tumor and lymph nodes). On the pre-treatment FDG-PET scans, each lesion interactively identified by the user was then volumetrically segmented by the automatic software algorithm, which defines the segmented volume as all connected voxels with intensity greater than a lesion-specific adaptive threshold of 60 % of the SUVpeak within the lesion. In the ACRIN 6668 study, the SUVpeak was defined as the mean SUV within a circular region of interest (0.75–1.5 cm in diameter) that encompasses the SUVmax [12]. We chose a diameter of 1 cm for all lesions, to ensure consistency. Total MTV (tMTV-pre) was the sum of the volumes of all segmented lesions in milliliters.

In the post-treatment setting, an absolute SUV threshold (based on 50 % of the pre-treatment SUVpeak within each lesion) was used to calculate the post-treatment MTV. An absolute threshold based on the pre-treatment SUVpeak was chosen for the post-treatment scans, because the residual SUVpeak of responding lesions often falls to near background levels, and residual FDG uptake on post-treatment FDG-PET scans often represents radiation pneumonitis rather than residual/recurrent tumor, both of which lead to segmenting excessive volumes when thresholding relative to the post-treatment SUVpeak. The highest absolute SUV threshold was used to calculate the MTV of any new lesions on the post-treatment FDG-PET scans. The total post-treatment MTV (tMTV-post) was defined as the sum of the MTV for all prior and new lesions.

Follow-up

Patients were followed for a minimum of 2 years or until death. Outcomes collected included OS and local-regional control (LC). In ACRIN 6668/RTOG 0235, LC was determined by the treating institution. A local-regional failure was defined as a failure within the irradiated primary lung tumor and/or the involved regional lymph node fields. The study investigators were blinded to patient outcomes.

Statistical analysis

We hypothesized that patients with higher tMTV-pre would have worse OS than patients with lower tMTV-pre. Secondary hypotheses included the association between tMTV-pre and LC and between tMTV-post and OS. For pre-treatment measures, survival time and local-regional failure time were defined as the interval from study registration to either event occurrence or patient censoring. In examining time to local-regional failure, patients who died prior to recording a local-regional failure were treated as censored. For post-treatment measures, survival time was defined from the date of the post-treatment PET scan.

Kaplan–Meier curves were generated to depict the association between tMTV and clinical outcome. Median tMTV was used as the cutpoint to stratify patients into high and low tMTV groups, with separate curves reported by time point (pre-treatment vs. post-treatment). Log-rank tests were performed to compare outcomes between groups.

Univariate Cox proportional hazards regression models were used to evaluate tMTV as a predictor of clinical outcome. In addition, multivariable Cox proportional hazards regression models including both PET metrics (tMTV, SUVpeak, and number of hypermetabolic lesions) and clinical variables (age, gender, performance status, clinical stage, radiation dose, and chemotherapy regimen) were used to identify independent predictors of clinical outcome. For tMTV-pre, separate models are reported by clinical outcome (OS or LC); for tMTV-post, a single model for OS is reported. Model diagnostics related to the Cox model were assessed by means of scaled Schoenfeld residuals [14]. In cases where there was evidence of non-proportional hazards, time-dependent coefficients were added to the model through an appropriate interaction with time.

A Bonferroni correction was used to account for multiple comparisons. A p value threshold of 0.05/3 = 0.017 was used to declare statistical significance for the three fitted models. All statistical analyses were performed using SAS 9.4 (SAS Institute, Inc., Cary, NC, USA), Stata 13.1 (StataCorp LP, College Station, TX, USA), and R v3.1.0 (R project, http://www.r-project.org/)software.

Results

Patient population

In the primary study, a total of 250 patients were accrued among 37 institutions between June 2005 and May 2009 [12]. Sixteen patients were deemed ineligible. Of the remaining 234 patients, 230 had FDG-PET scans available for analysis. Patient characteristics are shown in Table 1. The majority of these patients had clinical stage IIIA/IIIB disease. While the radiotherapy intent was definitive in this study, 19 patients (8.3 %) received <60 Gy (6–59.4 Gy). Of these, ten (53 %) received between 56 and 59.4 Gy. Post-treatment FDG-PET scans were available for 176 patients (Table 1).

Table 1 Baseline patient characteristics and treatment data

MTV measurements

SUVs were reported previously [12]. eFigure 1 summarizes the agreement between SUVmax and SUVpeak recorded in this secondary analysis compared with the ACRIN Core Lab reads. Agreement was better for SUVmax than for SUVpeak, but for both measures, the SUV for the secondary analysis was higher on average than the ACRIN Core Lab reads.

The median number of hypermetabolic lesions identified on the pre-treatment FDG-PET scan was 2 (range, 0–17). Pre-treatment SUVpeak and tMTV-pre were 0 in 2/230 patients (0.9 %), because no identifiable lesions were found. The median (range) and mean (standard deviation) for tMVT-pre were 32 mL (0–649 mL) and 66.7 mL (95.5 mL), respectively.

Based on the post-treatment FDG-PET scan, the corresponding median (range) and mean (standard deviation) for tMTV-post were 0 mL (0–323 mL) and 12.4 mL (45.0 mL). tMTV-post was scored as 0 for 64 % (112/176) of patients; 19 % (34/176) were found to have new lesions on the post-treatment FDG-PET scan.

Pre-treatment tMTV and OS

Figure 1a demonstrates that patients with tMTV-pre >32 mL (the median value) had significantly worse OS than did patients with tMTV-pre ≤32 mL (p < 0.001, median 14.8 vs. 29.7 months). On univariate analysis, every 10-mL increase in tMTV-pre was associated with a 3 % increase in risk of death (HR = 1.03, 95 % confidence interval [CI] 1.02–1.05, p < 0.001). On multivariate analysis, tMTV-pre (p < 0.001), PS (p = 0.003), and radiotherapy dose (p = 0.011) were significantly associated with OS (Table 2). Number of pre-treatment hypermetabolic lesions (p = 0.026) and patient age (p = 0.053) were marginally significant. However, we found that the effect of both PS and number of pre-treatment hypermetabolic lesions varied with time, implying that the prognostic ability of these baseline measures diminishes over time.

Fig. 1
figure 1

Overall survival (OS) by pre-treatment and post-treatment metabolic tumor volume. a Patients are divided based on the median value of pre-treatment tMTV (tMTV-pre). b Patients are divided based on the median value of post-treatment tMTV (tMTV-post), with OS time defined from the post-treatment PET scan

Table 2 Results of a main-effects multivariate Cox proportional hazards regression model with pre-treatment total MTV (tMTV-pre) as a continuous covariate, along with potential confounders, for predicting overall survival (OS)

Exploratory analyses of pre-treatment MTV and OS

In addition to the main effects model presented in Table 2, we identified a significant statistical interaction between tMTV-pre and radiation dose (p = 0.002). For comparison, the model including this interaction term is presented in eTable 1. As dose delivered increases, the negative prognostic impact of tMTV-pre decreases. Figure 2 shows that the negative impact of tMTV-pre >32 mL is retained at all dose levels (≤60 Gy, 61–69 Gy, and ≥70 Gy).

Further analysis was performed to investigate the interaction between tMTV-pre and radiation dose (Fig. 3). Figure 3a shows that patients with low tMTV-pre (tMTV-pre ≤32 mL) had no difference in OS between radiation dose strata. However, among patients with high tMTV-pre (tMTV-pre >32 mL), patients who received ≤60 Gy experienced worse survival outcomes than did those who received >60 Gy (p = 0.001, Fig. 3b). This effect was marginally significant when five patients who received <56 Gy were excluded (p = 0.081, eFigure 2a). When ten patients with high tMTV-pre who received less than the protocol-specified dose of at least 60 Gy were excluded, there was no statistically significant difference in survival (eFigure 2b).

Fig. 2
figure 2

Overall survival (OS) for the low (tMTV-pre ≤32 mL) vs. high (tMTV-pre >32 mL) tMTV-pre groups at varying dose levels. a Patients treated with doses of 60 Gy or less are divided into the low and high tMTV-pre groups. b Patients treated with doses of 61–69 Gy are divided into the low and high tMTV-pre groups. c Patients treated with doses of 70 Gy or higher are divided into the low and high tMTV-pre groups

Fig. 3
figure 3

Overall survival (OS) by radiation dose delivered in the low- and high-pre-treatment metabolic tumor volume groups. a Patients with pre-treatment tMTV (tMTV-pre) below the median (32 mL, “low tMTV-pre group”) are divided based on radiation dose. b Patients with tMTV-pre above the median (32 mL, “high tMTV-pre group”) are divided based on radiation dose

Pre-treatment tMTV and LC

On univariate analysis, higher tMTV-pre was associated with worse LC. However, based on evidence of non-proportional hazards, the effect of tMTV-pre for LC varied with time (prognostic ability diminished over time). In particular, although prognostic at baseline (HR 1.06, 95 % CI 1.02–1.10, p = 0.001) and at 6 months (HR 1.04, 95 % CI 1.02–1.06, p < 0.001), by 1 year the tMTV-pre value was no longer prognostic for LC (HR 1.02, 95 % CI 0.99–1.04, p = 0.205). This suggests that for patients still in local control, a 1-year-old tMTV measured pre-treatment may no longer be useful for predicting future LC. On multivariate analysis, tMTV-pre and radiation dose remained significant prognostic factors for LC (eTable 2). PS was marginally significant at the adjusted significance threshold (p = 0.023). Again, the effect of tMTV-pre, number of pre-treatment hypermetabolic lesions, and PS diminished over time.

A significant statistical interaction was found between radiation dose and tMTV-pre (p = 0.017): the negative prognostic impact of high tMTV-pre decreased with increasing radiation dose (Fig. 4). As with the OS results, this effect is no longer statistically significant when patients who received <60 Gy are excluded (eFigure 3).

Fig. 4
figure 4

Local-regional control (LC) by radiation dose in the low- and high-pre-treatment metabolic tumor volume groups. a Patients with pre-treatment tMTV (tMTV-pre) below the median (32 mL, “low tMTV-pre group”) are divided based on radiation dose. b Patients with tMTV-pre above the median (32 mL, “high tMTV-pre group”) are divided based on radiation dose

Post-treatment tMTV and OS

Figure 1b demonstrates that patients with post-treatment total MTV (tMTV-post) >0 mL (the median value) had significantly worse OS (measured from the post-treatment scan) than patients with no residual tMTV-post (p < 0.001, median 11.6 vs. 25.8 months). On multivariate analysis, tMTV-post was no longer statistically significant after adjusting for post-treatment SUVpeak (eTable 3). These two PET metrics were also highly correlated (Spearman correlation = 0.85). Only PS (p = 0.007) and post-treatment SUVpeak (p = 0.001) remained significant prognostic factors of OS. Patient age was marginally significant using the adjusted significance threshold (p = 0.054).

Discussion

This secondary analysis of ACRIN 6668/RTOG 0235 demonstrates that tMTV-pre was highly prognostic for OS in a group of uniformly treated patients with stage III NSCLC in a multi-institutional setting. The effect of tMTV-pre on OS was independent of PS, stage, age, gender, and SUVpeak. Additionally, our findings suggest that the negative effect of higher tMTV-pre values on OS may be diminished by increasing the radiation dose delivered to these tumors. Lastly, we also found that tMTV-pre was prognostic for LC.

Multiple studies have demonstrated an association between OS and tMTV-pre in patients with NSCLC [47, 9]. However, most of these studies included patients with all stages of disease and were performed at a single institution. The significance of the current analysis is that we have shown a prognostic impact of tMTV-pre specifically in a subgroup of NSCLC patients with inoperable stage III disease. Stage III NSCLC represents a very diverse group of patients. The value that tMTV-pre adds in this group is that it may be useful in future studies to select for the highest-risk patients in whom to investigate more aggressive treatment regimens including radiation dose escalation, consolidation chemotherapy, or the addition of novel targeted and/or immunotherapy agents.

Ohri et al. recently reported that pre-treatment MTV was prognostic for OS and LC among patients treated in ACRIN 6668/RTOG 0235 [15]. There are some key differences between the analyses in that study and those of our study, most notably that the definition of MTV was different. Ohri et al. used a semi-automatic gradient-based method to calculate MTV, whereas we used a lesion-adaptive relative threshold model based on the SUVpeak within each lesion (primary tumor and each lymph node). The gradient-based method leads to larger values of MTV, as indicated by the fact that the median pre-treatment MTV in the study by Ohri et al. was nearly twice that in the present analysis (57 mL vs. 32 mL). Despite this substantial difference in techniques, the principal message that has become evident from these and other studies is that MTV is a strong indicator of overall disease burden, and that it is highly prognostic for survival outcomes.

It remains an open question whether a particular definition of MTV is optimal. The difference in techniques between our study and the analysis by Ohri et al. leads to the hypothesis that the relative threshold method for defining MTV may reflect the most metabolically active areas within each lesion, whereas the gradient-based method more closely approximates the size-based gross tumor volume (GTV) typically delineated on computed tomography, which was the first volumetric marker of tumor burden found to be prognostic [16, 17]. This hypothesis is supported by the interesting finding in our study of a significant interaction between radiation dose and tMTV-pre for both OS and LC.

We found that the negative prognostic impact of tMTV-pre diminished as the radiation dose delivered increased (Fig. 2), suggesting that one way to approach patients with a large disease burden initially is to escalate the radiation dose. Patients with low tMTV-pre had similar OS regardless of radiation dose delivered (Fig. 3a). In contrast, it appears that for patients with high tMTV-pre, it is important that a dose of at least 60 Gy is delivered (eFigure 2). Simple escalation to higher doses, however, may not be sufficient for further improving OS in patients with large tMTV-pre. Nevertheless, Fig. 4 suggests that LC in patients with high tMTV-pre continues to improve with doses ≥70 Gy, though there is insufficient power to demonstrate statistical significance.

Based on the results of RTOG 0617, which demonstrated no improvement—and possible worsening—of OS and LC with 74 vs. 60 Gy given with concurrent chemotherapy in patients with inoperable stage III NSCLC [18], radiation dose escalation above 60–66 Gy is not the current standard of practice. One of the proposed hypotheses for the unexpected results of RTOG 0617 is that the cardiac and pulmonary toxicity associated with higher radiation dose may have contributed to the findings. However, with the increasing use of PET/CT for radiotherapy treatment planning purposes (either obtaining PET/CT in the treatment position or using software to fuse the PET/CT images to the CT images acquired at treatment planning), it may be possible to escalate the dose selectively to the high-risk PET-positive areas, which would allow for lower radiation doses to the surrounding normal critical structures. The use of tMTV-pre as defined in this study could be one way to define the high-risk PET-positive region. RTOG 1106/ACRIN 6697 is currently investigating the feasibility of dose escalation guided by mid-RT PET/CT [19].

Another distinction between the current study and that of Ohri et al. is that we analyzed the impact of post-treatment MTV on OS and found it to be an adverse prognostic factor. However, as in the primary analysis of the ACRIN 6668/RTOG 0235 data set, SUV was the strongest prognostic marker for OS in the post-treatment setting. The definition that we used for tMTV-post was largely based on an absolute SUVpeak threshold. Thus, post-treatment SUVpeak and tMTV-post were highly correlated, unlike the corresponding pre-treatment parameters. It is not surprising, then, that on multivariate analysis, SUVpeak but not tMTV-post remained prognostic for OS, indicating that the tMTV-post does not add independent information beyond the SUVpeak. We did not analyze the relationship between tMTV-post and LC, because patients with measurable tMTV-post likely already have local recurrence or radiation pneumonitis. While some post-treatment PET/CT imaging biomarkers may have a role in identifying patients with local-regional recurrences after chemoradiation, we feel that the strongest role for MTV is in the pre-treatment setting, as it can be used to help identify patients at highest risk of both death and local failure earlier in their disease and treatment course.

There are several limitations of our study. First, this was a hypothesis-generating, unplanned, retrospective analysis. We had no pre-specified cutpoint for separating the cohort into high- and low-tMTV-pre groups. As such, a prospective study (similar in design to ACRIN 6668/RTOG 0235) that uses a pre-specified cutpoint for tMTV-pre would be ideal for confirming our findings. This could be incorporated as a secondary endpoint in future stage III NSCLC clinical trials. Also, the local-regional control endpoint was reported by each institution but was not confirmed by central review. Given the intrinsic difficulty in interpreting post-treatment PET/CT images, scored local failures may have been confounded by both false-positive and false-negative findings. Improved methods for assessing local control after chemoradiation are needed, and we suggest the use of other PET tracers of proliferation, such as 3-deoxy-3-18F-fluorothymidine (FLT). Lastly, the analyses of outcome by radiation dose delivered were also unplanned post hoc comparisons that arose from the observation of an interaction between tMTV-pre and dose. Ideally, in order to incorporate radiation dose into a survival model using time since registration, one would want to use a time-dependent covariate where the dose values are updated over time (i.e. a counting process style of input structure); however, the method of data collection for the trial precluded this, as we did not have dates corresponding to all radiation dose administration. This raises the potential for bias (akin to immortal time bias [20]), although we did undertake sensitivity analyses excluding patients below either 56 or 60 Gy (eFigures 2 and 3). Ideally, this finding should also be confirmed in future studies, with data collection structured so as to allow for the incorporation of radiation dose as a time-dependent covariate in survival modelling of pre-treatment FDG-PET measures.

In conclusion, higher tMTV-pre is associated with significantly worse OS and LC in inoperable stage III NSCLC treated with definitive chemoradiotherapy. Our findings suggest that for patients with large tMTV-pre, achieving a therapeutic dose of radiation may help maximize OS. Overall, tMTV-pre is a valuable biomarker that may be used to stratify patients for risk-adapted therapies in stage III NSCLC.