Introduction

Glioblastoma (GBM) is the most common malignant brain tumor, with a median overall survival (OS) of 12–18 months [1]. The current standard of care for newly diagnosed GBM involves maximal surgical resection, concurrent radiation therapy (RT) and temozolomide (TMZ), followed by adjuvant/maintenance TMZ [2]. Despite this aggressive treatment almost all newly diagnosed GBM relapse. However, evidence suggests a nontrivial proportion of tumors exhibiting early radiographic progression following chemoradiation may not have growing tumor, but instead may have increased contrast enhancement that mimics tumor progression or “pseudoprogression (PsP)” [3,4,5,6]. Although the mechanisms of radiation-induced CNS changes are quite complex [7], PsP is thought to be part of a continuum of treatment-related changes ranging from early subacute inflammation to radionecrosis that typically occurs within months after radiotherapy [6]. While PsP was described more than 30 years ago [8,9,10], differentiation of PsP from true tumor progression continues to be a significant diagnostic challenge in neuro-oncology.

Studies have estimated the occurrence of PsP to range from 3 to 35% of patients treated with chemoradiation [11,12,13,14,15,16,17,18,19,20,21], with more than 90% of tumors with MGMT promoter methylation exhibiting PsP [12]. However, the true incidence of PsP during standard chemoradiation remains unknown, largely due to the tendency to treat patients at the first signs of radiographic progression and before verification of true progression is performed. Some studies included only cases with early progression occurring far less than 6 months (e.g., 2 months) after completion of radiotherapy [16, 17], even though PsP is known to present up to 6 months after radiotherapy [4]. Further, the definition of PsP is variable in the literatures. While most studies reporting the incidence of PsP included only WHO grade IV tumors, some studies also included WHO grade III tumors [14, 17, 22]. While the classical definition of PsP is largely retrospective in nature, with PsP defined as exhibiting imaging changes resolving over time following initial radiographic progression, one might consider an alternative, more clinical definition of PsP that includes (1) early radiographic progression within 6 months after radiotherapy followed by (2) a relatively long post-progression survival. Our definition can mitigate the variabilities arising from the difference in the frequency of follow-up MRI and can include patients that showed progression near 6 months after radiotherapy and would otherwise have been excluded from the report of the frequency of PsP under the classical definition of PsP. The thought is that PsP may be a favorable feature, in that the larger the treatment-related changes the better the potential outcome. In this way, we theorize that patients with early disease progression and long overall survival following chemoradiation, or “clinically-defined PsP”, may have distinct clinical, molecular, and imaging features that separate them from “true progressive disease (PD)”, defined as early radiographic progression followed by a short overall survival after completion of chemoradiation. To test this hypothesis, we quantified characteristics of newly diagnosed GBM exhibiting a short progression-free survival (PFS < 6 months) within the control (standard chemoradiation) arm in the phase III AVAglio trial (NCT00943826), then determined which features differentiate patients with short (ROS < 12 months) from long residual overall survival (ROS > 12 months).

Materials and methods

Patient population and data acquisition

A total of 463 patients from 120 institutions and 23 countries with pathologically confirmed newly diagnosed GBM in the supratentorial region from the placebo (control) arm from a multicenter phase III trial (AVAglio, ClinicalTrials.gov #NCT00943826) were included in the current retrospective analysis. All patients on the control arm received concurrent radiation therapy and TMZ followed by adjuvant TMZ for up to 6 cycles until first recurrence [23].

In the current study, patients with early PD, defined as PFS within 6 months, were interrogated to identify clinical, molecular, and imaging characteristics associated with short or long ROS after first radiographic progression. The date of progression was determined centrally by an independent radiologic facility according to the clinical trial guidelines [23]. Since the AVAglio trial began in 2009, before the current response assessment in neuro-oncology (RANO) criteria was described in 2010, the trial used a comparable criterion based on a modification to the Macdonald Criteria that included qualitative assessment of non-enhancing tumor in addition to enhancing tumor. Patients with early PD were categorized into 2 groups, those with "true PD" (ROS < 1 year) and “clinical pseudoprogression” (ROS > 1 year). Clinical and molecular features were also gathered. Patients with known IDH mutant tumor were excluded from analyses.

Magnetic resonance imaging and post-processing

Anatomic MR images were acquired for all patients in the current study using a 1.5-T or 3-T clinical MR scanner using pulse sequences supplied by their respective manufacturers and according to their local standard of care protocols [23]. Pre-contrast axial T1-weighted fast spin-echo or 3D gradient echo sequences were acquired along with T2-weighted fast spin-echo and fluid-attenuated inversion recovery (FLAIR) sequences. In addition, parameter matched T1-weighted images were acquired after injection of gadolinium-based contrast agent.

Following determination of progression by a central independent radiologic facility for the primary trial endpoints, post-hoc analysis was performed by a separate imaging core lab. For this post-hoc analysis, non-enhanced T1-weighted, T2-weighted, and FLAIR images were first linearly registered to contrast-enhanced T1-weighted images. Next, contrast-enhanced T1-weighted subtraction maps were created by normalizing image intensity for both non-enhanced and contrast-enhanced T1-weighted images, then voxel-by-voxel subtraction between the normalized non-enhanced and contrast-enhanced T1-weighted images was performed.

Segmentation of contrast-enhanced T1-weighted digital subtraction maps and FLAIR

Three mutually exclusive volumes-of-interest (VOIs) were defined on acquired images using a semi-automated thresholding method described previously [24,25,26]: (a)contrast-enhancing tumor defined by T1-weighted subtraction maps; (b)central necrosis defined by T1 hypointensity within areas of contrast enhancement and resection cavity; and (c) T2 hyperintense regions on FLAIR images, excluding areas of contrast enhancement and necrosis. A team of trained lab technologists created initial VOIs and all final VOIs were reviewed by a neuroradiologist (A.H.) with 9 years of experience in neuroimaging analysis.

Influence of tumor location using analysis of differential involvement (ADIFFI)

In order to determine whether there were any differences in baseline tumor location between patients exhibiting true PD and those exhibiting clinical PsP, ADIFFI analysis was performed as outlined previously [27, 28]. A 2-tailed Fisher’s exact test was used to evaluate a 2 × 2 contingency table comparing two differential phenotypes and tumor versus non-tumor tissue for each image voxel at baseline. The technical details are documented in Supplementary Note 1.

Radiomic analysis

Next, radiomic analysis was performed on images at the time of suspected radiographic progression to determine whether there are specific patterns or textures that may differentiate patients with long versus short ROS. We limited radiomic analysis to patients with at least 40% increase in enhancing tumor volume between baseline and progression so we were confident these patients had large tumors with well-documented tumor growth relative to the start of treatment. Patients were randomly divided into the training cohort and validation cohort at a ratio of 2 to 1, while preserving the ratio of true PD and clinical PsP in both groups. Technical details of radiomic analysis are described in Supplementary Note 2. Supplementary Note 3 provides the detailed information of radiomic features, in concordance with Imaging Biomarker Standardization Initiative [29].

The radiomic model developed on the training dataset was then tested on the validation dataset. We also applied clinical data (age, sex, and KPS score) to see if this information improved the performance of the radiomic analyses. Further, we also performed multivariate logistic analysis based on clinical data (age, sex, and KPS score) and contrast-enhancing and non-enhancing tumor volume at progression to predict clinical PsP.

The area under the curve from a receiver operating characteristic curve analysis, accuracy, sensitivity, and specificity for predicting clinical PsP were calculated using radiomic and multivariate logistic analyses for both training and validation datasets. The classification threshold was set at 0.5 for all models. Kaplan–Meier survival curves with a log-rank test to compare the ROS of groups predicted by radiomic and multivariate logistic models were used for both training and validation cohorts.

Statistical analysis

Demographic data, including age, sex, reasons of PD, MGMT status, types of surgery, and Karnofsky Performance Scale (KPS) were compared between patients with true PD and clinical PsP by the t-test or Chi-squared test. Contrast-enhancing tumor volumes at baseline before chemoradiation and at progression were compared between GBM showing true PD and clinical PsP using Mann–Whitney U test. Simple linear regression analysis was performed to investigate the relationship between PFS and ROS or OS in true PD and clinical PsP. A two-tailed P < 0.05 was considered statistically significant. All statistical tests were performed using GraphPad Prism v9 or Matlab R2018a.

Results

Out of 463 patients initially included in this study, two patients were known to have IDH mutation and excluded from analyses (178 patients had IDH status available [30]). Out of the remaining 461 patients, 12 patients were censored for PFS within 6 months and excluded because of no date of progression. Among remaining 449 patients, 226 (50.8%) progressed within 6 months since starting the trial. From the 226 patients, 25 patients who died at the date of PD, 6 patients who was censored for ROS within 12 months after PD, and 26 who had incomplete or no post-contrast scans at progression were subsequently removed from the study (Fig. 1A). The remaining 169 patients with PD within 6 months were further analyzed (Fig. 1B) and divided into two groups based on their ROS. Group 1 (true PD) was composed of 104 patients (61.5%) with a PFS less than 6 months and ROS less than 12 months after progression, while Group 2 (clinical PsP) was composed of 65 patients (38.5%) and corresponded to patients who progressed within 6 months but had an ROS longer than 12 months. Detailed demographics are shown in Table 1.

Fig. 1
figure 1

A Flow chart (Sankey diagram) showing association between patients enrolled in the control arm and patients exhibiting early progressive enhancement. B Diagram showing association between patients exhibiting early progressive enhancement and MGMT methylation status

Table 1 Patient demographics

Patients presenting with “clinical PsP” were significantly younger than those presenting with “true PD” (median age (range), 54 (24–69) vs 58.5 (24–74); Mann–Whitney U test, P = 0.003), MGMT methylation was associated more with “clinical PsP” (Chi-squared test, P = 0.02), and baseline KPS was higher in patients presenting with “clinical PsP” than those presenting with true PD (Chi-squared test, P = 0.01). While a similar proportion of “clinical PsP” and true PD patients exhibited any increase in tumor volume at the time of progression (86/104 or 82.7% vs. 53/65 or 81.5%) or more than a 40% increase in volume at the time of progression (65/104 or 62.5% vs. 43/65 or 66.2%), contrast-enhancing tumor volume at baseline (Fig. 2A; median (interquartile range), 7.3 (4.3–13.3) mL vs. 14.0 (6.3–25.1) mL; Mann–Whitney U test, P = 0.002) and at the time of radiographic progression was significantly lower in patients exhibiting clinical PsP compared with true PD (Fig. 2B; median (interquartile range), 14.5 (6.9–24.3) mL vs. 21.4 (9.2–41.1) mL; Mann–Whitney U test, P = 0.02). Importantly, patients who did not experience an increase in tumor volume at recurrence relative to baseline (~ 20%) had some tumor shrinkage prior to progression, resulting in recurrence being identified with respect to the nadir measurement. Additionally, there was no significant linear relationship between PFS and ROS in patients with true PD (Fig. 2C; P = 0.79) or clinical PsP (P = 0.15). However, a significant linear relationship was observed between PFS and OS, both taken from the time of randomization, for patients determined to have true PD (Fig. 2 D; P < 0.001), but not clinical PsP (P = 0.61).

Fig. 2
figure 2

Contrast-enhancing tumor volume at A baseline and B progression. Significant differences were found between patients with true PD and clinical PsP both at baseline and progresssion. Relationship between PFS and C residual (post-progression) overall survival ROS or D OS from randomization. A significant linear relationship was found only between PFS and OS in patients who presented with true PD

Next, we investigated the relationship between tumor location at diagnosis and whether patients experienced true PD or clinical PsP. In both true PD and clinical PsP, tumors were highly concentrated around the lateral ventricles (Fig. 3A, B). However, ADIFFI statistical analysis identified a cluster contacting the subventricular zone, exhibiting a volume of 7.4 ml within the right internal capsule, thalamus, putamen, globus pallidus, and temporal white matter and gray matter that showed a significantly higher frequency of occurrence in patients exhibiting true PD compared to clinical PsP (Fig. 3C). No regions were identified to occur significantly more frequently in patients with clinical PsP compared with true PD.

Fig. 3
figure 3

Voxel-wise frequency of tumor occurrence for A true PD (n = 104) and B clinical PsP (n = 65). C Cluster identifying a statistically higher frequency of tumor recurrence in patients exhibiting true PD compared with clinical PsP encompassing the right internal capsule, thalamus, riht putamen, globus pallidus, and temporal lobe, contiguous with the subventricular zone near the right posterior lateral ventricle (MNI coordinates of the peak, X, Y, Z = 33, − 30, 5)

A total of 102 patients (60%) showed at least a 40% increase in enhancing tumor volume between baseline and the time of radiographic progression. This cohort was then split into a training and validation dataset for radiomic analyses at the ratio 2:1. No significant difference was found in clinical and molecular characteristics between training and validation datasets, as summarized in Supplementary Table S1. Out of the 9,056 radiomic features, 8 radiomic features remained through the feature selection processes for differentiating between true PD and clinical PsP within the training cohort (Supplementary Table S2). When clinical data (age, sex, and KPS score) were added to the 9,056 radiomic features and selection processes were performed, only age was selected as a significant feature in addition to 8 radiomic features described above.

Supplementary Table S3 summarizes the prediction performance of the radiomic features, radiomic + clinical features, and multivariate logistic analysis based on clinical data in the training and validation datasets. For training dataset, the accuracy of predicting clinical PsP was higher for the radiomic model with MRI features and age than the radiomic model only with MRI features and multivariate logistic model with clinical features, with radiomic model only with MRI feature higher than multivariate logistic model with clinical features (82.4% vs 77.9% vs. 64.7%). Interestingly, the accuracy of predicting clinical PsP in the validation dataset was higher for the radiomic model that only contained MRI features than the radiomic model with MRI features and age (70.6% vs. 58.8%). The accuracy of multivariate logistic analysis in the validation dataset was also lower (55.9%). Figure 4 illustrates results of log-rank analysis performed on the prediction results from radiomic and multivariate logistic analyses. The predicted clinical PsP showed significantly better ROS than the predicted true PD based on the results of radiomic analysis performed with only MRI features (Fig. 4A; log-rank, P = 0.004, HR = 0.46 [95% confidence interval, 0.28–0.76]; median ROS, 19.3 vs. 9.3 months) and MRI features + age (Fig. 4B; log-rank, P < 0.001, HR = 0.35 [95% confidence interval 0.22–0.58]; median ROS, 20.3 vs. 9.0 months) on the training dataset. On the validation dataset, the clinical PsP predicted by radiomic analysis with only MRI features was verified to have a significantly longer ROS than the patients with predicted PD (Fig. 4D; log-rank, P = 0.04, HR = 0.47 [95% confidence interval 0.24–0.95]; median ROS, 14.2 vs. 7.2 months), while the clinical PsP predicted by radiomic analysis with MRI features and age did not show significant association with ROS (Fig. 4E). Clinical PsP predicted by multivariate logistic analysis with clinical features were not significantly associated with ROS neither on the training (Fig. 4C) nor validation dataset (Fig. 4F).

Fig. 4
figure 4

Results of log-rank analysis based on the results of prediction by radiomic analysis [MRI features only (A, D) and MRI features + age (B, E)] and multivariate logistic analysis using clinical features (C, F) performed on the training (top row) and validation dataset (bottom row). Hazard ratios are shown for predicted clinical PsP in relation to predicted true PD with their 95% confidence intervals

Discussion

Treatment-related PsP continues to be a significant diagnostic challenge and a lack of consensus on an objective definition is at least partially the reason. In the current study, we chose to use a more clinically defined definition of PsP that includes a combination of early radiographic progression (PFS < 6 months) and a relatively long ROS after progression (ROS > 12 months). This is consistent with current thought that PsP is a favorable phenomenon, and that treatment-related inflammation may result in a favorable outcome [6, 12]. While this is slightly different from most working definitions of PsP, which requires resolution of imaging changes over time, retrospective evaluation of PsP within clinical trials using the more traditional definition is almost impossible to validate because most patients are taken off study or change treatments after radiographic progression. PsP is known to occur up to 6 months after radiotherapy [4] and our definition allows inclusion of patients who showed progression near 6 months after radiotherapy in a study in the report of the frequency of PsP.

Despite these slight differences in the definition of PsP, the current study estimated the rate of radiographic PD within 6 months to be around 42.1% and the rate of clinical PsP of around 38.5% of patients with early PD, which appears consistent with the incidence described in previous, smaller studies by Jefferies et al. [31] and Chaskis et al. [21] Importantly, this incidence is slightly higher than the incidence described by a meta-analysis [32]. Our slightly elevated incidence may be due to the different definitions for PsP. Additionally, the current study suggested that patients with tumors exhibiting MGMT promoter methylation had a significantly higher incidence of “clinical PsP” (17/30, 56.7%) compared with unmethylated tumors (34/104, 32.7%). This high prevalence in MGMT methylated tumors is consistent with the study by Brandes et al. [12], albeit at a significantly lower proportion (21/23, 91%). This can at least partially be explained by differences in definition of PsP. The study by Brandes et al. is based on a more classical, retrospective definition as an immediate progressive enhancement event followed by subsequent stabilization or reduction in tumor size after subsequent observation, whereas the current study defines “clinical PsP” based on the discrepancy between a relatively short PFS and a relatively long post-progression survival. This group of patients is a bit perplexing, as the meta-analysis from Alnahhas et al. [33] (and our observations) suggests patients with early PD are enriched in MGMT unmethylated patients (i.e. short PFS), while “clinical PsP” is more prevalent in MGMT methylated patients (i.e. long OS after chemoradiation).

The impact of tumor location and baseline tumor burden on time to tumor progression and post-progression survival is also worthy of independent investigation. Results from the current study suggest GBM exhibiting “true PD” occurs significantly more frequently in tumors near the right internal capsule, thalamus, lentiform nucleus, and temporal lobe, including areas contiguous with the subventricular zone that are associated with poor outcomes [34,35,36]. Additionally, patients with higher enhancing tumor volume, patients who are older, and patients with lower baseline neurological function also had worse outcome. While the impact of tumor size [25, 37], age [38], and neurological status [37,38,39] are known to impact patient outcomes, entangling the interconnectedness of anatomic tumor location and other comorbidities including tumor size and methylation status on outcomes remains a significant challenge.

A promising technique for trying to differentiate treatment-related effects from tumor progression is radiomics–or the method of extracting image features from medical images [40]. Results from the current study suggest that a pure radiomic analysis of anatomic MR images at the time of progressive enhancement, within 6 months of starting chemoradiation, results in a 70.6% accuracy of predicting whether they will have a post-progression survival longer than 12 months. Importantly, patients who experienced “true PD” showed a strong correlation between PFS and OS (Fig. 2D), supporting the hypothesis that time to progression is related to overall patient survival in GBM. One of the possible causes for an unsatisfactory accuracy by radiomic analysis is that the MRI acquisition protocols were not standardized across the participating institutions [23], although this may be actually ideal for developing a universally applicable radiomic model. The current radiomic model may benefit from sophisticated algorithms for post-acquisition harmonization of images and radiomic features [41]. Additionally, incorporation of diffusion and perfusion MRI into the radiomics model may improve differentiation of clinical PsP and true PD [42]; however, this information was not collected in the current trial.

Conclusions

The current study examined the control arm of the AVAglio trial to try and identify clinical, molecular, and conventional MRI features associated with “clinically-defined” PsP, or patients exhibiting a short progression-free survival (PFS < 6mo) and a long residual overall survival after progression (ROS > 12mo). Results suggest a combination of well-described characteristics including tumor size, location, MGMT status, age, and KPS score can identify newly diagnosed GBM patients at risk for true tumor progression.