Introduction

Radiomics is an emerging field in medicine and oncology involving quantitative feature analysis of radiographic images [1, 2]. Radiomics can provide insights into tumor biology as the imaging features can be correlated to the tumor microenvironment and genomic information, as obtained from histopathology and molecular studies, and thereby serve as a surrogate marker for treatment outcomes [3]. Gliomas are a heterogeneous group of tumors arising from glial cells and named after their resemblance to glial cell lineages such as astrocytoma and oligodendrogliomas and classified into four grades that characterize biological behavior [4]. For the work here, we refer to grade 2 infiltrative gliomas as low grade glioma (LGG), which are typically identified on magnetic resonance imaging (MRI) as T2-W hyperintense lesions. On MRI, distinct anatomical compartments can be identified in glioblastoma (GBM), including tumor enhancement appearing as contrast-enhancing regions on T1-weighted sequences, with a central non-enhancing necrotic core and the adjacent peritumoral region (PTR) appearing as a T2-weighted hyperintense area. Although the structural composition is quite distinct, the radiological appearance of LGG is similar to the PTR of GBM. In LGG, the areas of T2-weighted hyperintensity represent tumor, whereas GBM PTR is an admixture of microscopic infiltrative tumor and vasogenic edema [5, 6]. The spatial information regarding the presence of microscopic infiltrative disease within the PTR of GBM is indistinguishable from the edema using semantic MRI features. The use of computer vision and quantitative imaging analysis in the form of radiomics can unfold high-dimensional information in an image and can potentially outperform low-level visual feature interpretation. Since the structural composition of the LGG and the GBM PTR is different, an exploratory analysis was contemplated in the current study to investigate the ability to differentiate the two groups using a radiomics approach.

Our previous work demonstrated a radiomic-based approach using 270 features to categorize individual voxels in the PTR of GBM on a probabilistic scale to indicate infiltrative tumor or edema extrapolating feature characteristics from LGG and PTR of brain metastasis, respectively [7]. The hypothesis of the current study was the ability of radiomics to effectively distinguish between the T2W hyperintensity of LGG (tumor) from GBM PTR (tumor and edema). The presented work was undertaken to differentiate the entire region of GBM PTR from LGG, expanding the number of features (3822) using various filtration techniques. As opposed to 10 patients with GBM with post-operative MRI analyzed in the previous work, in this study, 42 patients with GBM were analyzed using preoperative index MRI. Also, in the present study, different feature selection techniques (all 270 features were used in the previous study for modeling) along with four machine learning classifiers (only support vector machine classifier in previous work) were investigated. The radiomics model can be explored in future studies to identify patches of microscopic disease within the PTR of GBM that can help reduce the clinical target volumes for radiation rather than treating the entire T2W signal changes.

Methods

Patient selection

This retrospective study was approved by the institutional research ethics review board, and the requirement of obtaining consent was waived. Adult patients (≥ 18 years of age) with IDH-wildtype GBM and LGG were identified from an institutional database from January 2014 to December 2018, with MRI data available from two identical scanners. For patients in the GBM cohort, preoperative MRI was used for analysis, usually undertaken within one week of surgery. The LGG cohort included grade 2 astrocytoma and oligodendroglioma. Patients with high-risk or atypical features on histopathological evaluation or higher grade features on radiology, like intervening contrast-enhancing regions, significant T2W heterogeneity, were excluded. Post-operative scans were included in the study for the LGG cohort, but patients with prior radiotherapy or chemotherapy were excluded. During segmentation, the surgical cavity and tract were manually excluded. Tumors with segments smaller than 1 cm on at least one dimension were excluded from the study.

MRI protocol

Imaging was acquired on a GE Signa HDxT 1.5 T MRI scanner. using an 8-channel head coil (General Electric Medical Systems, Waukesha, WI, USA) was used. Three sequences were included in the current study: Gadolinium (Gd) (Gadovist 01 mmol/kg, 10 ml maximum, bolus) enhanced 3D T1-weighted FSPGR (T1-CE); Gd-enhanced T2 fluid-attenuated inversion-recovery PROPELLOR (T2-FLAIR); and diffusion-weighted imaging-derived apparent diffusion coefficient maps (ADC). A single-shot echo-planar imaging sequence with 3 diffusion directions and a b-value of 1000 s/mm2 was used to obtain diffusion-weighted images, and ADC maps were reconstructed online using GE’s Functool. The MRI acquisition parameters are shown in Supplementary Table S1.

Image preprocessing and segmentation

Figure 1 shows the schema used for the current study. The T2-FLAIR and ADC scans were first resampled to the corresponding T1-CE volume to match the field-of-view and resolution using the FMIRB Software Library (FSL) tool FLIRT. A pre-trained artificial neural network-based automated method was used for skull stripping using HD-BET. The extracted brain volumes from the T2-FLAIR scans were rigidly registered to the corresponding T1-CE volumes using FLIRT. Data handling and scripting were performed in Matlab R2018b (The Mathworks, Inc., Natick, MA, USA). The T1-CE, T2-FLAIR, and ADC volumes were combined into single workspaces for each patient using the software platform ITK-SNAP (http://www.itksnap.org) for manual segmentation [8].

Fig. 1
figure 1

Schematic of research methodology illustrating initial image registration, segmentation of the areas of interest in the two cohorts, glioblastoma peritumoral region (GBM PTR) and low-grade glioma (LGG), image processing, and feature extraction

Segmentation was carried out manually initially by a radiation oncologist (AD), and all cases were individually reviewed by a neuroradiologist (PM) and a neuro-radiation oncologist (AS) in order to achieve final consensus. The segments in LGG included the tumor observed as T2-FLAIR hyperintensity, with the surgical cavity excluded for post-operative cases. In patients with GBM, the PTR was segmented to include the T2-FLAIR hyperintense region beyond the contrast-enhancing tumor core (Supplementary Fig. S1).

Normalization & feature extraction

Prior to feature extraction, z-score intensity normalization was performed for all images by centering all pixel intensities within each brain to zero and then dividing by the standard deviation. The normalized images were then multiplied by 100 and shifted by 300 to ensure that the ± 3σ majority of pixel intensities were non-negative. Fixed bin width (FBW) quantization was used to discretize pixel intensities within each segment [9, 10]. The FBWs and corresponding bin counts (BC)s for the T1c images was 13 (BCmedian = 52; BCrange = 16–87); for the T2f images was 20 (BCmedian = 27; BCrange = 16–43); and for the ADC images was 7 (BCmedian = 52; BCrange = 18–124).

Feature extraction was performed using PyRadiomics software V2.2.0 The feature set included the following: 18 first-order statistical features; 22 Gy level co-occurrence matrix (GLCM) features; 16 Gy level size zone (GLSZM) features; 16 Gy level run length matrix (GLRLM) features; 5 neighboring gray tone difference matrix (NGTDM) features; and 14 Gy level dependence matrix (GLDM) features. Additional features were extracted by pre-processing images, either wavelet or Laplacian of Gaussian (LoG). LoG features were extracted with kernel sizes of 1, 2, 3, 4, and 5 mm. All features were extracted from the segments in 3D. Pixel intensities outside the segments were set to zero prior to image filtration to reduce the effect of contamination. 91 features were derived from unfiltered images, 728 from wavelet filtered images and 455 from LoG filtered images, resulting in 1274 features per modality and a total of 3822 features. A detailed description of the features can be found on the Pyradiomics website (https://pyradiomics.readthedocs.io/en/latest/features.html).

Feature selection & classification

All model building steps were performed in python using scikit-learn V0.22.2 [11]. Three feature selection approaches were utilized. Two were filter-based methods: ANOVA F-Test and minimum redundancy maximum relevance (mRMR). Recursive feature elimination (RFE), a wrapper-based approach, was also tested. A linear support vector machine (SVM) classifier (regularization parameter C = 1) was used as the base learner, and at each iteration, the 5 least important features were eliminated from the total set until a pre-determined number of features was returned. For each feature selection method, sets of the top 4 features were included. This threshold was chosen based on the commonly employed rule-of-thumb that models should be trained on datasets that have at least 10 times as many training samples per class as the number of features to mitigate overfitting [12]. In order to demonstrate the impact of feature selection on classification performance, a fourth strategy used no feature reduction i.e. models were trained on all available features.

In order to prevent data leakage that can be present when the same samples used for feature selection are again used for model validation, a leave-one-patient-out (LOPO) cross-validation approach was utilized. This strategy precludes the identification of an “optimal” feature set, as different features can be selected as the training fold is permuted; however, it reduces the optimistic bias that can impact radiomics studies that make use of internal validation schemes. In lieu of identifying an “optimal” feature set, this approach can provide descriptive statistics of the selection frequency of different feature types and allow a degree of stability assessment with respect to feature selection. Features from the training data were scaled to zero mean and unit standard deviation at each LOPO iteration and the learned scaling parameters were applied to the features of the test patient.

Four machine learning classifiers were investigated in this study: support vector machine with a linear kernel (SVM); K-nearest neighbors (K-NN); linear discriminant analysis (LDA); and adaptive boosting using decision stumps as the base learner (AdaBoost). Hyperparameter selection was repeated at each LOPO cross-validation iteration through grid-search with nested fivefold cross-validation to maximize balanced accuracy across the nested cross-validation procedure. For SVM, the range of the regularization parameter C was 10–4 to 105 in multiples of 10; for K-NN, the range for the number of neighbors was from 1 to 11 in steps of 1; and for AdaBoost, the range for the number of trees was from 50 to 450 in steps of 100 (Supplementary Table S2). All other tunable model hyperparameters were left as their default values assigned by scikit-learn (https://scikit-learn.org/0.22/). LOPO model performance was quantified by accuracy, sensitivity, specificity, and the area under the receiver operator characteristic curve (ROC AUC).

Results

A total of 74 patients were included in the analysis, 42 patients with GBM with preoperative scans with 42 tumors, and 32 patients with LGG without high-grade features on imaging with a total of 36 tumors. Patient characteristics are summarized in Table 1. The median patient age was 62 years for the GBM cohort (range 46–79) and 41 years for the LGG cohort (range 22–71), with the majority male patients (GBM 28/42, LGG 24/32). Twenty-five (n = 25) (60%) of the GBM patients were O (6)-methylguanine-DNA methyltransferase (MGMT) methylated. In the LGG cohort, 25 (69%) of patients had tumors with an astrocytic lineage, and 11 (31%) had an oligodendrocytic lineage. In the LGG cohort, 27 (75%) were IDH mutated by immunohistochemistry, and 10 (28%) had 1p/19q co-deletion detected.

Table 1 Patient and disease charactersistics in the study

Classification

A heat map of the performance metrics for each combination of feature selection method and classifier is presented in Fig. 2, and the LOPO ROC curves are shown in Fig. 3. Among the three feature selection methods tested, RFE resulted in the highest ROC AUC for all four classifiers (0.868–0.916). The ANOVA F-test method resulted in the lowest accuracy scores for all classifiers (67.9–70.5). Since feature selection was performed independently at each LOPO iteration to prevent data leakage, the sets of selected features varied. With 4 selected features for each of the 74 LOPO iterations, the maximum number of unique features that could be selected was 296. Supplementary Figure S2 depicts a visual breakdown of the selected features by type across all LOPO iterations for each feature selection method. The majority of features selected were from T1-CE sequences (33–44%), followed by ADC (33–39%). The ANOVA F-test was the most stable method with respect to single patient permutation of the training set as only 9 unique features were selected, two of which were selected every time. In contrast, mRMR and RFE were less stable, with 62 and 46 unique features selected, respectively. Heat maps of the 5 most frequently selected features for each method along with their selection frequencies are shown in Fig. 4. Feature maps for 1 representative patient each from GBM and LGG for three selected features are presented in Supplementary Figure S3, showing significant differences on visual inspection.

Fig. 2
figure 2

Accuracy, sensitivity, specificity, and receiver operating characteristic area under curve (ROC AUC) analyses with four classifiers, Support Vector Machine (SVM), K-Nearest Neighbors (K-NN), Linear Discriminant Analysis (LDA), and adaptive boosting (AdaBoost) for all extracted features. Three feature selection algorithms were applied: ANOVA F-test (F-Test), Minimum Redundancy Maximum Relevance (mRMR), and Recursive Feature Elimination (RFE)

Fig. 3
figure 3

Receiver operating characteristic curves for each feature selection method using Support Vector Machine (SVM), k-Nearest Neighbors (K-NN), Linear Discriminant Analysis (LDA), and adaptive boosting (AdaBoost)

Fig. 4
figure 4

a shows the heat map of the 5 most frequently selected features for each feature selection method across 74 iterations of leave-one-patient-out cross-validation. Feature values were scaled from − 1 to 1 for visualization. b represents the parametric map with quantitative feature values in the regions of interest (GBM peritumoral region and low grade glioma) for the three most consistently selected features (all from T1W-contrast sequences) from the F-Test

In the cases of model development without feature selection, i.e. all 3822 features included, the accuracy of the SVM classifier matched the highest score when feature selection was applied (83.3%) while LDA (87.2%) and AdaBoost (88.5%) achieved their highest accuracies. Additionally, the highest ROC AUC for SVM, LDA, and AdaBoost was obtained when all features were used for training, with AUCs of 0.918, 0.915, and 0.958, respectively. In contrast, the K-NN methodology performed slightly worse in terms of both accuracy and ROC AUC as compared to the highest performance observed with feature selection.

Discussion

Radiomics has the potential to integrate clinical information and outcomes with radiology using computer analytics leading the way towards the development of novel biomarkers in oncology. Magnetic resonance imaging is considered the gold standard imaging modality for the management of central nervous system (CNS) malignancies, and it is particularly useful due to its high soft tissue contrast and the availability of different functional sequences, which can aid in tissue characterization [13]. Radiomic analysis has been widely undertaken in various CNS tumors, including gliomas, aiding in the differentiation between tumor histologies, grading, genetic profiling, and prognostication [14, 15]. For patients with GBM, the PTR has been described as the area around contrast enhancement identified as hyperintense region on T2 weighted and T2-FLAIR MRI sequences[16]. The complex microenvironment in the PTR region contains edema, vascular alterations, neoplastic cells, astrocytes, oligodendrocytes, inflammatory cells, microglia, stromal cells, pericytes [17], as well as GBM stem cells [18]. As approximately 90 percent of recurrences are witnessed in the PTR region [19], understanding tumor cell heterogeneity and microinvasion of cells in this region have the potential to impact treatment. Prasanna and colleagues [20] predicted survival outcomes using the top ten performing radiomic features in the PTR on routine MRI sequences. Despite the radiographic similarities on conventional MRI sequences, in contrast to GBM PTR, the architecture of LGGs is quite different. LGGs infiltrate surrounding brain tissue, but unlike GBM, they lack microvascular endothelial proliferation, necrosis and exhibit a lower Ki-67 proliferative index [21].

In this study, a radiomics approach using conventional MRI sequences distinguished GBM PTR from LGG with reasonable performance. The feature selection was done in this study using three approaches as there is a potential risk of overfitting as a total of 3822 features were extracted with 74 patients in the output group. The model was trained with features extracted from two groups as the PTR of the GBM (excluding the enhancing component of the tumor) and LGG. Although a chance of overfitting of classifiers exists given the relatively smaller sample size (n = 74), feature selection and cross-validation techniques provide confidence in mitigating a major risk of overestimation of performance indices. Achieving good results with AUCs over 0.70 across all the classifiers demonstrates the ability of the radiomics in differentiating the two cohorts. However, with an increase in sample size, the reliability and robustness of the models are expected to increase as well as the stability of the features can be ascertained in the future. Applying well-established classifiers with three feature selection algorithms to each of the sequences, T1-CE, T2-FLAIR, and ADC resulted in relatively high levels of discrimination. In terms of classification accuracy, performance ranged from 67.9% to 88.5% among the various models and feature selection approaches tested. Interestingly, the highest performing model was AdaBoost without using feature selection. The ROC AUC was also highest for SVM and LDA when all features were used for training as compared to models built using only 4 features. These models appear to handle high dimensional data as regularization and/or relative feature importance are implicit in their design. In contrast, the K-NN methodology performed worse when feature selection was not used prior to model training. K-NN strictly computes the distance of samples to its neighbors in the feature space and therefore assigns no relative importance to different features. When the number of features is large, the performance of local approaches such as K-NN is known to deteriorate, which is encompassed in the phenomenon referred to as the “curse of dimensionality.” A large number of features and the limited number of samples likely contributed to spurious associations between samples, which resulted in good specificity (86.1%) but poor sensitivity (69.0%). These results indicate that the risk overfitting attributable to feature dimension relative to sample size is both model and data-dependent and can be observed using robust cross-validation. In the LGG cohort, we have included both oligodendroglioma and astrocytoma. Some studies have shown the ability of radiomic analysis to differentiate between histology or molecular phenotype in LGG [22]. We have not undertaken an analysis to differentiate between the two histologies due to the lesser number of patients with individual tumor types. It will be interesting to perform future studies to investigate the ability of radiomics to differentiate between low-grade oligodendroglioma, astrocytoma, and the GBM PTR (arising from astrocytic lineage) individually.

Distinguishing between LGG tumors and the PTR region of GBM has potential clinical applications. Previous work had demonstrated a radiomic model could demarcate brain metastasis peritumoral region (prototype for vasogenic edema) from LGG (prototype for infiltrative tumor) [7]. The model was extrapolated on the individual voxels within PTR of GBM to identify areas of microscopic tumor, which was shown to positively correlate with future areas of recurrence. In the present study, we had included a higher number of patients with GBM using preoperative MRI, a large number of features were generated, multiple feature selection methods, and four machine learning classifiers were used for model development. The current work can be expanded to independently identify microscopic disease in the GBM PTR using the LGG-like signature. Also, the radiomics model can be integrated with the previous work to improve the classification performances and increase the reliability of a combined model. Some studies have shown the potential improvement in clinical outcomes with surgical excision of the PTR region in GBM [23, 24], which is currently studied in a Canadian phase 2 randomized trial (NCT04737577). The present study will be particularly important in the preoperative identification of infiltrative microscopic disease since the images in the GBM cohort had preoperative MRI as opposed to post-operative imaging in the previous work. Similarly, identifying areas of infiltrative disease within the GBM PTR region can help in individualized radiation planning. Generally, radiation volumes include large margins empirically to include the entire T2W hyperintense areas without spatial information on the microscopic disease, which can lead to increased toxicity. In a recent work, the use of small 5 mm margins beyond the gross tumor and cavity has shown survival outcomes similar to contemporary clinical trials [25]. The MRI-radiomics model will form an important basis in margin reduction in patients with GBM to be treated using an MRI-linear accelerator device (MR-LINAC) with GBM for a prospective phase 2 study of GBM (NCT04726397). Also, by identifying low grade disease within the PTR, this can allow differential dosing as low grade disease can be effectively treated with doses ranging from 45 to 54 Gy in 1.8–2.0 Gy per fraction, and the high grade components can be treated with 60 Gy in.1.8-2.0 Gy/day. This may reduce risks of radiation necrosis and help generating personalized clinical target volume delineation and dosing.

The work here has some limitations. This was a retrospective study with a limited sample size on one model of an MRI scanner. Future work can include the standardization of imaging features from different MRI scanners to validate the general applicability of such a model. In the LGG group, a proportion of images were obtained in the post-operative setting. Although the surgical tract and cavity were carefully excluded, changes including post-operative edema cannot be completely ruled out. The inclusion of probable areas of post-operative edema in the LGG group can potentially impact the features extracted since the rationale of inclusion of this group was to include tumor cells rather than edema. With the expansion of the study and availability of a higher number of patients, it will be important to limit the LGG cohort to pre-operative imaging to avoid such confounding factors. Further studies are needed to include a larger number of patients, incorporating other MRI sequences, and undertaking external validation approaches to establish a reliable and reproducible radiomics model.

Conclusion

A radiomics model using routine MRI sequences can discriminate between low grade gliomas and the glioblastoma peritumoral region with acceptable performance, despite their visual resemblance. The best results were obtained using an AdaBoost classifier with an AUC of 0.96 to differentiate between the two groups.