Introduction

Gliomas are the most common primary brain malignancies in adults. Lower grade gliomas refer to World Health Organization (WHO) grade II and III gliomas [1], and the response to treatment and survival of these patients differs significantly due to genetic differences between lower grade gliomas [2]. Recently, the use of individualised cancer therapy has attracted considerable interest and has been extensively investigated. Molecular biomarkers reflect the genetic background of the tumour and provide potential targets for the development of individualised treatments for patients with gliomas.

Epidermal growth factor receptor (EGFR) is a key glioma molecular biomarker. EGFR overexpression can promote malignant proliferation of tumour cells, and a number of studies have focused on suppressing malignant proliferation by inhibiting its activity [3]. Nimotuzumab, one EGFR-targeted drug, has already been approved for the treatment of oesophageal cancer [4], advanced head and neck cancer [5,6,7] and glioma [8] in 30 countries worldwide [9]. Studies have shown that the EGFR variant III mutation may significantly influence tumour microenvironment changes and invasion. Hence, detection of EGFR overexpression is especially important for the personalised treatment of glioma patients. Conventional methods for detecting the EGFR expression status include immunohistochemistry and polymerase chain reaction, which are invasive assays based on craniotomy.

Magnetic resonance imaging (MRI) is capable of detecting brain malignancy characteristics non-invasively. Preliminary studies suggest that MRI features (e.g. relative cerebral blood volume and tumour blood flow) are associated with EGFR expression in gliomas [10, 11]. Radiogenomics is a new field investigating the relationships between radiological features and genomic data [12] and allows radiological data to be transformed into high-dimensional information and features, thereby greatly enriching the quantitative information of the radiological features. Consequently, this helps analyse the relationships between radiological features and gene phenotypes. For instance, the O6-methylguanine-DNA-methyltransferase methylation status in glioblastoma can be predicted by a set of radiological texture features [13, 14].

In the present study, the texture features of gliomas were extracted from MRI data, and radiological features predicting the EGFR expression status were identified.

Materials and methods

Patients

A total of 270 patients with pathologically confirmed grade II or III glioma and available MRI and EGFR expression status data who were treated at our hospital between August 2005 and August 2012 were included in this study. The inclusion criteria were as follows: (1) pathologically confirmed lower grade glioma with no previous diagnosis of any type of brain tumour; (2) available EGFR expression status based on immunohistochemistry; (3) high-resolution pre-surgical T2-weighted MR images; (4) available clinical characteristics. Patients who had undergone preoperative treatment, those with other pathological types, or those with incomplete EGFR, imaging or clinical data were excluded. The patients were randomly assigned into training (n=200) and validation sets (n=70). The study was reviewed and approved by the institutional review board.

Immunohistochemistry of EGFR

Identification of EGFR amplification was conducted using standard immunohistochemical methods. First, the glioma specimens were fixed in formalin, after which they were routinely processed and paraffin-embedded. Five-micron-thick sections were created, and antibodies specific for EGFR (Santa Cruz Biotechnology, Santa Cruz, CA, USA) were used to perform the immunohistochemical staining. All samples were analysed by two pathologists with experience in scoring central nervous system tumours. Less than 5% positive staining was identified as negative expression, while 6–100% positive staining was identified as positive expression. High and low EGFR expressions were defined as >30% and ≤30% positive cells, respectively [15].

Tumour segmentation

T2-weighted MR images were used for the extraction of radiomics features, as these images are well accepted in the identification of abnormal hyperintense signals that represent the involved regions of low-grade gliomas [16,17,18]. Most T2-weighted images were obtained with a Trio 3.0T scanner (Siemens, Erlangen, Germany). The T2-weighted image parameters were as follows: TR = 5800 ms; TE = 110 ms; flip angle = 150°; field of view = 240×188 mm2; voxel size = 0.6×0.6×5.0 mm3; matrix = 384×300. Lesions were delineated on the MR image by two experienced neurosurgeons who were blinded to the clinical information of the patients using MRIcron software (http://www.mccauslandcenter.sc.edu/mricro/mricron). Abnormal hyperintense signals on the T2-weighted images were identified as the tumour areas, while care was taken to avoid the cerebrospinal fluid signal, which may mask the tumour margins. A senior neuroradiologist re-evaluated the lesions afterwards. If the discrepancy was ≥5%, the senior neuroradiologist decided on the tumour borders [19].

Extraction of texture features

Quantitative radiological feature extraction was conducted, as previously described [20]. 431 texture features that could be assigned into four groups were extracted from each tumour (Fig. 1). Group 1 (first order statistics) consisted of 14 descriptors that quantitatively delineate the distribution of voxel intensities within the MR image through commonly used and basic metrics. Group 2 (shape- and size-based features) contained eight three-dimensional features that reflect the shape and size of the tumour region. Calculated from grey level run-length and grey level co-occurrence texture matrices, 33 textural features that can quantify intra-tumour heterogeneity differences were classified into group 3 (texture features). Finally, group 4 (wavelet features) included the intensity and texture features derived from wavelet transformation of the original image (group 1 and group 3 features). These four groups of features are listed in Supplementary Table 1.

Fig. 1
figure 1

The workflow of the radiogenomic analysis in lower grade gliomas. A set of textural features on magnetic resonance imaging (MRI) was identified as an epidermal growth factor receptor (EGFR)-associated radiological signature in the training set and was further tested in the validation set

Texture-based EGFR prediction

A logistic regression model (Y = X1×β1 + X2×β2 + … + Xn×βn + k) was created using MATLAB (2014a) software. Here, Y is the estimated value of EGFR status (high expression is defined as 1, while low expression is defined as 0); X represents the value of each radiological feature; β is the unknown coefficient; and k is the unknown constant. Receiver operating characteristic (ROC) curve analysis was conducted to evaluate the prediction results. The logistic model was repeated and the variable that had the smallest predictive value for the EGFR expression level was excluded each time, until the smallest number of features that provided an area under the curve (AUC) of at least 0.9 remained. These radiomics features, selected from the training set, were defined as a signature and were subsequently evaluated in the validation set.

Statistics

The logistic regression model was established using MATLAB (2014a) software. The clinical characteristics of the two groups were compared using Student’s t test and the Chi-square test. The differences were considered significant at p < 0.05. ROC curve analysis was utilised to illustrate the prediction performance of the radiomic signature. The optimal cutoff value was selected as the point when the sensitivity plus specificity was maximal. The AUC and prediction accuracy were calculated in both the training and validation sets.

Results

Patient demographic characteristics

High EGFR expression was detected in 49.0% (98/200) of patients in the training set, and in 48.6% (34/70) in the validation set. No significant difference was found between patients with low or high EGFR expression with respect to age, sex or WHO grade in both the training and validation sets (p > 0.05). The detailed patient demographic characteristics are listed in Table 1.

Table 1 Clinical characteristics of the training and validation sets

Texture analysis

The texture features were extracted through the method described above, resulting in 431 texture features being obtained from each patient, including 14 first-order statistics, eight shape- and size-based features, 33 textural features, and 376 wavelet features derived from the first-order statistics (group 1) and textural features (group 3). These are shown in Supplementary Table 1.

Texture-based EGFR prediction

Training set

The logistic regression model was utilised to perform linear regression of the 431 texture features of the 200 patients in the training set. Variables that contributed less to the regression equation than others were excluded one by one. Next, the estimated value of the regression equation was used to predict the EGFR expression status, and a ROC curve was delineated. When there were 41 textural features left, the distribution of the estimated values of patients could be differentiated significantly (Fig. 2a). In the ROC curve analysis, the AUC was 0.90, and the optimal cutoff value (0.3934) exhibited a sensitivity, specificity and accuracy of 94.9%, 70.6% and 82.5%, respectively (Fig. 2b).

Fig. 2
figure 2

Efficiency of EGFR prediction using radiomic analysis in the training set. (a) When 41 features remained, patients with different EGFR status could be differentiated effectively by the estimated values. The green line refers to the true values of the patients. The blue and red dots refer to the estimated values of the low and high EGFR expression groups, respectively. (b) In the receiver operating characteristic (ROC) curve analysis, the area under the curve (AUC) was 0.90. Using the optimal cutoff value, determined as the point when the sensitivity plus specificity was maximal (red dot), the sensitivity, specificity and accuracy were 94.9%, 70.6% and 82.5%, respectively

The identified 41 EGFR-related radiological features could be regarded as a radiomic signature, comprising 25 first-order statistics or related wavelet features (such as range, standard deviation, uniformity, variance), one shape- and size-based feature (spherical disproportion), and 15 textural features or related wavelet features (such as sum variance, sum entropy, run percentage). All of these features are listed in Table 2.

Table 2 MRI radiological features for epidermal growth factor receptor (EGFR) expression prediction

Validation set

Next, these 41 features were applied to the validation set to predict the EGFR expression status. As a result, the distribution of the estimated values could be differentiated with high efficiency (Fig. 3a). In the ROC curve analysis, which was conducted for the evaluation of the prediction efficiency, the AUC was 0.95, and the optimal cutoff value (0.4334) exhibited a sensitivity, specificity and accuracy of 94.1%, 86.1% and 90.0%, respectively (Fig. 3b).

Fig. 3
figure 3

Efficiency of EGFR prediction using radiomic analysis in the validation set. (a) Patients with different EGFR status could also be differentiated effectively using the 41 features. The green line refers to the true values of the patients. The blue and red dots refer to the estimated values of the low and high EGFR expression groups respectively. (b) In the ROC curve analysis, the AUC was 0.95. Using the optimal cutoff point identified in the training set (red dot), the sensitivity, specificity and accuracy were 94.1%, 86.1% and 90.0%, respectively

Discussion

In this study, EGFR-associated radiomic features were studied based on a large-scale imaging and genetics database. Through radiological feature extraction, feature selection and model construction, a set of radiological signatures that could reflect the EGFR expression level was established. A radiogenomic signature allowing for the prediction of the EGFR expression level in patients with lower grade glioma was identified. In the future, using non-invasive radiological information could provide crucial clinical information for the development of individualised treatment targeting EGFR in the future.

Radiogenomics is an emerging field that explores the associations between radiological features on MRI and genetic characteristics of tumours. Many oncological studies have successfully linked radiological features with gene expression patterns [21]. For example, an 11-feature radiomic signature that could successfully stratify the survival of glioblastoma patients was selected from 12,190 features extracted from multiparametric and multiregional tumour volumes [22]. Additionally, using a random forest algorithm and a total of 2,970 features extracted from multimodal MR images of glioma patients, the isocitrate dehydrogenase mutation status of the patients was effectively predicted in one previous study [23]. Furthermore, radiogenomics can be applied to many other types of tumours. In an early classical study, neural network machine learning was used in patients with liver cancer to determine the association between texture features extracted from enhanced CT scans and gene expression patterns [24].

Further, arterial spin-labelling cerebral blood flow maps have been found to relate to EGFR variant III expression in gliomas [25]; a higher median relative cerebral blood volume has been shown to be associated with high EGFR expression [10]; maximum tumour blood flow and relative tumour blood flow have been shown to significantly and positively correlate with EGFR expression status [11]; and relative tumour blood volume, an MR-perfusion-weighted imaging-derived parameter, has been found to be significantly related to the EGFR variant III expression status [26]. Taken together, these findings suggest that the EGFR expression status and MRI features are closely related. These previous studies, however, only indicated that these parameters were related to EGFR expression instead of predicting it, owing to the lack of independent validation sets.

In the current study, 41 out of a total 431 texture features were screened by using a logistic regression model. Consequently, the EGFR expression status could be predicted efficiently in the training set. Next, these 41 features were applied to the validation set, and the equation derived from the linear regression was found to still be effective in predicting the expression of EGFR. These results indicated that these 41 texture features correlate with the EGFR expression in lower grade gliomas. This is an improvement over the previous studies analysing individual imaging parameters associated with the EGFR expression status.

The 41 texture features obtained by the logistic regression model are also related to the expression status of EGFR theoretically. For example, as one component of this signature, Spherical Disproportion = A/4πR2. Here, A refers to the surface area of the tumour, while R is the radius of the sphere that has the same volume as the tumour [20]. This index quantitates the differentiation between the tumour shape and sphericity. Recently, EGFR expression status was demonstrated to correlate with this shape characteristic in lung cancers, with this shape characteristic showing a strong ability to identify patients with the strongest EGFR immunostaining.

Previous authors have speculated that the malignant proliferation of tumour cells and angiogenesis caused by high EGFR expression might contribute to the increased shape irregularity [27]. Additionally, as two group 1 features in this signature, the index Range refers to the range of voxel intensity values of the tumour, and the Variance refers to the voxel intensity value variability of the tumour. These two indicators may be associated with the non-uniformity levels within the tumour. Moreover, it has recently been reported that, in a murine glioma model, high EGFR expression related to increased tumour cell proliferation, reduced tumour cell adhesion to the extracellular substrate, and altered progenitor cell phenotype, suggesting that the EGFR expression status and tumour heterogeneity are indeed closely related [28]. Therefore, the Range and Variance may reflect the EGFR expression status. Of note, as glioblastomas (WHO grade IV) have distinct genetic characteristics from lower grade gliomas (WHO grade II and III), the radiomics features that associate with the EGFR expression level in glioblastomas are an interesting topic for further investigations.

A logistic regression model is a regression method in which the variables that contribute little to the linear model are eliminated. It is a simple and efficient method for reducing dimensionality. In this research, a single-mode quantitative imaging sequence was used for the prediction of the EGFR expression level. This radiomic-based prediction was found to be more effective than prediction based on traditional qualitative imaging, as the latter is descriptive rather than quantitative.

Lastly, the EGFR overexpression level in lower grade gliomas remains controversial. A previous study showed that EGFR amplification (a minimum average EGFR gene copy number of 7 per genome) was identified in 40–50% of glioblastomas and in about 10% of anaplastic gliomas, but was rare in WHO grade II gliomas [29]. On the other hand, another study showed that EGFR overexpression (defined as >25% positive cells on immunohistochemistry) was observed in 40% of grade II gliomas and 78.4% of grade III gliomas [30]; these proportions of high EGFR expression were similar to the proportion detected in the current study. The inconsistencies in the EGFR expression level may result from differences in the patient populations and the definition of EGFR expression used among the different previous studies.

We recognise some limitations to this study. First, our model was generated based on retrospectively collected data. The current study, therefore, needs to be prospectively evaluated. Second, it should be noted that multi-modal radiological data were not included in this research, and radiological information reflecting tumour perfusion (such as perfusion-weighted imaging, arterial spin-labelling, etc.) will be added for the optimisation of this prediction model in the future.