Introduction

Meningiomas are the most common primary intracranial tumours in adults, accounting for 36.8% of all primary central nervous system tumours, with an incidence of 8.14/100,000, based on the most recent data (2010–2014) from the Central Brain Tumor Registry of the United States (CBTRUS) [1]. Although most meningiomas are benign, approximately 10-20% of them exhibit aggressive behaviour [2,3,4]. The histopathological grade of these tumours defined by the World Health Organization (WHO) is important because it is one of the critical factors in determining clinical outcome [5]. A higher WHO grade usually means a higher recurrence rate and a lower survival rate [6,7,8]. Therefore, a presurgical evaluation of the WHO grade of a meningioma may enable the recognition of a potentially invasive meningioma, facilitate treatment decisions and improve the patient's prognosis.

Magnetic resonance imaging (MRI) has become the key imaging technique used for visualising and managing intracranial meningiomas. Many attempts have been carried out to preoperatively assess the grades and histopathology of meningiomas based on various MRI sequences [9,10,11,12]. The apparent diffusion coefficient (ADC) value generated from diffusion-weighted imaging (DWI) has been recognised as an effective method by several studies in distinguishing low-grade and high-grade meningiomas, demonstrating its potential value of grading meningiomas to some extent [11, 13,14,15]. However, different grades of meningiomas do not always demonstrate clear differences in their diffusion parameters, and the ADC value is not able to separate WHO II and III meningiomas, thus, potentially leading to inaccurate diagnosis [11, 13].

Currently, the emerging field of radiomics has provided a potential method for non-invasively recognising tumour characterisation by converting medical images into mineable data via the extraction of a large number of quantitative imaging features [16, 17]. Texture analysis (TA) is a method of radiomics that is used to analyse the statistical, structural and spectral characteristics of pixel intensities (grey value) within selected regions and then to classify the texture by employing supervised or unsupervised methods, in order to extract invisible feature correlations inside [18]. Compared with visual assessments of texture, computational TA techniques are more objective and more sensitive to changes that are imperceptible to the human visual system [19].

Although TA has been used to pathologically distinguish different subtypes of meningiomas since 2007, its utility in medical images is still limited [18, 20,21,22]. To the best of our knowledge, there is still no previous work that has evaluated the diagnostic effectiveness of TA in preoperative meningioma grading based on DWI images. Therefore, in our study, we investigated (1) whether TA based on ADC maps would be more efficient in the prediction of the WHO grade of meningiomas compared with radiologists and (2) then built the best classifier for further clinical practice.

Materials and methods

This study was a retrospective study; approval from the institutional review board (IRB) was obtained, and written informed consents were waived.

Patient cohort

Data were collected from our institution, which is a university-affiliated hospital. We searched the pathology database for consecutive patients with pathologically confirmed meningiomas between January 2011 and August 2017. Preoperative enhanced MRI scanning with diffusion-weighted images was required. Patients who had any previous relevant treatment history (including radiotherapy or surgery) were excluded. Cases with motion artefacts were also excluded. Finally, a total of 152 meningioma patients [88 (57.89%) with WHO I meningiomas, 48 (31.58%) with WHO II meningiomas and 16 (10.53%) with WHO III meningiomas, consisting of 16 anaplastic meningiomas, 48 atypical meningiomas, 56 fibrous meningiomas, 23 meningothelial meningiomas, 5 transitional meningiomas, 2 angiomatous meningiomas, 1 secretory meningioma and 1 psammatous meningioma] with 421 preoperative ADC maps were included. The mean (±SD) age of patients was 53.97±12.13 years and female accounted to 61.84%. (Table 1)

Table 1 Baseline information of the patients enrolled

A neuropathologist with 12-year-experience reviewed all the pathological tumour slices and assessed them according to the 2016 WHO classification system to avoid any potential impact that the inclusion of brain invasion as a formal diagnostic criterion for grade II meningiomas might have had on their association with the imaging features [5].

MRI protocol

All patients experienced an MRI-enhanced head scan with a 3.0 T device (Verio, Siemens, Erlangen, Germany/DISCOVERY MR750W, GE, Milwaukee, MI, USA). For this study, the necessary sequences are listed in Table 2.

Table 2 The parameters of the MRI sequences implemented in the study

All images were available in digital form. The morphological features were collected on a GE post-processing station (AW Volume Share™ 5; GE, Milwaukee, WI, USA) by two radiologists with over 10 years of experience, without any knowledge of the histopathological diagnosis. They reached consensus after negotiation. Details of clinical and morphological features were described in Table 3. Regions of interest (ROIs) for the ADC values were drawn by another radiologist blinded to the pathological results. Whole, solid-appearing portions of each meningioma were selected as ROIs. Both the ADC value and the area of the ROI in each slice were recorded, and the average ADC value for each patient was calculated based on the weights (area) of the value as follows:

$$ \overline{\mathrm{x}}=\frac{w_1{x}_1+{w}_2{x}_2+\dots +{w}_n{x}_n}{w_1+{w}_2+\dots +{w}_n} $$
Table 3 The description of all categories of the features used in our study

Radiologist evaluation

Two neuro-radiologists, with 10 and 11 years’ experience, respectively, independently reviewed the images. Both of them were blinded to pathological grades but were aware that the tumours were meningiomas. They assessed all the conventional MR images [T1-weighted imaging (T1WI), T1-weighted imaging (T2WI), T2-weighted fluid attenuated inversion recovery (T2-FLAIR), echo-planar diffusion-weighted imaging (EPI-DWI) and enhanced-T1WI, and recorded their diagnosis: grade I, grade II and grade III. The kappa value and accuracy for the diagnosis of each radiologists were recorded after compared with pathological results.

Texture feature extraction

The ADC maps were processed offline in the DICOM format with the TA software MaZda (version 4.6; Institute of Electronics, Technical University of \( \mathrm{L}\overset{\acute{\mkern6mu}}{\mathrm{o}} \)d\( \overset{\acute{\mkern6mu}}{\mathrm{z}},\mathrm{Poland} \)) [23, 24]. A polygonal ROI was placed on each slide within the boundary of the tumour on the ADC maps, in reference to the enhanced T1WI images. To minimise the influence of brightness variations on the different machines, grey-level normalisation and image quantisation were performed for each ROI. Grey-level normalisation uses the limitation of dynamics to μ ± 3SD (μ, grey level mean). Quantisation of the resulting grey-level range was done to compress it between 1 and 2k, where k is the number of bits per pixel [23, 24]. The extracted features included five categories, namely, the grey-level histogram, co-occurrence matrix, run-length matrix, autoregressive model and wavelet transform. Altogether, 156 texture features were retrieved from each ADC map (Table 3). An example of feature extraction was shown in Fig. 1.

Fig. 1
figure 1

An example of ROI selection and feature extraction. A 72-year-old man who had complained of hypomnesia for half a year. The pathological diagnosis was anaplastic meningioma. Before surgery, he underwent MRI scanning in our hospital including T1WI (a), T2WI (b), T2-FLAIR (c), DWI and enhanced T1WI (d). The ROI was selected in ADC map with reference to other sequences to avoid cystic part (e) and the average ADC value was 1042.536 × 10-6 mm2/s. All the region of tumour was selected when using MaZda software (f) to extract the texture information

Texture feature selection

Texture feature selection was necessary to avoid over-fitted and poorly generalised classification models. The process of feature selection was completed by MaZda and R software (version 1.1.383). First, a temporal split was used randomly and assigned all the data to a training dataset or a testing set with a ratio of 0.7:0.3. All feature selections were based on the training dataset to ensure independence from the testing dataset. In the MaZda software, three algorithms were implemented: mutual information; Fisher coefficient and classification error probability; average correlation coefficient (POE + ACC). Each algorithm was able to determine ten features. Thus, a total of 30 texture features were exported to R software. Finally, A subset of 23 texture features with the lowest misclassification rate (27.03%) among the three grades of meningiomas in the training dataset was selected. by recursive feature elimination (RFE) model.

Classification

The process of classification was performed using R studio. Since the pathological results of the meningiomas were classified into three groups, tree classifiers were applied to build multiclass models. Nested cross validation was used for model tuning and training by “caret” package on the training set, leaving the validation database independent from the model selection process. Differences in the predictive power among the models were assessed using bootstrapping (1,000 iterations).

Three types of tree classifiers were built to assess the diagnostic performances of all the selected features among the three grades of meningiomas: classical decision tree based on C4.5 algorithm, conditional inference tree based on statistical theory and decision forest based on random forest algorithm. The features used for comparison in classifiers were divided into five groups: (1) the average ADC value; (2) significantly-related clinical features + morphological features + average ADC value; (3) 23 selected texture features; (4) all texture features; (5) 23 selected texture features + ADC value. All classifiers were built by the data in the training dataset. Then, the data in the testing dataset were imported to these classifiers to compare their diagnostic performance. The kappa value and accuracy for each classifier was recorded, and the best classifier was selected. The process of the study was shown in Fig. 2

Fig. 2
figure 2

Flowchart showing methodological overview of our experimental set-up. a Features based on clinical and morphological features in each patient were used to build tree classifiers in distinguishing three grades of meningiomas. b Texture features and ADC values in each image were acquired to build tree classifiers in distinguishing three grades of meningiomas. Finally, the accuracy and kappa value were compared to determine the best classifier

.

Statistical analysis

All statistical analyses were performed using R software. The significance and correlations of the non-continuous features among the different grades of the meningiomas were evaluated using the chi-squared test and Fisher's exact test. The significance and correlations of the continuous features of the different groups were analysed using one-way ANOVA and the Friedman test. The predictive power of the decision tree was evaluated by the kappa value and accuracy using the “vcd” and “randomForest” packages.

Results

The diagnostic performance of radiologists

The diagnostic performance of the two radiologists was fair in distinguishing different grades of meningiomas: one with a kappa value of 0.3501(95% CI, 0.2069-0.4933) and accuracy of 61.11%, while another with a kappa value of 0.3957 (95% CI, 0.2565–0.5350) and accuracy of 62.04%. Two representative confusing cases can been seen in Fig. 3.

Fig. 3
figure 3

Two representative cases of meningiomas. a-d A 60-year-old man was admitted with left blepharoptosis for 1 month. He underwent MRI scanning with T1WI (a), T2-FLAIR (b), DWI and enhanced T1WI (d) sequences. The average ADC value of the tumour was 820.23 ×10-6 mm2/s (c). Both neuro-radiologists diagnosed it as grade I meningioma. His pathological diagnosis was grade III meningioma. e-h A 71-year-old man had been complaining of headache for 1 month. The MRI scanning sequence was the same as a, b, d. The average ADC value of the tumour was 940.95 ×10-6 mm2/s (c). One neuro-radiologist diagnosed it as grade II meningioma, while another misdiagnosed it as Grade III. His pathological diagnosis was Grade I meningioma

The clinical and morphological features in different grades of meningiomas

The clinical features included gender, age, location and the size of meningiomas. After analysis, the gender distributions (p = 0.0004) and the maximum diameters of the meningiomas (p < 0.0001) showed significant differences among the three grades of meningiomas (Table 1).

Seven morphological features of the meningiomas were assessed, which mainly described the number, shape, components and effects of each meningioma on the surrounding tissues. The results of the one-way ANOVA showed the existence of irregular shape, cysts, necrosis/haemorrhage, invasion of the skull and peritumoural oedema exhibited significant positive but weak correlations with higher pathological grades of the meningiomas (all p < 0.05; the correlation coefficients were 0.45, 0.28, 0.2, 0.17 and 0.25, respectively), as seen in Fig. 4.

Average ADC values in different grades of meningiomas

The average ADC value of the WHO I meningiomas was 891.05 ± 86.05 × 10-6 mm2/s, which was just statistically higher than the value of the WHO II meningiomas (815.28 ± 97.02 × 10-6 mm2/s, p < 0.001) and the WHO III meningiomas (792.06 ± 117.02 × 10-6 mm2/s, p = 0.003); however, no significant difference was found between the WHO II and III meningiomas, though a decreasing tendency was observed.

The diagnostic performance of three machine learning models

The diagnostic performances of each tree classifier were showed in Table 4 and the relationship between WHO grade and 23 selected texture features could be seen in Fig. 5

Table 4 Diagnostic performance among the different tree classifiers based on different feature groups

We found that the ADC value alone could not help distinguish the three grades of meningiomas; if combined with the clinical and morphological features, the kappa value could be elevated to 0.35, with an accuracy of 62.96%, indicating a fair diagnostic efficiency. The texture features extracted from ADC maps could significantly increase the efficiency of diagnosis, with a kappa value over 0.50 and an accuracy over 70%. Based on the combination of the ADC value and the 23 selected texture features, the diagnostic efficiency was further increased to a kappa value of 0.63 and an accuracy near 80% (Table 4). The decision forest presented a robust diagnostic performance, topping the list of all three classifiers (Figs. 4 and 5).

Fig. 4
figure 4

Correlation matrix among WHO grade, the clinical features, morphological features and average ADC values. The asterisk in the upper panel indicates a significant p value (p < 0.05), while the value in the lower panel indicates the Pearson correlation coefficient between two features

Fig. 5
figure 5

Correlation matrix among WHO grade, ADC value and texture features. The asterisk in the upper panel indicates a significant p value (p < 0.05), while the value in the lower panel indicates the Pearson correlation coefficient (100 times) between two features

Discussion

The histopathological grade of a meningioma is important since it has a close relation with clinical outcome and determines the treatment protocol to some extent. According to NCCN (National Comprehensive Cancer Network) guideline version 1.2017, a WHO grade I meningioma could be treated by surgery or observation, while a WHO grade II meningioma was recommended to be treated with gross total surgery with/without radiotherapy, and a WHO grade III meningioma should be treated as a malignant tumour with a more radical surgery combined with radiotherapy [2]. Furthermore, compared with the favourable prognosis of WHO grade I meningiomas, grade II and III meningiomas can recur more easily, and the 5-year overall survival rates are approximately 80% and <60%, respectively [3, 6, 8, 25,26,27]. Therefore, a presurgical evaluation of the WHO grade may recognise a potentially invasive meningioma, facilitate treatment decisions and improve the patient's prognosis.

DWI, based on the measurement of aberrancies in the expected Brownian motion of free water, has been proven to have potential diagnostic value in grading meningiomas [28]. According to our study, we found WHO grade I meningiomas did have higher ADC values compared with the other two grades; however, the ADC value alone was not capable to distinguish these three WHO grades of meningiomas. There are several problems that may affect the application of ADC. The first problem is the distortion effect of the EPI-DWI sequence. Meningiomas are always located in the cerebral convexity and at the skull base, where the heterogenous magnetic field caused by bone and sinuses could result in a distortion in the image, ultimately affecting the identification of the tumour. The second lies in the scattered cysts inside the tumour. Based on our data, cysts and necrosis had close relationships with higher grade meningiomas. Although we tried very hard to avoid these fluid components in the ROI selection, some scattered cysts with high ADC values were inevitably included, which could have raised the average ADC values in the grade II/III meningiomas. The third problem is the proportion of the subtypes of meningiomas. Angiomatous and secretory meningiomas always have higher ADC values, and fibrous meningiomas usually have lower ones. Therefore, a high proportion of angiomatous and secretory meningiomas could result in a higher average ADC values, which could decrease with more fibrous meningiomas. All of these reasons above limited the application of the ADC value.

In clinical practice, radiologists usually depend on not only the ADC value but also the clinical and the morphological information to estimate the malignancy of a meningioma. In our study, we found that gender, maximum diameter, margin, the existence of fluid components, peritumoural oedema and bone invasion were all weakly related to the WHO grades of meningiomas. The higher the grade was, the higher the rate of male predominance, large sizes, irregular shape, internal fluid components, and frequencies of peritumoural oedemas and bone invasion would be. After building tree classifiers based on these clinical, morphological features and the ADC value, the highest diagnostic accuracy was 62.96%, equivalent to the diagnostic performance of the two experienced neuro-radiologists. Hence, the features which could be easily captured by human beings, such as clinical data, morphology, ADC value, etc., may not achieve much better diagnostic efficiency even using machine learning algorithm.

The texture features extracted from the ADC map were expected to identify the intrinsic characteristics that could increase the preoperational diagnostic accuracy of meningiomas. The 23 selected texture features included 10 from the histogram, 4 from the co-occurrence matrix, 5 from the run-length matrix, 2 from the autoregressive model and 2 from the wavelet transform. From our results, texture features might reflect the microstructural changes in meningiomas. In higher grades of meningiomas, higher values were seen in almost all features from the histogram, indicating fluid components without diffusion restriction in higher grade meningiomas; the skewness was low, meaning that the number of pixels with hypointensities was predominant in WHO grade II and III meningiomas. Higher values of run length non-uniformity and grey-level non-uniformity indicated heterogenous structure inside higher grade meningiomas, while lower values of contrast and difference variance depicted the irregular changes in the grey pixels of aggressive meningiomas [19]. The findings above aligned with those of some previous works, but the explanation of the texture features should be complemented by further studies concerning pathology [18, 21, 22].

Although TA and machine learning are frequently investigated in the healthcare field, most studies have been designed to build binary classifiers. Unlike other studies, our research aimed to solve a multi-class problem which is more practical for clinical applications. Hence, we chose decision tree and decision forest as the algorithm. As a method used in machine learning, decision tree is statistically non-parametric, and its main advantage is its simple structure, which allows for interpretation and visualisation [29, 30]. In our study, both classic decision tree and conditional inference tree were applied. The results showed that classic decision tree was superior to conditional inference tree in all feature groups. Decision forest classifier in our study, based on random forest, is one of the most successful ensemble learning techniques. The principle of decision forests is to build subtrees using the training bootstrap samples, and when testing samples are imported, they choose the classification that has the most votes over all the trees in the forest [29]. In our study, the decision forest did outperform two single decision trees in terms of both kappa values and diagnostic accuracy. The best feature group applied to the decision forest was the combination of the 23 selected texture features and the ADC value. Therefore, decision forest model based on TA might exhibit potential in enhancing diagnostic performance concerning multiple diseases.

There were still several limitations in our study. First, the texture features selected from each image were two-dimensional; thus, they could not match the single set of morphological features. Some studies used the central layer of the tumour to solve this problem, but the use of only one layer might result in important texture features being missed. This problem might be solved by extracting three-dimensional texture features, which would require MRI images with high resolution. Second, we enrolled 152 cases with 421 images from one single institution; this was a small sample with a risk of selection bias. Third, since this was a retrospective study, we did not achieve enough prognostic information such as KPS scores and survival situations which could make TA a more useful technique in clinical practice.

Conclusions

The ADC values did differ between WHO grade I meningiomas and WHO grade II/III meningiomas, but the ADC value alone did not help in the distinguishing different grades of meningiomas. Machine learning models based on clinical, morphological features and ADC value could achieve equivalent diagnostic performance compared to experienced neuroradiologists. The decision forest with ADC map-based texture features is a promising model that could potentially recognise an invasive meningioma and bring exact WHO classification of meningioma before surgery, which can help to predict its biological behaviour for neurosurgeons and make the optimal treatment decision.