Introduction

Meningiomas are the most common primary extra-axial neoplasms in adults, accounting for 36.7% of all central nervous system tumors in the USA [1, 2]. The World Health Organization (WHO) classification is one of the major prognostic factors and was applied to tailor treatment strategies [3,4,5,6]. According to the WHO classification system, meningiomas can be classified into three grades: grade I (benign), grade II (atypical), and grade III (anaplastic) [7]. In general, low-grade (grade I) meningiomas were treated following an indolent clinical course, and complete surgical resection can achieve favorable survival outcomes [8]. By contrast, high-grade meningiomas (grade II, grade III) tend to exhibit more aggressive biological behavior and poorer clinical prognosis, with relatively high 5-year recurrence rates of 50 to 90% [9]. For individuals with grade I meningiomas, surgical resection or radiosurgery alone is usually recommended in clinical practice, and adjuvant radiotherapy is only necessary for residual tumor tissue. However, for highly aggressive grade II and grade III meningiomas, adjuvant radiotherapy may be necessary and beneficial even after complete resection [10]. Furthermore, for those patients with high-grade meningiomas, additional treatment strategies such as radiotherapy and/or chemotherapy are required frequently [6]. Therefore, accurately grading prior to surgery is critically important for patients with meningiomas in clinical settings [11].

Presently, the noninvasive imaging technique available for assessment of meningiomas is mainly MRI because of relatively high convenience and soft-tissue resolution, as well as providing the tumor dynamic blood supply information [6,7,8,9,10]. Although some researchers have demonstrated that MRI could provide valuable information for the evaluation of meningiomas, the radiological performance of different grades is largely overlapped, which could lead to misdiagnosis and inappropriate treatment strategies [12,13,14].

Machine learning (ML), an intersection of statistics and computer science, is a branch of artificial intelligence as it enables the extraction of meaningful patterns from examples, which is a component of human intelligence [15]. Over the last decade, it has been successfully applied in the field of radiology, particularly in automatically detecting disease and discriminating tumors [16]. Recently, some studies demonstrated that ML-based on MRI was a promising tool in grading meningiomas [17]. However, few radiomics studies combined with deep learning features were extracted using a pretrained convolutional neural network (CNN). Thus, we decided to evaluate the performance of a DLRM based on enhanced-T1WI images for preoperative differentiating low- and high-grade meningiomas and compare with that of RM and DLM.

Materials and methods

Study population

The research protocol was reviewed, approved, and overseen by the review board of *BLINDED 1*. The informed consent was waived due to the retrospective nature. Specific inclusion criteria were listed as follows: (1) pathologically confirmed intracranial meningiomas and histopathological grade of meningiomas were determined according to the guidelines of WHO criteria [18]; (2) previously untreated solitary primary tumor before MRI scans; (3) available preoperative brain MRI scans including axial and sagittal enhanced T1WI prior to surgical resection. The exclusion criteria were as follows: (1) unsatisfactory image quality, such as artifacts or incomplete tumor display (only partial tumor was scanned); (2) a history of relevant treatment (including radiotherapy or surgery) before MRI scans; (3) multiple lesions. All enrolled patients were randomly divided into training cohorts and test cohorts according to a ratio of 8:2. The test cohort was only used for testing the performance of the models. The flowchart of patient selection is shown in Supplementary Fig. 1.

Image acquisition

The examinations were performed by using an MRI 3.0T scanner (Discovery MR750w, GE Medical Systems). All MRI scans were completed within 2 weeks prior to surgical resection. The unenhanced scanning sequences included sagittal T1WI, axial T1WI, and axial T2WI. Enhanced scanning sequences included axial and sagittal T1WI. Specific T1WI sequence parameters were displayed as follows: repetition time [TR] 500 ms, echo time [TE] 10 ms, matrix size 256 × 256, slice thickness 5 mm, and field of view [FOV] 230 × 230 mm. T2WI images were acquired using the following scan parameters: TR 5200 ms, TE 95 ms, matrix size 256 × 256, slice thickness 5 mm, and FOV 23 cm. All enhanced T1WI images were acquired after administration of gadopentetate dimeglumine (0.1 ml/kg).

Image preprocessing

The processing of developing DLRM with synthetic minority oversampling technique (SMOTE) is presented in Fig. 1. Before data analysis, image preprocessing was performed, including image normalization, resampling, and N4 bias field correction. Unwanted low-frequency intensity nonuniformity was removed by implementing the N4 bias correction using SimpleITK software (version 2.0, https://www.simpleitk.org/) [19]. After that, the images were further normalized in order to reduce brightness and contrast variations. For the radiomics feature, the normalization option on the pyradiomics software (version 3.0, https://pyradiomics.readthedocs.io/en/latest/) is used, which enabled the normalization of image intensity values prior to feeding them into the extraction pipeline. Normalization operation is based on all gray values contained within the images, not just those defined by the ROI in the mask. For the deep learning feature, we used the torchvision package (version 0.9, https://pytorch.org/vision/stable/index.html) to normalize the images before feeding them into pretrained ResNet. In addition, all MRI images were resampled to a uniform voxel size of 1 mm × 1 mm × 1 mm for subsequent image processing.

Fig. 1
figure 1

Overview of the radiomics analysis framework for the differentiation of meningiomas. Image preprocessing was performed prior to data analysis. Features were extracted and then combined with feature reproducibility for feature selection. Prediction models were constructed based on LightGBM algorithm for meningioma differentiation. Finally, models were evaluated and tested

Tumor segmentation

For obtaining ROI, manual segmentation was performed on the meningioma images. The enhanced tumor images were imported into ITK-SNAP (version 3.8.0, https://www.itksnap.org/) with the tumor lesion areas manually delineated by two radiologists (both of who have more than 10-year practicing experience in brain MRI) blinded to the pathological results. The maximum area slice of the tumor lesion was selected and delineated, and the adjacent bone invasion was excluded as much as possible. As for deep learning features extraction, the maximum area slice of the tumor was chosen for delineating, and the adjacent bone invasion was excluded. Then, the tumor was cropped using a rectangle bounding box and resized into 299 × 299 pixels. The tumor segmentation process is presented in Fig. 2.

Fig. 2
figure 2

The tumor segmentation process

Feature extraction

Radiomics features were extracted by using pyradiomics (version 3.0, https://pyradiomics.readthedocs.io/) [20]. For each patient, a total of 1688 radiomics features could be extracted from each sequence, including seven categories: first-order, shape, gray-level cooccurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level size zone matrix (GLSZM), gray-level run-length matrix (GLRLM), and neighboring gray-tone difference matrix (NGTDM). All filters were enabled. To obtain deep learning features, we used the pretrained Resnext101_32x8d without the Softmax layer of PyTorch (version 1.8.1, https://pytorch.org/) as a feature extractor, which could yield 2048 deep learning features after feeding an image. The naming method of deep learning features was as follows: DLF_number_sequence. Finally, a total of 7472 features (1688 × 2 + 2048 × 2) could be extracted from axial and sagittal enhanced T1WI images of each meningioma.

Feature reproducibility

Intra- and interobserver reproducibility of feature extraction was assessed by calculating the intra- and interclass correlation coefficients (ICCs). For all cases in the training cohort, radiologists A and B extracted the features independently, and radiologist A reextracted features 2 weeks later. Radiomics features with ICC lower than 0.90 were not used to further analysis. The second segmentation results by radiologist A were finally chosen.

Data analysis

Data analysis, including data preprocessing, feature selection, and model development and evaluation, was performed by using the scikit-learn package (version 0.23.2, https://scikit-learn.org/) in Python (version 3.8, https://www.python.org/) [21].

Before feature selection, the mean and standard deviation of each feature in the training cohort were calculated one by one, and the values of each feature were normalized with Z-score (Z = (x-mean)/standard deviation). Subsequently, SMOTE was adopted due to the imbalance number of low- and high-grade meningiomas [22].

A three-step feature selection was applied to find optimal feature combinations. First of all, we calculated the ANOVA p-value between label and features for classification tasks and removed the features with p > 0.05. Next, the feature importance of each feature was calculated by using the LightGBM algorithm, and the features ranked in the top 20 were retained [23]. Finally, recursive feature elimination with 5-fold cross-validation (RFECV) was performed to select the most effective combination of the features according to the area under the curve (AUC) mean. To obtain optimal hyperparameters, a grid search with 5-fold cross-validation was implemented.

Models’ development and diagnostic performance assessment

We constructed a DLRM using LightGBM algorithm by combining two categories of features [24]. Considering the performance comparison, we also applied a similar method to construct the other five models. All predictive models were trained by the training cohort and then tested using the test cohort. The flowchart of developing models is shown in Supplementary Fig. 2. The predictive performance of the developed models was assessed using the ROC curve. The accuracy, sensitivity, and specificity were calculated based on the independent test cohort.

Results

Patients’ characteristics

Patients with multiple lesions (n = 9), a history of treatment before MRI scans (n = 8), and substandard image quality, such as artifacts or incomplete tumor display (only partial tumor was scanned) (n = 4) were excluded. Finally, a total of 132 patients with meningiomas from January 2017 to December 2020 were retrospectively reviewed in our study, composed of 39 males and 93 females (ranging from 34 to 72 years old, median 56). Forty-eight patients were assigned to high-grade meningiomas, and 84 patients were confirmed with low-grade meningiomas. Next, they were randomly divided into training cohorts (n = 105) and test cohorts (n = 27) according to the 8:2 ratio.

Feature screening

The intra-observer ICC ranged from −0.766 to 1, and interobserver ICC was ranged from −0.936 to 1. A total of 7472 features were extracted for each patient, and 4833 features were excluded due to poor reproducibility (ICC range: −0.936 to 0.899). All the extracted features are listed in Supplementary Material 1.

Then, 341 features were kept after removing features with p < 0.05. The top 20 features ranked by the LightGBM algorithm were entered in RFECV to select the most effective combination of the features. At last, a total of 15 features, including 12 radiomics features and 3 deep learning features, were selected to construct the DLRM using SMOTE. The selected features and corresponding importance are displayed in Fig. 3. The heat map of the selected features is shown in Fig. 4. The features selected for developing RM and DLM were listed in Supplementary Material 2.

Fig. 3
figure 3

The selected features and corresponding importance

Fig. 4
figure 4

The heat map of the selected features.

Predictive performance of the radiomics features

The test cohort was used to evaluate the performance in differentiating low- and high-grade meningiomas. The AUC, accuracy, sensitivity, and specificity of the DLRM were 0.912, 0.889, 1.000, and 0.824 in the test cohort, respectively. The performance of the DLRM outperformed that of the RM and the DLM both in the training and test cohorts (training AUC: 0.950, 0.839; test AUC: 0.888, 0.656). The models using SMOTE showed better performance. After the adoption of SMOTE, the AUC, accuracy, sensitivity, and specificity of the DLRM were 0.935, 0.926, 0.900, and 0.924 in the test cohort, respectively, which is still superior to the performance of the RM and the DLM both in the training and test cohort (training AUC: 0.980, 0.892; test AUC: 0.918, 0.718). ROC curves are shown in Fig. 5a and b. All diagnostic performance parameters are listed in Table 1.

Fig. 5
figure 5

ROC curves of test cohort

Table 1 Performance of different models in the training and test cohort

Discussion

The meningioma grade prediction was a crucial determinant for tumor recurrence and the patient’s overall survival. For patients with meningiomas, the clinical prognosis of different pathological grades is significantly different such that higher pathological grades of meningiomas tend to be with a higher recurrence rate (I: 7–25%; II: 29–52%; III: 50–94%) [25]. Therefore, given marked differences in treatment strategy and survival outcome, accurate and noninvasive evaluation of pathological tumor grade prior to surgery is of vital significance to assist in developing a personalized therapeutic schedule. In the present study, we constructed a DLRM based on axial and sagittal enhanced T1WI images, which showed favorable performance for differentiating low- and high-grade meningiomas. By contrast, the DLRM-based model demonstrated better performance than RM and DLM-based models. The AUC for the DLRM-based model was 0.988 in the training cohort, whereas the RM-based model was 0.980, and the DLM-based model was 0.892. The AUC for the DLRM-based model was 0.935 in the test cohort, whereas the RM-based model was 0.918, and the DLM-based model was 0.718. Generally, the performance in the test set should be slightly worse, as it is unlikely that no matter what model performs best on the training set would perform equally well on every other unseen set of data. In addition, we guessed that the performance of our model in the test cohort was attenuated by the limited population size and imbalanced proportions between the low-grade and high-grade meningioma patients. Further studies with the inclusion of more data are required to strengthen these findings.

Meningioma grading is crucial because it aids in treatment planning [26]. MRI is the first choice for the diagnosis and preoperative assessment of meningiomas [6, 27]. Nevertheless, the value of conventional sequences is limited for differentiating the grades of meningiomas, and controversial results have been reported for apparent diffusion coefficient values [12, 28]. It is well known that the presence of intratumoral edema, hemorrhage, and necrosis produce heterogeneous tumor parenchyma, which is closely related to the high aggressiveness of the tumor. However, those semantic features are generally highly subjective through visual evaluation by the radiologists, thereby resulting in great interobserver variability in clinical practice. In contrast, radiomics as a promising approach was applied in noninvasive meningioma grading and demonstrated favorable performance [17]. Lorenzo et al. summarized eight related radiomics studies, where the pooled AUC of studies employed a test cohort achieved 0.84 (95% confidence intervals [CI] = 0.78–0.90), suggesting that radiomics could serve as an effective tool in grading meningiomas [17]. Thus, the present study investigated the clinical usefulness of machine learning-based radiomics analysis in differentiating low- and high-grade meningiomas.

Currently, various machine learning algorithms have been applied to radiomics analysis. Such as random forest, decision tree (DT), and so on. As a novel and advanced Gradient-Boosting Decision Tree (GBDT) algorithm, LightGBM was proposed in 2017 by Microsoft Research Asia. LightGBM is a distributed and high-performance gradient lifting framework based on a DT machine learning algorithm. The advantages of the LightGBM algorithm include ingredient-based one-side sampling and exclusive feature bundling, which were used to deal with the huge number of data samples and a massive amount of radiomics features. Additionally, LightGBM demonstrates faster data training speed with a lower memory cost in comparison with traditional boosting algorithms. Thus, LightGBM enables to realize the extraction of high-throughput data [29].

In our study, we finally extracted a sum of 3376 radiomics features from each patient, which consisted of more advanced high-order features not just the inclusion of the first-order features. The entropy belongs to the GLCM feature pool, which essentially reflects the intensity of the spatial distribution. Specifically, the larger entropy value represents the greater tumor heterogeneity. At the same time, the GLSZM describes the uniformity of the tumor texture. In general, high-grade meningiomas are featured by a larger proportion of tissue disruption and thus a higher heterogeneity of the distribution of cells in the tumor lesions compared with low-grade lesions [30]. These radiomics features can be used to reflect the spatial heterogeneity of meningiomas of different pathological grades. Based on this fact, we successfully developed a RM based on enhanced T1WI images for grading meningiomas, achieving an AUC of 0.918 in the test cohort.

Contrast-enhanced T1WI images before surgery analyzed in our study could be attributed to the following reasons: (1) a series of previously reported works that predict the pathological grade of meningiomas based on presurgical MRI images were mainly focused on these imaging sequences; (2) the deep learning features derived from contrast-enhanced T1WI images have recently demonstrated good discrimination performance in both the primary and test cohorts, with an AUC of 0.811 in the test cohort [11]. It is well-known that most meningiomas manifest as a marked enhancement on enhanced T1WI images due to abundant blood supply. Thus, the tumor boundaries can be clearly displayed. A series of previously reported works that predict the pathological grade of meningiomas based on presurgical MRI images were mainly focused on these imaging sequences in order to obtain more robust results. Furthermore, the diagnostic performance comparison between the T1WI and T2WI sequences in a previous study draws the conclusion that a predictive model cohort up with an enhanced T1WI sequence shows a better differentiating ability of meningiomas, so we only outline volume of interest on the enhanced T1WI images [27, 31, 32].

The clinical usefulness in determining the pathological grade of meningiomas using enhanced T1WI images has been investigated by several researchers, with discrepant results. The study conducted by Laukamp et al. reported an AUC of 0.76 in the differentiation between low-grade and high-grade meningiomas. However, firstly, a previous study was carried out on a relatively small sample size (71 patients). Secondly, only shape and texture features were extracted from conventional enhanced T1WI images; advanced radiomics features were not obtained for technical reasons [31]. In contrast, the AUC of radiomics features was 0.950 in the current study, which demonstrated a higher diagnostic efficacy. The current study was applied a relatively larger sample size (132 cases), higher-order features (such as Gy level co-occurrence matrix (GLCM) feature and neighborhood gray-tone difference matrix (NGTDM) feature), as well as a more advanced radiomics analysis approach. Another previous study reported that radiomics feature-based machine learning classifiers of enhanced T1WI images are useful for differentiating meningioma grades. However, the study population was insufficient with analyses of just the largest dimension of the tumor on the enhanced T1 sequence instead of the analysis of the whole tumor, and the ignorance of data imbalance has to be admitted. It is inevitable that the existence of the data imbalance in meningioma grading resulted from the relatively low incidence of high-grade meningiomas, which has an adverse effect on the fitting of radiomics classification models [33]. In the present study, the state-of-the-art subsampling technique was recommended as a suitable strategy to remedy this defect, and the predictive performance was improved by using SMOTE.

Conventional machine-learning techniques show limited capabilities to process natural data in their raw form. The construction of a pattern-recognition system requires careful engineering and rich knowledge to design a feature extractor that enables the transformation from the raw data into a feature vector [34]. The deep learning approach is a representation learning method that allows a machine to be fed with raw data and to discover the representations needed for classification. DCNNs automatically utilized filters to create radiomics feature maps describing the distribution of such features derived from medical. An increased number of clinical applications of DCNNs in neuroradiology have been reported [35].

A previous study conducted by Zhang et al. designed a deep learning algorithm to predict meningioma grade. For meningioma grade classification, the prediction accuracy in the test cohort was 81.52%, demonstrating a better diagnostic performance in comparison with our results (81.52%) [36]. Banzato et al. studied 117 meningioma-affected cases and reported that DCNNs could accurately discriminate between benign and atypical/anaplastic meningiomas from ADC maps but not from enhanced T1WI images. The discriminating accuracy of both DCNNs on postcontrast T1WI images was low, with Inception-v3 displaying an AUC of 0.68 (95% CI, 0.59–0.76) and AlexNet displaying an AUC of 0.55 (95% CI, 0.45–0.64) [37]. In the present study, we reported an AUC of 0.839 in the training cohort and an AUC of 0.656 in the test cohort. The slight inconsistence between the above findings and our own results is probably because of the different sample sizes and MRI scanners with varying field strength (1.5T and 3.0T) and vendors.

In order to further obtain robust results, we investigated the additional value of DLRM in grading meningiomas. Our data suggested that deep learning radiomics-based feature analysis yielded strong classification performances for meningioma grading. Recently, Zhu et al. combined radiomics features with deep learning features to construct a comprehensive model for grading meningiomas, the AUC of which was 0.814 in the test cohort [11]. In the present study, ANOVA was firstly used to select the top ten important features, then the RFECV was performed to the most discriminating features. We retained features with greater contribution, which were then utilized to develop a DLRM. Subsequently, the verification cohort was substituted into the prediction model for further testing. The final results confirmed that the AUC of the verification cohort was as high as 0.912, thus demonstrating that the DLRM is reliable. Presumably, it was because that we added the features extracted from sagittal images and used pretrained Resnext101_32x8d as feature extractor, which was one of the state-of-the-art CNN models, achieving an accuracy of 0.854 in the ImageNet dataset [38]. Moreover, the radiomics features were extracted from all lesion slices instead of three consecutive slices, and thus they could reflect tumor heterogeneity comprehensively. All of which were able to make us obtain more valuable information.

This study has several limitations: (1) potential selection bias might exist because of the retrospective nature. (2) We only outlined the VOIs in tumor parenchyma. Thus, the potential significance of peritumoral edema areas for the grading of meningiomas should be explored in the near future. (3) The study sample size was limited, and a larger number of cases should be included in a subsequent study, and external test data also are needed for further validation.

In summary, this study demonstrates that the application of DLR analysis based on contrast-enhanced T1WI images provides great clinical usefulness in differentiation between high-grade and low-grade meningiomas. In addition, further investigation, including a much larger number of populations from multicenter, should be carried out to better expand the generalization ability of this method.

Conclusion

In summary, the deep learning radiomics model based on enhanced T1WI images has favorable performance for differentiating low- and high-grade meningiomas, and the addition of deep learning features is able to improve the performance of the radiomics model. Besides, additional studies with high methodological quality are still required to validate our results.