Introduction

Trigeminal neuralgia is characterized by paroxysmal and electric shock-like pain in the sensory distribution area of the trigeminal nerve [1]. Microvascular decompression (MVD) is one of the most commonly performed and effective surgical procedures for treating classical trigeminal neuralgia (CTN), and its mechanism of pain relief is based on relieving neurovascular compression (NVC) [2, 3]. However, approximately 30% of patients may experience pain recurrence within 10 years after MVD [4]. If an accurate prognostic model is available, patients at high-risk for poor outcomes following MVD can be fully informed and offered alternative surgical options such as percutaneous surgeries and Gamma Knife procedures. While these procedures may entail more complications and be less effective than MVD, they can still be a viable option for these patients [3]. To identify patients with CTN at high-risk for poor outcomes before surgery, previous studies have evaluated certain clinical characteristics [5,6,7]. Unfortunately, these characteristics are insufficient for accurately and consistently predicting patient prognosis across independent studies. Panczykowski et al developed a scoring system that demonstrated good discrimination for predicting MVD outcomes but only achieved moderate discrimination in an independent study by Ishaque et al [8, 9]. Consequently, there is an urgent need to establish a prediction model that offers greater accuracy and more stable performance.

Three-dimensional steady-state free precession (3D-SSFP) is a gradient-echo MRI sequence and three-dimensional time-of-flight magnetic resonance angiography (3D-TOF-MRA) sequence is based on the phenomenon of flow-related enhancement [10, 11]. Both sequences are integral to the routine management of patients with CTN because they can provide the desired anatomic information and identify responsible vessels [10]. In recent years, NVC and trigeminal nerve characteristics have been found to be used as prognostic biomarkers for MVD outcomes [5, 12, 13]. However, pathological changes in the microstructure of the cistern segment of the trigeminal nerve, such as axonopathy, axonal loss, and demyelination, cannot be observed or quantified by using conventional MRI-based imaging tools. Fortunately, with advancements in imaging technology, the concept of radiomics has emerged. Radiomics is a novel tool that can analyze the digitally encrypted information of medical images, thus enabling non-invasive quantification of pathophysiology throughout the entire lesioned tissue volume by extracting specific features [14,15,16,17,18]. Radiomics features of the cistern segment of the trigeminal nerve have the potential to add extra predictive capability and stability on traditional models. Although previous studies have identified some texture features associated with NVC, no study has focused on the predictive role of radiomics signatures in predicting MVD outcomes [19,20,21]. Thus, our objective was to to create a clinical-radiomics nomogram that combines radiomics features and clinical characteristics to predict MVD outcomes, with external validation.

Materials and methods

Patient selection

This retrospective multicentre study was approved by the review boards of all of the involved institutions, and the need to obtain written informed consent was waived. Six hundred thirty-five patients who were refractory to medical treatment and who underwent MVD at three hospitals from March 2016 to March 2023 were screened. The diagnosis of TN and classification of different subtypes followed the latest criteria established in 2018 by the International Headache Society [22]. The inclusion criteria for this study were patients with CTN who underwent MVD and MRI examinations. The exclusion criteria included: (1) patients diagnosed with idiopathic TN; (2) those without suitable MRI examinations; (3) individuals with an unidentified trigeminal root due to vertebrobasilar dolichoectasia; (4) patients with a history of previous TN-related surgeries; and (5) individuals who were lost to follow-up. A flowchart illustrating patient selection and screening is provided in Fig. 1.

Fig. 1
figure 1

Flowchart of patient recruitment and grouping

Clinical characteristics and definitions

A response to initial medication was defined as experiencing no or tolerable pain after taking the medication, whereas no response was defined as experiencing intolerable pain that needed additional treatments for control. The pain type was categorized as either typical (purely paroxysmal pain) or atypical (paroxysmal pain combined with concomitant continuous pain). Two independent neuroradiologists reviewed all sequences to evaluate the degree, location, and vessel type of NVC. The degree of NVC was classified into three grades: I, II, and III. Grade I indicated slight direct contact between the surrounding vessels and the nerve, whereas grades II and III indicated that the responsible vessels were compressing the nerve, with grade III indicating nerve distortion [23]. To simplify, the degree of trigeminal nerve compression was classified into two levels: slight NVC (grade I) and severe NVC (grade II or III). The vessel type of NVC was classified as artery, vein, or both artery and vein. The location of the NVC was defined as the location of the most severe compression, and the NVC was categorized as the proximal segment (≤ 5 mm from the pons) or the distal segment [24, 25].

Outcome evaluation and follow-up

All of the patients were followed up for at least 6 months after MVD. Pain grade, medicine usage, and postoperative complications were assessed by contacting patients via phone. In cases where pain recurred, the duration from initial relief to recurrence was recorded. Pain outcomes were evaluated and scored by using the Barrow Neurological Institute (BNI) pain intensity grades [26]. MVD outcomes were classified as favorable if the pain was effectively relieved without the need for medication (BNI I or II) 6 months after MVD and if there was no relapse until the latest follow-up. A poor outcome indicated that patients did not experience significant pain relief upon discharge or pain recurrence during the follow-up period (BNI ≥ IIIa). Pain-free survival (PFS) was defined as the time period from the date of pain relief to the date of the first recurrence or the last follow-up.

Image processing and radiomics feature extraction

The workflow of this radiomics study is depicted in Fig. 2. Prior to MVD, all of the patients underwent 3D-SSFP and 3D-TOF-MRA examinations within a month. The examination protocols and the detailed scanner parameters are provided in Supplementary Table 1. All of the DICOM images were imported into 3D-Slicer (version 4.9.0; http://www.slicer.org/) to delineate the cistern segment of the trigeminal nerve on each transverse section. The region of interest (ROI) was independently delineated by two neuroradiologists at hospital A and B, both of whom had more than 5 years of clinical experience. Additionally, 50 randomly selected patients were segmented twice by the neuroradiologist at hospital A after 1 month. The reproducibility of the data was assessed by using the intra and interclass correlation coefficients (ICC). When considering the inconsistency of MRI scan parameters among institutions, the segmented images of all patients were resampled to a voxel size of 1 × 1 × 1 mm to standardize the normal image intensity distribution through z-score normalization. A total of 2030 radiomics features were extracted and categorized into three groups: geometry, intensity, and texture. The detailed explanations of these features and extraction methods are presented in Appendix E1.

Fig. 2
figure 2

Flowchart of the study. This study included radiomics feature extraction, radiomics and clinical feature selection, model building, validation, and clinical application

Radiomics feature selection

The robust radiomics features were selected in four steps. First, only features with high reproducibility (ICCs > 0.9) were retained. Second, the Mann‒Whitney U-test was applied to identify outcome-related features in each sequence, and only those factors that achieved significance at p < 0.05 were included in further screening. Third, for features with high repeatability, Spearman’s rank correlation coefficient was used to calculate the correlation between the features. One of the features with a correlation coefficient greater than 0.9 between any two features was retained. To preserve the ability to depict features to the greatest extent possible, a greedy recursive deletion strategy was employed for feature filtering. Finally, the least absolute shrinkage and selection operator (LASSO) regression model was used on the discovery dataset to construct radiomics signatures. The LASSO method shrinks all of the regression coefficients towards zero and sets the coefficients of many irrelevant features exactly to zero, depending on the regulation weight λ. To identify the optimal λ, a 10-fold cross-validation with minimum criteria is employed, wherein the final value of λ yields the minimum cross-validation error. The retained features with nonzero coefficients were used for model fitting and combined into a radiomics signature. Subsequently, a radiomics score was obtained for each patient by a linear combination of the retained features weighted by their model coefficients.

Model construction and validation

The patients in the development cohort were used to identify the predominant features and to develop predictive models. A clinical model was established based on independent predictors of MVD outcomes through univariate and multivariate Cox analyses with stepwise selection. Three models based on 3D-SSFP features, 3D-TOF-MRA features, and a combination of both were constructed, respectively. Additionally, a clinical-radiomics nomogram was developed that incorporated the radiomics signature with the best performance and clinical data via multivariate Cox regression analysis. The concordance index (C-index), integrated Brier score (IBS), time-dependent receiver operating characteristic (time-dependent ROC) curve, and area under the curve (AUC) were generated to evaluate the performance of the models. Decision curve analysis (DCA) was also conducted to assess the clinical net benefits and applicability of the nomogram at different threshold probabilities. The calibration efficiency of the nomogram was evaluated by using calibration curves and the Hosmer–Lemeshow test. Furthermore, the performances of various models were compared by using integrated discrimination improvement (IDI).

Statistical analysis

Independent t tests were used to compare the continuous variables, whereas χ2 tests were used to compare the categorical variables. K‒M survival analysis was also conducted to evaluate PFS at various time points, and the log-rank test was applied to compare the results. Comparisons of variables, univariate and multivariate Cox regression analyses, and K‒M survival analysis were performed by using IBM SPSS Statistics 27.0 for Windows. The PyRadiomics package (version 2.12) in Python 3.7.1 (https://www.python.org) was used to extract the radiomics features. The LASSO Cox regression model, time-dependent ROC curve analysis, and DCA were performed by using the ‘glmnet’, ‘timeROC’, and ‘stdca. R’ packages in R version 4.2.1 (https://www.r-project.org), respectively. The C-index was calculated by using the ‘survcomp’ package. A statistical significance level of 0.05 was set.

Results

Clinical characteristics of patients in various hospitals

After screening, a total of 455 patients were included in the current study, with 279 from hospital A in the development cohort and 176 from hospital B and C in the test cohort. The baseline demographic and clinical characteristics of the individuals in the development and test cohorts are compared in Table 1. The median follow-up periods for the development and test sets were 34 (ranging from 6 to 89 months) and 35 (ranging from 6 to 91 months) months, respectively. During the follow-up period, poor outcomes occurred in 63 patients (22.6%) in the development cohort and 49 patients (27.8%) in the test cohort. The PFS rates were similar between the two study cohorts (p = 0.40, log-rank test) (Supplementary Fig. 1). Supplementary Table 2 presents the baseline characteristics of patients with poor and favorable outcomes in both cohorts.

Table 1 Clinical characteristics of patients in the development and test cohorts

Establishment and performance of the clinical model

Univariate Cox analysis demonstrated that the NVC degree (p < 0.001) and pain type (p < 0.001) were associated with MVD outcomes in the development cohort (Supplementary Fig. 2). Variables with p < 0.1 were further analyzed by using multivariate analysis with the ‘Forward LR’ method. The results showed that slight NVC and atypical pain were independent risk factors for poor outcomes (p < 0.001). Subsequently, a clinical model was established, which exhibited C-indices of 0.758 and 0.691 in the development and test cohorts. The time-dependent ROCs and K‒M curves of the development and test cohorts are presented in Fig. 3.

Fig. 3
figure 3

Performance of the clinical signature in predicting pain outcomes. ROC curves of the ability of the clinical signature to predict 1-, 3-, and 5-year PFS in the development (a) and test (c) cohorts. K‒M plots for PFS according to clinical factors in the development (b) and test (d) cohorts. PFS, pain-free survival; HR, hazard ratio; ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve

Radiomics feature extraction and selection

A total of 2030 radiomics features were extracted for each patient, including 396 first-order features, 28 shape features, and 1606 texture features. Among these features, 2004 (1073 from the 3D-SSFP sequence and 931 from the 3D-TOF-MRA sequence) showed excellent reproducibility, with intraclass correlation coefficient value ranging from 0.915 to 0.999 and interclass values ranging from 0.902 to 0.998. The Mann‒Whitney U-test identified 467, 307, and 774 features that were strongly associated with MVD outcomes from the 3D-SSFP, 3D-TOF-MRA, and combined features, respectively. After applying the Spearman’s rank correlation coefficient, only 80, 80, and 40 features were retained. Finally, the Lasso-Cox regression model was utilized, thus resulting in the retention of 7, 3, and 16 features with nonzero coefficients to construct 3D-SSFP, 3D-TOF-MRA, and their combination signatures (referred to as radiomics signature) (Supplementary Fig. 3). The formulas for the 3D-SSFP, 3D-TOF-MRA, and radiomics signatures and the corresponding contribution coefficients are presented in Appendix E2 and Supplementary Fig. 4.

Radiomics signature construction and validation

The AUC values for 1-, 3-, and 5-year PFS of the prediction models based on 3D-SSFP and 3D-TOF-MRA, as well as radiomics signatures, are presented in Supplementary Fig. 5. The prediction model performed best when the radiomics signature was used, with C-indices of 0.804 and 0.796 in the development and external validation cohorts, respectively. The prediction model based on 3D-TOF-MRA yielded C-indices of 0.705 and 0.665, whereas the 3D-SSFP signature yielded C-indices of 0.682 and 0.708 in the development and external validation cohorts, respectively. The radiomics signature, which showed the best performance, discriminated patients at high-risk from those at low-risk in the development (HR: 9.25; 95% CI: 4.56–18.75; p < 0.001) and test (HR: 6.78; 95% CI: 3.51–13.08; p < 0.001) cohorts. The distributions of radiomics scores in the development and test cohorts are presented in Fig. 4.

Fig. 4
figure 4

Performance of the radiomics signature in predicting pain outcomes and the radiomics scores of each patient. K‒M plots for PFS according to radiomics features in the development (a) and test (b) cohorts. Radiomics score for each patient in the development (c) and test (d) cohorts. PFS, pain-free survival; HR, hazard ratio; ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve

Construction and clinical application of the clinical-radiomics nomogram

The NVC degree (HR: 4.29; 95% CI: 2.46–7.49; p < 0.001), pain type (HR: 4.37; 95% CI: 2.576–7.43; p < 0.001), and radiomics signature (HR: 20.02; 95% CI: 9.42–42.54; p < 0.001) were identified as being independent predictors through univariate and multivariate Cox analyses and used to establish the clinical-radiomics model. The clinical-radiomics model is presented as a nomogram to provide individualized risk estimates (Fig. 5). The C-index and IBS of the nomogram for predicting MVD outcomes were 0.865 and 0.108, respectively, in the development cohort and 0.834 and 0.141, respectively, in the external test cohort. The clinical-radiomics nomogram successfully stratified patients into high-risk and low-risk categories in both the development (HR: 38.75; 95% CI: 16.40–91.54; p < 0.001) and test (HR: 37.18; 95% CI: 13.35–103.59; p < 0.001) sets (Fig. 6). In the test cohort, low-risk patients had significantly higher 1-, 3-, and 5-year PFS rates (99.2%, 96.7%, and 94.9%, respectively) than high-risk patients did (58.8%, 39.3%, and 17.4%, respectively), as presented in Table 2. Additionally, subgroup analysis was conducted and the clinical-radiomics model performed better in patients with severe NVC or typical pain than in those with slight NVC or atypical pain. The performance of the clinical-radiomics model for MVD outcome prediction in CTN patients with various degrees of NVC and pain types is summarized in Table 3. Calibration plots for the clinical-radiomics nomogram demonstrated that the model-predicted 1-, 3-, and 5-year PFS rates were well calibrated in both the development (HoSmer–Lemeshow test, p = 0.718) and test (HoSmer–Lemeshow test, p = 0.214) cohorts, as shown in Supplementary Fig. 6. DCA graphically demonstrated that the nomogram provided the largest net benefit across the range of most threshold probabilities compared with the other models (Supplementary Fig. 7).

Fig. 5
figure 5

Clinical-radiomics nomogram developed in the test cohort to predict 1-, 3-, and 5-year PFS

Fig. 6
figure 6

Performance of the nomogram in predicting pain outcomes. ROC curves of the nomogram for 1-, 3-, and 5-year PFS in the development (a) and test (b) cohorts. K‒M plots for PFS according to the nomogram in the development (c) and test (d) cohorts. PFS, pain-free survival; HR, hazard ratio; ROC, receiver operating characteristic; AUC, area under the receiver operating characteristic curve

Table 2 Median PFS and PFS rates according to risk group defined by the nomogram
Table 3 Subgroup analysis of the performance of the nomogram

Model comparison

The nomogram showed the best C-index, AUC, and IBS scores compared with the other models at various time points in both cohorts (Supplementary Fig. 8 and Table 4). Furthermore, the nomogram indicated an improved prediction performance with significantly positive IDIs when compared to the clinical model in predicting 1-, 3-, and 5-year PFS (Table 5). The radiomics features added value for the prediction of MVD outcomes in both the development and test cohorts at three different time points.

Table 4 Predictive performance of nomogram compared with other models
Table 5 IDI of the nomogram versus clinical model and radiomics model

Discussion

This was the first multi-institutional study to construct a clinical-radiomics nomogram to predict the risk of poor outcomes in patients with CTN following MVD. The clinical-radiomics nomogram based on two clinical factors and 16 radiomics features from 3D-SSFP and 3D-TOF-MRA sequences showed good performance in predicting long-term outcomes following MVD and remained stable across multiple centers. Furthermore, the nomogram outperformed both the clinical and radiomics models in terms of prognostic ability and clinical utility. The clinical-radiomics nomogram could accurately categorize CTN into high-risk and low-risk subgroups for poor MVD outcomes in advance, thus suggesting that our findings may provide value in guiding therapeutic strategies.

Although MVD is widely accepted as being the most effective modality for treating CTN, it is important to note that not all patients are suitable for this procedure and the choice of surgical intervention should be carefully considered after evaluating the corresponding risks and benefits. For example, if pain recurs soon after MVD, this approach may not be cost-effective due to the greater risks associated with surgery and anesthesia compared to other surgical options [27,28,29]. Therefore, the preoperative prediction of MVD outcomes is of great clinical importance in developing personalized surgical plans. Several clinical models have recently been developed by using clinical characteristics and MRI findings [30,31,32,33]. However, these models have shown limited accuracy and have not performed well in external cohorts. Similarly, our clinical model based on pain type and NVC degree showed acceptable discriminatory ability (C-index = 0.758) in the development cohort, whereas it did not perform well in the external validation cohort and only showed low discrimination (C-index = 0.691) [34].

Given that clinical information and NVC characteristics only reflect certain aspects of CTN, we introduced radiomics features in this study, which may provide additional disease information and further improve the performance and accuracy of the prediction model. As previous studies have suggested that the primary location of pathological changes in CTN is the trigeminal root, we chose the cistern segment of the trigeminal nerve as the ROI [24, 35,36,37,38]. Due to the diverse protocols and scanners used by different institutions, z-score normalization was conducted prior to feature extraction in order to minimize variability. Fortunately, 7, 3, and 16 features extracted from 3D-SSFP, 3D-TOF-MRA, and their combination, respectively, were found to be strongly correlated with MVD outcomes and were used to construct radionics-based models. Notably, among these features, most were textural features that reflected microscopic descriptions of the trigeminal root. Additionally, we focused on the shape of the trigeminal root, as previous studies have shown that trigeminal root atrophy is associated with MVD outcomes [13, 39]. However, in this study, we observed no relationship between trigeminal root volume and pain outcomes. This discrepancy may be attributed to the fact that trigeminal root atrophy is better reflected by the cross-sectional area than by the volume, which can be influenced by the length variable [40, 41]. The radiomics signature demonstrated superior performance compared to either of the other two models based on a single sequence. This finding suggested that combining multiple MRI sequences could provide more comprehensive information on the trigeminal root. By combining 3D-SSFP and 3D-TOF-MRA, it becomes possible to discover morphological information and microscopic changes in both the trigeminal root and microvessels simultaneously [16]. Importantly, the predictive performance of radiomics features remained stable, whereas the predictive performance of clinical factors decreased significantly in the external validation set. This difference in stability between clinical and radiomics features could be attributed to the subjective nature of assessing certain clinical features, whereas radiomics features are more objective.

When clinical factors were integrated with radiomics features to establish a fusion model, we observed improved performance. The clinical-radiomics model outperformed both the clinical model and the radiomics model in terms of predictive performance and clinical utility. Based on the best model, we constructed a corresponding nomogram that effectively distinguished high-risk patients from low-risk patients in both the development cohort (HR: 38.75; 95% CI: 16.40–91.54; p < 0.001) and the test cohort (HR: 37.18; 95% CI: 13.35–103.59; p < 0.001). By grouping patients according to the nomogram risk, we found that the 3-year PFS rate for the high-risk group was 39.3%, which decreased to only 17.4% at 5 years. In contrast, the PFS rate in the low-risk group remained high at various time points. Interestingly, the clinical-radiomics model demonstrated superior predictive power in patients with severe NVC and typical pain compared to those with slight NVC and atypical pain. Our findings suggest that there may be other factors (such as central mechanisms) involved in the development of CTN with slight NVC or atypical pain, which correspondingly affects the prognosis of MVD.

Our study had several limitations. First, the retrospective nature of the study may introduce inherent biases, although we included external validation data from two other independent hospitals. To validate the model, further prospective studies are needed. Second, we had to exclude CTN patients with vertebrobasilar dolichoectasia because the trigeminal root was severely detorted and could not be identified. Third, the MRIs that are currently used include other sequences, such as three-dimensional volumetric interpolated breath-hold examinations and three-dimensional turbo spin echo sequences, which are not commonly performed in these centers. These additional sequences may provide additional features and further improve the predictive accuracy. Fourth, a volume effect may exist and correspondingly affect the boundary of the trigeminal root. Fifth, MVD surgeries were performed by more than three senior neurosurgeons in three independent hospitals, leading to inevitable differences in MVD outcomes. Finally, in this large-scale data study, two neuroradiologists manually delineated the trigeminal root. Therefore, it is important for future research to develop more effective automatic segmentation tools.

The results of our study demonstrated that radiomics features can add value to the existing predictive model. Our clinical-radiomics nomogram is a robust and externally validated prediction tool and has the potential to be utilized in clinical practice. However, the clinical application of nomograms in real-world practice still still has a long way to go. Despite significant improvements in predictive power and stability, there remains a risk of incorrect predictions that could harm patients’ interests. Furthermore, challenges related to data accessibility, computational resources, and the interpretability of radiomics features also pose additional obstacles.