Introduction

Nasopharyngeal carcinoma (NPC) is a very common endemic neoplasm in southeast and eastern Asia [1]. Radiotherapy is currently the preferred modality of treatment for non-metastatic NPC, but the temporal lobes (TL) are inevitably included in the radiation field [2, 3]. The reported rate of radiation-induced temporal lobe injury (RTLI) ranges from 4.6 to 8.5% for patients treated with intensity-modulated radiation therapy (IMRT) radiotherapy [4,5,6]. Patients who developed RTLI suffer damages in memory, language, and mobility [7], but nearly half of the patients (45.3%) are asymptomatic for RTLI on diagnosis [7] and the majority of patients were asymptomatic even at a late stage [8]. The early identification or individualized prediction of RTLI is an important requirement for improving the quality of life and prognosis in NPC patients [9].

Some studies focused on identifying the risk factors leading to RTLI. Guan [10] developed a model for the prediction of RTLI in NPC patients including Dmax (the maximum point dose) of the TL, D1cc (the maximum dose delivered to a 1-cm3 volume), T stage, and neutrophil-to-lymphocyte ratios (NLRs). Zeng [11] and Huang [12] reported that D1cc and V20 (absolute volumes of the TL receiving at least 20 Gy) were predictive of RTLI for NPC. However, the optimal dose/volume predictors for RTLI still vary in different studies and the clinical application is limited. The imaging diagnosis of RTLI mainly depends on MRI currently [13]. However, existing conventional MRI techniques can only differentiate RTLI at the irreversible stage [14]. Other advanced imaging modalities, such as diffusion and perfusion MRI, have been reported to provide additional information in RTLI diagnosis [15, 16]. But functional MRI has higher requirements for equipment and scanning technology. Considering the long latency period and few cases of RTLI, it is very challenging to guarantee the accuracy and consistency of the predefined region of interest (ROI) placements; otherwise, it will introduce great variation in the measuring results and lead to inconsistent conclusions. Recently, artificial intelligence (AI) such as radiomics has been widely used in predicting treatment effects such as complications and disease progression [17]. Radiomics describes the process of extracting large amounts of image-based features from routine diagnostic scans. High-dimensional data that quantify tumor shape, image intensity, and texture may reflect the characteristics of the disease, which can be applied within clinical decision support [18].

The purpose of this study is to develop and validate a radiomics-based model for the pretreatment prediction of RTLI in patients with NPC.

Materials and methods

Patient selection

The institutional review board approved this retrospective study, and the requirement to obtain informed consent was waived (institutional ethics approval number 21/278-2949). We searched the radiology reports for the term radiation-induced temporal lobe injury on MRI scans obtained between January 2017 and May 2021. A total of 108 NPC patients of RTLI were included according to the inclusion and exclusion criteria (Fig. 1; Supplementary Text). The 108 controls were randomly selected from patients without RTLI after IMRT between January 2017 and May 2021 according to the inclusion and exclusion criteria (Fig. 1; Supplementary Text). Thus, 216 patients were included in this study, which were randomly allocated to a training set (136 patients) and a validation set (80 patients).

Fig. 1
figure 1

Diagram for inclusion of patients into the study. IMRT intensity-modulated radiotherapy, NPC nasopharyngeal carcinoma, RTLI radiation-induced temporal lobe injury

Baseline clinical-pathologic data, including gender, age, NLRs, stage (T stage, N stage, and clinical stage), pathologic type, treatment, Dmax for each TL, and the planning gross tumor volume included the primary nasopharyngeal tumor or enlarged retropharyngeal nodes (PGTVNX) were obtained from the medical records. All patients underwent a standard treatment regimen that consisted of IMRT and concurrent or adjuvant chemotherapy with or without induction. IMRT was performed with a total dose of 70–76 Gy and took 30–33 times to complete. After completion of radiation therapy, follow-up MRI of the head and neck was performed every 1–3 months during the first 2 years, every 6 months in years 3–5, and annually thereafter [2]. Diagnostic criteria for temporal lobe injury [19] were as follows: (a) white matter lesions, defined as areas of finger-like lesions of increased signal intensity on T2-weighted images; (b) contrast-enhanced lesions, defined as lesions with or without necrosis on post-contrast T1-weighted images with heterogeneous signal abnormalities on T2-weighted images; (c) cysts, round or oval well-defined lesions of very high signal intensity on T2-weighted images with a thin or imperceptible wall.

Image acquisition

MRI examinations were performed by using 3.0-T scanner (GE Discovery MR 750, General Electric Medical Systems) with an 8-channel head and neck phased array coil. Axial fast spoiled gradient-echo (FSPGR) contrast-enhanced T1-weighted (CE-T1w) imaging was performed 60 s after intravenous bolus injection of gadopentetate dimeglumine (Magnevist, Bayer, Leverkusen) at a dosage of 0.2 ml/kg of body weight and 1.5 ml/s using a power injector. The imaging protocol with parameters used is detailed in Supplementary Table 1.

Temporal lobe segmentation

Axial fat-suppressed T2-weighted (FS-T2w) and CE-T1w MR images were loaded into ITK-SNAP software (version 3.6.0, http://www.itksnap.org) for segmentation. The ROI was manually delineated along the boundaries of the middle and lower portions of the TL, from the top level of the cerebral peduncle to the bottom of the TL. Bilateral temporal lobes were covered in the ROI of all patients. All segmentations were performed by one radiologist (D.B., with 5 years of experience in head and neck MRI diagnosis) and confirmed by another senior radiologist (D.H.L., with 38 years of experience). Disagreements were resolved by consensus. To assess for segmentation variability, a subset of 30 randomly selected patients was independently delineated by one radiologist (Y.F.Z., with 19 years of experience). The reliability was calculated by using the Dice similarity coefficient (DSC). According to the guidelines [20], DSC ≥ 0.75 indicates good agreement.

Radiomics feature extraction

Radiomic features were extracted from the volume of interests (VOIs) by using the AK software (Analysis Kit, GE Healthcare). MR images were normalized by centering to the mean standard deviation, resampled to a voxel size of 1 × 1 × 1 mm3 using B-Spline interpolation with gray-level discretized by a fixed bin width of 25 in the histogram. In total, 1316 radiomics features including 14 shape features, 252 first-order intensity features, and 1050 texture features were extracted from each sequence.

Feature selection and signature construction

We devised a three-step procedure for dimensionality reduction and selection of robust features (Fig. 2). First, the least absolute shrinkage and selection operator (LASSO) regression was performed to reduce irrelevant features in the training set. Pearson correlation analysis was used to further reduce the redundancy of radiomic features with one of the paired significantly correlated features (p < 0.05 and correlation coefficient > 0.5) removed from further analysis. Finally, multivariable logistic analysis was applied to select RTLI-related features with p < 0.05. A radiomics signature (Rad-score) was generated via a linear combination of selected features weighted by their respective coefficients. The association of clinical variables with RTLI was evaluated by using logistic regression analysis.

Fig. 2
figure 2

Workflow of the development and testing of a radiomics model. GLCM gray level co-occurrence matrix, GLDM gray level dependence matrix, GLRLM gray level run length matrix, GLSZM gray level size zone matrix, NGTDM neighboring gray tone difference matrix

Model development and validation

Logistic regression analysis was conducted to develop three models for RTLI prediction in the training cohort: including radiomics signature, clinical variables, and clinical-radiomics parameters, respectively. A function on the basis of the variance inflation factor was conducted to check for the collinearity of variables included in the regression equations. A variance inflation fact factor greater than 10 indicates multicollinearity [21]. The predictive performance of established models was quantified by the receiver operating characteristic (ROC) curve and the area under the curve (AUC). AUC estimates were compared between prediction models by using the Delong nonparametric approach. Tenfold cross-validation was performed with iteration of model development. The average AUC and average sensitivity, specificity, and accuracy were provided as performance metrics. To provide a more understandable outcome measure, a nomogram was then constructed. Calibration curves were plotted via bootstrapping with 1000 resamples to assess the calibration of the radiomics model, accompanied by the Hosmer-Lemeshow goodness-of-fit test. Decision curve analysis (DCA) was used to calculate the net benefit from the use of the radiomics model at different threshold probabilities in the validation dataset. Patients were classified into high-risk or low-risk groups according to the clinical-radiomics model, and the threshold was identified by using ROC with the AUC analysis. The predictive ability of the model in subgroups with different clinical-pathologic characteristics was assessed with ROC analysis.

Statistical analysis

Categorical variables were compared by x2 test or Fisher exact test. Continuous variables were compared by independent samples t-test or Mann-Whitney U test. Statistical analysis was performed by using SPSS 26.0 (IBM) and R software (version 3.4.4, www.r-project.org). A two-sided p value less than 0.05 was considered to indicate statistical significance. The packages in R used in this study are described in Supplementary Table 2.

Results

Patient demographics

A total of 216 patients, including 145 men (mean age, 47.2 years; age range, 10–73 years) and 71 women (mean age, 44.1 years; age range, 9–65 years), were identified according to the inclusion and exclusion criteria (Fig. 1; Supplementary Text). Clinical characteristics of the training (n = 136) and validation (n = 80) sets are summarized in Table 1. There were no differences in clinical characteristics between the training and validation cohorts. Baseline clinical characteristics in patients with and without RTLI are summarized in Supplementary Table 3. 108 patients included in the present study were diagnosed with RTLI (bilateral, 23; left, 39; right, 46). The ratio of RTLI was 56.88% (76 of 136) and 24.12% (32 of 80) in the training and validation cohorts, respectively. The median duration of follow-up was 33.3 months (interquartile range, 25.8–41.9 months) until RTLI and 61.0 months (interquartile range, 53.2–66.7 months) without RTLI.

Table 1 Characteristics of patients in the training and validation cohorts

Inter-observer reproducibility variability of segmentation

The intra-reader Dice value was 0.98 ± 0.002 (range 0.978–0.985) for FS-T2w sequences and 0.98 ± 0.002 (range 0.977–0.985) for CE-T1w sequences between the two radiologists. These results indicated a favorable inter-observer reproducibility for manual segmentation.

Feature selection and radiomics signature construction

Among 2632 extracted radiomics features from both FS-T2w and CE-T1w images, 120 features associated with RTLI in the LASSO regression algorithm were identified. The Pearson correlation analysis was then used to select 11 features for subsequent analysis. The 11 radiomics variables in patients with and without RTLI are summarized in Supplementary Table 3. The 2 most relevant and stable features (one ngtdm and one glszm feature) from the training set were selected. The radiomics signature was constructed, with a Rad-score calculated by using the following formula:

$$ \mathit{\log}\left( radiomics\ score\right)=-15.34+31876.46\times T2-W\_ ngtdm\_ strength+16.38\times CET1-W\_ glszm\_ Small\ Area\ Emphasis $$

where ngtdm quantifies the difference between a gray value and the average gray value of its neighbors within distance δ, and glszm is the amount of homogeneous connected areas within the volume of a certain size and intensity.

Prediction model development and validation

The radiomics signature indicated a favorable prediction of RTLI with an AUC of 0.89 (95% confidence interval [CI]: 0.83–0.94) in the training cohort and 0.92 (95% CI:0.85–0.99) in the validation cohort. In the training cohort, 2 clinical variables (age, p = 0.01; T stage, p < 0.001) were predictive of RTLI in multivariable analysis. A clinical prediction model was built based on the two independent predictors without the addition of a radiomics signature. With the use of multivariable logistic regression analysis, independent predictors were identified for the clinical-radiomics model (Table 2). The variance inflation factors of the four potential predictors ranged from 1.039 to 1.081, indicating no multicollinearity. The AUCs of the clinical-radiomics model for predicting RTLI in the training and validation cohorts were 0.93 (95% CI: 0.88–0.97) and 0.95 (95% CI: 0.90–1.00), respectively. The AUC value of the clinical-radiomics model was higher than that of the radiomics model, but the difference was not statistically significant (p = 0.09) in the validation cohort, while the radiomics model was significantly better than the clinical model in the prediction of RTLI (p = 0.02) (Table 3 and Fig. 3).

Table 2 Risk factors for radiation-induced temporal lobe injury of nasopharyngeal carcinoma in the training cohort
Table 3 Predictive performances of three models in predicting the radiation-induced temporal lobe injury in the training and validation cohort
Fig. 3
figure 3

Performances of three models in training cohort and validation cohort. a, b Radiomics model, including two radiomics features- FS/T2w-lbp-3D-k_ngtdm_Strength and CET1w- wavelet-HHH_glszm_SmallAreaEmphasis. c, d Clinical model, including two clinical variables- T stage and age. e, f Clinical-radiomics radiomics, integrated two clinical variables and two radiomics features. ROC receiver operating characteristic curve

A nomogram integrating Rad-Score and two clinical features was constructed (Fig. 4a). The calibration curve of the clinical-radiomics nomogram demonstrated good agreement between predicted and observed RTLI in both the training and validation cohorts (Fig. 4b, c). No significant difference was found in the Hosmer–Lemeshow test (p = 0.08), suggesting no departure from the good fit. The DCA showed that the radiomics signature and the clinical-radiomics model provide a better net benefit to predict RTLI than the clinical model across the majority of the range of reasonable threshold probabilities (Fig. 5).

Fig. 4
figure 4

Radiomics nomogram developed with receiver operating characteristic curves and calibration curves. a A radiomics nomogram was constructed in the training cohort, with radiomics score, T stage and age incorporated. Calibration curves of the radiomics nomogram in the (b) training and (c) validation cohorts. CET1-SAE CET1w_wavelet-HHH_glszm_SmallAreaEmphasis, T2-Strength T2w_lbp-3D-k_ngtdm_Strength

Fig. 5
figure 5

Decision curve analysis for each model in the validation dataset. The y-axis measures the net benefit, which is calculated by summing the benefits (true-positive findings) and subtracting the harms (false-positive findings), weighting the latter by a factor related to the relative harm of undetected radiation-induced temporal lobe injury (RTLI) compared with the harm of unnecessary treatment

The optimum cutoff of the clinical-radiomics model was generated by the ROC analysis with the AUC equals 0.732 from the training cohort. The average AUC of the clinical-radiomics model from 10-fold cross-validation was 0.93 (sensitivity, 97%; specificity, 70%; and accuracy, 83%) with a threshold probability of 0.732. Accordingly, patients were classified into a high-risk group (Rad-score ≥ 0.732) and a low-risk group (Rad-score < 0.732). When assessing the distribution of risk value and RTLI status, patients with lower risk values generally had a lower probability of RTLI than higher risk values (Supplementary Fig. 1). When the patients were stratified based on clinical-pathologic factors, an excellent predictive performance of the clinical-radiomics model was found in all subgroups (AUC = 0.88–0.97) (Table 4 and Fig. 6).

Table 4 Diagnostic performance of clinical-radiomics model within different clinical-pathologic subgroups
Fig. 6
figure 6

The performances of clinical-radiomics model within different clinical-pathologic subgroups. ROC analysis with the AUC to evaluate the clinical-radiomics model as an independent biomarker in the following clinical-pathologic factors respectively: a, b gender (male or female); c, d age (< 40 or ≥ 40); e, f TNM stage (I–III or IV); g, h pathologic type (differentiated non-keratinizing, or undifferentiated non-keratinizing); i, j synchronous chemotherapy (untreated or treated); k, l targeted therapy (untreated or treated); m, n Dmax of left temporal lobe (< 68 Gy or ≥ 68 Gy); o, p Dmax of right temporal lobe (< 68 Gy or ≥ 68 Gy)

Discussion

In this study, we developed and validated a radiomics model to evaluate the prediction of nasopharyngeal carcinoma (NPC) patients at risk of radiation-induced temporal lobe injury (RTLI) by radiomics features which are extracted from pretreatment MRI of the temporal lobe. The radiomics model demonstrated excellent predictive performance with the validation set (AUC, 0.92; sensitivity, 66%; specificity, 96%). The clinical-radiomics model showed excellent predictive performance of RTLI in patients within different clinical-pathologic subgroups, thereby may facilitate pretreatment discrimination of NPC patients at high risk for RTLI.

The RTLI-related radiomics features with the maximum significance in the present study were “lbp-3D-k_ngtdm_Strength” and “wavelet-HHH_glszm_SmallAreaEmphasis,” which were extracted from FS-T2w and CE-T1w images respectively. The precise mechanism that leads to RTLI and the association with TL heterogeneity remains unknown and is rarely investigated currently. SmallAreaEmphasis (SAE) measures the distribution of small size zones, with a greater value indicative of smaller size zones and more fine textures [22]. Strength is a measure of the primitives in an image, and its value is high when the primitives are easily defined and visible [23]. High values of SAE and strength in our study, increasing the radiomic score, were found to be associated with patients more prone to develop RTLI. SAE with a greater value indicates the minor difference between the gray-level values, and the higher value of strength is associated with an image with a slow change in intensity, indicating less heterogeneity of image textures [22,23,24]. This corresponded to the abundance of cells in the VOI of TL, with the cells arranged tightly and regularly. Furthermore, abundant blood supply and high oxygen demand of the corresponding TL, which means more sensitivity to radiotherapy [25, 26], thus more prone to develop RTLI [27]. Thus, our study demonstrated that this radiomics model could predict RTLI more accurately with an AUC of 0.89 in the training cohort and 0.92 in the validation cohort.

Our study focuses on the pretreatment MR images of NPC patients, enables early identification of RTLI, and provides the earliest prevention or protective personalized clinical treatment. Unlike the previous study that only included T2w sequence [28], CET1-w images were also included in our study. Some studies reported that the histological heterogeneity and structural changes associated with RTLI may be related to contrast enhancement [19, 29, 30]. Our result that the radiomics features finally selected were derived from both FS-T2w and CE-T1w images was consistent with it. Although the most frequent component of radiation-induced injury identified in some studies was white matter lesion, there were still some patients with extensive damage [31]. Therefore, the TL VOI we delineated not only included white matter, and this segmentation of the whole TL was more convenient and practical.

T stage and age were also found to be significant indicators of RTLI risk in the clinical model, which were consistent with previous studies [10, 28]. In clinical treatment, physicians have more interests in the clinical applications of AI models or comparison with clinical impact factors [32]. The clinical-radiomics model in our study successfully identified high-risk patients with RTLI, for whom earlier preventive treatment was recommended. The ability of radiomics features to help predict RTLI when the patients were stratified based on clinical-pathologic factors was evaluated and excellent predictive performances of the clinical-radiomics model were found in all subgroups.

Our study had several limitations. Firstly, this study was retrospective with possible selection bias. The included patients without RTLI after IMRT were randomly selected. Although the preferred design should include all patients to ensure that no bias is introduced for all relevant risk factors and outcomes [33], the low incidence of RTLI in clinical and the long follow-up time needed for RTLI outcomes in NPC may make the research hard to implement. Second, we did not perform the external validation with independent data sets for generalization. The DCA and subgroup validation of different clinical factors used in this study, which enables the evaluation of clinical relevance in a traditional decision-analytic approach, justified that the identified radiomics signature and radiomics nomogram hold great potential for clinical application in RTLI outcome estimation. Third, the ROI of the TL was drawn manually, which is a time-consuming task and requires automated segmentation techniques in the near future. Finally, the dosimetric parameters included in this study were limited and not independent predictors of RTLI in the training set. In general, the feasibility of radiomics and clinical and dosimetric parameters to predict RTLI should be explored by future studies, especially prospective studies, with larger sample sizes at multicenter institutions.

In summary, we developed and validate a machine learning approach for predicting radiation-induced temporal lobe injury (RTLI) in patients with nasopharyngeal carcinoma (NPC) by pretreatment temporal lobe MRI. The identified radiomics signature has the potential to be used as a biomarker for risk stratification in RTLI. The radiomics nomogram described here, which well demonstrated the incremental value of the radiomics signature to other clinical-pathologic factors for accurate prediction of RTLI, further studies are required to explore the generalized utility of our model and apply our results to clinical application.