Introduction

Nasopharyngeal carcinoma (NPC) is a malignant head and neck cancer prevalent in southern China, especially in Guangdong and Guangxi Province [1, 2]. Radiotherapy (RT) with or without adjuvant chemotherapy remains the primary treatment for NPC due to its radiosensitivity [3, 4]. However, owing to the proximity to the RT target area, radiation-induced temporal lobe necrosis (RN) is a common late neurological complication, with an incidence of 4.6–8.5% according to a previous report [5]. Evidence has shown that RN is permanent, irreversible, and may progress with time [6, 7], which would cause a series of neurological symptoms, such as memory loss, cognitive impairment, dysfunction in attention and visual perception, thereby severely affecting the patients’ quality of life [8, 9]. Thus, accurate prediction of RN at the acute or early-delay stage post-RT (less than 6 months post-RT, which is reversible) would be helpful. Recent reports have indicated that RT-related brain alterations at the early stage are the warning signs of neurocognitive dysfunction at the late-delay stage post-RT [10,11,12,13,14]. However, whether RT-related brain alterations at the early stage can be used to predict RN at the late-delay stage is still an open question. Therefore, the knowledge of the longitudinal evolution of brain structure changes from the acute stage to the late-delay stage in patients with NPC post-RT and its relationship with RN is of great significance for the early identification and prevention of RN.

Cerebral white matter (WM) has been reported to be more susceptible to radiation damage than gray matter (GM), which may be attributed to the relative scarcity of the blood supply to the WM [15, 16]. Increasing evidence has shown that RT-related WM microstructural changes and WM volumetric alterations are potential biomarkers of radiation brain injury [10, 11, 17,18,19]. More recently, studies have indicated that early RT-related WM microstructural destruction plays a potential predictive role for late-delay stage RN [20, 21]. Using diffusion kurtosis imaging (DKI), Lu et al retrospectively evaluated the early diffusion and kurtosis features of temporal WM in patients with NPC who developed RN. Their results showed that the mean kurtosis decreased significantly at 4 weeks during RT and 1 month after RT, which may represent a valuable early indicator of RT-induced WM microstructural damage to the temporal lobe [20]. However, to date, limited studies have assessed the significance of RT-related WM volumetric alterations in the detection of RN, which is easier to obtain in clinical practice than diffusion [19, 22]. Our previous study showed that patients with NPC had time- and dose-dependent WM atrophy in selective subfields of the temporal lobe after RT compared to healthy controls [23]; however, the relationship between this time- and dose-dependent WM atrophy and RN remains unclear. Therefore, we attempted to further investigate whether RT-related temporal WM volumetric alterations at the early stage could serve as an imaging marker for RN at the late-delay stage.

Machine learning has been widely used in medical imaging, and it can extract meaningful patterns from imaging data with different algorithms to classify and predict at the individual level [24, 25]. In this study, we aimed (1) to delineate the divergent patterns of longitudinal WM volumetric alterations in patients with NPC between those who developed RN and those who did not, and (2) to construct a predictive model for RN with WM volumetric alterations at the early stage post-RT with machine learning methods.

Materials and methods

Participants

Between December 2014 and August 2018, 150 treatment-naive patients with NPC were recruited, and they underwent baseline magnetic resonance imaging (MRI). All patients had been pathologically confirmed, and the clinical stage of each patient was defined according to the American Joint Committee on Cancer System (7th edition); thereafter, all patients received standard RT at the Sun Yat-sen University Cancer Center (Guangzhou, China). Seventy-four of these patients (aged 18–55 years, 46 males and 28 females) who completed at least 36 months follow-up and had at least one MRI at 1–3 or 6 months post-RT (some also had MRI at 9–12 months post-RT) were finally enrolled in the present study. The details of the inclusion criteria are presented in the Supplementary Materials. During the follow-up period, 15 patients developed RN, which was confirmed through repeated MRI follow-ups and were classified as the RN group, while 59 patients who did not develop RN were classified as the NRN group. The procedures for the enrollment and follow-up of the participants are illustrated in Fig. 1. The clinical data of each patient in the RN group are summarized in Table 1, and the distribution of necrotic lesions is shown in Fig. 2. In addition to the baseline (pre-RT) MRI collected from all patients, 68 patients had a follow-up MRI within 3 months post-RT, 54 patients had a follow-up MRI at 6 months post-RT, and 31 patients had a follow-up MRI at 9–12 months post-RT; thus, 227 MRI datasets were finally collected. This study was approved by the local Institutional Review Board, and written informed consent was obtained from all participants.

Fig. 1
figure 1

Enrollment and follow-up procedures for patients with NPC. *Forty-eight patients did not complete at least a 36-month follow-up, and 20 patients did not have at least one MRI at 1-3 or 6 months post-RT during the follow-up. Abbreviations: Mos, months; NPC, nasopharyngeal carcinoma; RT, radiotherapy; RN, radiation-induced temporal lobe necrosis; LMM, linear mixed model

Table 1 Clinical data of the patients in the RN group
Fig. 2
figure 2

Density distribution maps showing the overlap of necrotic lesions in the full sample of 15 patients in the RN group. Abbreviations: RN, radiation-induced temporal lobe necrosis

Treatment

All NPC patients were treated with intensity-modulated radiation therapy (IMRT) (n = 69) or tomography radiation therapy (n = 5) at the Sun Yat-sen University Cancer Center. The prescribed regimen included a total dose of 68–70 Gy in 30–33 fractions at 2.12–2.33 Gy/fraction to the planning target volume (PTV) of the primary gross tumor volume (GTVnx), 60–70 Gy to the PTV of the GTV of involved lymph nodes (GTVnd), 60–64 Gy to the PTV of the high-risk clinical target volume (CTV1), and 54–58 Gy to the PTV of the low-risk clinical target volume (CTV2). Details are presented in the Supplementary Materials.

Dosimetric evaluation for the bilateral temporal lobes

The bilateral temporal lobes were manually delineated on treatment planning CT axial images by referring to T1-weighted anatomical imaging, and the dose-volume statistics for the temporal lobes were calculated. Dose evaluation was performed based on the data from the dose-volume histogram for the targets. The main evaluation parameters were maximum dose (Dmax), mean dose (Dmean), and minimum dose (Dmin) received by temporal tissues. Dose-volume statistics of the temporal lobe in the RN group and the NRN group are listed in Supplementary Table 1.

MRI acquisition

We collected MRI data with a GE Discovery MR 750 3.0-T scanner (GE Medical Systems) using a 16-channel head and neck coil at the Department of Medical Imaging, Sun Yat-sen University Cancer Center. Conventional MRI sequences, including axial T1-weighted and T2-weighted images, T2-FLAIR images, and contrast-enhanced T1-weighted images, were acquired to rule out intraparenchymal invasion and to detect RN during follow-up. Additionally, high-resolution structural MRI scans with T1-weighted three-dimensional brain volume imaging (3D-BRAVO) were obtained for volumetric evaluation. The scanning parameters were as follows: TR = 8.16 ms, TE = 3.18 ms, TI = 800 ms, flip angle = 8°, acquisition matrix = 256 × 256, field of view = 256 × 256 mm2, and voxel size = 1 × 1 × 1 mm3.

Image processing

Structural images were processed using the open-source FreeSurfer pipeline (version 6.0, http://surfer.nmr.mgh.harvard.edu). The processing procedure mainly included skull removal, B1 bias field correction, data registration, and gray-white matter segmentation [26]. Finally, 11 subfields of the temporal lobe were extracted using an automated FreeSurfer segmentation script, and the WM volumetric values of these regions were subsequently measured. The 11 subfields included banks of the superior temporal sulcus (BANKSSTS), fusiform (FUS), inferior temporal gyrus (ITG), middle temporal gyrus (MTG), parahippocampal gyrus (PHG), pars opercularis, superior temporal gyrus (STG), supramarginal gyrus (SMG), temporal pole (TP), transverse temporal gyrus (TTG), and insular gyrus.

Statistical analysis

All analyses were conducted with the open-source R software (version 4.0.2).

Demographic analysis

Two-sample independent t-test was performed to evaluate the age and education level differences between the two groups at baseline. Pearson’s chi-squared test was applied to assess the differences in gender and treatment between the two groups at baseline. Differences with a p value < 0.05 (two-sided) were considered statistically significant.

Longitudinal temporal WM volume trajectories in patients with NPC post-RT

To reduce the individual differences, we standardized the WM volume of each patient at each time point during follow-up by dividing by their baseline data. Additionally, in five patients with NPC in the RN group who had necrotic lesions on both sides of the temporal lobes, we treated the unilateral temporal lobe as one sample, thus forming 20 samples. For patients in the NRN group, as no differences were found between the left and right sides (Supplementary Table 2), the WM volumes of the 11 subfields of temporal lobes were bilaterally averaged, thus forming 59 samples.

Given that our previous studies have shown that the RT-associated morphological alterations in patients with NPC are linear [14, 19], linear mixed models (LMM) were used to evaluate the longitudinal changes in WM volume of the subfields of temporal lobes, which modeled fixed and random effects simultaneously and accounted for unequal sampling intervals and missing data [27, 28]. First, to explore whether the WM volume underwent different changes post-RT between the two groups, we modeled the first LMM (model 1) with an interaction effect between groups (RN and NRN group) and time (in months):

$$ Yij={\gamma}_{00}+{\gamma}_{01}\left( treatmen{t}_j\right)+{\gamma}_{02}\left( grou{p}_j\right)+{\gamma}_{10}\left( time{i}_j\right)+{\gamma}_{11}\left( grou{p}_j\times time{i}_j\right)+{\mu}_{0j}+{\mu}_{1j}\left( time{i}_j\right)+{r}_{ij}\ \left(\mathrm{Model}\ 1\right) $$
(1)

where Yij denotes WM volume for the jth participant on the ith follow-up visit after RT, and the longitudinal variable represents the time interval after completion of RT. The pretreatment session with the quality structural MRI data of each participant was defined as the baseline. Time always started from zero. The longitudinal time-dependent effect was expressed as a simple regression between time and Y with residual r. Group (RN or NRN group) and treatment (with or without neoadjuvant chemotherapy) were binary dummy variables.

Then, for the selective subfields that showed significantly different longitudinal changes in WM volume between groups, we examined the longitudinal changes in the two groups separately via the following second LMM:

$$ {Y}_{ij}={\gamma}_{00}+{\gamma}_{01}\left( treatmen{t}_j\right)+{\gamma}_{10}\left( tim e{i}_j\right)+{\mu}_{0j}+{\mu}_{1j}\left( tim{e}_{ij}\right)+{r}_{ij}\left(\mathrm{Model}\ 2\right) $$
(2)

where the variables are the same as those in model 1. To address the multiple testing for the various hypotheses, the Holm-Bonferroni method was applied for controlling family-wise errors at an alpha level of 0.05.

Construction of prediction models for RN with temporal WM volume alterations at the early stage post-RT

First, to reduce the difference in dimensions and value ranges between features, we standardized the data and mapped the values to [0, 1] through min-max normalization. Next, the WM volumes of the 11 subfields of temporal lobes at the early stage (1–3 months and 6 months post-RT) were used as structural features, of which the relatively prominent features were extracted by the Boruta package embedded in the R software, and then, machine learning methods were used to construct the prediction models. The RN and NRN groups were randomly sorted into the training set and testing set at a ratio of 7:3. Four machine learning methods, namely support vector machine (SVM), logistic regression (LR), random forest (RF), and k-nearest neighbors (KNN), were used to construct prediction models for RN with temporal WM alterations at the early stage; age and dose were also included as factors in the models. With 10-fold cross-validation, receiver operating characteristic (ROC) curves and the corresponding area under the curve (AUC) values were used to evaluate the diagnostic efficiency of different models. Based on the best diagnostic threshold, the accuracy, sensitivity, and specificity were calculated. Moreover, we constructed two additional prediction models with dosimetric factor and temporal WM volume only to compare their independent predictive values. DeLong tests were used to assess the differences in the ROC curves.

Construction of prediction models for RN with absolute temporal WM volume at the early stage post-RT and verified in another independent dataset

In addition to temporal WM volume alteration, we constructed prediction models for RN with absolute temporal WM volume (which is more easily obtained in clinical practice) at 1–3 months post-RT using similar methods to those presented in the “Construction of prediction models for RN with temporal WM volume alterations at the early stage post-RT.” We verified the models with another independent dataset; detailed information on the independent dataset is presented in the Supplementary Materials.

Results

Demographic and clinical data at baseline

The demographic characteristics of the RN group and the NRN group at baseline are presented in Table 2. There were no significant differences in age (40.47 ± 7.73 years for RN, 38.25 ± 9.46 years for NRN; p = 0.41), gender (8 males and 7 females for RN, 38 males and 21 females for NRN; p = 0.43), education level (10.53 ± 3.64 years for RN, 12.22 ± 3.03 years for NRN; p = 0.07), and treatment (10 patients with neoadjuvant chemotherapy and 5 patients without neoadjuvant chemotherapy for RN, 28 patients with neoadjuvant chemotherapy and 31 patients without neoadjuvant chemotherapy for NRN; p = 0.18) between the two groups at baseline.

Table 2 Demographic characteristics of the RN group and NRN group at baseline

Divergent trajectories of temporal WM volume changes in the RN and NRN groups within 1-year post-RT

Significant different trajectories were observed in the STG between the RN and NRN groups (βTime*Group = 1.98×10-3, p = 0.004) (Fig. 3a, Supplementary Table 3). Although both groups exhibited time-dependent WM atrophy in the STG in the first year post-RT, the RN patients (βTime = −2.90 × 10-3, p < 0.001 ) showed more rapid atrophy than those in the NRN group (βTime = −9.52 × 10-4, p = 0.001) (Table 3 and Fig. 3b, c).

Fig. 3
figure 3

Longitudinal WM volume alterations in the STG within 1-year post-RT. a Different trajectories of the WM volume in the RN group and NRN group. b Longitudinal WM volume alterations in the RN group. c Longitudinal WM volume alterations in the NRN group. Abbreviations: WM, white matter; STG, superior temporal gyrus; RT, radiotherapy; RN, radiation-induced temporal lobe necrosis; NRN, no radiation-induced temporal lobe necrosis

Table 3 Longitudinal RT-related WM atrophy in the STG in the RN group and the NRN group. Statistically significant effects (p < 0.05) appear in bold (corrected for multiple comparisons, p < 0.0045 for all the 11 parcels)

Temporal WM volume alterations at the early stage post-RT could predict RN occurrence in patients with NPC during 36 months of follow-up

WM volumetric alterations in the TP, FUS, BANKSSTS, MTG, STG, SMG, and ITG at 1–3 months post-RT and WM volumetric alterations in the BANKSSTS, insular gyrus, and TP at 6 months post-RT had important characteristic attributes (Fig. 4), which were used to construct the prediction models. All models constructed with different machine learning methods (SVM, LR, RF, and KNN) showed acceptable predictive power (AUC = 0.879, 0.853, 0.719 and 0.725, accuracy = 0.778, 0.833, 0.818, and 0.875 for SVM, LR, RF and KNN at 1–3 months post-RT, respectively; AUC = 0.806, 0.800, 0.767, and 0.664, accuracy = 0.818, 0.842, 0.833, and 0.789 for SVM, LR, RF, and KNN at 6 months post-RT, respectively), while the prediction model constructed by SVM had the best diagnostic performance on the testing set verification (Table 4 and Fig. 5). The AUC value, accuracy, sensitivity, and specificity were 0.879, 0.778, 0.714, and 0.800, respectively at 1–3 months post-RT, while the AUC value, accuracy, sensitivity, and specificity were 0.806, 0.818, 0.750, and 0.833 at 6 months post-RT, respectively.

Fig. 4
figure 4

Feature selection using the Boruta package. The green bars represent important attributes, the red bars represent unimportant attributes, and the yellow bars represent attributes of uncertain importance. The WM volume alterations in the TP, FUS, BANKSSTS, MTG, STG, SMG, and ITG at 1–3 months post-RT (a), and the WM volume alterations in the BANKSSTS, insular gyrus, and TP at 6 months post-RT (b) had important characteristic attributes. Abbreviations: WM, white matter; TP, temporal pole; FUS, fusiform; MTG, middle temporal gyrus; STG, superior temporal gyrus; SMG, supramarginal gyrus; ITG, inferior temporal gyrus; RT, radiotherapy; NPC, nasopharyngeal carcinoma

Table 4 Predictive efficiency for RN with temporal WM volume alteration at 1–3 months and 6 months in NPC patients post-RT based on different machine learning methods
Fig. 5
figure 5

ROC curves of multiple classifiers based on machine learning methods for original data with temporal WM volume alteration. The AUC values of the prediction models constructed based on SVM, LR, RF, and KNN were 0.879, 0.853, 0.719, and 0.725 at 1–3 months post-RT, respectively, and 0.806, 0.800, 0.767, and 0.664 at 6 months post-RT, respectively. Abbreviations: ROC, receiver operating characteristic; WM, white matter; AUC, area under the curve; SVM, support vector machine; LR, logistic regression; RF, random forest; KNN, k-nearest neighbors; RT, radiotherapy.

Moreover, based on the SVM, the result of the prediction model constructed with only the temporal WM volume at 1–3 months post-RT showed an AUC value, accuracy, sensitivity, and specificity of 0.841, 0.733, 0.750, and 0.727, respectively, while the result of the prediction model constructed with only the dosimetric factor showed an AUC value, accuracy, sensitivity, and specificity of 0.789, 0.783, 0.500, and 0.842, respectively (Supplementary Figure 2). Furthermore, we found the model constructed with both the temporal WM volume at 1–3 months post-RT and the dosimetric factor performed better than the model constructed with only the temporal WM volume at 1–3 months post-RT (p = 0.04) and also performed better than the model constructed with only the dosimetric factor (p < 0.001), while the model constructed with only the temporal WM volume at 1–3 months post-RT performed better than the model constructed with only the dosimetric factor (p = 0.01).

Absolute temporal WM volume at the early stage post-RT could also predict RN occurrence in patients with NPC during 36 months of follow-up, which was verified in another independent dataset

Interestingly, prediction models constructed with absolute temporal volume at 1–3 months post-RT also presented good predictive performance, while the model constructed by SVM had the best diagnostic performance, with an AUC value, accuracy, sensitivity, and specificity of 0.842, 0.826, 0.750, and 0.842, respectively (Table 5 and Fig. 6). The model was verified by another independent dataset, and the AUC value, accuracy, sensitivity, and specificity were 0.773, 0.737, 0.625, and 0.818, respectively (Table 5 and Fig. 6).

Table 5 Predictive model constructed with absolute temporal WM volume at 1–3 months post-RT for original data had acceptable predictive efficiency for external independent verification data based on the SVM method
Fig. 6
figure 6

ROC curves of multiple classifiers based on machine learning methods for original data and other independent verification data with absolute temporal WM volume at 1–3 months post-RT. The prediction models constructed based on SVM, LR, RF, and KNN were verified on the testing set, with AUC values of 0.842, 0.798, 0.771, and 0.725, respectively, while the prediction models constructed based on SVM, LR, RF, and KNN were verified using an independent verification dataset, with AUC values of 0.773, 0.659, 0.534, and 0.318, respectively. Abbreviations: ROC, receiver operating characteristic; AUC, area under the curve; WM, white matter; SVM, support vector machine; LR, logistic regression; RF, random forest; KNN, k-nearest neighbors; RT, radiotherapy

Discussion

To the best of our knowledge, this study is the first prospective study to combine quantitative MRI technology and machine learning methods to explore the predictive value of RT-related WM volumetric alterations at the early stage for RN occurrence at the late-delay stage. Although both RN and NRN patients exhibited progressive atrophy in the STG within 1-year post-RT, the patients in the RN group showed a more rapid volume decrease than those in the NRN group. Furthermore, the temporal WM volume alterations at the early stage (1–3 months and 6 months post-RT) could identify and predict RN occurrence at the late-delay stage, and the model constructed by SVM had the best diagnostic performance. Notably, the predictive model constructed with absolute temporal volume at 1–3 months post-radiotherapy also presented a similar predictive performance, which was verified by another independent dataset. Thus, temporal WM volume alterations (or absolute volume) in patients with NPC at the acute stage and early-delay stage post-RT may serve as potential biomarkers for the prediction of RN at the late-delay stage.

Although both RN and NRN patients exhibited time- and dose-dependent WM atrophy in the STG within 1-year post-RT, the RN patients had increased atrophy compared to the NRN patients. Animal experiments and clinical studies have indicated that RT-induced microvascular injury serves as a crucial pathological mechanism for RN occurrence [15, 16, 29, 30]; thus, increasing atrophy of the STG in the RN group may be attributed to more severe RT-induced vascular damage in the STG area. A histopathological study also demonstrated that the patient with NPC who developed severe temporal lobe radiation injury 5 years after irradiation had a more extensive impairment of the microcirculation, including structural changes of the vessel wall and perivascular inflammatory reaction, which resulted in complete temporal WM destruction [31]. Our findings are consistent with those of a previous study by Zhang et al, who found that the patients who developed RN during follow-up had a more severe cortical thinning and larger cortical surface area in the STG compared to those who did not, although they also found abnormalities in other subfields of the temporal lobe [32]. These findings illustrated that the morphological alterations in the temporal regions, especially the STG, at the early stage post-RT may characterize RN at the late-delay stage. Future research is needed to combine multimodal imaging technology and pathological analysis to validate our findings and hypotheses.

Furthermore, combined with machine learning, we found that temporal WM volume alterations at the early stage (1–3 months and 6 months post-RT) in patients with NPC post-RT have a high predictive value for RN occurrence at the late-delay stage. We constructed the prediction models with both standardized temporal WM volume (divided by baseline) and absolute temporal WM volume (which is more easily obtained in clinical practice) and, interestingly, obtained similar predictive power. Moreover, the model constructed with absolute temporal WM volume was verified by an additional independent dataset. Several recent studies that used imaging before RN reported success in predicting RN in patients with NPC at the late-delay stage post-RT. Indeed, using texture features extracted from the MR images of the medial temporal lobe, GM, and WM before RN onset, Zhang et al developed predictive models for RN in patients with NPC [33]. Although they also obtained a high predictive power, they constructed predictive models with MR scans at the late-delay stage (median latency to RN: 41 months), at which stage the injury was irreversible and preventive therapy was not applicable [6, 7]. Moreover, combined with radiomics features and clinical factors, Hou et al developed and validated an MRI-based radiomics nomogram model for the early detection of RTLI in patients with NPC with texture features extracted from bilateral temporal lobes at the end of RT. The results demonstrated the potential power of radiomics features for the early prediction of RN in patients with NPC post-RT [34]. Our findings are complementary to these previous studies, which support the feasibility of temporal WM volume (alteration or absolute) at the early stage post-RT for the early identification of RN. Compared to radiomic features, temporal volume is much easier to obtain in clinical practice; thus, our findings could predict RN occurrence more effectively and conveniently at the early stage, which may help preventive therapy for RN [35,36,37].

Moreover, the prediction model combining temporal WM volume and dosimetric factor had the best prediction performance compared to the models constructed with WM volume and dosimetric factor only, while the predictive model constructed with temporal WM only exhibited better performance than the model constructed with dosimetric factor only. The dose has been previously identified as a strong predictor of RN [38, 39], and our findings also revealed that dosimetric factors could improve the prediction performance. Moreover, our findings indicated that temporal WM volume alterations at the early stage could serve as a more important predictor than the dosimetric factor; this is unsurprising, given that the temporal WM volume alterations at the early stage attributed to both the dosimetric factor and the individuals’ sensitive to dose [40,41,42], which had more information than dosimetric factor only.

This research has several limitations. First, the sample of the RN group was relatively small. Large-scale research is needed to verify these preliminary findings. Second, the 36-month observation period of patients post-RT lacks literature support. As a next step, we plan to extend the observation period and shorten the follow-up interval for patients with neurological symptoms in order to record the time to RN after RT more accurately. Finally, multi-modal MRI is needed to characterize the significance of brain changes (including brain structure, function, and network) for RN occurrence.

Conclusions

This is the first study to explore the predictive value for RN in patients with NPC with temporal WM volume alterations at the early stage (1–3 months and 6 months post-RT). Divergent STG atrophy trajectories were observed in patients with NPC with or without RN. Although both groups exhibited time-dependent atrophy in the STG, the patients in the RN group showed a sharper increase in atrophy compared to those in the NRN group. Furthermore, temporal WM volume alteration (or absolute volume) at the early stage could predict RN occurrence at the late-delay stage after RT. The present findings shed light on the early detection and prevention of RN in patients with NPC post-RT.