Introduction

Treatment of primary central nervous system lymphoma (PCNSL) typically consists of induction methotrexate-based chemotherapy (MTX-C) with or without consolidation whole brain radiotherapy (WBRT). Although these treatments prolong survival [1, 2], there is a risk of neurotoxicity that increases with advanced age at treatment, and in patients with prolonged disease-free survival [3]. MTX-C and WBRT may each cause CNS damage, but there is synergistic toxicity when these modalities are combined [4]. The adverse effects of chemotherapy and radiotherapy have been described to involve demyelination, inflammation, microvascular injury, depletion of oligodendrocytes [5, 6], microglial inflammation and disruption of hippocampal neurogenesis [7,8,9]. Modified treatments including MTX-C alone [2, 10,11,12,13], or followed by reduced-dose WBRT (rdWBRT) or by high-dose chemotherapy with autologous stem cell transplantation (HDC-ASCT) [1, 14,15,16] have been used in part as an attempt to reduce neurotoxicity in PCNSL patients.

PCNSL patients often experience cognitive difficulties that interfere with their ability to function at pre-diagnosis levels despite adequate disease control. However, cognitive outcomes have been included in a limited number of clinical trials. Literature reviews [17, 18] indicated that WBRT was associated with cognitive impairment in most PCNSL patients, but the retrospective designs limited the ability to determine the specific contributions of tumor and treatment. Prospective studies involving patients treated with MTX-C alone, reported cognitive impairment at diagnosis and no significant decline over time; however, several studies had methodological problems such as variable follow-up intervals, inclusion of patients with tumor progression, and the use of mental status screening tools with limited sensitivity in this population [17, 18]. The results of recent randomized prospective trials in patients treated with regimens including MTX-C followed by consolidation with WBRT or HDC-ASCT [15, 16] reported improvement in cognitive functions over a period of either 2 or 3 years in patients treated with HDC-ASCT, but a decline in patients treated with WBRT.

Our group assessed cognitive functions prospectively in 12 PCNSL patients treated with MTX-C followed by rdWBRT and high-dose cytarabine (ARA-C) [1], and reported a significant improvement in attention/executive functions and memory after induction MTX-C. Cognitive performance was relatively stable up to 4 years post-treatment; however, there was a significant decline in motor speed and an increase in white matter (WM) abnormalities. We assessed cognitive functions prospectively over 2 years in 15 patients treated with MTX-C followed by HDC-ASCT [14]. There was a significant improvement in attention/executive functions, motor speed and memory up to 1-year post-HDC-ASCT; however, the rate of improvement in attention/executive function and memory slowed subsequently, and there was evidence of WM abnormalities. In this study, we included data from additional PCNSL patients followed up to 5 years post-treatment with the intent of analyzing cognitive functions, quality of life (QoL), and WM abnormalities and cortical atrophy (CA) over time in patients treated with MTX-C followed by rdWBRT or HDC-ASCT.

Patients and methods

Immunocompetent patients with newly diagnosed histologically-confirmed B-cell PCNSL treated with induction chemotherapy including rituximab, MTX, procarbazine and vincristine (R-MPV) followed by consolidation with rdWBRT + ARA-C or HDC-ASCT as part of two prospective phase II clinical trials at Memorial Sloan Kettering Cancer Center (MSKCC) assessing the efficacy and safety of these treatment regimens [1, 14], with no evidence of disease progression during the entire follow up period, completed serial cognitive evaluations up to 5 years post-treatment.

R-MPV followed by rdWBRT + ARA-C

Induction chemotherapy consisted of five to seven cycles of R-MPV. Patients with a complete response post-R-MPV received consolidation rdWBRT (23.4 Gy, 1.8 Gy per fraction X 13 daily) followed by two cycles of ARA-C. Fourteen patients completed cognitive assessments at diagnosis, post-R-MPV and prior to rdWBRT, and at yearly intervals up to 5 years after rdWBRT and ARA-C. Four patients were not available at the 5-year follow-up for the following reasons: one had disease progression, two declined cognitive testing, and one was no longer followed at MSKCC; these patients were included in the analysis for all prior timepoints.

R-MPV followed by HDC-ASCT

Induction chemotherapy consisted of five to seven cycles of R-MPV. Patients with a complete or partial response post-R-MPV proceeded with HDC including thiotepa, busulfan, and cyclophosphamide followed by an ASCT. Fifteen patients completed cognitive assessments at diagnosis, post-R-MPV and prior to HDC-ASCT, and at yearly intervals up to 5 years post-transplant. Two patients did not complete cognitive follow ups at years 4 and 5 for the following reasons: one declined cognitive testing and the other was no longer followed at MSKCC; these patients were included in the analysis for all prior timepoints.

Measures

Cognitive assessment

Patients completed standardized neuropsychological tests and self-report mood and quality of life (QoL) scales, as per published guidelines for the assessment of PCNSL patients [17]. Raw cognitive test scores were compared with published normative values according to age and education, and converted into z-scores to characterize the presence and severity of cognitive difficulties. Cognitive impairment was defined as a z-score ≤  − 1.5 standard deviations from the normative values [17].

Cognitive test battery

Attention/Executive Functions Trail Making Test Parts A and B (TMTA, TMTB) [19]; Brief Test of Attention (BTA) [20], Controlled Oral Word Association Test (COWAT) [21].

Verbal Memory Hopkins Verbal Learning Test- Revised (HVLT-R) [22]: HVLT-R-Total Learning, HVLT-R-Delayed Recall, HVLT-R-Discrimination Index.

Graphomotor Speed Grooved Pegboard Test (GPT, Dominant and Non-Dominant hand) [23].

Mood and QoL questionnaires

Beck Depression Inventory (BDI) [24]

Functional Assessment of Cancer Therapy-Brain (FACT-BR) [25]

Neuroimaging

Brain magnetic resonance imaging (MRI) scans, performed within approximately ± three months of the cognitive evaluation, were rated for the presence of periventricular WM abnormalities and CA. WM ratings were performed on fluid-attenuated inversion recovery sequences for most patients, and if not available, T2-weighted sequences were used. WM ratings were performed using a modified Fazekas scale [26] and included: no WM disease (Grade 0); minimal patchy WM foci (Grade 1); start of confluence of WM disease (Grade 2); large confluent areas (Grade 3). Tumor and surrounding edema were excluded from these measurements. CA ratings were performed on T1-weighted sequences using a global CA scale [27] and included: normal volume/no ventricular enlargement (Grade 0); opening of sulci/mild ventricular enlargement (Grade 1); volume loss of gyri/moderate ventricular enlargement (Grade 2); ‘knife blade’ atrophy/severe ventricular enlargement (Grade 3).

Statistical analyses

Demographics were summarized using descriptive statistics and compared across treatment regimens using the Chi-squared test or Fisher’s Exact test for categorical variables, and the Wilcoxon two-sample test or the two-sample t-test for continuous variables, as appropriate. Cognitive tests scores were summarized using descriptive statistics. Linear mixed models (LMMs) adjusting for age were used to assess longitudinal trajectories of test scores for each group separately, and then groups were combined in additional models to test for interactions with treatment regimen. Exact follow-up assessment times from baseline were calculated, and both linear and quadratic terms were estimated by the LMMs. Wilcoxon two-sample tests were used to compare scores at specific time points by treatment groups. McNemar’s test was used to test for significant changes in WM and CA ratings between assessments for each regimen, and the Homogeneity of Stratum Effects (HSE; https://www.ncbi.nlm.nih.gov/pubmed/24697196) test was used to compare change in MRI ratings from baseline to each time point between treatment groups. Associations between binary WM and CA ratings and cognitive test scores were assessed using the Wilcoxon two-sample test. All analyses and graphics were performed using SAS v9.4 (Cary, NC) and R v3.5.2. All tests were two-sided with an alpha level of 0.05 for statistical significance. We did not adjust for multiple comparisons given the small sample size.

Results

Table 1 presents demographic, disease and treatment history per treatment group. There were no statistically significant differences between groups in age, education, estimated IQ, or disease characteristics.

Table 1 Demographic, disease and treatment history

Cognitive functions

At baseline and prior to R-MPV, mean z-scores were in the impaired range (z-score ≤ − 1.5) on most tests of attention/executive functions, memory, and motor speed in all patients (Fig. 1).

Fig. 1
figure 1

Cognitive Test Z-Scores: Attention/Executive Functions (TMT-Part A; TMT-Part B; BTA; COWAT), Verbal Memory (HVLT-R-Total learning; HVLT-R-Delayed Recall; HVLT-R-Discrimination Index), Graphomotor Speed (GPT-Dominant and Non-Dominant Hand)

In patients treated with consolidation rdWBRT + ARA-C the results of LMMs showed significant positive linear time components indicating continuous improvement from baseline up to year 3 on the TMTA (p = 0.004), TMTB (p = 0.03), BTA (p = 0.001), COWAT (p = 0.01), HVLT-R-Discrimination Index (p = 0.01), and GPT dominant (p = 0.003) and non-dominant (p = 0.01) hands. However, a statistically significant quadratic time component was observed suggesting a decline after year 3 on the TMTA (p = 0.004), HVLT-R-Learning (p = 0.03) and HVLT-R-Delayed Recall (p = 0.01) (Fig. 1).

In patients treated with HDC-ASCT the results of LMMs showed significant positive linear time components indicating continuous improvement from baseline up to year 3 on the TMTA (p = 0.002), TMTB (p = 0.02), COWAT (p = 0.003), HVLT-R-Learning (p = 0.004), HVLT-R-Delayed Recall (p < 0.0001), and GPT dominant (p = 0.02) and non-dominant (p = 0.04) hands. However, a statistically significant quadratic time component was observed after year 3 suggesting a slowed rate of improvement on the TMTA (p = 0.008) and TMTB (p = 0.047), and a decline on the COWAT (p = 0.008), BTA (p = 0.008), HVLT-R-Learning (p = 0.001), HVLT-R-Delayed Recall (p = 0.007) and GPT non-dominant hand (p = 0.0007) (Fig. 1).

Group Comparisons The results of LMM analysis adjusting for age showed no statistically significant longitudinal differences between the two groups on any of the cognitive tests. On tests of attention/executive functions, mean z-scores remained within about 0 and − 1 SDs from normative values following R-MPV through year 5 for both groups. On tests of memory (HVLT-R-Learning, HVLT-R-Delayed Recall), mean z-scores were within about − 1 and − 1.7 SDs from the norms following consolidation with rdWBRT + ARA-C, and within about − 0.5 and − 1.5 SDs following HDC-ASCT. On motor speed, mean z-scores were within about − 0.5 and − 1.5 SDs below the norms for both groups.

The 6 patients who did not complete either the year 5 follow-up in the rdWBRT group or the year 4/5 follow-ups in the HDC-ASCT group were not statistically significantly different from the remaining participants on age, education, or baseline cognitive performance, except for worse performance on the HVLT-R Total Learning (p = 0.04).

Self-reported mood and QoL

The results of LMMs analysis showed a statistically significant improvement in the FACT-BR scores (i.e., better QoL) from baseline over time in patients treated with rdWBRT + ARA-C (p = 0.0005) and with HDC-ASCT (p < 0.0001), with a sharp improvement from baseline to year 1. BDI scores declined significantly (i.e., fewer symptoms of depression) from baseline over time in patients treated with rdWBRT + ARA-C (p = 0.01) and with HDC-ASCT (p < 0.0001); mean scores were not indicative of depression at any time point. A statistically significant quadratic time component was observed suggesting that the rate of improvement for the FACT-BR (p < 0.0001) and the rate of decline for the BDI (p = 0.001) slowed after year 2 in patients treated with HDC-ASCT. The results of LMMs analysis adjusting for age showed no statistically significant longitudinal differences in the FACT-BR or BDI between the two groups (Fig. 2).

Fig. 2
figure 2

Mood (BDI) and quality of life (FACT-BR)

Neuroimaging

In patients treated with rdWBRT analysis of WM ratings showed statistically significant worsening from baseline to years 3 and 4, with 70% of patients with grades 0/1 at baseline changing to grade 2/3 by years 3 and 4 (McNemar’s test p-values 0.03 and 0.03, respectively for years 3 and 4). Analysis of CA ratings showed statistically significant worsening from post-R-MPV to years 3 and 4, with 56% of patients with grades 0/1 post-R-MPV changing to grade 2 (no ratings > 2) by years 3 and 4 (McNemar’s test p-values 0.03 and 0.03, respectively for years 3 and 4) (Tables 2, 3).

Table 2 White matter (WM) abnormalities
Table 3 Cortical atrophy

In patients treated with HDC-ASCT analysis of WM ratings showed statistically significant worsening from baseline to years 1 through 5, with 40% of patients with grades 0/1 at baseline changing to grade 2/3 by year 5 (McNemar’s test p-values 0.03, 0.03, 0.03, 0.03, and 0.046, respectively for years 1, 2, 3, 4 and 5). Analysis of CA ratings showed no statistically significant longitudinal changes (no ratings > 2) (Tables 2, 3).

Group comparisons There were no statistically significant longitudinal differences between the two groups in WM or CA ratings.

Associations Overall, for patients at years 3, 4, and 5, mean cognitive test z-scores were lower for those with WM grades 2/3 and CA grades 2 compared to those with grades 0/1. At year 3 and 4, CA ratings were significantly associated with performance on the GPT dominant (p = 0.03 and p = 0.009, respectively) and non-dominant (p = 0.02 and p = 0.0095, respectively) hands, such that patients with CA grades 2 had worse performance compared to patients with grades 0/1. At year 5, WM ratings were significantly associated with performance on the BTA (p = 0.02) and approached significance for the GPT dominant hand (p = 0.06), with patients with grades 2/3 having worse performance compared to patients with grades 0/1. For the TMTB, associations with CA ratings approached significance at years 3, 4 and 5 (p = 0.06, p = 0.09, p = 0.06, respectively) and associations with WM ratings approached significance at year 4 (p = 0.06), with patients with CA or WM grades 2/3 having worse performance compared to patients with grades 0/1. Overall, for patients at years 3, 4, and 5, mean BDI scores were higher and FACT-BR scores were lower for patients with WM grades 2/3 and CA grades 2 compared to patients with grades 0/1, and there were no statistically significant associations.

Discussion

This study assessed cognitive functions longitudinally in PCNSL patients treated with R-MPV followed by consolidation with rdWBRT + ARA-C or HDC-ASCT that were progression-free up to 5 years post-treatment. Most patients had cognitive impairment at diagnosis likely related to disease burden, and possibly to the adverse effects of anti-epileptic and corticosteroid medications [28, 29]. There were no significant longitudinal group differences in cognitive functions over time. Initially, there was a significant improvement in attention/executive function, memory and motor speed following R-MPV in both groups, likely related to tumor response to treatment. However, given the impaired scores at baseline, regression to the mean may have also influenced the improvement observed after R-MPV. These findings of cognitive improvement post-induction chemotherapy are consistent with our prior results in a subset of patients treated with these regimens [1, 14], and with reports from other groups [15,16,17,18].

Longitudinal follow-up showed improvement in most cognitive domains from baseline up to year 3 for both groups, which is overall consistent with some of our prior results [1, 14]. However, there was a significant decline in attention/executive functions and memory after year 3 in both groups, suggesting that these regimens may be associated with late delayed neurotoxicity. Moreover, mean test z-scores remained below expected normative values during the follow up period on most cognitive domains suggesting a relatively diffuse pattern of deficits. Patients treated with rdWBRT + ARA-C had lower scores in learning and delayed recall over time than patients treated with HDC-ASCT, particularly after year 3, although the comparisons did not reach statistical significance, possibly due to reduced power and interpatient variability. This observation is consistent with some studies suggesting increased sensitivity of hippocampal-mediated functions to radiotherapy [30, 31], although several studies have reported primarily frontal-subcortical dysfunction [32, 33]. Similar to the cognitive findings, there was significant improvement in self-reported mood and QoL over time in both groups, with no significant longitudinal group differences.

The results of the second randomization of the International Extranodal Lymphoma Study Group-32 (IELSG32) phase 2 trial [15] reported improvement in cognitive functions over 2 years in patients treated with induction chemoimmunotherapy followed by consolidation HDC-ASCT (N = 27), but a progressive decline in attention/executive functions in patients treated with consolidation WBRT (N = 30; 3600 cGy). Similarly, in the randomized phase 2 PRECIS Study [16] there was evidence of stable or improved cognitive functions up to 3 years post-treatment in more than one half of patients treated with HDC-ASCT, but executive function declined in most patients treated with WBRT (4000 cGy). Our study including a 5-year follow-up suggests that patients treated with HDC-ASCT may also experience cognitive decline, and underscores the importance of performing longitudinal follow ups in this population. Our findings also suggest that regimens including rdWBRT (2340 cGy) may be associated with less severe delayed neurotoxicity than described after full-dose WBRT [5, 32, 33], consistent with the suggestion that radiation dose may be a major factor influencing the risk of neurotoxicity [15].

In this study, there was a significant increase in WM abnormalities over time in both treatment groups. Group comparisons showed no significant longitudinal differences between treatment groups in WM or CA ratings, although a larger number of patients treated with rdWBRT had an increase in WM and CA by year 3. These findings are consistent with previously reported structural brain abnormalities following chemotherapy + WBRT for PCNSL [2, 34] including WM disease and CA [34, 35], albeit these changes seemed less pronounced than after full-dose WBRT [32, 33]. WM abnormalities have also been reported in PCNSL patients treated with HDC alone [32, 36], and alterations in WM integrity and gray matter volume have been described in patients with non-CNS cancers treated with chemotherapy with or without total body irradiation followed by SCT [37, 38]. Our results suggest that both HDC-ASCT and rdWBRT may be associated with WM abnormalities in PCNSL patients achieving long-term remission, and that patients treated with rdWBRT may be at greater risk. Overall, patients with more WM abnormalities and CA regardless of treatment modality had worse cognitive performance than patients with no/minimal changes. CA was associated with worse graphomotor speed and executive functions, and WM was associated with worse attention/executive functions. Moderate associations between WM and CA changes and cognitive function have been described in some but not all PCNSL studies [32, 33, 35]. Future studies using quantitative imaging techniques such as diffusion tensor imaging may provide greater sensitivity to assess treatment-related changes in brain structure.

The relatively small sample size and attrition are limitations of this and other longitudinal cognitive studies in PCNSL patients [18], and may diminish the power to detect small group differences in cognitive performance. Importantly, comparisons between the two groups in this study should be viewed with caution as patients were not randomly assigned to the treatment regimens/clinical trials, and inherent differences in demographic and disease characteristics may have influenced the results. Moreover, we studied patients without disease progression and our results may not be representative of all patients who participated in the two clinical trials. In the HDC-ASCT clinical trial, a larger proportion of patients were progression-free and younger, whereas more patients in the rdWBRT trial progressed or died [1, 14], which may have introduced patient selection bias. Also, it is possible that the few patients who declined cognitive follow-up experienced either further cognitive worsening or improvement. Nevertheless, the current findings highlight the importance of conducting long-term longitudinal studies to determine the contribution of both disease and treatment modality to cognitive outcome and QoL in this population. Prospective clinical trials have begun to incorporate published guidelines for cognitive assessment in PCNSL patients [15] and the findings will continue to advance our understanding of neurotoxicity in PCNSL patients treated with various treatment regimens. Two ongoing multi-center randomized trials from the Alliance group investigating R-MPV with and without rdWBRT (Radiation Therapy Oncology Group 1114), and myeloablative versus non-myeloablative consolidation chemotherapy (Cancer and Leukemia Group B 51,101) in PCNSL include serial cognitive assessments and may allow for a competing risk analysis accounting for progression and death as competing risks, and the results will provide additional information about cognitive outcome in large patient cohorts.