Introduction

Posterior fossa primitive neuroectodermal tumors (PF-PNET) are the second most common central nervous system (CNS) neoplasms in children followed by ependymoma with incidence rates of 6.5 and 3.4 per million children (0–14 years) per year, respectively [30]. The use of multimodality treatment concepts including surgery, radiotherapy, and chemotherapy resulted in a significant improvement of prognosis during the past decades in these patients [30]. Prognosis primarily depends on the tumor site, age at diagnosis, stage of the disease, and extent of resection [10, 20, 39]. Patients with average-risk PF-PNET may achieve a progression-free survival (PFS) of 80% at 5 years [10, 20, 39]. Although PFS among patients with high-risk PF-PNET was recently reported to be comparably high (70%) [4], only about 30% to 40% of patients with frank dissemination at diagnosis can expect to be progression free at 5 years [20, 39]. In patients with ependymoma, PFS ranges from 50% to 70% [28, 33, 34].

The incidence and spectrum of late effects following surgery, radiotherapy, and/or chemotherapy of childhood CNS tumors have been extensively described since the late 1970s [2, 9, 22, 27, 29]. These side effects include permanent endocrine dysfunctions, neurologic and neurosensory deficits as well as neuropsychologic impairment and can cause significant morbidity in long-term survivors of childhood brain tumors [2, 9, 22, 27, 29]. In addition, a large amount of parameters of intellectual, neuropsychologic, and social functioning such as intelligence quotients (IQ), attention and memory skills, learning abilities, behavioral outcomes, and employment and family status have been assessed in former brain tumor patients [8, 11, 13, 15, 17, 25, 26, 38]. Several risk factors for the development of cognitive impairment after treatment of childhood brain tumors were identified. For patients with PF brain tumors, a detailed review of neurocognitive sequelae has recently been presented by Mulhern et al. [27].

Quality of life (QoL), however, was not specifically evaluated by appropriate test instruments in most of the older studies. Instead, different neuropsychological parameters (e.g., school achievements, employment status, and income) were used as rough surrogate markers of QoL [15, 17, 25]. From the late 1990s, studies reported more specifically on health-related QoL [4, 7, 24, 26, 32, 38]. An extensive number of measures are available to evaluate both IQ- and health-related QoL in these patients. The Wechsler Intelligence Scale for Children—Revised, the Bayley Scales of Infant Development, McCarthy Scales of Children’s Abilities, and the Kaufman Assessment Battery for Children (K-ABC) are the most commonly used instruments to evaluate intellectual outcomes and cognitive development in children of different ages [8, 11, 13, 17, 26, 32, 38]. In contrast, tests to assess health-related QoL include the PedsQL, CHQ 50, SF-36, KINDL, and EORTC-QLQ-C30 [1, 4, 24, 31, 32]. A major issue concerning quantification of late effects in cancer patients is the lack of a reproducible, clinically practicable tool to quantitatively characterize the degree of late effects in former brain tumor patients. The aim of the present study was, therefore, to quantify the degree of late effects by a simple numerical score (late effects severity score [LESS]) in patients who have completed multimodality treatment for PF-PNET or ependymoma. As a surrogate measure to validate this LESS, we correlated QoL and neurocognitive parameters with the severity of late effects.

Patients and methods

Patients

Between January 1990 and December 2005, 242 patients with primary CNS tumors were referred to our institution. Among them, 50 patients (20.7%) were diagnosed with PF-PNET and 22 patients (9.1%) with ependymoma, respectively. As of March 31, 2007, 34 out of 50 patients with PF-PNET (68%; complete remission, n = 31) and 17 out of 22 patients with ependymoma (77.3%; complete remission, n = 16) are alive. With regard to the instruments used in this study, only children older than 6 years of age were eligible. Initial primary diagnostic evaluation included complete physical examination and pre- and postoperative contrast-enhanced magnetic resonance imaging (MRI) scans of the brain and spine (in patients with suspected PF-PNET or ependymoma). Follow-up MRI scans were performed at 3-month intervals for the first year after diagnosis, at 6- to 12-month intervals between the second and sixth year, and at 2- to 4-year intervals thereafter. Additional scans were done when clinical symptoms suggested disease progression. Tumor tissue specimens for histopathologic examination were obtained by tumor resection. Histopathologic diagnosis was performed by local neuropathologists according to the WHO classification system for brain tumors [19]. Central neuropathologic review by the Brain Tumor Reference Center of the Pediatric Oncology and Hematology Society of the German Language Group (GPOH) at the University of Bonn, Germany was done in ten patients with either PNET or ependymoma (43.5%). The extent of surgical resection was documented using the neurosurgeon’s report and the immediate postoperative contrast-enhanced MRI. Treatment of patients with PNET or ependymoma was performed according to the prospective multicenter trials (HIT 89, HIT 91 [20, 37], and HIT 2000) of the GPOH after informed consent was obtained.

Study design

Basic evaluation of long-term tumor- and/or treatment-related late effects was done according to our institutional follow-up program [3, 21]. In all patients, neuropsychological and QoL testing was performed by an experienced psychologist (KS, AW) at variable time points after cessation of therapy in a single session which lasted approximately 3 h. Psychologists were familiar with the patients and their families and took care of them during the entire course of the disease. Due to the postoperative neurological side effects observed in most patients and the emotional situation of the patients and their parents after diagnosis of a potentially life-threatening malignant condition, we did not perform neuropsychologic and QoL testing before initiation of non-surgical treatment.

The German Wechsler Intelligence Scales for Children (WISC) and the German Wechsler Adult Intelligence Scales (WAIS) were used to test cognitive ability in patients <16 and >16 years of age, respectively [35, 36]. Three intelligence quotients (IQ), a full scale IQ (FSIQ), a verbal scale IQ (VIQ), and a performance scale IQ (PIQ) are calculated with an IQ of 85 to 115 being defined as average range. Additionally, the test battery included the d2 concentration test that measures selective attention, processing speed, rule compliance and concentration [5], and the fragmented picture test (FPT), an instrument to evaluate perception and memory [18]. The verbal learning and memory test (VLMT), a modified version of the Auditory Verbal Learning Test measures both short-term and long-term memory [14]. QoL was evaluated using the KINDL (for patients <16 years) and EORTC-QLQ-C30 (for patients >16 years), respectively [1, 31]. The study was approved by the Ethics Committee of the Medical University of Graz, Austria.

Late effects severity score (LESS)

In order to quantify treatment-related sequelae and to correlate neurocognitive parameters and QoL with the severity of late effects, we generated a simple LESS (Table 1). Based on the severity of late effects, 0–2 points were assigned within each of four different categories (neurology [N], endocrine [E], visual/auditory [V/A], others [O]): 0 point: no late effect; 1 point: one isolated neurological late effect (e.g., mild ataxia, mild hemiparesis, pathological EEG) (N); unsubstituted hormone deficit(s) (E), uni- or bilateral hearing impairment not requiring correction with hearing aid and/or visual impairment not requiring special training/devices/equipment (V/A); any treatment-related medical problem not listed under the above categories, but not requiring medical intervention(s) (e.g., alopecia) (O); 2 points: any combination of >1 neurological late effect (e.g., ataxia and hemiparesis) and/or seizures necessitating anticonvulsive medication (N); >1 hormone deficit(s) necessitating hormone replacement (E); unilateral deafness +/− contralateral hearing impairment or any hearing impairment requiring correction with hearing aid and/or visual impairment requiring special training/devices/equipment (V/A); any treatment-related medical problem not listed under the above categories, which requires medical intervention(s) (e.g., ventriculoperitoneal (VP) shunts, second malignancies). The total LESS for one patient comprised the sum of the individual scores in the four categories (i.e., a maximum score of 8 [4 × 2] was achievable). Three authors (AM, HL, and MB) independently assessed the LESS in all patients.

Table 1 Late effects severity score (LESS)

Statistical analysis

The interobserver agreement regarding severity of late effects was calculated by the Kendall’s coefficient of concordance W. Values of W can range from 0 to 1, with 0 indicating perfect disagreement and 1 indicating perfect agreement. The Mann–Whitney U test was used to compare groups with regard to measures of cognitive performance, QoL, and late effects. Spearman’s rank order correlation was performed to describe relations among parameters of cognitive performance, QoL, and late effects. The χ 2 test was used to test for differences in IQ in relation to the time after completion of therapy (<5 years versus >5 years).

Results

Study patients, late effects, and late effects severity score

Basic clinical characteristics of study patients are summarized in Tables 2 and 3. Seventeen patients with PNET and six patients with ependymoma agreed to be enrolled in the present study and fulfilled the eligibility criteria. Eight patients with PF low-grade astrocytoma who underwent tumor resection only served as control group. The median time from the end of therapy was 56 months (range, 1–174) for patients with PF-PNET or ependymoma and 76 months (range, 7–207) for patients with low-grade astrocytoma, respectively. Median age at the time of testing was 14 years (range, 9–49). Three patients with PF-PNET were adults (36, 38, and 40 years of age) at the time of diagnosis. Due to the local policy, these patients received adjuvant treatment at our institution and were included into the present study. The spectrum of late effects is summarized in Table 4. Not unexpectedly, patients with PF-PNET or ependymoma had significantly higher LESS in the categories N, E, and V/A compared to patients with low-grade astrocytoma (Table 5). Other late effects were also more common among patients with PF-PNET or ependymoma, but this difference did not reach statistical significance. Kendall’s coefficient of concordance for the categories N, E, and V/A were 0.91, 0.95, and 0.92 indicating excellent agreement between the three authors who blinded scored the severity of late effects for each individual patient. For the category “other late effects”, interobserver agreement was moderate (coefficient of concordance = 0.76).

Table 2 Clinical characteristics of study patients with PF-PNET or ependymoma (n = 23)
Table 3 Clinical characteristics of study patients with PF low-grade astrocytoma (n = 8)
Table 4 Spectrum of late effects in patients with PF-PNET, ependymoma, and PF low-grade astrocytoma
Table 5 Descriptive statistics for neurocognitive performance, quality of life, and late effects

Neurocognitive testing

Ten out of 23 patients with PF-PNET or ependymoma (43.5%) achieved average FSIQ scores, whereas in 12 patients (52.2%) FSIQ scores were below average. Only one patient (4.3%) scored above average. Thirteen (56.5%) and nine (39.1%) patients achieved average VIQ and PIQ scores. In contrast, VIQ and PIQ scores were below average in nine (39.1%) and 14 (60.9%) patients, respectively. Intellectual skills declined with time from the end of treatment. Nine of twelve patients (75%) who were tested less than 5 years after treatment achieved normal FSIQ scores (VIQ = 10/12 [83.3%]; PIQ = 7/12 [58.3%]). In contrast, only one of 11 patients (9.1%) tested more than 5 years after completion of therapy achieved a normal FSIQ score (VIQ = 3/11 [27.3%]; PIQ = 2/11 [18.2%]). This turned out to be significant for FSIQ, VIQ, and PIQ scores, respectively.

Patients with PF-PNET or ependymoma had significantly lower WAIS/WISC FSIQ, VIQ, and PIQ scores compared to patients with low-grade astrocytoma (Table 5). However, with regard to other neurocognitive measures, differences between the two groups were less impressive. Patients suffering from PF-PNET or ependymoma performed significantly worse than patients with PF low-grade astrocytoma only on some subtests (consolidation [VLMT] and memory [FPT]) (Table 5). The total LESS and the LESS in the categories N, E, and V/A were negatively correlated with FSIQ, PIQ, VIQ, attention and concentration (measured by the d2 concentration test), and verbal learning (measured by the VLMT) (Table 6). Interestingly, patients suffering from more severe late effects (total score, N, V/A) showed more improvement between session 1 and session 2 on the FPT than patients with less severe or no late effects. This observation indicates that training programs and special educational support might be able to restore cognitive abilities in severely handicapped patients.

Table 6 Spearman’s rank order correlations between measures of cognitive performance and late effects

Correlation of quality of life, late effects, and cognitive performance

We first compared patients with PF-PNET or ependymoma and patients with low-grade astrocytoma in terms of global QoL and found no statistically significant difference between the two groups in patients (<16 years of age) tested by the KINDL. The number of patients with low-grade astrocytoma who were tested by the EORTC was too low to allow for a meaningful comparison between the two groups with regard to global QoL. We then tried to correlate measures of neurocognitive performance and QoL. There was no statistically significant relationship between measures of neurocognitive performance and QoL indicating that neurocognitive functioning (either impaired or normal) does not automatically translate into alterations or impairment in QoL. When we compared QoL and late effects in patients with PF-PNET or ependymoma, no significant correlation was found except for neurological late effects and the KINDL score showing that younger patients with more severe late effects reported on a worse QoL.

Discussion

Although neurological and neuropsychological sequelae have been extensively described in children with malignant CNS tumors treated with radiotherapy and/or chemotherapy [2, 4, 79, 11, 13, 15, 17, 22, 23, 2527, 29, 32], a direct correlation of late effects, neurocognitive sequelae, and QoL following treatment of malignant childhood brain tumors has been rarely attempted. In the majority of the studies published so far, different neuropsychological outcomes (e.g., educational and/or employment status) were used as surrogate parameters for QoL [34]. Such parameters, however, may not necessarily or not appropriately reflect the subjective perception of QoL in former cancer patients. Thus, the present study aimed first at assessing and quantifying late effects in patients who have successfully completed multimodality treatment for malignant brain tumors. We then wanted to correlate the late effects with neurocognitive and QoL parameters, thereby testing the hypothesis whether severe late effects and poor neurocognitive functioning directly translates into poor QoL. In contrast to the evaluation of QoL and neurocognitive functioning where the applied test instruments generally yield a definitive result, measurement and grading of late effects is difficult and might largely reflect the personal estimation of the physician rather than the actual degree of these late effects. We, therefore, generated a LESS in order to quantify treatment-related sequelae, thus allowing further statistical analysis and correlation of neurocognitive and QoL parameters with the severity of late effects. As such, a LESS has to be simple and reproducible allowing also to express complex patterns of late effects in a simple numerical score. This goal was achieved by grouping late effects into a limited number of (four) different categories (neurology, endocrine, visual/auditory, others) and by assigning 0, 1, or 2 points within each of these four domains reflecting the degree of a particular late effect. We are aware that generation of medical scoring systems to quantify deviations from normal is somewhat problematic and debatable. However, different scoring systems were shown to be useful, particularly where objectively measurable parameters are difficult to achieve [6, 12, 16]. The agreement between the three testers who scored the severity of the patients’ late effects was good within the three most important categories of late effects (neurology, endocrine, visual/auditory; Kendall’s coefficient of concordance = 0.91, 0.95, 0.92) indicating that this LESS is able to generate reproducible results. Whether the LESS described here might be helpful in grading tumor- or treatment-related late effects in brain tumor patients has to be shown in a larger prospective clinical study.

In accordance with previously reported results [29], patients suffering from PF-PNET had more and more severe late effects (except for the domain “other late effects”) than patients from the control group. Not surprisingly, patients with PF-PNET did also worse on all neurocognitive tests than patients with PF low-grade astrocytoma. This difference was statistically significant for the Wechsler test (FSIQ, VIQ, and PIQ), the consolidation domain of the VLMT, and the memory skills subtest of the FPT. For other neurocognitive parameters tested (d2 total score, d2 total minus errors, VLMT verbal learning, FPT speed, and FPT improvement), the difference between the two groups did not reach statistical significance. This could be explained by the small number of patients in the present study. The most significant differences were found on the Wechsler domains, indicating a decline of more than one standard deviation in the PF-PNET group, which can be considered clinically significant. Correlation of total late effects and evaluation of cognitive performance showed that more severe late effects were negatively correlated with the Wechsler domains and other neurocognitive parameters (selective attention, processing speed, rule compliance and concentration [measured by the d2 test], verbal learning [measured by the VLMT], and memory [measured by the FPT]). These correlations had medium and high effect sizes, respectively.

Interestingly, we did not find any group difference between patients with PF-PNET/ependymoma and patients with PF low-grade astrocytoma in terms of QoL and its domains. Since we had to divide the study population into two subgroups for QoL testing depending on the patients’ age, this finding was also interpreted as a consequence of the small sample size. However, the absolute differences of the QoL mean scores were practically negligible, indicating a nearly perfect null effect for the group differences in QoL. Given the high significance for group differences in the total LESS which is considered to objectively reflect the severity of late effects, this observation indicates that objective QoL measurement differs from subjective assessment of QoL. There are several reasons to explain this observation. Firstly, one might speculate that the number and degree of late effects as expressed by the LESS does not necessarily mean that QoL is compromised in patients with high LESS scores. Secondly, the patients might have learned to live with their impairments and do not consider them as severe as others do. Thirdly, a social desirability bias may have influenced testing results. In two recently presented large studies, QoL was measured among patients with CNS tumors. In our study, neurological late effects seem to be most predictive for an impaired QoL in younger children tested by the KINDL. Bhat et al. used the PedsQL 4.0 and corresponding parent-proxy reports to assess QoL in a large series of 134 patients with different CNS tumors [4]. They found that the total QoL and other functioning scales were significantly lower among patients compared to controls. With regard to the histopathological diagnosis, there were no differences in patient-assessed total QoL. However, physical health scores were significantly higher among patients with low-grade glioma than in patients with PF-PNET on parent-proxy reports. Patients treated with radiotherapy alone had lower total QoL and subscale scores compared to those receiving other treatments. Maunsell et al. studied QoL in 1,334 childhood cancer survivors using the SF-26 [24]. Although patients with CNS tumors were found to have significantly poorer QoL and functioning scores compared to controls, the authors concluded that, for the majority of the effects, differences were clinically not relevant. In accordance, we found statistically significant differences between the two groups with regard to the severity of late effects and neurocognitive impairment. However, keeping in mind the small number of patients, there were no differences between both groups in terms of QoL indicating that QoL is probably less compromised in patients following radiochemotherapy of malignant brain tumors than neurocognitive abilities. The LESS described here allows to express the degree of late effects in a numerical fashion and particularly reflects the profile of late effects observed in former brain tumor patients. It seems to be a simple and reproducible tool for the assessment and quantification of late effects in these patients, but has to be evaluated in a larger cohort.