Introduction

Neuropsychological testing has demonstrated extreme value in several neurological disorders (see, for example, [1]), including brain tumors. Changes in neurologic disease status can be identified through variations in cognitive performance, possibly correlating with the disease severity. Identifying neuropsychological changes earlier in brain tumor progression, for example, may lead to better therapeutic interventions to improve outcome. Indeed, although the outcome is measured by progression-free survival, and CT and MRI provide highly relevant information, they do not inform about the clinical situation of the patient. In addition, the simple ability to perform daily living activities does not reflect the patient’s cognitive status. Therefore, indices of cognitive function have become increasingly important in clinical trials, as well as in pharmacological studies, to test the initial average global cognitive level and to control for possible decline (or improvement) during the follow-up. The cognitive follow-up can detect the effects of neurosurgery procedures, radiotherapy, chemotherapy, or anticonvulsant drugs, and, finally, it guides a rehabilitation program. Cognitive function has also independent prognostic significance. It has been demonstrated that some tests, such as the Digit Span or verbal recall, can be predictive of survival in malignant recurrent glioma [2]. However, there are several difficulties in optimizing cognitive batteries for brain tumor clinical trials [3], one being the need to balance brevity and sensitivity. This is especially the case for tumors with a slow rate of growth, such as low-grade gliomas (LGG). The main issues will be discussed below.

Task sensitivity

In 2009, Duffau et al. [4] published a paper concerning the role of the uncinate fasciculus in language functions. Thirteen LGG patients were evaluated by means of the Boston Diagnostic Aphasia Examination (BDAE) before and after surgical treatment, which was performed in awake surgery. The main objective of the BDAE, as with most of the standardized language examinations, is that of classifying patients inside one of the traditional aphasic syndromes; items typically present very simple and concrete tasks most children in the lower grades can pass. Therefore, the BDAE, as with other standard language examinations, is appropriate for aphasic syndromes due to cerebrovascular accidents, but language impairments provoked by slowly growing lesions differ from those following acute brain damage. Deficits due to lesions with a slow rate of growth can be very subtle and, accordingly, can be detected only through a detailed, sufficiently sensitive, language evaluation. Moreover, lesions involving the left anterior temporal cortex and its subcortical connections can lead to impaired retrieval of specific categories, namely proper names [58], a type of stimuli that was not used by Duffau and colleagues. Indeed, by using a famous people naming task in a series of 44 patients submitted to awake surgery for removal of a left frontal or temporal glioma, we found that, when removal included the uncinate fasciculus, patients performed significantly worse than when this fiber tract was spared [9]. This deficit was evident even 3–6 months after surgery and also involved picture naming of objects. In particular, in patients with uncinate removal, naming of famous faces significantly decreased immediately after surgery and only partially recovered, remaining significantly poorer than before surgery; moreover, the number of patients scoring below the cut-off was significantly higher when the uncinate was removed. This was not due to infiltration before surgery, since the two groups of patients did not differ in their pre-operative performance. Cognitive deficits in brain tumors can remain undetected when tasks are not sensitive enough or adequate for the specific lesion site. Of course, we do not mean that the neurosurgeon should avoid removing the tumor, since more or less subtle naming deficits can appear, as the balance between risks and benefits certainly favors tumor removal; however, we believe that, in order to make an “informed consent” really informed, it is important to know what brain surgery can cause, to inform the patient in detail about these (although mild) consequences, and to reassure him/her that these potential deficits will spontaneously (or with the aid of rehabilitation) recover in a few months in the large majority of cases.

The existence of pitfalls in the assessment of disability in individuals with LGG is not new [10]: 24 patients were evaluated by both a neurologist and a neuropsychologist, and were also asked to self-evaluate their deficits by responding to a specific questionnaire. The neurologist used the Boston Aphasia Severity Rating Scale from the BDAE and a verbal delayed recall test to assess language and memory function, respectively, while the neuropsychologist made use of a more detailed neuropsychological test battery, which included symbol digit, Rey auditory verbal learning, block design, picture arrangement, information, verbal fluency on phonological cue, and Judgment of Line Orientation. Neuropsychological assessment revealed from moderate to severe cognitive impairment in more than half of the patients. This impairment was not detected by the simplified evaluation performed by the neurologist, and the patients themselves reported it at an intermediate extent (between the neurologist and the neuropsychologist). The results showed statistical differences in memory and language as recorded by the three assessors (patient, neurologist, neuropsychologist), demonstrating that a detailed neuropsychological evaluation is necessary to detect cognitive dysfunction in LGG patients.

Therefore, the first take-home message is to use wide-ranging sensitive tasks and avoid standard language examinations or batteries, such as the Mini Mental State Examination, which, in the case of LGG in particular (but also in brain tumors in general), cannot give any relevant information.

Lesion location

Another important issue is what we could reasonably expect from a lesion in that particular location, as already mentioned above. For instance, prefrontal gliomas can produce general cognitive deficits, such as decrease in sustained attention, forgetfulness, decision-making difficulties, and changes in mood; temporal tumors cause verbal memory impairment or language deficits, when located in the language dominant hemisphere; similarly, they can result in visuo-spatial memory deficits when located in the right hemisphere, although these are less frequently observed. Language deficits can be very subtle and affect only a particular grammatical or semantic class, and this should be taken into account when selecting stimuli. In left parietal gliomas, one has to look for the presence of Gerstmann syndrome (co-occurrence of agraphia, finger agnosia, acalculia, left–right disorientation), which could be not clinically evident or could show peculiar aspects (see, for example, [11], whose patient with a glioblastoma had toe agnosia). A glioma in the occipital lobe extending in the splenium can produce alexia without agraphia [12]. Therefore, a second take-home message is to include specific tasks depending on lesion site.

Plasticity

However, rate of growth and, accordingly, plasticity can change what we know about anatomo-clinical correlations. For example, we observed an anaplastic oligodendroglioma grade III, involving the left frontal lobe, whose volume was 118.50 cm3 (see Fig. 1). The patient did not show any cognitive deficit, mood or personality change.

Fig. 1
figure 1

a and b MRI of a patient with an anaplastic oligodendroglioma in the left prefrontal lobe and an entirely normal neuropsychological evaluation

In a recent study [13], we used direct electrical stimulation (DES) during surgical removal of a glioma in 38 patients to identify the sites involved in naming different categories of objects. The sites that were selectively inhibited in naming either living or non-living things were displaced relative to those observed with other subjects populations, possibly reflecting cortical reorganization due to slowly evolving brain damage.

Additional information

In building an adequate battery for the neuropsychological evaluation of brain tumors, specifically LGG, another point to take into account is whether there are specific tests that can provide additional information, such as rate of survival or relapsing probability. In other words, if there are tests that can predict evolution before MRI data, they obviously need to be included. In a study performed in 2000, Meyers et al. [2] found that tests most strongly related to survival after accounting for the clinical variables (age, histology, Karnovsky index, time since diagnosis) were verbal learning, digit span, and digit symbol. However, this study included only 80 patients, and the results need to be confirmed on a larger sample. Similarly, Armstrong et al. [14] tested 34 patients with supratentorial low-grade tumors, 11 of which developed recurrent tumors. They compared two models for the early detection of low-grade brain tumor recurrence prior to detection with clinically scheduled neuroimaging. A general model based on tests sensitive to malignancy and white matter disease was compared with a tumor-specific model based on indices related to each patient’s tumor locus. A Cox proportional hazards model was used to identify the predictor variables that significantly changed immediately prior to recurrence. Only the tumor-specific model achieved significance. Also, a single memory task, namely word recognition, approached significance.

In a preliminary study on 226 patients (see below), our group has found that verbal fluency and naming significantly predicted relapsing probability in LGG even when the volume of the tumor was taken into account. However, these measures need further testing to confirm their predictive value.

To sum up, in selecting batteries, clinicians have to take into account that multiple cognitive domains must be examined with tests that are sensitive to generalized dysfunction. This is necessary to detect both focal changes due to the tumor effects and more general dysfunction or unexpected changes due to plasticity and undergoing therapies. The gold standard would be to set a brief battery of less than an hour, but it has to be brought in mind that a serious and reliable evaluation requires at least an hour and a half and could be possibly broken into different sessions.

Given all these constraints, we have developed our own battery, which is mainly intended to evaluate LGG patients, who undergo surgical removal of the lesion, but we have used this same battery with almost all types of brain tumor. It includes tests that are performed by all patients and a selected group of tests depending on the tumor location.

Materials

The Milano-Bicocca Battery (MIBIB)

This battery investigates language, memory, apraxia, including visuo-constructional abilities, and executive functions. Spatial cognition is assessed only in specific cases (see below). The total time of administration in its long version is 1.5–2 h. A shortened version of this battery requires approximately an hour (or even less, see below), depending on the patient’ cognitive abilities. Patients are submitted to the neuropsychological evaluation in the week before surgery, immediately post-surgery, and then every 3 months as follow-up. We always use a shortened version in the post-surgery session. For all tests, raw scores are adjusted for age, education, and, when indicated, for sex, according to the parameters estimated in a normal sample (200–321 neurologically unimpaired subjects) with a multiple regression model. Adjusted Scores that are <5% one-sided non-parametric tolerance limit (with 95% CI), are considered pathological; inferential cut-off scores are therefore those at which or below which the probability that an individual belongs to the normal population is <0.05 (see for example [15]).

  1. (a)

    Language. For the reasons mentioned above, we do not use standardized language examinations. Instead, we have selected a group of tests, which have proved to be sensitive enough to detect very mild deficits. These are: verbal fluency on phonemic and semantic cue [16], picture naming of people [17], picture naming of objects and word-picture matching [18], picture naming of actions (normative data collection is in progress), naming by description [16], an 80-item sentence–picture matching ([19]; currently a new version is being standardized), and a token test [20]. Repetition is evaluated by means of the nonword, word, and sentence repetition from the BADA [21]. Nouns and verbs are balanced for word frequency and age of acquisition. Nouns are also balanced for semantic categories, picture typicality, image complexity, semantic relevance, name agreement, and familiarity. Famous people are graded for the period of their fame and represent four different professional categories (artists/scientists, athletes, actors, politicians). When awake surgery is scheduled, we submit the patient to the same tasks three non-consecutive times, and we use for language mapping all the stimuli that were correctly named three times out of three with no latency. As far as possible, we also keep these selected stimuli balanced for the relevant variables. In the case of picture naming of objects, two different versions are available, one 82-item test for patients apparently without or with very mild deficits and an abbreviated 48-item version for patients with moderate deficits or when there is need to keep the evaluation shorter (e.g., patients who cannot tolerate the 2-h assessment).

  2. (b)

    Memory. Short- and long-term, verbal and visuo-spatial memory tests are included. More specifically, we use the digit span and the Corsi span [22], word list learning [23], supraspan learning [24], and Rey figure reproduction [25]; we are currently collecting normative data for the Taylor figure that we use as alternative material to avoid learning, while alternative standardized lists of words are already available for verbal recall. Copying of the complex figure precedes its long-term reproduction, allowing testing visuo-constructional abilities. The long version of the battery includes all memory tests, while the Corsi supraspan learning is not performed in the shortened version.

  3. (c)

    Executive functions. The following tests are performed: Raven colored progressive matrices [26] to assess nonverbal intelligence, the Weigl test [27] and the Wisconsin test [28], attentional matrices [27] and the Stroop test [29] for selective attention, and the trail-making test [30] for divided attention.

  4. (d)

    Apraxia. Orofacial, ideomotor [31] and constructional apraxia (see [25]) are investigated. However, since we never found any impairment of oral and ideomotor praxis, these tests are now omitted, unless there are specific reasons.

  5. (e)

    Spatial cognition is evaluated by means of a battery [32] that is performed only when specific deficits are expected (e.g., right parietal lesions). This battery includes line bisection, star cancellation, letter cancellation, reading of sentences, and drawing (copy and mental).

As we have mentioned, other specific tests can be added depending on the tumor location. For example, in extensive frontal lesions, a modified version of the Iowa gambling task [33] is used. In frontal LGG involving the insula, we administer the Ekman test [34] to assess facial expression comprehension. In temporal LGG, especially when no deficits are detected with the extended version of our battery, we further investigate patients’ semantic abilities by means of semantic judgments on triplets of abstract and concrete nouns and verbs, and we also record response times. We have detected increased latencies at the follow-up, even when no clinical deficits are evident.

No occipital tumors were found in our series; therefore, no specific tests were ever used.

Patients

Two hundred twenty-six patients were evaluated from January 2007 to November 2010 by means of the battery described above. They were all tested immediately before surgery, in the week after surgery, and every 3 months. Until November 2010, at least one follow-up at 3 months was collected for 117 patients (see Table 1 for the sample data).

Table 1 Clinical data of the patients’ sample

Preliminary results

Some tests appeared particularly sensitive to brain damage. These were (see Table 2 for percentage of impaired patients before surgery and at a 3-month follow-up):

  1. 1.

    picture naming of famous people and picture naming of objects, which were impaired in left frontal and temporal patients;

  2. 2.

    picture naming of actions, which was impaired in left frontal, temporal and parietal patients;

  3. 3.

    verbal fluency on phonemic cue, which was impaired in both left and right frontal patients and in left temporal and parietal patients, therefore proving to be particularly sensitive to brain damage;

  4. 4.

    verbal fluency on semantic cue, which was minimally impaired in left frontal and temporal patients at the pre-surgery evaluation, but proved to be sensitive to left temporal removal;

  5. 5.

    the Weigl task proved to be sensitive to frontal damage, but also sensitive to general damage, since 31% of left parietal patients and 11% of all temporal patients were impaired; and

  6. 6.

    word list learning also proved to be impaired in almost all types of lesion, both in the immediate and delayed recall (see Table 3).

Table 2 Percentage of patients with an impaired performance at the neuropsychological evaluation, before surgery and at a 3-month follow-up
Table 3 Percentage of patients with an impaired performance at the immediate and delayed auditory verbal recall depending on tumor location

We performed a series of logistic ordinal regression analyses to control whether there were specific tests that were associated with relapsing. Preliminary results showed that for left temporal tumors these were: delayed verbal recall (b = −0.76, p = 0.04), face naming (b = −1.08, p = 0.01), object naming (b = −0.43, p = 0.04) and verbal fluency (b = −0.89, p = 0.008). When the tumor volume was included as covariate, only verbal fluency was predictive of relapsing (p = 0.048). In the case of frontal tumors, only patients’ performance on attentional matrices seemed to be associated with relapsing (b = −0.496, p = 0.03). When left frontal and temporal gliomas were considered together, object naming was the best predictor (b = −0.29, p = 0.05), even when the volume, site and grade were taken as covariates (p = 0.017).

We also performed a series of logistic regression to control for the effects of chemotherapy on test performance. When all relevant variables were taken into account (age, education, tumor size, grade, handedness, side and site of the tumor, scores before surgery), no effect was found in the available follow-up (6 months). As regards radiotherapy, only 20 patients in our series underwent this treatment. Given the limited number of patients, no analysis was performed on its effects.

Conclusions

First, we have presented the criteria that we have considered in developing our battery to assess LGG patients; second, we have reported the performance of our first sample of patients on this battery. Data concerning the predictive value of some test must be considered with caution, since several other variables (e.g., type of tumor, grade, etc.) need to be taken into account on a larger number of subjects.

Several issues, however, are worth considering. First, even when it remains in the normal range, the neuropsychological performance in patients with tumor removal decreases. More specifically, it decreases immediately after surgery and then improves, but at a 3-month follow-up it is typically still lower than at the pre-surgery interval (see Fig. 2).

Fig. 2
figure 2

Example of performance in verbal fluency on phonemic cue at three time intervals (T0 pre-surgery, T1 in the week after surgery, T2  after 3 months)

Therefore, a low, though still normal, pre-surgical performance does not give any safety margin to avoid cognitive impairments after removal. This information is relevant for both the patient and the surgeon, since it allows predicting surgery outcome and whether rehabilitation will be necessary. Once again, we do not argue that in these cases tumor removal should be partial, but that the patient has to be informed about the possible consequences.

Second, an informative neuropsychological evaluation requires time. It never occurred to us that a patient would refuse to complete the entire battery. However, we are aware that some clinicians prefer to reduce the time devoted to cognitive assessment. If an extensive battery cannot be performed, there are a number of tasks that need nonetheless to be included, since they are particularly sensitive to general, beside specific, brain damage, and these are: verbal fluency (6 min), picture naming of people and objects (about 15 min), verbal learning (15 min), the Weigl test or the trail-making task (5 min), the total time being about 45 min, which is the best compromise we can accept. Finally, it is recommended to perform a follow-up every 3 months to detect immediate changes that can be predictive of relapsing.