Introduction

The point-prevalence of dementia in Parkinson’s disease (PD) is approximately 30 % with a sixfold increased risk compared to controls (Emre et al. 2007) and a cumulative prevalence of at least 75 % for PD surviving more that 10 years (Aarsland et al. 2003). However, these estimates are not consistent across all studies, and the risk of dementia may be lower in early compared with late-onset PD patients, (Aarsland et al. 2001). Similarly the older the patient and the longer disease duration, the higher the prevalence of dementia (Halliday et al. 2008) further emphasizing that the best predictors of cognitive decline are patient age and disease duration. In recent years, the concept of mild cognitive impairment (MCI), initially developed to detect early Alzheimer’s disease (AD) cognitive changes in healthy controls (Petersen et al. 1999), has been applied to PD to improve detection of patients at risk to develop dementia (Verbaan et al. 2007). In PD, MCI is common but frequencies (ranging from 25 to 36 % in newly diagnosed PD patients) and clinical characteristics across studies are heterogeneous mainly because of methodological differences regarding MCI definition criteria and, most importantly, neuropsychological tests selected for cognitive profiling as well as cutoff score derived from normative data (Aarsland et al. 2010; Foltynie et al. 2004a). Recently, the Movement Disorders Society Task Force has defined MCI diagnostic criteria including clinical and methodological issue for its diagnosis such as examples of various neuropsychological tests to include in each cognitive domain (9). Indeed, diagnostic instruments used in clinical practice may affect the estimate of cognitive deficits and dementia in PD, the early detection of subtle cognitive deficits as well as how specific MCI profile contributes to dementia development. Using standardized cutoff scores, Reidel and colleagues showed that prevalence of cognitive deficits, evaluated by different methods, was 17.5 % by MMSE (≤24), 41.8 % by Clock Drawing Test (≥3), 43.6 % by PANDA (≤14), while 28.6 % met the DSM-IV criteria for dementia (Riedel et al. 2008). The heterogeneous percentage of cognitive deficits in PD associated with the use of different cognitive scales highlights the need to better understand the load of specific neuropsychological tests in explaining the variance associated with cognitive impairment in PD.

The main aim of our study was to identify the neuropsychological tests that best discriminate MCI form PD with normal cognition in a cohort cognitively defined based on current diagnostic criteria (Litvan et al. 2011).

Methods

A series of 104 consecutive PD patients were recruited from the Parkinson’s disease Unit of the ‘San Camillo’ Hospital (Venice Lido, Italy) and at the Neurology Clinic of the University of Padua, Italy. All were Italian-speaking individuals diagnosed with idiopathic PD according to UK Brain Bank criteria. L-dopa equivalent daily dose (LEDE) and dopamine agonist equivalent daily dose (DAED) for each patient were calculated on the basis of theoretical L-dopa equivalents, as follows: L-dopa dose + L-dopa dose × 1/3 if on entacapone + L-dopa dose x 1/2 if on tolcapone, pramipexole (mg) × 100 + ropinirole (mg) × 20 + rotigotine (mg) × 30 + selegiline (mg) × 10 + rasagiline (mg) × 100 + amantadine × 1 + apomorphine (mg) × 10 (Tomlinson et al. 2010). The severity of extrapyramidal symptoms was graded using Hoehn and Yahr (H&Y) and the motor Unified Parkinson Disease Rating Scale (UPDRS-III). The demographic data (age, gender, and education level) and neurological details (age at onset, disease duration) were also collected. We did not include patients with significant cardiovascular problems and history of major psychiatric disorders (i.e., bipolar disorder or major depression). Further, we excluded patients with atypical Parkinsonism and PD who underwent neurosurgical procedures (including deep brain stimulation). In addition, we excluded patients with functional impairment who did have cognitive deficits in only one domain and did not fulfil PD-CNT/PD-MCI or PDD inclusion criteria following the Movement Disorder Society Task Force Guidelines. Depression (BDI-II) and presence of hallucination were evaluated. The study was approved by the ethics committee of the IRCCS San Camillo, Venice, Italy. Written informed consent was obtained from all participants after the nature of the study was fully explained. A summary of demographics and clinical variables in our cohort is shown in Table 1.

Table 1 Demographic and clinical characteristics of PD groups

Neuropsychological examination

All PD subjects performed a comprehensive neuropsychological evaluation including the Mini Mental State Examination (MMSE) (Folstein et al. 1975) to test general cognitive functions; the Frontal Assessment Battery (FAB) to investigate frontal functioning (Apollonio et al. 2005); the WAIS-IV Similarities test to measure verbal comprehension (abstract verbal reasoning) (Lezak et al. 2004); the Category and Letter fluency tasks to evaluate, respectively sematic-based words retrieval and response generation and set-maintenance of task instruction (Baldo et al. 2006; Novelli et al. 1986); Tests of short and long term memory (RVLT, Digit Span Forward and Backwards, Corsi test and ROCF immediate recall) verbal and not verbal (Spinnler and Tognoni 1987; Caffarra et al. 2002a) tests of visual-spatial planning and attention such as the Rey-Osterrieth Complex Figure Test (ROCF) (copy) and Digit Cancellation (Spinnler and Tognoni 1987; Caffarra et al. 2002b); the Stroop Color/Word Interference Test to evaluate response monitoring and conflict resolution (Caffarra et al. 2002a); the Trail Making Test part A to assess visual scanning, numeric sequencing and visual-motor speed (TMT-A), part B to assess general frontal lobe dysfunction (TMT-B) and the time differences between TMT-A and TMT-B (TMTB-A) to evaluate shifting abilities (Giovagnoli 1996). Clock drawing test (CDT) to evaluate planning and visual-spatial abilities (Caffarra et al. 2011). Drawing Coping Test was administered to assess visual-constructive abilities and constructional-apraxia respectively (Novelli et al. 1986). Standardized, published normative datasets were used as comparative references to determine impairments.

Diagnosis of dementia

Dementia was diagnosed based on the Movement Disorder Society task force recommendation criteria (Emre et al. 2007). It was based on neuropsychological test, functional autonomy as well as clinical interview.

MCI definition

Following the Movement Disorder Society Task Force Guidelines (9), each test was allocated into 1 of the following cognitive domains: (1) attention/working memory, (2) executive function, (3) visual-spatial functions (4) memory, and (5) languages. For each domain, at least two tests were included (see Table 2). MCI definition was based on performance in the four cognitive domains. To this end, raw individual test values were converted into Z-scores using relative published Italian normative data corrected for age and education. According to published criteria, patients with deficits of at least −1.5 SD in two scores within any single domain or deficits of at least −1.5 SD in two scores from different domains were classified as MCI (Riedel et al. 2008). Family/friend interview provided information about patient’s functional status that was evaluated through clinical history form assessing problems with everyday activities and livings. A clinical interview was also conducted with family/friend to obtain information about patient’s functional status and to further discriminate the extent of which any decline could be attributed to cognitive rather than motor impairment.

Table 2 Neuropsychological tests administered to evaluate PD cognitive performance

Statistical analyses

Clinical and demographic variables were analyzed using Pearson Chi square test to assess differences in the distribution of dichotomous variables. Independent-sample t test and Analysis of Variance (ANOVA) followed by Bonferroni post hoc test were used to analyze continuous variables. When we analyzed neuropsychological performance among PD-subgroups, we considered only age and education corrected scores. Since one of the main goals of this study was to test the validity of neuropsychological tests to discriminate PDD/PD-MCI from PD-CNT, receiver operating characteristic (ROC) curves with area under the curve (AUC) (95 % CI) were computed. We used for this ROC analysis only neuropsychological tests that had discriminated PD-MCI from PD-CNT.

AUC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and percent correctly diagnosed were calculated for each test. We defined the optimal cutoff point as the value obtained by the intersection of sensitivity and specificity scores of each test. The screening cutoff point was defined as the value achieving >80 % sensitivity and NPV. The diagnostic cutoff point was defined as the value achieving >80 % specificity and PPV. A two cluster-K-means based on sensitivity and specificity of each test was run separately for screening and diagnostic cutoff values to extract cognitive tests with the best screening and diagnostic power in detecting MCI and PDD. All statistics were performed using SPSS version 20 (SPSS, Chicago, IL).

Results

In our series of 104 PD patients, there were 55 PD-CNT (53 %), 34 PD-MCI (33 %) and 15 PDD (14 %) patients. PDD patients were older (p = 0.01), had greater motor severity (H&Y) (p = 0.02) and scored higher for depression (p = 0.002) than PD-MCI and PD-CNT. Bonferroni post hoc comparisons revealed that PD-MCI and PDD patients had lower education (p = 0.001 and p = 0.005, respectively) and longer disease duration (p < 0.001 for both groups) compared to PD-CNT. Frequency of hallucinations was significantly different among groups with the greatest prevalence in PDD [PDD (84 %), PD-MCI (47 %) and PD-CNT (2 %)] (p < 0.001).

Analysis of cognitive performance showed differences in several tasks between the three subgroups. MMSE score discriminated only PDD from PD-CNT (p < 0.001) and PD-MCI (p < 0.001), while no difference was found between PD-CNT vs. PD-MCI. Further comparisons among groups showed the same pattern for the following tests: Stroop/Word interference time test (p < 0.001), ROCF immediate recall (p = 0.002), Digit span Forward (p < 0.01), Corsi test (p < 0.01), Digit Cancellation (p < 0.001), CDT (p < 0.001), and Coping Drawing Test (p < 0.001).

By contrast, another set of neuropsychological tests discriminated PD-MCI from PD-CNT cognitive performance. They were the TMT-A (p < 0.01), TMT-B (p < 0.001), TMTB-A (p < 0.001), FAB (p < 0.001), ROCF copy (p < 0.001), RVLT immediate and delayed recall (p = 0.005 and p < 0.02, respectively), Digit Span Backward (p = 0.002), and Category Fluency Task (p = 0.03 and p = 0.02 for phonologic and semantic, respectively). ROC analyses showed that only TMT-A, TMT-B and TMTB-A, ROCF copy, FAB, Digit Span Backward and RVLT immediate recall reached significant AUC value (>0.7) in detecting PD-MCI (see Fig. 1). The optimal cutoff point, screening and diagnostic values, sensitivity and specificity, PPV and NPV for PD-MCI in each test are listed in Table 3 along with additional norms cutoff values for each test.

Fig. 1
figure 1

Estimated ROC curves for all cognitive tests comparing PD-MCI with PD-CNT

Table 3 Result of ROC comparison between PD-MCI and PD-CNT

Values from ROC analyses in PDD vs. PD-CNT group are available as supplemental data. Cluster analysis (K-means) in PD-MCI and PDD groups using specificity and sensitivity values as variables showed that discriminating screening power in detecting PD-MCI (p = 0.4) and PDD (p = 0.7) did not differ among these tests. However, TMT-B, ROCF copy, and RVLT immediate recall were identified as the best diagnostic instruments for PD-MCI (p = 0.01). No differences were found in terms of diagnostic discrimination power for PDD tests.

Discussion

We found that in our neuropsychological battery the TMT, ROCF copy, FAB, Digit Span Backward, and RVLT immediate recall reached an adequate discrimination power to diagnose MCI in PD and that the TMT-B, ROCF copy, and RVLT immediate recall were the best instruments. Moreover, the diagnostic and screening cutoff value calculated by ROC analyses for each of these tests was widely within the normal range of normative data.

Using the most recent diagnostic criteria, we found that more than half of our PD patients did not have cognitive deficits while 30 % were diagnosed as MCI and 15 % as PDD. These results are similar to those reported by others (Foltynie et al. 2004b; Caviness et al. 2007; Williams-Gray et al. 2009) and reinforce the use of a standardized Z score-based methodology in defining cognitive impairment in PD.

Consistent with the literature we found that demographic and clinical variables associated with dementia were older age (>60), lower educational level, greater motor severity, longer disease duration, greater frequency of visual hallucinations and depression (Williams-Gray et al. 2009; Aarsland et al. 2007; Gjerstad et al. 2002; Hobson and Meara 2004; Hughes et al. 2000). Similar to other cohort studies, PD-MCI had a mean disease duration of 9.5 years, which was significantly longer than the cognitively intact PD (5.8 ± 3.8 years) and shorter than PDD (11.3 ± 3.6) (Caviness et al. 2007).

It is important to notice that in our cohort the best discrimination diagnostic power was achieved by the TMT-B, ROCF copy, and RVLT immediate reproduction task. Each of these tests covers different domains (attention/set-shifting, visual-spatial planning and verbal memory) supporting the concept that cognitive abnormalities in PD are widespread and include posterior in addition to frontal deficits. Indeed, other authors reported a similar pattern already at early stages of the disease highlighting the contribution of temporal lobe dysfunction to the development of dementia in PD. In particular, Muslimovic and colleagues (Muslimovic et al. 2005) observed in a cohort of PD ‘de novo’, cognitive impairment in executive function (attention, set-shifting), memory (free verbal recall), and visual-spatial abilities. Moreover, longitudinal prospective studies showed abnormalities of posterior type with hippocampal alterations associated with posterior cortical profile (like in AD) vs. the fronto-striatal type (classically attributed to PD) (Janvin et al. 2006; Weintraub et al. 2004; Whittington et al. 2006; Pagonabarraga et al. 2008).

We found that cutoff scores reported by normative data for all the seven tests identified are not useful in detecting PD-MCI. Indeed, ROC analysis showed that screening and diagnostic PD-MCI cutoff values for these tests would lie within the healthy subject range according to published normative data (Apollonio et al. 2005; Caffarra et al. 2002b; Mondini et al. 2003; Carlesimo 1996). This has clinical consequences since current scoring of individual tests likely leads to a high percentage of false negative (MCI diagnoses missed). We believe this represents one possible source of variability in PD-MCI frequencies across studies in the literature. We acknowledge that no definitive gold-standard exists for PD-MCI in life diagnosis, although we used the best currently available criteria. However, considering that the ROC analysis was performed only on those tests that provided best discrimination we believe we avoided circularity.

Finally, even if several potentially useful screening and diagnostic instruments have been proposed in PD, there is still no consensus about the most appropriate neuropsychological test battery. We believe our study provides for the first time a significant clinical contribution since inclusion of these specific tests should be highly recommended for PD cognitive screening.

Consistent with others studies, analysis of MMSE validity showed poor sensitivity and specificity (AUC < 0.7) as screening and diagnostic cognitive instrument in PD (Litvan et al. 2011). Moreover, our study, in line with Movement Diagnostic Society task force recommendations, defined values <26.4 as MMSE cutoff for PDD confirming that this test shows poor sensitivity in detecting MCI (Emre et al. 2007).Recently, the Montreal Cognitive Assessment (MoCA) was suggested as suitable cognitive battery (AUC > 0.7) (Carlesimo 1996; Hoops et al. 2009). Although we did not use this test in our study, we found that our best discrimination tests for MCI reached similar sensitivity and specificity for both screening and diagnostic values.

Although our study was conducted with a large sample of PD, the number of PDD cases was relatively small in terms of statistic validity. We did not study a matched non-PD control group, but the aim of our study was not to compare cognitive abilities between healthy subject and PD. Finally, in our neuropsychological battery, language abilities were not adequately tested and for this reason we could have underestimated PD-MCI or PDD frequency in our population.

In conclusion, we found that specific neuropsychological tests covering verbal memory impairment, attention/set-shifting, and visual-spatial deficits are the best predictors of MCI in PD if valid cutoff scores are used. This may help cognitive preclinical diagnosis, ameliorate therapy selection and be used to establish the rate of cognitive decline in prospective PD studies.