Introduction

Cognitive impairment is common in patients with Parkinson’s disease (PD) [1], and its spectrum may range from mild cognitive impairment (MCI) to PD dementia (PD-D), with PD-related MCI (PD-MCI) representing a risk factor for PD-D [2]. PD-MCI may occur early in PD course [3], and progresses frequently and quickly to PD-D [4]. MCI recognition is recommended because it predicts a diffuse/malignant PD subtype [5].

A Movement Disorder Society (MDS) task force has recently delineated diagnostic criteria for PD-MCI [6]. According to these criteria, possible PD-MCI (level I) is based on abbreviated assessment, but level II diagnosis requires a comprehensive neuropsychological evaluation and permits subtyping (i.e., single-domain vs. multiple-domain) of MCI. MDS level II criteria stipulated at least two neuropsychological test in each of the five cognitive domains (i.e., attention and working memory, executive function, language, memory, visuospatial function), offered examples of tests for each domain, but acknowledged the need for additional studies on the optimal number and type of tests to be used [6].

Neuropsychological testing is the gold standard for objectively diagnosing PD-MCI because patient and caregiver reports are often inaccurate [7]. Comprehensive neuropsychological batteries usually applied for PD-MCI may be time-consuming and burdensome for patients and test administrators [8]. A quick and efficient test battery would be helpful in the clinical setting and for research on PD-MCI. A very recent study on English-speaking PD patients documented that two tests per domain provides highly practical, robust diagnostic assessment, and offered a 10-test battery that included two tests per each of the five cognitive domains and proved to be highly sensitive and specific for PD-MCI diagnosis [8]. This 10-test battery however cannot be directly translated to other PD populations, because of the issues related to its validation and standardization across different cultures and languages. In particular, data on which tests should be used in Italian-speaking patients are scanty [9]. Moreover, the more reliable neuropsychological battery may differ according to age, but this issue has not been consistently explored in PD-MCI.

To offer some new pieces of information on which is the optimal neuropsychological battery in Italian PD patients, we retrospectively assessed data from a group of people with PD who have undergone cognitive testing with a comprehensive 23-item battery. The study was aimed to derive the best-performing 10-test battery (i.e., two tests per each of the five cognitive domain), to explore its accuracy for diagnosing PD-MCI in comparison to the full battery, and to understand which tests are the most efficient ones in older patients. A secondary aim was to explore the role of this battery for subtyping PD-MCI according to single-domain vs. multiple-domain involvement.

Materials and methods

Subjects

We retrospectively evaluated data from 150 consecutive Italian PD patients who underwent cognitive testing. The study was carried out in accordance with the principles of the Declaration of Helsinki as revised in 2001. Inclusion criteria were as follows: (a) diagnosis of PD according to the UK PD Brain Bank Criteria [10]; (b) no PD-D [11]; (c) Italian as mother language; (d) no other reasons for cognitive impairment (e.g., delirium, stroke or cerebrovascular disease, head trauma, metabolic abnormalities, adverse effects of medication); (e) no other PD-related conditions (e.g., marked motor impairment, severe or unpredictable motor fluctuations and/or dyskinesia, severe anxiety, excessive daytime sleepiness, or psychosis) that could have significantly influenced cognitive testing [6, 12]. Depression was assessed with the Beck Depression Inventory II (BDI-II) [13], with a cut-off of 14 for the presence of any depression and a cut-off of 28 for severe depression [14]. Depression was not an exclusion criterion, except if severe (i.e., patients with a BDI-II score >28 were excluded), because it affects around 35% of PD patients [15] and including PD patients with mild to moderate depression would have resulted in a more real-life scenario. The severity of PD motor symptoms was measured with the Modified Hoehn and Yahr staging scale and the Unified Parkinson’s Disease Rating Scale [16]. Total levodopa equivalent daily dose (LEDD) was calculated according to conversion formulae [17].

After screening for inclusion criteria (Fig. 1), 79 patients (46 males, 33 females; age: average 70.4 ± 9.3 years, median 72, range 44–88; education: average 7.6 ± 3.2 years, median 8, range 4–17) were included in the study.

Fig. 1
figure 1

Flow diagram of the study and reasons for patients’ exclusion

Neuropsychological assessment

All patients underwent the Mini Mental State Examination (MMSE) and our full 23-test cognitive evaluation, which were performed by two expert neuropsychologists (AF, MT) with the patient in stable ON state [12]. Cognitive evaluation included at least two neuropsychological tests for each of the five following cognitive domains [6]. Attention and working memory were explored with Digit Span Forward, a subtest of the Wechsler Memory Scale [18], Interference Memory Task 10″ and 30″, based on the Brown-Peterson paradigm [19], Attentional Matrices part I for selective attention and part II for divided attention [20], and Trail Making Test (TMT) part A [21]. Executive functions were examined with TMT part B [21], Frontal Assessment Battery [22], Phonemic Fluency Test, Clock Drawing Test, and Cognitive Estimation Test, three subtests of the Esame Neuropsicologico Breve 2 (ENB-2, Short Neuropsychological Examination version 2) [23]. Language was tested with the short form of the Boston Naming Test [24], Object Naming Test and Verb Naming Test, two subtests of the Esame Neuropsicologico Per lAfasia (ENPA, Neuropsychological Examination of Aphasia) [25], and the Token Test, a subtest of the ENB-2 [23]. Memory was examined with Rey’s Auditory Verbal Learning Test immediate and recall [26], and Prose Memory Test immediate and recall, two prose recall subtest of the ENB-2 [23]. Visuospatial function was explored with Benton’s Judgment of Line Orientation [27], the Clock Copying Test [8], the Geometrical Figures Copying Test, a subtest of the Mental Deterioration Battery [26], and the Intersecting Pentagons derived from the MMSE [12].

Diagnosis of PD-MCI

The diagnosis of PD-MCI was made according to the MDS Task Force level II criteria, which include (a) gradual decline, in the context of established PD, in cognitive ability reported by either the patient, informant, or observed by the clinician, and consisting of at least 1 item of the instrumental activities of daily life scale; (b) cognitive deficits not sufficient to interfere significantly with functional independence, although subtle difficulties on complex functional tasks may be present, as documented by normal basic activities of daily life scale; (c) impairment on at least two neuropsychological tests, represented by either two impaired tests in one cognitive domain (single-domain PD-MCI) or one impaired test in two different cognitive domains (multiple-domain PD-MCI) [6, 12]. Based on previous studies, impaired performance on a neuropsychological test was defined as a score that was at least 1.5 standard deviations (SDs) below the age-adjusted mean from normative data [9, 12, 28]. Correction for education was applied when available.

Statistical analysis

All tests were carried out with the IBM SPSS version 20.0 statistical package. The normality of variable distribution was analysed with the Skewness-Kurtosis test. For the comparison of demographic and clinical variables according to the presence/absence of PD-MCI, the unpaired t test and the non-parametrical Mann-Whitney U test were used for continuous variables and the Fisher’s exact test for dichotomous ones. For identification of the two best-performing tests within each domain, least absolute shrinkage and selection operator (LASSO) logistic regression analysis was applied [8, 29]. We compared the 10-test battery, which included the two best tests within each domain selected by LASSO ranking, to the full 23-test neuropsychological battery (gold standard) with receiver-operator characteristics (ROC) curve and calculated the area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and kappa value for diagnosing PD-MCI and for determining single- and multiple-domain impairment subtypes [8]. P < 0.05 (two-tailed) was taken as the significance threshold for all the tests.

Results

Clinical data and the full neuropsychological battery from the 79 PD patients were reviewed and, according to the MDS Task Force level II criteria [6], PD-MCI was diagnosed in 41 patients (52%). Six out of the 41 (15%) PD-MCI patients were classified as single-domain MCI (executive function: N = 5, memory: N = 1), and the other 35 patients (85%) were classified as multiple-domain MCI (two domains: N = 28, three domains: N = 5, four domains: N = 1, five domains: N = 1; attention and working memory: N = 19, executive function: N = 30, language: N = 2, memory: N = 23, visuospatial function: N = 2). Demographic and clinical variable according to the presence or absence of MCI are reported in Table 1. Education was significantly shorter in patients with MCI (6.9 ± 2.6 years) than in those without MCI (8.4 ± 3.5 years, p = 0.03; Table 1). PD motor signs were more severe in MCI group (H-Y: 2.2 ± 0.8; UPDRS-III: 25.7 ± 8.6) than in patients without MCI (H-Y: 1.9 ± 0.8, p = 0.04; UPDRS-III: 19.5 ± 7.8, p = 0.01; Table 1). The other variables did not differ significantly between the two groups.

Table 1 Characteristics of PD patients, according to the diagnosis of PD-MCI (MDS Task Force level II criteria [6])

Within each domain, we identified the two best tests based on their LASSO regression coefficient (i.e., larger coefficient reflects higher rank; Table 2) and obtained a 10-test battery that included the following: (a) Interference Memory Task 30″, Attentional Matrices part II (attention and working memory); (b) TMT part B, Phonemic Fluency Test (executive function); (c) Object Naming Test, Verb Naming Test (language); (d) Rey’s Auditory Verbal Learning Test immediate, Prose Memory Test immediate (memory); (e) Geometrical Figures Copying test, and Intersecting Pentagons (visuospatial function; Table 3). The AUC of the 10-test battery for PD-MCI diagnosis (full 23-test battery as gold standard) was 0.95 [95% confidence interval (CI): 0.90–1.00]. With a cut-off value of ≥2 abnormal tests, the 10-test battery showed 73% sensitivity, 100% specificity, 100% PPV, and 78% PNV (kappa = 0.72 [95% CI: 0.58–0.87]) for diagnosing PD-MCI. Using a cut-off value of ≥1 abnormal tests, the 10-test battery had 98% sensitivity, 71% specificity, 79% PPV, and 97% PNV (kappa = 0.74 [95% CI: 0.60–0.87]). The 10-test battery showed 69% sensitivity, 100% specificity, 100% PPV, and 35% PNV (kappa = 0.39 [95% CI: 0.15–0.63]; AUC = 0.87 [95% CI: 0.75–0.99]) for diagnosing single-domain vs. multiple-domain PD-MCI.

Table 2 Characteristics of neuropsychological tests for the diagnosis of PD-MCI
Table 3 Best-performing neuropsychological tests for each cognitive domain

Since age may affect cognitive testing, and for some of the tests that turned out to perform best in the overall population of PD patients (e.g., TMT part B), correction may be difficult for elderly people, LASSO logistic regression analysis was retested in older patients (i.e., age > 70). When exploring patients with age > 70 years (N = 45; 26 PD-MCI patients, of whom 3 single-domain and 23 multiple-domain MCI according to the full 23-test battery), the resulting 10-test battery was slightly different, in that the best-performers tests included the following: (a) Attentional Matrices part II (LASSO regression coefficient: 0.87) and Interference Memory Task 30″ (0.68) for attention and working memory domain, (b) Phonemic Fluency test (0.51) and Frontal Assessment Battery (0.35) for executive function domain, (c) Verb Naming Test (0.51), and Object Naming Test (0.49) for language domain, (d) Prose Memory Test immediate (0.48) and Prose Memory Test recall (0.37) for memory domain, (e) Geometrical Figures Copying Test (1.09) and Intersecting Pentagons (0.24) for visuospatial function domain (Table 3). The resulting 10-test battery (cut-off value ≥2) showed 84% sensitivity, 100% specificity, 100% PPV, and 79% PNV (kappa = 0.80 [95% CI: 0.63–0.97]; AUC = 0.93 [95% CI: 0.85–1.00]) for diagnosing PD-MCI in comparison to the full 23-test battery gold standard and 86% sensitivity, 100% specificity, 100% PPV, and 57% PNV (kappa = 0.66 [95% CI: 0.35–1.00]; AUC = 0.89 [95% CI: 0.77–1.00]) for separating single-domain vs. multiple-domain PD-MCI.

Discussion

We explored the MDS task force criteria for PD-MCI level II diagnosis and subtyping [6] in Italian-speaking population, identified which are the best-performing tests among each of the five cognitive domains, and derived a 10-test battery that could quickly and efficiently diagnose PD-MCI patients and may be helpful to subtype them according to the presence of single- or multiple-domain PD-MCI with a reasonable reliability. We also documented that a different 10-test battery should be used in older PD patients.

The recent MDS task force diagnostic criteria for PD-MCI represent an important step towards better and earlier recognition of this condition that may affect patient’s quality of life [30], is a risk factor for PD-D [2, 4] and a predictor of a more unfavourable PD course [5]. However, additional work is needed on this topic, because uncertainties about the application of level II criteria and the best neuropsychological tests may cause consistent variability in the number of patients diagnosed as PD-MCI (i.e., 33–79%) [28].

Tests for attention and working memory (N = 6) and executive function (N = 5) were prevalent in comparison to other cognitive domains (N = 4 per domain) in our full neuropsychological battery. This slight imbalance among domains comes from the interest for those that are known to be more involved in PD-MCI patients according to current literature [31], but it should not represent a bias because, in a previous study, the probability of documenting an impairment was found not to increase when more than two tests were applied per domain [8].

Based on LASSO logistic regression analysis data, the two best-performing tests were identified for each domain and we derived a 10-test battery. Only two of them (i.e., TMT part B, Intersecting Pentagons) overlapped with those obtained in a very similar study aimed to identify the ten best-performers out of a 19-test battery to diagnose PD-MCI in English-speaking patients recruited in Chicago [8]. The reasons for this discrepancy include the differences in the full cognitive battery from which the tests were selected, and the issues related to their translation and validation across different languages and cultures. Some of the tests we identified (i.e., TMT part B, Prose Memory Test immediate, Object Naming Test, Verb Naming Test) overlapped with those that turned out to be good predictors of PD-MCI in a study exploring the sensitivity and specificity of different neuropsychological tests in Italian PD patients [9].

Our 10-test battery showed 73% sensitivity and 100% specificity with a cut-off of two or more abnormal tests, but the sensitivity raised to 98% at the expenses of reduced specificity (i.e., 71%) when using a ≥1 cut-off. Based on these figures, and in line with the MDS task force diagnostic criteria [6], we suggest using the ≥2 cut-off. However, a single abnormal test might suggest to follow-up the patient and repeat neuropsychological testing after some time. Prognostic significance of a single abnormal test should be explored in future prospective studies.

Based on previous reports, impairment on a neuropsychological test was defined as a score that was at least 1.5 SD below the age- and education-adjusted normative data [12, 28], which was considered the best trade-off between type I and II errors [9, 32]. The alternative level II criteria of significant decline on serial cognitive testing or decline from estimated premorbid level [6] were not used in our study because of the difficulties and uncertainties in their application [12].

The 10-test battery showed 100% specificity, but 69% sensitivity and only 35% PNV for subtyping PD-MCI. However, the large imbalance between multi-domain (85%) in comparison to single-domain PD-MCI patients (15%) might have influenced these data and this point is a limitation of the present report. Future studies on larger samples should confirm our results.

The estimated time of application of our 10-test battery is around 30′, which, in the clinical setting, makes it a good deal between the time required by the examiner, the burden for the patient, and the risk that motor fluctuations might occur and bias testing.

We examined best-performing tests in older (i.e., age > 70 years) patients and derived a slightly different battery that included Frontal Assessment Battery and Prose Memory Test recall instead of TMT part B and Rey’s Auditory Verbal Learning Test immediate. The resulting 10-test battery (estimated time of application around 30′) showed 84% sensitivity and 100% specificity for PD-MCI diagnosis (cut-off ≥2), and 86% sensitivity and 100% specificity for separating single-domain from multiple-domain PD-MCI. These figures are in keeping with the influence of age on cognitive tests, a topic that has been seldom explored in PD patients [33], and that warrants further studies.

Limitations of the present study include its retrospective design, and the absence of follow-up data. Longitudinal studies on PD-MCI patients showed that they may either progress to PD-D, remain stable or revert to normal cognition [34, 35]. Reasons for the presence of PD-MCI reverters include comorbidities, measurement errors, learning effects due to repeated neuropsychological testing, improved cognition after initiation of symptomatic treatment, suboptimal treatment of motor symptoms at the time of first testing, motor fluctuations, psychiatric symptoms or drug side effects [12, 34]. These limitations should be considered in future prospective studies aimed to explore our 10-test battery in terms of test-retest reliability and prognostic significance for conversion to PD-D.