Introduction

It is now well accepted that patients with Alzheimer pathology manifest subtle cognitive complaints and deficits for many years prior to the onset of dementia proper [44]. There is also good evidence that the predominant early deficit is in the domain of episodic memory [8, 42] although controversy exists as to whether other impairments, notably involving attention/executive function and semantic memory, are also consistently present early in the course of the disease [29].

The label of mild cognitive impairment (MCI) has emerged as the preferred diagnostic term to designate subjects in this early pre-dementia state [33]. Initially MCI was applied to patients with an amnestic disorder but otherwise preserved cognitive ability and intact activities of daily living (ADL) but recent studies employing more demanding neuropsychological tests have shown that many patients presenting with purely memory complaints, who do not fulfill criteria for dementia on ADL grounds, have cognitive deficits beyond episodic memory leading to an expansion of the concept of MCI to embrace so called multi-domain MCI (mdMCI) [30, 43, 44]. In addition, there is also a subset of Memory Clinic attendees who complain of “poor memory” as a convenient self-report label but, in fact, demonstrate non-amnestic deficits on neuropsychology testing, such patients have been said to have non-amnestic MCI (naMCI).

There has been considerable interest in prediction of progression to dementia in MCI. Estimates have varied considerably from 6 to 45% per annum [1, 5, 6, 9, 11, 14, 17, 18, 2123, 33, 36, 40, 41]. This literature is confounded by the fact that the definition of MCI has varied considerably across the studies; many earlier investigations combined patients with what would now be termed pure aMCI and mdMCI; furthermore there was significant variability in the number of subjects per study and follow-up duration. A wide range of predictive variables have been explored and there seems little doubt that structural MRI measures of medial temporal lobe structures, PET measures of cerebral metabolism, CSF biomarkers and ApoE genotyping are all important contributors for use in research centres [7, 10, 27] and will become increasingly important as disease-modifying treatments become available. These measures are, however, not universally available in routine clinical practice. We were interested in the predictive value of simple clinical measures, plus neuropsychological assessment, in patients with subtypes of MCI presenting to a busy memory clinic. An earlier study of patients labelled as questionable dementia (this category collapsed combined patients who would now be labeled aMCI, mdMCI and naMCI) found that errors on the paired associative learning (PAL) test from the CANTAB battery, combined with scores on the Graded Naming Test (GNT) was predictive of patients destined to develop AD [4, 39]. A subsequent study also suggested that the Addenbrooke’s Cognitive Assessment (ACE) had a major predictive value and was superior to more focused neuropsychology tasks which evaluate a single cognitive domain [16].

The aim of the current study was to investigate outcome after 2 years in a group of non-demented, non-depressed memory clinic attendees who we had previously sub-categorized [2]. Based upon the emergent literature, we hypothesized that the mdMCI would have the highest rate of short-term progression to AD. We thought that a proportion of the naMCI may declare alternative diagnoses on follow-up and that some of the worried well might progress to MCI. A second aim was to look at the predictive values of various simple cognitive tests notably the PAL, GNT and ACE since these had previously been shown to have good predictive value.

Method

The Cambridge MCI study aims to investigate the cognitive profiles and outcome of MCI. A total of 166 participants were included in the study consisting of consecutive referrals to the Cambridge Memory Clinic between June 2003 and March 2005 from General Practitioners in the Cambridge area in whom the referral letter suggested the possibility of early dementia. Over the same time period 150 referrals from other specialists were received who were not included in the study since these patients had (in general) established dementia or neuropsychiatric syndromes. All were aged 50 or over, had an informant, and were examined by an experienced behavioural neurologist (PJN or JRH). The Mini Mental State Examination (MMSE) [15] and Addenbrooke’s cognitive examination (ACE) [3, 12, 24] were used to assess general cognitive status. Impact on everyday activities was evaluated using the Clinical Dementia Rating Scale (CDR) [26]. Depression and anxiety were scored using Hospital Anxiety and Depression Scale (HADS) and Geriatric Depression Scale (GDS) [38]. Patients were investigated with a standard battery of screening blood tests and brain imaging (CT scan or MRI). Patients with established dementia (DSM-IV), significant depression (clinical judgement or HADS >14) or other medical conditions such as alcoholism, stroke, epilepsy or head injury were excluded. The study was approved by the Local Research Ethics Committee. Control data were obtained from 30 age- and education-matched normal volunteers drawn from the MRC-Cognition and Brain Sciences Unit subject panel. All 124 non-demented and non-depressed patients were invited for annual follow-up. Of the 124, 107 patients were assessed over a 2 year period: 13 refused, two moved away from the area and two were un-contactable. All 107 had a repeat of the neuropsychological evaluation and were seen for a clinical assessment by a senior neurologist (JRH or PJN). Baseline demographic and neuropsychological performance of the 107 subjects included in the follow-up study is shown in Table 1. There were no significant intergroup differences for age, sex distribution or years of education.

Table 1 Demographics and neuropsychological test scores of all 107 patients followed up

Neuropsychological assessment

Four cognitive domains were assessed: episodic memory, language and semantic memory, attention and executive functioning, and visuospatial skills. We selected tests that are widely used in routine neuropsychological practice and are sensitive to early deficits in these cognitive domains.

  1. 1.

    Episodic memory

  2. (a)

    Rey Auditory Verbal Learning Test (RAVLT) [37]

The RAVLT was administered in the standard manner which consists of five learning trials of a 15-word list (with the subject asked to repeat back as many items as possible after each trial). A distracter list is then presented once, after which the subject is asked to recall as many items as possible from the original list (immediate recall). Delayed recall of the same list is assessed after 30 minutes and then, lastly, recognition is measured through identification of the 15 original target words from a list containing 35 foils.

  1. (b)

    Rey complex figure test [35]

Subjects were asked to copy this figure freehand, and without time restriction. After an interval of 30 minutes, subjects were asked (without warning) to reproduce from memory the figure which they had copied.

  1. (c)

    Paired Associates Learning (PAL)

Subjects were administered a modified and shortened version of the PAL from the CANTAB battery [4, 39]. This test is given in two phases. In the first, introductory phase, six white boxes appear on a touch-sensitive computer screen. Each box “opens” and “closes” in a random sequence, revealing in three of them three different simple coloured patterns. Once all boxes have opened and closed, the patterns are presented in random order in the centre of the screen and the subject touches the box in which he or she remembers each pattern appearing. Up to ten attempts are allowed to achieve all three correct. As soon as success is achieved, the main test phase starts in which all six boxes have different patterns and again the subject has up to ten attempts to remember which pattern appeared in which box. The final scores include number of trials to success in each phase and number of pattern-position errors in each and both phases.

  1. 2.

    Semantic memory

  2. (a)

    Category fluency

Subjects were asked to produce as many different category exemplars as possible in one minute, from the category ‘animals’.

  1. (b)

    Naming

Subjects were asked to name the 30 line drawings from the Graded Naming Test described by McKenna and Warrington [25]

  1. 3.

    Attentional-executive functioning

  2. (a)

    Trail Making Test A and B [34]

Subjects were instructed to sequentially connect 25 circles on a sheet that contained the numbers 1 through 25 in Part A, and the numbers 1 through 13 and the letters A through L in Part B. Part A required that individuals connect the circles in ascending sequence from 1 through 25. Part B required that individuals connect the circles in an ascending sequence that alternated between numbers and letters (1, A, 2, B etc.). The total number of seconds required to complete Part A and B is separately measured.

  1. (b)

    Letter fluency

Subjects were asked to produce as many words as possible in 1 min that begin with the letter P.

  1. 4.

    Visuospatial skills

  2. (a)

    Copy of the Rey complex figure

See above.

MCI and worried well definitions

  1. 1.

    Pure amnestic MCI (aMCI)

Of 124 non-demented, non-depressed patients, 26 fulfilled the following criteria (modified from [19, 31]: (1) memory complaint corroborated by an informant; (2) abnormal memory function documented by either impaired total learning across the five trials, immediate or delayed recall of the RAVLT and/or impaired recall of the Rey complex figure, using a 10th percentile cut-off based on controls; (3) normal general cognitive function as determined by a battery of neuropsychological tests designed to probe semantic memory, attention-executive functions, visuospatial ability (see above); (4) normal or minimally impaired in ADLs, as determined by a clinician interview with the patient and their informant, and CDR score of 0.5, and (5) not sufficiently impaired, cognitively or functionally to meet NINCDS-ADRDA criteria for probable AD and MMSE ≥24.

  1. 2.

    Multi-domain MCI

Of 124 non-demented, non-depressed patients, 64 fulfilled the above criteria but, in addition, performed below the 10th percentile on one or more non-memory tests in the battery. Of note is the fact that all fulfilled criteria (4) from above with preserved ADL and a CDR score of 0.5 and had a MMSE ≥24.

  1. 3.

    Non-amnestic MCI

Of 124 non-demented, non-depressed patients, 12 had intact memory as defined by their performance on the RAVLT or delayed Rey figure recall but performed below the 10th percentile on one or more non-memory tests.

  1. 4.

    Worried Well

Of the 124 non-demented non-depressed patients, 22 performed normally on all tests.

Statistics

Paired sample tests were used to assess the performance of each patient at baseline and at follow-up. Group means were compared using one-way ANOVA. Sensitivity and specificity calculations, MANOVA and stepwise discriminant analysis were used to predict the group outcome. Individual decline was measured by calculating z scores based upon published norms for the new psychological tests. All statistical functions carried out using SPSS 13.0 for windows and Microsoft Excel.

Results

Group outcome

Patients were deemed to have converted to AD if they showed cognitive decline as defined by a fall on the MMSE to <24 with additional non-amnestic deficits, and a CDR score of >0.5, sufficient to interfere with everyday life and a pattern compatible with a diagnosis of AD. Improvers were the patients who on re-testing scored above the 10th percentile on all objective tests of memory and other cognitive domains.

There were striking differences in outcome according to the patient’s initial classification.

As shown in Fig. 1, of 54 patients with mdMCI, 32 progressed to dementia (~59%) of whom 31 (~57%) met criteria for AD. Given the period of follow-up (24 months) this approximates to a 30% p.a. conversion rate. Only 3 (~5%) improved and the remainder 19 (~35%) were still classified as mdMCI. By contrast, of the 22 pure aMCI, 5 (~23%) acquired an organic diagnosis with 4 (~18%) developing AD, an annual conversion rate of only 9%. One was diagnosed as semantic dementia and another developed a clear affective disorder that would account for the memory impairment. Of note is the fact that 9 (41%) aMCI improved such that they no longer fulfilled criteria for aMCI on the basis of the RAVLT and performed normally on all other tasks; 7 (~32%) remained stable. Of 10 non-amnesic MCI patients, 1 developed dementia with Lewy bodies (10%) and 7 (70%) improved. Of the 21 worried well (WW), no patients progressed to AD, 3 declared other diagnosis (1—Parkinson’s disease, 2—seizure disorders) and 1 aMCI (~5%).

Fig. 1
figure 1

Patient outcome after 2 years according to initial classification

To explore the heterogeneous outcome of the aMCI group we compared the baseline performance of those who converted to AD (n = 4) with the subgroups who remained MCI (n = 7) and who improved (n = 9). As shown in Table 2, there was a graded difference between the groups with the converters clearly performing much worse on all memory measures (RAVLT delayed recall, Rey figure recall and errors on PAL) indicating more marked amnesia at baseline in the converters. Another potential marker of amnesia is the consistency of performance across memory tests: it was possible to be included in the aMCI group by virtue of impairment on a single measure on the RAVLT or Rey figure recall, or alternatively based on impairment on all memory indices. Examination of individual’s profiles revealed that those who converted typically failed on 4 or 5 components, whereas the improvers usually failed on a single task with a score just below the prescribed cut-off.

Table 2 Baseline neuropsychological scores of the aMCI according to classification at 2 years

Predictors of conversion to AD in MCI

To examine which variables predicted progression to dementia (combining across the MCI subgroups) we entered all of the neuropsychology scores into a multivariate analysis (MANOVA). Tests were then selected that discriminated between more than one pair of outcome groups (i.e. converted to AD, remained MCI or improved). The RAVLT and the Rey Figure recall discriminated between all possible outcome pairs, which is not surprising since these were used to classify cases initially. These tests were not used in further analysis. The ACE, PAL errors at the 6 stage, Trails B time, Graded Naming and Animal Category Fluency scores were entered into a discriminant analysis. The relationship between the tests and the group outcome indicated that 72.7% were correctly placed into the converted group and 81.3% into the improved group, with the ACE and 6 pattern stage of the PAL showing greater contribution to group outcome.

Sensitivity, specificity, positive and negative predictive values for conversion to AD were calculated for each of the tests using as a cut-off a score at least 2SD below that of a control group. As shown in Table 3, the ACE and PAL had the best overall scores. Other tests such as the GNT, fluency measures and Trails had very high specificity but low sensitivity.

Table 3 Sensitivity and specificity calculations at 2SD below control data for each neuropsychology test

Stepwise discriminant analysis confirmed these findings. The ACE and PAL together accounted for 99.3% of the variance. Overall 87.5% were correctly identified as improvers and 66.7% as converters. Using the scores achieved at the initial baseline assessment 80 patients completed both the PAL and the ACE and had a known outcome at 2 years: 60 fell in the high risk category (a score of 88 or less on the ACE and/or 14 or more errors on the PAL 6 stage [2, 4, 39] and 20 fell in the low risk category. As shown in Fig. 2, of the 60 high risk group, a very small proportion of patients (5%) improved and were no longer thought to have MCI, 55% converted and 40% remained as MCI. Conversely of the 20 in the low risk group, only 2 (10%) converted, whereas 75% improved and 15% remain classified as MCI. In terms of concordance between diagnostic classification (mdMCI, aMCI and naMCI) and assessed risk it is interesting to note that 48 of 51 (~94%) mdMCI patients fell in the high risk group, compared to 7 of 14 aMCI (50%) and 5 of 15 naMCI (33%).

Fig. 2
figure 2

The proportion of patients in the high and low groups that had converted to AD, improved or remained MCI but showed decline in their cognitive assessment

To analyze further the performances of these two tasks, sensitivity and specificity were calculated with the patients dichotomized initially as converters versus non-converters (including those who improved according to predicted risk). As shown in Table 4, the sensitivity to conversion was extremely high (94%) but specificity was lower (40%) since many of the high risk group may yet convert. In practical terms, the negative predictive value (NPV) is probably the most important: of 20 at low risk only 2 (10%) have progressed, a NPV of 90%.

Table 4 Sensitivity and specificity of the ACE and PAL test when patients are grouped according to conversion to AD

When dichotomized as converters/decliners versus improvers a different picture emerges (Table 5). Sensitivity was again very high at 92% but specificity considerably better at 83%. Thus the algorithm is extremely good at predicting those who will improve over the next 2 years.

Table 5 Predictive outcome when patients are grouped according to whether they had improved or otherwise

As a comparison we used an algorithm devised by Blackwell et al. [4] based upon the PAL and the Graded Naming test to predict low or high risk of converting to AD. This produced an excellent specificity at 100% but sensitivity was rather low at 67%. Positive and negative predictive values were 100% and 39%, respectively.

Worried well

To look at the possibility that some of the apparent WW subjects may be at an even earlier stage of organic brain disease we looked at changes in their neuropsychological profile over 2 years. Of the 21 patients one patient progressed to meet MCI criteria and three declared other diagnosis. Thus, 17 remained in the “Worried Well” category with no significant decline in any of the neuropsychology measures between the baseline assessment and the 2 year follow-up as shown in Table 6.

Table 6 Neuropsychology scores of the WW group at baseline and at 2 years follow up

Discussion

Our findings confirm that patients with subtypes of MCI have radically different outcomes. Those with mdMCI have an almost 60% likelihood of progressing to Alzheimer type dementia within 2 years. By contrast, the conversion rate of those with pure aMCI was only 27% with a greater proportion reverting to normal (41%). Our naMCI group was small but only 1 of the 9 has developed a clear-cut neurological disorder. Within the WW group 3 of 21 received an alternative diagnosis (1—Parkinson’s disease, 2—epilepsy) and one has progressed to MCI. Considering the MCI group as a whole, we found that a combination of the PAL (errors at 6 stage) and the total ACE score was highly predictive of status after 2 years. Those at low risk (ACE ≥88 and PAL ≤14) have a low likelihood of declining to meet AD criteria within 2 years, with a NPV of 80%, whereas the high risk had little likelihood of improvement and a PPV for conversion of 55%. Many of these are likely to progress at a later date.

Current estimates of conversion of MCI to dementia range from 6 to 38% p.a. [1, 5, 6, 9, 11, 14, 17, 18, 2123, 33, 36, 40, 41]. It should be noted, however, that a number of older studies used more clinically based definitions based on the GDS or CDR and precede the formulation of the now widely used Petersen criteria [19]. Another critical difference from the current study is the lack of distinction between patients with additional non-amnestic deficits (mdMCI) from those with pure aMCI. The findings of our study were very similar to those of Tabert et al. [40] who found a 50% conversion rate over 3 years in 64 patients with what they term “amnestic-plus” MCI compared to only 10% of their pure amnestic patients. Together these studies suggest that md or amnestic-plus MCI represents a very substantial risk state. Patients with this disorder very rarely (~6%) improve, over a half progress within 2–3 years and the remainder are likely to convert later [20]. These facts raise the issue of whether such patients should not simply be designated as early or mild AD, rather than the artificial and contradictory label of mdMCI. It could be argued that such patients already meet criteria for dementia having cognitive dysfunction beyond the domain of episodic memory despite their well preserved ADLs and normal MMSE scores. Current trends have dictated the use of the label mdMCI which we find rather unsatisfactory and disingenuous. In early studies from Cambridge such patients were diagnosed as minimal AD and shown to have a very high rate of progression [28]: a very long-term follow-up study of 10 patients confirmed progression to frank dementia in all 10 with pathological confirmation of Alzheimer’s disease in all those (n = 5) coming to post mortem [20]. From a scientific perspective there can be little doubt that a designation of early AD is more satisfactory. With the advent of more effective disease-modifying therapy it will also be important to identify patients with AD before the onset of frank dementia. From the position of patients and family members the situation is more contentious. It could be argued that the use of the term mdMCI spares unnecessary distress in those not destined to convert within 2–3 years. Yet all live under a cloud with the threat of dementia hanging over them.

Following earlier studies, which involved rather small groups of subjects, we confirmed that a combination of a memory test and a global measure (errors at the 6-pattern stage of the PAL test and the total ACE score) were highly predictive of outcome [4, 16, 39]. AD is defined, cognitively, by the combination of memory impairment and additional non-memory deficits; while memory impairment is mandatory to diagnosis, the precise type of non-memory deficits (visuospatial, language etc.) may vary between individual patients. We propose that the predictive strength of an algorithm that combines a targeted memory test with a global measure lies in its ability to capture both the universal (episodic memory), and the heterogeneous (non-memory), impairments that make up the disease. Furthermore, we speculate that the specific utility of the 6-pattern error score from the PAL lies in its relative freedom from the floor effects that often confound measures such as delayed free-recall. The two measures were particularly valuable in predicting those subjects with a good outcome after 2 years with a NPV of 90% (18 of 20 correctly predicted). This means that regardless of symptoms and performance on other tasks patients with a score of >88 on the ACE and/or <14 errors on the PAL can be reassured confidently. As in other studies MCI was defined on the basis of a combination of symptom profile, informant report and under performance on a verbal and/or non-verbal memory task (the RAVLT and Rey figure recall). The fact that a substantial percentage of patients in the aMCI group improved suggests that such tests are over sensitive and vulnerable to the effects of anxiety, mood disturbance and concurrent medical problems, all of which are common in the setting of memory clinics.

The PAL test of associative learning has a 3-pattern stage which is passed with ease and essentially acts to accustom subjects to using a touch screen and to boost confidence. The 6-pattern stage is more demanding and was designed to be sensitive to hippocampal dysfunction and hence the earliest stage of AD pathology [4, 39]. The ACE, by contrast, assesses a wide range of cognitive abilities and is sensitive to episodic and semantic memory as well as impairment in executive and visuospatial skills [12, 16]. The sensitivity of the ACE to early AD reflects the heterogeneity of deficits found in more detailed investigations.

Turning to the pure aMCI group we found a low rate of progression over 2 years (18%) with a greater (41%) proportion reverting to normal. All of these patients qualified for a diagnosis of MCI by virtue of a score on a component of the RAVLT or recall of the complex Rey figure (below 2 SDs of normal) but in many patients this was a marginal impairment and their performance on the PAL fell within the normal range. This again emphasizes the over sensitive nature of some neuropsychology tasks. In contrast to subjects with mdMCI, those with pure aMCI should be given an optimistic prognosis. We would also argue that criteria for aMCI should be refined to require impairment on at least two tests of episodic memory or on a more discriminating test such as the PAL. It should also be noted that compared to mdMCI pure aMCI is a rare disorder. Our initial cohort of 166 patients represented consecutive GP referrals over a 2 year period from a total of greater than 300 assessed in the memory clinic (the remainder being specialist referrals with a range of neuropsychiatric and neurodegenerative disorders). From this cohort we diagnosed pure aMCI in 24 patients of whom 22 were available for follow-up. By contrast, mdMCI is far more frequently encountered. An important reason for designating patients as MCI is to define enriched cohorts of patients with minimal impairment that are, nevertheless, at high risk of short-term decline for inclusion in disease-modifying therapy trials in AD [32]. The rationale for this approach is clear—by including patients who are highly likely to decline in the short-term, researchers are maximizing the chance of detecting a therapeutic signal. This is especially relevant for disease-modifying, as opposed to symptomatic, trials where failure-to-decline rather than symptomatic improvement is the likely measure of pharmacological efficacy. The current results suggest that pure aMCI (as presently defined) is not a desirable group to target for such trials given their scarcity and that ~73% of such cases either improved or, at least, failed to decline significantly over 2 years.

Our cohort contained only 10 naMCI who were available for follow-up. Any conclusions are therefore speculative. Nevertheless over two-thirds (7 of 10) improved and only one developed a dementia; this patient had clinical features of DLB. Other recent studies have shown a more substantial rate of progression but lower than that found in MCI [13, 22]. The group designated WW were also of potential interest. All complained of episodic memory problems with substantiation of change by family or friends. They did not, by definition, meet criteria for MCI at baseline and lacked psychiatric diagnosis. Given that they were of approximately the same age (mean 64.4 years, see Table 1) as our MCI groups we speculated that a proportion, at best, may be at an even earlier stage of pathology. At 2 years all continued to complain of memory difficulties but only one met criteria for MCI on re-evaluation. By contrast, three received other organic diagnoses: one Parkinson’s disease and two a seizure disorder, suggesting their memory complaints represented an ill-defined awareness of prodromic cognitive dysfunction. Moreover, as a group there was no significant decline in any measures including even the PAL test. It remains possible that a few will develop clearer cut MCI but in many we suspect it reflects a personality and cognitive style rather than a progressive disorder. Limitations of the present study are the relatively short period of follow-up and the lack of comparable imaging and biomarker information. It would be of considerable interest to confirm our findings in a larger cohort with parallel imaging and biomarkers, and, ideally pathological confirmation of diagnosis.

In conclusion, over the 2 year follow-up period we observed a progression to frank dementia in almost 60% of those with mdMCI which argues in favour of reverting to a definition of minimal or mild AD. Patients in this group would appear the logical target population for trials of disease-modifying therapy given both their abundance and their high likelihood of short-term decline in a placebo arm. In our experience aMCI is rare and unstable if reliance is placed upon a single memory test to make the diagnosis. Pure aMCI is not a group to be recommended for clinical trials given their scarcity and variable outcome. Overall a combination of the ACE and the PAL are highly effective at predicting progression and detecting those with a benign outcome. Patients scoring >88 on the ACE and/or <14 errors on the PAL can be confidently reassured of a low risk of dementia. naMCI appears a fragile concept and most WW patients remain well after 2 years.