Introduction

A specific pattern of episodic memory impairment, i.e., “amnesia of the hippocampal type”, is the core clinical criterion for the early diagnosis of Alzheimer’s disease (AD) [1, 2]. This amnestic profile, which is characterized by poor learning ability and rapid memory decay over a relatively short period of time, is typically associated with early involvement of the medial temporal lobe (MTL) [3, 4], which is the primary anatomic substrate for memory trace consolidation and storage [5]. Recent research has revealed that this characteristic pattern of amnesia is already recognizable in patients with amnestic Mild Cognitive Impairment (a-MCI) who are destined to convert to AD dementia years before the first clinical manifestation of dementia [6,7,8]. In fact, similar to what occurs in AD dementia the amnestic profile of converter a-MCI patients is characterized by poor learning and diminished delayed free recall ability.

The use of testing paradigms that support memory performance by presenting specific cues at both encoding and retrieval does not improve or only marginally improves the free recall deficits of both AD and converter a-MCI [7, 9,10,11]. This kind of memory impairment, which mirrors genuine deficits in consolidation and storage of episodic memory traces, is in line with neuroanatomical findings showing significantly more marked grey matter reduction that predominantly affects the MTL in converter a-MCI patients [4, 12, 13]. On the other hand, a-MCI patients who will not convert to AD dementia typically present a well-preserved capacity to learn new information, but their memory performance is affected because of deficits in encoding and retrieval skills such as reduced attentional resources, lack of planning, organization or response monitoring [2, 14]. Considering the variability of clinical outcomes over time in non-converter a-MCI patients, various factors can account for the pattern of “dysexecutive amnesia” disclosed by this type of patient. Indeed, impairments in elaborative encoding and strategic retrieval processes have been documented in a-MCI patients who remain stable for various non-amnestic reasons (anxiety, depression, adverse effects of medication, other geriatric or non-neurological conditions, etc.) [2, 14]. Accordingly, since their memory deficit is not underlain by a pure deficit in consolidation and storage subtended by MTL involvement, the use of controlled learning strategies or cueing has been shown to induce improvement or even normalization of their memory performance [7, 8, 11, 14].

In light of the foregoing, it would be useful to provide additional qualitative analyses of patients’ memory impairments to determine the potential occurrence of the typical pattern of MTL amnesia and, therefore, to improve the ability to predict progression to AD dementia from the prodromal stage.

In neuropsychological practice, recall and recognition are two commonly used procedures that allow evaluating verbal long-term memory ability differently by manipulating the amount of information provided during memory retrieval [15]. Specifically, in free recall tests subjects must produce studied items without any external retrieval cues. By contrast, recognition tests present stimuli that serve as cues for retrieving the studied items [16]. It has been documented that recognition procedures are useful for discriminating “pure-amnestic” syndromes, like those of AD patients, from other neurological or psychiatric conditions characterized by memory deficits that are subtended by impaired encoding or retrieval strategies [17,18,19,20,21,22,23]. For example, frontotemporal dementia (FTD) patients, whose memory performance is typically characterized by normal memory consolidation but poor organization and lack of efficient learning strategies due to the earlier degeneration of the prefrontal cortex, have been found to show significant deficits on these measures of delayed free recall [24, 25]. However, a different picture emerges for performance on recognition trials: FTD patients tend to perform better than AD patients [24] and sometimes show no impairment compared to healthy controls [9, 26, 27]. This difference between FTD and AD patients on recognition compared to free recall is generally attributed to the cue which enables FTD patients to overcome retrieval problems; on the other side, the diminished cueing-related improvement in AD patients reflects deficits in storage caused by impaired consolidation of studied items.

It has been demonstrated that poor recognition discriminability is also present in patients with a-MCI [28,29,30], albeit with less impairment compared to AD patients, and that this test could be a useful measure for distinguishing a-MCI patients from healthy older adults [31]. In fact, recognition performance declines less than recall in normally aging patients [15, 32, 33] and, thus, has been shown to improve diagnostic specificity in identifying age-related neurological disorders such as a-MCI [16]. However, only few studies [34, 35, 37, 38] have assessed the prognostic power of the recognition test in predicting progression to AD dementia from the prodromal stage and they have reported controversial results. Therefore, whereas measures of recall has been accepted for decades as the best predictors of the development of AD dementia in a-MCI subjects, it is not clear whether measurements of recognition tasks could be useful for identifying subjects who will convert to AD dementia over time in clinical cohorts of a-MCI.

The aim of the present study was to evaluate whether performance on a word list recognition task enhances the ability to predict subsequent progression of a-MCI patients to AD dementia over and above the prediction value of delayed free recall for the same material. For this purpose, we carried out a longitudinal study in a group of subjects diagnosed as a-MCI at baseline and followed-up for 3 years to monitor their conversion to AD dementia. Baseline memory performances on word list recall and recognition were analyzed to determine their diagnostic ability in predicting later conversion to AD dementia. We expected that the stable group of a-MCI would exhibit more performance improvement in the recognition with respect to the recall task compared to a-MCI patients who subsequently progress to AD dementia. Indeed, we hypothesized that stable a-MCIs could benefit from the facilitation provided by externally guided recalling in minimizing their retrieval deficit at the time of free recall, whereas the typical consolidation and storage deficits of converter a-MCIs would prevent significant improvement in recognizing studied items as a consequence of memory trace forgetting.

Materials and methods

MCI patients

Baseline evaluation

The experimental sample of the current longitudinal study consisted of a cohort of 80 patients diagnosed with MCI at the first assessment. All patients had been referred to the Alzheimer’s Disease unit of IRCCS Santa Lucia Foundation of Rome from 2000 to 2016. They were submitted to formal clinical, neuropsychological, behavioural and functional evaluation and a Computed Tomography (CT) or a Magnetic Resonance (MR) scan as part of the diagnostic process. The MCI subjects were classified as “amnestic” according to current clinical criteria [39] if they reported: (a) a complaint of memory decline (reported by the subject and confirmed by an informant); (b) objective memory impairment (revealed by scores below age/education-adjusted norms on at least one of the standard episodic memory tests administered); (c) normal general cognition, as indicated by Mini-Mental State examination scores above the normality cut-off (> 23.8); (d) normal activities of daily living (as confirmed by a total Clinical Dementia Rating scale score less than or equal to 0.5); (e) CT or MR brain imaging negative for focal lesions (minimal diffuse changes or minimal lacunar lesions were allowed). Furthermore, patients did not fulfill the clinical criteria for the diagnosis of dementia and had no history of drug/alcohol abuse or any psychiatric or neurological disease.

Follow-up

MCI subjects were invited to a follow-up evaluations after 12, 24, and 36 months during which they were again submitted to the clinical examination and the neuropsychological, behavioural and functional assessments administered in the screening phase. At the 12-month follow-up, 11 MCI patients from the entire sample fulfilled the criteria for the diagnosis of AD dementia (14%); at the 24-month follow-up, 13 more patients fulfilled the criteria for AD dementia (16%); at the 36-month follow-up, 15 more patients converted to AD dementia (19%). In the same period, 41 patients (51%) remained in a stable condition of selective cognitive impairment (n = 35) or normalized their performance (n = 6). MCI patients who developed a form of dementia different from AD dementia during the 3-year follow-up were not included in our experimental sample.

AD dementia was diagnosed by a neurologist (R.P.) who has great expertise in the field of dementia, according to the criteria of the National Institute of Neurological and Communicative Disease and Stroke-Alzheimer’s Disease and Related Disorders Association [40, 41].

Control group

A cohort of 62 age- and education-matched healthy individuals were also recruited from a healthcare center to serve as healthy controls (HC). Inclusion criteria were: (a) absence of neurological or psychiatric disorders; (b) no history of alcohol or drug abuse; (c) normal general cognitive functioning as confirmed by performance above the normality cut-off scores on each cognitive test administered.

The study was conducted in conformity with the Santa Lucia Foundation institutional ethics requirements. Informed consent was obtained from all participants prior to the study.

General neuropsychological examination

The tests comprising the neuropsychological battery are described below according to the cognitive domains they examine: Verbal episodic long-term memory: 15-Word List test (Immediate, 15-min Delayed recall and 30-min Recognition trial) [42] and Short story test (Immediate and 20-min Delayed recall) [43]; Visuo-spatial episodic long-term memory: Rey-Osterrieth complex figure recall (Immediate and 20-min Delayed recall) [43]; Short-term memory: Digit span and Corsi Block Tapping task forward [44]; Executive functions: Phonological Word Fluency [42], Digit span and Corsi Block Tapping task backward [44] and Modified Card Sorting Test [45]; Language: Naming objects subtest of the BADA [46]; Reasoning: Raven’s Coloured Progressive Matrices [42]; Constructional praxis: Copy of simple drawing [42], Copy of the Rey-Osterrieth complex figure [43].

For all tests, we used Italian normative data for score adjustment (sex, age, and education) and to define cut-off normality scores, which were established as the lower limit of the 95% tolerance interval for a confidence level of 95%. For each test, normative data are reported in the corresponding references.

The 15-word learning test

Since performance on the recall and recognition trials of the 15-word learning test was the topic of the present study, the procedure used to administer this test will be discussed in detail.

This test material [42] consists of a list of 15 unrelated names of concrete objects. The examiner reads the word list aloud five times. Immediately following each presentation and 15 min after the last one, the participant is required to recall as many words as possible without a time limit and in any order. The immediate recall score consists of the total number of words recalled in the 5 immediate trials (range 0–75) and the delayed score consists of the number of words recalled after the 15-min delay (range 0–15). The recognition trial consists of a list of 45 words; it includes the 15 words from the studied list and 30 non-studied foils. Fifteen minutes after the delayed recall trial of the word list, the list of words is presented auditorily by the examiner in pseudo-random order and the participant is required to discriminate studied words (“old”) from unstudied words (“new”).

Two separate scores for correct recognition of the studied items (hit = range 0–15) and for incorrect recognition of the unstudied items (false alarms = range 0–30) were computed.

Data analysis

Statistical analyses were performed using the Statistical Package for the Social Sciences (IBM SPSS, Version 22.0, Inc., Chicago, IL). To detect differences in age, education and MMSE scores, as well as in baseline neuropsychological performance among converter a-MCI, stable a-MCI and HC, one-way analyses of variance (ANOVA) were run. Differences in gender distribution were assessed using a Chi square test.

Two measures of recognition memory performance were scored according to the Signal detection theory [47]. First, the sensitivity measure d-prime (d′), which provides an index of subjects’ ability to discriminate targets from non-targets, was calculated as the normalized distance between their hit rate and false alarm rate (d′ = z Hit rate − z False alarm rate). A larger d′ means greater ability to maximize hits and minimize false alarms and, thus, better discrimination between studied words and unstudied foils. Second, the response bias C index was calculated using the following formula: C = − 0.5 (z Hits rate + z False alarms rate) to differentiate between conservative response strategies (reduced numbers of hits, C > 0) and impulsive response strategies (increased number of false alarms, C < 0). A score of C equal to 0 suggests an unbiased judgment. Moreover, to quantify the effectiveness of recognition to improve the recovery of stored words compared to free recall, an Index of Sensitivity of Recognition (ISR) was calculated for each subject using the following formula: (Total correct recognition proportion − Delayed Word list proportion of words recalled)/(1 − Delayed Word list proportion of words recalled). The total correct recognition proportion was determined by the score of (Hit + Correct rejection)/45. This index was derived from an adaptation of the Index of Sensitivity of Cueing’s formula [7], which is widely used to evaluate the efficacy of semantic cues to facilitate retrieval from stored information in Free and Cued selective reminding test procedures. Therefore, multiple-way analyses of variance with Group as a three-level between-subjects factor and Memory indexes as within-subjects factors were computed. When significant effects were detected, post hoc analyses were performed using Fisher’s LSD test with Bonferroni correction for multiple comparisons (p threshold set at: 0.05/3 = 0.02).

To investigate group differences on baseline cognitive tests between patients who converted in 9–12 months (Converter 1), 13–24 months (Converter 2) or 24–36 months (Converter 3), analyses of variance were based on non-parametric Kruskal–Wallis tests as most variables were not adjusted to the parametric assumption because of the small sample size of the three groups.

To evaluate the discriminatory power of the different neuropsychological memory tests considered for conversion to dementia, receiver-operating characteristics (ROC) curves were generated. Areas under the curves (AUCs) were used as a measure of the overall performance of the ROC curves (with 95% Confidence Interval—CI) [48]. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), diagnostic odd ratio (dOR) and overall hit ratio (OHR) were calculated as indexes of the diagnostic power of the tests in predicting AD dementia.

The optimum cut-off point was calculated only for the neuropsychological tests with significant diagnostic power; this was carried out by selecting the point on the ROC curve that maximized both sensitivity and specificity. Finally, the Kaplan-Maier survival curve was used to illustrate the differences in progression to AD dementia between patients below or above the optimum cut-off point. Survival time was calculated as the interval from the initial baseline evaluation to the diagnosis of dementia. For patients who remained non-demented, survival time was censored at the date of the last clinical assessment.

Results

Baseline demographic and clinical data

Table 1 reports subjects’ demographics and baseline neuropsychological test results. The three groups were similar with respect to age (F [2, 139] = 2.239, p = 0.11), years of education (F [2, 139] = 1.655, p = 0.19) and gender distribution (χ2 [2] = 3.075, p = 0.21). A significant difference in average MMSE score was found between groups (F [2, 139] = 22.77, p < 0.001) because of the better performance of HC compared to stable a-MCI who, in turn, obtained a higher mean score than the converter a-MCI group. Moreover, HC performed better than both stable and converter a-MCI on all tests included in the baseline neuropsychological battery with the exception of the Digit span forward, Object naming and Copy of the Rey-Osterrieth complex figure tests, in which scores of the three groups were comparable. No significant difference was detected between stable and converter a-MCI regarding performance on tests investigating cognitive domains other than memory; instead, stable a-MCI performed better than converter a-MCI patients on both the immediate and delayed recall of the Rey-Osterrieth complex figure and the Prose recall tests.

Table 1 Demographic data and neuropsychological performance of controls and a-MCI patients

Finally, Converter 1, Converter 2 and Converter 3 subjects were comparable in terms of general cognitive efficiency (MMSE: Converter 1 = mean raw score 26.91 ± 1.5; Converter 2 = mean raw score 26.31 ± 1.2; Converter 3 = mean raw score 26.33 ± 1.9) (Kruskal–Wallis ANOVA: χ2 = 1.89, p = 0.39), as well as in all the cognitive domains investigated during the baseline neuropsychological examination including tests of long-term memory (Kruskal–Wallis ANOVA: p > 0.05 in all comparisons).

Group comparison on word list recall and recognition

Performance on the immediate and delayed recall trials and proportion of forgetting, as well as d′ and C derived from the recognition performance and ISC values, are reported in Table 2. The two-way ANOVA with Group (HC, stable a-MCI, converter a-MCI) as between factor and Trial (5th immediate and delayed recall) as within factor showed a main effect of Group (F [2, 139] = 79.03, p < 0.001), which was due to the overall better performance of HC (mean 8.76 ± 2.5) compared to stable (mean 5.56 ± 2.7) and converter a-MCI (mean 4.06 ± 2.4), and by stable a-MCI compared to converter a-MCI (p < 0.001 in all comparisons). The significant main effect of Trial (F [2, 139] = 330.8, p < 0.001) was observed because of the higher average scores obtained by the whole group on immediate (mean 7.78 ± 2.58) than on delayed recall (mean 5.12 ± 3.28). There was also a significant interaction between Group and Trial (F [2, 139] = 9.33, p < 0.001). Post hoc analysis revealed that HC remembered consistently more words than stable and converter a-MCI on both the 5th immediate and delayed recall (p < 0.001 in all comparisons). Similarly, stable a-MCI were able to recall more items than converter a-MCI on both trials of the word list recall (immediate: p = 0.009, delayed: p < 0.001); nevertheless, the amount of memory loss passing from immediate to delayed recall (i.e., forgetting) was comparable between the two groups of patients (p = 0.07) and significantly more marked compared to the forgetting scores of HC (p < 0.001 in both comparisons).

Table 2 Mean raw scores, standard deviations (in parentheses) and percentage of memory loss passing from immediate to delayed recall on the word list; d′ and ISR scores on the recognition task

For the recognition task, the one-way ANOVA applied to the d’ scores revealed a significant difference between the three groups (F [2, 139] = 36.09, p < 0.001), which was due to HC scoring higher than stable a-MCI (p = 0.01) who, in turn, obtained a significantly higher d′ mean score than converter a-MCI (p < 0.001). Comparable scores were obtained between controls and patients in terms of C (F [2, 139] = 1.903, p = 0.153). The one-way ANOVA applied to the ISR values showed a significant main effect of Group (F [2, 139] = 15.37, p < 0.001). As revealed by post hoc analysis, HC and stable a-MCI exhibited similar memory improvement on the recognition task (p = 0.08), whereas the effect of facilitation provided by recognition with respect to free recall was significantly less pronounced in the converter a-MCI group compared to HC and stable a-MCI (p < 0.001 in both comparisons).

Diagnostic utility of word list recall and recognition

ROC curves were generated to compare the ability of the word list’s delayed and recognition trials to discriminate between stable and converter a-MCI patients. Results of the analysis showed that the best predictor of conversion was d′ (AUC = 0.74, 95% CI = 0.63–0.85, p < 0.001); indeed, it showed good power in discriminating converter a-MCI from stable a-MCI. Additionally, ISR significantly predicted conversion (AUC = 0.71, 95% CI = 0.60–0.83, p = 0.001). For the free recall component, we found that scores resulting from delayed recall were able to significantly discriminate converter a-MCI from stable a-MCI (AUC = 0.723, 95% CI = 0.60–0.81, p < 0.001). Despite similar AUC values, however, all the other accuracy metrics we calculated demonstrated the superiority of both d′ and ISR compared to free recall in terms of discriminating power (Table 3). The optimum cut-off point (COP = 1.999) for d′ scores, which was determined by selecting the score with the best sensitivity and specificity on the ROC curve, was used to illustrate the differences in progression to AD dementia in patients above and below the COP using Kaplan-Meier’s survival curves (Fig. 1). Results showed that 74% of the entire sample with d′ scores below the COP subsequently converted to AD (9 patients in 9–12 months, 11 patients in 13–24 months and 8 patients in 25–36 months) compared to 26% who were above the cut-off (2 patients in 9–12 months, 2 patients in 13–24 months, and 7 patients in 25–36 months). The mean time to conversion was significantly shorter for the converter a-MCIs with a d′ score below the COP (mean 20.86 months ± 17.1, with 71% of patients converting within 2 years from baseline evaluation) compared to those above the COP (mean 28.09 months ± 12.6, with 34% of patients converting within 2 years from baseline evaluation) (Log Rank: χ2 [1] = 20.99 p < 0.001).

Table 3 Diagnostic power of word list delayed recall trial and recognition d′ and ISR scores for predicting a-MCI patients’ conversion to AD dementia
Fig. 1
figure 1

Kaplan–Meier survival curve for the conversion to AD of patients with a-MCI whose d′ scores on the recognition were above or below the cut-off point

Discussion

In the present study, baseline memory performances on word list free recall and recognition trials obtained from a sample of 80 patients diagnosed with a-MCI at the first evaluation and followed-up for 3 years were analyzed to determine their diagnostic ability to predict later conversion to AD. In fact, whereas measures of recall have been well documented as predictors of the development of AD dementia in a-MCI subjects, it is not clear whether measurements of recognition tasks could be useful for identifying subjects who will convert to AD dementia over time in clinical cohorts of a-MCI.

Overall, we found that a-MCI who converted to AD dementia during the 3-year follow-up generally performed worse than HC and non-converter MCI patients on both word list free recall and recognition, as well as on all other measures of episodic long-term memory. This finding is in line with previous studies which reported a more marked episodic memory impairment in converter-MCIs compared to stable patients in standard memory evaluations [35, 49,50,51,52]. Moreover, we found that a-MCI patients who remained in a stable form of selective memory impairment obtained lower scores than normal controls on both immediate and delayed recall of the word list; they also presented a comparable amount of memory loss passing from the immediate to delayed recall trial of the word list (− 13% of accuracy) compared to converter a-MCIs (− 19% of accuracy). Supporting our initial prediction, however, stable a-MCI patients had a greater advantage on the recognition procedure than converter patients. Indeed, when we compared the effectiveness of recognition in improving the recovery of stored words compared to free recall, we found that the percentage of improvement on the recognition task obtained by stable a-MCI patients (+ 81%) was comparable to that obtained by HC (+ 86%), and significantly more pronounced than that disclosed by converter a-MCI (+ 72%). Further analyses revealed that the d′, which was derived from the recognition performance, had the best predictive validity in discriminating a-MCI who developed AD dementia within the 3-year follow-up. In fact, the d′ measure correctly classified group membership with good overall accuracy, which was higher compared to the classification of converter and stable a-MCI patients provided by free recall scores. In addition, the mean time to conversion was significantly reduced for the converter-MCIs who showed more marked impairment on recognition than converter-MCIs who did not.

Taken together, the results of the present study reinforce the view that a-MCI subjects destined to convert to AD dementia are characterized by a memory deficit that is qualitatively different from that of a-MCI subjects who remain clinically stable over time. In line with the current literature [1, 2], our results demonstrate that a specific pattern of “amnesia of the hippocampal type”, which represents the core clinical criterion for the diagnosis of AD dementia, is already recognizable in a-MCI subjects destined to convert to AD 1–3 years before dementia becomes clinically manifest, as opposed to a profile of “dysexecutive memory impairment” typical of non-converter a-MCI patients. Accordingly, a careful characterization of the qualitative aspects of memory performance on tests which are commonly used for the clinical evaluation of long-term memory ability could be an effective strategy for discriminating between these different profiles of memory impairment and, thus, for enhancing the ability to accurately recognize patients destined to convert to AD dementia from the prodromal stage. In particular, our findings demonstrate the usefulness of the recognition procedures in differentiating between a “pure-amnestic” profile in patients like converter a-MCI, in whom a consolidation deficit prevents any effective memory storage, from a “frontal” memory profile in patients such as non-converter a-MCI, in whom memory deficits are mainly due to inefficient elaborative encoding and/or retrieval strategies. In fact, similar to what occurs in AD patients, our converter a-MCIs showed both reduced delayed free recall ability and diminished sensitivity in benefiting from cues for recognizing studied words. These results are consistent with the view that the memory deficit in a-MCI patients destined to convert to AD dementia is due to an AD-related hippocampal pathology that is responsible for the defective storage of memory traces which, as a consequence, may not be recovered even when the experimental paradigm assists and facilitates retrieval processes [7, 8, 11]. On the other hand, the memory impairments of a-MCI patients who remain clinically stable over time are typically due to reduced encoding abilities and a deficit in elaborative encoding and/or in implementing effective retrieval strategies, whereas consolidation takes place normally [7, 8, 11, 14]. Accordingly, our stable a-MCI patients showed poor memory performance on the free recall procedure, which required strategically elaborative encoding of incoming information; however, in the conditions in which the request for efficient retrieval strategies was minimized, as occurs in the recognition trial, they exhibited significantly a better memory performance compared to converter a-MCIs and improved ability to recognize studied words with respect to their free recall performance, which was comparable to that of normal controls. Therefore, their poor baseline memory performance seems due to concurrent, non-amnestic factors such as poor organization and the use of inefficient retrieval strategies rather than a pure consolidation deficit of the hippocampal type. Considering the variability of clinical outcomes over time in non-converter MCI patients, various factors can account for the pattern of “dysexecutive amnesia” disclosed by this type of patient. In fact, there are many factors that can affect memory performance in elderly populations apart from neurodegenerative disorders. These include psychiatric status (i.e., anxiety, depression), vascular risk factors, hormonal changes, other geriatric or non-neurological conditions; and all of these factors can lead to impaired attention, poor strategic search in memory and reduced processing capacity, ultimately resulting in poor performance on free recall for non-amnestic reasons [2, 53].

In conclusion, the results of the present study demonstrate the importance of studying the qualitative aspects of memory deficits to better predict the risk of developing AD dementia and in particular the usefulness of recognition procedures for identifying MCI subjects with memory disturbances compatible with an early phase of AD dementia. In fact, a decreased sensitivity in benefiting from the facilitation provided by externally guided cues in minimizing retrieval deficits on free recall is significantly associated with higher risk and faster mean time of conversion to AD dementia. Compared to free recall procedures, scores deriving from the recognition task showed high sensitivity in identifying genuine deficits in consolidation and storage that are typical of converter-MCIs and mirror early involvement of MTL areas. Therefore, added to the previously reported qualitative methods for investigating memory deficits in MCI patients, this test could be a useful diagnostic tool for predicting progression to AD dementia from the prodromal stage.