Keywords

1 Background

The Mini-Mental State Examination (MMSE) was published in 1975 [1] as a relatively simple practical method of grading cognitive impairment. Since then it has become the most commonly used cognitive screener [2]. Whilst the MMSE may never have been intended as a diagnostic (case-finding) tool, it has been extensively investigated as a diagnostic test of dementia and to a lesser extent as a diagnostic screen for mild cognitive impairment (MCI) and delirium. Many are attracted by the brevity of the instrument (typically taking 6–8 min in healthy individuals) and its initial royalty free distribution (since 2001 copyright was acquired by Psychological Assessment Resources: http://www.minimental.com/). In clinical practice common applications of the MMSE are to help clinicians grade the severity of cognitive change and to help with cognitive screening [3, 4]. The concept of screening as used here is an initial examination largely to rule-out (reassure) those without cognitive disorder with as few false negatives as possible. It is less clear whether the MMSE has a case-finding role (that is, to confirm a clinical diagnosis with minimal false positives).

The MMSE has an internal structure of 20 individual tests covering 11 domains including orientation, registration, attention or calculation (serial sevens or spelling), recall, naming, repetition, comprehension (verbal and written), writing, and construction. Internal consistency appears to be moderate with Cronbach alpha scores reported between 0.6 and 0.9 [5, 6]. Test-retest reliability has been examined in several studies, and in those where re-examination took place within 24 h reliability by Pearson correlation was usually above 0.85. Scoring emphasises orientation (time – 5 points; place – 5 points); attention/concentration/calculation (5 points) with lower emphasis on registration memory (3 points) and recall (3 points). Relatively little weight is placed on naming (2 points), repetition (1 point), following a three-stage command (3 points), reading (1 point), writing (1 point) or copying intersecting pentagons (1 point). Factor-analytic and item-response studies suggest up to five factors [7, 8]. Using Rasch analysis it is possible to grade the completion difficulty of each item on the MMSE. Relatively difficult items are the recall of three words, citing the correct date, copying the pentagon design and spelling WORLD backwards or completing serial sevens. Conversely, relatively simple items are naming the correct country, registering three words, following the command, and naming an object. Acceptability is generally high but it falls in those with definite or suspected impairment who may be reluctant to expose perceived deficits [9]. All questions are designed to be asked in the order listed, with omissions scored as errors giving a maximum score of 30. However there is some ambiguity in several items leading to the structured MMSE (see Chap. 4 at Sect. 4.2.1).

Approximately 200 validation studies have been published using the MMSE as the principal tool or as a comparator tool but many are underpowered and/or lack an adequate criterion standard and hence can give a misleading impression of accuracy [10]. For example Folstein, Folstein, and McHugh validated the MMSE in only 38 patients with dementia [1]. Yet this extensive evidence base means scores are fairly well understood by health professionals and can be adjusted on the basis of normative population data. For example Crum et al. tested an extensive group of 18,056 participants in the U.S. Epidemiologic Catchment Area (ECA) study and presented distributions by age and educational levels [11]. Some groups have provided norms for each item on the MMSE by age group [12]. Yet there remains uncertainty regarding optimal cut-off threshold for each condition under study [1316]. A cut-off of <24 was recommended as significant by Folstein and colleagues in persons with at least 8 years of education [1]. Some individuals with MCI or early dementia and a background of extensive education may experience a ceiling effect with the MMSE (see early dementia, Sect. 3.3 below). In other words the MMSE may lack subtle tests necessary to detect early cognitive changes particularly regarding recall.

Here I will review the diagnostic accuracy of the MMSE in the detection of the common cognitive disorders in clinical practice namely: dementia, mild cognitive impairment (MCI), and delirium.

2 Diagnostic Validity in Dementia of Any Severity

The MMSE has been extensively investigated as a diagnostic test for current dementia either on its own or against comparison scales. O’Connor et al. conducted one of the first adequately powered tests of the MMSE using a cut-off <24 in 586 patients who received a CAMDEX/CAMCOG interview as a reference standard [17]. O’Connor et al. found that sensitivity of the MMSE was 86 % and specificity 92 %. In 2009 Mitchell undertook a meta-analysis of 34 MMSE dementia studies [18] and this was revised to 45 studies in the previous edition of this chapter [19]. This included community studies, primary care studies. and studies in specialist settings where the prevalence of dementia is relatively high. The prevalence of each condition in each setting strongly influences the performance of a test (see Chap. 2 at Sect. 2.3.4). High prevalence settings favour case-finding with few false positives but at the expense of false negatives. Low prevalence settings favour screening with few false negatives but at the expense of frequent false positives. The most recent meta-analysis published in 2015 included 108 MMSE studies involving 36,080 subjects (10,263 with dementia) [20]. The most common cut-off values to define dementia were <23 and <24. Across all studies, the prevalence was 28 % showing that the authors combined all settings: specialist and non-specialist.

Using bivariate random-effects model the sensitivity from this meta-analysis was 81.3 % (95 % CI = 80.6–82.1 %) and specificity was 89.1 % (95 % CI = 88.7–89.5 %). Further analysis is shown in Table 3.1. PPV was calculated as 74.8 % (95 % CI = 74.0–75.6 %) and NPV was 92.3 % (95 % CI = 92.0–92.6 %). The positive clinical utility index (CUI) was 0.608 “fair” (95 % CI = 0.598–0.618) for case-finding and negative CUI was 0.822 “excellent” (95 % CI = 0.819–0.825) for screening. No results were presented by setting but can be estimated using the Bayesian plot of conditional probabilities (Fig. 3.1) which illustrates the effect of changing prevalence.

Table 3.1 Summary table of diagnostic accuracy of the MMSE for cognitive impairment
Fig. 3.1
figure 1figure 1

Meta-analytic summary accuracy of the MMSE for Dementia, Delirium and MCI across a range of probabilities. Pre-test - Post-test Bayes Plot of Conditional Probabilities; * results from Spering et al. 2012 [32]; MMSE+ score below the chosen MMSE cut-off indicating a positive test; MMSE– score above the chosen MMSE cut-off indicating a negative (normal) test

It should be noted that overall performance deteriorates if patients with MCI are combined with healthy controls (see Sect. 3.4 below). Regarding broadly defined dementia, the MMSE would be most suitable as a screening test in specialist settings, and in primary care provided instrument length was not problematic.

3 Diagnostic Validity in Early Dementia

One critical question is whether the MMSE retains sufficient accuracy when looking for early dementia. People with early dementia are particularly at risk of being overlooked and undertreated [21]. Provisional evidence from three studies suggests a modest reduction in accuracy when attempting to diagnose those with mild dementia. For example, in specialist hospital or memory clinics, Heinik et al. found that the area under the ROC curve was 0.96 for all dementias but 0.89 for very mild dementia [22] and similarly Meulen and colleagues found that the area under the ROC for the MMSE was 0.95 for all dementias but 0.87 for mild dementia [23]. Also a cut-off threshold higher than ≤23 is recommended when looking for mild dementia. Yoshida et al. [24] found 95 % sensitivity and 83 % specificity looking for mild dementia in a Japanese memory clinic at a threshold of ≤28 which would give “good” clinical utility for screening (CUI + = 0.789) and case-finding (CUI− = 0.786). At a lower threshold of ≤25 sensitivity fell to 76 % but specificity increased to 97 % which would also have “good” clinical utility for screening (CUI + = 0.800) and case-finding (CUI− = 0.727). In a sub-analysis of 88 people with mild Alzheimer’s scoring >20 on the MMSE, Kalbe and colleagues [25] found that the MMSE had a sensitivity of 92 % and a specificity of 86 % (PPV = 85.2 %, NPV = 92.2 %) which again would imply “good” clinical utility for case-finding (CUI + = 0.781) and screening (CUI− = 0.796). Regarding diagnosis of mild dementia in primary care, Kilada and colleagues found adjustment of the MMSE cut-off to ≤27 was required [26]. Grober et al. [27] examined the value of MMSE in 317 primary care attendees with mild dementia (CDR of 1.0 and 0.5 but without MCI). In this study, at a cut-off of ≤23 sensitivity was 53 % and specificity 90 % (PPV = 52.7 %, NPV = 90.1 %), but at a cut-off of ≤26 sensitivity was 73 % and specificity 73 % (PPV = 36.0 %, NPV = 92.7 %) suggesting only “fair” clinical utility. Further information on the diagnosis of early dementia comes from studies in which the comparator sample is a combination of healthy controls and those with MCI as this is more likely to be the situation clinically (see Sect. 3.4).

4 Diagnostic Accuracy in the Detection of MCI

There were only five studies published up to 2009 regarding MMSE for diagnosis of MCI [18] but by 2012 this had risen to 11 qualifying studies [19]. In 2015 a meta-analysis found 21 studies with a sensitivity estimate of 0.62 (95 % CI = 0.52–0.71) and specificity of 0.87 (95 % CI = 0.80–0.92) [20]. A new search for this chapter revealed 40 relevant studies (see Table 3.1 for summary findings). Most have used cross-sectional rather than longitudinal definitions of MCI and these criteria themselves remain somewhat controversial [28, 29]. These are essentially the combination of subjective memory complaints with objective impairment but no dementia and “minimal” functional decline. It is important to realise many patients with pre-dementia cognitive decline will not fulfil these rules largely because of measurable problems with activities of daily living or absence of recorded subjective memory complaints. Thus MCI should be considered as only one of several possible pre-dementia categories. Further, it is now recognised that many with MCI do not progress but remain stable or actually improve.

An overview of 40 studies shows that the majority used the Mayo Clinic diagnostic criteria suggested by Petersen and colleagues [28, 30] but some use revised Winblad criteria [29] and a minority use a Clinical Dementia Rating score of 0.5 (CDR) [31]. The vast majority were recruited from memory clinics or secondary care, only a handful claim to recruit directly from the community. Samples were not matched demographically but instead recruited from convenience samples, which is nevertheless similar to clinical practice. Thus across these 40 studies, the mean age of those with MCI was 73.2 years whilst in healthy controls it was 71.0 years. The proportion of females in MCI studies was 44 % and in controls 46.9 %. Regarding education, the mean number of educated years in those with MCI was 9.79 vs 9.64 in controls. Perhaps the major question regards cut-off threshold on the MMSE: 12 studies used <29; 9 studies used <28; 17 studies used <27; and 9 studies used <26.

Summary results are shown in Table 3.1. After weighting, the meta-analytic sensitivity was found to be 59.7 % (95 % CI = 58.6–60.7 %) and specificity was 80.2 % (95 % CI = 79.4–81.0 %). PPV was 72.1 % (95 % CI = 71.1–73.2 %) and NPV 69.9 % (95 % CI = 69.0–70.7 %). The positive clinical utility was 0.431 “poor” (95 % CI = 0.418–0.444) for case-finding and negative CUI was 0.561 (95 % CI = 0.553–0.568), that is qualitatively “fair”, for screening.

A related question is how the detection of dementia is influenced by the inclusion of patients with MCI in the comparator group alongside healthy controls. This is a clinically useful question as attendees in memory clinics usually are mixed in type and severity. One very large study (n = 6843) provides the answer [32]. In comparison to detection of dementia against healthy controls alone, specificity falls as does PPV when using MMSE to detect dementia vs healthy controls and/or people with MCI. For example, at a cut-off of ≤26 whilst sensitivity remains at 71.6 % (95 % CI = 69.8–73.4 %), specificity falls from 97.9 to 93.5 % (95 % CI = 92.8–94.2 %) and PPV falls from 96.3 to 85.1 % (95 % CI = 83.5–86.7 %). In this mixed comparison, overall the optimal threshold appears to be ≤26 as clinical utility is “fair” for case-finding (CUI + = 0.609) and “very good” for screening (CUI− = 0.808) at this cut-point.

5 Diagnostic Validity in Delirium

Delirium is a mental disorder usually characterized by acute or sub-acute onset, impaired attention, an altered level of consciousness and a fluctuating course. Frequently there are widespread cognitive deficits in orientation, memory, attention, thinking, perception and insight. It occurs in approximately 10–30 % of vulnerable patients admitted to hospital. If unresolved, delirium is strongly associated with poor outcomes such as disability and death [3335]. Randomized trials have shown multi-component preventive strategies to be effective in preventing and treating delirium [36]. However it remains under-recognized leaving a possible role for screening instruments [37]. A recent review of the accuracy of 11 instruments used in 25 studies highlighted potential value of the Global Attentiveness Rating (GAR), Memorial Delirium Assessment Scale (MDAS), Delirium Rating Scale Revised-98 (DRS-R-98), Clinical Assessment of Confusion (CAC), Delirium Observation Screening Scale (DOSS) and Nursing Delirium Screening Scale (Nu-DESC) [37]. The Confusion Assessment Method (CAM) was the most thoroughly studied but the Mini-Mental State Examination (MMSE) was omitted from this review [37].

The MMSE may not seem the ideal choice for delirium but nevertheless has the potential to be useful because of its broad cognitive remit. Indeed the accuracy of the MMSE in detecting delirium has been reported in a recent meta-analysis [38]. No more recent primary studies have been published to date. Thirteen studies were included in this meta-analysis representing 2017 patients in medical settings of whom 29.4 % had delirium. The meta-analysis revealed the MMSE had an overall sensitivity and specificity estimate of 84.1 and 73.0 %, but this was 81.1 and 82.8 % in a subgroup analysis involving robust high quality studies. Sensitivity was unchanged but specificity was 68.4 % (95 % CI = 50.9–83.5 %) in studies using a predefined cut-off of <24 to signify a case. Clinical utility was poor for confirmation (case-finding) of delirium but good for initial screening (minimizing false negatives).

6 Conclusion and Implementation

This chapter brings up to date the latest evidence concerning the application of the MMSE as a diagnostic test for dementia, MCI and delirium. It is worth acknowledging that the MMSE has a number of obvious limitations [4]. It has a floor effect (imprecise measurement in the very severe range) [39, 40] which is notable in advanced dementia, in those with little formal education, and in those with severe language problems. There is also a ceiling effect, meaning it may not perform well in people with very mild dementia or indeed MCI [41]. This is thought to relate to its relatively crude testing of recall based solely on three objects. This problem is likely to be amplified when testing highly educated individuals. That said, this current analysis reveals that the MMSE is only marginally impaired in the detection of mild dementia as compared to the detection of moderate to severe dementia.

Most cognitive tests are influenced by age, education, and ethnicity and the MMSE is no exception [40]. Twelve percent of the variance in MMSE scores can be attributed to age and education alone [42]. Tables of adjustment by age and education have been published but are often overlooked by busy clinicians [43]. However a useful rule of thumb when screening for dementia is to choose a cut-off threshold of <21 for those with a basic school education, <23 for those with a high school education, and <24 for those with graduate/university education. Another important limitation is its length, particularly when its intended use is in primary care [44, 45]. Whilst it can be completed and scored in 5–8 min in unimpaired healthy individuals, it often takes 15 min or more in patients with dementia [23].

The focus of this chapter has been on the accuracy of the MMSE when used to help in the diagnosis of a cognitive disorder. A cognitive test can be used as a screening tool to reassure those without cognitive impairment, or as a case-finding tool to confirm those that do have cognitive impairment. The MMSE performs differently for each purpose and does not perform well as a single tool used for all types of patient in all settings. Overall results from 108 studies suggest it performs best when separating dementia from healthy cognitively unimpaired individuals. Here clinical utility was qualitatively “fair” (CUI + = 0.608) for case-finding and “excellent” (CUI− = 0.822) for screening. Performance was slightly weaker in early dementia vs healthy unimpaired individuals but the MMSE still achieved a “good” clinical utility. For MCI, however, the MMSE had a poor positive clinical utility (0.431) for case-finding and the negative CUI was only “fair” (0.561) for screening, illustrating limited performance for MCI. In most memory clinics people are not simply divided into dementia or healthy, therefore the comparison of dementia vs healthy combined with MCI is of note. In the detection of dementia vs healthy controls or MCI the clinical utility is no longer “poor” but “fair” for case-finding (CUI + = 0.609) but a “good” rating is preserved for screening (CUI− = 0.808). However an adjustment of cut-off threshold to ≤26 is necessary. Thus in specialist settings the MMSE is likely to be useful for initial reassurance in those who score 27 or above. Regarding delirium the latest evidence shows clinical utility of the MMSE was fair for confirmation (case-finding) of delirium but again “good” for initial screening (minimizing false negatives).

The final decision whether to use the MMSE as a diagnostic tool will depend on the consequences of false positives and false negatives. The following examples are illustrative of screening yield. In the case of the MMSE for dementia vs healthy controls (sensitivity = 81.3 %, specificity = 89.1 %, prevalence = 28.4 %) out of 100 people tested the MMSE would correctly identify 23 with dementia, missing 5; and it would correctly reassure 64, with 8 false positives. In the case of the MMSE for MCI vs healthy controls (sensitivity = 59.7 %, specificity = 80.2 %, prevalence = 46.2 %) out of 100 patients tested the MMSE would correctly identify 28 with MCI, missing 18; and it would correctly reassure 43, with 11 false positives. If all those tests (i.e. including those with false negatives and positives) received further evaluation then the adverse consequences of any initial erroneous results would be minimised, however if those with false negatives received no follow-up and those with false positives received incorrect treatment then the consequences of error could be serious. Further, one must consider uptake of follow-up testing. Past research has shown that the uptake of further diagnostic tests by individuals who screened positive for cognitive impairment is between 28 and 48 % [46, 47].

Some may argue that data on the accuracy of a tool does not prove that it is effective in clinical practice. Few studies have actually evaluated whether the MMSE (or indeed any cognitive tool) improves outcomes when implemented in a clinical setting. Although one early study incorporating the MMSE showed no beneficial effect of delirium screening [48], a second larger randomized study of delirium screening and treatment was effective [49]. Regarding implementation of MMSE screening for dementia, in a non-randomized study Van Hout and colleagues [50] found general practitioners opted to use the MMSE in only 18 out of 93 cases and use of the MMSE was not associated with better diagnostic accuracy. However in a 24-month cluster-randomized study, Fowler et al. [51] found those who received cognitive test results were more likely to order diagnostic tests and discuss memory problems with patients, and patients were more likely to be taking cognitive-enhancing medication at follow-up. Overall this lack of evidence from implementation studies has led some guidelines to advise against routine (and/or population based) screening for cognitive impairment in asymptomatic individuals [52, 53]. In truth, evidence from implementation studies where clinicians are randomized to using or not using the MMSE is lacking across all cognitive disorders and all stages, whether people are symptomatic or asymptomatic. Further research should focus on this question of implementation effectiveness.

The MMSE has gained tremendous popularity as a relatively quick ‘bedside’ cognitive test but its diagnostic accuracy has been hitherto unclear. The best evidence available to date suggests it is not an ideal tool for case-finding dementia and it is frankly poor at case-finding MCI and only fair for dementia and delirium. However it can have a role as a first step screener for dementia, MCI or delirium. In fact, for dementia vs healthy controls it has “excellent” screening accuracy (although this falls to “good” if the population is mixed healthy controls and MCI). As an initial first step screener for delirium it has good accuracy and for MCI only “fair” accuracy. If the MMSE is used in clinical practice then I recommend for those scoring below threshold (positive) that a second step comprehensive clinical and neuropsychological evaluation is conducted.