Introduction

Background

The Alzheimer’s Disease Assessment Scale-cognition subscale (ADAS-Cog) (Rosen et al. 1984) is the most widely used general cognitive measure in clinical trials of AD (Connor and Sabbagh 2008; Ihl et al. 2012; Rozzini et al. 2007). The ADAS-Cog was developed as an outcome measure for dementia interventions; its primary purpose was to be an index of global cognition in response to antidementia therapies. The ADAS-Cog assesses multiple cognitive domains including memory, language, praxis, and orientation. Overall, the ADAS-Cog has proven successful for its intended purpose. Despite small effects of pharmaceutical interventions (Li et al. 2008; Birks et al. 2009) that some have considered too negligible to be worth the cost (Loveman et al. 2006), the ADAS-Cog has detected differences between treatment and placebo groups. There are two versions of the ADAS-Cog in terms of item content. The original version of the ADAS-Cog (Rosen et al. 1984), which we call the ADAS-Classic, was subsequently modified by Mohs and colleagues (1997) who added additional items. We refer to this modified version as the ADAS-Modified.

Although the ADAS-Cog was designed for people with AD, the ADAS-Cog has also been used as an outcome measure for trials of interventions in people with MCI. The utility of the ADAS-Cog in MCI has been shown to be limited (Mohs et al. 1997; Benge et al. 2009; Sano et al. 2011). In MCI, there may be relatively little cognitive decline to detect. Measurement imprecision in the outcome scale that would be tolerable in the AD stage (in which rates of decline are faster) may not be tolerable during the MCI phase. The extent to which a test is able to detect change, given that change has occurred, is referred to as responsiveness (Kirshner and Guyatt 1985; Husted et al. 2000; Beaton et al. 2001). It may be that the ADAS-Cog is insufficiently responsive for trials in MCI.

In general terms, there are two strategies to improve the responsiveness of an instrument, both of which have been applied to the ADAS-Cog. The first of these is to use a more optimal weighting system for the items. The second is to add additional item content. We discuss these strategies in the sections below.

More optimal weighting

The first strategy to improve the responsiveness of an instrument is to use a more optimal weighting system for the items. This strategy recognizes that there was little scientific rationale for the initial weights (item scores) assigned to the various tasks included in the ADAS-Cog (Rosen et al. 1984). Two different approaches have been applied to optimize weights for the ADAS-Cog to date, an approach based on recursive partitioning trees (Llano et al. 2011) and the Rasch model approach (Wouters et al. 2008).

Llano and colleagues developed weights for the ADAS-Cog using recursive partitioning. In this approach, weights are derived based on maximizing differences between known groups. Llano and colleagues chose weights to maximize discrimination among normal, MCI, and AD in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, the same data set we analyze here. We refer to the score they developed as the ADAS-Tree, for recursive partitioning trees. Llano and colleagues compared the ADAS-Tree with the Mini-Mental State Examination (Folstein et al.1975) and standard scoring of the ADAS-Cog with respect to several criteria. Not surprisingly, because of the way it was derived, the ADAS-Tree was superior to the MMSE or the ADAS-Cog in terms of differentiating between normal, MCI, and AD groups. The ADAS-Tree was also comparable to AD biomarkers of plaque formation (CSF Aβ42), neuronal degeneration (MRI volumetric measures and CSF ptau/Aβ42 ratio) and dysfunction (FDG-PET) in its ability to predict conversion from MCI to AD over a 12-month period. Whether the recursive partitioning approach will result in improvements in responsiveness is an open question; there is no particular reason to suspect that improved discrimination between groups (a cross sectional phenomenon) would necessarily be associated with improved responsiveness (a longitudinal phenomenon) (Kirshner and Guyatt 1985).

An alternative method of more optimal weighting uses modern psychometric theory. In this approach, only relationships between items are considered when deriving the weights (as opposed to recursive partitioning, where the known groups are used along with the item level data to generate the weights). One (or more) latent traits are posited to underlie the covariation among observed responses to test items. All modern psychometric theory models include a parameter for item difficulty. Estimates of ability levels then take account of the difficulty levels of the items. The modern psychometric approach assigns weights to optimally assess the underlying ability levels, whatever their effect on differences between groups. If discrepancies between groups in item responses are driven by differences in underlying ability levels, then cross sectional group differences from the recursive partitioning approach and the modern psychometric approach should be similar.

Wouters and colleagues (2008) used a modern psychometric approach to improve the accuracy of the ADAS-Cog. We refer to their score as ADAS-Rasch because it was based on the Rasch model. In both the Rasch model and the models we considered here, the difficulty level of the items is explicitly modeled. In the Rasch model, however, the items are each posited to have equal weights, while in the models considered in this paper, item weights are modeled empirically.

To summarize: the first approach to improve the responsiveness of the ADAS-Cog for people with MCI is to consider changes to the weights of the items. Two papers have used these approaches. The first used recursive partitioning to maximize differences between groups. The second used modern psychometric theory. We will use a more flexible modern psychometric model in this paper.

Adding additional content

A second approach to improving the responsiveness of the ADAS-Cog for MCI is to add additional content. This approach may be important when trying to improve responsiveness in particular, as additional indicators may be necessary to increase the ability to detect small differences in underlying ability. This strategy has already been applied to the ADAS-Cog. In its initial manifestation, the ADAS-Classic (Rosen et al. 1984) did not include delayed word recall or number cancellation, which was added in the ADAS-Modified (Mohs et al. 1997). The purpose of those additions was to broaden the scope of cognitive domains covered and range of symptoms consistent with mild to moderate AD (Mohs et al. 1997). Our work here can be considered a further extension of those earlier efforts, where we are hoping to add additional item content to improve the range of symptoms consistent with MCI rather than mild to moderate AD.

We were interested in extending the ADAS-Cog’s ability to detect differences in levels of executive functioning (EF). Recent studies have shown EF deficits to predict the conversion from MCI to AD in amnestic MCI populations (Nakata et al. 2009; Brandt et al. 2009). Performance in instrumental activities of daily living (IADL) and longitudinal change in IADL have been shown to be related to EF abilities (Cahn-Weiner et al. 2002; Farias et al. 2006).

We also considered adding informant reports to further extend the reach of the ADAS-Cog for people with MCI. Several studies have found informant- based appraisals of functional status to be reliable indices of cognitive and functional deficits in dementia participants (Koss et al. 1993; Butt 2008; Mackinnon et al. 2003). A study of one measure, the Functional Activities Questionnaire (FAQ), (Pfeffer et al. 1982) found functional deficits in 72 % of amnestic MCI individuals, suggesting that these deficits may have utility in the detection of early cognitive decline (Brown et al. 2011). These findings, along with evidence from other studies, suggest that informant-based questionnaires may also have utility in detecting cognitive and functional changes in MCI populations (Sabbagh et al. 2010).

An important consideration for any measure is that it must be valid, that it measures the thing it is supposed to measure. Furthermore, when augmenting an existing measure, the augmented measure should measure essentially “the same thing” as the original. Careful theoretical considerations and data analyses are needed to ensure that this is the case. Content experts consider additional content by referring to conceptual models of what the original scale measures, and deciding whether the additional content can be considered consistent with the latent trait or ability measured by the original scale. Data then can be analyzed using confirmatory factor analytic approaches to ensure that the latent structure of the augmented measure is essentially the same. Similarly, association studies with external factors known a priori to be associated with the thing measured by the scale can be performed to ensure that relationships with these external factors are similar for the augmented measure. All of these techniques together are useful in determining whether the validity of a revised scale is similar to that of the earlier scale.

To summarize: the second strategy to improve the responsiveness of the ADAS-Cog for people with MCI is to add additional content. This is the strategy that was applied to the ADAS-Classic to produce the ADAS-Modified. We propose to further augment the ADAS-Cog by adding additional objective EF items and informant reports of functioning. Our goals were to determine whether we could add to the content of the ADAS-Cog without reducing its validity, and to compare the responsiveness of our augmented ADAS-Cog candidates to other approaches.

In the current study, we used both strategies to improve the responsiveness of the ADAS-Cog for people with MCI, employing weights derived using modern psychometric theory, and considering additional item content. We derived our measures using data from 811 Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants, and validated them among the 394 participants with MCI.

Materials and methods

Overview

The current work examines the utility of using modern psychometric theory and adding measures of EF and informant-based reports of functional ability to the ADAS-Cog to increase responsiveness among people with MCI. We used data from people with MCI at baseline who had been followed for up to 36 months in ADNI. In a training subset of the data, we compared the responsiveness of ten potential expanded versions scored using modern psychometrics. We chose the best composites with EF alone and with EF plus informant items. In the validation subset of the data, we evaluated these two candidates along with ADAS-Cog scores (ADAS-Classic, ADAS-Modified, ADAS-Rasch, and ADAS-Tree). We performed four sets of analyses: responsiveness, ability to predict conversion from MCI to AD, strength of association with baseline magnetic resonance imaging (MRI) indices of AD pathology, and strength of association with baseline cerebral spinal fluid (CSF) biomarkers of AD pathology.

Participants

Data were obtained from ADNI, a longitudinal study designed to assess the rate of progression of MCI and Alzheimer’s disease using biological markers, clinical, and neuropsychological data. Participants were recruited from more than 50 sites across the United States and Canada. The target recruitment goal was 800 adults aged 55 to 90 including approximately 200 cognitively normal older adults, 400 people with MCI, and 200 people with early AD. Diagnosis of amnestic MCI required patient-reported memory complaints, objective memory deficits, intact functional activities, a Clinical Dementia Rating Scale (Morris 1993), global score of 0.5, and a Mini-Mental State Examination (Folstein et al. 1975) score of 24 or greater. Participants with AD met the National Institute of Neurological and Communicative Diseases and Stroke- Alzheimer’s Disease and Related Disorders Association criteria for probable AD (McKhann et al. 1984). Further details about ADNI, including participant selection criteria and study protocol can be accessed online at www.loni.ucla.edu/ADNI. Of the 819 ADNI participants eligible at baseline, 810 had complete data for the ADAS-Cog and were included in this study (Table 1).

Table 1 Demographic, clinical, CSF and MRI data by baseline diagnosis (n = 810 with baseline ADAS-Cog data)

Measures

ADAS-classic

The original ADAS-Cog (Rosen et al. 1984) included 11 items assessing cognitive function. The domains include memory, language, praxis, and orientation. There are 70 possible points, 48 for the first 9 items, and 22 for the last two items, word recall and recognition. Test performance was assessed for errors in following ordered commands, naming of real objects and of fingers, constructional praxis (copying of geometric forms), ideational praxis (preparation of a letter for mailing), orientation, a 10-item word recall task and a 12-item and 12 foils word recognition task. Higher scores reflect greater cognitive impairment.

ADAS-modified

The modified ADAS-Cog 13-item scale (Mohs et al. 1997) includes all original ADAS-Cog items with the addition of a number cancellation task and a delayed free recall task, for a total of 85 points. As in the parent instrument, higher scores indicated greater severity. According to Mohs and colleagues, the purpose of these additional items was to increase the number of cognitive domains and range of symptom severity without a substantial increase in the time required for administration.

Executive function measures

The first category of items we considered for adding to the ADAS-Cog were executive functioning items. These items are characterized as being “objective” in that the participant is directly observed performing the task and receives an easily identified score based on those observations. The ADAS-Cog includes several items of this type. The ADNI neuropsychological battery included the Trail-Making Test (TMT) A & B (Reitan and Wolfson 1985), the WAIS-R Digit Symbol Substitution Test (DSST) (Wechsler 1987), digit span (Wechsler 1987), and category fluency (Strauss et al. 2006) (see http://www.nia.nih.gov/research/dn/alzheimers-disease-neuroimaging-initiative-adni for complete details). The TMT (A & B) assesses attention, speed, and mental flexibility. For this test, participants connect numbers (TMT-A) and numbers and letters (TMT-B) in order. Time to complete was the outcome of interest for this measure. Trails A is truncated at 3 min and Trails B at 5 min. The DSST measures speed and information processing. Participants are instructed to refer to a number-symbol key at the top of the page and to write corresponding symbols under the corresponding numbers as quickly as possible. For digit span, an auditory attention task, participants were asked to recall a series of numbers forward and backward. For category fluency, a measure of speed and flexibility of verbal thought, participants were asked to name as many items as possible in a specified category (vegetables); unique responses during the first minute were counted.

Functional assessment

The second category of items we considered for adding to the ADAS-Cog was informant-reported functioning. Unlike the objective items considered above, study staff do not directly observe these behaviors, and rely instead on the reports of informants. We thus considered two sets of augmented ADAS-Cog items. The first set included additional EF items only, and the second included additional EF items plus additional informant-reported functioning items.

The Pfeffer Functional Assessment Questionnaire (FAQ) (Pfeffer et al. 1982) was used to assess informant appraisals of functional abilities. Among ADNI participants with MCI, most informants lived with the participant (80 %) and most were a spouse or child (91 %). Those living with the participant reported a mean of 131 h together per week, and those not living with the participant reported a mean of 13 h together per week. FAQ items are in the form of specific tasks for which the informant rates the level of independence from normal (0) to dependent (3). Despite the requirement of “intact functional activities,” there was considerable variability on the FAQ among MCI in ADNI (Table 1). For this study we selected a subset of 5 FAQ items that we thought would be most relevant to MCI: 1) “Writing checks, paying bills, balancing checkbook;” 2) “Assembling tax records, business affairs, or papers”; 3) “Playing a game of skill, working on a hobby;” 4) “Keeping track of current events;” and 5) “Remembering appointments, family occasions, holidays, medications.”

Magnetic resonance imaging

All ADNI participants completed neuroimaging at baseline, 6, 12, and 24 months. MCI individuals had additional neuroimaging at 18 and 36 months. All participants received 1.5 T structural MRI. Total volumes of the brain, the left and right inferior and lateral ventricles, left and right hippocampus, and left and right entorhinal cortex from the baseline structural MRI scan were used in the current study. We chose all four brain measures for their sensitivity to preclinical AD, but posited that the hippocampus and entorhinal cortex would primarily relate to memory processes, and may not have an improved relationship to cognition with the additions of functional and executive function measures (Devanand et al. 2007; Evans et al. 2010). Additionally, total intracranial volume was obtained as a covariate. Images were processed using Freesurfer software (http://surfer.nmr.mgh.harvard.edu), an atlas-based approach that has been validated for use in participants with a great deal of morphologic variability (Desikan et al. 2006). The current analyses utilized processed imaging data from the ADNI database. These data are publicly available on the UCLA Laboratory of Neuroimaging (LONI) website (www.loni.ucla.edu/ADNI) along with detailed information about ADNI neuroimaging instrumentation, image acquisition, and image processing (http://www.loni.ucla.edu/ADNI/Research/Cores/index.shtml).

CSF

ADNI investigators obtained CSF samples from approximately 50 % of participants with MCI at baseline. Measures from the CSF included beta amyloid 1–42 (Aβ1-42), tau protein (Tau), 181phosphorylated-tau (Tau181p), tau-to- Aβ1-42 ratio (Tau/Aβ1-42), and Tau181p -to-Aβ1-42 ratio (pTau181p/Aβ1-42). Sample collection and analysis procedures are described in detail by Shaw et al. (2009). Values for these biomarkers were available at http://www.adni-info.org.

Test development: more optimal weighting

All latent-trait models were developed in the ADNI baseline data (n = 810), using Mplus (5.2) with the theta parameterization and the WLSMV estimator (Muthén and Muthén 1998–2007). Mplus can model a maximum of 10 categories for categorical items. We collapsed items with more than 10 categories, using a strategy for maintaining variability in the tails at the expense of maintaining variability in the middle of the distributions (Online appendix 1). We first formed a model for the modified ADAS-Cog, using all 13 items from the expanded version. We considered a single-factor model and a bi-factor model that accounted for methods correlations between the three word recall and recognition items, and the four items rated by the interviewer. We freely estimated loadings and fixed the variance of the primary factor at 1, leading to scores with a mean of 0, and standard deviation of 1 in the ADNI baseline sample. We then used item parameters from the baseline models to compute scores at follow-up visits.

A test information curve shows how precisely the latent trait is modeled over the ability spectrum. In a bi-factor model, the assumptions underlying the typical formulas for computing test information (Baker and Kim 2004, chapter 3) are not valid. We simulated multiple response patterns based on a range of underlying true values of the latent trait represented. We computed the maximum likelihood estimates (MLE) of a score for each of the simulated response patterns. The inverse of the variance of the MLEs at each of the underlying values forms a measure of how precisely the collection of items can measure an individual’s cognitive function.

Test development: adding additional items

We reviewed the ADNI battery and selected indicators we thought would be responsive in MCI. Decisions regarding test inclusion were determined by consensus of a group of clinical neuropsychologists. In all, we came up with 10 theoretically justifiable sets of additions to the ADAS-Cog (Table 2). Eight of these included additional EF items, and two included both additional EF items and additional informant-based items from the FAQ. Our next step was to narrow the list down to 2 candidate models, one with only directly observed EF data, and a second with a combination of directly observed EF data and informant reported data.

Table 2 The ADAS-Modified and the augmented ADAS-Cog models evaluated in the training sample. All models include the 13 items in the ADAS-Modified

Many of the imaging analysis strategies ADNI has developed use separate training and validation subsets of the data to avoid overly optimistic findings due to deriving and testing an analytic procedure in the same individuals. ADNI has developed specific subgroups to ensure consistency across different analytic approaches. We used participants in the ADNI-specified training sample to evaluate our 10 tests. We picked our candidate tests based on their responsiveness among those with MCI at baseline. We operationally defined responsiveness using mixed models with random intercepts and slopes and an unstructured covariance matrix, controlling for age, education, gender and presence of one or more APOE-ε4 alleles (Table 2). Visit month was converted to years for use as the measure of time. To be able to compare the measures directly, we focused on the standardized regression coefficients (Z-statistics). The larger this Z statistic, the greater the responsiveness of the instrument.

Test validation

We conducted four sets of analyses comparing our two best candidates against the ADAS-Classic, ADAS-Modified, ADAS-Rasch, and ADAS-Tree, in participants with MCI at baseline.

First, we assessed the responsiveness of each of these tests in the validation sample using two approaches. First, we used z scores with the same modeling strategy and covariates as described above for the training sample.

Second, we used the coefficients for time in years and the adjusted residual standard deviation from these models to determine sample sizes needed to detect a 25 % reduction in the rate of decline in 12 months, with 80 % power and two sided alpha = 0.05. A smaller sample size needed to detect a given amount of change indicates a more responsive instrument.

Second, we assessed the ability of the EF measures to predict conversion from MCI to AD using accelerated failure time models with a Weibull distribution. We controlled these models for age, education, sex and presence of one or more APOE-ε4 alleles and censored time at 36 months. We looked at both baseline scores as a predictor of AD at any subsequent visit, and at scores at the preceding visit as a predictor of AD at the current visit. All measures were standardized to a mean of 0 and a SD of 1 before analysis.

Third, we assessed the association of baseline volumes of whole brain, the ventricles, hippocampus, and entorhinal cortex with baseline cognition in linear regression models, controlling for age, education, sex, presence of one or more APOE ε4 alleles, and intracranial volume.

Fourth, we assessed the association of baseline CSF-based phenotypes with cognition using logistic regression models, controlling for age, education, sex, and presence of one or more APOE ε4 alleles. The CSF measures were dichotomized at Tau > 99 pg/ml, Tau/Aβ1-42 < 192 pg/ml, Tau181p > 23 pg/ml, Tau/Aβ1-42 > 0.39, and pTau181p/Aβ1-42 > 0.10, as in Shaw et al. (2009).

Results

Test development

Initial fit for a single-factor model for the ADAS-Cog was inadequate. The Confirmatory Fit Index (CFI) was 0.937 (criteria for excellent fit: CFI > 0.95), the Tucker Lewis Index (TLI) was 0.951 (excellent fit: TLI > 0.95), and root mean squared error of approximation (RMSEA) = 0.092 (excellent fit: RMSEA < 0.05; acceptable fit: (RMSEA < 0.08) (Reeve et al. 2007). Our final score was from a bi-factor model that accounted for residual correlations between word recognition and recall and between the four items based on interviewer report (Fig. 1). Fit was excellent, with a CFI of 0.990, TLI of 0.992, and RMSEA of 0.037. A plot of test information for the ADAS-bifactor is shown the solid line in Fig. 2. The curve indicates measurement precision across levels of cognition. Information peaks around + 2 SD, which means that the ADAS-bifactor has its best precision at high levels of cognitive impairment. An ideal curve for measuring change over time would have information above 12 for scores in the mild to more severe cognitive impairment range, corresponding to a standard error of measurement (SEM) of about 0.3 standard deviations. In the ADAS-bifactor, the information curve is less than 5 throughout, corresponding to an SEM of 0.5 at best, with even less precision for lower levels of impairment.

Fig. 1
figure 1

ADAS-Cog bifactor model

Fig. 2
figure 2

Test information curves, derived using a Monte Carlo approximation to the test information

Listed in Table 2 are the candidate tests we considered for the augmented ADAS-Cog. We used participants in the training sample to select candidate models for further analysis on the basis of responsiveness. In the training sample, the ADAS-Modified was the least responsive of the tests, including all of our candidates. The most responsive test including only additional EF items added the vegetable fluency item (Fig. 3). Here we refer to this score as the ADAS-Plus-EF. The ADAS-Plus-EF bi-factor model fit well, with a CFI of 0.990, a TLI of 0.992 and a RMSEA of 0.037. Adding this item increased the information across the cognitive spectrum, but the overall level was still low (Fig. 2, dashed line).

Fig. 3
figure 3

ADAS plus EF bifactor model

The most responsive candidate test that included informant-based items included vegetable fluency, TMT A & B, DSST, and five FAQ items (Fig. 4). Here we refer to this score as the ADAS-Plus-EF & FA. The ADAS-Plus-EF & FA bi-factor model included a methods factor for the five FAS items and fit well, with a CFI of 0.978, a TLI or 0.990, and a RMSEA of 0.053. The information curve (Fig. 2, dotted line) is higher than the original ADAS-Cog bifactor model and the ADAS-plus EF over all levels of cognitive impairment, including the mild to more severely impaired levels we hoped to target.

Fig. 4
figure 4

ADAS-Cog plus EF + FA bifactor model

Test validation

In the validation sample, ADAS-Tree was slightly more responsive based on z scores (z score = 12.04) than the ADAS-Plus-EF & FA (z score 11.81). The next highest z score was less than 11; standard scores for the ADAS-Modified had a z score of 10.70.

ADAS-Plus-EF & FA was the most responsive in terms of sample sizes needed to detect a 25 % change over 12 months. Only 547 per group would be needed to detect this amount of change for the ADAS-Plus-EF & FA, compared with 733 to 1409 for the other measures. ADAS-plus EF was a slight improvement over the ADAS-Cog bifactor model, but not as responsive as ADAS-Tree (Table 3).

Table 3 Z-statistics for time, in the validation sample, from mixed models for cognition, controlling for age, education, gender, and APOE-4 alleles. Sample size needed per group to detect a 25 % decrease over 12 months, with 80 % power and alpha = 0.05, two-sided

The ADAS-Plus-EF & FA was the strongest predictor of conversion to dementia. This was true both using baseline cognition and cognition at the previous visit (Tables 4, 5, 6). ADAS-Plus-EF did not improve prediction of dementia.

Table 4 Time ratios for dementia conversion among people with MCI, with 95 % confidence intervals (CI), controlling for age, education, gender and APOE-4 alleles. Ratios greater than one indicate a longer survival time.
Table 5 Z-statistics for MRI measures in regression models for the clinical outcomes, controlling for change in the clinical outcome, controlling for age, education, gender, APOE-4, and intracranial volume
Table 6 Z-statistics for baseline CSF measures as dichotomized by Shaw*, in regression models the clinical outcomes, controlling for age, education, gender, and APOE-4+

Both baseline ventricular volume and total brain volume were most strongly associated with the ADAS-Plus-EF & FA, compared to the other cognitive measures. The bifactor scoring of the ADAS-Cog was the second strongest. As expected, neither the ADAS-Plus-EF & FA nor ADAS-Plus-EF were especially strongly associated with hippocampal volume or entorhinal thickness.

The ADAS-Plus-EF & FA was most strongly associated with all the baseline CSF measures except for p-tau, which was most strongly associated with ADAS-Tree.

Discussion

Major findings

Results from this study suggest that the addition of EF measures and informant-based functional items can improve the responsiveness of the ADAS-Cog among people with MCI. Likewise, these additions did not adversely impact validity. These findings suggest that simple and brief additions to the ADAS-Cog may enhance its responsiveness among people with MCI.

Impact of empirical weighting comparisons

The ADAS-Rasch, ADAS-bifactor, ADAS-Tree, and ADAS-Modified are all different ways of scoring precisely the same data. From a modeling perspective, the ADAS-Rasch can be considered a very constrained version of the ADAS-bifactor in which the secondary domains and residual correlations are constrained to 0, and all loadings are constrained to be equal. There was little to choose from between the ADAS-Modified and the ADAS-bifactor, while both were better than the ADAS-Rasch. The ADAS-Tree appeared superior to the ADAS-bifactor and the ADAS-modified in most analyses.

Impact of adding additional content

Augmenting the ADAS-Modified by adding additional content—especially when we added both EF and FA measures—improved the performance of the scale. The ADAS-Plus-EF & FA score was as responsive as the ADAS-Tree and more responsive than the other instruments. The ADAS-Plus-EF & FA performed as well as or better than the other measures for nearly all of our validity assessments. These findings support the utility of EF and FA items to increase the responsiveness of the ADAS-Cog.

The fact that adding functional items was beneficial is consistent with previous literature. Functional decline is a core clinical feature of dementia (McKhann et al. 2011), and appears to be emergent in at least some individuals with MCI (Brown et al. 2011). Because the ability to perform instrumental activities of daily living (IADL) rests upon intact executive function along with other cognitive abilities, it is expected that these items together would accurately classify participants diagnosed with dementia in the ADNI sample (Griffith et al. 2003; Tomaszewski-Farias et al. 2009).

It is noteworthy that our functional assessment items were derived from informant reports. Several studies have documented the utility of informant reports of functional status in dementia assessments (Butt 2008; Sabbagh et al. 2010). For example Mackinnon and colleagues (2003) examined whether dementia screening sensitivity improved with an augmented measure of the MMSE and the Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) in comparison to either item alone. Similar to our findings, Mackinnon and colleagues found the augmented measure to be associated with increased sensitivity and prediction accuracy of dementia cases. Collectively, these findings suggest the possible utility of informant-based reports in the cognitive assessment of people with MCI and AD.

In some settings inclusion of informant‐based items may be more burdensome than it was in the ADNI study, which required an informant for enrollment. This is an important consideration, particularly for clinical trials. The ADAS-Cog administration time lasts approximately 30–45 min. By expanding upon the ADAS-Cog with the proposed additional measures, the ADAS-plus EF & FA may take up to 15 min longer to administer over the ADAS-Cog administration, including patient (8–15 min) and informant (5 min) input. While requiring input from an informant of the patient may be inconvenient, and in some cases not possible, it was evident in our analysis that including this information, when available, was particularly useful when considering data from people with MCI.

There may be some contexts in which it may not be acceptable to include informant reports alongside objective cognitive data. For example, the Food and Drug Administration (FDA) in the US may wish to consider cognition separately from functional impairments. Our confirmatory factor analyses suggested that the FA items, the EF items, and the ADAS-Cog items could be considered to be measuring the same underlying construct, which was consistent with the views of our participating neuropsychologists. A trial could conceivably collect both sorts of data, and report results separately for the ADAS-Plus-EF & FA in papers while using other scores (such as that from the ADAS-Plus-EF & FA except for the FA items) in applications to the FDA.

There are several strengths of this study. First, ADNI represents a rich longitudinal dataset with careful quality control procedures. Second, our approach of supplementary EF and FA scores leaves the original ADAS-Cog intact. This allows for backward compatibility and validation of the original measure. Third, the ADNI-Plus-EF & FA score increased responsiveness with minimally added participant burden.

There are potential limitations associated with our study. The data presented in this study are from a large multi-center cohort similar to those recruited into clinical trials. It is unclear if our findings would be comparable in an epidemiological study or a cohort with more ethnic diversity. Also, the augmented ADAS-Cog models were necessarily restricted to EF and functional measures administered by ADNI. It is possible that EF or FA measures not included in ADNI may function better to detect cognitive changes. Furthermore, we only considered candidate extensions of the ADAS-Cog that added EF and FA items; similar methods could be used to add additional content we did not consider here. Some of the differences in associations across scores were small and results could be different in a different sample. No tests were done to establish one test as significantly better than another using a statistical threshold. Lastly, to our knowledge this is the first attempt to add supplemental measures of EF and FA abilities to the ADAS-Cog. The applicability and efficiency of our proposed ADAS-Cog in a pharmacological clinical trial has yet to be determined.

Summary and conclusions

This study demonstrated that the addition of a few supplemental EF measures and FA items improved the responsiveness of the ADAS-Cog without impairing its validity. Future research should build on these findings and focus on developing a MCI general cognition instrument akin to the ADAS-Cog. A general cognition instrument tailored specifically for MCI populations has the potential to be useful in clinical trials of therapies targeting individuals at increased risk for dementia.