A recently published research framework for Alzheimer’s disease (AD) proposes that AD can be diagnosed on the sole basis of two biomarkers: β-amyloid and pathologic tau [1]. Such guidance incorporates an important departure from previous criteria for AD in that living individuals with high amyloid and tau burden would be diagnosed as having AD even if they had no objective or subjective memory or cognitive difficulties. Clinical AD is a multi-factorial syndrome, and the new proposal aspires to untangle the Gordian knot by focusing on a specific neuropathologic process putatively defined as AD.

We argue here that diagnosing AD based on two biomarkers alone, ignoring subjective and objective cognitive assessments, is a mistake until we are certain that these biomarkers are the central causal mechanisms for symptomatic AD. Diagnostic criteria are rules for measuring the presence or absence of a disease, and the standards we typically apply for evaluating measurements are relevant to diagnostic criteria. Measurement innovations can powerfully accelerate research but must be weighed against potential disadvantages. We argue that any changes in a research framework for diagnosing AD should be evaluated on three criteria (Box 1) and adopted only if they: (1) improve the validity of the diagnosis; (2) improve reliability or reduce cost, thereby increasing statistical power achievable in new studies; and (3) foster innovative, rigorous research, by reducing the potential for bias and promoting scientific discoveries. The new research framework as currently proposed is unlikely to fulfill any of these criteria. On the contrary, adopting this framework will be a setback for the field, muddling current AD research and chilling future scientific discovery.

Validity of current and newly proposed diagnostic guidelines

Validity is defined as the extent to which a measurement assesses the construct of interest, as opposed to other, potentially correlated constructs. What is the target of interest in AD research? Conceptually, we can separate the biological changes in the brain from the clinical consequences of those changes such as cognitive and functional deterioration. It is the cognitive and functional outcomes that distress patients and families and pose major health care challenges. From a public health and clinical decision-making perspective, the clinical syndrome of AD is the relevant outcome.

One argument for adopting a biomarker-based AD research framework is that the clinical syndrome of AD is influenced by multiple pathologies and using a biomarker-based criterion may help us focus on a limited set of physiologic processes. A narrower set of physiologic processes indicated by the selected biomarkers might be easier to understand, interrupt, or reverse compared to the complex, intersecting processes contributing to clinical AD. This reasoning only advances science if we have comprehensive biomarkers for the relevant neuropathological process and we have demonstrated that process is necessary and sufficient for the eventual development of clinical AD. Unfortunately, we cannot say these conditions are met with respect to AD. The amyloid cascade hypothesis remains unverified as illustrated by several pharmacological treatments that show reductions in amyloid burden but no improvements on clinical manifestations [2]. Amyloid is characteristic of patients with clinical AD and correlates with future cognitive deterioration, but many people (about 30%) have substantial amyloid burden and no detectable cognitive consequences. Most people who are biomarker positive will never develop the clinical disease [3]. Conversely, many people (about 25%) who meet the clinical diagnostic criteria for AD have no or limited amyloid burden [4]. The limited available evidence suggests the correspondence of these biomarkers with clinical and cognitive outcomes may differ by age and race, perhaps due to the differential importance of vascular disease [5,6,7,8].

Additionally, we do not know if amyloid is the initial etiologic insult that leads to AD, or merely a biomarker of another pathologic process. It is critical to understand this before designating these biomarkers as sufficient to define the disease. For many diseases, there are strongly predictive biomarkers that are not biological mechanisms. For example, elevated C-reactive protein (CRP) is a strong marker of coronary heart disease risk, but it is not a biological mechanism [9]. Until we understand the essential biological mechanisms of AD, we cannot be sure that amyloid and tau are valid measures of the most relevant biological processes leading to clinical AD. The important new approaches developed to measure amyloid appear to be valid measures of the presence of amyloid in the brain, but they are not yet proven to be valid measures of the clinical syndrome of AD. The best evidence to date suggests that clinical AD culminates when multiple pathways converge [10,11,12].

The importance of amyloid in the original case description of Alois Alzheimer is sometimes invoked to justify using amyloid imaging as a gold standard in contemporary diagnoses. This argument is specious because the first patient identified with AD—a woman of 51 years—was brought to Alzheimer’s attention because she was experiencing severe cognitive impairment and further was found to have substantial co-occurring neuropathological changes [13].

Improving efficiency of research

New diagnostic approaches often provide breakthroughs because they improve the efficiency of research and help us make faster scientific progress. For example, a new diagnostic approach that made identification of AD cases less expensive or less burdensome would allow us to enroll larger sample sizes and achieve more precise effect estimates. Alternatively, increasing measurement accuracy would allow us to learn more from smaller samples, for example allowing for more targeted recruitment in invasive clinical trials. Reliability is the extent to which variation in an instrument reflects variation in the construct of interest. More reliable outcome assessments provide more precise effect estimates with the same sample size. Increasing reliability improves statistical efficiency, but that advantage is eroded if the new measure is more expensive or burdensome.

We can calculate the net impact of proposed new measures with a few assumptions about the relative cost versus relative reliability (i.e., percent of variance in measurement that is due to a hypothetical pathologic process defining AD) of alternative measures. Consider a study of whether an exposure, for example, physical activity, reduces risk of AD. Given fixed resources to conduct the study, are we better off using neuropsychological assessments or using amyloid imaging to assess whether the exposure influences the pathologic process of clinical AD? Which approach will maximize power? In Table 1, we show the net impact on power under alternative assumptions. In most plausible scenarios with current technology, power is much worse with imaging measures, and adopting these measures may increase risk of missing important causes of clinical AD. The same calculations can be applied to evaluate alternative biomarkers, and the ratios may improve with technological innovations to reduce cost.

Table 1 Ratio of power in hypothetical studies using amyloid imaging versus cognitive assessments, considering tradeoffs in affordable sample size versus reliability under alternative scenarios
Box 1 Criteria for evaluating the new research framework for Alzheimer’s disease

Fostering rigorous and innovative scientific research

A serious potential negative aspect of the proposed research framework is that it may narrow our scientific vision, instead of expanding it. In part due to numerous failed trials, interest in investing research on β-amyloid and pathologic tau as therapeutic targets are diminishing in the pharmaceutical industry. Measuring proposed AD biomarkers is burdensome and expensive, cannot be done in the home of research participants or in most clinical settings, and may be perceived skeptically by many potential research participants, particularly those within populations where the benefit of early diagnosis would be of the greatest advantage. The result of adopting this research framework will be even fewer study participants from communities already underrepresented in research, including racial/ethnic minorities, low socioeconomic status individuals, and people in rural or medically underserved communities. These categories include the majority of US residents and the vast majority of all living humans. In other words, the proposed framework will work best for a small slice of white, highly educated, people in middle- and high-income countries who live within close proximity to a major research university. By restricting the diversity of research participants, we also restrict the types of risk factors we can evaluate, and scientifically, this limits the generalizability and relevance of research results. Given this, combined with the high unit cost for biomarker diagnosis, search for upstream risk factors in the general population will become increasingly difficult. We will not be able to assess, for example, whether AD is influenced by many geographic or environmental exposures or risk factors that vary primarily across demographic groups.

While AD research is already highly selective, the 2-biomarker framework would exacerbate this problem. The strong selection may also weaken the rigor of studies and threaten internal validity. In observational studies, when treatment cannot be randomized, selection bias can lead to spurious correlations [14]. To illustrate how this could happen, consider the possibility that people with a family history of AD may be exceptionally motivated to participate in AD research. Such individuals may be willing to drive a long distance to the clinic and undergo uncomfortable or invasive procedures to participate. Subtle memory changes may make the person even more motivated to participate in research. In contrast, people with no family history of AD may ignore early feelings of memory decline because the possibility of developing AD is less salient to them. Such individuals may differentially decline study participation. This phenomenon may bias effect estimates or create entirely spurious associations between familial risk factors (e.g., genetics) and AD incidence [15]. As the barriers to participation grow, the selection bias introduced by such phenomena is also likely to grow.

We acknowledge that the current diagnostic criteria for clinical AD have important challenges, for example arising from the need to disentangle developmental from neurodegenerative processes. Early identification of AD using the current clinical criteria could be influenced by reserve, such that for two individuals with identical neuropathology meeting a specific biomarker diagnosis for AD, only one may meet clinical diagnostic criteria (or they may meet clinical criteria at very different ages) because of different early life experiences and cognitive development. To design targeted and effective interventions, however, it may be important to distinguish determinants of cognitive development from determinants of neurodegeneration, even if both processes influence clinical cognitive impairment in late life. This is an important motivation for revisiting current criteria and should influence statistical analyses in AD research [16]. Unfortunately, the recently published biomarker-only criterion, based on an incomplete biological understanding of disease, does not solve this challenge.

An additional rationale offered for the new framework is the promise of identifying cases earlier in progression, before cognitive symptoms are manifest [17]. There is hope that such early identification will improve the potential for interventions to prevent disease development and identify a more appropriate population for trial enrollment. This is a specious argument because there is no need to redefine the disease in order to allow trialists to preferentially enroll people with heavy amyloid or tau burden. Further, future innovations in treatment strategies may not need to target amyloid positive individuals. More fundamentally, there is insufficient evidence that amyloid is the earliest detectable physiologic change foreshadowing incident clinical AD. In vivo amyloid assessments have not been available long enough to demonstrate that they show changes earlier than more easily assessed markers, such as cognition or even non-specific physiologic changes such as declines in body mass index (BMI) [18, 19]. In the Dominantly Inherited Alzheimer Network (DIAN), which simulated longitudinal data by using age of anticipated symptom onset, trajectories of logical memory for carriers began to deteriorate relative to trajectories of non-carriers at nearly an identical period as CSF aβ42 began to diverge [20]. The similar timing of early changes in cognition and the biomarkers was especially notable given the known limitations in sensitivity of the logical memory test as a single indicator. It is not yet clear at the population level that we are gaining any early notice by using amyloid biomarkers. We might better serve the goal of early detection by improving reliability and range of cognitive assessments.

Redefining AD as equivalent to brain amyloidosis will create a Tower of Babel in current research. Although there are compelling hypotheses about the role of amyloid, as noted above, we still face substantial uncertainty regarding the biological mechanisms linking amyloid and cognitive decline. Is it possible that amyloid is a byproduct of a disease process, indicating a neurodegenerative process but not the underlying cause of the neurodegenerative process? Could amyloid result from cellular efforts to recover from or reduce the impact of a cellular injury? Perhaps amyloid is a factor that increases cellular vulnerability to other biological insults but has little effect in otherwise healthy cells? If any of these might be true, it is not appropriate to use amyloid burden as a primary criterion. Such a definition could result in adoption of expensive interventions—many of which have significant side effects—that alter amyloid burden but have no benefits for cognitive or functional well-being.

Why not adopt this new framework and change it later as we learn more?

Scientific hypotheses are constantly tested, revised, or falsified. The biomarker-based diagnostic framework is sometimes framed as a hypothesis, with the expectation that although imperfect, it is a step forward and we can improve it as we go along. Adopting a new criterion for a potentially fatal disease is not analogous to positing a testable scientific hypothesis in other settings, however. Redefining a disease criterion changes the course of research and makes it more difficult to evaluate alternative hypotheses. Premature adoption of new criteria, based on incomplete biological understanding and only accessible to a few highly advantaged individuals, could have many adverse consequences.

We should learn here from the numerous public health episodes in which serious mistakes occurred because we misunderstood the biology of biomarkers linked to disease. For example, premature ventricular complexes (PVCs) in post-myocardial infarction (post-MI) patients predict increased risk for death. For years it seemed reasonable to assume drug suppression of PVCs would decrease the risk of post-MI mortality and standard practice was to attempt to suppress such events. The CAST trial showed later that such suppression increased mortality [21]. Prostate specific antigen (PSA) was widely adopted as a standard screen for prostate cancer before we had clear understanding of how to distinguish tumors that were likely to be indolent versus aggressive [22]. Based on premature widespread adoption of PSA criteria, over a million men were needlessly treated for a cancer that would have remained innocuous; prostate cancer treatment often has severe consequences on quality of life, such as impotence and incontinence. PSA is an important example, because once enshrined as a diagnostic instrument, major financial and professional incentives create pressure to retain the standards [23]. A mistake that creates financial incentives for the status quo is very difficult to correct, and the strong financial incentives will render objective scientific discussion and discovery more difficult. Changing the diagnostic criteria for AD will have numerous financial consequences which should be considered when evaluating proposed changes. Such a change will also have immense personal consequences for the individuals diagnosed, most of whom will never manifest symptoms. AD is among the most feared diseases of late life [24]. In 2010, 66% of respondents in the US based Health and Retirement Study believed that individuals with Alzheimer’s disease are not capable of making informed decisions about their own care [25]; the influence of stigma associated with AD on biomarker positive people is unknown.

Conclusions

Biomarkers present an incredible opportunity to evaluate, test, and revise our understanding of the biological mechanisms that may underlie cognitive change and neurodegeneration. If we make a very limited set of biomarkers definitional of disease, ignoring the cognitive syndrome, we throw away that opportunity. Research should be focused on prevention and treatment of the AD clinical syndrome, which is primarily manifested as deteriorations in memory and other cognitive abilities. We should therefore identify and verify a stronger mechanistic link between proposed biomarker and ultimate clinical manifestation of the disease before codifying the biomarkers as diagnostic of the disease itself. Without such a link, a new biomarker-only-based research framework will fail to accelerate scientific progress in preventing and treating AD. We fear this is one step towards a move to define these biomarkers as acceptable surrogate outcomes in clinical trials for prevention or treatment of AD [26]. Such a move would be a financial boon for many, but a tragic betrayal of the interests of millions of people whose lives are affected by clinical AD.