Introduction

Neuropathologic assessment at autopsy remains the gold standard for the diagnosis of Alzheimer’s disease (AD). However, important inconsistencies between laboratories have been described mainly due to the diversity of staining methods and diagnostic approaches [1, 5, 6, 11]. As a consequence, and to have a common base across centers with relevant neuropathologic data useful for neurologists and neuroscientists contributing in the study of AD, consensus criteria for the diagnosis of AD were developed, with the Khachaturian criteria in 1985 the first to be established [10]. Khachaturian highlighted the importance of quantification of neuritic plaque per mm2 field of view and adjustment for age at death for a neuropathological diagnosis of AD. In the following years, the Consortium to Establish a Registry for Alzheimer’s disease (CERAD) elaborated a standardized protocol in which the semiquantitative assessment of neuritic plaques coupled with the patient’s age and clinical signs of dementia resulted in a scoring system that purported to provide a level of certainty for the diagnosis of AD [12]. In 1997, the National Institute on Aging (NIA) and the Ronald and Nancy Reagan Institute of the Alzheimer’s Association suggested new guidelines (NIA-Reagan criteria) in which the CERAD score for neuritic plaques was combined with the topographic staging of tau neurofibrillary deposits proposed by Braak and Braak [4], and the degree of likelihood that AD was the cause of dementia was given [9, 21]. These latter criteria required a history of dementia as they addressed the question whether AD was the underlying cause of a patient’s symptoms. However, the age or the patient’s clinical history was not modifying factors for the diagnosis.

Since these pioneering early criteria, there has been a vast advance in the knowledge of AD motivating the creation of updated criteria on the neuropathologic assessment of AD, the National Institute on Aging-Alzheimer’s Association guidelines (i.e., NIA-AA criteria) [8, 13]. These new diagnostic criteria, (i.e., NIA-AA criteria) use the CERAD Aβ pathology scoring system as well as the Thal Aβ phase (TAP) scoring system (Table 1) [18] for evaluation of the burden of Aβ pathology, in addition to the Braak et al. tau staging scheme.

Table 1 Schematics of the two different scoring systems for Aβ pathology in NIA-AA criteria

There are major differences between these two scoring systems. First, they evaluate different types of plaques. For example, despite the well-known difficulty in distinguishing neuritic from diffuse plaques and the fact that tau-positive neurites can be found in nearly all diffuse plaques [1, 17], CERAD is based on the specific assessment of only neuritic plaques, a subset of Aβ deposits that contain dystrophic neurites which are thought to reflect more mature plaque pathology. In contrast, TAP evaluates all Aβ plaques regardless of their morphology and the presence of associated neurites. Secondly, these staging systems differ in method of plaque quantification. Specifically, the CERAD scores are based on a semiquantification of severity of neuritic plaque burden while in TAP, the Aβ phases reflect the spatial distribution of a dichotomized presence or absence of plaques across several neocortical limbic and subcortical brain regions resulting in five distinct Aβ phases that are grouped into a three-tiered system in NIA-AA and we refer to this as TAP A1–A3 (Table 1) [18]. Thirdly, the evaluation of neuritic plaques in the CERAD scoring system is applied to a more limited set of neocortical brain regions than TAP [12]. Finally, the staining techniques recommended for the detection of Aβ plaques also differ in each scoring system, i.e., CERAD recommends the use of silver-based staining methods, such as the modified Bielschowsky method, or the use of amyloid-binding dyes such as ThioflavinS (ThS), while TAP uses immunohistochemistry (IHC) with specific antibodies to Aβ, which is a more sensitive method for the detection of all Aβ deposits [18].

However, the relationship between the CERAD and TAP scoring systems for Aβ plaques is not clear which poses a problem for comparative studies that utilize these two different scoring systems as well as for interpreting data generated using the NIA-AA criteria for AD [8, 13]. Thus, the main goal of this study was to investigate the association between the CERAD and TAP Aβ scoring systems. There was a conspicuous association for advanced stages of disease but not for low or intermediate pathology. However, we developed an algorithm in which CERAD scores can be imputed to TAP scores in a feasible manner which would allow for cases scored by NIA-Reagan criteria to be included in future studies using NIA-AA criteria.

Materials and methods

Cases

The training set consisted of 82 cases with a clinicopathological diagnosis of AD based on the burden of AD neuropathologic change (n = 40) or subjects without cognitive impairment at the time of death with a low burden of AD neuropathologic change (n = 42) obtained between 1998 and 2013 from the Center for Neurodegenerative Disease Research (CNDR) brain bank at the University of Pennsylvania were included in the study. In addition, a group of 41 patients with a clinicopathological diagnosis of AD (n = 24) or subjects without cognitive impairment (n = 17) were used for validation. Informed consent was obtained for all patients, and all procedures were performed in accordance with the local institutional review board guidelines. Patient demographics and clinical characteristics (gender, age at death, disease duration) were obtained from our integrated neurodegenerative disease database [19, 20] and are shown in Table 2.

Table 2 Patients included in the study

Histochemistry, immunohistochemistry, neuropathologic assessment and statistical methods

At autopsy, fresh tissue was sampled from representative regions of the brain, fixed in 10 % neutral buffered formalin (NBF), embedded in paraffin and sectioned at a 6 μm thickness. The diagnostic assessment of cases was conducted as recently reviewed [19] and the focus here was on comparing CERAD and TAP scoring systems. To that end, for the assessment of neuritic plaques by CERAD methods, ThS staining as recommended by CERAD was done [12] although it must acknowledge that sharp distinctions between neuritic and diffuse plaques are not always possible [1, 17]. For TAP Aβ grading [2, 18, 19], detection of Aβ plaques was carried out with IHC using a mouse monoclonal antibody specific for Aβ with formic acid pretreatment (NAB228, anti-Aβ1–11: 1:20,000, generated in CNDR) [7]. Detection of phosphorylated tau, PHF1 [15] (1:1,000; courtesy of Dr. Peter Davies) to identify neurofibrillary tangle pathology was also performed as previously reviewed [20].

Following the recommendations from CERAD [9, 12] and TAP Aβ scoring [2, 18], sections from different regions of the neocortex (frontal, parietal and temporal) were stained with ThS. The frontal and occipital lobes, hippocampus (including entorhinal cortex), amygdala, midbrain (including substantia nigra) and cerebellum were stained with NAB228. CERAD and TAP scores were tabulated for each case following the three-tiered system in the NIA-AA criteria and based on the evaluation of each of the regions studied in each case [8, 13].

Agreement between the different Aβ measures was calculated using a linear weighted kappa index. To measure the correlation between the different neuropathological scores, a Spearman rank correlation was applied.

Results

CERAD C3 corresponds to TAP A3

When Aβ aggregates in our cohort were analyzed with the two different grading systems suggested by the NIA-AA criteria, CERAD and TAP, there was an overall moderate agreement between scoring systems (kappa = 0.60). Subanalyses of the specific stages showed that 19/19 (100 %) cases in CERAD C3 category corresponded to the TAP A3 score. Of the 32 subjects classified as CERAD C2, 15 of them (47 %) were graded TAP A3 and 14 cases (44 %) fell into the TAP A2 group, whereas 3 (9 %) had a TAP A1 score. The 12 subjects with CERAD C1 score resulted in the greatest variability when compared with the TAP scoring system. The majority of CERAD C1 cases (6; 50 %) corresponded with TAP A2 score, while the rest of these cases were equally divided between the TAP A1 group (3; 25 %) and TAP A3 group (3; 25 %). Out of the 19 cases without neuritic plaques, i.e., with a CERAD C0 score, 10 (53 %) were also free of Aβ pathology (TAP A0) when analyzing with Αβ IHC and accounting for all types of plaques. The rest of the cases fell into TAP A1 (5; 26 %) and TAP A2 (4; 21 %) categories. None of the patients who had an initial score of CERAD C0 was classified as TAP A3 (Fig. 1).

Fig. 1
figure 1

Comparison of agreement between CERAD C and TAP A scores. In AD and normal patients, the overall agreement between CERAD C and TAP A scores is moderate (kappa = 0.57). When assessing individual groups, CERAD C3 has very good agreement (100 %) with TAP A3, while CERAD C1 and C2 cases corresponded to a diverse range of TAP A values

Total Aβ is not helpful in associating CERAD C2 cases to TAP score

When cases in the CERAD C2 group were extrapolated to TAP scores, they were evenly distributed into TAP A2 or A3 categories. Since the severity of total Aβ plaques, including neuritic and diffuse plaques detected by IHC, was shown to correlate with dementia in the oldest-old [16], we assessed total burden of Aβ pathology in the frontal neocortex in a semiquantitative manner (0 = none, 1 = low; 2 = moderate; 3 = high). CERAD C2 cases with high burden of total Aβ plaques corresponded to TAP A3 in a moderate percentage (11/17; 64 %), while low or moderate burden of total Aβ corresponded to TAP A2 in a higher percentage (11/15; 73 %).

Association of TAP and CERAD scores with clinical dementia and neurofibrillary tangle pathology

TAP A3 category was associated with dementia and it showed 94 % sensitivity and 87.7 % specificity for the diagnosis of dementia of the Alzheimer type. Similarly, Braak stage V–VI showed a sensitivity of 82 % and a specificity of 94 %. Conversely, CERAD C3 category showed a sensitivity of 57 % and a specificity of 100 % for the presence of dementia of the Alzheimer type (Table 3). TAP and CERAD showed a similar correlation with Braak stage (r = 0.72, p < 0.0001) (Table 4).

Table 3 Association of TAP A, CERAD C and Braak B scores to dementia
Table 4 Association of TAP A and CERAD C scores to neurofibrillary tangle pathology (Braak B stage)

Assessment of neuritic plaques with NAB228 and ThS

As noted above, the designation of neuritic versus diffuse plaques is difficult, with no consensus as to their definition or appearance [1, 17]. Despite this caveat, we attempted to assess the amount of neuritic plaque pathology on IHC-stained sections to determine whether antibody versus ThS-based methods revealed similar results. Two raters (SB and DI) compared neuritic plaques assessed with NAB228 and ThS in a subset of 35 cases in frontal and occipital lobes and resulted in moderate agreement, with kappa values of 0.73 and 0.74 for frontal cortex and 0.82 and 0.85 for occipital cortex.

Algorithm for the conversion of CERAD to TAP scores

In our cohort, we observed that even though there is not a direct relationship between CERAD and TAP scores, there is a trend in which lower CERAD scores correspond to lower TAP scores and higher CERAD scores correspond to higher TAP scores. Furthermore, in 99 % of cases (all except for 1 case in which the midbrain section was incomplete), Aβ aggregates followed the topographical distribution described by Thal et al. [18]. This raised the possibility that when analyzing the distribution in anatomical regions of Aβ in each of the CERAD groups, an algorithm could be developed to transpose TAP scores from existing CERAD scores with high accuracy in our cohort (Fig. 2). Thus, we examined the minimal number of regions necessary to perform Aβ IHC to reliably convert CERAD scores to TAP. Cases classified as CERAD C3 can be transformed to TAP A3 with 100 % accuracy. If the CERAD C score is <3, but the patient has a Braak B3 score for tangles or has clinical symptoms of dementia, they can also be converted to TAP A3 with a 72 or 86 % accuracy, respectively. If subjects do not follow these characteristics, but are classified as CERAD C2, Aβ IHC of the midbrain discriminates those cases that will be TAP A3 (Aβ positive in midbrain) from those that are TAP A2 or A1 (Aβ negative in midbrain) with 100 % accuracy in our cohort. If the cases have a score of CERAD C0 and are Aβ immunoreactive in amygdala with or without frontal cortex involvement, they can be extrapolated to TAP A2 (Aβ positive in amygdala and positive or negative in frontal cortex) or TAP A1 if only frontal cortex and not amygdala is Aβ immunoreactive (Aβ positive in frontal cortex). And if both the frontal cortex and amygdala regions are negative for Aβ, the score will be TAP A0. There was one case out of the five in the cohort studied here that was negative for Aβ by IHC in frontal cortex and amygdala but was Aβ positive in occipital cortex and therefore corresponded to TAP A1. Because of the variability of TAP scores in cases with CERAD C1, the staining of the six sections proposed in Alafuzoff et al. [2] is suggested as it was shown that the evaluation of these sole regions gave similar results in agreement rate and assessment results as the use of more numerous brain sections as suggested by Thal [2, 8, 13, 18].

Fig. 2
figure 2

Algorithm for the conversion of CERAD to TAP scores of subjects with AD neuropathologic change. CERAD C3 can be extrapolated readily to TAP A3. If CERAD C is <3 and Braak B is 3 or the patient was demented, the score can be extrapolated to TAP A3. For CERAD C2 cases, Aβ IHC of the midbrain will differentiate between TAP A3 and TAP A1 or A2 groups. For CERAD C0, Aβ IHC in amygdala and frontal cortex will differentiate between TAP A0 and TAP A1 or A2. CERAD C1 cases require Aβ IHC to be performed on all six brain regions discussed in the text to relate CERAD C1 to a TAP A score

The algorithm was validated with an additional cohort of 41 cases that were clinically demented (n = 24) and non-demented (n = 17) with CERAD scores ranging from C0 to C3 (Table 2) obtaining a 98 % accuracy when transforming CERAD C scores to TAP A scores (Table 5).

Table 5 Results obtained in the validation group (n = 41) when inferring TAP A scores from CERAD C scores according to the algorithm

Discussion

Over the years, to overcome the inconsistencies observed between laboratories, and as an answer to the need of having consensus diagnostic criteria between centers for the creation of common databases and favor collaborations between institutions to advance in the study of AD, several schemes for the neuropathologic diagnosis of AD have been developed [2, 4, 8, 10, 12, 13]. Recently, an update of the 1997 NIA-Reagan criteria has been proposed resulting in the current NIA-AA criteria. Similar to the previous diagnostic guidelines for the assessment of AD pathology, the NIA-AA criteria were based on the assessment of Aβ and neurofibrillary tangles in addition to other pathologies. Furthermore, the NIA-AA criteria include two different scoring systems for the evaluation of Aβ pathology, CERAD and TAP. As noted above, there are significant methodological differences between the two scoring systems (i.e., type of plaques evaluated, method of plaque quantification, regions of the brain assessed and staining procedures used) that might influence the outcome of research results and for this reason we investigated the relationship between these two evaluation schemes.

When analyzing the correspondence between CERAD and TAP scores globally, the agreement was moderate, with a kappa value of 0.60. However, the correspondence between CERAD and TAP varied greatly depending on CERAD score, with CERAD C3 cases corresponding to TAP A3 with a 100 % agreement (Fig. 1). This high concordance is possibly due to the fact that CERAD C3 and TAP A3 represent advanced stages of disease where there are both maximal levels of cortical burden of neuritic pathology and spatial distribution of Aβ deposits throughout the brain. In contrast, groups with CERAD C score <3 did not show a good correlation with TAP scores, which most likely reflects the heterogeneity associated with more modest levels of Aβ pathology.

Because the CERAD C2 group was characterized by an almost equal split between TAP A2 and TAP A3 scores, in other words, a moderate amount of neuritic plaques does not seem to directly correlate with the anatomical distribution of β-amyloid deposits, we studied this relationship in more detail. It has been shown that the total burden of Aβ plaques measured by IHC, which includes diffuse and neuritic plaques as well as other types of Aβ deposits, but not neuritic plaques alone correlated with dementia in the oldest-old suggesting a role for multiple types of Aβ pathologies or deposits in dementia [16]. We semiquantitatively measured total Aβ burden in the neocortex using NAB228, and its relation to TAP groups, which resulted in moderate correspondence. Since we expected that a more extensive distribution of pathology throughout the brain would be associated with higher burden of total Aβ in the cortex, particularly since neocortical Aβ pathology is seen in earlier stages of disease, this finding suggests that intensity of pathology in a brain region involved in early stages and the anatomical distribution over time might not evolve at the same pace. Further quantitative studies are required to confirm these observations.

A significant percentage of CERAD C0 cases, 9 out of 19 (47 %) in our cohort, were classified as either TAP A1 (n = 8) or TAP A2 (n = 1) indicating early stages of Aβ deposition. This may be related to the fact that Aβ IHC is more sensitive than ThS in detecting all forms of Aβ deposits, leading to higher TAP scores relative to CERAD scores. Thus, it is tempting to hypothesize from these data that the spread of diffuse plaques precedes the formation of neuritic plaques, but extensive evaluation across multiple CNS regions in a larger dataset is needed to clarify this observation. Importantly, following the NIA-AA criteria, some TAP A2 cases would be classified as intermediate AD neuropathologic change, which, depending on the Braak stage, may be considered sufficient to account for the dementia in a given case. This finding is of importance as it may have implications on how patients would be classified in future studies compared to previous classification schemes that might result in different clinicopathological correlations.

Because of this observation, we compared the two diagnostic criteria to diagnose AD (i.e., NIA-Reagan and NIA-AA criteria) in our initial cohort of 82 cases. To obtain a degree of likelihood that the pathologic changes were the cause of dementia, NIA-Reagan criteria ask for a particular match between Braak stage and CERAD while excluding a wide range of combinations between the two scoring systems, which in part prompted the revised NIA-AA criteria. Therefore, of our 82 cases, only a limited number (n = 36) was able to be evaluated with the NIA-Reagan scheme. These cases had a good correspondence when determining AD as the cause of dementia, i.e., following NIA-Reagan criteria, five subjects had low-, 12 intermediate- and 19 high-probability of AD that corresponded to five low-, 12 intermediate- and 19 high-AD neuropathologic change by the NIA-AA criteria. However, despite the fact that 40 cases showed histological characteristics of AD pathology, they were not classifiable. Of these subjects, eight had no, 17 low, six intermediate and nine high levels of AD neuropathologic change with the new NIA-AA criteria. Interestingly, six cases were defined as not meeting NIA-Reagan criteria for AD based on the absence of neuritic plaques, tau pathology and clinical history of dementia, and therefore considered to not have AD with unremarkable or non-diagnostic CNS findings. However, four of these cases were classified as having low AD neuropathologic change with the NIA-AA classification based on the presence of diffuse plaques. Thus, the NIA-AA criteria could alter the way cases are categorized, representing a distinct change and constituting a possible modifier of results in future studies.

We also analyzed the association of the spatial distribution of Aβ deposits (i.e., TAP score), and semiquantification of neuritic plaques (i.e., CERAD score) with the presence of clinical signs of dementia and the burden of neurofibrillary tangle pathology (i.e., Braak stage). Although CERAD C3 showed a high specificity for the diagnosis of dementia, the sensitivity was low. Conversely, TAP A3 showed an excellent sensitivity and specificity for dementia before death indicating that the extension of plaques rather than the cortical presence of neuritic plaques correlates better with diagnosis of dementia. However, the correlation with Braak stage was similar between CERAD and TAP.

With the aim of simplifying diagnosis of cases in brain banks, we analyzed the possibility of evaluating CERAD score with Aβ IHC, avoiding the need for staining with two different techniques to determine CERAD and TAP scores. Overall, the agreement was moderate (49/70; 70 %) indicative of the difficulties associated with identifying different types of plaques by IHC, as described previously in an inter-laboratory IHC assessment of Aβ peptide [1].

Taking advantage of the singularity of the trend of the cases in our cohort to follow the hierarchical distribution of Aβ pathology described by Thal et al. [18], we developed an algorithm in which it is possible to relate cases from CERAD scores to TAP scores with moderate to high reliability (Fig. 2) which was validated with a second group of subjects with high accuracy. This would facilitate the imputation of Thal scores for cases in brain banks, limiting the time and high costs associated with updating pathology data to correspond to NIA-AA criteria, rendering the material useful as research material for future studies.

An important limitation of this study is the existing significant inter-rater variability when assessing Aβ plaques [1, 11]. Although it is lower when assessing Aβ plaques compared to quantifying neuritic plaques, inter-rater variability is still present and this might result in variations between the correspondence of CERAD with TAP scores. It has been reported that the inter-rater variability for Thal Aβ phases classification is low [2], but caution is needed as the use of antibodies with different sensitivities to detect Αβ plaques might introduce errors when classifying cases for TAP.

In summary, except for advanced stages of Aβ deposition corresponding to CERAD score of C3 that can be reliably extrapolated to TAP A3 in AD, there is no clear relation between the semiquantitative evaluation of neuritic plaques (CERAD) and the topographical distribution of Aβ aggregates described by Thal et al. (TAP). Therefore, it is not possible to convert CERAD scores to TAP category without doing further work-up. However, given the robust hierarchical distribution of Aβ in AD patients, we describe an algorithm that can be used to simplify the assessment of TAP scores of already diagnosed cases based on CERAD methods for the analysis of a limited set of brain regions. Interestingly, TAP was more associated with the clinical diagnosis of dementia in patients before death and it might be a better criterion to grade amyloid plaques than the CERAD score.