Introduction

Alzheimer’s (AD) and Parkinson’s diseases (PD) are the two most common neurodegenerative disorders, together affecting > 6 million individuals worldwide [10, 27]. AD is defined neuropathologically by the presence of amyloid-beta (Aβ) plaques and tau neurofibrillary tangles (NFT), while PD is defined by the presence of Lewy bodies composed of alpha-synuclein (aSyn). The average age of a patient receiving an AD clinical diagnosis is ~ 80 years old [2], while the average age of a patient receiving a PD clinical diagnosis is ~ 60 years old [36].

PD is not the only disease defined by aSyn Lewy bodies. Rather, PD belongs to a group of “synucleinopathies” collectively called the Lewy body diseases (LBD). The LBD comprise PD, with or without dementia, dementia with Lewy bodies (DLB), and multiple system atrophy (MSA) [19], with the first two entities (PD and DLB) demonstrating neuronal aSyn Lewy bodies, while MSA shows aSyn inclusions in glia. Importantly, the distinction between DLB and PD with dementia (PDD) is clinical, based on the timing of development of dementia [31]. On neuropathological examination, DLB and PDD patients are nearly indistinguishable at the individual level. Furthermore, DLB and PDD share preclinical features, and shared genetic variants confer an increased risk in both disorders [4, 14, 28].

Despite traditional separation between AD and the LBD, growing evidence suggests a dynamic interaction between their pathophysiologies. Fifty to 80% of patients with a primary clinicopathological diagnosis of LBD have concomitant Aβ and tau pathology [38]. At autopsy, up to 40% of PD patients exhibit enough Aβ and NFT to qualify for a secondary diagnosis of AD [14]. Mechanistically, in vitro and in vivo studies suggest that aSyn, tau, and Aβ may interact synergistically in events leading to disease development [5, 44]. From a practical viewpoint, these findings suggest that LBD patients may be at-risk for developing AD.

Genetic risk factors for developing AD have been identified through family studies and genome-wide association studies (GWAS). In a recent AD GWAS comparing > 50,000 cases with > 100,000 controls, 25 distinct loci were associated with risk for AD [24]. However, the genetic heritability (h2) reported for this study was only 0.071, and various genetic risk scores composed of AD GWAS-nominated variants have poor predictive value for AD in the general population [11].

We reasoned that the high prevalence of AD within the LBD population might enhance the ability of AD genetic risk variants to predict the development of AD pathology. Accordingly, we genotyped all common genetic variants reported in 2 or more AD GWAS to associate with AD risk in a single-center cohort of 208 consecutive cases with a primary clinicopathological diagnosis of either PD or DLB. We tested these AD risk variants for their ability to predict concomitant AD pathology in these cases, validating our best model in an additional 70 LBD cases from the multi-center National Alzheimer’s Coordinating Center (NACC) database.

Materials and methods

Participants

Clinical and neuropathological data from all autopsy cases enrolled between February 1985 and July 2019 at the University of Pennsylvania (Penn) Center for Neurodegenerative Disease Research brain bank were assessed [45]. A clinicopathological diagnosis was assigned to each case primarily determined by neuropathology and secondarily accounting for clinical history. We note that although some cases were banked decades ago, all cases have been reassessed using modern criteria and techniques. Those with (1) a primary clinicopathological diagnosis of PD, PDD, or DLB and (2) DNA available for genetic studies were included in the analysis. All cases included in this study had a clinicopathological diagnosis of DLB or PD with or without dementia; we excluded MSA to focus on primary neuronal synucleinopathies [17]. Of 1922 accessioned cases, 208 cases met the above criteria, with 127 cases having complete genotype data at the outset of our study (training set), and 81 more cases genotyped during the course of our study (test set).

The National Alzheimer’s Coordinating Center (NACC) database is a multi-center collection of clinical and neuropathological data from over 42,000 de-identified individuals across 39 past and present Alzheimer’s Disease Research Centers (ADRCs), as of March 2020 [3]. Through the Alzheimer’s Disease Genetics Consortium (ADGC), genetic data are also available for some of these individuals. Individuals with (1) evidence of Lewy body neuropathology, (2) a presumptive etiological diagnosis of Lewy body disease, (3) from a non-Penn ADRC, and (4) SNP genotypes for AD GWAS loci available through the ADGC, were included in the validation stage of our analysis [22, 23]. Cases with genetic mutations for familial AD were excluded. Seventy cases from 20 ADRCs met criteria and were included in the analysis. Informed consent was obtained from all participants prior to death at each ADRC.

Prior to conducting these studies, approval was obtained from the Penn Institutional Review Board, and informed consent was obtained from all participants prior to death. All procedures in these studies adhere to the tenets of the Declaration of Helsinki.

Immunohistochemistry and neuropathological staging

For the Penn cases, neuropathological characterization of defined brain regions (frontal neocortex, temporal neocortex, parietal neocortex, occipital neocortex, anterior cingulate gyrus, hippocampus including entorhinal cortex, amygdala, basal ganglia, thalamus, midbrain, pons, medulla, cerebellum) was conducted on all cases as previously described [1, 45]. Briefly, each brain region was assessed by hematoxylin and eosin stain in addition to immunohistochemical stains (NAB228 for Aβ generated by Dr. Trojanowski, PHF1 for phosphorylated tau gifted by Peter Davies, SYN303 for aSyn generated by Dr. Trojanowski, and 1D3 for phosphorylated TDP-43 gifted by Manuela Neumann and Elisabeth Kremmer) to assign a semi-quantitative score (none, rare, mild, moderate, or severe) for tau, Aβ, aSyn, and TDP-43 pathologies. For the NACC cases, neuropathological characterization of brain regions was conducted at individual ADRCs in accordance with established guidelines [21]. For all cases, an AD Neuropathologic Change (ADNC) score was also assigned in accordance with the National Institute on Aging’s guidelines for the neuropathologic assessment of AD. Absence of AD co-pathology was defined by an ADNC of None or Low, while presence of AD co-pathology was defined by an ADNC score of Intermediate or High [32].

Genotyping of AD risk variants

Single nucleotide polymorphisms (SNPs) in genome-wide association with AD risk were nominated from the literature. Three AD genome-wide association studies (GWAS) together examining > 70,000 AD subjects and > 380,000 controls were used to identify candidate SNPs [16, 24, 25]. SNPs reaching genome-wide significance (p < 5 × 10–8) in at least two of these three major GWAS were included in our study. Twenty independent loci reached criteria for inclusion.

For the Penn cases, SNP genotype was determined by Illumina Global Screening Arrays (GSA), or TaqMan SNP Genotyping Assays, as previously described [7]. In some cases, proxy SNPs (D′ > 0.8 in the EUR reference population from 1000 Genomes Project Phase 3 [41] were substituted, as indicated in the text. For the NACC cases, SNP genotyping was completed by the Alzheimer’s Disease Genetic Consortium (ADGC) via Illumina or Affymetrix high-density microarrays, as previously described [33].

Association of individuals risk variants with ADNC

Logistic regression models were used to test for association between genotype at each SNP and the presence or absence of AD co-pathology in the Penn cohort (N = 208). Because the APOE locus has three alleles reported to have differential effects in AD [13, 39], we considered the number of APOE E2 and APOE E4 alleles separately. Additional analyses were performed with sex and age at disease onset as covariates in the logistic regression.

Logistic regression model predicting ADNC

Penn autopsy cases were split into training (N = 127, 61%) and test (N = 81, 39%) sets, which is within the range of optimal allocation proportions for large data sets with high data accuracy [8]. The training set comprised the first batch of 127 cases genotyped, for which data were available at the outset of the study, while the test set comprised the next 81 cases genotyped, for which genetic data were obtained during the subsequent replication step of our study. There was no overlap between training and test sets. Backwards stepwise regression was used to develop a binary classifier to predict the presence or absence of AD co-pathology in the training set, with age at disease onset and sex as covariates in the model. Comparison of Akaike information criterion (AIC) at each step was used to determine model fitness, and we estimated predictive performance by 10-fold cross-validation (100 iterations). The Hosmer and Lemeshow goodness-of-fit test [12] was used to evaluate the final logistic regression model developed in the training set.

Model performance at predicting AD co-pathology was assessed in both the Penn training and test sets using receiver operating characteristic (ROC) curves, generating an area under the curve (AUC) for both the training and test sets.

The best-fit model was also applied to LBD subjects from the NACC database, with ROC curve analyses.

Development of the ADNC-RS

An ADNC risk score (ADNC-RS) was calculated for each case based on the best model developed in the Penn-based training set by multiplying the age at disease onset or the risk allele dose by the respective regression coefficient. The risk score can be used to calculate the probability of AD co-pathology using the formula: \(p = e^{\wedge}\left( {\text{risk score}} \right)/\left( {1 + e^{\wedge}\left( {\text{risk score}} \right) } \right)\). Specifically, because our model is a logistic regression model, the output (risk score) is in log odds. Log odds may be converted to odds by taking the antilog (\(e^{\wedge}{{\text{risk score}}}\)). The odds may then be converted to probability using the standard probability (p) formula (\(p = {\text{odds}}/\left( {1 + {\text{odds}}} \right)\)). Scores and probabilities were generated using the “predict” function in the “caret” package in R [20] from the logistic regression model.

Additional details regarding statistical analysis

Analyses were conducted in R (https://www.r-project.org) and Prism 8 (https://www.graphpad.com/scientific-software/prism); R-scripts as well as Penn-based datasets are available in the Supplementary Methods as an Online Resource. The “caret” package was used for cross-validation and model generation [30]. The “ROCR” and “pROC” packages were used for creating and analyzing receiver operating characteristic (ROC) curves [37, 40]. T test, Wilcoxon rank-sum, or Fisher’s exact tests were used to assess differences between clinical variables, as indicated by the distributions of data. For all statistical tests, power was set at 0.8, alpha was set to 0.05, and all tests were two-sided.

Results

Penn LBD cohort characteristics

Two hundred and eight participants from Penn with a primary clinicopathological diagnosis of PD or DLB were included in this analysis. The mean age at clinical disease onset was 64.51 years (SEM 0.70) and at death was 77.67 years (SEM 0.55). The majority of these subjects [n = 163/208 (78.4%)] received clinicopathological diagnoses of PD; 108 of these PD individuals had dementia at the time of death, and 55 did not. Additional diagnoses for this cohort, as well as clinical and demographic details, are shown in Tables 1 and 2.

Table 1 Demographic and clinical characteristics of cohort
Table 2 Clinicopathological diagnosis of cohort

Only 43/208 (20.67%) of this LBD cohort had no ADNC at autopsy, while more than one-third had intermediate or high levels of ADNC (Fig. 1a, Supplementary Table 1, Online Resource). Among the group with no ADNC, 16.8% are identified as primary age-related tauopathy (PART). In the Penn LBD cohort, virtually all individuals with amyloid plaques also demonstrated NFT; indeed, only four individuals had amyloid plaques without evidence of NFTs. Representative immunohistochemical sections of anterior cingulate and middle frontal cortex demonstrating Lewy body pathology with co-occurring Aβ and tau NFT are shown in Fig. 1b.

Fig. 1
figure 1

Alzheimer’s disease neuropathological change (ADNC) scores in N = 208 cases from Penn with a primary clinicopathological diagnosis of PD or DLB. a Number of subjects and % of whole cohort at each level of ADNC. b Representative immunohistochemical sections (160×) demonstrating Lewy body aSyn pathology alone in anterior cingulate (left panel, in red), concomitant Aβ (brown) and aSyn (red) pathology in anterior cingulate (middle panel), and Aβ (red) and tau NFT (brown) pathology in middle frontal cortex (right panel). aSyn pathology was detected with the MJFR13 antibody against phosphorylated aSyn. Aβ pathology was detected with the NAB228 antibody, and tau NFT’s were detected with the 17,028 rabbit polyclonal anti-tau antibody

Clinical differences in PD/DLB patients with vs. without ADNC

Compared to PD/DLB subjects with absent or low levels of ADNC, subjects with intermediate-to-high levels of ADNC were older at disease onset (67.76 vs. 62.52 years, p < 0.001) and had shorter disease duration (11.41 vs. 15.27 years, p < 0.001). They were also more cognitively impaired, with lower MMSE scores (18.08 vs. 21.29, p = 0.02), and greater rates of clinical dementia (89.9% vs. 63.6%, p < 0.001, Table 1), prior to death. The mean time between the last MMSE and death was 2.66 years (SD 2.37).

Association of individual AD risk SNPs with ADNC in PD/DLB patients

Twenty genetic loci have been robustly associated with risk for developing AD by multiple GWAS [16, 24, 25] (Table 3). As shown in Table 4, the number of APOE E4 alleles associated with increased risk for ADNC in PD/DLB (nominal p < 0.001). One other locus near SORL1, represented by rs11218343, approached but did not meet the significance threshold for association with ADNC (nominal p = 0.06). Adjusting for age at onset and sex minimally affected these results (Supplementary Table 2, Online Resource).

Table 3 Genetic loci nominated from Alzheimer’s disease GWAS literature
Table 4 Associations between individual genetic loci and presence of concomitant AD pathology

Development of a model predicting concomitant AD pathology in PD/DLB individuals

In a training set consisting of the first 127 PD/DLB individuals genotyped at Penn, we developed a logistic regression model to predict concomitant AD pathology (defined as intermediate-to-high levels of ADNC). We began by including genotypes at all 20 AD risk SNPs [16, 24, 25] (Table 3), age at disease onset, and sex in the model. We then used backward stepwise regression, with model selection based on the Akaike Information Criterion (AIC). For each model, we also estimated predictive performance by 10-fold cross-validation (100 iterations) within the training set (Fig. 2a).

Fig. 2
figure 2

Backward stepwise logistic regression model selection for predicting concomitant Alzheimer’s disease (AD) pathology in N = 127 cases (training set) with a clinicopathologic diagnosis of PD or DLB from Penn. Concomitant AD pathology is defined as an AD Neuropathological Change (ADNC) score of Intermediate or High. a Akaike information criterion (AIC, left axis) at each step during model selection and the corresponding area under the receiver operating characteristics curve (AUC, right axis), estimated by ten-fold cross-validation, within the training set are shown. Initial model included all AD risk SNPs, sex, and age at disease onset as predictors; sequential elimination of predictors and effect on AIC and AUC are shown from left to right. As the training set cases showed no genetic variability at the TREM2 locus, this locus was not included in the model. b Coefficients (β), standard error (SE), and p values for the four predictors included in the best model (lowest AIC) for predicting concomitant AD in LBD cases

Our best model (by AIC) incorporated only four predictors: age at disease onset, number of APOE E4 alleles, and genotype at the BIN1 and SORL1 loci (Fig. 2b). The Hosmer–Lemeshow goodness-of-fit test for this model produced a χ2(8, N = 127) = 7.578, p = 0.4758, indicating fit. The area under the receiver operator curve (AUC) for this model in our training set data (ten-fold cross-validation) was 0.751 (Fig. 3a), whereas the AUC for a shuffled version of our dataset in which ADNC positive vs. negative status was permuted (null model) was 0.479 (Supplementary Fig. 1, Online resource).

Fig. 3
figure 3

Performance characteristics of the best model for predicting concomitant Alzheimer’s disease (AD) pathology among Penn cases with a clinicopathological diagnosis of PD or DLB. Receiver operating characteristics (ROC) curves and areas under the curve (AUC) of the final model (with age at onset, number of APOE4 alleles, BIN1 genotype, and SORL1 genotype as predictors) in the training (a) and test (b) cohorts are shown. c The Alzheimer’s disease neuropathological change risk score (ADNC-RS) calculated from the best logistic regression model is shown for both the training set and test set cohorts. Individuals positive for ADNC showed higher average ADNC-RS. d The probability of concomitant AD pathology was calculated from the ADNC risk score for each case. Values above 0.5 have a high probability of concomitant AD pathology, while values below 0.5 have a low probability of concomitant AD pathology. The prevalence of concomitant AD pathology at each quintile of ADNC risk score in the training (e) and test (f) cohorts demonstrates fourfold enrichment for the presence of ADNC for individuals in the top quintile vs. individuals in the first two quintiles of risk. *p < 0.05

Model performance in test set

We applied the best model developed in our Penn-based training set to a Penn-based test set of 81 PD/DLB individuals whose data were never used to develop the predictor. Despite differences in the proportion of cases with concomitant AD pathology in the training set (46%) vs. the test set (26%), our model performed equally well in the test set, with an AUC of 0.781 (Fig. 3b).

We additionally performed a subgroup analysis, applying our predictor only in the subset of our 208 Penn cases with a clinicopathological diagnosis of PD or PDD (N = 163), which minimally affected the results (AUC = 0.728, Supplementary Fig. 2, Online Resource).

Development of an ADNC risk score

To develop a clinically-useful tool, we used our logistic regression model to generate a continuous risk score (vs. binary outcome predictor) for concomitant AD pathology (ADNC risk score, or ADNC-RS). An ADNC-RS was calculated for each case using the following formula:

$${\text{ADNC}}{ - }{\text{RS}}\;{ = }\;{ - }{7}{\text{.97717}}\;{ + }\;{0}{\text{.0636}}\;{\text{(age at onset)}}\;{ + }\;{1}{\text{.04327(}}APOE{\text{ E4 alleles)}}\;{ + }\;{0}{\text{.45498}}\;{(}BIN1{\text{ risk alleles)}}\;{ + }\;{1}{\text{.48933(}}SORL1{\text{ risk alleles)}}{.}$$

The distribution of ADNC-RS across both the Penn-based training and test sets is shown in Fig. 3c. The ADNC-RS was significantly higher for PD/DLB individuals with concomitant AD pathology in both the training [M 0.241 (SEM 0.129) vs. M −0.622 (SEM 0.111), p < 0.001] and test sets [M 0.378 (SEM 0.139) vs. M −0.470 (SEM 0.113), p < 0.001], compared to those without concomitant AD pathology. For each case, the ADNC-RS was used to determine the probability of AD co-pathology; the distribution of predicted probability of AD co-pathology is shown in Fig. 3d. We found, importantly, that individuals with ADNC-RS in the highest quintile were four times more likely to have AD pathology than individuals with ADNC-RS in the lowest two quintiles. This enrichment was observed in both the training set (Fig. 3e) and the test set (Fig. 3f) individuals.

Validation of the ADNC risk score in LBD cases from the National Alzheimer’s Coordinating Center (NACC) Database

Having demonstrated that our logistic regression predictor and its associated ADNC-RS performed well in Penn-based individuals from both our training and test sets, we sought to validate this predictor in a national multi-site setting.

The National Alzheimer’s Coordinating Center (NACC) is a national database of clinical and neuropathological data from over 42,000 de-identified individuals across 39 past and present Alzheimer’s Disease Research Centers (ADRCs), as of March 2020. Genetic information for some patients is also available through the Alzheimer’s Disease Genetics Consortium (ADGC). Seventy individuals from 20 non-Penn ADRCs with autopsy-confirmed LBD neuropathology and presumed clinical etiology of LBD were included in this analysis. The mean age at disease onset was 70.49 years (SEM 1.03), and mean age at death was 80.41 years (SEM 0.98). Since many NACC patients are recruited from memory disorder clinics, this group was highly enriched for patients with dementia during life [n = 58/70 (82.9%)] and intermediate/high ADNC at autopsy [n = 62/70 (88.6%)], compared to the Penn-based PD and DLB cohort. Additional clinical and demographic details are shown in Table 5.

Table 5 Demographic and clinical characteristics of NACC validation set

Despite these differences in prevalence of ADNC, applying our best Penn-derived model to the NACC Validation set resulted in a ROC AUC of 0.754 (Fig. 4a), indicating comparable performance to that seen in our Penn-based training (0.751) and test sets (0.781). LBD individuals from the NACC database with AD co-pathology exhibited higher average ADNC-RS than those without AD co-pathology [M 0.552 (SEM 0.109) vs. M −0.179 (SEM 0.244), p = 0.018)] (Fig. 4b). Despite the NACC database’s enrichment for patients with ADNC, higher ADNC-RS continued to correlate with a higher prevalence of AD co-pathology (Fig. 4c).

Fig. 4
figure 4

Performance characteristics of the best model for predicting concomitant Alzheimer’s disease (AD) pathology among non-Penn, NACC cases with neuropathological evidence of Lewy bodies and presumed clinical diagnosis of LBD. a Receiver operating characteristic (ROC) curve and area under the curve (AUC) of the final model (developed in the Penn-based training set, with age at disease onset, number of APOE4 alleles, BIN1 genotype, and SORL1 genotype as predictors) are shown. b The Alzheimer’s disease neuropathological change risk score (ADNC-RS) calculated from the final model is shown for LBD cases from the NACC. Individuals positive for ADNC showed higher average ADNC-RS. c Despite the NACC database’s enrichment for ADNC-positive individuals compared to the Penn-based cases (training and test sets combined), the ADNC-RS correlated with prevalence of ADNC in both groups

Discussion

In this study, we performed an in-depth analysis of 208 PD/DLB cases from Penn to determine whether common genetic variants associated with risk for AD by GWAS might predict which individuals would develop concomitant AD pathology. We first demonstrated that concomitant AD pathology is highly prevalent in PD/DLB patients, with over one-third of the Penn cohort exhibiting intermediate-to-high levels of ADNC. We next evaluated a set of 20 common genetic variants found by multiple AD GWAS to associate with risk for AD, examining their association with ADNC in PD/DLB and developing a best-fit logistic regression model predicting the presence of intermediate-to-high ADNC in these primary neuronal synucleinopathies. A best-fit predictor incorporating only age at disease onset and genotype at 3 SNPs achieved moderately high performance (AUC 0.75–0.78) in both the training set in which it was developed and a held-out test set. From our logistic regression model, we developed a continuous metric, the ADNC-RS, and demonstrated that this simple tool could identify a population of LBD individuals at very high risk for development of concomitant AD pathology. Finally, we applied our logistic regression model and associated ADNC-RS calculator to LBD cases from the national, multi-site NACC database, validating its performance (AUC = 0.754) in a set of 70 cases recruited outside of Penn.

Our findings have clinical implications. Both “proteinopathies” defining ADNC—plaques composed of Aβ and NFT composed of tau—are targetable with drugs in clinical trials now, and, in clinical AD, immunological approaches targeting Aβ have shown enough promise to proceed to Phase III trials [6, 35]. However, within the clinical AD spectrum, the need to identify individuals ever-earlier in the course of pathophysiology [43] to see benefit with these therapies has created considerable problems with feasibility, not to mention potential burden to the healthcare system should any of these therapeutics attain FDA approval. These practical issues have been compounded by the fact that genetic risk scores based on AD GWAS-nominated variants achieve only very modest predictive value in the general population, where the absolute prevalence of AD is relatively low [11]. The performance of such genetics-based risk scores may be vastly improved in a population enriched for the presence of AD pathology, however [9].

Patients with primary clinical diagnoses of PD during life (and LBD at autopsy) represent exactly such an AD pathology-enriched population. Indeed, the prevalence of concomitant AD pathology in this group has been reported to range from 38 to 70%, depending on the definition of AD pathology used, and on whether clinical diagnosis of PD or primary pathological diagnoses of LBD is used [38, 42]. Our study corroborates these findings, with ~ 38% of PD/DLB individuals from Penn demonstrating an intermediate to high degree of ADNC, and only ~ 20% showing no ADNC. As a consequence, in this enriched population, the logistic regression model developed here achieves an AUC of ~ 0.781.

More important from a practical perspective, we use the predictors (and associated weights) identified in our model to develop a risk score for ADNC (the ADNC-RS) that can identify those PD/DLB individuals most likely to exhibit ADNC at autopsy. Indeed, in both our Penn-based training and test sets, those individuals with ADNC-RS in the top 20% are four times more likely to develop ADNC than LBD individuals with ADNC-RS in the bottom 40%, while in the NACC validation set, higher ADNC-RS still correlated with higher likelihoods of individuals having ADNC, despite the NACC database’s bias towards individuals with ADNC. Because the ADNC-RS requires knowledge of only the age at disease onset and genotype at 3 AD risk SNPS, it can be easily calculated in most settings using results from a simple blood sample. Thus, the ADNC-RS developed here might serve as a screening step enriching for those PD/DLB individuals who warrant assessment for development of ADNC using more expensive modalities such as Aβ or tau imaging. Moreover, as plasma biomarkers for AD are emerging now [15, 18, 29, 34] future studies incorporating plasma biomarkers with the clinico-genetic predictor described here may further improve accuracy.

How certain can we be of our model and associated risk score? While the definitive answer to this question will lie in future studies investigating other cohorts, several aspects of our current study increase confidence. First, we nominate candidate genetic variants for inclusion in model development in an unbiased manner, starting with all loci reported to associate with risk for AD across two or more major GWAS studies. Second, we use strict criteria that are widely accepted in the field for defining ADNC. Third, in the first two stages of our study, we employ a training set/test set design in our analyses, with each group defined by consecutive genotyping of autopsy cases diagnosed with PD or DLB. Such a design guards against over-fitting, and our results confirm that we are not over-fitting the training set data, since performance in the test set is as high as in the training set. Indeed, because completion of our test set cases followed completion of our training set cases, these two subgroups had different levels of concomitant AD pathology (46% of cases with concomitant AD pathology in the training set vs. 26% in the test set), but the ADNC-RS performed equally well in enriching for individuals with AD co-pathology in both subgroups. Finally, we validated our findings in a multi-site group of LBD individuals recruited outside of Penn, finding that the ADNC-RS performed equally well in a dataset highly enriched with concomitant AD pathology (88.6% of cases).

Limitations of the current study should be considered alongside the previously-mentioned strengths. In particular, although our sample sizes of 208 neuropathologically characterized PD/DLB cases from Penn and 70 LBD cases from NACC are not small, a larger sample, across multiple centers, would be a valuable addition to the work presented here. In addition, further investigations of the cognitive consequences of ADNC in PD or DLB patients would add clinical depth to our findings. Third, because the focus of this study was neuropathological, we defined our cohort neuropathologically, rather than using a clinicopathological diagnosis of PD. That said, a subset analysis of the 163 individuals in our Penn-based LBD cohort with a clinical diagnosis of PD yielded near-identical results. In the future, however, a clinically defined study in a PD population, verifying the presence or absence of ADNC by imaging, could extend the current work. Finally, we recognize that the LBD cases in our NACC Validation set may differ clinically from the PD/DLB cases characterized at Penn, because most NACC participants are recruited at memory disorders clinics. That said, we selected for only the NACC LBD cases whose clinical diagnosis was presumed to be LBD (n = 70 out of 559 NACC cases with autopsy and genetic data). Moreover, in thinking about the potential clinical uses of our predictor, we are encouraged by its high performance in this Validation set, since heterogeneity is the norm rather than the exception in most clinical contexts.

In addition to the clinical implications discussed above, the biological implications of our study are also worth considering. Specifically, the genetic loci identified in our final model predicting ADNC in LBD individuals were APOE, BIN1, and SORL1. Many functions for APOE have been reported, but a consistent finding over many years is that the APOE E4 allele (included in our predictive model) encodes a form of this protein that binds Aβ less efficiently [46]. BIN1 encodes a protein that functions in beta-secretase 1 trafficking, which in turn can impact the production of Aβ. SORL1 encodes the sortilin-related receptor 1, which also functions in intracellular trafficking, including the sorting of APP to the retromer pathway for degradation or to the endosome-lysosome system, where APP is cleaved to generate Aβ. Collectively, the fact that our best predictive model incorporates these three genetic loci underscores the importance of Aβ production and processing in the development of ADNC among LBD individuals. In addition, direct interaction between BIN1 and tau regulates tau phosphorylation, which may affect the development of AD pathology via a different route [26]. Interestingly, among these three genetic loci, the SORL1 locus exerted the strongest effect in our model, with a coefficient of ~ 1.5 compared to ~ 1 for the APOE locus. As the SNP at the SORL1 locus is relatively rare (minor allele frequency of 0.04), the contribution to AUC may be seen in only a small fraction of individuals, however. In contrast, in the general population, among AD common genetic risk, APOE has by far the largest effect size.

In summary, we present our findings from a study of 208 PD/DLB cases at Penn, validated in 70 additional LBD cases from the multi-site NACC database, demonstrating that age at disease onset and genotype at 3 SNPs is sufficient to identify a subset of LBD individuals at very high risk for development of concomitant AD pathology. The development of molecular tools such as the ADNC-RS reported here may in turn be permissive for strategies to target Aβ and tau accumulation in PD and other LBD.