Data-driven causal model discovery and personalized prediction in Alzheimer's disease

Zheng, Haoyang; Petrella, Jeffrey R.; Doraiswamy, P. Murali; Lin, Guang; Hao, Wenrui

doi:10.1038/s41746-022-00632-7

Data-driven causal model discovery and personalized prediction in Alzheimer's disease

Article
Open access
Published: 08 September 2022

Volume 5, article number 137, (2022)
Cite this article

Download PDF

You have full access to this open access article

npj Digital Medicine

Data-driven causal model discovery and personalized prediction in Alzheimer's disease

Download PDF

5950 Accesses
5 Citations
18 Altmetric
3 Mentions
Explore all metrics

Abstract

With the explosive growth of biomarker data in Alzheimer’s disease (AD) clinical trials, numerous mathematical models have been developed to characterize disease-relevant biomarker trajectories over time. While some of these models are purely empiric, others are causal, built upon various hypotheses of AD pathophysiology, a complex and incompletely understood area of research. One of the most challenging problems in computational causal modeling is using a purely data-driven approach to derive the model’s parameters and the mathematical model itself, without any prior hypothesis bias. In this paper, we develop an innovative data-driven modeling approach to build and parameterize a causal model to characterize the trajectories of AD biomarkers. This approach integrates causal model learning, population parameterization, parameter sensitivity analysis, and personalized prediction. By applying this integrated approach to a large multicenter database of AD biomarkers, the Alzheimer’s Disease Neuroimaging Initiative, several causal models for different AD stages are revealed. In addition, personalized models for each subject are calibrated and provide accurate predictions of future cognitive status.

Challenges and Opportunities with Causal Discovery Algorithms: Application to Alzheimer’s Pathophysiology

Article Open access 19 February 2020

Estimation of Causal Effects in the Presence of Unobserved Confounding in the Alzheimer’s Continuum

Observational studies in Alzheimer disease: bridging preclinical studies and clinical trials

Article 31 October 2022

Introduction

Among the top 10 causes of death in the United States, Alzheimer’s disease (AD) is the only condition without a viable treatment to cure or prevent it, or even significantly slow its progression¹. Failure to develop a successful disease-modifying therapy for AD, despite large investments of public and private resources, is rooted in its complexity^2,3,4. For instance, signaling pathway analyses of AD pathophysiology has implicated over 30 metabolic pathways and over 1000 chemical species⁴. Our incomplete understanding of how these mechanisms vary and interact at an individual level to create a clinically and biologically heterogeneous phenotype has resulted in an attempt to treat patients with varying underlying pathophysiology in a similar fashion^5,6. Thus, failure to characterize and subtype AD at an individual level has represented a major roadblock in the development of effective therapeutic strategies to slow or halt AD progression. Recent biological classification of AD, based on imaging and cerebral spinal fluid (CSF) biomarkers, represents a major step toward the future development of personalized prognoses and therapeutic strategies^7,8. The increasing availability of such data in large cohorts of subjects has made possible the development and testing of rigorous quantitative models of AD pathophysiology. For example, the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a multicenter, prospective, naturalistic study, began in 2003, comprises four sequential studies—ADNI-1, ADNI-GO, ADNI-2, and ADNI-3—which followed subjects up to 15 years, using genetic, blood- and CSF-based, imaging, and cognitive biomarkers. The abundance of data from this and similar multinational biomarker studies in AD will require a rigorous quantitative data-driven modeling approach to analyze, integrate and interpret data at the level of the individual, where it can have maximum clinical impact.

Several mathematical models of AD progression have been developed recently. For example, one mathematical model includes a cellular biologic system of neurons, glia, macrophages, amyloidβ aggregation, and tau to simulate and validate at a cellular level the mechanisms underlying the failure of several drugs in recent clinical trials, and suggest alternative approaches⁹. Moreover, a mathematical modeling approach has also been used to describe the key AD clinical biomarkers including pathologic hallmark biomarkers (beta-amyloid and tau), neuronal loss biomarkers, and cognitive impairment¹⁰. This model was parameterized and tested to successfully simulate the natural history scenarios of three sub-types of AD presented in¹¹: (1) early-onset autosomal dominant AD, (2) late-onset amyloid-first AD, and (3) late-onset tau-first AD.

Although these mathematical models bring new insights in understanding AD progression and enable simulation of therapeutics, the current models are built upon a priori hypotheses of the AD pathophysiological network which still is an open area of research¹². In fact, there are dozens of pathophysiological pathways implicated in AD by systems biologists, and our understanding of these networks and their interactions remains incomplete¹³. Moreover, there has been limited work on mechanistic modeling of clinically measurable AD biomarkers. Most research to date on the keyAD biomarkers has been observational or correlational. Such modeling approaches do not benefit from the tools of a more integrative systems approach that address disease mechanism^14,15.

Computational data-driven modeling approaches have already achieved success in analyzing multi-dimensional clinical data in diseases such as cancer^16,17 and cardiovascular disease^18,19. Such data-driven approaches employ mathematical models for patient populations using clinical, omics, and biomarker data, as well as powerful and new means to personalize such models based on individual data, yielding personal risk profiles. These data-driven modeling approaches can simulate complex systems, helping to elucidate complex physiological interactions and optimize personalized prevention and treatment strategies. Examples of such work include statistical approaches, such as Bayesian generalized linear models²⁰, Bayesian hierarchical models^21,22, and those based on Markov chain Monte Carlo simulations^23,24,25,26 to analyze genome sequencing and biomarker dynamics. Recently, machine learning techniques, such as deep recurrent neural networks, have been used to predict AD progression²⁷.

In this paper, we propose to develop a computational data-driven modeling framework to predict AD biomarker progression. We propose a methodology to construct data-driven causal models at a group and individual patient level. This method does not depend on any specific hypothesis of AD progression and extracts the causal model completely from the empirical data. More specifically, we derive the causal model based on clinical biomarkers in the ADNI dataset. In this data-driven modeling approach, the causal model is learned from four biomarkers (amyloid-beta pathology, total-tau pathology, hippocampal volume, and cognitive decline) to describe AD progression. Moreover, we incorporate a disease progression score (DPS) in the causal model²⁸ to unify AD progression for different subjects since the onset age and rates of progression may markedly vary within and across the different subject classes in ADNI.

Results

We elaborate on the effectiveness of the proposed data-driven causal model here. First, we construct a population-based causal model that describes the biomarker dynamics for all eligible subjects in ADNI-1, including normal controls. By fitting the population parameters via the ADNI dataset, the population model describes the transition of AD biomarkers between three different disease stages, cognitively normal (CN), late mild cognitive impairment (LMCI), and Alzheimer’s disease (AD). Second, we derive a population model for LMCI and AD subjects only. Third, we analyze the Sobol sensitivity^29,30 of the parameter space of the population model, which identifies the attribution of each model parameter. Based on the sensitivity analysis results, we finally construct a personalized model for each subject and provide personalized biomarker predictions for subjects who have more than four longitudinal biomarker data points.

A population model

We construct a causal model by fitting biomarkers of all subjects across the ADNI dataset. Since the causal model is a dynamic system expressed as ordinary differential equations (ODEs), we require at least two longitudinal data points for each subject. More specifically, we remove patients who do not provide at least two measurements for any one of the four biomarkers. The histograms in Fig. 1(a) summarize the available biomarker data in the ADNI dataset.

**Fig. 1: The ADNI dataset histogram and the calibrated causal model plots.**

Algorithm 1

Population model calibration algorithm to compute the population parameters w⁽¹⁾ and DPS parameters (α, β). See details in Methods section.

Input ${{{\boldsymbol{y}}}}={\{{y}_{ijk}\}}_{ijk}$, ${{{\boldsymbol{t}}}}={\{{t}_{ij}\}}_{ij}$.

Initialize α⁰, β⁰, and w⁰.

1: for l=1 to L do

2: for k ∈ {A, T, N, C} do ⊳ Population parameter calibration

3: ${{{{\boldsymbol{w}}}}}_{k}^{l}={{{\mbox{argmin}}}}_{{{{\boldsymbol{{w}}}_{k}}}}{\sum }_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l-1}\right)\right)}^{2}.$

4: ${\sigma }_{k}^{l}=\frac{1}{| {{{{\mathcal{I}}}}}_{k}-2I-4| }{\sum }_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l-1}\right)\right)}^{2}.$

5: end for

6:

7: for i=1 to I do ⊳ Update DPS parameters

8: $({\alpha }_{i}^{l},\,{\beta }_{i}^{l})={{{\mbox{argmin}}}}_{{\alpha }_{i},{\beta }_{i}}{\sum }_{(j,k)\in {{{{\mathcal{I}}}}}_{i}}\frac{1}{{\sigma }_{k}^{l}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}^{l}{t}_{ij}+{\beta }_{i}^{l};{{{{\boldsymbol{w}}}}}_{k}^{l}\right)\right)}^{2}.$

9: end for

10: end for

Output w^L as the population parameter w⁽¹⁾, α^L, β^L.

By using Algorithm 1, the initial value of α_i is randomly chosen in (0, 4) and the initial value of β_i satisfies − 10 ≤ s_i(t) ≤ 20 on all the measurement. Then we obtain the population model in terms of the fitted DPS shown in Fig. 1(b). The population model (black solid) is learned on three different disease stages, namely, CN (orange circle), LMCI (green circle), and AD (blue hexagons). The gray area is the confidence interval of the population model. More specifically, we sample the population parameters, w⁽¹⁾, from the posterior distribution (given by the simulation study) and run the model with the same initial condition 1000 times. Then the 95% confidence interval at every time point is plotted. The simulation study and diagnostic plots corresponding to the population model are shown in the Supplementary Materials. From this figure, we can separate biomarkers into three stages according to the population model. In particular, CN and AD patients correspond to s < 0 and s > 0, respectively while LMCI patients locate around s = 0. Moreover, the first three biomarkers (A_β, τ, and N) start at steady-states when s < 0 (CN), change gradually when s = 0 (LMCI), and finally approach another steady-state (AD). Different from other biomarkers, ADAS continues to grow which means that cognitive symptoms get worse as AD progresses.

We also compare the population model with the sigmoid function fitting (black solid in Fig. 2). First, the population model provides relatively smooth transitions from one stage to another while the sigmoid function fitting gives more abrupt changes for A_β at s ≈ 4. Second, the population model follows the biomarker cascade theory which is that τ rises after A_β starts decreasing, N increases after τ, and C rises after N. However, the sigmoid function fitting makes A_β and τ change after s = 0, while N and C change at s ≈ −3 and s ≈ −10.

**Fig. 2: Subject biomarkers in ADNI data and the fitted sigmoid function.**

Since the CN group might not follow the same disease trajectory, we next derive a population model based on the LMCI and AD groups only. In order to better identify the biomarker dynamics among LMCI and AD group patients, we fix the parameters of DPS, (α, β), that we obtained before and only update the causal model parameters, w, by using Algorithm 1. Figure 1(c) shows the population model based on LMCI and AD groups.

The transitions for different biomarkers shown in Fig. 1(c) are similar to Fig. 1(b). But Fig. 1(c) advances the onset of changes since the LMCI and AD groups are prone to suffer from cognitive decline earlier. We summarize the parameters of the above-mentioned causal models in Table 1, which corresponds to the results given in Fig. 1(b) & (c).

Table 1 Population parameters w⁽¹⁾ of the calibrated causal models based on the ADNI dataset.

Full size table

Sensitivity analysis

The quasi-Monte Carlo method is applied to compute sensitivity indices. For more details about Sobol sensitivity analysis, please refer to^29,30. By taking C(0) as the output, Fig. 3(a) list the top nine most sensitive parameters for the first-order effects and total order sensitivity index. We see that the weight with greater first-order impact (S_m > 0.4) is associated with A_β(⋅).

The right figure in 3(a) shows the second-order interaction between two parameters. We see that the parameters associated with ${A}_{\beta }^{2}$ are always positively related to other terms. While A_β with parameter w_A1 is almost positively related with other terms, the A_β term with parameter w_T3 are negatively related with other parameters except ${A}_{\beta }^{2}$. But compared to their first-order sensitivity contribution, the second-order ones contribute slightly.

Figure 3(b, c) shows the dynamics of sensitivities with respect to DPS. From the figures, we can see the first-order sensitivity value of w_A1 drops down over DPS which implies that the effect of Abeta on cognitive decline switches from linear to nonlinear in later-stage disease. At the same time, the first-order sensitivity values of some other parameters increase gradually, with a notable increase of w_C3 and w_C5. The second-order sensitivities between different parameters eventually converge to zero thus the interactions among different parameters become less as the biomarkers reach equilibrium. Based on the results shown in Fig. 3(a), we select w_A1, w_A2, w_T4, w_T5, w_N4, w_N5, w_C3, and w_C5 as the most sensitive parameters for personalization by setting the threshold, Tol, as 0.01 in Algorithm 2.

Personalized model and biomarker prediction

Algorithm 2

Personalized model calibration algorithm. The personalized parameters are initialized by the population model. The personalized models are applied for subjects who meet the requirement denoted as i ∈ Ω.

Input longitudinal biomarker data {y_ijk} at {t_ij} with i ∈ Ω;

Input the DPS parameter values (α_i, β_i) for each subject i ∈ Ω;

Input the population parameter values w⁽¹⁾ (w for simplicity);

Input sensitivity threshold, TOL.

1: for m=1 to 21 do ⊳ First order sensitivity.

2: ${{{{\rm{S}}}}}_{m}(z)=\frac{{{{{\rm{Var}}}}}_{{w}_{m}}\left[{{{{\rm{E}}}}}_{{{{\rm{{w}}}_{ \sim m}}}}(z| {w}_{m})\right]}{{{{\rm{Var}}}}(z)}.$

3: if S_m(z)≥ TOL then

4: set w_m as a personalized parameter and denote as ${w}_{m}^{(2)}$ else

5: keep w_m as a population parameter.

6: end if

7: end for

8:

9: for i=1 to ∣Ω∣ do ⊳ Personalized model calibration.

10: for k ∈ {A, T, N, C} do

11: Denote the personalized parameters in k-th equation as ${{{{\boldsymbol{{w}}}_{k}}}}^{(2)}$.

12: ⊳ Select parameters to calibrate.

13: ${{{{\boldsymbol{{w}}}_{k}}}}^{(2)}={\arg \min }_{{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}}\mathop{\sum }\limits_{j=1}^{M-1}{\left({\hat{y}}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}\right)\right)}^{2}.$

14: $P{A}_{ik}=\frac{{\hat{y}}_{iMk}-{f}_{k}\left({\alpha }_{i}{t}_{(iM)}+{\beta }_{i};{{{{\boldsymbol{{w}}}_{k}}}}^{(2)}\right)}{{\hat{y}}_{iMk}}\times 100 \% .$

15: ⊳ Compute prediction accuracy.

16: end for

17: end for

Output PA_ik for i ∈ Ω and k ∈ {A, T, N, C}.

Next, we build personalized models and provide biomarker prediction for subjects whose data satisfies the following two criteria: (1) There are at least four measurements for each biomarker; (2) Each biomarker measurement changes monotonically with respect to DPS. Based on the first-order sensitivity analysis results shown in Fig. 3(a), we chose the eight most sensitive parameters as personalized parameters by choosing TOL = 0.01 in Algorithm 2. For each subject, we denote the biomarker data as $\hat{{{{\boldsymbol{y}}}}}({s}_{i})={[{\hat{{{{\boldsymbol{A}}}}}}_{\beta }({s}_{i})\hat{{{{\boldsymbol{\tau }}}}}({s}_{i})\hat{{{{\boldsymbol{N}}}}}({s}_{i})\hat{{{{\boldsymbol{C}}}}}({s}_{i})]}^{T}$ (i = 1, ⋯ , M), fit the sensitive personalized parameters of the population model w⁽¹⁾ by using the first M − 1 data points, and test the prediction accuracy on the last data point by $\frac{\hat{{{{\boldsymbol{y}}}}}({s}_{M})-{{{\boldsymbol{y}}}}({s}_{M})}{\hat{{{{\boldsymbol{y}}}}}({s}_{M})}\times 100 \%$. A detailed procedure is outlined in Algorithm 2.

Figure 4 shows the biomarker trajectories of the personalized model by training (blue) and testing (red) data for one subject (pseudo ID = 18). We also compare the personalized model with the sigmoid function fitting, the personalized model provides a better prediction accuracy. In fact, the prediction accuracies given by the personalized model are 97.3% (A_β), 95.9% (τ), 98.4% (N), and 95.1% (C), respectively while the ones given by the sigmoid function fitting are 95.5% (A_β), 90.8% (τ), 95.7% (N), and 63.4% (C), respectively. Since the sigmoid function fitting predicts by using the longitudinal information of the current biomarker only, it provides a less accurate cognitive score.

**Fig. 4: The personalized model for one LMCI subject with pseudo ID = 18.**

Furthermore, we build personalized models for the CN and LMCI groups (there are not enough data points in the AD group) with different numbers of longitudinal data points and summarize the predictive results in Tables 2–3. The tables indicate that our personalized models can provide high predictive accuracy compared to the sigmoid function fitting. Moreover, the accuracy of predicting biomarker dynamics increases as the number of biomarkers data points increases.

Table 2 The prediction accuracy summary for CN subjects using different numbers of longitudinal biomarker datapoints (n).

Full size table

Table 3 The prediction accuracy summary for LMCI subjects using different longitudinal data points (shown in the first column).

Full size table

Discussion

Different from the existing pathophysiological AD network which is based on a priori assumptions about biomarker trajectories, this work develops a data-driven causal modeling approach informed by AD clinical biomarker data and demonstrates both population and personalized models. The proposed population model traces the general biomarker dynamics for all patient data without any specific assumptions regarding the form of the model and enables personalized AD risk prediction via incorporating historical clinical data such as CSF protein and imaging biomarkers as well as cognitive scores. By introducing a DPS for each subject, we calibrate and scale AD biomarker progression across the ADNI population and derive population parameters. We also compare the proposed data-driven modeling approach to an empirical fitting approach with a sigmoid function fitting and conclude that the proposed causal model is able to better capture disease progression with a smoother transition over time. Moreover, this causal model allows us to explore the underlying cascade relationship among biomarkers, while the empirical sigmoid function approach considers each biomarker as an independent term. The population model not only provides a means to classify different stages of AD progression for each biomarker, but also lays the foundation for personalized modeling.

Before constructing the personalized model, we performed a sensitivity analysis for the population parameters. From a clinical standpoint, the sensitivity analysis provides insights on AD progression in terms of which parameters play the greatest role in disease progression, and when during the disease course they are most relevant. From a computational standpoint, the sensitivity analysis aids the subsequent personalized parameter selection . Based on the sensitivity analysis, we see that change in cognition is driven primarily by first-order effects and is time-dependent. Initially, the greatest effects are by amyloid, represented by w_A1, and to a lesser extent tau and neuronal vulnerability to tau, represented by w_N4 and w_N5, respectively. The amyloid parameter w_A1 is most sensitive when the disease starts (DPS = 0) and the sensitivity diminishes as DPS increases. On the other hand, the sensitivity of parameters related to N and C, namely w_C3 and w_C5, increase significantly as the disease progresses. Thus, the sensitivity analysis suggests that at the early stage of AD cognitive decline is driven by A_β levels and sensitivity decreases linearly as the disease progresses. Whereas at the later stages, cognitive decline is driven mainly by downstream effects including the level of neuronal degeneration, represented by w_C3, and the interaction of cognition and neuronal degeneration, represented by w_C5. These results are consistent with prior observational studies based on ADNI and other longitudinal cohorts, which suggest that cognitive decline is driven primarily by high amyloid levels at earlier disease stages and by neurodegeneration at later stages³¹.

Sensitivity analysis also provides key insights in terms of personalized parameter selection. The paucity of longitudinal biomarker data and the relatively larger number of model parameters can easily lead to overfitting for personalized models. Based on the sensitivity analysis results, we chose the eight most sensitive population parameters as personalized parameters and set the rest of the parameters at the mean population parameter values. In this case, calibration of personalized parameters based on sparse longitudinal biomarker data for each patient avoids the overfitting issue and provides a high-precision personalized prediction for each subject, as outlined in Results section.

Limitations of this work include sampling bias. Because the ADNI dataset is a research cohort from academic clinics, only one-third of ADNI subjects agreed to provide CSF biomarkers. Thus we need to replicate these findings using data from more general practice settings in the future. Despite these limitations, this model advances our understanding of the complexity of AD biomarker pathophysiology over that of current biomarker models which have primarily been independent and ad hoc in nature, with inherent assumptions regarding the shape of individual biomarker trajectories. Our current approach is integrative and based on the cascade mechanism, yet without assumptions regarding the exact mathematical form of the individual biomarker models or the resulting shape of the biomarker trajectories. In the future, we intend to extend the current approach to the spatiotemporal domain by utilizing longitudinal imaging data to determine mechanisms driving the spread of pathology in time and space.

Methods

We propose a pathophysiology and data-driven modeling approach to construct a causal model of AD clinical biomarkers. We construct a causal model from the serial clinical biomarker measures across 819 subjects from the ADNI-1 datasets with mild AD (N = 192), late mild cognitive impairment (LMCI, N = 398), and normal cognition (N = 229) (more details are shown in Table 4). We use PseudoIDs instead of RIDs to link across all clinical biomarker data belonging to a patient. The CSF proteins measured in ADNI are the following A-Beta 42 and Phosphorolated tau 181 (p-tau 181)^32,33. These measures were obtained through serial spinal taps on subjects over approximately two-year intervals. Of note, A-Beta in the CSF goes down, and total and phosphorylated tau go up as the disease progresses. Hippocampal volume, a measure of neurodegeneration, was measured through volumetric analysis of serial MRI images obtained at approximately one-year intervals. It goes down as the disease progresses. Finally, cognitive decline was measured through a pencil-and-paper neuropsychological test, the thirteen-item Alzheimer s Disease Cognitive Assessment Scale (ADAS13). This measures function in several cognitive domains affected by AD, including memory, language, and praxis and is the de facto primary outcome measure in AD clinical trials. It goes up as the disease progresses.

Table 4 The characteristics of the ADNI-1 dataset in data-driven modeling.

Full size table

After constructing and calibrating the population model with data across all ADNI subjects, we then personalize the parameters of the model using each patient’s longitudinal data to provide a personalized prediction of biomarker trajectories. The overall modeling approach is outlined in Fig. 5, and each step is elaborated in the following subsections.

**Fig. 5: A flowchart of the pathophysiology and data-driven modeling approach.**

The data-driven causal model learning via ADNI dataset

Four AD biomarkers are key factors in AD diagnosis and monitoring of AD progression, and include amyloid-beta A_β, tau τ, neuronal degeneration N, and cognitive decline C. Amyloid-beta is the main component of amyloid plaques and is considered to be an early event of the pathological cascade of AD. Amyloid production leads to downstream Tau phosphorylation causing the formation of neurofibrillary tangles and neuropil threads. Tau is a microtubule-associated protein, which is very common in neurons of the central nervous system. Both amyloid-beta and tau phosphorylation contribute to neuronal degeneration and cognitive decline.

To describe the cascade relationship among the above-mentioned four biomarkers of AD progression, we consider a canonical system of ODEs to describe their relations. The amyloid-dependent cascade is initiated by amyloid-beta pathology A_β, and mediated via tau τ. Neuron degeneration N starts with the rise of tau τ, and in turn, leads to the initiation of cognitive decline C. According to the above description, we consider the causal model as the system of ODEs:

$$\left\{\begin{array}{lll}\frac{d{A}_{\beta }}{dt}=\mathop{\sum }\limits_{\ell = 0}^{m}{w}_{1,\ell }{\phi }_{\ell }({A}_{\beta });\frac{d\tau }{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{2,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}({A}_{\beta },\tau );\\ \frac{dN}{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{3,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}(\tau ,N);\frac{dC}{dt}=\mathop{\sum }\limits_{| {{{\boldsymbol{\ell }}}}| \le m}{w}_{4,{{{\boldsymbol{\ell }}}}}{\psi }_{{{{\boldsymbol{\ell }}}}}(N,C),\end{array}\right.$$

(1)

where ℓ = (ℓ₁, ℓ₂), ∣ℓ∣ = ∣ℓ₁∣ + ∣ℓ₂∣, and m is the degree of the model. We choose the polynomial basis function in the initialized ODE model, namely,

$${\phi }_{\ell }(x)={x}^{\ell },\,\,{\psi }_{{{{\boldsymbol{\ell }}}}}(x,y)={x}^{{\ell }_{1}}{y}^{{\ell }_{2}}.$$

(2)

We then learn the causal model parameters in (1) by using ADNI data. More specifically, we use CSF amyloid-beta 1-42 (A_β), CSF total tau (τ), the ratio of hippocampal volume to whole-brain volume on MRI (N), and the Alzheimer’s Disease Assessment Scale-cognitive (C) to calibrate A_β, τ, N, and C, respectively in the causal model. In order to denoise longitudinal data for different subjects, we applied a sigmoid interpolation for each biomarker. Moreover, because AD has a different time of onset and rate of progression for different subjects, we employ DPS²⁸ to unify the time scale across subjects in the causal model.

Disease progression scores

For different subjects in ADNI, the onset of disease and rate of progression are different within and among subject classes of CN, LMCI and AD. To fit the causal model for all subjects in the ADNI-1 study, we standardize the longitudinal measurement among patients by employing the DPS²⁸. In particular, we define DPS s_i(t) as a linear function of the patient’s age t for each patient:

$${s}_{i}(t)={\alpha }_{i}\cdot t+{\beta }_{i},$$

(3)

where i = 1, 2, ⋯ , I is the patient index, α_i is the rate of AD progression, and β_i is the age of AD onset.

The sigmoid function fitting

We fit each biomarker data in ADNI to a sigmoid function. Specifically, each biomarker is parameterized by four parameters ${{{{\boldsymbol{\theta }}}}}_{k}={[{a}_{k},{b}_{k},{c}_{k},{d}_{k}]}^{T}$:

$${g}_{k}\left(s;{{{{\boldsymbol{\theta }}}}}_{k}\right)={a}_{k}{(1+{e}^{-{b}_{k}\left(s-{c}_{k}\right)})}^{-1}+{d}_{k},$$

(4)

where a_k is a magnitude scale of the function, b_k is a slope coefficient, and c_k and d_k determine function positions. Here we take g₁(s) = A_β(s), g₂(s) = τ(s), g₃(s) = N(s), g₄(s) = C(s) and denote ${{{\boldsymbol{g}}}}={({g}_{1},{g}_{2},{g}_{3},{g}_{4})}^{T}$.

Next, we apply the sparse learning to reveal the causal model in (1) which is re-written as

$$\frac{d{{{\boldsymbol{x}}}}}{ds}=\mathop{\sum}\limits_{| \ell | \le m}{{{{\boldsymbol{\phi }}}}}_{\ell }({{{\boldsymbol{x}}}}){w}_{\ell },\,{{{\rm{where}}}}\,{{{\boldsymbol{x}}}}={({A}_{\beta },\tau ,N,C)}^{T}\in {R}^{4}.$$

By taking uniform grid points ${\{{s}_{i}\}}_{i = 1}^{M}$ on s ∈ [−10, 20], we denote

$${D}_{i}=[{\phi }_{{\ell }_{1}}({{{\boldsymbol{g}}}}({s}_{i})),\cdots \,,{\phi }_{{\ell }_{n}}({{{\boldsymbol{g}}}}({s}_{i}))]\,and\,{b}_{i}=\frac{d({{{\boldsymbol{g}}}}({s}_{i}))}{ds},$$

where ℓ₁, ⋯ , ℓ_n are in the set of ∣ℓ∣ ≤ m. By expanding

$$D=\left(\begin{array}{l}{D}_{1}\\ \vdots \\ {D}_{M}\end{array}\right){{{\rm{and}}}}\,b=\left(\begin{array}{l}{b}_{1}\\ \vdots \\ {b}_{M}\end{array}\right),$$

we learn the causal model via the following Lasso regression, namely,

$$\mathop{\min }\limits_{w}\parallel Dw-b{\parallel }_{2}^{2}+\lambda \parallel w{\parallel }_{1},$$

(5)

where ∥w∥₁ enforces the sparsity.

Here we keep the polynomial degrees among all the variables in the causal model be consistent and choose m = 4 with λ = 10⁻⁷ in (5). By performing Lasso, we find the result is consistent with the causal model when m = 2 but different from the one with m = 1, which indicates the optimal choice of the causal model is m = 2. Then the general causal model of ODEs describing the progression of AD biomarkers is summarized below (All rights to the in-silico model belong to the authors and it cannot be used for any commercial purpose without permission):

$$\left\{\begin{array}{l}\frac{d{A}_{\beta }}{ds}={w}_{A0}+{w}_{A1}{A}_{\beta }+{w}_{A2}{A}_{\beta }^{2};\\ \frac{d\tau }{ds}={w}_{T0}+{w}_{T1}\tau +{w}_{T2}{\tau }^{2}+{w}_{T3}{A}_{\beta }+{w}_{T4}{A}_{\beta }^{2}+{w}_{T5}{A}_{\beta }\tau ;\\ \frac{dN}{ds}={w}_{N0}+{w}_{N1}N+{w}_{N2}{N}^{2}+{w}_{N3}\tau +{w}_{N4}{\tau }^{2}+{w}_{N5}\tau N;\\ \frac{dC}{ds}={w}_{C0}+{w}_{C1}C+{w}_{C2}{C}^{2}+{w}_{C3}N+{w}_{C4}{N}^{2}+{w}_{C5}NC,\end{array}\right.$$

(6)

with an initial condition A_β(−10) = y₀ and τ(−10) = N(−10) = C(−10) = 0, where y₀ is also a parameter that we consider as a small positive value to initiate the cascade.

Population model calibration

First, we calibrate the learned causal model by using the ADNI dataset and rewrite (6) as the following population model

$$\left\{\begin{array}{lll}\frac{d{A}_{\beta }}{ds}=\mathop{\sum }\limits_{\ell = 0}^{2}{w}_{A,\ell }^{(1)}{A}_{\beta }^{\ell },\frac{d\tau }{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{T,\ell }^{(1)}{A}_{\beta }^{{\ell }_{1}}{\tau }^{{\ell }_{2}},\\ \frac{dN}{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{N,\ell }^{(1)}{\tau }_{\rho }^{{\ell }_{1}}{N}^{{\ell }_{2}},\frac{dC}{ds}=\mathop {\sum}\limits_{| \ell | \le 2}{w}_{C,\ell }^{(1)}{N}^{{\ell }_{1}}{C}^{{\ell }_{2}},\end{array}\right.$$

(7)

where ${{{\boldsymbol{w}}}}=\{{w}_{A,\ell }^{(1)},{w}_{T,\ell }^{(1)},{w}_{N,\ell }^{(1)},{w}_{C,\ell }^{(1)}\}$ denote the population parameters. We also denote f₁(s) = A_β(s), f₂(s) = τ(s), f₃(s) = N(s), and f₄(s) = C(s) with the initial conditions f₁(−10) = y₀, f₂(−10) = f₃(−10) = f₄(−10) = 0. Then the population parameters are calibrated based on the ADNI dataset by minimizing the sum of squared differences between the data and the solution of the causal model, namely

$$\mathop{\min }\limits_{{{{\boldsymbol{{w}}}_{k}}}}\mathop{\sum}\limits_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2},\,\,(i,j,k)\in {{{\mathcal{I}}}}$$

(8)

where y_ijk is the k-th biomarker data for i-th patient at j-th visit and ${{{{\mathcal{I}}}}}_{k}$ is the set of (i, j) for k-th biomarker.

Since the biomarkers for each patient will generally increases or decreases monotonically, we consider fitting DPS as a least square linear regression problem, namely,

$$\mathop{\min }\limits_{{\alpha }_{i},{\beta }_{i}}\mathop{\sum}\limits_{(j,k)\in {{{{\mathcal{I}}}}}_{i}}\frac{1}{{\sigma }_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2},$$

(9)

where ${{{{\mathcal{I}}}}}_{i}$ is set of (j, k) for i-th patient and σ_k is the sum of squared error with respect to biomarker k, namely,

$${\sigma }_{k}=\frac{1}{| {{{{\mathcal{I}}}}}_{k}-2I-4| }\mathop{\sum}\limits_{(i,j)\in {{{{\mathcal{I}}}}}_{k}}{\left({y}_{ijk}-{f}_{k}\left({\alpha }_{i}{t}_{ij}+{\beta }_{i};{{{{\boldsymbol{w}}}}}_{k}\right)\right)}^{2}.$$

(10)

The detailed procedure to fit the parameters is shown in Algorithm 1. The optimization solver employs the Levenberg-Marquardt method³⁴, which can avoid getting stuck in a local minimum.

Sensitivity analysis

We assume that the parameters in the population model, ${{{{\boldsymbol{w}}}}}^{(1)}=[{w}_{A0}^{(1)},\,{w}_{A1}^{(1)},\,\cdots \,,\,{w}_{m}^{(1)},\,\cdots \,,\,{w}_{C4}^{(1)},\,{w}_{C5}^{(1)}]\in {{\mathbb{R}}}^{21}$, are independent and identically distributed inputs, where m is the index of inputs. For sensitivity analysis, we omit the superscript of the parameters later for simplicity. The range of each input is 90–110% of their values shown in Table 1.

Then we perform Sobol sensitivity analysis, which is also called variance-based sensitivity analysis and is developed from the analysis of variance. As a global sensitivity analysis method, it analyzes the effects of each input by decomposing the variance of the output of the population model into fractions attributed to the inputs. In this paper, we perform both the first-order and second-order sensitivity analyses to the parameters. In particular, the first-order sensitivity index measures the attribution to the variance of the output considering only one input, which is calculated by:

$${{{{\rm{S}}}}}_{m}(y)=\frac{{{{{\rm{Var}}}}}_{{w}_{m}}\left[{{{{\rm{E}}}}}_{{{{{\rm{w}}}}}_{ \sim {{{\rm{m}}}}}}(y| {w}_{m})\right]}{{{{\rm{Var}}}}(y)},$$

(11)

where ${w}_{ \sim m}=\left[{w}_{A1},\,\cdots \,,\,{w}_{m-1},\,{w}_{m+1},\,\cdots \,,\,{w}_{C5}\right]$ includes all inputs except w_m. Next, the second order sensitivity with respect to m and n is measured by sum of attributing the variance of the output considering their first order effects and the second-order interaction between inputs m and n:

$${{{{\rm{S}}}}}_{(m,n)}(y)={{{{\rm{S}}}}}_{m}(y)+{{{{\rm{S}}}}}_{n}(y)+\frac{{{{{\rm{Var}}}}}_{({w}_{m},{w}_{n})}\left[{{{{\rm{E}}}}}_{{{{{\rm{w}}}}}_{ \sim {{{\rm{m}}}},{{{\rm{n}}}}}}(y| {w}_{m},{w}_{n})\right]}{{{{\rm{Var}}}}(y)}.$$

(12)

Then we measure the total-order sensitivity index, which is calculated by attributing the variance of the output considering both the first-order effect, second-order effect, and other higher-order ones.

$${{{{\rm{S}}}}}_{T,m}(y)=1-\frac{{{{{\rm{Var}}}}}_{{{{\rm{{w}}}_{ \sim m}}}}\left[{{{{\rm{E}}}}}_{{w}_{m}}(y| {{{\rm{{w}}}_{ \sim m}}})\right]}{{{{\rm{Var}}}}(y)}.$$

(13)

When the sensitivity value is positive, the corresponding parameter is positively correlated with the model output. If the value is negative, they are negatively correlated. The absolute value of parameter sensitivities represents the degree of influence on the model output. If the sensitivity value is closer to 0, changing this parameter will have less influence on the model output. Based on the sensitivity values and the number of biomarker measurements, we determine the personalized parameters to fit the longitudinal data points for each patient and keep the remaining parameters the same as the population parameter values. This can avoid overfitting when providing the personalized prediction for each subject.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Access to the ADNI dataset is publicly available via http://adni.loni.usc.edu³⁵.

Code availability

The sensitivity analysis code is available at http://salib.readthedocs.io/en/latest/. The simulation study code is available at https://www.pymc.io/welcome. The non-linear optimizer can be found in https://github.com/jjhartmann/Levenberg-Marquardt. Codes for Algorithms 1 and 2 are included in the Supplementary Information.

References

Cortes-Canteli, M. & Iadecola, C. Alzheimer’s disease and vascular aging: Jacc focus seminar. J. Am. College Cardiol. 75, 942–951 (2020).
Article CAS Google Scholar
Batool, A., Kamal, M. A., Rizvi, S. & Rashid, S. Topical discoveries on multi-target approach to manage alzheimer’s disease. Curr Drug Metab. 19, 704–713 (2018).
Article CAS PubMed Google Scholar
Bertram, L., McQueen, M. B., Mullin, K., Blacker, D. & Tanzi, R. E. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat. Genet 39, 17–23 (2007).
Article CAS PubMed Google Scholar
Lane, C. A., Hardy, J. & Schott, J. M. Alzheimer's disease. Eur. J. Neurol. 25, 59–70 (2018).
Article CAS PubMed Google Scholar
Aliev, G. et al. Alzheimer’s disease–future therapy based on dendrimers. Curr. Neuropharmacol. 17, 288–294 (2019).
Article CAS PubMed PubMed Central Google Scholar
Milne, R. et al. At, with and beyond risk: expectations of living with the possibility of future dementia. Soc. Health Illness 40, 969–987 (2018).
Article Google Scholar
Sperling, R. A. et al. Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 7, 280–292 (2011).
Article PubMed PubMed Central Google Scholar
Jack, C. R. et al. NIA-AA research framework: toward a biological definition of Alzheimer’s disease. Alzheimers Dement 14, 535–562 (2018).
Article PubMed PubMed Central Google Scholar
Hao, W. & Friedman, A. Mathematical model on Alzheimer’s disease. BMC Syst Biol 10, 108 (2016).
Article PubMed PubMed Central Google Scholar
Petrella, J. R., Hao, W., Rao, A. & Doraiswamy, P. M. Computational causal modeling of the dynamic biomarker cascade in Alzheimer’s disease. Comput. Math. Methods Med. 2019, https://doi.org/10.1155/2019/6216530 (2019).
Jack, C. R. & Holtzman, D. M. Biomarker modeling of Alzheimer’s disease. Neuron 80, 1347–1358 (2013).
Article CAS PubMed PubMed Central Google Scholar
Abeysinghe, A. A. D. T., Deshapriya, R. D. U. S. & Udawatte, C. Alzheimer’s disease; a review of the pathophysiological basis and therapeutic interventions. Life Sci. 256, 117996 (2020).
Article CAS PubMed Google Scholar
Guo, T., Korman, D., Baker, S. L., Landau, S. M. & Jagust, W. J. Longitudinal cognitive and biomarker measurements support a unidirectional pathway in Alzheimer’s disease pathophysiology. Biol. Psychiatry 89, 786–794 (2021).
Article CAS PubMed Google Scholar
Myszczynska, M. A. et al. Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nat. Rev. Neurol. 16, 440–456 (2020).
Article PubMed Google Scholar
Iturria-Medina, Y., Carbonell, F. M., Sotero, R. C., Chouinard-Decorte, F. & Evans, A. C. Multifactorial causal model of brain (dis)organization and therapeutic intervention: Application to Alzheimer’s disease. Neuroimage 152, 60–77 (2017).
Article PubMed Google Scholar
Friedman, A. & Hao, W. The role of exosomes in pancreatic cancer microenvironment. Bull. Math. Biol. 80, 1111–1133 (2018).
Article CAS PubMed Google Scholar
Budithi, A., Su, S., Kirshtein, A. & Shahriyari, L. Data driven mathematical model of FOLFIRI treatment for colon cancer. Cancers. 13, https://doi.org/10.3390/cancers13112632 (2021).
Hao, W. et al. A mathematical model of aortic aneurysm formation. PLoS One 12, e0170807 (2017).
Article PubMed PubMed Central Google Scholar
Friedman, A. & Hao, W. A mathematical model of atherosclerosis with reverse cholesterol transport and associated risk factors. Bull. Math. Biol. 77, 758-781 (2015).
Wang, X. et al. A bayesian framework for generalized linear mixed modeling identifies new candidate loci for late-onset alzheimer’s disease. Genetics 209, 51–64 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sun, N. et al. Multi-modal latent factor exploration of atrophy, cognitive and tau heterogeneity in alzheimer’s disease. Neuroimage 201, 116043 (2019).
Article PubMed Google Scholar
Schäfer, A. et al. Bayesian physics-based modeling of tau propagation in alzheimer’s disease. Front. Physiol. 1081, https://doi.org/10.3389/fphys.2021.702975 (2021).
Iddi, S. et al. Estimating the evolution of disease in the parkinson’s progression markers initiative. Neurodegenerative Dis. 18, 173–190 (2018).
Article Google Scholar
Iddi, S. et al. Predicting the course of alzheimer’s progression. Brain Informatics 6, 1–18 (2019).
Article Google Scholar
Li, D. et al. The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic alzheimer’s disease. Alzheimer’s & Dement. 5, 308–318 (2019).
Article Google Scholar
Li, D., Iddi, S., Thompson, W. K., Donohue, M. C. & Initiative, A. D. N. Bayesian latent time joint mixed effect models for multicohort longitudinal data. Stat. Methods Med. Res. 28, 835–845 (2019).
Article PubMed Google Scholar
Marinescu, R. V. et al. Predicting alzheimer’s disease progression: Results from the tadpole challenge: Neuroimaging: Neuroimaging predictors of cognitive decline. Alzheimer’s Dement. 16, e039538 (2020).
Article Google Scholar
Jedynak, B. M. et al. A computational neurodegenerative disease progression score: method and results with the alzheimer’s disease neuroimaging initiative cohort. Neuroimage 63, 1478–1486 (2012).
Article PubMed Google Scholar
Sobol, I. M. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Math. Comput. Simul. 55, 271–280 (2001).
Article Google Scholar
Zhang, S., Ponce, J., Zhang, Z., Lin, G. & Karniadakis, G. An integrated framework for building trustworthy data-driven epidemiological models: Application to the covid-19 outbreak in new york city. PLOS Comput. Biol. 17, 1–29 (2021).
Article CAS Google Scholar
Jack, C. R. et al. Serial PIB and MRI in normal, mild cognitive impairment and Alzheimer’s disease: implications for sequence of pathological events in Alzheimer’s disease. Brain 132, 1355–1365 (2009).
Article PubMed PubMed Central Google Scholar
Shaw, L. M. et al. Qualification of the analytical and clinical performance of CSF biomarker analyses in ADNI. Acta Neuropathol 121, 597–609 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shaw, L. M. PENN biomarker core of the Alzheimer’s disease Neuroimaging Initiative. Neurosignals 16, 19–23 (2008).
Article CAS PubMed Google Scholar
Levenberg, K. A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math. 2, 164–168 (1944).
Article Google Scholar
Weiner, M. W. et al. The alzheimer’s disease neuroimaging initiative: a review of papers published since its inception. Alzheimer’s Dement. 9, e111–e194 (2013).
Article Google Scholar

Download references

Acknowledgements

G.L. and H.Z. were supported in part by NSF (DMS-1555072, DMS-1736364, DMS-2053746, and DMS-2134209) and DOE DE-SC0021142. JRP was supported in part by NSF DMS-2052676. W.H. was supported in part by NSF DMS-2052685. PMD’s work on this project is supported by the NIA, Karen L Wrenn Trust and Steve Aoki Fund.

Funding

Funding for data collection was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904, Michael Weiner, PI) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through contributions from the following: AbbVie, Alzheimer ’s Association; Alzheimer ’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. ADNI investigators contributed to the design and implementation of the ADNI database and/or provided data but did not participate in the analysis or writing of this report. A complete listing of ADNI investigators can be found at http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgment_List.pdf.

Author information

Authors and Affiliations

School of Mechanical Engineering, Purdue University, West Lafayette, 47907, IN, USA
Haoyang Zheng & Guang Lin
Department of Radiology, Duke University Health System, Durham, 27710, NC, USA
Jeffrey R. Petrella
Departments of Psychiatry and Medicine, Duke University School of Medicine and Duke Institute for Brain Sciences, Durham, 27710, NC, USA
P. Murali Doraiswamy
Department of Mathematics, Purdue University, West Lafayette, 47907, IN, USA
Guang Lin
Department of Mathematics, Penn State University, University Park, 16802, PA, USA
Wenrui Hao

Authors

Haoyang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey R. Petrella
View author publications
You can also search for this author in PubMed Google Scholar
P. Murali Doraiswamy
View author publications
You can also search for this author in PubMed Google Scholar
Guang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wenrui Hao
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

for the Alzheimer’s Disease Neuroimaging Initiative

Jeffrey R. Petrella
& P. Murali Doraiswamy

Contributions

J.R.P. conceived the idea for in-silico modeling of AD biomarkers. P.M.D. initiated the collaboration between J.R.P., P.M.D., and W.H. to further develop this idea. W.H. developed the data-driven modeling idea. J.P. completed the idea by including DPS in consultation with P.M.D. and W.H.. P.M.D. provided the idea to apply this model separately in C.N., M.C.I. and A.D.. J.R.P. and P.M.D. provided data access and clinical constructs. G.L. and W.H. supervised the work. H.Z. implemented the code. All authors contributed to the discussions leading to the perspective presented. All authors contributed to the editing and shaping of the manuscript at various stages of preparation. All authors read and approved the final version.

Corresponding author

Correspondence to Guang Lin.

Ethics declarations

Competing interests

The authors declare no Competing Non-Financial Interests but the following Competing Financial Interests: PMD is a co-inventor on patents for the diagnosis or treatment of Alzheimer disease. PMD owns shares in several biotechnology companies whose products are not discussed here. P.M.D. has received grants from NIH, DARPA, DOD, ONR, Bausch, Avanir, Avid, Cure Alzheimer’s Fund, Karen L. Wrenn Trust, Steve Aoki Foundation, and advisory fees from Apollo, Brain Forum, Clearview, Lumos, Neuroglee, Otsuka, Verily, Vitakey, Sermo, Lilly, Vivly, and Transposon.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, H., Petrella, J.R., Doraiswamy, P.M. et al. Data-driven causal model discovery and personalized prediction in Alzheimer's disease. npj Digit. Med. 5, 137 (2022). https://doi.org/10.1038/s41746-022-00632-7

Download citation

Received: 18 December 2021
Accepted: 16 June 2022
Published: 08 September 2022
DOI: https://doi.org/10.1038/s41746-022-00632-7
Springer Nature Limited

This article is cited by

BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks
- Simon J. Pelletier
- Mickaël Leclercq
- Arnaud Droit
Nature Communications (2024)
Data-driven care for patients with neurodegenerative disorders
- Gunhild Waldemar
Nature Reviews Neurology (2023)

Data-driven causal model discovery and personalized prediction in Alzheimer's disease

Abstract

Similar content being viewed by others

Challenges and Opportunities with Causal Discovery Algorithms: Application to Alzheimer’s Pathophysiology

Estimation of Causal Effects in the Presence of Unobserved Confounding in the Alzheimer’s Continuum

Observational studies in Alzheimer disease: bridging preclinical studies and clinical trials

Introduction

Results

A population model

Algorithm 1

Sensitivity analysis

Personalized model and biomarker prediction

Algorithm 2

Discussion

Methods

The data-driven causal model learning via ADNI dataset

Disease progression scores

The sigmoid function fitting

Population model calibration

Sensitivity analysis

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Consortia

for the Alzheimer’s Disease Neuroimaging Initiative

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks

Data-driven care for patients with neurodegenerative disorders

Search

Navigation