Keywords

1 Introduction

Alzheimer’s disease (AD) is a growing health-economic worldwide issue, accounting for most cases of dementia [13]. Despite the great amount of effort devoted to AD prevention and drug development during the last three decades, the few pharmacological treatments available show a modest benefit. The study of the AD process is further hindered by the fact that dementia can be caused by multiple pathologies, and that AD often co-occurs with them [11], being age and genetic variations the main risk factors [10].

For more than two decades, the most widely accepted model of the pathophysiological process underlying AD was the so-called amyloid cascade hypothesis. This hypothesis states that the process starts with an abnormal accumulation of the \(\beta \text {-amyloid}\) (A\(\beta \)) peptide, triggering a chain of pathological events in a predictable way. The corresponding model of biomarker dynamics states that the main AD biomarkers become abnormal in a temporally ordered manner [6, 7]. However, large cohort studies showed that all possible combinations of biomarker abnormalities are frequently present in the cognitively normal population [8], evidencing that the amyloid cascade hypothesis is insufficient to explain the observed heterogeneity in sporadic AD  [4, 5].

A new conceptual model of AD was recently proposed [4], which posited a non-deterministic disease path. According to this model, A\(\beta \) and tau levels interact between them and with genetic and environmental factors to increase or reduce the risk of disease progression. These interactions would be responsible for the huge heterogeneity observed in biomarker trajectories and the discrepancies between observation and the amyloid cascade hypotheses.

Quantitative tools that estimate the biomarker dynamics are needed to shed light on the AD process and to build better clinical tools for diagnosis, prognosis and therapy efficacy assessment.

1.1 Disease Progression Models of Alzheimer’s Disease

The first AD progression models describing long-term trajectories from short-term biomarker observations were based on Jack’s model [7], i.e., they assumed that all subjects follow the same disease progression pattern but with different onset times and at different speeds. Jedynak defined a disease progression score aimed at quantifying disease progression [9]. Subjects were temporally ordered according to this score and a parametric sigmoid-shaped curve was used to fit the progression of biomarkers. In [3] the authors proposed a semi-parametric model to determine the population mean of biomarker trajectories and the temporal order of subjects. A similar but more flexible model used Gaussian Process (GP) to model also the individual departures from the mean [12]. In general, all these models may suffer from identifiability issues when trained with short-term observations, because of the need to simultaneously estimate the disease onset times and the biomarker trajectories. Sometimes identifiability issues were mitigated using mixed-effect modelling to restrict the variance of the subject-level parameters.

The first dynamical model that relaxes the unique trajectory condition, allowing an arbitrary combination of variables as initial conditions, used a Riemannian framework to transport the mean trajectory to fit the subject’s observations [16]. Contrary to the previous works, it is the initial value of the variables, and not the onset time, which was modelled as a random effect.

Finally, differential equation models parameterize biomarker velocities instead of biomarker trajectories and are therefore implicit models. Two works [2, 15] tackle the problem of how to estimate long-term biomarker trajectories from short-term observations of a single biomarker. A recent work used a system of ordinary differential equations (ODEs) to simulate the effect of amyloid treatments on the disease course [1], being the first multivariate ODE-based AD progression model.

In this work, we propose a probabilistic AD progression model that uses a system of ODEs to describe biomarker dynamics. In our formulation, and similar to [16] and [1], all combinations of trajectory starting values are allowed. Another important common feature is that onset times are not model parameters, reducing the risk of non-identifiability. But contrary to all ODE-based approaches, we model the biomarker velocities non-parametrically, using GPs, which adds flexibility and imposes less inductive bias.

2 Methods

2.1 Definitions and Model Overview

We propose a Bayesian generative model to describe the trajectories of brain biomarkers throughout AD. Let \(\textbf{x}_s(t) = [x_{1,s}(t), x_{2,s}(t), \cdots , x_{L,s}(t)]\) be a set of L brain features and \(\textbf{y}_s(t) = [y_{1,s}(t), y_{2,s}(t), \cdots , y_{Q,s}(t)]\) a set of Q covariates for subject s at time t. The features \(x_{l,s}(t)\) represent magnitudes associated with the disease status that evolve as the disease progress. For example, they could be brain atrophy, amyloid plaques or neurofibrillary tangles.In general, we can distinguish the aforementioned brain features, associated to the disease process, from the brain biomarkers extracted from MRI or PET images. However, we will consider one observable \(\hat{x}_l\) per feature \(x_l\) and will refer to features and biomarkers interchangeably. The covariates \(y_{q,s}(t)\) are assumed to have no observation error. They could be fixed over time (e.g. genetics), change in time according to a predefined or known pattern (e.g. age), or be controlled externally (e.g. treatments).

The link between a set of observed biomarkers \(\mathbf {\hat{x}}\) and the feature vector \(\textbf{x}\) is specified by a likelihood function \(\mathcal {L}(\mathbf {\hat{x}} | \textbf{x}, \Theta )\), where \(\Theta \) are model parameters. In our case, the likelihood functions will be independent Gaussian distributions. Let \(\hat{x}_{l,s; i} \sim \mathcal {N}\left( x_{l,s} ( t_{l,s;i} ), \sigma _l^2 \right) \) be the ith observation of biomarker l for subject s at observation time \(t_{l,s;i}\). Note that the number observations and observation times may be different for each subject and each biomarker.

The main hypothesis in this work is that the state of features and covariates at a given time determines unequivocally the rate of progression,i.e. the expected rate of change of all the brain features. Specifically, trajectories should be a solution of the of the initial value problem defined by the system of ODEs

$$\begin{aligned} \frac{d \textbf{x}_s(t)}{dt} = \textbf{v} ( \textbf{z}_s( t ) ) , \end{aligned}$$
(1)

where \(\textbf{z}_s(t) = [\textbf{x}_s(t), \textbf{y}_s(t)]\), with initial condition

$$\begin{aligned} \textbf{z}_s(0) = [\textbf{x}_{0,s}, \textbf{y}_s( 0 ) ] . \end{aligned}$$
(2)

We propose to model each component l of the velocity field \(\textbf{v}( \cdot )\) using a GP prior

$$\begin{aligned} v_l \sim \mathcal{G}\mathcal{P}(0, k_l( \cdot , \cdot ) ) , \end{aligned}$$
(3)

with the exponential kernel, \(k_l( \textbf{z}_m , \textbf{z}_n) = k_{\alpha _l, \rho _l}( \textbf{z}_m , \textbf{z}_n) = \alpha _l^2 \exp (-\frac{(\textbf{z}_m - \textbf{z}_n)^2}{2\rho _l^2})\).

The problem is completely specified once we define priors for the model hyperparameters, i.e., the observation variance \(\sigma _l^2\), the kernel parameters \(\alpha _l\) and \(\rho _l\), and the subject-level parameters \(\textbf{x}_{0,s}\).

However, the velocity field \(\textbf{v}( \cdot )\) is a function-valued parameter which could be difficult to estimate and very hard or impossible to marginalize out given that it is involved in the ODE system (1).We propose the following approximation to transform Eq. (3) into a likelihood function and to model \(\textbf{v}( \cdot )\) implicitly, as it is usual in GP regression. Let \(\hat{x}_{l,s;i}\) and \(\hat{x}_{l,s;i+1}\) be two observations of the same biomarker l at two consecutive time points, \(t_{l,s;i}\) and \(t_{l,s;i+1}\), respectively. Assuming that the time difference \(\varDelta t_{l,s;i} = t_{l,s;i+1} - t_{l,s;i} \) is small with respect to the biomarker dynamics, we can approximate the velocity field using the observation differences.Dropping the indexes s and l for clarity, we have

$$\begin{aligned} \nonumber x(t_{i+1})\simeq & {} x(t_i) + v( \textbf{z}(t_i) ) \varDelta t_i \\ \nonumber \hat{x}_{i+1} - \epsilon _{i+1}\simeq & {} \hat{x}_i - \epsilon _i + v( \textbf{z}(t_i) ) \varDelta t_i \\ \hat{v}_i = \frac{\hat{x}_{i+1} - \hat{x}_i}{ \varDelta t_i}\simeq & {} v( \textbf{z}(t_i) ) + \frac{\epsilon _{i+1} - \epsilon _i}{ \varDelta t_i}, \end{aligned}$$
(4)

where \(\epsilon _i \sim \mathcal {N}(0, \sigma ^2)\) is the observation error. Then \(\hat{v}_i \sim \mathcal {N}( v( \textbf{z}(t_i) ), 2 \sigma ^2 / \varDelta t_i^2 )\) and we can replace Eq. (3) with

$$\begin{aligned} \hat{v} \sim \mathcal{G}\mathcal{P}(0, \hat{k}( \cdot , \cdot ) ) , \end{aligned}$$
(5)

where the new kernel \(\hat{k}( \cdot , \cdot )\) is the same as \(k( \cdot , \cdot )\) plus a noise term, and Eq. (5) represents a likelihood function because its l.h.s. is an observation. Note that the Gaussianity and independence of biomarker observations was critical to define the approximate velocities in Eq. (4).

2.2 Proposed Model

The complete set of parameter priors is given by

$$\begin{aligned} \sigma _l\sim & {} \mathcal {N}^+(0, \tau _{\sigma ,l}) \\ \alpha _l\sim & {} \mathcal {N}^+(0, \tau _{\alpha ,l}) \\ \rho _l\sim & {} \varGamma (5, 5) \\ \left[ x_{0,s} \right] _l\sim & {} \mathcal {U}(0, 1), \end{aligned}$$

where \(\mathcal {N}^+(\cdot , \cdot )\) is the half-Gaussian distribution and \(\varGamma (5, 5)\) is used as a weakly informative prior that penalizes extremely large and extremely small values of the length scale parameter \(\rho \). Weakly informative priors for \(\sigma _l\) and \(\alpha _l\) are determined by setting \(\tau _{\sigma ,l}\) and \(\tau _{\alpha ,l}\) equal to the mean subject-level standard deviation of observations and the variance of the estimated velocities \(\hat{v}_{l, s; i}\), respectively.

For simplicity, each component of the subject-level parameters \(\textbf{x}_{0,s}\) is restricted to be in the unit segment after normalizing the biomarker values to fit in the unit hypercube. Note that they can be modelled as random effects if desired, which would provide better estimations in case that a biomarker is completely missing for a given subject.

The likelihood functions are

$$\begin{aligned} \hat{x}_{l, s;i}\sim & {} \mathcal {N}\left( [ \textbf{x}_s ( t_{l, s;i} ) ]_l, \sigma _l^2 \right) \\ \nonumber \hat{\textbf{v}}_l\sim & {} \mathcal {N}\left( \textbf{0}, K_{\alpha _l, \rho _l}(\hat{Z}_l, \hat{Z}_l) + \textrm{diag}\left( \textbf{s}_{\sigma _l}^2\right) \right) , \end{aligned}$$
(6)

where \(\hat{\textbf{v}}_l\) includes all consecutive observation differences from all subjects, i.e.,

$$\begin{aligned} \hat{\textbf{v}}_l = \left[ \frac{ \hat{x}_{l, 1;2} - \hat{x}_{l, 1;1} }{\varDelta t_{l, 1;1}}, \frac{ \hat{x}_{l, 1;4} - \hat{x}_{l, 1;3} }{\varDelta t_{l, 1;3}}, \dots , \frac{ \hat{x}_{l, 2;2} - \hat{x}_{l, 2;1} }{\varDelta t_{l, 2;1}}, \dots \right] ^T , \end{aligned}$$
(7)

\(\hat{Z}_l\) is a matrix with the corresponding features and covariates,

$$\begin{aligned} \hat{Z}_l = \begin{pmatrix} \hat{x}_{1, 1;1} &{} \hat{x}_{2,1;1} &{} \cdots &{} y_{1,1;1} &{} y_{2,1;1} &{} \cdots \\ \hat{x}_{1, 1;3} &{} \hat{x}_{2,1;3} &{} \cdots &{} y_{1,1;3} &{} y_{2,1;3} &{} \cdots \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots &{} \\ \hat{x}_{1, 2;1} &{} \hat{x}_{2,2;1} &{} \cdots &{} y_{1,2;1} &{} y_{2,2;1} &{} \cdots \\ \vdots &{} \vdots &{} &{} \vdots &{} \vdots &{} \\ \end{pmatrix} = \begin{pmatrix} \left( \hat{\textbf{z}}_{1,1} \right) ^T \\ \left( \hat{\textbf{z}}_{1,3} \right) ^T \\ \vdots \\ \left( \hat{\textbf{z}}_{2,1} \right) ^T \\ \vdots \end{pmatrix}, \end{aligned}$$

the noise term is given by \( \textbf{s}_{\sigma _l} = \sqrt{2} \sigma _l [ \varDelta t^{-1}_{l, 1;1} , \varDelta t^{-1}_{l, 1;3}, \dots , \varDelta t^{-1}_{l, 2;1} , \dots ]\), and the matrix \(K_{\alpha _l, \rho _l}(\hat{Z}_l, \hat{Z}_l)\) is obtained by applying the kernel to all combinations of rows in \(\hat{Z}_l\).

To compute \(\textbf{x}_s ( t_{l, s;i} )\) in the likelihood term (Eq. (6)) we used forward Euler integration

$$\begin{aligned} \textbf{x}_s ( t_{l, s; i+1} ) = \textbf{x}_s ( t_{l, s;i} ) + \textbf{v}( \textbf{z}_s( t_{l, s;i} )) \varDelta t_{l, s;i} \end{aligned}$$

with initial condition given by Eq. (2), where

$$\begin{aligned} \left[ \textbf{v}( \textbf{z}) \right] _l = \textbf{k}_{\alpha _l, \rho _l}(\textbf{z} , \hat{Z}_l)^T \left( K_{\alpha _l, \rho _l}(\hat{Z}_l, \hat{Z}_l) + \textrm{diag}(\textbf{s}_{\sigma _l}^2)) \right) ^{-1} \hat{\textbf{v}}_l \end{aligned}$$

Note that the trajectory of all biomarkers should be computed simultaneously, even when only a single component l is needed in Eq. (6). This doesn’t represent any problem as far as the covariates \(\textbf{y}_s( t_{l, s;i} )\) are available at all time points because the observations \(\hat{x}_{l', s;i}\) are not used in this computations for \(l'\ne l\). This implies that the biomarkers don’t need to be acquired at the same time points. This is an important consideration for long longitudinal studies, such as ADNI, for which each imaging modality is scheduled at a different rate and the time gap between the first acquisition of two modalities differ between subjects.

However, the matrix \(\hat{Z}_l\) should be complete for each component l. This implies that, at least at the first of the two time points used to compute the differences in Eq. (4), the complete set of observations is needed, and that the vectors \(\hat{\textbf{v}}_l\) in Eq. (7) may have different lengths for each biomarker l.

3 Experiments

3.1 Data

The model was fitted to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) databaseFootnote 1. All subjects from the ADNI dataset having at least 4 valid AV45 (Florbetapir) PET scans and 4 MRI T1 scans were used to fit the model, resulting in a total of 198 participants (88 Cognitively Normal and 110 with Mild Cognitive Impairment), and 874 PET and 1225 MRI measurements.

Three features were selected: mean AV45-PET SURV (average PET signal in cortical grey matter normalized by whole cerebellum), and the ratios of hippocampal and ventricular volume to intracranial volume (ICV)Footnote 2. Covariates included age and the presence of a copy of the E4 allele of the apolipoprotein-E (APOE) gene. The covariate vector \(\textbf{y}\) had only 1 dimension (age) because velocities fields for APOE E4 carriers and non-carriers were kept apart and estimated separately, sharing only the hyperparameters.

3.2 Results

The posterior distributions of the model parameters were obtained with Markov chain Monte Carlo (MCMC) sampling using Stan software [17]. To explore the model predictive performance we did a leave-one-site-out experiment consisting in removing all subjects from a given hospital, except for a few observations used for predicting the rest of the biomarker trajectory, but not for velocity estimation.All the observations from left-out sites within a 5-year interval centred at the PET-MRI time overlap were used to estimate the rest of the trajectory. This interval was defined in order that all subjects to have at least one PET observation. As AV45 started to be acquired after MRI, there are few observations after this period. Therefore, prediction time in the past is larger than in the future.

Figure 1 shows predicted trajectories along with observations for a subset of subjects. Specifically, we selected the subject with most data not shown to the model from each site. Then, the 8 subjects with the longest unobserved trajectories from the APOE E4 non-carrier group were selected.

Apart from prediction, the model allows to test an endless amount of hypotheses, such as the mean difference in a given biomarker rate of change between two given sub-populations. For the sake of illustration, we have focused on the hippocampal rate of change, shown in Fig. 2. The top row panels show a representation of the velocity field in the MRI plane (hippocampal and ventricular volumes) and the bottom row panels show the rate of change of hippocampal volume for different conditions. The most prominent pattern is that APOE E4 carriers present higher rates of hippocampal atrophy than E4 non-carriers. These dynamics are only mildly modulated by brain amyloid levels, as can be observed in the bottom row panels. The strong influence of genetic factors in AD dynamics and the importance of considering their effect in AD progression modelling was recently highlighted in [4].

Fig. 1.
figure 1

Normalized biomarker trajectories for selected individuals. Blue: AV45-PET mean SUVR/3; Orange: Ventricular volume/ICV \(\times 10\); Green: Hippocampal (HPC) volume/ICV \(\times 100\). Dots correspond to observations, black stars denote observations used for prediction while the rest of the observations were hidden during the model inference. Shaded areas represent the 90% highest density interval of the prediction posterior probability. (Color figure online)

Fig. 2.
figure 2

Top: Representation of velocity fields. Dots correspond to the mean estimated initial values and lines represent velocity (two years of evolution) and their uncertainty (lines are drawn from the posterior distribution). Bottom: Hippocampal volume rate of change among subjects with mild neurodegeneration for different conditions of age, APOE and amyloid PET. Shaded areas represent 90% highest density interval of the posterior probability.

4 Discussion

We have presented a statistical model of brain-derived biomarker progression that overcomes important limitations of previous progression models of AD [14]. Remarkably, the proposed model dispenses with the assumption of a common disease trajectory. Biomarker independence is another limiting assumption frequently made. Conversely, the relationship between one biomarker value and another biomarker dynamics is at the core of the proposed model. Finally, this work presents the first non-parametric ODE-based AD progression model.

We have illustrated the model on PET and MRI-derived biomarkers and shown its potential as a tool for AD dynamics understanding and prediction. We showed the feasibility of full Bayesian posterior inference using MCMC in a moderate-sized dataset. The only multivariate ODE-based model of AD progression we are aware of used a variational approximation to estimate posteriors [1].

4.1 Limitations and Future Directions

A limitation of this work is the small number of selected biomarkers. We foresee no mayor computational issues in adding a large number of biomarkers, because GPs scale well with dimensionality. However, more experiments are required to verify the stability of the estimates.

The proposed model can be easily extended with other relevant AD tests, such as cognitive tests. Additionally, a cross-sectional clinical prediction model whose input features are the biomarkers and covariates used in this work could be added on top of the progression model. The progression and the diagnostic model combine together to forecast diagnosis in the future, i.e., producing prognosis predictions.