1 Introduction

Neurodegenerative disorders such as Alzheimer’s disease (AD) are characterized by morphological and molecular changes of the brain, and ultimately lead to cognitive and behavioral decline [8]. To date there is no clear understanding of the dynamics regulating the disease progression. Consequently several attempts have been made to model the disease evolution in a data-driven way, using sets of biomarkers extracted from different imaging acquisition techniques, such as Magnetic Resonance Imaging (MRI) [12]. However available data are mostly represented by cross-sectional measures or time-series acquired on a short-term time span, while the ultimate goal is to unveil the “long-term” disease evolution spreading over decades. Therefore there is a critical need to define the AD evolution in a data-driven manner with respect to an absolute time scale associated to the natural history of the pathology.

To this end, in [9] the authors introduce a disease progression score for each patient in order to identify a data-driven disease scale. This score is based on a set of biomarkers and was shown to correlate with the decline of brain cognitive abilities. A similar approach was proposed by [12] and [6] with scalar biomarkers. In [3], a disease progression score was estimated using higher-dimensional biomarkers from molecular imaging. However these methods don’t provide information about the brain structures involved in AD, and how the disease affects them along time. To overcome these limitations, [13] proposes a spatio-temporal model of disease progression explicitly accounting for different temporal dynamics across the brain. This is done by decomposing cortical thickness measurements as a mixture of spatio-temporal processes, by associating each vertex to a temporal progression modeled by a sigmoid function. They also estimate a disease progression score for each subject as a linear transformation of time. However since the proposed formulation does not account for spatial correlation between vertices, it may be potentially sensitive to spatial variation and noise, thus leading to poor interpretability.

The challenge of spatio-temporal modelling in brain images is a classical problem widely addressed via Independent Component Analysis (ICA [7]), especially on functional MRI (fMRI) data [4]. ICA aims at decomposing the data via matrix factorization, looking for a reduced number of spatio-temporal latent sources. Although successful in fMRI analysis, ICA cannot find straightforward applications to the modelling of AD progression. First, ICA retrieves maximally independent latent sources best explaining the data. However, although brain regions can exhibit different atrophy rates, this doesn’t necessarily imply statistical independence between them. Second, differently from fMRI data, the absolute time axis of AD spatio-temporal observations is unknown. Thus estimating the pathology timing is a key step in order to model the disease progression, and cannot be performed with standard dimensionality reduction methods such as ICA. Finally, fMRI time series are defined over hundreds of time points, while we work essentially in a cross-sectional setting with one or a few images per-subject.

In this work we present a novel spatio-temporal generative model of disease progression aimed at quantifying the independent dynamics of changes in the brain. We model the observed data through matrix factorization across temporal and spatial sources, with a plausibility constraint introduced by clinically-inspired statistical priors. To promote smoothness in time and model steady evolution from normal to pathological stages, the temporal sources are defined as monotonic independent Gaussian Processes (GPs). We also estimate an individual time-shift parameter for each patient to automatically position him along the sources time-axis. To encode the spatial continuity of the brain sub-structures, the spatial sources are modeled as Gaussian random fields. The framework is efficiently optimized through stochastic variational inference. In the next sections we detail the method formulation and show its application on synthetic and real data composed by a large dataset of MRIs from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Further information can be found in the Appendix.Footnote 1

2 Method

We assume that the spatio-temporal data \(Y(x,t) = [Y_{1}(x,t_{1}), Y_{2}(x,t_{2}),..,\) \(Y_{P}(x,t_{p})]\) is stored in a matrix with dimensions \(P \times F\), where P is the number of patients, F the number of image features, and \(Y_{i}(x, t_{i})\) is the image of an individual i observed at position x and at time \(t_{i}\). We postulate a generative model in order to decompose the data in \(N_s\) spatio-temporal sources such that:

$$\begin{aligned} \displaystyle Y_{p}(x,t_{p}) = S(\theta , t+t_{p})A(\psi , x) + \mathcal {E} \end{aligned}$$
(1)

where S is a \(P \times N_s\) matrix where each column represents a temporal trajectory, \(t_{p}\) the individual time-shift parameter, and \(\theta \) the set of parameters related to the temporal sources. A is a \(N_s \times F\) matrix where each row represents a spatial map, and \(\psi \) is a set of spatial parameters. \(\mathcal {E}\) is a \(\mathcal {N}(0, \sigma ^{2}I)\) Gaussian noise. According to the generative model the likelihood is:

$$\begin{aligned} p(Y|A,S, \sigma ) = \displaystyle \prod _{p=1}^{P} \frac{1}{(2\pi \sigma ^{2})^\frac{F}{2}} \exp (-\frac{1}{2\sigma ^{2}}||Y_{p} -S(\theta , t+t_{p}) A(\psi , x)||^{2}) \end{aligned}$$
(2)

For each row \(A_{n}\) of A we specify a \(\mathcal {N}(0, I)\) prior, while each column \(S_{n}\) of S is a GP modeled as in [5]. This setting leverages on kernel approximation through sampling of basis functions in the spectral domain [14]. For specific choices of the covariance, such as the Radial Basis Function used in our work, the GPs can be approximated as a Bayesian neural network with form: \(S(t) = \phi (\varOmega t)W\). Where \(\varOmega \) is the projection in the spectral domain, \(\phi \) the non-linear basis function activation, and W the regression parameter. The GPs inference problem thus amounts at estimating approximated distributions for \(\varOmega \) and W.

To account for the steady increase of the sources from normal to pathological stages we introduce a monotonicity prior over the GPs. To do so, we constrain the space of the temporal sources to the set \(\mathcal {C} = \{S(t) \mid S'(t) \le 0 \quad \forall t\}\), following [11]. This leads to a second likelihood term constraining the dynamics of the temporal sources:

$$\begin{aligned} p(\mathcal {C}|S', \lambda ) = (1 + \exp (-\lambda S'(t)))^{-1} \end{aligned}$$
(3)

We jointly optimize (2) according to priors and constraints, by maximizing the data evidence:

$$\begin{aligned} \displaystyle \log (p(Y,\mathcal {C}|\sigma , \lambda )) = \log [\int _{A}\int _{S} \int _{S'}p(Y|A,S, \sigma )p(\mathcal {C}|S', \lambda )p(A)p(S,S'| \lambda )dAdSdS'] \end{aligned}$$
(4)

Since this integral is intractable, we tackle the optimization of (4) via stochastic variational inference. Following [10] and [5] we introduce approximations \(q_1(A)\) and \(q_2(\varOmega , W)\) to derive the lower bound:

$$\begin{aligned} \begin{aligned} \log (p(Y,\mathcal {C}|\sigma , \lambda ))&\geqslant {{\mathrm{\mathbb {E}}}}_{A \sim q_{1}, (\varOmega , W) \sim q_{2}}[log(p(Y|A,\varOmega , W, \sigma ))] + {{\mathrm{\mathbb {E}}}}_{(\varOmega , W) \sim q_{2}}[log(p(\mathcal {C}|\varOmega ,W, \lambda ))] \\&- \mathcal {D}[q_{1}(A)||p(A)] - \mathcal {D}[q_{2}(\varOmega , W)||p(\varOmega , W)] \end{aligned} \end{aligned}$$
(5)

where \(\mathcal {D}\) refers to the Kullback-Leibler divergence.

We specify the approximated distribution of the spatial activation maps \(q_{1}\) such that \(q_{1}(A) = \textstyle \prod _{n=1}^{Ns} \mathcal {N}(\mu _{n}, \varSigma (\alpha , \beta ))\). To introduce spatial correlations in the maps we choose \( \varSigma _{i,j}(\alpha , \beta ) = \alpha \exp (-||u_{i}-u_{j}||^{2}/2\beta )\) to model a smooth decay across voxels with coordinates \((u_{i}, u_{j})\). We follow [5] and [11] to also define a variational lower bound on the constrained GPs parameterizing the temporal processes. Thanks to the proposed framework, (4) can be efficiently optimized by stochastic variational inference through backpropagation. We chose to alternate the optimization between the spatio-temporal parameters and the time-shift. We set \(\lambda \) to the minimum value that gives monotonic sources, while \(\sigma \) was arbitrarily determined from the data. A detailed derivation of the model and lower-bound can be found in the Appendix.

3 Results

3.1 Benchmark on Synthetic Data

We tested the algorithm on synthetic data to assess its ability to separate spatio-temporal sources from mixed data, and to provide a model selection via the variational lower bound. We generated three monotonically increasing functions \(S_{i}(t)\) such that \(S_i(t) = 1/(1 + \exp (-t + \alpha _{i}))\), and three synthetic Gausian activation maps \(A_1, A_2, A_3\) with a \(30 \times 30\) resolution, to mimick grey matter brain areas (Figs. 1a and b). The data was generated as \(Y_{p,j} = S(t_{p})A + \mathcal {E}_{j}\) over 40 time points \(t_{p}\), where \(t_{p}\) is uniformly distributed in [0,1]. We sampled 50 images at instants \(t_{p}\) and applied our method. To simulate a pure cross-sectional setting the time associated to each input image was set to zero. Figures 1c and d show the estimated spatio-temporal processes when fitting the model with three latent sources. In Fig. 2, we see that the individual time-shift parameter estimated for each subject correlates with the original time used to generate the data. This means that the algorithm correctly positions each subject on the temporal trajectories.

Fig. 1.
figure 1

(a)–(b) Ground truth temporal and spatial sources. (c) Red: raw temporal sources against the original time axis. Blue: recovered temporal sources against the estimated time scale. (d) Estimated spatial maps. (Color figure online)

Fig. 2.
figure 2

The red points represent the values of the estimated subjects’ time-shift against their associated ground truth value. (Color figure online)

To test the model selection, we generated the data as described above using respectively one, two, or three sources over ten folds. For each fold we ran the algorithm looking for one to four sources. Figure 3 shows mean and standard deviation of the lower bound. We observe that when the number of sources is under-estimated the lower bound is higher. When the number of sources is over-estimated, although the lower bound for model selection is more uncertain, by looking at the extracted spatial maps we observe that the additional sources are mainly set to zero or have low weights (see the map of Fig. 3). These experimental results indicate that the optimal number of sources should be selected by inspection of both the lower bound and the extracted spatial sources.

Fig. 3.
figure 3

(a)–(b)–(c): Distribution of the lower bound against the number of fitted sources. (d): \(4^{th}\) extracted spatial map with data generated by 3 latent sources.

The method was also compared to ICA in a simplified setting by assigning the ground truth parameter \(t_{p}\) beforehand. This simplification is necessary since standard ICA can’t be applied when the time associated to each image is unknown. We observed that ICA recovered the spatio-temporal sources, by providing however more noisy estimations than the ones we obtained. This result highlights the importance of the priors and constraints introduced in our method (see Appendix).

3.2 Application on Real Data

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. For up-to-date information, see www.adni-info.org.

Fig. 4.
figure 4

(a)–(b) Temporal and spatial sources extracted from the data.

In this section we present an application of the algorithm on real data, using grey matter maps extracted from structural MRI. We selected a cohort of 555 subjects from ADNI composed by 94 healthy controls, 343 MCI, and 118 AD patients. We processed the baseline MRI of each subject to obtain high-dimensional grey matter density maps in a standard space [1]. We extracted the \(90 \times 100\) middle coronal slice for each patient, to obtain a data matrix Y with dimensions \(555 \times 9000\), and applied our algorithm looking for three spatio-temporal sources (see Fig. 4). The middle spatial map shows a strong activation of the hippocampus, while the left and right plots show an activation on the temporal lobes, with two similar temporal behaviours, characterized by a less pronounced grey matter loss compared to the hippocampus. More specifically, we observe that the hippocampal trajectory has a strong acceleration in opposition to the other brain areas. This pattern quantified by our model in a pure data-driven manner is compatible with empirical evidence from clinical studies [2]. In Fig. 5 we observe the estimated time of each patient against standard volumetric and clinical biomarkers. We see a strong correlation between brain volumetric measures and the estimated time, as well as a non-linear relation in the evolution of ADAS11. The latter result indicates an acceleration of clinical symptoms along the estimated time course.

Fig. 5.
figure 5

Evolution of volumetric and clinical biomarkers along the estimated time.

4 Conclusion

We presented a method for analyzing spatio-temporal data, which provides both independent spatio-temporal processes at stake in AD, and a disease progression scale. Applied on grey matter maps, the model highlights different brain regions affected by the disease, such as the hippocampus and the temporal lobes, along with their differential temporal trajectory. We also show a strong correlation between the estimated disease progression scale and different clinical and volumetric biomarkers. We are currently extending the approach to scale to 3D volumetric images by parallelization on multiple GPUs. The lower bound properties will be also further investigated to better assess its reliability, in order to improve the model comparison. Moreover the method will be extended beyond the cross-sectional application of Sect. 3.2, to account for time-series of brain images, as well as for multimodal imaging biomarkers. Finally we will investigate the use of the approach for prognosis purposes, to provide a data-driven assessment of disease severity in testing patients.