1 Introduction

Deformable atlas building is to create a “mean” or averaged image and register all subjects to a common space. The resulting atlas and group transformations are powerful tools for statistical shape analysis of images [12, 18], template-based segmentation [13, 21, 22], or object tracking [16, 17], just to name a few. A good quality of altas heavily relies on the registration process, which is typically formulated as a regularized optimization to solve [5, 7, 25, 30]. An issue in the current process of registration-based atlas construction is how to regularize model parameters. Having an appropriate regularization is critical to the “sharpness” of the atlas, as well as ensuring a set of desirable properties of transformations, i.e., a smooth and invertible smooth mapping between images, also known as diffeomorphisms, to preserve the topology of original images.

Current atlas building models either exhaustively search for an optimal regularization in the parameter space, or treat it as unknown variables to estimate from Bayesian models. While ad hoc parameter-tuning may yield satisfactory results, it requires expert domain knowledge to guide the tuning process [14, 18, 26, 27]. Inspired by probabilistic models, several works have proposed Bayesian models of atlas building with automatically estimated regularizations  [1, 2, 31]. These approaches define a posterior distribution that consists of an image matching term between a deformed atlas and each individual as a likelihood, and a regularization as a prior to support the smoothness of transformation fields. The regularization parameter is then jointly estimated with atlas after carefully integrating out the image deformations using Monte Carlo sampling. However, sampling in a high-dimensional transformation space (i.e., on a dense 3D image grid \(128^3\)) is computationally expensive and often leads to a long execution time with high memory consumption. More importantly, the aforementioned methods are limited to regularizations with single-penalty for population studies. This prohibits the model’s ability to adaptively search for the best regularization parameter associated with an individual subject, which is critical to images with various degrees of geometric transformations. The typical “one-fits-all” fails in cases where large geometric variations occur, i.e., brain shape changes of Alzheimer’s disease group. Allowing the subject-specific (data-driven) regularization can substantially affect the sharpness and quality of the atlas [29].

In this paper, we propose a hierarchical Bayesian model of atlas building with subject-specific regularizations in the context of Large Deformation Diffeomorphic Metric Mapping (LDDMM) algorithm [7]. In contrast to previous approaches treating the regularization of individual subjects as a single-penalty function with adhoc parameters, we develop a data-adaptive algorithm to automatically adjust the model parameters accordingly. To achieve this, we introduce a novel hierarchical prior that features (i) prior distributions with multiple regularization parameters on the group transformations in a low-dimensional bandlimited space; and (ii) a hyperprior to model the regularization parameters as latent variables. We then develop a Monte Carlo Expectation Maximization (MCEM) algorithm, where the expectation step integrates over the regularization parameters using Hamiltonian Monte Carlo (HMC) sampling. The joint estimation of model parameters including atlas, registration, and hyperparameters in the maximization step successfully eliminates a massive burden of multi-parameters tuning. We demonstrate the effectiveness of our algorithm on both 2D synthetic images and 3D real brain MRIs.

To the best of our knowledge, we are the first to extend the atlas building to a data-adaptive and parameter-tuning-free framework via hierarchical Bayesian learning. Experimental results show that our model provides an efficient atlas construction of population images, particularly with large variations of geometric transformations. This paves a way for an improved quality of clinical studies where atlas building is required, for example, statistical shape analysis of brain changes for neurodegenerative disease diagnosis [12], or atlas-based segmentation for in-utero placental disease monitoring [16].

2 Background: Atlas Building with Fast LDDMM

We first briefly review an unbiased atlas building algorithm [14] based on Fourier-approximated Lie Algebra for Shooting (FLASH), a fast variant of LDDMM with geodesic shooting [30]. Given a set of images \(I_1,\cdots ,I_N\) with N being the number of images, the problem of atlas building is to find a template image I and transformations \(\phi _1,\cdots , \phi _N\) that minimize the energy function

$$\begin{aligned} E(I, \phi _n) = \sum _{n=1}^{N} \text {Dist} (I \circ \phi _n, I_n) + \text {Reg}(\alpha , \phi _n). \end{aligned}$$
(1)

The \(\text {Dist}(\cdot , \cdot )\) is a distance function that measures the dissimilarity between images, i.e., sum-of-squared differences [7], normalized cross correlation [6], and mutual information [28]. The \(\mathrm{Reg}(\cdot )\) is a weighted regularization with parameter \(\alpha \) that guarantees the diffeomorphic properties of transformation fields.

Regularization In Tangent Space of Diffeomorphisms. Given an open and bounded d-dimensional domain \(\varOmega \subset \mathbb {R}^d\), we use \(\mathrm{Diff}(\varOmega )\) to denote a space of diffeomorphisms and its tangent space \(V=T \mathrm{Diff}(\varOmega )\). The regularization of LDDMM is defined as an integral of the Sobolev norm of the time-dependent velocity field \(v(t) \in V (t \in [0, 1])\) in the tangent space, i.e.,

$$\begin{aligned} \text {Reg}(\alpha , \phi _n) = \int \langle \mathcal {L}(\alpha ) v_n(t), \mathcal {L}(\alpha ) v_n(t) \rangle \, dt, \, \text {with} \, \, \frac{d\phi _n(t)}{dt} = - D\phi _n(t)\cdot v_n(t). \end{aligned}$$
(2)

Here \(\mathcal {L}\) is a symmetric, positive-definite differential operator, with parameter \(\alpha \) controling the smoothness of transformation fields. In this paper, we use the Laplacian operator \(\mathcal {L}=(- \alpha \varDelta + \text {Id})^3\), where \(\text {Id}\) is an identity matrix. The operator D is a Jacobian matrix and \(\cdot \) denotes an element-wise matrix multiplication.

According to the geodesic shooting algorithm [25], the minimum of LDDMM is uniquely determined by solving a Euler-Poincaré differential equation (EPDiff) [3, 19] with initial conditions. This inspires a recent model FLASH to reparameterize the regularization of Eq. (2) in a low-dimensional bandlimited space of initial velocity fields, which dramatically reduces the computational complexity of transformation models with little to no loss of accuracy [30].

Fourier Computation of Diffeomorphisms. Let \(\widetilde{\mathrm{Diff}}(\varOmega )\) and \(\tilde{V}\) denote the space of Fourier representations of diffeomorphisms and velocity fields respectively. Given time-dependent velocity field \(\tilde{v}(t) \in \tilde{V}\), the diffeomorphism \(\tilde{\phi }(t) \in \widetilde{\mathrm{Diff}}(\varOmega )\) in the finite-dimensional Fourier domain can be computed as

$$\begin{aligned} \tilde{\phi }(t) = \tilde{\text {Id}} + \tilde{u}(t), \quad \frac{d \tilde{u}(t)}{dt}&= -\tilde{v}(t) - \tilde{\mathcal {D}} \tilde{u}(t) *\tilde{v}(t), \end{aligned}$$
(3)

where \(\tilde{\text {Id}}\) is the frequency of an identity element, \(\tilde{\mathcal {D}}\tilde{u}(t)\) is a tensor product \(\tilde{\mathcal {D}} \otimes \tilde{u}(t)\), representing the Fourier frequencies of a Jacobian matrix \(\tilde{\mathcal {D}}\) with central difference approximation, and \(*\) is a circular convolution Footnote 1.

The Fourier representation of the geodesic shooting equation (EPDiff) is

$$\begin{aligned} \frac{\partial \tilde{v}(t)}{\partial t} =-\tilde{\mathcal {K}}\left[ (\tilde{\mathcal {D}} \tilde{v}(t))^T \star \tilde{\mathcal {\mathcal {L}}}\tilde{v}(t) + \tilde{\nabla } \cdot (\tilde{\mathcal {\mathcal {L}}}\tilde{v}(t) \otimes \tilde{v}(t)) \right] , \end{aligned}$$
(4)

where \(\star \) is the truncated matrix-vector field auto-correlation. The operator \(\tilde{\nabla } \cdot \) is the discrete divergence of a vector field. Here \(\tilde{\mathcal {K}}\) is an inverse operator of \(\tilde{\mathcal {L}}\), which is the Fourier transform of a Laplacian operator in this paper.

The regularization in Eq. (2) can be equivalently formulated as

$$ \begin{aligned} \text {Reg}(\alpha , \phi _n) = \langle \tilde{{\mathcal {L}}}(\alpha ) \tilde{v}_n(0), \tilde{{\mathcal {L}}}(\alpha ) \tilde{v}_n(0) \rangle , \quad \text {s.t.} \, \text {Eq.}~\mathrm{(3)}\, \& \,\text {Eq.}~\mathrm{(4)}. \end{aligned}$$

We will drop off the time index in remaining sections for notational simplicity, e.g., defining \(\tilde{v}_n \triangleq \tilde{v}_n(0)\).

3 Our Model: Bayesian Atlas Building with Hierarchical Priors

This section presents a hierarchical Bayesian model for atlas building that allows subject-specific regularization with no manual effort of parameter-tuning. We introduce a hierarchical prior distribution on the initial velocity fields with adaptive smoothing parameters followed by a likelihood distribution on images.

Likelihood. Assuming an independent and identically distributed (i.i.d.) Gaussian noise on image intensities, we formulate the likelihood of each observed image \(I_n\) as

$$\begin{aligned} p(I_n \, | \, I, \tilde{v}_n, \sigma ^2) = \frac{1}{(\sqrt{2 \pi } \sigma ^2 )^{M}} \exp { \left( -\frac{1}{2\sigma ^2} \Vert I \circ \phi _n - I_n \Vert _2^2 \right) }. \end{aligned}$$
(5)

Here \(\sigma ^2\) denotes a noise variance, M is the number of image voxels, and \(\phi _n\) is an inverse Fourier transform of \(\tilde{\phi }_n\) at time point \(t=1\). It is worth mentioning that other noise models such as spatially varying noises [24] can also be applied.

Prior. To ensure the smoothness of transformation fields, we define a prior on each initial velocity field \(\tilde{v}_n\) as a complex multivariate Gaussian distribution

$$\begin{aligned} p({\tilde{v}}_n \, | \, \alpha _n ) = \frac{1}{(2 \pi )^{\frac{M}{2}} | {\tilde{\mathcal {L}}_n}^{-1}(\alpha _n) |} \exp \left( {-\frac{1}{2} \langle {\tilde{\mathcal {L}}_n(\alpha _n)} {\tilde{v}}_n, {\tilde{\mathcal {L}}(\alpha _n)} {\tilde{v}}_n \rangle } \right) , \end{aligned}$$
(6)

where \(|\cdot |\) is matrix determinant. The Fourier coefficients of a discrete Laplacian operator is \(\tilde{{\mathcal {L}}}_n(\xi _1 , \ldots , \xi _d) = \left( -2 \alpha _n \sum _{j = 1}^d \left( \cos (2\pi \xi _j) - 1 \right) + 1\right) ^3\), with \((\xi _1 , \ldots , \xi _d)\) being a d-dimensional frequency vector.

Hyperprior. We treat the subject-specific regularization parameter \(\alpha _n\) of the prior distribution (6) as a random variable generated from Gamma distribution, which is a commonly used prior to model positive real numbers [23]. Other prior such as inverse Wishart distribution [11] can also be applied. The hyperprior of our model is formulated as

$$\begin{aligned} p(\alpha _n \, | \, k, \beta ) = \frac{\alpha _n^{k-1} \exp ^{(-\alpha _n / \beta )}}{\varGamma (k) \beta ^{k}}, \end{aligned}$$
(7)

with k and \(\beta \) being positive numbers for shape and scale parameters respectively. The Gamma function \(\varGamma (k)=(k-1)!\) for all positive integers of k. We finally arrive at the log posterior of the diffeomorphic transformation and regularization parameters as

$$\begin{aligned} E(\tilde{v}_n, \alpha _n, I, \sigma , k, \beta )&\triangleq \ln \prod _{n=1}^N p(I_n \, | \, I, \tilde{v}_n, \sigma ^2) \cdot p({\tilde{v}}_n \, | \, \alpha _n ) \cdot p(\alpha _n \, | \, k, \beta ) \nonumber \\&= \sum _{n=1}^{N} \frac{1}{2}\ln |\mathcal {L}_n |- M \ln \sigma -\frac{ \Vert I \circ \phi _n -I_n \Vert _2^2}{2\sigma ^2} - \frac{1}{2}(\tilde{\mathcal {L}} \tilde{v}_n, \tilde{\mathcal {L}} \tilde{v}_n) \nonumber \\&\quad (k-1) \ln \alpha _n -\frac{\alpha _n}{\beta } - k \ln \beta - \ln \varGamma (k) +\text {const.} \end{aligned}$$
(8)

3.1 Model Inference

We develop an MCEM algorithm to infer the model parameter \(\varTheta \), which includes the image atlas I, the noise variance of image intensities \(\sigma ^2\), the initial velocities of diffeomorphic transformations \(\tilde{v}_n\), and the hyperparameters k and \(\beta \). We treat the regularization parameter \(\alpha _n\) as latent random variables and integrate them out from the log posterior in Eq. (8). Computations of two main steps (expectation and maximization) are illustrated below.

Expectation: HMC. Since the E-step does not yield a closed-form solution, we employ a powerful Hamiltonian Monte Carlo (HMC) sampling method [9] to approximate the expectation function Q with respect to the latent variables \(\alpha _n\). For each \(\alpha _n\), we draw a number of S samples from the log posterior (8) by using HMC from the current estimated parameters \(\hat{\varTheta }\). The Monte Carlo approximation of the expectation Q is

$$\begin{aligned} Q(\varTheta | \hat{\varTheta }) \approx \frac{1}{S} \sum _{n=1}^{N} \sum _{j=1}^{S} \ln p(\alpha _{nj} \, | \, I_n; \hat{\varTheta }). \end{aligned}$$
(9)

To produce samples of \(\alpha _n\), we first define the potential energy of the Hamiltonian system \(H(\alpha _n,\gamma ) = U(\alpha _n)+W(\gamma )\) as \(U(\alpha _n) = -\ln p(\alpha _n | I_n; \varTheta )\). The kinetic energy \(W(\gamma )\) is a typical normal distribution on an auxiliary variable \(\gamma \). This gives us Hamilton’s equations to integrate

$$\begin{aligned} \frac{\alpha _n}{dt} = \frac{\partial H}{\partial \gamma } = \gamma , \quad \frac{d \gamma }{dt}&=-\frac{\partial H}{\partial \alpha _n}=- \nabla _{\alpha _n}U. \end{aligned}$$
(10)

Since \(\alpha _n\) is a Euclidean variable, we use a standard “leap-frog” numerical integration scheme, which approximately conserves the Hamiltonian and results in high acceptance rates. The gradient of U with respect to \(\alpha _{n}\) is

$$\begin{aligned} \nabla _{\alpha _{n}}U = \frac{3}{2S}\sum _{j=1}^S[ \sum _{i=1}^d \frac{\tilde{\mathcal {A}}_{i}}{\alpha _{nj} \tilde{A}_i + 1} - \langle 2(\alpha _{nj} \tilde{\mathcal {A}} +1)^{5}\tilde{\mathcal {A}} \tilde{v}_{nj}, \tilde{v}_{nj} \rangle ], \end{aligned}$$
(11)

where \(\tilde{\mathcal {A}}= -2 \sum _{i = 1}^d \left( \cos (2\pi \xi _i) - 1 \right) \). Here \(\tilde{\mathcal {A}}\) denotes a discrete Fourier Laplacian operator with a d-dimensional frequency vector.

Starting from the current point \(\alpha _n\) and initial random auxiliary variable \(\gamma \), the Hamiltonian system is integrated forward in time by Eq. (10) to produce a candidate point \((\hat{\alpha }_n, \hat{\gamma })\). The candidate point \(\hat{\alpha }_n\) is accepted as a new point in the sample with probability \(p(accept) = \min (1, -U(\hat{\alpha }_n)-W(\hat{\gamma }) + U(\alpha _n)+W(\gamma ))\).

Maximization: Gradient Ascent. We derive the maximization step to update the parameters \(\varTheta =\{I, \tilde{v}_n, \sigma ^2, k, \beta \}\) by maximizing the HMC approximation of the expectation Q in Eq. (9).

For updating the atlas image I, we set the derivative of the Q function with respect to I to zero. The solution for I gives a closed-form update

$$\begin{aligned} I = \frac{\sum _{j=1}^S \sum _{n=1}^N (I_n \circ \phi ^{-1}_{nj}) \cdot |D \phi ^{-1}_{nj} |}{\sum _{j=1}^S \sum _{n=1}^N |D \phi ^{-1}_{nj} |}. \end{aligned}$$
(12)

Similarly, we obtain the closed-form solution for the noise variance \(\sigma ^2\) after setting the gradient of Q w.r.t. \(\sigma ^2\) to zero

$$\begin{aligned} \sigma ^2 = \frac{1}{MNS} \sum _{n=1}^N\sum _{j=1}^S \Vert I \circ \phi _{nj}-I_n \Vert _2^2. \end{aligned}$$
(13)

The closed-form solutions for hyperparameters k and \(\beta \) are

$$\begin{aligned} k = \psi ^{-1}( \frac{1}{NS}\sum _{i=1}^{N} \sum _{j = 1}^{S} \ln \alpha _{nj}-\ln \beta ), \quad \beta = \frac{1}{NS k}\sum _{n=1}^N \sum _{j = 1}^S \alpha _{nj}. \end{aligned}$$
(14)

Here \(\psi \) is a digamma function, which is the logarithmic derivative of the gamma function \(\varGamma (\cdot )\). The inverse of digamma function \(\psi ^{-1}\) is computed by using a fixed-point iteration algorithm [20].

As there is no closed-form update for initial velocities, we employ a gradient ascent algorithm to estimate \(\tilde{v}_{nj}\). The gradient \(\nabla _{\tilde{v}_{nj}} Q\) is computed by a forward-backward sweep approach. Details are introduced in the FLASH algorithm [30].

4 Experimental Evaluation

We compare the proposed model with LDDMM atlas building algorithm that employs single-penalty regularization with manually tuned parameters on 3D brain images [30]. In HMC sampling, we draw 300 samples for each subject, with initialized value of \(\alpha = 10\), \(k = 9.0\), \(\sigma =0.05\), and \(\beta = 0.1\). An averaged image of all image intensities is used for atlas initialization.

Data. We include 100 3D brain MRI scans with segmentation maps from a public released resource Open Access Series of Imaging Studies (OASIS) for Alzheimer’s disease [10]. The dataset covers both healthy and diseased subjects, aged from 55 to 90. The MRI scans are resampled to \(128^3\) with the voxel size of 1.25 mm\(^3\). All MRIs are carefully prepossessed by skull-stripping, intensity normalization, bias field correction, and co-registration with affine transformation.

Experiments. We estimate the atlas of all deformed images by using our method and compare its performance with LDDMM atlas building [30]. Final results of atlases estimated from both our model and the baseline algorithm are reported. We also compare the time and memory consumption of proposed model with the baseline that performs HMC sampling in a full spatial domain [31]. To measure the sharpness of estimated atlas I, we adopt a metric of normalized standard deviation computed from randomly selected 3000 image patches [15]. Given N(i), a patch around a voxel i of an atlas I, the local measure of the sharpness at voxel i is defined as \(\text {sharpness}(I(i)) = \text {sd}_{N(i)}(I)/ \text {avg}_{N(i)}(I)\), where sd and avg denote the standard deviation and the mean of \(N_i\).

To further evaluate the quality of estimated transformations, we perform atlas-based segmentation after obtaining transformations from our model. For a fair comparison, we fix the atlas for both methods and examine the registration accuracy by computing the dice similarity coefficient (DSC) [8] between the propagated segmentation and the manual segmentation on six anatomical brain structures, including cerebellum white matter, thalamus, brain stem, lateral ventricle, putamen, caudate. The significance tests on both dice and sharpness between our method and the baseline are performed.

Results. Figure 1 visualizes a comparison of 3D atlas on real brain MRI scans. The top panel shows that our model substantially improves the quality of atlas with sharper and better details than the baseline with different values of manually set regularization parameters, e.g., \(\alpha =0.1, 3.0, 6.0, 9.0\). Despite the observation of a smaller value of \(\alpha =0.1\) produces sharper atlas, it breaks the smoothness constraints on the transformation fields hence introducing artifacts on anatomical structures (outlined in purple boxes). The mean and standard deviation of our estimated hyperprior parameters k and \(\beta \) in Eq. (7) over 30 pairwise image registrations are 47.40/7.22, and 0.036/0.005. The bottom panel quantitatively reports the sharpness metric of all methods. It indicates that our algorithm outperforms the baseline by offering a higher sharpness score while preserving the topological structure of brain anatomy.

Fig. 1.
figure 1

Top: atlases estimated by baseline with different \(\alpha \) and our model (artifacts introduced by small regularization are outlined in purple boxes). Bottom: sharpness measurement of atlas for all methods with different patch size w. The mean of the sharpness metric of our method vs. the best performance of baseline without artifacts (\(\alpha =3\)) is 0.290/0.264, 0.362/0.323, 0.405/0.360.

Figure 2 reports results of fixed-atlas-based segmentation by performing the baseline with various regularization parameters and our algorithm. It shows the dice comparison on six anatomical brain structures of all image pairs. Our algorithm produces better dice coefficients without the need of parameter tuning.

Fig. 2.
figure 2

A comparison of dice evaluation for fixed-atlas-based segmentation on six brain structures (cerebellum white matter (WM), thalamus (Th), brain stem (BS), lateral ventricle (LV), putamen (Pu), caudate (Ca)).

The runtime of our atlas building on 100 3D brain MR images are 4.4 hours with 0.89GB memory consumption. The p-values of significance differences test on both dice (\(p=0.002\)) and sharpness (\(p=0.0034\)) reject the null hypothesis that there’s no differences between our model estimation and baseline algorithms.

5 Conclusion

This paper presents a novel hierarchical Bayesian model for unbiased diffeomorphic atlas building with subject-specific regularization. We design a new parameter choice rule that allows adaptive regularization to control the smoothness of image transformations. We introduce a hierarchical prior that provides prior information of regularization parameters at multiple levels. The developed MCEM inference algorithm eliminates the need of manual parameter tuning, which can be tedious and infeasible in multi-parameter settings. Experimental results show that our proposed algorithm yields a better registration model as well as an improved quality of atlas. While our algorithm is presented in the setting of LDDMM, the theoretical development is generic to other deformation models, e.g., stationary velocity fields [4]. In addition, this model can be easily extended to multi-atlas building where a much higher degree of variations exist in the population studies. Our future work will focus on conducting subsequent statistical shape analysis in the resulting atlas space.