Keywords

1 Introduction

Diffusion MRI of the brain is based on the anisotropic diffusion of water due to the presence of nerve fibers. For diffusion MRI models, such as the diffusion tensor model, it is crucial for reliable analysis that all acquired images are spatially aligned. However, during the acquisition, subject motion and eddy currents may cause misalignment, which are typically characterized with global 3D affine transformation models, [11, 17]. Alignment of the diffusion-weighted images (DWIs) is commonly achieved by a pairwise affine registration of every single DWI to the non-DWI (i.e., b = 0) image [6, 17]. Recently, several groupwise registration approaches were proposed, e.g. [1, 13, 15, 23], in which a global dissimilarity metric is minimized during a simultaneous optimization of the transform parameters for all images. In this way, a bias towards the reference image is avoided and the intensity information of all images is taken into account simultaneously, leading to more consistent registration results [13, 15]. Especially for data with a low signal-to-noise ratio (SNR) it is preferable to use the intensity information of all images simultaneously.

In this work, we propose a novel groupwise registration method for diffusion MRI data, using a dissimilarity metric based on principal component analysis (PCA). Our method is based on the assumption that aligned data can be modeled by a limited number of principal components with high eigenvalues, while unaligned data will need more principal components with high eigenvalues. Rohde et al. have previously used this principle as a post-hoc evaluation method [17]. Melbourne et al. [14] also use PCA in a registration. They use a progressive principal component registration (PPCR), iteratively subtracting the principal components from the original image data.

In our work we use the PCA during registration by deriving a dissimilarity metric that explicitly maximizes the selected eigenvalues. This approach does not make assumptions about the diffusion, such as the diffusion tensor model, which means that it can potentially be used for any other diffusion approach as well, such as diffusion kurtosis imaging, [5], or diffusion spectrum imaging [25].

2 Method

2.1 Groupwise Registration Framework

Let Mg(x) be the series of \(\mathrm{g} \in \{ 1\mathop{\ldots }G\}\) images to be registered, with x a 3D voxel position. During registration, a transformation \(\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\mu }_{\mathrm{g}})\) is applied to each image \(\mbox{ M}_{\mathrm{g}}\left (\mathbf{T}_{g}(\mathbf{x};\boldsymbol{\mu }_{\mathrm{g}})\right )\). The transformation \(\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\mu }_{\mathrm{g}})\) is parameterized by \(\boldsymbol{\mu }_{\mathrm{g}}\) and the goal is to find \(\boldsymbol{\mu }_{\mathrm{g}}\). For the groupwise registration approach the transform parameters for each separate volume are concatenated into one parameter vector \(\boldsymbol{\mu }={ \left (\boldsymbol{\mu }_{\mathrm{1}}^{\mbox{ T}},\boldsymbol{\mu }_{\mathrm{2}}^{\mbox{ T}},\mathop{\ldots },\boldsymbol{\mu }_{G}^{\mathrm{T}}\right )}^{\mathrm{T}}\). Transform parameters \(\boldsymbol{\hat{\mu }}\) are estimated by minimizing a dissimilarity metric \(\mathcal{D}:\hat{\boldsymbol{\mu }}= \mathrm{argmin}_{\boldsymbol{\mu }}\mathcal{D}(\boldsymbol{\mu })\).

2.2 Dissimilarity Metric

Given sample locations x i with \(\mathrm{i} \in \{ 1\mathop{\ldots }N\}\), we can define \(\mathbf{M}\left (\boldsymbol{\mu }\right )\):

$$\displaystyle{ \mathbf{M}\left (\boldsymbol{\mu }\right ) = \left (\begin{array}{ccc} \mbox{ M}_{1}\left (\mathbf{T}_{1}\left (\mathbf{x}_{1};\boldsymbol{\mu }_{1}\right )\right ) &\ldots & \mbox{ M}_{G}\left (\mathbf{T}_{G}\left (\mathbf{x}_{1};\boldsymbol{\mu }_{G}\right )\right )\\ \vdots &\ddots & \vdots \\ \mbox{ M}_{1}\left (\mathbf{T}_{1}\left (\mathbf{x}_{N};\boldsymbol{\mu }_{1}\right )\right )&\ldots &\mbox{ M}_{G}\left (\mathbf{T}_{G}\left (\mathbf{x}_{N};\boldsymbol{\mu }_{G}\right )\right )\\ \end{array} \right ) }$$
(1)

The dissimilarity metric is based on PCA performed on the measurements in \(\mathbf{M}\left (\boldsymbol{\mu }\right )\). Define the G × G correlation matrix K associated with M:

$$\displaystyle{ \mathbf{K} =\boldsymbol{ {\Sigma }}^{-1}\mathbf{C}\boldsymbol{{\Sigma }}^{-1} = \frac{1} {N - 1}\boldsymbol{{\Sigma }}^{-1}{\left (\mathbf{M -\overline{M}}\right )}^{\mathrm{T}}\left (\mathbf{M -\overline{M}}\right )\boldsymbol{{\Sigma }}^{-1}, }$$
(2)

\(\mathbf{\overline{M}}\) is a matrix within each column the column-wise average of M. \(\boldsymbol{\Sigma }\) is a diagonal matrix that equals the square root of the diagonal of the covariance matrix C: \(\boldsymbol{\Sigma } = \mbox{ diag}\left [\sqrt{C_{11 },}\mathop{\ldots },\sqrt{C_{\mathrm{gg }}},\mathop{\ldots },\sqrt{C_{GG}}\right ]\). Element i, j of K describes the correlation between the images Mi and Mj. The correlation between an image with itself is one by definition. On the diagonal of K we therefore find only ones. The trace of K is then equal to the number of images in the series, G. Our metric is then defined as

$$\displaystyle{ \mathcal{D}_{\mbox{ PCA}} =\sum _{ \mathrm{g}=1}^{G}\mbox{ K}_{\mathrm{ gg}} -\sum _{\mathrm{j}=1}^{L}\lambda _{ \mathrm{j}} = G -\sum _{\mathrm{j}=1}^{L}\lambda _{ \mathrm{j}}, }$$
(3)

where λ j is the j-highest eigenvalue of K and L is a user-defined number. It is expected that for diffusion weighted images that follow the diffusion tensor model, the optimal value for L agrees with the number of free parameters in the tensor model. The diffusion tensor model has seven free parameters: the six independent diffusion tensor elements and the intensity for b = 0. Usually more images are acquired to obtain a better estimation of the model parameters. The redundancy in the number of measured images becomes visible when PCA is applied: a limited number (L) of eigenvectors describes most of the variance in the aligned data. However, when the images are misaligned, the number of eigenvalues that describe most of the data variance is higher. During optimization of \(\mathcal{D}_{\mbox{ PCA}}\), the total variance minus the L highest eigenvalues is minimized. The image data is registered such that it can be described best by the eigenvectors belonging to the L highest eigenvalues. Real data does not follow the diffusion tensor model everywhere, [2]. The real data is more complicated than the model fitted so it is expected that for real data the optimal value for L ≥ 7.

2.3 Metric Derivative

For minimization with gradient based optimizers the metric derivative with respect to \(\boldsymbol{\mu }\) must be known. Differentiating Eq. (3) with respect to \(\boldsymbol{\mu }\) and using the approach of van der Aa et al. [21] for calculating eigenvalue derivatives we get:

$$\displaystyle{ \frac{\partial \mathcal{D}} {\partial \boldsymbol{\mu }} = -\sum _{\mathrm{j}=1}^{L}\frac{\partial \lambda _{\mathrm{j}}} {\partial \boldsymbol{\mu }} = -\sum _{\mathrm{j}=1}^{L}\mathbf{v}_{\mathrm{ j}}^{\mbox{ T}}\frac{\partial \mathbf{K}} {\partial \boldsymbol{\mu }} \mathbf{v}_{\mathrm{j}}, }$$
(4)

where \(\mathbf{v}_{\mathrm{j}}^{T}\) is the jth eigenvector of the matrix of K. The unlikely repetition of eigenvalues which leads to linear combination of eigenvectors being also an eigenvector, invalidating the above expression, is ignored [21]. Schultz and Seidel [18] also use this approach, for calculating eigenvalue derivatives of the diffusion tensor in DW-MRI data. Using Eqs.  (2) and (4) we get for the derivative of \(\mathcal{D}\) with respect to an element μ p:

$$\displaystyle\begin{array}{rcl} & & \frac{\partial \mathcal{D}} {\partial \mu _{p}} = - \frac{2} {\mathrm{N} - 1}\sum _{\mathrm{i}=1}^{L}\left [\mathbf{v}_{\mathrm{ i}}^{\mbox{ T}}\boldsymbol{{\Sigma }}^{-1}{\left (\mathbf{M -\mathbf{\overline{M}}}\right )}^{\mbox{ T}}\left (\frac{\partial \mathbf{M}} {\partial \mu _{\mathrm{p}}} -\frac{\partial \mathbf{\overline{M}}} {\partial \mu _{\mathrm{p}}} \right )\boldsymbol{{\Sigma }}^{-1}\mathbf{v}_{\mathrm{ i}}\right. \\ & & \qquad \quad \left.+\mathbf{v}_{\mathrm{i}}^{\mbox{ T}}\boldsymbol{{\Sigma }}^{-1}{\left (\mathbf{M -\overline{M}}\right )}^{\mbox{ T}}\left (\mathbf{M -\overline{M}}\right )\frac{\partial \boldsymbol{{\Sigma }}^{-1}} {\partial \mu _{\mathrm{p}}} \mathbf{v}_{i}\right ]. {}\end{array}$$
(5)

The expression above is obtained after some simplifications and using the fact that

$$\displaystyle{ \mathbf{v}_{\mathrm{i}}^{\mbox{ T}}{\mathbf{B}}^{\mbox{ T}}\mathbf{E}\mathbf{v}_{\mathrm{ i}} =\mathbf{ v}_{\mathrm{i}}^{\mbox{ T}}{\mathbf{E}}^{\mbox{ T}}\mathbf{B}\mathbf{v}_{\mathrm{ i}}. }$$
(6)

for two matrices B and E and vector v i .

The derivative of \(\boldsymbol{{\Sigma }}^{-1}\) with respect to μ p is equal to

$$\displaystyle{ \frac{\partial \boldsymbol{{\Sigma }}^{-1}} {\partial \mu _{\mathrm{p}}} = - \frac{\boldsymbol{{\Sigma }}^{-3}} {N - 1}\mbox{ diag}\left [{\left (\mathbf{M} -\mathbf{\overline{M}}\right )}^{T}\left (\frac{\partial \mathbf{M}} {\partial \mu _{\mathrm{p}}} -\frac{\partial \mathbf{\overline{M}}} {\partial \mu _{\mathrm{p}}} \right )\right ] }$$
(7)

and \(\partial \mathbf{M}/\partial \mu _{\mathrm{p}}\) and \(\partial \overline{\mathbf{M}}/\partial \mu _{\mathrm{p}}\) are computed using

$$\displaystyle{ \frac{\partial \mbox{ M}_{\mathrm{g}}\left (\mathbf{T}_{\mathrm{g}}\left (\mathbf{x}_{\mathrm{i}};\boldsymbol{\mu }_{\mathrm{g}}\right )\right )} {\partial \mu _{\mathrm{p}}} ={ \left (\frac{\partial \mbox{ M}_{\mathrm{g}}} {\partial \mathbf{x}} \right )}^{\mbox{ T}}\left (\frac{\partial \mathbf{T}_{\mathrm{g}}} {\partial \mu _{\mathrm{p}}} \right )\left (\mathbf{x}_{\mathrm{i}};\boldsymbol{\mu }_{\mathrm{g}}\right ). }$$
(8)

2.4 Transformation Model

The applied affine transformation is defined as \(\mathbf{T}_{g}(\mathbf{x};\boldsymbol{\mu }_{g}) = \mathbf{A}(\mathbf{x} -\mathbf{c}) + \mathbf{t} + \mathbf{c}\), where A is a matrix without restrictions and c is the center of rotation. For the parameterization we use an exponential mapping of A, similar to [23]:

$$\displaystyle{ \mathbf{A} =\exp \left (\boldsymbol{\Gamma }\right ) =\exp \left (\begin{array}{ccc} \mu _{1} & \mu _{2} & \mu _{3}\\ \mu _{ 4} & \mu _{5} & \mu _{6}\\ \mu _{7 } & \mu _{8 } & \mu _{9}\\ \end{array} \right )\,\,\,\,\,\,\,\,\,\mbox{ and}\,\,\,\,\,\,\,\,\,\boldsymbol{t} ={ \left (\mu _{10},\mu _{11},\mu _{12}\right )}^{\mbox{ T}}, }$$
(9)

where exp(⋅ ) is the matrix exponential and omitting subscript g for clarity. For the calculation of the metric derivative, \(\partial \mathbf{T}_{g}/\partial \boldsymbol{\mu }_{g}\) is required. This derivative is trivial for the translation part of the transform. For the linear part the approach of Fung is applied [4]. Consider the following system of differential equations:

$$\displaystyle{ \frac{\mbox{ d}} {\mbox{ d}t}\mathbf{y} =\boldsymbol{ \Gamma }\mathbf{y}\,\,\, \mbox{with solution at}\ t = 1:\,\,\, \mathbf{y}\left (1\right ) =\exp \left (\boldsymbol{\Gamma }\right )\mathbf{y}\left (0\right ). }$$
(10)

Now differentiate Eq. (10) with respect to μ p :

$$\displaystyle{ \frac{\mbox{ d}} {\mbox{ d}t}\left ( \frac{\partial } {\partial \mu _{p}}\mathbf{y}\right ) = \frac{\partial \boldsymbol{\Gamma }} {\partial \mu _{p}} \mathbf{y} +\boldsymbol{ \Gamma } \frac{\partial } {\partial \mu _{p}}\mathbf{y} }$$
(11a)
$$\displaystyle{ \frac{\partial } {\partial \mu _{p}}\mathbf{y}(1) = \frac{\partial \exp \left (\boldsymbol{\Gamma }\right )} {\partial \mu _{p}} \mathbf{y}(0) +\exp \left (\boldsymbol{\Gamma }\right ) \frac{\partial } {\partial \mu _{p}}\mathbf{y}(0) }$$
(11b)

and define:

$$\displaystyle{ \mathbf{z} = \left (\begin{array}{c} \frac{\partial } {\partial \mu _{p}}\boldsymbol{y} \\ \mathbf{y} \end{array} \right )\,\,\,\,\,\,\,\,\,\mbox{ and}\,\,\,\,\,\,\,\,\,\,\boldsymbol{\tilde{\Gamma }} = \left (\begin{array}{cc} \boldsymbol{\Gamma }&\frac{\partial \boldsymbol{\Gamma }} {\partial \mu _{p}} \\ 0 & \boldsymbol{\Gamma } \end{array} \right ). }$$
(12)

Then Eqs. (11a) and (11b) can be written as:

$$\displaystyle{ \frac{\mbox{ d}} {\mbox{ d}t}\mathbf{z} =\tilde{\boldsymbol{ \Gamma }}\mathbf{z},\,\,\,\mathrm{with\ solution\ at}\ t = 1:\,\,\, \mathbf{z}(1) =\exp \left (\boldsymbol{\tilde{\Gamma }}\right )\mathbf{z}(0) }$$
(13a)
$$\displaystyle{ \mathbf{z}(1) = \left (\begin{array}{cc} \exp \left (\boldsymbol{\Gamma }\right )& \frac{\partial } {\partial \mu _{p}}\exp \left (\boldsymbol{\Gamma }\right ) \\ 0 & \exp \left (\boldsymbol{\Gamma }\right ) \end{array} \right )\mathbf{z}(0). }$$
(13b)

Combining Eqs. (13a) and (13b) it follows that \(\frac{\partial } {\partial \mu _{p}}\exp \left (\boldsymbol{\Gamma }\right )\) can be extracted from \(\exp (\boldsymbol{\tilde{\Gamma }})\).

2.5 Optimization

The adaptive stochastic gradient descent (ASGD) [7] is used for optimization. This optimizer randomly samples the image in order to reduce computation time. A conventional multi-resolution strategy is used to avoid convergence to local minima. The number of random samples, the number of resolution levels and, the number of iterations per resolution level are user-defined parameters. The average deformation of the DWIs is constrained to zero. To guarantee this the approach of Balci et al. is applied [1]:

$$\displaystyle{ \frac{\partial {\mathcal{D}}^{{\ast}}} {\partial \boldsymbol{\mu }_{\mathrm{g}}} = \frac{\partial \mathcal{D}} {\partial \boldsymbol{\mu }_{\mathrm{g}}} - \frac{1} {G}\sum _{\mathrm{g}^{\prime}}\frac{\partial \mathcal{D}} {\partial \boldsymbol{\mu }_{\mathrm{g}}^{\prime}}. }$$
(14)

A scaling between the matrix elements of the transform and the translations is necessary, due to the different ranges in voxel displacement that they cause. The scaling is done according to Klein et al. [7].

2.6 Groupwise Approaches Proposed by Others

Wachinger et al. [23] proposed accumulated pair-wise estimates (APE) as a family of metrics. One of the metrics they propose is the sum of squared normalized correlation coefficients. This can be written as the sum of the squared elements of the correlation matrix K. We implemented this metric as follows:

$$\displaystyle{ \mathcal{D}_{\mbox{ APE}} = 1 - \frac{1} {G}\sqrt{\sum _{\mathrm{j} } \sum _{\mathrm{j} } K_{\mathrm{ij } }^{2}}. }$$
(15)

Metz et al. [15] proposed the sum of the variances, assuming no intensity changes between images. The metric is defined as:

$$\displaystyle{ \mathcal{D}_{\mbox{ VAR}} = \frac{1} {\mathrm{NG}}\sum _{\mathrm{i}=1}^{\mathrm{N}}\sum _{ \mathrm{g}=1}^{G}{\left [\mbox{ M}_{\mathrm{ g}}\left (\mathbf{T}_{\mathrm{g}}\left (\mathbf{x}_{\mathrm{i}},\boldsymbol{\mu }_{\mathrm{g}}\right )\right ) - \frac{1} {G}\sum _{\mathrm{g}}\mbox{ M}_{\mathrm{g}}\left (\mathbf{T}_{\mathrm{g}}\left (\mathbf{x}_{\mathrm{i}},\boldsymbol{\mu }_{\mathrm{g}}\right )\right )\right ]}^{2}. }$$
(16)

Both metrics are compared with our method \(\mathcal{D}_{\mbox{ PCA}}\).

2.7 Implementation

The method is implemented in the publicly available registration package elastix [8]. In all experiments we used 10,000 samples, 2 resolutions, and 1,000 iterations per resolution. The sampling was performed off the voxel grid, to reduce interpolation artifacts [8]. Masks are used in all experiments. For the synthetic data a mask is used to exclude the background, in order to only sample in the region of the brain. This is done to make sure that the zero voxel values in the background are not helping the registration. For both the synthetic data and the real data, a mask is used to exclude the high voxel values of the b = 0 image (in cerebrospinal fluid). When excluding the high voxel values the correlation between the b = 0 image and the DWIs is increased. In preliminary experiments it was observed that this mask is necessary for good alignment.

3 Experiments and Results

3.1 Synthetic Data

A noiseless, initially perfectly aligned, synthetic DWI set [9], with size 107 × 79 × 60, 1. 8 × 1. 8 × 2. 4 mm3 voxel size and G = 61, was transformed with five random parameterized affine transformations. The matrix elements of \(\boldsymbol{\Gamma }\) were drawn from a normal distribution \(0.01\mathcal{N}(0,1)\) and t x , t y and t z were drawn from \(\mathcal{N}(0,1)\) (mm). The five transformed image sets are registered with the three different metrics \(\mathcal{D}_{\mbox{ PCA}}\), \(\mathcal{D}_{\mbox{ APE}}\) and \(\mathcal{D}_{\mbox{ VAR}}\), where for metric \(\mathcal{D}_{\mbox{ PCA}}\) different values for \(L\) are investigated: \(L \in \{ 1\mathop{\ldots }10\}\). The synthetic data is simulated using the diffusion tensor model. For \(L = G\), \(\mathcal{D}_{\mbox{ PCA}} = 0\) and the expected optimal value for L is 7, since this is the number of free parameters in the diffusion tensor model. The range chosen for investigating the impact of L is therefore expected to be sufficiently broad.

In the next experiment Gaussian noiseFootnote 1 was added to the synthetic DWI set, which gives a resulting SNR of 8.65. Our metric, using L = 6, was used to register the noised DWI set with the same initial five transforms.

Table 1 Mean and standard deviation of \(\|{\mathbf{d}}^{{\ast}}\left (\mathbf{x}\right )\|\) for the noiseless synthetic DWI set without registration, for registration with metric \(\mathcal{D}_{\mbox{ APE}}\), with metric \(\mathcal{D}_{\mbox{ VAR}}\) and with metric \(\mathcal{D}_{\mbox{ PCA}}\) for \(L = 1\mathop{\ldots }10\)

Evaluation measure Let \(\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\hat{\mu }}_{\mathrm{g}})\) be the transformation that was computed by the registration. The deformation field of the initial aligned data, after application of the composition of \(\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\hat{\mu }}_{\mathrm{g}})\) and \(\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\mu }_{\mbox{ init,g}})\) should be zero: \(\mathbf{d}_{\mathrm{g}}(\mathbf{x}) = \mathbf{T}_{\mathrm{g}}\left (\mathbf{T}_{\mathrm{g}}(\mathbf{x};\boldsymbol{\mu }_{\mbox{ init,g}});\hat{\boldsymbol{\mu }}_{\mathrm{g}}\right ) -\mathbf{x} = \mathbf{0}\). However, the constraint, Eq. (14), was not applied to the initial transformation, so we subtract the mean of the deformation field: \(\mathbf{d}_{\mathrm{g}}^{{\ast}}(\mathbf{x}) = \mathbf{d}_{\mathrm{g}}(\mathbf{x}) - \frac{1} {G}\sum _{g=1}^{G}\mathbf{d}_{\mathrm{ g}}(\mathbf{x})\). Our measure for the registration error was then defined as the mean and standard deviation of \(\|\mathbf{d}_{\mathrm{g}}^{{\ast}}(\mathbf{x})\|\) for all \(\mathbf{x}\) and \(\mathrm{g}\).

Results Table 1 shows the mean and standard deviation of \(\|{\mathbf{d}}^{{\ast}}(\mathbf{x})\|\) for the experiments with the synthetic DWI set, for each different metric and before registration. For L = 6 our method performs best. But for L > 3 our method is already better than \(\mathcal{D}_{\mbox{ APE}}\). For L = 2, the error is particularly high. Visual inspection revealed that in this case the registration resulted in two completely misaligned groups of images M g , although the images within each group were properly aligned with each other. This apparently leads to a correlation matrix K with two relatively high eigenvalues.

Table 2 Results for synthetic DWI set with noise

Table 2 shows the results of the experiments with noisy synthetic DWI set, for all metrics. For the noisy dataset our method is the best of the three.

3.2 Real Diffusion Weighted Data

Five diffusion MRI data sets were obtained from different previous studies to evaluate our new approach. Details of these studies:

  1. 1.

    10 b = 0 s/mm2; 60 b = 700 s/mm2; 2. 0 × 2. 0 × 2. 0 mm3 voxel size; 1.5T; [10]

  2. 2.

    1 b = 0 s/mm2; 60 b = 1, 200 s/mm2; 1. 75 × 1. 75× 2.0 mm3 voxel size; 3.0T; [3]

  3. 3.

    1 b = 0 s/mm2; 32 b = 800 s/mm2; 1. 75 × 1. 75× 2.0 mm3 voxel size; 3.0T; [24]

  4. 4.

    1 b = 0 s/mm2; 32 b = 800 s/mm2; 1. 75 × 1. 75× 2.0 mm3 voxel size; 1.5T; [22]

  5. 5.

    1 b = 0 s/mm2; 45 b = 1, 200 s/mm2; 1. 72 × 1. 72× 2.0 mm3 voxel size; 3.0T; [16]

For all data sets we choose L = 6.

To evaluate the registration, ExploreDTI [12] is used. B-matrix rotation is applied [3]. Directional Encoded Colormaps (DEC) and the standard deviation (STD) across the DWIs show a bright rim at the edge of the brain when the images are not aligned [20]. DEC maps and STDs of the DWIs of the five datasets before and after alignment are shown in Fig. 1. The bright rim is visible in the data pre alignment and is not visible in the data post alignment. The method is therefore successful in aligning the data. Metrics \(\mathcal{D}_{\mbox{ VAR}}\) and \(\mathcal{D}_{\mbox{ APE}}\) were also tested. In agreement with the results on synthetic data, the proposed metric \(\mathcal{D}_{\mbox{ PCA}}\) outperformed \(\mathcal{D}_{\mbox{ VAR}}\) and gave a slight improvement over \(\mathcal{D}_{\mbox{ APE}}\), based on inspection of the DEC and STD maps.

Fig. 1
figure 1

STD (equal intensity range for pre and post registration) and DEC maps of datasets 1–5, pre and post alignment

4 Conclusions

With regard to the synthetic DWI set our method outperforms the two existing groupwise methods with which it was compared.

The use of L = 6 eigenvalues performed best. This is due to the underlying structure of the synthetic DWI data: the degrees of freedom are determined by the directions of the diffusion in the brain. The diffusion tensor model has 7 free model parameters, so the optimal value for L was expected to be 7. This optimal value is 6 due to the use of the mask: The eigenvalue spectrum of the aligned synthetic DWI shows that 99 % of the data variance is described by 6 eigenvectors when the high values are masked, but without masking the high values, this number is 7. Masking out these high values decreases the data variance that is described by the b = 0 image, leading to a decrease in the number of eigenvectors describing most of the data variance.

The number of eigenvalues is a parameter that must be set correctly to obtain the best results, however the synthetic data experiments suggest that large values of L are preferable to small values of L. The optimal value for L for real data may not be 6. Further research qualifying the performance of the registration for real data is necessary to see if the optimal value should be > 6.

It would be an improvement if the method could also work without the necessity of the mask. Furthermore it would be interesting to investigate if the proposed method also works on other applications that involve intensity changes over time, such as perfusion-imaging or T1 mapping data.

Altogether the proposed method offers potential improvements over the current standard to align diffusion weighted data, due to the general benefits of groupwise registration , the fact that the method is not parameterized with a diffusion model, and the good results obtained on the synthetic and real data.