Keywords

1 Introduction

Typical applications in medical imaging are to analyze spatio-temporal variations of bio-medical images. A prerequisite for such analysis is that images are aligned and in many cases joint registration of multiple images is required. Examples are, e.g., analysis of images from different time points and/or different complimentary modalities, atlas registration, longitudinal normalization, motion correction or image reconstruction [1, 4, 7, 8, 12, 13, 18, 19, 21].

A number of registration models are already available to register a pair of two images [15, 20, 22], but their simple extension to register a group of images might suffer from various problems. Generally, these pair-wise methods assume one of the images as a reference image, and therefore registrations are implicitly biased towards the reference image. Moreover, the selection of a reference image from the given image sequence is not always a very straight forward process. Most importantly, these registration models are primarily influenced by features shared by the image pair and less affected by the features other images have in the image sequence. Therefore, this approach does not account the global information available in the image sequence. It has also been shown that these methods have slow convergence rate compared to the groupwise methods [2, 3].

To avoid the selection of a reference image and the related bias, Joshi et al. [13] proposed the registration of each image from the image sequence with respect to the group mean of the registered image sequence. This approach does not need to define the reference image explicitly, moreover accounts the global information through the group mean. This approach inherits the assumption that every image in the image sequence is almost similar to the group mean.

Recently, Guyader [8] and Brehmer [2, 3] proposed groupwise registration methods for a sequence of images. The underlying assumption is that images are linearly dependent if they are aligned. The linear dependency idea completely circumvents the need of defining a group mean image. Both of these methods construct an image matrix where each column is corresponding to an image from the sequence. Brehmer [2, 3] estimates transformation fields by minimizing the rank of the matrix and implicitly forcing columns of the matrix to become linear dependent to each other. Guyader [8] utilizes the multivariate version of mutual information, called total correlation, to define a groupwise registration model.

The paper is structured as follows: In Sect. 2, we discuss mathematical formulations of SVD based image registration approaches. More precise, we discuss a general framework for groupwise registration models based on correlation maximization. In Sect. 3 we briefly discuss the used numerical setting. After that, in Sect. 4, we demonstrate the performance of some of the proposed methods on two datasets and compare them with other state-of-the-art methods.

2 Registration Approaches for Multiple Images

In this section, we describe our Schatten q-norm based distance measure \(\mathrm {S}q\mathrm {N}\) for multiple images. We start by briefly outlining a standard variational registration framework for two images [15]. We then present a straightforward extension for multiple images and discuss the drawbacks of the naive approach drawbacks. The main drawbacks are its sequential and thus ordering dependent assessment of the image frames and the weak coupling of image information over the frames.

We then present the setting of the \(\mathrm {S}q\mathrm {N}\) distance measure. The main idea is to make use of the singular values of an image feature array. Finally, we relate the Schatten q-norm based distance measure to work of Friedman et al. [6] and Guyader et al. [8].

2.1 Variational Registration Approach for Two Images

We start the discussion with a standard approach to image registration; see e.g. [15] for details. To simplify discussion, an image \(\mathcal {T}\) is assumed to be a real valued intensity function \(\mathcal {T}:{{\,\mathrm{\mathbb {R}}\,}}^d\rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) with compact support in a domain \(\varOmega \subset {{\,\mathrm{\mathbb {R}}\,}}^d\). Given two images \(\mathcal {T}_0,\mathcal {T}_1\), the goal of image registration is to find a transformation \(y:{{\,\mathrm{\mathbb {R}}\,}}^d\rightarrow {{\,\mathrm{\mathbb {R}}\,}}^d\) such that ideally \(\mathcal {T}_1\circ y\approx \mathcal {T}_0\), where \(\mathcal {T}\circ y(x):=\mathcal {T}(y(x))\). To achieve this goal, we choose a variational framework where a joined functional

$$\begin{aligned} J^{\mathrm {two}}(y;\mathcal {T}_0,\mathcal {T}_1):=D(\mathcal {T}_0, \mathcal {T}_1\circ y)+S(y), \end{aligned}$$
(1)

is to be minimized over an admissible set of transformations. Various choices for distance measures D and regularizers S are discussed in the literature; see e.g. [15] and references therein. A thorough discussion is beyond the scope of this paper. Here, we only briefly recall the \(L_2\)-norm (sum of squared distances, \(\mathrm {SSD}\)), the normalized gradient field (\(\mathrm {NGF}\)) [10], and the elastic potential [5]:

$$\begin{aligned} D^{\mathrm {SSD}}(\mathcal {T}_0, \mathcal {T}_1\circ y):= & {} \textstyle \frac{1}{2} \Vert \mathcal {T}_1\circ y-\mathcal {T}_0\Vert _{L_2(\varOmega )}^2, \end{aligned}$$
(2)
$$\begin{aligned} D^{\mathrm {NGF}}(\mathcal {T}_0, \mathcal {T}_1\circ y):= & {} \textstyle \frac{1}{2}\int _{\varOmega } \Big [1 -\left\langle \frac{\nabla \mathcal {T}_1\circ y}{\Vert \nabla \mathcal {T}_1\circ y\Vert _{\eta }}, \frac{\nabla \mathcal {T}_0 }{\Vert \nabla \mathcal {T}_0\Vert _{\eta }}\right\rangle _{}^2\Big ]\ dx \end{aligned}$$
(3)
$$\begin{aligned} S^{\mathrm {elas}}(y):= & {} \textstyle \frac{1}{2}\Vert \mu \, \mathrm {tr}(E^2) + \lambda \,\mathrm {tr}(E)^2 \Vert _{L_2(\varOmega )}^2 \end{aligned}$$
(4)

with \(\Vert a\Vert _\eta :=\sqrt{\left\langle a,a\right\rangle _{}+\eta }\), \(\eta >0\) and strain \(E := \nabla y + \nabla y^\top - I\) where I is the identity matrix.

Derivations of image intensities are also commonly used to quantify image similarity. For a unified conceptual framework, we introduce a feature map F that maps an image to a Hilbert space of features. Any metrics \(\mu \) on the feature space can then be used for registration: \(D(\mathcal {T}_0,\mathcal {T}_1):=\mu (F(\mathcal {T}_0),F(\mathcal {T}_1))\). Examples of such feature maps are e.g. intensity normalization \(F^\text {IN}(\mathcal {T}) = \mathcal {T}/\Vert \mathcal {T}\Vert _{L_2}\) or the normalized gradient field, \(F^\text {NGF}(\mathcal {T}) = \nabla \mathcal {T}/\Vert \nabla \mathcal {T}\Vert _{\eta }\), to name a few. Note that the NGF distance measure is based on \(\nabla \mathcal {T}(x)/\Vert \nabla \mathcal {T}(x)\Vert _{\eta }\) whereas the feature map is based on \(\nabla \mathcal {T}(x)/\Vert \nabla \mathcal {T}\Vert _{L_2}\).

2.2 Sequential Registration Approach for Multiple Images

Our goal is to extend the standard registration to sequences of images \(T=(\mathcal {T}_1,\ldots ,\mathcal {T}_K)\). Note that the images might be given as a time series such as our DCE-MRI example, a structured process such as the HISTO application, or even an unstructured ensemble of images such as an atlas generation.

The first approach is to simply apply the above framework sequentially. With transformations \(Y=(y_1,\ldots ,y_K)\) the corresponding energy to be minimized with respect to \(Y\) reads

$$\begin{aligned} J^{\mathrm {seq}}(Y;T):=\sum _{k=2}^K\left\{ D(\mathcal {T}_{k-1}\circ y_{k-1},\mathcal {T}_{k}\circ y_{k})+S(y_k)\right\} . \end{aligned}$$
(5)

Note that typically, one of the deformations is fixed, e.g., \(y_1(x) := x\) for well-posedness. However, as the problem is usually too big to be solved straightforwardly, a non-linear Gauss-Seidel type iteration is usually applied. Here, one assumes that \(Y\) is a good starting guess and sequentially improves component by component for \(\ell =1,\ldots ,K\) by determining optimizers

$$\begin{aligned} z^* \in \arg \min _z J^{\mathrm {seq}}(y_1,\ldots ,y_{\ell -1},z,y_{\ell +1},\ldots ,y_K;T), \end{aligned}$$
(6)

setting \(y_\ell :=z^*\) and iterates until convergence. This process is generally rather expensive and therefore slow. A problem is that the coupling of the different components of \(Y\) is weak. An update of \(y_\ell \) has impact only every K-th step in the procedure. Therefore, potentially a high number of iterations is required.

2.3 Global Registration Approach for Multiple Images

Here, we propose a registration approach that provides a full coupling of all image frames. Our objective is to find a minimizer \(Y\) of the energy \(J^{\mathrm {glo}}\),

$$\begin{aligned} J^{\mathrm {glo}}(Y;T) :=D^{\mathrm {glo}}(T\circ Y)+S^{\mathrm {glo}}(Y), \end{aligned}$$
(7)

where we use the suggestive abbreviation \(T\circ Y:=(\mathcal {T}_0\circ y_0,\ldots ,\mathcal {T}_K\circ y_K)\) and for sake of simplicity let be \( S^{\mathrm {glo}}(Y):=\sum _{k=1}^K S(y_k) \) with S any of the regularizers discussed in Sect. 2.1. Clearly, one could debate for a more general or even stronger regularization of \(Y\). However, this is not in the scope of the paper and we leave the discussion for future work. The essential contribution is thus the global distance measure that is based on the feature array \(F(T):=[F(\mathcal {T}_1),\ ,\ldots ,\ F(\mathcal {T}_K)]\) which comprises the features of the image sequence and its symmetric, positive semi-definite correlation matrix \(C=\left\langle F,F\right\rangle _{} \in {{\,\mathrm{\mathbb {R}}\,}}^{K\times K}\) where \(C_{ij}\) assembles the correlations of \(F(\mathcal {T}_i)\) and \(F(\mathcal {T}_j)\). Note that we assumed F maps into a Hilbert space such that the correlation is well defined according to the corresponding inner product. Our key assumption is that the rank of the feature array is minimal if the image frames are aligned. Note that we actually aim to exclude the trivial situation \(\mathop {\mathrm {rank}}F=0\) as this implies that all features are zero. We also note that the assumption may not hold for multi-modal images, if the feature map does not compensate intensity variations. Therefore, a plain image intensity based feature map may not be successful. If we expect that intensity changes will occur at similar positions in space, e.g., the NGF feature map is a valid choice.

2.4 Schatten q-norm Based Image Similarity Measure \(D_{S,q}\)

The above considerations suggest to choose \(\mathop {\mathrm {rank}}F\) as a distance measure. In [2, 3], Brehmer et al. proposed to reformulate the rank minimization problem in terms of a relaxation of the rank function based on a so-called Schatten q-norm. Roughly speaking, the Schatten q-norm of an operator is the q-norm of the vector of its singular values. Thus

$$\begin{aligned} D_{S,q}(T):= \Vert F(T)\Vert _{S,q}:= \Big ( \sum _{k=1}^{K} \sigma _k(F(T))^q \Big )^{1/q} \end{aligned}$$
(8)

where \(\sigma _k\), \(k=1,\ldots ,K\), denote the non-zero singular values of \(F(T)\). Before we discuss numerical details, we relate this measure to other rank based similarity measures for image stacks. Particularly we address volume minimization of the feature parallelotope and correlation maximization of normalized features.

2.5 Volume Minimization of the Feature Parallelotope

The above approach can be linked to work of Guyader et al. [8]. To this end, we consider the minimization of the volume of the parallelotope spanned by the columns of \(F(T)\). Equivalently, we can consider the determinant of C or, exploring the monotonicity of the logarithm, set

$$\begin{aligned} \textstyle D(T) :=\log (\det (C(T))) =\log (\prod _{k=1}^K\sigma _k^2(F(T))) =2\sum _{k=1}^K\log (\sigma _k(F(T))). \end{aligned}$$
(9)

This expression is related to the volume of a normalized covariance matrix which is the total correlation in [8] and used as a similarity measure for group-wise registration.

However, a volume based approach has a severe drawback; see also the discussion in [11]. To illustrate this, we consider two feature vectors \(f_1\ne 0\) and \(f_2\) with angle \(\alpha \). Hence, \(\mathrm {volume}(f_1,f_2)=\Vert f_1\Vert \Vert f_2\Vert \sin \alpha \). This value is minimal if the vectors are linearly dependent. Unfortunately, this also happens if \(f_2=0\). In a registration context, this implies that a translation of one of the images, say, about the diameter of \(\varOmega \) yields a global optimizer. In [11] it is therefore suggested to replace the minimization of volume by a maximization of correlation \(|\cos \alpha |\). This value is maximal iff and only iff \(f_2=\pm f_1\) and is in fact minimal if \(f_2=0\). This subtle difference is very important in a registration context.

2.6 Correlation Maximization of Normalized Features

In this section we focus on correlation maximization and do not discuss the corresponding minimization formulation. We also assume that feature vectors are normalized, i.e. \(\Vert F(\mathcal {T}_k)\Vert =1\). For the correlation matrix \(C(T) \in {{\,\mathrm{\mathbb {R}}\,}}^{K,K}\) holds

$$\begin{aligned} C_{kk}=1,\quad C_{jk}=\left\langle F(\mathcal {T}_j),F(\mathcal {T}_k)\right\rangle _{}=\cos \gamma _{jk}, \end{aligned}$$
(10)

where \(\gamma _{jk}\) denotes the angle between the j-th and k-th feature. In the two image setting it is therefore natural to maximize \(|C_{1,2}|\) if we account both, for positive and negative correlation. This is the underlying idea of normalized cross correlation. Note that the NGF approach is still different as the correlation is computed point wise and finally averaged.

For the multiple image setting, the best scenario is \(C\in \{\pm 1\}^{K,K}\). If only non-negative correlation is considered, the ideal case is \(C(T)=1\cdot 1^{\top }\). On the opposite, the worst case scenario for registration is that \(C(T)=I\) meaning all features are fully uncorrelated. Therefore, a suitable distance measure is to maximize the difference

$$\begin{aligned} \textstyle D(T) :=\Vert C(T)-I\Vert _{M}, \end{aligned}$$
(11)

where \(\Vert \cdot \Vert _{M}\) denotes a suitable matrix norm.

2.7 Correlation Maximization and Schatten q-norms

Specifically, choosing \(\Vert \cdot \Vert _M = \Vert \cdot \Vert _{S,q}\) a Schatten q-norm in (11) we obtain

$$\begin{aligned} \textstyle D(T)=\Vert C(T)-I\Vert _{S,q} =\left( \sum _{k=1}^{K} ( \sigma _k^2(F(T))- 1)^q\right) ^{1/q}. \end{aligned}$$
(12)

We investigate the special cases \(q=2\) and \(q=\infty \). Note that

$$\begin{aligned} \Vert A\Vert _{S,\infty }= & {} \sigma _{\max }(A),\quad \text{ the } \text{ largest } \text{ singular } \text{ value } \text{ of } \text{ A, } \text{ and }\quad \\ \Vert A\Vert _{S,2}^2= & {} \sum _k \sigma _k^2 =\mathrm {trace}(A^{\top }A) =\sum _{j,k}|a_{j,k}|^2 =\Vert A\Vert _{\mathrm {Fro}}^2. \end{aligned}$$

Thus, choosing the Schatten \(\infty \)-norm yields maximizing \(\sigma ^2_{\max }(F(T))-1\). This is equivalent to maximizing the largest singular value of \(F(T)\), see also [6]:

$$ \mathop {\mathrm {arg \ max}}\, \Vert C(T)-I\Vert _{S,\infty } = \mathop {\mathrm {arg \ max}}\, \sigma _{\max }(F(T)). $$

For the Schatten 2-norm we have \(D(T)=\Vert C(T)-I\Vert _{S,2}^2=\sum _{i\ne j}|C_{ij}|^2\) which shows that the distance is quadratic mean of the correlation among the image features. Furthermore, a direct computation shows

$$\begin{aligned} D(T) = \Vert C(T)-I\Vert _{S,2}^2 = \Vert F\Vert _{S,4}^4 - K. \end{aligned}$$

Here, we exploit the special structure of correlation matrix C, i.e., \(\mathrm {trace}(C) = K\).

To this end, we define the two \(\mathrm {S}q\mathrm {N}\) distance measures for NGF features as follows:

$$\begin{aligned} \mathrm {S}q\mathrm {N}_4(T):= & {} K - \Vert F^\mathrm {NGF}(T)\Vert _{S,4}^4\end{aligned}$$
(13)
$$\begin{aligned} \mathrm {S}q\mathrm {N}_\infty (T):= & {} -\sigma _{\max }(F^{\mathrm {NGF}}(T)) \end{aligned}$$
(14)

3 Numerical Methods

For the optimization of the functional \(J^{\mathrm {seq}}\) (cf. (5)) we use the discretize-then-optimize framework introduced in [9]. The basic concept is to use a sequence of discretized finite dimensional optimization problems. A smooth approximation of the problem is represented with few degrees of freedom. It is expected that the optimization is fast as the problem is low dimensional and smooth. Its numerical solution is prolongated and then serves as a starting guess for the finer resolved problem. It is expected that a numerical solution can be computed fast, as the starting point is expected to be close to the solution. The process is generally terminated when reaching the resolution of the given data. Note that the images are only smoothed in the spatial domain.

To solve the discrete problem on a fixed resolution we use a quasi-Newton type approach. More precisely, we use L-BFGS with the Hessian of the regularizer as an initial approximation of the metric and a Wolfe linesearch; see, e.g. [16] for optimization and [15] for details.

For the optimization of \(J^{\mathrm {S}q\mathrm {N}}\),

$$\begin{aligned} J^{\mathrm {S}q\mathrm {N}}(Y;T) :=\mathrm {S}q\mathrm {N}(T\circ Y)+S^{\mathrm {glo}}(Y), \end{aligned}$$
(15)

we use similar concepts as above for the regularization term.

For the \(\mathrm {S}q\mathrm {N}\) distance, we remark that the distance is a rather simple algebraic expression of the singular values of the feature matrix. The challenging part is thus the derivative of the singular values. Here, we follow [17]. A singular value decomposition of the feature matrix \(F \in {{\,\mathrm{\mathbb {R}}\,}}^{n\times K}\) is denoted by \(F=U\varSigma V^{\top }\), where the matrices \(U=(u_{i,k})\in {{\,\mathrm{\mathbb {R}}\,}}^{n,n}\) and \(V=(v_{j,k})\in {{\,\mathrm{\mathbb {R}}\,}}^{K,K}\) are orthogonal and \(\varSigma \in {{\,\mathrm{\mathbb {R}}\,}}^{n,K}\) is a non-negative diagonal matrix with the singular values \(\sigma _k(F)\) as diagonal entries. From [17] we have the surprisingly simple relation \(\frac{\partial \sigma _k(F)}{\partial F_{i,j}} = u_{i,k} v_{j,k}\) that is used in our implementation.

4 Results

We now present results for the registration of histological serial sectioning of a marmoset monkey brain as well as for DCE-MRI sequences of a human kidney. For the given datasets, we will compare the registration results of \(\mathrm {S}q\mathrm {N}_4\), \(\mathrm {S}q\mathrm {N}_\infty \) in comparison to a total correlation based approach like in [8] and sequential \(\mathrm {NGF}\). We start with registrations of a serial sectioning of a marmoset monkey brain; data courtesy of Harald Möller, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany [14]. The dataset consists of every 4th slice of the original serial sectioning of the brain, in total 69 slices of sizes from \(2252 \times 3957\) pixels up to \(7655 \times 9965\) pixels. For proof of concept we reduced the number of pixels per slice to reduce computation time to a reasonable level. The objective of the registration of histological slices is to align them in order to reconstruct the 3D volume of the tissue.

Fig. 1.
figure 1

Three representative axial slices of a marmoset monkey brain dataset; data courtesy of Harald Möller [14]

Figure 1 shows three representative axial slices of the data set. The main difficulties of registering this particular dataset are the different sizes of the slices on the one hand and the translation of whole parts of the imagestack within the domain on the other hand. Furthermore we didn’t use a pre-segmentation of the dataset to show robustness of the registration approaches against artifacts in the background region. The background region of the slices contains several markings of the examiners like white rectangles as well as dust and dirt from the object slide captured during the high resolution scanning process; see Fig. 1.

Figure 2 shows two sagittal slices (top and bottom row) through the image stack from the reduced, unregistered monkey brain dataset besides the registration results to illustrate the alignment of the slices. As expected the results of \(\mathrm {S}q\mathrm {N}_4\) are quite similar to the results of \(\mathrm {S}q\mathrm {N}_\infty \). The computation for the groupwise approaches using \(\mathrm {S}q\mathrm {N}\) as well as the total correlation approach from [8] took about 45 to 50 min for a resolution of \(128 \times 158\) pixels for each of the 69 slices. Compared to this, the sequential NGF approach with just one sweep needed about 2.2 times the computation time (ca. 110 min). However, from visual comparison it is obvious that many more sweeps are needed to achieve results comparable to those of the groupwise approaches; see Fig. 2. Everything was implemented in Python using Numpy and Scipy for optimization.

Moreover, we used a random permutation of the stack of histological serial sections to demonstrate invariance to the order of images of the singular value based groupwise registration approaches. We randomly permuted the order of images, registered the stack in random order using \(\mathrm {S}q\mathrm {N}_4\) and reordered it afterwards; see Fig. 3, center column. As expected, the results are the same as for registration using \(\mathrm {S}q\mathrm {N}_4\) without random permutation; cf. Figs. 2 and 3 for comparison.

Fig. 2.
figure 2

Registration results for 3D reconstruction of the monkey brain datasets. For illustration, we show only 2D slices that are sagittal cuts at two positions, i.e., 53 and 82.

Next we present registration results for a DCE-MRI sequence of a human kidney; data courtesy of Jarle Rørvik, Haukeland University Hospital Bergen, Norway. Here, 3D images are taken at 45 time points. For ease of presentation and to have a reasonable level of computation time we show results for a 2D slice over time. More precisely, we use 178-by-95 coronal slices of a 178-by-95-by-30-by-45 volume for z-slice 18; see Fig. 4 for representative slices. All time points are used for registration. The objective here is to register the slices while maintaining the dynamics. Figure 5 illustrates the stack of slices for the different registration approaches using a sagittal cut through the stack, analog to the results for the histological serial sections shown in Fig. 2. The illustrated results were achieved using three different levels of spatial resolution up to half the original resolution in about 8 min per groupwise approach. The result of the sequential approach was achieved in about twice the time using just one sweep. For the alignment using the approach from [8], we couldn’t find a parameter setting to achieve results comparable to the \(\mathrm {S}q\mathrm {N}\)-approaches.

Fig. 3.
figure 3

Registration results after random permutation of the axial slices. As expected, the results are the same as for the non-permuted image stack; also see Fig. 2 for comparison.

Fig. 4.
figure 4

Three representative 2D coronal slices of the 4D DCE-MRI dataset of a human kidney; data courtesy of Jarle Rørvik, Haukeland University Hospital, Bergen, Norway. The slices are shown at three different time points. The dataset is a 178-by-95-by-30-by-45 volume, the shown slices are 178-by-95.

Fig. 5.
figure 5

Illustrated are sagittal cuts through the stack of 2D slices from a 4D DCE-MRI dataset of a human kidney at positions 29 and 40. The first column shows the unregistered stack. Right next to this the results of the different registration approaches are illustrated.

5 Discussion and Conclusions

The registration of multiple images is an important task in image processing. Conventional approaches often use an extension of a pairwise approach for two images. In this paper, we demonstrate that this approach may come with numerous disadvantages and may be time consuming. We also describe and analyze a recently proposed alternative. The Schatten q-norm based \(\mathrm {S}q\mathrm {N}\) [2, 3] distance measure is a reference for our investigations on different singular value based measures such as the maximization of correlation between different images as well as minimization of spanned volumes. For this purpose we have introduced a general formulation using feature maps that map images into Hilbert spaces. This opens a door for even further investigation on image registration methods for multiple images. With our numerical results we demonstrate that \(\mathrm {S}q\mathrm {N}\) based motion compensation is applicable in dynamic imaging as well as for the alignment of histological serial sections. Moreover, the results clearly show that \(\mathrm {S}q\mathrm {N}\) performs at least as good as standard approaches from the literature. In our experiments both the alignment and the computation time of the groupwise approaches were closer to a desirable solution than the sequential approach using pairwise \(\mathrm {NGF}\).

Furthermore, we outlined that a singular value based approach exploits the global information of a dataset, which cannot be achieved by using two-neighbourhoods in registration. In some specific applications, such as dynamic imaging or reconstruction of histological volumes from serial sections, this can avoid unwanted effects like the so-called banana-effect. Future work will address the optimal choice of the parameter q and investigations of different variants of feature maps. Finally, different regularization strategies will be investigated.