Keywords

AMS subject classifications

1 Introduction

Working on a pair of images of the same object taken at different times or acquired using different devices, image registration aims to either find differences between them or fuse complementary information to each other which is otherwise not possible with a single modality. In either case, the key is to find a reasonable spatial geometric transformation between these two images. Though the task is required in diverse fields such as astronomy, optics, biology, chemistry and remote sensing and particularly in medical imaging, and much work have been done, getting a robust model for the task is still a challenge. For an overview of image registration methodologies and approaches, especially for registering images acquired by the same modality (e.g. CT-CT), we refer to [17, 18, 33, 35, 40]. For a more recent survey, see [8]. This Chapter is mainly concerned with registering two images from different modalities (e.g. CT-MRI or digital-Infrared) and focuses on one important question of how to impose a constraint so that the underlying transformation is diffeomorphic.

The image registration problem can be described as follows: given a fixed image R (the reference) and a moving image T (the template), both represented by scalar function mappings over \(\varOmega \subset \mathbb {R}^d\longrightarrow \mathbb {R}\), find a suitable geometric transformation \(\boldsymbol{\varphi }(\boldsymbol{x}) = \boldsymbol{x} + \boldsymbol{u}(\boldsymbol{x}),\ \boldsymbol{u} : \mathbb {R}^d \longrightarrow \mathbb {R}^d\) such that

$$\begin{aligned} G_1(T [\boldsymbol{\varphi }])=G_1(T(\boldsymbol{x}+\boldsymbol{u}(\boldsymbol{x}))\approx G_2(R), \end{aligned}$$
(1)

where \(G_1, G_2\) must be chosen suitably in multi-modality scenario, because only features or patterns in TR visually resemble each other, not their given intensities. In contrast, in mono-modality registration where intensities as well as features in TR resemble each other, we have \(G_i(\cdot )=I_d,\ (i=1,2)\) or \(T\approx R\) pixel wise. In the special case of parametric models, the solution \(\boldsymbol{u}\) (or \(\boldsymbol{\varphi }\)) is assumed to belong to some linear spanned space with known Ansatz functions, depending on few parameters (e.g. affine with 6 parameters in 2D or 12 parameters in 3D). However, not all problems can be solved by parametric models.

Here, we focus on variational models for deformable non-parametric image registration where the unknown \(\boldsymbol{u}\) sought in a properly chosen functional space is not assumed to have any parametric forms. The reconstruction problem based on model (1) is an ill-posed inverse problem and thus regularization techniques are needed to overcome ill-posedness [7, 11, 13, 14, 21, 30, 31, 47]. Generally speaking, a regularization technique turns the ill-posed problem (1) into a well-posed optimization model

$$\begin{aligned} \min _{ \mathbf{u}\in \mathcal {H}} \Big \lbrace \mathcal {J}(\mathbf{u})=S(\boldsymbol{u})+\frac{\lambda }{2} D(T(\boldsymbol{x}+\boldsymbol{u}), R)\Big \rbrace \end{aligned}$$
(2)

where the displacement \(\boldsymbol{u}\) is a minimizer of the above joint energy functional and \(\lambda \) is a positive weight which controls the trade-off between them.

In (2), the first term \(S(\boldsymbol{u})\) is a regularization term which controls the smoothness of \(\boldsymbol{u}\) and reflects our expectations in penalizing unlikely transformations. Various regularizers have been proposed, such as first-order derivatives-based on total variation [10, 23], diffusion [15] and elastic regularizer registration models, higher-order derivatives-based on linear curvature [16], mean curvature [12], Gaussian curvature [24], and fractional order derivatives based models [50]; refer also to [11, 31, 44, 51, 52].

The second term \(D(T (\boldsymbol{x}+\boldsymbol{u}),R)\) is a fidelity measure, which quantifies distance or similarity between the transformed template image \(T(\boldsymbol{x}+\boldsymbol{u})\) and the reference R. For mono-modal registration, a widely-used data fidelity term \(D(T (\boldsymbol{x}+\boldsymbol{u}),R)\) is the sum of squared differences \(D=\Vert T (\boldsymbol{x}+\boldsymbol{u})-R\Vert ^2_2\equiv \mathrm {SSD}(T (\boldsymbol{x}+\boldsymbol{u}),R)\) to measure the difference between the reference image R and the deformed template image \(T(\boldsymbol{x}+\boldsymbol{u})\). However for multi-modality registration, the choice of \(D(T (\boldsymbol{x}+\boldsymbol{u}),R)\) is more challenging. The main issue is how to design the right (or rather better) similarity measures that can support the difference (in features, colours, gradients, illumination etc.) between images from different modalities (e.g. SSD no longer makes sense). Various measures have been proposed and tested in the literature. Designing a measure which is based on the geometric information such as the gradients of the images is a good choice. See for instance the normalized gradient field (\(\mathbf {NGF}\)) [22, 26, 39], edges sketching registration [1], normalized gradient fitting (\(\mathbf {GT}\)) [22, 43] and Mutual Information [29, 37, 46]. Recently [9] proposed a cross-correlation similarity measure based on reproducing kernel Hilbert spaces and found advantages over Mutual Information.

Many models in the literature, of type (2), do not usually contain constraints to ensure that \(\boldsymbol{\varphi }(\boldsymbol{x})\) is a diffeomorphic map for the mono-modal registration. And even fewer theoretical or experimental studies deal with diffeomorphic maps for the multi-modal registration. But non-diffeomorphic maps cause phenomena such as folding or tearing which are usually seen as non-natural transformations between the two images, unless \(\lambda \) is small (implying a poor registration fidelity error). Over the last decade, more and more researchers have focused on diffeomorphic image registration where folding measured by the local invertibility quantity \(\det (J_{\boldsymbol{\varphi }})\) is reduced or avoided where \(\det (J_{\boldsymbol{\varphi }})\) is the Jacobian determinant of \(\boldsymbol{\varphi }\). Under desired assumptions, obtaining a one-to-one mapping is a natural choice, see [7, 14, 19, 20].

After surveying a few models of type (2) for multi-modal images, this Chapter shows how to incorporate a suitable constraint into a model so that it can deliver a diffeomorphic map. We illustrate our idea by a specific model: minimizing a new functional based on using reformulated normalized gradients of the images as the fidelity term [43], higher-order derivatives and a new Beltrami coefficient based term [28, 48]. An effective, iterative scheme is also presented and numerical experimental results show that the new registration model has a good performance.

2 Review of Related Models

For a variational image registration model (2), while there exist many choices for a regularizer \(S(\boldsymbol{u})\) such as the diffusion operator or the Laplacian [8], below, we briefly review a few of such choices of \(D(T(\boldsymbol{x}+\boldsymbol{u}), R)\) for registering a pair of multi-modal images TR.

Normalized Gradient Field (NGF) and its variants. The basic idea of NGF [22, 26, 39] is the use of a derived information from the image intensity, i.e., the gradient. Similarity measures depending on the gradients or geometry of the images, which naturally encode information about the shape, can be better. The aim is to align the gradients \(\nabla T(\boldsymbol{x}+\boldsymbol{u})\) and \(\nabla R\) by minimizing the cosines distance between them. More precisely, on each point \(\boldsymbol{x} \in \varOmega \), try to find a displacement \(\boldsymbol{u}(x)\) such that \(\cos \Theta =1\) where \(\Theta \) is the angle between \(\nabla T(\boldsymbol{x}+\boldsymbol{u})\) and \(\nabla R\), which leads to minimizing the similarity term:

$$\begin{aligned} D^{NGF}(T(\boldsymbol{x}+\boldsymbol{u}),R)=\int \limits _\varOmega (1- (\cos \Theta )^2)\,\mathrm {d}\boldsymbol{x}=\int \limits _\varOmega (1- (\nabla _n T(\boldsymbol{x}+\boldsymbol{u})\cdot \nabla _n R)^2)\,\mathrm {d}\boldsymbol{x}, \end{aligned}$$
(3)

where \(\nabla _n T(\boldsymbol{x}+\boldsymbol{u}) =\nabla T(\boldsymbol{x}+\boldsymbol{u}) / |\nabla T(\boldsymbol{x}+\boldsymbol{u})| \) and \(\nabla _n R= \nabla R / |\nabla R|\) are normalized unit vectors. An alternative form of the NGF that avoids using terms \(\nabla _n T(\boldsymbol{x}+\boldsymbol{u}) \) and \(\nabla _n R \) which are degenerated in homogeneous regions, reformulate NGF as

$$\begin{aligned} D^{NGF}(T(\boldsymbol{x}+\boldsymbol{u}),R)=\int \limits _\varOmega (|\nabla T(\boldsymbol{x}+\boldsymbol{u})|^2 |\nabla R|^2- (\nabla T(\boldsymbol{x}+\boldsymbol{u})\cdot \nabla R)^2)\,\mathrm {d}\boldsymbol{x}, \end{aligned}$$
(4)

Mutual Information (MI). It was firstly proposed in [46] and has been studied in various literatures (see [29, 37]), showcasing its great capability as well as limitations. The basic idea is to compare the histograms of the images by exploiting the following quantity

$$\begin{aligned} D^{MI}(T(\boldsymbol{x}+\boldsymbol{u}),R)=-\int \limits _{\mathbb {R}^2} p_{T,R}(t,r) \log \dfrac{p_{T,R}(t,r) }{p_{T}(t) p_{R}(r) }\,\mathrm {d}t\mathrm {d}r, \end{aligned}$$
(5)

where \(p_R, p_T\) are probability distributions of the gray values in R and T, while \(p_{T,R}\) is the joint probability of the gray values which can be derived from the joint histogram. The main drawback of \(\mathbf {MI}\) is its sensibility to image quantization and the difficulty in estimating the joint probability density function (PDF). In addition, the measure also fails when two features with different intensities in one image have similar intensities in the other one [27].

Maximum Correlation Coefficient (MCC). It is an extension of well-known Normalized cross correlation (\(\mathbf {CC}\)) measure , which is only efficient for mono-modal images [6, 33], to a measure that is able to handle multi-modal images [9]. The similarity measure is defined by

$$D^{MCC}(T(\boldsymbol{x}+\boldsymbol{u}),R)=(1- \mathbf {MCC}(T,R))^p:= (1- \max _{f,g}\mathbf {CC}(M,N))^p,\; 0<p<1,$$

where \(M(\boldsymbol{x}) = f(T(\boldsymbol{x} + \boldsymbol{u})\), \(N(\boldsymbol{x}) = g(R(\boldsymbol{x}))\), f and g are two measurable functions. This \(\mathbf {MCC}\) formulation does not require estimation of the continuous joint PDF and offers a powerful alternative to the models based on maximizing \(\mathbf {MI}\). However. the computation of the maximum over all functions f and g is a big challenge. The recommended approach in [9] is to approximate it based on the theory of reproducing kernel Hilbert space (\(\mathbf {RKHS}\)) [2, 5].

3 The New Model

We aim to design a variational model building on the energy of the form (2)

$$\begin{aligned} \min _{ \mathbf{u}\in \mathcal {H}} \Big \lbrace \mathcal {J}(\mathbf{u})=S(\boldsymbol{u})+D(T(\boldsymbol{x}+\boldsymbol{u}), R) + \gamma C(\boldsymbol{u})\Big \rbrace \end{aligned}$$
(6)

which is comprised of three building blocks: a data fidelity term with similarity measure D, a regularization term S and a control term C. The emphasis of this Chapter is how to choose C. To do this for a concrete model, we now specify our choice of all three terms.

3.1 Data Fitting

We consider a similarity measure based on the gradient information [43]. This measure is motivated by the standard NGF [22, 32] and it primarily explores the potential of normalized gradients beyond its standard form. We shall consider normalized gradients fitting combined with a measure based on the triangular similarity inequality. More precisely, we consider the following fitting term

$$\begin{aligned} D(T(\boldsymbol{x}+\boldsymbol{u}), R)= D^{GF}(\boldsymbol{u})+\alpha D^{TM}(\boldsymbol{u}) \end{aligned}$$
(7)

where GF stands for ‘gradient filed difference’ and TM for ‘Triangular Measure’ with

$$\begin{aligned} \begin{aligned} D^{GF}(\boldsymbol{u})&=\int \limits _{\varOmega }|\nabla _{n}T(\boldsymbol{x}+\boldsymbol{u})-\nabla _{n}R|^{2}\mathrm {d}\boldsymbol{x},\\ D^{TM}(\boldsymbol{u})&=\int \limits _{\varOmega }(|\nabla T(\boldsymbol{x}+\boldsymbol{u})|+|\nabla R|-|\nabla T(\boldsymbol{x}+\boldsymbol{u})+\nabla R|)^{2}\mathrm {d}\boldsymbol{x}. \end{aligned} \end{aligned}$$

3.2 Regularization

A regularizer controls the smoothness. Our primary choice for smoothness control is the diffusion model [15] which uses first-order derivatives and promotes smoothness. As affine linear transformations are not included in the kernel of the \(H^1\)-regularizer, we desire a regularizer which can penalize such transformation. As such, we add the regularizer based on second-order derivatives (LLT) to the model which allows to remove the need of any pre-registration step of affine transformations. The second-order derivatives allows also getting smooth transformations [52]. Our adopted regularizer is given by

$$\begin{aligned} S(\boldsymbol{u})=\frac{\beta _{1}}{2}S_{1}(\boldsymbol{u})+\frac{\beta _{2}}{2}S_{2}(\boldsymbol{u}) \end{aligned}$$
(8)

where

$$\begin{aligned} \begin{aligned} S_{1}(\boldsymbol{u})&=\int \limits _{\varOmega }|\nabla \boldsymbol{u}|^{2}\mathrm {d}\boldsymbol{x}, \;\;\;\;\ S_{2}(\boldsymbol{u})=\int \limits _{\varOmega }|\nabla ^{2} \boldsymbol{u}|^{2}\mathrm {d}\boldsymbol{x}.\end{aligned} \end{aligned}$$

3.3 Invertibility

A diffeomorphic map ensures local invertibility of the map and this is achievable by a control term C that imposes the constraint \(\det (J_{\boldsymbol{\varphi }})>0\) at any \(\boldsymbol{x}\in \varOmega \). This latter idea is much used in the literature with somewhat limited success because either strong assumptions on TR or compromised fidelity error are required; see tests and remarks from [48]. Here, instead of controlling \(\det (J_{\boldsymbol{\varphi }})\) directly, we control the Beltrami coefficient [48] in getting a diffeomorphic map and propose the use of

$$\begin{aligned} C(\boldsymbol{u}) = \!\int \limits _{\varOmega }\!\phi (|\mu (\boldsymbol{u})|^{2})\mathrm {d}\boldsymbol{x}, \end{aligned}$$
(9)

where \(\phi (v)=\frac{v^{2}}{(v-1)^{2}}\) and \(|\mu (\boldsymbol{u})|^{2}=\frac{(\partial _{x_{1}}u_{1}-\partial _{x_{2}}u_{2})^{2} +(\partial _{x_{2}}u_{1}+\partial _{x_{1}}u_{2})^{2}}{(\partial _{x_{1}}u_{1}+\partial _{x_{2}}u_{2}+2)^{2} +(\partial _{x_{2}}u_{1}-\partial _{x_{1}}u_{2})^{2}}\).

One notes that our choice of the first two terms SD for (6) is quite common while the third term [48] is relatively new to readers. This is the key idea of this Chapter: an old, non-diffeomorphic, variational model of form (2) can be converted to a diffeomorphic model by adding a control term such as C from (9). This can be done in 2D and also in 3D following our recent work. It should be remarked that model (6) is non-convex so its solutions are not unique (as true for all registration models). However we can show that the model admits at least one solution in the space \(W^{2,2}(\varOmega )\), following the idea of [49].

4 The Solution Algorithm

Here, we choose first-discretize-then-optimize method, namely directly discretize the variational model to get a discrete optimization problem and then use optimization methods to solve this resulting optimization problem. In this section we focus on a Gauss-Newton (G-N) method and in the next section we briefly introduce another alternating iteration method just before numerical results are shown.

4.1 Discretization

In the implementation, we employ the nodal grid and define a spatial partition

$$\varOmega _{h}^{n} = \{\boldsymbol{x}^{i,j}\in \varOmega | \boldsymbol{x}^{i,j} =(x_{1}^{i},x_{2}^{j})=(ih,jh), 0 \le i \le n , 0 \le j \le n\},$$

where \(h = \frac{1}{n}\) and the discrete domain consists of \(n^{2}\) cells of size \(h \times h\). We discretize the displacement field \(\boldsymbol{u}\) on the nodal grid, namely \(\boldsymbol{u}^{i,j} = (u_{1}^{i,j},u_{2}^{i,j}) = (u_{1}(x_{1}^{i},x_{2}^{j}), u_{2}(x_{1}^{i},x_{2}^{j}))\). By lexicographical ordering, we reshape four matrices to two long vectors of dimension \(\mathbb {R}^{2(n+1)^{2}\times 1}\)

$$\begin{aligned} X&= (x_{1}^{0},x_{1}^{1},...,x_{1}^{n},\ldots , x_{1}^{0},x_{1}^{1},...x_{1}^{n}, x_{2}^{0},x_{2}^{0},...,x_{2}^{0},\ldots ,x_{2}^{n},x_{2}^{n},...x_{2}^{n})^{T}, \\ U&= (u_{1}^{0,0}, ..., u_{1}^{n,0},\ldots ,u_{1}^{0,n}, ..., u_{1}^{n,n}, u_{2}^{0,0}, ..., u_{2}^{n,0},\ldots ,u_{2}^{0,n}, ..., u_{2}^{n,n})^{T}. \end{aligned}$$

4.1.1 Discretization of Fitting Term

Firstly, set \(\mathbf {R} = \mathbf {R}(PX) \in \mathbb {R}^{n^{2}\times 1}\) as the discretized reference image and \(\mathbf {T}(PX+PU) \in \mathbb {R}^{n^{2}\times 1}\) as the discretized deformed template image, where \(P \in \mathbb {R}^{2n^{2} \times 2(n+1)^{2}}\) is an average matrix from the nodal grid to the cell-centered grid. In order to discretize \(\nabla T\) and \(\nabla R\), we introduce two discrete operators: \(D_{1}= I_{n}\otimes \partial _{h}^{1}\) and \(D_{2}=\partial _{h}^{1}\otimes I_{n}\), where

$$\begin{aligned} \partial _{h}^{1} = \frac{1}{2h}\begin{bmatrix} -1 &{} 1 \\ -1 &{} 0&{} 1\\ &{}...&{}...&{}...&{} \\ &{} &{} -1 &{}0 &{}1 \\ &{} &{} &{} -1&{} 1 \end{bmatrix}\in \mathbb {R}^{n\times n}. \end{aligned}$$

Hence, the discretized \(\nabla T\) and \(\nabla R\) are \([D_{1}\mathbf {T}, D_{2}\mathbf {T}]\) and \([D_{1}\mathbf {R}, D_{2}\mathbf {R}]\) respectively. Set \(\mathrm {LT} = (\sum _{i=1}^{2}D_{i}\mathbf {T}\odot D_{i}\mathbf {T}+\epsilon )^{.1/2}\), \(\mathrm {LR} = (\sum _{i=1}^{2}D_{i}\mathbf {R}\odot D_{i}\mathbf {R}+\epsilon )^{.1/2}\) and \(\mathrm {LTR} = (\sum _{i=1}^{2}D_{i}(\mathbf {T}+\mathbf {R})\odot D_{i}(\mathbf {T}+\mathbf {R})+\epsilon )^{.1/2}\), where \(\odot \) indicates component-wise product and \((\cdot )^{.1/2}\) indicates the component-wise square root.

Then for \(D^{GF}(\boldsymbol{u})\) and \(D^{TM}(\boldsymbol{u})\), we have the following discretizations:

$$\begin{aligned} D^{GF}(\boldsymbol{u})\approx h^{2}p_{1}^{T}p_{1}, \quad D^{TSM}(\boldsymbol{u})\approx h^{2}p_{2}^{T}p_{2}, \end{aligned}$$
(10)

where (using ./ to indicate the component-wise division)

$$\begin{aligned} p_{1}&= [D_{1}\mathbf {T}./\mathrm {LT}-D_{1}\mathbf {R}./\mathrm {LR}; D_{2}\mathbf {T}./\mathrm {LT}-D_{2}\mathbf {R}./\mathrm {LR}]\\ p_{2}&= \mathrm {LT}+\mathrm {LR}-\mathrm {LTR}. \end{aligned}$$

4.1.2 Discretization of Regularization Term

The first-order regularization term can be discretized into the following form:

$$\begin{aligned} S_{1}(\boldsymbol{u}) \approx h^{2}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}\sum _{l=1}^{2} \big (\frac{u_{l}^{i+1,j}-u_{l}^{i,j}}{h} \big )^{2} +\big (\frac{u_{l}^{i,j+1}-u_{l}^{i,j}}{h}\big )^{2} \end{aligned}$$
(11)

by using the forward difference and mid-point rule.

Define \(B_{1} = I_{n+1}\otimes \partial _{h}^{2} \in \mathbb {R}^{(n+1)^{2}\times (n+1)^{2}}\), \(C_{1} = \partial _{h}^{2}\otimes I_{n+1} \in \mathbb {R}^{(n+1)^{2}\times (n+1)^{2}}\),

$$\begin{aligned} \partial _{h}^{2} = \frac{1}{h}\begin{bmatrix} -1 &{} 1 \\ ...&{}...&{}...&{} \\ &{} -1 &{}1 &{} \\ &{} &{} 0&{} \end{bmatrix}\in \mathbb {R}^{(n+1)\times (n+1)}, \ A_{1} = \begin{bmatrix} B_{1}&{}0\\ C_{1}&{}0\\ 0&{}B_{1}\\ 0&{}C_{1} \end{bmatrix}\in \mathbb {R}^{4(n+1)^{2}\times 2(n+1)^{2}}, \end{aligned}$$

where \(\otimes \) denotes the Kronecker product. Then (11) can be rewritten into the following form (noting \(U\in \mathbb {R}^{2(n+1)^2\times 1}\))

$$\begin{aligned} S_{1}(\boldsymbol{u}) \approx h^{2}U^{T}A_{1}^{T}A_{1}U. \end{aligned}$$
(12)

The second-order regularization term can be discretized into the following:

$$\begin{aligned} S_{2}(\boldsymbol{u})&\approx h^{2}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}\sum _{l=1}^{2} \big (\frac{u_{l}^{i+1,j}-2u_{l}^{i,j}+u_{l}^{i-1,j}}{h^{2}})^{2} + (\frac{u_{l}^{i,j+1}-2u_{l}^{i,j}+u_{l}^{i,j-1}}{h^{2}}\big )^{2} \nonumber \\&+ 2h^{2}\sum _{i=0}^{n-1}\sum _{j=0}^{n-1}\sum _{l=1}^{2}\big (\frac{u_{l}^{i,j}-u_{l}^{i+1,j}-u_{l}^{i,j+1}+u_{l}^{i+1,j+1}}{h^{2}}\big )^{2} \end{aligned}$$
(13)

by using the central difference, mid-point rule and Neumann boundary conditions (\(l=1,2\)): \( u_{l}^{i,0} = u_{l}^{i,-1}, u_{l}^{i,n} = u_{l}^{i,n+1}, u_{l}^{0,j} = u_{l}^{-1,j}, u_{l}^{n,j} = u_{l}^{n+1,j}. \)

Further define \(B_{21} = I_{2}\otimes (I_{n+1}\otimes \partial _{h}^{3})\), \(B_{22} = I_{2}\otimes (\partial _{h}^{3}\otimes I_{n+1})\), \(C_{2} = I_{2}\otimes (E\otimes E)\), \(\tau _1=(n+1)\times (n+1)\), \(\tau _2=n\times (n+1)\), where

$$\begin{aligned} \partial _{h}^{3} = \frac{1}{h^{2}}\begin{bmatrix} -1 &{} 1 \\ 1 &{} -2&{} 1\\ &{}...&{}...&{}...&{} \\ &{} &{} 1 &{}-2 &{}1 \\ &{} &{} &{} 1 &{} -1 \end{bmatrix}\in \mathbb {R}^{\tau _1},\quad E = \frac{1}{h}\begin{bmatrix} -1 &{} 1 \\ &{} -1&{} 1\\ &{}...&{}...&{}...&{} \\ &{} &{} -1 &{}1 &{} \\ &{} &{} &{} -1 &{} 1 \end{bmatrix}\in \mathbb {R}^{\tau _2}. \end{aligned}$$

Then (13) can be rewritten into the following form

$$\begin{aligned} S_{2}(\boldsymbol{u}) \approx h^{2}U^{T}A_{2}U,\quad A_{2} = B_{21}^{T}B_{21}+B_{22}^{T}B_{22}+2C_{2}^{T}C_{2}. \end{aligned}$$
(14)

4.1.3 Discretization of Control Term

Note that \(\phi (|\mu ( \boldsymbol{u})|^{2})\) involves only first order derivatives and all \(\boldsymbol{u}^{i,j}\) are available at vertex pixels. Thus it is convenient first to obtain approximations at all cell centers (e.g. at \(V_5\) in Fig. 1) and second to use local linear elements to facilitate first order derivatives. We shall divide each cell (Fig. 1) into 4 triangles. In each triangle, we construct two linear interpolation functions to approximate the \(u_{1}\) and \(u_{2}\). Consequently, all partial derivatives are locally constants or \(\phi (|\mu ( \boldsymbol{u})|^{2})\) is constant in each triangle.

Fig. 1
figure 1

Partition of a cell, nodal point \(\square \) and center point \(\circ \). \(\triangle V_{1}V_{2}V_{5}\) is \(\varOmega _{i,j,k}\)

Set \(\mathbf{L} ^{i,j,k}(\boldsymbol{x})= (L_{1}^{i,j,k}(\boldsymbol{x}),L_{2}^{i,j,k}(\boldsymbol{x}))= (a^{i,j,k}_{1}x_{1}+a^{i,j,k}_{2}x_{2}+a^{i,j,k}_{3}, a^{i,j,k}_{4}x_{1}+a^{i,j,k}_{5}x_{2}+a^{i,j,k}_{6})\), which is the linear interpolation for \(\boldsymbol{u}\) in the \(\varOmega _{i,j,k}\). Note that \(\partial _{x_{1}} L^{i,j,k}_{1} = a^{i,j,k}_{1}, \partial _{x_{2}} L^{i,j,k}_{1} = a^{i,j,k}_{2},\partial _{x_{1}} L^{i,j,k}_{2} = a^{i,j,k}_{4}\) and \(\partial _{x_{2}} L^{i,j,k}_{2} = a^{i,j,k}_{5}\). Then according to the partition in Fig. 1, we have

$$\begin{aligned} \begin{aligned} C(\boldsymbol{u})=&\int \limits _{\varOmega }\phi (|\mu (\boldsymbol{u})|^{2})\mathrm {d}\boldsymbol{x}\\ \approx&\frac{h^{2}}{4}\sum _{i=1}^{n}\sum _{j=1}^{n}\sum _{k=1}^{4} \phi \Big (\frac{(a^{i,j,k}_{1}-a^{i,j,k}_{5})^{2} +(a^{i,j,k}_{2}+a^{i,j,k}_{4})^{2}}{(a^{i,j,k}_{1}+a^{i,j,k}_{5}+2)^{2} +(a^{i,j,k}_{2}-a^{i,j,k}_{4})^{2}}\Big ). \end{aligned} \end{aligned}$$
(15)

To simplify (15), define 3 vectors \(\mathbf { r}(U), \mathbf { r}^{1}(U), \mathbf {r}^{2}(U)\) \(\in \mathbb {R}^{4n^{2}}\) by \(\mathbf {r}(U)_{\ell }=\mathbf { r}^{1}(U)_{\ell } \mathbf { r}^{2}(U)_{\ell }\), \(\mathbf { r}^{1}(U)_{\ell }=(a^{i,j,k}_{1}-a^{i,j,k}_{5})^{2} +(a^{i,j,k}_{2}+a^{i,j,k}_{4})^{2}\), \(\mathbf { r}^{2}(U)_{\ell }=1\big /[(a^{i,j,k}_{1}+a^{i,j,k}_{5}+2)^{2} +(a^{i,j,k}_{2}-a^{i,j,k}_{4})^{2}]\) where \(\ell = (k-1)n^{2}+(j-1)n+i\ \in [1, 4n^2]\).

Hence, (15) becomes

$$\begin{aligned} C(\boldsymbol{u}) \approx \frac{h^{2}}{4}\boldsymbol{\phi }(\mathbf { r}(U))e^{T} \end{aligned}$$
(16)

where \(\boldsymbol{\phi }(\mathbf { r}(U)) = (\phi (\mathbf { r}(U)_{1}),...,\phi (\mathbf { r}(U)_{4n^{2}}))\) denotes the pixel-wise discretization of \(u_1, u_2\) at all cell centers, and \(e = (1,...,1)\in \mathbb {R}^{4n^{2}}\).

Finally, combining the above three parts (10), (12), (14) and (16), we get the discretization formulation for model (6):

$$\begin{aligned} \min _{U} J(U):= h^{2}p_{1}^{T}p_{1}+\alpha h^{2} p_{2}^{T}p_{2}+\frac{\beta _{1}h^{2}}{2}U^{T}A_{1}^{T}A_{1}U+\frac{\beta _{2}h^{2}}{2}U^{T}A_{2}U+ \frac{\gamma h^{2}}{4}\boldsymbol{\phi }(\mathbf { r}(U))e^{T}. \end{aligned}$$
(17)

Remark 1

According to the definition of \(\phi \) and \(\mathbf { r}(U)_{\ell } \ge 0\), each component of \(\boldsymbol{\phi }(\mathbf { r}(U))\) is non-negative and differentiable.

4.2 Optimization Method for the Discretized Problem (17)

In the numerical implementation, we choose a line search method to solve the resulting unconstrained optimization problem (17). Here, the basic iterative scheme is

$$\begin{aligned} U^{i+1} = U^{i}+\theta \delta U^{i}, \end{aligned}$$
(18)

where \(\delta U^{i}\) is the search direction and \(\theta \) is the step length. In order to guarantee a descent search direction, we employ a Gauss-Newton method as the standard Newton method does not generate a descent direction because our exact Hessian is non-definite.

4.2.1 Gradient and Approximated Hessian of (17)

Firstly, we consider computing the gradient and approximated Hessian of the discretized fitting term \(h^{2}p_{1}^{T}p_{1}+\alpha h^{2} p_{2}^{T}p_{2}\). Its gradient and approximated Hessian are respectively:

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{1} &{}=&{} 2h^{2}P^{T}(\mathrm {d}p_{1}^{T}p_{1}+\alpha \mathrm {d}p_{2}^{T}p_{2})\in {\mathbb R}^{2(n+1)^{2}\times 1},\\ \hat{H}_{1} &{}= &{}h^{2}P^{T}(\mathrm {d}p_{1}^{T}\mathrm {d}p_{1}+\alpha \mathrm {d}p_{2}^{T}\mathrm {d}p_{2})P \in \mathbb {R}^{2(n+1)^{2}\times 2(n+1)^{2}}. \end{array} \right. \end{aligned}$$
(19)

where \(\mathrm {d}p_{1} = [\Lambda D_{1}-\mathrm {diag}(D_{1}\mathbf {T}./t)\Gamma ; \Lambda D_{2}-\mathrm {diag}(D_{2}\mathbf {T}./t)\Gamma ]\), \(\mathrm {d}p_{2} = \sum _{i=1}^{2}\mathrm {diag}(D_{i}\mathbf {T}./\mathrm {LT}-D_{i}(\mathbf {T}+\mathbf {R})./\mathrm {LTR})D_{i}\), \(\Lambda = \mathrm {diag}(1./\mathrm {LT})\), \(t = \mathrm {LT}^{.3}\), \(\Gamma = \sum _{i=1}^{2}\mathrm {diag}(D_{i}\mathbf {T})D_{i}\) and \(\mathrm {diag}(v)\) is a diagonal matrix with v on its main diagonal.

Remark 2

Evaluating the deformed template image \( \mathbf {T} \) must involve interpolation because \(PX+PU\) are not in general pixel points. Here in our implementation, we choose B-splines for the interpolation.

For the discretized diffusion regularizer \( \frac{\beta _{1} h^{2}}{2} U^{T}A_{1}^{T}A_{1}U+\frac{\beta _{2} h^{2}}{2} U^{T}A_{2}U, \) its gradient and Hessian are respectively

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{2} &{}=&{} h^{2}(\beta _{1}A_{1}^{T}A_{1}+\beta _{2}A_{2})U \in {\mathbb R}^{2(n+1)^{2}\times 1},\\ H_{2} &{}= &{}h^{2}(\beta _{1}A_{1}^{T}A_{1}+\beta _{2}A_{2}) \in {\mathbb R}^{2(n+1)^{2}\times 2(n+1)^{2}}.\end{array} \right. \end{aligned}$$
(20)

Finally, for the discretized Beltrami term \(\frac{\beta h^{2}}{4}\boldsymbol{\phi }(\mathbf { r}(U))e^{T}\), the gradient and approximated Hessian are as follows:

$$\begin{aligned} \left\{ \begin{array}{lcl} d_{3} &{}=&{} \frac{\beta h^{2}}{4} \mathrm {d}\mathbf { r}^{T}\mathrm {d}\boldsymbol{\phi }(\mathbf { r}) \in {\mathbb R}^{2(n+1)^{2}\times 1},\\ \hat{H}_{3} &{}= &{}\frac{\beta h^{2}}{4} \mathrm {d}\mathbf { r}^{T}\mathrm {d}^{2}\boldsymbol{\phi }(\mathbf { r})\mathrm {d}\mathbf { r}. \end{array} \right. \end{aligned}$$
(21)

where \(\mathrm {d}\boldsymbol{\phi }(\mathbf { r})= (\phi '(\mathbf { r}_{1}),...,\phi '(\mathbf { r}_{4n^{2}}))^{T}\) is the vector of derivatives of \(\boldsymbol{\phi }\) at all cell centers,

$$\begin{aligned} \left\{ \begin{array}{lcl} \mathrm {d}\mathbf { r}\ &{} = &{} \text {diag}(\mathbf { r}^{1})\mathrm {d}\mathbf { r}^{2}+\text {diag}(\mathbf { r}^{2})\mathrm {d}\mathbf { r}^{1}, \\ \mathrm {d}\mathbf { r}^{1} &{} =&{} 2\text {diag}(A_{31}U)A_{31} + 2\text {diag}(A_{32}U)A_{32}, \\ \mathrm {d}\mathbf { r}^{2} &{} =&{} -\text {diag}(\mathbf { r}^{2}\odot \mathbf { r}^{2})[2\text {diag}(A_{33}U+2)A_{33} + 2\text {diag}(A_{34}U)A_{34}], \end{array} \right. \end{aligned}$$
(22)

\(\odot \) denotes a Hadamard product, \(\mathrm {d}\mathbf { r}, \mathrm {d}\mathbf { r}^{1}, \mathrm {d}\mathbf { r}^{2}\) are the Jacobian of \(\mathbf { r}, \mathbf { r}^{1}, \mathbf { r}^{2}\) with respect to U respectively, \( [\mathrm {d}\boldsymbol{\phi }(\mathbf { r})]_{\ell }\) is the \(\ell \)th component of \(\mathrm {d}\boldsymbol{\phi }(\mathbf { r})\) and \(\mathrm {d}^{2}\boldsymbol{\phi }(\mathbf { r})\) is the Hessian of \(\boldsymbol{\phi }\) with respect to \(\mathbf { r}\), which is a diagonal matrix whose ith diagonal element is \(\phi ''(\mathbf { r}_{i}),\ 1\le i \le 4n^{2}\). More details about \(\mathbf { r}^{1}\), \(\mathbf { r}^{2}\), \(A_{31}\), \(A_{32}\), \(A_{33}\) and \(A_{34}\) are shown in Appendix 1.

Therefore, combining the above results for 3 terms, we can obtain the gradient

$$\begin{aligned} d_{J} = d_{1}+d_{2}+d_{3} \end{aligned}$$
(23)

and the approximated Hessian of (17):

$$\begin{aligned} H = \hat{H}_{1}+H_{2}+\hat{H}_{3}. \end{aligned}$$
(24)

4.2.2 Search Direction

With the above approximated Hessian (24), in each outer (nonlinear) iteration, we solve the Gauss-Newton system

$$\begin{aligned} H\delta U=-d_{J} \end{aligned}$$
(25)

to obtain the search direction \(\delta U\) for (17). Because H is symmetric positive semi-definite, in our implementation, we choose MINRES with diagonal preconditioning as the numerical solver [4, 36].

4.2.3 Step Length

Here, we choose a popular inexact line search condition, Armijo condition, which determines a step length \(\theta \) that satisfies the following sufficient decrease condition:

$$\begin{aligned} \mathcal {J}(U + \theta \delta U) < \mathcal {J}(U) + \theta \eta {d_{\mathcal {J}}}^{T}\delta U. \end{aligned}$$
(26)

Here, we set \(\eta = 10^{-4}\) and use the backtracking approach to find a suitable \(\theta \). In addition, we need to check that \(\mathbf { r}(U)\) is smaller than 1 which is the norm of the discretized Beltrami coefficient. For more details, please refer to [25, 34, 41].

4.2.4 Stopping Criteria

In the implementation, we choose the stopping criteria used in [33]:

  1. (1.a)

    \(\Vert J( U^{i+1})-J( U^{i})\Vert \le \tau _{J}(1+\Vert J( U^{0})\Vert )\),

  2. (1.b)

    \(\Vert U^{i+1}-U^{i}\Vert \le \tau _{W}(1+\Vert X+U^{0}\Vert )\),

  3. (1.c)

    \(\Vert d_{J}\Vert \le \tau _{G}(1+\Vert J( U^{0})\Vert )\),

  4. (2)

    \(\Vert d_{J}\Vert \le \) eps,

  5. (3)

    \(i \ge \) MaxIter.

Here, eps is the machine precision and MaxIter is the maximal number of outer iterations. We set \(\tau _{J} = 10^{-3}\), \(\tau _{W} = 10^{-2}\) and \(\tau _{G} = 10^{-2}\). If any one of (1) (2) and (3) is satisfied, the iterations are terminated. Hence, a Gauss-Newton numerical scheme with Armijo line search can be developed and summarized in Algorithm 1.

figure a

4.2.5 Multi-level Strategy

A multi-level strategy is a standard technique in image registration. In the multi-level strategy, we firstly coarsen the template T and the reference R by L levels. Then we can obtain \(U_{1}\) by solving our model (6) on the coarsest level. In order to give a good initial guess for the finer level, we adopt an interpolation operator on \(U_{1}\) to obtain \(U_{2}^{0}\) as the initial guess for the next level. We repeat this process and can get the final registration on the finest level. The most important advantage of the multi-level strategy is that it can save computation time because of less variables on the coarser level than on the fine level. In addition, it can help to avoid trapping into a local minimum.

4.2.6 Convergence Result

Our above described Algorithm 1 will converge to a stationary point of our new model. Details are shown in Theorem 1 of Appendix 2 below.

5 Numerical Results

In this section, we will show some numerical results to illustrate the performances of our proposed model (6) using Gauss-Newton method called GNR. We compare with the standard NGF [32] and the Augmented Lagrangian approach for solving a similar model [43] called ALMR, which uses the same regularization and fitting terms. However, the local invertibility of the map is guaranteed by imposing an inequality constraint on the model. For more details about the augmented Lagrangian method, we refer to [3, 38, 42] and the reference therein.

ALMR. Alternating iteration is another popular method which might be applied to (6). However, below, we shall consider it for a related model [43] that uses a constrained optimization (different from (6)):

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle \min _{\boldsymbol{u} \in \mathcal {H}} \lbrace \mathcal J_1(\boldsymbol{u})=S(\boldsymbol{u}) + \frac{\lambda }{2} D^{GF}(\boldsymbol{u}) + \frac{\lambda }{2}D^{TM}(\boldsymbol{u}) \rbrace ,\\ \text {w.r.t}\;\;\;\mathcal {C}_\epsilon (\boldsymbol{u})=\det \, (I + \nabla \boldsymbol{u})\ge \epsilon , \end{array}\right. } \end{aligned}$$
(27)

where imposing the constraint is a competing way of ensuring a diffeomorphic transformation.

To reformulate (27), introducing variables K, \(\mathbf{p}\) and \(\mathbf{n}\), we solve the following constrained minimization problem:

$$\begin{aligned} {\left\{ \begin{array}{ll} \displaystyle \min _{\boldsymbol{u},K,\mathbf{p},\mathbf{n}}\lbrace S(\boldsymbol{u}) + \frac{\lambda }{2} \int \limits _\varOmega ( \mathbf{n}-\nabla _n R)^2\mathrm {d}\boldsymbol{x}+ \frac{\lambda }{2} \int \limits _\varOmega (|\mathbf{p}|+ |\nabla R | -|\mathbf{m}|)^2\,\mathrm {d}\boldsymbol{x}\rbrace ,\\ \text {w.r.t}\;\;\; K=T(\boldsymbol{x}+\boldsymbol{u}),\;\; \mathbf{p}=\nabla K,\;\; |\mathbf{p}| \mathbf{n}=\mathbf{p},\;\; \mathbf{m}=\mathbf{p}+\nabla R,\; \;\mathcal {C}>0. \end{array}\right. } \end{aligned}$$
(28)

Then, the augmented Lagrangian functional corresponding to the constrained optimization problem (28) is defined as follows:

$$\begin{aligned} \begin{aligned}&\mathcal {L}_1(\boldsymbol{u},K,\mathbf{p},\mathbf{n},\mathbf{m}, \lambda _1,\lambda _2,\lambda _3,\lambda _4,\lambda _5) \\&= S(\boldsymbol{u}) + \frac{\lambda }{2} \int \limits _\varOmega ( \mathbf{n}-\nabla _n R)^2\mathrm {d}\boldsymbol{x}+ \frac{\lambda }{2} \int \limits _\varOmega (|\mathbf{p}|+ |\nabla R | -|\mathbf{m}|)^2\,\mathrm {d}\boldsymbol{x}\\&\ \ + \frac{r_2}{2} \int \limits _\varOmega (\mathbf{p}-\nabla K)^2\mathrm {d}\boldsymbol{x}+\frac{r_3}{2} \int \limits _\varOmega (\mathbf{p}-|\mathbf{p}| \mathbf{n})^2 \mathrm {d}\boldsymbol{x}+\frac{r_4}{2} \int \limits _\varOmega (\mathbf{p}+ \nabla R-\mathbf{m})^2 \mathrm {d}\boldsymbol{x}\\&\ \ +\int \limits _\varOmega (T(\boldsymbol{x}+\boldsymbol{u})-K)\lambda _1 \mathrm {d}\boldsymbol{x}+ \int \limits _\varOmega (\mathbf{p}- \nabla K)\cdot \lambda _2 \mathrm {d}\boldsymbol{x}+ \int \limits _\varOmega (\mathbf{p}-|\mathbf{p}| \mathbf{n})\cdot \lambda _3 \mathrm {d}\boldsymbol{x}\\&\ \ + \int \limits _\varOmega (\mathbf{p}+ \nabla R-\mathbf{m})\cdot \lambda _4\, \mathrm {d}\boldsymbol{x}+\frac{r_1}{2}\int \limits _\varOmega (T(\boldsymbol{x}+\boldsymbol{u})-K)^2\mathrm {d}\boldsymbol{x}+\frac{1}{2\sigma } \int \limits _\varOmega \mathcal {C}_s(\boldsymbol{u},\lambda _5)\,\mathrm {d}\boldsymbol{x}, \end{aligned} \end{aligned}$$
(29)

where

$$\begin{aligned} \mathcal {C}_s(\boldsymbol{u},\lambda _5)=[\min \lbrace 0,\sigma (\mathcal {C}(\boldsymbol{u})-\epsilon ) - \lambda _5 \rbrace ) ]^2-\lambda _5^2, \end{aligned}$$
(30)

\(\epsilon >0\) is a small parameter, \(\sigma >0\) and \(\lambda _i (i=1,\ldots ,5)\) are the Lagrange multipliers. The augmented Lagrangian algorithm is shown in Algorithm 2.

figure b

In practice, the minimization problem (29) or (31) is decomposed into a number of sub-problems, each of which can be solved quickly. However, the convergence of the augmented Lagrangian iterations for this case is not guaranteed due to the non-convexity of overall registration problem. Currently this is a major weakness of ALMR while the convergence of GNR (even if a bit slower) can be proved and hence recommended.

In order to reduce the number of parameters to tune, we set \(\lambda =15\), \(\beta _1=0.005\), \(\beta _2=0.1\times \beta _1\) \(r_1 = 5\), \(r_2=10\) and \(r_3=r_4=100\) in all numerical experiments unless stated otherwise. We consider \(N_{max}=70\) as the maximum number of iterations for ALMR from Algorithm 2 and we stop the iterations before reaching \(N_{max}=70\) if the following stopping criterion

$$ \frac{\Vert \mathbf{p}^k + \nabla R -\mathbf{m}^k\Vert _{L^1}}{\sqrt{l\times c}} \le \tau $$

is satisfied for a given tolerance \(\tau =10^{-3}\), where l and c are the numbers of rows and columns in the image

For all compared methods, we set the zero vector as the initial guess \(U^{0}\). To measure the quality of the registered images, we use the following quantities

$$\begin{aligned} \mathrm {GFer} = \frac{D^{GF}(\boldsymbol{u})}{D^{GF}(\boldsymbol{u}^{0})}, \end{aligned}$$
(31)
$$\begin{aligned} \mathrm {NGFer} = \frac{D^{NGF}(\boldsymbol{u})}{D^{NGF}(\boldsymbol{u}^{0})}, \end{aligned}$$
(32)

and

$$\begin{aligned} \mathrm {MIer} = -D^{MI}(\boldsymbol{u}). \end{aligned}$$
(33)

The good result means that it can lead to small GFer, small NGFer and large MIer. All the codes are implemented by Matlab R2019b on a PC with 3.4 GHz Intel(R) Core(TM) i5-3570 processor and 12 GB RAM.

Fig. 2
figure 2

Example 1 without the Beltrami control term: the first row shows the reference, template and overlay of the reference and template. The second and third rows show the deformed templates and transformations obtained by two pairs of parameters \((\beta _{1},\beta _{2})=(50,2)\) and \((\beta _{1},\beta _{2})=(50,5)\), respectively. The results are visually similar but the transformations are not both one-to-one. The first choice leads to a mesh with folding because the minimum of the Jacobian determinant of the transformation is negative

Fig. 3
figure 3

Example 1: the deformed template and transformation are generated by \((\beta _{1},\beta _{2},\gamma )=(50,2,10)\). The results are visually satisfied and the transformation is one-to-one. Second row: the deformed template obtained by ALMR and its overlay with the reference R

5.1 Example 1

In this example, we consider a pair of images displayed in Fig. 2a, b. The resolution is \(256\times 256\). In order to choose the parameter easily, in this example, we fix \(\alpha \) and set \(\alpha =0.01\).

Firstly, we consider the model without Beltrami control term, namely \(\gamma =0\). For the parameters of regularizers, we set two pairs \((\beta _{1},\beta _{2})=(50,2)\) and \((\beta _{1},\beta _{2})=(50,5)\). The corresponding deformed templates and transformations are shown in Fig. 2d, e, g, h. From Fig. 2f, i, we can find that the deformed templates generated by these two pairs of parameters are visually satisfied. In addition, these two choices give similar measurements: GFer \( = 0.82\), NGFer \( = 0.81\), MIer \( = 0.58\) and GFer \( = 0.83\), NGFer \( = 0.84\), MIer \( = 0.57\) respectively. However, the first choice leads to a transformation containing folding because the minimum of the Jacobian determinant of the transformation is negative but the second choice produces a smooth transformation without folding because the minimum of the Jacobian determinant of the transformation is positive.

Since first and second order regularizers just control the smoothness, in order to overcome this drawback, we keep \((\beta _{1}, \beta _{2})=(50,2)\) unchanged and choose a suitable \(\gamma \). Here, we set \(\gamma =10\). Figure 3a, b shows the corresponding deformed template and transformation. From Fig. 3c, the deformed template is similar visually with the previous one without controlling the Beltrami coefficient and the measurements are also similar (GFer \(= 0.82\), NGFer \(= 0.82\) and MIer \(= 0.57\)). But the minimum of the Jacobian determinant of the transformation is positive, which illustrates that the transformation is diffeomorphic. In the same figure, we also give the result of ALMR model, which shows again from the overlay of \(T(\boldsymbol{\varphi })\) and the reference R that the template image T is well registered to R.

Now, we investigate the sensitivity of \(\gamma \). From Table 1, we can find that when we fix \(\alpha ,\beta _{1}\) and \(\beta _{2}\) and change \(\gamma \), GFer, NGFer and MIer are robust and at the same time, the minimum of the Jacobian determinant of the transformations are all positive. This indicates that the Beltrami control term is not sensitive.

Table 1 Example 1: measurements obtained by using \(\alpha =10^{-2},\beta _{1}=50\) and \(\beta _{2}=2\)

In addition, we also investigate the convergence of the algorithm for our model. Here, we force the relative norm of the gradient of the approximated solution to reach \(10^{-3}\) although it only runs several iterations by using the practical stopping criteria. Here, according to Fig. 4, we can find that the algorithm for our model is convergent.

Hence, this example illustrates that our new control term can effectively control the transformation and lead to an accurate registration. Meanwhile, the new control term can make this model more robust.

Fig. 4
figure 4

Example 1: Relative norm of the gradient and relative norm of the function value by the parameter \((\alpha ,\beta _{1},\beta _{2},\gamma ) = (0.01,50,2,10)\). Here, we can notice that our algorithm is convergent

5.2 Example 2

In this example, we consider another pair of \(256\times 256\) images (Fig. 5a, b). Again, in order to reduce the complexity of choosing parameters, we fix \(\alpha =10^{-1}\) in this example.

Fig. 5
figure 5

Example 2 by the new model GNR without using the control term C: the resulting transformation is not diffeomorphic although the deformed template is visually satisfied

Firstly, we set \(\beta _{1}=50,\beta _{2}=10\) and \(\gamma =0\). From Fig. 5d–f, although the deformed template is satisfied visually, we can find that the resulting transformation has folding since the minimum of the Jacobian determinant is negative.

As a comparison, we also test the model of the standard NGF [32] with the same first- and second-order regularizer. Here, we test three pairs of \((\beta _{1},\beta _{2})\) and the corresponding results are shown in Fig. 6. We can find that for the fitting term, if we choose NGF, it is very hard to choose the suitable parameters to get a good registration, namely, simultaneously get a diffeomorphic transformation and a visually satisfied deformed template. In order to overcome this difficulty, we keep \(\beta _{1}\), \(\beta _{2}\) unchanged and choose \(\gamma \) as 1, 10 and 100 separately. Figure 7 shows that they can all generate visually satisfied deformed template and diffeomorphic transformations. Specifically, according to Fig. 7, we can see that the measurements obtained by these choices are very similar, which again demonstrates that this model can be more robust through combining the Beltrami control term. We also give the result of ALMR model in Fig. 8. We can observe from overlay of the registered and the reference images that all models work fine in producing acceptable registration result.

In summary, when the ALMR, the NGF and the GNR work, the latter has the largest MIer similarity (indicating better quality). However, NGF (or taking out an extra control term for ALMR and GNR) can fail to deliver a valid result (with negative \(\det \nabla \boldsymbol{y}\)) if the parameters are not chosen correctly. Although ALMR is completive to GNR (and takes less time to converge in practice), only the convergence of GNR can be proved. Hence our model GNR is robust and can be recommended for multi-modal registration.

Fig. 6
figure 6

Example 2 by the GNR without imposing a control term. Each column shows results of a different choice of \((\beta _{1},\beta _{2})\) balancing first- and second-order regularizers: the deformed template, overlay of \(T(\boldsymbol{\varphi })\) and R, and the transformation. Clearly the last column obtains the incorrect \(\boldsymbol{\varphi }\)

Fig. 7
figure 7

Example 2 by the new model GNR. By using the control term for each choice of \(\gamma \) (by column), the resulting transformation is diffeomorphic and the deformed template is also visually pleasing

Fig. 8
figure 8

Example 2 by ALMR model. The deformed template is also visually close to the reference R

6 Conclusions

Image registration is an increasingly important and often challenging image processing task. The quality of the transformation requires suitable control. In this Chapter to improve a multi-modality registration model, we propose a novel term motivated by Beltrami coefficient, which can lead to a diffeomorphic transformation. The advantage of the term lies in no bias imposed on its Jacobian of the transformation’s determinant. By employing first-discretize-then-optimize method, we design an effective solver to solve our proposed model numerically. Experimental tests confirm that our proposed model performs well in multi-modality images registration. In addition, with the help of the Beltrami control term, the proposed model is more robust with respect to the parameters. Future work will investigate extension of this work to a deep learning framework [45].