Keywords

1 Introduction

Image registration is the process of aligning images by finding the correct spatial transformation between corresponding elements and structures in images. In medical imaging applications, registration of images acquired from different sensors or imaging protocols helps clinicians in diagnosis and computer-aided surgery by using complementary information obtained from different modalities [1]. Because of the intensity variations originated from illumination changes, inhomogeneities, or simply different imaging techniques, the registration task is becoming more difficult.

To deal with this problem, a key issue is to define an appropriate similarity measure robust to those intensity variations. Traditionally, multi-modal registration is carried out by measuring statistical dependency using similarity measures, such as mutual information (MI)[10], assuming a functional or statistical relationship between image intensities [1]. However, these measures would be problematic in those cases with complex and spatially dependent intensity relations [7]. Conditional mutual information (cMI) [9], contextual conditioned mutual information (CoCoMI) [12] and self-similarity weighted mutual information (\(\alpha \)-MI) [11] are further works that try to overcome this problem by integrating spatial and contextual information in the MI formulation in expense of higher computational time and complexity.

Structural information has been used in the literature of multi-modality problem for improving the robustness of similarity measures to image intensity variations [3, 6, 8, 18]. Edge and intensity information was utilized in [8] to register visible and infra-red (IR) images. Employing the dual-tree complex wavelet transform (DT-CWT) for registering IR and visible images in a multi-resolution approach was proposed in [3]. Complex phase order has been used as a similarity measure in registering magnetic resonance (MR) with computed tomography (CT) images in [18]. A structural similarity measure relying on un-decimated wavelet transform coefficients was proposed in previous work for cross-modality label fusion [6].

Structural information has been recently utilized to transform multi-modal to mono-modal registration. Reducing the multi-modal problem to a mono-modal one results in using simple L1 or L2 distance metrics that are computationally less expensive than statistical or structural similarity measures. Usage of gradient intensity, ridge, and estimation of cross correlating gradient directions are examples of creating a structural representation of input images for registration [4]. Structural representation based on entropy images followed by measuring sum of squared distances (SSD) was proposed in [16]. In our previous work, we have proposed a method based on a combination of phase congruency and gradient information to form a structural representation of different MR modes [5].

In this paper, a registration method is proposed based on converting the multi-modal problem into a mono-modal one by using a new structural representation of multi-modal images. Structural features, which are invariant to the image intensity, are obtained from modified version of entropy images in a patch-based paradigm. Simple measure based on intensity difference is used that will lead to faster evaluation of the image similarity and efficient optimization. In our experiments, the application of proposed structural representation is evaluated for registration. Simulated and real brain images of different modalities are used to assess the accuracy of the registration.

2 Methodology

The problem of registering two images \(I_m, I_f:\varOmega \longrightarrow \mathcal {I}\), as the moving and fixed image, defined on the grid \(\varOmega \) and the intensity values \(\mathcal {I} = \{1, \cdots , n\}\) is formulated as:

$$\begin{aligned} \hat{T} = \mathop {\mathrm{arg min}}\limits _T {D\big (I_f,T(I_m)\big )}, \end{aligned}$$
(1)

where T represents the space transformation and D stands for the dissimilarity (distance) measure to evaluate the degree of alignment. For images being represented with the same intensity values, sum of absolute differences (SAD) or SSD can be good choices for the distance measure. Registration of images with complex intensity relationships requires more complicated similarity/dissimilarity measures. Correlation coefficient (CC), correlation ratio (CR), and MI are widely used in this case [1]. In this paper, we aim to find a new structural representation, R, of different modalities and therefore, reduce the problem of multi-modal registration to a mono-modal one, so that a simple measure can effectively be employed to assess the degree of alignment. For the representation R, the registration problem stated in (1) will be reformulated as

$$\begin{aligned} \hat{T} = \mathop {\mathrm{arg min}}\limits _T {D\big (R_f,T(R_m)\big )}, \end{aligned}$$
(2)

where \(R_f\) and \(R_m\) stand for the structural representation of images \(I_f\) and \(I_m\), respectively.

Fig. 1.
figure 1

Applying a location dependent weighting to differentiate patches with different structures and the same entropy: P1 and P2, with the same structure and entropy, are encoded in two different intensity mappings. P3 has different structure and the same entropy, encoded with the same intensity mapping as P2. Applying a Gaussian kernel (Mask) to P2 and P3 results in WP1 and WP2 with different entropy values.

Consider patches \(P_x\) defined on the local neighborhood \(N_x\) centered at x. To form the new representations, the idea is to extract structural information of each patch based on the amount of information content in the patch. The bound for patch information can be represented by Shannon’s entropy which is defined as

$$\begin{aligned} H\big (I(x)\big ) = - \sum _{x \in P_x} p(I=I(x))\log \big (p(I=I(x))\big ), \end{aligned}$$
(3)

where the random variable I gets the pixel intensity values in \(P_x\) with possible values in \(\mathcal {I}\) characterized by the patch histogram p. However, it is possible that patches with different structures can end up with the same histogram and therefore the same entropy. Figure 1 shows how entropy value differentiates patches with different structures. In this figure, patches P1 and P2, which are encoded in two different intensity mappings but the same structure, take the same value as entropy. Patch P3, encoded with the same intensity mapping as P2, have different structure than P1 and P2 but the same entropy value. Weighting patch histogram based on spatial information can differentiate different patches with the same information content. A Gaussian weighting kernel defined as follows is employed for this purpose

$$\begin{aligned} G(x) = G_\sigma (\Vert x-x_0\Vert ), \end{aligned}$$
(4)

where G(x) is centered at \(x_0\) with variance \(\sigma \). Therefore, the entropy for the patch \(P_x\) will be modified to

$$\begin{aligned} \tilde{H}\big (I(P_x)\big ) = - \sum _{x \in P_x} G(x) p\big (I=I(x)\big )\log \big (p(I=i) \big ). \end{aligned}$$
(5)

Patches WP2 and WP3 in Fig. 1 illustrate how weighting two \(5\times 5\) patches with the same entropy by using a Gaussian Mask helps to differentiate them.

Patch information is mainly concentrated on structures and edges, whereas smooth areas contain less information in the patch. Edges and structures are mostly pixels with lower probability and smooth areas are represented with the higher probability values in the patch histogram. To extract patch structural information, we propose to focus on structures and highlight the pixels with higher uncertainty while decreasing the contribution of those pixels in the patch that are located in smooth areas.

Let’s define

$$\begin{aligned} h(y) = -y\log (y) \end{aligned}$$
(6)

as the weighted pixel information, where \(y = p\big (I=I(x)\big )\) for calculating patch entropy in (5). In Fig. 2.a, h(y) is shown by the blue curve. When y represents the histogram for the patch intensity, smoother areas will take larger values of y, and edges and structures will take smaller ones. To lessen the contribution of smoother areas and highlight edges and structures, one way is to use the function f to map the probability values of the patch histogram such that \(f(y)>y\) for large ys, and \(f(y)<y\) for small ys. Therefore, the weighted pixel information in (6) will be modified to

$$\begin{aligned} h(y) = -y\log (f(y)). \end{aligned}$$
(7)

The green curve in Fig. 2.a is the result of applying such function on the patch histogram. As is illustrated in this figure, applying f increases the contribution of pixels with lower probability and highly weakens the pixel contribution in the smooth areas compared to calculating the conventional entropy. Finally, the modified entropy with respect to \(P_x\) will be defined as

$$\begin{aligned} \tilde{H}\big (I(P_x)\big ) = - \sum _{i \in \mathcal {I}} G(x) p\big (I(x)=i\big )\log \big (f(p(I(x)=i)) \big ), \end{aligned}$$
(8)

which is used as the new representation, R(x), for the pixel located at x.

Fig. 2.
figure 2

Applying function f on the patch histogram. (a) Weighted pixel-information before and after applying the function f on the patch histogram. Applying f makes the curve tilt towards the vertical axis and highly attenuates its value around \(y=1\), where we have higher intensity probabilities. (b) Function f to apply on the patch histogram, which has almost linear behaviour around center and a smooth slope around boundaries.

$$\begin{aligned} H_x = -p(x)\log \big (p(x)\big ). \end{aligned}$$
(9)

Having these characteristics for the function f(.), it should be an ascending function defined in the range of [0, 1] with lower derivatives on the two endpoints of the range \([-1,1]\) and a linear behavior in the middle of the range. The function f, which is able to satisfy those characteristics, can simply be chosen as an m–th order polynomial function with symmetry property:

$$\begin{aligned} f(y) = \sum _{i=0}^m a_i y^i. \end{aligned}$$
(10)

As an example of such function, we chose a polynomial function with order \(m=5\). The resulting polynomial function, which is shown in Fig. 2.b, will be:

$$\begin{aligned} f(y) = 6y^5-15y^4+10y^3. \end{aligned}$$
(11)
Fig. 3.
figure 3

Structural representation for different MR modes. The first row shows a slice of brain scans in T1, T2, and PD modes from BrainWeb database. Second row shows the structural features associated with the first row images.

Structural features will be calculated by applying the proposed function, f, and weighting kernel, G. Figure 3 shows structural representation of different MR modes for a slice of a brain scan from simulated BrainWeb MR data [13]. As indicated in this figure, structural representation changes the problem of multi-modal registration to a mono-modal one. Therefore, SSD can be used to measure the alignment accuracy:

$$\begin{aligned} D(R_m,R_f) = \sum _n{\big |T_n(R_m(n)) - R_f(n)\big |^2}. \end{aligned}$$
(12)

3 Experimental Results

3.1 Experimental Setup

In order to evaluate the performance of the proposed method, experiments are conducted on the BrainWeb simulated database [13] and a real dataset from the Retrospective Image Registration Evaluation (RIRE) [15] that are provided by ground truth alignment. BrainWeb simulated database contains simulated MR brain scans in T1, T2, and PD modes with different levels of noise and intensity non-uniformity. In the following experiments, scans with \(3\,\%\) noise and \(20\,\%\) intensity non-uniformity are chosen. Real brain scans that are used from the RIRE dataset are in different modes of T1, T2, PD, and CT images.

In the experiments, the registration accuracy is quantitatively assessed using the target registration error (TRE), which measures the Euclidean distance between the pixel positions in the transformed image and their corresponding position in the ground truth [2].

$$\begin{aligned} TRE = \frac{1}{|\varOmega |} \sum _{i=1}^{|\varOmega |}{(x_i - x^{\prime }_i)^2}, \end{aligned}$$
(13)

where \(x_i\) and \(x^{\prime }_i\) are respectively the position of the i-th pixel in the ground truth and aligned image.

The proposed method, which is represented as Reg in the following tables, is compared with the MI-based registration (MI) [17] and SSD on entropy images (eSSD) [16]. The optimization for the rigid registration is carried out by MATLAB tools based on gradient descent optimizer for the SSD based mono-modal, and one-plus-one evolutionary optimizer for the MI-based multi-modal registration. Both rigid and deformable registration scenarios are considered for the evaluation procedure. The deformable registration is performed by free-from deformation (FFD) based on cubic B-Splines using Image Registration and Segmentation Toolkit (ITK) [14]. In our simulations, the patch size and number of bins in the histogram are empirically chosen to be \(7\times 7\) pixels and 64 bins.

3.2 Rigid and Deformable Registration

For rigid registration, the proposed method is evaluated by using MI and eSSD for the alignment, when translation is in the range of \([-20,20]\) mm with \(0^{\circ }\) rotation, and maximum rotation of \(\pm 20^{\circ }\) with zero translation. Table 1 reports the average results for 100 multi-modal rigid registration over different rotations and translations in terms of TRE in mm.

For deformable registration, a set of training data was generated from the dataset using artificial deformations by the thin-plate spline (TPS). The deformation field is normalized such that the maximum displacement is limited to 15 mm. The results of deformable registration is given in Table 2 for different combinations of image modalities. Similar to Table 1, the proposed method is compared with eSSD and MI-based registration results. Quantities in this table are obtained by averaging the results of aligning ten randomly deformed images to a fixed image.

As can be seen, the proposed method in most cases outperforms the eSSD and MI-based registration. Since the proposed method tends to extract structural features and structural features are mainly located in the rigid body of the image, the improvement in the alignment accuracy for the rigid registration is more significant. It can be seen that for non-rigid registration, the method is not able to outperform the eSSD method in all of the cases, however, the results are still comparable.

Table 1. Multi-modal rigid registration (translation T and rotation R) for RIRE and BrainWeb datasets. Registration errors are represented in average pixel displacement.
Table 2. Multi-modal deformable registration for RIRE dataset. Registration errors are represented in average pixel displacement.

4 Conclusions

We proposed a method based on introducing a structural representation for the purpose of registering multi-modal images. Unlike common multi-modal registration techniques that utilize sophisticated similarity measures, the new structural representation helps to map different intensity mappings to a common intensity space, so that a simple similarity measure can be employed to assess the alignment accuracy. The statistical representation is generated in a patch-based framework by modifying the patch entropy. To validate the merit of the method, experiments were carried out on different brain image modalities. Based on the results presented in this paper, the proposed method improved the registration accuracy compared to the eSSD and conventional MI registration methods.