Introduction

In medical imaging, we often have chronological images of tissues (which are usually collected with different imaging modalities) that need to be aligned [1, 2]. Fusion of the information of those corresponding images is proven to provide useful information to clinicians [3,4,5]. Even though registration based on manually selected homologous landmarks can be performed on images, the corresponding images are often misaligned due to reasons such as tissue deformation and errors in landmark selection. For example, in image-guided surgery, deformation of the organs, such as the brain, can invalidate surgical planning [6,7,8,9].

Image registration is the method, which aligns corresponding misaligned images acquired in different times and/or with different sensors [10]. One can categorize image registration methods in various classes such as automatic or interactive [11]. Automatic image registration is generally faster and avoids erroneous actions of the user [12]. Another classification can be made based on the method used: intensity-based or feature-based. Intensity-based image registration methods generally work better for smaller deformations, whereas feature-based methods generally work better if the initial misalignment is large [13, 14].

An automatic intensity-based image registration method can consist of different components. One image would be chosen as the template or fixed image (\(I_\mathrm{f}\)). The other image is called the moving image (\(I_\mathrm{m}\)). During the registration process, \(I_\mathrm{m}\) should move to be registered to \(I_\mathrm{f}\). The movement of \(I_\mathrm{m}\) can be restricted and modeled by a spatial transformation. A transformation type is selected based upon the application [15,16,17]. When there is no deformation of the object scene, we can simply use a rigid transformation, which only has 6\({^{\circ }}\) of freedom [18,19,20]. When one image has deformation with respect to the other one, we can use transformations with more parameters for instance, affine or free-form B-spline transformations [21,22,23]. The image registration method should have a similarity metric to evaluate the similarity of two images after the transformation. On one end of the spectrum, the similarity metric can assume a restrictive equality relationship between image intensities and easily subtract two images as in sum of square differences (SSD). On the other end of the spectrum, it can assume a general information-based similarity between images as in mutual information (MI) [24]. Correlation ratio (CR) assumes a functional relationship between intensities of the two images and provides a compromise between these two extremes. The original CR proposed by Roche et al. [25, 26] was calculated globally for the entire volumes, whereas CR used in [27] is calculated in small local patches. More importantly, unlike the original CR, we perform a binning of intensities of the reference image and calculate histograms using Parzen windows. This allows us to reliably calculate CR from small patches. The third component of registration methods maximizes the similarity of the images by varying the parameters of the chosen transformation [25,26,27].

We proposed an automatic intensity-based image registration method using the refined version of CR. The proposed method is an extended version of the method (MARCEL) proposed in [28] which was itself based on RaPTOR (Robust PaTch-based cOrrelation Ratio) [29]. Our similarity metric measures the similarity of the images based on corresponding patches locally. We modeled movement of \(I_\mathrm{m}\) with affine transformation and utilized the covariance matrix adaptation evolutionary strategy (CMA-ES) [30] as the optimization approach. We applied our method on RESECT (REtroSpective Evaluation of Cerebral Tumors) database [31] to validate the results. Recent work has successfully performed US–US registration of the RESECT database [32]. In order to show the performance of our method on different MRIs and ultrasound data, we also applied our method to the BITE database [33], which was collected using different ultrasounds and MRI machines.

The contributions of this paper are threefold. First, ARENA uses CMA-ES, which does not need gradient of the cost function, which becomes noisy when the patch size (i.e., the number of samples) is small. Therefore, the optimization step in ARENA is less susceptible to patch size and noise. Second, we show for the first time that US and MRI images of the RESECT database can be automatically registered. And third, although CMA-ES has been successfully used in registration of other imaging modalities [34,35,36,37], it is used for registration of MRI and US for the first time in this paper, where we show that it works even for patients where a very large initial misalignment exists between US and MRI.

The main difference between MARCEL [28] and ARENA is that MARCEL uses gradient descent optimization, while ARENA uses CMA-ES. Gradient descent optimization needs the derivative of the cost function and also requires tuning the step size, whereas CMA-ES does not need the gradient and further does not need tuning of the optimization parameters. In addition, MARCEL was only tested on 5 subjects in the RESECT database, whereas ARENA is tested on all 22 subjects. This further justifies the use of CMA-ES and its ability to converge to the correct solution in all tested cases. Consequently, ARENA has the following improvements compared to MARCEL. First, ARENA is less computationally expensive and is also more straightforward to implement due to the simplicity of the CMA-ES optimization method. Second, ARENA is less sensitive to parameter tuning compared to MARCEL and RaPTOR.

This paper is organized as follows. In section two, we elaborate our method and derive the equations. In section three, qualitative and quantitative validations of the method are presented. In section four, we discuss the advantages and disadvantages of our method and avenues for the future. And finally, we provide a brief conclusion in section five.

Methods

Let \(I_\mathrm{m}\) and \(I_\mathrm{f}\) be, respectively, the moving and fixed images. In our registration problem, we fix \(I_\mathrm{f}\) and move \(I_\mathrm{m}\) so that it matches \(I_\mathrm{f}\). We transform \(I_\mathrm{m}\) with \(\mathbf T \). The optimal \(\mathbf T \), when applied to \(I_\mathrm{m}\), for each point like \(\mathbf x \) in the space of images, gives us the best alignment of \(I_\mathrm{f}\) and \(I_\mathrm{m}\). Alignment of \(I_\mathrm{f}\) and \(I_\mathrm{m}\) is measured by a dissimilarity metric D. The best alignment of \(I_\mathrm{f}\) and \(I_\mathrm{m}\) with T corresponds to minimum achievable D. In other words, our goal is to minimize the following cost function:

$$\begin{aligned} C = D(I_\mathrm{f}(\mathbf x ),I_\mathrm{m}(\mathbf T (\mathbf x ))) + R(\mathbf T ) \end{aligned}$$
(1)

where \(R(\mathbf T )\) is a regularization term to enforce a smooth transformation and C is the cost function. Minimizing C by varying \(\mathbf T \) provides the transformation that aligns the fixed and moving images.

Dissimilarity metric

As explained in Eq. 1, D measures the alignment of input images, i.e., the fixed and moving images. Since CR is an asymmetric similarity metric, the order of computing CR is important. To allow either \(I_\mathrm{m}\) or \(I_\mathrm{f}\) to be the first or second image in CR, we label our input images as X and Y. D in Eq. 1 and in Eq. 2 is the amended version of RaPTOR [29]. D can vary from zero to one. In case that X and Y are the same, \(D=0\). When X and Y do not have any similarity, \(D=1\). Therefore, D is a dissimilarity metric. In Eq. 2, \(\eta \) is CR, the similarity metric proposed by Roche et al. [25]. The similarity metric needs to identify corresponding features of X and Y locally. So we calculate CR in \(N_P\) corresponding patches of X and Y.

$$\begin{aligned} D(Y,X)=\frac{1}{N_p}\sum _{i=1}^{N_p}(1-\eta (Y|X;\varvec{\varOmega _i})) \end{aligned}$$
(2)

where \(\varvec{\varOmega _i}\) represents the patch i space. The definition of CR in Eq. 2 is as follows:

$$\begin{aligned}&1-\eta (Y|X) = \frac{1}{N\sigma ^2}\left( \sum _{t=1}^{N}i_t^2-\sum _{j=1}^{N_b}N_j\mu _j^2\right) \end{aligned}$$
(3)
$$\begin{aligned}&\mu _j=\frac{\sum _{t=1}^{N}\lambda _{t,j}i_t}{N_j},N_j=\sum _{t}\lambda _{t,j} \end{aligned}$$
(4)

where N is the total number of voxels in Y, \(\sigma ^2 = \mathrm{Var}[Y]\), \(i_t\) is the intensity of voxel number t in Y, \(N_b\) is the total number of bins, and \(\lambda _{t,j}\) is the contribution of sample t in bin j as proposed in [29].

Obviously in the calculation of D in Eq. 2, patches that have approximately the same voxel intensities or equally small variances should be discarded because they do not include any image feature. Therefore, we apply a gamma correction on patches of X and Y as the one explained in [38] after selecting patches in X and Y to increase variance of patch intensities. The gamma correction applies an experimental transformation, \(i_n=\exp (\gamma i_0)\), on the voxels of patches where \(i_n\) is new intensity of the voxel. \(\gamma \) is the correction parameter which we set it to 50 heuristically, and \(i_0\) is old intensity of the voxel. We normalize intensities of the patches right after the gamma correction. Then every pair of patches in which \(\sigma ^2 < T\) are discarded. Heuristically, we found that \(T=1\) is the best value.

Transformation

We used affine transformation to model the movement of moving image. In our formulation, no regularization is needed in Eq. 1 since the affine transformation has 12 parameters to be optimized, and the images provide many cues for reliably optimizing for those parameters. The affine transformation matrix is defined as:

$$\begin{aligned} \varvec{T}= \begin{bmatrix} a_{1}&a_{2}&a_{3}&a_{4}\\ a_{5}&a_{6}&a_{7}&a_{8}\\ a_{9}&a_{10}&a_{11}&a_{12}\\ 0&0&0&1\\ \end{bmatrix} \end{aligned}$$
(5)

As one can see in Eq. 5, the affine transformation has twelve parameters which are \(a_i,\, 1\le i \le 12\). In general, these twelve parameters can be any real number.

Optimization

The explanation in the section two defines the registration procedure as an optimization problem. Image registration, in general, is an ill-posed problem and consequently entails optimizing a highly non-convex objective function [39]. In order to tackle this problem, we deployed CMA-ES as our optimizer. In Eq. 1, C is the cost of the objective function D. The affine transformation parameters \(a_i,\, 1\le i \le 12\) in Eq. 5 are used by the optimization algorithm to minimize C in Eq. 1.

Fig. 1
figure 1

From the top row, sagittal view of Patient 5, 12, 19, and 21, respectively. Columns show before and after the registration, respectively. The arrows show where the images had improvements

CMA-ES is similar to natural selection of the biological creatures [40]. In each iteration (generation) \(\lambda \), new candidate solutions (offsprings) \(x_k^{(g+1)} ,\, 1\le k \le \lambda \) are calculated from the best \(\mu \) out of \(\lambda \) of the last generation (parents) \(x_{i:\lambda }^{(g)} ,\, 1\le i \le \mu \).

There are \(N=12{^{\circ }}\) of freedom in the optimization established by affine transformation parameters. Hence, the parameter settings for \(\lambda \) and \(\mu \) are \(\lambda =4+\lfloor 3\ln (N)\rfloor \) and \(\mu =\lfloor \lambda /2\rfloor \). CMA-ES update equation for the generation g to \(g+1\) is presented in Eq. 6.

$$\begin{aligned} x_k^{(g+1)}=\frac{1}{\sum _{i=1}^{\mu }w_i}\sum _{i=1}^{\mu }w_{i}x_{i:\lambda }^{(g)}+\sigma ^{(g)}\varvec{B}^{(g)}\varvec{D}^{(g)}z_k^{(g+1)} \end{aligned}$$
(6)

where \(w_i ,\, 1\le i \le \mu \) are summation weights of offsprings and they are calculated as Eq. 7.

$$\begin{aligned} w_i=\ln \left( \frac{\lambda +1}{2}\right) -\ln (i) \end{aligned}$$
(7)

In Eq. 6, \(\sigma ^{(g)}\in \mathbb {R}^+\) is the step size at the generation g. The so-called covariance matrix \(\varvec{C^{(g)}}\) in the generation g is a symmetric positive definite \(N\times N\) and its relationship with defined parameters is presented in Eq. 8:

$$\begin{aligned} \varvec{B}^{(g)}\varvec{D}^{(g)}z_k^{(g+1)}\sim \mathcal {N}(\varvec{0,C^{(g)}}) \end{aligned}$$
(8)

For detailed explanations and equations of \(\sigma ^{(g)}\), \(\varvec{B}^{(g)}\), \(\varvec{D}^{(g)}\), \(z_k^{(g+1)}\), and \(\varvec{C^{(g)}}\) one can refer to [40, 41].

Patient data

We applied the proposed image registration method on the RESECT database [31] and the BITE database [33]. The RESECT database is an open-source clinical database that contains 23 surgical cases of low-grade gliomas resection operated at St. Olavs University Hospital. With the primary goal to help develop image processing techniques for brain shift correction, for each patient, the dataset provides preoperative T1w and T2-FLAIR MRI scans, intra-operative 3D ultrasound volumes obtained before, during, and after tumor resection, and corresponding anatomical landmarks between MRI–US pairs and US–US pairs. To demonstrate our proposed algorithm, we used the preoperative T2-FLAIR MRI and US volume before tumor resection since often this stage sets the tone for the total brain shift after craniotomy. More specifically, 22 patients from the RESECT dataset were used, where 15–16 pairs of MRI-US homologous landmarks were manually tagged. The BITE database consists of 14 patients with preoperative T1w MRI and pre-resection US. As one of the patients’ landmarks was outside the image, we excluded that patient from our experiment (as is also done in other publications [29, 42, 43]).

Table 1 Pre- and post-registration mTRE values corresponding to the RESECT database

Registration procedure

For each patient, we first up-sampled the MRI image (\(\mathrm{resolution}=1\times 1\times 1 \,\mathrm{mm}^3\)) to the resolution of corresponding US image because of the US images considerable higher resolution (\(\mathrm{resolution}=0.24\times 0.24\times 0.24 \,\mathrm{mm}^3\)). We set the US as \(I_\mathrm{m}\) and MRI as \(I_\mathrm{f}\), and then we implemented the image registration algorithm on each patient. For better performance of our method, we used up to four levels of Gaussian pyramids to tackle the large misalignment present in some of the cases. We use [q / p] patches with the size of \(7\times 7\times 7\) in this work to calculate CR where q is the number of nonzero voxels in the US image, p is the number of pixels in each US slice, and [.] denotes rounding a number to the next smaller integer operator.

Fig. 2
figure 2

Comparison of mTREs before and after the registration of patients in the RESECT database for SSC using linear transformation, \(\mathrm{LC}^2\) using rigid transformation, and ARENA. The minimum achievable mTRE by affine transformation is included as well

Results

Qualitative validation

By comparing the images before and after the registration, with visual inspection, we evaluated the quality of the registration. We compared the alignment of corresponding brain anatomical features for instance sulci and tumor boundaries in the MRI and US images before and after registration. Each patient data include the brain tumor in MRI and US images. We checked whether alignment of the boundary of the tumor has improved as well.

Figure 1 shows overlaid US and MRI slices of sagittal view for Patient 5, 12, 19, and 21 in the RESECT database [31]. Columns show before and after the registration, respectively. Each row corresponds to an individual patient. The arrows guide the reader to locate the improvement after the registration. The first and second rows show significant improvements in tumor and sulci region. The second and the third rows show improvements around the tumor region.

Table 2 Pre- and post-registration mTRE values corresponding to the BITE database
Fig. 3
figure 3

Comparison of mTREs before and after the registration of patients in the BITE database for SSC using nonlinear transformation, \(\mathrm{LC}^2\) using rigid transformation, and ARENA. The minimum achievable mTRE by affine transformation is included as well

Quantitative validation

Corresponding homologous landmarks are selected manually in the US and MRI in the RESECT database by two experts. Consider N as the total number of corresponding landmarks in US and MRI volumetric images. We used the provided landmarks to calculate mean target registration error (mTRE) [44]. mTRE for each patient is defined as Eq. 9:

$$\begin{aligned} \mathrm{mTRE} = \frac{1}{N}\sum _{i=1}^{N}\Vert \varvec{T(x_i)}-\varvec{x^\prime _i}\Vert \end{aligned}$$
(9)

where \(\varvec{T}\) is the optimal affine transformation derived after implementing the image registration algorithm.

Initial mTRE of each patient before registration and the number of landmarks for each patient are reported in Table 1. Each patient has N landmarks, and affine transformation has twelve parameters. In this table, minimum achievable mTRE is the minimum mTRE we can achieve using an affine transformation for the registration. We made system of linear equations to find the optimal achievable affine transformation. In this system, the provided landmarks are known and the optimal affine transformation parameters are unknowns. Therefore, the number of known landmarks is more than the number of unknowns \(N>12\). We solved this overdetermined problem with least squares (LS). We reported LS solution for each patient in Table 1. It is worth mentioning that while the minimum achievable mTREs are calculated the similar way as the fiducial registration error (FRE) [45], they are not equal to FRE. FRE is the root mean square error (RMSE), and we calculate mean root square error (MRSE) so that it can be compared to the initial and final mTRE values calculated before and after registration, respectively.

Recently, the Correction of Brainshift with Intra-Operative Ultrasound (CuRIOUS) 2018 Challenge (curious2018. grand-challenge.org) was held in conjunction with Medical Image Computing and Computer Assisted Intervention (MICCAI) 2018 Conference and addressed the same problem of registration of pre-resection US to MRI in the RESECT database. These papers used methods such as Multilayer Perceptron (MLP) [46], Demons [47], Linear Correlation of Linear Combination (\(\mathrm{LC}^2\)) [48], Spatial Transformer Network (STN) using 3D Convolutional Neural Network (CNN) [49], Self-Similarity Context descriptors (SSC) [50], Scale-Invariant Feature Transform (SIFT) [51], symmetric block matching using Normalized Cross-Correlation (NCC) [52], and \(\mathrm{LC}^2\) using Bound Optimization BY Quadratic Approximation (BOBYQA) [53]. ARENA is compared to the first [48] and second [50] place participants in this challenge.

ARENA improved alignments for each patient. In Table 1, initial mTRE shows rather high value of standard deviation. As in Table 1, our method had a significant improvement for standard deviation. One can interpret it as ability of the method to improve a wide range of misaligned images with high mTRE values. Figure 2 shows the data in Table 1 graphically.

Furthermore, the proposed method was applied on the BITE database as well (Table 2). The method is compared with SSC [43] and \(\mathrm{LC}^2\) [42]. Figure 3 shows the results of Table 2 graphically. Note that the SSC method utilized nonlinear deformable transformation with \(10^7\) DOFs, which allows more complex deformation than an affine transformation with 12 DOFs.

The SSC method applied on the BITE database [43] has different transformations and similarity metrics than the one applied on the RESECT database [50]. In [43], authors utilized a complex transformation with \(10^7\) Degree of Freedoms (DOFs), whereas in [50], they applied a linear method and nonlinear methods to correct the brain shift. On the other hand, the rigid registration performed with \(\mathrm{LC}^2\) in [42, 48] is different from each other. In [42], the method registered 2D slices of US images to 3D MRI images using rigid registration as the initialization and the cubic spline as the deformable registration. However, in [48], the authors performed a 3D registration by initializing the registration with a translation, then they performed a rigid registration, and finally they applied the method with an affine transformation. In Table 1 and Table 2, we reported results of the rigid initialization before the principal registration in [42, 48].

In addition to the validation method, we did a statistical analysis of our results. We used the Wilcoxon rank sum test which is a nonparametric statistical analysis method [54]. In this test, the null hypothesis \(H_0\) is: The method did not have improvement in mTRE. Using the data in Table 1, the null hypothesis is \(\mu =5.40\). The alternative hypothesis \(H_1\) would be \(\mu \geqslant 5.40\). Using the initial mTRE before the image registration and the results of ARENA in Table 1, we achieved the p value of 0.0058 by applying Wilcoxon rank sum test. Considering the conventional significance level of \(\alpha =0.05\), \(p=0.0058\) shows that not only we reject \(H_0\) and \(H_1\), but also with \(\%99.42\) confidence we improved the result. With the same setting using the data in Table 2 and the mTREs of pre- and post-registration for ARENA, the p value of 0.0483 \((p<0.05)\) has been achieved.

Discussion and future work

We showed the minimum achievable mTRE values with an affine transformation to provide a lower bound for mTRE values. We have not used these values to optimize and improve ARENA. We achieved mTRE values that are very close to this minimum value in some patients (e.g., in Patient 24). However, the average minimum achievable mTRE is 0.93 mm, which is in the order of inter-observer variability (below 0.5 mm  [31]) in landmark selection. Therefore, it is expected that our final mTRE values will be larger than the minimum achievable error.

In this work, we proposed to use a simple affine transformation to correct for brain shift. Nevertheless, nonlinear transformations offer more flexibility and allow us to recover the deformation more accurately. Before employing affine transformation, we used simple translation, rigid transformation, and rigid transformation with scaling parameters. We notice that none of them is able to improve mTRE for all patients. Affine transformation was the least general transformation model that could give us significant improvement in mTRE. Affine transformation is simpler and faster than nonlinear transformations, and practical in a wide range of applications.

CR and its derivatives RaPTOR and ARENA are asymmetric similarity metrics, meaning that reversing the order of images changes the similarity value and likely the results. We set the US and MRI as moving and fixed images, respectively, since this provided better results for ARENA. Since ARENA uses affine transformation, it can be simply inverted if clinicians prefer to deform the MRI to align with US.

Symmetric image registration methods are generally considered superior to asymmetric techniques [55, 56]. However, we found that asymmetric method used in this paper is superior to a symmetric version of ARENA. In addition, changing the role of the fixed and moving image substantially degrades the results. One reason is that CR is an asymmetric similarity measure, and order of the images substantially changes its value. Because affine transformation is an invertible transformation, we can move the MRI image by inverting the transformation.

Image registration with affine transformation has a good performance for structural images. But for functional data, such as tractography, nonlinear deformation is necessary to preserve the continuity of the tracts [57]. Except for the patients with large initial mTREs in the RESECT and BITE database, ARENA has a close performance to the SSC and \(\mathrm{LC}^2\). ARENA was utilized with exactly the same settings for both databases, which was not the case for either SSC or \(\mathrm{LC}^2\) [42, 43, 48, 50]. In addition, \(\mathrm{LC}^2\) method is computationally expensive and was implemented on GPU. The population size \(\lambda \) in CMA-ES optimization is an important factor, and a larger \(\lambda \) leads to better performance by costing more computation time.

CMA-ES implementation in MATLAB is not optimized, and it is relatively slow with conventional CPUs. More specifically, for each hierarchical level the optimization takes 2–5 min. Nevertheless, it is fast enough in IGNS settings where neurosurgeons generally spend about 10–20 min between collection of US images and resection of the tumor. For the next step, we plan to implement our method with GPU in order to further accelerate the registration process. Finally, we aim to further test our method on more datasets in different applications.

Conclusion

Herein, we presented ARENA, an affine registration method to align US and MRI volumetric images. We applied our method on the RESECT database and the BITE database, validated our method qualitatively and quantitatively, and compared to two recently published registration methods. The qualitative results show that the registered images have improvements in alignment of salient image features. ARENA has consistently improved the mTRE in all patients in both databases and is therefore a potentially promising registration method for use during IGNS.