Introduction

A common setup for image-guided neurosurgery (IGNS) is one that allows for the visualization of a tracked pointer onto a pre-operative 3D magnetic resonance (MR) image in order to expose internal regions of interest (e.g., tumors) and use the pre-operative image for guidance during the procedure. However, brain movements during open-skull operations as well as inaccurate image-to-patient registrations are known to reduce the utility of using the images for guidance. Intra-operative imaging modalities have thus been proposed for improved guidance. Nonetheless, some modalities, such as intra-operative MR, involve a prohibitive cost, as well as requiring major modifications to the operating room and surgical procedure. For such reasons, intra-operative ultrasound (iUS) continues to be a highly appealing option given its ease of use and relatively inexpensive costs. The iUS is typically tracked and superimposed on the pre-operative MR with the use of a navigation system, allowing the surgeon to assess a misalignment or deformation with relation the MR. Unfortunately, the comparatively poor image quality of iUS allows for the exposure of only a limited (yet important) set of coarse anatomical structures such as the brain tumor, the lateral ventricle boundaries, and the falx. For example, the accuracy of iUS for identifying brain tumor boundaries such as gliomas and metastases has been well documented in previous work [23]. However, there are many other important anatomical references that are left undepicted, thus limiting the direct use of iUS for guidance purposes.

The approach we follow in this work consists of matching a tracked iUS to a pre-operative MR (pMR) image, thereby permitting the updating of the pMR during surgery. The clinician can therefore benefit from the soft-tissue imaging detail found in the MR volume, as well as any other preoperative images, while taking advantage of the practical ease of US acquisition and the physical reality represented by the iUS. The critical challenge thus lies in quickly and accurately registering the MR image to an iUS image, thereby minimizing the time required to obtain an updated MR image and allowing for seamless integration in the operating room.

Registration of MR to iUS is a problem faced with multiple challenges brought forward by the widely different image formation models of each modality. Magnetic Resonance Imaging can be largely characterized as a tissue-type based modality, where the image intensity of a given voxel is mainly a function of the tissue found within the voxel’s volume. On the other hand, US images illustrate different acoustic impedance transitions encountered by the ultrasonic wave. Figure 1a–d shows four different neurosurgical cases and illustrates the differences between the two modalities. Notice that the MR image allows for an accurate identification of multiple soft-tissue types such as gray matter, white matter, and bone and hence also permits the localization of many anatomical structures on the brain (e.g., lateral ventricles, falx, sulci, gray and white matter surface, etc.). Alternatively, the corresponding US image exposes the tumor tissue and its boundary (with some degree of uncertainty) and also depicts part of some key structures like the falx and the lateral ventricles.

Fig. 1
figure 1

Pre-operative MR volume and the initial unregistered iUS volume in a brain tumor resection surgery, corresponding to cases 4, 6, 8, and 9. The first column shows the MR image in grayscale with the corresponding iUS overlapped and heat map colored. The second column shows the MR image. The third column shows the iUS. The top row shows a coronal view, the middle row shows a sagittal view, and the bottom row shows a transverse view. Anatomical structures found in the iUS are identified by a green arrow and label, while structures found in the MR are identified by a white arrow and label. a Case 4, b Case 6, c Case 8, d Case 9

In the particular context of brain tumors, we are also affected by the fact that pathologies from different cases can have quite unique image features in each modality. Figure 1a–c contrasts three cases with significantly different depictions in each modality. In particular, we can observe that the inner tumor tissue and the tumor boundaries in the iUS are exposed with quite distinctive image characteristics in each case. For example, Fig. 1c depicts tumor tissue with very high US intensity values, but does not allow for an accurate identification of its boundary. Alternatively, in Fig. 1b, the tumor tissue is depicted with a low US intensity and the boundary can be identified with increased certainty. Furthermore, Fig. 1b, c provides a prominent depiction of the lateral ventricles in the iUS, a highly informative anatomical structure for identifying a match across modalities. On the other hand, Fig. 1a provides a very weak depiction of the boundaries of the lateral ventricles. This kind of variability in exposing anatomical structures in iUS is a clear challenge for conventional multi-modal registration approaches that assume a global hard mapping between image features of one modality to image features of a second modality.

The registration of MR to US images has been previously addressed by various groups [1, 10, 12, 16, 25]. Some approaches [3, 20] rely on gradient magnitude as an image feature of interest in conjunction with a conventional multi-modal similarity metric such as mutual information (MI) or normalized cross-correlation (NCC). Other proposed techniques rely on local-phase [14, 25] as a feature in conjunction with MI. A major challenge encountered by such approaches is that the image intensity response found in US is significantly non-homogeneous. Consider the coronal view found in Fig. 1d where we can clearly observe how the US intensity decays with relation to the distance from the probe. In particular, notice that the US pixel intensities corresponding to white matter are far from consistent and will likely result in a degradation of registration performance when using a similarity metric that involves the full image domain. There have also been approaches [10, 12] that propose a preprocessing stage in which imaging artifacts (e.g., speckle, noise) are reduced and consequently register the preprocessed images with a multi-modal metric like normalized mutual information (NMI) evaluated over a sampling mask that typically covers the tumor volume and part of its surrounding region. Previously, our group has proposed [1, 16] the generation of a pseudo-US from segmented structures in the MR and then registering the pseudo-US with the acquired iUS. Similarly, in [24], the registration of US to CT is addressed through a simulated US obtained from the CT. This technique makes use of a hard mapping of the CT intensity to the tissue’s echogeneity, which relies on a functional relationship between those two values. Unfortunately, the strong presence of speckle and the wide variability across IGNS cases do not allow for a straightforward implementation of such strategy in this highly variable context.

In this paper, we propose a new MR to US registration framework, with the goal of providing substantially improved robustness and computational performance in the context of image-guided neurosurgery (IGNS). Registration is based on gradient orientation alignment, motivated by the fact that gradient orientations are considered to characterize the underlying anatomical boundaries found in the scene and are more robust to the effect of non-homogeneous intensity response found in US. However, orientation estimates can be noisy in this context and using the full set of orientations would lead to undue computational cost and increase the risk of errors. An important distinction with other approaches based on gradient orientations [2, 7, 9, 11, 14, 18, 19] is that we restrict the evaluation of the similarity metric to a small set of locations of interest with low uncertainty gradient orientations, which allows for improved robustness and computational performance. In order to address such limitations, our technique [5, 6] first selects locations whose gradient orientations have high certainty (i.e., minimal noise) and likely correspond to structures of interest. Once such locations have been identified, we maximize their alignment with corresponding orientations. Our approach is thus asymmetric in the sense that the fixed image (US) plays a different role than the moving image (MR), allowing the registration task to focus on the alignment of boundaries that appear in the US and likely have a counterpart in the MR image, and thereby improving registration robustness against the variability in iUS found throughout the cases evaluated.

Experimental findings show that our approach brings forward gains in computational performance and registration accuracy, as evaluated over fourteen clinical cases obtained from the publicly available MNI BITE dataset. In particular, we can achieve a robust performance, in the sense that all fourteen cases used for validation have a resulting mean distance between corresponding points that is larger than the smallest possible mean distance (under a rigid transformation) by no more than 1 mm. Furthermore, such performance is achieved with a highly reduced subset of voxels (e.g., 2 % of the image) and a GPU-based implementation, which leads to an average processing time of 0.76 s. This achievement should permit the strategy to be easily embedded in the clinical IGNS system, minimizing the delay suffered every time an updated MR is required.

Validation of registration performance is measured with a significant number of clinical cases, which we argue provides a much more informative indication of real registration accuracy and robustness in comparison with controlled brain phantom-based setups. However, the lack of a gold standard does involve some limitations in the validation strategy. In particular, given the limited number of homologous landmarks, the uncertainty in identifying the precise location of such points, and the potential disagreement between experts, there is some ambiguity as to how accurate the performance metric is. In this work, we provide an analysis of the landmarks identified by each expert in order to demonstrate the degree of agreement between experts, the potential need for a non-rigid registration and to illustrate the challenges of an accurate registration validation based on real clinical cases. Given this ambiguity in measuring registration accuracy, we choose to evaluate the performance of a rigid registration only, and we demonstrate that even though a non-rigid registration may be required, a rigid registration can characterize most of the deformation encountered.

Clinical data

We make use of fourteen clinical neurosurgical cases obtained from the Montreal Neurological Institute’s Brain Images of Tumors for Evaluation (MNI BITE) [15], an open access onlineFootnote 1 dataset of clinical MR and US images of brain tumors. In particular, we evaluate the registration of pre-operative MR images to intra-operative US images obtained prior to tumor resection, identified as Group 2 of the MNI BITE dataset. The cases involve low- and high-grade gliomas (LGG and HGG, respectively), at different depths and locations in the brain and with tumor volumes ranging between 0.2 and 79.2 cm\(^3\). The initial location of each case corresponds to a preliminary registration involving the manual identification of corresponding points on the skin and the MR image, as is common in standard clinical procedures.

All pre-operative images used consist of T1-weighted gadolinium-enhanced MR images. All cases, except case 8, were acquired with an axial SPGR sequence. The pre-operative MR image of case 8 was acquired with a 3D MPRAGE acquisition. Two-dimensional US images were acquired on the dura with an HDI 5000 (ATL/Phillips, Bothell, WA, USA) machine using a P7-4 phased array transducer at a depth of 6.5 and 8 cm. The US probe was tracked with reflective spheres and the 2D images were captured using a Pinnacle PCTV frame-grabbing card. Each acquisition includes between 200 and 600 frames. A 3D US volume of voxel spacing of 1.0 \(\times \) 1.0 \(\times \) 1.0 mm is then reconstructed with a distance-weighted pixel-based method [22]. Note that the reconstructed volume used is different than the one found in the MNI BITE dataset. In particular, we adopt a coarser voxel spacing which reduces the number of holes found in the volume and tends to decrease the presence of noise. The processing time of reconstructing a 3D US volume currently represents a significant delay with an average processing time of 90 s, largely due to lack of software optimization. We note that an implementation of 3D US reconstruction performed in real-time was recently demonstrated in [4].

Gradient orientation-based registration

In this section, we describe the proposed registration approach [5, 6], characterized by three major components: (1) a local similarity metric based on gradient orientation alignment, (2) a multi-scale selection strategy that identifies locations of interest with gradient orientations of low uncertainty, and (3) a computationally efficient technique for evaluating gradient orientations of the transformed moving image. The registration pipeline consists of two stages. First, a pre-processing stage involving the computation of image derivatives of both volumes and the identification of locations with low uncertainty gradient orientations. The second stage consists of an optimization strategy which maximizes the average value of the local similarity metric evaluated on the locations of interest. Optimization is performed with a covariance matrix adaptation evolution strategy (CAE) [8], a non-gradient-based optimization strategy.

Local similarity measure

The similarity metric employed evaluates the alignment of corresponding gradient orientations as:

$$\begin{aligned} S(\nabla I_F, \nabla I_M) = \cos \left( \varDelta \theta \right) ^2 \end{aligned}$$
(1)

where \(\varDelta \theta \) is the inner angle between corresponding image gradients, \(\nabla I_F\) and \(\nabla I_M\). Note that the fixed image, \(I_F\), is set to be the reconstructed 3D US volume, while the moving image, \(I_M\), is set to be the pre-operative MR image.

In order to improve registration robustness and computational performance, we restrict the evaluation of the similarity metric to a small set of locations with low uncertainty gradient orientations. In the following section, we describe the principles behind the selection strategy employed.

Gradient orientations from a noisy image

The main motivation behind the use of gradient orientations as an image feature of interest lies in the fact that they are directly related to the direction of anatomical boundaries. Thus, it is of critical importance to give greater importance to locations whose orientations correspond to tissue transitions and disregard all locations whose gradient orientations are brought forward by image artifacts or noise.

In this work, we adopt a selection strategy that identifies locations of interest whose corresponding gradient orientations are considered reliable or of high certainty, and we then restrict the evaluation of the local similarity metric to such locations. It is important to note that such locations are identified on the fixed image and there is no selection made on the moving image. This asymmetrical nature of this approach is very well-suited for this multi-modal context. In particular, we note that for the cases found in the MNI BITE dataset, most of the strong tissue-type transitions exposed by the fixed image (US) can also be found in the moving image (MR), while the converse is typically not true. For such purposes, we demonstrate an approach for obtaining a reliable indicator of the certainty of a given gradient orientation and consequently select a reduced set of locations with high certainty gradient orientations.

In this work, we consider the main source of uncertainty in gradient orientations to be additive Gaussian noise on voxel intensities from the reconstructed US volume. It is important to state that commonly used US noise models are typically more complex. However, they are generally defined with relation to the raw “envelope” (i.e., before log-compression) US intensity or on the log-compressed US intensity (i.e., US intensity displayed on screen). For example, a proposed noise model for log-compressed US intensity, commonly known as the Loupas model [13], characterizes the noise as additive Gaussian with a location-dependent variance proportional to the original (i.e., undegraded) intensity value. We highlight the fact that intensities found in the reconstructed US volume are obtained by a linear combination (i.e., weighted mean) of a large number of pixel intensities from log-compressed US slices and thus involve a different noise model from the one in log-compressed US.

Our simplifying assumption is to characterize such noise as an additive Gaussian model,

$$\begin{aligned} I[i] = F[i] + \varepsilon [i] \end{aligned}$$
(2)

where \(F\) is the undegraded (i.e., noiseless) image, \(i\) is a voxel index, and \(\varepsilon [i]\) is an i.i.d. Gaussian random variable with variance \(\sigma ^2\).

The corresponding probability density of a voxel’s intensity is expressed as,

$$\begin{aligned} p(I[i]~|~F[i]) = \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \Big (-\frac{(I[i] -F[i])^2}{2\sigma ^2}\Big ) \end{aligned}$$
(3)

If we also consider that the image gradient at a particular location is obtained by convolution with linear operators that act solely on the axis (dimension) of interest, we can derive the posterior probability of, \(\phi \), the gradient orientation of the undegraded image, \(F\), given \(m\) and \(\theta \) the observed gradient magnitude and orientation of \(I\). The resulting expression is

$$\begin{aligned} p(\phi ~|~m,~\theta ) = \frac{e^{-\frac{-m^2 \sin ^2\varDelta }{2||\mathbf{K}||^2\sigma ^2}} \cdot \varPhi (\frac{m \cos \varDelta }{|\mathbf{K}|\sigma }) }{\pi \sqrt{\sigma |\mathbf{K}|} \cdot I_0(\frac{m^2}{4||\mathbf{K}||^2\sigma ^2}) \cdot e^{-\frac{m^2}{||\mathbf{K}||^2\sigma ^2}}} \end{aligned}$$
(4)

where \(\varDelta =\phi -\theta \) and \(\varPhi ()\) is the cumulative distribution function of a Gaussian random variable and \(\mathbf{K}\) is the discrete kernel characterizing the derivative linear operator.

The posterior density mainly expresses a unimodal directional density whose variance is a monotonically decreasing function of \(\frac{m}{|\mathbf{K}|}\). Thus, we obtain that \(\frac{m}{|\mathbf{K}|}\) represents a common indicator of the precision (i.e., inverse of variance) of a given gradient orientation and which we can use as a cue for identifying locations of interest.

In this work, we automatically identify a gradient orientation selection threshold which corresponds to the 80th percentile of gradient magnitude (i.e., the threshold above which the top 20 % locations with highest gradient magnitude are found) for each particular case. To allow for reduced processing times, we also consider a random selection of a reduced number (e.g., 8,000) of voxels found within such mask.

Evaluation of transformed moving image gradients

Instead of adopting a straightforward approach for evaluating the gradient of the transformed moving image, in which pixel intensities are evaluated first by interpolation and the image derivatives are consequently computed by convolution, we adopt a more computationally efficient approach [6] which simply maps the gradients computed (once) by convolution on the original moving image.

The linear mapping of gradients can be easily found by expanding the expression for the derivative of a transformed moving image. Consider a D-dimensional moving image, \(I_M\), whose coordinate space is inversely mapped by a continuous transformation function, \(\mathbf{T}\), to the fixed image coordinate space. In other words, a location, \(\mathbf{x} = (x_1, \ldots , x_D)\), in the fixed image coordinate space corresponds to location \(\mathbf{T}(\mathbf{x}) = (T_1, \ldots , T_D)\) in the coordinate space of the original moving image. The derivative of the moving image with respect to a particular dimension, \(x_j\), of the fixed image coordinate space is expressed as,

$$\begin{aligned} \left. \frac{\partial I_m }{ \partial x_j} \right| _{\mathbf{T}(\mathbf{x})} = \sum _{i}^D \left. \frac{\partial I_m}{\partial T_i}\right| _{\mathbf{T}(\mathbf{x})} \cdot \left. \frac{\partial T_i}{ \partial x_j}\right| _{\mathbf{x}} \end{aligned}$$
(5)

where the term \(\frac{\partial T_i}{ \partial x_j}\) corresponds to the \((i,j)\)-th component of the spatial Jacobian matrix of the transformation function.

$$\begin{aligned} J_{\mathbf{T}} = \left[ \begin{array}{ccc} \frac{\partial T_1}{\partial x_1} &{} \cdots &{} \frac{\partial T_1}{\partial x_D} \\ \vdots &{}\quad \ddots &{}\quad \vdots \\ \frac{\partial T_D}{\partial x_1} &{}\quad \cdots &{}\quad \frac{\partial T_D}{\partial x_D} \end{array} \right] \end{aligned}$$
(6)

Re-arranging terms we obtain the expression for the transformed image gradient,

$$\begin{aligned} \nabla _{\mathbf{x}}I_m\big ( \mathbf{T}(\mathbf{x} ) \big ) = J^T_{\mathbf{T}}(\mathbf{x}) \cdot \nabla _{\mathbf{T}} I_m\big (\mathbf{T}(\mathbf{x})\big ) \end{aligned}$$
(7)

where \(\nabla _{\mathbf{T}} I_m(\mathbf{x}) = \Big ( \frac{\partial I_m(\mathbf{T(\mathbf{x})})}{\partial T_1}, \ldots ,\frac{\partial I_m(\mathbf{T}(\mathbf{x}))}{\partial T_D} \Big )\) is the gradient of the original moving image. The gradient of the transformed moving image is therefore evaluated as the product of the transpose of the spatial Jacobian matrix and the gradient of the original (undeformed) moving image.

In other words, the gradient of the transformed moving image can be evaluated as the product of the transpose of the spatial Jacobian matrix and the gradient of original (undeformed) image. Hence, the derivatives of both images are computed only once and the computational complexity of the method is significantly reduced.

It is of interest to note that in the case of a rigid transformation, the spatial Jacobian matrix is the same for all locations in the domain and simply reflects the rotational component of the transformation.

Alternative techniques for comparison

We evaluated the registration performance of a variety of intensity-based registration techniques with and without relevant pre-processing stages. The first set of experiments involve conventional multi-modal intensity-based approaches such as the maximization of normalized cross-correlation (NCC), the maximization of mutual information (MI), and the maximization of normalized mutual information (NMI). Since the optimization strategy can potentially play a critical role in the registration performance, we evaluated the registration with the same non-gradient-based optimization strategy employed in our technique, as well as a gradient descent (GD) strategy with an adaptive gain.

The second set of experiments involve the use of gradient magnitude images as input images instead of the original volumes, and employing the same intensity-based similarity metrics from the first set of experiments. Finally, we also evaluate the performance of an approach largely based on the registration pipeline proposed in [10]. This techniques relies on a filtering pre-processing stage which aims to reduce the presence of speckle, noise, and other image artifacts in both modalities and thereby improve registration robustness. In particular, the US volume is first Gaussian blurred to reduce the effect of speckle. Then, high-intensity regions in the US are automatically identified as those whose intensity values lie above the Otsu threshold [17]. The mask is then dilated by a few voxels so as to allow for an increased registration aperture range. The MR image is similarly processed with a median filter, which tends to increase the intensity homogeneity in regions with a common tissue type. Finally, the processed images are registered by maximization of NMI.

Registration validation

Given the lack of a gold standard, we measure the registration accuracy of each method as the mean distance between homologous landmarks independently identified by two or three experts,Footnote 2 commonly referred as the mean target registration error (mTRE). Each case has between 19 and 40 landmarks in total. It is of critical importance to note that the minimal mTRE under a rigid transformation has a unique non-zero value for each case, both for each expert’s landmarks as well as for the combined set of all experts’ landmarks. There are two main reasons behind this phenomenon. The first is the inherent uncertainty from the experts in accurately identifying anatomical locations in both modalities (particularly in the US volume). Thus, large errors in the identification of landmarks reduce the accuracy of the performance metric (i.e., mTRE) and also result in a potentially false large value for the minimal mTRE under a rigid transformation. The second is the potential presence of non-rigid deformations. In particular, in the case of perfectly accurate landmarks, a non-zero minimal mTRE effectively quantifies the “residual” part of the deformation that is not fully explained by a rigid transformation. Hence, a large minimal mTRE can reflect the presence of errors in the identified landmarks and/or significant non-rigid components in the true deformation. For the purposes of illustrating the variability of landmark identification between experts and the potential need for a non-rigid registration, we provide a quantitative analysis of the chosen points.

Analysis of expert landmarks

For each expert’s landmarks, we evaluate the initial mTRE of each case. We also report the mTRE evaluated with the optimal rigid transformation. The optimal rigid transformation given a set of landmarks is obtained analytically by solving the corresponding Orthogonal Procrustes problem [21]. Note that the mTRE evaluated with the optimal rigid transformation corresponds to the minimal mTRE that can be obtained with a rigid transformation given that set of landmark points. In order to analyze the agreement between different experts, we also evaluate the mTRE of a given expert’s landmarks with relation to the optimal rigid transformation obtained with another expert’s landmarks. Such measure is particularly informative in cases that do not seem to require a non-rigid transformation.

Table 1 lists the results from such analysis. For example, Table 1a evaluates the mTRE based on the landmarks identified by Expert 1. The column labeled Initial lists the mTRE values computed at the initial (unregistered) location of each case. The column labeled Expert 1 Solution lists the mTRE values computed with the optimal rigid transformation (analytically) obtained from Expert 1’s landmarks. Hence, the column labeled Expert 1 Solution in Table 1a also lists the minimal possible mTRE. We identify all values of the minimal mTRE larger than 2 mm (highlighted in bold) as cases with potential need for a non-rigid model. Notice that between Expert 1 and 3, only Expert 1 yields one case with a minimal mTRE larger than 2 mm. In direct contrast, Expert 2 yields six cases with a minimal mTRE larger than 2 mm. We therefore conclude that for most of the evaluated cases and their corresponding landmarks, a rigid transformation can characterize most of the deformation encountered.

Table 1 Analysis of homologous landmarks identified by each expert

In Table 1, all mTRE values larger than 3 mm and evaluated with relation to a different expert’s solution are identified as cases with significant disagreement between experts (highlighted in italic). The most striking case of disagreement is found in Case 1, where each expert has a corresponding minimal mTRE of 1.23, 1.23, and 1.35 mm. Thus, there seems to be no strong need for a non-rigid transformation. However, the rigid solution obtained with the landmarks from Expert 2 results in a mTRE value of 11.55 mm with Expert 1’s landmarks. In contrast, the rigid solution obtained with the landmarks from Expert 3 results in a mTRE value of 2.11 mm with Expert 1’s landmarks. Additionally, the rigid transformation from Expert 2 results in a mTRE of 10.69 mm when evaluated with the landmarks chosen by Expert 3. In contrast, the rigid transformation from Expert 1 results in a mTRE of 2.66 mm when evaluated with Expert 3’s landmarks. Hence, we can state that Expert 1 and 3 seem to somewhat agree on the deformation encountered in Case 1, while having a strong disagreement with Expert 2.

The landmarks identified by Expert 1 and 2 for Case 1 are illustrated in Fig. 2 for further analysis. Notice the significant difference between the spatial distribution of each set and the difference between the apparent transformation for each set. In particular, the points identified by Expert 1 (shown in blue and white) are already quite close in distance, while the ones identified by Expert 2 (shown in green and yellow) are significantly farther apart. For reference purposes, in Fig. 3, we illustrate the landmarks identified by Expert 1 and 2 for Case 13, which exposes a less prominent disagreement between experts. In such case, the distributions of the two landmark sets are relatively similar when compared to the ones found in Case 1.

Fig. 2
figure 2

Homologous landmarks identified by Expert 1 and 2 for Case 1. Points identified by Expert 1 are colored in blue (MR) and white (US). Points identified by Expert 2 are colored in green (MR) and yellow (US). A coronal and transverse slice of the MR are also shown for reference, as well as a translucent rendering of the skin’s surface

Fig. 3
figure 3

Homologous landmarks identified by Expert 1 and 2 for Case 13. Points identified by Expert 1 are colored in blue (MR) and white (US). Points identified by Expert 2 are colored in green (MR) and yellow (US). A coronal and transverse slice of the MR are also shown for reference, as well as a translucent rendering of the skin’s surface

The relevance of the analysis on the experts’ landmarks lies in highlighting the challenges involved in the validation of a registration method in real clinical cases with no gold standard. In particular, we feel it is important to underline that though the use of manually identified points allows for a quantitative evaluation of performance, there is still a significant degree of subjectivity behind such validation strategy and the numerical results should not be accepted blindly. In particular, based on the exposed variability between experts and corresponding landmarks, we state that the validation of a non-rigid registration based on this particular dataset is rather compromised. Hence, we choose to evaluate registration performance restricted to a rigid transformation.

Results

Our proposed approach was evaluated with three configurations. The first configuration, referred to as GOA Full Mask, involves the maximization of gradient orientation alignment of the top 20 % locations with the highest gradient magnitudes in the reconstructed 3D US volume. The second configuration, referred to as GOA Subset, provides reduced processing times and involves the maximization of gradient orientation alignment of 8,000 locations randomly selected from the previously defined top 20 % mask. It is important to note that 8,000 locations correspond to approximately 2 % of the voxel locations found in the US volume (the exact ratio varies from case to case). The first two configurations were implemented in C++ and run on a computer with an Intel Core 2 Quad Q6700 CPU. The third configuration, GOA Subset on GPU, is implemented to run on a NVIDIA GTX 670 video card and was developed to provide highly reduced processing times. It involves the maximization of gradient orientation alignment of 16,000 locations randomly selected from the top 20 % mask. Figure 4 illustrates the images from Case 3 before and after registration with GOA Subset on GPU. Notice how key structures like the falx and the lateral ventricles are closely aligned after registration.

Fig. 4
figure 4

Overlapped pMR and iUS slices from Case 3 before and after registration. The first row shows the slices at their initial location (coronal, sagittal, and transverse). The second row shows the slices after registration with the proposed approach

The registration results for the three configurations and each of the fourteen cases are shown in Figs. 5 and 6. In Fig. 5, we illustrate the performance of each configuration with relation to a superset of landmarks that includes all experts’ landmarks. It is important to note that all configurations consistently improve the registration accuracy with relation to their corresponding initial location. Additionally, we can also observe that the resulting mTRE is very close to the minimal mean distance under a rigid transformation (depicted as a red dashed line). The same set of results are enlisted in Table 2, where we also report the processing times for each configuration. The first configuration, (GOA Full Mask), involves a processing time that ranges from 36 to 76 s, which is comparable to the processing time of conventional intensity-based methods. Alternatively, the second configuration, (GOA Subset), obtains a similar registration accuracy with significantly reduced processing times that range from 7 to 14 s. Finally, the third configuration, (GOA Subset on GPU), also obtains a similar registration accuracy but with highly reduced processing times that range from 0.61 to 0.93 s.

Fig. 5
figure 5

Rigid registration results with proposed method evaluated with the set of all landmarks independently identified by three different experts. The x axis corresponds to the clinical case, while the y axis corresponds to the mTRE between manually identified corresponding anatomical points. Also shown are the initial mTRE (Initial) in blue, as well as the minimal mTRE (Minimal) possible under a rigid transformation, shown as a red dashed line. Notice that the three configurations yield a mTRE just slightly larger than the minimal mTRE. Table 2 also lists the resulting mTRE values for each configuration as well as the corresponding processing times

Fig. 6
figure 6

Rigid registration results with proposed method evaluated with each of three sets of landmarks identified by a particular expert. The x axis corresponds to the clinical case, while the y axis corresponds to the mTRE between manually identified corresponding anatomical points. Also shown are the initial mTRE, (Initial), in blue, as well as the minimal mTRE (Minimal) possible under a rigid transformation, shown as a red dashed line. Notice that for Expert 1 and 3, both configurations yield a mTRE just slightly larger than the minimal mTRE and significantly decreased with relation to the initial mTRE. However, evaluation with relation to Expert 2 yields various cases with poor performance (e.g., Case 1, 7, 8, 12, 13 and 14)

Table 2 Rigid registration results with proposed method evaluated as the mean distance (i.e., mTRE in mm) and range (i.e., minimal and maximal value) between all the landmarks independently identified by all experts combined

In Fig. 6, we illustrate the performance of all configurations with relation to the set of landmarks identified by each expert. It is immediately clear that our proposed method is in very close agreement with the landmarks selected by Expert 1 and Expert 3, since the resulting mTRE is quite close to their minimal mTRE. On the other hand, when evaluating with relation to Expert 2 landmarks, we tend to encounter a slightly larger value than the minimal mTRE. However, this comes as no surprise, since in the previous section we already demonstrated that Expert 2 had significant disagreements with Expert 1 and 3. Thus, any method that tends to align more with landmarks from Expert 1 and 3 will inevitably show a degradation with relation to the landmarks from Expert 2.

We also present a statistical summary of the registration performance of our proposed approaches as well as of competing techniques in Table 3. The evaluated methods are characterized by a choice of similarity metric: mutual information (MI), normalized cross-correlation (NCC), normalized mutual information (NMI), and gradient orientation alignment (GOA); input images: original images (ORI), gradient magnitude images (GM), and median filtered MR in conjunction with a Gaussian blurred US (PRE); and optimization strategy: covariance matrix adaptation evolution strategy (CAE) and a gradient descent (GD) optimizer with adaptive gain. We report the mean and median value of two registration accuracy measures: the mTRE of the homologous landmarks from all experts combined, and the difference between such mTRE and the minimal mTRE. Additionally, we demonstrate the number of cases that had a successful registration, where we define success as an instance when the mTRE is larger than the minimal mTRE by no more than 1 or 2 mm.

Table 3 Statistical summary of rigid registration results with all evaluated techniques

The results obtained clearly indicate that conventional multi-modal intensity-based metrics, like MI, NCC, and NMI, generally show very poor performance in this particular context. Nonetheless, if we make use of gradient magnitudes images and employ the same multi-modal similarity metrics, we obtain slightly improved results. A particularly good-performing configuration involves the maximization of NMI between gradient magnitude images with a gradient descent strategy, which successfully registers 12 of the 14 cases with an mTRE less than 1 mm larger than the minimal mTRE.

In comparison with other competing techniques evaluated, our proposed approach is the only one where all cases have a mTRE less than 1 mm larger than the minimal mTRE. In particular, the median value of the difference between the mTRE and the minimal mTRE is of 0.27 mm for the first configuration, (GOA Full Mask); 0.22 mm for the second configuration, (GOA Subset); and 0.33 mm for the third configuration, (GOA Subset on GPU).

Discussion

We have presented a rigid registration method for MR to iUS and evaluated it with fourteen clinical cases obtained from the MNI BITE dataset. Registration accuracy was evaluated with the use of homologous landmarks identified by two or three experts, which provide a reliable indication of the positive performance of the algorithm with relation to competing methods. It is important to note that, while our validation with real clinical cases is much more preferable than a synthetic setup which relies on brain phantoms, there are still some serious limitations with such a validation strategy. In particular, we have pointed out that there is inherently an uncertainty in the points identified by the experts. We have exposed some of the limits of such validation strategy by reporting the minimal mTRE obtainable with a rigid transformation (with all available landmarks or with the landmarks associated with a particular expert), as well as the mTRE with relation to another expert’s solution. Such analysis shows that there is a significant variability in the performance metric dependent on the particular expert used for landmark identification.

While for some cases a rigid body registration might be sufficient for a successful alignment of all regions of interest, there are also cases that might require correction of non-rigid deformations. It is important to note that the evaluation of a non-rigid registration approach magnifies the challenge of adopting a proper validation scheme. Future work in this direction should not only propose a suitable non-rigid registration scheme, but also present a principled approach for evaluating its accuracy.

In is important to note that we evaluated our proposed approach on real clinical cases prior to resection. Future work will directly address the task of registration given the presence of resection cavities and evaluate its performance. Note that such context involves the additional challenge of uncorresponded tissue across modalities.

We have also specified that our current setup suffers a significant processing delay related to the reconstruction of a 3D US volume with the set of acquired 2D US slices. It is likely that a reduced processing time can be obtained by software optimization or by adopting a GPU-based solution [4]. Alternatively, we can adopt a slice-to-volume registration strategy which effectively bypasses the need of a reconstruction algorithm and provides almost immediate visual feedback. Future work will evaluate the performance of such techniques.

Conclusion

We have presented a new and robust approach for the rigid registration of pMR to iUS which provides fast and robust performance evaluated over fourteen clinical cases. Our proposed approach registered all cases with a median difference between the mTRE and the minimal mTRE (i.e., lower bound) of 0.22, 0.27, or 0.33 mm (depending on the configuration). Furthermore, we have shown the computational efficiency of our technique which allows for registration times as low as 0.61 s by using a reduced set of locations and adopting a GPU-based implementation. Finally, in order to expose some of the limitations of our validation strategy, we have also reported an analysis on the manually identified landmarks used for measuring registration accuracy, where we have shown that there is a significant variability between experts.