Keywords

1 Introduction

Image-guided neurosurgery (IGN) has proven to be a valuable tool for assisting neurosurgeons in the planning, interventional, and post-operative clinical phases [1, 2]. Yet, achieving accurate lesion localization and differentiation from the surrounding anatomical structures remains a challenging task in neurosurgery. This challenge is related to the difficulty of visually defining these pathologic structures from healthy tissue in addition to the brain movements, known as “brain shift”, due to neurosurgical manipulation, gravity, and anesthesia [3].

Hence, intra-operative magnetic resonance images (iMRI) and intra-operative ultrasound images (iUS) have been used for compensation of brain shift during surgery [4]. The iMRI scanner however limits the physician’s access to the operative field and special surgical tools are required, which may be associated with high costs. iUS is portable, inexpensive, requires little preparation, and provides fast data acquisition. Though iUS can visualize interior soft tissue and structures, it has the difficulty of imaging through bones, and its high dependency on the inter-operator interpretation may result in image inconsistency. Consequently, the fusion of pre-interventional MRI images with the iUS data acquired intra-operatively is proposed to compensate for the brain shift to enable guided surgery.

Over the past years, many approaches have been applied for medical image registration that can be classified into non-learning- and learning-based approaches [5, 6]. Basically, classical or non-learning methods are formulated as an iterative pair-wise optimization problem that requires proper feature extraction, choosing a similarity measurement, defining the used transformation model, and finally an optimization mechanism to investigate the search space. Over time, extensive literature has developed using diverse combinations of the aforementioned elements [7,8,9,10]. Still, the traditional iterative process is computationally expensive, requiring long processing times ranging from tens of minutes to hours even with an efficient implementation on a regular central processing unit (CPU) or modern graphical processing unit (GPU).

To overcome the limitations of classical methods, learning-based approaches have been proposed in recent years. Learning methods formulate the classical optimization problem into a problem of loss function estimation. Rather than optimizing for every input pair of images individually, deep learning tends to find a function that takes many pairs of images and directly computes the transformation field. Some neural networks were proposed for the registration of a pre-operative MRI to the iUS volumes for brain shift correction [11, 12]. To better cope with inaccurate ground truth data and to eliminate the time required for dataset annotation, unsupervised learning was introduced [13].

In this work, we propose a real-time automated deformable MRI-iUS registration method using a mixture of deep learning and traditional registration tool. By combing both methods, our method intends to provide considerably improved robustness and computational performance for assisting neurosurgeons intra-operatively. The main contributions of this paper are as follows.

  • We introduce our hybrid learning-based and traditional approach (see Fig. 1) for MRI-iUS deformation field estimation.

  • We validate the performance of our model on data from 36 patients from two publicly available multi-site datasets: BITE and RESECT and compare it to the state-of-the-art non-learning- and learning-based registration algorithms.

  • To the best of our knowledge, this is the first real-time non-linear pre-operative MRI to iUS registration method using hybrid learning-based and classical approaches towards brain shift compensation.

2 Material and Methods

2.1 Dataset

In this study, we have used two public accessible multi-site datasets, namely BITE [14] and RESECT [15]. These datasets contain pre-operative MRI and 3D iUS images from 14 patients and 22 patients, respectively. Expert-labeled anatomical landmarks, provided for each MRI-iUS pair, are utilized for ground truth evaluations. For the BITE dataset, we use the landmarks chosen by the first two experts while the third expert’s annotation in the dataset was excluded for consistency since they provided data for the first six patients only.

2.2 Proposed Workflow

Traditional Image Registration.

Medical image registration is the process of aligning two or more sets of imaging data acquired using mono- or multi-modalities into a common coordinate system. Let \({I}_{F}\) and \({I}_{M}\) denote the fixed and the moving images, respectively, and Let \(\phi \) be the deformation field that relates the two images. Then, our goal is to find the minimum cost function C as:

$${\varvec{C}}={\varvec{D}}\left({{\varvec{I}}}_{{\varvec{F}}},{{\varvec{I}}}_{{\varvec{M}}}.{\varvec{\phi}}\right)+{\varvec{R}}\left({\varvec{\phi}}\right)$$
(1)

where \(\left({{\varvec{I}}}_{{\varvec{M}}}\boldsymbol{ }.\boldsymbol{ }\boldsymbol{ }{\varvec{\phi}}\right)\) is the moving image \({{\varvec{I}}}_{{\varvec{M}}}\) warped by the deformation field \({\varvec{\phi}}\), the dissimilarity metric is denoted by \({\varvec{D}}\), and \({\varvec{R}}\left({\varvec{\phi}}\right)\) represents the regularization parameter. In this work, MRI and iUS scans are utilized as the moving and fixed images, respectively, since our goal is to reflect the brain shift in the pre-operative MRI data.

Learning-Based Registration.

Figure 1 presents an outline of the proposed non-rigid registration method. Our model consists of two steps: first, \({{\varvec{I}}}_{{\varvec{F}}}\) and \({{\varvec{I}}}_{{\varvec{M}}}\) are fed into our convolutional neural network (CNN) that then predicts \({\varvec{\phi}}\). Second, \({{\varvec{I}}}_{{\varvec{M}}}\) is transformed into a warped image (\({{\varvec{I}}}_{{\varvec{M}}}\boldsymbol{ }.\boldsymbol{ }\boldsymbol{ }{\varvec{\phi}}\)) using a spatial re-sampler. The developed CNN architecture utilized in experiments is based on U-Net [16] and our previous enhancement [17]. Using backpropagation, which is a feedback loop that estimates the network weighting parameters, the network can automatically learn the optimal features and the deformation field.

Our CNN contains two main parts: a feature extractor (or encoder) as well as a deformation field estimator (or decoder). 3D convolutions are applied in both encoder and decoder parts instead of the 2D convolutions used in the original U-Net architecture. Table 1 lists the detailed implementation of each layer in our CNN. The encoder consists of two consecutive 3D convolutional layers, each followed by a rectified linear unit (ReLU) and 3D spatial max pooling. A stride of 2 is employed to reduce the spatial dimension in each layer by half, similar to the traditional pyramid registration scheme. In the decoding path, each step consists of a 3D up-sampling, a concatenation with the corresponding features from the encoder, 3D up-convolutions, a batch normalization layer, followed by a rectified linear unit (ReLU). Finally, a 1 x 1 x 1 convolution layer is applied to map the resultant feature vector map into \({\varvec{\phi}}\).

Fig. 1.
figure 1

An overview of the proposed workflow for 3D MRI to iUS image deformable registration. Dashed red arrows show the processes applied in the training stage only (Color figure online).

Table 1. Our deformable CNN architecture details.

Loss Function.

Owing to the applied two-step approach, the overall loss function \(\mathcal{L}\) has two components, as Shown in Eq. 2. \(\mathcal{L}\) computes the image similarity between the warped image (\({{\varvec{\phi}}.{\varvec{I}}}_{{\varvec{M}}}\)) and the ground truth warped image \( {{\varvec{I}}}_{{\varvec{W}}}\), whereas \({\mathcal{L}}_{{\varvec{d}}{\varvec{i}}{\varvec{s}}{\varvec{p}}}\) corresponds to the deformation field gradient error.

$$\mathcal{L}={\mathcal{L}}_{{\varvec{s}}{\varvec{i}}{\varvec{m}}}+\boldsymbol{ }{\mathcal{L}}_{{\varvec{d}}{\varvec{i}}{\varvec{s}}{\varvec{p}}}$$
(2)

where \({\mathcal{L}}_{{\varvec{s}}{\varvec{i}}{\varvec{m}}}\) employs the similarity metric of the local normalized correlation coefficient (NCC), which are calculated as follows:

$${\mathcal{L}}_{{\varvec{s}}{\varvec{i}}{\varvec{m}}}=\mathbf{N}\mathbf{C}\mathbf{C}({\mathbf{I}}_{\mathbf{W}},{\varvec{\upphi}}.{\mathbf{I}}_{\mathbf{W}})=\frac{1}{\mathbf{N}}\mathop{\sum}\nolimits_{\mathbf{p}{\varvec{\upepsilon}}\mathbf{X}}\frac{\sum_{\mathbf{i}}\left({\mathbf{I}}_{\mathbf{W}}\left(\mathbf{p}\right)-\overline{{\mathbf{I} }_{\mathbf{W}}\left(\mathbf{p}\right)}\right)\sum_{\mathbf{i}}\left({ {\varvec{\upphi}} . \mathbf{I}}_{\mathbf{M}}\left(\mathbf{p}\right)-\overline{{ {\varvec{\upphi} } . \mathbf{I}}_{\mathbf{M}}\left(\mathbf{p}\right)}\right)}{\sqrt{\sum_{\mathbf{i}}{\left({\mathbf{I}}_{\mathbf{W}}\left(\mathbf{p}\right)-\overline{{\mathbf{I} }_{\mathbf{W}}\left(\mathbf{p}\right)}\right)}^{2}}\sqrt{\sum_{\mathbf{i}}{\left({ {\varvec{\upphi}} . \mathbf{I}}_{\mathbf{M}}\left(\mathbf{p}\right)-\overline{{ {\varvec{\upphi} } . \mathbf{I}}_{\mathbf{M}}\left(\mathbf{p}\right)}\right)}^{2}}}$$
(3)

where \({({\varvec{\phi}}.{\varvec{I}}}_{{\varvec{M}}}\left({\varvec{p}}\right))\) and \({{\varvec{I}}}_{{\varvec{W}}}\left({\varvec{p}}\right)\) are the voxel intensities of a corresponding patch p in the warped image and the ground truth, respectively, whereas \((\overline{{\boldsymbol{ }{\varvec{\phi}}\boldsymbol{ }.\boldsymbol{ }\boldsymbol{ }{\varvec{I}}}_{{\varvec{M}}}\left({\varvec{p}}\right)})\) and \(\overline{{{\varvec{I}} }_{{\varvec{W}}}\left({\varvec{p}}\right)}\) are the mean pixel intensities for both images. \({\mathcal{L}}_{{\varvec{d}}{\varvec{i}}{\varvec{s}}{\varvec{p}}}\) measures spatial gradients differences in the predicted displacement d as follows:

$${\mathcal{L}}_{{\varvec{d}}{\varvec{i}}{\varvec{s}}{\varvec{p}}}=\mathop{\sum}\nolimits_{\mathbf{p}{\varvec{\upepsilon}}\mathbf{X}}\Vert \nabla {\varvec{d}}\left({\varvec{p}}\right)\Vert $$
(4)

3 Experimental Results

3.1 Experimental Setup

Due to the large differences between the two databases in terms of study characteristics and the followed MRI and iUS protocols, a pre-processing step is crucial. First, for each patient, the iUS images were resampled to the voxel resolution of the MRI of 1 × 1 × 1 mm3. Then, the MRI images were cropped to the field of view (FOV) of the iUS. After that, all images were resized into an image resolution of 128 × 128 × 128 pixels to be applicable by the proposed deep learning model. An affine alignment on MRI and iUS volumes was achieved using the MINC toolkit (https://github.com/BIC-MNI/minc-tools). Finally, we obtained the ground truth warped MRI by applying thin-plate spline transformation to the input MRI and the expert labeled MRI-iUS tags using the MINC toolkit.

As the number of cases is rather limited, we use intensive data augmentation to help prevent the model from overfitting and improve the registration results. This involves random 3D flipping, 3D rotations [0–30 degrees], random gamma intensity transformation [0.8–1.2], and elastic deformation. For the experiments, our model was implemented in Python using the TensorFlow library. The experiments were run on an Intel Xeon Gold 6248 (27.5M Cache, 2.50 GHz) CPU with 8 GB RAM and a single NVIDIA Tesla V100 GPU 32GB. For training our network, we divide the cases into two sets 78% for the training set and 22% for the validation set, the ADAM optimizer with an initial learning rate of 0.0001, and a batch size of 4 was used. To compare with other studies, we use the mean target registration error (mTRE), which represents the average distance between the corresponding landmarks in each MRI-iUS pair after registration. The evaluation of our experiments was performed using the same approach reported in [18].

3.2 Registration Results

Figure 2 shows seven examples of aligning MRI to iUS for different patients using our proposed registration method. Each column corresponds to an individual patient. From the visual results, it can be seen that overlaid MRI-iUS pairs are significantly improved after applying our method. Table 2 and Table 3 depict the pre-and post-registration results for all trained cases for the BITE and the RESECT datasets, individually. In both tables, the minimum achievable affine is the minimum mTRE we can achieve using an affine transformation for the registration. In the last column, an average of the results over the listed cases for each dataset as well as their standard deviation (std dev) is reported. For the BITE database, our model reduced the initial mTRE (provided in the dataset) from 4. 18 ± 1.91 mm to 1.68 ± 0.65 mm. Similarly, an mTRE of 0.99 ± 0.22 mm was achieved on the RESECT database starting with an initial mTRE value of 5.35 ± 4.29. This result highlights that our method delivers better results than initial alignment and similar results to the minimum achievable truth affine registration on average. In a few cases, the proposed approach performs similar or slightly worse compared to the ground-truth affine results which indicate that the optimal transformation has been achieved using the truth rigid registration. Overall, this analysis confirms that the number of available training cases affects the accuracy and robustness of the CNN. This was indicated by the superior performance of our method on the RESECT dataset (22 cases) over the BITE dataset (14 cases).

Fig. 2.
figure 2

Examples of MRI to iUS registration. From the top row, iUS images (green), preoperative T2-FLAIR MRI (grey), initial overlay of iUS on MRI, and final deformable registration, respectively. Columns cases of BITE #5, #6, #14 and RESECT #5, #9, #15, #17, and #23, respectively (Color figure online).

Table 2. Details of the MRI-iUS registration for each case in the BITE dataset. Underlined bold represent cases used during the validation stage.
Table 3. Details of the MRI-iUS registration for each case in the RESECT dataset. Underlined bold represent cases used during the validation stage.

3.3 Comparison with the State-of-the-Art Methods

The initial and final landmarks errors for the proposed method and approaches found in the literature for MRI-iUS registration are displayed in Fig. 3(a). For the BITE database, our method are compared with LC2 [7], SSC [8], SeSaMI [9], miLBP [19], Laplacian Commutators [20], cDRAMMS [10], and ARENA [21]. The results obtained indicate that our method outperforms other evaluated competing techniques, providing mTRE of 1.68 ± 0.65 mm with about 0.40 mm margin smaller than cDRAMMS.

Fig. 3.
figure 3

A comparison of the registration error (mTRE) between our proposed method and the state-of-the-art methods on the BITE dataset (a) and the RESECT dataset (b).

Furthermore, our method was applied to the RESECT database as well (Fig. 3(b)). Here, we compare our results with conventional methods: LC2 [22], SSC [18], NiftyReg [23], cDRAMMS, ARENA [21] in addition to learning methods: FAX [11], CNN + STN [12]. As illustrated in Fig. 3 (b), our model ranks first with mTRE of 0.99 ± 0.22 mm followed by the learning-based method FAX with mTRE of 1.21 ± 0.55 mm. Though team FAX reported comparable results, our method predicts a 3D pair of MRI-iUS images with a runtime of 170 ms compared to 1.77 s (team FAX), and thus 10 times faster. Experimental findings found clear support for the potential of using learning-based registration methods in neurosurgery.

4 Conclusion

This study presented an automated fast and robust non-linear approach for pre-operative MRI to iUS registration to assist intraoperative neurosurgical procedures. In our experiments, the performance of our proposed method has been evaluated using 36 patients from two multi-location databases. Outstandingly, our model outperforms the state-of-the-art in terms of both performance and computational efficiency. Furthermore, the qualitative results indicate that the registered MRI-iUS pairs have a significant improvement over the initial alignment. Therefore, the results of our proposed registration method are promising and can be applied for clinical use during future work.