Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Multi-modal image registration is a fundamental research area in medical imaging. Spatially aligning complementary information from two or more imaging modalities has a wide range of applications, including diagnostics, planning, simulation and guidance.

Registration of multiple modalities has been extensively studied and many solutions have been proposed [1] using landmarks, image intensity, gradients, mutual information and learning similarity functions. These approaches often assume cross-modality information, e.g., anatomical structures, landmarks or objects that are visible in both imaging modalities. The presence of cross-modality information is a reasonable assumption for many clinical applications. However, the lack of such information can prohibit automatic and accurate registration.

In image guided interventions, such as cardiac resynchronization therapy (CRT), pre-operative MR or SPECT images are used to analyse tissue characteristics or function and intraoperative X-ray fluoroscopy is used to guide the procedure. The pre- and intra-operative modalities are fundamentally different and do not share significant cross-modality information. In such cases alternative registration strategies are required.

Cross-modality registration for CRT procedures can be performed using fiducial markers and optical tracking devices [2], however, this requires the pre-operative MR imaging immediately before the procedure and additional hardware in the operating room. Anatomical registration has been proposed where the position of the vessels is inferred from catheters and aligned to vessels segmented from pre-operative images [3, 4]. However, catheters may induce deformations in the anatomy and the quality of MR images may be too low to identify vascular structures accurately. Registration of pre-operative SPECT to fluoroscopic images has been proposed by manually matching landmarks (intraventricular grooves to coronary artery tree), performing an iterative closest point (ICP) refinement and finally a non linear warping [5]. This method is dependent on accurately identifying landmarks in pre-operative data which is challenging and variations of the anatomy may result in inaccuracies. Additionally, since the epicardium is not visible in the SPECT images, the center of the myocardium is detected and a constant myocardial thickness is assumed. Considering this, and that the vessels are warped to the generated epicardial surface, the accuracy of the algorithm is questionable.

In this paper, a novel approach is presented for registering multi-modal images by using adjacent anatomical structures which does not rely on cross-modality information. The registration of adjacent structures is formulated as a partial surface registration problem which is solved using a globally optimal ICP (Go-ICP) algorithm. The practical clinical application is demonstrated on an image guided CRT procedure by registering the left ventricle (LV) (pre-operative MR) to the coronary vessel anatomy (intra-operative fluoroscopy). The method is validated on synthetic and phantom data.

2 Methods

At the core of the proposed registration approach is the use of anatomical structures that are adjacent or share a common surface. For example, in cardiac anatomy the epicardial surface of the LV is adjacent to the coronary sinus (CS) vessel tree. The LV is visible in preoperative MR but the vessel tree is not. The vessel tree is visible during contrast injected X-ray fluoroscopy, however, the LV is not. This concept is illustrated in Fig. 1. The following section outlines how this prior anatomical knowledge can be exploited to register multi-modal images without cross-modality image information.

2.1 Extracting Adjacent Anatomical Structures

The LV is automatically segmented from pre-operative MRI. The epicardial contour is detected in long (two-, three- and four-chamber) and short axis images using a combination of machine learning landmark detection and gray level analysis [6]. A mesh is fit to the contours to generate a surface representation of the LV epicardium at end diastole as shown in Fig. 1(a).

Two contrast injected fluoroscopy images are acquired during the intervention. The images and the corresponding venous tree (CS and the venous branches that drain into the CS) are illustrated in Fig. 1(b). The sequences are acquired at different angulations and time points. As a result the sequences capture the anatomy at various points in the cardiac cycle. One (end diastolic) frame from each sequence is automatically selected using masked principle component analysis motion gating [7]. In the method, cardiac motion is extracted by band pass filtering the variation of the first principle component. Corresponding points on the CS are manually selected and reconstructed by epipolar triangulation to create a point cloud that represents the venous tree in 3D.

Fig. 1.
figure 1

Registration of adjacent landmarks in a cardiac procedure. (a) The MR data is segmented to extract the LV epicardial mesh (green). (b) The point cloud of the coronary vasculature (red) is reconstructed from two contrast injected venograms. (c) The segmented LV mesh is registered to the reconstructed point cloud (d) to show a valid overlay that can be used for interventional guidance. (Color figure online)

2.2 Registration Algorithm

Registering the reconstructed vessel point cloud \({\varvec{x}}\) (data) and the vertices of the LV epicardial shell \({\varvec{y}}\) (model) can be described as a partial point cloud matching problem with unknown point correspondences and can be formulated as

$$\begin{aligned} E({\varvec{R}},{\varvec{t}}) = \sum _{i = 1}^N e_i^2({\varvec{R}}, {\varvec{t}}) = \sum _{i = 1}^N || {\varvec{R}} {\varvec{x}}_i + {\varvec{t}} - {\varvec{y}}_{j^*} ||^2, \end{aligned}$$
(1)

where \(e_i\) is the error of point i depending on the rotation \({\varvec{R}}\) and the translation \({\varvec{t}}\), N represents the number of data points and \({\varvec{y}}_{j^*}\) the optimal correspondences. However, \(j^*\) is a function of \({\varvec{R}}\), \({\varvec{t}}\) and \({\varvec{x}}_i\). If the optimal \({\varvec{R}}\) and \({\varvec{t}}\) were known, the correspondences could be found easily and if the correspondences were known, the optimal \({\varvec{R}}\) and \({\varvec{t}}\) would be easy to calculate. The well established approach for such a problem is the ICP algorithm, however, it always finds the nearest local minimum. To find the global minimum, the Branch and Bound (BnB) algorithm can be used [8]. Convergence to the optimal solution is guaranteed, however, the whole search space may have to be processed.

BnB and ICP were combined to overcome their individual weaknesses, thus to provide a fast globally optimal solution [9]. The search space for the registrations is the special Euclidean group SE(3), that incorporates all real 3D motions and can be subdivided into the rotation group SO(3) and the translations of \(\mathbb {R}^3\). SO(3) is parametrized by a solid sphere of radius \(\pi \), and is simplified by using the cube \([-\pi , \pi ]^3\). The translations in \(\mathbb {R}^3\) can be parametrized by a cube \([-\xi , \xi ]^3\), where \(\xi \) is the half side length of the cube.

The algorithm uses two priority lists, one for the rotation cubes \(C_r\) in the outer BnB, and one for the translation cubes \(C_t\) in the inner BnB. The lower the lower error bound of a cube, the higher its priority in the list. The outer BnB calculates the lower bound

$$\begin{aligned} \underline{E} \doteq \sum _{i = 1}^N \underline{e}_i^2 = \sum _{i = 1}^N \text {max}(e_i({\varvec{R}}_{{\varvec{r}}_0}, {\varvec{t}}_0) - \gamma , 0)^2 \end{aligned}$$
(2)

for the initial cube, where \(({\varvec{r}}_0, {\varvec{t}}_0)\) represents the center of the 3D motion domain \(C_r\times C_t\) and \(\gamma \) is the uncertainty radius [10]. If the current error estimate \(E^*\) is close to \(\underline{E}\), the solution is found. Otherwise, the cube is subdivided and the subcubes are processed. The upper error bound

$$\begin{aligned} \overline{E} \doteq \sum _{i = 1}^N \overline{e}_i^2 = \sum _{i = 1}^N e_i^2({\varvec{R}}_{{\varvec{r}}_0}, {\varvec{t}}_0), \end{aligned}$$
(3)

and the corresponding optimal translation is calculated for each subcube \(C_t\) by the inner BnB algorithm. If the upper error bound is smaller than \(E^*\), ICP is run to update the error and the transformation \(({\varvec{R}},{\varvec{t}})\). The lower error bound \(\underline{E}\) for the current subcube is calculated and if it is above \(E^*\), the cube is discarded, otherwise it is added to the priority list. The above steps, starting with removing the first item from the priority list are repeated until the lower error bound and the current error estimate are both within a set threshold (\(\underline{E} - E^* < \textit{tresh}\)).

The algorithm guarantees convergence to the globally optimal solution. It is additionally much more efficient than the standard BnB algorithm, since even if it explores the whole possible solution space, it refines the intermediate results with the ICP method, thus benefitting from the good attributes of both algorithms. It has been shown that the algorithm is well suited to registering partial surfaces, has high noise tolerance and is robust to outliers [10].

3 Results and Evaluation

The evaluation of the approach on in vivo patient data is extremely challenging. The lack of cross-modality information in the available images makes the generation of an accurate ground truth registration very difficult, since no automatic approach is capable of registering the images and there is only minimal information present (heart shadow) for a clinical expert to perform the registration manually. Therefore, the proposed method was quantitatively and qualitatively evaluated on synthetic and phantom data. The practical clinical application is demonstrated on a CRT intervention.

3.1 Synthetic Data

A synthetic dataset of the LV and coronary tree was created from a patient MR dataset. The LV point cloud is segmented from MR as described above, however, the coronary tree is not visible in the MR and therefore it is artificially created by sampling points from the vertices of the LV mesh. Four experiments were performed to evaluate the proposed approach.

A baseline comparison between ICP and Go-ICP was performed by registering two LV meshes to each other where one mesh was artificially transformed by a rotation (-25\(^{\circ }\) to 25\(^{\circ }\) around all three axis with 5\(^{\circ }\) steps). For each initialization the mean absolute error (MAE) of the corresponding vertices (vertex-to-vertex error) was calculated, see Table 1. The Go-ICP method always finds the optimal alignment, compared to the high fail rate (MAE > 5.0mm) of the conventional ICP method. The registration of the coronary tree to the LV is a partial registration problem. An experiment was performed to determine the minimum number of coronary tree points required for a successful registration, the datapoints were a random subset of the model vertices and the number of the randomly selected vertices was varied. The average error decreases significantly as the number of selected points increases. A minimum of 20 points representing the coronary tree was chosen since then the error decreases below 0.025 mm.

Table 1. Results of the evaluation performed on the synthetic data for registering the complete mesh and only a subset of the vertices (point cloud) with the ICP and the Go-ICP methods.

Partial surface registration performance was evaluated with ICP and Go-ICP using 20 points to represent the coronary tree and transforming the data as described above. The results shown in Table 1 demonstrate the proposed approach is robust even in the partial surface registration. To inspect the noise tolerance of the method the same, 20 vertex, realistic point cloud was used as in the previous experiment. In addition to the transformation, white Gaussian noise was added to the data with varying standard deviation. For no added noise the vertex-to-vertex error is close to 0, however, as the noise increases from 1 mm to 2 mm, the error increases from \(1.63\pm 0.12\) mm to \(3.68\pm 2.02\) mm.

3.2 Phantom Data

Phantom experiments were performed to evaluate the proposed approach in a clinical imaging environment with known ground truth. The LV epicardial surface (segmented from an MR) was 3D printed and wires were attached to model the vascular tree. Intra-operative ground truth data was obtained by acquiring a cone beam CT (CBCT). The LV point cloud, from CBCT, was registered to ten pairs of X-ray fluoroscopy.

The mean 3D vertex-to-vertex MAE between the ground truth and the automatically registered mesh was \(3.28\pm 1.18\) mm. The mean Hausdorff (surface-to-surface) distance was 1.12 mm with a mean maximal distance of 3.36 mm. The mean Dice score of the projections was \(0.98\pm 0.01\). A summary of the results can be seen in Table 2. The overlay and the Hausdorff distance mapping is shown for one setup in Fig. 2. Small registration errors are attributed to inaccuracies in the reconstruction of the vascular tree and LV segmentation.

Table 2. Quantitative results of the evaluation on phantom data. The vertex-to-vertex MAE was calculated in 2D and 3D, the dice score in 2D and the Hausdorff distance in 3D for the ground truth and the automatically registered meshes.
Fig. 2.
figure 2

(a) Anterior-posterior (AP 0\(^{\circ }\)), (b) right anterior oblique (RAO 30\(^{\circ }\)) and (c) left anterior oblique (LAO 30\(^{\circ }\)) X-ray projections of the LV Phantom with the Hausdorff distance mapped to the surface vertex color. (Color figure online)

Fig. 3.
figure 3

Registration application for a CRT procedure. (a) Overlay of the epicardal mesh (green) and (b) the segmented scar tissue mesh (red). The electrodes of the multipolar lead (yellow) are placed to avoid scar tissue. (Color figure online)

3.3 Clinical Application

The practical clinical application of the approach is demonstrated on a CRT procedure, see Fig. 3. The clinician must place electrodes in healthy tissue, however, it is not possible to differentiate between healthy and scarred tissue using conventional X-ray fluoroscopy guidance. Scar tissue (red), segmented from late gadolinium enhancement MR images, can be overlaid onto X-ray fluoroscopy images using the proposed approach, after the acquisition of two contrasted venograms. The presented overlay of scar tissue meshes onto X-ray fluoroscopy guides the clinician towards healthy tissue which can potentially increase responder rates and reduce procedure time. Since the orientation of the heart is approximately known, the rotation space was limited to \(\pm {25}^{\circ }\) in all three degrees of freedom. The registration time with the set limits for the case was 45 s on an Intel i7 with 8 GB of RAM.

4 Conclusion

This paper presents a novel method for automated registration of multi-modal images using adjacent anatomical structures. The approach does not use cross-modality image information. Adjacent structure alignment is formulated as a partial surface registration problem which is solved using a globally optimal ICP method. The presented approach is validated on synthetic and phantom data. The method is capable of fast registration making it well suited to clinical workflows.