Introduction

Minimally invasive beating-heart surgery, such as ablation for persistent atrial fibrillation (AF) [1], is a technically challenging procedure that is currently performed without computer navigation. Hybrid treatment of surgical AF ablation has been shown to be effective and durable [2]. The procedure is performed by thoracoscopic access from the right chest wall, so the surgeon has no direct vision when accessing the left atrium. An opportunity for navigation arises because of the extensive medical imaging involved: A preoperative cardiac computed tomography (CT) scan is routine, as are the intraoperative use of transesophageal echocardiography (TEE) and fluoroscopy. The purpose of this work is to register two of these common data sources—TEE and CT—in support of beating-heart navigation. The complexity of the surgical procedure and examples of routinely available images are illustrated in Fig. 1.

Fig. 1
figure 1

Surgical approach and routinely available image data. a Drawing of a thoracoscopic approach via three incisions in the right chest wall. b Image from a thoracoscope during beating-heart surgery, showing the ablation catheters and minimal visible landmarks. c TEE ultrasound image, acquired intraoperatively. d Surface models of the four cardiac chambers

We propose a method to register 2D TEE to 3D CT. Much existing work has relied on dynamic CT [3,4,5] that used electrocardiography (ECG) data captured with the TEE to temporally align the two image sources. Desires to reduce radiation exposure prompted interest in methods using static CT images. Li et al. [6] proposed synthetic dynamic CT, which fused a static CT with a preoperative dynamic 3D TEE, to construct a temporally varying CT model; this produced synthetic images with registration results of similar quality to those using dynamic CT, but required both preoperative and intraoperative 3D TEE. Other methods using static CT typically have relied on CT gating and then used ECG data collected simultaneously with TEE to achieve temporal alignment [7, 8].

Instead, our approach relied on only two data sources: a single static CT and a 2D TEE sequence. A 3D mesh model of the four cardiac chambers was derived by segmenting the CT scan. The chamber walls were manually delineated in a single image from the TEE sequence. Given these inputs, our algorithm computed a similarity transformation that mapped the 2D ultrasound image space to 3D CT space. This problem represents a 2D/3D slice-to-volume registration with an added complication due to the dynamic nature of the 2D imaging. It is worth noting that our reliance on segmentations of the preoperative image means that other modalities, such as magnetic resonance imaging, could be used in place of CT, as long as they can be segmented with sufficient accuracy.

Our algorithm first extracted a point set that represented the visible cardiac chambers in each image of the TEE sequence by deforming a template to model variations from image to image. Using this deformed template, an ultrasound image was registered to CT space by solving a set of rigid 2D/2D registration problems. This registration was done by precomputing, from the 3D chamber models, a large number of 2D slices along different axes and positions. The optimal transformation from ultrasound to CT space was found by an exhaustive search that registered each ultrasound image to each precomputed 2D slice.

Methods in the literature for segmenting or registering heart chambers in echocardiographic images include methods based on level sets, statistical shape or appearance models, machine learning, optical flow, and nonrigid registration [9]. Our ultrasound data sets had properties that limited the applicability of existing methods. Not all cardiac chambers were visible in all images of a sequence, and some chambers were only partly visible. The TEE ultrasound views differed significantly between patients, making it difficult to generalize the appearance of the images across patients. Also, because we were working with data sets from only a few patients, we could not rely on statistical methods that required large amounts of training data.

We used a template-based nonrigid registration technique that was initialized for a single image. The template was a deformable model that consisted of a set of oriented points, with each point labeled as coming from a particular cardiac chamber. The template could be deformed by a hierarchy of affine transformations to capture the motion of the beating heart: There was a transformation for the overall template, written as A, followed by one independent affine transformation for each cardiac chamber, written as \(C_1\ldots {}C_4\). The hierarchy of transformations is illustrated in Fig. 2. Registration through this hierarchical transformation was formulated with a hybrid mixture model similar to that used in other recent work on oriented point registration [10, 11], though these works were limited to rigid/similarity transformations. We optimized the transformation parameters using a novel generalized expectation maximization (GEM) algorithm that imposed temporal coherence on the extracted point sets by regularizing the transformations between adjacent images in the sequence.

Fig. 2
figure 2

A hierarchy of deformable models. The transformation A captures global deformation; A is applied to partitioned transformations \(C_1\ldots {}C_4\) that each model one cardiac chamber

These extracted ultrasound point sets were then exhaustively registered to a set of precomputed slices of the preoperative CT-derived geometry via a variant of the iterative closest point (ICP) [12] algorithm operating in 2D. This approach is in some sense a hybrid method, combining a discrete search over the space of planes with a continuous in-plane registration to obtain the complete 3D similarity transform mapping ultrasound to CT.

Discrete methods of slice-to-volume 2D/3D registration are enticing as they tend to be less prone to local minima than continuous alternatives [13]. Typically, discrete methods are formulated as labeling problems on Markov random fields (MRF). For nonrigid registration, the typical approach is for the nodes of the MRF to represent control points of a grid which are labeled by discrete parameters encoding local details of the transformation [14,15,16,17]. These methods interpret the deformed grid to infer a rigid transformation and a 2D in-plane deformation field. Discrete methods for linear transformations (such as rigid, similarity, and affine transformations) are less easily encoded this way, because all transformation parameters are inherently global. To solve this problem, Zikic et al. [18] proposed an approximation to the linear transform registration energy consisting entirely of second-order terms on a fully connected graph with one node per transformation parameter; however, their framework was intended for monodimensional registration, or 2D/3D registration with projective images, such as fluoroscopy. Porchetto et al. [19] subsequently proposed a framework for rigid slice-to-volume registration using a similar MRF encoding. In spite of their promising results, none of the above-described slice-to-volume registration methods also account for the temporal alignment that we consider in the present work.

The primary contribution of this paper is a novel CT to TEE registration pipeline relying on neither ECG for temporal registration nor on ECG-gated dynamic CT. To achieve this, we present a novel GEM algorithm for nonrigid registration of ultrasound image sequences, and a simple discretized approach to dynamic slice-to-volume registration. We present the results of our registration on data sets from four patients who underwent minimally invasive epicardial ablation.

Methods

This study was conducted in accordance with the principles outlined in the Declaration of Helsinki. With institutional review board (IRB) approval for retrospective image analysis, preoperative CT images and intraoperative TEE sequences were acquired from four patients.

The CT images were acquired, during breath-hold, using electrocardiography (ECG) gating at approximately 75% of the RR cycle with an Aquilion ONE 320-slice scanner (Canon Medical Systems, Tustin, CA, USA). The TEE ultrasound images were acquired using a Vivid 9 scanner (GE Healthcare, Milwaukee, WI, USA). The CT images in DICOM format were imported into Slicer open-source software; segmented; and the resulting surfaces were exported as STL meshes. These meshes, and the TEE images in AVI format, were imported into MATLAB (MathWorks, Natick, MA, USA) and processed using custom software.

An overview of our entire registration pipeline is shown in Fig. 3. Each preoperative CT scan was manually segmented and used to generate a large number of precomputed slices of 3D geometry. We then extracted point sets from a sequence of ultrasound images representing the visible parts of the endocardium in each chamber across the cardiac cycle. We exhaustively registered the ultrasound point sets to the precomputed slices of the CT to find the globally best matching ultrasound–CT slice pair, ultimately retrieving a similarity transform mapping from ultrasound to CT coordinates.

Fig. 3
figure 3

Our registration pipeline consisted of processing the preoperative CT images to precompute a large number of slices of 3D geometry. We then extracted the motion from an ultrasound sequence and registered the resulting ultrasound frames to the preoperative CT space. In an actual clinical workflow, processes inside the dotted box would occur preoperatively, while all other processes would occur during perioperative setup. Note trapezoidal processes require manual input

3D CT preprocessing

The four cardiac chambers were manually segmented from the preoperative CT scan, which produced a mesh model of each chamber. The 2D/3D registration found the optimal feasible scan plane S that was in the CT coordinate frame. To find such feasible scan planes, an initial estimate of a scan plane was computed. Feasible scan planes were found by varying the plane’s parameters from the initial feasible plane.

The mesh points of the cardiac chambers were assembled into a single \(3\times {}N_C\) matrix. An initial feasible plane was computed from principal components analysis of this matrix. The pose of each other feasible plane S was calculated by varying along the normal direction of the initial feasible plane in 2 mm increments and by varying the initial normal direction in \(1^\circ \) increments. This produced a set of \(N_P\) feasible ultrasound planes. We used \(N_P=30\times {}30\times {}30=27{,}000\) feasible planes in this study.

Fig. 4
figure 4

Manually placed points that delineated the visible chamber walls were used to initialize the deformable model. In the second example, the right atrium was not fully imaged, so \(K=3\)

For subsequent registration, each feasible plane S was intersected with each 3D surface model of each of the four cardiac chambers. The precomputation produced a data set that contained, for each slice S:

  • \(\{x_i^S\}\)—a set of points obtained by intersecting each plane S with the four chamber meshes

  • \(R_S^\mathrm {CT}\)—a rigid transformation from the local coordinates of slice S to the 3D CT space

Ultrasound processing by template deformation

We formulated the ultrasound processing problem as an application-specific nonrigid registration between sets of oriented points. For each t of \(N_E\) TEE ultrasound images, we extracted a point set \(\{x_i^t\}_{i=1}^{N_t}\) using an edge detector. The orientation \(\hat{x}_i\) of each extracted point \(x_i\) was computed as the normalized image gradient at that point, giving a set of oriented points \(X^t = \{(x_i, \hat{x}_i)^t\}\) for each ultrasound image t.

We used a deformable template shape to represent the chambers visible in the ultrasound image. The template shape consisted of oriented point set \(Y = \{(y_j, \hat{y}_j)\}_{j=1}^M\) where M is the total number of points in the model. The points \(\{y_j\}\) were initialized by manually placing a set of points delineating each chamber on the first ultrasound image in the sequence. The orientation \(\hat{y_j}\) at each \(y_j\) was automatically computed as an outward facing normal vector. Letting K be the number of visible chambers, a label map \(\ell _{\mathrm {US}} : \{1,2,\dots ,M\} \rightarrow \{1,2,\dots ,K\}\) mapped the index j of an oriented point to a chamber index \(\ell _{\mathrm {US}}(j)\). An example of two manually instantiated models is shown in Fig. 4.

Due to the nature of ultrasound imaging, the point sets extracted by edge detection are noisy. To prevent overfitting, and improve robustness, we opted to use a model of deformation that limited the total degrees of freedom, while still providing a reasonably high degree of deformability. In particular, to capture the beating-heart rhythm in the ultrasound image sequence, we used hierarchical affine transformations of the deformable template shape. A global affine transformation A modeled the overall movement of the heart and there were separate affine transformations \(\{C_k\}\) for the oriented points in each cardiac chamber, so the total transformation \(G = C_1 \circ \dots \circ C_K \circ A\) partitioned the oriented points into K sets. Each affine transformation T was represented as \(T = (B_T, d_T)\), where \(B_T\) is a \(2\times 2\) matrix and \(d_T\) is a translation vector.

The transformation G, applied to an oriented point, had the effect \(G(y_j)\) on position and \(G\star \hat{y}_j\) on the normal \(\hat{y}_j\). The transformed deformable template model was thus \(G(Y) = \{(G(y_j), G\star \hat{y}_j)\}\). The task was, for each ultrasound image t, to compute the transformation G that best aligned the model’s oriented point set G(Y) to the target’s oriented point set \(X^t\).

Formulating the registration The registration framework was based on mixture models and the expectation maximization (EM) algorithm [20], which is common in point set registration [21]. As is done in an expectation conditional maximization (ECM) algorithm [22], our M-step separately optimized the transformation parameters and the other statistical parameters. However, to make the transformation update tractable, we only required that the algorithm be a GEM algorithm [20].

We chose the mixture components to take the form of a product of a Gaussian distribution a von Mises distribution. Let \(X=\{(x_i, \hat{x}_i)\}_{i=1}^N\) be the N oriented points from the current ultrasound image, \(\mathscr {N}(\cdot | \mu , \sigma ^2)\) represent a normal distribution centered at \(\mu \) with isotropic variance \(\sigma ^2\), and \(f(\cdot | \nu , \kappa )\) represent a von Mises distribution with mean \(\nu \) and concentration parameter \(\kappa \). An oriented point \((x,\hat{x})\) had the distribution

$$\begin{aligned}&p(x,\hat{x} | G,\kappa ,\sigma ^2) \nonumber \\&\quad = \frac{\omega }{2\pi A} + \frac{1-\omega }{M}\sum _{j=1}^M \mathscr {N}(q | G(p_j), \sigma ^2)f(\hat{x} | G\star \hat{y}_j, \kappa )\nonumber \\ \end{aligned}$$
(1)

where A is the area of a region containing all the points in the target set and \(I_0\) is the zeroth-order modified Bessel function of the first kind. The first term of Eq. (1) is a uniform distribution over the registration domain to capture outliers in the target point set, and the terms of the sum correspond to the mixture model. The hyper-parameter \(\omega \) represents the expected proportion of points in the target point set that are outliers.

To avoid excessive deformations of the shape model between successive ultrasound images, a regularization function was incorporated by adding the prior \(P(G) \propto \exp (-\varPhi (G))\). The regularization function affected only the chamber transformations and took the form

$$\begin{aligned} \varPhi (G) = \lambda \sum _{k=1}^K ||B_{C_k} - B_{k0}||^2_F \end{aligned}$$

where \(B_{C_k}\) is the \(2\times 2\) transformation matrix from affine transform \(C_k\) and \(B_{k0}\) is the initial value of \(B_{C_k}\) at the beginning of the registration. Registration of the deformable template to the target oriented point set X could then be achieved by maximizing the posterior

$$\begin{aligned} p(G | X,\sigma ^2, \kappa )= P(G)\prod _{i=1}^N p(x_i, \hat{x}_i | G, \sigma ^2, \kappa ) \end{aligned}$$
(2)

with respect to the transformation G and parameters \(\sigma ^2\)and \(\kappa \).

We used an EM approach to optimize Eq. (2) by introducing a “hidden” variable \(Z = \{z_i\}_{i=1}^N\), representing from which mixture component each oriented point in the target set was sampled. For each i, \(z_i=j\) meant the ith data point was sampled from the jth cluster (or the outlier term if \(z_i = 0\)). The expected complete-data negative log-likelihood was

$$\begin{aligned} Q(G, \sigma ^2, \kappa )= & {} \sum _{i,j=1}^{N,M} \alpha _{ji} \left( \frac{||x_i-G(y_j)||^2}{2\sigma ^2} -\kappa \hat{x}_i^T (G\star \hat{y}_j)\right) \nonumber \\&+\, N_\alpha \log (\sigma ^2) +N_\alpha \log (I_0(\kappa )) + \varPhi (G)\nonumber \\ \end{aligned}$$
(3)

where \(N_\alpha = \sum _{i,j} \alpha _{ji}\) and constant terms have been removed. The \(M\times N\) matrix of coefficients \(\alpha _{ji}\) were posteriors, computed during the E-step as

$$\begin{aligned} \alpha _{ji}= & {} p(z_i = j | x_i, G, \sigma ^2, \kappa ) \\= & {} \frac{\mathscr {N}(x_i | G(y_j),\sigma ^2)f(\hat{x}_i|G\star \hat{y}_j, \kappa )}{\frac{2M\pi I_0(\kappa )\sigma ^2\omega }{A (1-\omega )} + \sum _{j=1}^M \mathscr {N}(x_i | G(y_j),\sigma ^2)f(\hat{x}_i|G\star \hat{y}_j, \kappa )} \end{aligned}$$

We divided the M-step into two separate conditional M-steps. The first optimized the distribution parameters \(\sigma ^2\) and \(\kappa \); the second optimized the transformation G.

The distribution M-step Equation (3) could be optimized independently with respect to \(\sigma ^2\) and \(\kappa \). The variance \(\sigma ^2\) was optimized analytically giving update

$$\begin{aligned} \sigma ^2 = \frac{\sum _{i,j} \alpha _{ji} ||x_i-G(y_j)||^2}{N_\alpha } \end{aligned}$$

We adapted the truncated Newton approximation maximum likelihood estimation for \(\kappa \) from hypersphere clustering methods [23], giving the iterative solution

$$\begin{aligned} \bar{R}= & {} \frac{\sum _{i,j} \alpha _{ji} \hat{x}^T_i (G\star \hat{y}_j)}{N_\alpha }, \;\;\;\; A(\kappa ) = I_1(\kappa ) / I_0 (\kappa ), \\ \kappa '= & {} \kappa - \frac{A(\kappa ) - \bar{R}}{1 - A(\kappa )^2 - A(\kappa )/\kappa } \end{aligned}$$

The transformation M-step To simplify the mathematical expressions, the concept of virtual observations can be used to derive an equivalent paired-point expression that can be optimized in place of Eq. (3) when updating the transformation [24]. We extended the concept to the case of oriented points, defining virtual weights \(W = \{(w_j, \hat{w}_j)\}\), and virtual oriented points \(V = \{(v_j,\hat{v}_j)\}\) as

$$\begin{aligned} w_j= & {} \frac{1}{2\sigma ^2}\sum _i \alpha _{ji}; \;\;\;\; \hat{w}_j = \kappa \sum _i \alpha _{ji}; \;\;\;\; v_j = \frac{\sum _i \alpha _{ji} x_i}{N_{\alpha }}; \\ \hat{v}_j= & {} \frac{\sum _i \alpha _{ji} \hat{x}_i}{N_{\alpha }} \end{aligned}$$

With these definitions, optimizing Eq. (3) with respect to G is equivalent to optimizing

$$\begin{aligned} Q'(G) = \sum _{j} [w_j ||v_j - G(y_j))||^2 - \hat{w}_j \hat{v}_j^T(G\star \hat{y}_j)] + \varPhi (G) \end{aligned}$$
(4)

Notice that this is effectively a weighted, regularized, paired-point registration between the model point set Y and target virtual point set V through transformation G.

We further decomposed the transformation M-step into two conditional M-steps: The first updated only the global affine transformation A, holding the others constant, and the second independently updated each chamber transformation \(C_k\).

We did this decomposition in such a way that each step was a solution to a weighted, regularized, paired-point registration between oriented point sets through a single affine transformation. Symbolically, let \(P=\{(p_j,\hat{p}_j)\}\) and \(Q=\{(q_j, \hat{q}_j)\}\) be corresponding oriented point sets; let \(U = \{(u_j, \hat{u}_j)\}\) be a set of weights; let \(\beta \) be a regularization weight; and let \(B_0\) be an initial matrix. Define an algorithm to do a regularized paired-point affine registration between these point sets as

$$\begin{aligned} \mathscr {A}(P,Q,U,\beta ,B_0)= & {} \mathop {\text {arg min}}\limits _{T} \sum _{j} [u_j ||p_j - T(q_j))||^2 \nonumber \\&-\, \hat{u}_j \hat{p}_j^T(T\star \hat{q}_j)] + \beta ||B_T - B_0||^2_F.\nonumber \\ \end{aligned}$$
(5)

Assuming a smooth underlying shape from which the oriented points are sampled, an affine transformation T will act on a normal \(\hat{q}\) as

$$\begin{aligned} T\star \hat{p} = \frac{B_T^{-T} \hat{q}}{||B_T^{-T}\hat{q}||} \end{aligned}$$

Equation (5) can be solved for the optimal transformation matrix \(B_T\) using a quasi-Newton solver directly on the matrix components, with an analytically computed gradient. Given the optimal matrix \(B_T\), the optimal translation can be computed as

$$\begin{aligned} d_T = \frac{\sum _j u_j p_j}{\sum _j u_j} - B_T \frac{\sum _j u_j q_j}{\sum _j u_j} \end{aligned}$$

We will now describe our decomposed transformation M-step in terms of applications of this algorithm \(\mathscr {A}\). First, consider the update of the chamber transformations. Supposing the global transformation has already been updated to some \(A'\), \(Q'(C_1\circ \dots \circ C_K\circ A')\) can be decomposed into separate terms for each chamber transformation

$$\begin{aligned}&\begin{aligned} Q_k'(C_k)&= \sum _{j \in \ell _\mathrm {US}^{-1}(k)} [w_j ||v_j - C_k \circ A'(y_j)||^2 \\&\quad -\, \hat{w}_j \hat{v}_j^T ((C_k\circ A')\star \hat{y}_j)]+\lambda ||B_{C_k} - B_{0k}||^2_F \end{aligned}\\&Q'(C_1\circ \dots \circ C_K\circ A') = \sum _k Q_k'(C_k) \end{aligned}$$

With this decomposition, the optimal update for each chamber transformation would be

$$\begin{aligned} C_k'=\mathop {\text {arg min}}\limits _{C_k}Q_k'(C_k)=\mathscr {A}(V_k,A'(Y_k),W_k, \lambda , B_{k0}) \end{aligned}$$

where \(V_k\), \(Y_k\), and \(W_k\) are, respectively, the set of virtual points, model points, and virtual weights belonging to the kth chamber.

In an ECM formulation, we would need to update the global transformation by optimizing Eq. (4) with respect to A, holding \(\{C_k\}\) fixed. However, the composition of the chamber transformations with A makes such an optimization challenging. We instead only required our algorithm to be a GEM algorithm. In the present context, this only required an update from G to \(G'\) guaranteeing \(Q'(G') < Q'(G)\).

This guarantee was achieved through a simple strategy. We updated the global transformation, completely ignoring \(\{C_k\}\) via \(A' = \mathscr {A}(V,Y,W,0,0)\). We then updated the chamber transformations as described above. If this strategy failed to improve \(Q'\), we simply reverted the global transformation to its initial value A and recomputed the chamber transformations. This was guaranteed to improve \(Q'\), since the updates to \(C_k\) optimize \(Q'_k(C_k)\). The complete transformation M-step is given in pseudocode in Algorithm 1.

figure a

Cardiac deformation over time After initializing the model by manually placing points on the first ultrasound image, for each image in the sequence we registered the model using our described GEM algorithm, initializing the global transformation G with the transformation that had been calculated from the preceding image. For the first image, the regularization matrices \(\{B_{k0}\}\) were set to the identity matrix. Subsequently, for each TEE ultrasound image, the regularization matrices were initialized to the transformation matrices \(B_{C_k}\) from the converged computations of the previous image. This initialization produced a relatively stable set of local affine transformations \(C_k\), which acted primarily as small corrections to the global affine transformation A. The result of this process was a set of \(N_E\) transformations \(\{G^t\}\) and deformed point sets \(\{U^t = G^t(Y)\}\) that represented each TEE ultrasound image.

Managing poorly registered images Some TEE ultrasound images were not well registered by our automated algorithm. To detect these poorly registered images, we computed the root mean square error between the ultrasound virtual points and model points as

$$\begin{aligned} \mathrm {RMSE}_t = \sqrt{\frac{\sum _j ||v_j - G^t(y_j)||^2}{M}}. \end{aligned}$$

Given this sequence, we used a peak detection algorithm to compute the local maxima in RMSE. We found all outlier peaks, determined as peaks lying more than three scaled median absolute deviations from the median peak, and used them to compute a threshold RMSE as

$$\begin{aligned} \mathrm {RMSE}_{\mathrm {thresh}} = \min _{t \text { an outlier}} \mathrm {RMSE}_t. \end{aligned}$$

At the end of this ultrasound image registration process, we had a set of \(N_E' < N_E\) ultrasound images that were registered with RMSE below the threshold. We kept only the positional data from the registered images, which produced a data set of \(N_E'\) point sets \(U^t = \{u^t_j\}_{j=1}^M\) representing the ultrasound sequence.

Matching 3D slices to ultrasound images

We performed an exhaustive search that matched the \(N_P\) precomputed slices from the CT model with the \(N_E'\) TEE ultrasound point sets, finding the pair producing the globally minimal RMSE. The match was computed by registering the ultrasound point set \(\{u^t_j\}\) to the CT slice \(\{x_i^S\}\) with a similarity transformation.

Table 1 Model deformation settings and results for each of the four patients are shown

The registration was initialized by creating a correspondence between the center of each cardiac chamber in the feasible CT slice and the deformed ultrasound model. This provided an approximate translation, orientation, and uniform scale factor between the point sets. Using this initial estimate, the similarity transform was optimized using a modified ICP algorithm [12], in which the nearest-neighbor search was limited to points from the same cardiac chamber. The registration was scored by the \(\mathrm {RMSE}\) between the registered ultrasound point set and the converged nearest neighbors from the CT slice.

The exhaustive search resulted in a slice \(\hat{S}\) and ultrasound image \(\hat{U}\) representing the optimal match, as defined by least RMSE. The registration between this optimal ultrasound image and slice pair gave a similarity transform \(T_\mathrm {US}^{\hat{S}}\) mapping the ultrasound to the coordinates of slice \(\hat{S}\). Combining this similarity transformation with the rigid transform \(R^{\hat{S}}\), we were able to compute the total transformation from ultrasound to CT space

$$\begin{aligned} T_\mathrm {US}^\mathrm {CT}= R_{\hat{S}}^\mathrm {CT}\circ T_\mathrm {US}^{\hat{S}} \end{aligned}$$

Data analysis

The RMS error of the deformed cardiac model to the edge-detected points in the ultrasound image was tabulated for each patient’s TEE sequence. The RMS error of the transformed deformed model to the CT segmentation was also tabulated. This transformation encoded scale factors for the size of pixels in the ultrasound image, which could be used to express the ultrasound RMS errors in millimeters.

The expected error was computed from the ultrasound RMSE and a value from the literature. In CT coronary angiography with ECG gating, blood vessels of a diameter less than 1.5 mm are routinely excluded [25]. The ultrasound RMS error was the square root of a variance, as was the minimum blood vessel diameter, so we computed the expected error by adding the values in quadrature \(\sqrt{\mathrm {RMSE}_t^2 + 1.5^2}\). We pooled two RMS errors for all patients: the overall RMS error of 2D/3D registration and the computed expected error. The pools were compared with a two-sided t-test.

Results

We processed a total of four data sets, manually segmenting each patient’s preoperative CT scan and using it to generate a total of 27, 000 slices. Of these slices, those which did not contain a cross section from all four chambers were discarded. For each patient, we sought to deform our manually initialized 2D model to each ultrasound image. The free parameters of the registration, namely outlier weight \(\omega \) and registration weight \(\lambda \), were tuned for each patient, but all varied within a similar range. The initial values for \(\sigma ^2\) and \(\kappa \) were consistently set to 250 and 1.5, respectively. After deforming the model to each ultrasound image in a sequence, and automatically detecting and removing individual poorly registered frames, we computed the RMS error between the deformed model in each image and the points extracted from the image. Table 1 summarizes these results for each patient and documents the various settings used.

The exhaustive search produced a registration of the TEE ultrasound sequence to the CT scan by seeking an optimally matched 3D slice and ultrasound point set. The total number of slices and images compared, along with the RMS error between the best matching 3D slice and ultrasound image, is reported in Table 2. The registration results are illustrated for each patient in Fig. 5, which show the 2D slices and 3D models plotted in ultrasound space.

Table 2 The total number of searched slices and images are reported
Fig. 5
figure 5

The best matching slices and ultrasound images allowed us to generate a transform for each 3D model from CT space to ultrasound space. The left-hand side shows the best matching 3D cross sections (circular markers), with the ultrasound image and model (plus markers). On the right-hand side are the same matches, with the 3D chamber models shown in translucent 3D. Each row corresponds to one patient

From a t-test of the pooled RMS values in Table 2, there was no statistically significant difference (\(p\approx {}0.58\)) between the overall registration error and the error expected from the combination of the ultrasound model deformation error and the uncertainty in the cardiac CT scan. A paired t-test also showed no statistically significant difference (\(p\approx {}0.60\)).

Discussion

Minimally invasive arrhythmia surgery has gained popularity over the past decade, especially in the most complex subset of patients, such as those with persistent atrial fibrillation [1]. In particular, an epicardial approach has provided high rates of success even when compared to a transcatheter one in persistent atrial fibrillation [26] but with an increased incidence of perioperative complications. In particular, the occurrence of intraoperative bleeding complications has been reported in a range from 0 to 10% across several reports [27]: Hence, the possibility to identify strategies to improve surgical navigation and minimize damages to the heart or the nearby vascular structures appears to be critical for a safer application of such less invasive techniques.

Our ultrasound processing used a novel hierarchical affine registration algorithm with oriented points, which provided advantages for our data sets compared to many existing methods for segmentation or registration of TEE ultrasound image sequences. The method made no assumptions about the TEE viewing angles, or about the number of visible cardiac chambers. Level set methods may be less well suited to partial views because they are constrained to producing closed contours. Meanwhile, methods based on statistical shape models are typically constrained to a single view or topological configuration as represented in the training data, or else require a large amount of user interaction to construct a patient specific model [28]. Unlike a statistical or machine learning method, our method also did not require large amounts of training data.

Our ultrasound registration made use of oriented points. The addition of orientation information to point set registration has been found to benefit accuracy in both ICP-like methods [29, 30] and probabilistic methods [10, 11, 31]. In our case, the primary benefit of orientations came from an enhanced ability to distinguish between spatially close but oppositely oriented points. For example, nearby points such as those on either side of either septum were easily distinguished when using orientation information.

We registered the deformable model to each image using a mixture model formulation. Although ICP-like formulations may be more efficient—and can use probabilistic noise models with oriented points [29]—our optimization step for the transformation fundamentally relied on the formulation as a GEM algorithm. It is possible that a sparse or incremental approach to EM [32], or a threshold on the matching likelihood [33], could be used to improve computational efficiency. Such optimizations could be realized using a PD-tree lookup for oriented points [29]. Furthermore, although GEM algorithms offer weaker theoretical guarantees than the ECM algorithms typically used for registration, in all our experiments the algorithm converged in fewer than 70 iterations. This experience is in line with recent work on statistical shape model registration using a GEM algorithm [34].

To facilitate our oriented point-based registration, we relied on a simple edge detector to extract features from each ultrasound image in a sequence. In a noisy modality such as ultrasound, this approach inevitably produces noisy data and spurious features. We addressed this by using a probabilistic registration method that is intended to be robust to outliers, along with a transformation model that is inherently limited in its total degrees of freedom to prevent overfitting. Our mixture model might become more robust to outliers by using a Student’s t distribution for the positional data, rather than a Gaussian [10, 35, 36]. Alternatively, an improved feature detection method may reduce outliers and improve robustness. Scale-invariant features, such as SIFT [37] or SURF [38] features or their 3D variants [39], have been shown to provide high-quality correspondences for ultrasound images [40, 41] and have been successfully applied in the domain of cardiac imaging [42].

The main computational bottleneck in our method was the exhaustive search for the 2D/3D match which, in our unoptimized MATLAB implementation on an older PC, took a few hours to run for each trial. This process is, however, amenable to a number of possible optimizations to be considered in future work. The nature of this search makes it very simple to be processed in parallel to whatever extent the available hardware allows. The regular structure of our generated slices, essentially forming a 3D grid along axis height and two angle dimensions, leaves the approach open to a coarse-to-fine search to reduce the number of required 2D registrations. Alternatively, it may be possible to adopt more efficient discrete optimization methods by formulating our planar search in an MRF framework using a similar approach to that of Porchetto et al. [19] adapted to feature-based matching criteria. Restrictions on the TEE view could also be used to reduce the degrees of freedom of possible transformations [8], thereby reducing the dimensionality of the search space. Finally, integration with ECG might reduce the number of ultrasound frames needing to be considered by allowing us to select ultrasound images near the phase of the cardiac cycle at which the preoperative CT was gated [8]. In fact, because the search time is proportional to the number of ultrasound images in the sequence—which in our data sets was between 45 and 110 (see Table 1)—ECG integration alone could speed the search by between one and two orders of magnitude, which may be fast enough for execution during perioperative setup.

For a deformable organ such as the heart, the restriction of our method to the computation of a similarity transform may seem limiting; however, allowing in-plane deformation of the slice would make it difficult to differentiate frames of the TEE sequence. Rigid slice-to-volume registration appears to perform as well as nonrigid methods when the deformation of the anatomy is low between the two image acquisitions and less so otherwise [19]. Our algorithm relies on this disparity to be able to compute both the temporal and spatial registration simultaneously; it assumes that temporally aligned images should result in lower spatial registration error than those that are temporally misaligned. However, additional deformations may be present even in temporally well-registered images due to factors such as irregular heart beats, respiration, and positional differences. Though nonrigid registration may help accommodate these differences, nonrigid slice-to-volume registrations can also produce results which are anatomically unsound, despite having low geometric error [13]. Instead, remaining deformations along with uncertainty in the computed transformation may be partially ameliorated by adaptive visualization techniques [43].

After computing a similarity transform mapping ultrasound to CT coordinates perioperatively, it could be necessary to refine the transform intraoperatively to account for small changes in the relative placement of the probe. The use of a tracked ultrasound probe would enable us to automatically account for these movements without requiring further registration computations. In this case, our perioperative registration would serve to compute the mapping from intraoperative world coordinates to preoperative CT coordinates. Alternatively, the perioperative registration could be used as an initialization for a more efficient intraoperative registration to be executed online. The static nature of our preoperative CT limits the opportunity for dynamic image fusion beyond a simple similarity transform because we lack a model of the 3D heart motion; however, in AF patients, motion captured by preoperative dynamic CT may not be representative of the motion observed intraoperatively due to the irregular heart beats characterizing the condition.

It is important to note that the registration errors reported in this study are based on the nearest neighbor between the model and target point sets, as opposed to being based on specific anatomical landmarks. It is therefore possible to have cases of poor registration with very low error values, especially when dealing with nonrigid transformations. Our goal in this work was to demonstrate the overall feasibility of our approach; in future work, it will be important to validate our method with objective, landmark-based error metrics, possibly via synthetic data and/or a phantom study. In such a study, we could better analyze which sources of error in our pipeline, if any, need to be improved to reach clinical applicability, and verify the strength and correctness of the global minimum in our search.

Though most of our process is automated, we do rely on human input for segmentation purposes both preoperatively and during perioperative setup. In particular, we require manual segmentation of the four cardiac chambers from the preoperative CT, as well as perioperative manual delineation of points outlining the chambers in the intraoperative ultrasound. It may be possible to automate some or all of this work using modern automatic segmentation methods, such as those based on deep learning.

Conclusion

We presented a way to register a preoperative 3D cardiac CT scan to intraoperative 2D TEE ultrasound images. This was done by first optimizing a hierarchy of affine transformations using a novel generalized expectation maximization formulation and then solving a discretized slice-to-volume registration. Our approach computed the registration in both time and space, not relying on ECG gating for temporal alignment. Results, computed from data of four surgical patients, were quantitatively and qualitatively promising. Future work may include ground-truth validation of our method, optimization of our computations to reach sufficient speed for clinical applications, and integration into a navigation system for minimally invasive beating-heart cardiac surgery.