Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Over the past two - three decades, ultrasound (US) imaging has evolved as the preferred, standard-of-care imaging modality for the diagnosis, screening, monitoring, and real-time guidance of several conditions. Specifically, thanks to its real-time capabilities, relatively inexpensive cost (compared to other modalities), and lack of exposure, US imaging has become the “first-line” modality for patient screening, diagnosis, and cardiac function assessment.

Trans-esophageal echocardiography (TEE) enables heart imaging while minimizing signal attenuation and optimizing field-of-view. As such, TEE is not only used for screening and diagnosis, but also for intra-operative therapy monitoring and/or image-guided cardiac interventions. Since the mid-2000s, TEE technology has accommodated 3D image acquisition and visualization of the cardiac anatomy in lieu of simple 2D renderings. However, despite the added bonus of 3D and 4D (3D + time) displays, the inherent trade-off between frame rate, and extent of anatomy covered, has determined clinicians to resort to the acquisition and visualization of multi-planar (orthogonal bi-plane or tri-plane) images to estimate the required parameters to assess cardiac function (i.e., ejection fraction) or identify critical features for image-guided therapy.

Despite their high frame rate, 2D US images are hampered by several well known limitations: challenging interpretation and uncertainty in identifying structures of interest due to inherent specular appearance. Several approaches for LV segmentation in echocardiography [1] have been popularly formulated as a contour finding problem, with the active contour method [2, 3] being extensively used. Given its edge-based energy approach, the active contour method often produced many local minima and is also sensitive to the initialization. Inspired by the active contours, the level set method [4, 5] uses both edge- and region-based energy, making it more robust and less sensitive to initialization.

Active shape [6] and active appearance models [7] incorporate knowledge of the LV shape and appearance from manually segmented training sets, but assume a Gaussian distribution of the shape and appearance derived from the training sets, requiring an initial approximation close to the final solution. On the other hand, database-guided segmentation [8] overcome the initialization problem by implicitly encoding prior knowledge from the expert-annotated databases, yet at the expense of a highly complex search process. Other supervised learning techniques, such as artificial neural networks [9], have been used to detect endocardial border pixels using expert annotated training sets, but require large training sets and are unable to handle cases well outside of the training set.

In this work we propose the implementation and clinical validation of an automatic workflow that encompasses well-evaluated filtering, segmentation, registration, and volume reconstruction techniques as a means to provide a rapid, robust and accurate framework for feature tracking from multi-plane ultrasound image sequences. The proposed computational framework was developed in close collaboration with our echocardiography colleagues, motivated by the need to reduce user-dependent and user-induced bias and reduce the uncertainty associated with the process of manually identifying features from US image sequences. The impact and contribution of the proposed work is the integration of several image processing techniques (i.e., phase-based filtering, segmentation, registration and volume reconstruction) into a streamlined workflow that utilizes traditional standard of care images and fits seamlessly within the current workflows associated with both cardiac function assessment and intra-operative cardiac intervention guidance and monitoring.

2 Methodology

Speckle noise and signal dropouts inherent in US images render intensity based approaches unreliable; rather, local-phase based approaches [10], theoretically invariant to the intensity magnitude, have been preferred for detecting endocardium. Here we exploit the robustness of phase-based feature detection and combine it with the power of graph cut-based techniques [11] that use both region and boundary regularization, to obtain a rapid, automatic piecewise smooth segmentation of the LV blood pool and muscle regions. In addition, we conducted a preliminary study using retrospective clinical patient data consisting of tri-plane (60\(^\circ \) to one another) TEE image sequences through the cardiac cycle to validate the proposed tools and demonstrate their clinical utility and performance against commercial, clinical-grade, clinician-operated software.

The proposed methodology encompasses three steps: (1) endocardial left ventricle (LV) feature extraction and blood-pool segmentation from the raw 2D multi-plane image sequences, (2) frame-to-frame feature tracking and propagation through the cardiac cycle using non-rigid image registration, and (3) 3D reconstruction of the LV blood pool geometry at the desired cardiac phases using spline-based interpolation and convex hull fitting.

2.1 LV Feature Extraction and Blood-Pool Segmentation

Image Pre-processing via Monogenic Filtering: Unlike intensity-based edge detection algorithms are inefficient in identifying features from US images, intensity invariant local phase-based techniques have shown promising results [10], where a local phase of \(\pm \pi /2\) signifies high symmetry, while a local phase of 0 or \(\pi \) signifies high asymmetry [12]. The local phase computation of a 1D signal uses a complex analytic signal comprised of the original signal as the real part and its corresponding Hilbert transform as the imaginary part. However, since the Hilbert transform is mathematically restricted to 1D with no straightforward extension to 2D and 3D, we used the method described in [13] to extend the concept of the analytic signal to higher dimensions using a monogenic signal. The higher dimension monogenic signal is generated by combining a bandpass Gaussian-derivative filter with a vector-valued odd filter (i.e., a Reisz filter). The low frequency variations in the local phase are extracted using a high spread (\(\sigma \)) Gaussian-derivative filter, while the high frequency components are extracted using a low spread (\(\sigma \)) Gaussian-derivative filter. The described monogenic filtering sequence is used to transform each of the three tri-plane 2D US images into corresponding “cartoon” images in which the blood pool and myocardial wall appear enhanced, facilitating their segmentation in the subsequent step.

Fig. 1.
figure 1

Segmentation Workflow: (a) original US image, (b) high spread (\(\sigma \)) low frequency monogenic filter applied to the “2D + time” image dataset shown with the high confidence blood pool mask, (c) low spread (\(\sigma \)) high frequency monogenic filter output with blood pool removed, (d) “cartoon” image with enhanced regions, and (e) graph cut segmentation output (f) superimposed onto the original image.

Graph Cut-Based Segmentation: The resulting “cartoon” image is used to construct a four neighborhood graph structure in which each pixel is connected to its east, west, north and south neighbors. Three special nodes called terminals are added, which represent three classes (labels): background, blood pool and myocardium. The segmentation can be formulated as an energy minimization problem to find the labeling f, such that it minimizes the energy:

$$\begin{aligned} E(f)~=~\sum \limits _{\{p,q\} \in \mathcal {N}} V_{p,q}(f_p,f_q)~+~\sum \limits _{p \in \mathcal {P}} D_p(i_p,f_p), \end{aligned}$$
(1)

where the first term represents smoothness energy, which forces pixels p and q defined by a set of interacting pair \(\mathcal {N}\), towards the same label. The second term represents the data energy that reduces the disagreement between the labeling f and the observed data \(i_p\). The links between each pixel and the terminals (i.e., t-links) are formulated as the negative logarithm of the normal distribution [14]:

$$\begin{aligned} D_p(i_p,f_p)=-ln\left( \frac{1}{\sigma \sqrt{2 \pi }}exp\left( -\frac{(i_p-\mu )^2}{2 \sigma ^2}\right) \right) , \end{aligned}$$
(2)

where \(\mu \) and \(\sigma \) are the mean and standard deviation for the three classes obtained from the image. The links between neighboring pixels, called n-links, are weighted according to their similarity to formulate the smoothness energy:

$$\begin{aligned} V_{p,q}(f_p,f_q) = {\left\{ \begin{array}{ll} 2K \cdot T(f_p \ne f_q) &{} \text {if}~|I_p-I_q| \le C\\ K \cdot T(f_p \ne f_q) &{} \text {if}~|I_p-I_q| > C \end{array}\right. } \end{aligned}$$
(3)

where \(T(\cdot )\) is 1 if its argument is true, and otherwise 0, K is a constant, and C is a intensity threshold that forces the neighboring pixels within the threshold towards the same label. The minimum cut equivalent to the maximum flow is obtained via the expansion algorithm in [11] yielding optimal segmentation of background, blood-pool, and myocardium (Fig. 1e).

2.2 Frame-to-frame Feature Tracking and Propagation

Image Pre-processing: Once a single-phase image is segmented using the procedure outlined in Sect. 2.1, the extracted features are tracked and propagated throughout the cardiac cycle using non-rigid registration (Fig. 2). Prior to registration, each “2D + time” image sequence corresponding to each of the tri-plane views is first “prepared” by identifying a region of interest-based “bounding box” centered on the features that belong to the LV. To ensure the chosen “bounding box” spans the entire LV including blood-pool, myocardium, and surrounding region, this window is selected based on the high confidence blood pool mask obtained after the application of the high spread Gaussian-derivative filter employed in Sect. 2.1 to the entire image sequence, followed by an isotropic dilation to ensure full coverage beyond the LV myocardial boundary. Moreover, the mitral valve region is “trimmed” using a straight line joining the leaflet hinges.

Fig. 2.
figure 2

The frame-to-frame motion transforms (\(T_{(k-1)\rightarrow k}\)) are estimated by non-rigidly registering adjacent images in the sequence, then concatenated (\(T_{1\rightarrow k}\) = \(T_{1 \rightarrow 2}\) \(\cdot \) ...\(\cdot \) \(T_{(k-1) \rightarrow k}\)) and applied to the segmented end-diastolic (ED) frame (\(F_{k}\) = \(T_{1 \rightarrow k}\) \(\cdot \) \(F_{1}\) = \(T_{(k-1) \rightarrow k}\) \(\cdot \) ...\(\cdot \) \(T_{1 \rightarrow 2}\) \(\cdot \) \(F_{1}\)).

Non-rigid Registration Algorithm: The employed registration algorithm is a modified version of the biomechanics-based algorithm proposed by Lamash et al. [15]. The LV anatomy is modeled as a two compartment model consisting of muscle — linear elastic, isotropic, and incompressible, and blood-pool, with prescribed smoothness constraints to allow rapid motion of the endocardial contour. We initialize the algorithm by first discretizing the endocardial and epicardial contours, then constructing a mesh of the blood-pool and myocardium. Rather than resorting to a rectangular grid, we account for the local curvature of the endocardial border using a finite-element like mesh defined via linear shape functions. The algorithm deforms the mesh by estimating the required deforming forces that minimize the sum of the squared difference between the initial and target images (Fig. 3). To avoid large deformations and ensure a smooth displacement field, a linear elastic regularization approach [16] is utilized.

Fig. 3.
figure 3

Registration workflow: (a) the original image is “prepared” by automatically identifying an LV-centered ROI (b) onto which the mesh is applied (c), then registered to the target image (d); the resulting displacement field (e) is applied to the pre-registered image (b) to obtain the registered image (f), which can be compared to the target image (d) by visualizing the digitally subtracted image (g).

2.3 3D LV Volume Reconstruction

Following the segmentation of each of the tri-plane views at end-diastole using the technique in Sect. 2.1 and their propagation throughout the cardiac cycle, the resulting images are re-inserted into a pseudo-3D image volume along the same orientation at which they were originally acquired (i.e., 60\(^\circ \) apart) corresponding to each cardiac phase. The boundary points of each segmented contour at the same elevation are then fitted using the parametric variational cubic spline technique in [17]. The spline interpolated data is used to generate a convex hull using the algorithm proposed in [18] (Fig. 4).

Fig. 4.
figure 4

Schematic illustration of the 3D LV reconstruction: the tri-plane views at 60\(^\circ \) (a) are inserted at their appropriate orientation (b), followed by spline interpolation and convex hull generation (c).

3 Evaluation and Results

We conducted a preliminary study using retrospective tri-plane time series data spanning multiple cardiac cycles from patients who underwent TEE imaging for cardiac function assessment. Since the proposed framework encompasses three different components — automatic extraction of endocardial features, registration-based feature tracking and propagation, and volume reconstruction — we assessed the performance of each component against the ground truth, which consists of the blood-pool representation annotated manually by the expert clinician, using the EchpPac PC clinical software. In addition, we also evaluated the performance at each stages of our application running in MATLAB on an Intel\(^\circledR \) Xenon\(^\circledR \) 3.60 GHz 32 GB RAM PC.

Automatic Direct Frame Endocardial Feature Extraction Evaluation: We first evaluated the accuracy of our automatic, direct frame endocardial feature extraction component against expert manual annotation of the same features from the same frames performed by a cardiologist using the GE EchoPac PC clinical software. Table 1 summarizes the blood-pool area measurements annotated by the expert (Ground Truth) and the area obtained via A — automatic feature detection from individual frames; B — single phase automated feature detection + registration-based propagation; and C — single phase expert manual annotation + registration-based propagation. Measurements are evaluated at two cardiac phases — end-diastole (ED) and end-systole (ES) — and averaged across all views and multiple cardiac cycles spanned by the acquired sequences. Our automatic blood-pool extraction technique required 26.5 s to segment a “2D + time” 15 frame TEE tri-plane sequence.

Table 1. Comparison between the blood-pool area measurements (Mean \(\pm \) Std. Dev. [cm\(^2\)]) annotated by the expert (Ground Truth) and the area obtained via A — automatic feature detection from individual frames; B — single phase automated feature detection + registration-based propagation; and C — single phase expert manual annotation + registration-based propagation. Measurements are evaluated at two cardiac phases — end-diastole (ED) and end-systole (ES) — and averaged across all views and cardiac cycles spanned by the acquired data.

Registration-Based Blood-Pool Tracking and Propagation Evaluation: To evaluate the accuracy with which the non-rigid registration algorithm propagates the extracted features throughout the cardiac cycle, we employed several metrics, including the DICE correlation, Hausdorff distance, mean absolute distance error and endocardial target registration error (TRE) computed between the ground truth blood-pool manually annotated by the expert and the blood-pool depicted via three other methods under consideration (Table 2).

Table 2. Mean \(\pm \) Std. Dev. of several metrics — DICE Coefficient [%], Hausdorff Distance [mm], Mean Absolute Distance (MAD) Error [mm], and Endocardial TRE [mm] — used to compare the expert clinicians’ blood-pool annotations (Ground Truth) with the blood-pool annotation obtained via A — automatic feature detection from individual frames; B — single phase automated feature detection + registration-based propagation; and C — single phase expert manual annotation + registration-based propagation. Measurements are evaluated at two cardiac phases — end-diastole (ED) and end-systole (ES).

Figure 5 visually compares the ground truth blood-pool annotation performed by the expert clinician to that extracted via direct frame feature identification, as well as registration-based propagation of the single-frame blood-pool annotated either manually by the expert or automatically using the first component of our proposed framework. The segmentation propagation technique required 162 s to run through a 15 frame tri-plane TEE sequence.

Fig. 5.
figure 5

Visual comparison of the blood-pool annotations achieved via A — automatic feature detection from individual frames; B — single phase automated feature detection + registration-based propagation; and C — single phase expert manual annotation + registration-based propagation vs. the ground truth expert manual blood-pool annotation (GT) quantified at end-diastole (ED) and end-systole (ES) for the three tri-plane views (V1, V2 and V3). White regions are common between the GT and each of the three A, B and C blood-pool estimates, red regions belong to the expert annotated blood-pool (GT), while the blue regions belong to the blood-pool area depicted by each of the three annotation methods A, B or C under comparison. Panels are named according to the same convention — i.e., the panel labeled GT-B V2 ES compares the ground truth expert-annotated blood-pool (GT) to the blood-pool annotated using Method B displayed in View 2 at end-systole (Color figure online).

3D Volume Reconstruction and Ejection Fraction Evaluation: Lastly, we assessed the accuracy of the 3D LV reconstruction procedure by comparing the reconstructed LV volume to that estimated by the GE EchoPac PC clinical software following expert manual segmentation. The end-diastolic and systolic volume measurements are summarized in Table 3, along with the corresponding ejection fraction measurements. Performance-wise, the LV volume reconstruction from a tri-plane sequence requires 11.6 s.

Table 3. Comparison between the LV blood-pool volume and Ejection Fraction (EF) between expert manual annotations (Ground Truth) and A — automatic feature detection from individual frames; B — single phase automated feature detection + registration-based propagation; and C — single phase expert manual annotation + registration-based propagation. Measurements were evaluated at two cardiac phases — end-systole (ES) and end-diastole (ED).

4 Discussion

We described the implementation and clinical data evaluation of a rapid, automatic framework that encompasses well-evaluated filtering, segmentation, registration, and volume reconstruction techniques as a means to provide a rapid, robust and accurate framework for feature tracking from multi-plane ultrasound image sequences. All components of the proposed technique — segmentation, registration-based feature tracking and propagation, and 3D blood-pool volume reconstruction — were assessed against expert manual segmentation at both the systolic and diastolic cardiac phases and demonstrated accurate and consistent performance, while significantly minimizing user-induced variability. Furthermore, unlike other techniques that operate on 3D datasets, this technique enables rapid and consistent analysis of multi-plane, 2D US image sequences — the standard format for acquisition, interpretation, and analysis of cardiac US images.

As the proposed workflow integrates multiple algorithms, the influence of different parameters in the segmentation result is an important consideration. The frequency specific to the monogenic filter operates over a wide range of values and yields a good quality “cartoon image” for further segmentation. Similarly, for the graph cut algorithm, the mean and standard deviations for the blood pool, muscle and background regions are adaptively extracted from the image content, while the threshold ’C’ that constraints the pixels towards same label can span a sufficiently wide range without significantly effecting the segmentation result. Furthermore, Lamash et al. [15] have thoroughly studied the effects of various regularization parameters in the biomechanics-based registration; for our purpose we selected the optimal parameters as suggested by the paper [15]. In summary, the proposed workflow yields a consistent segmentation result over a wide range of parameter values.

Unlike expert manual segmentation that is highly sensitive to intra- and inter-observer variability, the proposed technique provides a consistent result for each dataset, which can be reviewed and improved, if needed, by expert clinicians. The single-phase feature extraction, followed by tracking and propagation via registration further reduces uncertainty, avoiding the need to segment each frame independently by using the a priori frame information along with the image sequence to achieve optimal segmentation. Hence, should the expert clinician choose to perform any adjustments to the single-phase segmentation, their precise tracking and propagation throughout the cardiac cycle is guaranteed by the registration-based implementation.

5 Summary and Future Work

The impact and contribution of the proposed work is the integration of several image processing techniques (i.e., phase-based filtering, segmentation, registration and volume reconstruction) into a streamlined workflow that utilizes traditional standard of care images and fits seamlessly within the current workflows associated with both cardiac function assessment and intra-operative cardiac intervention guidance and monitoring.

Ongoing and future efforts include further evaluation and demonstration of how the proposed technique can cater to dynamically reconstructing 3D endocardial LV representations that facilitate computer-assisted assessment of stroke volume and ejection fraction, as well as employing intra-operative multi-plane 2D TEE data to dynamically update and animate CT and/or MRI anatomy depicted pre-operatively to better represent the intra-operative conditions. Lastly, although we believe the most meaningful assessment is still against the expert clinicians analysis of the same input data, we acknowledge the importance of assessing the output of our proposed framework against the output of other techniques and extend the analysis to a large dataset of multi-plane image sequences acquired across multiple cardiac cycles.

Besides its direct application to computer-aided cardiac function assessment, the proposed framework is readily adaptable to the guidance and monitoring of image-guided cardiac interventions, most of which involve the use of real-time ultrasound imaging the clinical standard of care for cardiac procedures.