Robust motion tracking in liver from 2D ultrasound images using supporters

Ozkan, Ece; Tanner, Christine; Kastelic, Matej; Mattausch, Oliver; Makhinya, Maxim; Goksel, Orcun

doi:10.1007/s11548-017-1559-8

Robust motion tracking in liver from 2D ultrasound images using supporters

Original Article
Published: 22 March 2017

Volume 12, pages 941–950, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Robust motion tracking in liver from 2D ultrasound images using supporters

Download PDF

Ece Ozkan¹,
Christine Tanner¹,
Matej Kastelic¹,
Oliver Mattausch¹,
Maxim Makhinya¹ &
…
Orcun Goksel¹

848 Accesses
19 Citations
Explore all metrics

Abstract

Purpose

Effectiveness of image-guided radiation therapy with precise dose delivery depends highly on accurate target localization, which may involve motion during treatment due to, e.g., breathing and drift. Therefore, it is important to track the motion and adjust the radiation delivery accordingly. Tracking generally requires reliable target appearance and image features, whereas in ultrasound imaging acoustic shadowing and other artifacts may degrade the visibility of a target, leading to substantial tracking errors. To minimize such errors, we propose a method based on so-called supporters, a computer vision tracking technique. This allows us to leverage information from surrounding motion for improving robustness of motion tracking on 2D ultrasound image sequences of the liver.

Methods

Image features, potentially useful for predicting the target positions, are individually tracked, and a supporter model capturing the coupling of motion between these features and the target is learned on-line. This model is then applied to predict the target position, when the target cannot be otherwise tracked reliably.

Results

The proposed method was evaluated using the Challenge on Liver Ultrasound Tracking (CLUST)-2015 dataset. Leave-one-out cross-validation was performed on the training set of 24 2D image sequences of each 1–5 min. The method was then applied on the test set (24 2D sequences), where the results were evaluated by the challenge organizers, yielding 1.04 mm mean and 2.26 mm 95%ile tracking error for all targets. We also devised a simulation framework to emulate acoustic shadowing artifacts from the ribs, which showed effective tracking despite the shadows.

Conclusions

Results support the feasibility and demonstrate the advantages of using supporters. The proposed method improves its baseline tracker, which uses optic flow and elliptic vessel models, and yields the state-of-the-art real-time tracking solution for the CLUST challenge.

Ultrasound-based liver tracking utilizing a hybrid template/optical flow approach

Article 05 June 2018

Gated-tracking: Estimation of Respiratory Motion with Confidence

An Online Learning Approach for Robust Motion Tracking in Liver Ultrasound Sequence

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Ultrasound (US) imaging is a low-cost, real-time, and non-ionizing method, which makes it an appealing choice for image-guided computer-assisted interventions in radiation therapy. Treatments of liver tumors using high-intensity focused ultrasound, intensity-modulated radiation therapy, or proton therapy enable precise dose delivery to the desired location. However, the target region during the treatment is affected by internal body motion, such as breathing, which is a major drawback in effectiveness of these treatments. Not taking the respiratory motion into account would cause deviations of the delivered dose distribution from the intended one and increase radiation exposure of healthy tissue while lowering dose to the target volume, which would reduce efficiency and aggregate complications [1].

One of the strategies to reduce breathing-induced organ motion during radiation treatment is deep inspiration breath hold method [2], where a patient performs a supervised breath hold during therapy, which requires active support and the ability of the patient to maintain such a breath hold. Another possible approach to compensate for breathing motion that does not require patient compliance is to track the position of the target region during therapy and dynamically adjust the radiation accordingly.

To use motion tracking algorithms for radiation therapy interventions, real time, accurate, and robust localization of the target region for the entire procedure is required. US imaging being non-ionizing and real time makes it an ideal choice for this aim [3]. There are numerous studies focusing on tracking of liver motion in US image sequences using different approaches, such as image registration [4], block matching [5], and optic flow [6]. However, these methods are generally affected by limitations of US imaging such as low signal-to-noise ratio (SNR) and large appearance changes of the tracked landmarks caused by, e.g., acoustic shadowing due to poor transducer–skin contact or highly reflecting anatomical structures like the ribs.

In this work, we propose to use supporters, a computer vision technique [7], to improve optic flow-based tracking. This relies on tracking additional image features, potentially beneficial for predicting the target position. To that end, a supporter model is built based on motion coupling observed on some frames between these tracked features (supporters) and the target. Using this model, the tracking can then be made robust to changes in target appearance, where a consensus voting of several supporter estimations can be used to infer target location.

Considering motion tracking in medical images, supporters were used earlier for determining two orthogonal MR acquisition planes through the heart valve [7]. Instead of the valve itself, which may leave the image, four annotated points (supporters) on a plane perpendicular to the valve were tracked to define the acquisition planes. A supporter model based on squared Euclidean distances was used to downgrade distant supporters. In [8], supporters were used for tracking abnormalities in video capsule endoscopy. First, the supporters were matched between successive frames by considering a triangular constraint, where the triangle shape is maintained while allowing weak deformations. Then, affine transformations calculated from the supporter triplet help determine abnormal positions, where the precise position is estimated from the features of the target itself. In [9], cells were tracked in spatiotemporal optical images from densely packed multilayer tissues. The tight spatial topology of neighboring cells was exploited as contextual information by applying spatiotemporal graph labeling. In [10], 600 supporters were detected in fluoroscopy images by using Kanade–Lucas–Tomasi feature tracker for automatic motion compensation. An autoregression model and motion clustering was employed for learning the relationship between supporter and target motion. Supporters were also used in many other typical computer vision applications, e.g., in [11,12,13,14,15]. Supporters have not been studied for motion tracking in US images. We hereby show that this method is particularly beneficial in cases where the target cannot be observed directly, such as due to occlusions from shadowing artifacts.

Note that particular challenges of US tracking are poor image quality and the relatively small number of landmarks suitable for tracking. Nevertheless, relative locations of liver landmarks stay stable during radiation therapy of liver tumors, which motivates the use of supporters in this work for 2D US tracking of the liver. We hereby devise an approach for effective supporter model creation from few supporters and evaluate this on a standard public dataset.

Methods

Motion tracking is the process of estimating the trajectory of an object over time by predicting its position in every frame of an image sequence. For image-guided computer-assisted applications, targets in moving organs such as the liver, prostate, and the heart are commonly tracked. Tracking an object position can be challenging, e.g., due to the appearance change over time, low SNR, or occlusions. In US images, tracked target can temporarily disappear by going out of the field of view or by being covered by a shadow due to poor transducer–skin contact or highly reflecting anatomical structures such as the ribs. To improve robustness of a conventional tracking algorithm for such cases, we propose combining it with a supporter model, which takes advantage of correlated surrounding motion.

Tracking with a supporter model

Grabner et al. [7] proposed a method for tracking the invisible using a set of local image features, called supporters, by exploiting the visual context and relative spatial relations to improve target tracking. Good supporters were defined as the image features whose motion is correlated with that of the target and, thus, might be useful for predicting the position of the target. For example, a wristwatch on a hand holding a target object is a good supporter for the position of that target (even when the target is not directly visible or trackable), since their motions are strongly correlated. Below we first summarize the supporter model [7] for the sake of completeness and then describe our methods for its adaption in this work.

Overview of supporter modeling Tracking with supporters has two main modes: learning the model and applying the model. The model captures the statistical relationship between the target and supporter positions and therefore provides a measure of how strongly the motion between each supporter and the target is coupled. This measure can then be used for adjusting the contribution of each supporter in the overall supporter prediction.

The overall goal is to learn and apply a probability density function (pdf) model, $P({\mathbf {x}}|{\mathbf {I}})$, for predicting the position of target object, ${\mathbf {x}}=(x,y)$, in image ${\mathbf {I}}$ via the help of S tracked supporter positions $\{{\mathbf {x}}_{s}|s=1,2, \ldots ,S\}$. For this aim, the relationship between supporter positions $\{{\mathbf {x}}_{s}\}$ and the target position ${\mathbf {x}}$ is learned, providing conditional pdf $P({\mathbf {x}}|{\mathbf {x}}_{s})$ for supporter s. Each supporter s then votes for potential target positions ${\mathbf {x}}$ via pdf $P({\mathbf {x}}|{\mathbf {x}}_{s})$. These votes are combined by accounting for the reliability of the supporter position estimates ${\mathbf {x}}_{s}$ from ${\mathbf {I}}$ with probability $P({\mathbf {x}}_{s}|{\mathbf {I}})$, resulting in pdf using law of total probability

$$\begin{aligned} P({\mathbf {x}}|{\mathbf {I}}) \propto \sum _{s=1}^{S} P({\mathbf {x}}|{\mathbf {x}}_{s})P({\mathbf {x}}_{s}|{\mathbf {I}}). \end{aligned}$$

(1)

The final target position is then determined by finding the position that has the highest likelihood in the voting space.

Learning a supporter model Let ${\mathbf {I}}^0, {\mathbf {I}}^1, \ldots ,{\mathbf {I}}^{F-1}$ be an image sequence consisting of F image frames, {${\mathbf {x}}_{s}^{0}|s=1,2,...,S$} be the set of S supporter positions of the first frame ${\mathbf {I}}^0$, and ${\mathbf {x}}^{0}$ be the target position of ${\mathbf {I}}^0$. The goal of the model is to estimate for frame $I^f$ the most likely target position ${\mathbf {x}}^f$ from the observed supporter positions $\{{\mathbf {x}}_{s}^f\}$. Assuming a translational relationship, this is based on learning per supporter s the conditional pdf of the relative target position $\mathbf {u}_{s}={\mathbf {x}}-{\mathbf {x}}_{s}$ for a given ${\mathbf {x}}_{s}$. For on-line learning during tracking, the exponential forgetting principle between the so far learned pdf model $P^{f-1}(\cdot )$ and the current pdf $p(\cdot )$ is used:

$$\begin{aligned}&P^{f}(\mathbf {u}_{s}|{\mathbf {x}}_{s}) = \alpha \ P^{f-1}(\mathbf {u}_{s}|{\mathbf {x}}_{s}) + (1 - \alpha ) \ p(\mathbf {u}^{f}_{s}|{\mathbf {x}}_{s}^f), \end{aligned}$$

(2)

$$\begin{aligned}&P^{f}({\mathbf {x}}_{s}|{\mathbf {I}}) = \alpha \ P^{f-1}({\mathbf {x}}_{s}|{\mathbf {I}}) + (1 - \alpha ) \ p({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f}) , \end{aligned}$$

(3)

where forgetting factor $\alpha \in [0,1]$ weights the contribution of past and current pdfs. $P^{f}(\mathbf {u}_{s}|{\mathbf {x}}_{s})$ is the model learned from frames 1 to f and provides the pdf of supporter position ${\mathbf {x}}_{s}$ voting for relative target position $\mathbf {u}_{s}$. $p(\mathbf {u}^{f}_{s}|{\mathbf {x}}_{s}^f)$ is the corresponding pdf derived only from the tracked positions in the current frame f. $P^{f}({\mathbf {x}}_{s}|{\mathbf {I}})$ is the reliability model of the supporter position estimation learned from frames 1 to f. $p({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f})$ defines the reliability of supporter position ${\mathbf {x}}_{s}^f$. We will explain how $P^{f}(\cdot )$ and $p(\cdot )$ are defined in practice in “Robust motion tracking by estimating the target position using supporters” section.

Applying the supporter model Given image ${\mathbf {I}}^f$ and tracked supporter positions $\{{\mathbf {x}}_{s}^f\}$, the learned supporter models $P^{f}(\mathbf {u}_{s}|{\mathbf {x}}_s)$ and $P^{f}({\mathbf {x}}_{s}|{\mathbf {I}})$ are evaluated for ${\mathbf {x}}_{s}$ = ${\mathbf {x}}_{s}^{f}$ and ${\mathbf {I}}={\mathbf {I}}^f$. From this the target position, ${\mathbf {x}}^f$ is estimated by using Eqs. (2) and (3) in Eq. (1), where the pdfs for the relative target positions are brought into the target space via $P^{f}({\mathbf {x}}= \mathbf {u}_{s}+{\mathbf {x}}_{s}^f|{\mathbf {x}}_{s}^f) = P^{f}(\mathbf {u}_{s}|{\mathbf {x}}_{s}^f)$, i.e.,

$$\begin{aligned}&{\mathbf {x}}^f=\arg \max _{{\mathbf {x}}} P({\mathbf {x}}|{\mathbf {I}}^f) \quad \text{ with } \quad \nonumber \\&\quad P({\mathbf {x}}|{\mathbf {I}}^f) = \sum _{s=1}^S P^{f}({\mathbf {x}}|{\mathbf {x}}_{s}^f)P^{f}({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f}). \end{aligned}$$

(4)

Robust motion tracking by estimating the target position using supporters

Tracking with supporters requires another tracking method to compute supporter locations and their reliability. Supporters can then assist and correct such a baseline method to achieve improved tracking results. We first summarize our method for a generic object tracker (see also Algorithm 1) and then instantiate it with a particular tracking method later below.

Input data Our method uses a given initial target position ${\mathbf {x}}^{0}$, a fixed set of initial supporter positions {${\mathbf {x}}^{0}_{s}$}, and reference patches around the target, $\mathbf {B}^{0}$, and each supporter, $\{\mathbf {B}^{0}_{s}\}$, where positions and reference patches are manually annotated in the first image frame ${\mathbf {I}}^0$. Note that the reference patches are manually chosen to contain distinct image appearance compared to their surrounding. For the current frame $f>0$, we obtain target and supporter position estimations from the conventional object tracker, which are denoted as ${\mathbf {x}}^{f}_{t}$ and $\{{\mathbf {x}}^{f}_{s}\}$, respectively.

Tracking reliability Assuming that the feature appearance changes only linearly during tracking, we use the correlation coefficient measure between image patches for estimating the tracking reliability. For this, we extract patches $\mathbf {B}^{f}$ and $\mathbf {B}^{f}_{s}$, of the same size as $\mathbf {B}^{0}$ and $\mathbf {B}^{0}_{s}$, centered around the tracked positions ${\mathbf {x}}^{f}_{t}$ and ${\mathbf {x}}^{f}_{s}$, respectively. Then, we calculate the correlation coefficient between the corresponding patches, i.e., $\rho ^f = CC(\mathbf {B}^{0},\mathbf {B}^{f}$) and $\rho ^f_s = CC(\mathbf {B}^{0}_{s},\mathbf {B}^{f}_{s}$). We employ reliability measure $\rho ^f$ to decide whether to rely on the current target position for tracking and updating the model. Specifically, if $\rho ^{f} \ge \theta _{CC}$, which is a learned threshold, we assume to have reliable object tracking and use this position, i.e., ${\mathbf {x}}^{f} = {\mathbf {x}}^{f}_{t}$. Furthermore, for another threshold $ \theta _{\textit{update}} > \theta _{CC}$, if $\rho ^{f} \ge \theta _{\textit{update}}$, then the supporter model is updated as described next.

Supporter model learning The supporter model $P^f(\mathbf {u}_{s}|{\mathbf {x}}_{s}^f)$ from Eq. (2) is approximated with a 2D Gaussian distribution by

$$\begin{aligned} P^f(\mathbf {u}_{s}|{\mathbf {x}}_{s}^f) \propto \frac{1}{2 \pi \sqrt{ |\mathbf {C}_{s}^f|}} \exp \left( -\frac{1}{2} (\mathbf {u}_{s} {-} \varvec{\mu }_{s}^f)(\mathbf {C}_{s}^f)^{-1}(\mathbf {u}_{s} {-} \varvec{\mu }_{s}^f)^\intercal \right) , \end{aligned}$$

(5)

where $\varvec{\mu }_{s}^f$ and $\mathbf {C}_{s}^f$ denote the on-line learned mean and covariance matrix, respectively, of the relative target positions $\mathbf {u}^{f}_{s}$ across frames, i.e.,

$$\begin{aligned} \varvec{\mu }^{f}_{s}&= \alpha \ \varvec{\mu }^{f-1}_{s} + (1-\alpha ) \ \mathbf {u}^{f}_{s}, \end{aligned}$$

(6)

$$\begin{aligned} \mathbf {C}^{f}_{s}&= \alpha \ \mathbf {C}^{f-1}_{s} + (1-\alpha ) \ \mathbf {C}_{s}, \end{aligned}$$

(7)

where the covariance matrix $\mathbf {C}_{s}$ captures the variance contribution of the current relative target position $\mathbf {u}^{f}_{s} = [u^{f}_{s}, v^{f}_{s}]$ with respect to the current mean $\varvec{\mu }^{f}_{s} =[\mu ^{f}_{s,u}, \mu ^{f}_{s,v}]$:

$$\begin{aligned} \mathbf {C}_{s} = \begin{bmatrix} \left( u^{f}_{s} - \mu ^{f}_{s,u}\right) ^{2}&0 \\ 0&\left( v^{f}_{s} - \mu ^{f}_{s,v}\right) ^{2} \end{bmatrix}. \end{aligned}$$

(8)

An illustration of such a distribution is shown in Fig. 1a.

Supporter model application We use the supporter model to predict the target position ${\mathbf {x}}^{f}$ if the tracked target position ${\mathbf {x}}_t^{f}$ is not reliable (i.e., $\rho ^{f} < \theta _{CC}$). The most likely relative target location per supporter s is mean $\varvec{\mu }^{f}_{s} = \arg \max _{\mathbf {u}} P^f(\mathbf {u}|{\mathbf {x}}_{s}^f)$, with corresponding probability $P^f(\varvec{\mu }^{f}_{s}|{\mathbf {x}}_{s}^f) = 1/(2 \pi \sqrt{|\mathbf {C}_{s}^f|})$. Instead of predicting the target from the peak of the resulting Gaussian mixture model (GMM) distribution (see Fig. 1b for a 1D illustration), we use a weighted average of the mean values from all mixture components [16] and incorporate the reliability of the supporter position predictions, i.e., $P^f({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f}) = \rho ^f_{s}$. The prediction from all supporters is then

$$\begin{aligned} {\mathbf {x}}^{f}_{p}= & {} \frac{\sum _{s} (\varvec{\mu }^{f}_{s}+{\mathbf {x}}^{f}_{s}) P^f(\varvec{\mu }^{f}_{s}|{\mathbf {x}}_{s}^f) P^f({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f})}{\sum _{s} P^f(\varvec{\mu }^{f}_{s}|{\mathbf {x}}_{s}^f) P^f({\mathbf {x}}_{s}^f|{\mathbf {I}}^{f})}\nonumber \\= & {} \frac{\sum _{s} (\varvec{\mu }^{f}_{s}+{\mathbf {x}}^{f}_{s}) \rho ^f_{s}/\sqrt{|\mathbf {C}_{s}^f|} }{\sum _{s} \rho ^f_{s}/\sqrt{|\mathbf {C}_{s}^f|} }. \end{aligned}$$

(9)

Finally, if the applied supporter model and the main object tracker agree on the target position estimation, i.e., $P({\mathbf {x}}^{f}_{t}) = \sum _{s} P^f({\mathbf {x}}^{f}_{t}-{\mathbf {x}}^{f}_{s}|{\mathbf {x}}_{s}) \rho ^f_{s} \ge \theta _{P}$, then the estimation from the main tracker is used: ${\mathbf {x}}^{f} = {\mathbf {x}}^{f}_{t}$. Otherwise, we use the supporter prediction ${\mathbf {x}}^{f} = {\mathbf {x}}^{f}_{p}$.

An example for target position estimation using the supporter model is shown in Fig. 2.

Experiments and results

We evaluated our method using the 2D liver US image sequences provided by the Challenge on Liver Ultrasound Tracking (CLUST)-2015 [17]. A main advantage of supporters is the robustness to feature appearance in tracking, for instance, when a target is occluded by acoustic shadowing. Since such disappearing target locations are not (and cannot reliably be) annotated in the given dataset, we devised a simulation framework to emulate acoustic shadowing artifacts from the ribs on the images and evaluated this scenario. As the baseline object tracker, we employed [6] such that motion tracking with and without using the supporter model can be compared.

CLUST-2015 dataset

The CLUST-2015 dataset includes 2D liver US image sequences and consists of two subsets, namely training and test set. The sequences in the dataset have a duration between 60 and 330 s. The training set has 24 image sequences with manual annotations in 10% of all frames. The annotations are mostly for vessel cross sections in the liver, which are reliable landmarks for liver motion. The test set contains 24 image sequences with no public annotations apart from the reference positions ${\mathbf {x}}^0$, and the submitted results are evaluated by the challenge organizers. For the evaluation, the Euclidean distance between each manual annotation and the corresponding tracked point is computed, where summary error statistics including mean, standard deviation, and 95%ile errors are reported to the participant. In this work, we are particularly interested in reducing 95%ile errors to minimize large errors for a robust tracking performance throughout all sequences.

For parameter optimization and sensitivity analysis, we used the training set. Our method has four parameters to optimize, which are forgetting factor $\alpha $, correlation coefficient threshold $\theta _{CC}$, supporter model update threshold $\theta _{\textit{update}}$ and target probability threshold $\theta _{P}$. We optimized these parameters for minimizing 95%ile error with leave-one-out cross-validation using grid search. Optimal parameters range from $[\alpha ,\theta _{CC},\theta _{\textit{update}},\theta _{P}] = [0.90,0.3,0.3,0.5]$ to [0.95, 0.3, 0.4, 0.7] and hence are relatively insensitive to the left-out case. The mean parameters were found to be $[\alpha ,\theta _{CC},\theta _{\textit{update}},\theta _{P}] = [0.9479, 0.3000, 0.3021, 0.6625]$. Figure 3 shows the mean, 95%ile, and maximum tracking error distributions from the 24 sequences of the baseline method (abbreviated as TMG for Tracking by Makhinya and Goksel) and our proposed tracker (denoted as RMTwS for Robust Motion Tracking with Supporters). Table 1a compares overall performance for the mean, standard deviation, 95%ile, and maximum error after pooling all training results into one distribution. Note that our proposed method yields a 16% improvement for the 95%ile error.

Table 1 Comparison of mean, standard deviation, 95%ile and maximum of tracking errors (in mm) of baseline (TMG) and proposed (RMTwS) method after pooling all results from a training and b test set

Full size table

We then applied our method on the test set using the optimal parameters found above. Test set results were evaluated by the challenge organizers. Figure 4 compares tracking error distributions of the baseline tracker, TMG, and our proposed tracker, RMTwS, for the 24 test sequences, and Table 1b lists the overall performance after pooling all results. RMTwS yields 1.04 mm mean and 2.26 mm 95%ile error, improving the baseline method by 4.6 and 6.6%, respectively. The 95%ile error of the individual test landmarks was improved by more than 5% for seven landmarks and by >30% for five landmarks. The remaining landmarks have accuracies within 2%.

We also evaluated the time needed to run our proposed method. Learning and applying the supporter model take between 20 and 60 ms per frame in the given sequences on an Intel Core i7–4770K CPU @ 3.5GHz.

Evaluating tracking under shadowing

Since the target points which disappear in the acoustic shadow are not annotated in the CLUST-2015 dataset, we conducted a simulation, where we emulated acoustic shadowing artifacts from a simulated rib on the images and evaluated this scenario. For this purpose, we manually placed a structure of size 12.4 mm $\times $ 7.2 mm, representing a rib cross section in accordance with [18], close to the skin.

We augmented each frame in a US image sequence from the training data with new ultrasound bone shadows by multiplying the input US images with a signal intensity map. For each pixel of an ultrasound image, this map stores the accumulated intensity of the ultrasound signal induced by reflection at the bone surface and energy loss (attenuation) within the bone structures. It is between [0, 1], with 1 for the original signal intensity and 0 for a complete signal loss. The signal intensity map is generated in a multistage process. In the first step, we create a map of attenuation coefficients $\mathbf {Z}$ of bone cross sections, given by intersection of the bone tissue with the transducer plane. To create a bone segment j, we simply rasterize a circle with radius $r_j$ at position $p_j$ in $\mathbf {Z}$. Inside each circle, we store attenuation coefficients $\mathbf {Z}(x,y)=\beta _j$ corresponding to bone segment j, and $\mathbf {Z}(x,y)$ is zero otherwise. Typical values of $\beta $ for bone are used from literature [19].

In the next step, we use ray marching to traverse $\mathbf {Z}$ and create a (pre-scan-converted) signal intensity map $\mathbf {A}$, in a simplified and task-specific variation of more complex ultrasound simulation method [19]. In particular, we traverse the columns (scanlines) of $\mathbf {Z}$ from top to bottom (y-direction). During this, we record a reflected signal intensity at the bone surface and energy loss thereafter and accumulate the attenuation coefficients in $\mathbf {Z}$. At each step of the ray marching process, the current pixel $\mathbf {A}(x,y)$ is computed as $\mathbf {A}(x,y)=\mathbf {A}(x,y-1)\exp (-\mathbf {Z}(x,y))$.

The resulting signal intensity map is finally filtered with a Gaussian function to emulate the blurring due to convolution with the ultrasound point spread function (PSF). Since the input images are from a convex probe, the map is scan-converted from a radial domain into a Cartesian frame, using the scan conversion parameters estimated geometrically from the original image. This yields the typical ultrasound shadow appearance in convex probe images, where the shadows become softer and wider in the far field of the images. This provides simulated image data with ground truth for evaluating tracking under shadowing. Example images of a signal intensity map, an original image, and the resulting shadowed image are shown in Fig. 5.

After generating a 2D US image sequence containing shadow, we applied the baseline and our method to the new sequence. For that, we used the same optimal parameters as for the CLUST-2015 test set, obtained by leave-one-out cross-validation. The mean errors for TMG and RMTwS were 2.79 and 2.61 mm, with 95%ile errors of 12.11 and 10.29 mm. This indicates a 6.5% (15%) improvement in mean (95%ile) error. Examples of tracking performance with and without shadowing for inhale and exhale phases of the breathing cycle are shown in Fig. 6.

Discussion and conclusions

We have demonstrated an ultrasound tracking method using supporters, RMTwS, where image locations other than the target are also tracked in order to exploit motion consistency with such surrounding tissue for improving tracking robustness. We employed an optic flow- and vessel model-based tracker, TMG, as our baseline as well as for tracking the target and supporter locations to then learn and apply the supporter model using these initial estimations. In this work, we are particularly interested in reducing 95%ile errors to ensure effective tracking performance throughout all frames in order to minimize 95%ile therapy margins for more focal therapies and reduced collateral damage to healthy tissue.

Our evaluations using the training and test sets show that the proposed method, RMTwS, can track targets more accurately than the conventional object tracker, TMG. The resulting performance is 1.04 mm mean and 2.26 mm 95%ile errors. This 95%ile tracking performance is relevant in liver motion tracking for radiation and focused therapy applications, when compared to 1.23 mm mean inter-observer 95%tile variability reported for a similar dataset in [17].

The accuracy improvements seem to be small for mean and 95%ile error when taking all trajectories into account. This is because the main object tracker already performs quite well in most cases and fails only in certain situations such as under shadowing. All the same, to enable a satisfactory therapy for every patient, a tracking method should be robust for all scenarios.

Optimal thresholds for updating the supporter model, $\theta _{\textit{update}}$, and the reliability of the tracking performance, $\theta _{CC}$, were found to be very close. A supplementary experiment showed that the tracking performance difference using $\theta _{\textit{update}} = \theta _{CC}$ is insignificant. Thus, one can use the same parameter for $\theta _{CC}$ and $\theta _{\textit{update}}$.

Our proposed method applies the learned supporter model in 12% of the frames, which indicates that the reliability of the tracking performance by TMG is not always high. The main advantage of using supporters for tracking is the robustness in scene or target appearance changes over time, such as due to acoustic shadowing. Since there exist no annotations for such cases in the given dataset and this scenario cannot be evaluated using the current setting, we devised a simulation framework to imitate acoustic shadowing artifacts on the images in a 2D sequence. This simulated experiment showed that without additional optimization for such a scenario, the proposed method improves the 95%ile tracking performance of the baseline by 15%.

On each sequence two to three supporters were used, which is not a large number since there are only a few easily identifiable landmarks in these images. We aim to study automatic landmark detection in the future to automatically identify a (potentially larger) number of supporters, also yielding a interaction-free framework. Additionally, with more supporters available, we plan to conduct a sensitivity analysis regarding their number and locations.

There are several locations in the liver such that the motion of some can be used to imply the motion of the others. However, this method requires that the target and supporter motion are coherent, and there exists a model to infer the former from the latter. The coherence is already checked for the supporter model building process. But there could be more complex models such as finite element (FE) models. If we were to use a continuum mechanics-based approach with our current position priors, extrapolating points outside the FE supporter mesh might not have been robust.

This study is the first demonstrating the benefits of employing supporters for US tracking. Given the target and supporter position estimations from the main object tracker, learning and applying the supporter model take less than 20 ms, where correlation coefficient calculation takes most of it. The resulting tracking technique has a near real-time tracking performance with 22.5 frames per second (fps) on average. As such, it is the state of the art in the CLUST2015 challenge for real-time tracking of liver motion in 2D ultrasound sequences, as the winner of this challenge could only achieve on average 4.8 fps (469% slower) with its mean (95%ile) error being 12.5% (2.6%) better. Hence, we based our method on this real-time liver tracker.

In a practical application of our method for radiation therapy, a 2D convex transducer can be used to image the liver reaching below the ribs. On an initial (reference) frame, an operator would then mark the target (e.g., tumor center) location, as well as few other easy to track locations (supporters). Tracking would then run during the treatment, while the target location estimates can be used to gate or compensate for patient motion.

References

Keall PJ, Mageras GS, Balter JM, Emery RS, Forster KM, Jiang SB, Kapatoes JM, Low DA, Murphy MJ, Murray BR et al (2006) The management of respiratory motion in radiation oncology report of AAPM Task Group 76a). Med Phys 33(10):3874–3900
Article Google Scholar
Magerasa GS, Yorkea E (2004) Deep inspiration breath hold and respiratory gating strategies for reducing organ motion in radiation treatment. Semin Radiat Oncol 14(1):65–75
Article Google Scholar
De Luca V, Szkely G, Tanner C (2015) Estimation of large-scale organ motion in B-mode ultrasound image sequences: a survey. Ultrasound Med Biol 41(12):3044–3062
Article PubMed Google Scholar
Vijayan S, Klein S, Hofstad EF, Lindseth F, Ystgaard B, Langø T (2013) Validation of a non-rigid registration method for motion compensation in 4D ultrasound of the liver. In: 2013 IEEE 10th international symposium on biomedical imaging, pp 792–795
De Luca V, Tschannen M, Székely G, Tanner C (2013) A learning-based approach for fast and robust vessel tracking in long ultrasound sequences. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 518–525
Makhinya M, Goksel O (2015) Motion tracking in 2D ultrasound using vessel models and robust optic-flow. In: MICCAI 2015 challenge on liver ultrasound tracking
Grabner H, Matas J, Van Gool L, Cattin P (2010) Tracking the invisible: learning where the object might be. In: International conference on computer vision and pattern recognition (CVPR), pp 1285–1292
Yanagawa Y, Echigo T, Vu H, Okazaki H, Fujiwara Y, Arakawa T, Yagi Y (2012) Tracking abnormalities in video capsule endoscopy using surrounding features with a triangular constraint. In: International symposium on biomedical imaging (ISBI)
Chakraborty A, Roy-Chowdhury AK (2015) Context aware spatio-temporal cell tracking in densely packed multilayer tissues. Med Image Anal 19(1):149–163
Article PubMed Google Scholar
Xia Y, Hussein S, Singh V, John M, Wu Y, Chen T (2016) Context region discovery for automatic motion compensation in fluoroscopy. Int J Comput Assist Radiol Surg 11(6):1–9
Article Google Scholar
Sun Z, Yao H, Zhang S, Sun X (2011) Robust visual tracking via context objects computing. In: 18th IEEE international conference on image processing, pp 509–512
Xiong F, Camps OI, Sznaier M (2012) Dynamic context for tracking behind occlusions. In: European conference on computer vision (ECCV), pp 580–593
Meng L, Jia Q (2013) Multi-target tracking based on level set segmentation and contextual information. Int J Signal Process Image Process Pattern Recognit 6(4):287–296
Google Scholar
Zhang L, Van Der Maaten L (2014) Preserving structure in model-free tracking. IEEE Trans Pattern Anal Mach Intell 36(4):756–769
Article PubMed Google Scholar
Meshgi K, Maeda S-I, Oba S, Skibbe H, Li Y-Z, Ishii S (2016) An occlusion-aware particle filter tracker to handle complex and persistent occlusions. Comput Vis Image Underst 150:81–94
Article Google Scholar
Samei G, Chlebus G, Sz ekely G, Tanner C (2013) Adaptive confidence regions of motion predictions from population exemplar models. In: MICCAI workshop on computational and clinical challenges in abdominal imaging, pp 231–240
De Luca T, annd Benz V, Kondo S, Knig L, Lbke D, Rothlbbers S, Somphone O, Allaire S, Lediju Bell M, Chung D, Cifor A, Grozea C, Gnther M, Jenne J, Kipshagen T, Kowarschik M, Navab N, Rhaak J, Schwaab J, Tanner C (2015) The 2014 liver ultrasound tracking benchmark. Phys Med Biol 60(14):5571
Article PubMed Google Scholar
Mohr M, Abrams E, Engel C, Long WB, Bottlang M (2007) Geometry of human ribs pertinent to orthopedic chest-wall reconstruction. J Biomech 40(6):1310–1317
Article PubMed Google Scholar
Mattausch O, Goksel O (2016) Monte-carlo ray-tracing for realistic interactive ultrasound simulation. In: Eurographics workshop on visual computing for biology and medicine
Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: Proceedings of imaging understanding workshop, pp 121–130
Crimi A, Makhinya M, Baumann U, Thalhammer C, Szekely G, Goksel O (2016) Automatic measurement of venous pressure using B-mode ultrasound. IEEE Trans Biomed Eng 63(2):288–299
Article PubMed Google Scholar

Download references

Acknowledgements

This work is supported by the Swiss National Science Foundation.

Author information

Authors and Affiliations

Computer-Assisted Applications in Medicine Group, Computer Vision Lab, Department of Information Technology and Electrical Engineering, ETH Zurich, Zurich, Switzerland
Ece Ozkan, Christine Tanner, Matej Kastelic, Oliver Mattausch, Maxim Makhinya & Orcun Goksel

Authors

Ece Ozkan
View author publications
You can also search for this author in PubMed Google Scholar
Christine Tanner
View author publications
You can also search for this author in PubMed Google Scholar
Matej Kastelic
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Mattausch
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Makhinya
View author publications
You can also search for this author in PubMed Google Scholar
Orcun Goksel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ece Ozkan.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Appendix: Baseline method: tracking by Makhinya and Goksel (TMG)

Our previously developed tracker [6], which is runner up of the Challenge in Liver Ultrasound Tracking (CLUST)-2015 challenge and is based on optic flow and elliptic vessel model, is employed as object tracker for tracking the supporters and target. The method is summarized below for completeness. Note that this method can track several landmarks together real time and works faster than US acquisition.

Overview The method decides in the initial frame, if the target is vessel-like or not by matching with ellipsoid vessel templates and integrates then several tracking strategies. It involves reference tracking (RT) when the local appearance on the initial, ${\mathbf {I}}^{0}$, and the current frame, ${\mathbf {I}}^{f}$, are similar. Meanwhile, it uses model-based iterative tracking (IT) when RT fails and local appearance of consecutive frames, ${\mathbf {I}}^{f-1}$ and ${\mathbf {I}}^{f}$, are similar. A robust motion tracking is applied in either case. For vessel-like structures, this is improved further by model-based tracking.

Motion tracking Lucas–Kanade-based tracking [20] was applied on a set of regularly spaced grid points around each target. RT is then used for exploiting the repetitive breathing motion characteristic, while IT is used for tracking the motion during the rest of the cycle, i.e., when RT fails. Each tracking strategy yields several motion vectors, which are then filtered for outliers. Finally, from the remaining motion vectors, an affine transform is computed to provide a robust motion estimate for the target.

Model-based tracking For vessel-like structures, model-based tracking is done using an axis-aligned ellipse representation of vessels. For each frame ${\mathbf {I}}^{f}$, first the center is transformed by the affine transform determined by motion tracking; see above, and then the center and radii are re-estimated as in [21] using the Star Edge detection, dynamic programming, model fitting, and binary template matching. The center of the resulting ellipse is then used as the estimated target position at frame ${\mathbf {I}}^{f}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ozkan, E., Tanner, C., Kastelic, M. et al. Robust motion tracking in liver from 2D ultrasound images using supporters. Int J CARS 12, 941–950 (2017). https://doi.org/10.1007/s11548-017-1559-8

Download citation

Received: 28 January 2017
Accepted: 09 March 2017
Published: 22 March 2017
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11548-017-1559-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust motion tracking in liver from 2D ultrasound images using supporters