Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Ultrasound imaging (US) is a widely used modality due to its versatility, low cost, and real-time capabilities. Such acquisitions have been for a long time limited to 2D images but the recent development of 3D US allowed to consider new problems such as volumetric assessments of organs or image registration. In addition to conventional US, three-dimensional real-time visualization of vascularization can be achieved with contrast-enhanced ultrasound (CEUS) imaging. This rather new modality provides very useful information for lesions diagnosis or large vessels monitoring [1]. Gas-filled microbubbles, acting as amplifiers of the blood backscattering signal, are used as a contrast agent. Because the bubbles are naturally eliminated by metabolism processes, this modality is considered as completely safe for the patients even with renal or liver failure (unlike contrast-enhanced CT, for example).

However the usually poor quality of CEUS images makes any computer-based analysis challenging: in addition to having powerful speckle noise, the image is very grainy and almost binary as a result of ultrasound interactions with individual bubbles. Unlike in conventional US [2], very few segmentation methods of 3D CEUS images have been reported. Among them, Gasnier et al. [3] introduced an interactive approach to segment and analyze tumors in this modality. However, their framework was specific to lesion segmentation, just as the automatic methods proposed in [4, 5]. In [6], Ma et al. developed an automatic algorithm to segment the heart left ventricle. This method, although applicable to other organs, does not provide any natural way to refine or correct the result interactively. Besides, it has been designed for images acquired with a particular transducer, producing sparse rotated slices instead of a whole 3D volume.

In this chapter, we address the problem of kidney segmentation in 3D CEUS images. This challenging issue is of great importance to assess quantitatively the volume of renal tissues. First, we present a generic and fast approach to automatically segment a kidney in CEUS volumes. Our method consists in detecting it in the image as an ellipsoid, and then deforming this ellipsoid to match precisely its boundary. Second, we extend this framework in order to take into account other kinds of information :

  • user interactions: Because of the poor image quality or pathologies, image information may be sometimes unreliable and even misleading. In such cases, the clinician user should be able to guide or correct the segmentation easily and with a real-time feedback.

  • simultaneous use of another image: Because of shadowing effects, pathologies, and limited field of view, parts of the kidney may be hardly visible in the image. In such cases even expert users may have difficulty delineating the true boundary of the organ by solely relying on one CEUS image. In clinical routine every CEUS acquisition is preceded by a conventional US acquisition to locate the kidney. Hence, the latter would be useful to complement the CEUS image and thus cope with missing and corrupted information.

Prior work on kidney segmentation in CEUS is limited to two of our conference papers [7, 8], of which this chapter is an extended version.

The remainder of the chapter is organized as follows. First of all, section “Material” is dedicated to the description of the material used throughout the chapter in validation experiments. In section “Kidney Detection via Robust Ellipsoid Estimation”, we introduce a fast and robust method to estimate roughly the center, orientation, and sizes of the kidney. This is a done via an original variational framework for ellipsoid detection. The outcome of this step is then used as the prior model of a segmentation algorithm, based on template deformation, described in section “Kidney Segmentation via Implicit Template Deformation”. Because of the inherent ambiguities in CEUS images, the obtained segmentation may be improved by using additional information. In section “Segmentation with User Interactions”, we show how user interactions can be used inherently in our framework to correct the result in real-time. Then we extend our approach to multiple images, namely the CEUS and the US volumes (section “Joint Co-segmentation and Registration”) which are not aligned. Thus a generic framework for joint co-segmentation and registration is introduced and applied to both the kidney detection and segmentation. We show that by taking additional information into account, the automatic kidney segmentation is more robust. Finally, we conclude the chapter by discussing potential improvements.

Material

This section describes the material used throughout the chapter. Our database is composed of 64 pairs of CEUS and US volumes acquired from 35 different patients, via an iU22 ultrasound system (Philips, The Netherlands). In order to have a clinically representative database, both healthy and diseased kidneys were considered. Images were acquired using different probes, namely V6-2 and X6-1 (Philips, The Netherlands) US probes, with various fields of view. The volumes size was 512 × 510 × 256 voxels with different spatial resolutions (0.25 × 0.25 × 0.55 mm in average). The acquisition protocol was as follows: first, the clinician scouted for the patient’s kidney using conventional US and acquired a US volume. Then, 2.4 mL of Sonovue (Bracco, Italy) contrast agent was injected to the patient and a CEUS acquisition was performed after a few seconds. Indeed, dynamic CEUS images of a kidney show a cortical enhancement shortly followed by a medullary enhancement. Better visualization of kidney tissue is then available when the contrast agent has diffused as it is completely hyperechoic whereas its fatty surrounding produces no signal. Figure 1 shows a comparison of US and CEUS images for two patients of our database. Note that the US and CEUS images are not aligned as the clinician may have slightly moved the probe between the two acquisitions.

Fig. 1
figure 1

Slices of conventional and contrast-enhanced ultrasound 3D images of the kidney for two different patients (left and right)

For each image, an expert was asked to segment the kidney with a semiautomatic tool. This segmentation was considered as the ground truth. The different approaches described in the chapter will be evaluated by computing the Dice coefficient between the segmentation result S and the ground truth G T, defined as

$$\displaystyle{ \mathrm{ Dice}(S,GT) = 2\ \frac{\text{Vol}(S \cap GT)} {\text{Vol}(S) + \text{Vol}(GT)}\ , }$$
(1)

where Vol(X) denotes the volume of a region X. Thus the higher the Dice coefficient, the better the segmentation is. In particular, this score is equal to 1 for a perfect segmentation and 0 for a completely non-overlapping segmentation.

All proposed methods were implemented in a C++ prototype and the computational times will be given for a standard computer (Intel Core i5 2.67 Ghz, 4GB RAM).

Kidney Detection via Robust Ellipsoid Estimation

Since kidney shape can be roughly approximated by an ellipsoid, the kidney automatic detection problem in CEUS images can be initially reduced to finding the smallest ellipsoid encompassing most of the hyperechoic voxels. A large number of methods (e.g., Hough transforms [9, 10]) have already been proposed to detect ellipses in images [11]. However their extension to 3D, though possible, is usually computationally expensive mainly because of the number of parameters to estimate (9 for a 3D ellipsoid). Furthermore, they do not explicitly use the fact that only one ellipsoid is present in the image. On the other hand, statistical approaches like robust Minimum Volume Ellipsoid (MVE) estimators [12] are better suited but require prior knowledge on the proportion of outliers (here the noise, artifacts, or neighboring structures), which may vary from one image to another and is thus not available. We therefore propose an original variational framework, which is robust and fast, to estimate the best ellipsoid in an image \(I :\varOmega \subset \mathbb{R}^{3} \rightarrow \mathbb{R}^{+}\).

A Variational Framework for Robust Ellipsoid Estimation

In the considered framework, an ellipsoid is implicitly represented using an implicit function \(\phi :\varOmega \rightarrow \mathbb{R}\) that is positive inside the ellipsoid and negative elsewhere. ϕ can be parametrized by the center of the ellipsoid c 3 and its sizes and orientations encoded by a 3 × 3 positive-definite matrix M. We therefore define the implicit equation of an ellipsoid as

$$\displaystyle{ \phi _{ {\bf c},{\bf M}}({\bf x}) = 1 - {({\bf x} -{\bf c})}^{T}{\bf M}({\bf x} -{\bf c}) = 0\ . }$$
(2)

The detection method should be robust to outliers, i.e. bright voxels coming from noise, artifacts, or other neighboring structures. Excluding those outliers is done by estimating a weighting function w (defined over the image domain Ω into [0,1]) that provides a confidence score for any point x to be an inlier. The ellipsoid estimation is then formulated as an energy minimization problem with respect to c, M, and w:

$$\begin{array}{l} \min\limits_{{\bf c},{\bf M},w}\left\{E_{\mathrm{det}}({\bf c},{\bf M},w) = -\int_{\varOmega}\phi_{{\bf c},{\bf M}}({\bf x})\ I({\bf x})\ w({\bf x})\ d{\bf x}\right.\\ \left.\qquad\qquad\qquad \qquad \qquad \qquad + \mu .\log \left(\frac{\text{Vol}({\bf M})}{\vert \varOmega \vert} \right).\left(\int _{\varOmega }I({\bf x})\ w({\bf x})\ d{\bf x}\right)\right\}\\ \qquad \text{with}\quad \phi_{ {\bf c},{\bf M}}({\bf x}) = \:1 -{\left({\bf x} -{\bf c}\right)}^{T}{\bf M}\left({\bf x} -{\bf c}\right) \\ \qquad \text{and}\quad \text{Vol}({\bf M}) = \:\frac{4\pi}{3} \sqrt{\det {{\bf M} }^{-1}}\quad \text{ the ellipsoid volume}.\\ \end{array}$$
(3)

The ellipsoid detection energy \(E_{\mathrm{ det}}\) is composed of two terms:

  • a data-fidelity term: The first term is an integral over the whole image domain Ω of the product ϕ c,M by w I. Note that w I is highly positive at voxels that have a high intensity but are not outliers. To minimize the energy, such voxels must therefore be included inside the ellipsoid, i.e. where ϕ is positive.

  • a regularization term: The second term penalizes the volume of the ellipsoid Vol(M) with respect to the domain volume |Ω|. The logarithm provides a statistical interpretation of the problem and eases the minimization of the energy, as will be seen in the next subsection. It is normalized by ∫ w I and weighted by a trade-off parameter μ > 0.

Numerical Optimization

This ellipsoid estimation process can be viewed as fitting a Gaussian distribution to the bright pixels of the image by minimizing its negative log-likelihood. Therefore \(E_{\mathrm{ det}}\) has a statistical meaning and when w is fixed, the minimizers (c ,M ) of \(E_{\mathrm{ det}}(\cdot ,\cdot ,w)\) have a closed form. Indeed, c is the barycenter of all voxels x weighted by I(x)w(x), while M is the inverse of the covariance matrixFootnote 1 of the same data. Besides, \(E_{\mathrm{ det}}\) is linear with respect to w which is by definition restricted to [0,1]. Therefore, at every voxel x the minimizer w (x) is equal to 0 or 1, depending only on the sign of \(\phi _{ {\bf c},{\bf M}} - \mu \log \left (\frac{\text{Vol}({\bf M})} {\vert \varOmega \vert } \right )\). w is then the indicator of the current ellipsoid estimation which has been dilated proportionately to μ. Its purpose is to remove the contribution of the points which are far away from the current ellipsoid and may hinder its refinement.

The weighting function w is initialized to 1 everywhere. Minimization of \(E_{\mathrm{ det}}\) is then performed with an alternate iterative scheme that successively updates the variables c, M, and w, as summarized in Algorithm 1. As the energy \(E_{\mathrm{ det}}\) decreases at each step, the algorithm is guaranteed to converge. In practice, few iterations are required for convergence and total computational time is less than a second for a 3D image.

The choice of μ is paramount as it controls the number of points that are taken into account for the ellipsoid matrix estimation. It should be set to values close to \(\frac{2} {5}\) in 3D and \(\frac{1} {2}\) in 2D (the proof is deferred in the appendix).

Figure 2 shows such a process for a synthetic 2D image. The first ellipse estimate is too large as all voxels are considered but far points are progressively eliminated via the weighting function w until the algorithm converges towards the good solution. We also present results on real CEUS data in Fig. 3. The estimated ellipsoids are not perfectly accurate but robust and close enough to be used as an initialization for a segmentation algorithm.

Fig. 2
figure 2

(a) Original 2D synthetic image, corrupted by salt-and-pepper noise. (b) Evolution of the ellipse along the iterations (orange) and final result (green). (c) Ellipse contour and center superimposed on the product w I at convergence

Fig. 3
figure 3

Results of the ellipsoid detection (red) compared to the ground truth (green), on slices of the volumes shown in Fig. 1

Kidney Segmentation via Implicit Template Deformation

The previously detected ellipsoid will now be deformed to segment the kidney more precisely. We follow the template deformation framework described in [13, 14] and extended in [15], as it is a very efficient model-based algorithm and it has already been applied successfully to kidney segmentation in CT images [16].

Implicit Template Deformation Framework

Implicit template deformation is a framework where an implicit shape defined by a function \(\phi _{0} :\varOmega \rightarrow \mathbb{R}\), called the template, is deformed so that its zero level set segments a given image \(I :\varOmega \rightarrow \mathbb{R}^{+}\). The segmenting implicit shape is the zero level set of a function \(\phi :\varOmega \rightarrow \mathbb{R}\), therefore defined with respect to this template and a transformation of the space ψ :ΩΩ that becomes the unknown of the problem : ϕ = ϕ0 ∘ ψ. In our application, the template is the implicit function of the previously estimated ellipsoid ϕ0 = ϕ c ,M and ψ is sought such that the image gradient flux across the surface of the deformed ellipsoid (ϕ0 ∘ ψ)−1(0) is maximum. The segmentation energy is then

$$\displaystyle{ E_{\mathrm{ seg}}(\psi ) =\int _{\{\phi _{0}\circ \psi =0\}} -\left < \vec{\nabla I}({\bf x})\ ,\ \vec{{\bf n}}({\bf x})\right >\ dS({\bf x}) + \lambda {\mathcal R}(\psi ) , }$$
(4)

where \(\vec{{\bf n}}({\bf x})\) denotes the vector normal to the surface of the segmentation at point x. \({\mathcal R}(\psi )\) is a regularization term which prevents large deviations from the original ellipsoid. Its choice will be detailed in section “Transformation Model” hereafter. λ is a positive scalar parameter that controls the strength of this shape constraint.

Using the divergence theorem, the first data-fidelity term can be rewritten as

$$\displaystyle{ \int _{\{\phi _{0}\circ \psi =0\}}\ \ \ -\left < \vec{\nabla I}({\bf x}),\ \vec{{\bf n}}({\bf x})\right >\ dS({\bf x}) = -\int _{\{\phi _{0}\circ \psi \geq 0\}}div(\nabla I({\bf x}))\ d{\bf x} = -\int _{\{\phi _{0}\circ \psi \geq 0\}}\varDelta I({\bf x})\ d{\bf x} }$$
(5)

where Δ denotes the Laplacian operator. Introducing H the Heaviside function (H(a) = 1 if a is positive, 0 otherwise) yields a more convenient formulation of the segmentation energy :

$$\displaystyle{ E_{\mathrm{ seg}}(\psi ) = -\int _{\varOmega }H(\phi _{0} \circ \psi ({\bf x}))\ \varDelta I({\bf x})\ d{\bf x} + \lambda {\mathcal R}(\psi ) , }$$
(6)

Transformation Model

The choice of the space of possible solutions ψ to Problem (6) is, in our case, intrinsically linked to the notion of shape. A shape can be considered as a set of objects sharing the same visual aspect. It should be invariant to geometric transforms such as translation, rotation, scaling, or shearing. We will refer to such a global transformation as the pose. To set up a clear distinction between the pose and the subsequent shape deformation, similarly to [17], we design our template transformation model ψ as a functional composition of a global transformation \({\mathcal G}\) and a nonrigid local transformation \({\mathcal L}\) (see Fig. 4):

$$\displaystyle{ \psi = {\mathcal L}\circ {\mathcal G} }$$
(7)
Fig. 4
figure 4

Decomposition of the transformation ψ. The implicit template ϕ0 undergoes a global transformation \({\mathcal G}\) and a local deformation \({\mathcal L}\)

Pose. \({\mathcal G} :\varOmega \rightarrow \varOmega\) is chosen as a parametric transform that coarsely aligns the template with the target surface in the image. It will basically correct or adjust the global position and scaling of the ellipsoid and can be chosen as a similarity. \({\mathcal G}\) is thus represented by a matrix in homogeneous coordinates defined by 7 parameters \({\bf p} =\{ p_{i}\}_{i=1\cdots 7}\) and noted \({\mathcal G}_{{\bf p}}\).

Deformation. \({\mathcal L} :\varOmega \rightarrow \varOmega\) is expressed using a displacement field u in the template referential \({\mathcal L} = Id + {\bf u}\). Similarly to problems in image registration and optical flow algorithms [18], u should be smoothly varying in space. While adding penalizations on differential terms of u to \({\mathcal R}(\psi )\) is a valid approach, efficient implementations are difficult to derive. Taking advantage of efficient linear filtering, smoothness of the displacement u is set as a built-in property defining it as a filtered version of an integrable unknown displacement field v

$$\displaystyle{ {\bf u}({\bf x}) = \left [K_{\sigma } {\ast}{\bf v}\right ]({\bf x}) =\int _{\varOmega }K_{\sigma }({\bf x} -{\bf y})\ {\bf v}({\bf y})\ d{\bf y} }$$
(8)

where K σ is a Gaussian kernel of scale σ. The overall transformation that can therefore be parametrized by p and v will be noted ψ p,v .

The proposed decomposition allows to define the shape prior term independently from the pose: \({\mathcal R}(\psi ) = {\mathcal R}({\mathcal L})\). \({\mathcal R}\) thus quantifies how much the segmenting implicit function ϕ deviates from the prior shape ϕ0. Using the L 2 norm we choose to constraint \({\mathcal L}\) towards the identity :

$$\displaystyle{ {\mathcal R}({\mathcal L}) = \frac{1} {2}\|{\mathcal L}- Id\|_{2}^{2} = \frac{1} {2}\int _{\varOmega }\|{\bf u}{({\bf x})\|}^{2}\ d{\bf x} }$$
(9)

The optimization problem to solve finally reads:

$$\begin{array}{lcl} \mathop{\min }\limits_{{\bf p},{\bf v}}\ \ \ &\left\{E_{\mathrm{ seg}}(\psi _{ {\bf p},{\bf v}}) = -\int _{\varOmega }H(\phi _{0} \circ \psi _{ {\bf p},{\bf v}}({\bf x}))\ \varDelta I({\bf x})\ d{\bf x} + \frac{\lambda } {2}\int _{\varOmega }\|K_{\sigma } {\ast}{{\bf v}\|}^{2}\right\}\\ \text{with}& \psi _{ {\bf p},{\bf v}} = (Id + {\bf u}) \circ {\mathcal G}_{{\bf p}}\quad \text{and}\quad {\bf u} = K_{\sigma}{\ast}{\bf v}\\ \end{array}$$
(10)

Numerical Implementation

Problem (10) is minimized via a standard gradient descent simultaneously on the parameters of the pose \({\mathcal G}_{p}\) and the deformation field v. The descent evolution equations are obtained by applying calculus of variations to \(E_{\mathrm{ seg}}\). We omit the tedious details but the final equations, after a variable substitution, read

$$\begin{cases} \frac{\partial {\bf p}} {\partial t} = -\int _{\varOmega }\delta (\phi _{0} \circ {\mathcal L})\ .\ \left \langle \nabla \phi _{0} \circ {\mathcal L},\ (Id + J_{{\bf u}})\frac{\partial {\mathcal G}} {\partial {\bf p}}{{\mathcal G}}^{-1}\right \rangle \ .\ \varDelta I \circ {{\mathcal G}}^{-1} \\ \frac{\partial {\bf v}} {\partial t} = -\Bigg[\ \ \delta (\phi _{0} \circ {\mathcal L})\ .\ \nabla \phi _{0} \circ {\mathcal L}\ .\ \varDelta I \circ {{\mathcal G}}^{-1}\ \ + \lambda {\bf v}\Bigg] {\ast} K_{ \sigma }\\ \end{cases}$$
(11)

where δ denotes the Dirac distribution and J u is the Jacobian matrix of the displacement field u.

A quick analysis of Eq. (11) reveals several key aspects for an efficient implementation. Interpolating \(\phi _{0} \circ {\mathcal L}\) and \(\nabla \phi _{0} \circ {\mathcal L}\) over the whole domain Ω would be extremely time-consuming. Nevertheless, since it is multiplied by \(\delta (\phi _{0} \circ {\mathcal L})\), the warped gradient field \(\nabla \phi _{0} \circ {\mathcal L}\) is only needed on the set \(\left \{\phi _{0} \circ {\mathcal L} = 0\right \}\) (Fig. 5a) which highly reduces the computational burden. Moreover, precise knowledge of the warped template \(\phi _{0} \circ {\mathcal L}\) is only necessary near its zero level set. We use a coarse-to-fine approach using octrees. At each level a decision is made to further refine the cell depending on the distance measure (Fig. 5b) drastically dropping complexity. Finally, stemming from the displacement model, extrapolating image and pointwise forces to the whole space boils down to a convolution with K σ (Fig. 5c). In practice, our current 3D implementation supports up to 100 time steps per second for a discretization of the implicit function on a 64 × 64 × 64 lattice.

Fig. 5
figure 5

Fast template deformation with coarse-to-fine distance warp and convolutions. (a) Surface/pointwise forces. (b) Coarse-to-fine \(\phi _{0} \circ {\mathcal L}\). (c) Convolved deformation

Results for Automatic Segmentation in CEUS Images

This validation has been performed on the CEUS images of the dataset presented in section “Material”. The completely automatic pipeline had a computational time of around 5 s.

Quantitative results are reported in Fig. 6. The overall median Dice coefficient is 0.69 for the detection and 0.76 for the segmentation and 25 % of the database have a very satisfying segmentation (Dice coefficient higher than 0.85), given the very poor image quality and the presence of pathologies.

Fig. 6
figure 6

Kidney detection (red) and segmentation (blue) results in terms of Dice coefficients shown as boxplots (left) and histograms (right). Boxplots show, respectively, the first decile, the first quartile, the median, the third quartile, and the ninth decile. Extreme points are shown separately

Figure 7 shows the obtained result for the two cases introduced in Fig. 1. The segmentations are very similar to the ground truth and can be considered as satisfying. Some cases are, however, more difficult (e.g., Fig. 10 in the next section) and will require additional information.

Fig. 7
figure 7

Result of the automatic segmentation (blue) compared to the ground truth (green) on a particular slice (top) and in 3D (bottom)

Segmentation with User Interactions

The previously described approach is fast and automatic, but fails in some difficult cases. Indeed ultrasound shadows or kidney pathologies make the image information unreliable and thus hinder the segmentation algorithm. It is therefore important to provide the clinician a way to guide or correct the segmentation easily and with a real-time feedback. As proposed in [15], this can be done easily within the implicit template deformation framework that was presented in section “Kidney Segmentation via Implicit Template Deformation”.

User Interactions as Constraints

In this section, we show how the user can guide the segmentation by indicating points that should be inside or outside the segmentation (see Fig. 8).

Fig. 8
figure 8

User interactions as inside/outside points. (a) Template deformed without constraints. (b) User indicates points that should be inside (blue) and outside (red) the segmentation. (c) New segmentation that satisfies these constraints

Consider that the user provides N points {x k } k Ω N in the image domain labeling each one as inside or outside of the surface to extract (which can be done via simple interactions such as a left click on an inside point, and a right click on an outside point). The implicit formulation allows to express this information merely as inequality constraints on the deformed implicit function, at points {x k } k :

$$\displaystyle{ \forall k \in [\vert 1,N\vert ],\ \ \ \gamma _{k}\ .\ \phi _{0} \circ \psi ({\bf x}_{k}) \geq 0 }$$
(12)

where γ k = 1 (resp. − 1) for inside (resp. outside) points. Note that it is also possible to specify a point that should be exactly on the segmentation surface by labeling it as both inside and outside: the two inequality constraints are equivalent to an equality constraint. Then, putting together the initial formulation in Eq. (6) and the constraints of Eq. (12) yields a general formulation of implicit template deformation with user interactions, as the following minimization problem:

$$\begin{array}{l} \mathop{\min }\limits_{\psi }\ \ \ \left \{E_{\mathrm{ seg}}(\psi ) = -\int _{\varOmega }H(\phi _{0} \circ \psi ({\bf x}))\ \varDelta I\left ({\bf x}\right )\ d{\bf x} + \lambda {\mathcal R}(\psi )\right \} \\ \text{subject to}\ \forall k \in [1,N],\ \ \ \gamma _{k}\ .\ \phi _{0} \circ \psi ({\bf x}_{k}) \geq 0\\ \end{array}$$
(13)

In the next subsection we propose a method to solve this problem efficiently. For the sake of genericity, no assumption is made on the representation of the deformation ψ and the model \(\psi = {\mathcal L}\circ {\mathcal G}\) will be just a particular implementation of the approach described hereafter.

Optimization Scheme

Since \(E_{\mathrm{ seg}}(\psi )\) is a non-convex functional and has to be minimized under a set of nonlinear constraints, no specifically tailored algorithms are available. For this matter, we follow a general augmented Lagrangian methodology [19] and define an equivalent unconstrained problem that can be locally minimized by gradient descent. The constrained problem (13) can equivalently be written as an unconstrained minimization problem of the form

$$\begin{array}{l} \mathop{\min }\limits_{\psi }\left \{\tilde{E}_{\mathrm{ seg}}(\psi ) =\mathop{\max }\limits_{\boldsymbol{ \alpha } \geq 0}\left \{E_{\mathrm{ seg}}(\psi ) -\sum_{k=1}^{N}\alpha _{ k}c_{k}(\psi )\right \}\right \}\\ \text{ with }\ c_{k}(\psi ) =\ \gamma _{k}\ .\ \phi _{0} \circ \psi ({\bf x}_{k})\\ \end{array}$$
(14)

where α k is the Lagrange multiplier associated with the kth constraint. Equation (14) has the same set of solutions as the original problem in Eq. (13): if ψ satisfies all constraints c k , then \(\tilde{E}_{\mathrm{ seg}}(\psi ) = E_{\mathrm{ seg}}(\psi )\); otherwise, \(\tilde{E}_{\mathrm{ seg}}(\psi )\) is infinite. Since \(\tilde{E}_{\mathrm{ seg}}\) jumps from finite to infinite values at the boundary of the feasible set, it is difficult to minimize it as such. A more practical approach is to introduce a smooth approximation \({\tilde{E}_{\mathrm{ seg}}}^{\nu \ \ }\) that depends on a quadratic penalty parameter ν. Parameter ν will be used to constrain the maximizers (α k ) k to finite values. These multipliers are estimated iteratively and we introduce (α k j) k the multipliers estimates at the jth iteration, in order to define the energy approximation

$$\displaystyle{ \tilde{E}_{\mathrm{ seg}}^{\nu }(\psi ,\boldsymbol{ {\alpha }}^{j}) =\mathop{\max }\limits_{\boldsymbol{ \alpha } \geq 0}\left \{E_{\mathrm{ seg}}(\psi ) -\sum_{k=1}^{N}\alpha _{ k}c_{k}(\psi ) - \frac{1} {2\nu }\sum_{k=1}^{N}{\left (\alpha _{ k} - \alpha _{k}^{j}\right )}^{2}\right \} }$$
(15)

The maximizing Lagrange multipliers associated with each constraint c k (ψ) have a closed-form solution :

$$\alpha _{k}^{j+1} = \begin{cases} 0 \ \ \text{if }\alpha _{k}^{j} - \nu c_{ k}(\psi ) \leq 0 \\ \alpha _{k}^{j} - \nu c_{k}(\psi )\ \ \text{otherwise.}\end{cases}$$
(16)

Substituting (16) into (15) yields the following expression of the smooth approximation \({\tilde{E}_{\mathrm{ seg}}}^{\nu \ \ }\):

$$\begin{array}{l} \tilde{E}_{\mathrm{ seg}}^{\nu }(\psi ,{\alpha }^{j}) = E_{\mathrm{ seg}}(\psi ) +\sum_{ k=1}^{N}F_{ \nu }\left (c_{k}(\psi ),\alpha _{k}^{j}\right ) \\ \text{ with }\ \ F_{\nu }(a,b) = \begin{cases} - ab + \frac{\nu } {2}{a}^{2} \text{if }\nu a \leq b \\ - \frac{1}{2\nu}{b}^{2} \text{otherwise.}\end{cases}\\ \end{array}$$
(17)

Finally, the alternate scheme described in Algorithm 2, in which the penalty parameter ν is gradually increased, will provide a local minimizer of \(E_{\mathrm{ seg}}\) that eventually satisfies the user constraints. Within this process, Step (1) is straightforward and Step (2) is very similar to the gradient descent proposed in section “Numerical Implementation”:

$$\begin{cases} \frac{\partial {\bf p}} {\partial t} \leftarrow \frac{\partial {\bf p}} {\partial t} -\sum_{k=1}^{K}\gamma _{ k}F(\alpha _{k})\ \left \langle \nabla \phi _{0} \circ {\mathcal L}\circ {\mathcal G}({\bf x}_{k}),\ (Id + J_{{\bf u}})\frac{\partial {\mathcal G}} {\partial {\bf p}}({\bf x}_{k})\right \rangle \\ \frac{\partial {\bf v}} {\partial t} \leftarrow \frac{\partial {\bf v}} {\partial t} -\Bigg[\ \sum_{k=1}^{K}\gamma _{ k}\delta _{ {\mathcal G}({\bf x}_{k})}\ F_{\nu }(\alpha _{k})\nabla \phi _{0} \circ {\mathcal L}\ \ \ \Bigg] {\ast} K_{\sigma }\end{cases}$$
(18)

Note that the additional terms in Eq. (18) are just pointwise contributions that do not influence the overall computational time.

Influence of User Interactions on Kidney Segmentation in CEUS

Validation of the user interactions has been performed on a subset of 21 CEUS volumes from 21 different patients of our database. For each case, the automatic segmentation has been run and its result was refined with user interactions from an expert. Figure 9 reports the Dice coefficients obtained as a function of the number of clicks. The score gradually increases as the user interacts with the algorithm but rapidly converges: most of the time, less than 3 clicks are needed for a fairly precise result (Dice ≥ 0.9).Footnote 2 The results also show that even when the initialization produces a low score, very few interactions can improve a lot the segmentation. The influence of user interactions is illustrated in Fig. 10, where we show results on a difficult case. The patient has a lot of renal cysts that are anechogenic and hinders the automatic segmentations. With 3 clicks, the segmentation is much closer to the ground truth.

Nevertheless, in some applications user interactions are not possible and the segmentation has to be automatic. In the next section, we propose to improve the kidney segmentation by using simultaneously and automatically the conventional US image

Fig. 9
figure 9

Boxplots of the Dice coefficient between the ground truth and the segmentation at different steps of the proposed algorithm. Boxplots show, respectively, the first decile, the first quartile, the median, the third quartile, and the ninth decile. Extreme points are shown separately

Fig. 10
figure 10

Example of a segmentation with user interactions. (a) Slice of the original CEUS volume. (b) Comparison of the ground truth (green) and automatic segmentation (red). (c) Corrected segmentation (blue) with 2 inside points (blue dots) and one outside point (red dot). (d) 3D visualization of the ground truth (green), the automatic (red), and corrected (blue) segmentation, with constraint points

Joint Co-segmentation and Registration

Co-segmentation often denotes the task of finding an object in each image that shares the same appearance but not necessarily the same shape [20]. Here we look for the exactly same organ in two images but with a different appearance. As simultaneous acquisition of US and CEUS is not possible on current 3D imaging systems, the two images are in arbitrary referentials and need to be aligned. However, standard iconic registration methods are not adapted since visible structures, apart from the kidney itself, are completely different in US and CEUS. Co-segmentation shall therefore help registration, just as registration helps co-segmentation. This calls for a method that jointly performs these two tasks (see Fig. 11).

Fig. 11
figure 11

Joint co-segmentation and registration. Given two different non-aligned images of the same object, the proposed method aims at segmenting this object in both images as well as estimating a rigid transformation between them

Although segmentation and registration are often seen as two separate problems, several approaches have already been proposed to perform them simultaneously. Most of them rely on an iconic registration guiding the segmentation (e.g., [2123]). Yet they assume that the segmentation is known in one of the images, which is not the case in our application of co-segmentation. Moreover, as stated before, CEUS/US intensity-based registration is bound to fail since visible structures do not correspond to each other. Instead of registering the images themselves, Wyatt et al. [24] developed a MAP formulation to perform registration on label maps resulting from a segmentation step. However no shape model is enforced and noise can degrade the results. In [25], Yezzi et al. introduced a variational framework that consists in a feature-based registration in which the features are actually the segmenting active contours.

In this section, we aim at extending both the previously described kidney detection and segmentation in a 3D CEUS image to a pair of 3D CEUS and US images. To that end, we develop a generic joint co-segmentation and registration framework inspired by [25]. This results in a fully automated pipeline to obtain both an improved kidney segmentation in CEUS and US images and a registration of them. But first of all, in order to use conventional US, we need to learn how the kidney looks like in such images.

Learning Appearance in Conventional Ultrasound

In CEUS images, bright areas indicate the presence of contrast agent which is mainly localized in the kidney. This is why we directly used the image intensity as a voxel probabilities to be inside the kidney. However in conventional US images, this does not hold and we need to transform the image into a more elaborate kidney probability map.

The kidney appearance has a much higher variability in US images, although their structure is consistent: kidneys are always composed of a bright sinus surrounded by a darker parenchyma (see Fig. 12). As intensity itself is not reliable enough, we chose to combine multiple image features using decision forests [26] to obtain a class posterior map p US.

Fig. 12
figure 12

Kidney appearance in US images (the kidney boundary is denoted in red). (Left) Original images showing the high variability of the database. (Middle) Kidney probability given by the first classifier. (Right) Final kidney probability p US

Recent work [2731] demonstrated that adding contextual information allows to improve spatial consistency and thus classification performance. Here we propose to exploit the kidney structure in a simple yet efficient way. Similarly to the auto-context framework introduced by Tu et al. [32], contextual information is included by using two classifiers in cascade. A first classification (kidney vs background) is performed in each voxel using a decision forest. Then we use these class posterior probabilities as additional input of a second random forest that will give the final kidney probability p US. In the remainder of the chapter, we will work on this map instead of the original US image.

The features used for the first decision forest were the intensity of the image and its Laplacian at the considered voxel as well as at its neighbors’ within a 7 × 7 × 7 local patch, at three different scales (σ = 2,4,6 mm). Intensities were normalized in each patch. For the second forest, we added the estimated class posterior as additional channels. Each forest was composed of 10 trees with maximum depth 15.

To validate this probability estimation, the patient database was split into two groups. Results on the whole dataset were then obtained using a two-fold cross-validation. Figure 13 shows the ROC and Precision-Recall curves computed (1) by the first decision forest and (2) using the auto-context approach with another forest in cascade. The latter provides better kidney probabilities with respect to all reported statistics. Indeed, taking into account structural information helps, for example, in distinguishing the kidney sinus from the background or the parenchyma from shadows and allows a more spatially coherent classification (see Fig. 12).

Fig. 13
figure 13

Comparison of classification results for the single decision forest and the auto-context approach. (Left) ROC curve. (Right) Precision-Recall curve

Generic Framework for Co-segmentation and Registration

In sections “Kidney Detection via Robust Ellipsoid Estimation” and “Kidney Segmentation via Implicit Template Deformation”, we presented two variational methods to, respectively, detect and segment the kidney. They both consist in seeking ϕ as the minimizer of a functional of the following generic form

$$\displaystyle{ E_{I}(\phi ) =\int _{\varOmega }f(\phi ({\bf x}))\ r_{I}({\bf x})\ d{\bf x} + {\mathcal R}(\phi ) }$$
(19)

where f is a real-valued function and r I (x) denotes a pointwise score on whether x looks like an interior or exterior voxel in the image I. This is a standard setting in which the optimal implicit function ϕ must achieve a trade-off between an image-based term and a regularization term \({\mathcal R}\).Footnote 3

We are interested in the case where a pair of images \(I_{1} :\varOmega _{1} \rightarrow \mathbb{R}\) and \(I_{2} :\varOmega _{2} \rightarrow \mathbb{R}\) of the same object are available. If those images were perfectly aligned, the energy in Eq. (19) can be straightforwardly generalized to perform co-segmentation :

$$\displaystyle{ E_{I_{1},I_{2}}(\phi ) =\int _{\varOmega _{1}}f(\phi ({\bf x}))\ (r_{I_{1}}({\bf x}) + r_{I_{2}}({\bf x}))\ d{\bf x} + {\mathcal R}(\phi )\ . }$$
(20)

Unfortunately, such an assumption rarely holds in medical applications unless the two images are acquired simultaneously. A more realistic hypothesis is to assume that the target object, segmented by ϕ, is not deformed between the two acquisitions, but only undergoes an unknown rigid transformation \({\mathcal G}_{r}\). The co-segmentation energy thus reads

$$\displaystyle{ E_{I_{1},I_{2}}(\phi ,{\mathcal G}_{r}) =\int _{\varOmega _{1}}f(\phi ({\bf x}))\ r_{I_{1}}({\bf x})\ d{\bf x} +\int _{\varOmega _{2}}f(\phi \circ {\mathcal G}_{r}({\bf x}))\ r_{I_{2}}({\bf x})\ d{\bf x} + {\mathcal R}(\phi )\ . }$$
(21)

Note that, after a variable substitution, it can be equivalently written

$$\displaystyle{ E_{I_{1},I_{2}}(\phi ,{\mathcal G}_{r}) =\int _{\varOmega _{1}}f(\phi ({\bf x}))\ (r_{I_{1}}({\bf x}) + r_{I_{2}} \circ {\mathcal G}_{r}^{-1}({\bf x}))\ d{\bf x} + {\mathcal R}(\phi )\ . }$$
(22)

Minimizing \(E_{I_{1},I_{2}}\) with respect to ϕ and \({\mathcal G}_{r}\) simultaneously can be therefore interpreted as performing jointly segmentation (via ϕ) and rigid registration \(({\rm via}\ {\mathcal G}_{r})\). This generalizes a more common co-segmentation approach (e.g., [34]) where the images are first aligned in a preprocessing step.

In the following, we apply this framework to the robust ellipsoid detection (section “Kidney Detection via Robust Ellipsoid Estimation”) and implicit template deformation (section “Kidney Segmentation via Implicit Template Deformation”) to build a completely automated workflow for kidney segmentation in CEUS and US images. Note that the kidney, which is surrounded by a tough fibrous renal capsule, is a rigid organ. The hypothesis of non-deformation is therefore justified.

Application to Kidney Detection

The robust ellipsoid detection setting of Eq. (3) falls into the framework described in Eq. (19) with :

  • f = I d and \(r_{I} = -wI\);

  • \({\mathcal R}(\phi _{ {\bf c},{\bf M}}) = {\mathcal R}({\bf M}) = \mu .\int _{\varOmega }Iw.\log \left (\frac{\text{Vol}({\bf M})} {\vert \varOmega \vert } \right )\).

Expanding this algorithm to another image I 2 requires the introduction of another weighting function w 2. Following Eq. (21), we can now define the co-detection energy as

$$\begin{array}{rl} E_{\mathrm{ co-det}}({\bf c},{\bf M},w_{1},w_{2},{\mathcal G}_{r}) = & -\int _{\varOmega }\phi _{ {\bf c},{\bf M}}({\bf x})\ w_{1}({\bf x})\ I_{1}({\bf x})\ d{\bf x}\\ & -\int _{\varOmega }\phi _{ {\bf c},{\bf M}} \circ {\mathcal G}_{r}({\bf x})\ w_{2}({\bf x})\ I_{2}({\bf x})\ d{\bf x}\\ &+ \mu \ \left (\int _{\varOmega }w_{1}I_{1} + w_{2}I_{2}\right )\ \log \left (\frac{\text{Vol}({\bf M})} {\vert \varOmega \vert } \right ) \\ \text{with}\quad \text{Vol}({\bf M}) &= \:\frac{4\pi } {3} \sqrt{\det {{\bf M} }^{-1}}\quad \text{ the ellipsoid volume}.\\ \end{array}$$
(23)

To facilitate the resolution of such a problem, \({\mathcal G}_{r}\)—as a rigid transformation—can be decomposed into a rotation and a translation. We can therefore equivalently write the energy as a function of the ellipsoid center c 2 in the second image and the rotation matrix R :

$$\begin{array}{rl} E_{\mathrm{ co-det}}({\bf c}_{i},w_{i},{\bf R},{\bf M}) = & -\int _{\varOmega }\phi _{ {\bf c}_{1},{\bf M}}({\bf x})\ w_{1}({\bf x})\ I_{1}({\bf x})\ d{\bf x}\\ & -\int _{\varOmega }\phi _{ {\bf c}_{2},{{\bf R}}^{T}{\bf M}{\bf R}}({\bf x})\ w_{2}({\bf x})\ I_{2}({\bf x})\ d{\bf x}\\ & + \mu \ \left (\int _{\varOmega }w_{1}I_{1} + w_{2}I_{2}\right )\ \log \left (\frac{\text{Vol}({\bf M})} {\vert \varOmega \vert } \right )\\ \end{array}$$
(24)

Minimization of such functional is done in an alternate three-step process:

  1. 1.

    The statistical interpretation still holds for the ellipsoid centers and matrix: minimizers c 1 and c 2 are weighted centroids while minimizer M is related to the weighted covariance matrix of pixels coming from both images.

  2. 2.

    The unknown matrix R accounts for the possible rotation between the two images and can be parametrized by a vector of angles \(\varTheta \in \mathbb{R}^{3}\). A gradient descent is performed at each iteration to minimize the energy with respect to Θ.

  3. 3.

    The weights w 1 and w 2 are finally updated as indicator functions (up to a slight dilation) of the current ellipsoid estimates.

The complete minimization strategy is summarized in Algorithm 2. This algorithm is computationally efficient: closed-form solutions are available (except for ℝ) and the process, though iterative, usually converges in very few iterations.

Figure 14 shows an example of ellipse co-detection in synthetic images, where the probability of belonging to the target object is the image intensity. Despite the noise, the simulated shadow, and the reduced field-of-view effect, the co-detection algorithm provides a good estimate on the ellipse position, size, and orientation in both images.

Fig. 14
figure 14

Ellipse detection on two synthetic images I 1 (a) and I 2 (d). Detected ellipses with their center and main axes are shown in (b) and (e) for independent ellipse detection (red) and proposed method for co-detection (blue) compared to the ground truth (green). (c) Second image registered with the estimated transform \({\mathcal G}_{r}^{-1}\). (f) Combination of image terms \(w_{1}I_{1} + (w_{2}I_{2}) \circ {\mathcal G}_{r}^{-1}\) used for ellipse estimation at convergence

Application to Kidney Segmentation

Implicit template deformation, as previously described in section “Kidney Segmentation via Implicit Template Deformation”, is part of the framework defined in Eq. (19) with :

  • f = H and \(r_{I} = -\varDelta I\);

  • \({\mathcal R}(\phi _{0} \circ \psi ) = {\mathcal R}({\mathcal L}) = \frac{\lambda } {2}\ \|{\mathcal L}- Id\|_{2}^{2}\).

We can therefore extend it to co-segmentation using Eq. (21) by considering the following functional

$$\begin{array}{rl} E_{\mathrm{ co-seg}}(\phi _{0} \circ {\mathcal L}\circ {\mathcal G},{\mathcal G}_{r}) &= \ E_{\mathrm{ co-seg}}({\mathcal L},{\mathcal G},{\mathcal G}_{r}) \\ &= -\int _{\varOmega }H(\phi _{0} \circ {\mathcal L}\circ {\mathcal G})\ \varDelta I_{1}({\bf x})\ d{\bf x} \\ & \quad-\int _{\varOmega }H(\phi _{0} \circ {\mathcal L}\circ {\mathcal G}\circ {\mathcal G}_{r})\ \varDelta I_{2}({\bf x})\ d{\bf x} \\ & \quad+ \frac{\lambda } {2}\|{\mathcal L}- Id\|_{2}^{2} .\end{array}$$
(25)

The energy \(E_{\mathrm{ co-seg}}\) is then minimized with respect to the parameters of \({\mathcal G}\), \({\mathcal G}_{r}\) and each component of the vector field u, through a gradient descent similar to section “Numerical Implementation”.

Results for Kidney Co-segmentation in CEUS and US

The average overall computational time for kidney probability estimation in US, ellipsoid co-detection, and kidney co-segmentation was around 20 s with our implementation.

Validation was performed by comparing the co-segmentation approach to a segmentation from a single image (in both CEUS an US cases). Dice coefficients and relative error on the measured kidney volume are reported in Fig. 15. Using simultaneously the complementary information from US and CEUS images significantly improves the segmentation accuracy in both modalities. More specifically, the median Dice coefficient is increased from 0.74 to 0.81 in CEUS (p-value < 10−4) and 0.73 to 0.78 in US (p-value < 10−4). Furthermore, the proposed approach provides more reliable clinical information as the median error on the kidney volume is almost divided by two in CEUS (29 % versus 15 %) and in US (25 % versus 13 %). Figure 16 shows the joint co-segmentation and registration results for one case. Independent segmentation fails in both US and CEUS images because of the kidney lesion (indicated by the yellow arrow) that looks like the background in CEUS but like the kidney in US. Conversely, the proposed co-segmentation manages to overcome this difficulty by combining information from the two modalities. Furthermore, for this example, one can assess the estimated registration by comparing the location of the lesion in the two modalities. Results on another case were also displayed in Fig. 11.

Fig. 15
figure 15

Boxplots of segmentation results for kidney segmentation in US and CEUS images, in terms of Dice coefficients (a)–(b) and relative volume error (c)–(d). The proposed co-segmentation compares favorably to independent segmentation with a p-value < 10−4. Boxplots show, respectively, the first decile, the first quartile, the median, the third quartile, and the ninth decile. Extreme points are shown separately

Fig. 16
figure 16

Example of joint co-segmentation and registration for a CEUS (top) and a US (bottom) images. (Left) Comparison of independent segmentations (red) and the proposed co-segmentation (blue) with respect to the ground truths (green). (Middle, Right) Two views of the registered volumes that can be assessed by considering the position of the lesion (yellow arrow)

Conclusion

This chapter addressed the problem of kidney segmentation in 3D CEUS images. Such a task is challenging because of the noise, the artifacts, and the partial occultation of the organ (due to the limited field of view).

A robust ellipsoid detector has been introduced to coarsely locate the kidney. The ellipsoid is then deformed to segment the kidney more precisely, by maximizing the image gradient flux through the segmentation boundary, using the template deformation framework. This method yields a fully automatic pipeline that provides a satisfying segmentation in a large number of cases but may fail when the image information is too ambiguous (shadows, pathologies, etc).

To overcome such difficulties, two extensions of this approach have been proposed to take into account additional information. First, we showed how user interactions can be exploited to guide the segmentation in real-time, by letting the user indicate points that should be inside/outside/on the segmentation. Then, we introduced a generic co-segmentation framework that generalizes any segmentation method to allow the simultaneous use of multiple images (here the CEUS and the US images). This results in both a better estimate of the organ shape and a registration of the images. The two aforementioned extensions are compatible and including user interactions in multiple images would be straightforward.

The kidney detection can still be improved by including more anatomical prior knowledge. A possible solution would be to constrain the ellipsoid’s axis lengths or volume to be close to clinically meaningful values. Another way is the use of CT images of the same patient to extract a tailored model of the kidney and help both the CEUS detection and segmentation.