1 Introduction

The task of automated image segmentation has become increasingly important in the last decade, due to a fast expanding field of applications, e.g., in biomedical imaging. The main goal of image segmentation is to recover an object-of-interest from a given dataset by partitioning it into disjoint compartments. In general, one can distinguish between edge-based (cf. e.g., [15, 44, 55]) and region-based (cf. e.g., [20, 22, 79]) segmentation methods. In this paper we will concentrate on the latter one, since our work is motivated by segmentation tasks in biomedical imaging, where we have to segment continuous objects-of-interest.

Recently, mathematical tools such as level sets, active contours, and variational methods led to significant improvements in automated image segmentation. One famous framework, which also allows to incorporate a-priori knowledge into the process of segmentation, is the popular Mumford-Shah (MS) model [55]. Based on this framework the frequently used Chan-Vese segmentation method [20] was developed, which simplifies the MS segmentation model to the case of piecewise constant approximations of the image intensity.

1.1 Motivation

Despite its high level of awareness in the segmentation community, the MS formulation has not yet been investigated in a more general context of physical noise modeling. This is a crucial part in image denoising, since the image noise naturally has to be covered by the denoising method in order to produce satisfying results. Some exemplary literature on image denoising based on statistical methods can be found in [6, 19, 47]. Furthermore, only few publications considered the effect of a specific noise model on the results of image segmentation [23, 53]. Since the field of applications for automated image segmentation grows steadily, a lot of segmentation problems need a suitable noise model, e.g., synthetic aperture radar, positron emission tomography or medical ultrasound imaging. Especially for data with poor statistics, i.e., with a low signal-to-noise ratio, it is important to consider the impact of the present noise model in the process of segmentation as we will show in later sections.

Besides the application of different noise models, the selection of appropriate regularizers has a remarkable influence on the results of a segmentation approach. By incorporating a-priori knowledge into the mathematical model one is able to reduce the set of possible solutions to a subset of segmentations satisfying certain constraints. This effect can be seen best in the work of Chan-Vese [20], which restricts the set of solutions of the segmentation problem to the subset of piecewise constant functions. However, it is possible to think about situations in which the original signal is not constant at all, e.g., inhomogeneous intensity distributions in magnetic resonance imaging [81] or medical ultrasound imaging [85]. This motivates the possibility to adjust the incorporation of a-priori knowledge with the help of suitable regularizers.

In the following we describe a general segmentation framework for different physical noise models, which also allows the incorporation of a-priori knowledge by using different regularization terms. In Sect. 2 we introduce a statistical formulation of the region-based segmentation problem and present the general segmentation framework. We study our statistical formulation for three different noise models, i.e., additive Gaussian noise, Poisson noise, and multiplicative speckle noise. The framework is analyzed extensively in Sect. 3. In particular, we investigate the existence of a solution and discuss different regularizers for the presented variational segmentation formulation, which allow us to incorporate a-priori knowledge about solutions to the segmentation problem. The relationship to the classical formulations of Chan-Vese and Mumford-Shah is discussed in detail in Sect. 4. In particular, we show that the region-based variant of the popular MS model is a special case of our framework for additive Gaussian noise. Moreover, we consider a natural extension of the Chan-Vese method to non-Gaussian noise models. Section 5 describes the numerical realization of the proposed segmentation formulation. We discuss the possibilities of global convex minimization and describe how to implement the corresponding optimization schemes efficiently. We present experimental results in Sect. 6 and validate our methods on both synthetic and real data from medical imaging with challenging image attributes, e.g., high noise level and intensity inhomogeneities. Finally, this paper is ended by discussion in Sect. 8.

2 Region-Based Segmentation Framework

The main idea of the region-based segmentation framework proposed in this paper is based on the fact that a wide range of noise types is present in real-life applications, particularly including noise models that are fundamentally different from additive Gaussian noise. To formulate a segmentation framework for different noise models and thus for a large set of imaging modalities, we use tools from statistics. First, we introduce some preliminary definitions to describe our model accurately.

Let Ω⊂ℝd be the image domain (we consider the typical cases d∈{2,3}) and let f be the given (noisy) image we want to segment. The segmentation problem consists in separation of the image domain Ω into an “optimal” partition \(\mathcal {P}_{m}(\varOmega )\) of pairwise disjoint regions Ω i , i=1,…,m, i.e.,

(1)

Naturally, the partition \(\mathcal {P}_{m}(\varOmega )\) is meant to be done with respect to the given image information induced by f, e.g., separation into an object-of-interest and background for m=2. Finally, we remark that in most cases the specific order of the Ω i in (1) does not matter.

In this work we are not only interested in the partition \(\mathcal {P}_{m}(\varOmega )\) of the image domain but also in the simultaneous restoration of the given data f as an approximation of the original noise free image. For this purpose we follow the idea proposed in [12] and [84, Sect. 4.4.4] and compute a smooth function u i for each subregion Ω i of \(\mathcal {P}_{m}(\varOmega )\), where the smoothness of u i is not only enforced in Ω i but on the entire image domain Ω. Using this approach an approximation u of the noise free image is given by

$$ u = \chi_{\varOmega_1} u_1 + \cdots + \chi_{\varOmega_m} u_m , $$
(2)

where \(\chi_{\varOmega_{i}}\) denotes the indicator function of Ω i and u i is a global smooth function induced by Ω i and the given data f, i.e.,

$$ \chi_{\varOmega_i}(x) = \begin{cases} 1 , & \text{if } x \in \varOmega_i , \\ 0 , & \text{else} , \end{cases} $$
(3)

and

$$ u_i = \begin{cases} \text{restoration of } f \text{ in } \varOmega_i ,\\ \text{appropriate extension in } \varOmega\setminus\varOmega_i . \end{cases} $$
(4)

2.1 Statistical Formulation of Region-Based Segmentation

In this section we provide a general region-based segmentation framework from the viewpoint of statistical (Bayesian) modeling. Following [25, 61] the partition \(\mathcal {P}_{m}(\varOmega )\) of the image domain Ω can be computed via a maximum a-posteriori probability (MAP) estimation, i.e., by maximizing the a-posteriori probability density \(p(\mathcal {P}_{m}(\varOmega )|f)\) using Bayes’ theorem. However, since we also want to restore an approximation u of the original noise free image, we have to maximize a modified a-posteriori probability density as discussed below. In order to give precise statements on probability densities we use a discrete formulation with N denoting the number of pixels (or voxels) and expressing the dependency on N by a superscript in the functions (to be interpreted as piecewise constant on pixels and identified with the finite-dimensional vector of coefficients in a suitable basis) and partitions (any subdomain restricted to be a union of a finite number of pixels). As a last step we consider the formal limit N→∞ to obtain the variational model. Since this serves as a motivation only, we will not treat the challenging problem of analyzing the continuum limit, which in the case of hierarchical Bayesian priors related to the standard Mumford-Shah model was already carried out in [40].

In the following, we have to maximize an a-posteriori probability density \(p(u^{N},\mathcal {P}^{N}_{m}(\varOmega )\mid f^{N})\), which can be rewritten as

(5)

The main advantage of the statistical formulation in (5) is the possibility to separate geometric properties of the partition of Ω (first term) from image-based features (second and third term). In addition, the densities on the right-hand side of (5) are often easier to model than the a-posteriori probability density \(p(u^{N},\mathcal {P}^{N}_{m}(\varOmega )\mid f^{N})\) itself. For the sake of completeness, we note that the probability densities \(p(\mathcal {P}^{N}_{m}(\varOmega ))\) and \(p(u^{N} \mid \mathcal {P}^{N}_{m}(\varOmega ))\) in (5) allow to incorporate a-priori information with respect to the desired partition \(\mathcal {P}^{N}_{m}(\varOmega )\) and the restoration u N in (2) into the segmentation process. Moreover, the posterior distribution \(p(f^{N} \mid u^{N},\mathcal {P}^{N}_{m}(\varOmega ))\) in (5) solely depends on the noise model in the given data f N, i.e., on the image formation process of the imaging device.

In order to characterize the a-priori probability density \(p(\mathcal {P}^{N}_{m}(\varOmega ))\) in (5), we consider a geometric prior which is most frequently used in segmentation problems. This prior provides a regularization constraint favoring smallness of the edge set

$$\varGamma^N = \bigcup_{i \neq j} ( \partial\varOmega_i^N \cap \partial\varOmega_j^N \setminus\partial\varOmega) $$

in the (d−1)-dimensional Hausdorff measure \(\mathcal {H}^{d-1}\), i.e.,

$$ p(\mathcal {P}^N_m(\varOmega )) \propto e^{- \beta \mathcal {H}^{d-1}_N(\varGamma^N)} , \quad\beta > 0 . $$
(6)

Note that in order to avoid unwanted grid effects one should use an appropriate approximation \(\mathcal {H}^{d-1}_{N}\) of the Hausdorff measure which also guarantees a correct limit as N→∞.

For characterization of the two remaining densities \(p(u^{N} \mid \mathcal {P}^{N}_{m}(\varOmega ))\) and \(p(f^{N} \mid u^{N},\mathcal {P}^{N}_{m}(\varOmega ))\) in (5) we assume in the following that the functions \(u_{i}^{N}\) in (2) are uncorrelated and independent with respect to the partition \(\mathcal {P}^{N}_{m}(\varOmega )\). This is natural since the segmentation should exactly separate the parts with different behavior of u N. Hence, due to the composition of u N by functions \(u_{i}^{N}\) and the pairwise disjoint partition of Ω N by \(\varOmega_{i}^{N}\) in (1), we obtain simplified expressions of the form

$$ p(u^N \mid \mathcal {P}^N_m(\varOmega )) = \prod_{i=1}^m p(u_i^N \mid \varOmega_i^N) $$
(7)

and

$$ p(f^N \mid u^N,\mathcal {P}^N_m(\varOmega )) = \prod_{i=1}^m p(f^N \mid u_i^N,\varOmega_i^N) , $$
(8)

where \(p(u_{i}^{N} \mid \varOmega_{i}^{N})\) and \(p(f^{N} \mid u_{i}^{N},\varOmega _{i}^{N})\) denote for a region-of-interest \(\varOmega_{i}^{N}\) the probability of observing an image \(u_{i}^{N}\) and f N, respectively.

First, we discuss the densities \(p(u_{i}^{N} \mid \varOmega_{i}^{N})\) from (7), which can be reduced to a-priori probability density functions \(p(u_{i}^{N})\). The most frequently used a-priori densities, in analogy to statistical mechanics, are Gibbs functions [31, 32] of the form

$$ p(u_i^N) \propto e^{- \alpha_i R_i^N(u_i^N)} , \quad\alpha_i > 0 , $$
(9)

where \(R_{i}^{N}\) is a discretized version of a non-negative (and usually convex) energy functional R i . Using these a-priori densities (7) is then given by

$$ p(u^N \mid \mathcal {P}^N_m(\varOmega )) \propto \prod_{i=1}^m e^{- \alpha_i R_i^N(u_i^N)} . $$
(10)

To characterize the densities \(p(f^{N} \mid u_{i}^{N},\varOmega_{i}^{N})\) in (8) we assume that each value \({f^{N}}_{|_{Px}}\) (with PxΩ N being a pixel) describes a realization of a random variable and all random variables are pairwise independent and identically distributed in the same corresponding subregion \(\varOmega_{i}^{N}\). Consequently, it is possible to replace the probability \(p(f^{N} \mid u_{i}^{N},\varOmega_{i}^{N})\) by a joint a-posteriori probability \(p_{i}(f^{N} \mid u_{i}^{N})\) in \(\varOmega _{i}^{N}\), i.e., the expression in (8) (with Px denoting a pixel) reads as

(11)

As mentioned above, we use the MAP estimator to determine an approximation of the unknown image u and a partition of the image domain \(\mathcal {P}_{m}(\varOmega )\). Following this approach, we maximize the a-posteriori probability (5), respectively minimize its negative logarithm, i.e.,

By inserting the a-priori constraints (6) and (10) for the geometric and image terms, respectively, as well as the region-based image term (11), we consequently minimize the following energy functional,

(12)

We already stated above that a suitable selection of probability densities \(p_{i}(f^{N} \mid u_{i}^{N})\) depends on the underlying physical noise model in the given data f N and the subregion \(\varOmega_{i}^{N}\). For the cases of additive Gaussian, Poisson, and multiplicative speckle noise we present the corresponding form of \(p_{i}(f^{N} \mid u_{i}^{N})\) in Sect. 2.2.

The variational problem (12) for the MAP estimate has a formal continuum limit (with α i and β rescaled by the pixel volume), which we shall consider as the basis of the proposed variational framework in the following:

(13)

Finally, we add that in the context of inverse problems the functionals R i in (13) and the Gibbs a-priori density in (9) are related to regularization functionals, whereas the resulting functionals \(\int_{\varOmega_{i}} \! - \log p_{i}(f(x) \mid u_{i}(x)) dx\) are related to data fidelity terms for each subregion Ω i .

The main advantage of the proposed region-based segmentation framework (13) is the ability to handle the information, i.e., the occurring type of noise and the desired smoothness conditions, in each subregion Ω i of the image domain Ω separately. For example, it is possible to choose different smoothing functionals R i if subregions of different characteristics are expected. Moreover, the framework in (13) describes a direct generalization of the Chan-Vese method and the region-based version of the Mumford-Shah segmentation model to non-Gaussian noise problems, which is further discussed in Sect. 4.

2.2 (Non-)Gaussian Noise Models

As mentioned above, the choice of probability densities p i (fu i ) in (13) solely depends on the noise occurring in the data f and the subregion Ω i . Typical examples for probability densities p i (fu i ) are exponentially distributed raw data f (see e.g., [23, 53]). In most cases it is (often implicitly) assumed that the data is perturbed by additive Gaussian noise. However, there are many real-life applications in which different types of noise occur, especially signal-dependent ones, which form the main interest of this paper. In this work we focus the attention on Poisson and multiplicative speckle noise. However, we note that there are also other non-Gaussian noise models which can be of particular interest, e.g., salt-and-pepper noise or different types of multiplicative noise like Gamma noise [6], multiplicative Gaussian noise [67], or Rayleigh-distributed noise [68].

In the following we discuss the selection of probability densities p i (fu i ) for additive Gaussian noise, Poisson noise, and multiplicative speckle noise. To illustrate the different characteristics of these noise forms we show in Fig. 1 a synthetic 1D signal corrupted by the mentioned three noise models. We can observe that the appearance of Poisson and speckle noise is in general stronger compared to the additive Gaussian noise. Hence, an appropriate choice of probability densities is required to handle the perturbation effects of different noise models accurately. For the sake of simplicity and since we are only interested in the formulation in (13), we will write p i (f(x)∣u i (x)) in the following. However, this term has to be interpreted as the value of pixels in the sense of a correct modeling.

Fig. 1
figure 1

Illustration of three physical noise models in one dimension. (a) Noise free 1D signal. (b) Signal biased by additive Gaussian noise with σ=5 (see Sect. 2.2.1). (c) Signal biased by Poisson noise (see Sect. 2.2.2). (d) Signal biased by speckle noise with σ=5 (see Sect. 2.2.3). Note, that we cut off a few values in case of the speckle noise model to maintain a comparable vertical scale for each signal. We observe that the Poisson and multiplicative speckle noise is much stronger than the classical additive Gaussian noise

2.2.1 Additive Gaussian Noise

One of the most commonly used noise models in computer vision and mathematical image processing is the additive Gaussian noise model (see Fig. 1(b)) of the form f=u+η, where η is a Gaussian-distributed random variable with expectation 0 and variance σ 2. This kind of noise is signal-independent and has a globally identical distribution of noise. For this case the conditional probability p i (f(x)∣u i (x)) in (11) is given by (cf. e.g., [11])

$$p_i(f(x) \mid u_i(x)) \propto e^{- \frac{1}{2\sigma^2} (u_i(x) - f(x))^2} . $$

Thus, this model leads to the following negative log-likelihood function in the energy functional E in (13),

$$ - \log p_i(f(x) \mid u_i(x)) = \frac{1}{2\sigma^2} (u_i(x) - f(x))^2 . $$
(14)

Consequently, the additive Gaussian noise model induces the commonly used L 2 data fidelity term, which is the canonical choice of fidelity in many segmentation formulations, e.g., in the Mumford-Shah or Chan-Vese model (see Sect. 4). Therefore, these segmentation methods are successful on a large class of images, since additive Gaussian noise is the most common form of noise. Finally, we mention that the factor σ 2 in (14) is neglected in the course of this work because it can be scaled by the regularization parameters α i and β in the energy functional (13).

2.2.2 Poisson Noise

In contrast to the additive noise model described above, we are also interested in Poisson or the so-called ’photon counting noise’. In Fig. 1(c) the effect of Poisson noise on an unbiased signal can be observed. This type of noise is signal-dependent and appears in a wide class of real-life applications, e.g., in positron emission tomography [78, 83], fluorescence microscopy [27, 41], CCD cameras [74], and astronomical images [46, 51]. For Poisson noise one indeed counts natural numbers as data, so also the image in the discrete modeling needs to be quantized. Thus, the conditional probability \(p_{i}({f^{N}}_{|_{Px}} \mid {u_{i}^{N}}_{|_{Px}})\) in (11) with \(Px \subset\varOmega _{i}^{N}\) is modeled as (cf. e.g., [11])

$$p_i({f^N}_{|_{Px}} \!\!\!= k \mid {u_i^N}_{|_{Px}} \!\! \!= \lambda) = \frac{\lambda^k}{k !} e^{- \lambda} , $$

and leads in the limit to the following negative log-likelihood function for the energy functional E in (13),

(15)

Note that the fidelity can be rescaled in f and u i in a straight-forward way, such that one can pass from quantized intensities on to real values in the reasonable limit of sufficiently high count rates. Appending additive terms independent of u i , the corresponding data fidelity term in (15) becomes the so-called Kullback-Leibler (KL) divergence (also known as cross entropy or I-divergence) between two non-negative measures f and u i . The most significant difference of the KL fidelity compared to the L 2 data term in (14) is the strong non-linearity in the KL functional, leading to complications in the computation of minimizers in (13).

2.2.3 Multiplicative Speckle Noise

The last noise model we want to investigate for the generalized segmentation framework (13) is signal-dependent noise of the form \(f = u + \sqrt{u} \eta\), where η is a Gaussian-distributed random variable with mean 0 and variance σ 2. The appearance of this noise form is illustrated in Fig. 1(d), in which a spatial variation of noise variance yields different signal amplitudes. This effect occurs because the noise model is of multiplicative nature, i.e., the noise variance directly depends on the underlying signal intensity. This type of noise can be found, e.g., in diagnostic ultrasound imaging [43, 45, 52] and corresponds to an experimentally derived model of multiplicative speckle noise [77] in ultrasound images. Using this noise model, the conditional probability p i (f(x)∣u i (x)) in (11) is modeled as

$$p_i(f(x) \mid u_i(x)) \propto \left(u_i(x)\right)^{-\frac {1}{2}} e^{- \frac{1}{2\sigma^2} \frac{(u_i(x) - f(x))^2}{u_i(x)}} , $$

and the negative log-likelihood function in the energy functional E in (13) is given by

(16)

Although this noise model is somewhat similar to the additive Gaussian noise model introduced in Sect. 2.2.1, its impact on the given data f differs fundamentally from the influence of additive Gaussian noise due to the multiplicative adaption of the noise level η by the signal \(\sqrt{u}\). Consequently, this aspect leads to a more complicated form of the data fidelity term and hence to more problems in the computation of minimizers in (13). As for additive Gaussian noise in Sect. 2.2.1, we can multiply the right-hand side of (16) by 2σ 2 and incorporate this scaling factor in the regularization parameter α i and β in (13).

2.3 Two-Phase Formulation of Region-Based Segmentation Framework

In the following we will restrict the proposed segmentation framework in Sect. 2.1 to a two-phase segmentation problem (i.e., we set m=2 in (1) and (13)) for multiple reasons. First, we are mainly interested in medical imaging applications in which we want to segment objects in a complex background. Second, we extensively employ exact convex relaxation (see e.g., Lemma 1) in our analysis and numerical solution. In principle an extension to multiphase models can be performed with exactly the same difficulties as in the case of the standard Chan-Vese model (cf. [13, 24, 26, 37, 48, 49, 63, 64, 79]).

For a two-phase formulation of the region-based segmentation framework (13) we introduce the following notations. First, we assume that we want to segment the image domain Ω into a background and a target subregion for the wanted partition \(\mathcal {P}_{2}(\varOmega )\) in (1), which we denote with Ω b and Ω t , respectively. Subsequently, we introduce an indicator function χ in order to represent both subregions such that

$$ \chi(x) = \begin{cases} 1 , & \text{if } x \in\varOmega_b , \\ 0 , & \text{else} . \end{cases} $$
(17)

The negative log-likelihood functions −logp i (fu i ) in (13) are defined as data fidelity functions using the notation

$$ D_i(f,u_i) = - \log p_i(f \mid u_i) \quad\text{ for } i \in \{ b,t \} . $$
(18)

Finally, we use the well-known relation between the Hausdorff measure and the total variation of an indicator function (see e.g., [5, Sect. 3.3] or [35, Ex. 1.4]), which implies that \(\mathcal {H}^{d-1}(\varGamma) = \lvert \chi \rvert _{\mathit {BV}(\varOmega )}\). Here ΓΩ is the edge set of the partition \(\mathcal {P}_{2}(\varOmega )= (\varOmega_{b}, \varOmega_{t})\), χ is defined in (17), and |⋅| BV(Ω) denotes the total variation of a function in Ω. Then the energy functional E in (13) can be rewritten for the case of a two-phase segmentation problem as

(19)

3 Analysis of Region-Based Segmentation Framework

After introduction of the general region-based segmentation framework in Sect. 2.1 we provide a mathematical analysis of the variational problem induced by the energy functional E defined in (13) in the following. For the sake of simplicity, we restrict this analysis to the two-phase formulation proposed in (19) and thus consider a variational problem of the form

$$ \min_{(u_b, u_t, \chi) \in \mathcal {D}(E)} E(u_b, u_t, \chi) , $$
(20)

for which \(\mathcal {D}(E)\) denotes the (effective) domain of the energy functional E.

3.1 Assumptions

In this section we introduce the necessary foundations for the analysis of the variational problem (20). In particular, we discuss the choice of function spaces in \(\mathcal {D}(E)\) and state the required assumptions on the data fidelity terms D i and the regularization functionals R i in (19) with i∈{b,t}.

We start with the characterization of \(\mathcal {D}(E)\) and deal with admissible sets of functions in the following (denoted by an additional (Ω)), which we assume all to be closed subsets of Banach spaces. First, we give a general formulation of these spaces and then provide detailed conditions such that all integrals in (19) are well-defined.

Assumption 1

(Function sets)

Let i∈{b,t}, then we consider the following sets of functions:

  1. (i)

    The domain W i (Ω) of R i in (19) is a subset of a Banach space \(\widetilde {W}_{i}(\varOmega )\), which is closed with respect to a topology \(\tau _{\widetilde {W}_{i}}\) (which will later be the weak or strong norm topology depending on specific examples).

  2. (ii)

    The domain U i (Ω) of D i in (19) is a subset of a Banach space \(\widetilde {U}_{i}(\varOmega )\) such that \(\widetilde {W}_{i}(\varOmega )\subset \widetilde {U}_{i}(\varOmega )\) and the embedding \(( \widetilde {W}_{i}(\varOmega ), \tau _{\widetilde {W}_{i}})\) into \(\widetilde {U}_{i}(\varOmega )\) with the strong norm topology is continuous.

  3. (iii)

    The data function set is V(Ω)=V b (Ω)∩V t (Ω), for which V i (Ω) is a subset of a Banach space \(\widetilde {V}_{i}(\varOmega )\) such that \(\widetilde {V}_{i}(\varOmega )\) is continuously embedded in \(\widetilde {U}_{i}(\varOmega )\), both associated with the strong norm topologies.

Thus, the effective domain \(\mathcal {D}(E)\) of the energy functional E in (20) is given by

Note that Assumption 1(i) and (ii) are required since the purpose of the regularization functionals R i is to restrict the admissible solution set of the minimization problem (20) to smooth functions. Additionally, the assumption on V(Ω) is necessary due to the occurrence of the data function f in both fidelity terms D i in (19).

Remark 1

The properties of the function sets in Assumption 1 are chosen in this very general form in order to provide a unified framework for all cases of data fidelity and regularization terms. These abstract assumptions become more clear for specific models in the progress of this work.

To have a first impression of the function sets W i (Ω), \(\widetilde {W}_{i}(\varOmega )\), U i (Ω), \(\widetilde {U}_{i}(\varOmega )\), V i (Ω), and \(\widetilde {V}_{i}(\varOmega )\), let us consider the choice of these sets for the simple example of additive Gaussian noise in the subregions Ω i and a squared H 1-seminorm regularization functional (cf. Sect. 3.4.1 and 3.5.1). Then we have to choose the function sets as

$$\widetilde {W}_i(\varOmega )= H^1(\varOmega) $$

and

$$W_i(\varOmega )= \left\{ u \in H^1(\varOmega) : \left| \frac {1}{\lvert \varOmega \rvert } \int_\varOmega u dx \right| \leq M_i \right\} $$

for a positive constant M i big enough and

$$V_i(\varOmega )= \widetilde {V}_i(\varOmega )= U_i(\varOmega )= \widetilde {U}_i(\varOmega )= L^2(\varOmega) . $$

We continue with assumptions on the data fidelity terms D i and the regularization functionals R i in (19), which is necessary to ensure the existence of a regularized solution. For the sake of brevity, we introduce data fidelity functionals

$$ \bar{D}_i : V_i(\varOmega )\times U_i(\varOmega )\times \mathit{BV}(\varOmega; [0,1]) \rightarrow \mathbb {R}\cup\{+\infty\} , $$
(21)

defined by

$$ \bar{D}_i(f,u,v) = \begin{cases} \int_\varOmega v D_i(f,u) dx ,& \text{if } \bigl( v D_i(f,u) \bigr) \in L^1(\varOmega) , \\ +\infty , &\text{else} . \end{cases} $$
(22)

In the following we provide the required assumptions on the functionals \(\bar{D}_{i}\). However, note that some of these can be transfered easily to assumptions on D i in (22).

Assumption 2

(Energy functionals)

Let the function sets W i (Ω), U i (Ω), and V i (Ω) satisfy Assumption 1. Then we assume the following:

  1. (i)

    For any fixed φV i (Ω) and ψBV(Ω;[0,1]), the function \(u \mapsto\bar{D}_{i}(\varphi,u,\psi)\) is bounded from below.

  2. (ii)

    For any fixed φV i (Ω) the function is lower semi-continuous with respect to the topology \(\tau _{\widetilde {W}_{i}}\), where for all xΩ.

  3. (iii)

    The functional R i :W i (Ω)→ℝ≥0 is convex and lower semi-continuous with respect to the topology \(\tau _{\widetilde {W}_{i}}\).

  4. (iv)

    For any fixed φV i (Ω) and ψBV(Ω;[0,1]),

    $$ \mathcal {D}(\bar{D}_i(\varphi, \cdot , \psi)) \cap \mathcal {D}(R_i) \neq \emptyset $$
    (23)

    holds, where \(\mathcal {D}\) denotes the effective domain of a functional. In particular, this implies that the functionals R i are proper.

  5. (v)

    For every a>0, the sub-level sets \(\mathcal{S}_{R_{i}}(a)\) of the functional R i , defined by

    $$ \mathcal{S}_{R_i}(a) := \left\{ u \in W_i(\varOmega ): R_i(u) \leq a \right\} , $$
    (24)

    are sequentially precompact with respect to the topology \(\tau _{\widetilde {W}_{i}}\).

3.2 Convex Relaxation

In this section we shortly anticipate the numerical realization of the minimization problem (20), for which we provide a theoretical basis in this section. Due to the simultaneous minimization with respect to u b , u t and χ, the problem (20) is hard to solve in general and we use an alternating minimization scheme to achieve our aim. This approach is commonly used for segmentation models in the literature (e.g., for the models of Ambrosio-Tortorelli [4], Chan-Vese [20], or Mumford-Shah [55]) and leads to the following iterative minimization process,

(25a)
(25b)

However, even the minimization step (25b) is a difficult task since this problem is nonconvex due to the non-convexity of the function set BV(Ω;{0,1}). Considering the form of the energy functional E in (19), exact convex relaxation results for such problems have been proposed by Chan, Esedoglu and Nikolova in [22], which we recall in the following.

Lemma 1

(Exact convex relaxation)

Let a∈ℝ and gL 1(Ω). Then there exists a minimizer of the constrained minimization problem

$$ \min_{\chi \in \mathit{BV}(\varOmega; \{0,1\})} a + \int_\varOmega g \chi dx + \lvert \chi \rvert _{\mathit {BV}(\varOmega )} , $$
(26)

and every solution is also a minimizer of the relaxed problem

$$ \min_{v \in \mathit{BV}(\varOmega; [0,1])} a + \int_\varOmega g v dx + \lvert v \rvert _{\mathit {BV}(\varOmega )} , $$
(27)

consequently leading to the fact that the minimal functional values of (26) and (27) are equal. Moreover, if \(\hat{v}\) solves (27), then for almost every μ∈(0,1) the indicator function

$$\hat{\chi}(x) = \begin{cases} 1 , & \text{\textit{if} } \hat{v}(x) > \mu , \\ 0 , & \text{\textit{else}} , \end{cases} $$

solves (26) and thus also (27).

Since we use the alternating minimization strategy proposed in (25a)–(25b), there is an immediate implication of Lemma 1.

Theorem 1

Let W i (Ω), U i (Ω), V(Ω), \(\bar{D}_{i}\), and R i satisfy Assumptions 1 and 2. Additionally, we assume that fV(Ω) and

$$ D_b(f,u_b) - D_t(f,u_t) \in L^1(\varOmega) , \quad\forall u_i \in W_i(\varOmega ). $$
(28)

Here, if \((\hat{u}_{b}, \hat{u}_{t}, \hat{\chi})\) is a minimizer of (20), then it is also a minimizer of the relaxed problem

$$ \min_{(u_b, u_t, v) \in \mathcal {D}_{rel}(E)} E(u_b, u_t, v) , $$
(29)

where \(\mathcal {D}_{rel}(E)\) denotes the relaxed (effective) domain of the energy functional E and is given by

Moreover, if \((\hat{u}_{b}, \hat{u}_{t}, \hat{v})\) solves (29), then for almost every μ∈(0,1) and

$$\hat{\chi}(x) = \begin{cases} 1 , & \text{\textit{if} } \hat{v}(x) > \mu , \\ 0 , & \text{\textit{else}} , \end{cases} $$

the triple \((\hat{u}_{b}, \hat{u}_{t}, \hat{\chi})\) is a minimizer of (20).

Proof

For fixed u b W b (Ω) and u t W t (Ω), set

$$a = \int_\varOmega D_t(f,u_t) dx + \alpha_b R_b(u_b) + \alpha_t R_t(u_t) $$

and

$$g = D_b(f,u_b) - D_t(f,u_t) \stackrel{(28)}{\in} L^1(\varOmega) . $$

Thus, due to \(\mathcal {D}(E) \subset \mathcal {D}_{rel}(E)\) and Lemma 1 we have

$$\min_{v \in \mathit{BV}(\varOmega; [0,1])} E(u_b,u_t,v) = \min_{v \in \mathit{BV}(\varOmega; \{0,1\})} E(u_b,u_t,v) $$

and we can conclude that

 □

3.3 Existence of Minimizers

In this section we verify the existence of a minimizer of (20). For this reason, we first show the sequential lower semi-continuity of the energy functional E. With respect to the following lemma we note that the weak* topology represents the natural choice of a topology in BV(Ω).

Lemma 2

(Sequential lower semi-continuity)

Let W i (Ω), U i (Ω), V(Ω), \(\bar{D}_{i}\), and R i satisfy Assumptions 1 and 2, and let the data function fV(Ω). Moreover, let \(( u_{i}^{n} )\) be any \(\tau _{\widetilde {W}_{i}}\)-convergent sequence to some u i in W i (Ω) and v n v in BV(Ω) with 0≤v n≤1 almost everywhere. Then, for

$$ \underbrace{D_b(f,u^n_b) - D_t(f,u_t^n)}_{=: g^n} \stackrel {L^1(\varOmega)}{\rightarrow} \underbrace{D_b(f,u_b) - D_t(f,u_t)}_{=: g} , $$
(30)

we have

$$E(u_b,u_t,v) \leq \liminf_{n \rightarrow \infty} E(u_b^n,u_t^n,v^n) . $$

Proof

Due to the sequential lower semi-continuity of the functionals R b , R t , and the total variation seminorm |⋅| BV(Ω) (cf. e.g., [5, Prop. 3.6]), it is sufficient to show the following convergence

$$\int_\varOmega v^n g^n dx \rightarrow \int_\varOmega v g dx , $$

where g n and g are defined as in (30). For this purpose we consider the estimate

(31)

The first term on the right-hand side of (31) vanishes in the limit due to \(\lVert v^{n} \rVert _{L^{\infty}(\varOmega)} \leq1\) and assumption (30). For the second term, the definition of the weak* topology on BV(Ω) implies v nv in L 1(Ω) (cf. e.g., [5, Def. 3.11]). Moreover, as \(0 \leq v^{n} \leq1 \ \mathrm{a.e.}\), the Banach-Alaoglu theorem [54, Thm. 2.6.18] delivers the existence of a subsequence \((v^{n_{j}})\) with \(v^{n_{j}} \rightharpoonup^{*} \tilde{v}\) in L (Ω) and thus \(v^{n_{j}} \rightharpoonup \tilde{v}\) in L 1(Ω). The uniqueness of the limit yields \(\tilde{v} = v\), and we find that each subsequence of (v n) has a subsequence converging to v in the weak*-sense, which implies \(\bigl | \langle v^{n} - v, g \rangle\bigr| \rightarrow 0 \). □

Theorem 2

(Existence of a minimizer)

Let W i (Ω), U i (Ω), V(Ω), \(\bar{D}_{i}\), and R i satisfy Assumptions 1 and 2. Assume that α i , β>0 and fV(Ω) be fixed. Moreover, let (30) hold for any sequence \((u^{n}_{i})\) converging to u i with respect to the topology \(\tau _{\widetilde {W}_{i}}\). Then there exists a minimizer of (29) and consequently also of (20).

Proof

With respect to Assumption 2 and Lemma 2 we use the direct method of the calculus of variations (see e.g., [7, Sect. 2.1.2]). Due to the boundedness of \(\bar {D}_{i}\) from below by a constant C i ∈ℝ and due to α i , β>0 we have an inequality of the form

(32)

where \((u_{b}^{n},u_{t}^{n},v^{n})\) is a minimizing sequence of E in \(\mathcal {D}_{rel}(E)\) with \(E(u_{b}^{n},u_{t}^{n},v^{n}) \leq C\) and C>(C b +C t ). By using standard arguments, we obtain the existence of a minimizer of (29) and thus by Theorem 1 the existence of a minimizer of (20). □

3.4 Discussion of Regularization Functionals

In this section we investigate different regularization functionals that allow to incorporate a-priori information of possible solutions in the proposed segmentation framework. We focus our attention on the Fisher information and the most frequently used squared H 1-seminorm regularization. The main objective of this section is to verify the properties of these functionals with respect to the requirements in Assumption 2.

3.4.1 The Squared H 1-Seminorm

In this section we consider the case of the squared H 1-seminorm regularization (sometimes also called Dirichlet functional),

$$ R_i(u) = \frac{1}{2} \int_\varOmega \lvert \nabla u \rvert ^2 dx , \quad i \in \{b,t\} . $$
(33)

Considering this smoothing functional, we choose the function space \(\widetilde {W}_{i}(\varOmega )\) as the Sobolev space H 1(Ω) and the subset W i (Ω) as

$$ W_i(\varOmega )= \left\{ u \in H^1(\varOmega) : \left| \frac {1}{\lvert \varOmega \rvert } \int_\varOmega u dx \right| \leq M_i \right\} $$
(34)

with a constant M i >0 big enough. Assumptions 2(iii) and (v) are then satisfied if \(\tau _{\widetilde {W}_{i}}\) is the weak topology on H 1(Ω). More precisely, the following lemma gives an overview on properties of the functional in (33).

Lemma 3

(Properties of squared H 1-seminorm)

Let W i (Ω) be the function set defined in (34) and R i the squared H 1-seminorm (33) defined on W i (Ω), then the following statements hold:

  1. (i)

    R i is convex.

  2. (ii)

    R i is lower semi-continuous with respect to the weak topology on H 1(Ω).

  3. (iii)

    Let a>0 and \(\mathcal{S}_{R_{i}}(a)\) be a sub-level set of R i defined in (24). Then \(\mathcal{S}_{R_{i}}(a)\) is sequentially precompact with respect to the weak topology on H 1(Ω).

Proof

First, note that W i (Ω)⊂H 1(Ω) is convex and closed with respect to the strong norm topology on H 1(Ω). Thus, W i (Ω) is also weakly closed (cf. e.g., [9, p. 29, Prop. 1.2.5]).

(i) It is obvious that R i is convex on H 1(Ω) and thus on W i (Ω), since W i (Ω) is a convex subset.

(ii) It is also obvious that R i defined on H 1(Ω) is lower semi-continuous with respect to the strong norm topology on H 1(Ω). By [28, p. 11, Cor. I.2.2], R i defined on H 1(Ω) is thus also lower semi-continuous with respect to the weak topology and the assertion holds since W i (Ω) is weakly closed.

(iii) First, we mention that the functional \(\lVert \cdot \rVert ^{*}_{p,H^{1}(\varOmega)}\) given by

$$ \lVert u \rVert ^*_{p,H^1(\varOmega)} := \left( \left| \frac{1}{\lvert \varOmega \rvert } \int_\varOmega u^p dx \right|^{\frac{2}{p}} + \int_\varOmega \lvert \nabla u \rvert ^2 dx \right)^{\frac{1}{2}} $$
(35)

defines an equivalent norm on H 1(Ω) for p∈{1,2} (cf. [56, Thm. 7.1]). Hence we obtain that \(\lVert u \rVert ^{*}_{1,H^{1}(\varOmega)}\) is uniformly bounded for every \(u \in\mathcal {S}_{R_{i}}(a)\) and the weak precompactness of sub-level sets follows directly from the Banach-Alaoglu theorem (cf. e.g., [54, Thm. 2.6.18]) since H 1(Ω) is reflexive. □

Finally, we recall some basic results about the Sobolev space H 1(Ω), which is needed in the course of following sections.

Lemma 4

(Embedding of H 1(Ω))

Let Ω⊂ℝd be an open and bounded subset with a Lipschitz boundary, and d≥1. Then the following compact embedding of H 1(Ω) holds,

(36)

Additionally, H 1(Ω) is continuously embedded in L r(Ω) for \(r = \frac{2d}{d-2}\) and d≥3.

Proof

See [2, Thm. 6.2] and [3, Thm. 4.12]. □

3.4.2 The Fisher Information

In the following we discuss the Fisher information as regularization functional, which can be written as

$$ R_i(u) = \frac{1}{2} \int_\varOmega\frac{\lvert \nabla u \rvert ^2}{u} dx , \quad u \geq 0 \ \mathrm{a.e.} , $$
(37)

with i∈{b,t}. We use this form of Fisher information since the numerical motivation for this regularization energy gets more apparent in (37). For a more rigorous definition we refer to [34], in which its formulation based on a weak definition of the “logarithmic gradient” in suitable measure spaces. Under some regularity assumptions on u the energy in (37) can also be written as (cf. [34, 80])

$$ R_i(u) = \frac{1}{2} \int_\varOmega u \lvert \nabla\log u \rvert ^2 dx = 2 \int_\varOmega \lvert \nabla\sqrt{u} \rvert ^2 dx . $$
(38)

The use of this regularization energy is motivated by the fact that the functional in (37) is one-homogeneous and thus seems to be more appropriate for density functions than the squared H 1-seminorm in (33). This is particularly significant in the context of problems with measured data corrupted by Poisson or speckle noise, since in such applications the desired functions typically represent densities or intensity information. Furthermore, the adaptive regularization property of the denominator u in (37) is additionally useful, since the background region of an image (with assumed low intensities) will be regularized stronger than the target subregion. Finally, we note that the Fisher information energy has already been used as regularization functional in density estimation problems [14, 58].

In the case of the Fisher information (37) as regularization energy, we use the result stated in [34, Lemma 2.2],

for which the conditions get obvious with regard to the second identity in (38). Thus we choose the function space \(\widetilde {W}_{i}(\varOmega )\) as \(L^{\frac{r}{2}}(\varOmega)\) with r given in (36), as well as

(39)

with a positive constant M i big enough. Then Assumptions 2(iii) and (v) are satisfied if \(\tau _{\widetilde {W}_{i}}\) is the strong norm topology on \(L^{\frac {r}{2}}(\varOmega)\) as we can see in the following lemma.

Lemma 5

(Properties of Fisher information)

Let W i (Ω) be the function set defined in (39) with r chosen as in (36) and R i the Fisher information (37) defined on W i (Ω). Then the following statements hold:

  1. (i)

    R i is convex.

  2. (ii)

    R i is lower semi-continuous with respect to the strong norm topology on \(L^{\frac{r}{2}}(\varOmega)\).

  3. (iii)

    Let a>0 and \(\mathcal{S}_{R_{i}}(a)\) be a sub-level set of R i defined in (24). Then \(\mathcal {S}_{R_{i}}(a)\) is sequentially precompact with respect to the strong norm topology on \(L^{\frac{r}{2}}(\varOmega)\).

Proof

(i) It is easy to show that the mapping \((x,y) \mapsto\frac{x^{2}}{y}\) is convex for y≥0 defining \(\frac {x^{2}}{y} = +\infty\) for y=0. Thus R i is convex since W i (Ω) is a convex subset of \(\{u \in L^{\frac{r}{2}}(\varOmega) : u \geq 0 \mathrm{\ a.e.} \}\).

(ii) Let (u n ) be a sequence in

$$S(\varOmega) := \{ u \in L^{\frac{r}{2}}(\varOmega) : u \geq 0 \mathrm{\ a.e.\ } : \sqrt{u} \in H^1(\varOmega) \} $$

and uS(Ω) such that u n u in \(L^{\frac{r}{2}}(\varOmega)\), i.e., also \(\sqrt{u_{n}} \rightarrow\sqrt {u}\) in L r(Ω). Since r is given as in (36) and ∇:L 2(Ω)→H −1(Ω) is a continuous linear operator, we additionally have that \(\nabla\sqrt{u_{n}} \rightarrow\nabla\sqrt{u}\) in H −1(Ω), where H −1(Ω) is the dual space of \(H_{0}^{1}(\varOmega)\). We can also conclude that the sequence \((\sqrt {u_{n}})\) is bounded in H 1(Ω) and thus there exists a weakly convergent subsequence in H 1(Ω) which converges against \(\sqrt{u}\) due to the strong convergence of \((\sqrt{u_{n}})\) in L 2(Ω) and \((\nabla\sqrt{u_{n}})\) in H −1(Ω). Thus we obtain that every subsequence of \((\sqrt{u_{n}})\) has a weakly convergent subsequence in H 1(Ω) with the same limit and consequently \(\sqrt{u_{n}} \rightharpoonup\sqrt{u}\) in H 1(Ω). Finally, since the squared H 1-seminorm (33) is lower semi-continuous with respect to the weak convergence in H 1(Ω) and due to relation (38), we obtain that the Fisher information is lower semi-continuous with respect to the strong norm topology on \(L^{\frac{r}{2}}(\varOmega)\).

(iii) Using the equivalent norm on H 1(Ω) in (35) with p=2, we see that the set

$$ \left\{ \sqrt{u} : u \in \mathcal{S}_{R_i}(a) \right\} $$
(40)

is bounded in H 1(Ω). Additionally, we note that the pointwise mapping uu 2 is continuous from L r(Ω) to \(L^{\frac{r}{2}}(\varOmega)\). Thus, the assertion is ensured since the set in (40) is compact in L r(Ω) due to the compact embedding of H 1(Ω) in Lemma 4 and the convexity of this set. □

Remark 2

In view of the squared H 1-seminorm in (33), one might also consider an approximation of the Fisher information (37) of the form

$$ R_i(u) = \frac{1}{2} \int_\varOmega\frac{\lvert \nabla u \rvert ^2}{w} dx , \quad w \in L^\infty(\varOmega) \ \text{and} \ w \geq 0 \mbox{ a.e.} , $$
(41)

where w is a given function and i∈{b,t}. Notice that the condition wL (Ω) (or more precisely the boundedness from above) is required to ensure the coercivity of the functional in (41). The main motivation to use this weighted version of the squared H 1-seminorm is the simpler form of (41) compared to the Fisher information formulation in (37), since the denominator in (41) is known. Furthermore, if the function w is chosen as an appropriate approximation of u such that the magnitude of w is close to the data function f, then the functional (41) is approximately one-homogeneous. Simultaneously, the analytical results from Lemma 3 also hold for the weighted version (41), due to the non-negativity of w.

3.5 Discussion of Data Fidelity Terms

We discuss in this section the properties of data fidelity terms that are used for the cases of additive Gaussian noise, Poisson noise, and multiplicative speckle noise. For this purpose we recall that a data fidelity functional \(\bar{D}_{i}(f,u,v)\), i∈{b,t}, is induced by a negative log-likelihood function D i (f,u) (cf. (18) and (22)) by setting

$$ \bar{D}_i(f,u,v) = \int_\varOmega v D_i(f,u) dx $$
(42)

with

$$D_i(f,u) = - \log p_i(f \mid u) . $$

In the case of the noise models mentioned above the negative log-likelihood function D i (f,u) is chosen as described in Sect. 2.2. In the following the objective is to verify the properties of \(\bar{D}_{i}(f,u,v)\) regarding the requirements stated in Assumption 2. We also prove the conditions (28) and (30), which are required for the existence of a minimizer in Sect. 3.3 above. However, it is sufficient to verify only (30), due to the similarity of the respective statements. Finally, note that the following analysis strongly depends on the general choice of regularization functionals in the segmentation framework. Therefore, we focus in this work on the Fisher information and squared H 1-seminorm regularization proposed in Sect. 3.4.

3.5.1 Additive Gaussian Noise

We begin our investigation by discussion of the additive Gaussian noise model and consider the fidelity functional \(\bar {D}_{i}(f,u,v)\) in (42) using the following negative log-likelihood function (cf. (14)),

$$ D_i(f,u) = \frac{1}{2} (u - f)^2 . $$
(43)

Due to this form, we choose the function sets in Assumption 1 as

$$V_i(\varOmega )= \widetilde {V}_i(\varOmega )= U_i(\varOmega )= \widetilde {U}_i(\varOmega )= L^2(\varOmega) $$

and summarize the properties of this fidelity term in the following lemma.

Lemma 6

(Properties of additive Gaussian noise model)

Let D i (f,u) be defined as in (43). Moreover, we assume that fL 2(Ω) and vBV(Ω;[0,1]) are fixed. Then the following statements hold:

  1. (i)

    \(\bar{D}_{i}(f, \cdot , v)\) is nonnegative and convex on L 2(Ω).

  2. (ii)

    is lower semi-continuous with respect to the weak topology on H 1(Ω) and the strong norm topology on L 2(Ω).

  3. (iii)

    The statement D i (f,u n)→D i (f,u) in L 1(Ω) (cf. (30)) holds

    • if u nu in H 1(Ω), i.e., using H 1-seminorm regularization;

    • if u nu in \(L^{\frac{r}{2}}(\varOmega )\), r as in (36), and d≤3, i.e., using Fisher information regularization.

Proof

(i) Obviously both properties hold due to the non-negativity of v and D i (f,u), and the convexity of the mapping uu 2.

(ii) It is clear that the functional is lower semi-continuous with respect to the strong norm topology on H 1(Ω) and L 2(Ω). With [28, p. 11, Cor. I.2.2] it is then also lower semi-continuous with respect to the weak topology on H 1(Ω) since the functional is convex.

(iii) Since fL 2(Ω), we have to show that

$$\lVert (u^n)^2 - u^2 \rVert _{L^1(\varOmega)} \rightarrow 0 \quad \text{ and } \quad \lVert u^n - u \rVert _{L^2(\varOmega)} \rightarrow 0 . $$

However, due to the inequality

$$\lVert (u^n)^2 - u^2 \rVert _{L^1(\varOmega)} \leq \lVert u^n + u \rVert _{L^2(\varOmega)} \lVert u^n - u \rVert _{L^2(\varOmega)} $$

and the fact that (u n) is uniformly bounded in L 2(Ω) if the sequence is convergent, it is sufficient to show that

$$ u^n \rightarrow u \quad\text{in } L^2(\varOmega) . $$
(44)

Hence, if we assume u nu in H 1(Ω), we directly obtain the required condition (44) due to the compact embedding of H 1(Ω) in L 2(Ω). Furthermore, assuming u nu in \(L^{\frac{r}{2}}(\varOmega)\) with r as given in (36), the condition (44) is fulfilled if d≤3. □

3.5.2 Poisson Noise

In this section we consider the case of the Poisson noise model, for which the data fidelity functional \(\bar {D}_{i}(f,u,v)\) in (42) is induced by the following negative log-likelihood function (cf. (15)),

$$ D_i(f,u) = u - f \log u . $$
(45)

Disregarding additive terms independent of u, the corresponding fidelity functional in (42) is the so-called Kullback-Leibler divergence (also known as cross entropy or I-divergence) between two nonnegative measures f and u. Consequently, we set in Assumption 1 the function sets

$$\widetilde {U}_i(\varOmega )= L^1(\varOmega) \quad\text{and} \quad \widetilde {V}_i(\varOmega )= L^\infty (\varOmega) , $$

as well as (see [65, Sects. 3.1 and 3.3])

(46)

and

(47)

and summarize the properties of this fidelity term in the following lemma.

Lemma 7

(Properties of Poisson noise model)

Let U i (Ω) and V i (Ω) be defined as in (46) and (47), respectively, and D i (f,u) as in (45). We also assume that vBV(Ω;[0,1]) and fV i (Ω) are fixed. Then the following statements hold:

  1. (i)

    \(\bar{D}_{i}(f, \cdot , v)\) is bounded from below on U i (Ω).

  2. (ii)

    is lower semi-continuous with respect to the weak topology on H 1(Ω) and the strong norm topology on L 1(Ω).

  3. (iii)

    The statement D i (f,u n)→D i (f,u) in L 1(Ω) (cf. (30)) holds

    • if u nu in H 1(Ω), i.e., using H 1-seminorm regularization;

    • if u nu in \(L^{\frac{r}{2}}(\varOmega )\), r as in (36), i.e., using Fisher information regularization.

Proof

(i) As the function uD i (f,u) attains its minimum at u=f, the assertion follows directly since v and \(\lVert f \log f - f \rVert _{L^{1}(\varOmega)}\) are bounded.

(ii) Due to the compact embedding of H 1(Ω) in L 2(Ω)⊂L 1(Ω), it is sufficient to show the lower semi-continuity with respect to the strong norm topology on L 1(Ω). In the following we show that the functional is even continuous with respect to the strong norm topology on L 1(Ω). For this purpose note that

such that we need to prove D i (f,u n)→D i (f,u) for u nu in L 1(Ω). Thus, using that fL (Ω), we have to show

$$\lVert u^n - u \rVert _{L^1(\varOmega)} \rightarrow 0 \quad\text{and}\quad \lVert \log u^n - \log u \rVert _{L^1(\varOmega)} \rightarrow 0 . $$

However, since the log function is Lipschitz continuous on [c,+∞) with c given in (46), we obtain with the mean value theorem the following inequality,

$$\lVert \log u^n - \log u \rVert _{L^1(\varOmega)} \leq \frac{1}{c} \lVert u^n - u \rVert _{L^1(\varOmega)} , $$

and conclude that D i (f,u n)→D i (f,u) if u nu in L 1(Ω).

(iii) To prove the assertion it suffices to show that

$$ u^n \rightarrow u \quad\text{in} \ L^1(\varOmega) , $$
(48)

as we have seen in the proof of Lemma 7(ii). Hence, if we assume u nu in H 1(Ω), we directly obtain the required condition (48) due to the compact embedding of H 1(Ω) in L 2(Ω)⊂L 1(Ω). Furthermore, supposing u nu in \(L^{\frac {r}{2}}(\varOmega)\) with r as given in (36), the condition (48) is fulfilled directly. □

3.5.3 Multiplicative Speckle Noise

Finally, we discuss the multiplicative speckle noise model presented in Sect. 2.2.3. Here the data fidelity functional \(\bar {D}_{i}(f,u,v)\) in (42) is induced by the negative log-likelihood function given in (16),

$$ D_i(f,u) = \frac{(u - f)^2}{u} + \sigma^2 \log u . $$
(49)

Considering the specific form of this function, we set the function sets in Assumption 1 as following,

$$V_i(\varOmega )= \widetilde {V}_i(\varOmega )= L^\infty(\varOmega) \quad \text{and} \quad \widetilde {U}_i(\varOmega )= L^1(\varOmega) , $$

as well as

$$ U_i(\varOmega )= \left\{ u \in L^1(\varOmega) : u \geq c > 0 \mbox{ a.e.} \right\} , $$
(50)

and describe the properties of the corresponding fidelity functional in the lemma below.

Lemma 8

(Properties of multiplicative speckle noise model)

Let U i (Ω) be as defined in (50) and D i (f,u) as in (49). Moreover, we assume that fL (Ω) and vBV(Ω;[0,1]) are fixed. Then the following statements hold:

  1. (i)

    \(\bar{D}(f, \cdot , v)\) is bounded from below on U i (Ω).

  2. (ii)

    is lower semi-continuous with respect to the weak topology on H 1(Ω) and the strong norm topology on L 1(Ω).

  3. (iii)

    The statement D i (f,u n)→D i (f,u) in L 1(Ω) (cf. (30)) holds

    • if u nu in H 1(Ω), i.e., using H 1-seminorm regularization;

    • if u nu in \(L^{\frac {r}{2}}(\varOmega)\), r as in (36), i.e., using Fisher information regularization.

Proof

(i) Let uU i (Ω), then \(\bar{D}_{i}(f,u,v) \geq \lvert \varOmega \rvert \sigma^{2} \log c\) for all c>0 in (50).

(ii) We proceed analogously to Lemma 7(ii) and thus need to prove D i (f,u n)→D i (f,u) for u nu in L 1(Ω). In this context we use the identity

$$\frac{(u - f)^2}{u} = u - 2f + \frac{f^2}{u} , $$

and exploit the results from Lemma 7(ii) with the observation that we also have to show

$$\bigg\|\frac{1}{u^n} - \frac{1}{u}\bigg\|_{L^1(\varOmega)} \rightarrow 0 . $$

We can use that the mapping \(u \mapsto\frac{1}{u}\) is Lipschitz continuous on [c,+∞) with c as given in (50) and obtain with the mean value theorem that

$$\bigg\|\frac{1}{u^n} - \frac{1}{u}\bigg\|_{L^1(\varOmega)} \leq \frac{1}{c^2} \lVert u^n - u \rVert _{L^1(\varOmega)} . $$

Finally, we obtain D i (f,u n)→D i (f,u) if u nu in L 1(Ω) and the continuity of with respect to the strong norm topology on L 1(Ω).

(iii) Using the results proven in Lemma 8(ii) we can argument analogously to the case of the Poisson noise model in Lemma 7(iii). □

We finally mention that the assumption fL (Ω) can be weakened in favor of the more natural condition fL 2(Ω) by an alternative proof taking into account the special structure of speckle noise data fidelity term and considering specific regularization terms. The coercivity of the functional is clear and it thus remains to consider the weak lower semi-continuity, which (by compact embeddings of BV(Ω), respectively H 1(Ω)), can be reduced to the lower semi-continuity of functionals of the form

with respect to the strong convergence of v and u in L p(Ω) for an appropriate p>1. Now using the a-priori boundedness of v it is straight-forward to pass to the limit in the first two terms. Thus it remains to verify the weak lower semi-continuity of

$$(v,u) \mapsto \int_{\varOmega} v \frac{f^2}{u} dx . $$

We can now substitute variables and use that \(w = \sqrt{v}\) still converges strongly in some L q(Ω). Since the functional \(\int _{\varOmega} w^{2} \frac{f^{2}}{u} dx\) is jointly convex with respect to w and u it is also lower semi-continuous in suitable reflexive Banach spaces L q(Ω) and L p(Ω), respectively.

We mention that in the above data term the restriction that u is bounded away from zero can also be removed using the joint convexity. However, in the overall functional the nonconvex term including logu cannot be shown to be lower semi-continuous without this restriction. This seems not to be just a technical issue, but a fundamental property of the multiplicative speckle noise model. Indeed, without a lower bound it is extremely favorable to have very small values of u if f is small due to the log term. Hence there is a certain bias for decreasing u, which might be removed if the logarithm term in the functional (49) is ignored. In our numerical tests both cases gave very similar results, so it might be possible to choose a convex speckle noise model without the logarithmic term in practice, which is easier to handle numerically and allows a complete analysis without restrictive assumptions.

3.6 Boundedness of Function Mean Values

In previous sections we discussed the analytical properties of different regularization functionals and data fidelity terms, which were required to verify the existence of a minimizer of (20). In the course of this discussion the boundedness of function mean values in (34) and (39) was needed in order to prove the sequential precompactness of the sub-level sets of the squared H 1-seminorm and Fisher information functional. In the following we show that the existence of such bounds on mean values can be guaranteed directly if the energy functional E in (19) is bounded.

Lemma 9

(Boundedness of function mean values)

Let V(Ω) satisfy Assumption 1 and we choose the sets U i (Ω) and V i (Ω) as in Sect3.5. Moreover, let fV(Ω), vBV(Ω;[0,1]), and α i , β>0. Let M i =∞ for W i (Ω) as given in (34) and (39). For a set of \((u_{b},u_{t},v) \in \mathcal {D}_{rel}(E)\) such that E(u b ,u t ,v) is bounded, i.e., it exists a constant C with E(u b ,u t ,v)≤C, there exist constants \(\bar{C}_{b}(C)\), \(\bar{C}_{t}(C) > 0\) independent of u b ,u t , and v such that

(51)

where c i are the mean values of u i , i∈{b,t}.

Proof

Analogously to the proof of Theorem 2, let C>(C b +C t ), for which C i ∈ℝ are lower bounds of the data fidelity terms \(\bar{D}_{i}\). We choose functions u b and u t such that E(u b ,u t ,v)≤C and

where r is given as in (36) and i∈{b,t}. The aim is to show that there exist constants \(\bar{C}_{i}(C) > 0\) independent of u t ,u b , and v such that (51) is fulfilled. For the sake of completeness, we note that the constants \(\bar{C}_{i}(C)\) also depend on the image domain Ω⊂ℝd, the dimension d, the lower bounds C i of the data fidelity terms \(\bar{D}_{i}\), the data function f, and the regularization parameters α i , β. However, for reasons of clarity, we refrain to indicate all dependencies and use only the bound C which is most crucial for the statement in Lemma 9.

To prove this assertion we proceed in two steps. At first we utilize the form of the data fidelity terms and show that \(\lVert v u_{b} \rVert _{L^{1}(\varOmega)}\) and \(\lVert ( 1 - v ) u_{t} \rVert _{L^{1}(\varOmega)}\) are bounded. Subsequently, these results and the form of the regularization functionals are used in order to show the boundedness presented in (51).

We start with the fidelity functional \(\bar{D}_{b}(f,u_{b},v)\) for the case of additive Gaussian noise discussed in Sect. 3.5.1. Due to the positivity of the used regularization functionals and the boundedness of \(\bar{D}_{t}\) from below (cf. Assumption 2), we obtain by using the Cauchy-Schwarz inequality

which results in

$$\lVert v u_b \rVert _{L^1(\varOmega )} \leq \sqrt{2 |\varOmega| (C - C_t)} + \sqrt {|\varOmega|} \lVert f \rVert _{L^2(\varOmega )} . $$

Thus \(\lVert v u_{b} \rVert _{L^{1}(\varOmega )}\) is bounded due to fL 2(Ω) in the case of additive Gaussian noise (cf. Sect. 3.5.1).

Next, we consider the Poisson noise fidelity functional \(\bar {D}_{b}(f,u_{b},v)\) investigated in Sect. 3.5.2. Using the properties of the Kullback-Leibler divergence presented, e.g., in [65, Lemma 3.3] (cf. comments to (45)), we obtain the estimate

(52)

Adding \(\frac{4 (C - C_{t})^{2}}{9}\) on both sides of (52) yields

and consequently

Thus \(\lVert v u_{b} \rVert _{L^{1}(\varOmega )}\) is bounded due to fL (Ω) in the case of Poisson noise (cf. Sect. 3.5.2).

We proved above that \(\lVert v u_{b} \rVert _{L^{1}(\varOmega )}\) is bounded in the case of additive Gaussian and Poisson noise fidelity term. Analogously we can show the boundedness of \(\lVert ( 1 - v )u_{t} \rVert _{L^{1}(\varOmega )}\). In the following we prove (51), assuming that there exist constants \(\tilde{C}_{b}(C)\), \(\tilde{C}_{t}(C) > 0\) such that

(53)

We start with the squared H 1 -seminorm (33) as regularization functional R i and split u i into

$$ u_i = c_i + w_i $$
(54)

with c i being the mean value of u i and w i :=u i c i satisfying ∫ Ω w i dx=0. For this purpose, we use the Poincaré-Wirtinger inequality (see e.g., [7, Sect. 2.5.1]), which gives us

(55)

where C 1>0 is a constant which depends on Ω⊂ℝd and d only, and C 2>0 is specified as in (32). Using the boundedness of \(\lVert v u_{b} \rVert _{L^{1}(\varOmega )}\), we obtain

and consequently

$$ \lvert c_b \rvert \lVert v \rVert _{L^1(\varOmega )} \stackrel{(55)}{\leq} \tilde{C}_b(C) + \sqrt{C_1 C_2} \quad \bigl( =: \bar{C}_b(C) \bigr) . $$

We can conclude analogously that there exists a constant \(\bar{C}_{t}(C)\) such that \(\lvert c_{t} \rvert \lVert 1-v \rVert _{L^{1}(\varOmega )} \leq\bar{C}_{t}(C)\).

Now we investigate the case of the Fisher information (37) as regularization functional R i . Since we assume u i ≥0 a.e., we split \(\sqrt{u_{i}}\) into

$$ \sqrt{u_i} = c_i + w_i $$
(56)

with c i being the mean value of \(\sqrt{u_{i}}\) and \(w_{i} := \sqrt {u_{i}} - c_{i}\) satisfying ∫ Ω w i dx=0. To show the boundedness of the mean value of u i we observe

such that further estimates for \(\lVert c_{i} \rVert _{L^{2}(\varOmega )}\) and \(\lVert w_{i} \rVert _{L^{2}(\varOmega )}\) are required. As in (55) the Poincaré-Wirtinger inequality yields an estimate of \(\lVert w_{i} \rVert _{L^{2}(\varOmega )}\),

(57)

where C 3>0 is a constant which depends only on Ω⊂ℝd and d, and C 2>0 is specified as in (32). In addition we have

and (57) yields the following estimate,

$$ \lvert c_b \rvert \lVert v \rVert _{L^2(\varOmega )} \leq \tilde{C}_b(C) + \sqrt {\frac{C_3 C_2}{2}} . $$

Thus, there exists a constant \(\bar{C}_{b}(C)\) with \(\lvert c_{b} \rvert \lVert v \rVert _{L^{1}(\varOmega )} \leq\bar{C}_{b}(C)\). Analogously, we can conclude that there exists a constant \(\bar{C}_{t}(C)\) such that \(\lvert c_{t} \rvert \lVert 1-v \rVert _{L^{1}(\varOmega )} \leq\bar{C}_{t}(C)\). □

Remark 3

With some modifications, the statement of Lemma 9 can also be proved in the case of the multiplicative speckle noise data fidelity term presented in Sect. 3.5.3. For this we note that u b c>0 a.e. (50) and thus we can obtain the following estimate using the Cauchy-Schwarz inequality,

Hence, for all c>0 in (50) we have σ 2logu b σ 2logc and

Now, we are able to use the same strategies as for the Poisson noise model in the proof of Lemma 9.

3.7 Existence of Minimizers for Proposed Noise and Regularization Models

In Theorem 2 we proved the existence of minimizers of (20) using the most general assumptions on data fidelity and regularization functionals without further knowledge about the specific form of these terms (see Assumption 2). Subsequently, we verified the corresponding assumptions for three different noise models (additive Gaussian, Poisson, and multiplicative speckle noise) and two regularization functionals ((weighted) squared H 1-seminorm and Fisher information) in Sects. 3.43.6. For reasons of clarity we summarize the presented results for the proposed noise and regularization models again in the following.

Lemma 10

(Existence of a minimizer for proposed noise and regularization models)

Let V(Ω) satisfy Assumption 1 and we choose the sets U i (Ω) and V i (Ω) as in Sect3.5. Moreover, let fV(Ω) and α i , β>0 be fixed. Let M i =∞ for W i (Ω) as given in (34) and (39), then there exists a minimizer of (29) and consequently also of (20) corresponding to each investigated noise and regularization model.

Proof

Analogously to the proof of Theorem 2 let C>(C b +C t ), where C i ∈ℝ are lower bounds of the data fidelity terms \(\bar{D}_{i}\), which exist as proven in Lemmata 6–8(i). Moreover, let \((u_{b}^{n},u_{t}^{n},v^{n})\) be a minimizing sequence of E with \(E(u_{b}^{n},u_{t}^{n},v^{n}) \leq C\) and

for which r is given as in (36) and i∈{b,t}.

Since the functionals R i are nonnegative the result in (32) and the constraint \(\lVert v^{n} \rVert _{L^{\infty}(\varOmega)} \leq1\) imply the uniform boundedness of (v n) in BV(Ω) and thus the existence of a weakly* convergent subsequence (cf. [1, Thm. 2.5] and [5, Prop. 3.13]), without restriction of generality v n itself. Since the constraint 0≤v n≤1 prevails for n→∞ we see that the limit belongs to BV(Ω;[0,1]).

If \(\lVert v^{n} \rVert _{L^{1}(\varOmega )}\) and \(\lVert 1-v^{n} \rVert _{L^{1}(\varOmega )}\) do not tend to zero, then Lemma 9 implies that the mean values of \((u_{b}^{n})\) and \((u_{t}^{n})\) are uniformly bounded, and we can use the results proposed in Lemmata 3 and 5. Thus, we can satisfy the conditions required for the proof in Theorem 2 and obtain the existence of a minimizer of (29).

It remains to discuss the case \(\lVert v^{n} \rVert _{L^{1}(\varOmega )} \rightarrow 0\) as the case \(\lVert 1-v^{n} \rVert _{L^{1}(\varOmega )} \rightarrow0\) is completely analogous. Due to 0≤v n≤1 we conclude that \(\lVert 1-v^{n} \rVert _{L^{1}(\varOmega )} \rightarrow1 \neq0\) such that by Lemma 9 the mean values of \((u_{t}^{n})\) are uniformly bounded. With the help of (32) we can use the results proposed in Lemmata 3(iii) and 5(iii), which ensure the existence of a convergent subsequence \((u_{t}^{n_{j}})\) that converges to some \(\hat{u}_{t} \in W_{t}(\varOmega )\). In addition, due to the nonnegativity of v n and the uniqueness of the limit, v n converges to zero also in the weak* topology of BV(Ω). Thus, with the simple estimate

we can conclude that

showing that is a minimizer of (29).

Finally, by Theorem 1 we conclude the existence of a minimizer of (20) corresponding to each noise and regularization model investigated in Sects. 3.4 and 3.5, namely the additive Gaussian, Poisson, and multiplicative speckle noise model in combination with the (weighted) squared H 1-seminorm and Fisher information regularization functional. □

4 Special Cases of the Segmentation Framework

In the following we investigate some special cases of the proposed segmentation framework in Sect. 2 which correspond to a region-based formulation of the popular Mumford-Shah model [84] and the Chan-Vese model [20].

4.1 Mumford-Shah Formulation

First, we study the Mumford-Shah model [55], which is well known within the segmentation community. This model can achieve edge-based segmentations of high quality for a large class of images and is formulated as a minimization problem of an energy functional of the following form,

(58)

In this context ΓΩ denotes the edge set, which is measured in the (d−1)-dimensional Hausdorff measure \(\mathcal {H}^{d-1}\), and u is a smoothed approximation of the perturbed image f on ΩΓ. As already mentioned the original formulation of the Mumford-Shah model (58) is an edge-based segmentation method but can be turned into a region-based model if Γ is restricted to be the contour delineating the different subregions of Ω, e.g., in the case of two subregions Ω 1 and \(\varOmega_{2} = \varOmega\setminus\overline {\varOmega}_{1}\) by the restriction Γ=∂Ω 1∂Ω.

In [84] the author proposes an efficient region-based variant of the Mumford-Shah model, which is based on the modification of the boundary conditions on the edge set leading to different extensions of functions outside the subregions. The efficiency of this approach is induced by reformulating the Helmholtz-type optimality equations to the whole image domain Ω, which can be enforced by the following energy functional,

(59)

Note that the case of two subregions Ω 1 and \(\varOmega_{2} = \varOmega\setminus\overline{\varOmega}_{1}\) is used in (59) only for the sake of simplicity. Apparently, this functional is equivalent to our generalized segmentation framework in (13) choosing the negative log-likelihood functions −logp i (f|u i ) as squared distances (14) and the regularizers R i as squared H 1-seminorms (33). As already discussed in Sect. 2.2.1 this choice of likelihood functions corresponds to the assumption of the additive Gaussian noise model for the observed data f. Thus, the region-based version of the Mumford-Shah formulation in (59) represents a special case of the proposed generalized segmentation framework for an additive Gaussian noise model and H 1-seminorm regularization.

4.2 Chan-Vese Formulation

The traditional Chan-Vese segmentation model in [20] evolved as a simplification of the Mumford-Shah formulation presented in (58). It is based on the assumption that the intensities in the different subregions of Ω can be modeled as piecewise-constant functions. The Chan-Vese model can be formulated as minimization problem of an energy functional from the following form,

(60)

Here f is the perturbed image to be segmented and Γ again denotes the delineating contour of subregions Ω 1 and \(\varOmega _{2} = \varOmega\setminus\overline{\varOmega}_{1}\) of Ω, defined by Γ=∂Ω 1∂Ω. The functions c 1 and c 2 are constant approximations of f in Ω 1 and Ω 2, respectively. For the Chan-Vese segmentation method the relationship to our generalized segmentation framework (13) gets obvious choosing the fidelity functions −logp i (f|u i ) as squared distances like for the Mumford-Shah model in Sect. 4.1 and the regularizers R i as

$$ R_i(u_i) = \begin{cases} 0 , & \text{if } \lvert \nabla u_i \rvert = 0 , \\ \infty , & \text{else} . \end{cases} $$
(61)

In this way we can enforce the solutions u 1 and u 2 to be constant. Analogous to the case of the Mumford-Shah formulation, we discussed in Sect. 2.2.1 that the choice of fidelity functions above corresponds to an additive Gaussian noise model assumed in the given data f. Hence, the Chan-Vese segmentation model is a special case of the proposed segmentation framework for the additive Gaussian noise model and regularizers R i in (61), which enforce constant solutions of the segmentation problem.

4.3 Extension of Chan-Vese Formulation to Non-Gaussian Noise Models

In this section we discuss the natural extension of the Chan-Vese formulation presented in the section above to non-Gaussian noise models described in Sect. 2.2. To perform this extension it suffices to exchange the L 2 distance functions in (60) by general negative log-likelihood functions −logp i (f|u i ) such that the functional in (60) gets the following form,

(62)

As we can see this energy functional corresponds to the region-based segmentation framework (13) for the two-phase formulation (m=2) using the regularization functionals R i defined in (61) in order to enforce constant solutions c 1 and c 2. Actually, these optimal constants can be computed explicitly using the form of the negative log-likelihood functions. Thus, we obtain in the case of additive Gaussian (14) and Poisson (15) noise model the following constants,

$$c_i = \frac{\int_{\varOmega_i} f dx}{\lvert \varOmega_i \rvert } , $$

and in the case of multiplicative speckle noise (16),

$$c_i = \frac{1}{2} \left( \sqrt{\sigma^4 + \frac{4 \int_{\varOmega_i} f^2 dx}{\lvert \varOmega_i \rvert }} - \sigma^2 \right) . $$

Due to the simple form of these constants, the extension of the Chan-Vese segmentation method to non-Gaussian noise models in (62) is easy to implement and allows to be used in a wide range of applications in which piecewise constant approximations can be expected.

5 Numerical Realization

After the introduction and analysis of our region-based segmentation framework in earlier sections, we now discuss the details of its numerical implementation. For the sake of simplicity we restrict this discussion to the two-phase formulation proposed in (19) and thus deal with the realization of minimization problem (20). We use an alternating minimization strategy proposed in (25a)–(25b) in order to minimize the energy functional E in (19). Deviating from (25a)–(25b) we split the minimization step (25a) into separate subproblems regarding \(u^{k+1}_{b}\) and \(u^{k+1}_{t}\) and have to solve the following minimization problems,

(63a)
(63b)
(63c)

The first two minimization problems (63a) and (63b) can be interpreted as denoising problems, since their solutions \(u^{k+1}_{i}\) are meant to be approximations of the given noisy data f inside the subregions specified by χ k and (1−χ k) with appropriate extensions outside of these, and which are characterized by the choice of regularization functionals R i . The numerical realization of these two subproblems are discussed in Sects. 5.2 and 5.3 for the cases of additive Gaussian noise, multiplicative speckle noise, and Poisson noise. The third minimization problem (63c) is derived from (25b) by neglecting additive terms independent of χ. This subproblem is related to the actual partition of Ω into subregions given by χ and (1−χ) and depends on the previously computed solutions \(u^{k+1}_{i}\). The numerical solution of the minimization step (63c) is discussed in Sect. 5.4.

5.1 Numerical Realization of Denoising Steps: Preliminary Remarks

First, we discuss the numerical realization of denoising problems (63a) and (63b) for each of the noise models described in Sect. 2.2, i.e., additive Gaussian noise, Poisson noise, and multiplicative speckle noise. To simplify this discussion, we see that both problems are of the form

(64)

with i∈{b,t} and an indicator function \(\tilde{\chi}^{k}_{i}\) such that \(\tilde{\chi}^{k}_{b} = \chi^{k}\) in the case of (63a) and \(\tilde{\chi}^{k}_{t} = (1 - \chi^{k})\) in the case of (63b). In sections below we distinguish between the choice of the regularization functional R i as the squared H 1-seminorm (33) (respectively its weighted version (41)) and the Fisher information (37) in order to propose a numerical strategy solving (64).

Finally, we mention the following aspects of the numerical realization below.

  • In Assumption 2(iii) we assumed the regularization functionals R i to be convex. Thus, we use the concept of subdifferentials from convex analysis (see e.g., [28, Sect. 5.1]) to compute the first order optimality condition of the minimization problem (64) and denote the subdifferential with .

  • With respect to the analytical results in Sect. 3 the optimization problem (64) implies additional complications in the computation of minimizers using the Poisson and speckle noise model, or for the Fisher information regularization. As we can see in (39) and (46) the admissible function spaces only allow non-negative solutions in these cases, which needs to be enforced in the minimization problem (64). However, for reasons of clarity, we refrain to note the non-negativity constraint on u i in (64) explicitly but include this condition in the function set W i (Ω) whenever it is required.

5.2 Numerical Realization of Denoising Steps: (Weighted) H 1-Seminorm Regularization

We start with a discussion of the problem (64) using the squared H 1-seminorm regularization (33) and its weighted variant (41). Since the weighted version is more general than the basic H 1-seminorm, we focus on the weighted H 1-seminorm regularization functional and denote it with \(R_{H^{1}_{w}}\). In order to propose an appropriate numerical strategy minimizing the problem (64) with \(R_{i} = R_{H^{1}_{w}}\) for each of the noise models described in Sect. 2.2, we proceed in two steps. First, we show that for each fixed k∈ℕ the problem (64) can be equivalently realized by a sequence of convex variational problems of the form

(65)

with an appropriate setting of the noise function q n and the weighting function h n with respect to the present noise model. The choice of these functions is summarized in Table 1. The advantage of this strategy is that it allows to propose a uniform numerical framework for the realization of the denoising steps (64) without depending on the actual form of the fidelity functions D i (f,u i ). In the second part we discuss the numerical realization of the variational problem (65).

Table 1 Overview for the setting of functions q n and h n in (65) with respect to the different physical noise models proposed in Sect. 2.2

5.2.1 Additive Gaussian Noise

The first case that we want to investigate is the minimization problem (64) using the negative log-likelihood function D i (f,u i ) for additive Gaussian noise,

$$D_i(f,u_i) \stackrel{(18)}{=} - \log p_i(f \mid u_i) \stackrel{(14)}{=} \frac {1}{2} \left( u_i - f \right)^2 . $$

Then, the problem (64) is apparently a special case of (65) for h n≡1 and q n=f, and hence no inner iterations are needed for this case.

5.2.2 Poisson Noise

We concentrate now on the problem of denoising images perturbed by Poisson noise. For this purpose we insert the following fidelity term D i (f,u i ) into (64),

$$D_i(f,u_i) \stackrel{(18)}{=} - \log p_i(f \mid u_i) \stackrel{(15)}{=} u_i - f \log u_i . $$

Note that the Poisson noise model needs an additional non-negativity constraint on the solution u i in (64) (cf. (46)). Hence, we use the Karush-Kuhn-Tucker (KKT) optimality conditions, which formally provide the existence of a Lagrange multiplier λ≥0, such that the stationary points of the functional in (64) need to fulfill the equations,

(66a)
(66b)

where \(p^{k+1}_{i} \in \partial R_{H^{1}_{w}} ( u^{k+1}_{i} )\) and on Ω. Multiplying (66a) by \(u^{k+1}_{i}\) the Lagrange multiplier λ can be eliminated by (66b) leading to a fixed point equation of the form

$$ 0 = \tilde{\chi}^k_i \bigl( u^{k+1}_i - f \bigr) + \alpha_i u^{k+1}_i p^{k+1}_i . $$

Here, we use a semi-implicit iteration approach proposed in [70] (see [69, Sect. 4.5] for details) and obtain the following iteration scheme to compute \(u^{k+1}_{i}\),

$$ \tilde{\chi}^k_i \bigl( u^{k+1}_i \bigr)^{n+1} = \tilde{\chi }^k_i f - \alpha_i \bigl( u^{k+1}_i \bigr)^n \bigl( p^{k+1}_i \bigr)^{n+1} $$
(67)

with

$$\bigl( p^{k+1}_i \bigr)^{n+1} \in \partial R_{H^1_w}\bigl( \bigl( u^{k+1}_i \bigr)^{n+1} \bigr) . $$

Considering the form of (67), we can see that each step of this iteration sequence can be realized by solving the following convex variational problem,

(68)

Inspecting the first order optimality condition of (68) confirms the equivalence of this minimization problem with the iteration step in (67) due to convexity. Hence, the original problem (64) can be realized in the Poisson case by a sequence of minimization problems of the form (68), which is a special case of (65) with q n=f and \(h^{n} = ( u^{k+1}_{i} )^{n}\). We note that we incorporate an additional non-negativity constraint in (68), and thus automatically ensure the complementarity condition (66b). Finally, using the convergence result in [69], we terminate the sequence (68) if the relative distance between two consecutive iteration solutions falls below a specified threshold, i.e., if

$$ \frac{\| ( u^{k+1}_i )^{n+1} - ( u^{k+1}_i )^n \|_{L^2(\varOmega)}}{\| ( u^{k+1}_i )^{n+1} \|_{L^2(\varOmega)}} < \epsilon . $$
(69)

Note that we can initialize the function \((u^{k+1}_{i})^{0}\) in (68) as \((u^{k+1}_{i})^{0} \equiv1\). However, in our experiments we observed that a combination of smoothed results of the last step of our outer minimization strategy (63a)–(63c), i.e., \((u^{k+1}_{i})^{0} = u^{k} = \chi^{k} u_{b}^{k} + (1 - \chi^{k}) u_{t}^{k}\), resulted in a better performance.

5.2.3 Multiplicative Speckle Noise

The last case that we want to discuss is dedicated to images biased by multiplicative speckle noise proposed in Sect. 2.2.3 and thus we investigate the following negative log-likelihood function D i (f,u i ) in (64),

$$D_i(f,u_i) \stackrel{(16)}{=} \frac{(u_i - f)^2}{u_i} + \sigma^2 \log u_i . $$

Considering the non-negativity constraint on the solution u i in (64) (cf. Sect. 3.5.3), we can proceed analogously to the case of the Poisson noise fidelity term in Sect. 5.2.2. Thus, we obtain that each step of the resulting semi-implicit iteration sequence can be realized by solving a convex variational problem of the form (see [69, Sect. 5.3] for details),

(70)

We can see that (70) corresponds to (65) by setting \(q^{n} = f^{2} / (u^{k+1}_{i})^{n} - \sigma^{2}\) and \(h^{n} = (u^{k+1}_{i})^{n}\). Finally, we terminate the sequence (70) as in (69) and initialize \((u^{k+1}_{i})^{0}\) analogously to Sect. 5.2.2 as \((u^{k+1}_{i})^{0} = u^{k} = \chi ^{k} u_{b}^{k} + (1 - \chi^{k}) u_{t}^{k}\).

5.2.4 Numerical Realization of Variational Problem (65)

In Sects. 5.2.15.2.3 we have seen that the proposed numerical schemes lead to solving a sequence of variational problems introduced in its most general form in (65). In this section we discuss the numerical realization of variational problems occurring in the sequence (65). For the sake of simplicity, we neglect in the following the indices k, n and i, and hence deal with the following minimization problem using the definition of the regularization functional \(R_{H^{1}_{w}}\) in (41),

$$ \min_{u \geq 0} \frac{1}{2} \int_\varOmega\tilde {\chi} \frac{(u - q)^2}{h} dx + \frac{\alpha}{2} \int_\varOmega\frac{\lvert \nabla u \rvert ^2}{w} dx . $$
(71)

Due to the presented results in (68) and (70) regarding the Poisson and speckle noise model, respectively, we consider in (71) the more interesting case of an additional non-negativity constraint. The case of the additive Gaussian noise model (i.e., where the non-negativity condition is unnecessary) is indicated in Remark 5 below.

In order to solve the problem (71) efficiently, we propose to use an augmented Lagrangian minimization approach and refer, e.g., to [30, 36, 42] for an introduction to this topic. The major motivation for this approach is that we want to obtain a unified method which can handle the weights h and w in (71) in a simple way and simultaneously incorporate the non-negativity condition on the solution without great effort. A scheme of that form has been used by the authors for inverse problems with Poisson noise and total variation regularization [69] demonstrating good performance and variability. In principle it would be possible to use alternative techniques, e.g., Newton’s method or a conjugate gradient method, particularly for the case of additive Gaussian noise and the standard H 1-seminorm regularization (i.e., using h=w≡1 in (71)). However, one has to take special care for the non-negativity constraint. Furthermore, if the weights h in the data fidelity term and w in the regularization functional are not identity functions, further problems can arise, such as choosing a proper preconditioner matrix for the conjugate gradient method for instance. Finally, another motivation for using the augmented Lagrangian approach is that the most expensive part of this iteration sequence is the computation of a Helmholtz-type optimality equation (see (74) below), which can be solved efficiently and exactly using the discrete cosine transform. This has the advantage that it provides the opportunity to accelerate this sub-step directly using fast methods, e.g., by using GPU-based approaches.

Due to the weights \(\tilde{\chi}\), h and w occurring in (71), we introduce auxiliary functions \(\tilde {u}\) and v to simplify handling of these weights. Hence, we consider the following equivalent constrained optimization problem,

(72)

where i ≥0 is an indicator functional given by i ≥0(u)=0 if u≥0 a.e. and +∞ else. Following the idea of augmented Lagrangian methods and using the standard Uzawa algorithm (without preconditioning) [29], we obtain an alternating minimization scheme given by (cf. [69, Sect. 6.3.4] for a detailed description in the case of total variation regularization)

(73a)
(73b)
(73c)
(73d)
(73e)

where \(\lambda_{\tilde {u}}, \lambda_{v}\) are Lagrange multipliers and \(\mu_{\tilde {u}}, \mu_{v}\) are positive relaxation parameters. The efficiency of this strategy strongly depends on how fast one can solve each of the subproblems in (73a)–(73e). First, the problem (73a) is differentiable with the following Helmholtz-type optimality equation assuming Neumann boundary conditions,

(74)

where I is the identity operator and Δ denotes the Laplace operator. In the discrete setting using finite difference discretization on a rectangular domain Ω, this equation can be solved efficiently by the discrete cosine transform (DCT-II), since −Δ is diagonalizable in the DCT-transformed space,

$$ u^{l+1} = \mathit{DCT}^{-1} \left( \frac{\mathit{DCT} \left( z^l \right)}{\mu _{\tilde {u}} + \mu_v \hat{k}} \right) , $$
(75)

where z l is defined in (74), \(\hat{k}\) represents the negative Laplace operator in the discrete cosine space (see [69, Sect. 6.3.4]), and DCT −1 denotes the inverse discrete cosine transform. Moreover, the solution of the minimization problem (73b) can be computed by an explicit formula of the form

$$ \tilde {u}^{l+1} = \max\left\{ \frac{\tilde{\chi} q + h \left( \mu_{\tilde {u}} u^{l+1} - \lambda_{\tilde {u}}^l \right)}{\tilde {\chi} + \mu_{\tilde {u}} h} , 0 \right\} , $$
(76)

where the maximum operation has to be interpreted pointwise (i.e., for each xΩ). Finally, the sub-problem (73c) can be computed by the following explicit formula,

$$ v^{l+1} = \frac{w \left( \mu_v \nabla u^{l+1} - \lambda_v^l \right)}{\alpha + \mu_v w} . $$
(77)

Remark 4

From a practical point-of-view, we note that the augmented Lagrangian approach presented above can be simplified for the original squared H 1-seminorm regularization (33) in (71), i.e., considering w≡1 in (71). In this special case the auxiliary function v=∇u in (72) is not necessarily required and hence the iteration steps (73e) and (77) disappear. In addition, the formula in (75) can be simplified and the resulting numerical scheme has only to perform the following iteration sequence,

$$u^{l+1} = \mathit{DCT}^{-1} \left( \frac{\mathit{DCT} \left( \lambda_{\tilde {u}}^l + \mu_{\tilde {u}} \tilde {u}^l \right)}{\mu_{\tilde {u}} + \alpha \hat{k}} \right) , $$

followed by steps in (76) and (73d). This increases the algorithm performance significantly in the special case of w≡1 compared to the more general approach presented in (73a)–(73e) and (77).

Remark 5

Finally, we want to discuss the case where the non-negativity condition in (71) is unnecessary, i.e. considering the additive Gaussian noise model. The only difference is that the indicator functional i ≥0 in (72) and (73b) is no longer required and hence we only have to modify (76) to

$$\tilde {u}^{l+1} = \frac{\tilde{\chi} q + h( \mu_{\tilde {u}} u^{l+1} - \lambda_{\tilde {u}}^l )}{\tilde{\chi} + \mu _{\tilde {u}} h} . $$

5.3 Numerical Realization of Denoising Steps: Fisher Information Regularization

In Sect. 5.2 we considered the numerical realization of the denoising problem (64) using the (weighted) squared H 1-seminorm regularization R i . In this section we discuss the case of the Fisher information regularization defined in (37), i.e., we consider

(78)

Note that in the case of the Fisher information regularization a non-negativity constraint on the solution is always required due to the admissible domain of the functional in (39).

Before we discuss the implementation of (78), we explain why the differentiation between the Fisher information and squared H 1-seminorm regularization is required to propose a numerical strategy minimizing (64). It is apparent that for the Fisher information regularization we can proceed analogously to Sect. 5.2 and propose an iteration scheme solving the following minimization problem (cf. (71)),

$$ \min_{u \geq 0} \frac{1}{2} \int_\varOmega\tilde{\chi} \frac{(u - q)^2}{h} dx + \frac{\alpha}{2} \int_\varOmega \frac{\lvert \nabla u \rvert ^2}{u} dx . $$
(79)

The complexity of this minimization functional, compared to the one in (71), arises from the unknown denominator in the regularization energy in (79). Hence, the augmented Lagrangian approach used in Sect. 5.2.4 leads for (79) to several realization problems, which require further detailed investigations and hence go beyond the scope of this work. Thus, we decided to omit these additional discussions from this paper.

We rather propose to use another approach to solve (64) with the Fisher information (37) as regularization functional R i in the following. There we derive the following iteration scheme for each fixed k∈ℕ,

(80)

where c>0 is a constant, while I and Δ denote the identity and Laplace operator, respectively. The choice of functions r n and s n in (80) is discussed below for each of the noise models described in Sect. 2.2 and summarized in Table 2. Now, each iteration step in (80) requires the solution of a Helmholtz-type equation, which can be realized efficiently in the discrete setting using the discrete cosine transform,

$$\bigl( u^{k+1}_i \bigr)^{n+1} = \mathit{DCT}^{-1} \left( \frac{\mathit{DCT} \left( \tilde{z}^n \right)}{c + \alpha_i \hat{k}} \right) , $$

using the notation of (75) with \(\tilde{z}^{n}\) as defined as in (80). Finally, we have to ensure the non-negativity of \(( u^{k+1}_{i} )^{n}\) for each n∈ℕ since the solution of this iteration sequence should be non-negative in the case of Fisher information regularization (see (39)). This can be guaranteed if the right-hand side of (80) is non-negative due to the maximum principle for elliptic partial differential equations. To ensure this property we have to choose the functions s n and \((u^{k+1}_{i})^{0}\) non-negative and

$$ c \geq \operatorname*{\mathrm{ess} \sup}_\varOmega\left( r^n + \frac{\alpha_i}{2} \frac {\lvert \nabla( u^{k+1}_i )^n \rvert ^2}{( ( u^{k+1}_i )^n )^2} \right) $$
(81)

for each n∈ℕ, whereas the latter condition is a simple consequence of the negative terms occurring in (80). The termination of (80) is realized as in (69). With respect to the convergence of (80), which is depending on c, we refer to [73] and the references therein, where the convergence of similar numerical schemes is discussed. It is obvious that we cannot guarantee a sufficiently fast convergence of the iteration sequence (80) due to potentially high values of c in (81). Thus, further development of alternative algorithms will be required in the future.

Table 2 Overview for the setting of functions r n and s n in (80) with respect to the different physical noise models proposed in Sect. 2.2

In the following we discuss the choice of functions r n and s n in (80) with respect to additive Gaussian, Poisson, and multiplicative speckle noise.

5.3.1 Additive Gaussian Noise

We start with the investigation of (78) using the negative log-likelihood function D i (f,u i ) corresponding to the additive Gaussian noise presented in (14), i.e., we have \(D_{i}(f,u_{i}) = \frac{1}{2} ( u_{i} - f )^{2}\). Due to the non-negativity constraint on the solution u i in (78) we use KKT conditions, which formally provide the existence of a Lagrange multiplier λ≥0, such that the stationary points of the functional in (78) need to fulfill the equations,

(82a)
(82b)

for which we assume Neumann boundary conditions. Multiplying (82a) by \(u^{k+1}_{i}\) the Lagrange multiplier λ can be eliminated by (82b) and the subsequent addition of the term \(c u^{k+1}_{i} - c u^{k+1}_{i}\) with c>0 leads to a fixed point equation of the form

(83)

Using a semi-implicit approach, we now obtain the iteration sequence (80) by setting

  • \(r^{n} = \tilde{\chi}^{k}_{i} (u^{k+1}_{i})^{n}\) and \(s^{n} = \tilde{\chi}^{k}_{i} (u^{k+1}_{i})^{n} f\), if f is non-negative everywhere;

  • \(r^{n} = \tilde{\chi}^{k}_{i} ( (u^{k+1}_{i})^{n} - f )\) and s n=0 else.

The case differentiation presented here is motivated by practical reasons. It is obvious that we can always choose c in (81) large enough such that the right-hand side of (80) is non-negative. On the other hand, it should be as small as possible to yield a reasonable convergence behavior of the iteration sequence. Thus, in the case of additive Gaussian noise, where the data function f can also be negative, using the functions above is more convenient. Finally, note that the resulting fixed point of (83) is a minimizer of (78) if the corresponding derivative has the correct sign, a condition which has to be checked in the numerical realization.

5.3.2 Poisson Noise

Here we consider the problem (78) in the case of Poisson noise, where the negative log-likelihood function D i (f,u i ) is given in (15), i.e., D i (f,u i )=u i flogu i . In this case we can proceed analogously to the additive Gaussian noise discussed in Sect. 5.3.1 and hence obtain the following fixed point equation using KKT and Neumann boundary conditions,

Then a semi-implicit approach yields the iteration sequence (80) with \(r^{n} = \tilde{\chi}^{k}_{i}\) and \(s^{n} = \tilde {\chi}^{k}_{i} f\). Note that the non-negativity of s n results from Sect. 3.5.2 since fV i (Ω) is non-negative.

5.3.3 Multiplicative Speckle Noise

Finally, we discuss (78) for the case of the speckle noise likelihood function introduced in (49), i.e., we consider \(D_{i}(f,u_{i}) = \frac{(u_{i} - f)^{2}}{u_{i}} + \sigma^{2} \log u_{i}\). In this case the KKT conditions give a fixed point equation of the form

with Neumann boundary conditions. Consequently, a semi-implicit approach results in the iteration sequence (80) with

5.4 Numerical Realization of the Segmentation Step

After discussing the numerical realization of the denoising problems (63a) and (63b) we now concentrate on the segmentation problem (63c) in the proposed alternating minimization strategy. In this context we are interested in computing the current segmentation indicated by χ k+1 using the updated denoised images \(u_{b}^{k+1}\) and \(u_{t}^{k+1}\).

The standard approaches to solve geometric problems are active contour models such as snakes initially proposed by Kass, Witkin and Terzopoulos in [44] or level set methods introduced by Osher and Sethian in [60]. Although these models have attracted strong attention in the past, there are several drawbacks leading to complications in the computation of segmentation results. For example, the explicit curve representation of snake models do not allow changes in curve topology or the level set methods require an expensive re-initialization of the level set function during the evolution process (cf. e.g., [50, 59]). However, the main drawback of these methods is the non-convexity of the associated energy functionals and consequently the existence of local minima leading to unsatisfactory results with wrong scales of details.

Recently, several globally convex segmentation models have been proposed in [12, 22, 39] to overcome the fundamental issue of existence of local minima. The main idea of all these approaches is based on the unification of image segmentation and image denoising tasks into a global minimization framework. In our work we follow the idea found in [16], where a relation between the well known Rudin-Osher-Fatemi (ROF) model [66] and the minimal surface problem is presented. In order to do so we recall this relation in the following lemma and note that the ROF model always admits a unique solution, since the associated energy functional is strictly convex.

Lemma 11

Let β>0 be a fixed parameter, gL 2(Ω), and \(\hat {u}\) the unique solution of the ROF minimization problem

$$ \min_{u \in \mathit{BV}(\varOmega)} \frac{1}{2} \int_\varOmega (u - g)^2 dx + \beta \lvert u \rvert _{\mathit {BV}(\varOmega )} . $$
(84)

Then, for almost every t∈ℝ, the indicator function

$$ \hat{\chi}(x) = \begin{cases} 1 , & \text{\textit{if} } \hat{u}(x) > t , \\ 0 , & \text{\textit{else}} , \end{cases} $$
(85)

is a solution of the minimal surface problem

(86)

In particular, for all t but a countable set, the solution of the problem (86) is even unique.

Using Lemma 11 we are capable to translate the segmentation problem (63c) to a standard ROF denoising problem. We can observe that the problem (63c) corresponds to the minimal surface problem (86) by setting

$$ t = 0 \quad\text{and} \quad g = D_t(f,u^{k+1}_t) - D_b(f,u^{k+1}_b) . $$
(87)

Therefore, the solution χ k+1 of the segmentation step (63c) can now be computed by simple thresholding of the form (85) with t=0, where \(\hat{u}\) is the solution of the ROF problem (84) with g specified in (87).

In conclusion, the ROF model (84) is a well understood and intensively studied variational problem in mathematical image processing. Hence, a variety of numerical schemes have been already proposed in literature in order to solve this problem. Exemplarily, we refer to the projected gradient descent algorithm of Chambolle in [17], a nonlinear primal-dual method of Chan, Golub and Mulet in [21], the split Bregman algorithm of Goldstein and Osher in [38], and some first-order algorithms in [8, 18].

6 Experimental Results on Synthetic Data

In this section we validate the proposed variational segmentation framework on synthetic data and focus on the choice of an appropriate data fidelity term and the selection of regularization functionals. To fully understand the influence of these variables we propose different experimental settings in the following, which are constructed to highlight observed characteristics of our segmentation framework. In order to segment the images discussed below, we use the alternating minimization scheme given in (63a)–(63c), for which 25 iteration steps are sufficient in most cases. To terminate the inner iteration loops discussed in Sects. 5.2 and 5.3 we choose ϵ=10−6 in (69).

6.1 Impact of Regularization Term

First, we investigate the impact of the selected regularization term (cf. Sect. 3.4) on the results of segmentation and especially on the estimated approximations to the original object data. For this purpose we use an image of size 150×150 pixels scaled to [0,1] with a simple object structure illustrated in Fig. 2(a). In this image we put inhomogeneities in Ω t and Ω b covering the full range of intensities, such that both regions have the same mean value. The challenge of this data is the occurence of the same grayscale values in both regions with strong intensity changes at the border of the object structure. Apparently, this situation is quite untypical in real-life applications. However, it illustrates the limits of the proposed segmentation framework.

Fig. 2
figure 2

Comparison of Chan-Vese segmentation result with ground truth on synthetic data

Due to this constructed properties it is obvious that a regularization functional enforcing piecewise constant functions is not feasible in this situation, as can be seen in Fig. 2(c) for the popular Chan-Vese algorithm from Sect. 4.2. This motivates the investigation of the squared H 1-seminorm and the Fisher information regularization discussed in Sect. 3.4. For this purpose we concentrate on data which are perturbed by Poisson noise (cf. Sect. 2.2.2) as illustrated in Fig. 3(b). In order to guarantee identical preconditions we keep β=0.4 fix for both experiments.

Fig. 3
figure 3

Comparison of segmentation and approximation results for synthetic data biased by Poisson noise using the squared H 1-seminorm and Fisher information regularization. The grayscale values in (a), (e) and (f) are scaled to a uniform interval and the horizontal white lines represent the position of the profile lines plotted in Fig. 4

For the squared H 1-seminorm the regularization parameters are chosen as α b =7 and α t =540 in order to achieve a satisfying segmentation result illustrated in Fig. 3(c). The strong difference in the absolute values of α b and α t is caused by the extraordinary form of the given data. However, as a consequence of this choice we obtain an oversmoothing of the target region and on the other hand a low regularization in the background region as presented in Fig. 3(e). The same observations can also be made for the Fisher information regularization, where the parameters are chosen as α b =20 and α t =30. The corresponding segmentation and approximation results are illustrated in Figs. 3(d) and 3(f), respectively. However, in order to demonstrate the difference of approximations in Figs. 3(e) and 3(f) we plot the intensity values of these results in Fig. 4 along the illustrated profile lines. As we can see the Fisher information result approximates the target object significantly better than the squared H 1-seminorm result.

Fig. 4
figure 4

Profile lines of the squared H 1-seminorm and Fisher information approximations from Fig. 3

During our experiments described above we also observed a possible convergence of our algorithm against the local minima presented in Fig. 2(c). For the Fisher information regularization we illustrate such segmentation result in Fig. 5(c), where the regularization parameters are chosen as described above and the initialization given in Fig. 5(a) are used. This unsatisfactory result can be avoided as illustrated in Fig. 5(d) using another initialization as presented in Fig. 5(b). The reason for this behavior is the non-convexity of the whole energy functional given in (19) such that a convergence against a global minimum cannot be guaranteed, although each subproblem in (63a)–(63c) has a global solution. Hence, we state that the convergence of the presented method against an optimal solution depends not only on the specified regularization parameters but also on a suitable initialization.

Fig. 5
figure 5

Comparison of segmentation results for synthetic data biased by Poisson noise using the Fisher information regularization and different initializations

6.2 Impact of the Data Fidelity Term

To evaluate the importance of a correct noise model in automated image segmentation we investigate images perturbed by physical noise forms described in Sect. 2.2, i.e., additive Gaussian noise, Poisson noise, and multiplicative speckle noise. For the sake of simplicity we perform this experiment on piecewise constant images, since this case is known from the popular Chan-Vese formulation [20] and we can neglect the regularization parameters α b and α t . For the segmentation we use the generalized Chan-Vese two-phase segmentation model proposed in Sect. 4.3, i.e., we only separate regions-of-interest from the background.

We choose the objects to be segmented with respect to typical segmentation tasks from biomedical imaging. Often, only one major anatomical region-of-interest has to be segmented, e.g., the left ventricle of the heart in echocardiographic examinations [57, 76] or [18F]FDG uptake studies in positron emission tomography [62]. Furthermore, it is desirable to preserve as many image details as possible during the process of segmentation. Especially in tumor imaging [75] small lesions in a size of only few pixels can easily be overseen, due to a loss of details by too intense regularization. This leads to severe consequences if not taken into account and hence it is important to preserve details of small image regions.

Figure 6(a) shows our experimental data without perturbation of noise and the ground truth segmentation in Fig. 6(b). In the image center we place an approximate shape of the left ventricle of the human heart as it would be imaged in an apical four-chamber view in echocardiography. Below this major structure we put three small squares with sizes of 1,2, and 4 pixels to simulate minor structures, such as small lesions, which we want to preserve during the process of automated image segmentation. On the left and right side of our heart phantom we set two curved lines with a diameter of 1 and 2 pixels to simulate vessel-like structures, which play an important role in perfusion studies of different organs [33, 82], e.g., liver veins or coronary arteries of the heart.

Fig. 6
figure 6

Synthetic data with anatomical structures of different size

To validate the impact of the data fidelity term we bias the image in Fig. 6(a) with synthetic noise and try to find the optimal value of the regularization parameter β. This optimization was done with respect to the following two criterions,

  • Segmentation of the main structure without noise artifacts.

  • Preservation of small anatomical structures without loss of details.

Naturally, it is hard to fulfill both criterions simultaneously, since there is a trade-off between noise-free segmentation results and a detailed partition of the image. For our synthetic images we look for the highest possible value of β, which preserves as many small structures as possible, and on the other hand for the lowest possible value of β that ensures a complete segmentation of the main structure without noise-induced artifacts. These two limiting cases are the foundation for our observations in the following experiment.

First, we perturb the data with additive Gaussian noise as illustrated in Fig. 7(a). In Fig. 7(b) the data fidelity term for additive Gaussian noise (cf. (14)) produces a satisfying segmentation result, which fulfills both criterions discussed above. For this case we observe very similar results for the Poisson and speckle data fidelity term as illustrated in Figs. 7(c), 7(d) and 7(e), 7(f), respectively.

Fig. 7
figure 7

Comparison of different data fidelity models on synthetic data of simulated anatomical structures biased by additive Gaussian noise

In the next experiment we perturb the synthetic data with Poisson noise as presented in Fig. 8(a). For this image we state that the Poisson data fidelity term in (15) is an appropriate choice as can be seen in Fig. 8(b). In Fig. 8(c) and 8(d) we test the additive Gaussian noise fidelity term and choose the regularization parameter β according to the criterions discussed above. In order to preserve all small structures in Fig. 8(c) we have to accept a significant amount of noise artifacts within the main structure. On the other hand, we lose almost all small structures in Fig. 8(d) by choosing β high enough to guarantee a segmentation of the center object without pertubation of noise. In this case the trade-off between smooth segmentations and the loss of details becomes obvious. Figure 8(e) and 8(f) show the results in the case of the speckle data fidelity term. Since we do not observe any satisfying results for different values of β, we show two representative segmentations for this model. Compared to the results in the case of additive Gaussian noise, the choice of the appropriate noise model has a significantly higher impact on the segmentation results in the presence of Poisson noise.

Fig. 8
figure 8

Comparison of different data fidelity models on synthetic data of simulated anatomical structures biased by Poisson noise

Finally, we investigate the case of data biased by multiplicative speckle noise as shown in Fig. 9(a). The segmentation result for the corresponding speckle data fidelity term (cf. (16)) is presented in Fig. 9(b). Again, we observe that we are able to satisfy both segmentation criterions when choosing the correct noise model. In Fig. 9(c) and 9(d) the segmentation results for the data fidelity term of additive Gaussian noise are presented. As for the results in Fig. 8(c) and 8(d), the Gaussian model is inappropriate in the presence of multiplicative speckle noise. For the Poisson fidelity term we observe in Fig. 9(e) and 9(f) similar effects compared to the Gaussian case, but with a better trade-off between smooth segmentation results and the preservation of small image structures.

Fig. 9
figure 9

Comparison of different data fidelity models on synthetic data of simulated anatomical structures biased by multiplicative speckle noise

In conclusion, we emphasize that the incorporation of physical noise modeling for given data has a significant impact on segmentation results and leads to improved accuracy in applications dealing with non-Gaussian noise.

7 Experimental Results on Real Data

In this section we use real data from biomedical imaging and investigate the performance of the proposed variational segmentation framework in real segmentation tasks. Therefore, we present images from two imaging modalities dealing with the non-Gaussian noise models discussed in Sect. 2.2.

7.1 Positron Emission Tomography

To test the variational segmentation framework on data which are heavily perturbed by noise we take images from positron emission tomography (PET). PET belongs to the field of molecular imaging in which a specific radioactive tracer is injected into blood circulation and its binding to certain molecules is studied [83]. One possible tracer is, e.g., H2[15O] (radioactive-labeled water), which can be used for quantification of the myocardial blood flow [71]. However, this quantification needs a segmentation of myocardial tissue [10, 71], which is extremely difficult to realize due to a very low signal-to-noise (SNR) ratio of H2[15O] data.

Figure 10(a) shows a slice of a reconstructed H2[15O] study using the expectation maximization (EM) algorithm [72] with signal intensities in the range of up to 2,000,000 counts. The data were captured in the moment, when the tracer flooded into the left ventricle of the human heart. This very short interval leads to a high level of Poisson noise in the data and thus in the corresponding EM reconstructions, causing a challenging task for most segmentation algorithms. Based on our observations in the last section we choose the Poisson data fidelity term (cf. (15)) and use Fisher information as regularization functional. For our experiment we focus on the region-of-interest shown in Fig. 10(b), which corresponds to the area around the myocardial tissue. As regularization parameters we determine α t =140, α b =20, and β=380,000 (low SNR).

Fig. 10
figure 10

Segmentation result for reconstruction of cardiac H2[15O] PET data

In Fig. 10(c) and 10(d) we show the approximation and segmentation result in the region-of-interest, respectively. The segmented region corresponds to the myocardium including the left and right ventricle. As can be seen, the proposed segmentation framework can segment the myocardium in data with very low SNR. However, we were not able to separate left and right ventricle as it would be needed for myocardial perfusion quantification. The reason for this is the used two-phase formulation in (19), which is not suitable to segment different uptake levels and additional background activity.

7.2 Medical Ultrasound Imaging

The last case we want to investigate is medical ultrasound data. As proposed in [45, 52] the physical noise occurring in this imaging modality can be modeled by multiplicative speckle noise as presented in Sect. 2.2.3 and hence the speckle data fidelity term (cf. (16)) is an appropriate choice in this context. In the following we concentrate on the task of blood vessel segmentation, which can be used in perfusion studies of different organs, e.g., the myocardium [33, 82].

In Fig. 11(a) we illustrate a section of an ultrasound B-mode image containing parts of a human liver. The darker regions within these data represent the blood vessels to be segmented. In Figs. 11(b)–11(d) we present three segmentation results for β∈{1000,2000,5000} using the classical Chan-Vese model (additive Gaussian noise) from Sect. 4.2. As can be seen we were not able to segment the small structures of the blood vessels accurately without over-segmenting the data. We compare the best segmentation result for the classical CV approach with β=2000 against the corresponding extension to the speckle noise model presented in Sect. 4.3. As can be seen in Fig. 11(e) by exchanging only the data fidelity term we can observe a significant improvement in the segmentation accuracy and suppression of noise artifacts. Although, the result in Fig. 11(e) appears to segment the blood vessels within the image correctly, we can identify some small structures that have been left out. In order to include these microvessels we exchanged the regularization term to the Fisher information for target and background region and used the regularization parameters α t =15,α b =25, and β=25. We could not achieve any meaningful segmentation result using the squared H 1-seminorm regularization due to oversmoothing of small structures.

Fig. 11
figure 11

Segmentation results for ultrasound (US) data from an examination of a human liver comparing classical and extended Chan-Vese (CV) method for speckle noise, and Fisher information for speckle noise

As illustrated in Fig. 11(f) the proposed segmentation framework is able to segment even smallest structures within the given data. Especially in low contrast regions, e.g., the lower-left corner of the image, we observe satisfying segmentation results. Figure 11(g) shows the corresponding approximation of the data for the Fisher information regularization and illustrates a non-constant approximation of the target region while preserving edges at the blood vessels within the image.

8 Conclusion

In this work we proposed a variational segmentation framework for different physical noise models and analyzed the corresponding mathematical theory. In particular, we investigated a selected variety of regularization terms and noise models, i.e., additive Gaussian noise, Poisson noise, and multiplicative speckle noise, for automated image segmentation. Experimental results on synthetic and real data show the necessity to adapt an algorithm to present conditions, e.g., incorporating a-priori knowledge about the solution.

We plan to translate the theoretical fundament provided in this work to more real world applications and to validate the presented variational segmentation framework on a variety of realistic data. Furthermore, we plan to explore the incorporation of more physical noise models, e.g., Rayleigh- or Gamma-perturbed data.