Abstract
A convex non-convex variational model is proposed for multiphase image segmentation. We consider a specially designed non-convex regularization term which adapts spatially to the image structures for a better control of the segmentation boundary and an easy handling of the intensity inhomogeneities. The nonlinear optimization problem is efficiently solved by an alternating directions methods of multipliers procedure. We provide a convergence analysis and perform numerical experiments on several images, showing the effectiveness of this procedure.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The fundamental task of image segmentation is the partitioning of an image into regions that are homogeneous according to a certain feature, such as intensity or texture, and to identify more meaningful high level information in the image. This process plays a fundamental role in many important application fields like computer vision, e.g. object detection, recognition, measurement and tracking. Many successful methods for image segmentation are based on variational models where the regions of the desired partition, or their edges, are obtained by minimizing suitable energy functions. The most popular region-based segmentation model, the Mumford–Shah model, is a non-convex variational model which pursues a piecewise constant/smooth approximation of the given image where the boundaries are the transition between adjacent patches of the approximation [25]. Many convex relaxation models have been proposed in the literature to overcome the numerical difficulties of the non-convex problem [4,5,6, 22] at the same time accepting a compromise in terms of segmentation quality.
In this work we propose the following Convex Non-Convex (CNC) variational segmentation model given by the sum of a smooth convex (quadratic) fidelity term and a non-smooth non-convex regularization term:
where \(\lambda > 0\) is the regularization parameter, \(b \in {\mathbb R}^n\) is the (vectorized) observed image, \((\nabla u)_i \in {\mathbb R}^2\) represents the discrete gradient of the image \(u \in {\mathbb R}^n\) at pixels i, \(\Vert \, \cdot \, \Vert _2\) denotes the \(\ell _2\) norm and \(\phi (\,\cdot \,;T,a): [\,0,+\infty ) \rightarrow {\mathbb R}\) is a parameterized, piecewise-defined non-convex penalty function with parameters \(T>0, a > 0\) and with properties that will be outlined in Sect. 2. In particular, the parameter a allows to tune the degree of non-convexity of the regularizer, while the parameter T is devised to represent a given gradient magnitude threshold above which the boundaries surrounding the features in the image are considered salient in a given context. This parameter plays a fundamental role in selecting which pixels do not have to be considered as boundaries of segmented regions in the image. The role of the penalty function \(\phi \) in the regularization term of functional \(\mathcal {J}\) in (1.1) is twofold. When the gradient magnitudes fall within the first interval [0, T), \(\phi \) smooths the image values, since the corresponding pixels belong to the inner parts of the regions to be segmented. In the interval \([T,+\infty )\), \(\phi \) is non-convex and then flat and hence it penalizes, in approximately the same way, all the possible gradient magnitudes.
The functional \(\mathcal {J}\) in (1.1) is non-smooth and can be convex or non-convex depending on the parameters \(\lambda \) and a. In fact, the quadratic fidelity term is strongly convex and its positive second-order derivatives hold the potential for compensating the negative second-order derivatives in the regularization term.
The idea of constructing and then optimizing convex functionals containing non-convex (sparsity-promoting) terms, referred to as CNC strategy, was first introduced by Blake and Zisserman in [2], then proposed by Nikolova in [27] for the denoising of binary images and was studied very recently by Selesnik and others for different purposes, see [9, 17, 20, 21, 28, 32, 33] for more details. The attractiveness of such CNC approach resides in its ability to promote sparsity more strongly than it is possible by using only convex terms while at the same time maintaining convexity of the total optimization problem, so that well-known reliable convex minimization approaches can be used to compute the (unique) solution.
In this paper, we propose a two-stage variational segmentation method inspired by the piecewise smoothing proposal in [5] which is a convex variant of the classical Mumford–Shah model. In the first stage of our method an approximate solution \(u^*\) of the optimization problem (1.1) is computed. Once \(u^*\) is obtained, then in the second stage the segmentation is carried out by thresholding \(u^*\) into different phases. The thresholds can be set by the user or can be obtained automatically using any clustering method, such as the K-means algorithm. As discussed in [5], this allows for a K-phase segmentation \((K \ge 2)\) by choosing \((K - 1)\) thresholds after \(u^*\) is computed in the first stage. In contrast, many multiphase segmentation methods such as those in [15, 23, 24, 38] require K to be set in advance which implies that if K changes, the minimization problem has to be solved again.
The main contributions of this paper are summarized as follows.
-
A new variational CNC model for multiphase segmentation of images is proposed, where a unique non-convex regularization term allows for simultaneously penalizing both the non-smoothness of the inner segmented parts and the length of the boundaries;
-
Sufficient conditions for convexity are derived for the proposed model;
-
A specially designed ADMM-based numerical algorithm is introduced together with a specific multivariate proximal map;
-
The proof of convergence of the minimization algorithm is provided which paves the way to analogous proofs for similar CNC algorithms.
1.1 Related work
Image segmentation is a relevant problem in the understanding of high level information from image vision. There exist many different ways to define the task of segmentation ranging from template matching, component labelling, thresholding, boundary detection, texture segmentation just to mention a few, and there is no universally accepted best segmentation procedure. The proposed work belongs to the class of region-based (rather than edge-based) variational models for multiphase segmentation without supervision constraints. Many variational models have been studied for image segmentation since the Mumford–Shah functional was introduced in [25]. A prototypical example is the Chan–Vese [7] model, which seeks the desired segmentation as the best piecewise constant approximation to a given image via a level set formulation. A variety of methods have been developed to generalize it and overcome the problem to solve nonconvex optimization problems. Specifically, in [22], the piecewise constant Mumford–Shah model was convexified by using fuzzy membership functions. In [31], a new regularization term was introduced which allows to choose the number of phases automatically. In [37, 38], efficient methods based on the fast continuous max-flow method were proposed. The segmentation method recently proposed by Cai et al. in [5] aims to minimize a convex version of the Mumford–Shah functional [25] by finding an optimal approximation of the image based on a piecewise smooth function. The main difference between our method and the approach in [5] is the regularization term: in [5], it consists of the sum of a smoothing term and a total variation term, where the latter replaces the non-convex term measuring the boundary length in the original Mumford–Shah model. However, it is well known that using non-convex penalties instead of the total variation regularizer holds the potential for more accurate penalizations [16, 18, 19]. In our variational model (1.1) we devise a unique regularization function which forces at the same time the smoothing of the approximate solution in the inner parts and the preservation of the features (corners, edges,..) along the boundaries of the segmented parts. This latter property is achieved by means of a non-convex regularizer. However, the CNC strategy applied to the solution of the optimization problem allows us to overcome the well-known numerical problems for the solution of the non-convex piecewise smooth Mumford–Shah original model.
This work is organized as follows. In Sect. 2 we characterize the non-convex penalty functions \(\phi (\,\cdot \,;T,a)\) considered in the proposed model. In Sect. 3 we provide a sufficient condition for strict convexity of the cost functional in (1.1) and in Sect. 4 we illustrate in detail the ADMM-based numerical algorithm used to compute approximate solutions of (1.1). A proof of convergence of the numerical algorithm is given in Sect. 5. Some segmentation results are illustrated in Sect. 6. Conclusions are drawn in Sect. 7.
2 Design of the non-convex penalty functions
In the rest of the paper, we denote the sets of non-negative and positive real numbers by \(\,{\mathbb R}_+ :{=}\, \{ \, t \in {\mathbb R}: \, t \ge 0 \}\) and \(\,{\mathbb R}_+^* :{=}\, \{ \, t \in {\mathbb R}: \, t > 0 \}\), respectively, and we indicate by \(\,I_d\) the \(d \times d\) identity matrix.
In this section, we design a penalty function \(\phi : {\mathbb R}_+ \rightarrow {\mathbb R}\) suitable for our purposes. In particular, the regularization term in the proposed model (1.1) has a twofold aim: in the first interval [0, T) it has to behave like a smoothing regularizer, namely a quadratic penalty, and in the second interval \([T,\infty ) \) it serves to control the length of the region boundaries, and is realized by a concave penalty function prolonged with a horizontal line.
To fulfill the above requirements we used a piecewise polynomial function defined over three subdomains [0, T), \([T,T_2)\) and \([T_2, \infty )\), with the following properties:
-
\(\;\phi \) continuously differentiable for \(t \in {\mathbb R}_+\)
-
\(\;\phi \) twice continuously differentiable for \(t \in {\mathbb R}_+ \!\setminus \! \{T,T_2\}\)
-
\(\;\phi \) convex and monotonically increasing for \(t \in [0,T)\)
-
\(\;\phi \) concave and monotonically non-decreasing for \(t \in [T,T_2) \)
-
\(\;\phi \) constant for \(t \in [T_2,\infty ) \)
-
\({\inf \nolimits _{t\in {\mathbb R}_+ \setminus \{T,T_2\}}} \phi '' = \; -a.\)
Recalling that the independent variable t of \(\phi (t;T,a)\) represents the gradient magnitude, the free parameter T allows us to control the segmentation, in particular it defines the lower slope considered as acceptable boundary for the segmentation process. The parameter a is used to make the entire functional \(\mathcal {J}\) in (1.1) convex, as will be detailed in Sect. 3. Finally, the parameter \(T_2\) is defined to allow for a good balancing between the two terms in the functional. In particular, the graph of the penalty function \(\phi \) must be pushed down when a increases. Towards this aim, we set \(T_2\) as \(T_2(a)\) in such a way that the slope in T given by \(\phi '(T; T,a)=(T_2-T)a\) is a monotonically decreasing function of the parameter a. In this work we set \(\phi '(T; T,a)=1/a\) so that \(T_2\) is set to be \(T_2= T+\frac{1}{a^2}\). Therefore, in the following we restrict the number of free parameters to a and T only.
The minimal degree polynomial function fulfilling the above requirements, turns out to be the following piecewise quadratic penalty function:
which has been obtained by imposing the following constraints:
-
\(\phi _1(0; T,a) = \phi _1^{\prime }(0; T,a) = 0 \)
-
\(\phi _1(T; T,a) = \phi _2(T; T,a) \)
-
\(\phi _1^{\prime }(T; T,a) = \phi _2^{\prime }(T; T,a)\)
-
\(\phi _2^{\prime }(T_2; T,a) = 0 \)
-
\(\phi _2^{\prime \prime }(t; T,a) = -a \qquad \forall t \in [T, T_2)\)
-
\(\phi _3 \; \text {constant} \qquad \qquad \quad \; \forall t \in [T_2, \infty )\)
In Figure 1 we show the plots of the penalty functions \(\phi (t;T,a)\) for three different values of \(a \in \{3,5,7\}\) with fixed \(T=0.2\). The solid dots in the graphs represent the points \((T,\phi (T;T,a))\) which separate the convex segment \(\phi _1\) from the non-convex ones \(\phi _2\) and \(\phi _3\).
This choice of a simple second-order piecewise polynomial as penalty function is motivated by computational efficiency issues as detailed in Sect. 4.
3 Convexity analysis
In this section, we analyze convexity of the optimization problem in (1.1). More precisely, we seek for sufficient conditions on the parameters \(\lambda ,T,a \in {\mathbb R}_+^*\) such that the objective functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is strictly convex. In the previous section we designed the penalty function \(\phi (\,\cdot \,;T,a)\) in (2.1) in such a way that it is continuously differentiable but not everywhere twice continuously differentiable. This choice, that was motivated by the higher achievable flexibility in the shape of the function, prevents us from using only arguments based on second-order differential quantities for the following analysis of convexity.
We rewrite \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) in explicit double-indexed form:
where \(\Omega \) represents the image lattice defined by
with \(n_1\) and \(n_2\) denoting the image height and width, respectively. In (3.1) we used a standard first-order forward finite difference scheme for discretization of the first-order horizontal and vertical partial derivatives. We notice that convexity conditions for the functional \(\mathcal {J}\) depend on the particular finite difference scheme used for discretization of the gradient. Nevertheless, the procedure used below for deriving such conditions can be adapted to other discretization choices.
In the following, we give four lemmas which allow us to reduce convexity analysis from the original functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) of n variables to simpler functions \(f(\,\cdot \,;\lambda ,T,a)\), \(g(\,\cdot \,;\lambda ,T,a)\) and \(h(\,\cdot \,;\lambda ,T,a)\) of three, two and one variables, respectively. Then in Theorem 3.5 we finally state conditions for strict convexity of our functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1).
Lemma 3.1
The function \(\mathcal {J}(\,\cdot \,;\lambda ,T,a): {\mathbb R}^n \rightarrow {\mathbb R}\) defined in (3.1) is strictly convex if the function \(f(\,\cdot \,;\lambda ,T,a): {\mathbb R}^3 \rightarrow {\mathbb R}\) defined by
is strictly convex.
Proof
First, we notice that the functional \(\mathcal {J}\) in (3.1) can be rewritten as
where \(\mathcal {A}(u)\) is an affine function of u. Then, we remark that each term of the last sum in (3.4) involves three different pixel locations and that, globally, the last sum involves each pixel location three times. Hence, we can clearly write
Since the affine function \(\mathcal {A}(u)\) does not affect convexity, we can conclude that the functional \(\mathcal {J}\) in (3.5) is strictly convex if the function f is strictly convex. \(\square \)
Lemma 3.2
The function \(f(\,\cdot \,;\lambda ,T,a): {\mathbb R}^3 \rightarrow {\mathbb R}\) defined in (3.3) is strictly convex if the function \(g(\,\cdot \,;\lambda ,T,a): {\mathbb R}^2 \rightarrow {\mathbb R}\) defined by
is strictly convex.
The proof is provided in the “Appendix”.
Lemma 3.3
Let \(\,\,\psi {:}\; {\mathbb R}^2 \rightarrow {\mathbb R}\) be a radially symmetric function defined as
Then, \(\psi \) is strictly convex in x if and only if the function \(\tilde{z}{:}\; {\mathbb R}\rightarrow {\mathbb R}\) defined by
is strictly convex in t.
The proof is provided in the “Appendix”.
Lemma 3.4
The function \(g(\,\cdot \,;\lambda ,T,a): {\mathbb R}^2 \rightarrow {\mathbb R}\) defined in (3.6) is strictly convex if and only if the function \(h(\,\cdot \,;\lambda ,T,a): {\mathbb R}\rightarrow {\mathbb R}\) defined by
is strictly convex.
The proof is immediate by applying Lemma 3.3 to the function g in (3.6).
Theorem 3.5
Let \(\,\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1). Then, a sufficient condition for the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) to be strictly convex is that the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfies:
Proof
It follows from Lemmas 3.1, 3.2 and 3.4 that a sufficient condition for the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) to be strictly convex is that the function \(h(\,\cdot \,;\lambda ,T,a)\) in (3.9) is strictly convex. Recalling the definition of the penalty function \(\phi (\,\cdot \,;T,a)\) in (2.1), \(h(\,\cdot \,;\lambda ,T,a)\) in (3.9) can be rewritten in the following explicit form:
Clearly, the function h above is even and piecewise quadratic; and, as far as regularity is concerned, it is immediate to verify that \(\,h \in \mathcal {C}^\infty ({\mathbb R}\setminus \{\pm T,\pm T_2\}) \,\cap \, \mathcal {C}^1({\mathbb R})\). In particular, the first-order derivative function \(h': {\mathbb R}\rightarrow {\mathbb R}\) and the second-order derivative function \(h'': {\mathbb R}\setminus \{\pm T,\pm T_2\} \rightarrow {\mathbb R}\) are as follows:
We notice that the functions h in (3.11) and \(h'\) in (3.12) are both continuous at points \(t \in \{\pm T, \pm T_2\}\), whereas for the function \(h''\) in (3.13) we have at points \(t \in \{T,T_2\}\) (analogously at points \(t \in \{-T,-T_2\}\)):
After recalling that \(\lambda ,T,a > 0\) and \(T_2 > T\), we notice that
hence the function h is strictly convex for \(|t| \in [0,T)\) and \(|t| \in (T_2,\infty )\). A sufficient condition (it is also a necessary condition since the function is quadratic) for h to be strictly convex in \(|t| \in (T,T_2)\) is that the second-order derivative \(h_2''(t;\lambda ,T,a)\) defined in (3.13) is positive. This clearly leads to condition (3.10) in the theorem statement. We have thus demonstrated that if (3.10) is satisfied then h is strictly convex for \(t \in {\mathbb R}\setminus \{\pm T,\pm T_2\}\).
It remains to handle the points \(\pm T,\pm T_2\) where the function h does not admit second-order derivatives. Since h is even and continuously differentiable, it is sufficient to demonstrate that if condition (3.10) is satisfied then the first-order derivative \(h'\) is monotonically increasing at points \(t \in \{T,T_2\}\). In particular, we aim to prove:
Recalling the definition of \(h'\) in (3.12), we obtain:
and, after simple algebraic manipulations:
Since \(\lambda ,T,a > 0\) and we are assuming \(\,\lambda \,{>}\; 9 a\) and \(0< t_1< T< t_2< T_2 < t_3\), inequalities in (3.20) and (3.21) are clearly satisfied, hence the proof is completed.
\(\square \)
We conclude this section by highlighting some important properties of the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1).
Definition 3.6
Let \(Z: {\mathbb R}^n \rightarrow {\mathbb R}\) be a (not necessarily smooth) function. Then, Z is said to be \(\mu \)-strongly convex if and only if there exists a constant \(\mu > 0\), called the modulus of strong convexity of \(\,Z\), such that the function \(Z(x) - \frac{\mu }{2} \, \Vert x \Vert _2^2\) is convex.
Proposition 3.7
Let \(\,\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1) and let the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfy condition (3.10). Then, the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is proper,Footnote 1 continuous (hence, lower semi-continuous), bounded from below by zero, coercive and \(\mu \)-strongly convex with modulus of strong convexity (at least) equal to
The proof is provided in the “Appendix”.
4 Applying ADMM to the proposed CNC model
In this section, we illustrate in detail the ADMM-based [3] iterative algorithm used to numerically solve the proposed model (1.1) in case that the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfies condition (3.10), so that the objective functional \(\mathcal {J}(u;\lambda ,T,a)\) in (1.1) is strongly convex with modulus of strong convexity given in (3.22). Towards this aim, first we resort to the variable splitting technique [1] and introduce the auxiliary variable \(t \in {\mathbb R}^{2n}\), such that model (1.1) is rewritten in the following linearly constrained equivalent form:
where the matrix \(D \in {\mathbb R}^{2n \times n}\) is defined by
with \(D_h,D_v \in {\mathbb R}^{n \times n}\) the unscaled finite difference operators approximating the first-order horizontal and vertical partial derivatives of an \(n_1 \times n_2\) image (with \(n_1 n_2 = n\)) in vectorized column-major form, respectively, according to the unscaled first-order forward scheme considered in the previous section. More precisely, matrices \(D_h\), \(D_v\) are defined by
where \(\otimes \) is the Kronecker product operator and \(L_{z} \in {\mathbb R}^{z \times z}\) denotes the unscaled forward finite difference operator approximating the first-order derivative of a z-samples 1D signal. In particular, we assume periodic boundary conditions for the unknown image u, such that matrix \(L_z\) takes the form
In (4.1) we defined as \(t_i :{=}\, \big ( (D_h u)_i \,,\, (D_v u)_i \big )^T \in {\mathbb R}^2\) the discrete gradient of the image u at pixel i, obtained by selecting from the vector t the \(i-\)th and \((i+n)-\)th entries. We remark that the auxiliary variable t has been introduced in (4.1)–(4.2) to transfer the discrete gradient operators \((\nabla u)_i\) in (1.1) out of the non-convex non-smooth regularization term \({\phi (\Vert \, \cdot \, \Vert _2;T,a)}\).
To solve problem (4.1)–(4.2), we define the augmented Lagrangian functional
where \(\beta > 0\) is a scalar penalty parameter and \(\rho \in {\mathbb R}^{2n}\) is the vector of Lagrange multipliers associated with the system of linear constraints in (4.2). We then consider the following saddle-point problem:
with the augmented Lagrangian functional \(\mathcal {L}\) defined in (4.6).
Given the previously computed (or initialized for \(k = 1\)) vectors \(t^{(k-1)}\) and \(\rho ^{(k)}\), the k-th iteration of the proposed ADMM-based scheme applied to the solution of the saddle-point problem (4.6)–(4.7) reads as follows:
In Sect. 5 we will show that, under mild conditions on the penalty parameter \(\beta \), the iterative scheme in (4.8)–(4.10) converges to a solution of the saddle-point problem (4.7), that is to a saddle point \((u^*,t^*;\rho ^*)\) of the augmented Lagrangian functional in (4.6), with \(u^*\) representing the unique solution of the strongly convex minimization problem (1.1).
In the following subsections we show in detail how to solve the two minimization sub-problems (4.8) and (4.9) for the primal variables u and t, respectively, then we present the overall iterative ADMM-based minimization algorithm.
4.1 Solving the sub-problem for \(\varvec{u}\)
Given \(t^{(k-1)}\) and \(\rho ^{(k)}\), and recalling the definition of the augmented Lagrangian functional in (4.6), the minimization sub-problem for u in (4.8) can be rewritten as follows:
where constant terms have been omitted. The quadratic minimization problem (4.11) has first-order optimality conditions which lead to the following linear system:
We remark that from the definition (4.3) of matrix \(D \in {\mathbb R}^{2n \times n}\), it follows that \(D^T D \in {\mathbb R}^{n \times n}\) in (4.12) can be written as \(D_h^T D_h + D_v^TD_v\), with \(D_h, D_v \in {\mathbb R}^{n \times n}\) defined in (4.4)–(4.5). Hence, \(D^T D\) represents a 5-point stencil finite difference discretization of the negative 2D Laplace operator. Since, moreover, \(\beta /\lambda > 0\), the coefficient matrix of the linear system (4.12) is symmetric, positive definite and highly sparse, therefore (4.12) can be solved very efficiently by the iterative (preconditioned) conjugate gradient method.
However, under appropriate assumptions about the solution u near the image boundary, the linear system can be solved even more efficiently by a direct method.
In particular, since we are assuming periodic boundary conditions for u, the matrix \(D^T D\) is block circulant with circulant blocks, so that the coefficient matrix \((I + \frac{\beta }{\lambda } \, D^T D)\) in (4.12) can be completely diagonalized by the 2D discrete Fourier transform (FFT). It follows that at any ADMM iteration the u-subproblem, i.e. the linear system (4.12), can be solved explicitly by one forward and one inverse 2D FFT, each at a cost of \(O(n \log n)\) operations.
We remark that the same computational cost can be achieved by using reflective or anti-reflective boundary conditions, see [10, 12, 26].
4.2 Solving the sub-problem for \(\varvec{t}\)
Given \(u^{(k)}\) and \(\rho ^{(k)}\), and recalling the definition of thepr augmented Lagrangian functional in (4.6), after some simple algebraic manipulations the minimization sub-problem for t in (4.9) can be rewritten in the following component-wise (or pixel-wise) form:
where the n vectors \(r_i^{(k)} \in {\mathbb R}^{2}\), \(i = 1,\ldots ,n\), which are constant with respect to the optimization variable t, are defined by:
with \(\,\big (D u^{(k)}\big )_i\), \(\big (\rho ^{(k)}\big )_{i} \in {\mathbb R}^2\) denoting the discrete gradient and the associated pair of Lagrange multipliers at pixel i, respectively. The minimization problem in (4.13) is thus equivalent to the following n independent 2-dimensional problems:
where, for future reference, we introduced the cost functions \(\theta _i: {\mathbb R}^2 \rightarrow {\mathbb R}\), \(i = 1,\ldots ,n\).
Since we are imposing that condition (3.10) is satisfied, such that the original functional \(\mathcal {J}(u;\lambda ,T,a)\) in (1.1) is strictly convex, we clearly aim at avoiding non-convexity of the ADMM sub-problems (4.15). In the first part of Proposition 4.1 below, whose proof is provided in the “Appendix”, we give a necessary and sufficient condition for strict convexity of the cost functions in (4.15).
Proposition 4.1
Let \(\,T,a,\beta \,{\in }\; {\mathbb R}_+^*\) and \(r \,{\in }\; {\mathbb R}^2\) be given constants, and let \(\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1). Then:
-
(1)
The function
$$\begin{aligned} \theta (x) :{=} \phi \left( \Vert x \Vert _2;T,a \right) + \frac{\beta }{2} \, \Vert x - r \Vert _2^2, \quad x \in {\mathbb R}^2, \end{aligned}$$(4.16)is strictly convex (convex) if and only if the following condition holds:
$$\begin{aligned} \beta > a \, \; \quad (\beta \,\;{\ge }\;\, a \,). \end{aligned}$$(4.17) -
(2)
In case that (4.17) holds, the strictly convex minimization problem
$$\begin{aligned} \arg \min _{x \in {\mathbb R}^2} \, \theta (x) \end{aligned}$$(4.18)admits the unique solution \(\,x^* {\in }\;\, {\mathbb R}^2\) given by the following shrinkage operator:
$$\begin{aligned} x^* = \xi ^* r,\quad \mathrm {with} \;\; \xi ^* \, {\in } \; (0,1] \end{aligned}$$(4.19)equal to
$$\begin{aligned} \begin{array}{rllcl} \mathrm {a)} \;&{} \xi ^* \,{=}\;\,\, \kappa _1 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \big [0,\kappa _0\big ) \\ \mathrm {b)} \;&{} \xi ^* \,{=}\;\,\, \kappa _2 - \kappa _3 / \Vert r \Vert _2 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \big [\kappa _0,T_2\big ) \\ \mathrm {c)} \;&{} \xi ^* \,{=}\;\,\, 1 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \left[ T_2,+\infty \right) \\ \end{array} \end{aligned}$$(4.20)and with
$$\begin{aligned} \kappa _0 = T + \frac{a}{\beta }\,(T_2-T), \quad \;\, \kappa _1 = \frac{T}{\kappa _0}, \quad \;\, \kappa _2 = \frac{\beta }{\beta -a}, \quad \;\, \kappa _3 = \frac{a T_2}{\beta - a}. \end{aligned}$$(4.21)
Based on (4.16)–(4.17) in Proposition 4.1, we can state that all the problems in (4.15) are strictly convex if and only if
In case that (4.22) is satisfied, the unique solutions of the strictly convex problems (4.15) can be obtained based on (4.19) in Proposition 4.1, that is:
where the shrinkage coefficients \(\,\xi _i^{(k)} {\in }\, (0 , 1]\) are computed according to (4.20). We notice that coefficients \(\kappa _0,\kappa _1,\kappa _2,\kappa _3\) in (4.21) are constant during the ADMM iterations, hence they can be precomputed once and for all at the beginning. The solutions of problems (4.15) can thus be determined very efficiently by the closed forms given in (4.20), with computational cost O(n).
4.3 ADMM-based minimization algorithm
To summarize previous results, in Algorithm 1 we report the main steps of the proposed ADMM-based iterative scheme used to solve the saddle-point problem (4.6)–(4.7) or, equivalently (as it will be proven in the next section), the minimization problem (1.1). We remark that the constraint on the ADMM penalty parameter \(\beta \) in Algorithm 1 is more stringent than that in (4.22) which guarantees the convexity of the ADMM subproblem for the primal variable t. Indeed, the more stringent requirement is needed for the analysis of convergence that will be carried out in Sect. 5.
5 Convergence analysis
In this section, we analyze convergence of the proposed ADMM-based minimization approach, whose main computational steps are reported in Algorithm 1. In particular, we prove convergence of Algorithm 1 in case that conditions (3.10) and (5.16) are satisfied.
To simplify the notations in the subsequent discussion, we give the following definitions concerning the objective functional in the (u, t)-split problem (4.1)–(4.2):
where the parametric dependencies of the fidelity term F(u) on \(\lambda \) and of the regularization term R(t) on T and a are dropped for brevity. The augmented Lagrangian functional in (4.6) can thus be rewritten more concisely as
and the regularization term in the original proposed model (1.1), referred to as \(\mathcal {R}(u)\), reads
To prove convergence, we follow the same methodology used, e.g., in [35], for convex optimization problems, based on optimality conditions of the augmented Lagrangian functional with respect to the pair of primal variables (u, t) and on the subsequent construction of suitable Fejér-monotone sequences. However, in [35] and other works using the same abstract framework for proving convergence, such as [36] where convex non-quadratic fidelity terms are considered, the regularization term of the convex objective functional is also convex (Total Variation semi-norm in [35, 36]). In contrast, in our CNC model (1.1) the total objective functional \(\mathcal {J}\) is convex but the regularizer is non-convex. This calls for a suitable adaptation of the above mentioned proofs, in particular of the proof in [35], where the same \(\ell _2\) fidelity term as in our model (1.1) is considered.
Our proof will be articulated into the following parts:
-
1.
derivation of optimality conditions for problem (1.1);
-
2.
derivation of convexity conditions for the augmented Lagrangian in (5.2);
-
3.
demonstration of equivalence (in terms of solutions) between the split problem (4.1)–(4.2) and the saddle-point problem (4.6)–(4.7);
-
4.
demonstration of convergence of Algorithm 1 to a solution of the saddle-point problem (4.6)–(4.7), hence to the unique solution of (1.1).
5.1 Optimality conditions for problem (1.1)
Since the regularization term in our model (1.1) is non-smooth non-convex, unlike in [35] we need to resort also to concepts from calculus for non-smooth non-convex functions, namely the generalized (or Clarke) gradients [11]. In the following we will denote by \(\partial _x [\, f \,](x^*)\) and by \(\bar{\partial }_x [\, f \,](x^*)\) the subdifferential (in the sense of convex analysis [29, 30]) and the Clarke generalized gradient [11], respectively, with respect to x of the function f calculated at \(x^*\).
In Lemma 5.1 below we give some results on locally Lipschitz continuity for the functions involved in the subsequent demonstrations, which are necessary for the generalized gradients being defined. Then, in Proposition 5.2, whose proof is given in [20], we give the first-order optimality conditions for problem (1.1).
Lemma 5.1
For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), the functional \(\mathcal {J}\) in (1.1) and, separately, the regularization term \(\mathcal {R}\) in (5.3) and the quadratic fidelity term, are locally Lipschitz continuous functions.
Proof
The proof is immediate by considering that the quadratic fidelity term and, under condition (3.10), the total functional \(\mathcal {J}\) in (1.1), are both convex functions, hence locally Lipschitz, and that the regularization term \(\mathcal {R}\) in (5.3) is the composition of locally Lipschitz functions (note that the penalty function \(\phi (\,\cdot \,;T,a)\) defined in (2.1) is globally L-Lipschitz continuous with \(L=a(T_2-T)\)), hence locally Lipschitz. \(\square \)
Proposition 5.2
For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), the functional \(\mathcal {J}: {\mathbb R}^n \rightarrow {\mathbb R}\) in (1.1) has a unique (global) minimizer \(u^*\) which satisfies
where 0 denotes the null vector in \({\mathbb R}^n\) and \(\partial _u \big [\,\mathcal {J}\,\big ](u^*) \subset {\mathbb R}^n\) represents the subdifferential (with respect to u, calculated at \(u^*\)) of functional \(\mathcal {J}\). Moreover, it follows that
where \(\,\bar{\partial }_{t}\big [\,R\,\big ](Du^*) \subset {\mathbb R}^{2n}\) denotes the Clarke generalized gradient (with respect to t, calculated at \(Du^*\)) of the non-convex non-smooth regularization function R defined in (5.1).
5.2 Convexity conditions for the augmented Lagrangian in (5.2)
Subsequent parts 3. and 4. of our convergence analysis require that the augmented Lagrangian functional in (5.2) is jointly convex with respect to the pair of primal variables (u, t). In [35, 36], where the regularization term is convex, such property is clearly satisfied for any positive value of the penalty parameter \(\beta \). In our case, where the regularization term is non-convex, this is not trivially true and needs some investigation, which is the subject of Lemma 5.3 and Proposition 5.4 below.
Lemma 5.3
Let \(F: {\mathbb R}^n \rightarrow {\mathbb R}\) be the fidelity function defined in (5.1), \(D \in {\mathbb R}^{2n \times n}\) the finite difference matrix defined in (4.3)–(4.5), \(b \in {\mathbb R}^n\), \(v \in {\mathbb R}^{2n}\) given constant vectors and \(\lambda \in {\mathbb R}_+^*\), \(\gamma \in {\mathbb R}\) free parameters. Then, the function \(Z: {\mathbb R}^n \rightarrow {\mathbb R}\) defined by
is convex in u if and only if
Moreover, if the parameter \(\lambda \) satisfies condition (3.10), (5.7) becomes
Proof
The function Z in (5.6) is quadratic in u, hence it is convex in u if and only if its Hessian \(H_Z\,{\in }\; {\mathbb R}^{n \times n}\) is at least positive semidefinite, that is if and only if
where, for a given real symmetric matrix M, the notation \(M \succeq 0\) indicates that M is positive semidefinite. Since the matrix \(\,D^T D \in {\mathbb R}^{n \times n}\) is symmetric and positive semidefinite, it admits the eigenvalue decomposition
with \(\,e_i \,{\in }\; {\mathbb R}_+, \, i=1,\ldots ,n,\,\) indicating the real non-negative eigenvalues of \(D^TD\). Replacing (5.10) into (5.9), we obtain:
Condition (5.11) is equivalent to
As previously highlighted, the matrix \(D^TD \in {\mathbb R}^{n \times n}\) represents a 5-point stencil unscaled finite difference approximation of the negative Laplace operator applied to an \(n_1 \times \, n_2\) vectorized image, with \(n = n_1 n_2\). In order to obtain an upper bound for the maximum eigenvalue \(\max _{i} e_i\) of \(D^T D\), first we recall definitions (4.3)–(4.4) and rewrite \(D^T D\) as
where (5.13) follows from properties of the Kronecker product operator and \(\oplus \) in (5.14) denotes the Kronecker sum operator. It thus follows from (5.14) and from the properties of the eigenvalues of the Kronecker sum of two matrices that the maximum eigenvalue of \(D^T D\) is equal to the sum of the maximum eigenvalues of \(\,L_{n_1}^T L_{n_1} \in {\mathbb R}^{n_1 \times n_1}\) and \(L_{n_2}^T L_{n_2} \in {\mathbb R}^{n_2 \times n_2}\), with the prototypical matrix \(L_z \in {\mathbb R}^{z \times z}\) defined in (4.5). From (4.5), we have that matrix \(L_z^T L_z \in {\mathbb R}^{z \times z}\) takes the form
By using Gershgorin’s theorem [39], we can state that all the real eigenvalues of the matrix \(L_z^T L_z\) in (5.15) belong to the interval [0, 4] independently of the order z. It follows that the eigenvalues of both matrices \(\,L_{n_1}^T L_{n_1}\) and \(\,L_{n_2}^T L_{n_2}\) are bounded from above by 4 and, hence, the maximum eigenvalue \(\max _{i} e_i\) of \(D^T D\) is bounded from above by 8. Substituting 8 for \(\max _{i} e_i\) in (5.12), condition (5.7) and condition (5.8) follow. \(\square \)
Proposition 5.4
For any given vector of Lagrange multipliers \(\rho \in {\mathbb R}^{2n}\), the augmented Lagrangian functional \(\mathcal {L}(u,t;\rho )\) in (5.2) is proper, continuous and coercive jointly in the pair of primal variables (u, t). Moreover, in case that condition (3.10) is satisfied, \(\mathcal {L}(u,t;\rho )\) is jointly convex in (u, t) if the penalty parameter \(\beta \) satisfies
or, equivalently
Proof
Functional \(\mathcal {L}(u,t;\rho )\) in (5.2) is clearly proper and continuous in (u, t). For what concerns coercivity, we rewrite \(\mathcal {L}(u,t;\rho )\) as follows:
We notice that the second term in (5.19) is bounded and the last term does not depend on (u, t), hence they do not affect coercivity of \(\mathcal {L}(u,t;\rho )\) with respect to (u, t). Then, the first and third terms in (5.19) are jointly coercive in u and the third term is coercive in t, hence \(\mathcal {L}(u,t;\rho )\) is coercive in (u, t).
Starting from (5.18), we have:
where \(\gamma _1\) and \(\gamma _2\) are scalars satisfying
such that, according to condition (5.8) in Lemma 5.3 and condition (4.17) in Proposition 4.1, the terms \(\mathcal {L}_2(u)\) and \(\mathcal {L}_3(t)\) in (5.20) are convex in u and t (hence also in (u, t)), respectively. The term \(\mathcal {L}_1(u,t)\) is affine in (u, t), hence convex. For what concerns \(\mathcal {L}_4(u,t)\), it is jointly convex in (u, t) if it can be reduced to the form \(+\big \Vert \, c_1 Du - c_2 t \big \Vert _2^2\) with \(c_1,c_2 > 0\). We thus impose that the coefficients of the terms \(\Vert Du\Vert _2^2\) and \(\Vert t\Vert _2^2\) in (5.20) are positive and that twice the product of the square roots of these coefficients is equal to the coefficient of the term \(-\langle Du , t \rangle \) in (5.20), that is:
Simple calculations prove that conditions in (5.21)–(5.22) are equivalent to the following:
We notice that the set \(\Gamma \) in (5.23) is not empty and contains only strictly positive pairs \((\gamma _1,\gamma _2)\). Hence, there exist triplets \((\beta ,\gamma _1,\gamma _2)\) such that (5.23) is satisfied and the augmented Lagrangian functional in (5.20) is convex. By simple analysis, we can see that the range of the function \(\beta (\gamma _1,\gamma _2) = \gamma _1 \gamma _2 / (\gamma _1 - \gamma _2)\) with \((\gamma _1,\gamma _2) \in \Gamma \) coincides with (5.17) and, after recalling that \(\lambda = \tau _c 9 a\), with (5.16). This completes the proof. \(\square \)
5.3 Equivalence between problems (4.1)–(4.2) and (4.6)–(4.7)
The optimality conditions derived in Sect. 5.1, together with the convexity conditions in Sect. 5.2 and the results in the following Lemmas 5.5–5.6 allow us to demonstrate in Theorem 5.7 that the saddle-point problem in (4.6)–(4.7) is equivalent (in terms of solutions) to the minimization problem in (4.1)–(4.2). In particular, we prove that, for any pair \((\lambda ,a)\) satisfying (3.10) and any \(\beta \) satisfying (5.17), the saddle-point problem (4.6)–(4.7) has at least one solution and, very importantly, all its solutions will provide pairs of primal variables \((u^*,t^*)\) with \(t^*=Du^*\), which solve the split problem (4.1)–(4.2), and thus \(u^*\) represents the unique global minimizer of the strongly convex functional \(\mathcal {J}\) in (1.1).
Lemma 5.5
Assume that \(Z = Q + S\), Q and S being lower semi-continuous convex functions from \(\,\,{\mathbb R}^n\) into \(\,{\mathbb R}\), \(\,S\) being G\(\hat{a}\)teaux-differentiable with differential \(S^{\prime }\). Then, if \(p^* \in {\mathbb R}^n\), the following two conditions are equivalent to each other:
-
(1)
\(\;p^* \;{\in }\; \arg \inf _{p \in {\mathbb R}^n} Z(p)\,\);
-
(2)
\(\;Q(p)-Q(p^*) + \left\langle \, S^{\prime } (p^*) , p-p^* \, \right\rangle \ge 0 \quad \; \forall p \in {\mathbb R}^n\).
Moreover, in case that the function Q has a (separable) structure of the type \(Q(p) = Q(p_1,p_2) = Q_1(p_1) + Q_2(p_2)\) with \(p_1 \in {\mathbb R}^{n_1}\) and \(p_2\in {\mathbb R}^{n_2}\) (and \(n_1 + n_2 = n\)) being two disjoint subsets of independent variables, then conditions (1) and (2) above are also both equivalent to the following condition:
-
(3)
\(\;\left\{ \begin{array}{lcl} p_1^* &{}{\in }&{} \arg \inf _{p_1 \in {\mathbb R}^{n_1}} \big \{ Z_1(p_1) :{=}\, Z(p_1,p_2^*) \, \big \} \\ p_2^* &{}{\in }&{} \arg \inf _{p_2 \in {\mathbb R}^{n_2}} \big \{ Z_2(p_2) :{=} \, Z(p_1^*,p_2) \, \big \} \end{array} \right. \;\;\,\mathrm {with}\;\; (p_1^*,p_2^*) = p^* .\)
Proof
Equivalence of conditions (1) and (2) is demonstrated in [14, Proposition 2.2]. If condition (1) is true, that is \(p^*\) is a global minimizer of the function Z over its domain \({\mathbb R}^n\), then clearly \(p^*\) is also a global minimizer of the restriction of Z to any subset of \({\mathbb R}^n\) containing \(p^*\). Hence, condition (1) implies condition (3). To demonstrate that condition (3) implies condition (2), which completes the proof, first we rewrite in explicit form the two objective functions \(Z_1\) and \(Z_2\) defined in condition (3):
Assume now that condition (3) holds. Then, by applying the equivalence of conditions (1) and (2), condition (3) can be rewritten as follows:
By summing up the two inequalities in (5.26), we obtain:
which coincides with condition 2), thus completing the proof. \(\square \)
Lemma 5.6
For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), any penalty parameter \(\,\beta \) fulfilling condition (5.17) and any vector of Lagrange multipliers \(\rho \), the augmented Lagrangian functional \(\,\mathcal {L}\) in (5.2) satisfies the following:
Proof
According to the results in Proposition 5.4, in case that the parameters \((\lambda ,a)\) and \(\,\beta \) satisfy conditions (3.10) and (5.17), respectively, the augmented Lagrangian \(\mathcal {L}(u,t;\rho )\) in (5.2) is proper, continuous, coercive and convex jointly in the pair of primal variables (u, t). In particular, within the proof of Proposition 5.4 we highlighted how for any \(\beta \) satisfying (5.17) there exist at least one pair of scalar coefficients \((\gamma _1,\gamma _2)\) such that the augmented Lagrangian functional can be written in the form
with functions \(\mathcal {L}_1,\mathcal {L}_2,\mathcal {L}_3,\mathcal {L}_4\) being convex in (u, t). In order to apply Lemma 5.5, we rewrite (5.29) in the following equivalent form:
From the convexity of functions \(\mathcal {L}_2\) and \(\mathcal {L}_3\) in (5.29), it clearly follows that \(Q_1\) and \(Q_2\), and hence Q, in (5.30) are convex as well. Moreover, function S in (5.30) is clearly convex and G\(\hat{a}\)teaux-differentiable. Hence, by simply applying the property of equivalence of conditions 1) and 3) in Lemma 5.5, statement (5.28) follows and the proof is completed. \(\square \)
Theorem 5.7
For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10) and any parameter \(\,\beta \) fulfilling condition (5.17), the saddle-point problem (4.6)–(4.7) admits at least one solution and all the solutions have the form \((u^*,Du^*;\rho ^*)\), with \(u^*\) denoting the unique global minimizer of functional \(\,\mathcal {J}\) in (1.1).
The proof is postponed to the “Appendix”.
5.4 Convergence of Algorithm 1 towards a solution of (4.6)–(4.7)
Given the existence and the good properties of the saddle points of the augmented Lagrangian functional in (5.2), highlighted in Theorem 5.7, it remains to demonstrate that the ADMM iterative scheme outlined in Algorithm 1 converges towards one of these saddle points, that is towards a solution of the saddle-point problem (4.6)–(4.7). This is the goal of Theorem 5.8 below.
Theorem 5.8
Assume that \(\,(u^*,t^*;\rho ^*)\) is a solution of the saddle-point problem (4.6)–(4.7). Then, for any pair of parameters \((\lambda ,a)\) satisfying condition (3.10) and any parameter \(\beta \) fulfilling condition
the sequence \(\,\big \{(u^{(k)},t^{(k)};\rho ^{(k)})\big \}_{k=1}^{+\infty }\,\) generated by Algorithm 1 satisfies:
The proof is postponed to the “Appendix”.
We conclude this analysis with the following final theorem, whose proof is immediate given Theorem 5.7 and Theorem 5.8 above.
Theorem 5.9
Let the pair of parameters \((\lambda ,a)\) satisfy condition (3.10) and let the parameter \(\beta \) satisfy condition (5.31). Then, for any initial guess, the sequence \(\,\big \{u^{(k)}\}_{k=1}^{+\infty }\,\) generated by Algorithm 1 satisfies:
with \(u^*\) denoting the unique solution of problem (1.1).
6 Experimental results
In this section, we investigate experimentally the performance of the proposed CNC segmentation approach, named CNCS, on 2D synthetic and real images. In particular, in the first two examples we compare CNCS with the most relevant alternative approach, namely the two-step convex segmentation method presented in [5]. Then, in the third example we extend the comparison to some other recent state-of-the-art multiphase segmentation approaches. In the visual results of all the presented experiments, the boundaries of the segmented regions are superimposed on the original images.
We recall that the proposed CNCS method consists of two steps. In the first step an approximate solution of the minimization problem (1.1) is computed by means of the ADMM-based iterative procedure illustrated in Algorithm 1; iterations are stopped as soon as two successive iterates satisfy \(\Vert u^{(k)} - u^{(k-1)} \Vert _2 / \Vert u^{(k-1)} \Vert _2 < 10^{-4}\). In the second step the segmented regions are obtained by means ofF a thresholding procedure.
For what concerns the parameters of the CNCS approach, first we used in all the experiments a fixed value \(\tau _c = 1.01\) of the convexity coefficient in (3.10), so as to always use an almost maximally non-convex regularizer with the constraint that the total functional is convex. Then, the regularization parameter \(\lambda \) has been hand-tuned in each experiment—as usually done in most of the variational methods in image processing—so as to produce the best possible results. We remark that, once \(\tau _c\) and \(\lambda \) have been set, the parameter a and, then, the lower bound \(\,\bar{\beta }\,\) of the ADMM penalty parameter \(\beta \) are automatically obtained based on (3.10) and (5.31), respectively. Moreover, we noticed from the experiments that using \(\beta = 1.1 \,{\cdot }\, \bar{\beta }\) always yields a satisfactory convergence behavior of the ADMM-based algorithm.
For what finally concerns the model parameter T, it has been devised exactly to let the user control the salient parts to be segmented. More precisely, as already pointed out, T should be selected as the maximum expected magnitude of the gradient in the inner segmented part. However, users which are not expert in the field can be easily assisted by automatic procedures based on image gradient estimation, provided in MATLAB and in public image processing software packages such as GIMP.
Example 1
In this example we demonstrate the performance of the proposed CNCS approach in comparison with the convex two-phase model proposed by Cai et al. in [5], referred to as CCZ in the following. For all the experiments, the two parameters \(\lambda \) and \(\eta \) of the CCZ approach have been hand-tuned so as to produce the best possible results. In order to highlight the benefits of using a non-convex penalty function, we applied both methods to the segmentation of a noisy synthetic image, named geometric, containing four basic shapes, with three different gray intensities, on a white background.
In the first row of Fig. 2, from left to right, we show the given image corrupted by additive Gaussian noise with noise level 0.02, the approximate solution \(u^*\) obtained by Algorithm 1 (CNCS) in 7 iterations and by the convex two-phase algorithm (CCZ) in 31 iterations, and the difference between the \(u^*\)’s obtained by CNCS and CCZ algorithms. For the second thresholding step in both algorithms the thresholds were fixed to be \(\rho _1= 0.85\) , \(\rho _2= 0.90\) and \(\rho _3= 0.99\).
The four different regions segmented by the CCZ and CNCS methods are shown in the second and third rows of Fig. 2, respectively. The boundaries of the segmented regions are shown with red color and superimposed on the given images. CCZ fails to detect region 3 (light gray shapes) and it smooths out the boundaries. This behavior is justified from the fact that the well-known TV regularizer used in [5] is defined as the \(\ell _1\)-norm, which inevitably curtails originally salient boundaries to penalize their magnitudes. In particular, as discussed in [34], the TV of a feature is directly proportional to its boundary size, so that one way of minimizing the TV of that feature would be to reduce its boundary size, in particular by smoothing corners. Moreover, the change in intensity due to TV regularization is inversely proportional to the scale of the feature, so that very small-scale features are removed, thus causing failures in the segmentation procedure.
By exploiting the penalty term \(\phi \) introduced in our model, the pixels whose gradient is above the threshold T are identified by the function pieces \(\phi _2\) and \(\phi _3\) defined in (2.1) that represent a very good approximation of the perimeter function introduced in the Mumford–Shah original model. Therefore our model well preserves the sharp boundary shapes as illustrated in Fig. 2 (third row).
In Fig. 3 we highlight the effects of the parameter T on the segmentations obtained by CNCS. In particular, we depict in blue the pixels treated by the functions \(\phi _2\) and \(\phi _3\) for different values of the parameter T: \(T=0.1\) first column, \(T=0.05\) second column, and \(T=8 \times 10^{-4}\) third column. The largest T allows one to detect clearly only the boundary of the darkest rectangle, with the other T values we can distinguish the two gray darker rectangles, Fig. 3 (middle), and with the smallest T we can easily detect the four simple shapes.
Example 2
In this example we demonstrate the benefit of the penalty function introduced in the proposed model to allow for segmentations that reproduce sharp boundaries and detect inhomogeneous regions.
Figure 4a shows the image rectangles of dimension \(400 \times 400\), that we want to segment into \(K=2\) parts: vertical rectangles and background, where the rectangles are characterized by magnitude inhomogeneity. The segmentation results obtained by the proposed method are shown in Fig. 4b using yellow-colored boundaries. The penalty function used in our model allows for preserving the sharp features of the boundaries (e.g., corners) even in the darker regions where the intensity gradient characterizing the boundary is very small.
This is a simple example belonging to the class of piecewise smooth segmentation problems for which the model CCZ in [5] has been devised. The segmentation results obtained by CCZ are shown in Fig. 4c, d using, in the second phase, a manually tuned optimal threshold value \(\rho = 0.19\) and a K-means algorithm, respectively. The former can better reproduce the segmented boundaries, however both the results present rounded corners. This effect is again motivated by the approximation of the perimeter with the \(\ell _1\)-norm regularization term in the CCZ model.
A set of state-of-the-art two-phase segmentation methods, including the unsupervised model by Sandberg et al. [31], the max-flow model by Yuan et al. [38], the Chan–Vese [7] and the frame-based variational framework by Dong. et al. [13] are evaluated and compared with our model. The codes we used are provided by the authors, and the parameters in the codes were chosen by trial and error to produce the best results of each method. In Fig. 4e–h the resulting segmentations are superimposed on the original image by using yellow-colored boundaries. We can observe that all the methods fail in segmenting the darker upper part of the two rectangles in the synthetic image, the method CCZ (Fig. 4d) fails only when the automatic K-means post-processing is applied, while our method achieves extremely accurate results.
Example 3
In this example, we test the CNCS approach on real, more complex, images and compare it with some state-of-the-art alternative approaches for multi-region image segmentation. Figure 5 shows segmentation results obtained on a real image brain, a MRI brain scan of dimension \(583 \times 630\). The multiphase segmentation is applied to detect the three regions (\(K=3\)) characterized by white, gray and black pixel color values. For multiphase segmentation, the thresholds \(\rho _i\) used in the second stage can be chosen according to the ones obtained by K-means or can be tuned, in both cases without recomputing the solution of the first more expensive minimization problem, in order to achieve specific segmentation results. In particular, for this example we used the thresholds \(\rho _1= 0.18\) and \(\rho _2=0.56\) for both methods CNCS and CCZ.
By comparing images in Fig. 5b, c, we can observe that more detailed and complete small features are detected by CNCS model, which, nevertheless, gives comparable result to CCZ model. Other comparisons are illustrated in the second row of Fig. 5. We notice that the results obtained by the methods proposed by Sandberg et al. [31] (Fig. 5d), Yuan et al. [38] (Fig. 5e), and Chan–Vese [8] (Fig. 5f) are less satisfactory than those achieved by CNCS and CCZ.
Another challenging segmentation problem is presented in Fig. 6: the multiphase segmentation of the anti-mass image with dimension \(480 \times 384\) pixels into \(K=3\) regions (see Fig. 6a) identified as light gray, dark gray and black. Also in this case, we compared the proposed CNCS method with alternative state-of-the-art segmentation methods. In particular, in the first row of Fig. 6, the segmentation results obtained by the Sandberg et al. [31] (Fig. 6b), the Yuan et al. [38] (Fig. 6c), and the Chan–Vese [8] models (Fig. 6d) are reported.
Figure 6e–g show the regions segmented by CNCS method, and in Fig. 6h the three detected regions are shown by visualizing the three gray levels. We can appreciate how our method can reveal different meaningful high-level structures in the image which are not captured by the other over-detailed segmentations.
7 Conclusions
We presented a CNC variational model for multiphase image segmentation. The minimized energy functional is made of a standard strictly convex quadratic fidelity term and a new non-convex regularization term designed for penalizing simultaneously the non-smoothness of the segmented inner regions and the length of the boundaries. A sufficient condition for strict convexity of the functional has been derived. This result allows us to benefit from the advantages of using such non-convex regularizer while, at the same time, maintaining strict (or, better, strong) convexity of the optimization problem to be solved. An efficient iterative minimization procedure based on the ADMM algorithm has been proposed, where we have derived a new proximity operator associated with the proposed regularization function. An analysis of convergence of the minimization algorithm has been presented which paves the way for analogous demonstrations for other CNC models. Experiments demonstrate the effectiveness of the proposed approach, also in comparison with some alternative state-of-the-art segmentation methods.
Notes
A convex function is proper if it nowhere takes the value \(-\infty \) and is not identically equal to \(+\infty \).
References
Bioucas-Dias, J., Figueredo, M.: Fast image recovery using variable splitting and constrained optimization. IEEE Trans. Image Process. 19(9), 2345–2356 (2010)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–22 (2011)
Brown, E.S., Chan, T.F., Bresson, X.: Completely convex formulation of Chan–Vese image segmentation model. Int. J. Comput. Vis. 98(1), 103–121 (2012)
Cai, X.H., Chan, R.H., Zeng, T.Y.: A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM J. Imaging Sci. 6(1), 368–390 (2013)
Chan, T., Esedoglu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)
Chan, T., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10, 266–277 (2001)
Chan, T., Vese, L.A.: Active contours without edges for vector-valued image. J. Vis. Commun. Image Represent. 11, 130–141 (2000)
Chen, P.Y., Selesnick, I.W.: Group-sparse signal denoising: non-convex regularization, convex optimization. IEEE Trans. Signal Proc. 62, 3464–3478 (2014)
Christiansen, M., Hanke, M.: Deblurring methods using antireflective boundary conditions. SIAM J Sci Comput 30, 855–872 (2008)
Clarke, F.H.: Optimizatiom and Nonsmooth Analysis. Wiley, New York (1983)
Donatelli, M., Reichel, L.: Square smoothing regularization matrices with accurate boundary conditions. J Comput Appl Math 272, 334–349 (2014)
Dong, B., Chien, A., Shen, Z.: Frame based segmentation for medical images. Commun. Math. Sci. 32, 1724–1739 (2010)
Ekeland, I., Temam, R.: Convex Analysis and Variational Problems (Classics in Applied Mathematics). SIAM, Philadelphia (1999)
Esedoglu, S., Tsai, Y.: Threshold dynamics for the piecewise constant Mumford–Shah functional. J. Comput. Phys. 211, 367–384 (2006)
Huang, G., Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: Majorization–minimization generalized Krylov subspace methods for \(\ell _p - \ell _q\) optimization applied to image restoration. BIT Numer Math 57(2), 351–378 (2017). doi:10.1007/s10543-016-0643-8
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization. In: Aujol, JF., Nikolova, M., Papadakis, N. (eds.) Scale Space and Variational Methods in Computer Vision. SSVM 2015. Lecture Notes in Computer Science, vol. 9087, pp. 666–677. Springer, Cham (2015)
Lanza, A., Morigi, S., Reichel, L., Sgallari, F.: A generalized Krylov subspace method for lp–lq minimization. SIAM J. Sci. Comput. 37(5), S30–S50 (2015)
Lanza, A., Morigi, S., Sgallari, F.: Constrained TVp-l2 model for image restoration. J. Sci. Comput. 68(1), 64–91 (2016)
Lanza, A., Morigi, S., Sgallari, F.: Convex image denoising via non-convex regularization with parameter selection. J. Math. Imaging Vis. 56(2), 195–220 (2016)
Lanza, A., Morigi, S., Selesnick, I., Sgallari, F.: Nonconvex nonsmooth optimization via convex–nonconvex majorization–minimization. Numer. Math. 136(2), 343–381 (2017)
Li, F., Ng, M., Zeng, T.Y., Shen, C.: A multiphase image segmentation method based on fuzzy region competition. SIAM J. Imaging Sci. 3, 277–299 (2010)
Li, F., Shen, C., Li, C.: Multiphase soft segmentation with total variation and H1 regularization. J. Math. Imaging Vis. 37, 98–111 (2010)
Lie, J., Lysaker, M., Tai, X.: A binary level set model and some applications to Mumford–Shah image segmentation. IEEE Trans. Image Process. 15, 1171–1181 (2006)
Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42(5), 577–685 (1989)
Ng, M.K., Chan, R.H., Tang, W.C.: A fast algorithm for deblurring models with Neumann boundary conditions. SIAM J Sci Comput 21, 851–866 (1999)
Nikolova, M.: Estimation of binary images by minimizing convex criteria. Proc. IEEE Int. Conf. Image Process. 2, 108–112 (1998)
Parekh, A., Selesnick, I.W.: Convex Denoising Using Non-Convex Tight Frame Regularization. arXiv:1504.00976 (2015)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)
Sandberg, B., Kang, S., Chan, T.: Unsupervised multiphase segmentation: a phase balancing model. IEEE Trans. Image Process. 19, 119–130 (2010)
Selesnick, I.W., Bayram, I.: Sparse signal estimation by maximally sparse convex optimization. IEEE Trans. Signal Process. 62(5), 1078–1092 (2014)
Selesnick, I.W., Parekh, A., Bayram, I.: Convex 1-D total variation denoising with non-convex regularization. IEEE Signal Process. Lett. 22(2), 141–144 (2015)
Strong, D.M., Chan, T.F.: Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl. 19(6), 165–187 (2003)
Wu, C., Tai, X.C.: Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models. SIAM J. Imaging Sci. 3(3), 300–339 (2010)
Wu, C., Zhang, J., Tai, X.C.: Augmented lagrangian method for total variation restoration with non-quadratic fidelity. Inverse Probl. Imaging 5(1), 237–261 (2011)
Yuan, J., Bae, E., Tai, X., Boykov, Y.: A study on continuous max-flow and min-cut approaches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2217–2224 (2010)
Yuan, J., Bae, E., Tai, X., Boykov, Y.: A continuous max-flow approach to Potts model. In: ECCV 2010: Proceedings of the 11th European Conference on Computer Vision, Springer, Berlin, pp. 332–345 (2010)
Varga, R.S.: Matrix Iterative Analysis, Springer Series in Computational Mathematics. Springer, Berlin, Heidelberg (2000). doi:10.1007/978-3-642-05156-2
Acknowledgements
We would like to thank the referees for comments that lead to improvements of the presentation.This work is partially supported by HKRGC GRF Grant No. CUHK300614, CUHK14306316, CRF Grant No. CUHK2/CRF/11G, AoE Grant AoE/M-05/12, CUHK DAG No. 4053007, and FIS Grant No. 1907303. Research by SM, AL and FS was supported by the “National Group for Scientific Computation (GNCS-INDAM)” and by ex60% project by the University of Bologna “Funds for selected research topics”.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proof of Lemma 3.2
Let \(x :{=} (x_1,x_2,x_3)^T \in {\mathbb R}^3\). Then, the function \(f(\,\cdot \,;\lambda ,T,a)\) in (3.3) can be rewritten in a more compact form as follows:
with the matrix \(Q \in {\mathbb R}^{3 \times 3}\) defined as
We introduce the eigenvalue decomposition of the matrix Q in (7.2):
where orthogonality of the modal matrix V in (7.3) follows from symmetry of matrix Q. Then, we decompose the diagonal eigenvalues matrix \(\Lambda \) in (7.3) as follows:
Substituting (7.4) into (7.3), then (7.3) into (7.1), we obtain the following equivalent expression for the function f:
Recalling that the property of convexity for a function is invariant under non-singular linear transformations of its domain, we introduce the following one for the domain \({\mathbb R}^3\) of function f above:
which is non-singular due to V and Z being non-singular matrices. By defining as \(f_T :{=} f \circ T\) the function f in the transformed domain, we have:
Recalling the definitions of Z and \(\widetilde{\Lambda }\) in (7.4), we can write (7.7) in the explicit form:
where the function g in (7.8) is defined in (3.6). Since the first term in (7.8) is (quadratic) convex, a sufficient condition for the function \(f_T\) in (7.8) to be strictly convex is that the function g in (3.6) is strictly convex. This concludes the proof after recalling that the function f is strictly convex if and only if the function \(f_T\) is strictly convex. \(\square \)
Proof of Lemma 3.3
It follows immediately from the definition of strict convexity that a function from \({\mathbb R}^2\) into \({\mathbb R}\) is strictly convex if and only if the restriction of the function to any possible straight line of \({\mathbb R}^2\) is strictly convex. Due to the radial symmetry property of function \(\psi \) in (3.7), the restriction of \(\psi \) to a generic straight line l is identical to the restriction of \(\psi \) to any other straight line obtained by rotating l around the origin. Hence, \(\psi \) is strictly convex if and only if all its restrictions to horizontal straight lines (any other direction, e.g. vertical, could be chosen as well) with non-negative intercept are strictly convex.
We denote by \(h_0\) and \(h_k\) the functions from \({\mathbb R}\) into \({\mathbb R}\) corresponding to the restriction of \(\psi \) to the horizontal straight line with null intercept, namely the horizontal coordinate axis, and to any horizontal straight line with positive intercept \(k > 0\), respectively. From the definition of the function \(\psi \) in (3.7), we have:
Since the function \(\psi \) in (3.7) is strictly convex if and only if both \(h_0\) in (7.9) and \(h_k\) in (7.10) are strictly convex, it is clear that a necessary condition for \(\psi \) to be strictly convex is that \(h_0\) in (7.9) is strictly convex. It thus remains to demonstrate that \(h_0\) being strictly convex is also a sufficient condition for \(\psi \) to be strictly convex or, equivalently, that strict convexity of \(h_0\) in (7.9) implies strict convexity of \(h_k\) in (7.10) for any positive k.
The functions \(h_0\) and \(h_k\) in (7.9)–(7.10) are clearly even and, since we are assuming \(z \in \mathcal {C}^1({\mathbb R}_+)\), we have that \(h_k \in \mathcal {C}^1({\mathbb R})\) and \(h_0 \in \mathcal {C}^0({\mathbb R}) \cap \mathcal {C}^1({\mathbb R}\setminus \{0\})\). In particular, the first-order derivatives of \(h_0\) and \(h_k\) are as follows:
We note that \(h_0\) is continuously differentiable also at the point \(t=0\) if and only if the right-sided derivative of the function z at 0 is equal to 0.
We now assume that the function \(h_0\) in (7.9) is strictly convex. This implies that the first-order derivative function \(h_0'\) is monotonically increasing on its entire domain \({\mathbb R}\setminus \{0\}\). It thus follows from the definition of \(h_0'\) in (7.11) that the first-order derivative function \(z'\) is nonnegative and monotonically increasing on \({\mathbb R}_+\). We then notice that, for any given \(k > 0\), the first-order derivative function \(h_k'\) in (7.12) is continuous (since \(z'\) is continuous on \({\mathbb R}_+\) by assumption) and odd (hence \(h_k'(0) = 0\)). Finally, by recalling that the composition and the product of positive, monotonically increasing functions is monotonically increasing, it follows that \(h_k'\) in (7.12) is monotonically increasing on the entire real line, hence \(h_k\) in (7.10) is strictly convex. This completes the proof. \(\square \)
Proof of Proposition 3.7
The functional \(\mathcal {J}(\,\cdot \,;\lambda ,\eta ,a)\) in (1.1) is clearly proper. Moreover, since the functions \(\phi (\,\cdot \,;T,a)\) and \(\Vert \cdot \Vert _2\) are both continuous and bounded from below by zero, \(\mathcal {J}\) is also continuous and bounded from below by zero. In particular, we notice that \(\mathcal {J}\) achieves the zero value only for \(u = b\) with b a constant image. The penalty function \(\phi (\,\cdot \,;T,a)\) is not coercive, hence the regularization term in \(\mathcal {J}\) is not coercive. However, since the fidelity term is quadratic and strictly convex, hence coercive, and the regularization term is bounded from below by zero, \(\mathcal {J}\) is coercive.
As far as strong convexity is concerned, it follows from Definition 3.6 that the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is \(\mu \)-strongly convex if and only if the functional \(\widetilde{\mathcal {J}}(u;\lambda ,T,a,\mu )\) defined as
is convex, where \(\mathcal {A}(u)\) is an affine function of u. We notice that the functional \(\widetilde{\mathcal {J}}\) in (7.13) almost coincides with the original functional \(\mathcal {J}\) in (1.1), the only difference being the coefficient is \(\lambda -\mu \) instead of \(\lambda \). Hence, we can apply the results in Theorem 3.5 and state that \(\widetilde{\mathcal {J}}\) in (7.13) is convex if condition (3.10) is satisfied with \(\lambda - \mu \) in place of \(\lambda \). By substituting \(\lambda - \mu \) for \(\lambda \) in condition (3.10), deriving the solution interval for \(\mu \) and then taking the maximum, one obtains equality (3.22). \(\square \)
Proof of Proposition 4.1
The demonstration of condition (4.17) for strict convexity of the function \(\theta \) in (4.16) is straightforward. In fact, the function \(\theta \) can be equivalently rewritten as
with \(\mathcal {A}(x)\) an affine function, so that a necessary and sufficient condition for \(\theta \) to be strictly convex is that the function \(\bar{\theta }\) in (7.14) is strictly convex. We then notice that \(\bar{\theta }\) is almost identical to the function g in (3.6), the only difference being the coefficient \(\beta /2\) that for g now reads \(\lambda /18\). By setting \(\lambda /18 = \beta /2 \Longleftrightarrow \lambda = 9 \beta \), the two functions coincide. Condition for strict convexity of g in (3.10) reads as \(\lambda > 9\,a\), hence by substituting \(\lambda = 9 \beta \) in it we obtain condition (4.17) for strict convexity of \(\theta \).
We remark that condition \(\beta > a\) reduces to \(\beta \ge a\) when only convexity is required.
For the proof of statement (4.19), according to which the unique solution \(x^*\) of the strictly convex problem (4.18) is obtained by a shrinkage of vector r, we refer the reader to [20, Proposition 4.5].
We now prove statement (4.20). First, we notice that if \(\Vert r\Vert _2 = 0\), i.e. r is the null vector, the minimization problem in (4.18) with the objective function \(\theta (x)\) defined in (4.16) reduces to
Since the former and the latter terms of the cost function in (7.15) are a monotonically non-decreasing and a monotonically increasing functions of \(\Vert x\Vert _2\), respectively, the solution of (7.15) is clearly \(x^* = 0\). Hence, the case \(\Vert r\Vert _2 = 0\) can be easily dealt with by taking any value \(\xi ^*\) in formula (4.19). We included the case \(\Vert r\Vert _2 = 0\) in formula a) of (4.20). In the following, we consider the case \(\Vert r\Vert _2 > 0\).
Based on the previously demonstrated statement (4.19), by setting \(x = \xi \, r\), \(\xi \ge 0\), we turn the original unconstrained 2-dimensional problem in (4.18) into the following equivalent constrained 1-dimensional problem:
where in (7.16) we omitted the constants and introduced the cost function \(f: {\mathbb R}_+ \rightarrow {\mathbb R}\) for future reference. Since the function \(\phi \) in (7.16), which is defined in (2.1), is continuously differentiable on \({\mathbb R}_+\), the cost function f in (7.16) is also continuously differentiable on \({\mathbb R}_+\). Moreover, f is strictly convex since it represents the restriction of the strictly convex function \(\theta \) in (4.16) to the half-line \(\xi \, r, \, \xi \ge 0\). Hence, the first-order derivative \(f'(\xi )\) is a continuous, monotonically increasing function and a necessary and sufficient condition for an inner point \(0< \xi < 1\) to be the global minimizer of f is that \(f'(\xi ) = 0\). From the definition of f in (7.16) we have:
and, in particular:
It follows from (7.18) that the solution of (7.16) can not be \(\xi ^* = 0\), hence it is either \(\xi ^* = 1\) or an inner stationary point.
Recalling the definition of \(\phi (\,\cdot \,;T,a)\) in (2.1), after some simple manipulations the function \(f'(\xi )\) in (7.17) can be rewritten in the following explicit form:
that is:
Denoting by \(\xi _1^*\), \(\xi _2^*\), \(\xi _3^*\) the points where \(f_1'\), \(f_2'\), \(f_3'\) in (7.20) equal zero, respectively, we have:
However, for \(\xi _1^*\), \(\xi _2^*\) and \(\xi _3^*\) in (7.21) to be acceptable candidate solutions of problem (7.16), they must belong to the domains \(\mathcal {D}_1\), \(\mathcal {D}_2\), \(\mathcal {D}_3\) of \(f_1'\), \(f_2'\), \(f_3'\), respectively, and obviously also to the optimization domain \(\mathcal {O} :{=} [0,1]\) of problem (7.16). We have:
The proof of statement (4.20) is thus completed. \(\square \)
Proof of Theorem 5.7
Based on the definition of the augmented Lagrangian functional in (5.2), we rewrite in explicit form the first inequality of the saddle-point condition in (4.7):
and, similarly, the second inequality:
In the first part of the proof, we prove that if \(\,(u^*,t^*;\rho ^*)\,\) is a solution of the saddle-point problem (4.6)–(4.7), that is it satisfies the two inequalities (7.25) and (7.26), then \(u^*\) is a global minimizer of the functional \(\mathcal {J}\) in (1.1).
Since (7.25) must be satisfied for any \(\rho \;{\in }\; {\mathbb R}^{2n}\), we have:
The second inequality (7.26) must be satisfied for any \((u,t) \;{\in }\; {\mathbb R}^n {\times }\, {\mathbb R}^{2n}\). Hence, by taking \(t = Du\) in (7.26) and, at the same time, substituting in (7.26) the previously derived condition (7.27), we obtain:
Inequality (7.28) indicates that \(u^*\) is a global minimizer of the functional \(\mathcal {J}\) in (1.1). Hence, we have proved that all the saddle-point solutions of problem (4.6)–(4.7), if there exists one, are of the form \(\,(u^*,Du^*;\rho ^*)\,\), with \(u^*\) denoting a global minimizer of \(\mathcal {J}\).
In the second part of the proof, we prove that at least one solution of the saddle-point problem exists. In particular, we prove that if \(u^*\) is a global minimizer of \(\mathcal {J}\) in (1.1), then there exists at least one pair \(\,(t^*,\,\rho ^*) \,{\in }\; {\mathbb R}^{2n} {\times }\, {\mathbb R}^{2n}\) such that \((u^*,t^*;\rho ^*)\) is a solution of the saddle-point problem (4.6)–(4.7), that is it satisfies the two inequalities (7.25) and (7.26). The proof relies on a suitable choice of the vectors \(t^*\) and \(\rho ^*\). We take:
where the term \(\bar{\partial }_{t} \left[ \, R \,\right] (Du^*)\) indicates the Clarke generalized gradient (with respect to t, calculated at \(Du^*\)) of the nonconvex regularization function R defined in (5.1). We notice that a vector \(\rho ^*\) satisfying (7.30) is guaranteed to exist thanks to Proposition 5.2. In fact, since here we are assuming that \(u^*\) is a global minimizer of functional \(\mathcal {J}\), the first-order optimality condition in (5.5) holds true.
Due to (7.29), the first saddle-point condition in (7.25) is clearly satisfied. Proving the second condition (7.26) is less straightforward: we need to investigate the optimality conditions of the functional \(\mathcal {L}\,(u,t;\rho ^*)\) with respect to the pair of primal variables (u, t). We follow the same procedure used, e.g., in [35], which requires \(\mathcal {L}\,(u,t;\rho ^*)\) to be jointly convex in (u, t). According to Proposition 5.4, in our case this requirement is fulfilled if the penalty parameter \(\beta \) satisfies condition (5.17), which has thus been taken as an hypothesis of this theorem. Hence, we can apply Lemma 5.6 and state that (7.26) is satisfied if and only if both the following two optimality conditions are met:
where in (7.31)–(7.32) we introduced the two functions \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) representing the restrictions of functions \(\mathcal {L}\,(u,t^*;\rho ^*)\) and \(\mathcal {L}\,(u^*,t;\rho ^*)\) to only the terms depending on the optimization variables u and t, respectively. In particular, after recalling the definition of the augmented Lagrangian functional in (5.2), we have
where, like in [35], \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) have been split into the sum of two functions with the aim of then deriving optimality conditions for \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\) by means of Lemma 5.5. Unlike in [35], the ADMM quadratic penalty term \(\frac{\beta }{2} \, \Vert t - D u \Vert _2^2\) has been split into two parts (differently in \(\mathcal {L}^{(u)}\) and \(\mathcal {L}^{(t)}\)) in order to deal with the nonconvex regularization term. In particular, the coefficients \(\beta _1\), \(\beta _2\) introduced in (7.33)–(7.34) satisfy
such that the terms \(S^{(u)}\), \(S^{(t)}\) in (7.33)–(7.34) are clearly convex and the terms \(Q^{(u)}\), \(Q^{(t)}\) are convex due to results in Lemma 5.3 and Proposition 4.1, respectively. We also notice that all the functions \(Q^{(u)}\), \(Q^{(t)}, S^{(u)}\), \(S^{(t)}\) are proper and continuous and that \(S^{(u)}\), \(S^{(t)}\) are G\(\hat{a}\)teaux-differentiable. Hence, we can apply Lemma 5.5 separately to (7.33) and (7.34), to check if the pair \((u^*,t^*)\) satisfies the optimality conditions in (7.31) and (7.32), so that the second saddle-point condition (7.26) holds true. We obtain:
where the term \(t^*-Du^*\) in (7.36)–(7.37) is zero due to the setting (7.29). We rewrite conditions (7.36)–(7.37) as follows:
where in (7.38) we added and subtracted the term \(\lambda \, (u^*-b)\) and added the null term \(\beta _1 D^T (t^* - D u^*)\), and in (7.39) we added the null term \(\beta _2 \, (t^* - Du^*)\). The term \(-\lambda \, (u^*-b) - D^T \rho ^*\) in (7.38) is null due to the setting (7.30). By introducing the two functions
which are convex under conditions (7.35) for the same reason for which the functions \(Q^{(u)}\), \(Q^{(t)}\) in (7.33)–(7.34) are convex, conditions (7.38)–(7.39) can be rewritten as
where we highlighted that the left side of the scalar product in (7.41) represents the subdifferential (actually, the standard gradient) of function U calculated at \(u^*\) and that the left side of the scalar product in (7.42) is a particular vector belonging to the subdifferential of function T calculated at \(t^*\). This second statement comes from the definition of function T in (7.40) and from settings (7.29)–(7.30).
Optimality conditions in (7.41)–(7.42) are easily proved by noticing that the left-hand sides of (7.41)–(7.42) represent the Bregman distances associated with functions U and T, respectively, which are known to be non-negative for convex functions. Hence, the second saddle-point condition in (7.26) is satisfied and, finally, the second and last part of the proof is completed. \(\square \)
Proof of Theorem 5.8
Let us define the following errors:
Since \((u^*,t^*;\rho ^*)\) is a saddle-point of the augmented Lagrangian functional in (4.6), it follows from Theorem 5.7 that \(t^* = D^*u\). This relationship, together with the ADMM updating formula for the vector of Lagrange multipliers in (4.10), yields:
It then follows easily from (7.44) that
Computation of a lower bound for the right-hand side of (7.45)
Since \((u^*,t^*;\rho ^*)\) is a saddle-point of the augmented Lagrangian functional in (4.6), it satisfies the following optimality conditions [see (7.36)–(7.37) in the proof of Theorem 5.7]:
Similarly, by the construction of \(\big (u^{(k)},t^{(k)}\big )\) in Algorithm 1, we have:
Taking \(u = u^{(k)}\) in (7.46), \(u = u^*\) in (7.48) and recalling that \(\langle D^T w , z \rangle = \langle w , D z \rangle \) , by addition we obtain:
Similarly, taking \(t = t^{(k)}\) in (7.47) and \(t = t^*\) in (7.49), after addition we have:
where, we recall, the parameters \(\beta _1\) and \(\beta _2\) in (7.50)–(7.51) satisfy the constraints in (7.35).
By summing up (7.50) and (7.51), we obtain:
that is
where we introduced the positive coefficient \(\beta _3 > 0\) (the reason will be clear later on). We want that the last term in (7.52) takes the form \(-\,\big \Vert \, c_1 \bar{t}^{(k)} - c_2 D \bar{u}^{(k)} \big \Vert _2^2 \) with \(c_1,c_2 > 0\). Hence, first we impose that the coefficients of \(\big \Vert \bar{t}^{(k)} \big \Vert _2^2\) and \(\big \Vert D \bar{u}^{(k)} \big \Vert _2^2\) in (7.52) are strictly positive, which yields:
Combining (7.53) with conditions (7.35), we obtain:
From condition on \(\beta _3\) in (7.54), the following constraint for \(\beta \) is derived:
We notice that condition (7.55) can be more stringent than (5.17), depending on \(\tau _c\), hence it has been taken as an hypothesis of this theorem and will be considered, together with (5.17), in the rest of the proof. From condition on \(\beta _3\) in (7.54) it also follows that the coefficient \(\beta - \beta _3\) of the scalar product in (7.52) is positive.
Then, we have to impose that the coefficient of the term \(-\big \langle \, \bar{t}^{(k)} , D \bar{u}^{(k)} \, \big \rangle \) in (7.52) is twice the product of the square roots of the (positive) coefficients of \(\big \Vert \bar{t}^{(k)} \big \Vert _2^2\) and \(\big \Vert D \bar{u}^{(k)} \big \Vert _2^2\), that is:
By imposing condition on \(\beta _3\) in (7.54), namely \(\beta -\beta _3 > 2a\), it is easy to verify that (7.56) admits acceptable solutions only in case that \(\beta _1 > \beta _2\). By setting in (7.56) \(\beta _1 = \tau _c \frac{9}{8} \, a\) and \(\beta _2 = a\), which are acceptable values according to this last result (since \(\tau _c > 1\), clearly \(\beta _1 > \beta _2\)) and also to conditions (7.54), we obtain:
We now check if there exist acceptable values for the two remaining free parameters, namely \(\beta \) and \(\beta _3\), such that (7.57) holds. We impose that \(\beta \) in (7.57) satisfies its constraint in (5.17), which guarantees convexity of the augmented Lagrangian functional, and the derived condition in (7.55):
Since \(\tau _c > 1\) (and \(a > 0\)), both conditions in (7.58) are satisfied for any \(\beta _3 > 0\). Hence, for \(\beta _1 = \tau _c \frac{9}{8} \, a\), \(\,\beta _2 = a\) and any \(0< \beta _3 < \beta - 2a\), with \(\beta > 2a\), the last term in (7.52) can be written in the form
where \(c_1,c_2 > 0\), \(c_1 \ne c_2\). Replacing the expression in (7.59) for the last term in (7.52), we have:
where in (7.60) we multiplied both sides by the positive coefficient \(2\beta \). We notice that the left-hand side of (7.60) coincides with the right-hand side of (7.45), hence it follows that:
Computation of a lower bound for the term \(\varvec{T}\) in (7.61).
We can write:
First, we notice that:
Then, from the construction of \(t^{(k-1)}\) (from \(u^{(k-1)}\)), we have:
Taking \(t = t^{(k-1)}\) in (7.49) and \(t = t^{(k)}\) in (7.64), we obtain:
By addition of (7.65) and (7.66), we have that
Recalling that
replacing (7.68) into (7.67) and then dividing by \(\beta \), we obtain:
From (7.62), (7.63) and (7.69), we have:
Convergence results for sequences \(\varvec{t^{(k)},Du^{(k)},\rho ^{(k)}}\).
From (7.61) and (7.70), we obtain:
that is:
where we have introduced the scalar sequence \(\{s^{(k)}\}\), which is clearly bounded from below by zero. We notice that the coefficient \(\beta -2\beta _2\) in (7.72) is positive due to the constraint \(\beta > 2a\). Since the right-hand side of the first inequality in (7.72) is nonnegative, \(\{s^{(k)}\}\) is monotonically non-increasing, hence convergent. This implies that the right-hand side of (7.72) tend to zero as \(k \rightarrow \infty \). From these considerations and (7.72) it follows that:
Since the two coefficients \(c_1\), \(c_2\) in (7.76) satisfy \(c_1, c_2 \ne 0\), \(c_1 \ne c_2\), then it follows from (7.75)–(7.76) that both the sequences \(\{\bar{t}^{(k)}\}\) and \(\{D \bar{u}^{(k)}\}\) tend to zero as \(k \rightarrow \infty \). Results in (7.73)–(7.76) can thus be rewritten in the following more concise and informative form:
where the last equality in (7.78) comes from the saddle-point properties stated in Theorem 5.7. Since it will be useful later on, we note that it follows from (7.78) that
Convergence results for sequence \(\varvec{u^{(k)}}\).
We now prove that \(\lim _{{k \rightarrow \infty } } u^{(k)} =u^*\). Since \((u^*,t^*;\rho ^*)\) is a saddle point of the augmented Lagrangian functional \(\mathcal {L}(u,t;\rho )\), we have
By taking \(u = u^{(k)}\), \(t = t^{(k)}\) in (7.81) and recalling the definition of \(\mathcal {L}(u,t;\rho )\) in (5.2), we have:
Taking \(u = u^*\) in (7.48) and \(t = t^*\) in (7.49), we obtain:
By summing up (7.83) and (7.84), we have:
Taking \(\lim \inf \) of (7.82) and \(\lim \sup \) of (7.85), and using the results in (7.77)–(7.80), we have
It follows from (7.86) that
We now manipulate \(F(u^{(k)})\) as follows:
On the other hand, we have that
where in (7.89) we have used the (optimality) condition (7.30). From (7.88) and (7.89) it follows that
that is
Taking the limit for \(k \rightarrow \infty \) of both sides of (7.91) and recalling (7.79) and (7.87), we obtain:
thus completing the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Chan, R., Lanza, A., Morigi, S. et al. Convex non-convex image segmentation. Numer. Math. 138, 635–680 (2018). https://doi.org/10.1007/s00211-017-0916-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-017-0916-4