1 Introduction

The fundamental task of image segmentation is the partitioning of an image into regions that are homogeneous according to a certain feature, such as intensity or texture, and to identify more meaningful high level information in the image. This process plays a fundamental role in many important application fields like computer vision, e.g. object detection, recognition, measurement and tracking. Many successful methods for image segmentation are based on variational models where the regions of the desired partition, or their edges, are obtained by minimizing suitable energy functions. The most popular region-based segmentation model, the Mumford–Shah model, is a non-convex variational model which pursues a piecewise constant/smooth approximation of the given image where the boundaries are the transition between adjacent patches of the approximation [25]. Many convex relaxation models have been proposed in the literature to overcome the numerical difficulties of the non-convex problem [4,5,6, 22] at the same time accepting a compromise in terms of segmentation quality.

In this work we propose the following Convex Non-Convex (CNC) variational segmentation model given by the sum of a smooth convex (quadratic) fidelity term and a non-smooth non-convex regularization term:

$$\begin{aligned} \min _{u \in {\mathbb R}^n} \mathcal {J}(u;\lambda ,T,a), \quad \mathcal {J}(u;\lambda ,T,a) :{=} \frac{\lambda }{2} \Vert u - b \Vert _2^2 + \sum _{i = 1}^{n} \phi \left( \Vert (\nabla u)_i \Vert _2 ; T, a \right) ,\nonumber \\ \end{aligned}$$
(1.1)

where \(\lambda > 0\) is the regularization parameter, \(b \in {\mathbb R}^n\) is the (vectorized) observed image, \((\nabla u)_i \in {\mathbb R}^2\) represents the discrete gradient of the image \(u \in {\mathbb R}^n\) at pixels i, \(\Vert \, \cdot \, \Vert _2\) denotes the \(\ell _2\) norm and \(\phi (\,\cdot \,;T,a): [\,0,+\infty ) \rightarrow {\mathbb R}\) is a parameterized, piecewise-defined non-convex penalty function with parameters \(T>0, a > 0\) and with properties that will be outlined in Sect. 2. In particular, the parameter a allows to tune the degree of non-convexity of the regularizer, while the parameter T is devised to represent a given gradient magnitude threshold above which the boundaries surrounding the features in the image are considered salient in a given context. This parameter plays a fundamental role in selecting which pixels do not have to be considered as boundaries of segmented regions in the image. The role of the penalty function \(\phi \) in the regularization term of functional \(\mathcal {J}\) in (1.1) is twofold. When the gradient magnitudes fall within the first interval [0, T), \(\phi \) smooths the image values, since the corresponding pixels belong to the inner parts of the regions to be segmented. In the interval \([T,+\infty )\), \(\phi \) is non-convex and then flat and hence it penalizes, in approximately the same way, all the possible gradient magnitudes.

The functional \(\mathcal {J}\) in (1.1) is non-smooth and can be convex or non-convex depending on the parameters \(\lambda \) and a. In fact, the quadratic fidelity term is strongly convex and its positive second-order derivatives hold the potential for compensating the negative second-order derivatives in the regularization term.

The idea of constructing and then optimizing convex functionals containing non-convex (sparsity-promoting) terms, referred to as CNC strategy, was first introduced by Blake and Zisserman in [2], then proposed by Nikolova in [27] for the denoising of binary images and was studied very recently by Selesnik and others for different purposes, see [9, 17, 20, 21, 28, 32, 33] for more details. The attractiveness of such CNC approach resides in its ability to promote sparsity more strongly than it is possible by using only convex terms while at the same time maintaining convexity of the total optimization problem, so that well-known reliable convex minimization approaches can be used to compute the (unique) solution.

In this paper, we propose a two-stage variational segmentation method inspired by the piecewise smoothing proposal in [5] which is a convex variant of the classical Mumford–Shah model. In the first stage of our method an approximate solution \(u^*\) of the optimization problem (1.1) is computed. Once \(u^*\) is obtained, then in the second stage the segmentation is carried out by thresholding \(u^*\) into different phases. The thresholds can be set by the user or can be obtained automatically using any clustering method, such as the K-means algorithm. As discussed in [5], this allows for a K-phase segmentation \((K \ge 2)\) by choosing \((K - 1)\) thresholds after \(u^*\) is computed in the first stage. In contrast, many multiphase segmentation methods such as those in [15, 23, 24, 38] require K to be set in advance which implies that if K changes, the minimization problem has to be solved again.

The main contributions of this paper are summarized as follows.

  • A new variational CNC model for multiphase segmentation of images is proposed, where a unique non-convex regularization term allows for simultaneously penalizing both the non-smoothness of the inner segmented parts and the length of the boundaries;

  • Sufficient conditions for convexity are derived for the proposed model;

  • A specially designed ADMM-based numerical algorithm is introduced together with a specific multivariate proximal map;

  • The proof of convergence of the minimization algorithm is provided which paves the way to analogous proofs for similar CNC algorithms.

1.1 Related work

Image segmentation is a relevant problem in the understanding of high level information from image vision. There exist many different ways to define the task of segmentation ranging from template matching, component labelling, thresholding, boundary detection, texture segmentation just to mention a few, and there is no universally accepted best segmentation procedure. The proposed work belongs to the class of region-based (rather than edge-based) variational models for multiphase segmentation without supervision constraints. Many variational models have been studied for image segmentation since the Mumford–Shah functional was introduced in [25]. A prototypical example is the Chan–Vese [7] model, which seeks the desired segmentation as the best piecewise constant approximation to a given image via a level set formulation. A variety of methods have been developed to generalize it and overcome the problem to solve nonconvex optimization problems. Specifically, in [22], the piecewise constant Mumford–Shah model was convexified by using fuzzy membership functions. In [31], a new regularization term was introduced which allows to choose the number of phases automatically. In [37, 38], efficient methods based on the fast continuous max-flow method were proposed. The segmentation method recently proposed by Cai et al. in [5] aims to minimize a convex version of the Mumford–Shah functional [25] by finding an optimal approximation of the image based on a piecewise smooth function. The main difference between our method and the approach in [5] is the regularization term: in [5], it consists of the sum of a smoothing term and a total variation term, where the latter replaces the non-convex term measuring the boundary length in the original Mumford–Shah model. However, it is well known that using non-convex penalties instead of the total variation regularizer holds the potential for more accurate penalizations [16, 18, 19]. In our variational model (1.1) we devise a unique regularization function which forces at the same time the smoothing of the approximate solution in the inner parts and the preservation of the features (corners, edges,..) along the boundaries of the segmented parts. This latter property is achieved by means of a non-convex regularizer. However, the CNC strategy applied to the solution of the optimization problem allows us to overcome the well-known numerical problems for the solution of the non-convex piecewise smooth Mumford–Shah original model.

This work is organized as follows. In Sect. 2 we characterize the non-convex penalty functions \(\phi (\,\cdot \,;T,a)\) considered in the proposed model. In Sect. 3 we provide a sufficient condition for strict convexity of the cost functional in (1.1) and in Sect. 4 we illustrate in detail the ADMM-based numerical algorithm used to compute approximate solutions of (1.1). A proof of convergence of the numerical algorithm is given in Sect. 5. Some segmentation results are illustrated in Sect. 6. Conclusions are drawn in Sect. 7.

2 Design of the non-convex penalty functions

In the rest of the paper, we denote the sets of non-negative and positive real numbers by \(\,{\mathbb R}_+ :{=}\, \{ \, t \in {\mathbb R}: \, t \ge 0 \}\) and \(\,{\mathbb R}_+^* :{=}\, \{ \, t \in {\mathbb R}: \, t > 0 \}\), respectively, and we indicate by \(\,I_d\) the \(d \times d\) identity matrix.

In this section, we design a penalty function \(\phi : {\mathbb R}_+ \rightarrow {\mathbb R}\) suitable for our purposes. In particular, the regularization term in the proposed model (1.1) has a twofold aim: in the first interval [0, T) it has to behave like a smoothing regularizer, namely a quadratic penalty, and in the second interval \([T,\infty ) \) it serves to control the length of the region boundaries, and is realized by a concave penalty function prolonged with a horizontal line.

To fulfill the above requirements we used a piecewise polynomial function defined over three subdomains [0, T), \([T,T_2)\) and \([T_2, \infty )\), with the following properties:

  • \(\;\phi \) continuously differentiable for \(t \in {\mathbb R}_+\)

  • \(\;\phi \) twice continuously differentiable for \(t \in {\mathbb R}_+ \!\setminus \! \{T,T_2\}\)

  • \(\;\phi \) convex and monotonically increasing for \(t \in [0,T)\)

  • \(\;\phi \) concave and monotonically non-decreasing for \(t \in [T,T_2) \)

  • \(\;\phi \) constant for \(t \in [T_2,\infty ) \)

  • \({\inf \nolimits _{t\in {\mathbb R}_+ \setminus \{T,T_2\}}} \phi '' = \; -a.\)

Recalling that the independent variable t of \(\phi (t;T,a)\) represents the gradient magnitude, the free parameter T allows us to control the segmentation, in particular it defines the lower slope considered as acceptable boundary for the segmentation process. The parameter a is used to make the entire functional \(\mathcal {J}\) in (1.1) convex, as will be detailed in Sect. 3. Finally, the parameter \(T_2\) is defined to allow for a good balancing between the two terms in the functional. In particular, the graph of the penalty function \(\phi \) must be pushed down when a increases. Towards this aim, we set \(T_2\) as \(T_2(a)\) in such a way that the slope in T given by \(\phi '(T; T,a)=(T_2-T)a\) is a monotonically decreasing function of the parameter a. In this work we set \(\phi '(T; T,a)=1/a\) so that \(T_2\) is set to be \(T_2= T+\frac{1}{a^2}\). Therefore, in the following we restrict the number of free parameters to a and T only.

The minimal degree polynomial function fulfilling the above requirements, turns out to be the following piecewise quadratic penalty function:

$$\begin{aligned} \phi (t;T,a) :{=} \left\{ \begin{array}{lcll} \phi _1(t; T,a) &{} := &{} \frac{a(T_2-T)}{2\,T}t^2 &{}\quad t \in [0,T)\\ \phi _2(t; T,a) &{} := &{} -\frac{a}{2}t^2 + a T_2 t - \frac{a T T_2}{2} &{}\quad t \in [T, T_2)\\ \phi _3(t; T,a) &{} := &{} \frac{a T_2(T_2-T)}{2} &{}\quad t \in [T_2, \infty )\\ \end{array} \right. \end{aligned}$$
(2.1)

which has been obtained by imposing the following constraints:

  • \(\phi _1(0; T,a) = \phi _1^{\prime }(0; T,a) = 0 \)

  • \(\phi _1(T; T,a) = \phi _2(T; T,a) \)

  • \(\phi _1^{\prime }(T; T,a) = \phi _2^{\prime }(T; T,a)\)

  • \(\phi _2^{\prime }(T_2; T,a) = 0 \)

  • \(\phi _2^{\prime \prime }(t; T,a) = -a \qquad \forall t \in [T, T_2)\)

  • \(\phi _3 \; \text {constant} \qquad \qquad \quad \; \forall t \in [T_2, \infty )\)

In Figure 1 we show the plots of the penalty functions \(\phi (t;T,a)\) for three different values of \(a \in \{3,5,7\}\) with fixed \(T=0.2\). The solid dots in the graphs represent the points \((T,\phi (T;T,a))\) which separate the convex segment \(\phi _1\) from the non-convex ones \(\phi _2\) and \(\phi _3\).

Fig. 1
figure 1

Plots of the penalty function \(\phi \) defined in (2.1) for different values of the concavity parameter a. The solid dots indicate the point which separates the subdomains [0, T) and \([T,\infty )\)

This choice of a simple second-order piecewise polynomial as penalty function is motivated by computational efficiency issues as detailed in Sect. 4.

3 Convexity analysis

In this section, we analyze convexity of the optimization problem in (1.1). More precisely, we seek for sufficient conditions on the parameters \(\lambda ,T,a \in {\mathbb R}_+^*\) such that the objective functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is strictly convex. In the previous section we designed the penalty function \(\phi (\,\cdot \,;T,a)\) in (2.1) in such a way that it is continuously differentiable but not everywhere twice continuously differentiable. This choice, that was motivated by the higher achievable flexibility in the shape of the function, prevents us from using only arguments based on second-order differential quantities for the following analysis of convexity.

We rewrite \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) in explicit double-indexed form:

$$\begin{aligned} \mathcal {J}(u;\lambda ,T,a)= & {} \sum _{(i,j) \in \Omega } \,\,\, \frac{\lambda }{2} \, \left( u_{i,j} - b_{i,j} \right) ^2 \nonumber \\&+ \sum _{(i,j) \in \Omega } \phi \,\left( \sqrt{ \left( u_{i+1,j}-u_{i,j}\right) ^2 {+}\, \left( u_{i,j+1}-u_{i,j}\right) ^2 } \,;\, T,a \right) ,\quad \qquad \end{aligned}$$
(3.1)

where \(\Omega \) represents the image lattice defined by

$$\begin{aligned} \Omega :{=} \left\{ (i,j) \in {\mathbb Z}^2: \; i = 1, \ldots , n_1, \; j = 1,\ldots , n_2 \,\right\} , \end{aligned}$$
(3.2)

with \(n_1\) and \(n_2\) denoting the image height and width, respectively. In (3.1) we used a standard first-order forward finite difference scheme for discretization of the first-order horizontal and vertical partial derivatives. We notice that convexity conditions for the functional \(\mathcal {J}\) depend on the particular finite difference scheme used for discretization of the gradient. Nevertheless, the procedure used below for deriving such conditions can be adapted to other discretization choices.

In the following, we give four lemmas which allow us to reduce convexity analysis from the original functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) of n variables to simpler functions \(f(\,\cdot \,;\lambda ,T,a)\), \(g(\,\cdot \,;\lambda ,T,a)\) and \(h(\,\cdot \,;\lambda ,T,a)\) of three, two and one variables, respectively. Then in Theorem 3.5 we finally state conditions for strict convexity of our functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1).

Lemma 3.1

The function \(\mathcal {J}(\,\cdot \,;\lambda ,T,a): {\mathbb R}^n \rightarrow {\mathbb R}\) defined in (3.1) is strictly convex if the function \(f(\,\cdot \,;\lambda ,T,a): {\mathbb R}^3 \rightarrow {\mathbb R}\) defined by

$$\begin{aligned} f(x_1,x_2,x_3;\lambda ,T,a)= & {} \frac{\lambda }{6}\, \left( x_1^2 + x_2^2 + x_3^2 \right) \nonumber \\&+\, \phi \,\left( \sqrt{ \left( x_2 - x_1 \right) ^2 {+}\, \left( x_3 - x_1 \right) ^2 } \,;\, T,a \right) \end{aligned}$$
(3.3)

is strictly convex.

Proof

First, we notice that the functional \(\mathcal {J}\) in (3.1) can be rewritten as

$$\begin{aligned} \mathcal {J}(u;\lambda ,T,a)= & {} \mathcal {A}(u) + \sum _{(i,j) \in \Omega } \frac{\lambda }{2} \, u_{i,j}^2 \nonumber \\&+ \sum _{(i,j) \in \Omega } \phi \,\left( \sqrt{ \left( u_{i+1,j}-u_{i,j}\right) ^2 {+}\, \left( u_{i,j+1}-u_{i,j}\right) ^2 } \,;\, T,a \right) ,\qquad \quad \end{aligned}$$
(3.4)

where \(\mathcal {A}(u)\) is an affine function of u. Then, we remark that each term of the last sum in (3.4) involves three different pixel locations and that, globally, the last sum involves each pixel location three times. Hence, we can clearly write

$$\begin{aligned} \mathcal {J}(u;\lambda ,T,a)= & {} \mathcal {A}(u) + \sum _{(i,j) \in \Omega } \,\,\, \frac{\lambda }{6} \, \left( u_{i,j}^2 + u_{i+1,j}^2 + u_{i,j+1}^2 \right) \nonumber \\&+ \sum _{(i,j) \in \Omega } \phi \,\left( \sqrt{ \left( u_{i+1,j}-u_{i,j}\right) ^2 {+}\, \left( u_{i,j+1}-u_{i,j}\right) ^2 } \,;\, T,a \right) \nonumber \\= & {} \mathcal {A}(u) + \sum _{(i,j) \in \Omega } f(u_{i,j},u_{i+1,j},u_{i,j+1};\lambda ,T,a). \end{aligned}$$
(3.5)

Since the affine function \(\mathcal {A}(u)\) does not affect convexity, we can conclude that the functional \(\mathcal {J}\) in (3.5) is strictly convex if the function f is strictly convex. \(\square \)

Lemma 3.2

The function \(f(\,\cdot \,;\lambda ,T,a): {\mathbb R}^3 \rightarrow {\mathbb R}\) defined in (3.3) is strictly convex if the function \(g(\,\cdot \,;\lambda ,T,a): {\mathbb R}^2 \rightarrow {\mathbb R}\) defined by

$$\begin{aligned} g(y_1,y_2;\lambda ,T,a) = \frac{\lambda }{18} \left( y_1^2 + y_2^2 \right) +\, \phi \,\left( \sqrt{ y_1^2 + y_2^2 } \,;\, T, a \right) \end{aligned}$$
(3.6)

is strictly convex.

The proof is provided in the “Appendix”.

Lemma 3.3

Let \(\,\,\psi {:}\; {\mathbb R}^2 \rightarrow {\mathbb R}\) be a radially symmetric function defined as

$$\begin{aligned} \psi (x) :{=}\, z\left( \Vert x\Vert _2\right) , \quad \; z{:}\;\, {\mathbb R}_+ \rightarrow {\mathbb R}, \;\;z \in \mathcal {C}^1\left( {\mathbb R}_+\right) . \end{aligned}$$
(3.7)

Then, \(\psi \) is strictly convex in x if and only if the function \(\tilde{z}{:}\; {\mathbb R}\rightarrow {\mathbb R}\) defined by

$$\begin{aligned} \tilde{z}(t) :{=}\, z(|t|) \end{aligned}$$
(3.8)

is strictly convex in t.

The proof is provided in the “Appendix”.

Lemma 3.4

The function \(g(\,\cdot \,;\lambda ,T,a): {\mathbb R}^2 \rightarrow {\mathbb R}\) defined in (3.6) is strictly convex if and only if the function \(h(\,\cdot \,;\lambda ,T,a): {\mathbb R}\rightarrow {\mathbb R}\) defined by

$$\begin{aligned} h(t;\lambda ,T,a) = \frac{\lambda }{18} \,\, t^2 +\, \phi \left( |t|; T,a \right) \end{aligned}$$
(3.9)

is strictly convex.

The proof is immediate by applying Lemma 3.3 to the function g in (3.6).

Theorem 3.5

Let \(\,\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1). Then, a sufficient condition for the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) to be strictly convex is that the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfies:

$$\begin{aligned} \lambda > 9 \, a \,\;\;{\Longleftrightarrow }\;\;\, \lambda = \tau _c \, 9 \, a , \quad \tau _c \,{\in }\; (1,+\infty ) \; . \end{aligned}$$
(3.10)

Proof

It follows from Lemmas 3.1, 3.2 and 3.4 that a sufficient condition for the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) to be strictly convex is that the function \(h(\,\cdot \,;\lambda ,T,a)\) in (3.9) is strictly convex. Recalling the definition of the penalty function \(\phi (\,\cdot \,;T,a)\) in (2.1), \(h(\,\cdot \,;\lambda ,T,a)\) in (3.9) can be rewritten in the following explicit form:

$$\begin{aligned} h(t;\lambda ,T,a)= & {} \left\{ \begin{array}{lcll} h_1(t;\lambda ,T,a) &{} :{=} &{} \big ( \frac{\lambda }{18} -\frac{a}{2} + a \frac{T_2}{2T} \big ) t^2 &{}\quad |t| \in [0,T) \\ h_2(t;\lambda ,T,a) &{} :{=} &{} \big ( \frac{\lambda }{18} - \frac{a}{2} \big ) t^2 + a T_2 |t| - \frac{a T T_2}{2} &{}\quad |t| \in [T, T_2) \\ h_3(t;\lambda ,T,a) &{} :{=} &{} \frac{\lambda }{18} \,\, t^2 + \frac{a T_2(T_2-T)}{2} &{}\quad |t| \in [T_2, +\infty ) \end{array} \right. \nonumber \\ \end{aligned}$$
(3.11)

Clearly, the function h above is even and piecewise quadratic; and, as far as regularity is concerned, it is immediate to verify that \(\,h \in \mathcal {C}^\infty ({\mathbb R}\setminus \{\pm T,\pm T_2\}) \,\cap \, \mathcal {C}^1({\mathbb R})\). In particular, the first-order derivative function \(h': {\mathbb R}\rightarrow {\mathbb R}\) and the second-order derivative function \(h'': {\mathbb R}\setminus \{\pm T,\pm T_2\} \rightarrow {\mathbb R}\) are as follows:

$$\begin{aligned} h'(t;\lambda ,T,a)= & {} \left\{ \begin{array}{lcll} h_1'(t;\lambda ,T,a) &{} {=} &{} \left( \frac{\lambda }{9} - a + a \frac{T_2}{T}\right) t &{}\quad |t| \in [0,T) \\ h_2'(t;\lambda ,T,a) &{} {=} &{} \left( \frac{\lambda }{9} - a\right) t + a T_2 \, \mathrm {sign}(t) &{} \quad |t| \in [T, T_2) \\ h_3'(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} \, t &{}\quad |t| \in [T_2, +\infty ) \end{array} \right. \qquad \end{aligned}$$
(3.12)
$$\begin{aligned} h''(t;\lambda ,T,a)= & {} \left\{ \begin{array}{lcll} h_1''(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} - a + a \frac{T_2}{T} &{}\quad |t| \in [0,T) \\ h_2''(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} - a &{}\quad |t| \in (T, T_2) \\ h_3''(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} &{}\quad |t| \in (T_2, +\infty ) \end{array} \right. . \end{aligned}$$
(3.13)

We notice that the functions h in (3.11) and \(h'\) in (3.12) are both continuous at points \(t \in \{\pm T, \pm T_2\}\), whereas for the function \(h''\) in (3.13) we have at points \(t \in \{T,T_2\}\) (analogously at points \(t \in \{-T,-T_2\}\)):

$$\begin{aligned} \begin{array}{lllclclcll} T:&{}\displaystyle {\lim _{t \uparrow T}} &{} h_1''(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} - a + a \frac{T_2}{T} &{} \,{\ne }\, &{} \frac{\lambda }{9} - a &{} {=} &{} \displaystyle {\lim _{t \downarrow T}} &{} h_2''(t;\lambda ,T,a) \\ T_2:&{}\displaystyle {\lim _{t \uparrow T_2}} &{} h_2''(t;\lambda ,T,a) &{} {=} &{} \frac{\lambda }{9} - a &{} \,{\ne }\, &{} \frac{\lambda }{9} &{} {=} &{} \displaystyle {\lim _{t \downarrow T_2}} &{} h_3''(t;\lambda ,T,a) \end{array} . \end{aligned}$$
(3.14)

After recalling that \(\lambda ,T,a > 0\) and \(T_2 > T\), we notice that

$$\begin{aligned} h_1''(t;\lambda ,T,a) = \frac{\lambda }{9} + \frac{a}{T} \left( T_2 - T\right)> 0, \qquad h_3''(t;\lambda ,T,a) = \frac{\lambda }{9} > 0, \end{aligned}$$
(3.15)

hence the function h is strictly convex for \(|t| \in [0,T)\) and \(|t| \in (T_2,\infty )\). A sufficient condition (it is also a necessary condition since the function is quadratic) for h to be strictly convex in \(|t| \in (T,T_2)\) is that the second-order derivative \(h_2''(t;\lambda ,T,a)\) defined in (3.13) is positive. This clearly leads to condition (3.10) in the theorem statement. We have thus demonstrated that if (3.10) is satisfied then h is strictly convex for \(t \in {\mathbb R}\setminus \{\pm T,\pm T_2\}\).

It remains to handle the points \(\pm T,\pm T_2\) where the function h does not admit second-order derivatives. Since h is even and continuously differentiable, it is sufficient to demonstrate that if condition (3.10) is satisfied then the first-order derivative \(h'\) is monotonically increasing at points \(t \in \{T,T_2\}\). In particular, we aim to prove:

$$\begin{aligned} T:&\left\{ \begin{array}{llll} h'(t_1;\lambda ,T,a) &{}{<}&{} h'(T;\lambda ,T,a) &{}\quad \forall t_1 \in \, (0,T) \\ h'(t_2;\lambda ,T,a) &{}{>}&{} h'(T;\lambda ,T,a) &{}\quad \forall t_2 \in \, (T,T_2) \end{array} \right. \end{aligned}$$
(3.16)
$$\begin{aligned} T_2:&\left\{ \begin{array}{llll} h'(t_2;\lambda ,T,a) &{}{<}&{} h'(T_2;\lambda ,T,a) &{}\quad \forall t_2 \in \, (T,T_2) \\ h'(t_3;\lambda ,T,a) &{}{>}&{} h'(T_2;\lambda ,T,a) &{}\quad \forall t_3 \in \, (T_2,+\infty ) \end{array} \right. \end{aligned}$$
(3.17)

Recalling the definition of \(h'\) in (3.12), we obtain:

$$\begin{aligned} T:&\left\{ \begin{array}{llll} \left( \frac{\lambda }{9} - a + a \frac{T_2}{T}\right) t_1 &{}{<}&{} \left( \frac{\lambda }{9} - a\right) T + a T_2 &{}\quad \forall t_1 \in \, (0,T) \\ \left( \frac{\lambda }{9} - a\right) t_2 + a T_2 &{}{>}&{} \left( \frac{\lambda }{9} - a\right) T + a T_2 &{}\quad \forall t_2 \in \, (T,T_2) \end{array} \right. \end{aligned}$$
(3.18)
$$\begin{aligned} T_2:&\left\{ \begin{array}{llll} \left( \frac{\lambda }{9} - a\right) t_2 + a T_2 < \frac{\lambda }{9} T_2 &{}\quad \forall t_2 \in \, (T,T_2) \\ \frac{\lambda }{9} t_3 > \frac{\lambda }{9} T_2 &{}\quad \forall t_3 \in \, (T_2,+\infty ) \end{array} \right. \end{aligned}$$
(3.19)

and, after simple algebraic manipulations:

$$\begin{aligned} T:&\left\{ \begin{array}{llll} \big ( \underbrace{\textstyle {\frac{\lambda }{9} - a}}_{>0} + \underbrace{\textstyle {a \frac{T_2}{T}}}_{>0} \big ) \underbrace{\textstyle {\left( t_1-T\right) }}_{<0} < 0 &{}\quad \forall t_1 \in \, (0,T) \\ \underbrace{\textstyle {\left( \frac{\lambda }{9} - a\right) }}_{>0} \underbrace{\textstyle {\left( t_2 - T\right) }}_{>0} > 0 &{}\quad \forall t_2 \in \, (T,T_2) \end{array} \right. \end{aligned}$$
(3.20)
$$\begin{aligned} T_2:&\left\{ \begin{array}{llll} \underbrace{\textstyle {\left( \frac{\lambda }{9} - a\right) }}_{>0} \underbrace{\textstyle {\left( t_2 - T_2\right) }}_{<0} < 0 &{}\quad \forall t_2 \in \, (T,T_2) \\ \underbrace{\textstyle {\frac{\lambda }{9}}}_{>0} \underbrace{\textstyle {\left( t_3 - T_2\right) }}_{>0} > 0 &{}\quad \forall t_3 \in \, (T_2,+\infty ) \end{array} \right. \end{aligned}$$
(3.21)

Since \(\lambda ,T,a > 0\) and we are assuming \(\,\lambda \,{>}\; 9 a\) and \(0< t_1< T< t_2< T_2 < t_3\), inequalities in (3.20) and (3.21) are clearly satisfied, hence the proof is completed.

\(\square \)

We conclude this section by highlighting some important properties of the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1).

Definition 3.6

Let \(Z: {\mathbb R}^n \rightarrow {\mathbb R}\) be a (not necessarily smooth) function. Then, Z is said to be \(\mu \)-strongly convex if and only if there exists a constant \(\mu > 0\), called the modulus of strong convexity of \(\,Z\), such that the function \(Z(x) - \frac{\mu }{2} \, \Vert x \Vert _2^2\) is convex.

Proposition 3.7

Let \(\,\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1) and let the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfy condition (3.10). Then, the functional \(\mathcal {J}(\,\cdot \,;\lambda ,T,a)\) in (1.1) is proper,Footnote 1 continuous (hence, lower semi-continuous), bounded from below by zero, coercive and \(\mu \)-strongly convex with modulus of strong convexity (at least) equal to

$$\begin{aligned} \mu = \lambda \,\;{-}\;\, 9 \, a = 9 \, a \, \left( \tau _c - 1 \right) \; . \end{aligned}$$
(3.22)

The proof is provided in the “Appendix”.

4 Applying ADMM to the proposed CNC model

In this section, we illustrate in detail the ADMM-based [3] iterative algorithm used to numerically solve the proposed model (1.1) in case that the pair of parameters \((\lambda ,a) \in {\mathbb R}_+^* {\times }\; {\mathbb R}_+^*\) satisfies condition (3.10), so that the objective functional \(\mathcal {J}(u;\lambda ,T,a)\) in (1.1) is strongly convex with modulus of strong convexity given in (3.22). Towards this aim, first we resort to the variable splitting technique [1] and introduce the auxiliary variable \(t \in {\mathbb R}^{2n}\), such that model (1.1) is rewritten in the following linearly constrained equivalent form:

$$\begin{aligned}&\{ \, u^*,t^* \} \,\;{\leftarrow }\;\, \mathrm {arg} \min _{u,t}\,\, \Bigg \{ \, \frac{\lambda }{2} \, \Vert u - b \Vert _2^2 + \sum _{i = 1}^{n} \phi \big ( \Vert t_i \Vert _2 ; T,a \big ) \, \Bigg \} \end{aligned}$$
(4.1)
$$\begin{aligned}&\quad \mathrm {subject}\,\mathrm {to:} \; t = D \, u, \end{aligned}$$
(4.2)

where the matrix \(D \in {\mathbb R}^{2n \times n}\) is defined by

$$\begin{aligned} D :{=}\, (D_h^T,D_v^T)^T , \end{aligned}$$
(4.3)

with \(D_h,D_v \in {\mathbb R}^{n \times n}\) the unscaled finite difference operators approximating the first-order horizontal and vertical partial derivatives of an \(n_1 \times n_2\) image (with \(n_1 n_2 = n\)) in vectorized column-major form, respectively, according to the unscaled first-order forward scheme considered in the previous section. More precisely, matrices \(D_h\), \(D_v\) are defined by

$$\begin{aligned} D_h = L_{n_2} \,\;{\otimes }\; I_{n_1}, \qquad D_v = I_{n_2} \,\;{\otimes }\; L_{n_1}, \end{aligned}$$
(4.4)

where \(\otimes \) is the Kronecker product operator and \(L_{z} \in {\mathbb R}^{z \times z}\) denotes the unscaled forward finite difference operator approximating the first-order derivative of a z-samples 1D signal. In particular, we assume periodic boundary conditions for the unknown image u, such that matrix \(L_z\) takes the form

$$\begin{aligned} L_{z} \,\,= \left[ \begin{array}{r@{\quad }r@{\quad }r@{\quad }r@{\quad }r} -1 &{} 1 &{} &{} &{} 0 \\ 0 &{} -1 &{} 1 &{} &{} \\ \vdots &{} &{} \ddots &{} \ddots &{} \\ 0 &{} &{} &{} -1 &{} 1 \\ 1 &{} 0 &{} \ldots &{} 0 &{} -1 \end{array} \right] . \end{aligned}$$
(4.5)

In (4.1) we defined as \(t_i :{=}\, \big ( (D_h u)_i \,,\, (D_v u)_i \big )^T \in {\mathbb R}^2\) the discrete gradient of the image u at pixel i, obtained by selecting from the vector t the \(i-\)th and \((i+n)-\)th entries. We remark that the auxiliary variable t has been introduced in (4.1)–(4.2) to transfer the discrete gradient operators \((\nabla u)_i\) in (1.1) out of the non-convex non-smooth regularization term \({\phi (\Vert \, \cdot \, \Vert _2;T,a)}\).

To solve problem (4.1)–(4.2), we define the augmented Lagrangian functional

$$\begin{aligned} \mathcal {L}(u,t;\rho )= & {} \displaystyle { \frac{\lambda }{2} \, \Vert u - b \Vert _2^2 + \sum _{i = 1}^{n} \phi \big ( \Vert t_i \Vert _2 ; T,a \big ) } \nonumber \\&\displaystyle { {-}\; \langle \, \rho , t - D u \, \rangle + \frac{\beta }{2} \, \Vert t - D u \Vert _2^2, } \end{aligned}$$
(4.6)

where \(\beta > 0\) is a scalar penalty parameter and \(\rho \in {\mathbb R}^{2n}\) is the vector of Lagrange multipliers associated with the system of linear constraints in (4.2). We then consider the following saddle-point problem:

$$\begin{aligned} \mathrm {Find}&\;\;\, (u^*,t^*;\rho ^*) \;\;{\in }\;\; {\mathbb R}^n {\times }\;\, {\mathbb R}^{2n} {\times }\;\, {\mathbb R}^{2n} \nonumber \\ \mathrm {s.t.}&\;\;\, \mathcal {L}\,(u^*,t^*;\rho ) \,\,{\le }\;\, \mathcal {L}\,(u^*,t^*;\rho ^*) \,\le \mathcal {L}\,(u,t;\rho ^*) \nonumber \\&\;\;\, \forall \, (u,t;\rho ) \;\;\,{\in }\;\; {\mathbb R}^n {\times }\;\, {\mathbb R}^{2n} {\times }\;\, {\mathbb R}^{2n}, \end{aligned}$$
(4.7)

with the augmented Lagrangian functional \(\mathcal {L}\) defined in (4.6).

Given the previously computed (or initialized for \(k = 1\)) vectors \(t^{(k-1)}\) and \(\rho ^{(k)}\), the k-th iteration of the proposed ADMM-based scheme applied to the solution of the saddle-point problem (4.6)–(4.7) reads as follows:

$$\begin{aligned} u^{(k)}&\;{\leftarrow }\;\;\,\, \mathrm {arg} \, \min _{u \in {\mathbb R}^n} \; \mathcal {L}(u,t^{(k-1)};\rho ^{(k)}) , \end{aligned}$$
(4.8)
$$\begin{aligned} t^{(k)}&\;{\leftarrow }\;\;\,\, \mathrm {arg} \, \min _{t \in {\mathbb R}^{2n}} \; \mathcal {L}(u^{(k)},t;\rho ^{(k)}) , \end{aligned}$$
(4.9)
$$\begin{aligned} \rho ^{(k+1)}&\;{\leftarrow }\;\;\,\, \rho ^{(k)} \;{-}\; \beta \, \big ( \, t^{(k)} \;{-}\; D u^{(k)} \, \big ). \end{aligned}$$
(4.10)

In Sect. 5 we will show that, under mild conditions on the penalty parameter \(\beta \), the iterative scheme in (4.8)–(4.10) converges to a solution of the saddle-point problem (4.7), that is to a saddle point \((u^*,t^*;\rho ^*)\) of the augmented Lagrangian functional in (4.6), with \(u^*\) representing the unique solution of the strongly convex minimization problem (1.1).

In the following subsections we show in detail how to solve the two minimization sub-problems (4.8) and (4.9) for the primal variables u and t, respectively, then we present the overall iterative ADMM-based minimization algorithm.

4.1 Solving the sub-problem for \(\varvec{u}\)

Given \(t^{(k-1)}\) and \(\rho ^{(k)}\), and recalling the definition of the augmented Lagrangian functional in (4.6), the minimization sub-problem for u in (4.8) can be rewritten as follows:

$$\begin{aligned} u^{(k)} \;{\leftarrow }\;\, \mathrm {arg} \min _{u \in {\mathbb R}^n}\, \bigg \{ \, \frac{\lambda }{2} \, \Vert u - b \Vert _2^2 + \big \langle \, \rho ^{(k)} , D u \, \big \rangle + \frac{\beta }{2} \, \big \Vert t^{(k-1)} - D u \big \Vert _2^2 \,\bigg \}, \end{aligned}$$
(4.11)

where constant terms have been omitted. The quadratic minimization problem (4.11) has first-order optimality conditions which lead to the following linear system:

$$\begin{aligned} \left( I_n + \frac{\beta }{\lambda } \, D^T D \right) u = b \,+\, \frac{\beta }{\lambda } \, D^T \left( t^{(k-1)} - \frac{1}{\beta } \, \rho ^{(k)} \right) . \end{aligned}$$
(4.12)

We remark that from the definition (4.3) of matrix \(D \in {\mathbb R}^{2n \times n}\), it follows that \(D^T D \in {\mathbb R}^{n \times n}\) in (4.12) can be written as \(D_h^T D_h + D_v^TD_v\), with \(D_h, D_v \in {\mathbb R}^{n \times n}\) defined in (4.4)–(4.5). Hence, \(D^T D\) represents a 5-point stencil finite difference discretization of the negative 2D Laplace operator. Since, moreover, \(\beta /\lambda > 0\), the coefficient matrix of the linear system (4.12) is symmetric, positive definite and highly sparse, therefore (4.12) can be solved very efficiently by the iterative (preconditioned) conjugate gradient method.

However, under appropriate assumptions about the solution u near the image boundary, the linear system can be solved even more efficiently by a direct method.

In particular, since we are assuming periodic boundary conditions for u, the matrix \(D^T D\) is block circulant with circulant blocks, so that the coefficient matrix \((I + \frac{\beta }{\lambda } \, D^T D)\) in (4.12) can be completely diagonalized by the 2D discrete Fourier transform (FFT). It follows that at any ADMM iteration the u-subproblem, i.e. the linear system (4.12), can be solved explicitly by one forward and one inverse 2D FFT, each at a cost of \(O(n \log n)\) operations.

We remark that the same computational cost can be achieved by using reflective or anti-reflective boundary conditions, see [10, 12, 26].

4.2 Solving the sub-problem for \(\varvec{t}\)

Given \(u^{(k)}\) and \(\rho ^{(k)}\), and recalling the definition of thepr augmented Lagrangian functional in (4.6), after some simple algebraic manipulations the minimization sub-problem for t in (4.9) can be rewritten in the following component-wise (or pixel-wise) form:

$$\begin{aligned}&t^{(k)} \;{\leftarrow }\; \mathrm {arg} \, \min _{t \in {\mathbb R}^{2n}} \, \sum _{i=1}^{n} \left( \, \phi \big ( \Vert t_i \Vert _2 ; T,a \big ) \,{+}\; \frac{\beta }{2} \big \Vert \, t_i - r_i^{(k)} \big \Vert _2^2 \, \right) , \end{aligned}$$
(4.13)

where the n vectors \(r_i^{(k)} \in {\mathbb R}^{2}\), \(i = 1,\ldots ,n\), which are constant with respect to the optimization variable t, are defined by:

$$\begin{aligned} r_i^{(k)} =\, \big ( D u^{(k)} \big )_i + \frac{1}{\beta } \, \big ( \rho ^{(k)} \big )_i, \end{aligned}$$
(4.14)

with \(\,\big (D u^{(k)}\big )_i\), \(\big (\rho ^{(k)}\big )_{i} \in {\mathbb R}^2\) denoting the discrete gradient and the associated pair of Lagrange multipliers at pixel i, respectively. The minimization problem in (4.13) is thus equivalent to the following n independent 2-dimensional problems:

$$\begin{aligned} t_i^{(k)} {\leftarrow }\; \mathrm {arg} \min _{t_i {\in } {\mathbb R}^2} \left\{ \theta _i(t_i) :{=} \phi \big ( \Vert t_i \Vert _2;T,a \big ) \;{+}\, \frac{\beta }{2} \big \Vert \, t_i - r_i^{(k)} \big \Vert _2^2 \right\} , \quad i = 1, \ldots , n, \end{aligned}$$
(4.15)

where, for future reference, we introduced the cost functions \(\theta _i: {\mathbb R}^2 \rightarrow {\mathbb R}\), \(i = 1,\ldots ,n\).

Since we are imposing that condition (3.10) is satisfied, such that the original functional \(\mathcal {J}(u;\lambda ,T,a)\) in (1.1) is strictly convex, we clearly aim at avoiding non-convexity of the ADMM sub-problems (4.15). In the first part of Proposition 4.1 below, whose proof is provided in the “Appendix”, we give a necessary and sufficient condition for strict convexity of the cost functions in (4.15).

Proposition 4.1

Let \(\,T,a,\beta \,{\in }\; {\mathbb R}_+^*\) and \(r \,{\in }\; {\mathbb R}^2\) be given constants, and let \(\phi (\,\cdot \,;T,a): {\mathbb R}_+ \rightarrow {\mathbb R}\) be the penalty function defined in (2.1). Then:

  1. (1)

    The function

    $$\begin{aligned} \theta (x) :{=} \phi \left( \Vert x \Vert _2;T,a \right) + \frac{\beta }{2} \, \Vert x - r \Vert _2^2, \quad x \in {\mathbb R}^2, \end{aligned}$$
    (4.16)

    is strictly convex (convex) if and only if the following condition holds:

    $$\begin{aligned} \beta > a \, \; \quad (\beta \,\;{\ge }\;\, a \,). \end{aligned}$$
    (4.17)
  2. (2)

    In case that (4.17) holds, the strictly convex minimization problem

    $$\begin{aligned} \arg \min _{x \in {\mathbb R}^2} \, \theta (x) \end{aligned}$$
    (4.18)

    admits the unique solution \(\,x^* {\in }\;\, {\mathbb R}^2\) given by the following shrinkage operator:

    $$\begin{aligned} x^* = \xi ^* r,\quad \mathrm {with} \;\; \xi ^* \, {\in } \; (0,1] \end{aligned}$$
    (4.19)

    equal to

    $$\begin{aligned} \begin{array}{rllcl} \mathrm {a)} \;&{} \xi ^* \,{=}\;\,\, \kappa _1 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \big [0,\kappa _0\big ) \\ \mathrm {b)} \;&{} \xi ^* \,{=}\;\,\, \kappa _2 - \kappa _3 / \Vert r \Vert _2 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \big [\kappa _0,T_2\big ) \\ \mathrm {c)} \;&{} \xi ^* \,{=}\;\,\, 1 &{} \,\;\mathrm {if} \;\; \Vert r \Vert _2 &{}{\in }&{} \left[ T_2,+\infty \right) \\ \end{array} \end{aligned}$$
    (4.20)

    and with

    $$\begin{aligned} \kappa _0 = T + \frac{a}{\beta }\,(T_2-T), \quad \;\, \kappa _1 = \frac{T}{\kappa _0}, \quad \;\, \kappa _2 = \frac{\beta }{\beta -a}, \quad \;\, \kappa _3 = \frac{a T_2}{\beta - a}. \end{aligned}$$
    (4.21)

Based on (4.16)–(4.17) in Proposition 4.1, we can state that all the problems in (4.15) are strictly convex if and only if

$$\begin{aligned} \beta \;{>}\; a \; . \end{aligned}$$
(4.22)

In case that (4.22) is satisfied, the unique solutions of the strictly convex problems (4.15) can be obtained based on (4.19) in Proposition 4.1, that is:

$$\begin{aligned} t_i^{(k)} =\, \xi _i^{(k)} \, r_i^{(k)},\quad i = 1, \ldots , n, \end{aligned}$$
(4.23)

where the shrinkage coefficients \(\,\xi _i^{(k)} {\in }\, (0 , 1]\) are computed according to (4.20). We notice that coefficients \(\kappa _0,\kappa _1,\kappa _2,\kappa _3\) in (4.21) are constant during the ADMM iterations, hence they can be precomputed once and for all at the beginning. The solutions of problems (4.15) can thus be determined very efficiently by the closed forms given in (4.20), with computational cost O(n).

4.3 ADMM-based minimization algorithm

To summarize previous results, in Algorithm 1 we report the main steps of the proposed ADMM-based iterative scheme used to solve the saddle-point problem (4.6)–(4.7) or, equivalently (as it will be proven in the next section), the minimization problem (1.1). We remark that the constraint on the ADMM penalty parameter \(\beta \) in Algorithm 1 is more stringent than that in (4.22) which guarantees the convexity of the ADMM subproblem for the primal variable t. Indeed, the more stringent requirement is needed for the analysis of convergence that will be carried out in Sect. 5.

figure a

5 Convergence analysis

In this section, we analyze convergence of the proposed ADMM-based minimization approach, whose main computational steps are reported in Algorithm 1. In particular, we prove convergence of Algorithm 1 in case that conditions (3.10) and (5.16) are satisfied.

To simplify the notations in the subsequent discussion, we give the following definitions concerning the objective functional in the (ut)-split problem (4.1)–(4.2):

$$\begin{aligned} G(u,t) :{=} \underbrace{\frac{\lambda }{2} \, \Vert u - b \Vert _2^2}_{F(u)} + \underbrace{\sum _{i = 1}^{n} \phi \big ( \Vert t_i \Vert _2 ; T,a \big )}_{R(t)}, \end{aligned}$$
(5.1)

where the parametric dependencies of the fidelity term F(u) on \(\lambda \) and of the regularization term R(t) on T and a are dropped for brevity. The augmented Lagrangian functional in (4.6) can thus be rewritten more concisely as

$$\begin{aligned} \mathcal {L}(u,t;\rho ) = F(u) + R(t) \;{-}\; \langle \, \rho , t - D u \, \rangle + \frac{\beta }{2} \, \Vert t - D u \Vert _2^2, \end{aligned}$$
(5.2)

and the regularization term in the original proposed model (1.1), referred to as \(\mathcal {R}(u)\), reads

$$\begin{aligned} \mathcal {R}(u) = R(Du). \end{aligned}$$
(5.3)

To prove convergence, we follow the same methodology used, e.g., in [35], for convex optimization problems, based on optimality conditions of the augmented Lagrangian functional with respect to the pair of primal variables (ut) and on the subsequent construction of suitable Fejér-monotone sequences. However, in [35] and other works using the same abstract framework for proving convergence, such as [36] where convex non-quadratic fidelity terms are considered, the regularization term of the convex objective functional is also convex (Total Variation semi-norm in [35, 36]). In contrast, in our CNC model (1.1) the total objective functional \(\mathcal {J}\) is convex but the regularizer is non-convex. This calls for a suitable adaptation of the above mentioned proofs, in particular of the proof in [35], where the same \(\ell _2\) fidelity term as in our model (1.1) is considered.

Our proof will be articulated into the following parts:

  1. 1. 

    derivation of optimality conditions for problem (1.1);

  2. 2. 

    derivation of convexity conditions for the augmented Lagrangian in (5.2);

  3. 3. 

    demonstration of equivalence (in terms of solutions) between the split problem (4.1)–(4.2) and the saddle-point problem (4.6)–(4.7);

  4. 4. 

    demonstration of convergence of Algorithm 1 to a solution of the saddle-point problem (4.6)–(4.7), hence to the unique solution of (1.1).

5.1 Optimality conditions for problem (1.1)

Since the regularization term in our model (1.1) is non-smooth non-convex, unlike in [35] we need to resort also to concepts from calculus for non-smooth non-convex functions, namely the generalized (or Clarke) gradients [11]. In the following we will denote by \(\partial _x [\, f \,](x^*)\) and by \(\bar{\partial }_x [\, f \,](x^*)\) the subdifferential (in the sense of convex analysis [29, 30]) and the Clarke generalized gradient [11], respectively, with respect to x of the function f calculated at \(x^*\).

In Lemma 5.1 below we give some results on locally Lipschitz continuity for the functions involved in the subsequent demonstrations, which are necessary for the generalized gradients being defined. Then, in Proposition 5.2, whose proof is given in [20], we give the first-order optimality conditions for problem (1.1).

Lemma 5.1

For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), the functional \(\mathcal {J}\) in (1.1) and, separately, the regularization term \(\mathcal {R}\) in (5.3) and the quadratic fidelity term, are locally Lipschitz continuous functions.

Proof

The proof is immediate by considering that the quadratic fidelity term and, under condition (3.10), the total functional \(\mathcal {J}\) in (1.1), are both convex functions, hence locally Lipschitz, and that the regularization term \(\mathcal {R}\) in (5.3) is the composition of locally Lipschitz functions (note that the penalty function \(\phi (\,\cdot \,;T,a)\) defined in (2.1) is globally L-Lipschitz continuous with \(L=a(T_2-T)\)), hence locally Lipschitz. \(\square \)

Proposition 5.2

For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), the functional \(\mathcal {J}: {\mathbb R}^n \rightarrow {\mathbb R}\) in (1.1) has a unique (global) minimizer \(u^*\) which satisfies

$$\begin{aligned} 0 \;\;{\in }\;\, \partial _u \left[ \, \mathcal {J} \,\right] (u^*), \end{aligned}$$
(5.4)

where 0 denotes the null vector in \({\mathbb R}^n\) and \(\partial _u \big [\,\mathcal {J}\,\big ](u^*) \subset {\mathbb R}^n\) represents the subdifferential (with respect to u, calculated at \(u^*\)) of functional \(\mathcal {J}\). Moreover, it follows that

$$\begin{aligned} 0 \;\;{\in }\;\, D^T \, \bar{\partial }_{t} \left[ \, R \,\right] (Du^*) \,+\, \lambda \, (u^* - b ), \end{aligned}$$
(5.5)

where \(\,\bar{\partial }_{t}\big [\,R\,\big ](Du^*) \subset {\mathbb R}^{2n}\) denotes the Clarke generalized gradient (with respect to t, calculated at \(Du^*\)) of the non-convex non-smooth regularization function R defined in (5.1).

5.2 Convexity conditions for the augmented Lagrangian in (5.2)

Subsequent parts 3. and 4. of our convergence analysis require that the augmented Lagrangian functional in (5.2) is jointly convex with respect to the pair of primal variables (ut). In [35, 36], where the regularization term is convex, such property is clearly satisfied for any positive value of the penalty parameter \(\beta \). In our case, where the regularization term is non-convex, this is not trivially true and needs some investigation, which is the subject of Lemma 5.3 and Proposition 5.4 below.

Lemma 5.3

Let \(F: {\mathbb R}^n \rightarrow {\mathbb R}\) be the fidelity function defined in (5.1), \(D \in {\mathbb R}^{2n \times n}\) the finite difference matrix defined in (4.3)–(4.5), \(b \in {\mathbb R}^n\), \(v \in {\mathbb R}^{2n}\) given constant vectors and \(\lambda \in {\mathbb R}_+^*\), \(\gamma \in {\mathbb R}\) free parameters. Then, the function \(Z: {\mathbb R}^n \rightarrow {\mathbb R}\) defined by

$$\begin{aligned} Z(u;\lambda ,\gamma ) :{=}\, F(u) \;{-}\; \frac{\gamma }{2} \, \Vert Du - v \Vert _2^2 \,{=}\, \frac{\lambda }{2} \, \Vert u - b \Vert _2^2 \;{-}\; \frac{\gamma }{2} \, \Vert Du - v \Vert _2^2 \end{aligned}$$
(5.6)

is convex in u if and only if

$$\begin{aligned} \gamma \le \frac{\lambda }{8}. \end{aligned}$$
(5.7)

Moreover, if the parameter \(\lambda \) satisfies condition (3.10), (5.7) becomes

$$\begin{aligned} \gamma \le \tau _c \, \frac{9}{8}a, \quad \tau _c \,{\in }\; (1,+\infty ). \end{aligned}$$
(5.8)

Proof

The function Z in (5.6) is quadratic in u, hence it is convex in u if and only if its Hessian \(H_Z\,{\in }\; {\mathbb R}^{n \times n}\) is at least positive semidefinite, that is if and only if

$$\begin{aligned} H_Z = \lambda \, I_n \;{-}\; \gamma \, D^T D \;{\succeq }\; 0, \end{aligned}$$
(5.9)

where, for a given real symmetric matrix M, the notation \(M \succeq 0\) indicates that M is positive semidefinite. Since the matrix \(\,D^T D \in {\mathbb R}^{n \times n}\) is symmetric and positive semidefinite, it admits the eigenvalue decomposition

$$\begin{aligned} D^T D = V E V^T , \quad E = \mathrm {diag}(e_1,e_2,\ldots ,e_n), \quad V^T V = V V^T = I, \end{aligned}$$
(5.10)

with \(\,e_i \,{\in }\; {\mathbb R}_+, \, i=1,\ldots ,n,\,\) indicating the real non-negative eigenvalues of \(D^TD\). Replacing (5.10) into (5.9), we obtain:

$$\begin{aligned} H_Z= & {} \lambda \, I \;{-}\; \gamma \, V^T E V = V^T \left( \lambda I - \gamma \, E \right) V \nonumber \\= & {} V^T \, \mathrm {diag}(\lambda - \gamma \, e_1, \, \lambda - \gamma \, e_2, \, \ldots , \, \lambda - \, \gamma e_n) \, V \;{\succeq }\; 0. \end{aligned}$$
(5.11)

Condition (5.11) is equivalent to

$$\begin{aligned} \lambda - \gamma e_i \;{\ge }\; 0 \;\; \forall i \in \{1,\ldots ,n\} \,\;{\Longleftrightarrow }\;\, \lambda - \gamma \max _{i} e_i \;{\ge }\; 0 \,\;{\Longleftrightarrow }\;\, \gamma \;{\le }\; \frac{\lambda }{\max _{i} e_i}.\quad \end{aligned}$$
(5.12)

As previously highlighted, the matrix \(D^TD \in {\mathbb R}^{n \times n}\) represents a 5-point stencil unscaled finite difference approximation of the negative Laplace operator applied to an \(n_1 \times \, n_2\) vectorized image, with \(n = n_1 n_2\). In order to obtain an upper bound for the maximum eigenvalue \(\max _{i} e_i\) of \(D^T D\), first we recall definitions (4.3)–(4.4) and rewrite \(D^T D\) as

$$\begin{aligned} D^T D= & {} D_h^T D_h + D_v^T D_v \nonumber \\= & {} \big ( L_{n_2} \,{\otimes }\; I_{n_1} \big )^T \big ( L_{n_2} \,{\otimes }\; I_{n_1} \big ) + \big ( I_{n_2} \,{\otimes }\; L_{n_1} \big )^T \big ( I_{n_2} \,{\otimes }\; L_{n_1} \big ) \nonumber \\= & {} \big ( L_{n_2}^T L_{n_2} \big ) \;{\otimes }\; I_{n_1} + I_{n_2} \;{\otimes }\; \big ( L_{n_1}^T L_{n_1} \big ) \end{aligned}$$
(5.13)
$$\begin{aligned}= & {} \big ( L_{n_2}^T L_{n_2} \big ) \;{\oplus }\; \big ( L_{n_1}^T L_{n_1} \big ), \end{aligned}$$
(5.14)

where (5.13) follows from properties of the Kronecker product operator and \(\oplus \) in (5.14) denotes the Kronecker sum operator. It thus follows from (5.14) and from the properties of the eigenvalues of the Kronecker sum of two matrices that the maximum eigenvalue of \(D^T D\) is equal to the sum of the maximum eigenvalues of \(\,L_{n_1}^T L_{n_1} \in {\mathbb R}^{n_1 \times n_1}\) and \(L_{n_2}^T L_{n_2} \in {\mathbb R}^{n_2 \times n_2}\), with the prototypical matrix \(L_z \in {\mathbb R}^{z \times z}\) defined in (4.5). From (4.5), we have that matrix \(L_z^T L_z \in {\mathbb R}^{z \times z}\) takes the form

$$\begin{aligned} L_z^T L_z = \left[ \begin{array}{r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r} 2 &{} -1 &{} 0 &{} \ldots &{} 0 &{} -1 \\ -1 &{} 2 &{} -1 &{} &{} &{} 0 \\ 0 &{} -1 &{} \ddots &{} \ddots &{} &{} \vdots \\ \vdots &{} &{} \ddots &{} \ddots &{} -1 &{} 0 \\ 0 &{} &{} &{} -1 &{} 2 &{} -1 \\ -1 &{} 0 &{} \ldots &{} 0 &{} -1 &{} 2 \end{array} \right] . \end{aligned}$$
(5.15)

By using Gershgorin’s theorem [39], we can state that all the real eigenvalues of the matrix \(L_z^T L_z\) in (5.15) belong to the interval [0, 4] independently of the order z. It follows that the eigenvalues of both matrices \(\,L_{n_1}^T L_{n_1}\) and \(\,L_{n_2}^T L_{n_2}\) are bounded from above by 4 and, hence, the maximum eigenvalue \(\max _{i} e_i\) of \(D^T D\) is bounded from above by 8. Substituting 8 for \(\max _{i} e_i\) in (5.12), condition (5.7) and condition (5.8) follow. \(\square \)

Proposition 5.4

For any given vector of Lagrange multipliers \(\rho \in {\mathbb R}^{2n}\), the augmented Lagrangian functional \(\mathcal {L}(u,t;\rho )\) in (5.2) is proper, continuous and coercive jointly in the pair of primal variables (ut). Moreover, in case that condition (3.10) is satisfied, \(\mathcal {L}(u,t;\rho )\) is jointly convex in (ut) if the penalty parameter \(\beta \) satisfies

$$\begin{aligned} \beta \,\;{\ge }\;\,\, a \, \frac{\lambda }{\lambda - 8a}, \end{aligned}$$
(5.16)

or, equivalently

$$\begin{aligned} \beta \,\;{\ge }\;\,\, 9 \, a \, \frac{\tau _c}{9\tau _c \, -8}, \,\;\quad \tau _c \,{\in }\; (1,+\infty ). \end{aligned}$$
(5.17)

Proof

Functional \(\mathcal {L}(u,t;\rho )\) in (5.2) is clearly proper and continuous in (ut). For what concerns coercivity, we rewrite \(\mathcal {L}(u,t;\rho )\) as follows:

$$\begin{aligned} \mathcal {L}(u,t;\rho )&\,{=}\,&F(u) \,+ R(t) \,+\, \frac{\beta }{2} \, \Vert t - D u \Vert _2^2 \,\;{-}\;\, \langle \rho , t - D u \rangle \end{aligned}$$
(5.18)
$$\begin{aligned}= & {} F(u) \,+ R(t) \,+\, \frac{\beta }{2} \, \Big \Vert t - D u - \frac{1}{\beta } \, \rho \Big \Vert _2^2 \;{-}\; \frac{1}{2 \beta } \, \Vert \rho \Vert _2^2. \end{aligned}$$
(5.19)

We notice that the second term in (5.19) is bounded and the last term does not depend on (ut), hence they do not affect coercivity of \(\mathcal {L}(u,t;\rho )\) with respect to (ut). Then, the first and third terms in (5.19) are jointly coercive in u and the third term is coercive in t, hence \(\mathcal {L}(u,t;\rho )\) is coercive in (ut).

Starting from (5.18), we have:

$$\begin{aligned} \mathcal {L}(u,t;\rho )= & {} F(u) \;{-}\; \frac{\gamma _1}{2} \, \Vert Du \Vert _2^2 + R(t) + \frac{\gamma _2}{2} \, \Vert t \Vert _2^2 \nonumber \\&+ \frac{\gamma _1}{2} \, \Vert Du \Vert _2^2 \;{-}\; \frac{\gamma _2}{2} \, \Vert t \Vert _2^2 + \frac{\beta }{2} \, \Vert t - D u \Vert _2^2 \;{-}\; \langle \rho , t - D u \rangle \nonumber \\= & {} \;{-}\; \underbrace{\langle \rho , t - D u \rangle }_{\mathcal {L}_1(u,t)} + \underbrace{F(u) \;{-}\; \frac{\gamma _1}{2} \, \Vert Du \Vert _2^2}_{\mathcal {L}_2(u)} + \underbrace{R(t) + \frac{\gamma _2}{2} \, \Vert t \Vert _2^2}_{\mathcal {L}_3(t)} \nonumber \\&{+}\, \underbrace{ \frac{\beta +\gamma _1}{2} \, \Vert Du \Vert _2^2 \;{-}\; \beta \, \langle D u , t \rangle + \frac{\beta -\gamma _2}{2} \, \Vert t \Vert _2^2}_{\mathcal {L}_4(u,t)}, \end{aligned}$$
(5.20)

where \(\gamma _1\) and \(\gamma _2\) are scalars satisfying

$$\begin{aligned} \gamma _1 \;{\le }\;\, \tau _c \frac{9}{8}a, \qquad \gamma _2 \;{\ge }a, \end{aligned}$$
(5.21)

such that, according to condition (5.8) in Lemma 5.3 and condition (4.17) in Proposition 4.1, the terms \(\mathcal {L}_2(u)\) and \(\mathcal {L}_3(t)\) in (5.20) are convex in u and t (hence also in (ut)), respectively. The term \(\mathcal {L}_1(u,t)\) is affine in (ut), hence convex. For what concerns \(\mathcal {L}_4(u,t)\), it is jointly convex in (ut) if it can be reduced to the form \(+\big \Vert \, c_1 Du - c_2 t \big \Vert _2^2\) with \(c_1,c_2 > 0\). We thus impose that the coefficients of the terms \(\Vert Du\Vert _2^2\) and \(\Vert t\Vert _2^2\) in (5.20) are positive and that twice the product of the square roots of these coefficients is equal to the coefficient of the term \(-\langle Du , t \rangle \) in (5.20), that is:

$$\begin{aligned} \beta \;{>}\; -\gamma _1, \qquad \quad \beta \;{>}\;\, \gamma _2 \, , \qquad \quad \beta \left( \gamma _1 - \gamma _2 \right) = \gamma _1 \gamma _2. \end{aligned}$$
(5.22)

Simple calculations prove that conditions in (5.21)–(5.22) are equivalent to the following:

$$\begin{aligned} \beta \;{=}\, \frac{\gamma _1 \gamma _2}{\gamma _1 \,{-}\, \gamma _2}, \quad (\gamma _1,\gamma _2) {\in }\; \Gamma \, :{=} \, \left\{ (\gamma _1,\gamma _2) \,{\in }\, {\mathbb R}^2 : \, \gamma _1 \;{>}\; \gamma _2, \, \gamma _1 \;{\le }\;\, \tau _c \frac{9}{8} \, a, \, \gamma _2 \;{\ge }\;\, a \right\} . \end{aligned}$$
(5.23)

We notice that the set \(\Gamma \) in (5.23) is not empty and contains only strictly positive pairs \((\gamma _1,\gamma _2)\). Hence, there exist triplets \((\beta ,\gamma _1,\gamma _2)\) such that (5.23) is satisfied and the augmented Lagrangian functional in (5.20) is convex. By simple analysis, we can see that the range of the function \(\beta (\gamma _1,\gamma _2) = \gamma _1 \gamma _2 / (\gamma _1 - \gamma _2)\) with \((\gamma _1,\gamma _2) \in \Gamma \) coincides with (5.17) and, after recalling that \(\lambda = \tau _c 9 a\), with (5.16). This completes the proof. \(\square \)

5.3 Equivalence between problems (4.1)–(4.2) and (4.6)–(4.7)

The optimality conditions derived in Sect. 5.1, together with the convexity conditions in Sect. 5.2 and the results in the following Lemmas 5.55.6 allow us to demonstrate in Theorem 5.7 that the saddle-point problem in (4.6)–(4.7) is equivalent (in terms of solutions) to the minimization problem in (4.1)–(4.2). In particular, we prove that, for any pair \((\lambda ,a)\) satisfying (3.10) and any \(\beta \) satisfying (5.17), the saddle-point problem (4.6)–(4.7) has at least one solution and, very importantly, all its solutions will provide pairs of primal variables \((u^*,t^*)\) with \(t^*=Du^*\), which solve the split problem (4.1)–(4.2), and thus \(u^*\) represents the unique global minimizer of the strongly convex functional \(\mathcal {J}\) in (1.1).

Lemma 5.5

Assume that \(Z = Q + S\), Q and S being lower semi-continuous convex functions from \(\,\,{\mathbb R}^n\) into \(\,{\mathbb R}\), \(\,S\) being G\(\hat{a}\)teaux-differentiable with differential \(S^{\prime }\). Then, if \(p^* \in {\mathbb R}^n\), the following two conditions are equivalent to each other:

  1. (1)

    \(\;p^* \;{\in }\; \arg \inf _{p \in {\mathbb R}^n} Z(p)\,\);

  2. (2)

    \(\;Q(p)-Q(p^*) + \left\langle \, S^{\prime } (p^*) , p-p^* \, \right\rangle \ge 0 \quad \; \forall p \in {\mathbb R}^n\).

Moreover, in case that the function Q has a (separable) structure of the type \(Q(p) = Q(p_1,p_2) = Q_1(p_1) + Q_2(p_2)\) with \(p_1 \in {\mathbb R}^{n_1}\) and \(p_2\in {\mathbb R}^{n_2}\) (and \(n_1 + n_2 = n\)) being two disjoint subsets of independent variables, then conditions (1) and (2) above are also both equivalent to the following condition:

  1. (3)

    \(\;\left\{ \begin{array}{lcl} p_1^* &{}{\in }&{} \arg \inf _{p_1 \in {\mathbb R}^{n_1}} \big \{ Z_1(p_1) :{=}\, Z(p_1,p_2^*) \, \big \} \\ p_2^* &{}{\in }&{} \arg \inf _{p_2 \in {\mathbb R}^{n_2}} \big \{ Z_2(p_2) :{=} \, Z(p_1^*,p_2) \, \big \} \end{array} \right. \;\;\,\mathrm {with}\;\; (p_1^*,p_2^*) = p^* .\)

Proof

Equivalence of conditions (1) and (2) is demonstrated in [14, Proposition 2.2]. If condition (1) is true, that is \(p^*\) is a global minimizer of the function Z over its domain \({\mathbb R}^n\), then clearly \(p^*\) is also a global minimizer of the restriction of Z to any subset of \({\mathbb R}^n\) containing \(p^*\). Hence, condition (1) implies condition (3). To demonstrate that condition (3) implies condition (2), which completes the proof, first we rewrite in explicit form the two objective functions \(Z_1\) and \(Z_2\) defined in condition (3):

$$\begin{aligned} Z_1(p_1)= & {} Q_1(p_1) + Q_2(p_2^*) + S_1(p_1), \quad S_1(p_1) :{=}\, S(p_1,p_2^*), \end{aligned}$$
(5.24)
$$\begin{aligned} Z_2(p_2)= & {} Q_1(p_1^*) + Q_2(p_2) + S_2(p_2), \quad S_2(p_2) :{=}\, S(p_1^*,p_2). \end{aligned}$$
(5.25)

Assume now that condition (3) holds. Then, by applying the equivalence of conditions (1) and (2), condition (3) can be rewritten as follows:

(5.26)

By summing up the two inequalities in (5.26), we obtain:

$$\begin{aligned}&Q_1(p_1) + Q_2(p_2) - Q_1(p_1^*) - Q_2(p_2^*) \nonumber \\&\qquad {+}\; \left\langle \, \big ( S_1^{\prime }(p_1^*),S_2^{\prime }(p_2^*)\big ) , \big (p_1-p_1^*,p_2-p_2^*\big ) \, \right\rangle \;{\ge }\; 0 \quad \forall \, (p_1,p_2) \in {\mathbb R}^{n_1} {\times }\; {\mathbb R}^{n_2} \nonumber \\&\quad {\equiv }\;\, Q(p_1,p_2) - Q(p_1^*,p_2^*) \nonumber \\&\qquad {+}\; \left\langle \, \big ( S_1^{\prime }(p_1^*),S_2^{\prime }(p_2^*)\big ) , \big (p_1,p_2)-\big (p_1^*,p_2^*\big ) \, \right\rangle \;{\ge }\; 0 \quad \forall \, (p_1,p_2) \in {\mathbb R}^{n_1} {\times }\; {\mathbb R}^{n_2} \nonumber \\&\quad {\equiv }\;\, Q(p) - Q(p^*) + \left\langle \, S^{\prime }(p^*),p-p^* \, \right\rangle \;{\ge }\; 0 \quad \forall \, p \in {\mathbb R}^n , \end{aligned}$$
(5.27)

which coincides with condition 2), thus completing the proof. \(\square \)

Lemma 5.6

For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10), any penalty parameter \(\,\beta \) fulfilling condition (5.17) and any vector of Lagrange multipliers \(\rho \), the augmented Lagrangian functional \(\,\mathcal {L}\) in (5.2) satisfies the following:

$$\begin{aligned} (u^*,t^*) \;{\in }\;\, \displaystyle {\arg \min _{u,t} \, \mathcal {L}\,(u,t;\rho )} \;\;{\Longleftrightarrow }\;\, \left\{ \begin{array}{rll} u^* &{}{\in }&{} \displaystyle {\arg \min _{u} \,\mathcal {L}\,(u,t^*;\rho )} \\ t^* &{}{\in }&{} \displaystyle {\arg \min _{t} \,\mathcal {L}\,(u^*,t;\rho )} \end{array} \right. . \end{aligned}$$
(5.28)

Proof

According to the results in Proposition 5.4, in case that the parameters \((\lambda ,a)\) and \(\,\beta \) satisfy conditions (3.10) and (5.17), respectively, the augmented Lagrangian \(\mathcal {L}(u,t;\rho )\) in (5.2) is proper, continuous, coercive and convex jointly in the pair of primal variables (ut). In particular, within the proof of Proposition 5.4 we highlighted how for any \(\beta \) satisfying (5.17) there exist at least one pair of scalar coefficients \((\gamma _1,\gamma _2)\) such that the augmented Lagrangian functional can be written in the form

$$\begin{aligned} \mathcal {L}(u,t;\rho )= & {} \;{-}\; \underbrace{\langle \, \rho , t - D u \, \rangle }_{\mathcal {L}_1(u,t)} + \underbrace{F(u) \;{-}\; \frac{\gamma _1}{2} \, \Vert Du \Vert _2^2}_{\mathcal {L}_2(u)} + \underbrace{R(t) + \frac{\gamma _2}{2} \, \Vert t \Vert _2^2}_{\mathcal {L}_3(t)} \nonumber \\&+ \underbrace{\Vert c_1 Du - c_2 t \Vert _2^2}_{\mathcal {L}_4(u,t)}, \end{aligned}$$
(5.29)

with functions \(\mathcal {L}_1,\mathcal {L}_2,\mathcal {L}_3,\mathcal {L}_4\) being convex in (ut). In order to apply Lemma 5.5, we rewrite (5.29) in the following equivalent form:

$$\begin{aligned} \mathcal {L}(u,t;\rho )= & {} \underbrace{ \overbrace{F(u) \;{-}\; \frac{\gamma _1}{2} \, \Vert Du \Vert _2^2 + \langle \, \rho , D u \, \rangle }^{Q_1(u)} + \overbrace{R(t) + \frac{\gamma _2}{2} \, \Vert t \Vert _2^2 \;{-}\; \langle \, \rho , t \, \rangle }^{Q_2(t)} }_{Q(u,t)} \nonumber \\&+ \underbrace{\Vert c_1 Du - c_2 t \Vert _2^2}_{S(u,t)}. \end{aligned}$$
(5.30)

From the convexity of functions \(\mathcal {L}_2\) and \(\mathcal {L}_3\) in (5.29), it clearly follows that \(Q_1\) and \(Q_2\), and hence Q, in (5.30) are convex as well. Moreover, function S in (5.30) is clearly convex and G\(\hat{a}\)teaux-differentiable. Hence, by simply applying the property of equivalence of conditions 1) and 3) in Lemma 5.5, statement (5.28) follows and the proof is completed. \(\square \)

Theorem 5.7

For any pair of parameters \((\lambda ,a)\) satisfying condition (3.10) and any parameter \(\,\beta \) fulfilling condition (5.17), the saddle-point problem (4.6)–(4.7) admits at least one solution and all the solutions have the form \((u^*,Du^*;\rho ^*)\), with \(u^*\) denoting the unique global minimizer of functional \(\,\mathcal {J}\) in (1.1).

The proof is postponed to the “Appendix”.

5.4 Convergence of Algorithm 1 towards a solution of (4.6)–(4.7)

Given the existence and the good properties of the saddle points of the augmented Lagrangian functional in (5.2), highlighted in Theorem 5.7, it remains to demonstrate that the ADMM iterative scheme outlined in Algorithm 1 converges towards one of these saddle points, that is towards a solution of the saddle-point problem (4.6)–(4.7). This is the goal of Theorem 5.8 below.

Theorem 5.8

Assume that \(\,(u^*,t^*;\rho ^*)\) is a solution of the saddle-point problem (4.6)–(4.7). Then, for any pair of parameters \((\lambda ,a)\) satisfying condition (3.10) and any parameter \(\beta \) fulfilling condition

$$\begin{aligned} \beta \,\;{>}\;\, \bar{\beta } :{=} \max \Big \{ 2a, \, 9 a \, \frac{\tau _c}{9\tau _c \, -8} \Big \}, \end{aligned}$$
(5.31)

the sequence \(\,\big \{(u^{(k)},t^{(k)};\rho ^{(k)})\big \}_{k=1}^{+\infty }\,\) generated by Algorithm 1 satisfies:

$$\begin{aligned} \lim _{k \rightarrow +\infty } u^{(k)}&=\,&u^*, \end{aligned}$$
(5.32)
$$\begin{aligned} \lim _{k \rightarrow +\infty } t^{(k)}&=\,&t^* =\, D u^*. \end{aligned}$$
(5.33)

The proof is postponed to the “Appendix”.

We conclude this analysis with the following final theorem, whose proof is immediate given Theorem 5.7 and Theorem 5.8 above.

Theorem 5.9

Let the pair of parameters \((\lambda ,a)\) satisfy condition (3.10) and let the parameter \(\beta \) satisfy condition (5.31). Then, for any initial guess, the sequence \(\,\big \{u^{(k)}\}_{k=1}^{+\infty }\,\) generated by Algorithm 1 satisfies:

$$\begin{aligned} \lim _{k \rightarrow +\infty } u^{(k)} =\, u^*, \end{aligned}$$
(5.34)

with \(u^*\) denoting the unique solution of problem (1.1).

6 Experimental results

In this section, we investigate experimentally the performance of the proposed CNC segmentation approach, named CNCS, on 2D synthetic and real images. In particular, in the first two examples we compare CNCS with the most relevant alternative approach, namely the two-step convex segmentation method presented in [5]. Then, in the third example we extend the comparison to some other recent state-of-the-art multiphase segmentation approaches. In the visual results of all the presented experiments, the boundaries of the segmented regions are superimposed on the original images.

We recall that the proposed CNCS method consists of two steps. In the first step an approximate solution of the minimization problem (1.1) is computed by means of the ADMM-based iterative procedure illustrated in Algorithm 1; iterations are stopped as soon as two successive iterates satisfy \(\Vert u^{(k)} - u^{(k-1)} \Vert _2 / \Vert u^{(k-1)} \Vert _2 < 10^{-4}\). In the second step the segmented regions are obtained by means ofF a thresholding procedure.

For what concerns the parameters of the CNCS approach, first we used in all the experiments a fixed value \(\tau _c = 1.01\) of the convexity coefficient in (3.10), so as to always use an almost maximally non-convex regularizer with the constraint that the total functional is convex. Then, the regularization parameter \(\lambda \) has been hand-tuned in each experiment—as usually done in most of the variational methods in image processing—so as to produce the best possible results. We remark that, once \(\tau _c\) and \(\lambda \) have been set, the parameter a and, then, the lower bound \(\,\bar{\beta }\,\) of the ADMM penalty parameter \(\beta \) are automatically obtained based on (3.10) and (5.31), respectively. Moreover, we noticed from the experiments that using \(\beta = 1.1 \,{\cdot }\, \bar{\beta }\) always yields a satisfactory convergence behavior of the ADMM-based algorithm.

For what finally concerns the model parameter T, it has been devised exactly to let the user control the salient parts to be segmented. More precisely, as already pointed out, T should be selected as the maximum expected magnitude of the gradient in the inner segmented part. However, users which are not expert in the field can be easily assisted by automatic procedures based on image gradient estimation, provided in MATLAB and in public image processing software packages such as GIMP.

Example 1

In this example we demonstrate the performance of the proposed CNCS approach in comparison with the convex two-phase model proposed by Cai et al. in [5], referred to as CCZ in the following. For all the experiments, the two parameters \(\lambda \) and \(\eta \) of the CCZ approach have been hand-tuned so as to produce the best possible results. In order to highlight the benefits of using a non-convex penalty function, we applied both methods to the segmentation of a noisy synthetic image, named geometric, containing four basic shapes, with three different gray intensities, on a white background.

In the first row of Fig. 2, from left to right, we show the given image corrupted by additive Gaussian noise with noise level 0.02, the approximate solution \(u^*\) obtained by Algorithm 1 (CNCS) in 7 iterations and by the convex two-phase algorithm (CCZ) in 31 iterations, and the difference between the \(u^*\)’s obtained by CNCS and CCZ algorithms. For the second thresholding step in both algorithms the thresholds were fixed to be \(\rho _1= 0.85\) , \(\rho _2= 0.90\) and \(\rho _3= 0.99\).

Fig. 2
figure 2

Segmentation of the geometric image

The four different regions segmented by the CCZ and CNCS methods are shown in the second and third rows of Fig. 2, respectively. The boundaries of the segmented regions are shown with red color and superimposed on the given images. CCZ fails to detect region 3 (light gray shapes) and it smooths out the boundaries. This behavior is justified from the fact that the well-known TV regularizer used in [5] is defined as the \(\ell _1\)-norm, which inevitably curtails originally salient boundaries to penalize their magnitudes. In particular, as discussed in [34], the TV of a feature is directly proportional to its boundary size, so that one way of minimizing the TV of that feature would be to reduce its boundary size, in particular by smoothing corners. Moreover, the change in intensity due to TV regularization is inversely proportional to the scale of the feature, so that very small-scale features are removed, thus causing failures in the segmentation procedure.

By exploiting the penalty term \(\phi \) introduced in our model, the pixels whose gradient is above the threshold T are identified by the function pieces \(\phi _2\) and \(\phi _3\) defined in (2.1) that represent a very good approximation of the perimeter function introduced in the Mumford–Shah original model. Therefore our model well preserves the sharp boundary shapes as illustrated in Fig. 2 (third row).

In Fig. 3 we highlight the effects of the parameter T on the segmentations obtained by CNCS. In particular, we depict in blue the pixels treated by the functions \(\phi _2\) and \(\phi _3\) for different values of the parameter T: \(T=0.1\) first column, \(T=0.05\) second column, and \(T=8 \times 10^{-4}\) third column. The largest T allows one to detect clearly only the boundary of the darkest rectangle, with the other T values we can distinguish the two gray darker rectangles, Fig. 3 (middle), and with the smallest T we can easily detect the four simple shapes.

Example 2

In this example we demonstrate the benefit of the penalty function introduced in the proposed model to allow for segmentations that reproduce sharp boundaries and detect inhomogeneous regions.

Fig. 3
figure 3

Segmentation of the geometric image by CNCS for different T values

Figure 4a shows the image rectangles of dimension \(400 \times 400\), that we want to segment into \(K=2\) parts: vertical rectangles and background, where the rectangles are characterized by magnitude inhomogeneity. The segmentation results obtained by the proposed method are shown in Fig. 4b using yellow-colored boundaries. The penalty function used in our model allows for preserving the sharp features of the boundaries (e.g., corners) even in the darker regions where the intensity gradient characterizing the boundary is very small.

Fig. 4
figure 4

Segmentation of the rectangles image. a Original, b CNCS, c CCZ (\(\rho =0.19\)), d CCZ and K-means, e Sandberg et al. [31], f Bae et al. [38], g Chan–Vese [7], h Dong et al. [13]

This is a simple example belonging to the class of piecewise smooth segmentation problems for which the model CCZ in [5] has been devised. The segmentation results obtained by CCZ are shown in Fig. 4c, d using, in the second phase, a manually tuned optimal threshold value \(\rho = 0.19\) and a K-means algorithm, respectively. The former can better reproduce the segmented boundaries, however both the results present rounded corners. This effect is again motivated by the approximation of the perimeter with the \(\ell _1\)-norm regularization term in the CCZ model.

A set of state-of-the-art two-phase segmentation methods, including the unsupervised model by Sandberg et al. [31], the max-flow model by Yuan et al. [38], the Chan–Vese [7] and the frame-based variational framework by Dong. et al. [13] are evaluated and compared with our model. The codes we used are provided by the authors, and the parameters in the codes were chosen by trial and error to produce the best results of each method. In Fig. 4e–h the resulting segmentations are superimposed on the original image by using yellow-colored boundaries. We can observe that all the methods fail in segmenting the darker upper part of the two rectangles in the synthetic image, the method CCZ (Fig. 4d) fails only when the automatic K-means post-processing is applied, while our method achieves extremely accurate results.

Example 3

In this example, we test the CNCS approach on real, more complex, images and compare it with some state-of-the-art alternative approaches for multi-region image segmentation. Figure 5 shows segmentation results obtained on a real image brain, a MRI brain scan of dimension \(583 \times 630\). The multiphase segmentation is applied to detect the three regions (\(K=3\)) characterized by white, gray and black pixel color values. For multiphase segmentation, the thresholds \(\rho _i\) used in the second stage can be chosen according to the ones obtained by K-means or can be tuned, in both cases without recomputing the solution of the first more expensive minimization problem, in order to achieve specific segmentation results. In particular, for this example we used the thresholds \(\rho _1= 0.18\) and \(\rho _2=0.56\) for both methods CNCS and CCZ.

Fig. 5
figure 5

Segmentation of brain image into \(K=3\) regions: dark gray, light gray, white. a Original, b CNCS, c CCZ, d Sandberg et al. [31], e Yuan et al. [38], f Chan–Vese [7]

By comparing images in Fig. 5b, c, we can observe that more detailed and complete small features are detected by CNCS model, which, nevertheless, gives comparable result to CCZ model. Other comparisons are illustrated in the second row of Fig. 5. We notice that the results obtained by the methods proposed by Sandberg et al. [31] (Fig. 5d), Yuan et al. [38] (Fig. 5e), and Chan–Vese [8] (Fig. 5f) are less satisfactory than those achieved by CNCS and CCZ.

Another challenging segmentation problem is presented in Fig. 6: the multiphase segmentation of the anti-mass image with dimension \(480 \times 384\) pixels into \(K=3\) regions (see Fig. 6a) identified as light gray, dark gray and black. Also in this case, we compared the proposed CNCS method with alternative state-of-the-art segmentation methods. In particular, in the first row of Fig. 6, the segmentation results obtained by the Sandberg et al. [31] (Fig. 6b), the Yuan et al. [38] (Fig. 6c), and the Chan–Vese [8] models (Fig. 6d) are reported.

Fig. 6
figure 6

Segmentation of anti-mass image into \(K=3\) regions: light gray (region 1), dark gray (region 2), black (region 3). a Original, b Sandberg et al. [31], c Yuan et al. [38], d Chan–Vese [8], e CNCS region 1, f CNCS region 2, g CNCS region 3, h CNCS

Figure 6e–g show the regions segmented by CNCS method, and in Fig. 6h the three detected regions are shown by visualizing the three gray levels. We can appreciate how our method can reveal different meaningful high-level structures in the image which are not captured by the other over-detailed segmentations.

7 Conclusions

We presented a CNC variational model for multiphase image segmentation. The minimized energy functional is made of a standard strictly convex quadratic fidelity term and a new non-convex regularization term designed for penalizing simultaneously the non-smoothness of the segmented inner regions and the length of the boundaries. A sufficient condition for strict convexity of the functional has been derived. This result allows us to benefit from the advantages of using such non-convex regularizer while, at the same time, maintaining strict (or, better, strong) convexity of the optimization problem to be solved. An efficient iterative minimization procedure based on the ADMM algorithm has been proposed, where we have derived a new proximity operator associated with the proposed regularization function. An analysis of convergence of the minimization algorithm has been presented which paves the way for analogous demonstrations for other CNC models. Experiments demonstrate the effectiveness of the proposed approach, also in comparison with some alternative state-of-the-art segmentation methods.