1 Introduction

Image restoration is one of the most interesting problems of image processing, which aims to discover the ‘clean’ or original image from the given corrupted image and covers a wide range of applications, e.g., image denoising, image deblurring and image deconvolution, etc. In practice, image corruptions may come from many different sources, such as malfunctioning pixels in camera sensors, motion blurring, or bit errors in transmission [16, 49], which can often be formulated as the following image convolution model

$$\begin{aligned} f = H\varvec{u} + e \,, \end{aligned}$$
(1)

where \(\varvec{u}\) denotes the restored original image, f is the given corrupted image and e is some additive random noise. The linear operator H in (1) stands for image corruption, e.g., a convolution operator for image deconvolution, a projection operator for image inpainting and the identity operator for image denoising [7, 38], etc. It is often challenging to restore the ‘correct’ image \(\varvec{u}\) inversely from the input corrupted image f, which is ill-posed in most practical scenarios due to the unknown noise e and usually undetermined operator H [43]. To handle the ill-posedness, a regularization term \(J(\varvec{u})\) incorporating prior information about the ‘correct’ image \(\varvec{u}\) is introduced, leading to the following optimization problem:

$$\begin{aligned} \mathop {\text {min}} \limits _{\varvec{u}} \hspace{5.0pt}l\left( {H{\varvec{u}},{f}} \right) + \lambda J\left( {\varvec{u}} \right) , \end{aligned}$$
(2)

where \(l\left( {H\varvec{u},f} \right) \) measures the data fidelity between \(H\varvec{u}\) and the observation f, and \(\lambda >0\) is a positive parameter to trade-off between the two terms of data fidelity and regularization \(J(\varvec{u})\). In view of the proposed optimization model (2), the regularization function \(J\left( \varvec{u} \right) \) encodes the prior image information. And both the smooth quadratic Tikhonov regularization term [36] and the non-smooth total-variation (TV) regularization term [28] were proposed and extensively studied to enforce regularities of image functions. Particularly, the TV regularization function performs well in practice to preserve rich image details and edges while smoothing.

On the other hand, the data fidelity function \(l\left( { \cdot , \cdot } \right) \) in (2) usually penalizes the difference between \(H\varvec{u}\) and f. The formulation of \(l\left( { \cdot , \cdot } \right) \) is carefully taken as the assumed statistical distribution model of the additive noise e. Nowadays, it is well-known that \(l_1\)-norm [12, 25, 42], as the fidelity penalty function \(l\left( { \cdot , \cdot } \right) \,\), performs more robustly to non-Gaussian-type noise in many applications of image restoration than \(l_2\)-norm [5, 26, 37]. The \(l_2\)-norm is sensitive on the specified additive Gaussian noise and quadratic loss. The \(l_1\)-norm shows outperformance on the Laplace-type noise distribution [9, 24], and particularly provides a proper convex approximation to the non-convex and nonlinear sparsity prior, i.e. \(l_0\)-norm, which is closely related to a universal criterion of signal recoverability. To some extent, such convex \(l_1\)-norm and other convex relaxations of the sparsity measurement [4, 15, 22, 30], which enjoy great computational simplicity and efficiency though, yield a biased estimator for the highly non-convex sparsity function of \(l_0\)-norm, and thus fail to achieve the best estimation performance in practice [35, 47]. To alleviate these issues, many works have proposed to directly exploit \(l_0\)-pseudo-norm or apply non-convex approximations of sparsity as the fidelity penalty function \(l(\cdot , \cdot )\). Yuan and Ghanem [44] proposed a sparse optimization method, which applies \(l_0\)-pseudo-norm data fidelity, for image denoising and deblurring with impulse noise. Gu et al. [11] introduced the TV-SCAD model, which utilizes the smoothly clipped absolute deviation (SCAD) data penalty function. Zhang et al. [47] proposed a non-convex model that exploits exponential-type data fitting term for image restoration. And recently, Zhang et al. [45] proposed the TVLog model that makes use of a non-convex log function for data fitting term. In addition, there are some other fractional- and exponential-type date fidelity functions [10, 17], and the directly designed difference of convex functions [1], etc. These abundant non-convex data fidelity penalty functions are proven to outperform the convex \(l_1\)-norm data fidelity penalty function in practice.

Since the data fidelity penalty function is generally non-convex, the computation of minimizers of the non-convex model (2) is a challenge problem. For non-convex models that directly utilize the sparsity function, scholars usually reformulate them into a mathematical programming with equilibrium constraints (MPEC) [2, 23, 44]. Work [44] proposed a proximal ADMM method to solve the MPEC, which is more effective than the penalty decomposition algorithm. As for non-convex models that contain some well-defined non-convex sparsity approximations, such as SCAD, log, truncated-\(l_1\), \(l_p\), fractional- and exponential-type functions, etc. These approximations can be characterized as a concave-convex composite function \(g(\left|\cdot \right|)\), where the outer function \(g(\cdot )\) is concave and the inner one \(\left|\cdot \right|\) is convex. In this work, we focus on this class of popular non-convex models. Below we will briefly introduce several commonly used numerical optimization strategies for solving such non-convex models.

The graduated non-convexity (GNC) method [3, 40, 48] is widely used to optimize non-convex energy functions. Rather than optimizing the original non-convex model directly, GNC method optimize a succession of surrogate functions that begin with a convex approximation of the original model, progressively become non-convex, and then converge to the original model. Due to the lack of efficient solvers, the broad applications of GNC have been limited. The Convex Non-Convex (CNC) strategy [8, 19, 21, 29] aims to construct and then optimize convex functionals containing non-convex regularization terms, which can be regarded as an evolution of the GNC method. Unlike the GNC method, the CNC strategy leads to a sequence of convex problems by carefully refining the parameter, which is a scalar representing the ‘degree of concavity’ of the non-convex term. The difference of convex functions (DC) [32,33,34] is also a common method to solve such non-convex models. As shown in [11, 45], both SCAD and log functions can be decomposed as the difference of two convex functions and then solved using a DC algorithm (DCA). The DCA solves non-convex models by linearizing their second convex function and then solving a sequence of convex functions. These three methods can be viewed as the energy function being redefined in each iteration to obtain a gradually sparser solution; in this sense, they are all multi-stage methods.

It is obvious that the non-convex fidelity functions provide a better mimic of the sparsity measure and are, hence, more effective in sparsity-based estimation than the other convex fidelity functions. Due to these considerations, we propose the image restoration model (2) with a concave-convex composite data-fidelity function \(l\left( { \cdot , \cdot } \right) \), in this work, which properly promotes the sparsity measure and results in a challenging non-smooth non-convex optimization problem. Meanwhile, we introduce its new equivalent optimization models for analysis and numerical implementations under a primal-dual perspective. As shown by Zhang [46], such concave-convex composite function indeed generalizes many popular non-convex sparsity approximation functions and can be in-depth analyzed in a multi-stage optimization way.

1.1 Contributions and Organization

In this paper, we study the proposed non-convex and non-smooth optimization model (2) of image restoration, which consists of a concave-convex composite function to impose sparse data fidelity and a non-smooth TV regularization term. We first follow the essential ideas of multi-stage optimization introduced in [46] to propose a new two-stage optimization approach to solve the optimization model (2) with both outer and inner loops. The key optimization sub-problem is efficiently solved by the augmented Lagrangian method (ALM) based on its dual model in the inner loop at the first stage. The key parameters are updated in the outer loop at the second stage. Then, we show that the proposed optimization model can be rewritten equally as a linearly constrained convex optimization problem using a primal-dual formulation. This approach has two benefits: in theory, it means that the studied non-convex optimization problem could be optimized globally given due to its equivalence to the dual model, which is a linearly constrained convex optimization problem; in numerics, all the variables can be optimized simultaneously under the proposed our new single-stage optimization framework. Moreover, numerical solvers for the kernel optimization problems for both the two-stage and the single-stage approaches can be implemented with two different simple fast algorithms, i.e., projected gradient scheme and indefinite linearized scheme. Extensive experiments over various image restoration problems illustrate the proposed single-stage optimization approach, with both projected gradient and indefinite linearized implementations, outperforms the other approaches, esp. in terms of numerical efficiency, stability, and image recovery. This further confirms our theoretical analysis.

The rest of this paper is organized as follows: Sect. 2 gives the details of the two-stage optimization method. Section 3 presents the mathematically equivalent dual model of the non-convex image restoration optimization model, which turns out to be a linear constrained convex optimization problem. A new efficient ALM-based optimization method is used to solve it. Section 4 gives extensive and comparative experiments for the proposed model and algorithms. A conclusion is given in Sect. 5.

2 Two-Stage Optimization Approach

In this work, we aim to solve the following discretized optimization problem:

$$\begin{aligned} \mathop {\text {min}}\limits _{\varvec{u} \in R^N} \sum _i \, \Psi (O_i(h_i \varvec{u} - f_i)) \, + \, \lambda \left\Vert \nabla \varvec{u}\right\Vert _{l_1} \, , \end{aligned}$$
(3)

where, \(\varvec{u} \in R^{N}\) with \(N = m \times n\) represents the column stacking vectorization of a 2D \(m \times n\) image; for each pixel element i, the sparsity function \(\Psi (\cdot )\) takes 1 for any nonzero variable and vanishes otherwise; \(O_i \in \{0,1\}\), related to the noise type of the corrupted image,Footnote 1 is predefined at the specified pixel i; \(h_i\) is the ith row vector of linear operator H and \(f_i\) is the image value at pixel i. Clearly, it is highly challenging to solve this problem due to its non-convexity and non-smoothness. Many literature works have tried to substitute the sparsity function \(\Psi (\cdot )\) by its well-defined approximation [18, 20, 31, 39], for examples, the well-studied non-convex \(l_p\)-function \(\left|\cdot \right|^p\) while \(p\in (0,1)\), the truncated-\({l_1}\) function \(\text {min}(\left|\cdot \right|, \alpha )\) for any \(\alpha > 0\), and the smoothed-log function \(\alpha \ln (\alpha + \left|\cdot \right|)\,\). It is easy to see that \(\left|\cdot \right|^p\) tends to be \(\Psi (\cdot )\) as \(p \rightarrow 0^+\), the same for \(\text {min}(\left|\cdot \right|, \alpha )/\alpha \) and \( \ln (1 + \left|\cdot \right|/\alpha ) / \ln (1+1/\alpha )\) as \(\alpha \rightarrow 0^+\). A recent study by Chan et al. [8] proposed a so-called CNC strategy to approximate and analyze the sparsity function by a composite concave-convex smooth function. Also, it is known that both \(\left|\cdot \right|^p\) and \(\text {min}(\left|\cdot \right|, \alpha )\) can be generalized as a composite function \(g(\left|\cdot \right|)\), where the outer function \(g(\cdot )\) is concave and the inner one \(\left|\cdot \right|\) is convex. In this work, we apply such concave-convex approximation to the sparsity data term in (3) and study the following optimization problem:

$$\begin{aligned} \mathop {\text {min}}\limits _{\varvec{u} \in R^N} \sum _{i} \, g(\left|O_i(h_i \varvec{u} - f_i)\right|) \, + \, \lambda \left\Vert \nabla \varvec{u}\right\Vert _{l_1} \, . \end{aligned}$$
(4)

2.1 Primal and Dual Models

Let S denote the index set of i for which \(O_i = 1\), then the optimization model (4) can be identically rewritten as

$$\begin{aligned} \mathop {\text {min}}\limits _{\varvec{u} \in R^N} \sum _{i \in S} \, g(\left|h_i \varvec{u} - f_i\right|) \, + \, \lambda \left\Vert \nabla \varvec{u}\right\Vert _{l_1} \, . \end{aligned}$$
(5)

Given the concave duality [27], the concave function g(v) can be equally represented as:

$$\begin{aligned} g\left( v \right) = \mathop {\text {min}}\limits _{q \in R} \Big \{ q \cdot v - g^*(q)\Big \} \, , \end{aligned}$$
(6)

where \({g^ * }\left( q \right) \) is the concave dual of \(g\left( v \right) \) and one also has

$$\begin{aligned} {g^ * }\left( q \right) = \mathop {\text {min}}\limits _{v \in R } \Big \{ v\cdot q - g(v)\Big \} \,. \end{aligned}$$

Generally, the concave dual of concave functions is computable. For examples, the cancave dual for the concave \(l_p\)-function \(\left|\cdot \right|^p\) with \(p\in (0,1)\) is

$$\begin{aligned} {g^*}\left( q \right) = -c \,{q^{{\, p / {(p - 1)}}}} \, , \text { where } c \text { is a positive constant w.r.t.} p\, . \end{aligned}$$
(7)

Similarly, for the truncated-\({l_1}\) function \(\text {min}(\left|\cdot \right|, \alpha )\) with any \(\alpha > 0\), its dual form is

$$\begin{aligned} {g^*}\left( q \right) = \alpha \left( { - 1 + q} \right) I\left( {q \in \left[ {0,1} \right] } \right) \,, \end{aligned}$$
(8)

where \(I(\cdot )\) is the indicator function of the given set. For the smoothed-log function \(g(\cdot ) = \alpha \ln (\alpha + \left|\cdot \right|)\) with any \(\alpha > 0\,\), its dual form is

$$\begin{aligned} {g^*}\left( q \right) = \alpha \ln q - \alpha q + \alpha (1 - \ln \alpha )\,. \end{aligned}$$
(9)

In this work, we focus on the concave truncated-\(l_1\) function, i.e., \(g(\left|h_i \varvec{u} - f_i\right|) = \text {min}\left( \left|h_i \varvec{u} - f_i\right|, \alpha \right) \), to discuss the two proposed dual-based two-stage and single-stage optimization approaches.

With the above concave dual expression (6), each component function of the first term in (5) can be equivalently written as

$$\begin{aligned} g(\left|h_i \varvec{u} - f_i\right|) \, = \, \text {min}_{q_i \ge 0} \Big \{ q_i \cdot \left|h_i \varvec{u} - f_i\right| - g^*(q_i) \Big \} \, , \end{aligned}$$
(10)

for any \(i \in S\), where \(q_i \ge 0\).

2.2 Two-Stage Convex–Concave Algorithm

Now we apply the dual expression (10) to the first term of the optimization model (5). This leads to the following equivalent minimization problem:

$$\begin{aligned} \text {min}_{q_i \ge 0} \mathop {\text {min}}\limits _{\varvec{u} \in R^N}\, \underbrace{\Big \{ \sum _{i \in S} q_i \left|h_i \varvec{u} - f_i\right| \, + \, \lambda \left\Vert \nabla \varvec{u}\right\Vert _{l_1} \, - \,\sum _{i \in S} g^*(q_i) \Big \}}_{ F(\varvec{u}, q) } \, . \end{aligned}$$
(11)

To solve the above optimization problem, the most popular way is to explore a two-stage convex-concave algorithmic framework such that, for each kth outer iteration, two optimization problems are solved sequentially:

  • We first fix each \(q_i^k\), where \(i \in S\), and solve the following sub-problem over \(\varvec{u}\):

    $$\begin{aligned} \varvec{u}^{k+1} \,:= \, \text {arg}\text {min}_{\varvec{u} \in R^N} \, \underbrace{\Big \{ \sum _{i \in S} q_i^k \left|h_i \varvec{u} - f_i\right| \, + \, \lambda \left\Vert \nabla \varvec{u}\right\Vert _{l_1}\Big \}}_{F(\varvec{u}, q^{k})} ; \end{aligned}$$
    (12)
  • Then we fix the computed \(\varvec{u}^{k+1}\) to optimize each \(q_i \ge 0\) over

    $$\begin{aligned} q_i^{k+1} \, := \, \text {arg}\text {min}_{q_i \ge 0} \, \underbrace{\Big \{ q_i \left|h_i \varvec{u}^{k+1} - f_i\right| \, - \, g^*(q_i) \Big \}}_{F(\varvec{u}^{k+1}, q)} \, . \end{aligned}$$
    (13)

Obviously, once \(\varvec{u}^{k+1}\) is computed by (12), the second ‘stage’ optimization sub-problem (13) can be solved easily for each element \(i \in S\), mostly with a closed-form solver (see “Appendix A”). Hence, the main computational load amounts to solve \(\varvec{u}^{k+1}\) over the first optimization problem (12) for any fixed \(q^k\), which is convex but non-smooth. We show that, in the following section, the introduced optimization problem (12) can be solved efficiently by an ALM-based algorithm upon and it avoids dealing with the non-smooth energy function directly.

2.2.1 Duality-based Optimization Algorithm for (12)

Now we study the non-smooth convex optimization sub-problem (12). Observe the equivalent dual representation of the absolute function is

$$\begin{aligned} q_i^k \left|h_i \varvec{u} - f_i\right| \, = \, \text {max}_{w_i \in [-q_i^{k}, q_i^{k}]} w_i \cdot (h_i \varvec{u} - f_i) \, \end{aligned}$$

and the dual form of the total-variation function [6] such that

$$\begin{aligned} \lambda {\left\| {\nabla \varvec{u}} \right\| _{l1}} \, = \, \mathop {\text {max}} \limits _{\left| \varvec{p}_i \right| \le \lambda } \varvec{u}^{T} \textrm{Div} \varvec{p} \, , \end{aligned}$$
(14)

hence we can reformulate the optimization subproblem (12) as its primal-dual model

$$\begin{aligned} \text {min}_{\varvec{u}}\,\text {max}_{\left|\varvec{p}_i\right| \le \lambda } \text {max}_{w_i \in [-q_i^{k}, q_i^{k}]} \, - \sum _{i \in S} w_i \cdot f_i \, + \, \varvec{u}^{T} \Big (\sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \Big ) \, . \end{aligned}$$
(15)

By minimizing the above primal-dual formulation (15) over the free variable \(\varvec{u}\), we then obtain its mathematically equivalent dual model, i.e., a linear-equality constrained optimization problem, as follows:

$$\begin{aligned} \text {max}_{\left|\varvec{p}_i\right| \le \lambda } \text {max}_{w_i \in [-q_i^{k}, q_i^{k}]} \, - \sum _{i \in S} w_i \cdot f_i \, , \quad \text {s.t.}\;\; \sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \, = \, 0\, , \end{aligned}$$
(16)

which is the so-called dual model of (15).

It is easy to see that, in view of the primal-dual model (15), the optimization target \(\varvec{u}\) just works as the optimal multiplier to the linear-equality constraints in its equivalent dual model (16).

The classical augmented Lagrangian method (ALM) can be applied to efficiently solve such a linear-equality constrained dual optimization problem (16). Its augmented Lagrangian function is defined as

$$\begin{aligned} \mathcal {L_\beta }( {w,\varvec{p};\varvec{u}}) = - \sum _{i \in S} w_i \cdot f_i + \varvec{u}^T \big ( \sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \big ) - \frac{\beta }{2} \Vert \sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p}\Vert ^2 \, . \end{aligned}$$
(17)

Given the above Lagrangian function \({\mathcal {L}_\beta }( {w,\varvec{p};\varvec{u}})\), the duality-based ALM algorithm uses the following iterative algorithm till convergence, i.e., for each lth inner iteration:

  • Fix \(\varvec{u}^l\) and \(\varvec{p}^l\), solve the following optimization sub-problem over w:

    $$\begin{aligned} w^{l+1} \, := \, \mathop {\text {arg}\text {max}_{w_i \in [-q_i^k, q_i^k]} } \mathcal {L_\beta }(w, \varvec{p}^l ;\varvec{u}^l) \, ; \end{aligned}$$
    (18)
  • Fix the computed \(w^{l+1}\) and \(\varvec{u}^l\), solve the following optimization problem over each \(\varvec{p}\)

    $$\begin{aligned} \varvec{p}^{l+1} \, := \, \text {arg}\text {max}_{\varvec{p}_i \le \lambda } {\mathcal L_\beta }(w^{l+1}, \varvec{p}; \varvec{u}^l) \, ; \end{aligned}$$
    (19)
  • Once \(w^{l+1}\) and \(\varvec{p}^{l+1}\) are computed, simply update \(\varvec{u}\) by

    $$\begin{aligned} \varvec{u}^{l+1} \, := \, \varvec{u}^l - \beta \, \big ( \sum _{i \in S} w_i^{l+1} {h_i}^T + \textrm{Div} \varvec{p}^{l+1} \big ) \, . \end{aligned}$$
    (20)

Clearly, the ALM-based algorithm proposed above solves the convex optimization problem (12) without dealing with its non-smoothness directly. Refer to “Appendix B” for details for numerical implementations for the above optimization problems (18) and (19).

3 Single-Stage Optimization Approach

For optimizing a convex-concave composite energy function, the multi-stage type algorithms are usually applied. For example, as shown in Sect. 2.2, the proposed two-stage optimization approach to the convex-concave optimization problem (11) alternates between the optimization of the two variables \(\varvec{u}\) and q. It essentially incorporates an inner iterative ALM-based algorithm for the first ‘stage’ optimization of \(F(\varvec{u}, q^k)\) over \(\varvec{u}\) while fixing \(q^k\), and an outer iteration to update \(q^{k+1}\) over the follow-up ‘stage’ with optimization of \(F(\varvec{u}^{k+1},q)\) by fixed \(\varvec{u}^{k+1}\).

In this section, we show that, for the convex-concave optimization problem (11), we can optimize \(F(\varvec{u}, q)\) over \(\varvec{u}\) and q simultaneously, i.e., a single-stage optimization approach, which enjoys great advantages in both theoretical analysis and numerical practices.

3.1 Primal and Dual Models

Following a similar primal-dual analysis as in Sect. 2.2.1, the optimization problem (11) can be equally reformulated as its primal-dual model

$$\begin{aligned} \text {min}_{\varvec{u}}\,\text {max}_{\left|\varvec{p}_i\right| \le \lambda } \text {min}_{q_i \ge 0} \text {max}_{w_i \in [-q_i, q_i]} \, - \sum _{i \in S} w_i \cdot f_i \, - \, \sum _{i \in S} g^*(q_i) \, + \, \varvec{u}^{T} \Big (\sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \Big ) \, . \end{aligned}$$
(21)

Notice that the variable q in the model (21) is independent of the others, we then add negative signs to all variables except q and get a more uniform format as follows:

$$\begin{aligned} \text {max}_{\varvec{u}}\,\text {min}_{\left|\varvec{p}_i\right| \le \lambda } \text {min}_{q_i \ge 0} \text {min}_{w_i \in [-q_i, q_i]} \, \sum _{i \in S} w_i \cdot f_i \, - \, \sum _{i \in S} g^*(q_i) \, - \, \varvec{u}^{T} \Big (\sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \Big ) \, . \end{aligned}$$
(22)

It is obvious to see that the optimization of q in the above primal-dual model (22) is just the optimization of the bound of w. Thus, the proposed two-stage algorithm in Sect. 2.2 actually alternates between the optimization of \((w, \varvec{p},\varvec{u})\) and the bound q of w. As a matter of fact, we prove that, in the following, all the variables of \((w, \varvec{p},\varvec{u})\) and q in the primal-dual model (22) can be optimized at the same time, utilizing the a single-stage approach as given later.

To see this, we first maximize the aforementioned primal-dual model (22) over the unconstrained variable \(\varvec{u}\,\) and obtain the theoretically identical dual formulation

$$\begin{aligned} \begin{aligned}&\text {min}_{\left|\varvec{p}_i\right| \le \lambda } \text {min}_{q_i \ge 0} \text {min}_{w_i \in [-q_i, q_i]} \,\ \sum _{i \in S} w_i \cdot f_i \, - \, \sum _{i \in S} g^*(q_i) \, , \\&\qquad \qquad \ \text {s.t.}\ \ \qquad \quad \sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \, = \, 0\, , \end{aligned} \end{aligned}$$
(23)

which is known as the dual model of (11). The optimization of \(\varvec{u}\), according to the primal-dual model (22), simply functions as the optimal multiplier to the linear-equality constraint in its corresponding dual model (23).

For decoupling the two variables w and q, the constraint \(w_i \in [-q_i, q_i]\) can be essentially formulated as two linear inequality constraints as below:

$$\begin{aligned} w_i + q_i \, \ge 0 \, , \quad - w_i + q_i \, \ge \, 0 \, . \end{aligned}$$
(24)

Thus, we finally obtain another equivalent dual model of the studied optimization problem (11) such that

$$\begin{aligned} \begin{aligned} \text {min}_{\left|\varvec{p}_i\right| \le \lambda \,, q_i \ge 0 \,, w_i} \quad&\sum _{i \in S} w_i \cdot f_i - \, \sum _{i \in S} g^*(q_i) \\ \text {s.t.} \quad \qquad&\sum _{i \in S} w_i {h_i}^T + \textrm{Div} \varvec{p} \, = \, 0 \,, \\&\, w_i + q_i \, \ge 0 \,, \\&- w_i + q_i \, \ge \, 0 \, . \end{aligned} \end{aligned}$$
(25)

Given the concave function \(g^*(\cdot )\,\), the dual model (25) is essentially a convex optimization problem with linear equality and inequality constraints. Consequently, it is clear that this new optimization model (25) can be solved by simultaneously optimizing all the variables \(\left( w,\varvec{p},q,\varvec{u}\right) \,\).

3.2 A Single-Stage Algorithm for (25)

Given the linearly constrained optimization model (25), ALM provides one most simple framework to develop efficient algorithms to tackle such convex optimization problem with both linear equality and inequality constraints. By the recent study [14], the number of variable blocks has a considerable influence on the performance of ALM-based algorithms, and the linear inequality constraints add new challenges. Therefore, we simplify the introduced optimization model (25) by integrating the two variables w and \(\varvec{p}\) into one new variable \(\varvec{x}\), namely \(\varvec{x} = \left( w \,, \, \varvec{p} \right) ^{T}\,\), and reformulate it as follows:

$$\begin{aligned} \begin{aligned} \text {min}_{\varvec{x} \in \Omega \,,q_i \ge 0} \quad&\left\langle {\hat{\varvec{f}}}, \varvec{x} \right\rangle - \, \sum _{i \in S} g^*(q_i) \\ \text {s.t.} \quad \quad \ {}&A\varvec{x} + Bq = 0 \ (\text {or} \ge 0) \,, \end{aligned} \end{aligned}$$
(26)

where \({\hat{\varvec{f}}}^T = \left( f_{s_1},f_{s_2}, \ldots , f_{s_{n^{'} }}, 0, \ldots , 0 \right) \), \(A = \left( \begin{matrix} {\hat{H}}^T &{}{\textrm{Div}} \\ {\hat{I}} &{} \varvec{0} \\ -{\hat{I}} &{} \varvec{0} \end{matrix} \right) \), \(B = \left( \varvec{0} \,,\, {\hat{I}} \,,\, {\hat{I}}\right) ^{T}\,\), \({\hat{H}}^T = \big ( h_{s_1}^T, h_{s_2}^T, \ldots , h_{s_{n^{'} }}^T \big )\), \(\hat{I} = (\varvec{1}_{s_1},\varvec{1}_{s_2},\ldots ,\varvec{1}_{s_{n^{'}}})^T\).

Clearly, it results in a two-block separable convex optimization model (26) with both linear equality and inequality constraints. Let \(\varvec{\lambda } = \left( \varvec{u}, \lambda _{+}, \lambda _{-} \right) ^{T} \,\), where \(\varvec{u} \in \mathbb {R}^n \,\), \(\lambda _{+},\, \lambda _{-} \in \mathbb {R}_{+}^n \,\), be the Lagrangian multiplier to the associated linear constraints, and define the following augmented Lagrangian function for the optimization problem (26)

$$\begin{aligned} \mathcal {L_\beta }\left( {\varvec{x},q;\,\varvec{\lambda }} \right) = \left\langle {\hat{\varvec{f}}}, \varvec{x} \right\rangle \, - \, \sum _{i \in S} g^*(q_i) \, - \, \left\langle \varvec{\lambda },A\varvec{x} + Bq \right\rangle \, + \, \frac{\beta }{2} \left\Vert A\varvec{x} + Bq\right\Vert ^2 \,. \end{aligned}$$
(27)

Therefore, we propose the single-stage ALM algorithm to the two-block separable convex optimization problem, which repeatedly explores the following steps in a prediction-correction way until convergence [14], i.e., for each kth iteration:

  • (Prediction Step) With given \(\left( A\varvec{x}^k, Bq^k, \varvec{\lambda }^k \right) \,\), find \(\left( \tilde{\varvec{x}}^k, \tilde{q}^k, \tilde{{\varvec{\lambda }}}^k \right) \,\) via

    $$\begin{aligned} \tilde{\varvec{x}}^k \, := \, \text {arg}\text {min}_{\varvec{x} \in \Omega }\, \left\langle {\hat{\varvec{f}}}, \varvec{x} \right\rangle \, - \, \varvec{x}^{T} A^T {{\varvec{\lambda }}^k} \, + \, \frac{\beta }{2} \left\Vert A \left( \varvec{x} - \varvec{x^k}\right) \right\Vert ^2 \,, \end{aligned}$$
    (28)
    $$\begin{aligned} \tilde{q}^k \! := \! \text {arg}\text {min}_{q \ge \varvec{0}} -\! \sum _{i \in S} g^*(q_i) \!- \! q^{T} B^T {{\varvec{\lambda }}^k}\! +\! \frac{\beta }{2} \left\Vert B\left( q \!- \! q^k\right) + A\left( \tilde{\varvec{x}}^k \!-\! \varvec{x^k}\right) \right\Vert ^2 \,, \end{aligned}$$
    (29)
    $$\begin{aligned} \tilde{\varvec{\lambda }}^k \, := \, {\varvec{\lambda }}^k \,-\, \beta \, \left( A \tilde{\varvec{x}}^k + B \tilde{q}^k \right) \,. \end{aligned}$$
    (30)
  • (Correction Step) Correct the solved predictors \(\left( \tilde{\varvec{x}}^k, \tilde{q}^k, \tilde{\varvec{\lambda }}^k \right) \,\), and generate the new iterate \(\left( A\varvec{x}^{k+1}, Bq^{k+1}, \varvec{\lambda }^{k+1} \right) \,\) with \(v \in (0,1)\,\) by

    $$\begin{aligned} A\varvec{x}^{k+1} \, := \, A \varvec{x}^{k} \, - \, v \left( A \varvec{x}^{k} - A \tilde{\varvec{x}}^k \right) \, + \, v \left( B q^{k} - B \tilde{q}^k \right) \,, \end{aligned}$$
    (31)
    $$\begin{aligned} Bq^{k+1} \, := \, Bq^{k} \,-\, v \left( B q^{k} - B \tilde{q}^k \right) \,, \end{aligned}$$
    (32)
    $$\begin{aligned} {\varvec{\lambda }}^{k+1} \, := \, \tilde{{\varvec{\lambda }}} \,+\, v\beta \left( A \varvec{x}^{k} - A \tilde{{\varvec{x}}}^k \right) \,. \end{aligned}$$
    (33)

Here, for simplicity, we set \( v = 0.1 \) for all formulas (31), (32) and (33). Please refer to “Appendix C” for detailed implementation for solving the sub-problems (28) and (29). It is worth noting that, given that (26) has both equality and inequality constraints, the update of \(\tilde{{\varvec{\lambda }}}\) in (30) can be expressed particularly as below:

$$\begin{aligned}&\tilde{\varvec{u}}^k \, := \, {\varvec{u}}^k \,-\, \beta \, \big ( \sum _{i \in S} {\tilde{w}_i}^k {h_i}^T + \textrm{Div} {\tilde{\varvec{p}}}^k \big ) \,, \end{aligned}$$
(34a)
$$\begin{aligned}&\tilde{\lambda _{+}}^k \, := \, \left[ {\lambda _{+}}^k \,-\, \beta \, \left( {\tilde{w}}^k + {\tilde{q}}^k\right) \right] _{+} \,, \end{aligned}$$
(34b)
$$\begin{aligned}&\tilde{\lambda _{-}}^k \, := \, \left[ {\lambda _{-}}^k \,-\, \beta \, \left( - {\tilde{w}}^k + {\tilde{q}}^k\right) \right] _{+} \,, \end{aligned}$$
(34c)

where \(\varvec{u}\) is associated with the equality constraint, \({\lambda }_{+}\) and \({\lambda }_{-}\) are associated with the two inequality constraints respectively.

It is easy to see that the above proposed single-stage optimization algorithm updates all the variables \(\left( {\varvec{x}}, q, {\varvec{\lambda }} \right) \,\), i.e., \(\left( w, \varvec{p}, q, \varvec{u}, \lambda _{+}, \lambda _{-} \right) \,\), at the same time. As a result, unlike the classical two-stage optimization approach, the proposed single-stage algorithm deals with all the variables under a global optimization perspective. Numerical experiments demonstrate that the single-stage optimization algorithm is more seamless, requiring less manual intervention in the estimation of the variable \(q\,\), whereas in the two-stage optimization approach, the handoff between its two computation stages is artificially determined.

4 Experiments

This section presents extensive experiments to show the performance of two-stage and single-stage approaches for the proposed convex-concave optimization model (4). Especially, two different ways are applied to implement the solvers of the sub-problems in the two-stage and single-stage approaches, i.e., projected gradient method (PG) and indefinite linearized method (IDL), which leads to four different algorithms, so-called Two-stage\(\_\)PG, Two-stage\(\_\)IDL, Single-stage\(\_\)PG and Single-stage\(\_\)IDL, respectively.

We first perform a numerical comparison of proposed single-stage and two-stage approaches implemented by PG and IDL, respectively, in Sect. 4.2. The test problem in this part is image deblurring. Since these four algorithms focus on the same non-convex model (4), their comparison results better reflect the differences between the single-stage approach and the two-stage approach. It is clear that the single-stage optimization algorithms are more numerically efficient than the two-stage ones. Then, in Sect. 4.3, we no longer perform the proposed two-stage algorithms, and compare our single-stage algorithms with other state-of-the-art non-convex methods, including \(l_0\)-pseudo-norm-based data fidelity algorithm \(l_0\)TV-PADMM [44], SCAD function-based data fidelity algorithm TVSCAD [11], and log function-based data fidelity algorithm TVLog [45]. The tests in this part are three typical problems of image restoration: image denoising, deblurring and scratched image denoising. All the algorithms were implemented by Matlab R2020b and performed in a laptop with a 2.90 GHz CPU and 8GB RAM.

Fig. 1
figure 1

All images used in experiments. The size of images from ae is \(321\times 481\times 3\,\); the size of images from fh is \(481\times 321\times 3\,\); and the size of images from ik is \(512\times 512\)

4.1 Implementation Details

The clean test images are presented in Fig. 1. For the denoising test, the test images are polluted by 30–70\(\%\) of the pixels with salt-and-pepper (SP) noise or gaussian random-valued (RV) noise. For the deblurring test, the test images are artificially blurred by \(P \otimes \varvec{u} + e\,\), where P represents a blurring operation. In experiments, five types of blurring operations are tested, they are the defocus blurring operation with a radius \(r=7\), the gaussian blurring operation with parameter \(( [9 \ 9],10 )\), the average blurring operation with parameter 9, the motion blurring operation with parameter [15, 30], and the disk blurring operation with parameter 4. And here e represents the additive SP noise or RV noise with 30–70\(\%\) of the pixels.

In all text cases, for Two-stage\(\_\)PG, we simply set the algorithm parameters \((\beta ,\sigma ,\mu ) = (0.1, 0.1, 10)\,\); for Two-stage\(\_\)IDL, the algorithm parameters are set as \((\beta ,\tau ) = (0.08, 0.75)\,\); for Single-stage\(\_\)PG, the algorithm parameters are set as \((\beta ,\sigma ,\mu ) = (0.03, 0.1, 10)\,\); and for Single-stage\(\_\)IDL, the algorithm parameters are set as \((\beta ,\tau ) = (0.06, 0.75)\,\). In different application scenarios, we only tune the model parameters \((\lambda , \alpha )\,\). For the parameters in \(l_0\)TV-PADMM, TVSCAD, and TVLog, we set them as suggested in [11, 44, 45].

For two-stage optimization approach, we set the inner stopping criterion as

$$\begin{aligned} \frac{\left\| {\varvec{u}^{k+1} - \varvec{u}^k} \right\| _F}{\left\| \varvec{u}^{k+1}\right\| _F} < 5e^{-5} \end{aligned}$$
(35)

and the outer one is

$$\begin{aligned} \frac{\left\| {\varvec{u}^{k+1} - \varvec{u}^k} \right\| _F}{\left\| \varvec{u}^{k+1}\right\| _F} + \frac{\left\| {q}^{k+1} - {q}^k \right\| _F}{\left\| {q}^{k+1}\right\| _F} < 1e^{-4}\,. \end{aligned}$$
(36)

Meanwhile, to keep the stopping criterion of the single-stage optimization consistent with that of the two-stage optimization, we set the stopping criterion for the single-stage optimization approach to be the same as (36).

To illustrate the recovery performance of these methods, the Peak Signal to Noise Ratio (PSNR) and the Structural Similarity (SSIM) are used to measure the quality of the restored images. They are computed as follows:

$$\begin{aligned} \begin{aligned}&\text {PSNR}(\varvec{u},\varvec{u_0}) = 10{\log _{10}}\left( {\frac{{\text {max}{{({\varvec{u}}_0)}^2}}}{{\text {MSE}}}} \right) \,, \\&\text {SSIM}(\varvec{u},\varvec{u_0}) = \frac{\left( 2 \bar{\varvec{u}} \bar{\varvec{u}}_0 + c_1\right) \left( 2 \sigma _{\varvec{u} \varvec{u_0}} + c_2 \right) }{\left( \bar{\varvec{u}}^2 + \bar{\varvec{u}}_0^2 + c_1\right) \left( \sigma _{\varvec{u}}^2 + \sigma _{\varvec{u}_0}^2 + c_2\right) } \,, \end{aligned} \end{aligned}$$
(37)

where \(\varvec{u}\) is the recovered image, \(\varvec{u}_0\) is the original image, MSE represent the mean square error between \(\varvec{u}\) and \(\varvec{u}_0\), \(\bar{\varvec{u}}\), \(\bar{\varvec{u}}_0\), \(\sigma _{\varvec{u}}\), \(\sigma _{\varvec{u}_0}\) and \(\sigma _{\varvec{u} \varvec{u_0}}\) are the local means, standard deviations, and cross-covariance for images \(\varvec{u}\), \(\varvec{u}_0\,\), \(c_1\), \(c_2\) are the constants for stability.

4.2 Numerical Comparisons of Proposed Algorithms

This section compares numerical performance of the proposed four algorithms, i.e., Two-stage\(\_\)PG, Two-stage\(\_\)IDL, Single-stage\(\_\)PG and Single-stage\(\_\)IDL, over experiments of image deblurring. We first use an example to illustrate numerical comparisons between two-stage and single-stage algorithms under different parameters. The comparisons results between PG-based and IDL-based implementations are shown in Sects. 4.2.1 and 4.2.2, respectively. Then, we compare all these four algorithms’ performance for deblurring more blurring operators in Sect. 4.2.3.

Fig. 2
figure 2

Experiment results of image deblurring by the four proposed algorithms. Parameter settings for sub-figures c and d are \(\lambda = 0.022, \, \alpha = 0.12\,\), and ones for sub-figures e and f are \(\lambda = 0.015, \alpha = 0.05\,\)

4.2.1 Comparisons Between PG-based Implementations

The test image, as shown in Fig. 2b, is artificially blurred ‘aircraft’ image using the defocus blur operation, and \(30\%\) of the pixels are contaminated by the RV noise. Experiment results by the four proposed algorithms show that all algorithms successfully obtain convergence and achieve similar results in terms of PSNR, SSIM and visuality. The single-stage based algorithms reach convergence faster than the two-stage based ones, and, for the same optimization framework of two-stage or single-stage, the IDL implementation is faster than the PG-based implementation. In the following part, we provide detailed performance comparisons between PG-based implementations in “Appendices B.1 and C.1” and IDL-based ones in ”Appendices B.2 and C.2” respectively.

Fig. 3
figure 3

Comparisons between PG-based implementations: in each graph, the x-coordinate is a log of the iteration number and the y-coordinate is a log of the MSE

For PG-based implementations, see “Appendices B.1 and C.1” for numerical details, the key parameters of these algorithms include the ALM penalty parameter \(\beta \), PG step-size \(\mu \), and TV step-size \(\sigma \). Figure 3 clearly illustrates convergence rates with different parameter settings: we initially set \(\beta = 0.1\,\), \(\sigma = 0.1\,\), \(\mu = 10\) for the two-stage optimization approach based on the PG implementation, and \(\beta = 0.03\,\), \(\sigma = 0.1\,\), \(\mu = 10\) for the single-stage optimization approach based on the PG implementation; then, we change one parameter but keep the others constant for experiments.

In general, as shown in Fig. 3, the single-stage algorithm with the PG implementation converges faster than the two-stage one, and both obtain similar visual results (see Fig. 2); in particular, the single-stage algorithm with the PG implementation converges fairly smoothly, while the two-stage algorithm with the PG implementation demonstrates a zig-zag convergence after an initial smooth decline. In fact, this is true since, for the two-stage algorithm, an update of q at the second stage indeed influences its global convergence and makes it slower. Meanwhile, it also shows that the single-stage algorithm is much less sensitive w.r.t. parameter settings than the two-stage one; in another words, the single-stage algorithm performs more stable than the two-stage one.

4.2.2 Comparisons Between IDL-based Implementations

For both two-stage and single-stage approaches, the IDL implementations, see “Appendices B.2 and C.2” for numerical details, generally converge faster than their corresponding PG implementations. The related key parameters of these algorithms include the ALM penalty parameter \(\beta \) and the IDL factor \(\tau \). Figure 4 shows the convergence rates with different parameter settings: we initially set \(\beta = 0.08\), \(\tau = 0.75\) for the two-stage optimization approach based on the IDL implementation, and \(\beta = 0.06\,\), \(\tau = 0.75\,\) for the single-stage optimization approach based on the IDL implementation; then, we change one parameter but keep the other one constant for experiments.

Fig. 4
figure 4

Comparisons between IDL-based implementations: in each graph, the x-coordinate is a log of the iteration number and the y-coordinate is a log of the MSE

Table 1 Experiment results for IDL-based implementations

In general, as Fig. 4 shown, the two-stage algorithm with the IDL implementation still has a zig-zag convergence, and the single-stage algorithm with the IDL implementation converges smoothly and stablely. In detail, as Table 1 shown, the single-stage algorithm with the IDL implementation has a substantially lower total iteration number (It.) than the two-stage one while both reach comparable visual results (see Fig. 2). It is noteworthy that, for the Single-stage\(\_\)IDL algorithm, the lower the \(\tau \,\), the fewer iterations to attain convergence, and the higher the accuracy (i.e., the lower the MSE) of the convergence results, which is consistent with the conclusion of the work [13]. Meanwhile, for the Two-stage\(\_\)IDL algorithm, the lower IDL parameter \(\tau \) can reduces the iteration numbers of the ALM algorithm that solves its first-stage sub-problem, and thus reduces the overall iteration numbers. However, from Table 1, the lower the \(\tau \), the poorer the solution accuracy (i.e., the larger the MSE) of Two-stage\(\_\)IDL algorithm. This is because the update of q at the second stage influences global convergence and results poor accuracy.

4.2.3 Comparisons Under Different Blur Operators

In this part, we test the performance of proposed four algorithms, i.e., Two-stage\(\_\)PG, Two-stage\(\_\)IDL, Single-stage\(\_\)PG, and Single-stage\(\_\)IDL, under more kind of blur operators. The RNSR, SSIM, and Iteration numbers results of all the algorithms for recovering different types of blur operators are list in Table 2. Generally, from this Table, for all blur operators, the four duality-based two-stage and single-stage algorithms reach convergence successfully, and the single-stage algorithms have fewer iteration numbers compared with two-stage ones. Among them, when the test images are contaminated by additive SP noise, regardless of what kind of blur operators, the Single-stage\(\_\)IDL algorithm can achieve the highest PSNR values and the smallest iteration numbers. When the test images are contaminated by additive RV noise, in some cases for average, motion, and disk blur operators, the two-stage algorithms may obtain higher PSNR or SSIM values; however, the Single-stage\(\_\)IDL algorithm achieves the best PSNR or SSIM values in most cases, and it still has a obvious advantage in the minimum iteration numbers in all cases.

Table 2 PSNR/SSIM/It. results for image deblurring under different blur operators and \(30\%\) additive noise

4.3 Experiments of Different Image Restoration Problems

In part 4.2, we have demonstrated the advantage of the single-stage approach compared with the two-stage approach to solving the same non-convex model (4). And in this section, we no longer perform the two-stage algorithms and compare our single-stage algorithms (Single-stage\(\_\)PG and Single-stage\(\_\)IDL) with other state-of-the-art non-convex models and algorithms, which are \(l_0\)TV-PADMM [44], TVSCAD [11], and TVLog [45]. The test experiments are three typical image restoration scenarios: image denoising, deblurring, and scratched image denoising.

4.3.1 Image Denoising

In this experiment, we demonstrate the performance of all five algorithms on image denoising. Figure 5 shows the visual results of different algorithms for denoising corrupted ‘castle’ image with a noise level of \(30\%\,\). Its first row shows the recovered images for denoising the SP noise, and in this case the parameter settings for the single-stage algorithms are \(\lambda = 0.06, \, \alpha = 0.02\,\). In general, the duality-based single-stage\(\_\)IDL algorithm achieves the best visuality, PSNR value, SSIM value, and iteration numbers. And since the \(l_0\)TV-PADMM, Single-stage\(\_\)GP, and Single-stage\(\_\)IDL algorithms all use the prior information of SP noise, i.e., the indicating matrix O, their visual results, PSNR and SSIM values are better than those of the TVSCAD and TVLog algorithms. The problem of denoising the additive RV noise is usually more challenging in practice compared to denoising the additive SP noise. The recovered images for the additive RV denoising are shown at the second row of Fig. 5, and the parameter settings for the single-stage algorithms are \(\lambda = 0.53, \, \alpha = 0.02\,\). It shows that our proposed Single-stage\(\_\)IDL algorithm still get the best visuality, PSNR value, and iteration numbers; the TVSCAD algorithm performs well in the SSIM value.

Fig. 5
figure 5

Visual results for image denoising by different algorithms. Top: denoising with SP noise. Bottom: denoising with RV noise

Table 3 PSNR/SSIM/It. results for images denoising

Table 3 shows the image recovery PSNR/SSIM/It. results for image denoising when adding SP or RV noise at different noise levels. Generally, for the additive SP noise, the single-stage algorithms have advantage in the fewer iteration numbers. The single-stage\(\_\)IDL algorithm outperforms other algorithms in PSNR, SSIM, and iteration numbers in most cases. The \(l_0\)TV-PADMM also has advantages in some cases. The TVSCAD and TVLog algorithms perform somewhat disadvantageously since they do not employ the noise type indicting matrix O. For the additive RV noise, the Single-stage\(\_\)IDL algorithm achieves the minimum iteration numbers in all the cases, the highest PSNR and SSIM values in most cases. The TVSCAD and TVLog algorithms also have advantages in PSNR or SSIM values in some cases. However, as two specific multi-stage methods, they have disadvantages in terms of iteration numbers. Especially when the noise level increases, the iteration numbers of TVSCAD and TVLog algorithms increase rapidly, but the iteration numbers of the single-stage algorithms can still be kept small. This is because the single-stage approach is a global optimization framework that can optimize all the variables simultaneously, which is more stable and less affected by the increase in noise level.

4.3.2 Image Deblurring

In this part, we show the performance of the \(l_0\)TV-PADMM, TVSCAD, TVLog, and the proposed single-stage algorithms on image deblurring with defocus blur operator. In the deblurring with SP noise experiments, we found the noise type indicating matrix O had a detrimental effect on our algorithms, so we did not utilize it. Obviously, unlike image denoising, the problem of image deblurring has additive noise as well as a multiplicative kernel, which is hence more difficult to handle than image denoising.

Fig. 6
figure 6

Visual results for image deblurring by different algorithms. Top: defocused deblurring with additive SP noise. Bottom: defocused deblurring with additive RV noise

Table 4 PSNR/SSIM/It. results for images deblurring

Figure 6 shows the visual results of different algorithms for deblurring corrupted ‘boat’ image with a noise level of \(30\%\). The first row displays the recovered images for deblurring with the additive SP noise, and in this case the parameter settings for the single-stage algorithms are \(\lambda = 0.019, \, \alpha = 0.01\,\). It is clear that all five algorithms successfully obtain convergence. Among them, the TVLog algorithm obtains the best PSNR and SSIM, the Single-stage\(\_\)IDL algorithm achieves the best iteration number and CPU time. Besides, the recovered images for the additive RV deblurring are shown at the second row of Fig. 6, in this case the parameter settings for the single-stage algorithms are \(\lambda = 0.032, \, \alpha = 0.03\,\). Not surprisingly, all five algorithms obtain convergence successfully with similar results. Meanwhile, the Single-stage\(\_\)IDL algorithm achieves the best PSNR, SSIM, iteration number, and CPU time.

Table 4 shows the more images recovery PSNR/SSIM/It. results for image deblurring when adding SP or RV noise at different noise levels. As for the image deblurring with addictive SP noise, the \(l_0\)TV-PADMM and TVLog have advantages over PSNR and SSIM in most cases, and the proposed Single-stage\(\_\)IDL still has advantages in iteration numbers. Especially when the noise level becomes high, for some test images with rich details, such as starfish or spaceman, the number of iterations of the TVSCAD and TVLog algorithms increase very quickly. But the Single-stage\(\_\)IDL algorithm can still maintain a small iteration number to reach convergence. As for image deblurring with additive RV noise, the \(l_0\)TV-PADMM has advantages on PSNR and SSIM in some large noise level cases, the Single-stage\(\_\)IDL algorithm has advantages on PSNR and SSIM in most cases. Similarly, in terms of computational efficiency, the Single-stage\(\_\)IDL algorithm still has a superiority compared with TVSCAD and TVLog models.

4.3.3 Scratched Image Denoising

In the last experiment, we demonstrate the performance of all five algorithms on restoring the images corrupted by nonuniform scratches, i.e. with image information loss. The test images f are the images of ‘pepper’, ‘barbara’ and‘pirate’ with manual scratches, as shown in the first column of Fig. 7. Recovering such scratched images is typically more challenging than image denoising and deblurring.

Fig. 7
figure 7

Experiment results of scartched image denoising by different algorithms. Top: denoising for scratched ‘pepper’. Middle: denoising for scratched ‘barbara’. Bottom: denoising for scratched ‘pirate’

The recovered images for denoising the scratched ‘pepper’ are shown at the first row of Fig. 7, and the parameter settings for the single-stage algorithms are \(\lambda = 0.19, \, \alpha = 0.05\,\). It is obvious to see that all the proposed algorithms reach convergence successfully with similar results in terms of visuality. The \(l_0\)TV-PADMM algorithm achieves the best PSNR and SSIM. Meanwhile, the Single-stage\(\_\)IDL algorithm achieves the best iteration numbers and CPU time. The recovered images for denoising the scratched ‘barbara’ are shown at the second row of Fig. 7, the parameter settings for the single-stage algorithms are \(\lambda = 0.19, \, \alpha = 0.05\,\). Clearly, the \(l_0\)TV-PADMM, Single-stage\(\_\)PG, and Single-stage\(\_\)IDL algorithms obtain well visual results. And the Single-stage\(\_\)IDL algorithm has an advantage in PSNR, SSIM, and iteration numbers. The recovered images for denoising the scratched ‘pirate’ are shown at the third row of Fig. 7, the parameter settings for the single-stage algorithms are \(\lambda = 0.95, \, \alpha = 0.05\,\). All the proposed algorithms still obtain convergence successfully, with similar results in PSNR and visuality. Moreover, the single-stage optimization algorithm with the PG implementation converges significantly faster than the other ones.

5 Conclusion

In this paper, we propose a non-convex and non-smooth optimization model for image restoration problems with different types of corruption. The model is made of a concave-convex sparse promoting function to measure the data fidelity and a convex TV regularization component. Using the primal-dual technique, the proposed difficult non-convex and non-smooth problem can be surprisingly reduced to a convex optimization problem with linear equality and inequality constraints. Based on these observations, an efficient augmented Lagrangian method, built by two different implementations, i.e., projected-gradient and indefinite-linearization, is proposed to solve the target non-smooth and non-convex optimization problem. Clearly, the proposed new dual optimization framework deals with both the non-convex and convex terms of the studied optimization problem jointly from a global optimization perspective, which is different from the classical two-stage convex-concave way, and hence enjoys great benefits in both theory and numerics. Extensive experiments in image restoration, along with image denoising, image deblurring, and scratched image denoising, have been performed; and the experiment results show that the proposed dual optimization-based single-stage algorithm outperforms the multi-stage methods in numerical efficiency, stability, and image recovery quality. In addition, the Single-stage\(\_\)IDL algorithm is competitive with the \(l_0\)TV-PADMM [44].