1 Introduction

With the rapid development of computer technology, images are widely used in each field, especially in remote sensing and medical imaging. However, the images are usually polluted by all kinds of noises in the transmission. Due to the poor characteristics of noisy image, we cannot directly use noisy image in practical image applications (such as image classification, super-resolution, image enhancement and recognition). Therefore, image noise removal is an indispensable step in many practical applications, and an active topic in computer low-level vision. Formally speaking, image noise removal is an inverse problem which reconstructs the original image as much as possible from its polluted one, and the problem has been extensively studied in recent decades [8, 11, 18, 21,22,23, 25, 34, 39, 41,42,43,44,45, 47, 48, 68, 70]. Image denosing can be modeled as follows,

$$ f = Au + n $$
(1)

Here, u denotes the origin image and Ω is a bounded open domain with a compact Lipschitz boundary. The observed image f : Ω → R is partitioned into a true image and additive noise n. A can either be an identity operator or a blurring operator. In practical applications, two types of noises are often encountered during image transmission and image acquisition: white Gaussian noise (GN) and impulse noise (IN). Commonly, GN is brought up into images due to the thermal motion in camera, while IN is generally caused by the damaged pixels in camera transmission and sensors [6].

In recent decades, various techniques are proposed for image denoising, such as the methods in [63, 71, 74] for GN, the methods for IN removal [41, 44, 68, 70], and Poisson noise [28, 73]. Although those algorithms have powerful accomplishment for single noise reduction, they don’t perform well in mixed noise denoising because different noises have different distributions.

In most previous works, Maximum a posteriori (MAP) estimation approach is often used for image denosing algorithms. We can formalize the problem as follows,

$$ \hat u = \underset{u}{\arg \max } p\left( {u|f} \right) $$
(2)

Here, the observed image f(x), and the original image u(x) are two random variables for each x ∈Ω. Therefore, we can apply Bayes rule, and then obtain,

$$ \begin{array}{@{}rcl@{}} \hat u &=& \underset{u}{\arg \max } p\left( {u|f} \right)\\ & =& \underset{u}{\arg \max } \frac{{p\left( {f|u} \right)p\left( u \right)}}{{p\left( f \right)}}\\ &=& \underset{u}{\arg \min } - \log \left( {f|u} \right) - \log \left( u \right)\\ &=& \underset{u}{\arg \min } - \int\limits_{\Omega} {\log p\left( {f\left( x \right)|u\left( x \right)} \right)} dx - \log p\left( u \right) \end{array} $$
(3)

Normally, the first term captures the noise-data distribution which is often called data fidelity term, and the second term denotes the prior information of origin image which is usually named regular term. In practical applications, the L2-norm is used to fit the GN distribution, and L1-norm data fidelity term is employed to simulate IN distribution, respectively [41, 44, 63, 68, 70, 71, 74]. Consequently, based on Bayes method, GN can be removed successfully by solving the following optimization problem,

$$ \hat u = \underset{u}{\arg \min } \left\| {f - u} \right\|_{2}^{2} + \lambda R(u) $$
(4)

here, fRN, uRN, λ, and R(u) refer to noisy image, clean image, weight parameter, and the regularization term, respectively. Similarly, IN can be removed by solving the following minimization problem,

$$ \hat u = \underset{u}{\arg \min } \left\{ {{{\left\| {f - Au} \right\|}_{1}} + \lambda R\left( u \right)} \right\} $$
(5)

With the above derivations for the GN or IN image, now we can derive a model that links both types of noises. If we set up a multi-noise likelihood, according to Bayes’ Theorem, and then we can obtain,

$$ \hat u = \underset{u}{\arg \min } \left\{ {{\mu_{1}}\left\| {f - u} \right\|_{2}^{2} + {\mu_{2}}{{\left\| {f - Au} \right\|}_{1}} + \lambda R\left( u \right)} \right\} $$
(6)

here, μ1 and μ2 are also weight parameters.

Minimization problem (6) is a template of many variational optimization models. And in the past years, the variational optimization model for image processing is an active topic in computer vision all the time. Those models mainly consist of two parts: data-fidelity and regularization term. Regularization term can maintain the smoothness of the objective function and recover a clean image with sharp edges and fine image detail information. Model with regularization term is also called variational regularization [62], which have been applied to both statistical learning and variational workframe problems [4, 26, 30,31,32, 46], and if it is a convex regularization model, a unique solution of optimization problem exists. However, if it is a nonconvex regularization one, the solution of optimization problem is not necessarily unique. Recently, nonconvex regularization has become a popular method because it can restore high-quality images with well-kept detail information of local characteristics and edges. Therefore, Some sholars explored the theoretical solution and numerical solution in [53], [31, 32, 50, 55, 59], respectively. Since the nonconvex and nonsmooth regularization is a powerful method, there is still exploration space for improvement of performance in computer vison, especially for image noise reduction. Subsequently, in this paper, we introduce an efficient image mixed noised removal algorithm with multi-fidelity terms based on nonsmooth and nonconvex optimization.

Although a theoretical solution to the minimization problem with a nonconvex and nonsmooth regularization has not yet been proposed, many numerical algorithms, such as iteratively reweighted least squares algorithms [19, 31, 54, 66], half-quadratic algorithms [2, 14, 65], and graduated nonconvexity algorithms (GNC) [5, 55, 56], have been proposed. However, the GNC method, which accommodates the minimizer of nonsmooth and nonconvex potentials, requires considerable computational time. The iteratively reweighted (IR) method [15] was proposed to track the sparsity properties of the regularization comprehensive sensing problems, which has been introduced to process the image restoration problems [35, 57]. In consequence, in this paper, a nonsmooth and nonconvex regularization method is used to solve the image restoration problem. Meanwhile, due to the nonconvex regularization, we adopt a multistage nonconvex relaxation method [72] that gave a more better solution than the standard convex relaxation solution.

Apart from variational optimization models, the most often-used mixed noise removal approaches are based on the “detecting then filtering” strategy. Trilateral filter [29] combined a gradient bilateral filter and an intensity bilateral filter with a pyramidbased method to limit filter window. Nonlocal mean (NLM) [8, 9] filter averages all pixels in the image to remove image noise and the filter can also be set as a regular term in variational optimization models. Recently, with development of compressive sensing technology, sparse representation has become a powerful tool in various computer vision assignments. Some scientific researchers used the detection trick for mixed noise removal based on the sparse representation model. Tao et al. [64] have done a lot of precious works, and they used a powerful manifold ranking and optimization algorithm-based matrix factorization method for the saliency detection.

With the widespred appllication of deep learning, several discriminative learning methods have been developed to learning image prior models in the context of truncated inference procedure. For example, Chen et al. [16, 17, 27] proposed a trainable nonlinear reaction diffusion model, which learnt a modified fields of experts [61] image prior by unfolding a fixed number of gradient descent inference steps. And He et al. [37] proposed a pulse-coupled neural network (PCNN)-based algorithm to underwater image enhancement scheme for robotic visual systems. The experimental results shown that the enhanced result improved the color and contrast of the source image and enhanced the details and edges of darker regions, which made a great contribution for deep-sea exploration tasks.

In this paper, we propose an efficient mixed GN and IN removal approach based on the L1 and L2 data-fidelity terms. Moreover, we also introduce a nonsmooth and nonconvex regularization term for solving the image restoration problem. Additionally, we use the multistage convex relaxation method in order to deal with the nonconvex regularization, which provides a good performance solution. Experimental results of simulated noisy images illustrate that our method outperforms current state-of-the-art mixed-noise removal methods.

2 Preliminaries

2.1 Alternating direction method of multipliers

In this subsection, the theories underpinning the proposed image mixed noise removal algorithm are introduced. Firstly, the Alternating direction method of multipliers (ADMM) will be explained. Therefore, now we consider the following structured constrained convex optimization problem, called the Alternating Direction Method [7, 26, 36],

$$ \underset{x \in X,y \in Y} {\min }\left\{ {{\theta_{1}}\left( x \right) + {\theta_{2}}\left( y \right)|Ax + By = b,x \in X,y \in Y} \right\} $$
(7)

here, \({\theta _{1}}\left (x \right ):{R^{{n_{1}}}} \to R\), \({\theta _{2}}\left (x \right ):{R^{{n_{2}}}} \to R \) are convex functions (but not necessary smooth). \(A \in {R^{m \times {n_{1}}}}\), \(B \in {R^{m \times {n_{2}}}}\) and bRm, \(X \in {R^{{n_{1}}}}\), and \( Y \in {R^{{n_{2}}}}\) are given closed convex sets. Let λ be the Lagrangian multiplier for the linear constraints Ax + By = b in (7). Then, the augmented Lagrangian function is as follows,

$$ L\left( {x,y,\lambda } \right) = \arg \min \left\{ {\begin{array}{*{20}{c}} {{\theta_{1}}\left( x \right) + {\theta_{2}}\left( y \right) - {{\left( {{\lambda^{k}}} \right)}^{T}}\left( {Ax + By - b} \right)}\\ { + \frac{\beta }{2}\left\| {Ax + By - b} \right\|_{2}^{2}} \end{array}\left| {\begin{array}{*{20}{c}} {x \in X}\\ {y \in Y} \end{array}} \right.} \right\} $$
(8)

which is defined on X × Y × Rm. Thirdly, let (x, y, z) be a saddle point of the Lagrangian function. So, (x, y, z) ∈Ω, and it satisfies,

$$ \left\{ {\begin{array}{*{20}{c}} {{\theta_{1}}\left( x \right) - {\theta_{1}}\left( {{x^{*}}} \right) + {{\left( {x - {x^{*}}} \right)}^{T}}\left( { - {A^{T}}{\lambda^{*}}} \right) \ge 0}\\ {{\theta_{2}}\left( y \right) - {\theta_{2}}\left( {{y^{*}}} \right) + {{\left( {y - {y^{*}}} \right)}^{T}}\left( { - {B^{T}}{\lambda^{*}}} \right) \ge 0}\\ {{{\left( {\lambda - {\lambda^{*}}} \right)}^{T}}\left( {A{x^{*}} + B{y^{*}} - b} \right) \ge 0\begin{array}{*{20}{c}} {}&{}&{}&{} \end{array}} \end{array}} \right.\begin{array}{*{20}{c}} {,\forall \left( {x,y,\lambda } \right) \in {\Omega} } \end{array} $$
(9)

here, Ω = X × Y × Rm. By defining

$$ u = \left( {\begin{array}{*{20}{c}} x\\ y \end{array}} \right),w = \left( {\begin{array}{*{20}{c}} x\\ y\\ \lambda \end{array}} \right),F\left( w \right) = \left( {\begin{array}{*{20}{c}} { - {A^{T}}\lambda }\\ { - {B^{T}}\lambda }\\ {Ax + By - b} \end{array}} \right) $$
(10)

and 𝜃 (u) = 𝜃1 (x) + 𝜃2 (y). The first-order optimal condition (9) can be written in a compact form as,

$$ {w^{*}} \in {\Omega} ,\theta \left( u \right) - \theta \left( {{u^{*}}} \right) + {\left( {w - {w^{*}}} \right)^{T}}F\left( {{w^{*}}} \right) \ge 0,\forall w \in \Omega $$
(11)

Note that the mapping F is monotone. We use Ω to denote the solution set of the variational inequality (11). For convenience, we use the notations, namely,

$$ v = \left( {\begin{array}{*{20}{c}} y\\ x \end{array}} \right),{V^{*}} = \left\{ {\left( {{x^{*}},{y^{*}}} \right)\left| {\left( {{x^{*}},{y^{*}},{\lambda^{*}}} \right) \in {\Omega} } \right.} \right\} $$
(12)

For a given λk, uk+ 1 = (xk+ 1, yk+ 1) is the solution of the following problem,

$$ \left( {{x^{k + 1}},{y^{k + 1}}} \right)\\ = \arg \min \left\{ {\begin{array}{*{20}{c}} \begin{array}{l} {\theta_{1}}\left( x \right) + {\theta_{2}}\left( y \right)\\ - {\left( {{\lambda^{k}}} \right)^{T}}\left( {Ax + By - b} \right) \end{array}\\ { + \frac{\beta }{2}\left\| {Ax + By - b} \right\|_{2}^{2}} \end{array}\left| {\begin{array}{*{20}{c}} {x \in X}\\ {y \in Y} \end{array}} \right.} \right\} $$
(13)

The multiplier update is defined by,

$$ {\lambda^{k + 1}} = {\lambda^{k}} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}} - b} \right) $$
(14)

For a given (yk, λk), xk+ 1 is the solution of the following problem,

$$ {x^{k + 1}} = \left\{ \begin{array}{l} {\theta_{1}}\left( x \right) - {\left( {{\lambda^{k}}} \right)^{T}}\left( {Ax + B{y^{k}} - b} \right)\\ \begin{array}{*{20}{c}} {}&{}&{} \end{array} + \frac{\beta }{2}\left\| {Ax + B{y^{k}} - b} \right\|_{2}^{2} \end{array} \right\} $$
(15)

Using operator separation technique for λk, xk+ 1, and yk+ 1 respectively, we have the solution of the following problem,

$$ {y^{k + 1}} = \left\{ \begin{array}{l} {\theta_{2}}\left( y \right) - {\left( {{\lambda^{k}}} \right)^{T}}\left( {A{x^{k + 1}} + By - b} \right)\\ \begin{array}{*{20}{c}} {}&{}&{} \end{array} + \frac{\beta }{2}\left\| {A{x^{k + 1}} + By - b} \right\|_{2}^{2} \end{array} \right\} $$
(16)

and

$$ {\lambda^{k + 1}} = {\lambda^{k}} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}} - b} \right) $$
(17)

Note that (15) and (16) are equivalent to (18) and (19), respectively.

$$ {x^{k + 1}} = \left\{ {{\theta_{1}}\left( x \right) + \frac{\beta }{2}\left\| {\left( {Ax + B{y^{k}} - b} \right) - \frac{1}{\beta }{\lambda^{k}}} \right\|_{2}^{2}} \right\} $$
(18)
$$ {y^{k + 1}} = \left\{ {{\theta_{2}}\left( y \right) + \frac{\beta }{2}\left\| {\left( {A{x^{k + 1}} + By - b} \right) - \frac{1}{\beta }{\lambda^{k}}} \right\|_{2}^{2}} \right\} $$
(19)

2.2 The multi-stage convex relaxation method

In this subsection, because the multistage convex relaxation method will be used, so we first introduce it as following. From the multi-stage convex relaxation method first proposed in [72], we have the following optimization problem,

$$ \hat w = \underset{w \in {R^{d}}}{\text{argmin}} \left\{ {g\left( w \right) = {g_{0}}\left( w \right) + \sum\limits_{i = 1}^{I} {{g_{i}}\left( w \right)} } \right\} $$
(20)

where g0 (w) is convex in w and each gi (w) is nonconvex. The convex/concave duality that was first defined in [72] can solve this optimization problem efficiently.

Let \({h_{i}}\left (w \right ):{R^{d}} \to {{\Omega }_{i}} \subset {R^{{d_{i}}}}\) be a vector function with range Ωi, and assume that there is a function \({\bar g_{i}}\), which is defined on Ωi, such that,

$$ {g_{i}}\left( w \right) = \bar g\left( {{h_{i}}\left( w \right)} \right) $$
(21)

Suppose that exists hi such that the function \({\bar g_{i}}\left ({{z_{i}}} \right )\) is concave on zi ∈Ωi. Then, the function gi (w) can be rewritten as,

$$ {g_{i}}\left( w \right) = \underset{{v_{i}} \in {R^{{d_{i}}}}}{\min } \left\{ {{v_{i}^{T}}{h_{i}}\left( w \right) + g_{i}^{*}\left( {{v_{i}}} \right)} \right\} $$
(22)

Using the concave duality [60], if \( g_{i}^{*}\left ({{v_{i}}} \right ) \) is the concave dual of \({\bar g_{i}}\left ({{z_{i}}} \right )\), we define,

$$ g_{i}^{*}\left( {{v_{i}}} \right) = \underset{{z_{i}} \in {{\Omega}_{i}}}{\min } \left[ { - {v_{i}^{T}}{z_{i}} + {{\bar g}_{i}}\left( {{z_{i}}} \right)} \right] $$
(23)

The minimum of (22) can be obtained by solving the following equation,

$$ {\hat V_{i}} = {\nabla_{z}}{\bar g_{i}}\left( z \right)|z = {h_{i}}\left( w \right) $$
(24)

Secondly, we present a simple convex relaxation of (20),

$$ \hat w = \underset{w \in {R^{d}}}{\arg \min } \left\{ {{g_{0}}\left( w \right) + \sum\limits_{i = 1}^{I} {{g_{i}}{{\left( w \right)}^{T}}{v_{i}}} } \right\} $$
(25)

Therefore, we can summarize the multistage convex relation algorithm as Algorithm 1.

figure f

3 The related work

Numerous previous methods have been developed for image denoising in the past decades, and most of these works can be classified as two types methods, namely, deep learning Neural Network based method and regularization optimization based method.

3.1 Deep neural networks for image denoising

Several deep neural networks have been attempted to handle image noise reduction and image enhancement. Scientific scholars used sparse representation (SR) to deal with various image computer vision and image processing assignments in the past few years. In [69], the authors used sparse denoising auto-encoders technique to handle GN removal and obtained satisfactory results compared to K-SVD [25]. The multilayer perceptron (MLP) was victoriously used for image noise reduction in [10]. Convolutional neural networks (CNNs) for image noise removal has been used by Kingma and Ba [40], which have achieved comparable results. In [16] a trainable nonlinear reaction diffusion (TNRD) model was proposed which can be expressed as a feed-forward deep network by unfolding a fixed number of gradient descent inference steps. In [27] a trainable nonlinear reaction diffusion (TNRD) model for passion noise is proposed and achieves promising performance. In [37] pulse-coupled neural network (PCNN)-based image enhancement and color transfer algorithms are combined to enhance the underwater image, which make some contribution for deep-sea exploration businesses. Kai Zhang and Lei Zhang et. [75] used residual learning and batch normalization to speed up the training process as well as boost the denoising performance. To the best of our knowledge, it remains uninvestigated to develop CNN for general image denoising.

3.2 Regularization optimization for image noise reduction

Traditional regularization optimization is still an active method in image processing and computer vision tasks, and most of these works use different regularization terms about the prior information to constraint image u. If an observed image u is corrupted with GN only, the noise distribution can be fitted successfully by the L2-norm of the data fidelity term, thus we can solve the following optimization problem to remove advantageously the GN,

$$ \hat u = \underset{u}{\arg \min } \left\| {f - u} \right\|_{2}^{2} + R\left( u \right) $$
(26)

here, R (u) is a regular function. If we formulate the Gaussian model as a MAP estimation, we can obtain the following general form,

$$ \hat u = \underset{u}{\arg \min } \left\| {f - u} \right\|_{2}^{2} + p\left( u \right) $$
(27)

here, p (u) = R (u). Usually, we assume that u follows a Gibbs prior [3],

$$ p\left( u \right) = \exp \left( { - \gamma \int\limits_{\Omega} {\phi \left( {u\left( x \right)} \right)} } \right)dx $$
(28)

here, ϕ is a nonnegative function. Let ϕ (u (x)) = ∥∇u (x)∥1, ∇u (x) = (∇xu,∇yu) is the gradient of u, and ∥∇u (x)∥1 is the total variation (TV) of u in [62].

The TV-L1 model [13] is obtained as a variant of the ROF model by replacing the squared-norm in the data fidelity term by the robust-norm. Although this was only a slight change, the TV-L1 model has some potential advantages over the ROF model. For example, 1) the TV-L1 model contrasts in variant; and 2) the TV-L1 model is much more effective in removing strong impulse noise (e.g., salt- and pepper noise). Because the Mumford-Shah (MS) model [49] is non-convex, therefore, it is difficult to solve the functional minimizer. To explore the true implicit solution of the MS function, Cai et al. [12] proposed the following equivalence model,

$$ \hat u = \underset{u}{\arg \min } \left\{ {\frac{{{\gamma_{1}}}}{2}\left\| {f - Au} \right\|_{2}^{2} + \frac{\alpha }{2}\left\| {\nabla u\left( x \right)} \right\|_{2}^{2} + {{\left\| {\nabla u\left( x \right)} \right\|}_{1}}} \right\} $$
(29)

here, γ1 and α are positive weight parameters.

Because the TV-L1 model is more effective than the MS model at removing strong impulse noise (e.g., salt-and pepper noise), the L1-data fidelity term ∥fAu1 plays an important role in the processing. The L1 data fidelity term has been extensively studied in [1, 51, 52]. Some researchers proposed the following model which is based on the TV-L1 model to remove strong impulse noise.

$$ \hat u = \underset{u}{\arg \min } \left\{ {\frac{{{\gamma_{1}}}}{2}{{\left\| {f - Au} \right\|}_{1}} + \frac{\alpha }{2}\left\| {\nabla u\left( x \right)} \right\|_{2}^{2} + {{\left\| {\nabla u\left( x \right)} \right\|}_{1}}} \right\} $$
(30)

To remove both GN and IN [38], a combined L1-L2 data fidelity terms were suggested by Shi, who combined (28) and (29) to obtain [43].

$$ \hat u = \underset{u}{\arg \min } \left\{ \begin{array}{l} \frac{{{\gamma_{1}}}}{2}{\left\| {f - Au} \right\|_{1}} + \frac{{{\gamma_{2}}}}{2}\left\| {f - Au} \right\|_{2}^{2}\\ \begin{array}{*{20}{c}} {}&{} \end{array} + \frac{\alpha }{2}\left\| {\nabla u\left( x \right)} \right\|_{2}^{2} + {\left\| {\nabla u\left( x \right)} \right\|_{1}} \end{array} \right\} $$
(31)

here, α > 0, γ1 ≥ 0, and γ2 ≥ 0 are parameters that balance the data fidelity terms and the regularization terms.

Currently, the patch-based nonlocal low rank regularization was shown to have better performance than the nuclear norm regularization in image reconstruction [22, 24, 34, 39, 41, 42]. Because the rank minimization problem is the key subproblem in patch-based nonlocal rank regularization methods, and it can be approximately solved by minimizing the following optimization problem [24].

$$ E\left( {u,\varepsilon } \right) = \log \left( {u + \varepsilon I} \right) $$
(32)

here, uRn×n is a symmetric positive semidefinite matrix, I is the identity matrix. In [58], Osher et al. proposed a novel low dimensional manifold model (LDMM), not only applied it to image noise reduction, but also achieved successful accomplishment. Tao et al. [64] used Manifold Ranking-Based Matrix Factorization (MRMF) for Saliency Detection. In the paper, they proved that the MRMF has good generalizability, and developed an efficient optimization algorithm based on the Nesterov method for Saliency Detection.

4 Proposed nonconvex image denoising model and algorithms

Based on the Shi-Model [43], we propose the following image denosing model,

$$ \hat u = \underset{u}{\arg \min } \left\{ \begin{array}{l} \frac{{{\gamma_{1}}}}{2}{\left\| {f - Au} \right\|_{1}} + \frac{{{\gamma_{2}}}}{2}\left\| {f - Au} \right\|_{2}^{2}\\ \begin{array}{*{20}{c}} {}&{}&{} \end{array} + \frac{\alpha }{2}\left\| {\nabla u\left( x \right)} \right\|_{2}^{2} + p\left( u \right) \end{array} \right\} $$
(33)

Using the Gibbs prior \(p\left (u \right ) = \exp \left ({ - \gamma \int \limits _{\Omega } {\phi \left ({u\left (x \right )} \right )} } \right )dx\), Shi let ϕ (u (x)) = ∥∇u (x)∥1, which resulted in [16]. To improve the image denoising effect, we define the Gibbs prior nonconvex regularizer as \(\int {\phi \left ({\left | {\nabla u} \right |} \right )} dx\), here, ϕ is a nonconvex function, namely,

$$ \left\{ {\begin{array}{*{20}{c}} {{\phi_{1}}\left( {\left| t \right|} \right) = \frac{{{{\left| t \right|}^{p}}}}{{1 + \rho {{\left| t \right|}^{p}}}}}\\ {{\phi_{2}}\left( {\left| t \right|} \right) = \frac{1}{\rho }\log \left( {1 + \rho {{\left| t \right|}^{p}}} \right)} \end{array}} \right. $$
(34)

for p ≥ 1.

Enven though the existence and uniqueness of solution of nonconvex regularizing optimization is still an open problem, which can preserve the more detail information of image during image processing. Therefore, we propose the following model (based on the Shi-model),

$$ \hat u = \underset{u}{\arg \min } \left\{ \begin{array}{l} \lambda {\left\| {f - Au} \right\|_{1}} + \frac{\mu }{2}\left\| {f - Au} \right\|_{2}^{2}\\ + \frac{\alpha }{2}\left\| {\nabla u\left( x \right)} \right\|_{2}^{2} + \int\limits_{\Omega} {\phi \left( {\left| {\nabla u} \right|} \right)dx} \end{array} \right\} $$
(35)

here, the Gibbs prior \(p\left (u \right ) = \exp \left ({ - \gamma \int \limits _{\Omega } {\phi \left ({u\left (x \right )} \right )} } \right )dx = \int {\phi \left ({\left | {\nabla u} \right |} \right )} dx\), and ϕ is the nonconvex function defined in (34).

In the following subsections, we present the optimization algorithm to solve the proposed model (35), which is based on a variable splitting technique [67]. To deal with the nonconvex regularizer, a multistage convex relaxation method is used.

We exploit the algorithmic framework of the variable splitting technique [67] to solve the unconstrained problem was described by (35). To obtain variable splitting, we introduce the auxiliary variables h and d under the constraints h = fAu, ∇u = d subject to

$$ \left\{ {\begin{array}{*{20}{c}} { \underset{u,h,d}{\min } \left\{ \begin{array}{l} \int\limits_{\Omega} {\left| h \right|dx + \frac{\mu }{2}\int\limits_{\Omega} {{{\left| {f - Au} \right|}^{2}}} } dx\\ \begin{array}{*{20}{c}} {} \end{array} + \frac{\alpha }{2}\left\| {\nabla u} \right\|_{2}^{2} + \int\limits_{\Omega} {\phi \left( {\left| d \right|} \right)dx} \end{array} \right\}}\\ {s.t.h = f - Au,\nabla u = d} \end{array}} \right. $$
(36)

Next, we exploit the algorithmic framework of the Split Bregman Iteration Method (SBIM) [33] to solve (36). The additional details of the SBIM algorithm are as follows,

$$ \left( {{u^{k + 1}},{h^{k + 1}},{d^{k + 1}}} \right)\\ = \underset{u,d,h}{\arg \min } \left\{ \begin{array}{l} F\left( {h,d} \right) + \frac{{{\gamma_{1}}}}{2}\left\| {h - \left( {f - Au} \right) - {b_{1}^{k}}} \right\|_{2}^{2}\\ + \frac{{{\gamma_{2}}}}{2}\left\| {d - \nabla u - {b_{2}^{k}}} \right\|_{2}^{2} \end{array} \right\} $$
(37)
$$ b_{1}^{k + 1} = {b_{1}^{k}} + \left( {f - A{u^{k + 1}}} \right) - {h^{k + 1}} $$
(38)
$$ b_{2}^{k + 1} = {b_{2}^{k}} + \nabla {u^{k + 1}} - {d^{k + 1}} $$
(39)

here,

$$ F\left( {h,d} \right) = \left( \begin{array}{l} \lambda \int\limits_{\Omega} {\left| h \right|dx + \frac{\mu }{2}\int\limits_{\Omega} {{{\left| {f - Au} \right|}^{2}}dx} } \\ \begin{array}{*{20}{c}} {}&{} \end{array} + \frac{\alpha }{2}\left\| {\nabla u} \right\|_{2}^{2} + \int\limits_{\Omega} {\phi \left( {\left| d \right|} \right)dx} \end{array} \right) $$
(40)

To solve the minimization problem (37), an alternative iteration algorithm is used. Specifically, the solution of minimization problem (uk+ 1, hk+ 1, dk+ 1) are defined as follows,

$$ {u^{k + 1}} \in \underset{u}{\arg \min } \left\{ \begin{array}{l} \frac{\mu }{2}\left\| {f - Au} \right\|_{2}^{2} + \frac{\alpha }{2}\left\| {\nabla u} \right\|_{2}^{2}\\ + \frac{{{\gamma_{1}}}}{2}\left\| {h - \left( {f - Au} \right) - {b_{1}^{k}}} \right\|_{2}^{2}\\ \begin{array}{*{20}{c}} {}&{} \end{array} + \frac{{{\gamma_{2}}}}{2}\left\| {{d^{k}} - \nabla u - {b_{2}^{k}}} \right\|_{2}^{2} \end{array} \right\} $$
(41)
$$ {h^{k + 1}} \in \underset{h}{\arg \min } \left\{ {\lambda \int\limits_{\Omega} {\left| h \right|dx + \frac{{{\gamma_{1}}}}{2}\left\| {h - \left( {f - Au} \right){{ - }}{b_{1}^{k}}} \right\|_{2}^{2}} } \right\} $$
(42)
$$ {d^{k + 1}} \in \underset{d} {\arg \min }\left\{ {\int\limits_{\Omega} {\phi \left( {\left| d \right|} \right)dx + \frac{{{\gamma_{2}}}}{2}\left\| {{d^{k}} - \nabla u - {b_{2}^{k}}} \right\|_{2}^{2}} } \right\} $$
(43)

It is noteworthy that the alternative (SBIM) is equivalent to the alternative ADMM [7] when ϕ is convex.

4.1 u-subproblem

Due to (41), the optimal value of uk+ 1 must satisfy the following Euler-Lagrange equation,

$$ \left[ {\left( {\mu + {\gamma_{1}}} \right){A^{T}}Au - \left( {\alpha + {\gamma_{2}}} \right){\Delta} } \right]u\\ = \left( \begin{array}{l} \mu {A^{T}}f + {\gamma_{1}}{A^{T}}\left( {f + {b_{1}^{k}} - {h^{k}}} \right)\\ \begin{array}{*{20}{c}} {}&{}&{} \end{array} - {\gamma_{2}}div\left( {{d^{k}} - {b_{2}^{k}}} \right) \end{array} \right) $$
(44)

We can use the Gauss-Seidel method to solve this equation as in [3]. Equation (44) may also be solved efficiently in the discrete Fourier domain if we assume periodic boundary conditions. If we define \(\mathcal {F}\left (u \right )\) as the Fourier transform of u, the close-form solution of (44) can be written as follows,

$$ u = {\mathcal{F}^{- 1}}\left( {\frac{\begin{array}{l} \mu \mathcal{F}\left( {{A^{T}}f} \right) + {\gamma_{1}}\mathcal{F}\left[ {{A^{T}}\left( {f + {b_{1}^{k}} - {h^{k}}} \right)} \right]\\ - {\gamma_{2}}\mathcal{F}\left( {div\left( {{d^{k}} - {b_{2}^{k}}} \right)} \right) \end{array}}{{\left( {\mu + {\gamma_{1}}} \right){A^{T}}A - \left( {\alpha + {\gamma_{2}}} \right)\mathcal{F}\left( {\Delta} \right)}}} \right) $$
(45)

4.2 h-subproblem

Similarly, due to the minimization problem of (42), and we can get its approximate solution by the following shrinkage operator,

$$ {h^{k + 1}} = shrink\left( {f + {b_{1}^{k}} - A{u^{k + 1}},\frac{\lambda }{{{\gamma_{1}}}}} \right) $$
(46)

here, shrink (s,t) is the shrinkage operator [67], and which was defined at each point α ∈ [0,1]2 by

$$ shrink\left( {{s_{\alpha} },{t_{\alpha} }} \right) = \frac{{{s_{\alpha} }}}{{\left| {{s_{\alpha} }} \right|}}\max \left( {\left| {{s_{\alpha} }} \right| - {t_{\alpha} },0} \right) $$
(47)

4.3 d-subproblem

Moreover, resembling to the u-subproblem, we can obtain dk+ 1 by solving the following nonconvex minimization problem,

$$ \underset{d}{\min } \left\{ {\int {\phi \left( {\left| d \right|} \right)dx + \frac{{{\gamma_{2}}}}{2}\left\| {d - s} \right\|_{2}^{2}} } \right\} $$
(48)

here, ϕ is the nonconvex regularizer in (36), and \(s = \nabla {u^{k + 1}} + {b_{2}^{k}}\). To handle the nonconvex term, we apply the multistage convex relaxation technique, which was described above.Consequently, we directly use the result in [3]. Specifically, the solution of (48) can be easily obtained as follows,

$$ {d^{k + 1}} \in \underset{d} {\arg \min }\left\{ {\int\limits_{\Omega} {{v^{k + 1,l}}\left| d \right|dx + \frac{{{\gamma_{2}}}}{2}\left\| {d - s} \right\|_{2}^{2}} } \right\} $$
(49)
$$ {d^{k + 1,l + 1}} = shrink\left( {s,\frac{{{v^{k + 1,l}}}}{\gamma }} \right) $$
(50)
$$ v_{1}^{k + 1,l + 1} = \frac{1}{{{{\left( {1 + \rho \left| {{d^{k + 1,l + 1}}} \right|} \right)}^{2}}}}\begin{array}{*{20}{c}} {} \end{array}for\begin{array}{*{20}{c}} {} \end{array}\phi = {\phi_{1}} $$
(51)

In this paper, we decompose the difficult optimization problem (37) into three subproblems (the u, h, and d-subproblems) based on the SBIM. In addition, all the subproblems have fast and accurate techniques for obtaining the solutions. For example, the u-subproblem can be solved efficiently by using the FFT technique or Gauss-Seidel iteration method. Furthermore, the h-subproblem and d-subproblem can be resolved successfully through applying the shrinkage operator. Eventually, the optimization procedure is summarized in Algorithm 2.

figure g

5 Experimental results

In this section, we present the results of our application of a nonconvex and nonsmooth mixed noise removal algorithm to the image restoration problem. Firstly, we employ a genetic algorithm aims to select the optimal parameters of the model. Secondly, we compare the noise removal effects of other models on multiple different levels noise images.

Because the proposed model with regularizer is nonconvex and nonsmooth, therefore, we could not give analysis convergence of the algorithm with an increasing number of iteration steps (Fig. 1b). Accordingly, we use the number of iterations of the algorithm as our termination condition. Additionally, we calculate the relationship between the number of algorithm iteration steps and the optimal ISNR on 100 images and chose to terminate with 16 iterations (Fig. 1a).

Fig. 1
figure 1

a The relationship between the number of algorithm iteration steps and frequency of convergence; b The relationship between the number of algorithm iteration steps and the optimal ISNR on 100 images

Moreover, we compare the models TVL1 [13], LRTM [20] and SHI [43] by applying them to several images, e.g., Parrots, Monarch, Boats, Cameraman, House and Lena (Fig. 2). Furthermore, we compare the models’ ISNR and PSNR values after the image was denoised. The peak signal to noise ratio (PSNR) and improvement signal to noise ratio (ISNR) were chosen as the quantitative measures of image quality.

Fig. 2
figure 2

The six classic Ground Truth (af): a is the Parrot, b is the Monarch, c is the Boat, d is the Cameraman, e is the House, f is the Lena

We utilize the improvement in the signal-to-noise ratio (ISNR) and the Peak Signal to Noise Ratio (PSNR) to evaluate the effectiveness of all methods for removing the noise, and the two metrics can be defined as follows,

$$ \begin{array}{@{}rcl@{}} ISNR\left( {{u_{*}},\hat u,\tilde u} \right) = 10{\log_{10}}\left( {\frac{{\sum {{{\left( {{u_{*}} - \tilde u} \right)}^{2}}} }}{{{{\sum {\left( {{u_{*}} - \hat u} \right)} }^{2}}}}} \right)\\ PSNR\left( {{u_{*}},\hat u} \right) = 10{\log_{10}}\left( {\frac{{{{255}^{2}}mn}}{{\left\| {{u_{*}} - \hat u} \right\|_{2}^{2}}}} \right) \end{array} $$

here, uRm×n is the clean image, \(\hat u \in {R^{m \times n}}\) is the restored image, and \(\tilde u \in {R^{m \times n}}\) is the contaminated image. Note that a high value (ISNR and PSNR) indicates a better restored result.

5.1 Parameter selection

There are a total of 5 required parameters in our algorithm, and there are two methods with which to select them. In the first method, the parameters were chosen by their experience. In the second method, one fixes the values of one or more of the parameters and varies the values of other parameters. We used a genetic algorithm to find the best parameters. In the proposed model, the parameters (a,b,c, and d) should be satisfied the following conditions, namely, a > 0, b < 0, and d > 0. Our goal is to choose one of the best points in the parameter space. Therefore, we chose our parameters to maximize the average of the ISNR and PSNR values on the training set P, and such that the objective function of the genetic algorithm is defined by,

$$ \left\{ {\begin{array}{*{20}{c}} {O\left( {a,b,c,d} \right) = \sum\limits_{p \in P} {ISNR\left( {M\left( {p|a,b,c,d} \right)} \right)} }\\ {O\left( {a,b,c,d} \right) = \sum\limits_{p \in P} {PSNR\left( {M\left( {p|a,b,c,d} \right)} \right)} }\\ {P = \left\{ {{p_{1}},{p_{2}},{p_{3}},{p_{4}},...,{p_{n}}} \right\}} \end{array}} \right. $$
(52)

here, pi is an image in training set, and a, b, c, and d are the model parameters that need to be selected. M represents the proposed model, and O is the objective function of genetic algorithm.

In the genetic algorithm, the number of chromosomes is 20, the heritability is 0.85, and the number of reproduction generation is 400. The following figure is the result of the parameter selection with 0.1 Gauss noise (Fig. 3).

Fig. 3
figure 3

The relationship between generation and ISNR parameter selection with 0.1 Gauss noise

According to the result of the operation, we choose the optimal parameters as follows, a = 0.01, b = 0.20,... It needs to be explained that in the following experiments, we used this same method to select the optimal parameters for different types of noise.

5.2 Comparison of pure noise experiments

In this subsection, we compare the denoising effects of the different models on pure noise images. Moreover, we compare Gauss noise with salt and pepper noise with the noise levels set at 0.01, 0.1, and 0.4, respectively. The denoising ISNR and PSNR results of the four models for different images are shown in Tables 1 and 2. The GN0.01 denoising visualization effect of Parrots, Monarch, Boats, Cameraman, House and Lena have been shown in Fig. 4. And The SP0.1 denoising visualization effect has been shown in Fig. 5.

Table 1 The denoising ISNR and PSNR results of the four models for different images with GN
Table 2 The denoising ISNR and PSNR results of the four models for different images with SP noise
Fig. 4
figure 4

Denoising effect of noisy images (GN0.01): The first column is the LRTM, the second column is the TVL1, the third column is the SHI model, and the fourth column is Ours

Fig. 5
figure 5

Denoising effect of noisy images (SP0.1): The first column is the LRTM, the second column is the TVL1, the third column is the SHI model, and the fourth column is Ours

The results in Tables 1 and 2 illustrate that our proposed model algorithm is superior to other models in terms of: running time, indices of Gaussian and salt-and-pepper noise images, and the local visualization quality after restoration.

5.3 Comparison of mixed noise experiments

In this subsection, in order to verify the robustness of the proposed model, we perform an experimental comparison on images with mixed noise. Where the mixed noise is a mixture of Gaussian noise and salt and pepper noise, the noise levels are GN0.01 + SP0.01, GN0.01+SP0.1, GN0.1+SP0.01, and GN0.1+SP0.1, respectively. The experimental results are shown in Table 3, and the effective images before and after denoising of the different models are shown in Fig. 5.

Table 3 The denoising ISNR and PSNR results of the four models for different images with MIXED noise

It can be seen from Table 3 that the proposed model algorithm is superior to the other three models in terms of running time and indexes on images with mixed noise, and the quality of local visualization after restoration is better than that of other models (Fig. 6).

Fig. 6
figure 6

Denoising effect of noisy images (GN0.01+SP0.01): The first column is the LRTM, the second column is the TVL1, the third column is the SHI model, and the fourth column is Ours

6 Conclusion

In this paper, we proposed a nonconvex and nonsmooth regular image mixed noise removal algorithm based on the Shi model by solving the nonconvex minimization problem. we use the multistage nonconvex relaxation technique, which aims to deal with the nonconvex term. Furthermore, we employ a genetic algorithm to select the optimal parameters for the model and set the number of iteration steps as the termination condition for the algorithm. Several, experiments on different images with different noise levels illustrate that the model’s robustness, running time, ISNR and PSNR are better than the other three models. Additionally, our algorithm can maintain the local information of the images with better visual quality.