Keywords

1 Introduction

Noise removal is a long standing problem in image processing. Many different methods have been proposed for image denoising problem, among which the well-known Rudin-Osher-Fatemi (ROF) total variation (TV) model is one of the most popular methods [8]. This model restores the image by solving a minimization problem

$$\begin{aligned} \arg \min _u\{\frac{1}{2}\Vert u-f\Vert _2^2+\lambda \,\mathrm {{TV}}(u)\}, \end{aligned}$$
(1)

where f is the observed noisy image, \(\mathrm {TV}(u)\) denotes the total variation of u and \(\lambda \) is the regularization parameter to balance both terms in (1). The distinctive feature of the total variation is that it can preserve sharp edges well. However, natural images usually contain smooth part that can be destroyed by TV, and therefore, the denoised image by total variation regularization often suffers the undesired staircasing effects in the flat region of the image [10]. Addressing this drawback of the total variation, many improved ROF models have been proposed in the recent literature. These approaches can be roughly divided into two categories. In the first category, some local gradient information is added to the total variation to make it differentiate different features of the image [4, 5, 11]. Another category to reduce the staircasing effect in the restored image is to incorporate higher order derivatives into the regularization term. One successful approach in this direction was give in [2] who suggested to use the infimal convolution of functionals with first and second order derivatives as regularizer. The resulting image denoising model has the form of

$$\begin{aligned} \arg \min _u\{\frac{1}{2}\Vert u-f\Vert _2^2+\mathrm {ICTV}(u)\}, \end{aligned}$$
(2)

where \(\mathrm {ICTV}(u)\) is the infimal convolution regularization that will be defined in Sect. 2. Although the staircaing effects caused by total variation can be alleviated to some extent, the infimal convolution regularization often penalizes the smooth regions of the image too much and may blur the sharp edges and finer details of the image.

In this paper, a novel higher order total variation regularization is proposed to remove the staircasing effects while still preserve sharp edges and finer details in the restored image. We first give an equivalent formulation of the traditional infimal convolution regularizer and motivate the proposed higher order total variation by relaxing the constrains appearing in traditional infimal convolution regularizer. This modification will not increase any difficulties for the numerical treatments of the resulting image denoising model. We characterize the solution of the proposed model via fixed points of fixed point equations in which the proximity operators of convex functions are involved. Based on the characterization, the convergent proximity algorithm is developed to solve the model.

The paper is organized as follows. We shall briefly introduce the difference matrices that are used to define total variation and the regularization terms in this paper in Sect. 2. In Sect. 3, we present the higher order total variation regularization term and the proposed image denoising model. We characterize the solution of the model via fixed point equations and develop numerical algorithms for the proposed model in Sect. 4. Numerical experiments showing the performance of our method is given in Sect. 5. Finally, we conclude our paper in Sect. 6.

2 Difference Matrices and Total Variation

In this section, we present the precise definition of the regularization terms appearing in model (1) and (2). For the sake of simplicity, we assume the image is an \(n\times n\) square matrix for some integer n. To describe the problem in matrix algebra language, we treat the image as a vector in \(\mathbb {R}^N\) by sequentially concatenating the columns of the image. Here we assume that \(N=n^2\). To define the total variation of the image u, we need a \(n \times n\) backward difference matrix

$$\begin{aligned} D:=\left[ \begin{array}{cccc} 0&{}&{}&{} \\ -1 &{}1&{}&{} \\ &{}\ddots &{}\ddots &{}\\ &{} &{}{-1} &{}{1} \\ \end{array}\right] , \end{aligned}$$

then \(-D^\top D\) is the second order central difference matrix. Since we reshape the square image into a vector, we can use

$$D_x:=I_n\otimes D,~~D_y:=D\otimes I_n$$

as the first order difference matrices for the image, where \(\otimes \) denotes the Kronecker product of the matrices and \(I_n\) denotes the \(n\times n\) identity matrix. Hence, by setting the gradient matrix

$$B_1:=[D_x;D_y]$$

which is formed by stacking two matrices into a single one, and \(\varphi :\mathbb {\mathbb {R}}^{2N}\rightarrow \mathbb {R}\) defined for \(z\in \mathbb {R}^{2N}\) as

$$\begin{aligned} \varphi (z):=\sum _{i=1}^N\sqrt{z_i^2+z_{i+N}^2}, \end{aligned}$$
(3)

the composition \(\varphi (B_1u)\) computes the total variation \(\mathrm {TV}(u)\) appearing in the model (1).

The second order difference matrices for the image can be defined as

$$D_{xx}:=I_n\otimes (-D^\top D),~~D_{yy}:=(-D^\top D)\,\otimes \,I_n,~~D_{xy}:=(-D^\top )\,\otimes \,D,~~D_{yx}:=D\,\otimes \,(-D^\top ),$$

using which we can get the second order gradient matrix

$$B_2:=[D_{xx};D_{xy};D_{yx};D_{yy}].$$

The first and second order gradient matrices \(B_1\) and \(B_2\) are related by

$$B_2=RB_1$$

where

$$R=\begin{bmatrix} I_n\otimes -D^t&0\\ -D^t\otimes I_n&0\\ 0&I_n\otimes -D^t\\ 0&-D^t\otimes I_n \end{bmatrix}.$$

Based on our notation, the regularizer \(\mathrm {ICTV}(u)\) in (2) which use the infimal convolution of functionals with first and second order derivatives has the form of

$$\begin{aligned} \mathrm {ICTV}(u):=\min _{u=u_1+u_2}\{\lambda _1 \varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\}, \end{aligned}$$
(4)

where \(\lambda _1\) and \(\lambda _2\) are positive parameters balancing between the first and second order derivatives, and \(\psi :\mathbb {\mathbb {R}}^{4N}\rightarrow \mathbb {R}\) is defined \(z\in \mathbb {R}^{4N}\) as

$$\begin{aligned} \psi (z):=\sum _{i=1}^N\sqrt{\sum _{j=0}^3z_{i+jN}^2}. \end{aligned}$$
(5)

3 The Proposed Model

In this section, we give the proposed higher order total variation regularization term and the resulting image denoising model. The proposed regularizer can be seen as the improved version of the infimal convolution regularizer in (4). The following lemma presents an equivalent form of the infimal convolution regularizer in (4) which facilitates us to motivate the proposed higher order total variation regularization term.

Lemma 1

For any u, we have that

$$\begin{aligned} \mathrm {ICTV}(u)=\min \limits _{\begin{array}{ll} B_1u=d+x\\ d,\,x\in \mathcal {R}(B_1) \end{array}}\{\lambda _1\varphi (d)+\lambda _2\psi (Rx)\}, \end{aligned}$$
(6)

where \(\mathcal {R}(\cdot )\) denotes the range of a matrix.

Proof

Clearly, we can conclude that

$$\min \limits _{\begin{array}{ll} B_1u=d+x\\ d,\,x\in \mathcal {R}(B_1) \end{array}}\{\lambda _1\varphi (d)+\lambda _2\psi (Rx)\}=\min \limits _{B_1u=B_1u_1+B_1u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\}$$

by setting \(d=B_1u_1\) and \(x=B_1u_2\).

Since \(u=u_1+u_2\) implies that \(B_1u=B_1u_1+B_1u_2\), we can get

$$\begin{aligned} \min \limits _{u=u_1+u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\}\ge \min \limits _{B_1u=B_1u_1+B_1u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\} \end{aligned}$$
(7)

On the other hand, for any \(u_1,~u_2\) satisfying \(B_1u=B_1u_1+B_1u_2\), there exists \(v\in \mathcal {N}(B_1)\) such that \(u=u_{1}+u_{2}+v\), and we can have

$$\begin{aligned} \lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)= & {} \lambda _1\varphi (B_1(u_{1}+v))+\lambda _2\psi (B_2u_{2}) \nonumber \\\ge & {} \min \limits _{u=u_1+u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\} \end{aligned}$$

which implies that

$$\min \limits _{B_1u=B_1u_1+B_1u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi _2(B_2u_2)\}\ge \min \limits _{u=u_1+u_2}\{\lambda _1\varphi (B_1u_1)+\lambda _2\psi (B_2u_2)\}$$

This together with (7) complete the proof.

The proposed higher order total variation regularization term can be seen as the improved version of the infimal convolution regularizer and has the form of

$$\begin{aligned} \mathrm {HTV}(u)=\min \limits _{ B_1u=d+x}\{\lambda _1\varphi (d)+\lambda _2\psi (Rx)\}. \end{aligned}$$
(8)

As can be seen from Lemma 1, the difference between the original infimal convolution relarization and its modified version (8) consists in the relaxed conditions on the new regularizer. For \(\mathrm {HTV}(u)\) we no longer have the restriction that \(d,x\in \mathcal {R}(B_1)\) and \(B_1u\) can be decomposed into any components d and x. The authors in [9] showed that this modification can generally lead to better numerical results than the original model (4). This modification was given a theoretical fundament in [1] based on tensor algebra and the corresponding new regularizer was called total generalized variation.

Using the higher order total variation in (8) as the regularizer, the proposed image denoising model is

$$\begin{aligned} \arg \min _u\{\frac{1}{2}\Vert u-f\Vert _2^2+\mathrm {HTV}(u)\}, \end{aligned}$$
(9)

which can be reformulated as an equivalent unconstrained two variables minimization problem

$$\begin{aligned} \arg \min _{u,d}\{\frac{1}{2}\Vert u-f\Vert _2^2+\lambda _1\varphi (d)+\lambda _2\psi (B_2u-Rd)\}. \end{aligned}$$
(10)

Both above models are equivalent in the sense that if the pair \((\hat{u},\hat{d})\) is the solution of model (10) then \(\hat{u}\) is the solution of model (9). Therefore, we can solve model (10) as an alternative of directly solving model (9).

4 Minimization Algorithm

The main focus of this section is to give numerical proximity algorithm that solve the proposed model (10). Proximity algorithm was proposed in [6] for solving the ROF model and outperformed the benchmarking split Bregman iteration method. The application of the proximity algorithm to other image models can be found in [3, 7]. The main idea of this algorithm is firstly characterize the solution of the model via the fixed point of fixed point equations in which the proximity operators of the convex functions are included, and then develop efficient numerical algorithms via various fixed point iterations. The proximity operator of a real-valued function g is defined for \(x\in \mathbb {R}^d\) and \(\gamma >0\) by

$$\mathrm {prox}_{\gamma g}(x):=\arg \min _{y}\{\frac{1}{2}\Vert y-x\Vert _2^2+\gamma g(y)\}$$

For convenience of developing numerical algorithms, we can rewrite the proposed model into a general form of

$$\begin{aligned} \arg \min _{w}\{F(w)+\lambda _2\psi (Bw)\}, \end{aligned}$$
(11)

by defining the matrix B, the vector w and the function F as

$$B=[B_2,-R], w=[u;d]$$

and

$$F(w)=\frac{1}{2}\Vert u-f\Vert _2^2+\lambda _1\varphi (d)$$

respectively. The following proposition characterizes a solution of model (11) in terms of fixed point equations.

Proposition 1

If \(\hat{w}\) is a solution of model (11), then for any \(\sigma ,\gamma >0\), there exist two vectors bc such that

$$\begin{aligned} \hat{w}= & {} \mathrm {prox}_{\frac{1}{\sigma }F}((I-\frac{\gamma }{\sigma }B^tB)\hat{w}-\frac{\gamma }{\sigma }B^t(b-c))\nonumber \\ c= & {} \mathrm {prox}_{\frac{1}{\gamma }\psi }(b+B\hat{w})\\ b= & {} b+Bw-c \nonumber \end{aligned}$$
(12)

Conversely, if there exist positive parameters \(\sigma ,\gamma \) and vectors bc and \(\hat{w}\) satisfying equations in (12), then \(\hat{w}\) is a solution of model (11).

Proof

See Proposition 5 in [3].

According to Proposition 1, the solution of model (10) is characterized by the fixed point equations in (12). Based on these fixed point equations, we can naturally propose an iterative algorithm that generates the sequence \(\{w^k,c^k,b^k\}\) from arbitrary initial vector \((w^0,c^0,b^0)\). This iterative procedure may be stated as follows:

$$\begin{aligned} \left\{ \begin{array}{l} w^{k+1}=\mathrm {prox}_{\frac{1}{\sigma }F}((I-\frac{\gamma }{\sigma }B^tB)w^k-\frac{\gamma }{\sigma }B^t(b^k-c^k)\\ c^{k+1}=\mathrm {prox}_{\frac{1}{\gamma }\psi }(b^k+Bw^{k+1})\\ b^{k+1}=b^k+Bw^{k+1}-c^{k+1} \end{array} \right. . \end{aligned}$$
(13)

The following theorem establishes a convergence result of the proposed iterative scheme (13), its proof can be found in Theorem 4.1 in [3].

Theorem 1

If the positive numbers \(\gamma \) and \(\sigma \) are chosen to satisfy

$$\begin{aligned} \frac{\sigma }{\gamma }<\frac{1}{\Vert B\Vert _2^2}, \end{aligned}$$
(14)

then the sequence \(\{(w^k,b^k,c^k)\}\) generated by the iterative scheme (13) converges to the solution of model (10).

Implementing the iterative procedure (13) requires the availability of explicit forms of two proximity operators \(\mathrm {prox}_{\frac{1}{\sigma }F}\) and \(\mathrm {prox}_{\frac{1}{\gamma }\psi }\). We next present explicit forms of these proximity operators. We first give the closed forms of the proximity operator of F. Remembering that \(w=[u;d]\) and \(F(w)=\frac{1}{2}\Vert u-f\Vert _2^2+\lambda _1\varphi (d)\), the corresponding proximity operator of F in (13) reads

$$\begin{aligned} \begin{bmatrix}u^{k+1}\\d^{k+1} \end{bmatrix}&=\mathrm {prox}_{\frac{1}{\sigma }F}((I-\frac{\gamma }{\sigma }B^tB)w^{k}-\frac{\gamma }{\sigma }B^t(b^{k}-c^{k}))=\arg \min _{u,d}\{\frac{1}{2}\Vert \begin{bmatrix}u\\d \end{bmatrix}\nonumber \\&\qquad -\begin{bmatrix}(I-\frac{\gamma }{\sigma }B_2^tB_2)u^{k}-\frac{\gamma }{\sigma }B_2^t(b^k-c^k)\\ (I-\frac{\gamma }{\sigma }R^tR)d^{k}+\frac{\gamma }{\sigma }R^t(b^k-c^k) \end{bmatrix}\Vert ^2+\frac{1}{2\sigma }\Vert u-f\Vert ^2+\frac{\lambda _1}{\sigma }\varphi (d)\} \end{aligned}$$
(15)

The minimization problem in (15) is very easy to compute since it can solved separately with respect to \(u^{k+1}\) and \(d^{k+1}\), that is we have

$$\begin{aligned} u^{k+1}= & {} \frac{1}{1+\sigma }f+\frac{\sigma }{1+\sigma }((I-\frac{\gamma }{\sigma }B_2^tB_2)u^{k}-\frac{\gamma }{\sigma }B_2^t(b^k-c^k))\nonumber \\ d^{k+1}= & {} \mathrm {prox}_{\frac{\lambda _1}{\sigma }\varphi }((I-\frac{\gamma }{\sigma }R^tR)d^{k}+\frac{\gamma }{\sigma }R^t(b^k-c^k)). \end{aligned}$$
(16)

Both of the proximity operators \(\mathrm {prox}_{\frac{1}{\gamma }\psi }\) and \(\mathrm {prox}_{\frac{\lambda _1}{\sigma }\varphi }\) appearing in (13) and (16) have the closed-form solutions respectively. For \(t>0\) and \(x\in \mathbb {R}^{2N}\) and \(\widetilde{x}\in \mathbb {R}^{4N}\), set \(y:=\mathrm {prox}_{t\varphi }(x)\) and \(z:=\mathrm {prox}_{t\psi }(\widetilde{x})\), writing \(\overline{x}_i:=[x_i;x_{i+N}]\), and \(\overline{\widetilde{x}}_i:=[\widetilde{x}_i;\widetilde{x}_{i+N};\widetilde{x}_{i+2N};\widetilde{x}_{i+3N}]\), for \(i=1,2,\cdots ,N\), we have

$$\begin{aligned} \begin{bmatrix}y_i\\ y_{i+N}\end{bmatrix}=\max \left\{ \Vert \overline{x}_i\Vert _2-t,0 \right\} \frac{\overline{x}_i}{\Vert \overline{x}_i\Vert _2} \end{aligned}$$
(17)

and

$$\begin{aligned} \begin{bmatrix}z_i\\ z_{i+N}\\ z_{i+2N}\\ z_{i+3N}\end{bmatrix}=\max \left\{ \Vert \overline{\widetilde{x}}_i\Vert _2-t,0 \right\} \frac{\overline{\widetilde{x}}_i}{\Vert \overline{\widetilde{x}}_i\Vert _2} \end{aligned}$$
(18)

respectively.

In summary, we have the following convergent algorithm for solving problem (10):

figure a

5 Experiments

In this section, we justify the proposed higher order total variation regularization term and image denoising model by its performance in removing Gaussian noise from images. The peak signal-to-noise ration(PSNR) is used to measure the quality of the restored image. We select the standard pepper and lena images and a CT image as our testing data. Both of the lena and pepper are with the size of \(256\times 256\), and the CT image is with the size of \(555\times 555\). All of the images are shown in Fig. 1. The regularization parameters are chosen to produce the highest PSNR values in all of the experiments. The stopping criterion of all of the algorithms is that the relative error between the successive iterates of the restored images should satisfy the following inequality

$$\frac{\Vert u^{i+1}-u^i\Vert _2^2}{\Vert u^i\Vert _2^2}\le 10^{-4}$$

where \(u^i\) is the denoised image at the i-th iteration. Each PSNR value reported in this section is the averaged result of three runs.

Fig. 1.
figure 1

The original images.

Fig. 2.
figure 2

The restored CT images in the case of Gaussian noise with \(\delta \)=10.

We add additive white Gaussian noise with standard deviation \(\delta =10\) and \(\delta =20\) to the lena, pepper and CT images respectively. We use the ROF model (1), the infimal convolution model (2), and the model (9) which is based on our proposed higher order total variation to restore the images. The PSNR results of all restored images are shown in Table 1. We can see that our proposed model can produce the restored images with the higher PSNR values than those of the ROF model and infimal convolution model. To see the visual effects of the restored images, we show the CT images (the case of \(\delta =10\)) in Fig. 2. Figure 3 is the zoomed version of Fig. 2. It can be clearly seen the staircasing effects caused by the total variation when the image is zoomed in. The infimal convolution regularization can remove the staircasing effects caused by TV, however, the visible blurring artifacts at the edges are introduced. Our proposed higher order total variation regularization can produce more favorable result without visual artifacts while still preserve sharp edges and finer details well.

Table 1. The summary of the restoration results of Models (1), (2), and (9).
Fig. 3.
figure 3

Zoomed results of the restored CT images in the case of Gaussian noise with \(\delta \)=10.

6 Concluding Remarks

We propose an higher order total variation regularization for image denoising problem. This regularization term can be seen as the improvement of the traditional infimal convolution type regularization and can alleviate the staircasing effects in the restored image caused by total variation. We use the proximity algorithm to solve the resulting model and establish the convergence result of the algorithm. Numerical results presented in this paper demonstrate the efficiency of the proposed method.