1 Introduction

Image deblurring is an important inverse problem arising in many image processing applications in which an unknown image \(\mathbf {x}\in \mathbb {R}^{n}\) has to be estimated from noisy observations \(\mathbf {b}\in \mathbb {R}^{m}\) defined by

$$\begin{aligned} \mathbf {b} = \mathbf {A}\mathbf {x}+\varvec{\eta } \end{aligned}$$
(1)

where \(\varvec{\eta }\) is the unknown white Gaussian noise vector and \(\mathbf {A}\in \mathbb {R}^{m\times n}\) is the discretized linear blur operator. As usual, images are assumed to be represented as vectors, by storing the pixel values in some (e.g. lexicographical) order.

Least squares optimization with \(\ell _1\)-regularization is now the state-of-the-art approach to image restoration. In fact, in imaging problems, the presence of edges often causes the prior distribution of the unknown image \(\mathbf {x}\) to be not Gaussian and leads to outliers in the regularization term. In this context, the \(\ell _1\)-norm, which is less sensitive to outliers, is advantageous compared to the \(\ell _2\)-norm employed in Tikhonov-like regularization, and can be effectively used to promote sparse solutions [11, 43, 49, 52]. For these reason, the \(\ell _1\)-regularized least squares problem

$$\begin{aligned} \min _\mathbf {x} \; \phi (\mathbf {x}) = \frac{1}{2}\Vert \mathbf {A}\mathbf {x}-\mathbf {b}\Vert _2^2 + \lambda \Vert \mathbf {x}\Vert _1, \quad \lambda >0 \end{aligned}$$
(2)

has recently attracted considerable attention both in the context of imaging inverse problems [1, 26, 42, 58] and of compressed sensing [1419, 21, 50]. (For simplicity of notation, hereafter \(\Vert \cdot \Vert \) will denote the \(\ell _2\)-norm \(\Vert \cdot \Vert _2\)).

In this work, we focus on \(\ell _1\)-regularized least squares problems (2) arising in image deblurring applications where the observation operator \(\mathbf {A}\) describes spatially invariant blur [6]. Under periodic boundary conditions, \(\mathbf {A}\) is a block circulant with circulant blocks (BCCB) matrix and matrix–vector products can be efficiently performed via the FFT [33]. We suppose that \(m \ge n\) and that \(\mathbf {A}^H\mathbf {A}\) has full rank. This assumption is satisfied in a variety of practical applications and is frequently used in image deblurring and in the literature of imaging inverse problems (see, for example, [4, 23, 2729, 33, 34, 58]).

A variety of algorithms have been proposed in the literature for the solution of (2), especially in compressed sensing. The state-of-the-art methods for (2) are probably gradient descent-type methods since their computational cost is mainly due to matrix–vector products with \(\mathbf {A}\) and \(\mathbf {A}^H\). This class of methods includes the following popular methods: IST [20], TwIST [10], SparSA [56], FISTA [4], NESTA [5] and AdaptiveSPARSA [30]. SPGL1 [53] and GPSS [42] are gradient projection-type methods for the equivalent Lasso formulation of (2).

Fixed-point continuation methods [31, 32], as well as methods based on Bregman iterations [25, 57] and variable splitting, as SALSA [1] and C-SALSA [2], have also been recently proposed.

The dual formulation of (2) has been considered in the literature; in [27] and [34], a Newton projection method and an active-set-type method are respectively proposed to solve the dual problem.

A different approach to handle the nondifferentiability of the objective function of (2) consists in reformulating the nondifferentiable unconstrained problem (2) as a constrained differentiable one. The disadvantage of this strategy is that, usually, the number of variable is doubled. The interior point method of [35] solves a quadratic programming formulations of (2) obtained by introducing an auxiliary variable; it has been shown to be efficient for large scale problems since a preconditioned gradient method is used to compute the search direction.

An equivalent formulation of (2) as a quadratic programming problem with nonnegativity constraints can be done by splitting \(\mathbf {x}\) into two nonnegative variables representing its positive and negative parts. Several methods have been proposed for the solution of the resulting bound constrained quadratic program as gradient projection [22], interior point [23] and Newton projection [4547] methods.

The Newton projection methods (also referred to as Two-Metric projection methods) can be interpreted as scaled gradient projection methods where the scaling matrix is the inverse of a modified Hessian [8, 9, 24]. In particular, only the variables in the working set are scaled by the inverse of the corresponding submatrix of the Hessian, where the working set is defined as the complementary of the set of the null variables with positive gradient. Under proper hypothesis on the scaling matrix, the global convergence and local superlinear convergence of Newton projection methods con be proved [8, 24].

In the literature, Newton-like projection methods have been recently developed which are tailor-made for image restoration problems formulated as \(\ell _2\)-regularized optimization problems under nonnegativity constraints, both in the case of Gaussian and Poisson noise [3840]. In these methods, the Hessian matrix is suitably approximated so that its inversion can be performed in the Fourier space at a very low computational cost. Then, the variables in the working set are scaled by the corresponding submatrix of the inverse approximated Hessian. It is worth mentioning that, using this scaling, the superlinear convergence rate of Newton-based methods is lost. However, this Newton-like projection methods have been shown to perform well compared to accelerated gradient projection algorithms for \(\ell _2\)-regularized least squares problems [12, 37].

This work aims at developing a new efficient and effective \(\ell _1\)-based method for the restoration of images corrupted by blur and Gaussian noise. In particular, we consider the analysis formulation of the \(\ell _1\)-regularized least squares problem for the restoration of images which are quite sparse in the pixel representation as, for example, medical images and astronomical images. This approach has been introduced in [23, 36] and recently in [58]. Observe that in astronomical imaging, the \(\ell _1\)-norm penalty is used as a flux conservation constraint. Therefore, the contribution of this paper is twofold. First, we show, by numerical evidence, that the \(\ell _1\)-regularized least squares problem can be effectively used as a mathematical model of the problem of restoring images degraded by blur and Gaussian noise. Second, we develop an efficient Newton-like projection method for its solution. In the proposed approach, problem (2) is firstly formulated as a nonnegatively constrained quadratic program by splitting the variable \(\mathbf {x}\) into the positive and negative parts. Then, the quadratic program is solved by a special purpose Newton-like projection method where a fair regularized approximation to the Hessian matrix is proposed so that products of its inverse and vectors can be computed at low computational cost. As a result, the only operations required for the search direction computation are matrix–vector products involving \(\mathbf {A}\) and \(\mathbf {A}^H\). Since the developed method uses a fair modification to the Hessian matrix, in the sequel, it will be referred to as Modified Newton Projection (MNP) method. Moreover, the regularization strategy, used to improve the conditioning of the Hessian matrix, penalizes negative entries in \(\mathbf {x}\) so that nonnegative solutions are favored. The convergence of the MNP method is proved even if MNP slows the convergence rate of Newton-type methods. Even if the size of the problem is doubled, the low computational cost per iteration and less iterative steps make MNP quite efficient. It is worth mentioning that a similar approach is described in [46]; however, the projection L1 method of [46] uses, as the classical Two-Metric projection methods, a suitable sub-matrix of the Hessian for scaling the gradient.

The performance of MNP is evaluated on some image restoration problems and is compared with that of the state-of-the-art methods. The results of the comparative study show that MNP is competitive and in some cases is also able to outperform the state-of-the-art methods in terms of computational complexity and achieved accuracy.

The paper is organized as follows. In Sect. 2, the MNP method is presented and its convergence is analyzed. In this section, the computation of the search direction is also discussed. In Sect. 3, the numerical results are presented. Conclusions are given in Sect. 4.

2 The Modified Newton Projection Method

Before introducing our MNP method, we need to reformulate the \(\ell _1\)-regularized least squares problem (2) as a nonnegatively constrained quadratic program (NCQP). This strategy for dealing with the nondifferentiability of the \(\ell _1\)-norm is quite classic in the literature and has been previously adopted by several authors [13, 22, 23, 45, 46].

2.1 Nonnegatively Constrained Quadratic Program Formulation

If we split \(\mathbf {x}\) into its positive and negative parts, that is

$$\begin{aligned} \mathbf {x}=\mathbf {u}-\mathbf {v} \end{aligned}$$

where

$$\begin{aligned} \mathbf {u}=\max (\mathbf {x},0), \quad \mathbf {v}=\max (-\mathbf {x},0), \end{aligned}$$

then, we obtain the following nonnegatively constrained quadratic program formulation of the original \(\ell _1\)-regularized least squares problem (2):

$$\begin{aligned}&\min _{(\mathbf{u},\mathbf{v})} \; {\mathcal {F}}(\mathbf{u},\mathbf{v})=\frac{1}{2}\Vert \mathbf{A}(\mathbf{u}-\mathbf{v})-\mathbf{b}\Vert ^2 + \lambda \mathbf{1}^H\mathbf{u} + \lambda \mathbf{1}^H\mathbf{v} \nonumber \\&\text { s.t.} \quad \mathbf{u}\ge 0 \nonumber \\&\qquad \quad \!\! \mathbf{v}\ge 0 \end{aligned}$$
(3)

where \(\mathbf {1}\) denotes the \(n\)-dimensional column vector of ones. In [13, 22, 44] it is shown that problems (2) and (3) share the same solutions and that at the solution of (3) either \(u_i\) or \(v_i\) or both are equal to zero.

The gradient of \(\mathcal {F}(\mathbf {u},\mathbf {v})\) is defined by

$$\begin{aligned} \nabla _{(\mathbf {u},\mathbf {v})}\mathcal {F}(\mathbf {u},\mathbf {v}) = \begin{bmatrix} \mathbf {A}^H\mathbf {A}(\mathbf {u}-\mathbf {v})-\mathbf {A}^H\mathbf {b}+\lambda \mathbf {1} \\ -\mathbf {A}^H\mathbf {A}(\mathbf {u}-\mathbf {v})+\mathbf {A}^H\mathbf {b}+\lambda \mathbf {1} \end{bmatrix} . \end{aligned}$$
(4)

Henceforth, we will denote by \(\mathbf {y}\) and \(\mathbf {g}\) the \(2n\)-dimensional vectors

$$\begin{aligned} \mathbf {y}=\begin{bmatrix} \mathbf {u} \\ \mathbf {v} \\ \end{bmatrix}, \quad \mathbf {g}=\begin{bmatrix} \mathbf {g}_\mathbf {u} \\ \mathbf {g}_\mathbf {v} \\ \end{bmatrix} \end{aligned}$$

where \(\mathbf {g}_\mathbf {u}\) and \(\mathbf {g}_\mathbf {v}\) are respectively the partial derivatives of \(\mathcal {F}\) with respect to \(\mathbf {u}\) and \(\mathbf {v}\).

Observe that, even if by reformulating (2) as a NCQP we have doubled the problem size, the computation of the objective function and its gradient values indeed requires only one multiplication by \(\mathbf {A}\) and one by \(\mathbf {A}^H\).

2.2 Hessian Approximation

The MNP method is basically a Newton-based method where the search direction computation requires the inversion of the Hessian matrix. Unfortunately, the Hessian \(\mathbf {H}\) of \(\mathcal {F}(\mathbf {u},\mathbf {v})\)

$$\begin{aligned} \mathbf {H} = \begin{bmatrix} \mathbf {A}^H\mathbf {A}&-\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A} \\ \end{bmatrix} \end{aligned}$$
(5)

is a positive semidefinite matrix. The idea underlying the proposed approach is to substitute the Hessian with a nonsingular approximation. In MNP, we modify \(\mathbf {H}\) by adding a small perturbation to its negative part. More precisely, we use the following Hessian approximation:

$$\begin{aligned} \mathbf {H}_\tau = \mathbf {H}+ \begin{bmatrix} \mathbf {0}&\quad \mathbf {0} \\ \mathbf {0}&\quad \tau \mathbf {I} \end{bmatrix} = \begin{bmatrix} \mathbf {A}^H\mathbf {A}&\quad -\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\quad \mathbf {A}^H\mathbf {A}+\tau \mathbf {I} \\ \end{bmatrix} \end{aligned}$$
(6)

where \(\tau \) is a positive parameter and \(\mathbf {I}\) and \(\mathbf {0}\) are respectively the identity and zero matrix of size \(n\). There are several reasons for this choice of the Hessian modification. Firstly, an explicit formula for the inverse of \(\mathbf {H}_\tau \) can be derived, secondly the search direction is computable at moderate cost and finally, negative part of images are penalized. Naturally, other nonsingular Hessian modifications are possible. Adding the positive constant \(\tau \) to all elements of the diagonal \(\mathbf {H}\) results in an Hessian modification such that the multiplication of its inverse by a vector requires more computational cost. Moreover, our numerical experiments indicate that this Hessian approximation tends to be more sensitive to the choice of the parameter \(\tau \). On the other hand, adding a small perturbation to the positive part of \(\mathbf {H}\) would result in a penalization of the positive part of images and in less accurate reconstructions.

Proposition 2.1

Assume that \(\mathbf {A}^H\mathbf {A}\) is nonsingular. Then, \(\mathbf {H}_\tau \) is nonsingular and its inverse is

$$\begin{aligned} \mathbf {M}_\tau = \frac{1}{\tau } \begin{bmatrix} \tau (\mathbf {A}^H\mathbf {A})^{-1}+\mathbf {I}&\quad \mathbf {I} \\ \mathbf {I}&\quad \mathbf {I} \\ \end{bmatrix} . \end{aligned}$$
(7)

Proof

We prove that \(\mathbf {H}_\tau \mathbf {M}_\tau =\mathbf {M}_\tau \mathbf {H}_\tau =\mathbf {I}_{2n}\) where \(\mathbf {I}_{2n}\) is the identity matrix of size \(2n\).

We have

$$\begin{aligned} \mathbf {H}_\tau \mathbf {M}_\tau&= \frac{1}{\tau } \begin{bmatrix} \mathbf {A}^H\mathbf {A}&\quad -\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\quad \mathbf {A}^H\mathbf {A}+\tau \mathbf {I} \\ \end{bmatrix} \begin{bmatrix} \tau (\mathbf {A}^H\mathbf {A})^{-1}+\mathbf {I}&\quad \mathbf {I} \\ \mathbf {I}&\quad \mathbf {I} \\ \end{bmatrix} \\&= \frac{1}{\tau } \begin{bmatrix} \tau \mathbf {I}+\mathbf {A}^H\mathbf {A}-\mathbf {A}^H\mathbf {A}&\quad \mathbf {A}^H\mathbf {A}\!-\!\mathbf {A}^H\mathbf {A} \\ \!-\!\tau \mathbf {I}\!-\!\mathbf {A}^H\mathbf {A}\!+\!\mathbf {A}^H\mathbf {A}\!+\!\tau \mathbf {I}&\quad \!-\!\mathbf {A}^H\mathbf {A}\!+\!\mathbf {A}^H\mathbf {A}\!+\!\tau \mathbf {I} \\ \end{bmatrix}\\&=\begin{bmatrix} \mathbf {I}&\mathbf {0} \\ \mathbf {0}&\mathbf {I} \\ \end{bmatrix} . \end{aligned}$$

Similarly, we have \(\mathbf {M}_\tau \mathbf {H}_\tau =\mathbf {I}_{2n}\) and this concludes the proof.

Proposition 2.2

The inverse Hessian approximation \(\mathbf {M}_\tau \) is a symmetric positive definite matrix.

Proof

Let \(\mathbf {z},\mathbf {w}\in \mathbb {R}^{n}\). We have

$$\begin{aligned}&\begin{bmatrix} \mathbf {z}^H&\mathbf {w}^H \end{bmatrix} \frac{1}{\tau } \begin{bmatrix} \tau (\mathbf {A}^H\mathbf {A})^{-1}+\mathbf {I}&\quad \mathbf {I} \\ \mathbf {I}&\quad \mathbf {I} \\ \end{bmatrix} \begin{bmatrix} \mathbf {z} \\ \mathbf {w} \end{bmatrix}\nonumber \\&\quad = \mathbf {z}^H(\mathbf {A}^H\mathbf {A})^{-1}\mathbf {z}+\frac{1}{\tau }\Vert \mathbf {z}+\mathbf {w}\Vert ^2 \ge 0 . \end{aligned}$$
(8)

Equality holds in (8) if and only if \(\mathbf {z}=0\) and \(\mathbf {z}+\mathbf {w}=0\), i.e. if and only if \(\mathbf {z}=0\) and \(\mathbf {w}=0\). This concludes the proof.

Remark 2.1

The developed strategy for approximating the Hessian matrix \(\mathbf {H}\) is closely related to the method of Lavrentiev regularization [6, 41, 48, 51]. In fact, in Lavretiev regularization, a regularized solution of a linear ill-posed problem with positive semidefinite linear operator \(\mathbf {K}\) is obtained as the solution of a slightly modified equation whose linear operator is \(\mathbf {K}+\tau \mathbf {I}\). Therefore, the Hessian approximation \(\mathbf {H}_\tau \) is indeed a Lavretiev-type regularized approximation to \(\mathbf {H}\) where only the last \(n\) variables are penalized. In this way, large values of the last \(n\) components of the search direction are penalized.

2.3 Algorithm

Firstly, let us introduce the basic notation that enables us to formalize the description of the MNP method. In the sequel, \(\mathcal {A}(\mathbf {y})\) will indicate the set of indices [8, 9, 55]:

$$\begin{aligned} \mathcal {A}(\mathbf {y})&= \Big \{i \; |\; 0\le y_i \le \varepsilon \text { and } g_i>0\Big \}, \\ \varepsilon&= \min \{ \overline{\varepsilon },\Vert \mathbf {y}-[\mathbf {y}- \mathbf {g}]^+\Vert \} \end{aligned}$$

where \(\overline{\varepsilon }\) is a small positive parameter and \([\cdot ]^+\) denotes the projection on the positive orthant.

Finally, let \(\mathbf {E}\) and \(\mathbf {F}\) denote the diagonal matrices [55] such that

$$\begin{aligned} \{ \mathbf {E}(\mathbf {y}) \}_{ii}&= \left\{ \begin{array}{ll} 1, &{} i \notin \mathcal {A}(\mathbf {y}); \\ 0, &{} i \in \mathcal {A}(\mathbf {y}); \end{array} \right. \\ \mathbf {F}(\mathbf {y})&= \mathbf {I}_{2n}-\mathbf {E}(\mathbf {y}) . \end{aligned}$$

Given a feasible initial iterate \(\mathbf {y}^{(0)}\in \mathbb {R}^{2n}\), the MNP method is defined by the general iteration

$$\begin{aligned} \mathbf {y}^{(k+1)} = [\mathbf {y}^{(k)} -\alpha ^{(k)}\mathbf {p}^{(k)}]^+, \quad \mathbf {p}^{(k)}=\mathbf {S}^{(k)}\mathbf {g}^{(k)} \end{aligned}$$
(9)

where

$$\begin{aligned} \mathbf {S}^{(k)} = \mathbf {E}(\mathbf {y}^{(k)})\mathbf {M}_\tau \mathbf {E}(\mathbf {y}^{(k)})+\mathbf {F}(\mathbf {y}^{(k)}) \end{aligned}$$
(10)

and the step-length \(\alpha ^{(k)}\) is determined by the Armijo rule along the projection arc [8, 9]. That is, \(\alpha ^{(k)}\) is the first number of the sequence \(\{2^{-m}\}_{m\in \mathbb {N}}\) such that

$$\begin{aligned}&\mathcal {F}\big (\mathbf {y}^{(k)}\big ) - \mathcal {F}\big (\mathbf {y}^{(k)}(2^{-m})\big ) \nonumber \\&\quad \ge \beta \left( 2^{-m}\sum _{i \notin \mathcal {A}^{(k)}} g^{(k)}_ip^{(k)}_i \!+\! \sum _{i \in \mathcal {A}^{(k)}} g^{(k)}_i \Big ( y_i^{(k)} \!-\! y_i^k(2^{-m}) \Big ) \right) \nonumber \\ \end{aligned}$$
(11)

where \(\mathbf {y}^{(k)}(2^{-m}) = [\mathbf {y}^{(k)}-2^{-m}\mathbf {p}^{(k)}]^+\), \(\beta \in (0,\frac{1}{2})\). For easier notation, in the following, \(\mathbf {E}^{(k)}\), \(\mathbf {F}^{(k)}\) and \(\mathcal {A}^{(k)}\) will denote respectively the diagonal matrices \(\mathbf {E}(\mathbf {y}^{(k)})\) and \(\mathbf {F}(\mathbf {y}^{(k)})\) and the index set \(\mathcal {A}(\mathbf {y}^{(k)})\).

Remark 2.2

In the Two-Metric projection method originally proposed by Bertsekas [8, 9, 24], the scaling matrix is

$$\begin{aligned} \mathbf {S}^{(k)}=\left( \mathbf {E}^{(k)}\mathbf {H}\mathbf {E}^{(k)}+\mathbf {F}^{(k)} \right) ^{-1}. \end{aligned}$$
(12)

Using this scaling, the standard Newton-like projection methods attain the superlinear convergence rate of Newton-like methods. However, the inversion of a submatrix of \(\mathbf {H}\), required in (12), is often impracticable in image deblurring applications. Therefor, in practice, a conjugate gradient version of the Newton-like projection method has to be used where the CG iterations are terminated when the relative residual becomes smaller than a given tolerance. The resulting approximate Newton-CG projection method may slow the convergence rate.

On the other hand, the scaling matrix (10) involves the inverse of the whole approximated Hessian matrix \(\mathbf {H}_\tau \) for which an explicit formula is given in Proposition 2.1. As a result, the search direction of MNP can be computed very quickly.

However, the scaling (10) may significantly slow the convergence rate and MNP may have worse convergence properties.

2.4 Convergence Analysis

As proved in [8, 24], the convergence of Newton-like projection methods only requires the scaling matrices \(\mathbf {S}^{(k)}\) to be positive definite matrices with uniformly bounded eigenvalues. In particular, the global convergence property of these methods can be proved under the general following assumptions [8].

A1:

The gradient \(\nabla _{(\mathbf {u},\mathbf {v})} \mathcal {F}\) is Lipschitz continuous on each bounded set of \(\mathbb {R}^{2n}\).

A2:

There exist positive scalars \(c_1\) and \(c_2\) such that

$$\begin{aligned}&c_1\Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le c_2 \Vert \mathbf {y}\Vert ^2, \quad \forall \mathbf {y}\in \mathbb {R}^{2n}, \quad k=0,1,\ldots \end{aligned}$$

The key convergence result is provided in Proposition 2 of [7] which is restated here for the shake of completeness.

Proposition 2.3

[7, Proposition 2] Let \(\{\mathbf {y}^{(k)}\}\) be a sequence generated by iteration (9) where \(\mathbf {S}^{(k)}\) is a positive definite symmetric matrix which is diagonal with respect to \(\mathcal {A}^{(k)}\) and \(\alpha ^k\) is computed by the Armijo rule along the projection arc. Under assumptions A1 and A2 above, every limit point of a sequence \(\{\mathbf {y}^{(k)}\}\) is a critical point with respect to problem (3).

Since the objective \(\mathcal {F}\) of (3) is twice continuously differentiable, it satisfies assumption A1.

The inverse Hessian approximation \(\mathbf {M}_\tau \) is a symmetric positive definite matrix (Proposition 2.2) and hence, the scaling matrix \(\mathbf {S}^{(k)}\) defined by (10) is a positive definite symmetric matrix which is diagonal with respect to \(\mathcal {A}^{(k)}\). Therefore, the global convergence of the MNP method is guaranteed if \(\mathbf {S}^{(k)}\) verifies assumption A2.

Proposition 2.4

Given a positive parameter \(\tau >0\), there exist two positive scalars \(c_1^\tau \) and \(c_2^\tau \) such that

$$\begin{aligned} c_1^\tau \Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le c_2^\tau \Vert \mathbf {y}\Vert ^2, \quad \forall \mathbf {y}\in \mathbb {R}^{2n}, \quad k=0,1,\ldots \end{aligned}$$

Proof

Since \(\mathbf {M}_\tau \) is positive definite, then

$$\begin{aligned} \sigma _1^\tau \Vert \mathbf {y}\Vert ^2 \le \mathbf {y}^H \mathbf {M}_\tau \mathbf {y} \le \sigma _{2n}^\tau \Vert \mathbf {y}\Vert ^2, \quad \forall \mathbf {y}\in \mathbb {R}^{2n} . \end{aligned}$$
(13)

where \(\sigma _1^\tau \) and \(\sigma _{2n}^\tau \) are the largest and smallest eigenvalue of \(\mathbf {M}_\tau \), respectively. We have

$$\begin{aligned} \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y}&= \mathbf {y}^H \big ( \mathbf {E}^{(k)}\mathbf {M}_\tau \mathbf {E}^{(k)}+\mathbf {F}^{(k)}\big )\\ \mathbf {y}&= \big ( \mathbf {E}^{(k)}\mathbf {y}\big )^H \mathbf {M}_\tau \big ( \mathbf {E}^{(k)}\mathbf {y}\big ) + \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} . \end{aligned}$$

From (13) it follows that

$$\begin{aligned} \sigma _1^\tau \Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2+\mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y}&\le \big ( \mathbf {E}^{(k)}\mathbf {y}\big )^H \mathbf {M} \big ( \mathbf {E}^{(k)}\mathbf {y}\big ) + \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y}\\&\le \sigma _{2n}^\tau \Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2+\mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} . \end{aligned}$$

Moreover we have

$$\begin{aligned} \mathbf {y}^H\mathbf {F}^{(k)}\mathbf {y} = \sum _{i\in \mathcal {A}^{(k)}} y_i^2, \quad \Vert \mathbf {E}^{(k)}\mathbf {y}\Vert ^2 = \sum _{i\notin \mathcal {A}^{(k)}} y_i^2 . \end{aligned}$$

Hence, we obtain

$$\begin{aligned} \sigma _{1}^\tau \sum _{i\notin \mathcal {A}^{(k)}} y_i^2\!+\!\sum _{i\in \mathcal {A}^{(k)}} y_i^2 \!\le \! \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \!\le \!\sigma _{2n}^\tau \sum _{i\notin \mathcal {A}^{(k)}} y_i^2\!+\!\sum _{i\in \mathcal {A}^{(k)}} y_i^2 \end{aligned}$$

and

$$\begin{aligned} \min \{\sigma _{1}^\tau ,1\}\Vert \mathbf {y} \Vert ^2 \le \mathbf {y}^H \mathbf {S}^{(k)} \mathbf {y} \le \max \{\sigma _{2n}^\tau ,1\}\Vert \mathbf {y}\Vert ^2 . \end{aligned}$$

The thesis immediately follows by setting

$$\begin{aligned} c_1^\tau =\min \{\sigma _{1}^\tau ,1\} \quad c_2^\tau =\max \{\sigma _{2n}^\tau ,1\}. \end{aligned}$$

Propositions 2.3 and 2.4 ensure the global convergence of the MNP method.

2.5 Implementation of the Search Direction Computation

The computation of the search direction \(\mathbf {p}^{(k)}\) requires the multiplication of a vector by \(\mathbf {M}_\tau \). Let \(\begin{bmatrix}\mathbf {z} \\ \mathbf {w}\end{bmatrix}\in \mathbb {R}^{2n}\) be a given vector, then it immediately follows that

$$\begin{aligned} \mathbf {M}_\tau \begin{bmatrix}\mathbf {z} \\ \mathbf {w}\end{bmatrix}&= \frac{1}{\tau }\begin{bmatrix} \tau (\mathbf {A}^H\mathbf {A})^{-1}\mathbf {z}+\mathbf {z}+\mathbf {w}\\ \mathbf {z}+\mathbf {w} \end{bmatrix}\nonumber \\&= \begin{bmatrix} (\mathbf {A}^H\mathbf {A})^{-1}\mathbf {z}+(\mathbf {z}+\mathbf {w})/\tau \\ (\mathbf {z}+\mathbf {w})/\tau \end{bmatrix} . \end{aligned}$$
(14)

In many image restoration applications, the blurring matrix \(\mathbf {A}\) is severely ill-conditioned and computing \((\mathbf {A}^H\mathbf {A})^{-1}\mathbf {z}\) requires a regularization strategy. Therefore, in our implementation, a Tikhonov-like technique is employed in the search direction computation by approximating the matrix–vector product (14) with

$$\begin{aligned} \mathbf {M}_\tau \begin{bmatrix}\mathbf {z} \\ \mathbf {w}\end{bmatrix} \approx \begin{bmatrix} (\mathbf {A}^H\mathbf {A}+\gamma \mathbf {I})^{-1}\mathbf {z}+(\mathbf {z}+\mathbf {w})/\tau \\ (\mathbf {z}+\mathbf {w})/\tau \end{bmatrix} \end{aligned}$$
(15)

where \(\gamma \) is a positive parameter. The inversion of \(\mathbf {A}^H\mathbf {A}+\gamma \mathbf {I}\) can be efficiently performed in the Fourier space with computational complexity of two Fast Fourier Transforms as follows. In fact, assuming periodic boundary conditions, \(\mathbf {A}\) can be factorized as

$$\begin{aligned} \mathbf {A} = \mathbf {U}^*\mathbf {D}\mathbf {U} \end{aligned}$$

where \(\mathbf {U}\) is the two dimensional unitary Discrete Fourier Transform (DFT) matrix and \(\mathbf {D}\) is the diagonal matrix containing the eigenvalues of \(\mathbf {A}\). Thus

$$\begin{aligned} (\mathbf {A}^H\mathbf {A}+\gamma \mathbf {I})^{-1} = \mathbf {U}^*(|\mathbf {D}|^2+\gamma \mathbf {I})^{-1}\mathbf {U} . \end{aligned}$$

The products involving \(\mathbf {U}\) and \(\mathbf {U}^*\) can be performed by using the FFT algorithm at the cost of \(O(n\log _2n)\) operations while the inversion of the diagonal matrix \((|\mathbf {D}|^2+\gamma \mathbf {I})\) has the cost \(O(n)\).

3 Numerical Results

In this section, we present the numerical results of several image restoration test problems. The numerical experiments aim at illustrating the performance of MNP compared with several state-of-the-art methods as SALSA [1], SPARSA [56], FISTA [4], NESTA [5], CGIST [26] and the Split Bregman method [25, 57]. In our comparative study, we also consider first and second-order methods solving, as MNP, the quadratic program (3) such as the nonmonotonic version of GPSR using the Barzilai Borwein technique for the step-length selection [22], the original Two-Metric Projection (TMP) method of Gafni and Bertsekas [8, 24, 45] employing the scaling (12) and a modified Newton projection method obtained by regularizing the full diagonal of the Hessian matrix (5). This last method, which will be indicated as MNP2, requires the inversion of the Hessian approximation:

$$\begin{aligned} \mathbf {H}_\tau = \begin{bmatrix} \mathbf {A}^H\mathbf {A} +\tau \mathbf {I}&-\mathbf {A}^H\mathbf {A} \\ -\mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A}+\tau \mathbf {I} \\ \end{bmatrix} . \end{aligned}$$

It can be prove (the proof is similar to that of Proposition 2.1) that the inverse of \(\mathbf {H}_\tau \) is defined as

$$\begin{aligned} \mathbf {M}_\tau&= \frac{1}{\tau } \begin{bmatrix} \mathbf {A}^H\mathbf {A} +\tau \mathbf {I}&\mathbf {A}^H\mathbf {A} \\ \mathbf {A}^H\mathbf {A}&\mathbf {A}^H\mathbf {A}+\tau \mathbf {I} \\ \end{bmatrix}\nonumber \\&\begin{bmatrix} (2\mathbf {A}^H\mathbf {A} +\tau \mathbf {I})^{-1}&\mathbf {0} \\ \mathbf {0}&(2\mathbf {A}^H\mathbf {A}+\tau \mathbf {I})^-1{} \\ \end{bmatrix} . \end{aligned}$$
(16)

Formula (16) indicates that, in MNP2, the computation of the search direction requires more computational effort than in MNP.

Finally, MNP has also been compared with the OWLQN method of Andrew and Gao [3] which is probably one of the most effective method proposed for general large-scale \(\ell _1\)-regularized learning.

3.1 Overall Assessment of the Considered Methods

The Matlab source code of the considered methods, when made publicly available by the authors, has been used in the numerical experiments. The OWLQN method is coded, by the authors, in C; a Matlab version, following the instruction provided in [3], has been implemented.

In TMP, the linear system for the search direction computation has been solved by the Conjugate Gradient method. The relative residual tolerance has been fixed at 0.01 and a maximum number of 100 iterations has been allowed. We remark that lower values for the CG relative tolerance produce restorations of worse quality since the Hessian matrix is ill-conditioned.

The numerical experiments have been executed on a Sun Fire V40z server consisting of four 2.4 GHz AMD Opteron 850 processors with 16 GB RAM using Matlab 7.10 (Release R2010a).

For all the considered methods, the initial iterate \(\mathbf {x}^{(0)}\) has been chosen to be the zero image whose pixel values are all equal to zero. The regularization parameter \(\lambda \) of (2) has been heuristically chosen.

MNP depends on the \(\tau \) and \(\gamma \) parameters. After a wide experimentation, we heuristically found that good values for these parameters are

$$\begin{aligned} \tau = 50\lambda , \quad \gamma = \lambda . \end{aligned}$$
(17)

These values have been used in all the presented numerical experiments. It is worth remarking that the values (17) for \(\tau \) and \(\gamma \) are not optimal for each experiments and that the numerical results could be further improved. However, with this parameter setting, the MNP algorithm only depends on the regularization parameter \(\lambda \). We also observe that the SALSA method depends on a penalty parameter \(\mu \) [1] and that the Split Bregman method depends on a “splitting” regularization coefficient \(\nu \) [25, 57]. In all the presented experiments, the values of these parameters have been handtuned for the best mean squared error reduction. The default parameters of the other methods, suggested by the authors have been chosen.

The termination criterion of all the considered methods is based on the relative change in the objective function at the last step. In particular, the methods iteration is terminated when

$$\begin{aligned} |\phi (\mathbf {x}^{(k+1)})-\phi (\mathbf {x}^{(k)})| \le tol_\phi \; \phi \big (\mathbf {x}^{(k+1)} \big ) \end{aligned}$$
(18)

where \(tol_\phi \) is a small positive parameter. The selection of a fair stopping criterium in image deblurring application is a critical issue. In our numerical experiments, this criterium, also used in [5, 56] appears to provide sufficiently accurate solutions while avoiding excessive computation costs. In general, criterion (18) may not always reliably indicate progress towards optimality, although we did not observe this in our experiments. In such cases, a more appropriate stopping criterion would be based on the norm of the projected gradient or on the duality gap.

A maximum number of 500 iterations has been allowed for each method.

The quality of the restorations provided by the compared methods has been measured by using the Peak Signal-to-Noise Ratio (PSNR) values.

All pixels of the original images described in the following experiments have been first scaled into the range between 0 and 1.

3.2 Experiment 1: The Satellite Image

In the first experiment, the famous \(256\times 256\) satellite image has been considered. The satellite image is a good test image for \(\ell _1\)-based image restoration because it has many pixels with value equal to zero. The observed image of Fig. 1 has been generated by convolving the original image, also shown in Fig. 1, with a Gaussian PSF with variance equal to 2, obtained with the code psfGauss from [33], and then by adding Gaussian noise with noise level equal to \(5\cdot 10^{-3}\). (The noise level \(\text {NL}\) is defined as \(\text {NL}:={\Vert \varvec{\eta }\Vert }\big /{\Vert \mathbf {A}\mathbf {x}_\text {original}\Vert }\) where \(\mathbf {x}_\text {original}\) is the original image.) In Fig. 1 and following, the satellite image intensities are displayed in “negative gray-scale”.

Fig. 1
figure 1

Experiment 1: deblurring of the satellite image. Left original image; right observed image (\(\text {NL}=5 \times 10^{-3}\))

For this test problem, the \(\mu \) and \(\nu \) parameters of the SALSA and Split Bregman methods have been set to \(\mu =5\lambda \) and \(\nu =0.025\lambda \), respectively.

Table 1 reports the PSNR values, the objective values, the CPU times in seconds, the number of performed iterations and the percentage of negative pixel values in the restored images. All the results are averaged over ten runs of each method. The numerical results given in Table 1 have been obtained by using the stopping tolerance values \(tol_\phi =10^{-3}\) and \(tol_\phi =10^{-5}\). Smaller values of \(tol_\phi \) do not improve the visual quality of the restored images even if they produce more accurate solutions to the optimization problem (2). The information in Table 1 is summarized in Fig. 2 where the PSNR values versus time are plotted for all the considered methods.

Fig. 2
figure 2

Experiment 1: PSNR values versus time (in seconds) obtained at \(tol_\phi =10^{-3}\) (left) and \(tol_\phi =10^{-5}\) (right)

Table 1 Experiment 1: numerical results

The results in Table 1 and Fig. 2 show that MNP is able to provide good quality restorations at low computational effort. They also indicate that the inversion of the modified Hessian adopted in MNP2 requires more computational effort.

In Fig. 3, the performance of MNP is compared to the performance of the first-order and second order methods (NESTA has been omitted from these comparisons because of its high computational time, see Table 1). Since PSNR is inversely proportionally to the Mean Squared Error (MSE), Fig. 3 illustrates, in a semi-logarithmic scale, the MSE behavior and the decreasing of the objective function versus time. For the sake of readability, all the methods are not represented in the same picture. The convergence rates of the considered second-order methods (MNP, MNP2, TMP and OWLQN) have also been compared. For these methods, Fig. 4 shows, in a semi-logarithmic scale, the behavior of the gradient norm values versus time; from the graphs it is evident that MNP, MNP2 and TMP decrease the gradient norm slower than OWLQN. As expected, the scaling of the gradient direction employed by MNP and MNP2 reduces the local convergence rate; moreover, in TMP, the use of an iterative solver for the search direction computation leads to slower convergence rate.

Fig. 3
figure 3

Experiment 1: comparison between the considered methods. Top line relative error histories versus time; bottom line objective function decrease versus time (in seconds)

Fig. 4
figure 4

Experiment 1: comparison between MNP and the second-order methods (MNP2, TMP, OWLQN): gradient norm reduction versus time (in seconds)

Figure 5 compares the image obtained by MNP at \(tol_\phi =10^{-3}\) with those provided by SALSA, Split Bregman, CGIST, MNP2 and OWLQN at \(tol_\phi =10^{-5}\). (For these methods, the images obtained at \(tol_\phi =10^{-3}\) are not shown since their visual quality is clearly poor.) In the Bregman image, for a better visualization, the pixels with negative values image has been set to zero. The images restored by SPARSA and FISTA are not shown because they are practically indistinguishable from those obtained by SALSA and MNP. The visual quality of the GPSR, NESTA and TMP images is inferior and therefore these images are not displayed (see also their PSNR values in Table 1). Observe that the MNP image is visually comparable to the SALSA image while the CGIST image has an inferior visual quality even if its PSNR value is comparable to that of SALSA. Moreover, the MNP2 and OWLQN images are not comparable, in terms of visual quality, to the MNP image; in particular, the MNP2 image seems to be too smooth.

Fig. 5
figure 5

Experiment 1: restorations provided by some of the considered methods with \(tol_\phi =10^{-5}\). For MNP, the restoration at \(tol_\phi =10^{-3}\) is depicted

The restored images obtained by SALSA, NESTA and the Split Bregman methods have pixels with negative entries. They are displayed in Fig. 6 where the black pixels correspond to nonnegative pixels values of the reconstructions.

Fig. 6
figure 6

Experiment 1: pixels with negative entries. The black pixels correspond to pixels with nonnegative values. The white pixels correspond to pixels with negative values

Finally, in order to assess the performance of the image restoration criterium (2), we have compared the \(\ell _1\)-norm regularizer with the \(\ell _2\)-norm and the Total Variation (TV) regularizers, which are widely-used in image restoration. The Lagged Diffusivity Fixed Point method [54] has been considered for the solution of the TV-regularized least squares problem. The \(\ell _2\)-regularized least squares problem has been solved by inverting, in the Fourier space, the linear system of its first order conditions. The MNP method is used for the solution of (2). Table 2 reports the PSNR values and the percentage of negative pixels; Fig. 7 displays the restored images. The superiority of the \(\ell _1\)-norm regularized image is evident both in terms of PSNR values and visual quality improvement. In fact, edges are preserved by the restoration model (2) without any staircasing or blurring effects which are respectively evident in the TV-regularized and \(\ell _2\)-norm regularized images.

Fig. 7
figure 7

Experiment 1: comparison between \(\ell _1\)-norm, \(\ell _2\)-norm and TV regularizers

Table 2 Experiment 1: comparison between \(\ell _1\)-norm, \(\ell _2\)-norm and TV regularizers

3.3 Experiment 2: The Flintstones Image

In the second experiment, the \(512\times 512\) Flintstones image (Fig. 8) has been used. This image is a good test image since has a nice mixture of detail and flat regions. The original Flintstones image has been degraded by two blurring operators, shown in Fig. 8, and by adding varying amounts of Gaussian noise.

Fig. 8
figure 8

Experiment 2: original Flintstones image and blur operators

In this experiment, MNP has been compared to SALSA, Split Bregman, FISTA, MNP2 and OWLQN. The reasons why we limit our comparative study to these methods are the following. From experiment 1, it is evident that SALSA and FISTA are able to achieve good quality results in little time. Moreover, Split Bregman and OWLQN have been considered because they are well-known methods for general \(\ell _1\) minimization not only proposed for image restoration applications. Finally, MNP2 has been considered because closely-related to MNP.

We have fixed the values \(\mu =2.5\lambda \) and \(\nu =\lambda \) for the SALSA and Split Bregman parameter. The numerical results are summarized in Table 3 where the level of the added noise (first column), the PSNR values, the objective values and the CPU times in seconds are reported for each blurring operator. The reported numerical values are the average over 10 run of each experiment and have been obtained using the tolerance values \(tol_\phi =10^{-3}\) since smaller values of \(tol_\phi \) do not improve the visual quality of the restored images. Figure 9 depicts the MNP restorations from the degraded images with highest noise level (NL=0.01). They asses the ability of the image restoration model (2) to remove blur and noise from the degraded images.

Fig. 9
figure 9

Experiment 2: Top line degraded image (NL=0.01). Bottom line MNP restorations. Left column blur \(h1\); right column blur \(h2\)

Table 3 Experiment 2: numerical results

3.4 Experiment 3

This last experiment is based on four famous test images with different features: the \(256\times 256\) Text image, the \(358\times 358\) Moon image, the \(364\times 364\) Concord image and the \(490\times 490\) Spine image (see Fig. 10). These images are example images from the Matlab distribution. They have been corrupted by the blurring operators depicted in Fig. 8 and then Gaussian noise has been added with the same noise level values used in experiment 2. For each noise level, 10 Gaussian noise vectors have been obtained with different noise realizations. This amounts to 100 experiments for each image and to 400 experiments in total.

Fig. 10
figure 10

Experiment 3: MNP restorations for \(h1\) blurring operator and NL=0.005. From left to right and from top to bottom: Text, Moon, Concord and Spine images

Also in this experiment, MNP has been compared to SALSA, Split Bregman, FISTA, MNP2 and OWLQN. The numerical results have been obtained by using the tolerance value \(tol_\phi =10^{-3}\). The \(\mu \) and \(\nu \) parameters of the SALSA and Split Bregman methods have been set to the same values of the previous experiment.

Let \(\text {PSNR}_\textsc {method}\) denote the PSNR value given by a method among those considered (MNP, SALSA, Split Bregman, FISTA, MNP2 and OWLQN) and let \({\textsc {Time}}_\textsc {method}\) denote the corresponding CPU time, in seconds. The performance of the methods has been compared in terms of percentage of experiments producing a reconstructed image such that

$$\begin{aligned} \text {PSNR}_\textsc {method} \ge C_1 \cdot \text {PSNR}_\textsc {max}, \quad C_1=1,0.9,0.8 \end{aligned}$$

and

$$\begin{aligned} {\textsc {Time}}_{\textsc {method}} \le C_2 \cdot {\textsc {Time}}_{\textsc {min}}, \quad C_2=1,3,5 \end{aligned}$$

where, for each experiment, \(\text {PSNR}_\textsc {max}\) is the maximum PSNR value and \({\textsc {Time}}_\textsc {min}\) is the minimum time. For each image and for the total 400 experiments, the corresponding percentage values are reported in Tables 4 and 5. Figure 10 depicts the MNP restorations from the images degraded by the \(h1\) blurring operator and Gaussian noise with NL=0.005.

Table 4 Experiment 3: percentage of experiments such that \(\text {PSNR}_\textsc {method} \ge C_1 \cdot \text {PSNR}_\textsc {max}\)
Table 5 Experiment 3: percentage of experiments such that \(\text {Time}_\textsc {method} \le C_2 \cdot \text {Time}_\textsc {min}\)

4 Conclusions

This paper describes a new approach to the solution of \(\ell _1\)-regularized least squares problems whose matrix is supposed to be overdetermined and full-rank. This approach solves the nonnegatively constrained quadratic programming reformulation of the original least squares problem by a modified Newton projection method where the Hessian matrix is approximated so that it can be efficiently inverted in the Fourier space. The developed MNP method favors nonnegative solutions of the \(\ell _1\)-regularized least squares problem without explicitly imposing any constraints in the optimization problem. Thus, MNP can also be applied to image restoration problems whose solution may have negative components as in compressed sensing when \(\mathbf {x}\) represents the coefficient vector of the image under some basis. A comparative study with some state-of-the-art methods has been performed in order to evaluate the potential of the described approach. The numerical results show that MNP is competitive with the considered methods in terms of PSNR values while SALSA is often the fastest method.