1 Introduction

Blind image deblurring has been widely investigated and seen substantial progress in the past few decades. In general, the blurry image B can be formulated by:

$$\begin{aligned} B=I*k+n, \end{aligned}$$
(1)

where I, k, and n represent the latent clean image, the blur kernel, and the additive noise. We use \(*\) to denote the convolution operator. The goal of blind image deblurring is to estimate I and k from the given B. This is a highly ill-posed problem, and theoretically, there exist infinite pairs of I and k making the same B. In order to clear up such indefiniteness, certain priors on the clean image, the blur kernel, and noise should be imposed to constrain the space of feasible solutions. In the Bayesian inference framework, the most commonly-used estimator is the maximum a posteriori (MAP):

$$\begin{aligned} P(I,k|B)\propto P(B|I,k)P(I)P(k), \end{aligned}$$
(2)

where P(B|Ik) is the likelihood function, P(I) and P(k) denote statistical priors of I and k, respectively. By adopting \(-\log P(I,k|B)\), such an MAP estimator is equivalent to minimize the following regularized model:

$$\begin{aligned} \min _{I,k}R_{1}(B-I*k)+\alpha R_{2}(I) + \beta R_{3}(k), \end{aligned}$$
(3)

where \(R_{1}(\cdot )\) is the data fidelity term used to model the noise n, \(R_{2}(\cdot )\) and \(R_{3}(\cdot )\) regularize the clean image and the blur kernel, \(\alpha \) and \(\beta \) are regularization parameters.

Existing MAP-based works usually assume the noise to follow a Gaussian or Laplacian distribution [1,2,3,4]. Consequently, the \(l_2\) or \(l_1\)-norm is adopted to model the noise, i.e., \(||B-I*k||_{2}^{2}\) or \(||B-I*k||_{1}\). However, as demonstrated in the previous studies [5,6,7], the noise modeling should be more complex empirically, and only using the above formulations as the data fidelity term is less compelling. Recently, by modifying the noise modeling technique from [8], Chen et al. [9] proposed a weighted sum of the dense norm \(l_2\) and the sparse norm \(l_e\) to model the unknown noise:

$$\begin{aligned} R_{1}(B-I*k)=||B-I*k||_{2}^{2}+\omega _{1}||\nabla ^{1} B-\nabla ^{1} I*k||_{e}, \end{aligned}$$
(4)

where \( \omega _{1} \) is a weight parameter, \( \nabla _{1} = \{\partial _{x},\partial _{y}\} \), and \( ||\cdot ||_{e}=||\cdot ||_{0}+||\cdot ||_{1} \). Compared with these complicate noise modeling skills, such as learning data fitting terms [5] or the mixture of Gaussian [6, 7], the noise modeling (4) is surprisingly simple but very effective as confirmed in [9]. Considering spatial randomness of the noise [10, 11] in the blurry image, we further introduce the second-order derivative of the residue image to (4):

$$\begin{aligned} R_{1}(B-I*k)=||B-I*k||_{2}^{2}+\sum _{i=1,2} \omega _{i}||\nabla ^{i} B-\nabla ^{i} I*k||_{e},\nonumber \\ \end{aligned}$$
(5)

where \( \omega _{2} \) is a weight parameter, and \( \nabla _{2} = \{\partial _{xx},\partial _{xy},\partial _{yy}\} \). In Fig. 1, it can be seen that the proposed data fidelity term (5) is more effective than the other two terms, as the corresponding results contain fewer artifacts.

Fig. 1
figure 1

Comparison between different data fidelity terms

For the regularization term on the latent clean image, it is generally believed that the gradients of sharp images follow heavy-tailed distributions [12,13,14], which can be well modeled by a hyper-Laplacian prior \( P(\nabla ^{1}I)\propto e^{-\alpha |\nabla ^{1}I|^p} \) with \( 0.5\le p \le 0.8\). As a result, the \(l_p\)-norm is adopted on the image gradients for image deblurring [13, 15,16,17,18]. In other works [1,2,3, 9, 19,20,21], the importance of pursuing sparsity of image gradients for the blur kernel estimation has also been illustrated, and they mostly used \(l_0\)-norm to achieve sparsity. Particularly in the work [9], an enhanced sparse norm \(l_e\) is given to promote sparsity. Motivated by the success of these previous works, we propose a composite \(l_{\mathcal {P}}\)-norm for image gradients by taking advantage of both \(l_0\) and \(l_p (0<p\le 1)\). The proposed \(l_{\mathcal {P}}\)-norm can be sparser than either single one as well as the \(l_e\)-norm, and undoubtedly when \(p=1\), the \(l_{\mathcal {P}}\)-norm is equivalent to the \(l_e\)-norm. With a proper of \(p\) value, we confirm that using the \(l_{\mathcal {P}}\)-norm to regularize the image gradients for the blur kernel estimation is more rational. As shown in Fig. 2, we note that the proposed \(l_{\mathcal {P}}\)-norm (\(p\) is set as 0.6) performs best among different regularization terms.

Minimizing the overall regularized model is challenging since the proposed \(l_{\mathcal {P}}\)-norm is highly non-convex. To address this optimization problem, we use the half-quadratic splitting method [22] to split the model into several sub-problems, and modify the thresholding rule of the generalized iterated shrinkage algorithm (GISA) [23] to solve the related \(l_{\mathcal {P}}\)-minimization problem. In the end, we summarize the contributions of this work as follows:

  • We present a more flexible deblurring model based on the model in [9]. By setting the corresponding parameters, our model becomes that of [9].

  • We propose an improved sparse regularization \(l_{\mathcal {P}}\)-norm to regularize the image gradients and introduce a new data fidelity term.

  • We provide an effective optimization scheme for the proposed deblurring model.

Fig. 2
figure 2

Comparison between different regularization terms

2 Related works

Extensive studies on blind image deblurring have developed abundant technologies, such as selecting salient edges for kernel estimation [4, 10, 24,25,26,27,28,29], end-to-end deblurring based on deep learning [30,31,32,33], and imposing sparsity on the image gradients [11, 15,16,17,18].

We only focus on the last category that is highly related to the proposed method. In [11], Shan et al. concatenated two piece-wise functions to fit the heavy-tailed distribution of image gradients and combined it with a local prior to suppress ringing artifacts. Kotera et al. [17] used the \(l_{0.3}\)-norm on image gradients, and Gan et al. [16] adopted the \(l_{0.4}\)-norm on both image gradients and the blur kernel. Moreover, Almeida et al. [15] used the \(l_{p}\)-norm on gradients and manually set \(p\) values as a non-increasing sequence (i.e., 0.8, \(\cdots \), 0.6, \(\cdots \),0.4, \(\cdots \)) while iterating. In [18], Zuo et al. further developed an iteration-wise learning method to produce a series of increasing \(p(-1<p \le 0.2)\) values and assigned them to the corresponding iteration of deblurring. However, these methods do not perform well on specific images beyond natural ones.

Instead of exploring the heavy-tailed distribution of gradients to achieve sparsity, Krishnan et al. [34] used a normalized sparsity regularization \( \frac{1_1}{l_2} \), and Xu et al. [21] proposed the approximated \(l_{0}\)-norm regularization to fulfill the task. Recently, to further improve the deblurring performance, researchers have suggested some complementary priors for the \(l_{0}\)-norm of gradients, Such as the dark channel prior [2], the extreme channels prior [35], the discriminative priors [1], and the local minimal intensity prior [36]. In [9], Chen et al. used an enhanced sparse regularization to regularize the gradients, and incorporate it with the noise modeling (4) for the blur kernel estimation. In this paper, we extend the noise modeling in [9] and suggest a more rational \(l_{{\mathcal {P}}}\)-norm for image gradients. We show that the proposed algorithm is advanced against [9] as well as state-of-the-art deblurring methods.

3 Proposed deblurring model

3.1 Improved sparse regularization

In this subsection, we first introduce the proposed improved sparse regularization \(l_{\mathcal {P}}\)-norm, and then provide the solution of the \(l_{\mathcal {P}}\)-minimization problem.

For a signal \(X\), the \(l_{\mathcal {P}}\)-norm of it is given by:

$$\begin{aligned} ||X||_{{\mathcal {P}}}=||X||_0+||X||_{p}^{p}. \end{aligned}$$
(6)

Given a degraded signal \(Y\), and assuming its latent signal \(X\) is sparse, the \(l_{\mathcal {P}}\)-minimization problem can be expressed as:

$$\begin{aligned} \min _{X}||Y-X||_{2}^{2}+\lambda (||X||_{0}+||X||_{p}^{p}), \end{aligned}$$
(7)

where \(\lambda \) is the penalty parameter. Note that Eq.(7) is an element-wise minimization problem, thus it can be decomposed into a set of independent sub-problems, and one of them is:

$$\begin{aligned} \min _{x}(y-x)^{2}+\lambda (|x|^{0}+|x|^{p}), \end{aligned}$$
(8)

where \(x\) and \(y\) denote the elements of \(X\) and \(Y\), respectively, at the same location. Without the term \(|x|^{p}\), Eq.(8) has a closed-form solution [22]. Similarly, without the term \(|x|^{0}\), Eq.(8) becomes a \(l_{p}\)-minimization problem that can be effectively solved by the generalized soft-thresholding (GST) operator [23].

In the following, we show that the solution of Eq.(8) can be obtained by modifying the thresholding rule of the GST operator. Let

$$\begin{aligned} E(x)=(y-x)^{2}+\lambda |x|^{p}, \end{aligned}$$
(9)

and

$$\begin{aligned} F(x)=(y-x)^{2}+\lambda (|x|^{0}+|x|^{p}). \end{aligned}$$
(10)
Fig. 3
figure 3

Curves of the functions \(E(x)\) (Top row) and \(F(x)\) (Bottom row) with different values of \(y\). We use \(p=0.5\) and \(\lambda =2\). The green square denotes the inflection point, and the red circle is the minimum point

By fixing \(p=0.5\) and \(\lambda =2\), we plot the curves of \(E(x)\) and \(F(x)\) with five typical \(y\) values, respectively, in Fig. 3, without loss of generality, only the case of \(y>0\) is considered. The so-called GST operator is proposed to minimize \(E(x)\) based on the observations that there exists a specific threshold \(\tau _{p}(\lambda )\), where \(E(0)\) equals \(E(x_{p}^{*})\) as seen in Fig. 3c. If \(y<\tau _{p}(\lambda )\), \(x=0\) (e.g., Fig. 3a and b) is the global minimum. Otherwise, a non-zero solution (e.g., Fig. 3d and e) would be optimal. Thus, the main issues of GST operator focus on how to calculate the \(\tau _{p}(\lambda )\) and search the non-zero solution \(S_{p}(y,\lambda )\), which satisfies \(E^{\prime }(S_{p}(y,\lambda ))=0\), i.e.,

$$\begin{aligned} S_{p}(y,\lambda )=y-\lambda p(S_{p}(y,\lambda ))^{p-1}. \end{aligned}$$
(11)

More details and proofs can be found in [23] and its supplementary material. Since \(F(x)\) is the sum of \(E(x)\) and the positive constant \(\lambda \) when \(x\) is non-zero, \(S_{p}(y,\lambda )\) is also the non-zero solution of \(F(x)\).

From Fig. 3f–j, one can see that GST strategy can also be used to minimize \(F(x)\) with a different and larger threshold. We use \(\tau _{{\mathcal {P}}}(\lambda )\) to denote this threshold, and rewrite the \(F(x)\) as:

$$\begin{aligned} F(x) = {\left\{ \begin{array}{ll} y^{2} &{} \text {if } x = 0,\\ (y-x)^{2}+\lambda (1+x^{p}) &{} \text {if } x > 0. \end{array}\right. } \end{aligned}$$
(12)

The first-order derivative of \(F(x)\) in the range of \((0,+\infty )\) is:

$$\begin{aligned} F^{\prime }(x)=2(x-y)+\lambda px^{p-1}. \end{aligned}$$
(13)

We solve the following equation system to obtain the threshold \(\tau _{{\mathcal {P}}}(\lambda )\) and its corresponding \(x_{{\mathcal {P}}}^{*}\):

$$\begin{aligned}{} & {} (x_{{\mathcal {P}}}^{*}-\tau _{{\mathcal {P}}}(\lambda ))^{2}+\lambda (1+(x_{{\mathcal {P}}}^{*})^{p})=(\tau _{{\mathcal {P}}}(\lambda ))^{2}, \end{aligned}$$
(14)
$$\begin{aligned}{} & {} 2(x_{{\mathcal {P}}}^{*}-\tau _{{\mathcal {P}}}(\lambda ))+\lambda p(x_{{\mathcal {P}}}^{*})^{p-1}=0. \end{aligned}$$
(15)

From Eq.(15), we can obtain that \(\tau _{{\mathcal {P}}}(\lambda )\) equals to \(x_{{\mathcal {P}}}^{*}+\frac{1}{2}\lambda p(x_{{\mathcal {P}}}^{*})^{p-1}\) and then carry it to Eq.(14):

$$\begin{aligned} (x_{{\mathcal {P}}}^{*})^{2}=\lambda (1-p)(x_{{\mathcal {P}}}^{*})^{p}+\lambda , \end{aligned}$$
(16)

we use fixed point iteration method to estimate \(x_{{\mathcal {P}}}^{*}\). After obtaining \(x_{{\mathcal {P}}}^{*}\), the threshold \(\tau _{{\mathcal {P}}}(\lambda )\) is:

$$\begin{aligned} \tau _{{\mathcal {P}}}(\lambda )=x_{{\mathcal {P}}}^{*}+\frac{1}{2}\lambda p(x_{{\mathcal {P}}}^{*})^{p-1}. \end{aligned}$$
(17)

Considering all the cases of \(y\), the solution of Eq. (8) is given by:

$$\begin{aligned} T_{{\mathcal {P}}}(y,\lambda ) = {\left\{ \begin{array}{ll} 0 &{} \text {if } |y| \le \tau _{{\mathcal {P}}}(\lambda ),\\ \text {sgn}(y)S_{p}(|y|,\lambda ) &{} \text {if } |y| > \tau _{{\mathcal {P}}}(\lambda ). \end{array}\right. } \end{aligned}$$
(18)

where \(\text {sgn}(y)\) denotes the sign of \(y\). Algorithm 1 shows the corresponding pseudo code, and we empirically use \(J=5\) in all experiments. As mentioned before, we further show that the solution of \(l_{{\mathcal {P}}}\)-minimization problem with \(p=1\) is exactly the solution of the \(l_e\)-minimization problem [9]. When \(p=1\), Algorithm 1 converges at \(t=1\), and \(x_{{\mathcal {P}}}^{*}=\sqrt{\lambda }\), \(\tau _{{\mathcal {P}}}(\lambda )=\sqrt{\lambda }+\frac{1}{2}\lambda \), then \(T_{{\mathcal {P}}}(y,\lambda )\) becomes:

$$\begin{aligned} T_{e}(y,\lambda ) = {\left\{ \begin{array}{ll} 0 &{} \text {if } |y| \le \sqrt{\lambda }+\frac{1}{2}\lambda ,\\ \text {sgn}(y)(|y|-\frac{1}{2}\lambda ) &{} \text {if } |y| > \sqrt{\lambda }+\frac{1}{2}\lambda . \end{array}\right. } \end{aligned}$$
(19)

This solution had been given in the supplementary material of [9].

Algorithm 1
figure a

Solving Eq.(8)

3.2 Deblurring model and optimization

This section presents the proposed deblurring model and the corresponding optimization scheme. We use \(l_{2}\)-norm to regularize the blur kernel \(k\) just as many state-of-the-art method [1, 2, 9, 19, 36, 37] adopt. Our proposed deblurring model is given by:

$$\begin{aligned}{} & {} \min _{I,k}||B-I*k||_{2}^{2}+\sum _{i=1,2}\omega _{i}||\nabla ^{i}B-\nabla ^{i}I*k||_{e}\nonumber \\{} & {} \quad +\alpha ||\nabla ^{1}I||_{{\mathcal {P}}}+\beta ||k||_{2}^{2}. \end{aligned}$$
(20)

Since it is intractable to obtain the solution of (20) directly, we perform alternating minimization in terms of the latent clean image \(I\):

$$\begin{aligned} \min _{I}||B-I*k||_{2}^{2}+\sum _{i=1,2}\omega _{i}||\nabla ^{i}B-\nabla ^{i}I*k||_{e}+\alpha ||\nabla ^{1}I||_{{\mathcal {P}}},\nonumber \\ \end{aligned}$$
(21)

and the blur kernel \(k\):

$$\begin{aligned} \min _{k}||B-I*k||_{2}^{2}+\sum _{i=1,2}\omega _{i}||\nabla ^{i}B-\nabla ^{i}I*k||_{e}+\beta ||k||_{2}^{2},\nonumber \\ \end{aligned}$$
(22)

3.2.1 Estimating the latent clean image \(I\)

To estimate the latent clean image \(I\), the half-quadratic splitting method is adopted by introducing auxiliary variables \(f_{1}\), \(f_{2}\), and \(g\) with respect to \(\nabla ^{1}(B-I*k)\), \(\nabla ^{2}(B-I*k)\) and \(\nabla ^{1}I\), respectively. Thus, (21) can be rewritten as:

$$\begin{aligned}{} & {} \min _{f_1,f_2,g,I}||B-I*k||_{2}^{2}+\sum _{i=1,2}\omega _{i}||f_i||_{e}\nonumber \\{} & {} +\sum _{i=1,2}\delta _{i}||f_{i}-\nabla ^{i}(B-I*k)||_{2}^{2}+\alpha ||g||_{{\mathcal {P}}}+\delta _{3}||g-\nabla ^{1}I||_{2}^{2},\nonumber \\ \end{aligned}$$
(23)

where \(\delta _{1}\), \(\delta _{2}\), and \(\delta _{3}\) are penalty parameters. This optimization problem can be solved by alternatively minimizing \(f_1\), \(f_2\), \(g\) and \(I\).

Given \(I\), we obtain \(f_1\) and \(f_2\) by:

$$\begin{aligned} \min _{f_1}\delta _{1}||f_{1}-\nabla ^{1}(B-I*k)||_{2}^{2}+\omega _{1}||f_1||_e, \end{aligned}$$
(24)

and

$$\begin{aligned} \min _{f_2}\delta _{2}||f_{2}-\nabla ^{2}(B-I*k)||_{2}^{2}+\omega _{2}||f_2||_e. \end{aligned}$$
(25)

Note that these two are minimization problem [9] and have closed-form solutions based on (19):

$$\begin{aligned} f_1 = {\left\{ \begin{array}{ll} \nabla ^{1}(D)+\frac{\omega _1}{2\delta _1} &{} \text {if } \nabla ^{1}(D)<-\sqrt{\frac{\omega _1}{\delta _1}}-\frac{\omega _1}{2\delta _1},\\ \nabla ^{1}(D)-\frac{\omega _1}{2\delta _1} &{} \text {if } \nabla ^{1}(D)>\sqrt{\frac{\omega _1}{\delta _1}}+\frac{\omega _1}{2\delta _1},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(26)

and

$$\begin{aligned} f_2 = {\left\{ \begin{array}{ll} \nabla ^{2}(D)+\frac{\omega _2}{2\delta _2} &{} \text {if } \nabla ^{2}(D)<-\sqrt{\frac{\omega _2}{\delta _2}}-\frac{\omega _2}{2\delta _2},\\ \nabla ^{2}(D)-\frac{\omega _2}{2\delta _2} &{} \text {if } \nabla ^{2}(D)>\sqrt{\frac{\omega _2}{\delta _2}}+\frac{\omega _2}{2\delta _2},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(27)

We use \(D\) to represent the residue image \(B-I*k\) for the convenience of display.

Fig. 4
figure 4

The flowchart of the proposed blur kernel estimation in a coarse-to-fine manner. \(B_{1},B_{2},\ldots ,B_{n-1}\) are the down-sampled versions of the blurry image \(B\) with gradually increasing resolution, and \(B_n\) is \(B\), where \(n\) denotes the number of image pyramid layers. After estimating the blur kernel \(k\) in one layer, it is up-sampled and passed to the next layer

The sub-problem referring to \(g\) is given by:

$$\begin{aligned} \min _{g}\delta _{3}||g-\nabla ^{1}I||_{2}^{2}+\alpha ||g||_{{\mathcal {P}}}. \end{aligned}$$
(28)

This is a \(l_{{\mathcal {P}}}\)-minimization problem, and the solution can be obtained via Algorithm 1.

For the fixed \(f_1\), \(f_2\) and \(g\), estimating \(I\) becomes a quadratic problem:

$$\begin{aligned}{} & {} \min _{I}||B-I*k||_{2}^{2}+\sum _{i=1,2}\delta _{i}||f_{i}\nonumber \\{} & {} -\nabla ^{i}(B-I*k)||_{2}^{2}+\delta _{3}||g-\nabla ^{1}I||_{2}^{2}. \end{aligned}$$
(29)

By using fast Fourier transform (FFT), we can obtain the closed-form solution of (29) in the following:

$$\begin{aligned} I={\mathcal {F}}^{-1}\left( \frac{\overline{{\mathcal {F}}(k)}{\mathcal {F}}(B)+\sum _{i=1,2}\delta _{i}\overline{{\mathcal {F}}(\nabla ^{i}){\mathcal {F}}(k)}\left( {\mathcal {F}}(\nabla ^{i}B)-{\mathcal {F}}(f_i) \right) +\delta _{3}\overline{{\mathcal {F}}(\nabla ^{1})}{\mathcal {F}}(g)}{\overline{{\mathcal {F}}(k)}{\mathcal {F}}(k)+\sum _{i=1,2}\delta _{i}\overline{{\mathcal {F}}(\nabla ^{i}){\mathcal {F}}(k)}{\mathcal {F}}(\nabla ^{i}){\mathcal {F}}(k)+\delta _{3}\overline{{\mathcal {F}}(\nabla ^{1})}{\mathcal {F}}(\nabla ^{1})} \right) \nonumber \\ \end{aligned}$$
(30)

where \({\mathcal {F}}(\cdot )\), \({\mathcal {F}}^{-1}(\cdot )\), and \( \overline{{\mathcal {F}}(\cdot )} \) denote the FFT, inverse FFT, and the complex conjugate operator after FFT, respectively.

3.2.2 Estimating the blur kernel \(k\)

Based on the valuable experiences of previous works [10, 38], we adopt an improved version of (22) by solving \(k\) in the gradient domain, i.e.,

$$\begin{aligned}{} & {} \min _{k}||\nabla ^{1}B-\nabla ^{1}I*k||_{2}^{2}\nonumber \\{} & {} +\sum _{i=1,2}\omega _{i}||\nabla ^{i}B-\nabla ^{i}I*k||_{e}+\beta ||k||_{2}^{2}. \end{aligned}$$
(31)

We also use the half-quadratic splitting method to introduce auxiliary variables \(h_{1}\) and \(h_{2}\) with respect to \(\nabla ^{1}(B-I*k)\) and \(\nabla ^{2}(B-I*k)\), then the above formulation can be rewritten as:

$$\begin{aligned}{} & {} \min _{h_1,h_2,k}||\nabla ^{1}B-\nabla ^{1}I*k||_{2}^{2}+\sum _{i=1,2}\omega _{i}||h_i||_{e}\nonumber \\{} & {} +\sum _{i=1,2}\varphi _{i}||h_{i}-\nabla ^{i}(B-I*k)||_{2}^{2}+\beta ||k||_{2}^{2}. \end{aligned}$$
(32)

where \(\varphi _{1}\) and \(\varphi _{2}\) are penalty parameters. We solve (32) by alternatively updating \(h_1\), \(h_2\) and \(k\) from the following formulations:

$$\begin{aligned}{} & {} \min _{h_{1}}\varphi _{1}||h_{1}-\nabla ^{1}(B-I*k)||_{2}^{2}+\omega _{1}||h_{1}||_{e}, \end{aligned}$$
(33)
$$\begin{aligned}{} & {} \min _{h_{2}}\varphi _{2}||h_{2}-\nabla ^{2}(B-I*k)||_{2}^{2}+\omega _{2}||h_{2}||_{e}, \end{aligned}$$
(34)

and

$$\begin{aligned}{} & {} \min _{k}||\nabla ^{1}B-\nabla ^{1}I*k||_{2}^{2}\nonumber \\{} & {} +\sum _{i=1,2}\varphi _{i}||h_{i}-\nabla ^{i}(B-I*k)||_{2}^{2}+\beta ||k||_{2}^{2}. \end{aligned}$$
(35)

The solutions of (33) and (34) are similar with (26) and (27), respectively. And (35) can be efficiently solved by FFT.

$$\begin{aligned} k={\mathcal {F}}^{-1} \left( \frac{\overline{{\mathcal {F}}(\nabla ^{1}I)}{\mathcal {F}}(\nabla ^{1}B)+\sum _{i=1,2}\varphi _{i}\overline{{\mathcal {F}}(\nabla ^{i}I)} \left( {\mathcal {F}}(\nabla ^{i}B)-{\mathcal {F}}(h_i)\right) }{\overline{{\mathcal {F}}(\nabla ^{1}I)}{\mathcal {F}}(\nabla ^{1}I)+\sum _{i=1,2}\varphi _{i}\overline{{\mathcal {F}}(\nabla ^{i}I)}{\mathcal {F}}(\nabla ^{i}I)+\beta } \right) .\nonumber \\ \end{aligned}$$
(36)

After obtaining \(k\), its negative elements are set to 0 and then normalized \(k\). Same to state-of-the-art deblurring methods, the blur kernel estimation is performed in a coarse-to-fine manner with an image pyramid [10] as shown in Fig. 4, the main steps from one of pyramid layers are listed in Algorithm 2.

Fig. 5
figure 5

Quantitative evaluations on the dataset [24] in terms of Success Percent

Algorithm 2
figure b

Blur kernel estimation algorithm

4 Experimental results

Same as previous algorithms [1, 2, 9, 35], the proposed method only estimates the blur kernel from the blurry input and then adopts one of the existing non-blind deblurring methods [13, 14, 20, 39,40,41] to obtain the final clean image. In all experiments, we experientially set \(\omega _{1}=4e-3\), \(\omega _{2}=4e-4\), \(\alpha =4e-3\), and \(\beta =2\). Since \(p\) is an important parameter in the proposed model, the deblurring effects with different \(p\) values (ranged from 0.1 to 1 with a step of 0.1) are evaluated on three widely-used benchmark datasets [20, 24, 42]. We show two results of the proposed method on these datasets, i.e., Ours-FP: setting a fixed \(p\) value for the entire dataset, and Ours-GFP: setting a fixed \(p\) value for a group of blurry images with the same ground truth in the dataset. And the results of other methods are provided by the authors or generated by their code using default settings. Our method is implemented in MATLAB, and all experiments are run on a computer with Intel Core i7-8700 CPU and 16 GB RAM.For a fixed blur kernel size of 21\(\times \)21, it takes about 10, 35, 150 s to process a blurry image with a size of 200\(\times \)200, 400\(\times \)400, 800\(\times \)800, respectively. As the image and blur kernel size increases, the time required also increases accordingly.

Table 1 Quantitative comparisons of state-of-the-art methods in terms of mean PSNR, mean SSIM, mean Error Ratio, and the Success Percent with \(r \le 5\) over the entire dataset [24]
Fig. 6
figure 6

Qualitative comparison on deblurred image regions (red rectangle) from the dataset [24]. There are three blurry images on the left side, and each blurry image is tested on 8 methods. From top to bottom and from left to right, these methods are Cho and Lee [10] (20.72dB, 21.30dB, 22.67dB), Sun et al. [24] (25.66dB, 28.46dB, 30.15dB), Michaeli and Irani [28] (31.61dB, 30.26dB, 30.42dB), Lai et al. [27] (32.51dB, 28.99dB, 32.79dB), Li et al. [1] (22.84dB, 31.58dB, 31.78dB), Wen et al. [36] (25.96dB, 32.47dB, 31.61dB), Chen et al. [9] (27.82dB, 31.13dB, 31.50dB), and our method (35.01dB, 34.26dB, 35.16dB), respectively

Fig. 7
figure 7

Visual comparison of the results from different methods over a challenging people image in dataset [42]

Fig. 8
figure 8

Visual comparison of the results from different methods over a challenging saturated image in dataset [42]

Table 2 Quantitative comparisons of state-of-the-art methods in terms of PSNR over the dataset [42]

4.1 Sun et al.’s dataset

We first evaluate the proposed method on the dataset of [24], this dataset contains 640 blurry images that are generated by 80 natural images and 8 blur kernels [38]. For fair comparisons, we use the provided codes of methods [1, 9, 36] to estimate blur kernels and the same non-blind deblurring method [41] to obtain the deblurring results, the results of methods [10, 24, 27, 28] are duplicated from [24]. We measure the quantitative performances of all methods with Error Ratio, which is defined by Levin et al. [38],

$$\begin{aligned} r=\frac{||I-I_{{\hat{k}}}||_{2}^{2}}{||I-I_{k}||_{2}^{2}}, \end{aligned}$$

where \(I\) denotes the ground-truth clean image, \(I_{k}\) and \(I_{{\hat{k}}}\) are the restored images generated by the ground-truth blur kernel \(k\) and the estimated kernel, respectively. The smaller \(r\) means the better result. Figure 5 reports the success percent of all methods with different Error Ratios, and the Success Percent refers to fractions of images that can be restored within a given \(r\). From this figure, one can see that the proposed method consistently outperforms these state-of-the-art methods. We further introduce the mean PSNR, mean SSIM, mean Error Ratio, and the Success Percent with \(r \le 5\), in Table 1. The proposed method is superior to all competing methods in all measures. Figure 6 shows three qualitative examples of challenging cases. Compared with other algorithms, the proposed method achieves more robust results with fewer artifacts.

4.2 Lai et al.’s dataset

Then the proposed method is tested on a challenging dataset [42], which consists of 100 blurry images made by 25 high-quality clean images (including 5 categories: Man-made, Natural, People, Saturated, and Text images) and 4 large-scale blur kernels. We compare the proposed method with seven other algorithms [1, 9, 10, 18, 20, 34, 36]. For a fair comparison, the same blur kernel sizes and the non-blind deblurring method [39] are adopted for all methods. Table 2 shows the comparisons of the methods in terms of average PSNR. It can be seen that our method outperformed other methods in all categories. As shown in Fig. 7 and Fig. 8, images restored by the proposed method contain sharper edges and are visually more satisfying compared with other methods.

4.3 Pan et al.’s dataset

To further illustrate the superiority of the proposed method, we evaluate on the text dataset of Pan et al. [20], and compare it with state-of-the-art methods [1, 9, 10, 18, 20, 34, 36]. This text dataset contains 120 blurry images generated by 15 ground-truth document images and 8 blur kernels from [38]. We adopt the same non-blind deblurring method [20] to generate final clean images for a fair comparison. Table 3 shows the average PSNR and SSIM of each method. The proposed method performs favorably against all these methods. As shown in Fig. 9, our result has fewer residual blur and ringing artifacts than other methods.

4.4 Real-world blurry images

For real blurry images, the blur cases of them are usually more complicated, thus it is challenging to handle real images for most deblurring methods. Finally, besides synthetic datasets, we evaluate the performance of the proposed method on real blurry images gathered by [42], and compare it with state-of-the-art methods [9, 18, 34, 36]. Figure 10 and Fig. 11 show two examples, i.e., “postcard” and “boat2,” respectively, and it can be seen that the deblurring results of the proposed method in general have better visual quality. Note that the same non-blind deblurring method [39] is applied to restore the final results. As shown in Fig. 10, methods [18, 34, 36] fail to generate valuable results, in contrast, the method [2] generates generally satisfactory result and our result is even better than that of [9]. Combined with Fig. 11, the proposed method tends to recover more robust and clearer results with better visual quality.

5 Analysis and discussion

5.1 Effectiveness of the proposed model

In this section, we conduct an ablation study on datasets [20, 42] to show the effectiveness of the proposed model (20). Considering different data fidelity terms (4) and (5), and sparse regularizations such as \(l_{0}\), \(l_{1}\), \(l_{p}\), \(l_{e}\) and \(l_{{\mathcal {P}}}\) on image gradients, we develop a total of 10 combinations as listed in Table 4 and show quantitative comparisons in Table 5. We use the same parameters on different models for fair comparisons, i.e., \(\omega _{1}=4e-3\), \(\omega _{2}=4e-4\), \(\alpha =4e-3\), \(\beta =2\) and \(p=0.6\).

In the view of the data fidelity term, (5) is not always superior to (4), for example, the performance of the Model5 is better than that of the Model6. But in most cases, (5) contributes to improving the effectiveness of models. For the regularizations on image gradients, we can see that enforcing more sparsity constraints on image gradients can achieve better performance. In the end, the proposed model achieves the best results among different combinations. The success mainly stems from two aspects: constraining the second-order derivative of noise focuses more on high frequencies erased in the blurry image, and constraining the \(l_{{\mathcal {P}}}\)-norm of image gradients helps to recover a sharper latent clean image, favoring the estimation of the blur kernel.

5.2 Convergence property analysis

The proposed model is highly non-convex due to the involvement of \(l_{0}\) and \(l_{p}\)-norm, one may want to know whether our overall optimization scheme is convergent or not. Unfortunately, the theoretical proof of convergence property is intractable. We instead explain the convergence of the proposed algorithm by carrying out experiments on the dataset [20], which contains 120 synthetic blurry images. We record the average values of the energy function (21) and the kernel similarity [25] over iterations at the finest image scale as shown in Fig. 12, it can be seen that the proposed algorithm converges after less than 30 iterations, which demonstrates the effectiveness of our optimization algorithm.

Table 3 Quantitative comparisons of state-of-the-art methods in terms of average PSNR and SSIM over the dataset [20]
Fig. 9
figure 9

Visual comparison of the results from different methods over a text image in dataset [20]

Fig. 10
figure 10

Visual comparison of the results from different methods over the “postcard” from [42]

Fig. 11
figure 11

Visual comparison of the results from different methods over the “boat2” from [42]

Fig. 12
figure 12

Convergence property analysis of the proposed algorithm

5.3 Relations with other deblurring methods

As discussed in Sect. 3.1, one can obtain the solution of the \(l_{e}\)-minimization problem [9] through the proposed Algorithm 1 by setting \(p=1\). Therefore, our method with \(p=1\) and reduces to the deblurring method of Chen et al. [9]. Previous methods [15,16,17,18] usually assume the \(l_{p}\) sparsity on image gradients, and a nature question is how to set \(p\) value during the iterations. For this problem, methods [16, 17] used a fixed \(p\) value throughout the process, and [15] experientially set a series of decreasing \(p\) values. Moreover, the method [18] automatically learned \(p\) values via an iteration-wise learning method. Although our method achieves competitive results on commonly-used datasets [20, 24, 42] and real blurry images, we can only set a fixed \(p\) value by experience rather than theoretical guidance.

5.4 Analysis of some key parameters

The proposed method has four key parameters in addition to \(p\), i.e., \(\omega _1\), \(\omega _2\), \(\alpha \), and \(\beta \). To analyze the effect of these parameters to the experiments, we carry out experiments on the dataset from [20] by varying one of them and keeping others fixed. Average PSNR is adopted as the evaluation metric. As shown in Fig. 13, the proposed method is not very sensitive to changes in these parameters within reasonable ranges. And we empirically select parameters that result in better performance as default parameters.

Fig. 13
figure 13

Analysis of key parameters in the proposed method

5.5 Limitations of the proposed method

As mentioned earlier, there are practical limitations to our method, specifically related to the selection of the parameter \(p\). In our experiments of Sect. 4, \(p\) ranged from 0.1 to 1 with a step of 0.1, and we choose the best \(p\) value based on the experimental results. For example, Fig. 14 shows the performance of different \(p\) values on datasets [20, 24, 42], and we, respectively, choose \(p\) as 0.1, 0.6, and 0.6. In fact, there is a relationship between the choice of \(p\) and the semantic information of the blurred image. In most deblurring methods [12,13,14], the gradient of natural clean images is considered to obey a heavy-tailed distribution (commonly modeled by a hyper-Laplacian prior\( P(\nabla ^{1}I)\propto e^{-\alpha |\nabla ^{1}I|^p} \) with \( 0.5\le p \le 0.8\)), which is used as regularization in the process of image restoration to help recover clean and accurate images. Therefore, there must exist a \(p\) that best recovers a blurry image, and our future work can focus on how to derive \(p\) from the given blurred image. According to abundant deblurring results in our work, we believe that setting \(p\) to 0.6 can achieve satisfactory results in most cases.

Fig. 14
figure 14

The performance of different \(p\) values on datasets [20, 24, 42]

6 Conclusion

In this paper, we have proposed an improved sparse regularization, namely \(l_{{\mathcal {P}}}\)-norm, and a flexible model for blind image deblurring. The \(l_{{\mathcal {P}}}\)-norm is a combination of \(l_{0}\) and \(l_{p}(0<p \le 1)\). Our model can be seen as a generalization of the enhanced sparse model. In order to solve the \(l_{{\mathcal {P}}}\)-minimization problem, we present an effective method based on the generalized iterated shrinkage algorithm and further combine this algorithm with the half-quadratic splitting method to develop an optimization scheme for the proposed model. Experimental results show that the proposed method performs favorably against state-of-the-art methods on commonly-used synthetic datasets and real blurry cases. The proposed model has limitations to select the \(p\) values, but in practical applications, we can set the value of \(p\) to 0.6.

Table 4 Different regularization combinations
Table 5 Quantitative comparisons of different models in terms of average PSNR over the datasets [20, 42]