1 Introduction

The nonparametric blur-kernel estimation problem, using either single image or multi-observation, has been intensively studied since the influential work of Fergus et al. [1]. In the current literature, it is also commonly believed that [2], multi-observation blur-kernel estimation is better posed than its single counterpart. Due to this possible cause, single- and multi-observation blur-kernel estimations are generally modeled and solved in an independent manner, to be observed from the brief overview in the following. Note that, this paper only focuses on the spatially invariant blur-kernel estimation, which forms the basic step for more challenging scenarios of spatially variant and noisy blur-kernel estimation problems [37].

1.1 Nonparametric Blur-Kernel Estimation Methods

1.1.1 Single-shot Case

Roughly speaking, single-shot blur-kernel estimation is currently addressed by two types of inference schemes, i.e., variational Bayesian (VB) and maximum-a-posteriori (MAP). VB is demonstrated, both empirically [1] and theoretically [8] in the scenario of single-shot blur-kernel estimation, superior to MAP as they are regularized by common natural image statistics, e.g., total variation [911]. Readers can refer to [1214] for detailed information on the VB-based signal and image processing approaches. Recently, VB-based blur-kernel estimation methods are reformulated equivalently into their counterpart MAP versions in [15] (regularized by a coupled prior on the sharp image and the blur-kernel), using the convex type of variational representation of image priors. Based on the transformation, the theoretical analysis on the choice of image priors is made, showing that the Jeffreys prior is optimal to a certain degree in terms of blur-kernel estimation precision. However, the VB-based blur-kernel estimation methods in [15] are found not rooted in a rigorous optimization perspective. On one hand, the kernel is modeled as a random variable but practically inferred as a deterministic parameter. On the other hand, noise variance updating is full of heuristics. A decreasing sequence of noise variance is pre-specified in [1, 8, 16, 17], while as for [15], although it achieves an adaptive update of the noise variance, the updating formula is purely empirical. Compared to VB, the MAP principle is practiced more commonly. The reason lies in that (i) it is intuitive; (ii) it provides a simple problem formulation; (iii) it is flexible in choosing regularization terms; and (iv) it typically leads to an efficient numerical implementation. More importantly, the state-of-the-art MAP approaches achieve comparable or even higher accuracy of blur-kernel estimation than the VB ones, via performing kinds of heuristic tricks for salient edge prediction, e.g., using highly nonconvex image priors [1820], enhancing edges by smoothing and shock filtering [2123], and so on. Nevertheless, it is generally difficult to conduct a profound analysis on success of those empirical prior-based MAP methods. In addition, there are many parameters that influence the results considerably [2], having to be tuned frequently for generating good deblurring results.

According to the literature survey, it is seen that unnatural image priors are more popular and preferred, particularly as practicing MAP-based single-shot blur-kernel estimation. We also note that the priors are either the off-the-shelf ones, e.g., Jeffreys prior [15, 16], normalized sparsity measure [18], or the empirically designed ones, e.g., [2023]. Therefore, it is natural to raise a question that whether there is a possibility of adaptively learning a parametric prior for the latent image along with the blur-kernel estimation, so as to achieve more accurate blur-kernel estimation while avoiding the annoying parameter adjustment.

1.1.2 Multi-shot Case

The ill-posed nature of blur-kernel estimation can be remedied to a great extent when capturing multi-observation of the same scene [2]. The possible first multi-shot blur-kernel estimation method, proposed by Rav-Acha and Peleg [24], uses two-directional motion blurred images, demonstrating a superior deblurring quality than the case of single image. The benefits of multi-observation deblurring, both nonblind and blind, are also empirically analyzed in [25]. Due to the considered better posedness of multi-shot blur-kernel estimation, the plain \({\varvec{\fancyscript{l}}}_{\alpha }\)-norm-based natural image statistics are frequently utilized in existing blur-kernel estimation approaches, where \(\alpha \) is around 1. For example, [2, 26] exploit an \({\varvec{\fancyscript{l}}}_{{1}}\)-norm of image gradients; the \({\varvec{\fancyscript{l}}}_{{1.2}}\)-norm and \({\varvec{\fancyscript{l}}}_{{0.8}}\)-norm are, respectively, utilized in [24, 27]; and [25] harnesses the \({\varvec{\fancyscript{l}}}_{{1}}\)- norm of the tight frame coefficients of images. The reason of MAP-based multi-shot kernel estimation methods relieving from unnatural image priors as required in the single-shot case is that they place more emphasis on exploring the intrinsic properties of two blur-kernels.

A well-known one is the commutative property of convolution, i.e., \({\mathbf {k}}_{2} *{\mathbf {y}}_{1} -{\mathbf {k}}_{1} *{\mathbf {y}}_{2} =0\), originally proposed in [28], where \({\mathbf {y}}_{1} ,\,{\mathbf {y}}_{2}\) are the two blurred versions of a clear image u, and \({\mathbf {k}}_{1} ,\, {\mathbf {k}}_{2}\) are the blur-kernels corresponding to \({\mathbf {y}}_{1} ,\, {\mathbf {y}}_{2}\). In [29], Sroubek and Flusser reformulate the idea of [28] into a multi-shot regularization term on the blur-kernels. However, there are at least two aspects of modeling flaws as using the commutative property. On one hand, it builds on a noiseless observation model; on the other hand, there is a solution ambiguity in that, for an optimal solution pair \(\{{\mathbf {k}}_{1} ,{\mathbf {k}}_{2} \}\), a family of solutions \(\{{\mathbf {k}}_{1} *{\mathbf {h}},\, {\mathbf {k}}_{2} *{\mathbf {h}}\}\) also satisfy the commutative property of convolution. To overcome the inherent problems of the property, several works extend the idea of [28]. For example, in [2] the multi-shot regularization term is improved for robustness against noise; in [27] a sparse prior and a continuity prior are designed for the kernel to mitigate the ambiguity issue; in [26], the estimated blur-kernels \({\mathbf {k}}_{1} *{\mathbf {h}}\) and \({\mathbf {k}}_{2} *{\mathbf {h}}\) are viewed as blurred images and deblurred using an \({\varvec{\fancyscript{l}}}_{\alpha }\)-norm-based prior on the kernels \(\{{\mathbf {k}}_{1} ,{\mathbf {k}}_{2} \}\). One more limitation of the commutative property of convolution is that, it just applies to the blurred image pairs. As the number of observations grows, it would expand combinatorially hence leading to higher computational burden. In addition, the commutative property of convolution does not apply to a blurred/noisy image pair [30, 31].

Recently, another multi-observation blur-kernel estimation method is proposed by Zhang et al. [29], free of the known commutative property. It roughly extends their results on the modeling equivalence [15] between the VB and MAP-based single-shot blur-kernel estimation methods to the multi-shot case, which, however, still imposes the Jeffreys prior on the sharp image. Although the Jeffreys prior is shown to be optimal to a certain degree in the single-shot case, in [29], there is no theoretical analysis made on the optimality of the Jeffreys prior for the multi-shot scenario. Note that Zhang et al. [29] cannot be reduced to the single-shot blur-kernel estimation [15], either, without carefully modifying their updating equation of the noise variance. Our another observation from [29] is that it requires the size of the blurred images smaller than the latent sharp image, which is obviously irrational. As for the practical implementation, the minimizing objective function in [29] is derived from the empirical Bayes analysis scheme, or type-II maximum likelihood (ML), and the majorization- minimization (MM) technique [32]. MM has been also extensively used in [29] to derive a tractable optimization scheme. The detailed algorithmic derivation can be referred to Sect. 6 in [29]. Interestingly, the final algorithm of [29] is just a degenerated version of the proposed Algorithm 1 in Sect. 4 of the present paper, despite their difference in modeling motivation, problem formulation, and algorithmic derivation between the two approaches. And, what is more important, the proposed method has been demonstrated to achieve fairly better deblurring performance than [29].

1.2 Applications

One natural application of single/multi-shot blur-kernel estimation is the camera-shake deblurring [1], i.e., restoring a clear image from the acquired single- or multiple-motion-blurred photographs. Another application of the proposed method to be considered in this paper is the nonparametric blind super-resolution. For readers better understanding the importance of blur-kernel estimation for super-resolution, we introduce a brief background on nonparametric blind super-resolution. Since the seminal work is proposed by Freeman and Pasztor [33] and Baker and Kanade [34], single-image SR has earned a considerable attention. A careful inspection of the literature in this arena finds that, existing approaches, either learning- or reconstruction-based, concentrate on developing advanced image priors, while mostly ignoring the need to estimate the blur-kernel. A recent comprehensive survey on SR in [35] (covering work up to 2012) testifies that SR methods generally resort to the assumption of a known kernel, both in the single image and the multi-frame SR regimes. More specifically, in the context of multi-frame SR, most methods assume a squared Gaussian kernel with a suitable standard deviation \(\delta \), e.g., \(3\times 3\) with \(\delta =0.4\) [36], \(4\times 4\) with \(\delta =1\) [37], \(5\times 5\) with \(\delta =1\) [38], or a simple pixel averaging, e.g., \(3\times 3\) [39, 40]. In [41], the authors have considered not only a \(3\times 3\) Gaussian blur but also a delta blur, i.e., no blur is involved in the modeling of low-res imaging. In single-image nonblind SR, we list a few commonly used options: bicubic low-pass filter (implemented by the Matlab default function imresize) [4249], \(7\times 7\) Gaussian kernel with \(\delta =1.6\) [49], \(3\times 3\) Gaussian kernel with \(\delta =0.55\) [50], and a pixel averaging [51]. Interestingly, a critical study on single-image SR is recently presented in [52]. The authors conduct an analysis on the effects of two components in single-image SR, i.e., the choice of the image prior and the availability of an accurate blur model. Their conclusion, based on both empirical and theoretical analysis, is that the influence of an accurate blur-kernel is significantly larger than that of an advanced image prior. Furthermore, [52] validates that “an accurate reconstruction constraintFootnote 1 combined with a simple gradient regularization achieves SR results almost as good as those of state-of-the-art algorithms with sophisticated image priors.” However, very limited works have addressed the estimation of an accurate nonparametric blur model in the context of image SR reconstruction, either with a single image or multiple frames. As a matter of fact, existing blind SR methods usually assume a parametric model, and the Gaussian kernel is a common choice, e.g., [53, 54]. The work reported in [55] is an exception, as it does present a nonparametric kernel estimation approach for blind SR and blind deblurring in a unified framework. However, it is restricting its treatment to single-mode blur-kernels. In addition, [55] is not rooted in a rigorous optimization principle, but rather building on the prediction of sharp edges as an important clue to blur-kernel estimation. Another noteworthy and highly related work is the one recently proposed by Michaeli and Irani [56], who exploit an inherent recurrence property of small natural image patches across different scales and propose a \(\text {MAP}_{\mathbf{k}}\)-based estimation method for recovering the blur-kernel (nonparametrically) [17]. The effectiveness of [56] largely builds on the searched nearest neighbors to the query low-res patches in the input blurred and low-res image. We should note that in both [55, 56], the kernel estimation includes an \({\varvec{\fancyscript{l}}}_{{2}}\)-norm-based kernel gradient regularization for promoting smoothness. In this paper, we harness the state-of-the-art nonblind dictionary-based fast single-image SR methods [4446], which assume the simplest bicubic interpolation kernel for the observation model. With the super-resolved but blurred high-res image generated by those methods, the proposed approach (single-shot blur-kernel estimation) is found able to estimate a highly reasonable blur-kernel for practicing blind super-resolution, showing the effectiveness and robustness of the proposed method.

1.3 Our Contributions

The core contribution in this paper is the proposal of a unified minimization approach to single/multi-observation blur-kernel estimation. Specifically, our contributions include as follows:

Firstly, the unified approach bases on a directed graphical model, in which the latent sharp image and the inverse noise variance associated with each observation are viewed as random variables, while each blur-kernel, different from existing VB methods, is just modeled as a deterministic parameter. Utilizing a universal, three-level hierarchical prior on the latent sharp image and a Gamma hyper-prior on each inverse noise variance, the single- and multi-shot blur-kernel estimations are uniformly formulated into a negative log-marginal-likelihood minimization problem, which is regularized by an \({\varvec{\fancyscript{l}}}_{{0.5}}\)-norm of blur-kernels.

Secondly, by borrowing ideas of expectation-maximization, majorization-minimization, and mean field approximation, as well as iteratively reweighted least squares, all the unknowns of interest, i.e., the latent sharp image, each blur-kernel, and inverse noise variance, and other related parameters are automatically estimated. Compared with most existing single/multi-shot blur-kernel estimation methods, the proposed approach is not only more flexible in processing multiple observations under distinct imaging scenarios due to its independence of the commutative property of convolution, but also more adaptive in image prior modeling while in the meanwhile with much less implementational heuristics.

Thirdly, the proposed blur-kernel estimation approach is applied to single-image nonparametric blind super-resolution (SR). It utilizes the state-of-the-art nonblind dictionary-based SR methods which often produce a super-resolved but blurred high-res image. With the blur-kernel estimated from the blurred, super-resolved image, the final super-resolved image can be recovered by a relatively simple nonblind SR method that uses a natural image prior. Compared against Michaeli and Irani’s [56] nonparametric blind SR method, the proposed approach is demonstrated to achieve quite comparable or even more accurate blur-kernel estimation, and hence, better super-resolved high-res images. Experimental results on Levin et al.’s [8] benchmark real-world motion blurred images as well as simulated multiple-blurred observations also validate the effectiveness and superiority of the proposed approach for camera-shake deblurring.

The rest of the paper is organized as follows. Section 2 sets up the problem of single/multi-shot blur-kernel estimation. In Sect. 3, Bayesian models of the latent sharp image, each blurred image and inverse noise variance are discussed and depicted using a directed graphical model, based on which a unified minimizing objective function is constructed for the nonparametric single/multi-shot blur-kernel estimation. Section 4 presents a numerical scheme of estimating the posterior distributions of hidden random variables as well as the involved deterministic model parameters. Experiment results on nonparametric blind SR and camera-shake deblurring are provided in Sect. 5, with comparison against the state-of-the-art approaches. The paper is finally concluded in Sect. 6.

2 Problem Setup

The single/multi-shot observation process is described as a uniform convolution blur model [2, 2429]

(1)

where I is the number of captured corrupted observations, \({\mathbf {y}}_{i}\) is the \(i\text {th}\) observation, \({\mathbf {k}}_{i}\) represents the \(i\text {th}\) blur-kernel, and \({\mathbf {n}}_{i}\) is supposed to be a Gaussian noise. To be noted that, a registration step is required to align the input observations prior to running the proposed blur-kernel estimation algorithm, just similar to the state-of-the-art multi-shot methods, e.g., [2, 29]. The core task in this paper is hence to estimate the blur-kernels \(\{{\mathbf {k}}_{i} \}_{i\in \varvec{\mho }}\) from the provided blurry images \(\{{\mathbf {y}}_{i} \}_{i\in \varvec{\mho }}\).

Fig. 1
figure 1

Directed graphical model describing the dependencies among the stochastic variables and the deterministic model parameters. The observed stochastic variables are represented by the double circled nodes; the hidden random variables by the circular nodes, and the deterministic parameters by the square nodes

Similar to many existing blur-kernel estimation methods [1, 8, 1517, 20, 21, 29], we address the blur-kernel estimation in the derivative domain of images for better performance. In specific, Eq. (1) is formulated in the derivative domain using the first-order horizontal and vertical derivative filters , given as

(2)

where \({\mathbf {y}}_{i\vert d} =\varvec{\nabla }^{d}*{\mathbf {y}}_{i} ,\,{\mathbf {u}}_{d} =\varvec{\nabla }^{d}*{\mathbf {u}},\,{\mathbf {n}}_{i\vert d} =\varvec{\nabla }^{d}*{\mathbf {n}}_{i}\). For the convenience of presentation, Eq. (2) can be also rewritten in a matrix-vector form as

(3)

where \(\{\mathbf {y}_{i\vert d} ,\,\mathbf {u}_{d} ,\,\mathbf {n}_{i\vert d} \}_{i\in \varvec{\mho } ,\,d\in \varvec{\Lambda }} \in \mathbf {R}^{M\times 1}\) are the vectorized representations of \(\{{\mathbf {y}}_{i\vert d} ,\,{\mathbf {u}}_{d} ,\, {\mathbf {n}}_{i\vert d} \}_{i\in \varvec{\mho } ,\,d\in \varvec{\Lambda }} ,\, M\) is the pixel number in the sharp image u, and \(\mathbf {K}_{i} \in \mathbf {R}_{\mathbf {0}}^{\varvec{+}M\times M}\) denotes the convolutional matrix corresponding to the vectorized blur-kernel \(\mathbf {k}_{i}\). For simplicity, we still assume that \(\mathbf {n}_{i\vert d}\) is a zero-mean Gaussian noise, i.e., \(\mathbf {n}_{i\vert d} \sim \varvec{\mathcal {N}}(0,\tilde{{\varsigma }}_{i} \mathbf {I}_{M} )\), where \(\tilde{{\varsigma }}_{i}\) is the noise variance called the precision hyper-parameter. The problem now turns to estimating \(\{\mathbf {u}_{d} \}_{d\in {\varvec{\Lambda }}},\, \{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}\) and \(\{\tilde{{\varsigma }}_{i}\}_{i\in \varvec{\mho }}\) from the images \(\{\mathbf {y}_{i\vert d} \}_{i\in \varvec{\mho },\,d\in \varvec{\Lambda }}\).

3 Single/Multi-shot Blur-Kernel Estimation: A Unified Formulation

In this section, single/multi-shot blur-kernel estimation is formulated into a unified framework, incorporating prior assumptions on both the latent sharp images and the blur-kernels via a composite VB and MAP modeling strategy. Simply speaking, the inverse problem (3) is addressed using a directed graphical model, where the observations \(\{\mathbf {y}_{i\vert d} \}_{i\in \varvec{\mho },\,d\in \varvec{\Lambda }} \) are treated as the observed random variables, the sharp images \(\left\{ {\mathbf {u}_{d}} \right\} _{d\in \varvec{\Lambda }}\) and the inverse noise variances \(\{\varsigma _{i} \}_{i\in \varvec{\mho }}\) are treated as the hidden random variables with assigned proper probability densities, and the blur-kernels \(\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}\) are just modeled as deterministic parameters imposed by certain regularization constraints. To be specific, \(\mathbf {y}_{i\vert d} \) obeys a conditional density \(\hbox {p}(\mathbf {y}_{i\vert d} \vert \mathbf {u}_{d} ;\varsigma _{i} ,\mathbf {k}_{i} )\), where a hyper-prior density \(\hbox {p}(\varsigma _{i} ;a,b)\) is imposed on the hyper-parameter \(\varsigma _{i} \). The sharp image \(\mathbf {u}_{d} \) is modeled by a general sparseness-promoting density represented in a three-layer hierarchical integral form, i.e.,

$$\begin{aligned} \text {p}(\mathbf {u}_{d} ;\alpha _{d} ,\rho _{d} ,\theta _{d})= & {} \int \text {p}(\mathbf {u}_{d} \vert {\varvec{\gamma }}_{d}) \text {p} ({\varvec{\gamma }}_{d} \vert {\varvec{\beta }}_{d} ; \alpha _{d}) \text {p}({\varvec{\beta }}_{d} ;\rho _{d} ,\theta _{d}) \nonumber \\&d{\varvec{\gamma }}_{d} d{\varvec{\beta }}_{d} , \end{aligned}$$
(4)

where \(\varvec{\gamma }_{d}\) and \(\varvec{\beta }_{d} \) are the stochastic hyper-parameters assigned, respectively, the hyper-prior densities \(\text {p}(\varvec{\gamma }_{d} \vert \varvec{\beta }_{d} ;\alpha _{d} )\) and \(\text {p}(\varvec{\beta }_{d} ;\rho _{d} ,\theta _{d} )\). As for \(\alpha _{d} ,\rho _{d} ,\theta _{d}\), the same as blur-kernels \(\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}\), they are just modeled as deterministic parameters. The directed graphical model as shown in Fig. 1 plots the dependencies among the stochastic variables and the deterministic model parameters.

3.1 Modeling of Observations and Sharp Images

In the following, Bayesian models of the multi-observations and the latent sharp images are elaborated and formulated in more details. As \(\mathbf {n}_{i\vert d} \) is assumed a zero-mean Gaussian noise, the likelihood function \(\text {p}(\mathbf {y}_{i\vert d} \vert \mathbf {u}_{d} ,\varsigma _{i} ;\mathbf {k}_{i} )\) can be given as

$$\begin{aligned} \text {p}\left( \mathbf {y}_{i\vert d} \vert \mathbf {u}_{d} ,\varsigma _{i} ;\mathbf {k}_{i} \right) =\varvec{\mathcal {N}}\left( \mathbf {y}_{i\vert d} \vert \mathbf {K}_{i} \mathbf {u}_{d} ,\varsigma _{i}^{-1} \mathbf {I}_{M}\right) , \end{aligned}$$
(5)

where \(\varsigma _{i} =\tilde{{\varsigma }}_{i}^{-1} ,\) and a Gamma hyper-prior is imposed on \(\varsigma _{i} \) to simplify posterior inference because it is conjugate to the Gaussian probability density function (PDF), i.e.,

$$\begin{aligned} \text {p}(\varsigma _{i} ;a,b)={{\varvec{\fancyscript{Ga}}}}(\varsigma _{i} \vert a,b)=\frac{b^{a}}{\Gamma \left( a \right) }\varsigma _{i}^{a-1}\exp \left( {-b\varsigma _{i} } \right) , \end{aligned}$$
(6)

where a is the shape parameter, b is the rate parameter, and \(\Gamma (\cdot )\) denotes the gamma function. In this paper, a and b are not automatically estimated but provided empirically and uniformly as \(a=5\times 10^{-5}M\) and \(b=3\times 10^{-3}M\) for both single- and multi-shot blur-kernel estimation. In this case, \(\text {p}(\varsigma _{i} ;a,b)\) is simply denoted as \(\text {p}(\varsigma _{i} )\triangleq \text {p}(\varsigma _{i} ;a,b)\). In the experimental part, the above settings on a and b are found work very well for all the blurred images in either camera-shake deblurring or nonparametric blind super-resolution. We should note that, a similar strategy for choosing a and b by trial-and-error is also exploited in one of our previous works on single-image blind motion deblurring [57], which is, however, based on a simple nonstationary Gaussian image prior, therefore different from the general sparseness-promoting prior in (4).

According to the literature review in Introduction, image priors are indispensable to blind image deconvolution. The three-layer hierarchical prior (4) is of our particular interest, in that, on one hand, a sparse image prior for blur-kernel estimation should be as general as possible in the form so as to adapt the proposed approach to diverse degradation scenarios; on the other hand, adaptive prior learning becomes computationally tractable as the image prior (4) is represented in the hierarchical form and only determined by three deterministic parameters. With automatic parameter estimation for every blur-kernel estimation problem, it not only relieves us from the parameter adjustment which is unavoidable particularly in the state-of-the-art MAP methods, but also leads to great possibility of achieving higher accurate blur-kernel estimation.

In the first layer of the Bayesian hierarchical modeling, \(p(\mathbf {u}_{d} \vert {\varvec{\gamma }}_{d})\) is defined as a product of spatially variant Gaussian PDFs, formulated as

$$\begin{aligned} \text {p}(\mathbf {u}_{d} \vert {\varvec{\gamma }}_{d} )=\prod \limits _{l\in \varvec{\Omega }} {\text {p}\left( \mathbf {u}_{d}^{l} \vert {\varvec{\gamma }}_{d}^{l} \right) } =\prod \limits _{l\in \varvec{\Omega }} {\varvec{\mathcal {N}}\left( \mathbf {u}_{d}^{l} \vert 0,{\varvec{\gamma }}_{d}^{l} \right) ,} \end{aligned}$$
(7)

where \(\varvec{\gamma }_{d}^{l} =({\tilde{\varvec{\gamma }}}_{d}^{l} )^{-1}\). In the second layer, \(\hbox {p}(\varvec{\gamma }_{d} \vert \varvec{\beta }_{d} ;\alpha _{d} )\) is selected to be a product of Gamma PDFs with a common shape parameter \(\alpha _{d}\) for all components of \(\varvec{\gamma }_{d} \) and an individual rate hyper-parameter \(\varvec{\beta }_{d}^{l} \), given as

$$\begin{aligned} \text {p}(\varvec{\gamma }_{d} \vert \varvec{\beta }_{d} ;\alpha _{d} )=\prod \limits _{l\in {\varvec{\Omega }} } {\text {p}\left( \varvec{\gamma }_{d}^{l} \vert \varvec{\beta }_{d}^{l} ;\alpha _{d} \right) } =\prod \limits _{l\in {\varvec{\Omega }}} {{\varvec{\fancyscript{Ga}}}\left( \varvec{\gamma }_{d}^{l} \vert \alpha _{d} ,\varvec{\beta }_{d}^{l} \right) .}\nonumber \\ \end{aligned}$$
(8)

As a matter of fact, a two-layer hierarchical model is more frequently used in the literature with common shape and rate parameters. To achieve more universal and adaptive sparse image modeling, a three-layer one is preferred in this paper in spite of the possible cost for estimating more parameters. In the third layer, a hyper-prior \(\hbox {p}(\varvec{\beta }_{d} ;\rho _{d} ,\theta _{d})\) is imposed on the rate parameter \(\varvec{\beta }_{d} \) and also selected as a product of Gamma PDFs

$$\begin{aligned} \text {p}(\varvec{\beta }_{d} ;\rho _{d} ,\theta _{d} )=\prod \limits _{l\in {\varvec{\Omega }} } {\text {p}\left( \varvec{\beta }_{d}^{l} ;\rho _{d} ,\theta _{d} \right) } =\prod \limits _{l\in {\varvec{\Omega }} } {{\varvec{\fancyscript{Ga}}}\left( \varvec{\beta }_{d}^{l} \vert \rho _{d} ,\theta _{d} \right) },\nonumber \\ \end{aligned}$$
(9)

where \(\rho _{d} ,\theta _{d} \) are the deterministic model parameters. With (7), (8), (9), it is not hard to derive that \(\hbox {p}(\mathbf {u}_{d} ;\alpha _{d} ,\rho _{d} ,\theta _{d})\) will reduce to the Jeffreys prior as \(\alpha _{d} ,\rho _{d} ,\theta _{d} \) are all set as 0.

We should note that the three-layer hierarchical prior was used previously in the fields of compressed sensing [58] and variable selection [59]. However, the three-layer models in [58, 59] are, respectively, Bayesian adaptive versions of the Laplacian distribution and the LASSO estimator. Their optimization schemes are also different from the proposed method in this paper. Specifically, in [58], the type-II maximum likelihood (ML) method has been used to estimate all the random hyper-parameters and deterministic model parameters through extending the relevance vector machine (RVM) in [60]; and [59] exploits the expectation-maximization (EM) method.

3.2 Minimizing Objective Function

Now, we come to the task of single/multi-shot blur-kernel estimation, which is to be formulated by taking advantage of both the theoretical robustness of VB-based approaches and the modeling flexibility of MAP-based approaches. The sole prior knowledge about a spatially invariant blur-kernel that we use is that, it can be very sparse when assuming a proper size for blur-kernels. Therefore, a sparse regularization term is incorporated in our framework to promote the sparseness of blur-kernels. In specific, with Eqs. (9), (10), (11), (12), and (13), the proposed blur-kernel estimation method is formulated into an \({\varvec{\fancyscript{l}}}_{{0.5}}\)-norm-regularized minimizing objective function

$$\begin{aligned}&{{\textsf {\textit{Q}}}}(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} ) \nonumber \\&\quad = \frac{1}{2}\sum \limits _{i\in \varvec{\mho }} {\vert \vert \mathbf {k}_{i} \vert \vert _{0.5}^{0.5} } \mathbf {-}\log \left\{ {\prod \limits _{i\in \varvec{\mho }} {\prod \limits _{d\in \varvec{\Lambda }} {\text {p}(\mathbf {y}_{i\vert d} ;\alpha _{d} ,\rho _{d} ,\theta _{d} ,\mathbf {k}_{i} )} } } \right\} .\nonumber \\ \end{aligned}$$
(10)

To be noted that, the \({\varvec{\fancyscript{l}}}_{{0.5}}\)-norm used in (10) is actually an empirical choice which is found to perform better in most of image blurring degradation scenarios than the \({\varvec{\fancyscript{l}}}_{1}\)-norm or the naive \({\varvec{\fancyscript{l}}}_{0}\)-norm. In fact, another property of blur-kernels is also very important, i.e., continuity of the kernel support [27]. In spite of that, the proposed method needs not to impose a continuity prior on the kernels; as a matter of fact, the quality of latent sharp images determines the precision of recovered kernels to a great degree [20], and the proposed approach is capable of producing more accurate edge images \(\left\{ {\mathbf {u}_{d} } \right\} _{d\in \varvec{\Lambda }}\) as core clues to kernel estimation due to the adaptive learning of the image priors \(\{\text {p}(\mathbf {u}_{d} ;\alpha _{d} ,\rho _{d} ,\theta _{d} )\}_{d\in {\varvec{\Lambda }}}\). As for \(\text {p}(\mathbf {y}_{i\vert d} ;\alpha _{d} ,\rho _{d} ,\theta _{d} ,\mathbf {k}_{i} )\), we get it by marginalizing out all the hidden random variables including \(\mathbf {u}_{d} ,{\varvec{\gamma }}_{d} ,{\varvec{\beta }}_{d} ,\varsigma _{i} \), given as

$$\begin{aligned}&\text {p}(\mathbf {y}_{i\vert d} ;\alpha _{d} ,\rho _{d} ,\theta _{d} ,\mathbf {k}_{i} )\nonumber \\&\quad =\int \text {p}(\mathbf {y}_{i\vert d} \vert \mathbf {u}_{d} ,\varsigma _{i} ;\mathbf {k}_{i}) \text {p}(\mathbf {u}_{d} ;\alpha _{d} ,\rho _{d} ,\theta _{d})\text {p}(\varsigma _{i} )\text {d}\mathbf {u}_{d} d\varsigma _{i} ,\nonumber \\ \end{aligned}$$
(11)

just similar to the state-of-the-art VB blind deblurring methods [1, 8, 1517]. However, compared with those approaches, our method has provided a completely rigorous and feasible optimization framework for estimating the posterior distributions of all the hidden random variables \(\{\mathbf {u}_{d} ,{\varvec{\gamma }}_{d} ,{\varvec{\beta }}_{d} \}_{d\in \varvec{\Lambda }},\{\varsigma _{i} \}_{i\in \varvec{\mho }}\), as well as the deterministic model parameters including the blur-kernels \(\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}\) and those involved in the hyper-priors, i.e., \(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in {\varvec{\Lambda }}}\).

With (11), single/multi-shot blur-kernel estimation is finally formulated as the following minimization problem:

$$\begin{aligned}&\mathop {\min }\limits _{\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in {\varvec{\Lambda }}} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} } \, {{\textsf {\textit{Q}}}}(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in {\varvec{\Lambda }}} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}) \nonumber \\&\quad \text {subject to }\alpha _{d} \ge 0,\, \rho _{d} \ge 0,\, \theta _{d} \ge 0,\, \mathbf {k}_{i}^{j} \ge 0, \nonumber \\&\quad \sum \limits _{j\in \mathbf{Z}} {\mathbf {k}_{i}^{j}} =1. \end{aligned}$$
(12)

where \(\mathbf {Z}=\{1,\, 2,\, \ldots ,P\}\), and P is the length of the vectorized kernel \(\mathbf {k}_{i}\). Note that, \(\alpha _{d} ,\rho _{d} ,\theta _{d} \) are allowed to be 0 in (12) just for including the noninformative Jeffreys prior. Nevertheless, it is hard to solve the minimization problem (12) in a direct way. In the following, a computationally tractable upper bound of \({{\textsf {\textit{Q}}}}(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} )\) is to be pursued by borrowing ideas of EM [12] and MM [32].

Firstly, let \(\varvec{\Upsilon }_{d} =\{\mathbf{u}_{d} ,\varvec{\gamma }_{d} ,\varvec{\beta }_{d} \}\) and \(\varvec{\Theta }_{d} =\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}\). Then, the objective function (12) can be rewritten as

$$\begin{aligned}&{{\textsf {\textit{Q}}}}(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} ) \nonumber \\&\quad =\frac{1}{2}\sum \limits _{i\in \varvec{\mho }} {\vert \vert \mathbf {k}_{i} \vert \vert _{0.5}^{0.5} } \nonumber \\&\qquad +\left\{ \mathbf {-}{{\textsf {\textit{F}}}}(\tilde{\hbox {q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} )-\text {KL}(\tilde{\mathrm{q}}\vert \vert \tilde{\mathrm{p}})\right\} , \end{aligned}$$
(13)

where \(\tilde{\hbox {q}}\) and \(\tilde{\hbox {p}}\) are defined as

$$\begin{aligned} \tilde{\mathrm{q}}= & {} \prod \limits _{i\in \varvec{\mho }} {\prod \limits _{d\in \varvec{\Lambda }} {\text {q}(\varvec{\Upsilon }_{d} ,\varsigma _{i})}} , \end{aligned}$$
(14)
$$\begin{aligned} \tilde{\mathrm{p}}= & {} \prod \limits _{i\in \varvec{\mho }} {\prod \limits _{d\in \varvec{\Lambda }} {\text {p}(\varvec{\Upsilon }_{d} ,\varsigma _{i} \vert \mathbf {y}_{i\vert d} ;\varvec{\Theta }_{d} ,\mathbf {k}_{i} )}}, \end{aligned}$$
(15)

with q being an any joint PDF of the hidden random variables \(\varvec{\Upsilon }_{d} ,\varsigma _{i}\), and \(\hbox {KL}(\tilde{\hbox {q}}\vert \vert \tilde{\hbox {p}})\ge 0\) is the Kullback–Leibler (KL) divergence between PDFs \(\tilde{\hbox {q}}\) and \(\tilde{\hbox {p}}\)

$$\begin{aligned} \text {KL}(\tilde{\mathrm{q}}\vert \vert \tilde{\mathrm{p}})=-\int {\tilde{\mathrm{q}}\log \left( {\frac{\tilde{\mathrm{p}}}{\tilde{\mathrm{q}}}} \right) } \,\text {d}\{\varvec{\Upsilon }_{d} \}_{d\in \varvec{\Lambda }} \text {d}\{\varsigma _{i} \}_{i\in \varvec{\mho }} . \end{aligned}$$
(16)

As for \({{\textsf {\textit{F}}}}(\tilde{\mathrm{q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} )\), it can be formulated as

$$\begin{aligned}&{{\textsf {\textit{F}}}}(\tilde{\mathrm{q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} ) \nonumber \\&\quad =\int {\tilde{\mathrm{q}}\log \left( {\frac{\prod \limits _{i\in \varvec{\mho }} {\prod \limits _{d\in \varvec{\Lambda }} {\text {p}(\mathbf {y}_{i\vert d} ,\varvec{\Upsilon }_{d} ,\varsigma _{i} ;\varvec{\Theta }_{d} ,\mathbf {k}_{i} )}}}{\tilde{\mathrm{q}}}} \right) } \nonumber \\&\qquad \text {d}\{\varvec{\Upsilon }_{d} \}_{d\in \varvec{\Lambda }} \text {d}\{\varsigma _{i} \}_{i\in \varvec{\mho }} . \end{aligned}$$
(17)

Due to the nonnegativity of \(\hbox {KL}(\tilde{\hbox {q}}\vert \vert \tilde{\hbox {p}})\ge 0\), an upper bound of \({{\textsf {\textit{Q}}}}(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} )\) can be thus obtained as

$$\begin{aligned}&{{\textsf {\textit{Q}}}}(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }})\le {\tilde{{{\textsf {\textit{Q}}}}}}(\tilde{\hbox {q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} ) \nonumber \\&\quad = \frac{1}{2}\vert \vert \mathbf {k}\vert \vert _{0.5}^{0.5} -{{\textsf {\textit{F}}}}(\tilde{\hbox {q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} ), \end{aligned}$$
(18)

where the equality holds only as \(\hbox {KL}(\tilde{\hbox {q}}\vert \vert \tilde{\hbox {p}})=0\) implying that \(\hbox {q}(\varvec{\Upsilon }_{d} ,\varsigma _{i} )=\hbox {p}(\varvec{\Upsilon }_{d} ,\varsigma _{i} \vert \mathbf {y}_{i\vert d} ;\varvec{\Theta }_{d} ,\mathbf {k}_{i} )\). With Eqs. (18) and (12), single/multi-shot blur-kernel estimation can now be reformulated with the upper bound \({\tilde{{{\textsf {\textit{Q}}}}}}\), i.e.,

$$\begin{aligned}&\mathop {\min }\limits _{\tilde{\mathrm{q}},\{{\varvec{\Theta }}_{d} \}_{d\in {\varvec{\Lambda }}} ,\{\mathbf{k}_{i} \}_{i\in {\varvec{\mho }}}} \,{\tilde{{{\textsf {\textit{Q}}}}}}(\tilde{\mathrm{q}},\{{\varvec{\Theta }}_{d}\}_{d\in {\varvec{\Lambda }} } ,\{\mathbf{k}_{i} \}_{i\in {\varvec{\mho }}}) \nonumber \\&\quad \text {subject to }\alpha _{d} \ge 0,\,\rho _{d} \ge 0,\, \theta _{d} \ge 0,\, \mathbf{k}_{i}^{j} \ge 0, \nonumber \\&\quad \sum \limits _{j\in \mathbf{Z}} {\mathbf{k}_{i}^{j}} =1. \end{aligned}$$
(19)

In other words, the blur-kernel estimation problem reduces to alternating minimization of \({\tilde{{{\textsf {\textit{Q}}}}}}\) with respect to the model parameters \(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }}\) and the approximate posterior distributions \(\tilde{\text {q}}\). Note that, the proposed approach rigorously relies on Eq. (19) and works without any extra pre-processing or post-processing operations, such as the smoothing or edge enhancement [20, 21, 23].

4 Numerical Implementation

4.1 Estimating Posterior Distributions

In order to perform minimization of \(\tilde{{{\textsf {\textit{Q}}}}}\) with respect to the approximating posterior distributions \(\tilde{\text {q}}\), the mean field approximation (MFA) method [12] is used, assuming that the posterior distributions of the hidden random variables \(\{\varvec{\Upsilon }_{d} \}_{d\in \varvec{\Lambda }} ,\,\{\varsigma _{i} \}_{i\in \varvec{\mho }} \) are independent to each other, i.e.,

$$\begin{aligned} \tilde{\mathrm{q}}= & {} \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}({\varvec{\Upsilon }}_{d} ,\varsigma _{i} )} } = \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varsigma _{i} )\text {q}({\varvec{\Upsilon }}_{d} )} } \nonumber \\= & {} \prod \limits _{{i \in {\varvec{\mho }}}}{\prod \limits _{{d \in {\varvec{\Lambda }}}}{\text {q}(\varsigma _{i} )\text {q}({\mathbf {u}}_{d} )\text {q}(\varvec{\gamma }_{d} )\text {q}(\varvec{\gamma }_{d})}}. \end{aligned}$$
(20)

First of all, we discuss the pursuit of the posterior distribution for the hyper-parameter \(\varsigma _{i}\). After some straightforward computations, \(-{{\textsf {\textit{F}}}}(\tilde{\text {q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }})\) can be decomposed with respect to \(\hbox {q}(\varsigma _{i} )\) as follows:

$$\begin{aligned}&- {{\textsf {\textit{F}}}}(\tilde{\mathrm{q}},\{ {\varvec{\Theta }}_{d} \} _{{d \in {\varvec{\Lambda }}}} ,\{ {\mathbf {k}}_{i} \} _{{i \in {\varvec{\mho }}}} ) \nonumber \\&\quad = \text {KL}\left( {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varsigma _{i} )} ||\tilde{\mathrm{p}}(\{ {\mathbf {y}}_{{i|d}}\} _{{d \in {\varvec{\Lambda }}}} ,\varsigma _{i} ;\{ {\varvec{\Theta }}_{d} \} _{{d \in {\varvec{\Lambda }}}} ,{\mathbf {k}}_{i} )} \right) \nonumber \\&\qquad + \int {\prod \limits _{{i' \in {\varvec{\mho }}\backslash i}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varsigma _{{i'}} )}} \log \prod \limits _{{i' \in {\varvec{\mho }}\backslash i}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varsigma _{{i'}} )} } } \text { d}\varsigma _{{i'}} \nonumber \\&\qquad + \int {\prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}({\mathbf {u}}_{d})}} \log \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}({\mathbf {u}}_{d})}}} \text { d}\{ {\mathbf {u}}_{d} \} _{{d \in {\varvec{\Lambda }}}} \nonumber \\&\qquad + \int {\prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varvec{\gamma }_{d} )}} \log \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varvec{\gamma }_{d} )} } } \text { d}\{ \varvec{\gamma }_{d}\} _{{d \in {\varvec{\Lambda }}}} \nonumber \\&\qquad + \int {\prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varvec{\beta }_{d})}} \log \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varvec{\beta }_{d})}}} \text { d}\{ \varvec{\beta }_{d} \} _{{d \in {\varvec{\Lambda }}}} , \end{aligned}$$
(21)

where \(\tilde{\hbox {p}}(\{\mathbf {y}_{i\vert d} \}_{d\in \varvec{\Lambda }} ,\varsigma _{i} ;\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\mathbf {k}_{i})\) is defined as

$$\begin{aligned}&\log \tilde{\mathrm{p}}(\{ {\mathbf {y}}_{{i|d}} \}_{{d \in {\varvec{\Lambda }}}} ,\varsigma _{i} ;\{ {\varvec{\Theta }}_{d} \} _{{d \in {\varvec{\Lambda }}}} ,{\mathbf {k}}_{i} ) \nonumber \\&\quad = \int \prod \limits _{{i' \in {\varvec{\mho }}\backslash i}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {q}(\varsigma _{{i'}} )\text {q}({\mathbf {u}}_{d})\text {q}(\varvec{\gamma }_{d} )\text {q}(\varvec{\beta }_{d})}} \nonumber \\&\qquad \log \left( {\prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {p}({\mathbf {y}}_{{i|d}} ,{\varvec{\Upsilon }}_{d} ,\varsigma _{i} ;{\varvec{\Theta }}_{d} ,{\mathbf {k}}_{i} )} } } \right) \nonumber \\&\qquad \text {d}\{ {\varvec{\Upsilon }}_{d} \} _{{d \in {\varvec{\Lambda }}}} \text {d}\{ \varsigma _{{i'}} \} _{{i' \in {\varvec{\mho }}\backslash i}} . \end{aligned}$$
(22)

It is seen that, \(-{{\textsf {\textit{F}}}}(\tilde{\hbox {q}},\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }})\) is minimized as the KL divergence \(\hbox {KL}\left( \prod _{d\in \varvec{\Lambda }} \hbox {q}(\varsigma _{i} )\vert \vert \tilde{\hbox {p}}(\{\mathbf {y}_{i\vert d} \}_{d\in \varvec{\Lambda }} ,\varsigma _{i} ;\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\mathbf {k}_{i} ) \right) \) equals zero, i.e., \(\prod _{d\in \varvec{\Lambda }} \hbox {q}(\varsigma _{i} )=\tilde{\hbox {p}}(\{\mathbf{y}_{i\vert d} \}_{d\in \varvec{\Lambda }} ,\varsigma _{i} ;\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} ,\mathbf{k}_{i})\), leading to

$$\begin{aligned}&\sum \limits _{{d \in {\varvec{\Lambda }}}} {\log \text {q}(\varsigma _{i} )} = | {\varvec{\Lambda }}| \log \text {q}(\varsigma _{i} ) \nonumber \\&\quad = \left\langle {\log \prod \limits _{{i \in {\varvec{\mho }}}} {\prod \limits _{{d \in {\varvec{\Lambda }}}} {\text {p}({\mathbf {y}}_{{i|d}} ,{\varvec{\Upsilon }}_{d} ,\varsigma _{i} ;{\varvec{\Theta }}_{d} ,{\mathbf {k}}_{i} )} } } \right\rangle _{{\{ \mathrm{q}(\varsigma _{{i'}} )\} _{{i' \in {\varvec{\mho }}\backslash i}} \{ \mathrm{q}({\varvec{\Upsilon }}_{d} )\} _{{d \in {\varvec{\Lambda }}}} }} \nonumber \\&\quad = \left\{ \begin{array}{l} \frac{M}{2}\sum \limits _{{d \in {\varvec{\Lambda }}}} {\log \varsigma _{i} } - \frac{{\varsigma _{i} }}{2}\sum \limits _{{d \in {\varvec{\Lambda }}}} \left\langle \left\| {{\mathbf {y}}_{{i|d}} - {\mathbf {K}}_{i} {\mathbf {u}}_{d} } \right\| _{2}^{2} \right\rangle _{{\mathrm{q}({\mathbf {u}}_{d} )}} \\ + \sum \limits _{{d \in {\varvec{\Lambda }}}} {\log ((a - 1)\log \varsigma _{i} - b\varsigma _{i} )} \end{array} \right\} \nonumber \\&\qquad +\, const. \end{aligned}$$
(23)

Hence, \(\hbox {q}(\varsigma _{i})\) is actually a Gamma PDF \({\varvec{\fancyscript{Ga}}}(\varsigma _{i} \vert a_{\varsigma _{i} } ,b_{\varsigma _{i}})\) where the shape and rate parameters are defined as

$$\begin{aligned} a_{{\varsigma _{i}}}= & {} \frac{M}{2} + a,\nonumber \\ b_{{\varsigma _{i}}}= & {} \frac{1}{{2|{\varvec{\Lambda }}|}}\sum \limits _{{d \in {\varvec{\Lambda }}}} {\left\langle \left\| {{\mathbf {y}}_{{i|d}} - {\mathbf {K}}_{i} {\mathbf {u}}_{d} } \right\| _{2}^{2} \right\rangle _{{\text {q}({\mathbf {u}}_{d} )}} } + b, \end{aligned}$$
(24)

and its mean is given by \(\langle \varsigma _{i} \rangle _{q(\varsigma _{i} )} =a_{\varsigma _{i} } /b_{\varsigma _{i} }\).

In Appendix 1, the posterior distributions of \(\varsigma _{i} \) and \(\mathbf {u}_{d} \), i.e., \(\hbox {q}(\varsigma _{i})\) and \(\hbox {q}(\mathbf {u}_{d})\), have been deduced in every detail for clearness, according to which, \(\hbox {q}(\mathbf {u}_{d})\) is a multivariate Gaussian PDF given as

$$\begin{aligned} \hbox {q}(\mathbf {u}_{d} )={\varvec{\mathcal {N}}}(\mathbf {u}_{d} \vert \varvec{\mu }_{d} ,\mathbf {C}_{d} ), \end{aligned}$$
(25)

with its mean \(\varvec{\mu }_{d} \) and covariance matrix \(\mathbf {C}_{d} \) defined as

$$\begin{aligned} \varvec{\mu }_{d}= & {} \mathbf {C}_{d} \sum \limits _{i\in \varvec{\mho } } {\langle \varsigma _{i} \rangle _{{\mathrm{q}}(\varsigma _{i} )} \mathbf {K}_{i}^{\mathrm{T}} \mathbf {y}_{i\vert d} ,} \end{aligned}$$
(26)
$$\begin{aligned} \mathbf {C}_{d}= & {} \left[ {\vert \varvec{\mho } \vert \hbox {diag}\{\langle \varvec{\gamma }_{d} \rangle _{\text {q}(\varvec{\gamma }_{d} )} \}+\sum \limits _{i\in \varvec{\mho } } {\langle \varsigma _{i} \rangle _{\text {q}(\varsigma _{i} )} \mathbf {K}_{i}^{\mathrm{T}} \mathbf {K}_{i} } } \right] ^{-1}. \end{aligned}$$
(27)

Since it is computationally demanding to calculate Eq. (26) directly, the conjugate gradient (CG) method is used to compute (26) in practice. As for the posterior distributions \(\text {q}(\varvec{\gamma }_{d} )\) and \(\text {q}(\varvec{\beta }_{d} )\), it is easy to derive them in a very similar fashion to \(\hbox {q}(\mathbf {u}_{d})\). Specifically, both \(\hbox {q}(\varvec{\gamma }_{d} )\) and \(\hbox {q}(\varvec{\beta }_{d} )\) are also Gamma PDFs, given as

$$\begin{aligned} \hbox {q}(\varvec{\gamma }_{d})= & {} \prod \limits _{l\in \varvec{\Omega }} {{\varvec{\fancyscript{Ga}}}(\varvec{\gamma }_{d}^{l} \vert a_{\varvec{\gamma }_{d}^{l} } ,b_{\varvec{\gamma }_{d}^{l}})} , \end{aligned}$$
(28)
$$\begin{aligned} \hbox {q}(\varvec{\beta }_{d} )= & {} \prod \limits _{l\in \varvec{\Omega }} {{\varvec{\fancyscript{Ga}}}(\varvec{\beta }_{d}^{l} \vert a_{\varvec{\beta }_{d}^{l} } ,b_{\varvec{\beta }_{d}^{l} } )} , \end{aligned}$$
(29)

with their means, respectively, given by

$$\begin{aligned} \langle \varvec{\gamma }_{d}^{l} \rangle _{\mathrm{q}(\varvec{\gamma }_{d}^{l} )} =a_{\varvec{\gamma }_{d}^{l} } /b_{\varvec{\gamma }_{d}^{l} } ,\,\langle \varvec{\beta }_{d}^{l} \rangle _{\mathrm{q}(\varvec{\beta }_{d}^{l} )} =a_{\varvec{\beta }_{d}^{l} } /b_{\varvec{\beta }_{d}^{l}}, \end{aligned}$$
(30)

where

$$\begin{aligned} a_{\varvec{\gamma }_{d}^{l} } =\frac{1}{2}+\alpha _{d} ,\,b_{\varvec{\gamma }_{d}^{l} } =\frac{1}{2}\left\langle (\mathbf {u}_{d}^{l} )^{2}\right\rangle _{\text {q}(\mathbf {u}_{d}^{l} )} +\langle \varvec{\beta }_{d}^{l} \rangle _{\text {q}(\varvec{\beta }_{d}^{l} )} , \end{aligned}$$
(31)

with \(\langle (\mathbf {u}_{d}^{l} )^{2}\rangle _{\mathrm{q}(\mathbf {u}_{d}^{l} )} =({\varvec{\mu }}_{d}^{l} )^{2}+\mathbf {C}_{d}^{l,l}\), and

$$\begin{aligned} a_{\varvec{\beta }_{d}^{l} } =\alpha _{d} +\rho _{d} ,\, b_{\varvec{\beta }_{d}^{l} } =\langle \varvec{\gamma }_{d}^{l} \rangle _{q(\varvec{\gamma }_{d})} +\theta _{d} . \end{aligned}$$
(32)

For computational efficiency of \(\langle (\mathbf {u}_{d}^{l} )^{2}\rangle _{\mathrm{q}(\mathbf {u}_{d}^{l})}\), the covariance matrix \(\mathbf {C}_{d} \) in Eq. (27) is approximated here by using only its diagonal entries.

4.2 Estimating Image Prior Parameters

With (19), (24), (26), (27), (31), (32), estimating \(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }} \) can be achieved by maximizing the objective function

$$\begin{aligned}&{{\textsf {\textit{G}}}}(\{\varvec{\Theta }_{d} \}_{d\in \varvec{\Lambda }}) \nonumber \\&\quad = \langle \log \prod \limits _{i\in \varvec{\mho }} {\prod \limits _{d\in \varvec{\Lambda }} {\text {p}(\mathbf{y}_{i\vert d} ,\varvec{\Upsilon }_{d} ,\varsigma _{i} ;\varvec{\Theta }_{d} ,\mathbf{k}_{i} )}} \rangle _{\{q(\varvec{\Upsilon }_{d} )\}_{d\in \varvec{\Lambda }} } \nonumber \\&\quad = \sum \limits _{l\in \varvec{\Omega }} \left\{ \begin{array}{l} \alpha _{d} \langle \log {\varvec{\beta }}_{d}^{l} \rangle _{\text {q}({\varvec{\beta }}_{d}^{l} )} +(\alpha _{d} -1)\langle \log {\varvec{\gamma }}_{d}^{l} \rangle _{\mathrm{q}({\varvec{\gamma }}_{d}^{l} )} \\ -\,\log \Gamma (\alpha _{d} )+\rho _{d} \log \theta _{d} -\log \Gamma (\rho _{d} ) \\ +\,(\rho _{d} -1)\langle \log {\varvec{\beta }}_{d}^{l} \rangle _{\mathrm{q}({\varvec{\beta }}_{d}^{l} )} -\theta _{d} \langle {\varvec{\beta }}_{d}^{l} \rangle _{\mathrm{q}({\varvec{\beta }}_{d}^{l} )} \\ \end{array} \right\} ,\nonumber \\ \end{aligned}$$
(33)

where \(\psi (x)={\Gamma }'(x)/\Gamma (x)\) is called a digamma function, and \(\langle \log \varvec{\gamma }_{d}^{l} \rangle =\psi (a_{\varvec{\gamma }_{d}^{l} } )-\log b_{\varvec{\gamma }_{d}^{l} } ,\,\langle \log \varvec{\beta }_{d}^{l} \rangle =\psi (a_{\varvec{\beta }_{d}^{l} } )-\log b_{\varvec{\beta }_{d}^{l}}\). It is impossible to derive the analytical solutions to \(\alpha _{d} ,\, \rho _{d} \), and hence the Matlab function fminbnd is utilized to estimate the two parameters with projection onto the set of nonnegativity constraints.

4.3 Estimating Blur-Kernels

Similarly, blur-kernels can be estimated by solving the following minimization problem:

$$\begin{aligned} \mathop {\min }\limits _{\{\mathbf{k}_{i} \}_{i\in \varvec{\mho }}} \,{{\textsf {\textit{H}}}}(\{\mathbf{k}_{i} \}_{i\in \varvec{\mho }}) \text { subject to } \mathbf{k}_{i}^{j} \ge 0,\, \sum \limits _{j\in \mathbf{Z}} {\mathbf{k}_{i}^{j}} =1, \end{aligned}$$
(34)

where

$$\begin{aligned} {{\textsf {\textit{H}}}}(\{\mathbf {k}_{i} \}_{i\in \varvec{\mho }} )= & {} \frac{1}{2}\sum \limits _{i\in \varvec{\mho }} {\vert \vert \mathbf {k}_{i} \vert \vert _{0.5}^{0.5}} \nonumber \\&+\,\frac{1}{2}\sum \limits _{i\in \varvec{\mho }} {\sum \limits _{d\in \varvec{\Lambda }} {\langle \varsigma _{i} \rangle _{\mathrm{q}(\varsigma _{i} )} \langle \left\| {\mathbf {y}_{i\vert d} -\mathbf {K}_{i} \mathbf {u}_{d} } \right\| _{2}^{2} \rangle _{\mathrm{q}(\mathbf {u}_{d} )}.}}\nonumber \\ \end{aligned}$$
(35)

Then, using iteratively reweighted least squares (IRLS), estimating \(\mathbf {k}_{i} \) can be approximately achieved by minimizing

$$\begin{aligned}&{{\textsf {\textit{H}}}}_{i} (\mathbf {k}_{i} ) = \left( {-\langle \varsigma _{i} \rangle _{\mathrm{q}(\varsigma _{i} )} \sum \limits _{d\in {\varvec{\Lambda }}} {\mathbf {b}_{i\vert d}^{\mathrm{T}} } } \right) \mathbf {k}_{i} \nonumber \\&\quad +\frac{1}{2}\mathbf {k}_{i}^{\mathrm{T}} \left( \hbox {diag}\left( {\left\{ \frac{1}{\max ((\mathbf {k}_{i}^{j} )^{1.5},\, \varepsilon )} \right\} } \right) \right. \nonumber \\&\quad \left. +\left( {\langle \varsigma _{i} \rangle _{\mathrm{q}(\varsigma _{i})} \sum \limits _{d\in {\varvec{\Lambda }}} {\mathbf {A}_{d}}} \right) \right) \mathbf {k}_{i}, \end{aligned}$$
(36)

where \(\varepsilon \) is a very small positive constant, chosen as 1e6 for all the experiments in this paper, and

$$\begin{aligned} \mathbf {A}_{d} \left( {r,s} \right)= & {} \sum \nolimits _{l\in {\varvec{\Omega }}} {(\varvec{\mu }_{d}^{l+r} \varvec{\mu }_{d}^{l+s} +\mathbf {C}_{d}^{l+r,l+s} )} , \nonumber \\ \mathbf {b}_{i\vert d} \left( r \right)= & {} \sum \nolimits _{l\in {\varvec{\Omega }} } {\varvec{\mu }_{d}^{l+r} \mathbf {y}_{i\vert d}^{l}}. \end{aligned}$$
(37)

where rs are the blur-kernel indexes. Note that, Equation (36) is a classic quadratic programming problem, which can be solved using the Matlab function quadprog along with projections onto the constraint set.

figure f

Based on the above discussion, single/multi-shot blur-kernel estimation is summarized into Algorithm 1. Interestingly, Algorithm 1 will be degenerated to the algorithm in [29] as we set the model parameters \(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in \varvec{\Lambda }}\) as 0 and omit the \({\varvec{\fancyscript{l}}}_{{0.5}}\)-norm-based kernel regularization in (10). Therefore, the proposed approach is more adaptive in problem modeling than [29] in this sense. In order to account for large-scale blur-kernel estimation and further reduce the risk of getting stuck in poor local minima, a multi-scale scheme of Algorithm 1 is used, similar to many existing VB and MAP methods in the literature. The number of scales is determined by the input kernel size at the finest scale. The size ratio between consecutive scales in each dimension is \(\sqrt{2} \) such that the kernel size at the coarsest scale is \(3\times 3\). In each scale, the input blurred images are the correspondingly downsampled versions of the finest scale, i.e., \(\{\mathbf {y}_{i} \}_{i\in \varvec{\mho }}\), and initial blur-kernels are set as the upsampled blur-kernels estimated from the coarser scale. Note that as for the coarsest scale, initial blur-kernels are chosen as a Dirac pulse of size \(3\times 3\). The initializations of other parameters are set the same for all the scales: \(\varsigma _{i} \) is set as 1e2, entries of \(\varvec{\gamma }_{d} \) are all set as 1e4, those of \(\varvec{\beta }_{d} \) are 0, and the initial \(\alpha _{d} ,\rho _{d} ,\theta _{d} \) are 0, too. The outer iteration number L and the inner iteration number J are, respectively, chosen as 10 and 5.

5 Validation and Discussion

In this section, the proposed blur-kernel estimation method is applied to both camera-shake deblurring and single-image nonparametric blind super-resolution, so as to validate its effectiveness and robustness in practical applications. All the experiments reported in this paper are performed with MATLAB v7.0 on a computer with an Intel i7-4600M CPU (2.90 GHz) and 8 GB memory.

5.1 Camera-shake Deblurring

In camera-shake deblurring, deblurred images restored from the provided single/multiple-blurry observations are of our final interest. Hence, with Algorithm 1 producing the estimated blur-kernels, the hyper-Laplacian prior-based nonblind deblurring method [61] is utilized to perform nonblind deconvolution for all the camera-shake deblurring experiments in this subsection. Note that, many other more advanced nonblind deconvolution algorithms can be tried, too, e.g., [6266]. The main reason for choosing [61] in this paper is that it achieves a good compromise between the deblurring quality and the computational complexity. For a fair comparison, [61] is also used for other single/multi-shot blur-kernel estimation methods to perform final nonblind deblurring.

5.1.1 Single-shot Case

The proposed approach for single-shot blur-kernel estimation is tested on the benchmark blurred dataset provided by Levin et al. in [8], downloaded from www.wisdom.weizmann.ac.il/~levina/papers/LevinEtalCVPR2011Code.zip. In this dataset, 4 sharp images (\({\varvec{\fancyscript{I}}}1,\, {\varvec{\fancyscript{I}}}2,\, {\varvec{\fancyscript{I}}}3,\, {\varvec{\fancyscript{I}}}4\)) of size \(255\times 255\) are blurred by 8 blur-kernels \(({\varvec{\fancyscript{K}}}1,\, {\varvec{\fancyscript{K}}}2,\, {\varvec{\fancyscript{K}}}3,\, {\varvec{\fancyscript{K}}}4,\, {\varvec{\fancyscript{K}}}5,\, {\varvec{\fancyscript{K}}}6,\, {\varvec{\fancyscript{K}}}7,\, {\varvec{\fancyscript{K}}}8)\) of sizes ranging from \(13\times 13\) to \(27\times 27\), producing 32 real-captured blurred images. The SSD (sum of squared difference) error ratio defined in [8] is used as the final evaluation measure on different methods. Figure 2 provides the cumulative histogram of SSD error ratios for the proposed and existing seven methods [1, 8, 1517, 21, 67].Footnote 2 As noted by Levin et al. [8], the deblurred images are visually plausible in general as the SSD error ratios are below 3, and in this case, the blind deblurring is considered successful. It is clear that the proposed kernel estimation method has achieved the best performance in terms of the success percentage (97 %), as well as uniformly better performance throughout all the bins. The average SSD error ratios of different methods are also provided here: 3.74 [1], 7.05 [8], 2.97 [15], 2.94 [16], 2.02 [17], 2.42 [21], 2.77 [67], and 1.44 (Ours). To show the superiority of adaptively learning image priors, the Jeffreys prior is tried in the proposed framework; the model parameters \(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in {\varvec{\Lambda }} } \) in Algorithm 1 are just set as 0 without altering other configurations, which leads to an increased average SSD error ratio 1.82 and a decreased success percentage 91 %. Recall that, methods [15, 16] build on the Jeffreys prior but perform far worse than our framework, demonstrating the benefit of adaptively estimating the inverse noise variance as well as additionally incorporating the \({\varvec{\fancyscript{l}}}_{\mathrm {0.5}}\)-norm-based kernel regularization. To be noted that the size information on the ground truth blur-kernels is assumed to be known in the above experiments.

Fig. 2
figure 2

The cumulative histogram of the SSD deconvolution error ratios as assuming an accurate kernel size. The success percentages (SSD error ratio below 3) of different methods are 69 % [1], 19 % [8], 69 % [15], 63 % [16], 84 % [17], 75 % [21], 69 % [67], and 97 % (Ours) (Color figure online)

Fig. 3
figure 3

The cumulative histogram of the SSD error ratios as without accurate kernel size information (a size of \(31\times 31\) is set uniformly). The success percentages achieved by the two methods are, respectively, 69 % [20] and 94 % (Ours) (Color figure online)

Fig. 4
figure 4

Ground truth blur-kernels (top block) and the kernels estimated by Xu et al. [20] and the proposed method for the 32 benchmark blurred images, uniformly assuming the kernel size as \(31\times 31\). The figure shown in each kernel is the calculated kernel SSD error (the lower the better)

Fig. 5
figure 5

Three benchmark test images for constructing multiple-blurred observations. Left Cameraman (C), Middle: Lena (L), Right Pepper (P)

Another approach to be compared is Xu et al. [20] without any size information on the blur-kernels, which is one of the best-performing MAP methods in the current literature. Since the sizes of blur-kernels in the benchmark blurred dataset are not greater than \(31\times 31\) (a configuration of the software released by Xu et al.), the size \(31\times 31\) is used for both [20] and the proposed method in this group of experiments. In general, the larger the size of the blur-kernel, the harder the blind image deblurring. In spite of that, the proposed kernel estimation method still achieves a higher success percentage (94 %) and a lower average SSD error ratio (1.60) than those of Xu et al. [20] (69 %, 2.57). It is also observed from the cumulative histogram of SSD error ratios in Fig. 3 that, the proposed kernel estimation method performs better than Xu et al. [20] throughout all the bins. In Fig. 4, the blur-kernels estimated, respectively, by Xu et al. [20] and the proposed kernel estimation approach are provided for visual perception. It is observed that those of the proposed kernel estimation method have less false-motion trajectories and isolated points. To make direct comparison between the two blur-kernel estimation methods, the kernel SSD errors corresponding to each of the 32 benchmark blurred images are also provided in Fig. 4 for both methods. It is not surprising that the proposed kernel estimation method also performs better than Xu et al. [20] in terms of the kernel SSD error.

Fig. 6
figure 6

Ground truth blur-kernels and the kernels estimated by [29] [with the Matlab code (see footnote 3) released by the authors] and the proposed kernel estimation approach for 6 groups of multi-shot images, constructed from three benchmark test images Cameraman (C), Lena (L), and Pepper (P), and two sets of kernels \({\varvec{\fancyscript{g}}}1=\{ {\varvec{\fancyscript{K}}}1 \ldots {\varvec{\fancyscript{K}}}4\}\) and \({\varvec{\fancyscript{g}}}2 =\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\). The figure shown in each blur-kernel is the computed kernel SSD (the lower the better)

Fig. 7
figure 7

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Cameraman (C) and \({\varvec{\fancyscript{g}}}1=\{ {\varvec{\fancyscript{K}}}1 \ldots {\varvec{\fancyscript{K}}}4\}\). The figure in each image is the calculated PSNR score

5.1.2 Multi-shot Case

For a quantitative comparison, 6 groups of multiple-blurred observations are constructed, based on 3 commonly utilized benchmark test images shown in Fig. 5: Cameraman (C), Lena (L), and Pepper (P). In each group, a test image is blurred by either \({\varvec{\fancyscript{g}}}1=\{ {\varvec{\fancyscript{K}}}1 \ldots {\varvec{\fancyscript{K}}}4\} \text { or } {\varvec{\fancyscript{g}}}2 =\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\), producing 4 blurred observations. The proposed kernel estimation method is applied to each group of multi-shot observations and compared with the state-of-the-art method [29] which is shown to outperform [2] significantly. Figure 6 provides the blur-kernels (also the kernel SSDs) estimated by the proposed method and [29] for each group of multi-shot observations.Footnote 3 It is clearly seen that the proposed kernel estimation method performs better than [29] in terms of the kernel SSD. For visual perception of the final deblurred images using the blur-kernels estimated by [29] and the proposed approach, Figs. 7 and 8 provide the 4 deblurred images for each group of multi-shot observations constructed from Cameraman (C), along with the PSNR scores; Figs. 9 and 10 correspond to those of Lena (L); and Figs. 11 and 12 correspond to those of Pepper (P). It is not surprising that the proposed kernel estimation approach achieves not only higher PSNR but also better visual quality, thus validating the modeling superiority of the proposed method to [29], indicating that on average lower kernel SSDs amount to better final deconvolution quality.

Fig. 8
figure 8

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)

Fig. 9
figure 9

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Lena (L) and \({\varvec{\fancyscript{g}}}1=\{{\varvec{\fancyscript{K}}}1 \ldots {\varvec{\fancyscript{K}}}4\}\)

Fig. 10
figure 10

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Lena (L) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)

Fig. 11
figure 11

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Pepper (P) and \({\varvec{\fancyscript{g}}}1=\{ {\varvec{\fancyscript{K}}}1 \ldots {\varvec{\fancyscript{K}}}4\}\)

Fig. 12
figure 12

Deblurred images using the blur-kernels estimated by [29] (top row) and the proposed kernel estimation method (bottom row) for the multiple-blurred observations simulated from Pepper (P) and \({\varvec{\fancyscript{g}}}2=\{{\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)

The blur-kernel size for the above experiments is uniformly specified set as \(31\times 31\). With the same size, the proposed blur-kernel estimation method is applied to each of the 24 blurred images simulated in the multi-shot case (3 benchmark images are blurred by 8 kernels), just to validate the superiority of the multi-shot blur-kernel estimation to the single-shot scenario. The estimated blur-kernels (along with their kernel SSDs) by the proposed approach are provided in Fig. 13. Compared with the results obtained in the multi-shot case as provided in Fig. 6, it is found that the proposed blur-kernel estimation approach in the single-shot case achieves comparable or sometimes even superior performance than [29] but inferior performance on average than the proposed approach in the multi-shot case. In particular, Fig. 14 provides the three final deblurred images corresponding to the estimated kernel \({\varvec{\fancyscript{K}}}8\), which are obviously with worse visual perception and PSNR scores than the ones in the case of multi-shot kernel estimation. But it should be noted that the multi-shot blur-kernel estimation is computationally more expensive than the single-shot case. We take the experiments with Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\), for example. The running time of the proposed kernel estimation approach in the multi-shot case is 908s, while the running time in the single-shot case is, respectively, 348s, 351s, 368s, and 353s. It is also seen that the proposed blur-kernel estimation approach has a much high computation burden even in the single-shot case, which is actually a common characteristic of VB-like methods [1, 8, 9, 11, 1517, 32]. One strategy for reducing the running time is to perform blur-kernel estimation using only part of the blurred images. How to improve the computational efficiency of the proposed method in a fundamental way is actually another challenge to work on in the future.

Fig. 13
figure 13

Blur-kernels estimated by the proposed kernel estimation method applied to each of the 24 blurred images simulated in the multi-shot case (3 benchmark images in Fig. 5 are blurred by 8 kernels provided by Levin et al. [8])

Fig. 14
figure 14

Final deconvolved images with the motion blur-kernel \({\varvec{\fancyscript{K}}}8\) estimated by the proposed kernel estimation approach in the scenario of single-shot blur-kernel estimation

In the following, we discuss the convergence of the proposed blur-kernel estimation approach using multiple-blurred images in an empirical way. Considering the limited space, we still take Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{{\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\), for example. In Fig. 15, the intermediately estimated blur-kernels during the 10 outer iterations of Algorithm 1 (the finest scale) are provided for each of the 4 blurry observations; in Fig. 16, the intermediate results of gradient magnitude imageFootnote 4 are also shown for visual perception. The intermediate estimates of both the gradient magnitude image and the blur-kernel clearly demonstrate that the proposed kernel estimation approach achieves a good convergence performance. It is more obviously observed from the kernel SSD listed in each blur-kernel that, the proposed method has produced a reasonable blur-kernel with 6 outer iterations for each blurry observation.

Fig. 15
figure 15

Intermediately estimated blur-kernels during the 10 outer iterations of Algorithm 1 (the finest scale) for each of the 4 blurry observations simulated utilizing the benchmark test image Cameraman (C) and the ground truth kernels \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)

Fig. 16
figure 16

Intermediate results of gradient magnitude image during the 10 outer iterations of Algorithm 1 (the finest scale). The magnitude images are shown with a linear transform for better visual perception

Table 1 SSD of the blur-kernel estimated by the proposed multi-shot kernel estimation method for each of the 4 blurry observations simulated from Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)
Table 2 PSNR score (dB) of the final deblurred image based on the proposed multi-shot kernel estimation method for each of the 4 blurry observations simulated from Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\)

As suggested by one of the reviewers, we also discuss the sensitivity of the proposed kernel estimation approach to the initializations of hyper-parameters \(\{\varsigma _{i} \}_{i\in \varvec{\mho }} ,\,\{\varvec{\gamma }_{d} ,\varvec{\beta }_{d} \}_{d\in {\varvec{\Lambda }}} \) and model parameters \(\{\alpha _{d} ,\rho _{d} ,\theta _{d} \}_{d\in {\varvec{\Lambda }} } \). Recall that, the above experimental results are generated with the following initializations: (O) \(\varsigma _{i} =1e2\), entries of \(\varvec{\gamma }_{d} \) are all set as 1e4, those of \(\varvec{\beta }_{d} \) are set as 0, and \(\alpha _{d} ,\rho _{d} ,\theta _{d} \) are set as 0. In the following, eight groups of experiments are conducted via altering those initializations, still using the 4 blurry observations simulated from Cameraman (C) and \({\varvec{\fancyscript{g}}}2=\{ {\varvec{\fancyscript{K}}}5 \ldots {\varvec{\fancyscript{K}}}8\}\). The initialization settings for the eight groups of experiments are given as follows:

  1. (1-1)

    \(\varsigma _{i} =2e1\) with other initializations in (O) unchanged;

  2. (1-2)

    \(\varsigma _{i} =2e2\) with other initializations in (O) unchanged;

  3. (2-1)

    \(\alpha _{d} =\rho _{d} =\theta _{d} =1\) with other initializations in (O) unchanged;

  4. (2-2)

    \(\alpha _{d} =\rho _{d} =\theta _{d} =1e1\) with other initializations in (O) unchanged;

  5. (3-1)

    entries of \(\varvec{\beta }_{d}\) are set as 1e-6 with other initializations in (O) unchanged;

  6. (3-2)

    entries of \(\varvec{\beta }_{d}\) are set as 1e-5 with other initializations in (O) unchanged;

  7. (4-1)

    entries of \(\varvec{\gamma }_{d} \) are set as 5e3 with other initializations in (O) unchanged;

  8. (4-2)

    entries of \(\varvec{\gamma }_{d}\) are set as 5e4 with other initializations in (O) unchanged.

Fig. 17
figure 17

Nonparametric blind SR (\(\text {SR}\times 2\)) with low-res Hollywood using the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]), Michaeli and Irani [56], and nonblind SR methods [4446]. The first image in the top row is the bicubic interpolated version of the low-res image using the Matlab function imresize

Using the proposed multi-shot kernel estimation method, Tables 1 and 2, respectively, provide the kernel SSD and the PSNR score of the final deblurred image for each of the 4 blurry observations. The results show that our proposed kernel estimation method achieves stability to a great extent in different initialization settings. In the meanwhile, we also see that the selected initialization setting (O) for Algorithm 1 is absolutely not the best, and hence, we believe that the comparison between Algorithm 1 and other kernel estimation methods is convincing.

5.2 Nonparametric Blind Super-Resolution

When super-resolving a clear, high-res image from its corresponding blurred, low-res version in this paper, the size of the blur-kernel in the degradation model is also uniformly set as \(31\times 31\). Intuitively, for the estimated blur-kernel reliable to its ground truth as much as possible, we should harness those nonblind SR methods assuming a blur-kernel as simple as possible, e.g., bicubic interpolation kernel. As such, three candidate state-of-the-art nonblind dictionary-based fast SR algorithms [4446] are chosen for practicing nonparametric blind SR, including neighborhood embedding (NE) [44], joint sparse coding [45] (JSC), and anchored neighbor regression (ANR) [46] (A brief review is provided as Appendix 2). With a pre-estimated blur-kernel, we generate the final super-resolved image using a simple reconstruction-based nonblind SR method [52], which is regularized by the natural hyper-Laplacian image prior [68] (Fig. 17).

Fig. 18
figure 18

Nonparametric blind SR (\(\text {SR}\times 2\)) with low-res Soldier using the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]), Michaeli and Irani [56], and nonblind SR methods [4446]. The first image in the top row is the bicubic interpolated version of the low-res image using the Matlab function imresize

Fig. 19
figure 19

Nonparametric blind SR (\(\text {SR}\times 3\)) with low-res Flower using the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]), Michaeli and Irani [56], and nonblind SR methods [4446]. The first image in the top row is the bicubic interpolated version of the low-res image using the Matlab function imresize

Fig. 20
figure 20

Nonparametric blind SR (\(\text {SR}\times 2\)) with low-res Chip using the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]). The first image in the top row is the bicubic interpolated version of the low-res image using the Matlab function imresize

Fig. 21
figure 21

Nonparametric blind SR (\(\text {SR}\times 4\)) with low-res University using the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]). The first image in the top row is the bicubic interpolated version of the low-res image using the Matlab function imresize

Using low-res blurred images, we make comparisons between the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]) and state-of-the-art nonparametric blind SR method proposed by Michaeli and Irani [56].Footnote 5 The synthetic low-res image in Fig. 15 (\(\text {SR}\times 2\)) is of severe motion blur. We see that the SR results (Proposed + [44], Proposed + [45], Proposed + [46]) corresponding to our estimated kernels are of better visual quality than that of [56] and the state-of-the-art nonblind SR methods [4446], showing the necessity of blur-kernel estimation for achieving high performance of SR reconstruction. Actually, [56] fails in this example, producing a smooth Gaussian-like blur-kernel. The low-res images corresponding to Figs. 18, 19, 20 and 21 are real. In Fig. 18 (\(\text {SR}\times 2\)), the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]) still performs better, producing clearer images without notable ringing artifacts. However, in the super-resolved image corresponding to [56], remarkable ringing artifacts can be observed, in that the produced blur- kernel by [56] has a larger support than the true one [52]. We also see that the proposed kernel estimation method is robust to nonblind SR methods to some degree, although it is empirically shown in [46] that ANR performs comparatively or better than JSC on average while NE is worse than JSC in the term of reconstruction quality. In Fig. 19 (\(\text {SR}\times 3\)), the proposed kernel estimation method achieves comparable performance to [56], in spite of that they produce visually different kernels, and their SR images are naturally better than those of nonblind SR methods. In Fig. 20 (\(\text {SR}\times 2\)) and Fig. 21 (\(\text {SR}\times 4\)), another two groups of nonparametric blind SR results are provided for the proposed kernel estimation method (-NE [44], -JSC [45], -ANR [46]), validating its effectiveness, robustness, and reasonableness. In the above mentioned experiments, the bicubic interpolation images are also provided for comparison.

6 Conclusion

This paper provides a unified optimization perspective to single/multi-observation blur-kernel estimation, and therefore bridges the gap between the two highly related problems. In specific, blur-kernel estimation, either single-shot or multi- observation, is formulated into an \({\varvec{\fancyscript{l}}}_{{0.5}}\)-norm-regularized negative log-marginal-likelihood minimization problem, which couples the VB and MAP principles, and imposes a universal, three-layer hierarchical prior on the latent sharp image and a Gamma hyper-prior on each inverse noise variance. By borrowing ideas of EM, MM, MFA, and IRLS, all the unknowns of interest, including the sharp image, each inverse noise variance and blur-kernel, and other relevant model parameters are estimated automatically. The experimental results on Levin et al.’s [8] benchmark real-world motion-blurred images as well as the simulated multiple-blurred observations validate the effectiveness and superiority of the proposed kernel estimation approach for camera-shake deblurring. Our kernel estimation approach is also applied to nonparametric single-image blind SR, achieving comparable or even better performance than the state-of-the-art method [56] recently proposed by Michaeli and Irani.