1 Introduction

1.1 Background and Motivation

Sparse representation of signals/images has been widely studied by signal and image processing communities in the past decades. Historically, sparsity dated back to the idea of variable selection studied by statisticians in late 1970s (Hocking 1976) and coring operator invented by RCA researchers in early 1980s (Carlson et al. 1985). The birth of wavelet (Daubechies 1988) (a.k.a. filter bank theory (Vetterli 1986) or multi-resolution analysis (Mallat 1989) in late 1980s rapidly sparkled the interest in sparse representation, which has found successful applications into image coding (Shapiro 1993; Said and Pearlman 1996; Taubman and Marcellin 2001) and denoising (Donoho and Johnstone 1994; Mihcak and Ramchandran 1999; Chang et al. 2000). Under the framework of sparse coding, a lot of research have been centered at two related issues: basis functions (or dictionary) and statistical modeling of sparse coefficients. Exemplar studies of the former are the construction of directional multiresolution representation (e.g., contourlet (Do and Vetterli 2005)) and over-complete dictionary learning from training data (e.g., K-SVD (Aharon et al. 2012; Elad and Aharon 2012), multiscale dictionary learning (Mairal et al. 2008), online dictionary learning (Mairal et al. 2009a) and the non-parametric Bayesian dictionary learning (Zhou et al. 2009)); the latter includes the use of Gaussian mixture models (Zoran and Weiss 2011; Yu et al. 2012), variational Bayesian models (Ji et al. 2008; Wipf et al. 2011), universal models (Ramirez and Sapiro 2012), and centralized Laplacian model (Dong et al. 2013b) for sparse coefficients.

More recently, a class of nonlocal image restoration techniques (Buades et al. 2005; Dabov et al. 2007; Mairal et al. 2009b; Katkovnik et al. 2010; Dong et al. 2011, 2013a, b) have attracted increasingly more attention. The key motivation behind lies in the observation that many important image structures in natural images including edges and textures can be characterized by the abundance of self-repeating patterns. Such observation has led to the formulation of simultaneous sparse coding (SSC) (Mairal et al. 2009b). Our own recent works (Dong et al. 2011, 2013a) have continued to demonstrate the potential of exploiting SSC in image restoration. However, a fundamental question remains open: how to achieve (local) spatial adaptation within the framework of nonlocal image restoration? This question is related to the issue of regularization parameters in a variational setting or shape parameters in a Bayesian one; but the issue becomes even more thorny when it is tangled with nonlocal regularization/sparsity. To the best of our knowledge, how to tune those parameters in a principled manner remains an open problem (e.g, please refer to (Ramirez and Sapiro 2012) and its references for a survey of recent advances).

In this work, we propose a new image model named SSC–GSM that connects Gaussian scale mixture (GSM) with SSC. Our idea is to model each sparse coefficient as a Gaussian distribution with a positive scaling variable and impose a sparse distribution prior (i.e., the Jeffrey prior (Box and Tiao 2011) used in this work) over the positive scaling variables. We show that the maximum a posterior (MAP) estimation of both sparse coefficients and scaling variables can be efficiently calculated by the method of alternating minimization. By characterizing the set of sparse coefficients of similar patches with the same prior distribution (i.e., the same non-zero means and positive scaling variables), we can effectively exploit both local and nonlocal dependencies among the sparse coefficients, which have been shown important for image restoration applications (Dabov et al. 2007; Mairal et al. 2009b; Dong et al. 2011). Our experimental results have shown that SSC–GSM based image restoration can deliver images whose subjective and objective qualities are often higher than other competing methods. Visual quality improvements are attributed to better preservation of edge sharpness and suppression of undesirable artifacts in the restored images.

1.2 Relationship to Other Competing Approaches

The connection between sparse coding and Bayesian inference has been previously studied in sparse Bayesian learning (Tipping 2001; Wipf and Rao 2004, 2007) and more recently in Bayesian compressive sensing (Ji et al. 2008), latent variable Bayesian models for promoting sparsity (Wipf et al. 2011). Despite offering a generic theoretical foundation as well as promising results, the Bayesian inference techniques along this line of research often involve potentially expensive sampling (e.g., approximated solutions for some choice of prior are achieved in Wipf et al. (2011)). By contrast, our SSC–GSM approach is conceptually much simpler and admits analytical solutions involving iterative shrinkage/filtering operators only. The other works closely related to the proposed SSC–GSM are group sparse coding with a Laplacian scale mixture (LSM) (Garrigues and Olshausen 2010)and field-of-GSM (Lyu and Simoncelli 2009). In (Garrigues and Olshausen 2010), a LSM model with Gamma distribution imposed over the scaling variables was used to model the sparse coefficients. Approximated estimates of the scale variables were obtained using the Expectation-Maximization (EM) algorithm. Note that the scale variables derived in LSM (Garrigues and Olshausen 2010) is very similar to the weights derived in the reweighted \(l_1\)-norm minimization (Candes et al. 2008). In contrast to those approaches, a GSM model with nonzero means and a noninformative sparse prior imposed over scaling variables are used to model the sparse coefficients here. Instead of using the EM algorithm for an approximated solution, our SSC–GSM offers an efficient inference of both scaling variables and sparse coefficients via alternating optimization method. In Lyu and Simoncelli (2009), a field of FSM (FoGSM) model was constructed using the product of two independent homogeneous Gaussian Markov random fields (hGMRFs) to exploit the dependencies among adjacent blocks. Despite similar motivations to exploit the dependencies between the scaling variables, the techniques used in Lyu and Simoncelli (2009) is significantly different and requires a lot more computations than our SSC–GSM formulation.

The proposed work is related to non-parametric Bayesian dictionary learning (Zhou et al. 2012) or solving inverse problems with piecewise linear estimators (Yu et al. 2012) but ours is motivated by the connection between nonlocal SSC strategy and local parametric GSM model. When compared with previous works on image denoising (e.g., K-SVD denoising (Elad and Aharon 2012), spatially adaptive singular-value thresholding (Dong et al. 2013a) and Expected Patch Log Likelihood (EPLL) (Zoran and Weiss 2011)), SSC–GSM targets at a more general framework of combining adaptive sparse inference with dictionary learning. SSC–GSM based image deblurring has also been experimentally shown superior to existing patch-based methods (e.g., Iterative Decoupled Deblurring BM3D (IDD-BM3D) (Danielyan et al. 2012) and Nonlocal centralized sparse representation (NCSR) (Dong et al. 2013b)) and the gain in terms of ISNR is as much as \(1.5dB\) over IDD-BM3D (Danielyan et al. 2012) for some test images (e.g., \(butterfly\) image - please refer to Fig.  5).

The reminder of this paper is organized as follows. In Sect.  2, we formulate the sparse coding problem with GSM model and generalize it into the new SSC–GSM model. In Sect. 3, we elaborate on the details of how to solve SSC–GSM by alternative minimization and emphasize the analytical solutions for both subproblems. In Sect. 4, we study the application of SSC–GSM into image restoration and discuss efficient implementation of SSC–GSM based image restoration algorithms. In Sect. 5, we report our experimental results in image denoising, image deblurring and image super-resolution as supporting evidence for the effectiveness of SSC–GSM model. In Sect.  6, we make some conclusions about the relationship of sparse coding to image restoration as well as perspectives about the future directions.

2 Simultaneous Sparse Coding via Gaussian Scale Mixture Modeling

2.1 Sparse Coding with Gaussian Scale Mixture Model

The basic idea behind sparse coding is to represent a signal \(\varvec{x}\in {\mathbb {R}}^{n}\) (\(n\) is the size of image patches) as the linear combination of basis vectors (dictionary elements) \({\mathbf {D}}\varvec{\alpha }\) where \({\mathbf {D}}\in {\mathbb {R}}^{n\times K}, n\le K\) is the dictionary and the coefficients \(\varvec{\alpha }\in {\mathbb {R}}^{K}\) satisfies some sparsity constraint. In view of the challenge with \(l_0\)-optimization, it has been suggested that the original nonconvex optimization is replaced by its \(l_1\)-counterpart:

$$\begin{aligned} \varvec{\alpha }= {\mathop {\hbox {argmin }}\limits _{\varvec{\alpha }}}\Vert \varvec{x}- {\mathbf {D}}\varvec{\alpha }\Vert _{2}^{2} + \lambda \Vert \varvec{\alpha }\Vert _{1}, \end{aligned}$$
(1)

which is convex and easier to solve. Solving the \(l_{1}\)-norm minimization problem corresponds to the MAP inference of \(\varvec{\alpha }\) with an identically independent distributed (i.i.d) Laplacian prior \(P(\alpha _i)=\frac{1}{2\theta _i}e^{-\frac{|\alpha _i|}{\theta _i}}\), wherein \(\theta _i\) denotes the standard derivation of \(\alpha _i\). It is easy to verify that the regularization parameter should be set as \(\lambda _i=2\sigma _n^2/\theta _i\) when the i.i.d Laplacian prior is used, where \(\sigma _n^2\) denotes the variance of approximation errors. In practice, the variances \(\theta _i\)’s of each \(\alpha _i\) are unknown and may not be easy to accurately estimated from the observation \(\varvec{x}\) considering that real signal/image are non-stationary and may be degraded by noise and blur.

In this paper we propose to model sparse coefficients \(\varvec{\alpha }\) with a GSM (Andrews and Mallows 1974) model. The GSM model decomposes coefficient vector \(\varvec{\alpha }\) into the point-wise product of a Gaussian vector \(\varvec{\beta }\) and a hidden scalar multiplier \(\varvec{\theta }\) -i.e., \(\alpha _i=\theta _i \beta _i\), where \(\theta _i\) is the positive scaling variable with probability \(P(\theta _i)\). Conditioned on \(\theta _i\), a coefficient \(\alpha _i\) is Gaussian with the standard derivation of \(\theta _i\). Assuming that \(\theta _i\) are i.i.d and independent of \(\beta _i\), the GSM prior of \(\varvec{\alpha }\) can be expressed as

$$\begin{aligned} P(\varvec{\alpha })=\prod _i P(\alpha _i), ~P(\alpha _i) = \int _{0}^{\infty }P(\alpha _i|\theta _i)P(\theta _i)d\theta _i. \end{aligned}$$
(2)

As a family of probabilistic distributions, the GSM model can contain many kurtotic distributions (e.g., the Laplacian, Generalized Gaussian, and student’s t-distribution) given an appropriate \(P(\theta _{i})\).

Note that for most of choices of \(P(\theta _i)\) there is no analytical expression of \(P(\alpha _i)\) and thus it is difficult to compute the MAP estimates of \(\alpha _i\). However, such difficulty can be overcome by joint estimation of \((\alpha _i, \theta _i)\). For a given observation \(\varvec{x}={\mathbf {D}}\varvec{\alpha }+\varvec{n}\), where \(\varvec{n}\sim N(0,\sigma _{n}^{2})\) denotes the additive Gaussian noise, we can formulate the following MAP estimator

$$\begin{aligned} (\varvec{\alpha },\varvec{\theta })&= {{\mathrm{argmax}}}\log P(\varvec{x}|\varvec{\alpha },\varvec{\theta }) P(\varvec{\alpha },\varvec{\theta })\nonumber \\&= {{\mathrm{argmax}}}\log P(\varvec{x}|\varvec{\alpha })+\log P(\varvec{\alpha }|\varvec{\theta }) + \log P(\varvec{\theta }), \end{aligned}$$
(3)

where \(P(\varvec{x}|\varvec{\alpha })\) is the likelihood term characterized by Gaussian function with variance \(\sigma _n^2\). The prior term \(P(\varvec{\alpha }|\varvec{\theta })\) can be expressed as

$$\begin{aligned} P(\varvec{\alpha }|\varvec{\theta }) \!=\! \prod _iP(\alpha _i|\theta _i)\!=\!\prod _i\frac{1}{\theta _i\sqrt{2\pi }}\exp \Big (-\frac{(\alpha _i-\mu _i)^2}{2\theta _i^2}\Big ).\nonumber \\ \end{aligned}$$
(4)

Instead of assuming the mean \(\mu _i=0\), we propose to use a biased-mean \(\mu _i\) for \(\alpha _i\) here inspired by our previous work NCSR (Dong et al. 2013b) (its estimation will be elaborated later).

The adoption of GSM model allows us to generalize the sparsity from statistical modeling of sparse coefficients \(\varvec{\alpha }\) to the specification of sparse prior \(P(\varvec{\theta })\). It has been suggested in the literature that noninformative prior (Box and Tiao 2011) \(P(\theta _i) \approx \frac{1}{\theta _i}\)—a.k.a. Jeffrey’s prior—is often the favorable choice. Therefore, we have also adopted this option in this work, which translates Eq. (3) into

$$\begin{aligned} (\varvec{\alpha },\varvec{\theta })&= {\mathop {\hbox {argmin }}\limits _{\varvec{\alpha },\varvec{\theta }}}\frac{1}{2\sigma _n^2}\Vert \varvec{x}-{\mathbf {D}}\varvec{\alpha }\Vert _2^2+\sum _i \log (\theta _i\sqrt{2\pi }) \nonumber \\&+\sum _i\frac{(\alpha _i-\mu _i)^2}{2\theta _i^2}+\sum _i \log \theta _i, \end{aligned}$$
(5)

where we have used \(P(\varvec{\theta })=\sum _i P(\theta _i)\). Noting that Jeffrey’s prior is unstable as \(\theta _i\rightarrow 0\); so we replace \(\log \theta _i\) by \(\log (\theta _i+\epsilon )\) where \(\epsilon \) is a small positive number for numerical stability and rewrite \(\sum _i \log (\theta _i+\epsilon )\) into \(\log (\varvec{\theta }+\epsilon )\) for notational simplicity. The above equation can then be further translated into the following sparse coding problem

$$\begin{aligned} (\varvec{\alpha },\varvec{\theta })&= {\mathop {\hbox {argmin }}\limits _{\varvec{\alpha },\varvec{\theta }}} \Vert \varvec{x}-{\mathbf {D}}\varvec{\alpha }\Vert _2^2+ 4\sigma _n^2 \log (\varvec{\theta }+\epsilon ) \nonumber \\&+\sigma _n^2\sum _i \frac{(\alpha _i-\mu _i)^2}{\theta _i^2}. \end{aligned}$$
(6)

Note that the matrix form of original GSM model is \(\varvec{\alpha }=\varLambda \varvec{\beta }\) and \(\varvec{\mu }=\varLambda \varvec{\gamma }\) where \(\varLambda =diag(\theta _i)\in {\mathbb {R}}^{K \times K}\) is a diagonal matrix characterizing the variance field for a chosen image patch. Accordingly, the sparse coding problem in Eq. (6) can be translated from \((\varvec{\alpha },\varvec{\mu })\) domain to \((\varvec{\beta },\varvec{\gamma })\) domain as follows

$$\begin{aligned} (\varvec{\beta },\varvec{\theta })&= {\mathop {\hbox {argmin }}\limits _{\varvec{\beta },\varvec{\theta }}} \Vert \varvec{x}-{\mathbf {D}}\varLambda \varvec{\beta }\Vert _{2}^2+4\sigma _{n}^{2} \log (\varvec{\theta }+\epsilon ) \nonumber \\&+ \sigma _{n}^{2}\Vert \varvec{\beta }-\varvec{\gamma }\Vert _{2}^{2}. \end{aligned}$$
(7)

In other words, the sparse coding formulation of GSM model boils down to the joint estimation of \(\varvec{\beta }\) and \(\varvec{\theta }\). But unlike (Portilla et al. 2003) that treats the multiplier as a hidden variable and cancel it out through integration (i.e., the derivation of Bayes Least-Square estimate), we explicitly use the field of Gaussian scalar multiplier to characterize the variability and dependencies among local variances. Such sparse coding formulation of GSM model is appealing because it allows us to further exploit the power of GSM by connecting it with structured sparsity as we will detail next.

2.2 Exploiting Structured Sparsity for the Estimationof the Field of Scalar multipliers

A key observation behind our approach is that for a collection of similar patches, their corresponding sparse coefficients \(\varvec{\alpha }\)’s should be characterized by the same prior i.e., the probability density function with the same \(\varvec{\theta }\) and \(\varvec{\mu }\). Therefore, if one considers the SSC of GSM models for a collection of \(m\) similar patches, the structured/group sparsity based extension of Eq. (7) can be written as

$$\begin{aligned} ({\mathbf {B}},\varvec{\theta })&= {\mathop {\hbox {argmin }}\limits _{{\mathbf {B}},\varvec{\theta }}} \Vert {\mathbf {X}}-{\mathbf {D}}\varLambda {\mathbf {B}}\Vert _F^2+4\sigma _n^2 \log (\varvec{\theta }+\epsilon ) \nonumber \\&+\sigma _{n}^{2} \Vert {\mathbf {B}}-\varGamma \Vert _F^2, \end{aligned}$$
(8)

where \({\mathbf {X}}=[\varvec{x}_1,...,\varvec{x}_m]\) denotes the collection of \(m\) similar patchesFootnote 1, \({\mathbf {A}}=\varLambda {\mathbf {B}}\) is the group representation of GSM model for sparse coefficients and their corresponding first-order and second-order statistics are characterized by \(\varGamma =[\varvec{\gamma }_{1},...,\varvec{\gamma }_{m}]\in {\mathbb {R}}^{K\times m}\) and \({\mathbf {B}}=[\varvec{\beta }_{1},...,\varvec{\beta }_{m}]\in {\mathbb {R}}^{K\times m}\) respectively, wherein \(\varvec{\gamma }_j=\varvec{\gamma }, j=1,2,\ldots ,m\). Given a collection of \(m\) similar patches, we have adopted the nonlocal means approach (Buades et al. 2005) for estimating \(\varvec{\mu }\)

$$\begin{aligned} \varvec{\mu }=\sum _{j=1}^m w_j \varvec{\alpha }_j, \end{aligned}$$
(9)

where \(w_j\sim \exp (-\Vert \varvec{x}-\varvec{x}_j\Vert _2^2/h))\) is the weighting coefficient based on patch similarity. It follows from \(\varvec{\mu }=\varLambda \varvec{\gamma }\) that

$$\begin{aligned} \varvec{\gamma }=\sum _{j=1}^m w_j \varvec{\varLambda }^{-1}\varvec{\alpha }_j=\sum _{j=1}^m w_j \varvec{\beta }_j. \end{aligned}$$
(10)

A practical issue of Eq. (10) is that the original patches \(\varvec{x}_j\) and \(\varvec{x}\), as well as the sparse codes \(\varvec{\alpha }_j\) and \(\varvec{\alpha }\) are not available, and thus we cannot directly compute \(\varvec{\gamma }\) using the Eq. (10). To avoid such difficulty, we can treat \(\varvec{\gamma }\) as another optimization variable and jointly estimate it with the sparse coefficients as

$$\begin{aligned} ({\mathbf {B}},\varvec{\theta },\varvec{\gamma })&= {\mathop {\hbox {argmin }}\limits _{{\mathbf {B}},\varvec{\theta },\varvec{\gamma }}} \Vert {\mathbf {X}}-{\mathbf {D}}\varLambda {\mathbf {B}}\Vert _F^2+4\sigma _n^2 \log (\varvec{\theta }+\epsilon ) \nonumber \\&+ \sigma _n^2 \Vert {\mathbf {B}}-\varGamma \Vert _F^2, \;\; \text{ s. } \text{ t. } \;\varvec{\gamma }={\mathbf {B}}\varvec{w}, \end{aligned}$$
(11)

where the weights \(\varvec{w}=[w_1,\ldots ,w_m]^T\) are pre-computed using the initial estimate of the image. Using the alternative directional multiplier method (ADMM) (Boyd et al. 2011), Eq.  (11) can be approximately solved. However, the computational complexity for solving the sub-problem of \(\varvec{\gamma }\) is high. Alternatively, we can overcome such difficulty by iteratively estimating \(\varvec{\gamma }\) from the current estimates of the sparse coefficients without any sacrifice of the performance. Let \(\varvec{\beta }_j=\hat{\varvec{\beta }}_j+\varvec{e}_j\), wherein \(\varvec{e}_j\) denotes the estimation error of \(\varvec{\beta }_j\) and is assumed to be Gaussian and zero-mean. Then, Eq. (10) can be re-expressed as

$$\begin{aligned} \varvec{\gamma }=\sum _{j=1}^m w_j \hat{\varvec{\beta }}_j+\sum _{j=1}^m\varvec{e}_j = \hat{\varvec{\gamma }} + \varvec{n}_w, \end{aligned}$$
(12)

where \(\varvec{n}_w\) denotes the estimation error of \(\varvec{\gamma }\). As \(\varvec{e}_j\) is assumed to be zero-mean Gaussian, \(\varvec{n}_w\) would be small. Thus, \(\varvec{\gamma }\) can be readily obtained from the estimates of representation coefficients \(\varvec{\beta }_j\). In practice, we recursively compute \(\varvec{\gamma }\) using the previous estimates of \(\varvec{\beta }_j\) after each iteration.

We call such new formulation in Eq. (8) Simultaneous Sparse Coding for Gaussian Scalar Mixture (SSC–GSM) and propose to develop computationally efficient solution to this problem in the next section. Note that here the formulation of SSC–GSM in Eq. (8) is for a given dictionary \({\mathbf {D}}\). However, the dictionary \({\mathbf {D}}\) can also be optimized for a fixed pair of \(({\mathbf {B}}, \varvec{\theta })\) such that both dictionary learning and statistical modeling of sparse coefficients can be unified within the framework of Eq. (8). Figure 1 shows the denoising results by the two proposed methods that use the model of Eqs. (7) and (8), respectively. Both the two proposed methods use the same dictionary \({\mathbf {D}}\) and \(\varvec{\gamma }\). From Fig. 1 we can see that the proposed method without nonlocal extension outperforms the KSVD method (Elad and Aharon 2012). With the nonlocal extension, the denoising performance of the proposed method is significantly improved.

Fig. 1
figure 1

Denoising performance comparison between the variants of the proposed method. (a) Noisy image (\(\sigma _n=20\)); (b) KSVD (Elad and Aharon 2012) (PSNR = 29.90 dB); (c) the proposed method (without nonlocal extension) uses the model of Eq.  (7) (PSNR = 30.18 dB); (d) the proposed SSC–GSM method (PSNR = 30.84 dB)

3 Solving Simultaneous Sparse Coding via Alternating Minimization

In this section, we will show how to solve the optimization problem in Eq. (8) by alternatively updating the estimates of \({\mathbf {B}}\) and \(\varvec{\theta }\). The key observation lies in that the two subproblems - minimization of \({\mathbf {B}}\) for a fixed \(\varvec{\theta }\) and minimization of \(\varvec{\theta }\) for a fixed \({\mathbf {B}}\) - both can be efficiently solved. Specifically, both subproblems admits closed-form solutions when the dictionary is orthogonal.

3.1 Solving \(\varvec{\theta }\) for a Fixed \({\mathbf {B}}\)

For a fixed \({\mathbf {B}}\), the first subproblem simply becomes

$$\begin{aligned} \varvec{\theta }={\mathop {\hbox {argmin }}\limits _{\varvec{\theta }}}\Vert {\mathbf {X}}-{\mathbf {D}}\varLambda {\mathbf {B}}\Vert _F^2 +4\sigma _{n}^{2} \log (\varvec{\theta }+\epsilon ), \end{aligned}$$
(13)

which can be rewritten as

$$\begin{aligned} \varvec{\theta }&= {\mathop {\hbox {argmin }}\limits _{\varvec{\theta }}} \Vert {\mathbf {X}}-\sum _{i=1}^{K}\varvec{d}_i\varvec{\beta }^{i}\theta _i\Vert _F^2+4\sigma _n^2 \log (\varvec{\theta }+\epsilon ) \nonumber \\&= {\mathop {\hbox {argmin }}\limits _{\varvec{\theta }}} \Vert \tilde{\varvec{x}} - \tilde{{\mathbf {D}}} \varvec{\theta }\Vert _{2}^{2} +4\sigma _{n}^{2}\log (\varvec{\theta }+\epsilon ), \end{aligned}$$
(14)

where the long vector \(\tilde{\varvec{x}}\!\in \! {\mathbb {R}}^{nm}\) denotes the vectorization of the matrix \({\mathbf {X}}\), the matrix \(\tilde{{\mathbf {D}}}\!=\![\tilde{\varvec{d}}_1, \tilde{\varvec{d}}_2, \ldots , \tilde{\varvec{d}}_K]\!\in \! {\mathbb {R}}^{mn\times K}\) whose each column \(\tilde{\varvec{d}}_j\) denotes the vectorization of the rank-one matrix \(\varvec{d}_i\varvec{\beta }^i\), and \(\varvec{\beta }^i\in {\mathbb {R}}^{m}\) denotes the \(i\)-th row of matrix \({\mathbf {B}}\). For optimizing the nonconvex \(\log \) penalty in Eq. (14), the principled difference of convex functions (DC) programming approach can be used for a local minimum (Gasso et al. 2009). It has also been shown in Candes et al. (2008) that the \(\log \) penalty can be linearly approximated and thus a local minimum of the nonconvex objective function can be obtained by iteratively solving a weighted \(\ell _1\)-penalized optimization problem.

However, the optimization of Eq. (13) can be much simplified when the dictionary \({\mathbf {D}}\) is orthogonal (e.g., DCT or PCA basis). In the case of orthogonal dictionary, Eq. (13) can be rewritten as

$$\begin{aligned} \varvec{\theta }={\mathop {\hbox {argmin }}\limits _{\varvec{\theta }}} \Vert {\mathbf {A}}-\varLambda {\mathbf {B}}\Vert _F^2+4\sigma _n^2 \log (\varvec{\theta }+\epsilon ), \end{aligned}$$
(15)

where we have used \({\mathbf {X}}={\mathbf {D}}{\mathbf {A}}\). For expression convenience, we can rewrite Eq. (15) as

$$\begin{aligned} \varvec{\theta }= {\mathop {\hbox {argmin }}\limits _{\varvec{\theta }}} \sum _{i}{a_i\theta _i^2+b_i\theta _i+c\log {\theta _i+\epsilon }}, \end{aligned}$$
(16)

where \(a_i=\Vert \varvec{\beta }^i\Vert _2^2\), \(b_i=-2\varvec{\alpha }^i(\varvec{\beta }^i)^T\) and \(c=4\sigma _n^2\). Hence, Eq. (16) can be decomposed into a sequence of scalar minimization problem, i.e.,

$$\begin{aligned} \theta _i = {\mathop {\hbox {argmin }}\limits _{\theta _i}} a_i\theta _i^2+b_i\theta _i+c\log (\theta _i+\epsilon ), \end{aligned}$$
(17)

which can be solved by taking \(\frac{df(\theta _i)}{d\theta _i}=0\), where \(f(\theta _i)\) denotes the right hand side of Eq.  (17). We derive

$$\begin{aligned} g(\theta _i) = \frac{df(\theta _i)}{d\theta _i} = 2a_i\theta _i+b_i+\frac{c}{\theta _i+\epsilon }. \end{aligned}$$
(18)

By solving \(g(\theta _i)=0\), we obtain the following two stationary points of \(f(\theta _i)\), i.e.,

$$\begin{aligned} \theta _{i,1}=-\frac{b_i}{4a_i}+\sqrt{\frac{b_i^2}{16}-\frac{c}{2a_i}},\quad \theta _{i,2}=-\frac{b_i}{4a_i}-\sqrt{\frac{b_i^2}{16}-\frac{c}{2a_i}},\nonumber \\ \end{aligned}$$
(19)

when \(b_i^2/(16a_i^2)-c/(2a_i)\ge 0\). Then, the global minimizer of \(f(\theta _i)\) can be obtained by comparing \(f(0)\), \(f(\theta _{i,1})\) and \(f(\theta _{i,2})\).

When \(b_i^2/(16a_i^2)-c/(2a_i)<0\), there does not exist any stationary points in the range of \([0, \infty )\). As \(\epsilon \) is a small positive constant, \(g(0)=b_i+c/\epsilon \) is always positive. Thus, \(f(0)\) is the global minimizer for this case. In summary, the solution to Eq. (17) can be written as

$$\begin{aligned} \theta _{i}= \left\{ \begin{array}{l@{\quad }l} 0, &{} \text {if}\; b_i^2/(16a_i^2)-c/(2a_i)<0, \\ v_i, &{} \text {otherwise} \end{array} \right. \end{aligned}$$
(20)

where

$$\begin{aligned} v_{i}={\mathop {\hbox {argmin }}\limits _{\theta _i}} \{f(0), f(\theta _{i,1}), f(\theta _{i,2}) \}. \end{aligned}$$
(21)

3.2 Solving \({\mathbf {B}}\) for a Fixed \(\varvec{\theta }\)

The second subproblem is in fact even easier to solve. It takes the following form

$$\begin{aligned} {\mathbf {B}}={{\mathrm{argmin}}}\Vert {\mathbf {X}}-{\mathbf {D}}\varvec{\varLambda }{\mathbf {B}}\Vert _{F}^{2}+\sigma _{n}^{2} \Vert {\mathbf {B}}-\varGamma \Vert _{F}^{2}. \end{aligned}$$
(22)

Since both terms are \(l_2\), the closed-form solution to Eq.  (22) is essentially the classical Wiener filtering

$$\begin{aligned} {\mathbf {B}}=\left( \hat{{\mathbf {D}}}^T \hat{{\mathbf {D}}}+\sigma _n^2{\mathbf {I}}\right) ^{-1}\left( \hat{{\mathbf {D}}}^T {\mathbf {X}}+\varGamma \right) , \end{aligned}$$
(23)

where \(\hat{{\mathbf {D}}}={\mathbf {D}}\varvec{\varLambda }\). Note that when \({\mathbf {D}}\) is orthogonal, Eq. (23) can be further simplified into

$$\begin{aligned} {\mathbf {B}}=\left( \varvec{\varLambda }^{T} \varvec{\varLambda }+\sigma _{n}^{2}{\mathbf {I}}\right) ^{-1}\left( \varvec{\varLambda }^{T} {\mathbf {A}}+\varGamma \right) , \end{aligned}$$
(24)

where \(\varvec{\varLambda }^{T} \varvec{\varLambda }+\sigma _{n}^{2}{\mathbf {I}}\) is a diagonal matrix and therefore its inverse can be easily computed.

By alternatively solving both subproblems of Eqs. (13) and (22) for the estimates of \(\varvec{\varLambda }\) and \({\mathbf {B}}\), the image data matrix \({\mathbf {X}}\) can then be reconstructed as

$$\begin{aligned} \hat{{\mathbf {X}}}={\mathbf {D}}\hat{\varvec{\varLambda }}\hat{{\mathbf {B}}}, \end{aligned}$$
(25)

where \(\hat{\varvec{\varLambda }}\) and \(\hat{{\mathbf {B}}}\) denotes the final estimates of \(\varvec{\varLambda }\) and \({\mathbf {B}}\).

4 Application of Bayesian Structured Sparse Coding into Image Restoration

In the previous sections, we have seen how to solve SSC–GSM problem for a single image data matrix \({\mathbf {X}}\) (a collection of image patches similar to a chosen exemplar). In this section, we generalize such formulation to whole-image reconstruction and study the applications of SSC–GSM into image restoration including image denoising, image deblurring and image superresolution. The standard image degradation model is used here: \(\varvec{y}={\mathbf {H}}\varvec{x}+\varvec{w}\) where \(\varvec{x}\in {\mathbb {R}}^N,\varvec{y}\in {\mathbb {R}}^M\) denotes the original and degraded images respectively, \({\mathbf {H}}\in {\mathbb {R}}^{N\times M}\) is the degradation matrix and \(\varvec{w}\) is additive white Gaussian noise observing \(N(0,\sigma _n^2)\). The whole-image reconstruction problem can be formulated as

$$\begin{aligned} (\varvec{x},\{{\mathbf {B}}_l\},\{\varvec{\theta }_l\})&= {\mathop {\hbox {argmin }}\limits _{\varvec{x},\{{\mathbf {B}}_l\},\{\varvec{\theta }_l\}}} \Vert \varvec{y}-{\mathbf {H}}\varvec{x}\Vert _2^2 \nonumber \\&+\,\sum _{l=1}^L \{\eta \Vert \tilde{{\mathbf {R}}}\varvec{x}-{\mathbf {D}}\varvec{\varLambda }_l{\mathbf {B}}_l\Vert _F^2 \nonumber \\&+\,\sigma _n^2 \Vert {\mathbf {B}}-\varGamma \Vert _F^2 +4\sigma _n^2\log (\varvec{\theta }_l+\epsilon ) \}, \end{aligned}$$
(26)

where \(\tilde{{\mathbf {R}}}_l\varvec{x}\doteq [{\mathbf {R}}_{l_1}\varvec{x}, {\mathbf {R}}_{l_2}\varvec{x}, \ldots , {\mathbf {R}}_{l_m}\varvec{x}]\in {\mathbb {R}}^{n\times m}\) denotes the data matrix formed by a group of image patches similar to the \(l\)-th exemplar patch \(\varvec{x}_l\) (including \(\varvec{x}_l\) itself), \({\mathbf {R}}_l\in {\mathbb {R}}^{n\times N}\) denotes a matrix extracting the \(l\)-th patch \(\varvec{x}_l\) from \(\varvec{x}\), and \(L\) is the total number of exemplars extracted from the reconstructed image \(\varvec{x}\). For a given exemplar patch \(\varvec{x}_l\), we search similar patches by performing \(k\)-nearest-neighbor search within a large local window (e.g., \(40\times 40\)). As the original image is not available, we use the current estimate of original image for patch matching, i.e.,

$$\begin{aligned} S = \{l_j |~\Vert \hat{\varvec{x}}_l - \hat{\varvec{x}}_{l_j}\Vert _2^2 < T\}, \end{aligned}$$
(27)

where \(T\) denotes the pre-selected threshold and \(S\) denotes the collection of positions of those similar patches. Alternatively, we can form the sample set \(S\) by selecting the patches that are within the first \(m\) (\(m=40\) in our implementation) closest to \(\hat{\varvec{x}}_l\). Invoking the principle of alternative optimization again, we propose to solve the whole-image reconstruction problem in Eq. (26) by alternating the solutions to the following two subproblems.

4.1 Solving \(\varvec{x}\) for a Fixed \(\{{\mathbf {B}}_l\},\{\varvec{\theta }_l\}\)

Let \(\hat{{\mathbf {X}}}_l={\mathbf {D}}\varvec{\varLambda }_l{\mathbf {B}}_l\). When \(\{{\mathbf {B}}_l\}\) and \(\{\varvec{\theta }_l\}\) are fixed, so is \(\{\hat{{\mathbf {X}}}_l\}\). Therefore, Eq.  (26) reduces to the following \(l_2\)-optimization problem

$$\begin{aligned} \varvec{x}={\mathop {\hbox {argmin }}\limits _{\varvec{x}}}\Vert \varvec{y}-{\mathbf {H}}\varvec{x}\Vert _2^2+\sum _{l=1}^L\eta \Vert {\mathbf {R}}\varvec{x}_l-\hat{{\mathbf {X}}}_l\Vert _F^2, \end{aligned}$$
(28)

which admits the following closed-form solution

$$\begin{aligned} \varvec{x}\!=\!\left( {\mathbf {H}}^T{\mathbf {H}}\!+\!\eta \sum _{l=1}^L\tilde{{\mathbf {R}}}_l^T\tilde{{\mathbf {R}}}_l\right) ^{-1} \left( {\mathbf {H}}^T\varvec{y}+\eta \sum _{l=1}^L\tilde{{\mathbf {R}}}_l^T\hat{{\mathbf {X}}}_l \right) , \end{aligned}$$
(29)

where \(\tilde{{\mathbf {R}}}_l^T\tilde{{\mathbf {R}}}_l \doteq \sum _{j=1}^{m}{\mathbf {R}}_j^T{\mathbf {R}}_j\), \(\tilde{{\mathbf {R}}}_l^T\hat{{\mathbf {X}}}_l \doteq \sum _{j=1}^{m}{\mathbf {R}}_j^T\hat{\varvec{x}}_{l_j}\) and \(\hat{\varvec{x}}_{l_j}\) denotes the \(j\)-th column of matrix \(\hat{{\mathbf {X}}}_l\). Note that for image denoising application where \({\mathbf {H}}={\mathbf {I}}\) - the matrix to be inversed in Eq. (29)—is diagonal, and its inverse can be computed easily. Similar to the K-SVD approach, Eq.  (29) can be computed by weighted averaging each reconstructed patches sets \(\tilde{{\mathbf {X}}}_l\). For image deblurring and super-resolution applications, Eq. (29) can be computed by using a conjugate gradient (CG) algorithm.

4.2 Solving \(\{{\mathbf {B}}_l\},\{\varvec{\theta }_l\}\) for a Fixed \(\varvec{x}\)

When \(\varvec{x}\) is fixed, the first term in Eq. (26) goes away and the subproblem boils down to a sequence of patch-level SSC–GSM problems formulated for each exemplar - i.e.,

$$\begin{aligned} ({\mathbf {B}}_l,\varvec{\theta }_l)&= {\mathop {\hbox {argmin }}\limits _{{\mathbf {B}}_l,\varvec{\theta }_l}} \Vert {\mathbf {X}}_l-{\mathbf {D}}\varvec{\varLambda }_l{\mathbf {B}}_l\Vert _F^2 + \frac{\sigma _n^2}{\eta } \Vert {\mathbf {B}}-\varGamma \Vert _F^2 \nonumber \\&+ \frac{4\sigma _n^2}{\eta }\log (\varvec{\theta }_l+\epsilon ), \end{aligned}$$
(30)

where we use \({\mathbf {X}}_l=\tilde{{\mathbf {R}}}_l\varvec{x}\). This is exactly the problem we have studied in the previous section.

One important issue of the SSC–GSM-based image restoration is the selection of the dictionary. To adapt to the local image structures, instead of learning an over-complete dictionary for each dataset \({\mathbf {X}}_l\) as in Mairal et al. (2009b), we learn the principle component analysis (PCA) based dictionary for each dataset here (similar to NCSR (Dong et al. 2013b)). The use of the orthogonal dictionary much simplifies the Bayesian inference of SSC–GSM. Putting things together, a complete image restoration based on SSC–GSM can be summarized as follows.

In Algorithm 1 we update \({\mathbf {D}}_l\) in every \(k_0\) to save computational complexity. We also found that Algorithm 1 empirically converges even when the inner loop executes only one iteration (i.e., \(J=1\)). We note that the above algorithm can lead to a variety of implementations depending the choice of degradation matrix \({\mathbf {H}}\). When \({\mathbf {H}}\) is an identity matrix, Algorithm 1 is an image denoising algorithm using iterative regularization technique (Xu and Osher 2007). When \({\mathbf {H}}\) is a blur matrix or reduced blur matrix, Eq. (26) becomes the standard formulation of non-blind image deblurring or image super-resolution problem. The capability of capturing rapidly-changing statistics in natural images - e.g., through the use of GSM - can make patch-based nonlocal image models even more powerful.

figure a

5 Experimental Results

In this section, we report our experimental results of applying SSC–GSM based image restoration into image denoising, image deblurring and super-resolution. The experimental setup of this work is similar to that in our previous work on NCSR (Dong et al. 2013b). The basic parameter setting of SSC–GSM is as follows: patch size—\(6\times 6\), number of similar blocks— \(K=44\); \(k_{max}=14, k_0=1\) for image denoising, and \(k_{max}=450, k_0=40\) for image deblurring and super-resolution. The regularization parameter \(\eta \) is empirically set. Its values are shown in Table 1. To evaluate the quality of restored images, both PSNR and SSIM (Wang et al. 2004) metrics are used. However, due to limited page space, we can only show part of the experimental results in this paper. More detailed comparisons and complete experimental results are available at the following website: http://see.xidian.edu.cn/faculty/wsdong/SSC_GSM.htm.

Table 1 Parameters setting for each experiment
Table 2 The PSNR (dB) results by different denoising methods

5.1 Image Denoising

We have compared SSC–GSM based image denoising method against three current state-of-the-art methods including BM3D Image Denoising with Shape-Adaptive PCA (BM3D-SAPCA) (Katkovnik et al. 2010) (it is an enhanced version of BM3D denoising (Dabov et al. 2007) in which local spatial adaptation is achieved by shape-adaptive PCA), learned simultaneous sparse coding (LSSC) (Mairal et al. 2009b) and nonlocally centralized sparse representation (NCSR) denoising (Dong et al. 2013b). As can be seen from Table 2, the proposed SSC–GSM has achieved highly competitive denoising performance to other leading algorithms. For the collection of 12 test images, BM3D-SAPCA and SSC–GSM are mostly the best two performing methods - on the average, SSC–GSM falls behind BM3D-SAPCA by less than \(0.2dB\) for three out of six noise levels but deliver at least comparable for the other three. We note that the complexity of BM3D-SAPCA is much higher than that of the original BM3D; by contrast, our pure Matlab implementation of SSC–GSM algorithm (without any C-coded optimization) still runs reasonably fast. It takes around 20 s to denoise a \(256\times 256\) image on a PC with an Intel i7-2600 processor at 3.4GHz.

Figures. 2 and 3 include the visual comparison of denoising results for two typical images (\(lena\) and \(house\)) at moderate (\(\sigma _w=20\)) and heavy (\(\sigma _w=100\)) noise levels respectively. It can be observed from Fig. 2 that BM3D-SAPCA and SSC–GSM seem to deliver the best visual quality at the moderate noise level; by contrast, restored images by LSSC and NCSR both suffer from noticeable artifacts especially around the smooth areas close to the hat. When the noise contamination is severe, the superiority of SSC–GSM to other competing approaches is easier to justify—as can be seen from Fig. 3, SSC–GSM achieves the most visually pleasant restoration of the house image especially when one inspects the zoomed portions of roof regions closely.

Fig. 2
figure 2

Denoising performance comparison on the Lena image with moderate noise corruption. (a) Original image; (b) Noisy image (\(\sigma _{n}=20\)); denoised images by (c) BM3D-SAPCA (Katkovnik et al. 2010) (PSNR = 33.20 dB, SSIM = 0.8803); (d) LSSC (Mairal et al. 2009b) (PSNR = 32.88 dB, SSIM = 0.8742); (e) NCSR (Dong et al. 2013b) (PSNR = 32.92 dB, SSIM = 0.8760); (f) Proposed SSC–GSM (PSNR = 33.08, SSIM = 0.8787)

Fig. 3
figure 3

Denoising performance comparison on the House image with strong noise corruption. (a) Original image; (b) Noisy image (\(\sigma _n=100\)); denoised images by (c) BM3D-SAPCA (Katkovnik et al. 2010) (PSNR = 35.20 dB, SSIM = 0.6767); (d) LSSC (Mairal et al. 2009b) (PSNR = 25.63 dB, SSIM = 0.7389); (e) NCSR (Dong et al. 2013b) (PSNR = 25.65 dB, SSIM = 0.7434); (f) Proposed SSC–GSM (PSNR = 26.70, SSIM = 0.7430)

5.2 Image Deblurring

We have also compared SSC–GSM based image deblurring and three other competing approaches in the literature: constrained total variation image deblurring (denoted by FISTA), Iterative Decoupled Deblurring BM3D (IDD-BM3D) (Danielyan et al. 2012) and nonlocally centralized sparse representation (NCSR) denoising (Dong et al. 2013b). Note that the IDD-BM3D and NCSR are two recently developed state-of-the-art non-blind image deblurring approaches. In our comparative study, two commonly-used blur kernal i.e., \(9\times 9\) uniform and 2D Gaussian with standard deviation of 1.6; blurred images are further corrupted by additive white Gaussian noise with variance of \(\sigma _n=\sqrt{2}\). Table 3 includes the PSNR/SSIM comparison results for a collection of 11 images among four competing methods. It can be observed that SSC–GSM clearly outperforms all other three for 10 out of 11 images (the only exception is the \(house\) image for which IDD-BM3D slightly outperforms SSC–GSM by \(0.13dB\)). The gains are mostly impressive for \(butterfly\) and \(barbara\) images which contain abundant strong edges or textures. One possible explanation is that SSC–GSM is capable of striking a better tradeoff between exploiting local and nonlocal dependencies within those images.

Table 3 PSNR(dB) and SSIM results of the deblurred images

Figures 4, 5 and 6 show the visual comparison of deblurring results for three test images: \(starfish\), \(butterfly\) and \(barbara\) respectively. For \(starfish\), it can be observed that IDD-BM3D and NCSR achieve deblurred images with similar quality (both noticeably better than FISTA); restored image by SSC–GSM is arguably the most preferred when compared against the original one (even though the PSNR gain is impressive). For \(butterfly\) and \(barbara\), visual quality improvements achieved by SSC–GSM are readily observable—SSC–GSM is capable of both preserve the sharpness of edges and suppress undesirable artifacts. Such experimental findings clearly suggest that the SSC–GSM model is a stronger prior for the class of photographic images containing strong edges/textures.

Fig. 4
figure 4

Deblurring performance comparison on the Starfish image. (a) Original image; (b) Noisy and blurred image (\(9\times 9\) uniform blur, \(\sigma _n=\sqrt{2}\)); deblurred images by (c) FISTA (Beck and Teboulle 2009) (PSNR = 27.75 dB, SSIM=0.8200); (d) IDD-BM3D (Danielyan et al. 2012) (PSNR = 29.48  dB, SSIM=0.8640); (e) NCSR (Dong et al. 2013b) (PSNR = 30.28 dB, SSIM = 0.8807); (f) Proposed SSC–GSM (PSNR = 30.58 dB, SSIM = 0.8862)

Fig. 5
figure 5

Deblurring performance comparison on the Butterfly image. (a) Original image; (b) Noisy and blurred image (\(9\times 9\) uniform blur, \(\sigma _n=\sqrt{2}\)); deblurred images by (c) FISTA (Beck and Teboulle 2009) (PSNR = 28.37 dB, SSIM=0.9058); (d) IDD-BM3D (Danielyan et al. 2012) (PSNR = 29.21  dB, SSIM=0.9216); (e) NCSR (Dong et al. 2013b) (PSNR = 29.68 dB, SSIM = 0.9273); (f) Proposed SSC–GSM (PSNR = 30.45 dB, SSIM = 0.9377)

Fig. 6
figure 6

Deblurring performance comparison on the Barbara image. (a) Original image; (b) Noisy and blurred image (Gaussian blur, \(\sigma _n=\sqrt{2}\)); deblurred images by (c) FISTA (Beck and Teboulle 2009) (PSNR = 25.03 dB, SSIM = 0.7377); (d) IDD-BM3D (Danielyan et al. 2012) (PSNR = 27.19 dB, SSIM = 0.8231); (e) NCSR (Dong et al. 2013b) (PSNR = 27.91 dB, SSIM = 0.8304); (f) Proposed SSC–GSM (PSNR = 28.42 dB, SSIM = 0.8462)

5.3 Image Superresolution

In our study on image super-resolution, simulated LR images are acquired from first applying a \(7\times 7\) uniform blur to the HR image, then down-sampling the blurred image by a factor of 3 along each dimension, and finally adding white Gaussian noise with \(\sigma _{n}^{2}=25\) to the LR images. For color images, we work with the luminance channel only; simple bicubic interpolation method is applied to the upsampling of chrominance channels. Table 4 includes the PSNR/SSIM comparison for a set of 9 test images among four competing approaches. It can be seen that SSC–GSM outperforms others in most situations. Visual quality comparison as shown in Figs. 7 and 8 also justifies the superiority of SSC–GSM to other SR techniques.

Fig. 7
figure 7

Image super-resolution performance comparison on the Plant image (scaling factor 3, \(\sigma _n=0\)). (a) Original image; (b) Low-resolution image; reconstructed images by (c) TV (Marquina and Osher 2008) (PSNR = 31.34 dB, SSIM = 0.8797); (d) Sparsity-based (Yang et al. 2010) (PSNR = 31.55 dB, SSIM = 0.8964); (e) NCSR (Dong et al. 2013b) (PSNR = 34.00 dB, SSIM = 0.9369); (f) Proposed SSC–GSM (PSNR = 34.33 dB, SSIM = 0.9236)

Fig. 8
figure 8

Image super-resolution performance comparison on the Hat image (scaling factor 3, \(\sigma _n=5\)). (a) Original image; (b) Low-resolution image; reconstructed images by (c) TV (Marquina and Osher 2008) (PSNR = 28.13 dB, SSIM = 0.7701); (d) Sparsity-based (Yang et al. 2010) (PSNR = 28.31 dB, SSIM = 0.7212); (e) NCSR (Dong et al. 2013b) (PSNR = 29.94 dB, SSIM = 0.8238); (f) Proposed SSC–GSM (PSNR = 30.21 dB, SSIM = 0.8354)

Table 4 PSNR(dB) and SSIM results(luminance components) of the reconstructed HR images

5.4 Running Time

The proposed SSC–GSM algorithm was implemented under Matlab. The running time of the proposed method with comparison to other competing methods is reported in Table 5. The LSSC method is implemented in other platform and thus we don’t report its running time. For image denoising, the proposed SSC–GSM method is about 6–9 times faster than the NCSR and BM3D-SAPCA methods. For image deblurring and superresolution, the proposed SSC–GSM method is much slower than other competing methods, as it requires more than four hundreds iterations. Since the patch grouping and the SSC for each exemplar patch can be implemented in parallel, the proposed SSC–GSM method can be much speeded up by using parallel computation techniques (e.g., GPU). Another way to accelerate the proposed method is to improve the convergence speed of Algorithm 1, which we remain it as future work.

Table 5 Running time (sec) and the number of iterations (in parenthesis) of the test methods on a \(256\times 256\) test image on Intel Core i7-3770 CPU

6 Conclusions

In this paper, we proposed a new image model named SSC–GSM that connects SSC with GSM and explore its applications into image restoration. The proposed SSC–GSM model attempts to characterize both the biased-mean (like in NCSR) and spatially-varying variance (like in GSM) of sparse coefficients. It is shown that the formulated SSC–GSM problem, thanks to the power of alternating direction method of multipliers - can be decomposed into two subproblems both of which admit closed-form solutions when orthogonal basis is used. When applied to image restoration, SSC–GSM leads to computationally efficient algorithms involving iterative shrinkage/filtering only.

Extensive experimental results have shown that SSC–GSM can both preserve the sharpness of edges and suppress undesirable artifacts more effectively than other competing approaches. This work clearly shows the importance of spatial adaptation regardless the underlying image model is local or nonlocal; in fact, local variations and nonlocal invariance are two sides of the same coin - one has to take both of them into account during the art of image modeling.

In addition to image restoration, SSC–GSM can also be further studied along the line of dictionary learning. In our current implementation, we use PCA basis for its facilitating the derivation of analytical solutions. For non-unitary dictionary, we can solve the SSC–GSM problem by reducing it to iterative reweighted \(l_1\)-minimization problem (Candes et al. 2008). It is also possible to incorporate dictionary \({\mathbf {D}}\) into the optimization problem formulated in Eq. (5); and from this perspective, we can view SSC–GSM as a generalization of K-SVD algorithm. Joint optimization of dictionary and sparse coefficients is a more difficult problem and deserves more study. Finally, it is interesting to explore the relationship of SSC–GSM to the ideas in Bayesian nonparametrics (Polson and Scott 2010; Zhou et al. 2012) as well as the idea of integrating over hidden variacles like BLS-GSM (Portilla et al. 2003).