Bidirectionally aligned sparse representation for single image super-resolution

Xie, Chao; Zeng, Weili; Jiang, Shengqin; Lu, Xiaobo

doi:10.1007/s11042-017-4689-7

Bidirectionally aligned sparse representation for single image super-resolution

Published: 19 April 2017

Volume 77, pages 7883–7907, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Bidirectionally aligned sparse representation for single image super-resolution

Download PDF

Chao Xie^1,2,
Weili Zeng³,
Shengqin Jiang^1,2 &
…
Xiaobo Lu^1,2

1323 Accesses
4 Citations
Explore all metrics

Abstract

It has been demonstrated that the sparse representation based framework is one of the most popular and promising ways to handle the single image super-resolution (SISR) issue. However, due to the complexity of image degradation and inevitable existence of noise, the coding coefficients produced by imposing sparse prior only are not precise enough for faithful reconstructions. In order to overcome it, we present an improved SISR reconstruction method based on the proposed bidirectionally aligned sparse representation (BASR) model. In our model, the bidirectional similarities are first modeled and constructed to form a complementary pair of regularization terms. The raw sparse coefficients are additionally aligned to this pair of standards to restrain sparse coding noise and therefore result in better recoveries. On the basis of fast iterative shrinkage-thresholding algorithm, a well-designed mathematic implementation is introduced for solving the proposed BASR model efficiently. Thorough experimental results indicate that the proposed method performs effectively and efficiently, and outperforms many recently published baselines in terms of both objective evaluation and visual fidelity.

Sparse representation based single image super-resolution with low-rank constraint and nonlocal self-similarity

Article 20 January 2017

Sparse Representation with Global and Nonlocal Self-similarity Prior for Single Image Super-Resolution

Single Image Super-Resolution Based on Nonlocal Sparse and Low-Rank Regularization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image super-resolution (SR) is by far a flourishing branch of image processing, concerning the particular issue of image resolution enhancement. When given one or more low-resolution (LR) observation(s) of the same scene, SR is capable of reconstructing a visually pleasing high-resolution (HR) output (i.e., containing more image details) [30], which might be a crucial preprocessing procedure in a wide range of digital imaging applications, such as medical diagnosis, remote sensing, and vehicle license plate recognition [53], to name but a few. Thus this technique provides us with a much economical and promising way to transcend the inherent limitations of LR optical imaging systems in place of utilizing sensor manufacturing technology, making it one of the most appealing research areas for image processing experts.

Numerous SR algorithms have been proposed over the last three decades or so. Judging from the domain employed, SR can be directly divided into two families: frequency domain approaches and spatial domain approaches. The initial research on SR [5, 24, 25, 42] belongs to the former. Although frequency domain approaches are theoretically and computationally simple, they can only be applied to pure translational model and are extremely susceptible to model errors according to the definition, which severely limits their prevalence [33]. For this reason, more and more researchers started to reconsider SR issue in spatial domain. Similarly, the proposed diverse spatial domain approaches can be roughly classified into two categories again in terms of the number of LR inputs required, i.e., multiple images SR and single image SR. For the multiple inputs case, a variety of approaches have been developed from different points of view, such as iteration back projection (IBP) based methods [18, 36], projection onto convex sets (POCS) based methods [1, 37], maximum a posteriori (MAP) estimation based methods [23, 43], regularization based methods [20, 26, 31, 32, 52, 54, 58], etc. However, it has been pointed out that the performance of this kind of methods all degrades dramatically under three circumstances where (a) the amount of LR inputs is inadequate; (b) the estimate of motion is imprecise; or (c) the scale factor increases [2, 21, 27, 45, 46].

The aforementioned limitations can be broken through by the way of exploring the other type of SR issue, i.e., single image SR (SISR). Clearly, SISR is the extreme case of SR when there is only one LR observation. Due to the ultra-insufficiency of input information, most of the proposed methods are built on the basic concept of Freeman et al. [22] assuming that the high frequency details lost in a LR image can be predicted or hallucinated by learning the co-occurrence relationship between LR patches and their corresponding HR patches extracted from a training image set, so they can be called the example or learning based methods. One group of them is neighbor embedding (NE) based methods [10, 29] which were first explored by Chang et al. These methods are based on the assumption in machine learning that small patches in LR and HR images form two manifolds lying in two distinct feature spaces but with similar local geometric structures. Thus, the SR output can be estimated by weighted summation of the K nearest HR neighbors found in the corresponding HR training database. However, the heavy running time of this method was ignored despite the fact that it is very important to real applications. Timofte et al. [39–41] incorporated the concept of learning and calculating a set of sparse dictionaries [55] (projection matrices) beforehand into the basic framework of NE to accelerate its computing process.

Another group of learning based methods, which is most relevant to this paper, is derived from the theory of sparse representation that most natural images are sparse or compressible actually when represented in the proper basis [7]. The representative work of sparse representation based SISR was proposed by Yang et al. [49, 50]. In their papers, a universal pair of HR dictionary and its corresponding LR dictionary is learned first by extracting raw patches randomly from some training images, and then sparse coding process is applied to the overlapping patches sampled in the input LR image with a raster-scan order to get the sparse coefficients. Finally, the SR output is recovered by averaging all the overlapping HR patches produced by the product of HR dictionary and coefficients. This scheme is proved to lead to a state-of-the-art result at that time, whereas it is very time-consuming to obtain two large dictionaries by randomly sampling. Hence, Zeyde et al. [55] put forward an improved method in which dimensionality reduction method is applied to the raw patches first to accelerate the subsequent dictionary learning process. To employ the priors of the training patches, clustering techniques were also introduced into the framework of sparse representation based SISR. For instance, Yang et al. [51] and Dong et al. [12] both utilize K-means algorithm to cluster the training raw patches into dozens of groups from which the multiple dictionaries can be learned. The superiority of their multiple dictionaries to the universal dictionary is experimentally validated.

Even though dictionary learning is an important procedure of sparse representation based SISR and the aforementioned progress actually has been made by studying on it, some more recent work [13, 14, 28, 34, 35, 47, 48] indicates that the accuracy of the sparse coefficients produced in the sparse coding process are more helpful to the performance of SISR. However, due to the complexity of the model of image degeneration, it still remains a challenging work to recover the ideal coefficients as precise as possible. Several pioneering studies on this aspect have already been made, for example, Peleg et al. [34] suggested utilizing a statistical prediction model in which a more accurate set of HR coefficients is predicted from their corresponding LR ones via the minimum mean-square error estimator. Moreover, Dong et al. [14] proposed a nonlocally centralized sparse representation (NCSR) model where the calculated coefficients are additionally centralized to a set of good estimates obtained by exploiting the nonlocal similarity within the observed image. By doing so, the two models both get improved greatly and the latter has even provided the leading SISR performance to date.

However, in light of the discovery [57] that similarities exist not only among columns but also among rows if a cluster of similar image patches is arranged in matrix form, we believe that the capacity of NCSR model is limited as it considers the column similarity only while ignores that among rows. Thus, in this paper we propose an enhanced SISR model based on bidirectionally aligned sparse representation (BASR). In our model, a pair of regularization terms is created first by exploiting both the column and row similarities (i.e., bidirectional similarities). Then, after sparse coding process, the roughly calculated sparse coefficients are simultaneously aligned to the pair of terms in order to compensate the errors caused by image noise and degradation, and consequently increase the accuracy of the sparse coefficients and SISR performance. Furthermore, for a more rapid convergence, the fast iterative shrinkage-thresholding algorithm (FISTA) [3] is adopted in this paper instead of employing the conventional iterative thresholding algorithm (ITA) [11]. Extensive experiments have demonstrated that our proposed BASR model outperforms its recent counterparts in terms of both visual quality and numerical evaluations.

The rest of this paper is organized as follows: In section 2, we formalize the sparse representation based SISR problem. Section 3 presents the proposed BASR model for SISR issue and its implementation in detail. Experimental results and analysis are given in Section 4, while Section 5 concludes the whole paper.

2 Problem formulation

The goal of SISR can be regarded as recovering the potential HR image as precise as possible from just only one LR input. For a comprehensive analysis, the first step is to set up a suitable single image degradation model which relates the original HR image to the observed and degraded LR image.

Assume that X is an ideal HR image, while Y is the corresponding LR image of X in the same scenario. Then both of them are lexicographically rearranged into vector form, i.e., X ∈ R ^N, Y ∈ R ^M, where N > M, r ² =N/M, and r is the scale factor. The degradation can be typically described as [38]

$$ \boldsymbol{Y}=\boldsymbol{SBX}+\boldsymbol{V}=\boldsymbol{HX}+\boldsymbol{V} $$

(1)

where S : R ^N → R ^M is the down-sampling operator, B : R ^N → R ^N is the blurring operator, V ∈ R ^M is the additive noise, and H : R ^N → R ^M is the degradation operator which can be viewed as a composite operator of both S and B.

Clearly, the fundamental constraint of SISR is that the recovered image should approximately reproduce the LR observation after imposing the same degradation on it. Nevertheless, since too much information is discarded during the high-to-low acquisition process, the linear eq. (1) is seriously underdetermined, i.e., infinitely many solutions may be suitable for (1). In order to obtain an appropriate solution, the researchers in [17, 49, 50] set up the initial framework of sparse representation based SISR, which incorporates the sparsity prior and the local-to-global reconstruction concept to the basic constraint. To be specific, suppose that the operator R _i : R ^N → R ⁿ is used to extract the i-th patch of size$ \sqrt{n}\times \sqrt{n} $ from an N length image and vectorize it, thus the i-th patch of X can be readily expressed as x _i = R _i X. With the corresponding observed LR patch y _i, each patch x _i can be sparsely represented by the formula $ {\widehat{\boldsymbol{x}}}_i={\boldsymbol{D}\boldsymbol{\alpha}}_{\boldsymbol{y}, i} $, where α _y , i is its sparse coefficient (representation) that can be calculated by a sparse coding operation with respect to a known and proper dictionary D

$$ {\widehat{\boldsymbol{\alpha}}}_{\boldsymbol{y}, i}= \arg \underset{{\boldsymbol{\alpha}}_i}{ \min}\left\{\ {\left\Vert {\boldsymbol{y}}_i-{\boldsymbol{HD}\boldsymbol{\alpha}}_i\right\Vert}_2^2\right.+\lambda \left.{\left\Vert {\boldsymbol{\alpha}}_i\right\Vert}_1\right\} $$

(2)

where λ is a trade-off to make a balance between the two terms. Note that the l ₁-norm term is a regularization term representing the sparsity prior and it has already been changed from l ₀-norm to l ₁-norm as long as the coefficients are sufficiently sparse due to non-convex character of l ₀-norm [8, 9, 16].

Usually, we work with overlapped patches to suppress the boundary artifacts. Under the circumstances of maximal overlaps, a total number of $ Q={\left(\sqrt{N}-\sqrt{n}+1\right)}^2 $ patches can be represented. After imposing a global constraint on these patches, the optimal reconstruction of the whole HR image X can be straightforwardly computed by averaging all the obtained local patches according to [17]

$$ \widehat{\boldsymbol{X}}=\boldsymbol{D}\circ {\widehat{\boldsymbol{\alpha}}}_{\boldsymbol{Y}}\overset{\Delta}{=}{\left(\sum_{i=1}^Q{{\boldsymbol{R}}_i}^T{\boldsymbol{R}}_i\right)}^{-1}\left(\sum_{i=1}^Q{{\boldsymbol{R}}_i}^T\boldsymbol{D}{\widehat{\boldsymbol{\alpha}}}_{\boldsymbol{y}, i}\right) $$

(3)

where α _Y represents the concatenation of all sparse coefficients, and a shorthand notation “∘” is defined here for a briefer expression in the following parts.

Eqs. (2) and (3) can be reformulated together into a more unified formation to stand for the sparse coding process and local-to-global reconstruction simultaneously

$$ {\widehat{\boldsymbol{\alpha}}}_{\boldsymbol{Y}}= \arg \underset{\boldsymbol{\alpha}}{ \min}\left\{{\left\Vert \boldsymbol{Y}-\boldsymbol{HD}\circ \boldsymbol{\alpha} \right\Vert}_2^2+\lambda {\left\Vert \boldsymbol{\alpha} \right\Vert}_1\right\} $$

(4)

where the first term corresponds to the data fidelity constraint, and the second one corresponds to the sparsity prior constraint.

In summary, under the basic framework of sparse representation, the ill-posed SISR problem is further regularized by the sparse prior of patches in addition to the local and global data fidelity constraint, resulting in a proper and stable solution.

3 Proposed BASR model for SISR

In this section, we are going to present an enhanced model, namely BASR, which is designed for handling the SISR issue under the particular circumstances where no external databases is allowed for prior or dictionary learning. The presentation of BASR model begins with the modeling of bidirectional similarities that is the theoretical support of subsequent content. Once it is finished, the key process, sparse coefficient alignment, is able to be established in both directions. Afterward, the way of dictionary learning and detailed implementation are specified sequentially.

3.1 Modeling of bidirectional similarities

In this paper, the proposed bidirectional similarities consist of the row similarity and column similarity. To construct the bidirectional similarities, the first step is to establish the similarity data matrix [57]. For each path x _i, we search for its P closest counterparts (include itself) from the whole HR image X, in the sense of Euclidean distance metric. By concatenating the patch and its counterparts together, we can obtain a matrix S _i ∈ R ^n × P, namely the similarity data matrix of the i-th patch. As we mentioned before, patch similarity can be found not only among the columns of similarity data matrix but also among the rows of it. Therefore, the next step is to exploit the column similarity and row similarity, respectively, by virtue of the similarity data matrix.

As for column similarity, it was first put forward in the non-local means (NLM) algorithm [6] and applied to the application of image denoising. But, different from the original NLM algorithm that actually means a weighted average of all similar patches; we determine to use every column of S _i as dictionary atoms to approximately represent the corresponding patch. This process can be formulated as

$$ {\boldsymbol{\beta}}_i= \arg \underset{{\boldsymbol{\beta}}_i}{ \min}\left\{{\left\Vert {\boldsymbol{x}}_i-{\boldsymbol{S}}_i{\boldsymbol{\beta}}_i\right\Vert}_2^2+{\eta}_1{\left\Vert {\boldsymbol{\beta}}_i\right\Vert}_2^2\right\} $$

(5)

where β _i is called the column similarity coefficient of the i-th patch, and η ₁ is its regularization parameter.

In (5), l ₂-norm of the coefficient is designed as the regularization term, which is involved for the purpose of alleviating the singularity in calculation and avoiding the trivial solution. As you may see, the formula is actually in the same form of classic Tikhonov regularization, which is also known as ridge regression in statistics. Its explicit solution can be easily given by

$$ {\boldsymbol{\beta}}_i={\left({\boldsymbol{S}}_i^T{\boldsymbol{S}}_i+{\eta}_1\boldsymbol{I}\right)}^{-1}{\boldsymbol{S}}_i^T{\boldsymbol{x}}_i $$

(6)

where I represents the identical matrix.

When turning to exploit row similarity, we are motivated by the concept of piecewise autoregressive (AR) model. As a classic but powerful method in statistics, AR model has been successfully employed in some other image processing applications, such as image interpolation [56] and image denoising [57]. The key point in it is that if a natural image is tailored into small local parts, each part can be viewed as one stationary process. In other words, it suggests that natural images are piecewise stationary and able to be modeled by a set of AR models.

Therefore, in this work we assume that the central pixel of each patch can be linearly represented by its neighboring pixels with the coefficient calculated by an AR model. Moreover, the patches that belong to the same similarity data matrix should have identical AR coefficient as they share the same similarity. Let C be the operator to extract the central pixels from the similarity data matrix (i.e., the central row of it). Similar to column similarity, this progress can also be modeled as

$$ {\boldsymbol{\gamma}}_i= \arg \underset{{\boldsymbol{\gamma}}_i}{ \min}\left\{{\left\Vert {\left({\boldsymbol{CS}}_i\right)}^T-{\boldsymbol{S}}_i^T{\boldsymbol{\gamma}}_i\right\Vert}_2^2+{\eta}_2{\left\Vert {\boldsymbol{\gamma}}_i\right\Vert}_2^2\right\} $$

(7)

where α _i is called the i-th row similarity coefficient, and the closed-form solution is given by

$$ {\boldsymbol{\gamma}}_i={\left({\boldsymbol{S}}_i{\boldsymbol{S}}_i^T+{\eta}_2\boldsymbol{I}\right)}^{-1}{\boldsymbol{S}}_i{\left({\boldsymbol{CS}}_i\right)}^T $$

(8)

With column and row coefficients, the similarities in both directions can be calculated as $ {\left\{{\boldsymbol{S}}_i{\boldsymbol{\beta}}_i\right\}}_{i=1}^Q $ and $ {\left\{{\boldsymbol{S}}_i^T{\boldsymbol{\gamma}}_i\right\}}_{i=1}^Q $, respectively. Be aware that each calculated row similarity is composed of the central pixels of the corresponding similar patches, so it is incorrectly ordered with what we have requested. Therefore, we need to spread it out in the whole HR image first, and then rearrange it in the same order as α. After doing so, the rearranged column and row similarities are denoted as φ _i ’ and ψ _i ’, respectively, which are the proposed bidirectional similarities in this paper. By taking advantage of the bidirectional similarities, it allows us to structure a much more accurate sparse representation model, which will be introduced specifically in the following subsection.

3.2 Bidirectional sparse coefficient alignment

As mentioned before, it has been found that the accuracy of sparse coefficients is of great significance to sparse representation based SISR. Nevertheless, model (4) which uses the sparse prior only may not lead to a precise enough output due to the complexity of image degradation. And it can be expected that a performance enhancement would be acquired by suppressing the sparse coding error caused by degradation and noise. Thus, in this subsection we propose an enhanced model, in which the roughly computed sparse coefficients are simultaneously aligned to the bidirectional similarities proposed previously.

Before the construction of our model, an important procedure we need to do is to change the values of bidirectional similarities from pixel domain to sparse coefficient domain through a sparse coding process. As will be introduced in the next subsection, all the sub-dictionaries adopted in this paper are orthogonal, so the coding process is simplified to only multiplying the pixel values by the transpose of the corresponding sub-dictionary. Given the sub-dictionary $ {\boldsymbol{D}}_{t_i} $of the i-th patch, this process is formulated as $ {\boldsymbol{\varphi}}_i={\boldsymbol{D}}_{t_i}^T{\boldsymbol{\varphi}}_i^{\boldsymbol{\prime}} $ and $ {\boldsymbol{\psi}}_i={\boldsymbol{D}}_{t_i}^T{\boldsymbol{\psi}}_i^{\boldsymbol{\prime}} $. By incorporating this pair of similarities into the basic model (4) as additional regularization terms, we get the objective function of our proposed BASR model

$$ \widehat{\boldsymbol{\alpha}}= \arg \underset{\boldsymbol{\alpha}}{ \min}\left\{{\left\Vert \boldsymbol{Y}-\boldsymbol{HD}\circ \boldsymbol{\alpha} \right\Vert}_2^2+\lambda {\left\Vert \boldsymbol{\alpha} \right\Vert}_1\right.\left.+{\mu}_1{\left\Vert \boldsymbol{\alpha} -\boldsymbol{\varphi} \right\Vert}_p^p+{\mu}_2{\left\Vert \boldsymbol{\alpha} -\boldsymbol{\psi} \right\Vert}_p^p\right\} $$

(9)

where φ and ψ are the concatenation of all φ _i and ψ _i respectively, representing the bidirectional similarities of α, respectively.

As you can see, in our BASR model, the output sparse coefficients are not only of the characteristic of sparsity, but also bidirectionally aligned so that the errors caused by degradation and noise can be efficiently suppressed. Furthermore, similar to [13], a more comprehensive analysis of sparse coding error is conducted and provided here in order to illustrate its statistical property, and consequently determine the type of norm to be used in (9). Specially, the test images Lena is chosen as an HR sample from which its four degraded versions are able to generate by applying the degradations specified in subsection 4.1. That is to say, here, we take into consideration all the four SISR scenarios. Using the given sub-dictionaries, it is straightforward to calculate the difference between the ideal and estimated sparse coefficients, namely the sparse coding error, by solving (4). Be aware that, to be fully convincing, the parameters used here are set to be identical with what will be adopted in the experimental section. Eventually, the probability density functions (PDFs) of sparse coding error under the four scenarios are plotted in Fig. 1(a)-(d) with respect to the 5th, 10th, 15th, 20th sub-dictionary, respectively. As shown in Fig. 1, the estimated PDFs are unable to conform to the Gaussian distributions, but they all can fit in well with the Laplace distributions. Therefore, l ₁-norm should be picked to model the sparse coding error (i.e., p = 1), motivated by the analysis conducted.

3.3 Dictionary learning and adaptive selection

Clearly, two key procedures left undone are dictionary learning and adaptive selection of one dictionary for each local patch. Conventional way of dictionary learning aims at learning a universal and over-complete dictionary to code different varieties of local structures [49, 50, 55]. However, recently it has been proved that sparse coding process under this kind of dictionaries is inherently time-consuming and potentially unstable [19]. Thus we turn to the help of another promising strategy, namely adaptive sparse domain selection (ASDS) [12].

Originally, ASDS needs an extra database of raw image patches to train on, whereas in this paper we are considering a more practical situation where no external information is available. To overcome this, an alternative training database is constructed for ASDS by sampling patches from the currently estimated HR image and its down-scaled versions instead. With training database, the specific produces goes as follows: firstly, the training patches are gathered and partitioned into K clusters via K-means clustering. After applying PCA to each cluster, we can totally obtain K orthogonal and compact sub-dictionaries which compose the final dictionary of this paper, denoted by $ {\left\{{\boldsymbol{D}}_i\right\}}_{i=1}^K $. Then, for an input patch x _i to be coded, the sub-dictionary $ {\boldsymbol{D}}_{t_i} $belonging to the nearest cluster is selected from the overall dictionary, and the sparse coding process is greatly simplified to matrix multiplication of the form: $ {\boldsymbol{\alpha}}_i={\boldsymbol{D}}_{t_i}^T{\boldsymbol{x}}_i $. Since each given patch can be better represented by the adaptively selected sub-dictionary, the whole recovered image is more accurate than just using a universal dictionary. Moreover, this learning and coding strategy implicitly enforces the coefficient of the given patch with respect to the other sub-dictionaries equal to zero. That is to say, our model guarantees the local sparsity of coefficients spontaneously, thus the regularization term in (9) enforcing local sparsity can be omitted. The objective function finally becomes

$$ \widehat{\boldsymbol{\alpha}}= \arg \underset{\boldsymbol{\alpha}}{ \min}\left\{{\left\Vert \boldsymbol{Y}-\boldsymbol{HD}\circ \boldsymbol{\alpha} \right\Vert}_2^2\right.\left.+{\mu}_1{\left\Vert \boldsymbol{\alpha} -\boldsymbol{\varphi} \right\Vert}_1+{\mu}_2{\left\Vert \boldsymbol{\alpha} -\boldsymbol{\psi} \right\Vert}_1\right\} $$

(10)

3.4 Summary and mathematic implementation

It can be seen that the proposed BASR model (10) is a hybrid optimization problem with the co-occurrence of l ₂-norm and l ₁-norm, which makes it become non-convex and have no closed-form solution. Therefore, in our implementation, the proposed model is designed to be iteratively solved in a patchwise manner. Without loss of generality, this model can be rewritten into a patchwise form

$$ {\left\{{\boldsymbol{\alpha}}_i\right\}}_{i=1}^Q= \arg \underset{{\boldsymbol{\alpha}}_i}{ \min}\left\{{\left\Vert {\boldsymbol{y}}_i-{\boldsymbol{HD}}_{t_i}{\boldsymbol{\alpha}}_i\right\Vert}_2^2\right.+{\mu}_1{\left\Vert {\boldsymbol{\alpha}}_i-{\boldsymbol{\varphi}}_i\right\Vert}_1\left.+{\mu}_2{\left\Vert {\boldsymbol{\alpha}}_i-{\boldsymbol{\psi}}_i\right\Vert}_1\right\}, for\ i=1,2,\dots, Q $$

(11)

On the basis of fast iterative shrinkage-thresholding algorithm (FISTA) [3], a local-to-global and coarse-to-fine solving course is able to be concisely expressed as follows: (for more details about the whole process, please refer to Appendix A).

$$ {\left\{{\boldsymbol{\alpha}}_i^{\left( l+1\right)}\right\}}_{i=1}^Q=\rho \left({\boldsymbol{\alpha}}_i^{t mp}, L\right)=\rho \left({\boldsymbol{D}}_{t_i}^T{\boldsymbol{R}}_i{\boldsymbol{X}}^{t mp}, L\right)=\rho \left({\boldsymbol{D}}_{t_i}^T{\boldsymbol{R}}_i\cdot h\left({\boldsymbol{X}}^{(l)},{\boldsymbol{X}}^{\left( l-1\right)}\right), L\right), for\ i=1,2,\dots, Q $$

(12)

Notation L is a scalar involved here to control the magnitude of step-size and its value can be determined by employing a backtracking step-size rule, function ρ is cast as the shrinkage operator which is defined in Appendix A. Moreover, to accelerate the iteration, the temporary variable before shrinkage operation in (12) ought not to be computed by considering the result obtained in the previous iteration only (i.e., X ^(l)), but rather calculated by utilizing a very special linear combination of the previous two results (i.e., X ^(l) and X ^{(l −1)}), which is conveyed by function h. In summary, the detailed description of the implementation in this paper is outlined in Algorithm 1, while the block diagram of proposed BASR method is illustrated in Fig. 2.

As we can see in Algorithm 1, a nested iteration consisting of inner and outer loops is employed. The reason is that the main computational burden exits in the procedures of dictionary learning and similarity modeling, and the variables involved in these parts do not change drastically while iteration proceeds. Therefore, these procedures can be put in the outer loop and only need to be executed every I ₂ iterations in order to save computation cost. In addition, the step (i) and (iv) in the inner loop is additionally requested computation cost of FISTA. When compared to other steps, the two are both computationally negligible, thus the algorithm above remains almost the same computational burden but with a faster convergence rate. The theoretical convergence of FISTA has been well proved, please refer to [3, 11] for details.

4 Experimental results and analysis

In this section, a series of experiments on natural images have been designed and conducted to verify the effectiveness and robustness of our proposed BASR method in comparison with eight recent state-of-the-art counterparts, including SCSR [50], SLSR [55], SPSR [34], ASDS [12], NCSR [14], ANR [39], A+ [40] and SRCNN [15].

4.1 Experimental settings

First of all, thirteen different types of genetic images (refer to Table 1) are accepted as test benchmarks, among which the first ten were presented in [12] while the rest three appeared in [34]. Then, the degradation model (1) needs to be applied to these test samples for generating the LR images that will be used as inputs in the following experiments. But be aware that two prevalent but different types of degradation configurations were used among the aforementioned eight methods: the first type tends to use the bicubic filter to blur the original image first and then down-sample it with a prespecified decimation factor in both horizontal and vertical directions; by contrast the second chose to employ a Gaussian filter instead, followed by the same down-sampling process. It can be seen that the essential difference between the two configurations is actually the choice of blur kernel. In this paper, both types of the filters should be taken into account in order to be fully convincing. But to emphasize the performance influence caused by different blur kernels, we decide to fix the decimation factor to be 3 (the benchmark value) throughout our experiments so that that part of impact introduced by down-sampling could always be the same. In addition, the additive zero-mean Gaussian noise of standard deviation 5 could be further added to the degraded inputs for testing the robustness of all methods to noise. Therefore, in total, four different SISR scenarios are taken into consideration in this paper, which are sequentially specified as follows:

Scenario 1: HR images are blurred by the Bicubic filter, followed by down-sampling.^{Footnote 1}
Scenario 2: HR images are blurred by the Gaussian filter of size 7 × 7 and standard deviation 1.6, followed by down-sampling.
Scenario 3: Use the LR outputs of scenario 1 but contaminated by Gaussian noise additionally.
Scenario 4: Use the LR outputs of scenario 2 but contaminated by Gaussian noise additionally.

Table 1 Numerical results (PSNR and SSIM) of the reconstructed HR images in scenario 1

Full size table

The other main setting of parameters involved in this paper empirically goes as follows: P = 12, K = 50, η ₁ = 0.1, η ₂ = 0.3, I ₁ = 9, I ₂ = 40 and the local patches are extracted with maximal overlaps, whose size is set to be 5 × 5 (i.e., n = 25). Note that, according to some preliminary tests, we have found that the performance of our BASR method is almost insusceptible to the aforementioned parameters within a reasonable range, whereas the weights of regularization terms (i.e., μ ₁ and μ ₂) play a much important role. Therefore, the determination of the regularization weights deserves a deep consideration. In order to simplify the determination, we decide to keep the weight of column similarity (i.e., μ ₁) equal to the value employed in [14] unvaryingly, meanwhile search for the optimal value of the other weight (i.e., μ ₂). In doing so, we can pay more attention on the complementary effect produced by the regularization terms of BASR model rather than get stuck in doing parameter optimization. As a result, the weights μ ₁ and μ ₂ are set to be 0.7 and 0.1 respectively for noiseless experiments, while under noisy condition they are equal to 1 and 0.2, respectively.

Moreover, considering that human visual system is more sensitive to the changes in luminance than to those in chromaticity, YCbCr color space is adopted for representing color images in place of RGB space. All of the competitive methods are applied to the luminance component only, while the chromaticity components are simply interpolated from the input LR image to the target HR image by bicubic interpolation. To qualitatively and quantitatively evaluate the performance, the reconstructed images produced by various methods are contrasted in terms of both visual quality and two wildly used numerical indicators, i.e., PSNR and SSIM index [44]. When computing both of the numerical indicators, image borders which were neglected in [15, 38, 55] are uniformly taken into consideration in this paper for a fair comparison. All experiments were performed in the MATLAB R2013b environment on a PC with Core i5 3.2GHz CPU and 4GB RAM.

4.2 Experimental results

In this subsection, all competitive methods are implemented and evaluated under the four different SISR scenarios specified above. First, let us concentrate on the first two scenarios, which actually correspond to noiseless cases but with two different blur kernels. For scenario 1, the numerical results are reported in Table 1. Note that for each image its upper row in the Table shows the PSNR values (unit is dB) while the lower row provides the SSIM indexes (dimensionless quantity), and the layout is accepted in every Table of this paper. From Table 1 we can see that, in terms of the quantitative assessment, the conventional sparse representation based methods, SCSR and SLSR, give the lowest results all the time, while the other six methods are all better than them more or less. This is mainly due to that some particular progress has been made in either dictionary learning or sparse coding process. In the meantime, it is doubtless that our proposed BASR method always provides the highest numerical values among all competitive methods, since it not only inherits the good properties of sparse representation based SISR framework but also benefits quite a lot from the proposed bidirectional sparse coefficient alignment procedure. When testing environment turns to the second scenario, the whole numerical indicators are listed in Table 2, from which a consistent conclusion can be readily drawn that our BASR method is still superior to all the other counterparts, including the best competitor NCSR. To be precise, it outperforms the second best method, namely NCSR, by an average gain of 0.63 dB and 0.0086 in PSNR and SSIM index, respectively. Moreover, by comparing the numerical results of scenario 1 and 2, we noticed that the numerical differences of our method between the two scenarios (−0.01 dB for PSNR and 0.0012 for SSIM) are negligible. That is to say, our BASR model is able to provide the equal performance when the blur operator is changed from bicubic filter to Gaussian, showing a strong robustness to the variation on blur type. Next, to further evaluate the visual quality obtained by all methods, two sets of visual comparisons on image Leaves and Parrot under the first two scenarios are illustrated in Figs. 3 and 4, respectively. As shown in the figures, the proposed BASR method can produce the most visually pleasing outputs in terms of preserving more fine details and sharp edges. For instance, the stems and leaves in Fig. 3(j) are most consistent with those belonging to the original image Leaves showed in Fig. 3(a), while others are blurred, twisted or distorted to some degree. Meanwhile, the stripes around the eye of image Parrot in Fig. 4(j) are the clearest and sharpest in comparison with the other methods.

Table 2 Numerical results (PSNR and SSIM) of the reconstructed HR images in scenario 2

Full size table

Then, in consideration of the fact that in practice the LR inputs of SISR are often contaminated by noise, thus we move on to the next two SISR scenarios for the purpose of testing the robustness of these methods to noise. Gaussian white noise is further added to the same LR images previously used in scenarios 1 and 2 to generate the new inputs for scenarios 3 and 4 respectively, making the task more challenging. Tables 3 and 4 and Figs 5 and 6 give the whole numerical results and another two sets of visual displays of HR images reconstructed by various methods. In contrast to their previous performance, it can be seen that the SPSR method is the most sensitive to the perturbation caused by noise (especially in scenario 4), which results in not only serious noise-caused artifacts in recovered images but also very severe declines in numerical indicators. The reason for this is that a particular set of parameters in the SPSR model is trained to work efficiently under one certain condition. If the circumstances are slightly changed, all parameters need to be retrained or reassigned. In other words, the SPSR method lacks the characteristic of robustness to noise, which is the fatal drawback of it. Similarly, although both SCSR and SLSR methods can indisputably enhance the resolution of LR images, they are still liable to generate noticeable jaggy artifacts along image edges, indicating that using sparse prior only is not enough for reliable reconstructions when facing noise. As for ANR, A+ and SRCNN, they do much better in reducing jaggy artifacts, whereas they still magnify, rather than eliminate, the unpleasant influence of the miscellaneous outliers to some degree. The ASDS and NCSR methods both show an outstanding capability of recovering high frequency components and suppressing noise; nevertheless, they tend to produce some unclear and blurred local details or even incorrect parts sometimes. For instance, in Fig. 6(e) the shape of the left eye of Lena is kind of deformed, while a wrong white spot can be observed in Fig. 6(f), which all make the whole HR images reconstructed by ASDS and NCSR look unnatural and uncomfortable. On the contrary, by taking advantage of bidirectional similarities, our BASR method can produce more accurate sparse coding coefficients so that it not only suppresses the noise efficiently but also preserves more delicate structures as compared to any other counterparts. Thus, its corresponding HR outputs are still of the best visual quality as we can see, showing a strong robustness against the noise.

Table 3 Numerical results (PSNR and SSIM) of the reconstructed HR images in scenario 3

Full size table

Table 4 Numerical results (PSNR and SSIM) of the reconstructed HR images in scenario 4

Full size table

4.3 Comparison and analysis on the rate of convergence

In this subsection, we decide to make an investigation on the convergence rate of our method. The second scenario is selected as the typical testing environment, and be aware that not all the competitive methods are added to the comparison this time, since some of them are not iteratively solved. Consequently, the ASDS and NCSR methods are picked as the testing baselines. In addition to the two methods, a variant of NCSR model, in which the FISTA algorithm is employed to solve it in place of using its original solution, is also implemented for the purpose of testing the effectiveness of FISTA and giving a more comprehensive illustration. The variant will be referred to as “NCSR + FISTA” hereinafter.

Then, the RMSE values of the first five images (bike, butterfly, hat, leaves and parrot) varying with the increase of iterations are plotted in Fig. 7(a)-(e), respectively. According to the graphics, the discoveries here are twofold: First, it can be observed that our method consistently converges fastest among these methods, which demonstrates both the efficiency and effectiveness of the proposed BASR model. Second, we still found that the original NCSR method is inferior to its variant, NCSR + FISTA, in terms of convergence rate. The improvement on the variant can only be attributed to the virtue of using FISTA algorithm, since we exactly preserved the whole framework of NCSR (even the original parameter settings were maintained) except for the change on solution. And because we had not fine-tuned the parameters of NCSR + FISTA, it becomes reasonable that its curves are ladder-like and not smooth enough.

Furthermore, to provide more precise information, we finished the experiments on the remaining samples. The average RMSE curve of all images is plot in Fig. 7(f). It can be seen that on average ASDS and NCSR do not converge after 500 iterations (actually after 900, as not shown in the figure), whereas our method only needs 340 ~ 380 iterations to yield a convergence. This is also the reason that we set the number of iterations equal to 360 times. As for running time, the ASDS, NCSR and BASR methods cost about 298, 532, 319 s respectively, to reconstructed a super-resolved image from 85 × 85 to 255 × 255 pixels. Even though ASDS requires the least amount of time, this is acquired by virtue of offline learning and never updating dictionary. If its learning process runs once, it will cost another 10 min. By contrast, our method achieves equivalent performance on running time as compared to ASDS, while costs much less when compared with NCSR, showing the efficiency once again.

In conclusion, according to the conducted experiments, the FISTA algorithm can be acknowledged as a more promising solution to BASR in the sense that it has a higher rate of convergence, and the superiority of our method to similar ones is fully verified.

5 Conclusion

In this paper, a bidirectionally aligned sparse representation model was proposed for the application of single image super-resolution. Motivated by the recent discovery [57] that image patch similarities exist not only among columns but also among rows of similarity data matrix, the modeling of bidirectional similarities is first presented in our model. By virtue of it, we constructed a pair of regularization terms in the form of l ₁-norm, to which the raw sparse coefficients are simultaneously aligned after sparse coding process in order to make up for the errors caused by image noise and degradation. On the basis of FISTA algorithm, a local-to-global and coarse-to-fine solving course was developed to efficiently solve the proposed BASR model. Extensive experiments were performed to make a complete comparison between the BASR method and other leading methods, and the results indicate that our method is constantly superior to its counterparts in terms of both numerical assessment and visual perception. In our future work, we will concentrate on the research on the computational efficiency and adaptive allocation of weights of similarities and regularization in order to achieve a further improvement.

Notes

This whole operation can be directly accomplished by using the MATLAB function “imresize” with the method option “bicubic”.

References

Aguena ML, Mascarenhas ND (2006) Multispectral image data fusion using POCS and super-resolution. Comput Vis Image Underst 102(2):178–187
Article Google Scholar
Baker S, Kanade T (2002) Limits on super-resolution and how to break them. IEEE Trans Pattern Anal Mach Intell 24(9):1167–1183
Article Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202
Article MathSciNet MATH Google Scholar
Bioucas-Dias JM, Figueiredo MAT (2007) A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration. IEEE Trans Image Process 16(12):2992–3004
Article MathSciNet Google Scholar
Bose NK, Kim HC, Valenzuela HM (1993) Recursive implementation of total least squares algorithm for image reconstruction from noisy, undersampled multiframes. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 269–272
Buades A, Coll B, Morel J (2005) A non-local algorithm for image denoising. In: IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR), 60–65
Candè EJ, Wakin MB (2008) An introduction to compressive sampling. IEEE Signal Process Mag 25(2):21–30
Article Google Scholar
Candes EJ, Romberg JK, Tao T (2006) Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math 59(8):1207–1223
Article MathSciNet MATH Google Scholar
Candes EJ, Wakin MB, Boyd SP (2008) Enhancing sparsity by reweighted ℓ 1 minimization. J Fourier Anal Appl 14(5–6):877–905
Article MathSciNet MATH Google Scholar
Chang H, Yeung DY, Xiong Y (2004) Super-resolution through neighbor embedding. In: IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR), I-I
Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun Pure Appl Math 57(11):1413–1457
Article MathSciNet MATH Google Scholar
Dong W, Zhang L, Shi G, Wu X (2011) Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans Image Process 20(7):1838–1857
Article MathSciNet MATH Google Scholar
Dong W, Zhang L, Shi G (2011) Centralized sparse representation for image restoration. In: IEEE International Conference on Computer Vision (ICCV), 1259–1266
Dong W, Zhang L, Shi G, Li X (2013) Nonlocally centralized sparse representation for image restoration. IEEE Trans Image Process 22(4):1620–1630
Article MathSciNet MATH Google Scholar
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Article Google Scholar
Donoho DL (2006) For most large underdetermined Systems of Linear Equations the minimal 1-norm solution is also the sparsest solution. Commun Pure Appl Math 59(6):797–829
Article MathSciNet MATH Google Scholar
Elad M, Aharon M (2006) Image Denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745
Article MathSciNet Google Scholar
Elad M, Feuer A (1997) Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images. IEEE Trans Image Process 6(12):1646–1658
Article Google Scholar
Elad M, Yavneh I (2009) A plurality of sparse representations is better than the sparsest one alone. IEEE Trans Inf Theory 55(10):4701–4714
Article MathSciNet MATH Google Scholar
Farsiu S, Robinson MD, Elad M, Milanfar P (2004) Fast and robust multiframe super resolution. IEEE Trans Image Process 13(10):1327–1344
Article Google Scholar
Farsiu S, Robinson D, Elad M, Milanfar P (2004) Advances and challenges in super-resolution. Int J Imaging Syst Technol 14(2):47–57
Article Google Scholar
Freeman WT, Jones TR, Pasztor EC (2002) Example-based super-resolution. IEEE Comput Graph Appl 22(2):56–65
Article Google Scholar
He Y, Yap K, Chen L, Chau L (2009) A soft MAP framework for blind super-resolution image reconstruction. Image Vis Comput 27(4):364–373
Article Google Scholar
Kim SP, Su W (1993) Recursive high-resolution reconstruction of blurred multiframe images. IEEE Trans Image Process 2(4):534–539
Article Google Scholar
Kim SP, Bose NK, Valenzuela HM (1990) Recursive reconstruction of high resolution image from noisy undersampled multiframes. IEEE Trans Acoust Speech Signal Process 38(6):1013–1027
Article Google Scholar
Li X, Hu Y, Gao X, Tao D, Ning B (2010) A multi-frame image super-resolution method. Signal Process 90(2):405–414
Article MATH Google Scholar
Lin ZC, Shum HY (2004) Fundamental limits of reconstruction-based superresolution algorithms under local translation. IEEE Trans Pattern Anal Mach Intell 26(1):83–97
Article Google Scholar
Lu X, Yuan H, Yan P, Yuan Y, Li X (2012) Geometry constrained sparse coding for single image super-resolution. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 1648–1655
Mishra D, Majhi B, Sa PK, Dash R (2016) Development of robust neighbor embedding based super-resolution scheme. Neurocomputing 202:49–66
Article Google Scholar
Nasrollahi K, Moeslund TB (2014) Super-resolution: a comprehensive survey. Mach Vis Appl 25(6):1423–1468
Article Google Scholar
Nguyen N, Milanfar P, Golub G (2001) A computationally efficient superresolution image reconstruction algorithm. IEEE Trans Image Process 10(4):573–583
Article MATH Google Scholar
Omer OA, Tanaka T (2011) Region-based weighted-norm with adaptive regularization for resolution enhancement. Digital Signal Process 21(4):508–516
Article Google Scholar
Park SC, Park MK, Kang MG (2003) Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 20(3):21–36
Article Google Scholar
Peleg T, Elad M (2014) A statistical prediction model based on sparse representations for single image super-resolution. IEEE Trans Image Process 23(6):2569–2582
Article MathSciNet MATH Google Scholar
Purkait P, Pal NR, Chanda B (2014) A fuzzy-rule-based approach for single frame super resolution. IEEE Trans Image Process 23(5):2277–2290
Article MathSciNet MATH Google Scholar
Qin F, He X, Chen W, Yang X, Wu W (2009) Video superresolution reconstruction based on subpixel registration and iterative back projection. J Electron Imaging 18(1):13007
Article Google Scholar
Stark H, Oskoui P (1989) High-resolution image recovery from image-plane arrays, using convex projections. J Opt Soc Am A 6(11):1715–1726
Article Google Scholar
Tian J, Ma K (2011) A survey on super-resolution imaging. SIViP 5(3):329–342
Article Google Scholar
Timofte R, De Smet V, Van Gool L (2013) Anchored neighborhood regression for fast example-based super-resolution. In: IEEE International Conference on Computer Vision (ICCV), 1920–1927
Timofte R, De Smet V, Van Gool L (2014) A+: adjusted anchored neighborhood regression for fast super-resolution. In: Asian Conference on Computer Vision (ACCV), 111–126
Timofte R, Rothe R, Van Gool L (2016) Seven ways to improve example-based single image super resolution. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 1865–1873
Tsai RY, Huang TS (1984) Multiframe image restoration and registration. Advances in computer vision and Image Processing 1(2):317–339
Google Scholar
Vrigkas M, Nikou C, Kondi LP (2013) Accurate image registration for MAP image super-resolution. Signal Process Image Commun 28(5):494–508
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Yan C, Zhang Y, Dai F, Wang X, Li L, Dai Q (2014) Parallel deblocking filter for HEVC on many-core processor. Electron Lett 50(5):367–368
Article Google Scholar
Yan C, Zhang Y, Dai F, Zhang J, Li L, Dai Q (2014) Efficient parallel HEVC intra-prediction on many-core processor. Electron Lett 50(11):805–806
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Process Lett 21(5):573–576
Article Google Scholar
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Trans Circuits Syst Video Technol 24(12):2077–2089
Article Google Scholar
Yang J, Wright J, Huang T, Ma Y (2008) Image super-resolution as sparse representation of raw image patches. In: IEEE conference on Computer Vision and Pattern Recognition (CVPR), 1–8
Yang J, Wright J, Huang TS, Ma Y (2010) Image super-resolution via sparse representation. IEEE Trans Image Process 19(11):2861–2873
Article MathSciNet MATH Google Scholar
Yang S, Liu Z, Wang M, Sun F, Jiao L (2011) Multitask dictionary learning and sparse representation based single-image super-resolution reconstruction. Neurocomputing 74(17):3193–3203
Article Google Scholar
Yue L, Shen H, Yuan Q, Zhang L (2014) A locally adaptive L₁− L₂ norm for multi-frame super-resolution of images with mixed noise and outliers. Signal Process 105:156–174
Article Google Scholar
Zeng W, Lu X (2012) A generalized DAMRF image modeling for superresolution of license plates. IEEE Trans Intell Transp Syst 13(2):828–837
Article Google Scholar
Zeng W, Lu X, Fei S (2015) Image super-resolution employing a spatial adaptive prior model. Neurocomputing 162:218–233
Article Google Scholar
Zeyde R, Elad M, Protter M (2012) On single image scale-up using sparse-representations. Curves and Surfaces (Springer), 711–730.
Zhang X, Wu X (2008) Image interpolation by adaptive 2-D autoregressive modeling and soft-decision estimation. IEEE Trans Image Process 17(6):887–896
Article MathSciNet Google Scholar
Zhang X, Feng X, Wang W (2013) Two-direction nonlocal model for image Denoising. IEEE Trans Image Process 22(1):408–412
Article MathSciNet MATH Google Scholar
Zhou L, Lu X, Yang L (2014) A local structure adaptive super-resolution reconstruction method based on BTV regularization. Multimedia Tools and Applications 71(3):1879–1892
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the associate editor and anonymous reviewers for their constructive and precious comments, which helped us a lot in improving the presentation of this work.

This work was supported by the National Natural Science Foundation of China (No.61374194, No.61403081), the National Key Science & Technology Pillar Program of China (No.2014BAG01B03), the Key Research and Development Program of Jiangsu Province (No. BE2016739), and a Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing 210096, China
Chao Xie, Shengqin Jiang & Xiaobo Lu
Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing 210096, China
Chao Xie, Shengqin Jiang & Xiaobo Lu
College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Weili Zeng

Authors

Chao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Weili Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Shengqin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaobo Lu.

Appendix A: Solving BASR on the basis of FISTA

Here, we provide the main process for solving the proposed BASR model (11) on the basis of FISTA [3] which is an effective algorithm for linear inverse problems with dense matrix data. Even though this part is more detailed when compared with that in subsection 3.4, it is still a sketch only. For comprehensive details, we direct you to [3, 11].

First of all, for the sake of convenience, let us divide (11) into two functions and define them respectively

$$ \begin{array}{l} f\left({\boldsymbol{\alpha}}_i\right)\overset{\Delta}{=}{\left\Vert {\boldsymbol{y}}_i-{\boldsymbol{H}\boldsymbol{D}}_{t_i}{\boldsymbol{\alpha}}_i\right\Vert}_2^2,\kern0.5em \nabla f\left({\boldsymbol{\alpha}}_i\right)=2{\boldsymbol{D}}_{t_i}^T\left({\boldsymbol{H}}^T{\boldsymbol{H}\boldsymbol{x}}_i-{\boldsymbol{H}}^T{\boldsymbol{y}}_i\right)\\ {} g\left({\boldsymbol{\alpha}}_i\right)\overset{\Delta}{=}{\mu}_1{\left\Vert {\boldsymbol{\alpha}}_i-{\boldsymbol{\varphi}}_i\right\Vert}_1+{\mu}_2{\left\Vert {\boldsymbol{\alpha}}_i-{\boldsymbol{\psi}}_i\right\Vert}_1\end{array} $$

(13)

Then, according to the mathematic results derived in [3], model (11) can be approximated by considering the following formula at the given point $ {\boldsymbol{\alpha}}_i^{(l)} $(i.e., the sparse coefficient in the l-th iteration)

$$ \begin{array}{c}{\left\{{\boldsymbol{\alpha}}_i^{\left( l+1\right)}\right\}}_{i=1}^Q=\rho \left({\boldsymbol{\alpha}}_i^{(l)}, L\right)\overset{\Delta}{=} \arg \underset{{\boldsymbol{\alpha}}_i^{\left( l+1\right)}}{ \min}\left\{\omega \left({\boldsymbol{\alpha}}_i^{\left( l+1\right)},{\boldsymbol{\alpha}}_i^{(l)}, L\right)\right\}\\ {}= \arg \underset{{\boldsymbol{\alpha}}_i^{\left( l+1\right)}}{ \min}\left\{ f\left({\boldsymbol{\alpha}}_i^{(l)}\right)+\left\langle {\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\alpha}}_i^{(l)},\nabla f\left({\boldsymbol{\alpha}}_i^{(l)}\right)\right\rangle +\frac{L}{2}{\left\Vert {\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\alpha}}_i^{(l)}\right\Vert}_2^2+ g\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}\right)\right\}, for\ i=1,2,\dots, Q\end{array} $$

(14)

It has been proved that the above formula admits a unique minimizer. By getting rid of the constant terms, function ρ can be reformulated into a briefer from

$$ \rho \left({\boldsymbol{\alpha}}_i^{(l)}, L\right)= \arg \underset{{\boldsymbol{\alpha}}_i^{\left( l+1\right)}}{ \min}\left\{ g\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}\right)+\frac{L}{2}{\left\Vert {\boldsymbol{\alpha}}_i^{\left( l+1\right)}-\left({\boldsymbol{\alpha}}_i^{(l)}-\frac{1}{L}\nabla f\left({\boldsymbol{\alpha}}_i^{(l)}\right)\right)\right\Vert}_2^2\right\} $$

(15)

Notation L here is involved to control the magnitude of step-size. After taking the derivative of (15) with respect to $ {\boldsymbol{\alpha}}_i^{\left( l+1\right)} $ and making it equal to zero, we can get

$$ {\boldsymbol{\alpha}}_i^{\left( l+1\right)}+\frac{\mu_1}{L}\mathit{\operatorname{sgn}}\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\varphi}}_i^{(l)}\right)+\frac{\mu_2}{L}\mathit{\operatorname{sgn}}\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\psi}}_i^{(l)}\right)={\boldsymbol{\alpha}}_i^{(l)}-\frac{1}{L}\nabla f\left({\boldsymbol{\alpha}}_i^{(l)}\right) $$

(16)

If we let the left side part of (16) form anther new function F, that is

$$ F\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}\right)={\boldsymbol{\alpha}}_i^{\left( l+1\right)}+\frac{\mu_1}{L}\mathit{\operatorname{sgn}}\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\varphi}}_i^{(l)}\right)+\frac{\mu_2}{L}\mathit{\operatorname{sgn}}\left({\boldsymbol{\alpha}}_i^{\left( l+1\right)}-{\boldsymbol{\psi}}_i^{(l)}\right) $$

(17)

Then, the function ρ can be expressed as the inverse function of F

$$ \rho \left({\boldsymbol{\alpha}}_i^{t mp}, L\right)={F}^{-1}\left({\boldsymbol{\alpha}}_i^{(l)}-\frac{1}{L}\nabla f\left({\boldsymbol{\alpha}}_i^{(l)}\right)\right)={F}^{-1}\left({\boldsymbol{D}}_{t_i}^T\left({\boldsymbol{x}}_i^{(l)}-\frac{2}{L}\left({\boldsymbol{H}}^T{\boldsymbol{H}\boldsymbol{x}}_i^{(l)}-{\boldsymbol{H}}^T{\boldsymbol{y}}_i\right)\right)\right) $$

(18)

And it can be readily formulated as

$$ \mathrm{If}\ \varphi \le \psi, $$

$$ \rho \left(\alpha, L\right)=\left\{\begin{array}{ll}\alpha +\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \alpha <\varphi -\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill \\ {}\varphi \hfill & \mathrm{if}\kern0.5em \varphi -\frac{\mu_1}{L}-\frac{\mu_2}{L}\le \alpha <\varphi +\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill \\ {}\alpha -\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \varphi +\frac{\mu_1}{L}-\frac{\mu_2}{L}\le \alpha <\psi +\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill \\ {}\psi \hfill & \mathrm{if}\kern0.5em \psi +\frac{\mu_1}{L}-\frac{\mu_2}{L}\le \alpha <\psi +\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill \\ {}\alpha -\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \psi +\frac{\mu_1}{L}+\frac{\mu_2}{L}\le \alpha \hfill \end{array}\right. $$

(19)

$$ \mathrm{Else}\ \mathrm{if}\ \varphi >\psi, $$

$$ \rho \left(\alpha, L\right)=\left\{\begin{array}{ll}\alpha +\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \alpha <\psi -\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill \\ {}\psi \hfill & \mathrm{if}\kern0.5em \psi -\frac{\mu_1}{L}-\frac{\mu_2}{L}\le \alpha <\psi -\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill \\ {}\alpha +\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \psi -\frac{\mu_1}{L}+\frac{\mu_2}{L}\le \alpha <\varphi -\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill \\ {}\varphi \hfill & \mathrm{if}\kern0.5em \varphi -\frac{\mu_1}{L}+\frac{\mu_2}{L}\le \alpha <\varphi +\frac{\mu_1}{L}+\frac{\mu_2}{L}\hfill \\ {}\alpha -\frac{\mu_1}{L}-\frac{\mu_2}{L}\hfill & \mathrm{if}\kern0.5em \varphi +\frac{\mu_1}{L}+\frac{\mu_2}{L}\le \alpha \hfill \end{array}\right. $$

(20)

It can be noted that the above algorithm corresponds to the general process of classic iterative thresholding algorithm (ITA) [11], which can be viewed as an extension of the classic gradient-based method. Function ρ acts as the shrinkage operator and processed in a pixelwise manner. Although it is well known that the first order optimization algorithms are often the only simple and practical option to deal with large-scale problems such as the case in this paper, it still has been found that the sequence produced by the above algorithm converges quite slowly to the final minimizer [4]. To accelerate the algorithm, the temporary variable before shrinkage operation in (18) ought not to be computed by considering the result obtained in the previous iteration only, but rather to be calculated by utilizing a very special linear combination of the previous two results. Therefore, formula (18) can be modified as follows:

$$ \left\{\begin{array}{l}\boldsymbol{Z}={\widehat{\boldsymbol{X}}}^{(l)}+\frac{t^{\left( l-1\right)}-1}{t^{(l)}}\left({\boldsymbol{X}}^{(l)}-{\boldsymbol{X}}^{\left( l-1\right)}\right)\\ {}\rho \left({\boldsymbol{\alpha}}_i^{t mp}, L\right)=\rho \left({\boldsymbol{D}}_{t_i}^T{\boldsymbol{R}}_i\left(\boldsymbol{Z}-\frac{2}{L}\left({\boldsymbol{H}}^T\boldsymbol{HZ}-{\boldsymbol{H}}^T\boldsymbol{Y}\right)\right), L\right)\end{array}\right. $$

(21)

Besides, anther condition ensuring convergence is to require that the step-size controller L is set to be no less than the smallest Lipschitz constant of the gradient of function f, and this quantity can be determined by employing a backtracking step-size rule. In summary, a step-by-step description of the above implementation can be given in Algorithm 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, C., Zeng, W., Jiang, S. et al. Bidirectionally aligned sparse representation for single image super-resolution. Multimed Tools Appl 77, 7883–7907 (2018). https://doi.org/10.1007/s11042-017-4689-7

Download citation

Received: 30 September 2016
Revised: 16 February 2017
Accepted: 04 April 2017
Published: 19 April 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11042-017-4689-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bidirectionally aligned sparse representation for single image super-resolution

Abstract

Similar content being viewed by others

Sparse representation based single image super-resolution with low-rank constraint and nonlocal self-similarity

Sparse Representation with Global and Nonlocal Self-similarity Prior for Single Image Super-Resolution

Single Image Super-Resolution Based on Nonlocal Sparse and Low-Rank Regularization

1 Introduction

2 Problem formulation

3 Proposed BASR model for SISR

3.1 Modeling of bidirectional similarities

3.2 Bidirectional sparse coefficient alignment

3.3 Dictionary learning and adaptive selection

3.4 Summary and mathematic implementation