Keywords

1 Introduction

Super resolution (SR) image reconstruction refers to a signal processing approach to obtain a high resolution (HR) image from observed single or multiple low resolution (LR) image(s) [1]. For the single image SR reconstruction problem, simple interpolation methods like “bilinear” or “bicubic” interpolation cannot do a good job because only the neighbor pixels are used to estimate the missing information. The enlarged image is blurry especially in the regions existing edges and textures. Some researchers [2, 3] merged the natural image priors into the traditional interpolation methods and generated the image with sharper edges. But these methods are unstable and sometimes easy to produce artifacts. Another kind of methods employs the machine learning techniques which are very popular nowadays. Freeman et al. [4, 5] first proposed this idea and named it as “example-based super-resolution”. They estimated the missing details with the help of abundant example-pairs extracted from an external training set. The algorithm attempts to learn the co-occurrence prior between LR and HR image patches. Then the prior is used in SR reconstruction. Example-based SR method breaks through the bottleneck of the conventional SR methods and generates more distinct image. Inspired by the manifold learning, neighbor embedding for SR was proposed [6]. In this method, multiple examples can contribute simultaneously to the generation of each HR image patch. In 2008, Yang et al. [7] employed “sparse coding” to represent an image patch sparsely over an example set. Moreover, the connection between LR and HR example is replaced by the same sparse representation coefficient over the LR or HR example set. Then, Wang [8] and Yang [9] successively used a pair of learned compact over-complete dictionary to replace the two huge example sets. This change speeds up the process of sparse coding greatly. However, it is very time-consuming to learn the over-complete dictionary-pair. So, some researchers try to accelerate the learning process [1012]. In recent years, Dong et al. [13] proposed a nonlocal centralized sparse representation technique and Peleg et al. [14] presented a statistical prediction model based on sparse representations, which all generates the better SR images.

Here, we propose a fast dictionary learning technique K-Eigen Decomposition (K-EIG) and employ it into the sparse representation based SR framework. In addition, we merge the classification of image patches and the extension of edge patches skillfully into the whole algorithm to promote the quality of the SR images. The outline of this paper is as follows. In Sect. 2, we analyze the essence of the sparse representation based image SR methods. Then, the details of our proposed algorithm are presented in Sect. 3. In Sect. 4, we show the simulations and results. Conclusion is given in Sect. 5.

2 Reviews and Analysis

Let \( \varvec{I}_{l} \) be the given LR image and \( \varvec{I}_{h} \) be the corresponding HR image which needs to be estimated. Usually, the degradation model can be written as:

$$ \varvec{I}_{l} = (\varvec{I}_{h} * f) \downarrow s, $$
(1)

where \( f \) is a blurring kernel, \( * \) is the convolution operation and \( \downarrow s \) means down-sampling with a factor \( s \). In addition to the prior degradation model, an external training set \( \varvec{TS} \) containing various HR images \( \varvec{T}_{h} \) can be used. The objective of the example-based image SR methods is to predict the missing details in \( \varvec{I}_{l} \) by “borrowing” information from some similar examples in \( \varvec{TS} \) [15] and then reconstruct \( \varvec{I}_{h} \). Next, we make an analysis from the following three aspects for the essence of sparse representation based image SR methods. Meanwhile, some representative prior work is also reviewed.

The first aspect is how to build the link between LR and HR image patches based on \( \varvec{TS} \) through learning. The link is a learned prior for SR reconstruction. With the degradation model, the LR version \( \varvec{T}_{l} \) for each \( \varvec{T}_{h} \) can be formed by \( \varvec{T}_{l} = (\varvec{T}_{h} * f) \downarrow s \). For the sampling convenience, \( \varvec{T}_{l} \) is enlarged to be with the same size of \( \varvec{T}_{h} \) by interpolation. Then, a number of image patch-pairs \( (\varvec{Tp}_{l} ,\varvec{Tp}_{h} ) \) are sampled from every image-pair \( (\varvec{T}_{l} \text{,}\varvec{T}_{h} ) \), representing the LR and HR version of the same region respectively. In the sparse representation based image SR methods, an effective link is through the common sparse representation coefficient [710]. Specifically speaking, a \( \varvec{Tp}_{l} \) can be represented by a sparse vector over a LR over-complete dictionary \( \varvec{D}_{l} \) and at the same time the corresponding \( \varvec{Tp}_{h} \) owns the same sparse vector over a HR dictionary \( \varvec{D}_{h} \). Therefore, a key problem is how to construct the over-complete dictionary-pair \( (\varvec{D}_{l} ,\varvec{D}_{h} ) \) to meet the requirement. The second aspect is how to express the LR and HR image patches. In [5], the authors pointed out that the highest spatial-frequency components of the LR image are most important in predicting the extra details. Usually, there are three ways to get them, the high-pass filter [4, 5, 10], the image primitives [15, 16] and the low-order derivatives [6, 9]. For each LR image patch, a signal vector \( \varvec{v}_{l} \) can be formed from its highest spatial-frequency components. For a HR image patch, its mean value or LR version is often subtracted and then \( \varvec{v}_{h} \) is formed. These signal vector-pairs \( (\varvec{v}_{l} ,\varvec{v}_{h} ) \) represent the raw image patch-pairs. The third aspect is how to reconstruct \( \varvec{I}_{h} \) from \( \varvec{I}_{l} \). Usually, the input \( \varvec{I}_{l} \) is scaled-up to the size of \( \varvec{I}_{h} \) by interpolation. Since the SR scheme works on the patches, \( \varvec{I}_{l} \) is broken into patches in advance. Then, the signal vector \( \varvec{v}_{l} \) of each patch \( \varvec{Ip}_{l} \) is calculated and its sparse representation vector \( \varvec{\alpha} \) is inferred over \( \varvec{D}_{l} \). Depending on the link learned in training, a HR version \( \hat{\varvec{I}}\varvec{p}_{h} \) can be estimated. Through the rational combination such as averaging the multiple estimated values for each pixel, a HR image can be obtained.

3 The Details of Our Proposed SR Method

3.1 A New Dictionary Learning Algorithm K-EIG

Assume that the given set of signal examples is \( \varvec{Y = }\{ \varvec{y}_{i} \}_{i = 1}^{M} \). \( \varvec{y}_{i} (\varvec{y}_{i} \in {\mathbb{R}}^{n} ) \) is a signal example and \( M \) is the number of signal examples. According to the theory of signal sparse representation [17], each \( \varvec{y}_{i} \) can be represented approximately by a sparse vector \( \varvec{\alpha}_{i} (\varvec{\alpha}_{i} \in {\mathbb{R}}^{L} ) \) over an over-complete dictionary \( \varvec{D} \), \( \varvec{D} \in {\mathbb{R}}^{n \times L} (L > n) \). Let \( \varvec{A} \) be a sparse vector set \( \varvec{A} = \{\varvec{\alpha}_{i} \}_{i = 1}^{M} \). Note that \( \varvec{Y} \) and \( \varvec{A} \) can be also treated as the matrices. Then, the problem of dictionary learning can be described as:

$$ \mathop {\hbox{min} }\limits_{{\varvec{D},\varvec{A}}} \{ ||\varvec{Y} - \varvec{DA}||_{F}^{2} \} \, s.t. \, \forall i, \, ||\varvec{\alpha}_{i} ||_{0} \le T_{0} , $$
(2)

where \( \parallel\varvec{\alpha}_{i} \parallel_{0} \) stands for the count of nonzero entries in \( \varvec{\alpha}_{i} \), \( T_{0} \) is the maximum allowed number of nonzero entries in \( \varvec{\alpha}_{i} \). By now K-SVD [12] is the most prevalent dictionary learning algorithm [1820]. It alternates between sparse coding of signal vectors and updating the dictionary atoms to get the solution. The creativity of K-SVD is that it updates one column in \( \varvec{D} \) at a time. All columns in \( \varvec{D} \) except one atom \( d_{k} \) is fixed. Besides, the kth row in \( \varvec{A} \), denoted as \( \alpha_{T}^{k} \), is also put in question. So, the penalty term can be rewritten as:

$$ ||\varvec{Y} - \varvec{DA}||_{F}^{2} = ||(\varvec{Y} - \sum\limits_{j \ne k} {\varvec{d}_{j}\varvec{\alpha}_{T}^{j} } ) - \varvec{d}_{k}\varvec{\alpha}_{T}^{k} ||_{F}^{2} = ||\varvec{E}_{k} - \varvec{d}_{k}\varvec{\alpha}_{T}^{k} ||_{F}^{2} . $$
(3)

\( \varvec{E}_{k} \) stands for the representation error matrix based on the current \( \varvec{D} \) and \( \varvec{A} \) when \( \varvec{d}_{k} \) is removed. Note that some signal examples do not use \( \varvec{d}_{k} \), so \( \varvec{E}_{k} \) should be restricted as \( \varvec{E}_{k}^{R} \) by choosing only the related columns. In order to use \( \varvec{E}_{k}^{R} \) to find the unknown \( \tilde{\varvec{d}}_{k} \) and \( \tilde{\varvec{\alpha }}_{T}^{k} \) for minimizing \( ||\varvec{E}_{k}^{R} - \varvec{d}_{k}\varvec{\alpha}_{T}^{k} ||_{F}^{2} \), the singular value decomposition (SVD) of \( \varvec{E}_{k}^{R} \) is employed:

$$ \varvec{E}_{k}^{R} = {\mathbf{USV}}^{T} ,\text{ }\tilde{\varvec{d}}_{k} = {\mathbf{U}}(:,1),\text{ }\tilde{\varvec{\alpha }}_{T}^{k} = {\mathbf{V}}^{T} (:,1) \times {\mathbf{S}}(1,1). $$
(4)

After updating dictionary atoms one by one, a new \( \varvec{D} \) can be obtained. Then, go back to the stage of sparse coding. A satisfactory dictionary fitting \( \varvec{Y} \) can be generated after several alternations. Unfortunately, the operation of \( \varvec{E}_{k}^{R} \)’s SVD is time-consuming since the column number of \( \varvec{E}_{k}^{R} \) is usually large. It causes the high computational cost. Rubinstein et al. [11] offered an approximate K-SVD (AK-SVD) algorithm to accelerate K-SVD. However, it sacrifices the performance of the learned dictionary to promote the training speed. Next, we propose a new algorithm K-EIG to accelerate the learning process and try to keep the similar performance of the learned dictionary. The details of derivation are as follows.

For the convenience of derivation, the target function \( ||\varvec{E}_{k}^{R} - \varvec{d}_{k}\varvec{\alpha}_{T}^{k} ||_{F}^{2} \) is denoted as \( ||\varvec{E} - \varvec{d\alpha }||_{F}^{2} \). Then, we expand \( || \cdot ||_{F}^{2} \):

$$ \begin{aligned} fu & = \text{ }||\varvec{E} - \varvec{d\alpha }||_{F}^{2} = \sum\limits_{j} {\sum\limits_{i} {(d(i)\alpha (j) - E(i,j))^{2} } } \\ & = \sum\limits_{j} {\sum\limits_{i} {(d(i)^{2} \alpha (j)^{2} - 2d(i)\alpha (j)E(i,j) + E(i,j)^{2} ).} } \\ & = \sum\limits_{j} {[(\sum\limits_{i} {d(i)^{2} } )\alpha (j)^{2} - 2(\sum\limits_{i} {d(i)} E(i,j))\alpha (j) + \sum\limits_{i} {E(i,j)}^{2} ]}. \\ \end{aligned}$$
(5)

Note that \( \sum\limits_{i} {d(i)^{2} } = 1 \). Besides, we can ignore \( \sum\limits_{i} {E(i,j)^{2} } \) because it can be treated as a constant. Therefore, \( fu \) turns to:

$$ fu = \sum\limits_{j} {[\alpha (j)^{2} - 2(\sum\limits_{i} {d(i)E(i,j)} )\alpha (j)]} . $$
(6)

The formula in \( [ \cdot ] \) can be considered as a form of quadratic equation about \( \alpha (j) \). So, the minimum value of \( fu \) is reached at the bottom of the quadratic curve when \( \alpha (j) = \sum\limits_{i} {d(i)E(i,j)} \). And now:

$$ fu = - \sum\limits_{j} {(\sum\limits_{i} {d(i)E(i,j)} )^{2} } . $$
(7)

Thus \( \alpha (j) \) is eliminated from \( fu \) and we can solve \( \varvec{d} \) directly under the normalization constraint with the help of Lagrange multiplier method. The Lagrange function is:

$$ Lf = - \sum\limits_{j} {(\sum\limits_{i} {d(i)E(i,j)} )^{2} } + \lambda (\sum\limits_{i} {d(i)^{2} } - 1). $$
(8)

Let \( \partial Lf/\partial d(m) = 0 \) where \( d(m) \) is the mth element in \( \varvec{d} \). Then we can obtain:

$$ \begin{aligned} \partial Lf/\partial d(m) & = - \sum\limits_{j} {(\sum\limits_{i} {d(i)E(i,j)} )} E(m,j) + \lambda d(m) = 0 \\ & \Rightarrow \sum\limits_{i} {d(i)\sum\limits_{j} {E(i,j)E(m,j)} } - \lambda d(m) = 0,\quad m = 1,2, \ldots ,n. \\ \end{aligned}$$
(9)

Let \( \varvec{R} \) be the autocorrelation matrix of \( \varvec{E} \), i.e. \( \varvec{R = E} \cdot \varvec{E}^{T} \). So, \( \sum\limits_{j} {E(i,j)E(m,j)} = R(i,m) \). Then, Eq. (9) turns to:

$$ \begin{aligned} & \sum\limits_{i} {R(i,m)d(i)} - \lambda d(m) = 0, \, m = 1,2, \ldots ,n \\ & \Rightarrow R(:,m)^{T} \varvec{d} = \lambda d(m), \, m = 1,2, \ldots ,n \Rightarrow \varvec{R}^{T} \varvec{d} = \lambda \varvec{d}\varvec{.} \\ \end{aligned} $$
(10)

Obviously, the solution of the above problem can be obtained by calculating the eigenvector of \( \varvec{R}^{T} \). We choose the eigenvector corresponding to the largest eigenvalue as the optimal solution of the problem. Then, use \( \varvec{\alpha}= \varvec{d}^{T} \varvec{E} \) to get \( \varvec{\alpha} \). Thus, we replace the implementation of \( \varvec{E} \)’s SVD with \( \varvec{R}^{T} \)’s eigen-decomposition and name the whole algorithm as K-Eigen decomposition. Compared with K-SVD, the advantage of K-EIG is computational simplicity. The reason is as follows. In K-SVD, the object of SVD is the matrix \( \varvec{E}_{k}^{R} \) with the size \( n \times M_{k} \), where \( M_{k} \) represents the number of signals using \( \varvec{d}_{k} \). Usually, \( M_{k} \) is related to the total number of input signals \( M \). With the increasing of \( M \), the column number of \( \varvec{E}_{k}^{R} \) is larger. It results in poor computational efficiency due to the slow implementation of \( \varvec{E}_{k}^{R} \)’s SVD. Whereas, K-EIG implements eigen-decomposition of \( \varvec{R}^{T} \). \( \varvec{R}^{T} \) is a square matrix of size \( n \times n \). It is much smaller than \( \varvec{E}_{k}^{R} \) since \( n \ll M_{k} \). So, the implementation of \( \varvec{R}^{T} \)’s eigen-decomposition is much faster. It makes K-EIG more time-saving.

As described in Sect. 2, based on the signal vector-pairs \( (\varvec{v}_{l} ,\varvec{v}_{h} ) \) sampled from \( (\varvec{T}_{l} \text{,}\varvec{T}_{h} ) \), we train the dictionary-pair \( (\varvec{D}_{l} \text{,}\varvec{D}_{h} ) \) to satisfy the following formula:

$$ \left\{ {\begin{array}{*{20}c} {\mathop {\hbox{min} }\limits_{{\varvec{D}_{l} ,\varvec{\alpha}}} \parallel \varvec{v}_{l}^{(i)} - \varvec{D}_{l} \cdot\varvec{\alpha}^{(i)} \parallel_{2}^{2} \, s.t. \, \parallel\varvec{\alpha}^{(i)} \parallel_{0} \le T_{0} } \\ {\mathop {\hbox{min} }\limits_{{\varvec{D}_{h} ,\varvec{\alpha}}} \parallel \varvec{v}_{h}^{(i)} - \varvec{D}_{h} \cdot\varvec{\alpha}^{(i)} \parallel_{2}^{2} \, s.t. \, \parallel\varvec{\alpha}^{(i)} \parallel_{0} \le T_{0} } \\ \end{array} } \right.,\quad \forall i \in \{ 1,2, \ldots ,M\} . $$
(11)

In order to get \( (\varvec{D}_{l} \text{,}\varvec{D}_{h} ) \), we first generate \( \varvec{D}_{l} \) based on \( \{ \varvec{v}_{l}^{(1)} ,\varvec{v}_{l}^{(2)} , \ldots ,\varvec{v}_{l}^{(M)} \} \) by solving:

$$ \varvec{D}_{l} ,\{\varvec{\alpha}^{(i)} \} = \mathop {\arg \hbox{min} }\limits_{{\varvec{D}_{l} ,\{\varvec{\alpha}^{(i)} \} }} \sum\limits_{i = 1}^{M} {\parallel \varvec{v}_{l}^{(i)} - \varvec{D}_{l} \cdot\varvec{\alpha}^{(i)} \parallel_{2}^{2} } , \, s.t. \, \parallel\varvec{\alpha}^{(i)} \parallel_{0} \le T_{0} ,\quad \forall i \in \{ 1,2, \ldots ,M\} . $$
(12)

K-EIG can serve the above problem and we get \( \varvec{D}_{l} \) and \( \{\varvec{\alpha}^{(i)} \} \). Then, \( \{\varvec{\alpha}^{(i)} \} \) and \( \{ \varvec{v}_{h}^{(1)} ,\varvec{v}_{h}^{(2)} , \ldots ,\varvec{v}_{h}^{(M)} \} \) both guide the computation of \( \varvec{D}_{h} \) directly as the way in [10]. Let \( \varvec{V}_{h} = [\varvec{v}_{h}^{(1)} ,\varvec{v}_{h}^{(2)} , \ldots ,\varvec{v}_{h}^{(M)} ] \) and \( \varvec{A} = [\varvec{\alpha}^{(1)} ,\varvec{\alpha}^{(2)} , \ldots ,\varvec{\alpha}^{(M)} ] \), then \( \varvec{D}_{h} \) can be obtained by \( \varvec{D}_{h} = \varvec{V}_{h} \cdot \varvec{A}^{T} \cdot (\varvec{AA}^{T} )^{ - 1} \). Now, \( (\varvec{D}_{l} \text{,}\varvec{D}_{h} ) \) can meet the requirement of common sparse representation coefficient. Compared with the joint dictionary training method used in [9], the direct dictionary-pair construction method employing K-EIG is fast and effective.

3.2 Sparse Representation Based SR Framework with Example Classification and Extension

Considering the types of image patches, we do an important work before training. That is classification for the sampled image patches. Each patch-pair \( (\varvec{Tp}_{l} ,\varvec{Tp}_{h} ) \) will be put into a classifier and then marked by a label “smooth” or “edge”. We employ the sorted quadrant median vector (SQMV) scheme proposed in [21] to classify every \( \varvec{Tp}_{h} \). SQMV is a simple but efficient tool to recognize an image patch belonging to uniform region or edge region. Then, the set of example-pairs is separated into two parts, “smooth” set and “edge” set. In order to increase the number of “edge” patch-pairs before learning, we rotate each HR image \( \varvec{T}_{h} \) by three different angles (90°, 180° and 270°) and three different images are formed. Then, we can generate more “edge” patch-pairs. For “smooth” class, there is no need to increase the patch number. Based on the two sets of signal vector-pairs, two different dictionary-pairs \( (\varvec{D}_{l}^{smooth} ,\varvec{D}_{h}^{smooth} ) \) and \( (\varvec{D}_{l}^{edge} ,\varvec{D}_{h}^{edge} ) \) are constructed respectively.

In reconstruction, the input LR image \( \varvec{I}_{l} \) is first interpolated to the objective size and then broken into lots of overlapped patches \( \varvec{Ip}_{l} \) with the size of \( w \times w \). For each \( \varvec{Ip}_{l} \), we use the first-order combining second-order derivatives to represent it and form a signal vector \( \varvec{v}_{l} \). Through labeling \( \varvec{Ip}_{l} \) by SQMV scheme, we choose the appropriate dictionary-pair \( (\varvec{D}_{l}^{smooth} ,\varvec{D}_{h}^{smooth} ) \) or \( (\varvec{D}_{l}^{edge} ,\varvec{D}_{h}^{edge} ) \) for SR. A key problem is to infer the sparse representation vector of \( \varvec{v}_{l} \) over \( \varvec{D}_{l} \) by \( \hat{\varvec{\alpha }} = \mathop {\arg \hbox{min} }\limits_{\varvec{\alpha}} \parallel \varvec{v}_{l} - \varvec{D}_{l} \cdot\varvec{\alpha}\parallel_{2}^{2} \, s.t. \, \parallel\varvec{\alpha}\parallel_{0} \le T_{0} \). Orthogonal matching pursuit (OMP) [22] or convex relaxation techniques [17] can be used to solve the problem. After getting \( \hat{\varvec{\alpha }} \) for \( \varvec{Ip}_{l} \), we can reconstruct its HR version by \( \hat{\varvec{I}}\varvec{p}_{h} = \varvec{Ip}_{l} + reshape(\varvec{D}_{h} \cdot \hat{\varvec{\alpha }}) \) where \( reshape( \cdot ) \) means rearranging a vector to a patch. By the simple “averaging” of these overlapped \( \hat{\varvec{I}}\varvec{p}_{h} \), we get the estimated HR image \( \hat{\varvec{I}}_{h} \). Besides, to satisfy the degradation model, we add an operation of “backprojection” [9] to modify \( \hat{\varvec{I}}_{h} \) and generate the last \( \hat{\varvec{I}}_{h} \).

4 Simulations and Results

4.1 Testing the Performance of K-EIG

We test the performance of K-EIG by a synthetic experiment and compare it with other typical dictionary learning algorithms. A random matrix \( \varvec{D}_{g} \) of size \( 20 \times 50 \) is generated with i.i.d uniformly distributed entries. Each column is normalized to a unit \( \ell_{2} \)-norm. Then a number of data signals \( \{ \varvec{y}_{i} \}_{i = 1}^{M} \) of dimension 20 are produced. Each signal is created by a linear combination of three different atoms in \( \varvec{D}_{g} \), with uniformly distributed i.i.d coefficients in random and independent locations. Then, K-SVD [12], AK-SVD [11] and K-EIG are implemented respectively to obtain three recovered dictionaries \( \varvec{D}_{r} \) based on \( \{ \varvec{y}_{i} \}_{i = 1}^{M} \) and then we observe which \( \varvec{D}_{r} \) is closest to \( \varvec{D}_{g} \). Authors can see [12] for more details about success ratio computation. In order to test the robustness of algorithm, white Gaussian noise with varying signal-to-noise ratio (SNR) is added to \( \{ \varvec{y}_{i} \}_{i = 1}^{M} \). Let \( M = 1500 \) and \( T_{0} = 3 \). Set the number of alternate iteration by 30. The results of atom recovery performance are shown in Table 1. We see that K-SVD obtains the best performance under any SNR situation. The average success ratio of K-EIG is very close to K-SVD. AK-SVD is inferior to the other two algorithms. Besides, we compare the running speed of the three algorithms. We change the input signal number from \( M = 1000 \) to \( M = 96000 \) with interval of 5000 and keep the other parameters unchanged. The running platform is MATLAB 2009 and the computer is equipped with 2.60 GHZ Inter(R) Core(TM) i5 CPU & 4.0 GB memory. Figure 1 shows the time consuming curve. We see that the computation time of K-SVD rises rapidly with the increasing of input signals number. On the contrary, the time consuming of K-EIG and AK-SVD always keeps a low level even in the situation of large M.

Table 1. Comparison of several dictionary learning algorithms in average success ratio of atom recovery
Fig. 1.
figure 1figure 1

Time consuming curve of three dictionary learning algorithms

4.2 Testing Image SR Performance

In our SR experiments, we use a training set \( \varvec{TS} \) containing some HR natural images such as flowers, architectures, animals and human faces. In simulation, we assume that the degradation model \( \varvec{I}_{l} = (\varvec{I}_{h} *f) \downarrow s \) is operated by the “bicubic filter” with a zoom factor \( 1/s \). During training, we extract about \( 8 \times 10^{4} \) image patch-pairs to be the training examples for each class. The patch size is \( 5 \times 5 \). The column number of \( \varvec{D}_{l} \) or \( \varvec{D}_{h} \) is set by 1000. The number of alternate iteration of K-EIG is set by 40. Five color images are chosen as the test images. They are “lena”, “child”, “flower”, “building”, and “zebra”. Then, five corresponding LR images are generated by the known degradation model. Our task is to reconstruct the HR versions of these LR images. As the way in other literatures, we just apply our SR algorithm to the illuminance channel (Y). The other channels (Cb, Cr) are processed by bicubic interpolation simply. Several representative and related image SR methods are also implemented for comparison purpose. They are “bicubic interpolation”, “neighbor embedding [6]”, “Yang’s method [9]” and “Zeyde’s method [10]”. We choose the peak signal-to-noise ratio (PSNR) and the mean structure similarity index measure (MSSIM) [23] as the performance indexes to measure the quality of the reconstructed image.

We show the quantitative SR results in Tables 2 and 3 in case of \( s = 2 \) and \( s = 3 \). Clearly, our proposed method achieves the highest PSNRs and MSSIMs for all the test images. In order to compare the reconstruction results visually, we present the estimated HR images by nearest interpolation, bicubic interpolation, neighbor embedding, Yang’s, Zeyde’s and our method in case of \( s = 3 \) in Figs. 2 and 3. Meanwhile, some details are magnified to clearly display the subtle difference. From them, we see that bicubic interpolation eliminates the sawtooth appearance in the images obtained by nearest interpolation, but it generates the blurred edges. Neighbor embedding is easy to produce some artifacts. For example, in the part between hat and face in Fig. 2(d), we can observe the discontinuous. Sparse representation based methods generate clearer HR images. Among them, our algorithm obtains the best results which are closest to the original HR images, thanks to the integration of patch classification and edge patches extension into the SR framework. In Fig. 3, we can see obviously that the lattice in Fig. 3(g) is reconstructed better than Fig. 3(b) and (f).

Table 2. PSNRs (MSSIMs) results of reconstructed images with s = 2
Table 3. PSNRs (MSSIMs) results of reconstructed images with s = 3
Fig. 2.
figure 2figure 2

(a) Input LR ‘child’ image, (b) estimated HR ‘child’ by nearest interpolation, (c) estimated HR ‘child’ by bicubic interpolation, (d) estimated HR ‘child’ by neigbor embedding, (e) estimated HR ‘child’ by Yang’s method, (f) estimated HR ‘child’ by Zeyde’s, (g) estimated HR ‘child’ by our method, (h) original ‘child’ image.

Fig. 3.
figure 3figure 3

(a) Input LR ‘building’ image, (b) estimated HR ‘building’ by nearest interpolation, (c) estimated HR ‘building’ by bicubic interpolation, (d) estimated HR ‘building’ by neigbor embedding, (e) estimated HR ‘building’ by Yang’s method, (f) estimated HR ‘building’ by Zeyde’s, (g) estimated HR ‘building’ by our method, (h) original ‘building’ image.

4.3 Testing the Time Performance of Dictionary-Pair Construction

Here, we compare the time performance of the three dictionary-pair construction methods (Yang’s, Zeyde’s and ours). The platform and computer configuration are the same as description in Sect. 4.1. The CPU running time is presented in Table 4. Obviously, our and Zeyde’s method are much faster than Yang’s method. Our method is slightly slower than Zeyde’s one because our training process includes patch-classification, edge-patches extension and learning of two dictionary-pairs. It is worthy of spending more several minutes to obtain better dictionary-pair.

Table 4. Time comparison of three dictionary-pair constructing methods

5 Conclusion

In this paper, we propose a sparse representation based single image super resolution algorithm with a new dictionary learning technique K-EIG. Our proposed K-EIG is an improved algorithm based on the classical K-SVD algorithm. It replaces the implementation of SVD about the representation error matrix with eigen-decomposition about its autocorrelation matrix. This change overcomes the drawback that the operation of SVD is time-consuming in case of the large number of training examples. We employ a direct dictionary-pair construction method based on K-EIG to accelerate the whole training process. Besides, considering the types of image patches and the number of edge examples, patch classification and edge patches extension are integrated rationally into the SR framework. Extensive experimental results show that our SR algorithm generates better SR images compared with some classical or related methods.