Keywords

8.1 Introduction

Image super-resolution (SR) can produce the high-resolution (HR) enhancement version of single or multiple observed low-resolution (LR) images without changing the low-cost image acquisition sensors [1]. Therefore, image SR attracts much attention and always plays an important role in various applications, such as the remote sensing, the satellite image application, the medical imaging diagnosis system, the digital television system. As a result, various SR methods have been proposed, mainly including the interpolation-based methods [2], the regularized reconstruction-based methods [13] and the learning-based methods [412].

The learning-based SR methods have been comprehensively researched since the example-based approach in [4] was proposed. In these methods, the prior relationship between the HR and LR training set can usually be learned in terms of the image structure and content. The relationship is very beneficial to recovering high-resolution version of the LR test image. Consequently, various learning-based SR approaches are put forward [512]. Referring to manifold learning strategy, Chang et al. [5] utilize the principle of local linear embedding (LLE) in SR. Their approach is helpful in decreasing the scale of training set, but the fitting problem makes the approach imperfect in image sharpness [7]. In [6], the contourlet transform is introduced into the single-image SR to capture the smoothness contours by directional decompositions. The dictionary-based learning approach via sparse representation is further introduced into image SR by Yang et al. [7, 8]. After the work of Yang, some improved approaches based on sparse representation are also proposed. In [9], a texture constrained sparse representation is proposed. However, the approach is sensitive to noise. In [10], the dictionary is trained by the difference between the HR images and LR images. The approach performs well for image denoising and SR reconstruction. In [11], a hierarchical clustering algorithm is applied to optimize the parameter for the dictionary learning way in [7].

However, the aforementioned SR approaches focus to the omitted high-frequency texture component and should pay more attention to the other component. Inspired by the morphological component analysis (MCA) [12], a novel image SR approach based on MCA and dictionary learning is applied. In our approach, MCA is utilized to decompose the interpolated LR test image into the cartoon and texture component. The texture component is discarded. The cartoon component is reserved as the HR cartoon part of the expected HR image. The expected HR image’s texture is estimated by means of the HR\LR dictionary pairs established from training image set. The reconstructed image result can be obtained by combining the HR cartoon and the HR texture. Thus, each component of the reconstructed HR image can be both focused on well. Experimental results show the desired super-resolution image quality can be achieved by the approach based on MCA and dictionary learning.

In the outline of this paper, Sect. 8.2 describes our approach based on MCA and dictionary learning in detail, Sect. 8.3 conducts some experiments and analyzes the results, and Sect. 8.4 shows the conclusion for this paper briefly.

8.2 Our Approach for Image Supper-Resolution

In the proposed approach, two steps are necessary: the dictionary learning step and the SR reconstruction step. The former can provide a pair of dictionaries for the later to obtain a high-resolution enhancement version of the LR test image. In each step, the MCA theory is utilized to decompose images.

8.2.1 Image Decomposition Using MCA

In the view of MCA, an original image X is the overlap of M different morphology layers \( \{ X_{i} \}_{i = 1}^{M} \), as shown in Eq. (8.1).

$$ X = \left\{ {{\text{X}}_{\text{i}} } \right\}_{{{\text{i}} = 1}}^{M} = {\text{X}}_{ 1} + {\text{X}}_{ 2} + \cdots + {\text{X}}_{\text{M}} $$
(8.1)
$$ X_{t} = T_{t} \alpha_{t} \begin{array}{*{20}c} , & {X_{s} = } \\ \end{array} T_{s} \alpha_{s} $$
(8.2)

In the SR application, two morphology layers \( \{ X_{t} ,X_{s} \} \) should be separated from X. \( X_{t} \) denotes the texture and \( X_{s} \) is for the needed cartoon. Correspondingly, \( T_{t} \) is the dictionary for \( X_{t} \) and \( T_{s} \) is for \( X_{s} \). Then, the layers should be optimally represented by sparse representation \( \{ \alpha_{t} ,\alpha_{s} \} \) in Eq. (8.2), where, \( X_{t} ,X_{s} \in R^{N} ,T_{t} ,T_{s} \in R^{N \times L} (L \gg N) \). To realize sparse representation, the noise-constraint solution via the basis pursuit (BP) is often used in Eq. (8.3) [12]. Here, \( \varepsilon \) stands for the noise level as the prior information in the decomposed image.

$$ \{ \alpha_{t}^{opt} ,\alpha_{s}^{opt} \} = \mathop {Arg}\limits_{{}} \mathop {\hbox{min} }\limits_{{\{ \alpha_{t} ,\alpha_{s} \} }} \left\| {\alpha_{t} } \right\|_{1} + \hbox{min} \left\| {\alpha_{s} } \right\|_{1} \;\;\;s.t.\left\| {X - T_{t} \alpha_{t} - T_{s} \alpha_{s} } \right\| \le \varepsilon $$
(8.3)
$$ X_{s} = X - T_{t} \alpha_{t}^{opt} $$
(8.4)

In the dictionary learning step for SR, the approach in Eq. (8.4) is adopted to further leave all the noise in the cartoon component after MCA image decomposition by Eq. (8.3). The solution is robust to generate the texture part with little noise from the HR training image in the super-resolution.

$$ \{ \alpha_{t}^{opt} ,\alpha_{s}^{opt} \} = \mathop {Arg}\limits_{{}} \mathop {\hbox{min} }\limits_{{\{ \alpha_{t} ,\alpha_{s} \} }} \left\| {\alpha_{t} } \right\|_{1} + \hbox{min} \left\| {\alpha_{s} } \right\|_{1} + \lambda \left\| {X - T_{t} \alpha_{t} - T_{s} \alpha_{s} } \right\|_{2}^{2} \; + \gamma TV\{ T_{s} \alpha_{s} \} $$
(8.5)

However, Eq. (8.3) is not suitable to extract smooth cartoon in SR reconstruction step. So Eq. (8.5) is adopted in this step [12]. In this solution, the noise-constraint condition is replaced by the unconstrained penalized item \( \lambda ||X - T_{t} \alpha_{t} - T_{s} \alpha_{s} ||_{2}^{2} \). Besides, a total variation (TV) penalty is adopted to obtain the cartoon structure with pronounced edge. Here, \( TV\{ T_{s} \alpha_{s} \} \) is the \( l_{1} \)-norm of the gradient. Figure 8.1 shows image decomposition result by MCA with the TV penalty and the original image can be found in http://pan.baidu.com/s/1c0Ix6Zu. The cartoon component is completed and piecewise smooth. Thus, it can help the SR reconstruction step recover the cartoon part of the expected HR image efficiently.

Fig. 8.1
figure 1

Image decomposition by MCA. Left original image, Middle cartoon, Right texture

8.2.2 The Training of the HR/LR Dictionary Pairs

In the dictionary training, MCA and dictionary learning method via sparse representation are combined to describe the feature of HR/LR images. The flowchart for this step is shown in Fig. 8.2.

Fig. 8.2
figure 2

The flowchart for dictionary learning step

In the feature extraction for LR dictionary, each LR image is interpolated and divided into patches with \( N \times N \) pixels and Eq. (8.6) are used as the derivative feature of image patches [5, 7]. Thus, the feature vector obtained for each LR patch by concatenating the four vectors has a length of \( 4N^{2} \). To reduce the complexity, the principal components analysis (PCA) is employed to reduce feature space. The reduced feature \( y_{i} \) can reflect the corresponding low-resolution patch. Eventually, LR set can be described as shown in Eq. (8.7).

K-SVD [13] is used for training LR dictionary. The algorithm aims to iteratively improve the initial dictionary and achieve optimal sparse representations \( A \) for the feature set \( Y_{1} \) in Eq. (8.8) [13]. Here, \( \alpha_{i} \) is the ith columm vector of sparse matrix \( A \) and denotes the sparse code for \( y_{i} \), \( T_{0} \) is the target sparsity constraint. Eventually, the optimal LR dictionary and the homologous sparse representations can be obtained by the algorithm.

In the HR texture extraction, Bicubic interpolation [2] are employed to magnify LR images into the HR image’s size. The MCA with the noise-constraint in Eq. (8.3) are applied to separate cartoon component from the interpolated image. Thus, the HR texture is extracted from the HR image by image subtraction.

$$ \begin{aligned} & f_{1} = [ - 1,0,1],\quad \quad \quad f_{2} = f_{1}^{T} \\ & f_{3} = [1,0, - 2,0,1],\quad \;f_{4}\,=\,f_{3}^{T} \quad \\ \end{aligned} $$
(8.6)
$$ Y_{l} = \{ y_{i} \}_{i = 1}^{n} = \{ y_{1} ,y_{2} , \ldots ,y_{n} \} $$
(8.7)
$$ \mathop {\hbox{min} }\limits_{{D_{L} ,A}} \{ ||Y_{l} - D_{L} A||_{F}^{2} \} \begin{array}{*{20}c} , & {s.t.\begin{array}{*{20}c} {\forall i} & {||\alpha_{i} ||_{0} \le T_{0} } \\ \end{array} } \\ \end{array} $$
(8.8)
$$ X_{h} = \{ x_{i} \}_{i = 1}^{n} = \{ x_{1} ,x_{2} , \ldots ,x_{n} \} $$
(8.9)

In the HR feature extraction, each texture image is divided into patches and the feature is extracted by the same method in the feature extraction for LR images. Eventually, the HR texture image set can be described in Eq. (8.9).

The HR dictionary is acquired by assuming that the patches can be represented by the same sparse matrix \( A \) under the HR/LR dictionary pairs. The HR dictionary in Eq. (8.10) can be considered to solve the pseudo-inverse problem in Eq. (8.11). Figure 8.3 shows some HR dictionary’s atoms obtained by the approach.

Fig. 8.3
figure 3

The visual version of some HR dictionary’s atoms from our experiment

$$ D_{\text{h}} = \arg \hbox{min} \left\| {X_{h} - D_{\text{h}} A} \right\|_{F}^{2} $$
(8.10)
$$ D_{\text{h}} = X_{h} A^{ + } = X_{h} A^{T} \left( {AA^{T} } \right)^{ - 1} $$
(8.11)

8.2.3 The Reconstruction for Super-Resolution Image

In this section, an approach based on MCA and sparse representation is presented to obtain HR image using inputted LR image. The flowchart for the SR image reconstruction step is shown in Fig. 8.4.

Fig. 8.4
figure 4

The flowchart for SR reconstruction step

In the step, the magnified LR image is decomposed into the cartoon and texture component by the MCA method with TV penalty item in Eq. (8.5). The decomposed cartoon component is retained as the HR cartoon part of the expected HR image. The same methods from the dictionary learning step are applied to extract the feature vector \( z_{i} \) of each LR patch. The input LR image can be described as shown in Eq. (8.12).

$$ Z_{test} = \{ z_{i} \}_{i = 1}^{{n_{1} }} = \{ z_{1} ,z_{2} , \ldots ,z_{{n_{1} }} \} $$
(8.12)

In the sparse representation, the OMP algorithm is applied in sparse coding for the LR feature by iteratively approximating the solution of the problem in Eq. (8.13) [13].Thus, the sparse representation matrix for the input LR image can be obtained under the LR dictionary.

$$ \gamma_{i} = \mathop {\arg \hbox{min} }\limits_{{\gamma_{\text{t}} }} \left\| {z_{i} - D_{\text{L}} \gamma_{\text{i}} } \right\|_{2}^{2} \quad s.t.\;\left\| {\gamma_{\text{i}} } \right\|_{0} \le T_{0} $$
(8.13)

To estimate HR texture, each texture patch \( p_{i} \) can be calculated using the relevant sparse code and the HR dictionary in Eq. (8.14). Thus, the texture of the expected HR can be obtained using all the texture image patches.

$$ p_{i} = D_{h} \,\,\gamma_{i} $$
(8.14)

Finally, the cartoon part from MCA image decomposition and the texture part from the sparse representation under the dictionary pairs are fused into the expected HR image.

8.3 Experiment and Result Analysis

60 training images (t1.bmp ~ t60.bmp in the folder ‘CVPR08-SR/Data/Training’) are download from http://www.ifp.illinois.edu/~jyang29/ScSR.htm in the experiment. Each training image is divided into patches as the same way in [7] to train the HR\LR dictionary pairs. Four LR test images (Lena, Peppers, Comic and Butterfly) are used for SR reconstruction, which are the down-sampled images from http://pan.baidu.com/s/1c0Ix6Zu. For the color images, only the luminance channel is processed by various SR methods and the others are processed using Bicubic interpolation [2].

8.3.1 The Quality Evaluation for SR Reconstruction

In this section, some experiments are implemented for image super-resolution by different SR methods, including NN [2], Bicubic interpolation [2], Yang’s in [8] and our approach. For each method, the magnification is always set as 3 and three evaluation indicators are employed to evaluate the results: PSNR, SSIM [14] and RMSE. The smaller RMSE and the bigger PSNR or SSIM mean the better image quality. The reconstructed HR images are shown in Fig. 8.5.

Fig. 8.5
figure 5

The SR results using different methods. From left to right in each row: LR image, original HR image, the reconstructed results by NN, Bicubic, Yang’s and our approach

As shown in Fig. 8.5, the edges are usually jagged and some speckle rags stay in the SR results from NN and Bicubic interpolation. Yang’s result is a little smooth and the local texture isn’t enough prominent. In the result by our approach, the quality of the edges and texture is improved, such as the hat edges in Lena, the complex fold structures in the drawn rectangular region or the gangly pepper in Peppers. Besides, our approach shows the least freckles and jagged blocks among these methods, such as the texture fold, the curved boundaries of the garment and the detailed decoration around the neck in Comic, and the marking of the butterfly’s wings in Butterfly.

To avoid the visual error, PSNR, RMSE and SSIM are compared between result and original HR image in Table 8.1. PSNR in our approach are bigger and RMSE are smaller obviously. It certifies that the recovered HR images by our approach are more approximated to the original HR image. Besides, the bigger SSIM means that our results are more similar to the original HR image in terms of image structures [14].

Table 8.1 The evaluation values for the reconstruction results in different approaches

8.3.2 The Reconstruction for Images with Additive Noises

In this section, several experiments are conducted to demonstrate that the proposed approach is more robust to the image with additive noises. The Comic image is selected in this section, because more complex textures and hierarchical structures are included in this image.

In these experiments, the LR Comic image with different additive noises are magnified into HR image with 249*360 pixels from 83*120 pixels. In each experiment, various Gaussian white noises with constant mean 0 and different standard deviations \( \sigma \) are added into the inputted original test image. The PSNR are shown in Table 8.2. In Table 8.2, the PSNR from our approach is biggest obviously. It proves that the proposed method is more robust to noise than these conventional methods.

Table 8.2 PSNR for reconstructed images with additive Gaussian noise

8.4 Conclusions

In this paper, a novel approach is proposed for SR image based on MCA and dictionary learning. The MCA theory is used to decompose images well both in the dictionary learning step and HR image reconstruction step. The LR dictionary is trained by the KSVD from the extracted feature of LR image patches. The HR dictionary is calculated directly via the sparse representation. It’s helpful to reduce the complexity of the dictionary learning step. In the reconstruction step, the cartoon part can be obtained well by MCA with TV penalty item. The dictionary pairs make a contribution to recovering the texture part for the expected HR image. A series of experiments and results verify the validity and robustness of our approach.