1 Introduction

Computer vision and pattern recognition techniques are utilized for recent digital image system. Researches on the pattern recognition, however, are in trouble due to the degradation of image quality and resolution. In order to address this difficulty, follow-up studies have been focusing on the approach to super-resolution (SR) image reconstruction. This technique is usually used in aviation, surveillance, medical images, and etc. SR image reconstruction techniques are categorized into a multi-frame SR [1, 2], which uses multiple low-resolution images taken from the same scene, and a single-frame SR, which uses a single low-resolution image. There are two different methods for the single-frame SR. The first method uses pixel-interpolation techniques such as the nearest neighbor, bi-linear, B-Spline, cubic-convolution, cubic spline, and etc. [3, 4] Discrete Wavelet Transform (DWT)-based SR methods [59] are also proposed, which estimates coefficients in high-frequency sub-bands (Low-High: LH, High-Low: HL, and High-High: HH) by interpolating from the correlation between sub-bands. Mueller et al. [8] proposed the interpolation method based on the DWT, estimating coefficients in high-frequency sub-bands by iteratively removing noises while preserving important coefficients around edges from multi-scale geometric information generated by contourlet transform. Unfortunately, HR images generated by this method have shortfalls such as blurring edges and/or textures in high frequency sub-bands, and jagging artifacts at diagonal edges. In order to overcome these shortages, interpolation methods [1017] adaptive to edge directions were proposed to preserve coefficients in high-frequency sub-bands. Even though these methods enable to preserve the distinctness of edge regions compared to iterative interpolation methods, it is hard to preserve regions with the detailedtexture. Recently, example-based SR image reconstruction methods [1825] have been receiving attention to address the above problem. This approach utilizes learning database which consists of pairs between a patch in a high-resolution image and a patch in a corresponding low-resolution image. By using the learning database, an image patch in an input image is compared with low-resolution patches, and then the found low-resolution patches are replaced to high-resolution image patches corresponding to found patches because they are linked each other. Freeman et al. [18] first proposed this technique which is a learning-based SR method selecting high-resolution patches corresponding to low-resolution patches by modeling the spatial relationship between two patches using Markov network. When high-frequency patches are selected, borders where overlaps occur are considered to select the best pair to increase the accuracy and connectivity betweenpatches. While the image is magnified, in general, the information loss brings blurring results. The loss of information in high-frequency areas with textures has a more decisive effect than in no texture areas. The reason is that the perception by the human eyes is more sensitive near edges where the difference of brightness exists when the distinctness of images is decided. As described above, the distinctness of images is influenced by the reconstruction in the high-frequency areas. Freemans method does not consider the reconstruction in frequency domains because it just replaces low-resolution image patches to high-resolution image patches. In this regard, this paper proposes an SR image reconstruction method based on the DWT to estimate or reconstruct lost information in high-frequency domain. Our method is a novel example-based technique using wavelet patch-pairs to estimate or reconstruct coefficients in high-frequency sub-bands transformed by DWT.

2 Example-based Super-resolution Method using Discrete Wavelet Transform

Our SR image reconstruction method utilizes the Discrete Wavelet Transformation (DWT) to reconstruct information in high-frequency sub-bands (high-pass) as shown in Fig. 1. In this paper, we reconstruct a HR image magnified 2 times from a Low-Resolution (LR) image. An interpolated image is first magnified 2 times from an input LR image. Once one level DWT is applied to the interpolated image, the input image is decomposed into four sub-bands (Low-Low: LL, Low-High: LH, High-Low: HL, and High-High: HH) with 1/4 image size. The input LR image is inserted into the LL sub-band, and other sub-bands are initialized as zero value (zero-padding) to estimate lost information in high frequency domain. This approach can reduce the loss of the input image because it is preserved in the LLband.

Figure 1
figure 1

Overall flowchart of the proposed method.

LH, HL, and HH as high-frequency sub-bands (high-pass) are obtained by the signed difference between horizontal, vertical, and diagonal directions when the input image is demagnified. This means that information in high-frequency sub-bands depends on pattern features in the LL sub-band because sub-bands are correlated each other. Accordingly, we can infer that pattern information in LH, HL, and HH sub-bands is similar to pattern features in LL sub-band if pattern features between a patch in an input image and a LL sub-band in database is similar. Using this relationship, we can estimate coefficients in LH, HL, and HH sub-bands by comparing between a patch in the LL sub-band and a wavelet patch-pair in learning database which consists of wavelet patch-pairs (a pair of LL, LH, HL, and HH wavelet patches) generated from HR images. In order to compare patches, we use Nearest Intensity - Local Binary Pattern (NI-LBP) which is used to find a class corresponding to a patch, and Subtraction of Center from Neighbors - MSE (SCN-MSE) which finds a wavelet patch-pair with the highest similarity from wavelet patch-pair database of the classified class. After searching wavelet patch-pairs corresponding to all patches in each high-frequency sub-band, each patch in the wavelet patch-pair is inserted into the same patch location corresponding to LH, HL, and HH sub-bands. Finally, we can make a super resolved HR image by applying the inverse DWT to four estimatedsub-bands.

2.1 Generation of Training Dataset

In order to estimate coefficients in high-frequency sub-bands, we generated training database. The training database is generated by DWT from training image database as shown in Fig. 2. After an input image in training image database is decomposed to LL, LH, HL, and HH sub-bands by DWT, patches with the same location and size in sub-bands are divided. A wavelet patch-pair can be created and stored with a pair of (LL,LH,HL,HH). We repeat these processes until all the training images are completed. In this paper, we use Haar Wavelet and the patch size is 3×3 . In addition, a pattern classification of wavelet patch-pair as shown in Fig. 2- 4 is performed to reduce the retrieval cost and to increase the accuracy when we compare the similarity between a patch in LL sub-band and the LL patch of a wavelet patch-pair in training database. This paper uses Neighbor Intensity of Local Binary Pattern (NI-LBP)[26] to classify wavelet patch-pairs. In general, Local Binary Pattern (LBP) descriptor [27] proposed by Ojala, which is based on the statistical features of texture, is used for the pattern recognition. This method has a good performance for the texture classification. LBP codes pixels values by using the relationship between the center pixel value of a patch and neighbor pixels values as shown in Eq. (1). If a neighbor pixel value is equal to or greater than the center pixel value, then the code value is 1,otherwise, 0.

$$ LBP_{p},_{r} = \sum\limits_{n=1}^{p-1}{s(g_{r},_{n}-g_{c})^{2}}^{n} $$
(1)
Figure 2
figure 2

Overall steps to generate the training database with wavelet patch-pair.

Since the LBP has a shortfall of recognizing edges with gradual intensity changes, this paper utilizes NI-LBP descriptor as shown in Eq. (2). This method uses the relationship between neighbor pixels value and the average intensity. If neighbor pixels value are equal to or greater than the average intensity, then the code value is 1,otherwise, 0.

$$\begin{array}{@{}rcl@{}} NI-LBP_{p},_{r} = \sum\limits_{n=1}^{p-1}s(g_{r},_{n}-\mu)2^{n} \ \ s(x) = \begin{cases} 1, & \text{if} \ \ x \geq 0 \\ 0, & \text{otherwise}. \end{cases} \end{array} $$
(2)

where, μ=12 ∑n = 0p−1g r , n .

As shown in Fig. 3a, in case the gradual intensity change exists in the patch, human can divide it into three areas related with two edges. The LBP descriptor classifies it into the pattern with no consideration of edges, but the NI-LBP descriptor can separate it like the perception of the human eyes. In case of Fig. 3b, there is a diagonal edge from the left-bottom to the right-top. The LBP descriptor results in the different result although the intensity change is not significant. The reason is that the LBP descriptor distinguishes values from the intensity of the center pixel value. The result of using the NI-LBP descriptor, however, is similar to the pattern recognized by human. Since the research on the SR image reconstruction is closely interrelated to the visual judgment of human, the pattern classification similar to the perception of the human eyes such as the NI-LBP descriptor can help increase the accuracy. In this paper, wavelet patch-pairs are classified into 256 classes by the NI-LBP descriptor as shown in Fig. 2.

Figure 3
figure 3

The comparison between NI-LBP and LBP in terms of preserving weak edge patterns. (a) and (b) example patchs; a1,b1 examples of coded value using NI-LBP; a2,b2 examples of coded value using LBP.

2.2 Similarity Comparison between Patches in the LLSub-band

In this step, we propose a similarity function between a patch in the LL sub-band and a LL patch in wavelet patch-pairs of the learning database. The similarity comparison algorithm searches a class matching with a patch in the LL sub-band using the NI-LBP descriptor. In order to evaluate the similarity between the input patch and a patch in a matched class, we devise SCN-MSE which is based on the Mean Square Error (MSE) using Subtraction of Center from Neighbors (SCN) in pixels, as in Eq. (3).

$$ SCN-MSE = \frac{1}{N^{2}}\sum_{i=1}^{N}\sum_{j=1}^{N}((I_{ij}-I_{c})-(P_{ij}-P_{c}))^{2} $$
(3)

where I and P are an input patch and an LL patch of wavelet patch-pairs in the matched class, I c and P c are the center pixel values of each patch, and N is the number of pixels in a patch. We repeat it until finding a patch with the highest similarity.

Notice that this paper estimates coefficients in LH, HL, and HH sub-bands which are the difference between gray pixels values in horizontal, vertical and diagonal directions when the input image is decomposed into four sub-bands by DWT. This means that the similarity comparison by gray pixel value may estimate wrong coefficients in sub-bands. Accordingly, the SCN-MSE considering the feature of DWT can resolve this problem. For example, Fig. 4a shows the gray-level of two input patches. Since two input patches have the same pattern by the NI-LBP descriptor, it is the same class and the similarity must be evaluated. Figure 4b depicts the result of the NI-LBP descriptor from those. When we apply the MSE to evaluate the similarity, the similarity between two patches is very low because it is 2500. In the MSE, the similarity is 0 if two patches are identical. However, Fig. 4c shows the result of the SCN that is the signed difference between neighbor pixels values and the center pixel value. This means the difference in horizontal, vertical, and diagonal directions. Accordingly, the SCN corresponds with the concept of DWT. The similarity calculated by MSE from the result of the SCN is 0, so that two patches are identical. Our SCN-MSE is utilized to find a wavelet patch-pair with the highest similarity in the learning database. LH, HL, and HH patches of the wavelet patch-pair are used to estimate coefficients in LH, HL, and HH sub-bands of the input image.

Figure 4
figure 4

An example of the similarity comparison by using the proposed SCN-MSE. a Two image patches. b The result of applying the NI-LBP descriptor. c The result of applying SCN descriptor.

2.3 Estimation of Coefficients in High-pass Sub-bands

We estimate coefficients in LH, HL, and HH sub-bands as shown in Fig. 5. The input LR image is inserted into the LL sub-band. The inserted LL sub-band is divided into 3×3 patches and each patch is compared to an LL patch of wavelet band-pairs in the learning database. If a wavelet patch-pair with the highest similarity by the SCN-MSE is found, LL, HL, and HH patches of the wavelet patch-pair are inserted into the corresponding patch location of LH, HL, and HH sub-bands, respectively. Finally, a high-resolution image is created by performing the inverse DWT from the estimated sub-bands.

Figure 5
figure 5

Overall steps to estimate coefficients in sub-bands through the similarity comparison from Wavelet band-pair database.

3 Experimental Results

This paper proposed the SR reconstruction method based on Discrete Wavelet Transform (DWT) and the example-based technique using patches in order to keep the image discreteness and reduce blurring effects when the low-resolution (LR) image is magnified. We first built a learning database which is acquired from 50 images randomly selected from corel DB. In the experiments, the size of patches is set up to 3×3 and an LR image with the size of 128×128 is magnified to a Super Resolution (SR) image with the size of 256×256. Figure 6 shows the interested regions with the red box in the cameramen and butterfly images to compare the proposed method to previous interpolation methods. The image quality comparisons were evaluated using the regions cropped from HR imagesreconstructed.

Figure 6
figure 6

Cameramen and butterfly images used for experiments. a Cameramen image b Butterfly image; the red box indicates the interested regions to evaluate the proposed method and previous methods.

In Figs. 7 and 8 (a)–(d) show the results of SR using the new edge directed interpolation (NEDI) method [11], the directional filtering and data fusion (DFDF) method [13], the single image detail synthesis with edge prior method16, and the sparse mixing estimators (SME) method [17], respectively. They are based on the interpolation method adaptive to edge directions. e and f in Figs. 7 and 8 show the results of SR using the haar wavelet and db.9/7 wavelet, respectively. g in Figs. 7 and 8 is the results of SR estimating coefficients in high-frequency sub-bands using the countourlet transform. As patch-based methods, h-j in Figs. 7 and 8 show the results of SR using Freeman method [18], Yang method [23], and Kim method [24], respectively. Finally, (l) indicates the truth HR image.

Figure 7
figure 7

The comparison of previous methods and the proposed method in the face, camera region, and camera tripod of the cameraman image. a1,a2 new edge-directed interpolation (NEDI) [11], b1,b2 directional filtering and data fusion (DFDF) [13], c1,c2 the single image detail synthesis with edge prior method [16], d1,d2 sparse mixing estimators (SME) [17], e1,e2 super-resolved image using DASR with haar wavelet function and bicubic interpolation [9], f1,f2 super-resolved image using DASR with db.9/7 wavelet function and bicubic interpolation [9] g1,g2 contourlet [8], h1,h2 Freeman et al. [18], i1,i2 Yang et al. [23], j1,j2 Kim et al. [24]. k1,k2 proposed method, l1,l2 the truth high-resolution image.

Figure 8
figure 8

The comparison of previous methods and the proposed method in the face, camera region, and camera tripod of the cameraman image. a1,a2 new edge-directed interpolation (NEDI) [11], b1,b2 directional filtering and data fusion (DFDF) [13], c1,c2 the single image detail synthesis with edge prior method [16], d1,d2 sparse mixing estimators (SME) [17], e1,e2 super-resolved image using DASR with haar wavelet function and bicubic interpolation [9], f1,f2 super-resolved image using DASR with db.9/7 wavelet function and bicubic interpolation [9] g1,g2 contourlet [8], h1,h2 Freeman et al. [18], i1,i2 Yang et al. [23], j1,j2 Kim et al. [24]. k1,k2 proposed method, l1,l2 the truth high-resolution image.

Figure 7 (a1)-(l1) show the results of the HR image reconstructed from the LR cameramen image. When focusing on the region of the face and camera, the results of theprevious methods might be blurred around edges. Although the result of Freeman method in Fig. 7(h1) is more discrete than previous methods in the face region, we can see the blurring effect around the hair region. In addition, the edges of the camera body are reconstructed well, but the lens region is blurred. On the other hand, we can see that the proposed method as shown in Fig. 7(k1) sharply reconstructed the eyes, nose, and mouth regions of the face and the detailed edges of the camera compared with the HR image of Fig. 7(l1).

Figure 7(a2)-(l2) show the results of HR image reconstruction for the region of the camera tripod in the cameraman image. In the edge-based methods, Fig. 7(a2)-(d2), the SME method of Fig. 7(d2) sharply reconstructed the edge regions than other edge-based methods compared to the high-resolution image, Fig. 7(l2). However, we can see the aliasing effect around the camera tripod. For Wavelet-based methods, Fig. 7(e2)-(g2), the contourlet method of Fig. 7(g2) outperforms others, but results in a little aliasing effect around the camera tripod and the blurring effect in horizontal edge direction on the building background. In the example-based methods, Fig. 7(h2)-(j2), the reconstruction of using Freeman method of Fig. 7(h2) has good results around the horizontal and vertical edges in the building background, and less aliasing effect around the camera tripod. However, the reconstruction of the lawn with complex textures results in the severe blurring effect. As shown in the proposed methods result of Fig. 7(k2), our method outperforms previous methods in the result of the building background, the camera tripod, and the lawn. The reconstructed image quality is similar to the HR image of Fig. 7(12). Figure 8 shows the results of the cropped regions in the reconstructed HR butterfly image. Most of previous methods results in burring effects around the butterfly head as shown in Fig. 8(a1)-(l1). We can see less aliasing effects in case of the reconstruction of the butterfly wing patterns using the edge-based methods from Fig. 8(a1) to Fig. 8(d1). On the other hand, the Wavelet-based methods in Fig. 8(e1)-(g1) have more aliasing effects. Kim et al. method of Fig. 8(j1) has the most distinct result, but our method of Fig. 8(k1) is better than in the regions of the butterfly head and wing patterns. In addition, our method reconstructed two lines in the tip region of the butterfly wing as shown in Fig. 8(k2) compared to the truth HR image of Fig. 8(l2), but other methods from Fig. 8(a2)-(j2) did not. In order to quantitatively evaluate the image resolution quality, we utilize an evaluation measure based on the Mean Square Error (MSE) and the Peak Signal-to-Noise Ratio (PSNR). The MSE is used to measure the amount of data loss through the pixel value comparison. As the PSNR is derived from MSE, it is used to measure the image quality. The PSNR index indicates the ratio between the maximum possible pixel value of the original image and the pixel value by noise. The equation is as the following:

$$ PSNR = 10\log_{10}(\frac{R^{2}}{MSE}) $$
(4)

where R is the maximum possible pixel value of the input image (R is 255 represented by 8 bits) and MSE represents the MSE between the given input image I i n p and the original image I o r g which can be obtained by the following:

$$ MSE = \frac{\sum_{i,j}(I_{inp}(i,j)-I_{org}(i,j))^{2}}{M \times N} $$
(5)

where M and N are the size of the images. The measurement by the ratio of signal noise simply transferred, however, is not enough because the last signal receiver is the human eyes in the image communication. In order to take it into account, Wang et al. [28] proposed a new theory that the human eyes extract structural information of the input signal when Human Visual System (HSV) recognizes images. They devised a Structural SIMilarity (SSIM) of images based on this theory. This method represents the original signal into x={x i |i=1,2,…,m×n} and the distorted signal into y={y i |i=1,2,…,m×n} in the size m×n of the window. The average l(x,y), the contrast c(x,y), and the correlation s(x,y) are calculated as shown in Eq. (6). After that, the structural similarity is measured by multiplying each other.

$$ \begin{aligned} l(x,y) = \frac{2\mu_{x}\mu_{y} + C_{1}}{{\mu_{x}^{2}} + {\mu_{y}^{2}} + C_{1}},\ \ C_{1} = (K_{1}L)^{2} \\ c(x,y) = \frac{2\sigma_{x}\sigma_{y} + C_{2}}{{\sigma_{x}^{2}} + {\sigma_{y}^{2}} + C_{2}},\ \ C_{2} = (K_{2}L)^{2} \\ s(x,y) = \frac{\sigma_{xy} + C_{3}}{\sigma_{x} + \sigma_{y} + C_{3}},\ \ C_{3} = c_{2}/2 \end{aligned} $$
(6)

where μ x and μ y are the mean of the signal x and y as the brightness, respectively, σ x and σ y are the variance of thesignal x and y as the contrast, respectively, σ x y is the covariance between the signal x and y as the correlation between two signals, respectively, and L is the dynamic range of the pixel values. The SSIM measure uses the following parameter settings: K 1=0.01,K 2 = 0.03.

$$ MSSIM(x,y) = \frac{1}{M}\sum_{j=1}^{M} SSIM(x_{i},y_{i}) $$
(7)

where x and y are the reference and distorted images, respectively, x i and y i are the image contents at the jt h local window, and M is the number or local windows in the image.

Table 1 shows the results of measuring the image quality by the PSNR through six experimental images. The bold number indicates the highest PSNR index. The proposed method outperforms other methods in most results except the cat and man images. The PSNR index of our method is the average 8.28db higher than edge-based methods, the average 7.68db higher than wavelet-based methods, and the average 6.02db higher than example-basedmethods.

Table 1 The results of PSNR for the reconstructed HR images using previous methods and the proposed method

Table 2 shows the results of MSSIM to measure the structural similarity based on the perception of the human eyes. The higher similarity indicates that the MSSIM index is closer to 1. The bold number is the results with the highest similarity through six experimental images. The MSSIM index of our method is the heisted similarity as the average 0.95. It is the average 0.1 higher than edge-based methods, the average 0.09 higher than wavelet-based methods, and the average 0.09 higher than example-basedmethods.

Table 2 The results of using MSSIM for the reconstructed HR images using previous methods and the proposed method

4 Conclusions

We proposed a novel example-based Super Resolution (SR) image reconstruction method using the discrete wavelet transform. Our method estimates coefficients in the high-frequency sub-bands by searching the high-frequency patches matching with patches in the sub-bands of the input low-resolution image. As the experimental results, our method reduced blurring effects and aliasing effects. In addition, the high-resolution images were sharply reconstructed compared to the original one. For quantitative analysis, we used the PSNR to measure the amount of data loss and the MSSIM (Mean of Structural Similarity) to measure the structural similarity based on the perception of the human eyes. The experimental results prove that the proposed method outperforms previous methods in most cases.