Abstract
Aiming at insufficient detailed description problem caused by the loss of edges during a single low-resolution (LR) image’s reconstruction process, a novel algorithm for super resolution image reconstruction is proposed in this paper, which is based on fusion of internal structural self-similarity dictionary and external convolution neural network parameters learning model. Firstly, for solving training samples too scattered problem, besides external database, an internal database is constructed to learn a dictionary of the single image’s structural self-similarity by multi-scale decomposition approach. Secondly, nonlocal regularization constraint is calculated on the priori knowledge, which is obtained from the internal database of the single LR image. Thirdly, similar block pairs of high and low-resolution samples in the external database are input into a convolution neural network for learning the parameters of reconstructing model. After all, combined parameters learned and the internal dictionary, the single LR image is reconstructed, and by iterative back-projection algorithm its result is improved. Experimental results show that, compared with state-of-the-art algorithms, such as Bicubic, K-SVD algorithm and SRCNN algorithm, our method is more effective and efficient.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Super resolution
- Structural self-similarity
- Convolution natural network
- Nonlocal regularization
- Block matching
1 Introduction
Super-resolution image reconstruction refers to the technology of improving low-quality and low-resolution images to recover high-resolution images, and has important applications in military, medical, public safety and computer vision. The general way of super-resolution image reconstruction is to learn from a large number of high-resolution images to reconstruct high-frequency details of the low-resolution image [1,2,3,4]. Performances of these algorithms are not so satisfied, for they are likely to be affected by training data that are too scattering to effectively represent a given image. On the other hand, structural information of this given image is valuable for itself reconstruction but little attention has been paid so far [5,6,7]. Freedman et al. [8] pointed out that there were many structural self-similar blocks distributed within one single image region, so several related studies on local structural self-similarity extraction are also reported [9, 10]. However, they cannot effectively deal with irregular texture blocks, which are sparsely or infrequently appeared in a single image. Error matching between image blocks will bring a lot of fake textures and make it difficult to guarantee the good effect of reconstruction. In order to solve the problems, this paper proposes a super-resolution image reconstruction method based on the structural self-similarity of the single image. Similar structural blocks of the same scale and different scales are extracted from the single image and used to set up an internal dictionary model. Then the weights of the dictionary are learned by external sample images trained with a convolution neural network. With these data, a reconstructive model adaptive to the given single image is obtained and information of the single image is made best use of. Experimental results verify the effectiveness of our proposed algorithm when compared with other state-of-the-art approaches.
2 Parameters Learning Model Based on Convolution Neural Network
In this paper, reconstructing parameters of the single LR image are learned by a Super-Resolution Convolution Neural Network (SRCNN) [11] framework, which has been proved to have a great capability in extracting the essential features of data sets. Firstly, block pairs of Low-Resolution (LR) and High-Resolution (HR) images from the external database are matched to each other to achieve matching pairs of image blocks. Then these blocks are regarded as samples and input into SRCNN, which consists of three layers of convolution layers, including feature extracting, non-linear mapping and high-resolution image reconstructing parameters achieving, respectively. The framework of our SRCNN is shown in Fig. 1, and three convolution layers of SRCNN deep learning algorithms are expressed as the following equations:
In Eq. (1)–(3), matrix X represents the original single LR image, Yi (i = 1, 2, 3) represents output of each convolution layer, Wi (i = 1, 2, 3) and Bi (i = 1, 2, 3) represent the neuron convolution kernel and neuron bias vector, respectively. Symbol ‘·’ represents a convolution operation, whose result is then processed by the ReLu activation function max {0, x}. With matching pairs of image blocks, this neural network frame needs to learn parameters set Φ = {W1, W2, W3, B1, B2, B3}, which are estimated by minimizing the error loss between the last output of neural network and HR image. Given a HR image Y and its corresponding LR image X, its loss function could be described by using its mean square error L (Φ), as shown in Eq. (4).
Equation (4) can be solved by stochastic gradient descent and back-propagation algorithm together.
3 Extraction of Self-similarity Feature
3.1 Self-similarity on the Same and Multi-scale Images
Image self-similarity refers to as similar features available among the various regions of the entire image. Researches [12] had shown that, for a 5 × 5 image block in a natural image, there were a large number of image blocks in the same scale and different scales can be found in the image. A statistic shows that more than 90% of image blocks can find at least 9 similar image blocks in the same scale of itself; more than 80% of image blocks can find at least 9 similar image blocks of different scales. Based on this image similarity mechanism, we can extract a lot of redundant information of the image itself on the same scale and different scales. Frank et al. [13] pointed out that there were two characteristics in general images: one is that a large number of similar structural regions appear in the whole image; and the second is that these structural similarities can keep consistent on multiple scales of the image, as shown in Fig. 2a).
Since there are so many structure-similar images blocks in the same scale and different scales of a single LR image, we will benefit if we make the best use of these structural similarities in its reconstruction. The basic scheme of our algorithm is shown in Fig. 2b), where HR represents a high-resolution image, and LR represents a corresponding low-resolution image thereto. Size of HR image is s times that of LR image. Suppose \( \varOmega_{1}^{HR} \) and \( \varOmega_{2}^{HR} \) represent two similar blocks with different scales in HR image, and size of \( \varOmega_{2}^{HR} \) is s times that of \( \varOmega_{1}^{HR} \). The corresponding image blocks of \( \varOmega_{1}^{HR} \) and \( \varOmega_{2}^{HR} \) in LR image are \( \varOmega_{1}^{LR} \) and \( \varOmega_{2}^{LR} \). In this case, \( \varOmega_{1}^{LR} \) and \( \varOmega_{2}^{LR} \) in the LR image form a pair of similar image blocks with different scales. Suppose scaling factor between the HR and LR images is the same with that between \( \varOmega_{1}^{HR} \) and \( \varOmega_{2}^{HR} \), and then size of \( \varOmega_{1}^{HR} \) in HR image is exactly the same as \( \varOmega_{2}^{LR} \) in LR image. Accordingly, in recovering block \( \varOmega_{1}^{LR} \) to form block \( \varOmega_{1}^{HR} \) in HR image, \( \varOmega_{2}^{LR} \) could provide helpful additional information for it.
In this paper, LR and HR similarity block pairs in same scale and different scales are derived from images of the external and internal database by using a non-local block matching method. Then, these block pairs are treated as training samples for dictionary learning to reconstruct the image to be restored.
3.2 Non-local Self-similar Block Matching
Researchers have found that [15, 16], natural images have abundant similarities in regions of texture, edge and so on. A low resolution (LR) image can restore its missing details based on this structural high-frequent similarity. It seems that exploiting the similarities between nonlocal patches distributed in different regions of the image can achieve higher image reconstruction resolution [14].
This paper presents a regularization constraint item based on non-local block similarity. Suppose Xi represents the ith block of a single LR image X, its similar blocks \( X_{i}^{l} \) (l = 1, 2, …, L) are firstly searched within X itself, which are then used to estimate Xi by their linear combination. The main idea of non-local constraint is that central point Pi of block Xi can be represented by the weighted average of central point \( P_{i}^{l} \) of block \( X_{i}^{l} \) (l = 1, 2, …, L), which could be described in Eq. (5).
Suppose that each weight vector wi is a matrix of vectors \( w_{i}^{l} \), which consists of weight matrix B. Each Pi is made up of by \( P_{i}^{l} \), which consists of dictionary matrix Ψ. The nonlocal regularization constraint can be expressed as Eq. (6).
In Eq. (6), I is an identity matrix and α is the 2-norm constraint parameter of nonlocal regularization.
4 Structure Self-similarity Extraction
4.1 Low Resolution Degraded Model
LR images are caused by blurring, down-sampling or noise pollution of HR images [17]. The whole degraded process could be approximated as a linear one, as shown in Eq. (7).
Where Y and X are reconstructed HR image and the original LR image, respectively. H represents the down-sampling operation, S is the fuzzy operator, and n is the noise pollution matrix. In order to accurately estimate the HR image matrix Y, some priori knowledge or regular constraint items of the image need to be introduced, as shown in Eq. (8).
In Eq. (8), \( Y \) is the reconstructed HR image, and \( ||X - HSY||_{F}^{2} \) represents error term in observation, α is the regular constraint item in Eq. (6), λ is the weighted balance parameter of regular item.
We deformalize the degraded model constrained by the Eq. (6), and on behalf of the formula (8), the final algorithm is got in Eq. (9).
Parameter μ represents the regularization parameters. After all, we use the iterative back-projection algorithm to further enhance the image reconstruction performance.
4.2 Algorithm in Detail
Firstly, with prior knowledge of the non-local self-similarity in the original LR image, search the best match blocks of the initial super-resolution image with multi-scale method, and then take them as an internal dictionary to learn non-local regularization constraints. Depth learning is generally trained with a large amount of data, but in this case, a relatively small training set consisting of 91 images [3] is used for training. Best match blocks in LR and HR images of these training samples are found out and made up to be a lot of pairs, which would be used to compose of an external dictionary. Both these two kinds of samples are input to convolution neural network for modeling self-structure similarity of LR image. After all, with the non-local regularization constraints learned from internal dictionary, the original LR image is reconstructed. There are four steps: initial interpolation, non-local blocks matching, neural network model learning and non-local regularization constraints.
-
Initial Interpolation. In this paper, cubic bilinear interpolation algorithm, the commonly used algorithm for LR image reconstruction, is selected to build its original HR image, which is later used for LR image’s self-similarity extraction.
-
Non-local blocks matching. The original HR image is partitioned point-by-point into blocks, which are then matched with each other to obtain structure similar blocks. There are two categories of similar image blocks, which are the ones with the same scale and the other ones with different scales. In the case of the same scale blocks set, suppose represents the ith one, we match it across the whole blocks set with the same size to search its closest similar couple block. The difference between the searched block and the current block \( \hat{x}_{i} \) is calculated as Eq. (10).
$$ e^{l}_{i} = \left\| {\left( {\hat{x}^{l}_{i} - \hat{x}_{i} } \right)} \right\|^{2}_{2} $$(10) -
Neural network model training. Convolution neural network model has a strong feature learning ability; therefore, we use SRCNN algorithm with a three-layer structure for dictionary training. Finally, corresponding network model parameters Φ are obtained.
-
Non-local Regularization Constraints. Based on the obtained parameters of convolution neural network model, combined with nonlocal regularization and dictionary data, this section builds a reconstructed image according to Eq. (9).
4.3 Algorithm Enhancement
We use the iterative back-projection algorithm to enhance our reconstructed image, which is based on a down-sampling image degradation model with sub-pixel displacement [18] Firstly, multi-frames of LR image are sampled in sequence and registered, and then errors between the LR image simulation and its observation results are iteratively back-projected to the HR image. Suppose that there are K sequential observation LR images, described as fk (m1, m2) with resolution M1 × M2. Size of the estimated HR image f (n1, n2) is enlarged by s times, which means resolution of the estimated HR image N1 × N2 = (sM1) × sM2). Using the Iterative Back-Project (IBP) method to estimate the HR image can be described as Eq. (11).
In Eq. (11), \( \hat{g}_{k}^{n} \) represents the kth simulation result of LR image in the nth iteration, generated by the actual displacement information of LR images. hBP(m1, m2; n1, n2) is the back-projection kernel, which determines how error affects the HR image construction during each iteration. We use a down-sampling rate of s = 3, and the displacement of sub-pixel (x, y) are (0, 0), …, (1, 3), respectively. We get 8 LR observation images and corresponding simulation images, and calculate errors between these two kinds of images. At last, we obtain the HR image according to the Eq. (11).
4.4 Algorithm Implementation
The proposed self-similar similarity convolution neural network algorithm is divided into two processes, training and reconstruction, respectively.
In order to clarify the algorithm in this paper more clearly, the algorithm flow chart is shown in Fig. 3.
4.5 Non-local Regularization Constraints Example
In this process, non-local regularization constraints of the single LR image are obtained by its structure self-similarity blocks’ representation. According to Eq. (6), non-constraints are iteratively calculated by weighted average matrix made up by each block’s similar representation, and a dictionary, which is consisted of by similar blocks’ central points. A simple example of this process is shown in Fig. 4a). Four clustering results of structure blocks are also shown in Fig. 4b), which represent non-local regularization constraints of the example image.
5 Experimental Results and Analysis
5.1 Experimental Setup
In order to verify the validity of our proposed method, three international public SR databases are used, which are Set5, Set14 and Urban100, and three-layer convolution neural network is used to for model learning. The first layer has 9 × 9 size and 64 convolution kernels and neurons, the second layer has 1 × 1 and 32, and the third layer has 5 × 5 and 1. In the experiment, Bicubic interpolation, K-SVD and convolution neural networks are selected as contrast analysis approaches to compare with the performance of our proposed method, based on indicator of Peak Signal to Noise Ratio (PSNR).
5.2 Experimental Setup
In order to evaluate the quality of image reconstruction, we compare the performance of these methods on PSNR. Taking three images in database Set14 as an example, the reconstructed results with four approaches are shown in Fig. 5 under the condition of magnification of 3 times. From them, local information restored by ours is clearer and more delicate, and global reconstructed images are more approaching to the original images.
PSNR of each approach is shown in Table 1, 2 and 3. Bicubic interpolation method takes the lowest place, only 22.101 db and the best contrast algorithm can reach 40.642 db, while our proposed method can reach the highest 42.204 db.
Our method has an average declining rate 9.17% in PSNR, higher than other methods.
6 Conclusions
The proposed algorithm considers the reconstruction of a single super-resolution of image based on self-structure similarity within the image. The algorithm derives self-similarity of the training samples through the scale decomposition of the image, and makes full use of the structural self-similarity of the input image to solve the problem that training samples are too scattered for representing the LR image. The intrinsic structure self-similarity of the image is obtained through the nonlocal regularization constraint. Finally, the iterative back-projection algorithm is used to further optimize the reconstructive effect. Compared with state-of-the-art algorithms such as Bicubic, KSVD and SRCNN, the proposed algorithm can achieve better reconstructive performance.
References
Li, Z.H.X., He, H., Wang, R., et al.: Single image super-resolution bidirectional group sparsity and directional features. Image Process. 9(24), 2874–2888 (2015)
Dong, C., Loy, C.C., He, K., et al.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2(38), 295–307 (2016)
Yang, J., Wright, J., Huang, T., et al.: Image super-resolution via spare representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Timofte, R., Smet, V., Gool, L.: Anchored neighborhood regression for fast example-based super-resolution. In: IEEE International Conference on Computer Vision, Sydney, pp. 1920–1927. IEEE (2013)
Timofte, R., De Smet, V., Van Gool, L.: A + : adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9006, pp. 111–126. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16817-3_8
Yang, C.Y., Yang, M.H.: Fast direct super-resolution by simple functions. In: IEEE Interaction Conference on Computer Vision, Sydney, pp. 561–568. IEEE (2013)
Dai, D., Timoft, R., Vangool, L.: Jointly optimized regressors for image super-resolution. Comput. Graph. Forum. 34(2), 95–104 (2015)
Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. 30(2), 12 (2011)
Protter, M., Elad, M., Takeda, H., Milanfar, P.: Generalizing the nonlocal-means to super-resolution reconstruction. IEEE Trans. Image Process. 18(1), 36–51 (2009)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Non-local sparse models for image restoration. In: IEEE International Conference on Computer Vision. Kyoto, Japan, pp. 2272–2279. IEEE (2009)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Gkasner, D., Bagon, S., Irani, M.: Super-resolution from a single image. In: International Conference on Computer Vision, Kyoto, pp. 349–356. IEEE (2009)
Candocia, F.M., Principe, J.C.: Super-resolution of images based on local correlations. IEEE Interact. Neural Netw. 2(10), 372–380 (1999)
Dong, W.S., Zhang, L., Shi, G.M., et al.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 4(22), 1620–1630 (2013)
You, X., Xue, W., et al.: Single image super-resolution with non-local balanced low-rank matrix restoration. In: International Conference on Pattern Recognition, Cancun, vol. 10, no. 23, pp. 1255–1260. IEEE (2016)
Xu, J., Zhang, L., Zuo, W., et al.: Patch group based nonlocal self-similarity prior learning for image denoising. In: Proceedings of IEEE Conference on Computer Vision, Santiago, pp. 244–252. IEEE (2015)
Tekalp, A.M.K., Sezan, M.I.: Hight-resolution image reconstruction from lower-resolution image sequences and space varying image restoration. In: Proceedings of the IEEE International Conference on Acoustics. Speech and Signal Processing, San Francisco, pp. 169–172 (1992)
Lu, Y., Imanura, M.: Pyramid-based super-resolution of the under sampled and subpixel shifted image sequence. Int. J. Syst. Technol. 12, 254–263 (2002)
Acknowledgement
This paper is funded by Scientific Project of Guangdong Provincial Transport Department (No. Sci & Tec-2016-02-30), Natural Science Foundation of Guangdong Province under Grant 2018A030313061, in part by the Guangdong Science and Technology Plan under Grant 2017B010124001, Grant 201902020016, and Grant 2019B010139001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, L., Jiang, W., Xiang, W. (2020). Dictionary Learning Based on Structural Self-similarity and Convolution Neural Network. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12488. Springer, Cham. https://doi.org/10.1007/978-3-030-62463-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-62463-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62462-0
Online ISBN: 978-3-030-62463-7
eBook Packages: Computer ScienceComputer Science (R0)