A New Single-Image Super-Resolution Using Efficient Feature Fusion and Patch Similarity in Non-Euclidean Space

Nayak, Rajashree; Balabantaray, Bunil Kumar; Patra, Dipti

doi:10.1007/s13369-020-04662-9

A New Single-Image Super-Resolution Using Efficient Feature Fusion and Patch Similarity in Non-Euclidean Space

Research Article-Computer Engineering and Computer Science
Published: 16 June 2020

Volume 45, pages 10261–10285, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

A New Single-Image Super-Resolution Using Efficient Feature Fusion and Patch Similarity in Non-Euclidean Space

Download PDF

Rajashree Nayak¹,
Bunil Kumar Balabantaray² &
Dipti Patra³

333 Accesses
6 Citations
Explore all metrics

Abstract

Efficient trade-off between the reconstruction qualities and the processing time of any single-image super-resolution reconstruction (SISRR) approach is critically influenced by two major aspects. These aspects are (i) appropriate representation of image patch in feature space and (ii) effective searching of candidate patches from the pool of training patches or learned dictionary. This paper proposes a neighbor embedding-based SISRR method. Novelties of our work include integration of (i) efficient feature mapping scheme which fuses multiple correlated features naturally, (ii) faster searching of candidate patches by measuring the patch correlation in non-Euclidean space and (iii) adaptive selection of neighborhood size using patch characteristic. Correlation among features is modeled via global covariance matrix, and the fusion process enables to preserve sufficient structural, spatial correlation among patches. Distance functions based on notion of generalized eigenvalue are used for measuring patch similarity which support faster searching of candidate patches. Performance analysis of the suggested method is compared with some of the competent state-of-the-art methodologies. From the simulated result analysis, proposed work is found to be outperforming in terms of sharpened image details with diminished effect of artifacts at a reasonable computational burden.

Improved Feature Selection for Neighbor Embedding Super-Resolution Using Zernike Moments

Variance Based External Dictionary for Improved Single Image Super-Resolution

Article 01 January 2020

Single Image Super Resolution with Neighbor Embedding and In-place Patch Matching

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Machine vision systems (MVSs) have a wide range of applications in various industry, academia, etc. Usually, MVSs are assembled around the right camera resolution and utilize sensors with well-specialized optics to capture high-quality images. These images are further processed, analyzed and measured by any other hardware or software techniques for decision making. It can effortlessly inspect the image details that are hard to be seen by human eye. However, current imaging system generates images with limited resolution and needs to be enhanced for obtaining high resolution (HR) images. Resolution of a digital image is defined as the detail information contained in that image. HR images contain high pixel density, encompass essential and critical image details and hence offer enhanced visual perception. Thus, HR images become essential in various imaging applications such as forensic image analysis, video surveillance [1], medical imaging, image security analysis [2] and many more. However, the resolution of an image is exclusively influenced by the properties and specification of the capturing device. Possible reasons for the limited resolution may be due to different physical constraints, insufficient amount of photodetectors, inferior spatial sampling rate and inappropriate image capturing process. Moreover, images captured in poor environmental condition get contaminated with different degradations such as fog, haze and smog. Consequently, quality of images gets degraded and results in low resolution (LR) images. Various dehazing and defogging techniques [3, 4] have been suggested in the literature to reinstate the visibility and to diminish the artifacts. These methods works efficiently as the preprocessing step to the reconstruction process but hardly improve the perceived resolution. Possible approaches to boost up the spatial resolution of the captured LR images are either by redesigning the hardware (by integrating adequate number of photodetector or by increasing pixel density per unit area) or by increasing the chip size. However, resolution enhancement by redesigning the hardware is critically challenging, costly and time-consuming in nature. As an alternative, image reconstruction is a mathematical process which generates images from the acquired projections captured at different angles or at different times. It aims at retrieving information that has been lost or obscured in the imaging process without sacrificing the quality of the image. Basically, reconstruction process is intended to model the degradation phenomenon and utilizes the reverse process to retrieve the original images from the degraded, noisy, blurred and aliased images. Over the few decades, Super-resolution reconstruction (SRR) has been recognized as one of the evolving and cost effective off-line technology to enrich the resolution of the LR images. This process aims at estimating HR image by fusing the non-redundant information in the single or multiple LR images of the same scene. SRR methods overcome the limitations associated with the optics and sensor technology and efficiently enhance the resolution.

SRR methodologies have been widely used in various imaging applications such as remote sensing [5], medical imaging [6] and machine vision. In the age of advanced space technology, commercial remote sensors capture images of better spatial resolution. However, quality of images degrades due to various atmospheric effects. Consequently, results in nonaligned LR images. In that scenario, SRR techniques behave as adaptive and low-risk solutions which will effectively enhance the resolution of the archive of remotely sensed LR images. In medical imaging, resolution of the captured multimodal images of a particular patient using the same imaging device at different times may not be consistent. This is due to several physical constraints, environmental conditions and also due to the respiratory behavior of the patient. Consequently, captured images are of limited resolution. SRR methodologies help to increase the resolution of captured images for better diagnostic analysis and proper treatment planning. Another important application of SRR is in the field of video surveillance [7]. Coding of surveillance video for various application needs large volume of memory and storage. As an alternative, we can compress the original video sequences into its lower version to store them effectively. Their resolution can be further improved via SRR techniques at the user end.

SRR of a HR image by using multiple number of LR images is referred as multi-image SRR (MISRR), whereas reconstruction of a HR image with single LR image is referred as single-image SRR (SISRR). Performance of the MISRR methods relies on the accuracy of motion estimation among the LR observations which is again an open research problem. Moreover, error in the registration process affects a lot to the reconstruction process. Furthermore, availability of a series of LR observations of a particular scene encompassing sub-pixel shift among each other is not an easy task. All these limitations necessitate the use of SISRR in reconstructing a HR image. SISRR aims at estimating a pleasant HR solution with enhanced image detail from its LR image. Learning-based approaches have been drawing widespread attention in SISRR as these types of reconstruction process provide better reconstruction capability even at higher magnification factors. These approaches exploit some image priors extracted from external training image datasets to recover the missing image details in the HR image.

Figure 1 depicts the basic flowchart of learning-based single-image super-resolution reconstruction (LBSISRR) methods. Estimation of the output HR image is performed by utilizing two basic steps such as training stage followed by reconstruction stage. Training stage aims at learning the co-occurrence priors or correlation among the training image patches via some learning models. This stage involves two subsequent steps such as feature extraction step and learning step. Training stage starts with the input LR image whose resolution is to be enhanced and a set of training images. Training images contain various HR images and their corresponding LR images. Prior to feature extraction step, the training/testing HR and LR images are divided into overlapped patches. In the feature extraction step, each training/testing HR and LR image patches are represented in feature space, i.e., any predefined feature extraction operator is used to represent each image patch in terms of feature vectors. Next, the learning step involves different learning models to preserve the co-occurrence prior among images patches. This phase aims at providing potential candidate HR patches corresponding to each input LR image patch for the reconstruction process. Candidate patches are selected by measuring the patch similarity in feature space. After the successful implementation of training stage, the reconstruction stage integrates the learned knowledge as a priori term with the input LR image to preserve the missing details in the estimation process. Finally, the estimated HR patches are stitched to generate the super-resolved output image. Basic steps followed in the reconstruction process are summarized as follows:

Generation of training image dataset containing HR images and their corresponding LR images
Division of training images into overlapped patches
Extraction of features for training image patches
Division of input LR image into several patches
Extraction of feature for the input LR image patches
Establishment of learning model to learn the co-occurrence priors among patches
Utilization of appropriate searching method to select candidate patches for each LR patch
Estimation of HR image by utilizing the priors obtained from the learning models.

Based on the variety of learning models available in the literature, LBSISRR methods are categorized as example-based single-image SRR (EX-SISRR) [8,9,10,11,12,13,14,15] methods, neighbor-embedding-based single-image SRR (NE-SISRR) methods [16,17,18,19,20,21,22,23,24], and sparse coding-based single-image super-resolution (SC-SISRR) [25,26,27,28,29,30,31,32,33,34,35] methods. Now-a-days, deep learning-based SRR methods have been successfully implemented for providing fast and high-quality super-resolution (SR) results [36,37,38,39,40].

EX-SISRR methods [8,9,10,11,12,13,14] depend on a numerous amount of example training patches for the reconstruction process. Prediction of HR patch is achieved by learning the Markov random field (MRF) model solved by belief propagation (BP) algorithm. Spatial relationship between the LR patches and their corresponding HR patches is learned from the example images using the Markov network. Despite the recent advances, these methods suffer from heavy computational overload and have a weak generality property in preserving the boundary between two highly contrasted patches [14, 15]. All these bottlenecks make these methods often too slow for practical use [29]. A detailed analysis about some of the significant EX-SISRR methods is provided in Sect. 2.1. Several NE-SISRR and SC-SISRR methods have been proposed to overcome these issues.

NE-SISRR methodologies are based on the idea of local linear embedding (LLE) from the manifold learning and estimate the HR image by linearly combining the neighboring candidate HR patches. Existing state-of-the-art NE-SISRR methodologies utilize two independent steps: (1) searching of K numbers of candidate patches for each input patch in its feature space and (2) obtaining the reconstruction weights of patches for estimating the HR image. Popularly, size of K is chosen randomly or by means of any distance function [18, 23], whereas the weight of each patch is selected by least-squares minimization of reconstruction error. Reconstruction quality of the estimated HR image solely depends on the quality and spatial compatibility among the K number of candidate patches. Computational overload of the estimation process is contributed mainly due to searching of candidate patches using Euclidean distance (Eud) measure. As compared to EX-SISRR methods, NE-SISRR methods exhibit much stronger generalization capability for a range of images [21]. These methods can estimate a high-quality HR solution by utilizing relatively limited training image patches. Some important aspects which influence the performance of NE-SISRR methods are

To obtain efficient feature mapping scheme in a low dimensional feature space which will help to preserve the missing details in the reconstruction process by selecting relevant candidate patches
To choose the optimal size of K which will enable faithful reconstruction. As reconstruction of a HR image with fixed number of neighboring patches causes over or under-fitting of data, thus yields blurred output.
To use a faster searching scheme to get the neighboring patches which will reduce the computational overload of the whole reconstruction process

So far, many improved NE-SISRR methods have been proposed in the literature. The detail description about these methods is provided in Sect. 2.2.

SC-SISRR methods exploit the sparsity prior to solve the reconstruction problem. These methods are based on the assumption that the LR image patch and its corresponding HR patch shares the same sparse representations. As the initial step, these methods aim at learning dictionaries from the external image datasets. Next step aims at searching the dictionary for each image patch to obtain the optimal match. Performance of these methods solely relies on dictionary learning in its feature space and main time-consuming step is the searching mechanism for obtaining the similar candidate patches from the dictionary. In the literature, a number of advanced SC-SISRR methods have been proposed by learning compact dictionaries and/or by utilizing faster searching schemes. A brief description about some of these methods is provided in Sect. 2.3.

1.1 Gaps in the Literature and Solutions to Break Them

Varieties of SISRR approaches have been suggested in the literature (refer Tables 1, 2, 3) to enable an efficient trade-off between reconstruct quality and processing time. An appropriate balance between the quality of HR solutions and the computational speed highly depends upon the selection of training image dataset, feature extraction operation to represent each patch, searching of candidate patches and the learning model to learn the co-occurrence prior among patches. Selection of an efficient training image dataset or an over-complete dictionary from a set of example image patches plays an important role in the successful accomplishment of the reconstruction task. Quality of reconstruction gets enhanced when the training image patches encompass structural as well as statistical similarity with the input image to be super-resolved [15]. However, learning of training dataset or dictionary is performed only once for any reconstruction problem, hence may not contribute more toward the computational overload of the method. Irrespective of the type of learning model, trade-off process is critically influenced by two major aspects such as (1) selection of appropriate feature type to represent image patch and (2) searching of candidate patches from the pool of training image dataset or learned dictionary. A brief discussion about these two aspects is discussed below.

1.1.1 Selection of Feature Type

After doing a critical review from Tables 1, 2 and 3, it observed that most of the SISRR methods [14, 16, 18, 20, 23, 25, 26] commonly use first- and second-order gradient operator to represent each patch. These operators individually preserve the local edge information of patches, but their combination may not preserve the local neighborhood compatibility. Moreover, second-order gradients are sensitive to noises. As compared to them, feature vectors using norm luminance (NL) [18, 24] efficiently preserve the low frequency information but fail to capture the high frequency information of the patch. Zernike moment (ZM)-based feature vectors [24] are insensitive to noise, efficiently preserve the global compatibility, but have limited reconstruction capability. Field-of-experts (FoE)-based feature vectors in [22] and histograms of oriented gradients (HoG)-based feature vectors in [19, 21] help to preserve more image details for images rich in edge information but provide limited performance to images rich in structural and geometrical contents. Methods in [15, 28, 32, 34, 35] utilize pixel intensity to represent image patch. However, intensity values of the patches are much sensitive to noise and contrast variation. Hence, a small disturbance in intensity value may result a significant change in further processing. Methods in [30, 31] use texture of the image as the feature vector to represent image patch. However, this feature vector cannot preserve the structural and statistical correlation among patches, consequently provide limited performance for images rich in geometrical and structural content. Disadvantages of these aforementioned methods necessitate selecting a state-of-the-art feature extraction operator which will fit to variety of image types in preserving high frequency information and spatial correlation in both LR and HR space.

1.1.2 Searching of Candidate Patches

About the searching of candidate patches, almost all the EX-SISRR and NE-SISRR methods unexceptionally utilize KNN-based searching scheme where EuD or its variants are used for measuring the patch similarity [21, 35]. EuD makes the searching process computationally intensive, does not preserve sufficient spatial relationship among patches and is sensitive to small amount of disturbances, hence may not select potential candidate patches. Consequently, estimated HR image suffers from various artifacts. Similarly, almost all SC-SISRR methods popularly use orthogonal matching pursuit (OMP) or linear programming (LP) to search the dictionary. This searching process utilizes $l_2$ norm for minimizing the error which is again computationally prohibitive. However, methods in [21, 26, 29, 32] utilizes some faster searching techniques which results in reducing the computational complexity of the reconstruction process with an compromise with the reconstruction quality. All these aspects motivated us to propose a SISRR method which will provide an efficient trade-off between reconstruction quality and processing speed.

This paper proposes an enhanced yet computationally efficient NE-SISRR method which will enable faster searching of potential candidate patches and will facilitate better preservation of image details with diminished effect of artifacts. Major contributions and novelty of the work include

Proposition of a low-dimensional feature mapping scheme to represent image patch in its feature space. Each patch is represented as a combination of several raw feature attributes such as intensity profile, texture, edge, statistical and spatial information. Fusion of these feature attributes enable to preserve sufficient structural, statistical and spatial homogeneity among patches in the reconstruction process.
(2) Development of a faster searching scheme to get potential candidate patches by measuring the similarity in non-Euclidean space. Patch similarity is measured by distance function based on eigenvalues rather than Eud which speeds up the searching process
(3) Implication of an automotive scheme to choose the neighborhood size of K based on the input patch characteristic. Neighborhood size of each input patch is adaptive in nature and is decided by the patch characteristics.

Afterward, the reconstruction weights are obtained by minimizing the reconstruction error. Finally, the output HR image is estimated by the linear combination of neighboring candidate HR patches. Performance analysis of the suggested method is compared in some of the significant state-of-the-art learning-based methodologies and found to be outperforming in terms of image quality as well as faster searching speed.

Rest of the paper is organized as follows: Sect. 2 provides a brief discussion about some of the significant LBSISRR methods in the literature. Detailed description of the proposed reconstruction method is provided in Sect. 3. Performance evaluation and analysis of the proposed method compared with some of the state-of-the-art LBSISRR methods along with some faster SISRR methods are performed in Sect. 4. Section 5 concludes the paper.

2 Related LBSISRR Methods

2.1 Example-Based Single-Image SRR (EX-SISRR) Methodologies

Freeman et al. [8] pioneered the concept of example-based SRR of single image. In this method, authors select numerous amount of training patches for the reconstruction process. Band-pass filtered feature map is used to represent each patch. Nearest neighbor (NN) search method is used to find candidate patches for each input patch and Eud is used to measure the patch similarity. MRF network is used to learn the correlation among related HR-LR image patches. BP algorithm is used to train the MRF network. Maximum a posteriori (MAP) approach is used to estimate the HR patch by utilizing the learned prior knowledge. However, this method fails to preserve sufficient spatial correlation among patches due to integration of Gaussian smoothness functions in the computation of compatibility functions. Moreover, heavy computational overload, slow rate of convergence make this method prohibitive in practical applications [14]. Number of methods have been proposed in the literature to address these limitations. Authors in [9, 10] proposed example-based methods which enable faster convergence rate, whereas methods in [12,13,14,15] aim at preserving more image details in the reconstructed output image. In method [9], each image patch is represented by Gaussian derivative. Position constraint operation is used to compute compatibility function and squared difference is used as the patch similarity measure, whereas primal sketch priors are used to represent patch in its feature space. In [11], searching of candidate patches is done via K-means algorithm to improve the rate of convergence of the reconstruction process. In [12], authors employ Contourlet coefficients to represent raw image patches and weighted Euclidean distance is used to measure the patch similarity. Method in [13] utilizes structural contents to represent each patch and patch similarity is measured via modified Chi-square distance. In [14, 15], authors utilize image Euclidean distance (IMED) to measure patch similarity. Method in [15] uses a probability way of searching candidate patches and utilizes edge preserving compatibility functions to preserve more image details. Table 1 provides a brief comparative analysis of the popularly used EX-SISRR methods. From the critical review of the literature, it is observed that EX-SISRR methodologies provide sharper output images by selecting appropriate compatibility functions. But computational burden of these methods remain high due to utilization of numerous amount of example images for the reconstruction process and EuD to measure patch similarity. Hence, these methods remain unsuitable for real-time applications.

Table 1 Comparative analysis of EX-SISRR methods

A New Single-Image Super-Resolution Using Efficient Feature Fusion and Patch Similarity in Non-Euclidean Space

Abstract

Similar content being viewed by others

Improved Feature Selection for Neighbor Embedding Super-Resolution Using Zernike Moments

Variance Based External Dictionary for Improved Single Image Super-Resolution

Single Image Super Resolution with Neighbor Embedding and In-place Patch Matching

Explore related subjects

1 Introduction

1.1 Gaps in the Literature and Solutions to Break Them

1.1.1 Selection of Feature Type

1.1.2 Searching of Candidate Patches

2 Related LBSISRR Methods

2.1 Example-Based Single-Image SRR (EX-SISRR) Methodologies

2.2 NE-Based Single-Image SRR (NE-SISRR) Methodologies

2.3 Sparse Coding-Based Single-Image Super-Resolution (SC-SISRR) Methodologies

2.4 Deep Learning-Based SRR

3 Proposed Method

3.1 Selection of \(TR _{optimal}\)

3.2 Selection of Features

3.3 Selection of Candidate Patches

3.4 Selection of Size K

3.5 Computation of Reconstruction Weight and Estimation of HR Image \(\left\{ {{\hat{x}}_{H}^j} \right\} _{j = 1}^N\)

4 Simulations and Result Analysis

4.1 Experimental Setting

4.2 Performance Analysis of Synthetic Data Experiments

4.2.1 Comparison with the State-of-the-Art Faster LBSISRR Methods

4.2.2 Entropy of Reconstructed HR Solutions

4.2.3 Reconstruction Quality Versus Processing Speed

4.3 Performance Analysis for Real-Time Data Experiments

5 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation