Keywords

1 Introduction

Image super-resolution (SR) refers to the process of creating clear and high-resolution (HR) images from a single low-resolution (LR) image or from a sequence of low-resolution observations (Schultz and Stevenson 1994). In this chapter, the most important SR techniques are explained.

The methods for SR are addressed including the definition of each method. We address big topics of work in super-resolution as: the pure interpolation with high scales of amplification, the use of dictionaries, the variational procedures and the exploiting of gradients sharpening. Each section in this chapter yields a guide for the technical comprehension of each procedure. The technical procedures of the cited articles are not fully reproduced but neither is a superficial description made without ideas for a practical realization.

The first main separation between the SR methods is determined by the resources to employ in the process. In the first case, a group of LR images are used. These procedures refer to the first publications about the topic. In the second case, due to practical situations, the SR is carried out by using only the input image of low resolution. Figures 5.1 and 5.2 show the taxonomies of the more evident classification of the methods, multiple-image SR or single-image SR.

Fig. 5.1
figure 1

Taxonomy of multiple-image super-resolution

Fig. 5.2
figure 2

Taxonomy of single-image super-resolution

In the second class of methods, we refer to the domain of application, spatial domain or frequency domain. The following proposed differentiation between SR methods is based on the mathematical models in order to reach the high resolution. Transformations, probabilistic prediction, direct projection, learning dictionaries, reduction of dimension and reconstruction models under minimization procedures and residual priors are discussed. A common goal is the incorporation of the lost high-frequency details. Finally, we propose two new methods for single-image SR. The first one is based on gradient control, and the second one is a hybrid method based on gradient control and total variation.

The rest of the chapter is organized as follows: In Sect. 5.2, the methods are explained. In Sect. 5.3, the results of the proposed methods are presented. In Sect. 5.4, the metrics used to characterize the methods are presented. Finally, the chapter concludes in Sect. 5.5.

2 Methods

The accuracy in the estimation of the HR image is result of a right selection of mathematical tools and signal processing procedures as transformations, learning models, minimization techniques, and others for reaching the major content of high spatial frequencies or details in the output image. In this section, current methods as well as the proposed SR procedures are explained. SR models for single image and multiple image are considered.

2.1 Image Model

Down-sampling and warping are two processes in consideration for a more realistic representation of the image at low resolution. In the first process, the image is averaged over equal areas of size \( q\, \times \,q \) as can be seen from Eq. (2.1). In the warping process, the image is shifted along x and y directions, and the distances a and b are in pixels. Also, a rotation Ɵ is assumed on the image (Irani and Peleg 1990; Schultz and Stevenson 1994) as can be observed in Eq. (2.2).

$$ g(m,n) = \frac{1}{{q^{2} }}\sum\limits_{x = qm}^{(q + 1)m - 1} {\sum\limits_{y = qm}^{(q + 1)n - 1} {f(x,y)} } $$
(5.1)
$$ w\left[ {\begin{array}{*{20}l} x \\ y \\ 1 \\ \end{array} } \right] = \left( {\left[ {\begin{array}{*{20}c} 1 & 0 & a \\ 0 & 1 & b \\ 0 & 0 & 1 \\ \end{array} } \right] \times \left[ {\begin{array}{*{20}l} {\cos \,\theta } & {\sin \,\theta } & 0 \\ { - \,\sin \,\theta } & {\cos \,\theta } & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]} \right)^{ - 1} \left[ {\begin{array}{*{20}l} m \\ n \\ 1 \\ \end{array} } \right] $$
(5.2)

In a SR algorithm, the model of degradation is fundamental for comparative purposes and evaluation of the effectivity of the algorithm. Equation (5.1) considers the blurring and down-sampling processes, and Eq. (5.2) represents the warping operation. For a number k of LR images with noise added, the model becomes (Irani and Peleg 1990),

$$ g_{k} \left( {m,n} \right) = d\left( {h_{k} \left( {w_{k} \left( {f\left( {x,y} \right)} \right)} \right)} \right) + \eta_{k} \left( {m,n} \right) $$
(5.3)

Equation (5.3) incorporates the distortions that yield a LR image, d is the down-sampling operator, h k is the blurring operator, w k is the warping operator, and f(x, y) is the HR image. Furthermore, the blurring process can be composed by distortions due to the displacement h d , the lens h l , and the sensors h s . The result is a convolution operation. The transformations are shown in Fig. 5.3.

Fig. 5.3
figure 3

Steps to form three LR images g 1 , g 2 , and g 3 from a HR image f. Each branch represents a different acquisition process

2.2 Image Registration

Translation and rotation relationship, between a LR and a HR image, is calculated using Eq. (5.4) (Keren et al. 1998),

$$ \begin{array}{*{20}l} {x = x_{k}^{t} + q_{x} m\,\cos \,\theta_{k} - q_{y} n\,\sin \,\theta_{k} } \\ {y = y_{k}^{t} + q_{x} m\,\sin \,\theta_{k} + q_{y} n\,\cos \,\theta_{k} } \\ \end{array} , $$
(5.4)

where \( x_{k}^{t} \) and \( y_{k}^{t} \) are the displacements, q x and q y the sampling rates, and \( \theta \) the rotation angle. Two acquisitions g1 and g2 with rotation and displacements can be related using the following Eq. (5.5).

$$ g_{2} (m,n) = g_{1} (m\,\cos \,\theta - n\,\sin \,\theta + a,n\,\cos \,\theta + m\,\sin \,\theta + b) $$
(5.5)

2.3 Approximation Between Acquisitions

The approximation to this parameter has been solved using the Taylor series representation. In the first step, \( \sin \,\theta \) and \( \cos \,\theta \) are expressed in series expansion using the first two terms.

$$ g_{2} (m,n) = g_{1} \left( {m + a - n\theta - \frac{{m\theta^{2} }}{2},n + b + m\theta - \frac{{n\theta^{2} }}{2}} \right). $$

Then, the function g1 can be expanded with Taylor series,

$$ g_{2} (m,n) = g_{1} (m,n) + \left( {a - n\theta - \frac{{m\theta^{2} }}{2}} \right)\frac{{\partial g_{1} }}{\partial m} + \left( {b + m\theta - \frac{{n\theta^{2} }}{2}} \right)\frac{{\partial g_{1} }}{\partial n} $$
(5.6)
$$ E(a,b,\theta ) = \sum {\left( {g_{1} (m,n) + \left( {a - n\theta - \frac{{m\theta^{2} }}{2}} \right)\frac{{\partial g_{1} }}{\partial m} + \left( {b + m\theta - \frac{{n\theta^{2} }}{2}} \right)\frac{{\partial g_{1} }}{\partial n} - g_{2} (m,n)} \right)}^{2} $$
(5.7)

Finally, the parameters a, b, and \( \theta \) of Eq. (5.7) are determined using partial derivatives on the final expansion and solving the equation system.

2.4 Frequency Domain

The models in frequency domain consider the sampling theory. There, a 2D array of Dirac deltas (D T ) performs the sampler function. The array has the same form in time and frequency domains (2D impulse train). The acquisition process multiplies the array of D T with the image in the spatial domain point by point. This operation in frequency domain becomes a convolution operation. The advantage is that the resolution of the convolution kernel (sampling array in the frequency domain in the interval of [−π, π]) can be increased for optimal scales of amplification, checking the high-frequency content at the output of the process. The Fourier transform of the sampling is shown in Eq. (5.8),

$$ D_{T} (\omega^{\prime}_{x} ,\omega^{\prime}_{y} ) = \frac{{\sin \left\{ {\omega^{\prime}_{x} \left( {\frac{M - 1}{2}} \right)\Delta x} \right\}\sin \left\{ {\omega^{\prime}_{y} \left( {\frac{L - 1}{2}} \right)\Delta y} \right\}}}{{\sin \left\{ {\omega^{\prime}_{x} \frac{\Delta x}{2}} \right\}\sin \left\{ {\omega^{\prime}_{y} \frac{\Delta y}{2}} \right\}}}, $$
(5.8)

and the convolution with the image can be expressed as in Eq. (5.9),

$$ \begin{aligned} S_{\text{amp}} (j_{1} ,j_{2} ) & = \sum\limits_{{n_{x} = - L/2}}^{L/2} {\sum\limits_{{m_{x} = - M/2}}^{M/2} {S(n_{x} \Delta \omega_{x} ,n_{y} \Delta \omega_{y} )} } \cdot \\ & D_{T} (j_{1} - n_{x} \Delta \omega_{x} + M_{cx} ,j_{2} - n_{y} \Delta \omega_{y} + L_{cx} ;j_{1} ,j_{2} ) \\ \end{aligned} $$
(5.9)

The high-frequency content in \( S_{\text{amp}} \) must be maximized. This strategy has been used in (Morera 2015). Figure 5.4 shows a 1D sampling array in space and frequency domains.

Fig. 5.4
figure 4

Sampling array in a space domain and b frequency domain

2.5 Wavelet Transform

The wavelet transform introduces the analysis of the image generally in four fields of information. The common decomposition brings directional information of fluctuation of the image signal. The coefficients of the transformation are present in four groups. The low-frequency coefficients which are a coarse representation of the image, the horizontal, the vertical and the diagonal coefficients which represent details of directional variations of the image. The most common strategy for SR using wavelets applies a non-sub-sampled wavelet or static wavelet before a wavelet reconstruction, and the first step produces a decomposition of four images with the same dimension as the input. Then, the wavelet reconstruction produces an amplified image with scale factor 2, this strategy is employed in (Morera 2014).

2.6 Multiple-Image SR

The main goal in this group of techniques is the simulation of the process of formation of the image in order to reject the aliasing effects due to the down-sampling effect. A group of acquisitions of the same scene in LR is required for estimation of the HR image.

2.6.1 Iterative Back-Projection

Iterative back-projection (IBP) methods were the first methods developed for spatial-based SR. IBP algorithm yields the desired image that satisfies that the reconstruction error is close to zero. In other words, the IBP is convergent. Having defined the imaging model like the one given in Eq. (5.3), the distance \( \left\| {Af - g} \right\|_{2}^{2} \) is minimized, where matrix \( A \) includes the blur, down-sampling and warping operations, \( f \) is the original HR image, and \( g \) is the observed image. The HR estimated image is generated and afterward refined. Such a guess can be obtained by registering the LR images over a HR grid and then averaged them (Irani and Peleg 1990, 1991, 1992, 1993). The iterative model given in Eq. (5.10) is used to refine the set of the available LR observations. Then, the error between the LR images and the observed ones is obtained and back-projected to the coordinates of the HR image to improve the initial estimation (Irani and Peleg 1993). The Richardson iteration is commonly used in these techniques.

$$ f^{(t + 1)} (x,y) = f^{(t)} (x,y) + \frac{1}{K}\sum\limits_{k = 1}^{K} {w_{k}^{ - 1} \left( {\left( {\left( {g_{k} - g_{k}^{(t)} } \right)\dot{d}} \right) * \dot{h}} \right)} , $$
(5.10)

where \( w_{k}^{ - 1} \) is the inverse of the warping operator, \( \dot{d} \) is the up-sampling operator, \( \dot{h} \) is a deblurring kernel, k = 1…K is the number of LR acquisitions, \( f^{(t + 1)} (x,y) \) is the reconstructed SR image in the (t + 1)th iteration, and \( f^{(t)} (x,y) \) is the reconstructed SR image in the previous (t)th iteration. The shortcoming of this algorithm is that produces artifacts along salient edges.

2.6.2 Maximum Likelihood

The noise term in the imaging model given in Eq. (5.3) is assumed to be additive white Gaussian noise (AWGN) with zero mean and variance \( \sigma^{2} \). Assuming the measurements are independent and the error between images is uncorrelated, the likelihood function of an observed LR image g k for an estimated HR image \( \hat{f} \) (Cheeseman et al. 1994; Capel and Zisserman 1998; Elad and Hel-Or 2001; Farsiu et al. 2004; Pickup et al. 2006; Pickup 2007; Prendergast and Nguyen 2008; Jung et al. 2011a) is,

$$ p\left( {g_{k} |\hat{f}} \right) = \prod\limits_{\forall m,n} {\frac{1}{{\sqrt {2\pi \sigma^{2} } }}} \exp \left( { - \frac{{\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{k} - g_{k} } \right)^{2} }}{{2\sigma^{2} }}} \right). $$
(5.11)

The log-likelihood transforms the product into a summation. Therefore, Eq. (5.11) becomes the summation of a term C that does not depend on \( f \) and the summation of the exponents of the exponential function as shown in Eq. (5.12),

$$ L\left( {g_{k} } \right) = C - \frac{1}{{2\sigma^{2} }}\sum\limits_{\forall m,n} {\left( {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{k} - g_{k} } \right)} . $$
(5.12)

The maximum likelihood (ML) solution (Woods and Galatsanos 2005) seeks a super-resolved image \( \hat{f}_{\text{ML}} \) which maximizes the log-likelihood for all observations. Notice that after maximization the constant term vanishes. Therefore, the super-resolved images can be obtained by maximizing Eq. (5.12) or, equivalently, by minimizing the distance between \( \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{k} \) and \( g_{k} \) as,

$$ \hat{f}_{\text{ML}} = \mathop {\arg \hbox{max} }\limits_{f} \left( {\sum\limits_{\forall m,n} {L(g_{k} )} } \right) = \mathop {\arg \hbox{min} }\limits_{f} \left( {\left\| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{k} - g_{k} } \right\|_{2}^{2} } \right). $$
(5.13)

2.6.3 Maximum a Posteriori

Given the LR images g k , the maximum a posteriori (MAP) method (Cheeseman et al. 1994) finds an estimate \( \hat{f}_{\text{MAP}} \) of the HR image by using the Bayes rule in Eq. (5.14),

$$ p\left( {\hat{f}|g_{1} ,g_{2} , \ldots ,g_{k} } \right) = \frac{{p\left( {g_{1} ,g_{2} , \ldots ,g_{k} |f} \right)p\left( f \right)}}{{p\left( {g_{1} ,g_{2} , \ldots ,g_{k} } \right)}} \propto p\left( {g_{1} ,g_{2} , \ldots ,g_{k} |f} \right)p\left( f \right) $$
(5.14)

The estimate can be found by maximizing log of Eq. (5.14). Notice that the denominator is a constant term that normalizes the probability conditional. This term is going to be zero after maximization then,

$$ \hat{f}_{\text{MAP}} = \mathop {\arg \hbox{max} }\limits_{f} \left( {\log \left( {p\left. {\left( {g_{1} ,g_{2} , \ldots ,g_{k} |f} \right)} \right) + \log p\left( f \right)} \right)} \right). $$
(5.15)

Applying statistical independence between the images gk, Eq. (2.15) can be written as,

$$ \hat{f}_{\text{MAP}} = \mathop {\arg \hbox{max} }\limits_{f} \left( {\sum\limits_{k = 1}^{K} {\log \left( {p\left( {g_{k} |f} \right)} \right)} + \log \left( {p(f)} \right)} \right), $$
(5.16)

where

$$ p\left( {g_{k} |f} \right) \propto \,\exp \left( { - \frac{{\left\| {\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{g}_{k} - g_{k} } \right\|_{2}^{2} }}{{2\sigma_{k}^{2} }}} \right) $$

The probability \( p\left( {g_{k} |f} \right) \) is named the regularization term. This term has been modeled in many different forms; some cases are:

  1. 1.

    Natural image prior (Tappen et al. 2003; Kim and Kwon 2008, 2010).

  2. 2.

    Stationary simultaneous autoregression (SAR) (Villena et al. 2004), which applies uniform smoothness to all the locations in the image.

  3. 3.

    Non-stationary SAR (Woods and Galatsanos 2005) in which the variance of the SAR prediction can be different from one location in the image to another.

  4. 4.

    Soft edge smoothness a priori, which estimates the average length of all level lines in an intensity image (Dai et al. 2007, 2009).

  5. 5.

    Double-exponential Markov random field, which is simply the absolute value of each pixel value (Debes et al. 2007).

  6. 6.

    Potts–Strauss MRF (Martins et al. 2007).

  7. 7.

    Non-local graph-based regularization (Peyre et al. 2008).

  8. 8.

    Corner and edge preservation regularization term (Shao and Wei 2008).

  9. 9.

    Multi-channel smoothness a priori which considers the smoothness between frames (temporal residual) and within frames (spatial residual) of a video sequence (Belekos et al. 2010).

  10. 10.

    Non-local self-similarity (Dong et al. 2011).

  11. 11.

    Total subset variation, which is a convex generalization of the total variation (TV) regularization strategy (Kumar and Nguyen 2010).

  12. 12.

    Mumford–Shah regularization term (Jung et al. 2011b).

  13. 13.

    Morphological-based regularization (Purkait and Chanda 2012).

  14. 14.

    Wavelet-based (Li et al. 2008; Mallat and Yu 2010).

2.7 Single-Image SR

Single-image SR problem is a very ill-posed problem. It is necessary an effective knowledge about the HR image to obtain a well-posed HR estimation. The algorithms are designed for one acquisition of low resolution of the image. Some of the strategies proposed are summarized following,

  1. 1.

    Pure interpolation using estimation of the unknown pixels in the HR image, modification of the kernel of interpolation, and checking the high-frequency content in the estimated output HR image.

  2. 2.

    Learning the HR information from external databases. In this case, many strategies of concentration of the information of the image and clustering are used. Then, the image is divided into overlapping patches and this information is mapped over a dictionary of LR–HR pairs of patches of external images.

  3. 3.

    Manage the information of gradients in the image.

  4. 4.

    Hybrid models used to reconstruct the image with a minimization procedure in which some prior knowledge about the estimation error is included.

2.7.1 Geometric Duality

The concept of geometric duality is one of the most useful tools in the parametric SR with least-square estimation for interpolation, and one of the most cited algorithm in comparison with SR method is the new edge-directed interpolation (NEDI) (Li and Orchad 2001).

The idea behind is that each low-resolution pixel also exists in the HR image and the neighbor pixels are unknown. Hence, with two orthogonal pairs of directions around the low-resolution pixel in the HR image (horizontal, vertical, and diagonal directions), a least-square estimation can be used in each pair. The equation system is constructed in the LR image, and then, the coefficients are used to estimate pixels in the HR initial image. The first estimation is made by using Eq. (5.17),

$$ \hat{Y}_{2i + 1,2j + 1} = \sum\limits_{k = 0}^{1} {\sum\limits_{l = 0}^{1} {\alpha_{2k + l} Y_{2(i + k),2(j + l)} } } $$
(5.17)

where the coefficients are obtained in the same configuration as in the LR image. In this case, the unknown pixels between LR pixels that exist in the HR image (in vertical and horizontal directions) are estimated. In the next step, the unknown pixels between LR pixels that exist in the HR image (in diagonal directions) are estimated. The pixels of each category are shown in Fig. 5.5.

Fig. 5.5
figure 5

Array of pixels in the initial HR image for NEDI interpolation. Black pixels are the LR pixels used to calculate the HR gray pixels. The white pixels are calculated using the white and the black pixels

In (Zhang and Wu 2008), take advantage of NEDI. There, a new restriction is applied including the estimated pixels in the second step, and the minimum square estimation is made using the 8-connected pixels around a central pixel in the diamond configuration shown in Fig. 5.6. They define a 2D piecewise autoregressive (PAR) image model of parameters \( a_{k} ,k \in [1, \ldots ,4] \) to characterize the diagonal correlations in a local diamond window \( W \) and the extra parameters \( b_{k} ,k \in [1, \ldots ,4] \) to impose horizontal and vertical correlations in the LR image as shown in Fig. 5.6b. The parameters are obtained using a linear least-square estimator using four 4-connected neighbors for \( b_{k} \) (horizontal and vertical), and four 8-connected diagonal neighbors, available in the LR image for \( a_{k} \).

Fig. 5.6
figure 6

a Spatial configuration for the known and missing pixels and b the parameters used to characterize the diagonal, horizontal, and the vertical correlations (Zhang and Wu 2008)

To interpolate the missing HR pixel in the window, the least-square strategy of Eq. (5.18) is carried out.

$$ {\hat{\mathbf{y}}} = \mathop {\arg \hbox{min} }\limits_{{\mathbf{y}}} \left\{ {\sum\limits_{i \in W} {\left\| {y_{i} - \sum\limits_{1 \le k \le 4} {a_{k} x_{i\diamondsuit k}^{(8)} } } \right\|} + \sum\limits_{i \in W} {\left\| {x_{i} - \sum\limits_{1 \le k \le 4} {a_{k} y_{i\diamondsuit k}^{(8)} } } \right\|} } \right\}, $$
(5.18)

where \( x_{i} \) and \( y_{i} \) are the LR and the HR pixels, respectively, \( x_{i\diamondsuit k}^{(8)} \) are the four 8-connected LR neighbors available for a missing \( y_{i} \) pixel and for a \( x_{i} \) pixel, and \( y_{i\diamondsuit k}^{(8)} \) denotes its HR missing four 8-connected pixels.

Other approach for NEDI algorithms (Ren et al. 2006; Hung and Siu 2012) uses a weighting matrix \( {\mathbf{W}} \) to assign different influence of the neighbor pixels on the pixel under estimation. The correlation is affected by the distance between pixels. The diagonal correlation model parameter is estimated by using a weighted least-square strategy.

$$ {\mathbf{A}} = \left( {{\mathbf{L}}_{LA}^{T} {\mathbf{WL}}_{LA} } \right)^{ - 1} {\mathbf{L}}_{LA}^{T} {\mathbf{WL}}, $$
(5.19)

where \( {\mathbf{A}} \in R^{4 \times 1} \) is the diagonal correlation model parameter, \( {\mathbf{L}} \in R^{64 \times 1} \) is a vector of the LR pixels, \( {\mathbf{L}}_{LA} \in R^{64 \times 4} \) are the neighbors of \( {\mathbf{L}} \), and \( {\mathbf{W}} \in R^{64 \times 64} \) is the weighting matrix of Eq. (5.20).

$$ {\mathbf{W}}_{i,i} = \exp \left( { - 2\left( {\left\| {{\mathbf{L}}_{c} - {\mathbf{L}}_{LAi} } \right\|_{p} /\sigma_{1} + \left\| {{\mathbf{V}}_{c} - {\mathbf{V}}_{LAi} } \right\|_{p} /\sigma_{2} } \right)} \right), $$
(5.20)

where \( \sigma_{1} \) and \( \sigma_{2} \) are global filter parameters, \( {\mathbf{L}}_{c} \in R^{4 \times 1} \) is the HR geometric structure, \( {\mathbf{L}}_{LAi} \in R^{4 \times 1} \) is the ith LR geometric structure, \( \left\| {\, \cdot \,} \right\|_{p} \) denotes the p-norm (1 or 2), and \( {\mathbf{V}}_{LAi} ,{\mathbf{V}}_{c} \in R^{2 \times 1} \) are the coordinates of L i and L c . The all-rounded correlation model parameter \( {\mathbf{B}} \in R^{8 \times 1} \) is given by,

$$ {\mathbf{B}} = \left( {{\mathbf{L}}_{LB}^{T} {\mathbf{WL}}_{LB} } \right)^{ - 1} {\mathbf{L}}_{LB}^{T} {\mathbf{WL}}, $$
(5.21)

where \( {\mathbf{L}}_{LB} \in R^{64 \times 8} \) are the neighbor’s positions in \( {\mathbf{L}} \).

2.7.2 Learning-Based SR Algorithms

In these algorithms, the relationship between some HR and LR examples (from a specific class like face images, animals) is learned. The training database as example shown in Fig. 5.8 needs to have proper characteristics (Kong et al. 2006). The learned knowledge is a priori term for the reconstruction. The measure of these two factors of sufficiency and predictability is explained in (Kong et al. 2006). In general, a larger database yields better results, but a larger number of irrelevant examples only increase the computational time of search and can disturb the results. The content-based classification of image patches (like codebook) during the training is suggested as alternative in (Li et al. 2009).

The dictionaries to be used can be a self-learned or an external-based dictionary. Typically, some techniques like the K-means are used for clustering of n observations into k clusters and the principal component analysis (PCA) algorithm is employed to reduce the dimensionality. In dictionary learning, it is important to reduce dimensionality of the data. Figure 5.7 shows a typical model of projection for dictionary learning for SR. Figure 5.8 shows an example of a LR image input and a pair of LR–HR dictionary images.

Fig. 5.7
figure 7

Projection of an input image using two external LR–HR dictionaries

Fig. 5.8
figure 8

Low-resolution input image and a pair of LR–HR dictionary images

The projection PCA is based on finding the eigenvectors and eigenvalues of an autocorrelation. This can be expressed as,

$$ {\varvec{\Omega}} = {\varvec{\Psi}}{\varvec{\Lambda}}{\varvec{\Psi}}^{T} , $$
(5.22)

where \( {\varvec{\Omega}} \) is the autocorrelation matrix of the input data \( {\mathbf{U}} \), \( {\varvec{\Psi}} \) is the matrix of eigenvectors, and \( {\varvec{\Lambda}} \) is a diagonal matrix containing the eigenvalues. The eigenspace is the projection of \( {\mathbf{U}} \) into the eigenvectors. The data at high and low resolution \( {\mathbf{U}}_{h} \) and \( {\mathbf{U}}_{l} \) are used to find the minimum distance in a projection over the found eigenspace.

$$ {\varvec{\Delta}}_{h} = {\mathbf{U}}_{h}^{k} {\varvec{\Psi}}_{h} . $$
(5.23)

In dictionary search, the patches represent rows or columns of the data matrix \( {\mathbf{U}}^{h} \) or \( {\mathbf{U}}^{l} \). The strategy is to find the position of the patch at HR with a minimum distance respect to the projection of a LR patch in the eigenspace of HR.

$$ ph({\text{pos}}) = \mathop {\hbox{min} }\limits_{v,l,j} \left\| {\Delta_{h}^{T} {\hat{\mathbf{U}}}_{k}^{h} - \Delta_{h}^{T} \hat{v}_{l,j} } \right\|_{2} ,\quad\left| {\hat{v}_{l,j} \in } \right.{\hat{\mathbf{U}}}_{k}^{l}. $$
(5.24)

2.7.3 Diffusive SR

Perona and Malik (1990) developed a method that employs a diffusion equation for the reconstruction of the image. The local context of the image is processed using a function to restore the edges.

$$ {\text{div}}\,\left( {c\nabla {\mathbf{I}}} \right) = \frac{\partial }{\partial x}\left( {c{\mathbf{I}}_{x} } \right) $$
(5.25)

where \( c \) is a function to control the diffusivity; for example if \( c = 1 \), the process is linear isotropic and homogeneous, and if \( c \) is a function that depends on \( {\mathbf{I}}_{x} \), i.e., \( c = c({\mathbf{I}}_{x} ) \), the process becomes a nonlinear diffusion; however, if \( c \) is a matrix-valued diffusivity, the process is called anisotropic and it will lead to a process where the diffusion is different for different directions. The image is differentiated in cardinal directions, and a group of coefficients are obtained in each point using the information of the gradient.

$$ \nabla_{N} I_{i,j} \equiv I_{i - 1,j} - I_{i,j} ,\,\,\,\nabla_{S} I_{i,j} \equiv I_{i + 1,j} - I_{i,j} ,\,\,\nabla_{E} I_{i,j} \equiv I_{i,j + 1} - I_{i,j} ,\,\,\nabla_{W} I_{i,j} \equiv I_{i,j - 1} - I_{i,j} $$
(5.26)
$$ \begin{aligned} c_{N\,i,j}^{t} = & g\left( {\left\| {\left( {\nabla {\mathbf{I}}} \right)_{i + (1/2),j}^{t} } \right\|} \right),\,\,c_{S\,i,j}^{t} = g\left( {\left\| {\left( {\nabla {\mathbf{I}}} \right)_{i - (1/2),j}^{t} } \right\|} \right),c_{E\,i,j}^{t} = g\left( {\left\| {\left( {\nabla {\mathbf{I}}} \right)_{i,j + (1/2)}^{t} } \right\|} \right) \\ & \quad \,\,\,c_{W\,i,j}^{t} = g\left( {\left\| {\left( {\nabla {\mathbf{I}}} \right)_{i,j - (1/2)}^{t} } \right\|} \right). \\ \end{aligned} $$
(5.27)

Finally, the image is reconstructed by adding the variations in the iterative process of Eq. (5.28).

$$ {\mathbf{I}}_{i,j}^{(t + 1)} = {\mathbf{I}}_{i,j}^{(t)} + \lambda \left[ {c_{N} \nabla_{N} {\mathbf{I}} + \,\,c_{S} \nabla_{S} {\mathbf{I}} + \,\,c_{E} \nabla_{E} {\mathbf{I}} + \,\,c_{W} \nabla_{W} {\mathbf{I}}} \right]_{i,j}^{(t)} $$
(5.28)

This principle has been a guide for local in time processing over the image used in image processing algorithms for adaptation to a local context in the image.

2.7.4 TFOCS

The reconstruction methods require powerful mathematical tools for minimization of the error in the estimation. A resource commonly used is the split-Bregman model in which the norm L1 is employed. In (becker et al. 2011), the library templates for first-order conic solvers (TFOCS) were designed to facilitate the construction of first-order methods for a variety of convex optimization problems. Its development was motivated by its authors’ interest in compressed sensing, sparse recovery, and low-rank matrix completion. In a general form, this tool let us solve x for the inverse problem:

$$ \hbox{min} \,\phi (x)\mathop = \limits^{\Delta } f({\rm A}(x) + b) + h(x), $$
(5.29)

where the function f is smooth and convex, h is convex, \( {\rm A} \) is a lineal operator, and \( b \) a bias vector. The function h also must be prox-capable; in other words, it must be inexpensive to compute its proximity operator of Eq. (5.30)

$$ \hbox{min} \,\Phi _{h} (x,t) = \mathop {\arg \hbox{min} }\limits_{z\,} \,h(z) + \frac{1}{2}t^{ - 1} \left\langle {z - x,z - x} \right\rangle $$
(5.30)

The following is an example of solution with TFOCS; consider the following problem,

$$ \hbox{min} \frac{1}{2}\left\| {Ax - b} \right\|_{2}^{2} ,s.t.\left\| x \right\|_{1} \le \tau $$
(5.31)

this problem can be written as:

$$ \hbox{min} \frac{1}{2}\left\| {Ax - b} \right\|_{2}^{2} + h(x) $$
(5.32)

where \( h(x) = 0 \) if \( \left\| x \right\|_{1} \le \tau \) and +∞ otherwise. Translated to a single line of code:

$$ {\text{x}} = {\text{tfocs}}\left( {{\text{smooth}}\_{\text{quad}},\left\{ {{\text{A}}, - {\text{b}}} \right\},{\text{proj}}\_{\text{l}}1\left( {\text{tau}} \right)} \right); $$

The library was employed in Ren et al. (2017) for the minimization of a function of estimation in which two priors are employed: the first a differential respect to a new estimation based on TV of a central patch respect to a window of search adaptive high-dimensional non-local total variation (AHNLTV) and the second a weighted adaptive geometric duality (AGD). Figure 5.9 shows the visual comparison between bicubic interpolation and AHNLTV-AGD method after HR image estimation.

Fig. 5.9
figure 9

Visual comparison of the HR image using a bicubic interpolation and b AHNLTV-AGD method

2.7.5 Total Variation

Total variation (TV) uses a regularization term as in MAP formulation. It applies similar penalties for a smooth and a step edge, and it preserves edges and avoids ringing effects; Eq. (5.33) is the term of TV,

$$ \rho (f) = \left\| {\nabla f} \right\|_{1} $$
(5.33)

where ∇ is the gradient operator. The TV term can be weighted with an adaptive spatial algorithm based on differences in the curvature. For example, the bilateral total variation (BTV) (Farsiu et al. 2003) is used to approximate TV, and it is defined in Eq. (5.34),

$$ \rho (f) = \sum\limits_{k = 0}^{P} {\sum\limits_{l = 0}^{P} {\alpha^{l + 1} \left\| {f - S_{x}^{k} S_{y}^{l} f} \right\|_{1} } } $$
(5.34)

where \( S_{x}^{k} \) and \( S_{y}^{l} \) shift \( f \) by k and l pixels in the x and y directions to present several scales of derivatives, \( 0 < \alpha < 1 \) imposes a spatial decay on the results (Farsiu et al. 2003), and P is the scale at which the derivatives are calculated (so it calculates derivatives at multiple scales of resolution (Farsiu et al. 2006). In (Wang et al. 2008), the authors discuss that an a priori term generates saturated data if it is applied to unmanned aerial vehicle data. Therefore, it has been suggested to combine it with the Hubert function, resulting in the BTV Hubert of Eq. (5.35),

$$ \rho \left( {\left| x \right|} \right) = \left\{ {\begin{array}{*{20}l} {\frac{{\left| {\nabla x} \right|^{2} }}{2}\quad,{\text{if}}\,A < \alpha \,\,\,} \\ {\frac{\partial A}{\partial x}\quad{\text{othewise}}\quad} \\ \end{array} } \right., $$
(5.35)

where A is the BTV regularization term and α is obtained as α = median [|A − median|A|·|]. This term keeps the smoothness of the continuous regions and preserves edges in discontinuous regions (Wang et al. 2008). In (Li et al. 2010), a locally adaptive version of BTV, called LABTV, has been introduced to provide a balance between the suppression of noise and the preservation of image details (Li et al. 2010). To do so, instead of the L1 norm an Lp norm is used. The value of p for every pixel is defined based on the difference between the pixel and its surroundings. In smooth regions, where the noise reduction is important, p is set to a large value, close to two, and in non-smooth regions, where edge preservation is important, p is set to small values, close to one. The same idea of adaptive norms, but using different methods for obtaining the weights, has been employed in (Omer and Tanaka 2010; Song et al. 2010; Huang et al. 2011; Liu and Sun 2011; Mochizuki et al. 2011).

2.7.6 Gradient Management

The gradients are a topic of interest in SR. The changes in the image are a fundamental evidence of the resolution, and a high-frequency content brings the maximal changes of values between consecutive pixels in the image. The management of gradient has been addressed in two forms: first, by using a dictionary of external gradients of HR and second, by working directly on the LR image and reconstructing the HR gradients with the context of the image and regularization terms.

In these methods (Sun et al. 2008; Wang et al. 2013), a relationship is established in order to sharp the edges. In the first case, the gradients of an external database of HR are analyzed, and with a dictionary technique, the gradients of the LR input image are reconstructed. In the second case, the technique does not require external dictionaries, the procedure is guided by the second derivative of the same LR images amplified using pure interpolation, then a gradient scale factor is incorporated extracted from the local characteristics of the image.

In this chapter, we propose a new algorithm of gradient management and the application for a novel procedure of SR. For example, a bidirectional and orthogonal gradient field is employed. In our algorithm, two new procedures are proposed; in the first, the gradient field employed is calculated as:

$$ l \,\nabla_{Th}^{u} = {\mathbf{I}}_{h}^{u} \left[ {\begin{array}{*{20}l} { - 1} & { - 1} & 1 \\ { - 1} & 0 & 1 \\ { - 1} & 1 & 1 \\ \end{array} } \right]\frac{1}{2} $$
(5.36)

Then, the procedure is integrated as shown in Fig. 5.10; for deeper understanding, refer to (Wang et al. 2013).

Fig. 5.10
figure 10

Overview of the proposed SR algorithm. First, two orthogonal and directional HR gradients as well as a displacement field

The second form of our procedure is the application of the gradient field with independence. That is, the gradient fields are calculated by convolving the image with discrete gradient operators of Eq. (5.37) to obtain the differences along diagonal directions. The resulting model is shown in Fig. 5.11.

Fig. 5.11
figure 11

Bidirectional and orthogonal gradient management with independent branches

$$ \left[ {\begin{array}{*{20}l} { - 1} & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]\frac{1}{2}\,{\text{and}}\left[ {\begin{array}{*{20}l} 0 & 0 & { - 1} \\ 0 & 0 & 0 \\ 1 & 0 & 0 \\ \end{array} } \right]\frac{1}{2} $$
(5.37)

2.7.7 Hybrid BTV and Gradient Management

This section proposes the integration of two powerful tools for SR, the TV and gradient control. In the proposed case, the gradient regularization is applied first using the proposed model of Sect. 5.2.7.6. The technique produces some artifacts when the amplification scale is high, and the regularization term takes high values. The first problem is addressed by TV also exposed previously. This algorithm brings an average of similar pixels around the image for estimation of the high resolution. Here, two characteristics can collaborate for a better result.

The general procedure of the proposed method is shown in Fig. 5.12, and the visual comparison between the LR image and the HR image is exposed in Fig. 5.16. The proposed new algorithm is named orthogonal and directional gradient management and bilateral total variation (ODGM-BTV). It is only an illustration of the multiple possibilities for the creation of SR algorithms.

Fig. 5.12
figure 12

Hybrid model for collaborative SR. The model combines the gradient control and BTV strategies

3 Results of the Proposed Methods

In this section, the results of the proposed methods are illustrated. Experiments on test and real images are presented with scaling factors of 2, 3, and 4. The objective metrics used were peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM), and results are given in Tables 5.1, 5.2, and 5.3. Subjective performance of our SR schemes is evaluated in Figs. 5.13, 5.14, 5.15, and 5.16.

Table 5.1 PSNR/SSIM comparison for multi-directional, diagonal, horizontal, and vertical gradient management with a scale of 2
Table 5.2 PSNR/SSIM comparison for multi-directional, diagonal, horizontal, and vertical gradient management with a scale of 3
Table 5.3 PSNR/SSIM comparison for multi-directional, diagonal, horizontal, and vertical gradient management with scale of 4
Fig. 5.13
figure 13

4× amplification factor using a test image with a diagonal, b horiz–vert, c coupled, and d decoupled gradients

Fig. 5.14
figure 14

Slopes of the estimated HR image (row 60 of the test image in Fig. 5.13). The image was processed using the two proposed algorithms with two orthogonal directions of the slopes independently

Fig. 5.15
figure 15

Processed images with the decoupled gradient algorithm. The scale factors are: 4× for the top row of images, 3× for the second row of images, and 2× for the row of images at the bottom

Fig. 5.16
figure 16

Application of SR using the hybrid BTV and gradient management strategy with a scale of amplification of q = 4, a low-resolution image and b application of ODGM-BTV

3.1 Gradient Management

In these experiments, the group of images shown in Fig. 5.15, included in the BSDS500 database, was used. The amplification factors were 2, 3, and 4. Tables 5.1, 5.2, and 5.3 show the increment in PSNR and SSIM of the second alternative proposed with independence of the two gradient fields. Also, the test image was used to observe the sharpening effect around contours, and the results are shown in Fig. 5.13. Figure 5.14 shows the plot of the row 60, taken from the test image of Fig. 5.13, to illustrate the edge transitions for the HR recovered image.

3.2 Hybrid BTV and Gradient Management

Figure 5.16 shows the result of the proposed method ODGM-BTV using a scale of amplification of 4.

Algorithm:

  • Input: LR image, iteration number

  • For i = 1: iteration number

  1. 1.

    Apply the BTV algorithm to the LR input image.

  2. 2.

    Apply the bidirectional orthogonal gradient management.

  3. 3.

    Update the LR input image with the HR output image.

  • end

  • Output HR image

4 Metrics

The PSNR in dBs of Eq. (5.38) and SSIM of Eq. (5.39) are the metrics most used to evaluate SR algorithms.

$$ {\text{PSNR}} = 10\log_{10} \left( {\frac{{v_{\hbox{max} }^{2} }}{{{\text{MSE}}(x,y)}}} \right), $$
(5.38)

where x and y are the two signals to compare, \( {\text{MSE}}(x,y) \) is the mean square error, and \( v_{\hbox{max} } \) is the maximum possible value in the range of the signals. The SSIM factor (Wang et al. 2004) is calculated as,

$$ {\text{SSIM}}(x,y) = \frac{{\left( {2\,\mu_{x} \,\mu_{y} + c_{1} } \right)\left( {2\sigma_{xy} + c_{2} } \right)}}{{\left( {\mu_{x}^{2} + \mu_{y}^{2} + c_{1} } \right)\left( {\sigma_{x}^{2} + \sigma_{y}^{2} + c_{2} } \right)}}, $$
(5.39)

where \( \mu_{x} \) and \( \mu_{y} \) are the mean value of x and y, \( \sigma_{x}^{2} \), \( \sigma_{y}^{2} \), and \( \sigma_{xy} \) are the variance and covariance of x and y; \( c_{1} \) and \( c_{2} \) are constants terms. Another metric derived from the SSIM is the mean SSIM (MSSIM) of Eq. (5.40)

$$ {\text{MSSIM}} = \frac{1}{M}\sum\limits_{j = 1}^{M} {{\text{SSIM}}(x_{j} ,y_{j} )} , $$
(5.40)

where M is the number of the areas being compared.

4.1 Discussion of the Results

Tables 5.1, 5.2, and 5.3 show an enhancement of the quality parameters SSIM and PSNR of our proposed method over the management of a single gradient. Also, the scales of amplification are greater than 3 with major increments of the quality factors for high scale factors. Our procedure employs a natural following of the gradients, and let to give a more precise dimension of the slopes, it is an important contribution to the state of the art of the algorithms of gradient management. Although the goal of our chapter is an overview of complements for super-resolution and not contributions of a novel algorithms or improvement of the results in the state of the art. The overview shows that SR is a very rich field of investigation. In each step, we can find a possibility of application of some method using the strongest principle of functioning. An example is the combination of the BTV and ODGM, the visual effect is very interesting in Fig. 5.16, and the major resolution by area can be observed. The contribution in this case avoids artifacts from gradient management, and at the same time, a less blurred image is obtained in comparison with BTV method due to the sharping procedure over the edges.

The review of the literature brings some conclusions. The investigation in this topic is extended, and the contributions for the state of the art are in the most of the cases little changes over well-known procedures. Unfortunately, the goal is based on a quality measurement and the benchmark for guide of the results is based on different configurations of the Eqs. (5.38), (5.39), and (5.40). The consequence is that the comparison between many reported algorithms and models is difficult and not always possible. In this point, the borders between classifications of the methods are diffused by this reason the comparison between methods in an overview more than attempts of classification and the explanation of the classification is not useful. Nevertheless, the great creativity exhibited in the different methods and the totally different mathematical solutions make it difficult to establish mathematical comparisons and objective conclusions without considering only empirical results based on measurement metrics.

5 Conclusions

SR is an exciting and diverse subject in the digital processing area and can take all possible forms. Each algorithm has a place in this area of research and is extremely complex and comprehensive. The study of these techniques should be oriented from the beginning because the development of each of them is broad and difficult to reproduce. Sometimes, a small advance can be made in one of them. Also, the initial condition is different in each case and some bases of comparison are required. In the literature, some standard measurements are proposed but the application conditions are diverse. A useful strategy to approach SR research is the knowledge of the cause of preexisting algorithms. Advantages and disadvantages are important factors to consider in order to combine characteristics that produce more convincing effects and better qualities of the output image in a system. The proposed example makes edge sharpening and average for estimation; the first method produces artifacts, but the second fails to produce clear edges. A case was proposed in which these two characteristics can be positively complemented. For future work, we continue the study of multiple possibilities in the field of SR estimation using the transformation of the image and learning from different characterizations as wavelets fluctuations with dictionary learning. Other interesting field is the minimization procedures for multiple residuals priors in the estimations as was made in works as (Ren et al. 2017).