1 Introduction

Image interpolation is an essential tool for image processing scientists to acquire an HR image from an LR one. The motivation for the research in this field of image processing is the inability of most imaging sensors to obtain the required HR image with a moderate cost. The application of image interpolation in the field of IR image processing is a promising trend as IR images usually have low resolutions. Traditional kernel-based image interpolation techniques have been broadly studied. Most of the research in the kernel-based interpolation techniques was directed to obtaining the best interpolation basis functions [6, 12, 13, 22,23,24, 26, 33].

Splines, Keys’ and optimal maximal order of minimum support (O-MOMS) interpolation techniques are the most common families utilized for image interpolation. These conventional techniques are space-invariant and they do not consider the spatial activities of the image to be interpolated. They also do not consider the mathematical model of the imaging process with a specific type of sensors. Spatially-adaptive kernel-based techniques depend on concepts such as the warped-distance concept. Although these adaptive techniques improve the quality of the interpolated image, especially, near edges, they still do not take into consideration the image capturing model. The entire kernel-based techniques and their adaptive variants can be considered as signal synthesis techniques [28, 33, 38].

This paper presents an efficient technique for obtaining IR images with high resolution. It is based on image SR concepts for resolution enhancement of LR images. The importance of the proposed SR technique is to overcome the limitations of having only LR images due to hardware limitations, because images are acquired from LR sensors.

The organization of this paper is as follows. Section 2 gives an explanation of the research motivations and the related work. Section 3 gives an explanation of the regularized image interpolation technique. Section 4 covers polynomial-based image interpolation. Section 5 gives an explanation of LS interpolation of IR images. Section 6 presents the concepts of SR applied to IR images. Section 7 gives a discussion of single-image SR applied to IR images. Section 8 presents the proposed technique. Section 9 gives the simulation results. Finally, section 10 gives the concluding remarks.

2 Motivations and related work

The rapid and massively increasing development of image processing technologies makes the acquisition of SR images a hot research topic with a very wide range of applications [32]. Moreover, applying SR techniques on low-quality images like IR images is very challenging. The main limitations to acquiring an HR image directly through an IR camera is the manufacturing difficulty, material properties, and imaging environment. Some researches offered designs to enhance the scanning section of the IR imaging system [2, 10]. These designs offered enhancement schemes using four different-angle plate refractors placed in parallel inside the device. These designs are still application-limited due to fabrication complexity, size, and cost.

The manufacturing challenges directed research towards the acquisition of HR infrared images from one or more LR images using SR techniques [3,4,5, 9, 11, 15, 21, 25, 27, 29, 35]. The SR acquisition techniques can be classified into three categories; interpolation techniques [3, 15, 27], reconstruction techniques [3, 9, 25] and learning techniques [5, 11, 21, 29, 35]. Interpolation techniques are the earliest and the most essential methodologies adopted for SR acquisition, but they give images with low quality. On the other hand, reconstruction techniques provide better-quality images by assuming that the LR image is a result of a degradation model comprising noise, distortion, blurring, and down-sampling, but there is still a need for prior knowledge to accurately obtain HR images from the LR ones. Learning techniques represent a new era in image processing resulting in satisfactory results without the need for prior knowledge about the LR degradation model. So, the concepts of learning are adopted in this paper.

Applying SR techniques on IR images is a very interesting and growing research topic. Researchers offered different methods for SR acquisition from IR images based on learning techniques [20, 30, 36, 37]. One of these methods [30] offers the learning stage through combining the information from visible and IR images as images from different sensors carrying complementary information for the same scene. Another method [36] presents the reconstruction of SR images based on their sparse representations by offering a pair of dictionaries, in which the HR and LR patches share the same sparse representations. Another trend adopts compressed sensing (CS) [20, 37] for SR acquisition of images [7] to solve the sparsity reconstruction problem.

3 Regularized image interpolation

Regularization theory, which was basically introduced by Tikhonov and Miller, provides a formal basis for the development of regularized solutions for ill-posed problems [8]. The stabilizing function approach is one of the basic methodologies for the development of regularized solutions. According to this approach, an ill-posed problem can be formulated as the constrained minimization of a certain function, called the stabilizing function [8]. The specific constraints imposed by the stabilizing function approach on the solution depend on the form and the properties of the function used.

From the nature of the problem, these constraints are necessarily related to the a priori information regarding the expected regularized solution. According to the regularization approach, the solution of Eq. (1) is obtained by the minimization of the cost function [8]:

$$ \varPsi \left(\hat{\mathbf{f}}\right)={\left\Vert \mathbf{g}-\mathbf{D}\hat{\mathbf{f}}\right\Vert}^2+\lambda {\left\Vert \mathbf{Q}\hat{\mathbf{f}}\right\Vert}^2 $$
(1)

where Q is the regularization operator and \( \lambda \kern.3em \mathrm{is} \) the regularization parameter.

This minimization is accomplished by taking the derivative of the cost function yielding:

$$ \frac{\partial \varPsi \left(\hat{\mathbf{f}}\right)}{\partial \hat{\mathbf{f}}}=\mathbf{0}=2{\mathbf{D}}^T\left(\mathbf{g}-\mathbf{D}\hat{\mathbf{f}}\right)-2\lambda {\mathbf{Q}}^T\mathbf{Q}\hat{\mathbf{f}} $$
(2)
$$ 2\hat{\mathbf{f}}\left({\mathbf{D}}^T\mathbf{D}+\uplambda {\mathbf{Q}}^T\mathbf{Q}\right)-2{\mathbf{D}}^T\mathbf{g}=\mathbf{0} $$
(3)

where T refers to matrix transpose.

Solving for that \( \hat{\mathbf{f}} \) that provides the minimum of the cost function yields [26]:

$$ {\displaystyle \begin{array}{l}\hat{\mathbf{f}}={\left({\mathbf{D}}^T\mathbf{D}+\uplambda {\mathbf{Q}}^T\mathbf{Q}\right)}^{-\mathbf{1}}{\mathbf{D}}^T\mathbf{g}\\ {}=\mathbf{A}\left(\uplambda \right)\mathbf{g}\end{array}} $$
(4)

The rule of the regularization operator Q is to move the small eigenvalues of D away from zero, while leaving the large eigenvalues unchanged. It also incorporates prior knowledge about the required degree of smoothness of the estimated image into the interpolation process.

The generality of the linear operator Q allows the development of a variety of constraints that can be incorporated into the interpolation operation. For instance [8]:

  1. 1.

    Q = I. In this case, the regularized solution reduces to the regularized inverse filter solution, which is named the pseudo-inverse filter solution, and it is represented as:

$$ \hat{\mathbf{f}}={\left({\mathbf{D}}^T\mathbf{D}+\uplambda \mathbf{I}\right)}^{-1}{\mathbf{D}}^T\mathbf{g} $$
(5)
  1. 2.

    Q= finite difference matrix. In this case, the operator Q is chosen to minimize the second order (or higher order) difference energy of the estimated image.

The 2-D Laplacian is preferred for minimizing the second order difference energy. It is the most popular regularization operator. It is adopted in this paper. The regularization parameter λ controls the trade-off between fidelity of the data and the smoothness of the solution [8].

Fig. 1
figure 1

2-D Laplacian operator

The solution of the regularized image interpolation problem is implemented by the segmentation of the LR image into overlapping segments and the interpolation of each segment separately using Eq. (7) as an inversion process. It is clear that if a global regularization parameter is used, a single matrix inversion process for a matrix of moderate dimensions is required, because the term (DtD + λQtQ)−1 is independent of the image to be interpolated. The interpolation formula can be written in the following form:

$$ {\hat{\mathbf{f}}}_{i,j}={\left({\mathbf{D}}^T\mathbf{D}+\lambda {\mathbf{Q}}^T\mathbf{Q}\right)}^{-1}{\mathbf{D}}^T{\mathbf{g}}_{i,j} $$
(6)

where gi,j and \( {\hat{\mathbf{f}}}_{i,j} \) are the lexicographically-ordered LR and the estimated HR blocks at position (i, j), respectively.

4 Polynomial-based image interpolation

Traditional signal interpolation techniques such as the basic spline (B-spline) interpolation approximate a continuous function from the discrete signal values available and resample this continuous function again. This continuous function is represented as follows [22]:

$$ \hat{f}(x)=\sum \limits_k{c}_k{\beta}^n\left(x-{x}_k\right) $$
(7)

where βn(x) denotes the central B-spline of degree n that is given by [23]:

$$ {\beta}^n(x)=\underset{\left(n+1\right)\kern1em \mathrm{times}}{\underbrace{\beta^0\ast {\beta}^0\ast \cdots \ast {\beta}^0(x)}} $$
(8)

where the * denotes convolution.

From the family of polynomial splines, the cubic spline tends to be the most popular. The closed-form approximation of the cubic spline basis function is given by [33]:

$$ {\beta}^3(x)=\Big\{{\displaystyle \begin{array}{ll}\frac{2}{3}-{\left|x\right|}^2+\frac{{\left|x\right|}^3}{2}& 0\le \mid x\mid <1\\ {}\frac{{\left(2-|x|\right)}^3}{6}& 1\le \mid x\mid <2\\ {}0& 2\le \mid x\mid \end{array}} $$
(9)

For equally-spaced 1-D sampled data f(xk), we define the distances between x and xk and between xk + 1 and x as [28]:

$$ s=x-{x}_k,\kern1em 1-s={x}_{k+1}-x. $$
(10)

where \( {x}_k\le x\le {x}_{k+1} \)

Thus, the cubic spline interpolation process can be expressed as follows [28]:

$$ \hat{f}(x)={c}_{k-1}\left[{\left(3+s\right)}^3-4{\left(2+s\right)}^3+6{\left(1+s\right)}^3-4{s}^3\right]/6+{c}_k\left[{\left(2+s\right)}^3-4{\left(1+s\right)}^3+6{s}^3\right]/6+{c}_{k+1}\left[{\left(1+s\right)}^3-4{s}^3\right]/6+{c}_{k+2}{s}^3/6 $$
(11)

The coefficients ck for cubic spline interpolation are estimated using a pre-processing filtering step. For image interpolation, this process is applied along rows and then along columns [38].

5 Least-squares interpolation of IR images

In the application of the adaptive LS algorithm on IR images, the IR image to be interpolated is split into small overlapping blocks, and the objective is to obtain an interpolated version of each block. We suppose that the relationship between the available LR block and the estimated HR block is given by [3]:

$$ {\hat{\mathbf{X}}}_{i,j}=\mathbf{W}{\mathbf{Y}}_{i,j} $$
(12)

where Yi,j and \( {\hat{\mathbf{X}}}_{i,j} \) are the lexicographically-ordered LR and estimated HR blocks at the block indices (i, j), respectively. W is the weight matrix required to obtain the HR vector from the LR vector. This matrix is required to be adaptive from block to block to accommodate for the local activity levels of each block.

The first look at Eq. (13) leads to the LS solution that can be obtained by minimizing the Mean Square Error (MSE) of the estimation as follows:

$$ \varPsi ={\left\Vert {\mathbf{X}}_{i,j}-{\hat{\mathbf{X}}}_{i,j}\right\Vert}^2={\left\Vert {\mathbf{X}}_{i,j}-\mathbf{W}{\mathbf{Y}}_{i,j}\right\Vert}^2 $$
(13)

Differentiating both sides of Eq.(14) with respect to W gives:

$$ \frac{\partial \varPsi }{\partial \mathbf{W}}=-2\left({\mathbf{X}}_{i,j}-{\hat{\mathbf{X}}}_{i,j}\right){\left({\mathbf{Y}}_{i,j}\right)}^T $$
(14)

This minimization leads directly to the following solution for W

$$ {\mathbf{W}}^{k+1}={\mathbf{W}}^k-\eta {\left[\frac{\partial \varPsi }{\partial \mathbf{W}}\right]}^k={\mathbf{W}}^k+\mu \left({\mathbf{X}}_{i,j}-{\hat{\mathbf{X}}}_{i,j}^k\right){\left({\mathbf{Y}}_{i,j}^k\right)}^T $$
(15)

where η is a constant, μ is the convergence parameter and k  is the iteration number.

The utilization of the above equation in estimating the weight matrix W requires the samples of the original HR block Xi,j to be known, which is not practical. This issue can be fixed by deducing the weights from another HR image and using these weights to interpolate the available LR image. This approach is expected to yield poor visual quality of the interpolated image.

An alternative to the above-mentioned algorithm is to consider the model that relates the available LR block to the original HR block, illustrated in Fig. 2. This model is offered by the following relation [16]:

$$ {\mathbf{Y}}_{i,j}=\mathbf{H}{\mathbf{X}}_{i,j} $$
(16)

The matrix H represents the filtering and down-sampling process that transforms the HR block to the LR block.

Thus, we can deduce the following cost function:

$$ \varPhi ={\left\Vert \mathbf{H}\left({\mathbf{X}}_{i,j}-{\hat{\mathbf{X}}}_{i,j}\right)\right\Vert}^2 $$
(17)

The above equation means reducing the MSE between the available LR block and a down-sampled version of the estimated HR block.

This leads to:

$$ \varPhi ={\left\Vert {\mathbf{Y}}_{i,j}-\mathbf{H}{\hat{\mathbf{X}}}_{i,j}\right\Vert}^2={\left\Vert {\mathbf{Y}}_{i,j}-\mathbf{H}\mathbf{W}{\mathbf{Y}}_{i,j}\right\Vert}^2 $$
(18)

Differentiating Eq. (18) with respect to W:

$$ \frac{\partial \varPhi }{\partial \mathbf{W}}=-2{\mathbf{H}}^t\left({\mathbf{Y}}_{i,j}-\mathbf{H}{\hat{\mathbf{X}}}_{i,j}\right){\left({\mathbf{Y}}_{i,j}\right)}^T $$
(19)

Using Eq. (19), the weight matrix can be deduced with the following equation:

$$ {\mathbf{W}}^{k+1}={\mathbf{W}}^k-\eta {\left[\frac{\partial \boldsymbol{\Phi}}{\partial \mathbf{W}}\right]}^k={\mathbf{W}}^k+\mu {\mathbf{H}}^t\left({\mathbf{Y}}_{i,j}-\mathbf{H}{\hat{\mathbf{X}}}_{i,j}^k\right){\left({\mathbf{Y}}_{i,j}^k\right)}^T $$
(20)

The adaptation of Eq. (20) can be easily performed, since it does not require the original HR block to be known a priori.

6 Super-resolution applied to IR images

The representations of the LR and HR images as vectors are \( {\mathbf{z}}_l\in {\mathbf{R}}^{M_l} \) and \( {\mathbf{y}}_h\in {\mathbf{R}}^{M_h}, \) respectively, where Mh = q2Ml and q is a scale-up factor of type integer that is higher than 1. We also refer to \( \mathbf{B}\in {\mathbf{R}}^{M_h\times {M}_h} \) as the blur operator, and \( \mathbf{H}\in {\mathbf{R}}^{M_l\times {M}_h} \) as the decimation operator with q in each axis. A well-known anti-aliasing low-pass filter is applied on the image to generate an LR image from an HR image [19],

$$ {\mathbf{z}}_l=\mathbf{H}{\mathbf{By}}_h+\mathbf{A} $$
(21)

where A is an additive noise vector or error. The relative problem of the reconstruction of yh from zl is denoted as zooming and deblurring. A bicubic low-pass filter and a Gaussian low-pass filter are possible blur kernels (blur operators B).

As the work is based on the sparse representation of patch pairs, we have to learn the correspondence between LR and HR patches of the same dimensions. So, we obtain an image \( {\mathbf{y}}_l\in {\mathbf{R}}^{M_h} \) by applying bicubic interpolation on the input LR image, and as we address the zooming and deblurring setup, we look forward at recovering the difference image \( {\hat{\mathbf{y}}}_{hl}={\mathbf{y}}_h-{\mathbf{y}}_l \) and then apply \( {\hat{\mathbf{y}}}_h={\hat{\mathbf{y}}}_{hl}+{\mathbf{y}}_l \)to get the final recovery. So, we keep the LR details and only predict the lost HR details. For the concept of image reconstruction based on patches, let an image patch that is centered at location Q with size \( \sqrt{m}\times \sqrt{m} \)and that is extracted from the image vector y of size Mh by the linear operator RQ to be PQ = RQy . A local model could be suggested to predict an HR patch \( {\mathbf{P}}_h^Q \)=RQyh from an LR one \( {\mathbf{P}}_l^Q={\mathbf{R}}_Q{\mathbf{y}}_l \). Once obtaining all HR patch predictions, the recovery of the HR image takes place by averaging of the overlapping recovered patches on their overlaps. Another factor that should be taken into consideration is the trade-off between the reconstructed image quality and the run time, while choosing the size of the overlap between adjacent patches. In order to achieve the best reconstruction quality, work has to be performed with maximally-overlapping patches (overlapping between adjacent patches is of \( \sqrt{m}\times \sqrt{m-1} \)pixels in the horizontal and vertical directions).

Finally, we briefly mention the sparsity-based synthesis model. The core idea of this model is that a signal S ∈ Rm can be represented as a linear combination of a few atoms (signal prototypes) taken from a dictionary D ∈ Rm × n , namely  S =  + η, where αRn is the sparse representation vector and η is the noise or model error. Similar to the previous approaches [34], we assume that each LR patch can be represented over a dictionary \( {\mathbf{D}}_l\in \kern0.5em {\mathbf{R}}^{m\times {n}_l}\kern0.5em \)by a sparse vector\( {\alpha}_l\in {\mathbf{R}}^{n_l} \), and similarly an HR patch is represented over \( {\mathbf{D}}_h\in {\mathbf{R}}^{m\times {n}_h} \) by \( {\boldsymbol{\upalpha}}_h\in {\mathbf{R}}^{n_h} \).

7 Single-image super-resolution applied to IR images

The concept of this model is to predict the missing HR details for each LR patch through a different number of atom pairs of LR and HR dictionaries. The difference in atom numbers between LR and HR dictionaries is natural as each dictionary characterizes a signal with a certain quality. So, for the low-quality one, the dictionary contains fewer atoms than those of the high-quality one. In addition, a small and orthogonal dictionary for the LR patches for complete and under-complete cases, which offers low complexity in sparse coding computations, is considered [14, 21].

First, we start with the low-cost pursuit stage to obtain αl which indicates the LR coefficients. Then, a suggested statistical parametric model is used for the prediction of αh which indicates the HR representation vector of each patch from its corresponding LR coefficients αl. Finally, the single-image SR scheme is presented as a result of the suggested model.

7.1 Low-cost pursuit stage

To sparsely represent the patches, an under-complete dictionary (nl < m) is sufficient enough. To allow a low-cost scale-up scheme, Dl is assumed as an under-complete orthonormal dictionary. Therefore, the inner products of the LR patch with the dictionary atoms results in the LR coefficients.

$$ {\boldsymbol{\upalpha}}_l={\left({\mathbf{D}}_l\right)}^T{\mathbf{P}}_l $$
(22)

A convolutional network is then used to compute the LR coefficients for all overlapping patches \( \left\{{\mathbf{P}}_l^Q\right\} \). The sparsity pattern \( {\mathbf{x}}_l\in {\left\{-1,1\right\}}^{n_l} \) is computed as:

$$ {x}_{l,j}=\Big\{{\displaystyle \begin{array}{c}1,\kern0.5em \mid {\alpha}_{l,j}\mid >\delta \\ {}-1,\kern2em Otherwise\end{array}}\kern0.5em ,\kern1.75em {\forall}_j=1,\dots, {n}_{l,}\operatorname{} $$
(23)

where δ is the maximal threshold satisfying that set, adaptively, for each LR patch based on a residual error criterion.

$$ \sum \limits_{j=1}^{n_l}{\left|{a}_{l,j}\right|}^2\left(|{a}_{l,j}|\le \delta \right)\le m{\rho}^2 $$
(24)

where ρ is a pre-specified parameter that indicates the targeted accuracy of the LR sparse representation.

7.2 The model

As explained in the previous sub-section, the LR dictionary Dl is an under-complete orthogonal dictionary. On the other hand, the HR dictionary Dh is assumed to be a complete or over-complete dictionary in order to allow a sufficient representation power. As there is a difference in the number of atoms between Dland Dh, it is not valid to assume the LR and HR dictionaries to have the same sparsity pattern representations as in all previous works that consider dictionaries. So, a model is required to capture the relations between the two different sparsity patterns \( -{\mathbf{x}}_l\in {\left\{-1,1\right\}}^{n_l} \) for the LR patch and \( {\mathbf{x}}_h\in {\left\{-1,1\right\}}^{n_h} \) for the HR patch. It is necessary to consider the Boltzmann machine prior.

$$ \mathit{\Pr}(s)=\frac{1}{Z}\mathit{\exp}\left({\mathbf{b}}^T\mathbf{x}+\frac{1}{2}{\mathbf{x}}^T\mathbf{Vx}\right) $$
(25)

where b ∈ Rn is a bias vector and V ∈ Rn × n is an interaction matrix used within the sparsity pattern x ∈ {1, −1}n of a single representation vector to capture statistical dependencies. We need to capture the dependencies between the sparsity patterns of the LR-HR pair. So, we use a variant of the Boltzmann machine, named restricted Boltzmann machine and given by the conditional probability.

$$ {\displaystyle \begin{array}{l}\Pr \left({\mathbf{x}}_h|{\mathbf{x}}_l\right)=\frac{1}{Z_1}\exp \left({\mathbf{b}}_h^T{\mathbf{x}}_h+{\mathbf{x}}_h^{\mathrm{T}}{\mathbf{V}}_{hl}{\mathbf{x}}_l\right)\\ {}\kern4em =\prod \limits_{j=1}^{n_h}\varPhi \left(\left({b}_{h,j}+{\mathbf{V}}_{hl,j}^T{\mathbf{x}}_l\right){x}_{h,j}\right)\end{array}} $$
(26)

where \( {\mathbf{b}}_h\in {\mathbf{R}}^{n_h} \) is a bias vector for the HR sparsity pattern,\( {\mathbf{V}}_{hl}\in {\mathbf{R}}^{n_h\times {n}_l} \) is an interaction matrix connecting between the LR and HR sparsity patterns, and Φ(z) = (1 +  exp (−2z))−1 is the sigmoid function. The last equality in the previous equation holds since the entries of xh are statistically independent given xl.

This choice leads to a closed-form formula for the conditional marginal probability of each entry in xh given xl,

$$ \mathit{\Pr}\left({x}_{h.j}=1\left|{\mathbf{x}}_l\right.\right)=\varPhi \left({b}_{h,j}+{\mathbf{V}}_{hl,j}^T{\mathbf{x}}_l\right),\kern1.75em {\forall}_j=1,\dots, {n}_h, $$
(27)

that aligns with the sigmoid unit in neural networks. Then, HR coefficients αh are addressed. Given the sparsity pattern sh and the LR coefficients αl, the following model is suggested:

$$ {\alpha}_{h,j}=\left\{\begin{array}{c}{u}_{j\kern2.25em },\kern0.75em {x}_{h,j}=1\\ {}0\kern2.25em ,\kern0.5em {x}_{h,j}=-1\end{array}\right.,\kern1.75em {\forall}_j=1,\dots, {n}_h, $$
(28)

where \( \mathbf{u}\in {\mathbf{R}}^{n_h} \) is assumed to be Gaussian distributed given αl, so that u|αl~N(Shlαl, Σhl) with \( {\mathbf{S}}_{hl}\in {\mathbf{R}}^{n_h\times {n}_l} \) and \( {\boldsymbol{\Sigma}}_{hl}\in {\mathbf{R}}^{n_h\times {n}_l} \) . Straightforward considerations lead to the following conditional expectation,

$$ E\left[{\alpha}_{h,j}\left|{x}_{h.j}=1,{\boldsymbol{\upalpha}}_l\right.\right]={\mathbf{S}}_{hl,j}^T{\boldsymbol{\upalpha}}_l,\kern1.75em {\forall}_j=1,\dots, {n}_h. $$
(29)

The last equations for each sparsity pattern xh perform a different mapping from αl to αh .

However, all \( {2}^{n_h} \) possible mappings are described through the same matrix Shl. Notice that the prediction in this model is linear only, when the sparsity pattern xh is known, and as we will see in the next sub-section, the final estimator for αl, j given  αl and xl is nonlinear.

7.3 Inference

An MMSE estimator is used for the prediction of each entry in αh from xl and αl [14, 21]

$$ {\displaystyle \begin{array}{l}{\hat{\alpha}}_{h,j}=E\left[{\alpha}_{h,j}|{\mathbf{x}}_l,{\boldsymbol{\upalpha}}_l\right]\\ {}\kern3em =\sum \limits_{{\mathbf{x}}_h\in {\varGamma}_j}E\left[{\alpha}_{h,j}|{\mathbf{x}}_h,{\mathbf{x}}_l,{\boldsymbol{\upalpha}}_l\right]\mathit{\Pr}\left({\mathbf{x}}_h|{\mathbf{x}}_l,{\boldsymbol{\upalpha}}_l\right)\\ {}\kern3em =\sum \limits_{{\mathbf{x}}_h\in {\varGamma}_j}E\left[{\alpha}_{h,j}|{x}_{h,j}=1,{\boldsymbol{\upalpha}}_l\right]\mathit{\Pr}\left({\mathbf{x}}_h|{\mathbf{x}}_l\right)\\ {}\kern3em =E\left[{\alpha}_{h,j}|{x}_{h,j}=1,{\boldsymbol{\upalpha}}_l\right]\mathit{\Pr}\left({x}_{h,j}=1|{\mathbf{x}}_l\right)\\ {}\kern3em =\left({\mathbf{S}}_{hl,j}^T{\boldsymbol{\upalpha}}_l\right)\varPhi \left({b}_{h,j}+{\mathbf{V}}_{hl,j}^T{\mathbf{x}}_l\right),\end{array}} $$
(30)

where \( {\varGamma}_j=\left\{\gamma \in {\mathbf{R}}^{n_h}:{g}_j=1\right\} \).

7.4 Neural network model

The proposed model for single-image SR can be interpreted as a feed-forward neural network providing a highly fast and simple implementation. The objective of this proposed model is to find the network parameters to get the best prediction of the HR patches from the corresponding LR ones. The suggested network consists of the following parameters,

$$ \varTheta =\left\{{\mathbf{D}}_l,{\mathbf{D}}_h,{\mathbf{S}}_{hl},{\mathbf{b}}_h,{\mathbf{V}}_{hl}\right\} $$
(31)

We assume that the process of learning the model parameters is off-line, using a set of LR-HR image pairs. Patches are extracted from each image pair yl, yhl at the same locations, resulting in a training set consisting of N paired LR HR patches\( \kern0.5em \left\{{\mathbf{P}}_l^Q,{\mathbf{P}}_h^Q\right\} \).

The optimization problem formulates the training model parameters Θ

$$ {\displaystyle \begin{array}{c} Argmin\\ {}\varTheta \end{array}}\sum \limits_{K=1}^N{\left\Vert {\mathbf{D}}_h\left(\left[\varPhi \left({\mathbf{b}}_h+{\mathbf{V}}_h{\mathbf{x}}_l^Q\right)\right]\circ \left[{\mathbf{S}}_{hl}{\left({\mathbf{D}}_l\right)}^T{\mathbf{P}}_l^Q\right]\right)-{\mathbf{P}}_h^Q\right\Vert}_2^2 $$
(32)

The product above is the Hadamard product.

To reduce the complexity of solving this joint optimization problem in order to allow learning of model parameters, an initial estimation of Dl, and Dh dictionaries is set using directional PCAs [31] and K-SVD [1] as well-known approaches. Having the true sparsity patterns for each patch pair and given the Dl, and Dh dictionaries estimates, we set an initial estimate for the covariance matrix Shl directly by solving an LS problem. After having the mentioned initial estimates, Dh and Shl can be updated together to be well-tuned. Now, we reach the network innermost layer, where we update the restricted Boltzmann machine parameters Vhl, bh, while the remaining parameters are kept fixed to the previous estimates. Finally, a last tuning of the Dh dictionary takes place to enhance the prediction in terms of the HR patch error.

8 The proposed approach

This proposed approach is based on single-image super resolution applied to IR images.

The steps of the proposed approach can be summarized as follows:

  1. 1.

    Capture the LR IR image.

  2. 2.

    Apply bicubic interpolation with three scenarios on the LR image to generate the enhanced IR image.

  3. 3.

    Get pre-learned parameters by neural network.

  4. 4.

    Apply the estimation process for the HR image.

  5. 5.

    Extract overlapping patches of the HR image.

  6. 6.

    Compute the LR representation from the LR patches using Eq.(22).

  7. 7.

    Compute the LR sparsity pattern from the LR representation using Eq.(23).

  8. 8.

    Compute the MSE for the HR representation using Eq.(30).

  9. 9.

    Apply the recovery process for the HR patches.

  10. 10.

    Recover the LR-HR difference image from the patches.

  11. 11.

    Apply the recovery process for the HR image.

9 Simulation results

In this section, four IR images have been used to test the adaptive LS algorithm and the SR algorithm. Firstly, the model of image down-sampling given in Fig. 2 is applied to the original images to yield the LR images down-sampled by a factor of two in both directions.

Fig. 2
figure 2

Down-sampling process from the N × N HR block to the N/2 × N/2 LR block

The adaptive LS algorithm and the SR algorithm are then tested on the obtained LR images with SNR = 25 dB. The obtained results are given in Figs. 3, 4, 5 and 6. The values of the PSNR and the average number of iterations per block for the adaptive LS algorithm are given in the figures. The database properties of the IR images used are shown in Table 1. The PSNR results of the four cases for all scenarios are given in Table 2. It is clear that the results are good from the PSNR and computation time perspectives (Table 3).

Fig. 3
figure 3

Case 1 of Tree image, SNR = 25 dB. Scenario 1: Bicubic filtering followed by down-sampling by 2, Scenario 2: Bicubic filtering followed by down-sampling by 3, Scenario 3: Gaussian filtering of size 7×7 with standard deviation 1.6 followed by down-sampling by 3

Fig. 4
figure 4

Case 2 of Plane image with SNR = 25 dB

Fig. 5
figure 5

Case 3 of Truck image with SNR = 25 dB

Fig. 6
figure 6

Case 4 of Man image with SNR = 25 dB

Table 1 Properties of test images
Table 2 Summary of the PSNR results
Table 3 Summary of the computation time results

10 Conclusion and future work

This paper investigated SR reconstruction for IR image enhancement with interpolation-based techniques, and learning-based techniques. A learning-based single-image SR technique was applied to IR images. The obtained results have shown good visual quality and superiority compared to those of other techniques. Simulation results revealed good quality of the obtained IR images. More enhancements could take place in the learning stage to provide better resolution IR images for further pattern recognition applications.