Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Introduction

In most imaging applications, a high resolution image is desired and widely required. Where, “high resolution image” means that an image has not only increasing the number of pixel but also increasing the resolving power. Therefore the resolution is related to the ability to distinguish details in the image.

The International Organization for Standardization (ISO) has described a precise method to measure the resolution of a digital camera [1]. The resolution can be measured as the highest frequency pattern of black and white lines where the individual black and white lines can still be visually distinguished in the image. It is expressed in line widths per picture height.

The standard also describes a method to compute the spatial frequency response (SFR) of a digital camera. The spatial frequency response is the digital imaging equivalent of the modulation transfer function (MTF) used in analogue imaging systems. It describes the variation between the maximum and minimum values that is visible as a function of the spatial frequency that is the number of black and white lines per millimeter. It can be measured using an image of a slanted black and white edge, and is expressed in relative spatial frequencies which is relative to the sampling frequency, line widths per picture height, or cycles per millimeter on the image sensor. The resolution chart that is used in the International Organization for Standardization (ISO) standard is shown in Fig. 3.1.

Fig. 3.1
figure 1

ISO resolution chart used to compute the SFR of a digital camera

Image resolution is one of the important factors in digital camera design, since the cameras have been widely used to capture images for numerous imaging applications. Digital cameras have long fast evolved towards a steadily increasing number of pixels. From about 0.3 Mega-pixels in 1993, the number of pixels on the charge-coupled device (CCD) or the complementary metal oxide semiconductor (CMOS) sensor in a digital camera has increased to about 5 Giga-pixels in some of the latest professional models. This pixel count has become the major selling argument for the camera manufacturers. Although the number of pixels of camera is increased, the current resolution grade of digital cameras and their price do not satisfy consumer demands and may not satisfy the future demand also. Thus, finding a way to increase the current resolution level is needed.

The most direct solution to increase spatial resolution of image is to reduce the pixel sizes of image sensor. It needs the high level sensor manufacturing technology, and the cost to do this is very high. The most critical problem of it is the shot noise that degrades the image quality severely, because the amount of light available is also decreased as decreasing the pixel size. Therefore the method reducing the pixel size has the limitation of the pixel size reduction and the current technical status of this is almost saturated.

The other approach is to increase the chip size, it uses the reverse concept of above method and it leads to an increase in capacitance [2]. This second approach is difficult to speed up a charge transfer rate by large capacitance. Thus, this approach is also not a considered effective.

Therefore, a novel approach to overcome above limitations of sensors and optic manufacturing technologies is required. One promising approach is to use the digital image signal processing techniques to obtain a high resolution image or video sequence from observed multiple low resolution images. Recently, such a resolution enhancement approach has been one of the most active research areas, and it is called super resolution or simply resolution enhancement in the literature [262]. The major advantage of super resolution algorithms is that it may costless, and the existing low resolution images can be still utilized. In here, the meaning of using the existing low resolution images is that the obtained high resolution image by a super resolution algorithm consists of real data from several low resolution images not artificial data by computing data from just one image.

The basic condition of super resolution techniques is that the multiple low resolution images captured from the same scene and these are sub-sampled and aliased, as well as shifted with sub-pixel displacement. If the low resolution images have different sub-pixel displacement from each other and if aliasing is present, then a high resolution image can be obtained, since the new information in each low resolution image can be exploited to generate a high resolution image as shown in Fig. 3.2a. Whereas, the low resolution images are shifted by integer pixel units, then it is difficult to generate a high resolution image because each image contains the same information for each other as shown in Fig. 3.2b. That is, there is no new information that can be used to reconstruct a high resolution image.

Fig. 3.2
figure 2

Basic condition for Super resolution. a Sub-pixel displacement. b Integer-pixel displacement

Generally, the super resolution algorithm covers image restoration techniques [63, 64] that produce high quality images from noisy, blurred images although the main concern of it is to reconstruct high resolution image from under-sampled low resolution images. Therefore, the goal of super resolution techniques is to restore a high resolution image from several degraded and aliased low resolution images as illustrated in Fig. 3.3.

Fig. 3.3
figure 3

Process sequence of Super resolution technique

Big difference between restoration and super resolution is that the restoration does not change the size of image. In fact, restoration and super resolution reconstruction are closely related theoretically, and super resolution reconstruction can be considered as a second-generation problem of image restoration.

Another problem related to super resolution reconstruction is image interpolation that has been used to increase the size of a single image. Although this field has been extensively studied [6567], the quality of an image magnified from an aliased low resolution image is inherently limited even though the ideal “sinc” basis function is employed. That is, single image interpolation cannot recover the high-frequency components lost or degraded during the low resolution sampling process. For this reason, image interpolation methods are not considered as super resolution techniques.

To achieve further improvements in this field, the next step requires the utilization of multiple data sets in which additional data constraints from several observations of the same scene can be used. The fusion of information from various observations of the same scene allows us super resolution reconstruction of the scene.

3.2 Observation Model

To comprehensively analyse the super resolution algorithm, the definition of the relation between a high resolution image and several low resolution images is necessary. One of famous and widely used the image formulation model is the observation model. The basic concept of observation model is if we can know how several low resolution images are generated from a high resolution image, then we can reconstruct a high resolution image from several low resolution images by using reverse process of observation model.

In this chapter, we employed the observation model for video sequence since the goal of this chapter is to obtain super resolution image on general video recording system. Let us denote by f (x, y, t) the continuous in time and space dynamic scene which is being captured. If the scene is sampled according to the Nyquist criterion in time and space, it is represented by the high resolution sequence f l (m, n), where \( l = 1, \ldots ,L \), \( m = 0, \ldots ,PM - 1 \), and \( n = 0, \ldots ,PN - 1 \), the discrete temporal and spatial coordinates, respectively.

For reasons that will become clear right away, the parameter P is referred to as the magnification factor. Note that although different magnification factors P r and P c can be used for rows and columns, respectively, for simplicity and without lack of generality, we used the same factor P for both directions. It is, however, important to low resolution images that depending on the available images we may not be able to improve the spatial image resolution in both directions at the same degree.

Before we proceed, a matrix-vector representation of images and image sequences is introduced to use in addition with the point-wise representation. Using matrix-vector notation, each PM × PN image can be transformed into a (PM × PN) × 1 column vector, obtained by lexicographic image ordering.

The (PM × PN) × 1 vector that represents the l-th image in the high resolution sequence is denoted by f l , with \( l = 1, \ldots ,L \). Additionally, if all frames f l , \( l = 1, \ldots ,L \), are lexicographically ordered, the vector f of dimensions (L × PM × PN) × 1 is obtained.

The high resolution sequence f is input to the imaging system which generates the low resolution images denoted by g as illustrated in Fig. 3.4. The goal of super resolution is to obtain a high resolution frame, f k , from the available low resolution images. All of the described techniques, however, may be applied to the super resolution of video by using, for example, a sliding window approach, as illustrated in Fig. 3.5. Alternatively, temporally recursive techniques can be developed in estimating a super resolution sequence of images. To obtain f k , the imaging system and the temporal relationship between high resolution and low resolution sequences need to be modeled.

Fig. 3.4
figure 4

Low resolution video acquisition model

Fig. 3.5
figure 5

Obtaining sequence of high resolution images from a set of low resolution images (The sliding window approach)

  • f k : the lexicographically ordered image of the k-th high resolution frame, vector f

  • g k : the lexicographically ordered image of the k-th low resolution frame, vector g

For the majority of the published work, sought after high resolution images \( {\text{f}}_{1} , \ldots ,{\text{f}}_{L} \), are assumed to satisfy

$$ f_{l} \left( {m,n} \right) = f_{k} \left( {m + d_{l,k}^{x} \left( {m,n} \right), n + d_{l,k}^{y} \left( {m,n} \right)} \right) $$
(3.1)

where \( d_{l,k}^{x} \left( {m,n} \right) \) and \( d_{l,k}^{y} \left( {m,n} \right) \) denote respectively the horizontal and vertical components of the displacement, that is,

$$ d_{l,k} \left( {m,n} \right) = \left( {d_{l,k}^{x} \left( {m,n} \right), d_{l,k}^{y} \left( {m,n} \right)} \right) $$
(3.2)

The model of Eq. (3.1) is a reasonable one under the assumption of constant illumination conditions in the same scene. It leads to the estimation of the optical flow in the scene, not necessarily, to the estimation of the true motion. Note that the above model applies to both local and global motion. Also note that there may exist pixels in one image for which no motion vector exists (occlusion problem), and pixels for which the displacement vectors are not unique. Finally, note that we are not including noise in the above model, since we will incorporate it later when describing the process to obtain the low resolution observations.

Equation (3.1) can be rewritten using matrix-vector notation as

$$ {\mathbf{f}}_{l} = {\mathbf{C}}\left( {{\mathbf{d}}_{l,k} } \right){\mathbf{f}}_{k} $$
(3.3)

where C(d l,k ) is the (PM × PN) × (PM × PN) matrix that maps frame f l to frame f k , and d l,k is the (PM × PN) × 2 matrix defined by lexicographically ordering the vertical and horizontal components of the displacements between the two frames. We will be using the scalar and matrix-vector notation interchangeably through this manuscript.

The motion estimation problem, as encountered in many video processing applications, consists of the estimation of d l,k or C(d l,k ) given f l and f k . What makes the problem even more challenging in super resolution is the fact that although the high resolution motion vector field is required, to get the high resolution images are not available, and therefore this field must be estimated utilizing the low resolution images. The accuracy of the d l,k is of the outmost important in determining the quality of the sought after high resolution images.

3.2.1 The Warp-Blur Model

As the name implies, with this model the warping of an image is applied before it is blurred. This case is shown as Fig. 3.6.

Fig. 3.6
figure 6

Warp–blur model relating low resolution images to high resolution images

The low resolution discrete sequence is denoted by g l (i, j), with \( i = 0, \ldots ,M - 1 \), \( j = 0, \ldots ,N - 1 \). Using matrix-vector notation, each low resolution image is denoted by the (M × N) × 1 vector g l . The low resolution image g l is related to the high resolution image f l by

$$ {\mathbf{g}}_{l} = {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{f}}_{l} + \eta_{l} $$
(3.4)

where the matrix H l of size (PM × PN) × (PM × PN) describes the filtering of the high resolution image, A l is the down sampling matrix of size MN × (PM × PN), and η l denotes the observation noise. The matrices A l and H l are generally assumed to be known.

Equation (3.4) expresses the relationship between the low resolution and high resolution frames g l and f l , while Eq. (3.3) expresses the relationship between frames l and k in the high resolution sequence. Combining these two equations we obtain the following equation which describes the acquisition of a low resolution image g l from the unknown high resolution image f k ,

$$ {\mathbf{g}}_{l} = {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{C}}\left( {{\text{d}}_{l,k} } \right){\mathbf{f}}_{k} + \eta_{l} +\upmu_{l,k} = {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{C}}\left( {{\text{d}}_{l,k} } \right){\mathbf{f}}_{k} + {\mathbf{e}}_{l,k} $$
(3.5)

where μ l,k represents the registration noise and e l,k represents the combined acquisition and registration noise. It is clear from Eq. (3.5) that C(d l,k )—the warp—is applied first on f k , followed by the application of the blur H l . This process is pictorially illustrated in Fig. 3.7.

Fig. 3.7
figure 7

Graphical depiction of the relationship between the observed low resolution images and the high resolution images

Note that the above equation shows the dependency of g l on both unknowns, the high resolution image f k and the motion vectors d l,k . This observation model was first formulated in [34], without matrix notation, and later written in matrix form by [34]. Wang and Qi [68] attributes this model to [31]. The acquisition model utilized in [11] for deriving frequency domain super resolution methods can also be written using this model If we assume that the noise e l,k in Eq. (3.5) is Gaussian with zero mean and variance \( \sigma^{2} \), denoted by \( N\left( {0, \sigma^{2} I} \right) \), the above equation produces the following conditional probability density functions to be used within the Bayesian framework,

$$ {\mathbf{P}}_{G} \left( {{\mathbf{g}}_{l} |{\mathbf{f}}_{k} ,{\text{d}}_{l,k} } \right)\,{ \propto }\,{ \exp }\left[ { - \frac{1}{{2\sigma^{2} }}\left\| {{\mathbf{g}}_{l} - {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{C}}\left( {{\text{d}}_{l,k} } \right){\mathbf{f}}_{k} } \right\|^{2} } \right] $$
(3.6)

such as a noise model has been used widely.

A uniform noise model is proposed by [2528]. The noise model used by these authors is oriented toward the use of the projection onto convex sets (POCS) method in super resolution problems. The associated conditional probability density functions has the form

$$ {\mathbf{P}}_{G} \left( {{\mathbf{g}}_{l} |{\mathbf{f}}_{k} ,{\text{d}}_{l,k} } \right){ \propto }\left\{ {\begin{array}{*{20}c} {const} & {{\text{if}} \left| {\left[ {{\mathbf{g}}_{l} - {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{C}}\left( {{\text{d}}_{l,k} } \right){\mathbf{f}}_{k} } \right]\left( i \right)} \right| \le c, \forall i} \\ 0 & {elsewhere} \\ \end{array} } \right. $$
(3.7)

where the interpretation of the index i is that it represents the i-th element of the vector inside the brackets.

The zero value of c can be thought of as the limit of P G (g l |f k , d l,k ) in Eq. (3.6) when σ = 0. Farsiu et al. [69, 70] have recently proposed the use of a generalized Gaussian Markov random field (GGMRF) [71] to model the noise in the image formation process for super resolution problems. Thus, Eq. (3.7) can be written as

$$ {\mathbf{P}}_{GG} \left( {{\mathbf{g}}_{l} |{\mathbf{f}}_{k},{\text{d}}_{l,k} } \right)\,{ \propto }\,{ \exp }\left[ { -\frac{1}{{2\sigma^{p} }}\left\| {{\mathbf{g}}_{l} - {\mathbf{A}}_{l}{\mathbf{H}}_{l} {\mathbf{C}}\left( {{\text{d}}_{l,k} }\right){\mathbf{f}}_{k} } \right\|_{p}^{p} } \right] $$
(3.8)

3.2.2 The Blur-Warp Model

Another acquisition model which has been used in the literature [29, 71, 72] first considers the blurring of the high resolution image, followed by warping and down-sampling, as shown in Fig. 3.8. In this case, the observation model becomes

Fig. 3.8
figure 8

Blur-warp model relating low resolution images to high resolution images

$$ {\mathbf{g}}_{l} = {\mathbf{A}}_{l} {\mathbf{H}}_{l} {\mathbf{M}}\left( {{\text{m}}_{l,k} } \right){\mathbf{f}}_{k} + \eta_{l} +\upmu_{l,k} = {\mathbf{A}}_{l} {\mathbf{M}}\left( {m_{l,k} } \right){\mathbf{B}}_{l} {\mathbf{f}}_{k} + {\mathbf{w}}_{l,k} $$
(3.9)

where w l,k denotes the acquisition and registration noise, B l the blurring matrix for the l-th high resolution image, M(m l,k ) the motion compensation operator for the blurred high resolution images through the use of motion vector m l,k , and A l again the down-sampling matrix.

Different notation has been used in Eqs. (3.5) and (3.9) for the blur and warping operators in order to distinguish between these two models for the rest of the text. The three-conditional probability density functions in Eqs. (3.6)–(3.8) can be rewritten now for the blur-warp model, by substituting A l, H l, C(d l,k ) by A l, M(m l,k )B l (for brevity we do not reproduce them here). The question as to which of the two models (blur–warp or warp–blur) should be used is addressed in [68]. The authors claim that when the motion has to be estimated from the low resolution images, using the warp–blur model may cause systematic errors and, in this case, it is more appropriate to use the blur–warp model. They showed that when the imaging blur is spatiotemporally shift invariant and the motion has only a global translational component the two models coincide. Note that in this case, the blur and motion matrices correspond to convolution matrices and thus they commute.

Before concluding this section on image formation for uncompressed observations, we mention here that for both the warp–blur and the blur–warp models we have defined conditional probability density functions for each low resolution observation g l given f k and d l,k . Our goal, however, is to define the conditional probability density functions P(g|f k , d), that is, the distribution when all the observations g and all the motion vectors d for compensating the corresponding high resolution frames to the k-th frame are taken into account. The approximation used in the literature for this joint-conditional probability density functions is

$$ {\mathbf{P}}\left( {{\mathbf{g}}\left| {{\mathbf{f}}_{k} } \right., {\mathbf{d}}} \right) = \prod\limits_{l = 1}^{L} {{\mathbf{P}}\left( {{\mathbf{g}}_{\varvec{l}} \left| {{\mathbf{f}}_{k} } \right., {\mathbf{d}}_{{\varvec{l},\varvec{k}}} } \right)} $$
(3.10)

which implies that the low resolution observations are independent given the unknown high resolution image f k and motion vectors d.

3.3 Survey of the Super Resolution Algorithms

The idea of super resolution was first introduced in 1984 by Tsai and Huang [11] for multi-frame image restoration of band-limited signals. A good overview of existing algorithms is given by [3] and [73]. Most super resolution methods are composed of two main steps: first all the images are aligned in the same coordinate system in the registration step, and then a high-resolution image is reconstructed from the irregular set of samples. In this second step, the camera point spread function is often taken into account. The scheme of super resolution is illustrated in Fig. 3.9.

Fig. 3.9
figure 9

Scheme for super resolution

Precise sub-pixel image registration is a basic requirement for a good reconstruction. If the images are inaccurately registered, the high-resolution image is reconstructed from incorrect data and is not a good representation of the original signal. Zitova and Flusser [74] presents an overview of image registration methods. Registration can be done either in spatial or in frequency domain. By the nature of the Fourier transform, frequency domain methods are limited to global motion models. In general, they also consider only planar shifts and possibly planar rotation and scale, which can be easily expressed in Fourier domain. However, aliasing is much easier to describe and to handle in frequency domain than in spatial domain.

3.3.1 Registration

3.3.1.1 Frequency Approach

Tsai and Huang [11] describes an algorithm to register multiple frames simultaneously using nonlinear minimization in frequency domain. Their method for registering multiple aliased images is based on the fact that the original, high resolution signal is band-limited. They derived a system equation that describes the relationship between low resolution images and a desired high resolution image by using the relative motion between low resolution images. The frequency domain approach is based on the following three principles: (i) the shifting property of the Fourier transform, (ii) the aliasing relationship between the continuous Fourier transform (CFT) of an original high resolution image and the discrete Fourier transform (DFT) of observed low resolution images, (iii) and the assumption that an original high resolution image is band-limited.

These properties make it possible to formulate the system equation relating the aliased discrete Fourier transform (DFT) coefficients of the observed low resolution images to a sample of the continuous Fourier transform (CFT) of an unknown image. For example, let us assume that there are two one-dimension low resolution signals sampled below the Nyquist sampling rate. From the above three principles, the aliased low resolution signals can be decomposed into the un-aliased high resolution signal as shown in Fig. 3.9.

Let f l (m, n) denote a continuous high resolution image and F l (w m, w n) be its continuous Fourier transform (CFT). The global translations, which are the only motion considered in the frequency domain approach, yield the k-th shifted image of Eq. (3.1). By the shifting property of the continuous Fourier transform (CFT), the continuous Fourier transform of the shifted image, F k (w m, w n), can be written as

$$ {\mathbf{F}}_{k} \left( {{\mathbf{W}}_{m} ,{\mathbf{W}}_{n} } \right) = { \exp }\left[ {j2\pi \left( {d_{l,k}^{x} \left( {m,n} \right){\mathbf{W}}_{m} , n + d_{l,k}^{y} \left( {m,n} \right){\mathbf{W}}_{n} } \right)} \right]{\mathbf{F}}_{l} \left( {{\mathbf{W}}_{m} ,{\mathbf{W}}_{n} } \right) $$
(3.11)

The shifted image f k (m, n) is sampled with the sampling period T m and T n to generate the observed low resolution image g k (m, n). From the aliasing relationship and the assumption of band-limitedness of F l (w m, w n)

$$ \left| {{\mathbf{F}}_{k} \left( {{\mathbf{W}}_{m} ,{\mathbf{W}}_{n} } \right)} \right| = 0\; {\text{for }}\left| {{\mathbf{W}}_{m} } \right| \ge \left( {L_{m} \pi /T_{m} } \right), \left| {{\mathbf{W}}_{n} } \right| \ge \left( {L_{n} \pi /T_{n} } \right) $$
(3.12)

The relationship between the continuous Fourier transform (CFT) of the high resolution image and the discrete Fourier transform (DFT) of the k-th observed low resolution image can be written as [75]

$$ \gamma_{k} \left[ {\Omega _{m} ,\Omega _{n} } \right] = \frac{1}{{T_{m} T_{n} }}\mathop \sum \limits_{{m_{l} = 0}}^{{L_{m} - 1}} \mathop \sum \limits_{{n_{l} = 0}}^{{L_{n} - 1}} \left\{ {{\text{F}}_{k} \times \left( {\frac{2\pi }{{T_{m} }}\left( {\frac{{\Omega _{m} }}{M} + m} \right), \frac{2\pi }{{T_{n} }}\left( {\frac{{\Omega _{n} }}{M} + n} \right)} \right)} \right\} $$
(3.13)

By using lexicographic ordering for the indices m, n on the right-hand side and k on the left-hand side, a matrix vector form is obtained as:

$$ {\mathbf{Y}} =\Phi {\text{X}} $$
(3.14)

where Y is a p × 1 column vector with the k-th element of the discrete Fourier transform (DFT) coefficients of y k [m, n], F is a L m L n × 1 column vector with the samples of the unknown continuous Fourier transform of f l (m, n), and Φ is a p × L m L n matrix which relates the discrete Fourier transform of the observed low resolution images to samples of the continuous high resolution image.

Therefore, the reconstruction of a desired high resolution image requires us to determine Φ and solve this inverse problem. It is not clear, however, if such a solution is unique and if such an algorithm will not converge to a local minimum. Most of the frequency domain registration methods are based on the fact that two shifted images differ in frequency domain by a phase shift only, which can be found from their correlation. Using a log-polar transform of the magnitude of the frequency spectra, image rotation and scale can be converted into horizontal and vertical shifts. These can therefore also be estimated using a phase correlation method.

3.3.1.2 Phase Shift and Correlation

Reddy and Chatterji [76 and 77] describe such planar motion estimation algorithms. Authors apply a high-pass emphasis filter to strengthen high frequencies in the estimation. Kim and Su [78, 79 and 80] also apply a phase correlation technique to estimate planar shifts. To minimize errors due to aliasing, their methods rely on a part of the frequency spectrum that is almost free of aliasing. Typically this is the low-frequency part of the images. [81] showed that the signal power in the phase correlation corresponds to a poly phase transform of a filtered unit impulse. [82] developed a rotation estimation algorithm based on the property that the magnitude of the Fourier transform of an image and the mirrored version of the magnitude of the Fourier transform of a rotated image has a pair of orthogonal zero-crossing lines. The angle that these lines make with the axes is equal to half the rotation angle between the two images. The horizontal and vertical shifts are estimated afterwards using a standard phase correlation method.

3.3.1.3 Regularization

An extension of this approach for a blurred and noisy image was provided by [12], resulting in a weighted least squares formulation. In their approach, it is assumed that all low resolution images have the same blur and the same noise characteristics. This method was further refined by [13] to consider different blurs for each low resolution image. Here, the Tikhonov regularization method is adopted to overcome the ill-posed problem resulting from blur operator. Bose et al. [14] proposed the recursive total least squares method for super resolution reconstruction to reduce effects of registration errors (errors in Φ). A discrete cosine transform (DCT) based method was proposed by [15]. They reduce memory requirements and computational costs by using discrete cosine transform (DCT) instead of discrete Fourier transform (DFT). They also apply multichannel adaptive regularization parameters to overcome ill-posed such as underdetermined cases or insufficient motion information cases.

Theoretical simplicity is a major advantage of the frequency domain approach. That is, the relationship between low resolution images and the high resolution image is clearly demonstrated in the frequency domain. The frequency method is also convenient for parallel implementation capable of reducing hardware complexity. However, the observation model is restricted to only global translational motion and LSI blur. Due to the lack of data correlation in the frequency domain, it is also difficult to apply the spatial domain a priori knowledge for regularization.

Generally, the super resolution image reconstruction approach is an ill-posed problem because of an insufficient number of low resolution images and ill-conditioned blur operators. Procedures adopted to stabilize the inversion of ill-posed problem are called regularization. In this section, we present deterministic and stochastic regularization approaches for super resolution image reconstruction. Typically, constrained least squares (CLS) and maximum a posteriori (MAP) super resolution image reconstruction methods are introduced.

3.3.1.4 Spatial Approach

Spatial domain methods generally allow for more general motion models, such as homographies. They can be based on the whole image or on a set of selected corresponding feature vectors, as discussed by [83] and by RANSAC algorithm [84]. Keren et al. [85] developed an iterative planar motion estimation algorithm based on Taylor expansions. A pyramidal scheme is used to increase the precision for large motion parameters. A hierarchical framework to estimate motion in a multi resolution data structure is described in [86]. Different motion models, such as affine flow or rigid body motion, can be used in combination with this approach. Irani et al. [87] presented a method to compute multiple, possibly transparent or occluding motions in an image sequence. Motion is estimated using an iterative multi resolution approach based on planar motion. Different objects are tracked using segmentation and temporal integration. Gluckman [88] described a method that first computes planar rotation from the gradient field distribution of the images to be registered. Planar shifts are then estimated after cancellation of the rotation using a phase correlation method.

3.3.2 Reconstruction

3.3.2.1 Interpolation-Based and Frequency Domain

In the subsequent image reconstruction phase, a high resolution image is reconstructed from the irregular set of samples that is obtained from the different low-resolution images. This can be achieved using an interpolation-based method as the one used by [85]. Tsai and Huang [11] describes a frequency domain method, writing the Fourier coefficients of the high-resolution image as a function of the Fourier coefficients of the registered low-resolution images. The solution is then computed from a set of linear equations. This algorithm uses the same principle as the formulation in time domain given by [89].

3.3.2.2 POCS

A high-resolution image can also be reconstructed using a projection onto convex sets (POCS) algorithm [27], where the estimated reconstruction is successively projected on different convex sets. Each set represents constraints to the reconstructed image that are based on the given measurements and assumptions about the signal. Capel and Zisserman [83] and [90] use a maximum a posteriori (MAP) statistical method to build the high-resolution image.

Other methods iteratively create a set of low-resolution images from the estimated image using the imaging model. The estimate is then updated according to the difference between the real and the simulated low-resolution images [32, 85]. This method is known as iterative back-projection. Zomet et al. [91] improved the results obtained with typical iterative back-projection algorithms by taking the median of the errors in the different back-projected images. This proved to be more robust in the presence of outliers. Farsiu et al. [70] proposed a new and robust super resolution algorithm.

Instead of the more common L2 minimization, they use the L1 norm, which produces sharper high-resolution images. They also showed that this approach performs very well in combination with the algorithm by [91]. Elad and Feuer [31] present a super resolution framework that combines a maximum-likelihood/MAP approach with a projection onto convex sets (POCS) approach to define a new convex optimization problem. Next, they show the connections between their method and different classes of other existing methods.

3.4 Novel Super Resolution Registration Algorithm Based on Frequency

In this chapter, we show that the flowchart of the proposed algorithm and each implementation sources. And we describe the detail methodologies and show their experiment results such as the obtained high resolution images, their image quality and the computational complexity comparing with the results of other super resolution algorithms. First, we show our main flow chart as in Fig. 3.10.

Fig. 3.10
figure 10

Main flowchart of the proposed algorithm

Secondly, we obtained the low resolution video sequence by applying the down-sampling factor of two into the original video sequence, as shown in Fig. 3.11, and their resolution size is 320 × 240.

Fig. 3.11
figure 11

Input low resolution video sequence generating scheme

3.4.1 Pre-processing

In the second step, we designed the automatic low resolution input image selection algorithm to reduce the registration error. In the whole video sequence, there are unsuitable images according to the reference image. Therefore, it is very important to choose this.

The video sequence has some linearity since it is made with 30 frames per second (fps) or 25 frames per second (fps). However, the accuracy of this is not high. According to the numerous literatures for the motion estimation and the motion compensation, the probability of inner 1/4-pixel distance motion vector is over 90 % for the practical video sequences, and the motion compensation error has maximum value at 1/2-pixel distance [92100] as shown in Fig. 3.12.

Fig. 3.12
figure 12

Distribution of the registration error depending on the sub-pixel shift

We designate the center image to the reference input image in the specified video sequence window, and analysis the registration error for each reference image and its comparing input low resolution images, at this time, we restrict the maximum number of input low resolution image is limited as five frames. The reason of this, it has very high computational complexity than others if we used many input low resolution images. The registration error is computed by the sum of difference (SAD) computing method since it can easily and simply calculate the motion compensation. Where, we assume that the block size computing the sum of difference (SAD) is as 8 × 8 to low computational complexity. Thus, the sum of difference (SAD) calculation allows us to take the motion compensation error (MCE). If the sum of difference (SAD) of one input low resolution image (ILRI) has 0 ≤ SAD ≤ maximum motion compensation error (MMCE, it is same with maximum SAD), then we can select it as an input low resolution image candidate (ILRIC). It is illustrated in Fig. 3.13.

Fig. 3.13
figure 13

The flowchart of input low resolution image candidate selection

In the next step, it compares the number of the input low resolution image candidates of each reference image. One reference image which has the largest the input low resolution image candidates is chosen as the optimal reference image. And also we propose an advanced architecture to choose the reference image to reduce computational complexity as shown in Fig. 3.14. This method can remove the duplication of the sum of absolute difference (SAD) calculation based on the partial distortion elimination (PDE) method at each frame. The basic concept of it is that if the difference between current and candidate block has small value then this candidate has higher probability to the optimal reference. Therefore, it is more efficient whenever as an input image which has larger initial accumulated sum of absolute difference (SAD) value is selected.

Fig. 3.14
figure 14

The flowchart of an advanced choosing the reference image based on the partial distortion elimination (PDE)

3.4.2 Planar Motion Estimation

Fourier based image registration methods only allow global motion in a plane parallel to the image plane. In such a case, the motion between two images can be described as a function of three parameters that are all continuous variables: horizontal and vertical shifts x 1,h and x 1,v and a planar rotation angle θ 1.

A frequency domain approach allows us to estimate the horizontal and vertical shift and the (planar) rotation separately. Assume we have a continuous two-dimensional reference signal f 0(x) and its shifted and rotated version f 1(x):

$$ f_{1} \left( x \right) = f_{0} \left( {R\left( {x + x_{1} } \right)} \right) $$
(3.15)
$$ {\text{with}} \;x = \left( {\begin{array}{*{20}c} {x_{h} } \\ {x_{v} } \\ \end{array} } \right), x_{1} = \left( {\begin{array}{*{20}c} {x_{1,h} } \\ {x_{1,v} } \\ \end{array} } \right), R = \left( {\begin{array}{*{20}c} {\cos \theta_{1} } & { - \sin \theta_{1} } \\ {\sin \theta_{1} } & {\cos \theta_{1} } \\ \end{array} } \right) $$

This can be expressed in Fourier domain as

$$ \begin{aligned} F_{1} \left( u \right) & = \iint\limits_{x} {f_{1} \left( x \right)e^{{ - j2\pi u^{T} x}} dx} \\ & = \iint\limits_{x} {f_{0} \left( {R\left( {x + x_{1} } \right)} \right)e^{{ - j2\pi u^{T} x}} dx} \\ & = e^{{ - j2\pi u^{T} x_{1} }} \iint\limits_{x} {f_{0} \left( {Rx^{\prime } } \right)e^{{ - j2\pi u^{T} x^{\prime } }} dx^{\prime } } \\ \end{aligned} $$
(3.16)

With F 1(u) the two-dimensional Fourier transform of f 1(x) and the coordinate transformation x′ = x + x 1. After another transformation x″ = R x′, the relation between the amplitudes of the Fourier transforms can be computed as

$$ \begin{aligned} \left| {F_{1} \left( u \right)} \right| & = \left| {e^{{ - j2\pi u^{T} x_{1} }} \iint\limits_{x} {f_{0} \left( {Rx^{\prime}} \right)e^{{ - j2\pi u^{T} x^{\prime}}} dx^{\prime}}} \right| \\ & = \left| {\iint\limits_{{x^{\prime}}} {f_{0} \left( {Rx^{\prime}} \right)e^{{ - j2\pi u^{T} x^{\prime}}} dx^{\prime}}} \right| \\ & = \left| {\iint\limits_{{x^{\prime\prime}}} {f_{0} \left( {x^{\prime\prime}} \right)e^{{ - j2\pi u^{T} \left( {R^{T} x^{\prime}} \right)}} dx^{\prime\prime}}} \right| \\ & = \left| {\iint\limits_{{x^{\prime\prime}}} {f_{0} \left( {x^{\prime\prime}} \right)e^{{ - j2\pi u^{T} \left( {Ru} \right)^{T} x^{\prime}}} dx^{\prime\prime}}} \right| \\ & = \left| {F_{0} \left( {Ru} \right)} \right| \\ \end{aligned} $$
(3.17)

We can see that |F 1(u)| is a rotated version of |F 0(u)| over the same angle θ 1 as the spatial domain rotation in Fig. 3.15. |F 0(u)| and |F 1(u)| do not depend on the shift values x 1, because the spatial domain shifts only affect the phase values of the Fourier transforms. Therefore we can first estimate the rotation angle θ 1 from the amplitudes of the Fourier transforms |F 0(u)| and |F 1(u)|. After compensation for the rotation, the shift x 1 can be computed from the phase difference between |F 0(u)| and |F 1(u)|.

Fig. 3.15
figure 15

The rotation estimation (θ 1 = 25°) and their Fourier transform. a Original image and its Fourier transformed amplitude. b Rotated image and its Fourier transformed amplitude

3.4.3 Rotation Estimation

The rotation angle between |F 0(u)| and |F 1(u)| can be computed as the angle θ 1 for which the Fourier transform of the reference image |F 0(u)| and the rotated Fourier transform of the image to be registered |F 1(Ru)| have maximum correlation. This implies the computation of a rotation of |F 1(u)| for every evaluation of the correlation, which is computationally heavy and thus practically difficult.

If |F 0(u)| and |F 1(u)| are transformed in polar coordinates, the rotation over the angle θ 1 is reduced to a (circular) shift over θ 1. We can compute the Fourier transform of the polar spectra |F 0(u)| and |F 1(u)|, and compute θ 1 as the phase shift between the two [76, 77]. This requires a transformation of the spectrum to polar coordinates. The data from the uniform u h , u v , -grid need to be interpolated to obtain a uniform u h , u v ,-grid. Mainly for the low frequencies, which generally contain most of the energy, the interpolations are based on very few function values and thus introduce large approximation errors. An implementation of this method is also computationally intensive.

Our approach is computationally much more efficient than the two methods described above. First of all, we compute the frequency content A as a function of the angle θ by integrating over radial lines:

$$ \varvec{A}\left( \theta \right) = \int\limits_{\theta - \varDelta \theta /2}^{\theta + \varDelta \theta /2} {\int\limits_{0}^{\infty } {\left| {F\left( {u_{r} ,u_{\theta } } \right)} \right|du_{r} du_{\theta } } } $$
(3.18)

In practice, |F 0(u r , u θ )| is a discrete signal. Different methods exist to relate discrete directions to continuous directions, like for example digital lines [101]. Here, we compute the discrete function A(θ) as the average of the values on the rectangular grid that have an angle θ  ∆θ/2 < u θ  < θ + ∆θ/2. As we want to compute the rotation angle with a precision of 0.1 degrees, A(θ) is computed every 0.1 degrees. To get a similar number of signal values |F 0(u r , u θ )| at every angle, the average is only evaluated on a circular disc of values for which u r  < ρ (where ρ is the image radius, or half the image size). Finally, as the values for low frequencies are very large compared to the other values and are very coarsely sampled as a function of the angle, we discard the values for which u r  < ερ, with ε = 0.1. Thus, A(θ) is computed as the average of the frequency values on a discrete grid with θ  ∆θ/2 < u θ  < θ + ∆θ/2 and ερ < u r  < ρ.

This results in a function A(θ) for both |F 0(u)| and |F 1(u)| as shown in Fig. 3.16. The exact rotation angle can then be computed as the value for which their correlation reaches a maximum. Note that only a one-dimensional correlation has to be computed, as opposed to the two-dimensional correlation approaches in [76] and [77].

Fig. 3.16
figure 16

Rotation estimation. a Average Fourier domain amplitude as a function of the angle A(θ) for the two image from Fig. 3.11. b Correlation between A 0 (θ) and A 1 (θ), with a maximum at the rotation angle θ 1  = 25°

Of course, the use of such a radial projection also reduces the available information, and might introduce ambiguities in the estimation. The simulation result of our rotation estimation algorithms is shown in Fig. 3.17.

Fig. 3.17
figure 17

The simulation result of the rotation estimation. a Reference image. b Object image. c Inverse rotation estimated image of (b)

3.4.4 Shift Estimation

A shift of the image parallel to the image plane can be expressed in Fourier domain as a linear phase shift:

$$ \begin{aligned} F_{1} \left( u \right) & = \iint\limits_{x} {f_{1} \left( x \right)e^{{ - j2\pi u^{T} x}} }\;dx = \iint\limits_{x} {f_{0} \left( {x + x_{1} } \right)}\;e^{{ - j2\pi u^{T} x}} dx \\ & = e^{{j2\pi u^{T} x_{1} }} \iint\limits_{{x^{'} }} {f_{0} \left( {x^{'} } \right)e^{{ - j2\pi u^{T} x^{'} }} }\;dx^{'} = e^{{j2\pi u^{T} x_{1} }} F_{0} \left( u \right) \\ \end{aligned} $$
(3.19)

It is well known that the shift parameters x 1 can thus be computed as the slope of the phase difference ∠(F 1(u)/F 0(u)) [7679, 81, 82, 102]. To make the solution less sensitive to noise, a least squares method is widely used.

Here, the shift parameters x 1 can be computed as the slope of the phase difference ∠(F 1(u)/F 0(u)). To make the solution less sensitive to noise, a least squares method is widely used. When we apply the inverse shift estimation into the object image after the rotation estimation for the reference image, the result image is exactly same with the reference image (see Fig. 3.18); therefore this shift estimation process is used in initial registration operation.

Fig. 3.18
figure 18

The simulation result of the shift estimation. a Reference image. b Object image. c Inverse shift estimated image with after the rotation estimation of (b)

We decide three candidates for a reference image. To do this, we use the Hilbert space method. That is, we execute the initial registration process as Fig. 3.19.

Fig. 3.19
figure 19

The initial registration process

In Fig. 3.19, LR1 denotes as a reference image and from LR2 to LR4 are chosen candidates low resolution images. These candidate images are located at each high resolution grid by using the inverse shift estimation. For example, four sample images of 320 × 240 resolution to generate a high resolution image are shown in Fig. 3.20.

Fig. 3.20
figure 20

Four sample images to generate a high resolution image

3.4.5 Reconstruction

And then, we can obtain a high resolution image, but its resolving power is not good. Because, obtained high resolution image has multichannel sampling frequencies and has unknown the offsets. To reduce the offsets and the number of multichannel sampling frequency, we apply the mean value filtering. That is, all of each pixel value is regenerated by using neighbor 5 pixels with cross-shape. Its graphical diagram and results image for four sample images are shown in Fig. 3.21.

Fig. 3.21
figure 21

Graphical diagram and results image for four sample images

Fig. 3.22
figure 22

Secondly obtained high resolution image by using the mean value filtering

Secondly obtained high resolution image, it looks not clear. Therefore, we apply the de-blurring operation to more reduce multichannel sampling frequencies, and then we apply sharpening process as shown in Fig. 3.23.

Fig. 3.23
figure 23

Result image after applying the de-blurring and sharpening to Fig. 3.22

We apply mean value filtering, bi-cubic interpolation, de-blurring and sharpening process again like a kind iterative back-projection (IBP) method. And then we can obtain as Fig. 3.24.

Fig. 3.24
figure 24

Result image after applying IBP to Fig. 3.22

These processes can be expressed as below equations. The initial registered image by the rotation and shift estimation has non-uniformed sampling frequency with unknown offsets. It can be expressed as

$$ {\text{Y}}_{m} = \mathop \sum \limits_{{i_{1} ,i_{2} }} e^{{j2\pi \left( {i_{1} N_{1} t_{m,1} + i_{2} N_{2} t_{m,2} } \right)}} D_{{t_{m} }}^{'} \alpha_{{i_{1} ,i_{2} }} $$
(3.20)

3.5 Conclusion

We have described super resolution methods, and especially we have focused to super resolution imaging with multichannel sampling with unknown offsets. In such algorithms, an accurate registration can decide the algorithm performance. We propose an advanced registration algorithms with smart rotation and shift estimation. The sequence for these two processes in our algorithm followed the warp-blur observation model. Generally, the cases that the blurring parameters are depend on the camera rotations and vibrations are much more than vice versa.

Firstly, our algorithm decides the optimal reference image to reduce the registration error, on the other hand another numerous super resolution algorithms discard considering this registration error or assume as uniform value. In this frame work, the registration error is calculated by using the sum of absolute difference (SAD) based on the partial distortion elimination (PDE). This process has been obtained the noticeable result comparing conventional algorithms.

Secondly, the proposed algorithm estimates the rotation and the shift in order successively, because our algorithm is based on the warp-blur observation model. The blurring effects are by the point spread function (PSF) of camera, and it is subject to changes according to the rotation parameter of the image.

Finally, we have reconstructed a high resolution image by using the planar motion estimation. This results in a set of nonlinear equations in the unknown signal coefficients and the offsets. Using this formulation, we have shown that the solution is generally unique if the total number of sample values is larger than or equal to the total number of unknowns (signal parameters and offsets).

We present the one reference image and their three candidate images for 10 sample images to reconstruct a high resolution image by using our proposed registration algorithm. These candidate images are obtained from first step. And also we represent all images for each sample and show a bi-cubic interpolated image and a super resolution image by proposed algorithm. The image quality of proposed algorithm is much higher than the bi-cubic interpolation method. We take the average PSNR of proposed algorithm is about 38 dB and another’s are lower than our method. It is as shown in Table 3.1.

Table 3.1 Comparison of the different methods presented in this chapter