1 Introduction

Super-resolution (SR) image reconstruction refers to reconstruct a high-resolution (HR) image from one or multiple low-resolution (LR) images. SR technology is an effective way to improve the image spatial resolution and image quality. Because of the very broad application prospects, SR technology has been greatly concerned and extensively researched by academic and business communities all over the world.

SR image reconstruction was first proposed by Harris [8] and Goodman [7] in the 1960s. They only used one LR image. However, only one single LR image could get little image information and was difficult to remove image noise. In order to overcome the defects of using single image, Tsai and Huang [23] first proposed multiple satellite images SR reconstruction algorithm based on Fourier transform. Heretofore, there already have been large numbers of algorithms which use multiple LR images to rebuild a SR image. The SR reconstruction of image sequences can be divided into three main directions: based on interpolation, learning and reconstruction algorithm. In this study, we use the algorithm based on reconstruction. The algorithm based on reconstruction is freedom from airspace priori. It mainly includes the iterative back projection method [1, 17], the POCS method [4, 19, 28], the maximum posteriori estimation method [9, 27] and the regularization method [13, 16, 26, 29]. The iterative back projection method doesn’t use any priori regular constraint, so the ill-posed of the SR reconstruction will lead to non-uniqueness and instability. The advantages of POCS method are intuitive principle, general degradation model and convenient integration of priori knowledge. But there are some potential defects: (a) The outline and details of the image are not clearly indicated; (b) The solution space is the intersection of all convex constraint sets, so the solution of the algorithm is not unique without a single point; (c) The algorithm is heavily dependent on the selection of the initial value; (d) It requires a number of interactions. The advantage of the maximum posteriori estimation method is that there is a unique solution. If there is a reasonable priori assumption, it can get a very good image edge. But the significant drawback is the large calculation. The regularization algorithm doesn’t require a circular point spread function and any statistical assumptions for the image and noise. But the regularization term inhibits the details while suppressing noise. It is easy to be too smooth. In this paper, we use the regularization method.

The basic framework of the SR reconstruction is based on the regularization theory. The HR image is obtained by minimizing the regularization energy function that is constructed by the degradation models of the LR image sequence and the regular items corresponding to image models. The rational design of geometric image model in the regularization space directly determines the visual effect of the SR reconstructed image. Hong et al. [10, 11] proposed the SR reconstruction method based on Tikhonov regularization item. Hardie et al. [12] proposed the SR reconstruction algorithm based on Tikhonov regularization using Conjugated Gradient (CG) to improve the operation efficiency. However, it was easy to blur the important geometric structure, such as the image edge. Capel and Zisserman [2] proposed SR reconstruction algorithm for image sequence, which suppressed noise and generated staircase effect based on the Total Variation model (TV). Farsiu et al. [5] proposed a SR reconstruction method based on L1 norm estimation and Bilateral Total Variation (BTV) model. The method reduced the impact of model error on the reconstruction results to a certain extent, and eliminated the staircase effect of TV regularization method [18]. But the method had a poor partial adaptive ability so that it could not effectively maintain the slight edge.

In this paper, we propose a local structure adaptive super-resolution reconstruction algorithm based on BTV regularization to overcome the defects of image super-resolution reconstruction model based on the BTV regularization. According to the local structure, the method selects the priori model and regularization parameters adaptively. The experimental results show that the proposed method is better able to maintain a slight edge of the image while de-noising, and greatly reduces the workload of manually selecting the regularization parameter.

The rest of the paper is organized as follows. In Section 2, an image degradation model is presented. In Section 3, the local structure adaptive robust SR reconstruction algorithm is proposed. And then, the performance of the proposed algorithm is evaluated and analyzed in the experiments of Section 4. Section 5 concludes the paper.

2 Image degradation model

The first step of SR image reconstruction technique is to establish an observational model to relate the original HR image and the LR images. In the actual image sampling process, the atmospheric disturbances, the object motion, the optical blur, the down-sampling, noise and other factors will lead to image degradation. The flow of degraded process of SR image is illustrated in Fig. 1

Fig. 1
figure 1

The flow of degraded process of SR image

In order to facilitate the calculation, we write the image matrix into one-dimensional vector [3, 5, 6, 20, 21, 24], so the form of matrix is described as:

$$ Y_{k}=D_{k}H_{k}F_{k}X+V_{k},k=1,2,\ldots,N. $$
(1)

where X is a SR image (the size is [r 2 M 2×1]), r is the magnification of the SR image relative to the LR image, Y k is the k pieces of LR images (the size is [M 2×1]), D k is the down-sampling operation (the size is [M 2×r 2 M 2]), \(H_{k}=H_{k}^{\rm cam}H_{k}^{\rm atm}\), \(H_{k}^{\rm cam}\) is the proliferation blurred of camera point (the size is [r 2 M 2×r 2 M 2]), \(H_{k}^{\rm atm}\) is the atmospheric blurred (the size is [r 2 M 2×r 2 M 2]), F k is the geometric movement operations (the size is [r 2 M 2×r 2 M 2]), V k is the system noise (the size is [M 2×1]).

3 The local structure adaptive robust SR reconstruction algorithm

3.1 The choice of the local structure adaptive prior model

SR image reconstruction is a serious ill-posed problem. The degradation model is given in (1). The general framework of the SR reconstruction based on regularization theory can be expressed as:

$$ \hat{X}=\underset{X}{\arg\min}\left\{\sum\limits^N_{k=1}\|Y_{k}-DF_{k}HX\|_{1}+\lambda\Gamma(X)\right\} $$
(2)

where λ is a compromise factor to adjust the regular item Γ(·) and the data fidelity term \(\sum^N_{k=1}\|Y_{k}-DF_{k}HX\|_{1}\).

Total variation (TV) regular method is widely used in image de-noising and de-blurring. It can keep the edge of the image while de-noising during the image reconstruction process. Farsiu, etc. [5] combined the bilateral filter with TV regularization operator, and proposed the following BTV regularization:

$$ \Gamma(X)=\sum\limits^\omega_{l=-\omega}\sum\limits^\omega_{m=-\omega}\alpha^{|m|+|l|}\left\|X-S^l_{x}S^m_{y}X\right\|_{1} $$
(3)

where, operator \(S^l_{x}\) and \(S^m_{y}\) represent X panning l and m pixels in vertical and horizontal directions respectively, l + m ≥ 0 (because of l and m are non-negative), scalar α(0 < α < 1) is spatial distance attenuation factor.

BTV regularization SR reconstruction model, to some extent, can reduce the impact of model error on the rebuild results. It can eliminate the staircase effects of the TV regularization method. However, it is a poor local adaptive method, and can not effectively maintain the slight edge. The previous studies showed that using L 2 norm will cause the edge blurring. The L 1 norm has a strong ability to protect the edge, but it produces the piecewise constant in the output image. In order to have a visually satisfactory result, we want to combine these two “smoothness” measures and automatically adjust the “smoothness” measure according to the local features of the image. So we use L 1 norm at the strong edge and L 1 norm at the flat areas. This section presents the following adaptive BTV regularization SR reconstruction model:

$$ \hat{X}=\underset{X}{\arg\min}\left\{\sum\limits^N_{k=1}\|Y_{k}-DF_{k}HX\|^{p_{(X)}}_{p_{(X)}}+ \lambda\sum\limits^\omega_{l=-\omega}\sum\limits^\omega_{m=-\omega}\frac{1}{p_{(X)}}\alpha^{|m|+|l|} \left\|X-S^l_{x}S^m_{y}X\right\|^{p_{(X)}}_{p_{(X)}}\right\} $$
(4)

where

$$ p_{(x)}=\begin{cases} 2 & x<x_1 \\ 2-\displaystyle\frac{x-x_1}{x_2-x_1} & x_1<x<x_2 \\ 1 & x>x_2 \end{cases} $$
(5)

where x 1 and x 2 is the predefined threshold value empirically, \(x=\left|X_{i,j}-X_{i+m,j+l}\right|\), 0 < x 1 < 1/3 max {X i,j  − X i + m,j + l }, 2/3 max {X i,j  − X i + m,j + l } < x 2 <  max {X i,j  − X i + m,j + l }. Then rounded p (x) has the following properties:

  1. (1)

    Monotonically decreasing;

  2. (2)

    \( p_{(x)}=\begin{cases} 2 & x \rightarrow 0 \\ 1 & x \rightarrow \infty \end{cases} \).

Thus, in the flat areas, p (x) = 2,

$$ \hat{X}=\underset{X}{\arg\min}\left\{\sum\limits^N_{k=1}\|Y_{k}-DF_{k}HX\|^2_2+\lambda\sum\limits^\omega_{l= -\omega}\sum\limits^\omega_{m=-\omega}\frac{1}{2}\alpha^{|m|+|l|}\left\|X-S^l_{x}S^m_{y}X\right\|^2_2\right\} $$

Near the edge, p (x) = 1,

$$ \hat{X}=\underset{X}{\arg\min}\left\{\sum\limits^N_{k=1}\|Y_{k}-DF_{k}HX\|_1+ \lambda\sum\limits^\omega_{l=-\omega}\sum\limits^\omega_{m=-\omega}\alpha^{|m|+|l|}\left\|X-S^l_{x}S^m_{y}X\right\|_1\right\} $$

For (4), we define Z = HX. So, Z is the blurred version of the ideal SR image X. Therefore, we devide the minimization problem into two separate steps:

  1. (1)

    Non-iterative data fusion, finding a blurred HR image from the LR measurements (we call this result \(\hat{Z}\));

  2. (2)

    Estimating the de-blurred image \(\hat{X}\) from \(\hat{Z}\).

To find \(\hat{Z}\), we substitute HX with Z. The vector \(\hat{Z}\) is the weighted mean of all measurements at a given pixel, after proper zero filling and motion compensation:

$$ \hat{Z}=\underset{X}{\arg\min}\left\{\sum\limits^N_{k=1}\|Y_{k}-DF_{k}Z\|^{p_{(X)}}_{p_{(X)}}\right\} $$

The following expression formulates our minimization criterion for obtaining \(\hat{X}\) from \(\hat{Z}\):

$$ \hat{X}=\underset{X}{\arg\min} \left\{\| A(HX-\hat{Z})\|_{p_{(X)}}^{p_{(X)}} + {\lambda}\sum^\omega_{l=-\omega}\sum^\omega_{m=-\omega}\frac{1}{p_{(X)}}\alpha^{|m|+|l|} \left\|X-S^l_{x}S^m_{y}X\right\|^{p_{(X)}}_{p_{(X)}}\right\} $$
(6)

where matrix A is a diagonal matrix with diagonal values equal to the square root of the number of measurements that contributes to each element of \(\hat{Z}\) (in the square case A is the identity matrix).

3.2 Adaptive parameters selection

On the other hand, the regular parameters need to be manually set for different LR gallery. And it needs a large number of repeat tedious trial compared work to get good results of HR images. We use the local structure self-adaptive robust SR reconstruction method to determine the regularization parameter self-adaptively based on the robust SR reconstruction method. Thus, we reduce the burden of workload [15]. The regularization parameters are different during each iteration, so it is more adaptive to the current situation and the final SR image will be better in self-adaptive solution process.

According to (2), the regularization parameter λ plays a role in balancing the regular item and the data term. When λ becomes large, the reconstructed image tends to smooth. On the contrary, the image fitting error is small. How to correctly estimate the regularization parameter λ to achieve a relative equilibrium of regular item and the data term is a difficulty and an important issue in the inverse problem of SR image reconstruction. It needs to make the reconstructed image meeting the appropriate conditions without deviation from the original image too far. There are several currently used methods:

  1. (1)

    Selected the parameter empirically. Firstly according to the observation images, we estimate the noise level ε, where \(\|AHX-A\hat{Z}\|\leq\varepsilon^2\). Then we empirically select the upper bound E of the regular item, where Υ(X) ≤ E 2. So we can determine the regularization parameter λ, where λ = (ε/E)2. But the disadvantage is that it is too subjective and needs priori knowledge of the noise and regular item.

  2. (2)

    U-Curve method [14, 25]. We draw an energy curve of the image fitting and regular items which is similar ‘U’ by selecting different regularization parameters. Then we select the parameter of the maximum curvature point of the curve as the optimal regularization parameter. The advantage of the method is intuitive and robust, and the disadvantage is too computationally intensive.

This paper presents a method. In this method, the regularization coefficient is determined dynamically and self-adaptively to avoid the shortcomings of the empirical determination and the large amount of computation. The most important feature is that the regularization parameter is not fixed but dynamically changes with the reconstruction process. We substitute λ with λ(X) constituted regularization function as follows:

$$ \hat{X}=\underset{X}{\arg\min}\!\left\{\|A(HX-\hat{Z})\|^{p_{(X)}}_{p_{(X)}}+ {\lambda}(X)\!\sum^\omega_{l=-\omega}\sum^\omega_{m=-\omega}\frac{1}{p_{(X)}}\alpha^{|m|+|l|} \!\left\|X-S^l_{x}S^m_{y}X\right\|^{p_{(X)}}_{p_{(X)}}\!\right\} $$
(7)

From (7), the role of image fitting is gradually fitting the observational error to make X inclined to the high-frequency with iterative. The regular items can be viewed as the positive evolution of a scale space and can act as the role of removing the noise and blurring the fine structure on small scales gradually to make X tend to the low-frequency. Thus, the regular parameter must be dynamically adjusted according to the fitting error and the change of the regular item to adaptively balance the high and low frequency characteristics of the reconstructed image. Then it will fully fit observation images while maintaining a certain regularity conditions. So the regularization coefficient selection should follow these properties: (1) the regularization coefficient λ(·) is proportional to the data residuals, (2) the regularization coefficient λ(·) is inversely proportional to regular items Γ(·), (3) the regularization coefficient λ(·) is larger than zero, and (4) the regularization coefficient is slightly corresponding to pixels of the non-smooth region of the edge and texture points. The fidelity of the data is controlled according to Property (1). The smoothness of the solution is controlled according to Property (2). Based on the properties described above, we defined the regularization functional λ(·) as

$$ \lambda(X)=\theta\left[T(X)\cdot\frac{\|A(HX-\hat{Z})\|^{p_{(X)}}_{p_{(X)}}} {\sum^\omega_{l=-\omega}\sum^\omega_{m=-\omega}\frac{1}{p_{(X)}}\alpha^{|m|+|l|} \left\|X-S^l_{x}S^m_{y}X\right\|^{p_{(X)}}_{p_{(X)}}}\right] $$
(8)

where θ(·) is a monotonically increasing function. T(X) represents the influence of the non-smooth region of the edge and texture points.

There are three types of monotonically increasing functions which have the above mentioned properties. Type 1 is a linear function which has a constant increasing rate. Type 2 is a logarithmical increasing function. Type 3 is an exponentially increasing function. However, when using the linear function, the regularization functional increases exponentially and becomes too sensitive to the partially reconstructed HR image. Furthermore, it depends on the initial condition considerably. Similarly, Type 3 regularization functional exponentially increases. It is more sensitive than the linear function. Therefore, we focus on Type 2 and propose a regularization functional taking the form of an irrational function as

$$ \lambda(X)=\sqrt{\frac{\|A(HX-\hat{Z})\|^{p_{(X)}}_{p_{(X)}}} {\sum^\omega_{l=-\omega}\sum^\omega_{m=-\omega}\frac{1}{p_{(X)}}\alpha^{|m|+|l|} \left\|X-S^l_{x}S^m_{y}X\right\|^{p_{(X)}}_{p_{(X)}}+\delta}} $$
(9)

where δ is a very small constant and we take δ as 0.0000001 to prevent division by zero. This regularization functional is not influenced by the non-smooth region of the edge and texture points. We achieve (7) by iteratively de-blurring to make the rebuilding image closer to the original image after image fusion. The reconstructed image is more accurate by multiple iterations [22]. The concrete realization is as follows:

  1. Step 1

    Data fusion, find the blurred preliminary HR image \(\hat{Z}\) and the diagonal matrix A, the initial HR image \(\hat{X}_n\) is \(\hat{Z}\).

    1. Step 1.1

      The median of image pixels of the same mobile position in the LR image sequences is mapped to the corresponding position of the HR image \(\hat{Z}\) by the nearest neighbor interpolation method. The diagonal values of diagonal matrix A equal to the square root of the number of the same mobile position.

    2. Step 2.2

      If the image \(\hat{Z}\) still has position vacancies, we use the median filter to \(\hat{Z}\) and fill the position corresponding to \(\hat{Z}\) by filtered image pixel values to get the final image \(\hat{Z}\). The initial HR image \(\hat{X}_n\) is \(\hat{Z}\).

  2. Step 2

    Select local structure adaptive prior model according to (5) and calculate the adaptive parameter λ according to (9) to obtain the current SR image \(\hat{X}_{n+1}\).

  3. Step 3

    If the number of iterations is not equal to the set value, assign \(\hat{X}_n=\hat{X}_{n+1}\) and return to step 2. Otherwise, exit calculation and \(\hat{X}_{n+1}\) is the final requirement of SR image.

4 Experimental results and analysis

This section will evaluate the merits of the proposed adaptive BTV method and BTV method by comparing peak signal to noise ratio (PSNR) and visual effects of the reconstructed image. PSNR is a method to evaluate the quality of signal reconstruction. It is widely used in image compression and so on. PSNR is defined as:

$$ PSNR=10\log_{10}\frac{MN}{\|f-f'\|^2} $$
(10)

where M and N are the size of the HR image, f is the matrix of pixel values of the actual HR image, f′ is the matrix of pixel values of the reconstructed HR image, ||f − f′||2 is the mean square error. The unit of PSNR is dB. So the larger the PSNR value, the less distortion. We do some experiments and compare the proposed local structure adaptive BTV SR reconstruction method with the BTV SR reconstruction method in references [24] and the adaptive U-Curve SR reconstruction method.

4.1 Simulations

The original HR image is shown in Fig. 2a. Keeping all other degraded parameters unchanged except noise to obtain multiple sets of degraded image sequences in experiments, because the noise is random Gaussian noise in the image degradation. We reconstruct images using the above two methods to compare with the PSNR average, as shown in Table 1. Figure 2 only shows one of the final effect images, (b) is a LR image, (c) is the adaptive U-Curve SR reconstruction image, (d) is the BTV SR reconstruction image, and (e) is the adaptive BTV SR reconstruction image. After blurring and down-sampling (a), adding Gaussian white noise with noise variance of 0.5 and noise output of 10, and narrowing 2 times, we get the LR image.

Fig. 2
figure 2

Comparison of the reconstructed image and the original HR image

Table 1 Comparison of reconstruction results of different reconstruction methods (PSNR values)

It is visually obvious that the local structure self-adaptive BTV SR reconstruction image in Fig. 2e is the smoothest and is closest to the original HR image in Fig. 2a. The BTV reconstruction image in Fig. 2d has a lot of noise making the effect not very good. Therefore, the local structure adaptive BTV SR reconstruction method is the best.

Table 1 shows that, in the three methods, the local structure adaptive BTV SR reconstruction method makes the maximum PSNR values of the reconstruction image. And the BTV SR reconstruction method makes the minimum PSNR values. Hence, the performance of the local structure adaptive BTV SR reconstruction method is the best according to the evaluation criteria of the PSNR value.

In order to further illustrate the effectiveness of the method, the following experience verifies the robustness of the local structure adaptive BTV SR reconstruction method for different noise. We adopt different noise outputs, while other parameters remain the same. Then we use PSNR to judge reconstruction HR image, as shown in Table 2. Figure 3 only shows the effect images of the three reconstruction methods when the noise output is 11. Where (b) is LR image sequence (after blurring and down-sampling (a), adding Gaussian white noise with the noise variance of 0.5 and narrowing 3 times), (c) is the adaptive U-Curve SR reconstruction image, (d) is the BTV SR reconstruction image, (e) is the adaptive BTV SR reconstruction image.

Table 2 The impact of different noise on image reconstruction (PSNR values)
Fig. 3
figure 3

The impact of different noise on image reconstruction

Comparison in Table 2 shows that PSNR values of the local structure adaptive BTV SR reconstruction are the biggest. That is to say its noise immunity is the best.

4.2 The actual experiment

In order to verify the accuracy of the simulation results, we use the above three methods to reconstruct the actual directly obtained LR images respectively. The magnification is 2. Shown in Fig. 4, (a) is the original LR images. Because the effects of the surrounding environment on the reconstruction results become larger with reconstructing the moving target, we need to individually isolate target. (b) is the LR target image after shearing, (c) is the adaptive U-Curve SR reconstruction image, (d) is the BTV SR reconstruction image, (e) is the adaptive BTV SR reconstruction image. The actual experiment further verifies that the local structure adaptive BTV SR reconstruction method is the most effective.

Fig. 4
figure 4

SR reconstruction of the movement of vehicles

5 Conclusions

This paper studies the robust SR reconstruction method based on regularization framework. The selection of regularization parameters plays a very important role in the reconstruction results and speed. A large number of experiments should be made to obtain the best effective parameters for every reconstructed HR images in the process of the robust SR reconstruction methods. It spends too much time and effort. This paper has proposed and improved and proposed the local structure adaptive robust SR reconstruction method. We discuss the adaptive model and the selection of parameters in detail. This algorithm can get better reconstruction results and can reduce the repetitive tasks greatly, because regularization parameters in each iteration are different in the adaptive algorithm. This algorithm can automatically select the local structure adaptive priori model. It can automatically adjust the “smoothness” measure according to the local characteristics of the image, so the reconstruction results are better able to remain the image edges and smooth the image non-edge region. Comparing three methods through simulation and actual experiment, we can confirm the effectiveness and practicality of the proposed method and apply it to the gray image SR reconstruction.