1 Introduction

Human beings rely on the visual system to obtain large amount of information. The research shows that about 70% of the information is obtained through the visual system, so it is particularly important to acquire, process and use the image information. Many image processing methods use image restoration technology to achieve the desired image effect. The significance of image restoration technology can be perceived from the space exploration more than 60 years ago. At that time, the images transmitted back to the earth from space were degraded due to unmatured imaging technology, unsatisfactory shooting environment, relative motion between objects and the jitter of the camera itself, which resulted in low image resolution and blurry image. To solve the problem of image degradation caused by various reasons, people began to study image restoration. During the production process and in real life, the two most typical image degradation phenomena are noise and blur. In the process of obtaining images, many factors will affect image quality, such as object motion, solar radiation, defocusing, optical deviation, and atmospheric [17, 23, 33, 34]. During the process of image transmission, the image will also produce blur and noise due to the interference of the transmission channel, the shooting of electronic components and other reasons. These degraded images bring great difficulties to subsequent image processing, including image feature extraction, target object tracking, etc. With the wide application of images in various fields, people also have higher and higher requirements for image resolution, so continuous research on image restoration technology is essential to meet the visual requirements of human beings and the application of various fields. In image restoration algorithm, one of the key techniques is image deblurring. This paper focuses on image deblurring in image restoration.

The rest of this paper is structured as follows. Section 2 introduces the process of image degradation and the existing research on it. Section 3 elaborates the principle of image deblurring algorithm. Section 4 presents the structure of convolutional neural networks. Section 5 carries out experiments to compare the restoration effects of the proposed algorithm with other image deblurring algorithms. Section 6 is the conclusion.

2 Related works

The blurry image formation is essentially a convolution process, during which blurry image is formed by the convolution of point spread function and clear image. The image restoration mainly aims to recover the real image from an observed degraded image, which is a deconvolution process. Linear restoration is also known as non-blind image restoration, meaning that the point spread function is known. However, the point spread function is often unknown and difficult to obtain in advance in our daily life. Therefore, such a problem turns into blind image restoration. At present, the classical deblurring algorithm mainly achieves image restoration based on known blur kernel, e.g., wavelet-based regularization method [5, 14] and Framelet method [4, 11]. When the blur kernel is unknown, image deblurring becomes an ill-posed problem and the ill-condition is more serious. It will be more difficult and more challenging to solve such a problem comparing to non-blind restoration [22].

When the blur is spatially invariant, the blurring process is usually modeled by

$$ g=x\otimes h+n $$
(1)

Where ⊗ denotes convolution operator, g is a given blurry image, x is the hidden clear image, h is blur kernel, and n is noise. So, the essence of image restoration is to solve the deconvolution of formula (1) to get a clear image x. When h is known, it corresponds to non-blind image restoration. When h is unknown, it corresponds to blind image restoration. This paper focuses on blind image restoration and aims to restore the blurred image g when x and h are unknown. It is a strongly ill-posed problem since feasible solutions are unstable and alternative. Therefore, it is of great importance to obtain prior knowledge about blur kernel and image to solve such a problem. The researches on blind image restoration often involve complicated imaging system and random external factors. Researchers have been continuously studying blind image restoration and a lot of research results have been achieved. Some research results were recognized in some respects, but they were limited to varying degrees due to the complexity of the problem. After the total variation model was successfully applied to image denoising and non-blind image restoration, Chan et al. [6] further proposed a blind restoration model based on the total variation norm and constructed the energy variation model to solve the blur kernel and clear image simultaneously. However, the final effect was not as satisfactory as expected because the prior knowledge about blurring had not been considered. Fergus et al. [12] proposed to use the prior knowledge of image of Gaussian mixture model which fits the “heavy-tailed distributions” feature of gradient edge distribution of natural image to estimate blur kernel in the framework of Variable Bayes first. Then they combined it with Lucy Richardson algorithm to restore the image, which achieved better restoration effect. However, such a method is highly complex in terms of time. Levin [26] proved that the algorithm with the sparsity of image gradients as prior knowledge could make a recovered image closer to the real image. In 2016, Pan [35] was inspired that the dark channel of a clear image was characterized by sparsity and that of a blurry image was not. He added the sparsity of dark channels into the optimization function as a regularization term, which has been proved to be one of the most successful regularization methods so far.

With the development of deep learning neural networks, deep convolution neural network (CNN) has already been used to solve some image restoration problems such as image super-resolution [51, 52] and denoising [50]. Based on the traditional iterative optimization algorithm, Schuler [39] input the blurry image into the feature extraction model, which composed of convolutional neural networks, and obtained blur kernel and clear image by extracting multiple models from coarse to fine. Kupyn et al. [24] used GAN (Generative Adversarial Network) network learning to predict the trajectory of blur kernels to achieve deblurring. Shen et al. [41] proposed a convolutional neural network of human-aware motion deblurring, which combined multi-branch deblurring model with supervised attention mechanism to selectively enhance foreground and background. Thus, the motion blur between the foreground and background is eliminated. Lu et al. [29] proposed an unsupervised domain-specific single-image deblurring method. By unraveling the content and blur features in the blurry image and adding KL divergence loss to prevent the blur features from coding the content information. To preserve the content structure of the original image, blurring branches and cyclic consistency loss are added to the framework, while the added perceptual loss helps the blurry image to remove impractical artifacts. Zhou et al. [53] proposed an end-to-end deep neural network of deblurring for stereo cameras. That is to say, the left and right images collected by a stereo camera are used to realize the deblurring process. The blur effects obtained by two images are different, and the blur has great spatial variability. Depth information can provide additional prior information and different information from two images can help remove blur. The purpose of this paper is: objects at close range are more easily blurred than objects at long distances, and the size of the blur can be better estimated by distance information. Li et al. [28] proposed an effective blind image deblurring method based on data-driven discriminant prior. It pre-defines images as binary classifiers using deep convolutional neural networks and the learned prior can distinguish whether the input image is clear or not. This method is embedded in the largest posterior framework and helps to blind deblurring in various scenarios. Tao et al. [43] proposed a new scale cyclic network -- SRN-DeblurNet. It discusses and solves two important general problems in CNN-based deblurring systems, and it has the characteristics of simple network structure, a smaller number of parameters and easier training.

At present, most blind image deblurring methods are traditional model-based methods, while it is difficult to deblur blind image through traditional convolutional neural network since it is restricted by training models. To overcome some shortcomings of blind image deblurring, the following contributions have been made in this paper:

  • The method of Bai [1] was mainly adopted to divide the blurry image into skeleton image and blur kernel to achieve accurate blur kernel estimation. Meanwhile, with the help of variable splitting techniques, like alternating direction method of multiplier (ADMM [2]) and half quadratic splitting method (HQS [16]), fidelity term and regularization term could be treated respectively [36], making it possible to combine the network model trained by discriminative learning approach with model-based optimization method to restore blind images more efficiently.

  • Two fast and effective denoising CNN models were trained in this paper, in which the latest CNN techniques such as ReLU activation function [21], batch normalization [19], residual learning [20] and dilated convolution [48] were also adopted to achieve better image restoration performance. The network model was combined with the half quadratic splitting method to provide powerful prior information of the image before the optimization method of the model was used

  • Two sets of CNN noise reduction models were inserted into the model-based optimization method to deblur blind image. Experimental results show that the combination of the model-based optimization method and the CNN denoising device can provide a flexible, fast and effective framework for blind image deblurring tasks. Traditional model-based optimization methods usually require complex image prior to obtain good restoration results. However, this depth-based CNN denoising prior optimization method can efficiently obtain good image restoration results due to the fast CNN denoising plug-ins. Meanwhile, this work emphasizes the benefits of integrating model-based optimization methods with discriminative learning methods. It also shows that well-learning CNN denoising prior can replace traditional image model prior. The final experimental results showed that the algorithm in this paper was closely effective or even more effective than the current algorithm in blind image deblurring.

3 Method

Since image restoration is an ill-posed problem with numerous solutions, prior (or regularization) method is required to restrain the solution space. Maximum A Posteriori approach fully considers the image prior knowledge and it is based on Bayesian perspective to convert the original issue to the issue of optimal solution \( \hat{x} \):

$$ \hat{x}=\underset{x}{\arg \max}\log p\left(g|x\right)+\log p(x) $$
(2)

Where p(g| x) represents likelihood probability, p(x) represents the prior probability of the clear image and is irrelevant to the degraded image g. The Eq. (2) can be further corrected to:

$$ \hat{x}=\underset{x}{\arg \min}\frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2+\lambda \Phi (x) $$
(3)

Where \( \frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2 \) is fidelity term, h is degraded matrix which describes the image degradation process. Φ(x) is regularization term, which represents the image prior information used to restrain the final solution. λ is the trade-off parameter between the fidelity term and regularization term. Image restoration is equivalent to the solution Eq. (3) and the method is generally divided into two categories: model-based optimization and discriminative learning. The model-based optimization approach is to solve the Eq. (3) directly, but it requires a large quantity of iterative computation and affects efficiency. The discriminative learning approach keeps optimizing the loss function in a number of training sets, which contain degradation images, to get the prior parameter Θ.The final target is that the network output and the target distance is the minimum, i.e., \( \underset{\Theta}{\min}\mathrm{\ell}\left(\hat{x},x\right) \)whose constraint condition is:

$$ \hat{x}=\underset{x}{\arg \min}\frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2+\lambda \Phi \left(x;\Theta \right) $$
(4)

The variable splitting technology can combine the advantages of the above two image restoration algorithms. It separates the fidelity and regularization terms and the separated regularization term only corresponds to the sub-problem of image denoising [3, 7, 37, 38, 42, 44]. After one auxiliary variable z is introduced by using the HQS method in variable splitting technology, the Eq. (3) can be rewritten to:

$$ \hat{x}=\underset{x}{\arg \min}\frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2+\lambda \Phi (z) $$
(5)

the constraint condition is z = x. However, the original HQS approach is designed to solve the following problem:

$$ {\zeta}_{\mu}\left(x,z\right)=\frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2+\lambda \Phi (z)+\frac{\mu }{2}{\left\Vert z-x\right\Vert}_2^2 $$
(6)

where, μ is the penalty parameter in the regularization term, which keeps decreasing in the iteration of the solution. As for the solution of Eq. (6), z can be taken as constant, then:

$$ {x}_{k+1}=\underset{x}{\arg \min }{\left\Vert g- hx\right\Vert}_2^2+\mu {\left\Vert x-{z}_k\right\Vert}_2^2 $$
(7)

similarly, when x is taken as constant:

$$ {z}_{k+1}=\underset{z}{\arg \min}\frac{\mu }{2}{\left\Vert z-{x}_{k+1}\right\Vert}_2^2+\lambda \Phi (z) $$
(8)

As shown in Eqns. (7) and (8), the HQS method has successfully separated the fidelity term \( \frac{1}{2}{\left\Vert g- hx\right\Vert}_2^2 \) and regularization term Φ(x), which divides the original big problem into two small individual parts. Eq. (7) can be used to seek the solution that derivative equals to 0. xk + 1 is also the solution to the following Eq. (9):

$$ \frac{\partial \left({\left\Vert g- hx\right\Vert}_2^2+\mu {\left\Vert x-{z}_k\right\Vert}_2^2\right)}{\partial x}=0 $$
(9)

it is easy to get:

$$ {x}_{k+1}={\left({h}^Th+\mu I\right)}^{-1}\left({h}^Tg+\mu {z}_k\right) $$
(10)

As for Eq. (8), it can be changed to the form of Eq. (11) as follows:

$$ {z}_{k+1}=\underset{z}{\arg \min}\frac{\mu }{2{\left(\sqrt{\lambda /\mu}\right)}^2}{\left\Vert {x}_{k+1}-z\right\Vert}_2^2+\Phi (z) $$
(11)

According to the Bayesian probability, the Eq. (11) can be explained as: zk + 1 is the result of denoising of the image xk + 1 by the Gaussian denoiser with the noise level of \( \sqrt{\lambda /\mu } \). To represent this point more vividly, the Eq. (11) can be changed to the form as follows:

$$ {z}_{k+1}= Denoiser\left({x}_{k+1},\sqrt{\lambda /\mu}\right) $$
(12)

The Eqns. (10) and (12) suggest that the fidelity term and regularization term have been separated successfully and the regularization term only corresponds to the sub-problem of image denoising. The result is that the two sets of trained denoisers can be integrated to the model-based optimization method to solve different image restoration problems.

This paper uses the prior knowledge of RGTV [1] to transform the blurry image problem in Eq. (1) into an optimization problem, and the formula Eq. (1) can be further rewritten as follows:

$$ \left\{\hat{x},\hat{h}\right\}=\underset{x,h}{\arg \min}\frac{1}{2}{\left\Vert x\otimes h-g\right\Vert}_2^2+\beta {\left\Vert x\right\Vert}_{RGTV}+\mu {\left\Vert h\right\Vert}_2^2 $$
(13)

where \( \hat{x} \) is skeleton image, \( \hat{h} \) is the blur kernel of estimation. \( \frac{1}{2}{\left\Vert x\otimes h-g\right\Vert}_2^2 \)is the data fidelity term and the remaining two terms are regularization for variables x and k. β and μ are two corresponding parameters. To solve blur kernel h, we make a slight modification by solving h in the gradient domain to avoid artifacts. The Eq. (13) becomes:

$$ \hat{h}=\underset{k}{\arg \min}\frac{1}{2}{\left\Vert \nabla \hat{x}\otimes h-\nabla g\right\Vert}_2^2+\mu {\left\Vert h\right\Vert}_2^2 $$
(14)

where ∇ is the gradient operator (14) is a quadratic convex function and has a closed-form solver like deconvolution. We accelerate the solver via FFT. After obtaining \( \hat{h} \), we threshold the negative elements to zeros and normalize \( \hat{h} \) to ensure \( {\sum}_i\overset{\wedge }{h_i}=1 \). The basic principle of successful kernel estimation is that (14) is an over-determined function. Since the kernel is much smaller than the image, the skeleton image with restored sharp edges is enough for kernel estimation.

In this paper, the blurry image and the estimated blur kernel are used as the input, so that the blind image restoration is converted to non-blind image restoration. Firstly, the image is deconvoluted, and the initial deconvolution image contains a lot of noise. Experiments show that the image gradient can model the details and structure of the image, and the image noise and artifacts after denoising in the gradient domain are smaller [49]. We can get the horizontal and vertical gradients of the processed image, and then use the convolution neural network for processing. To prevent extra parameters being trained, the vertical gradient is transposed so that they can share the same CNN denoiser. Finally, the result of less noise is achieved. The result z obtained in the previous step is used as the input of the model, and then the iterative solution is carried out. The flow is shown in Fig. 1.

Fig. 1
figure 1

Illustration of the flow chart of the algorithm

4 CNN denoising method

4.1 Key technology in CNN

Because of CNN’s powerful ability to excavate image features, we have reason to believe that using CNN to remove image noise will achieve good results. However, there are many problems with using the classical CNN structure to denoise the image directly: the first is how to select activation function; the second is that if the pooling layer is added to the network structure, the image behind the network will be compressed into a very small size, which will not only lose a lot of information but also make it more difficult to restore the image. Then it gives rise to the problem that how to increase the receptive field without changing the image size; third, there are many parameters in the whole network. So, the problem is how to speed up the whole training process when it takes a lot of time to train so many parameters under normal conditions. In the following part of this paper, we will give details about the network design.

4.1.1 Selection of activation function

The activation function is introduced to add nonlinear factors. Although the sigmoid function has been successfully applied to numerous network structures, few people have used sigmoid in the process of constructing neural networks now. The activation of neurons is saturated at close to 0 or 1 while is almost 0 at these gradient regions is, which leads to the gradient disappearing and greatly reduces the training speed. Another activation function called ReLU (also called linear correction unit) solves this problem, whose representation is as follows:

$$ f(x)=\max \left(0,x\right) $$
(15)

When the input is less than or equal to 0, the output is 0, which is equivalent to building a sparse matrix. This feature can remove the redundancy in the data and retain the characteristics of the data as much as possible. In the process of continuous network operation, it is trying to use a matrix of mostly 0 to express data features. Due to the existence of sparsity, this method is fast and effective.

4.1.2 Dilated convolution

The existence of the pooling layer makes the image constantly reduce, which leads to the loss of plenty of information. Therefore, the convolution method of dilated convolution is introduced. The basic idea of dilated convolution is that under the premise of keeping the image the same size, neither the receptive field can be smaller nor the huge amount of computation increased. To do this, the convolution kernels which are arranged tightly will be made “fluffy”. The number of points to be calculated in the convolution kernels remains the same, and all the empty positions are filled with “0”. In this way, the receptive field can be expanded continuously, and the part of the convolution kernel that needs to be calculated remains unchanged, always be 3 × 3. At the same time, dilated convolution is widely applied in video object segmentation [30, 31].

4.1.3 Batch normalization

Although the adoption of the above-mentioned ReLU activation function has solved the problem of saturation gradient disappearing, there are still many factors that slow down the training speed in the actual training process. In the process of network training, due to the backpropagation, the constant changes of parameters in each layer will cause input distribution in every layer of the network. Training needs to adapt to these changes, thus reducing training efficiency. In this paper, the method of batch normalization (BN) is adopted to solve this problem. After obtaining the average value and variance of the whole batch of data, we normalize the data so that the training of each layer of the network does not need to adapt to the input changes, which greatly increases the training efficiency.

4.1.4 Residual learning

There are two ways of learning in the neural network. One is to directly learn the mapping from the noise-containing image g to the potential clear image x, and the other is to learn the noise in the image first and then solve the potential clear image. The latter learning method is also called residual learning. If a mapping is closer to identity mapping, it is easier to get rid of the optimization process using residual network. Obviously, the process of image denoising is closer to an identity map, especially in the case of low noise level. Therefore, residual learning is added to the network structure in this paper, which can accelerate and stabilize the training process together with the BN mentioned above and improve the denoising ability of the network model.

4.2 Proposal of network model

Practice has showed that CNN structure is more conducive to extract the features of the image, and is very powerful in performance. Moreover, the process of training networks can run parallel operation based on GPU, which greatly improves the efficiency of the training process. Based on this advantage, this paper uses CNN to do image restoration, and then applies the above mentioned ReLU activation function, dilated convolution, batch normalization and residual learning to the network model, to obtain better image restoration performance. The network model in this paper is shown in Fig. 2. As can be seen from Table 1, the general term between the input layer and the output layer is a hidden layer. The size of the single convolution kernel in each layer adopts the traditional 3 × 3 model, and the convolution step size is 1. Meanwhile, the zero-filling method is adopted to solve the boundary effect (Fig. 2).

Table 1 Structure parameters of the network model
Fig. 2
figure 2

Denoising network structure, where s-DConv means expanded convolution, s = 1,2,3 and 4, BN means batch normalization and ReLU means activation function

5 Experiment

Since residual learning is added to the network, the residual function of the training model is:

$$ \mathrm{\ell}\left(\Theta \right)=\frac{1}{2N}{\sum}_{i=1}^N{\left\Vert f\left({g}_i;\Theta \right)-\left({g}_i-{x}_i\right)\right\Vert}_F^2 $$
(16)

where \( {\left\{\left({x}_i,{g}_i\right)\right\}}_{j=1}^N \) represents N pairs of clear and noisy images, f is the output of network model, the difference between the predicted value and the real value can be expressed in f(gi; Θ) − (gi − xi). Θ is the parameter to be trained in the model and N is the number of mini-batch input images.

Since the loss function is determined, the network needs to be trained. The training dataset used in this paper includes 400 images in Berkeley segmentation dataset [8], 400 images in ImageNet database [10] and 4744 images in Waterloo Exploration Database [32]. We cut all the images into small blocks of 35 × 35 and randomly selected 256 × 4000 blocks for training. Adam is adopted in the solver and the default hyperparameters is used in which the size of the mini-batch was selected as 256. The momentum stochastic gradient descent method is used to train the network. The learning rate is 0.01, the weight decay is 0.0001, and the momentum is set to 0.9. To handle a wide range of noise levels, we train a set of denoising network models for different noise levels, ranging from 0 to 50 with a step size of 2 and a total of 25 denoisers. Our experiments have been implemented in MATLAB R2017b with MatConvNet package [45], running on PC with Intel Core i7-7700HQ CPU,2.80 GHZ, NVIDIA GeForce GTX 1060 GPU. The operating system is windows 10.

5.1 Blind image restoration experiment

We designed the following experimental process of blind image deblurring:

figure a

In the above iteration, the noise level of the denoiser is a gradual process of decreasing. In this paper, the iteration times is set as 30, a series of 30 equal numbers (in descending order) in the range of 50 to 0 is formed, and then 25 denoisers are mapped into 30 (with repetition) in line with the principle of proximity. So not every iteration loads a new model. Besides, since the inverse of matrix exists in Eq. (10), FFT is used to speed up the solution. To verify the deblurring effect of the algorithm in this paper, we apply the method in this paper to the deblurring of some scene images, including Blind Decision Benchmark, Uniform Deblurring and Non-Uniform Deblurring. In each case, the method is compared with some representative algorithms to illustrate its effect. As shown in Figs. 3, 4, 5 and 6.

Fig. 3
figure 3

Restoration results on the blind deconvolution benchmark (1): a blurred image, b Fergus et al. [13] PSNR = 24.55, c Cho and Lee [9] PSNR = 26.67, d Shan et al. [40] PSNR = 28.178, e Bai et al. [1] PSNR = 27.93, f Kupyn et al. [25] PSNR = 27.34, g Gao et al. [15] PSNR = 28.37, h Proposed PSNR = 28.79

Fig. 4
figure 4

Restoration results on the blind deconvolution benchmark(2): a blurred image, b Fergus et al. [13] PSNR = 21.28, c Cho and Lee [9] PSNR = 24.98, d Shan et al. [40] PSNR = 22.68, e Bai et al. [1] PSNR = 22.35, f Kupyn et al. [25] PSNR = 21.84, g Gao et al. [15] PSNR = 22.15, h Proposed PSNR = 26.32

Fig. 5
figure 5

Restoration results on the uniform deblurring: a blurred image, b Levin et al. [27] PSNR = 27.36, c Cho and Lee [9] PSNR = 27.82, d Xu and Jia. [47] PSNR = 28.70, e Bai et al. [1] PSNR = 26.98, f Kupyn et al. [25] PSNR = 26.74, g Gao et al. [15] PSNR = 27.49, h Proposed PSNR = 29.13

Fig. 6
figure 6

Restoration results on the non-uniform deblurring: a blurred image, b Whyte et al. [46] PSNR = 25.74, c Hirsch et al. [18] PSNR = 27.21, d Bai et al. [1] PSNR = 27.52, e Kupyn et al. [25] PSNR = 28.05, f Gao et al. [15] PSNR = 26.74, g Proposed PSNR = 27.36

From the above comparison, we can directly observe that the blind image deblurring algorithm has a good performance in visual and PSNR, especially on some Blind Devolution Benchmark images. In many scenarios, the algorithm in this paper has a better restoration effect than some representative algorithms and a similar effect with some algorithms, which indicates that the algorithm in this paper has certain competitiveness in blind image restoration. The specific numerical results are shown in Table 2.

Table 2 Performance and efficiency comparison on the test dataset

6 Conclusion

In this paper, the Gaussian denoisers was obtained by CNN learning, and the denoisers were integrated as modules into the model-based optimization method through variable splitting technique (i.e., the fidelity term and regularization term are separated in the original problem). The blur kernel of the blind image was accurately estimated, and these methods were integrated to greatly improve the flexibility of blind image restoration. A variety of representative algorithms on blind image restoration were compared with the algorithm proposed in this paper to better illustrate the latter’s competitiveness. Finally, through the comparison of visual effect and PSNR, it is evident that the algorithm in this paper can rival some representative algorithms or even compare favorably with them. Meanwhile, the denoiser obtained by CNN learning possessed excellent prior knowledge of the image, and could be used in the model-based optimization method to restore blind image more effectively.

Although the method proposed in this paper has integrated with the advantages of model-based optimization method and discriminative learning method, a lot of problems still need to be further studied. For example, the restored blind images in some scenarios in this paper were slightly misaligned with the originals; the number of denoisers and iterations for training denoisers are also worth studying; besides, it is of great importance to further study whether other network models of deep learning apart from CNN can achieve better results.