1 Image Denoising

1.1 Problem Statement

Image denoising aims to recover a high quality image from its noisy (degraded) observation. It is one of the most classical and fundamental problems in image processing and computer vision. On one hand, the ubiquitous use of imaging systems makes image restoration very important to the system performance. On the other hand, the quality of output images plays a crucial role in the user experience and the success of the following high level vision tasks such as object detection and recognition.

A simplified general image degradation model for the denoising task largely adopted in the literature is:

$$\displaystyle \begin{aligned} {\boldsymbol{y}}={\boldsymbol{x}}+{\boldsymbol{n}}, \end{aligned} $$
(1)

where x refers to the unknown high quality image (ground truth), y is the degraded observation, and n represents the additive noise. For decades, most of the denoising research has been conducted on the additive white Gaussian noise (AWGN) case. AWGN assumes n to be the independent identically distributed Gaussian noise with zero mean and standard variance value σ. There are also some works [4, 15, 27, 33, 36, 95] which aim to solve Poisson noise removal or pepper and salt noise removal tasks. However, in this review, we focus mainly on the works and the proposed solutions for the AWGN removal task (Fig. 1).

Fig. 1
figure 1

Standard Lena test image corrupted by different types of noise: Gaussian, Poisson, and pepper-and-salt

The main challenge in image denoising lies in that a significant amount of information has been lost during the degradation process, making image denoising a highly ill-posed inverse problem. In order to get a good estimation of the latent image, prior knowledge is required to provide supplementary information. Therefore, how to appropriately model the prior of high quality images is the key issue in image restoration research.

1.2 Natural Image Prior Modeling for Image Denoising

A wide range of approaches have been suggested to provide supplementary information for estimating the denoised image. According to the image information used the approaches can be divided into internal (use solely the input noisy image) [7, 26, 41] and external (use external images with or without noise) [57, 78, 97, 102] denoising methods. Some works shown that the combination or fusion of internal and external information can lead to better denoising performance [9, 38, 63, 82].

In this review, based on how the prior is exploited to generate high quality estimation, we divide the previous prior modeling methods into two categories:

  1. 1.

    the implicitly modeling methods and

  2. 2.

    the explicitly modeling methods.

Each category is further briefly described and reviewed in the next.

1.2.1 The Implicit Methods

The category of implicit methods adopt priors of high quality images implicitly, where the priors are embedded into specific restoration operations. Such an implicitly modeling strategy was used in most of the early years image denoising algorithms [69, 84, 89]. Based on the assumptions of high quality images, heuristic operations have been designed to generate estimations directly from the degraded images. For example, based on the smoothness assumption, filtering-based methods [20, 62, 84, 88] have been widely utilized to remove noise from noisy images. Although image priors are not modeled explicitly, the priors on high quality images are considered in designing the filters to estimate the clean images. Such implicitly modeling schemes have dominated the area of image denoising for decades. To generate the piece-wise smooth image signal, diffusion methods [69, 89] have been proposed to adaptively smooth image contents. By assuming that the wavelet coefficients of natural image are sparse, shrinkage methods have been developed to denoise images in wavelet domains [25, 30]. Based on the observation that natural image contains many repetitive local patterns, the non-local mean filtering approach has been suggested to profile from the image non-local self-similarity (NSS) prior (see Fig. 2). Although these simple heuristic operations have limited capacity in producing high-quality restoration results, these studies greatly deepen researchers’ understanding on natural image modeling. Many useful conclusions and principles are still applicable to modern image restoration algorithm design.

Fig. 2
figure 2

Similar patches in an image from Set5 marked with coloured rectangles. The non-local self-similarity prior (NSS) refers to the fact that natural image usually contain many repetitive local patterns

Recently, attributed to the advances in machine learning, researchers have proposed to learn operations for image denoising. Different methods have been developed to build the complex mapping functions between noisy and its corresponding clean image [18, 23, 77, 80, 97]. Since the functions (such as neural networks) learned in these methods are often very complex, the priors embedded in these functions are very hard to analyze. As a result, the functions trained for a specific task (denoising task with different noise types) are often inapplicable to other restoration tasks. One may need to train different models for different degradation parameters. Albeit its limited generalization capacity, the highly competitive restoration results obtained by these discriminative learning methods make this category of approaches an active and attractive research topic.

1.2.2 The Explicit Methods

Besides implicitly embedding priors into restoration operations, another category of methods explicitly characterize image priors and adopt the Bayesian method to produce high quality reconstruction results. Having the degradation model p(y|x) and specific prior model p(x), different estimators can be used to estimate latent image x. One popular approach is the maximum a posterior (MAP) estimator:

$$\displaystyle \begin{aligned} \hat{{\boldsymbol{x}}} = \arg\max_{\boldsymbol{x}} p({\boldsymbol{x}}|{\boldsymbol{y}})=\arg\max_{\boldsymbol{x}}p({\boldsymbol{y}}|{\boldsymbol{x}})p({\boldsymbol{x}}), \end{aligned} $$
(2)

with which we seek for the most likely estimate of x given the corrupted observation and prior. Compared with other estimators, the MAP estimator often leads to an easier inference algorithm, which makes it the most commonly used estimator for image restoration. However, MAP estimation still has limitations in the case of few measurements [85]. An alternative estimator is the Bayesian least square (BLS) estimator:

$$\displaystyle \begin{aligned} \hat{{\boldsymbol{x}}} = E\{{\boldsymbol{x}}|{\boldsymbol{y}}\}=\int_{\boldsymbol{x}}{\boldsymbol{x}}p({\boldsymbol{x}}|{\boldsymbol{y}})d{\boldsymbol{x}}. \end{aligned} $$
(3)

BLS marginalizes the posterior probability p(x|y) over all possible clean images x. Theoretically, it is the optimal estimate in terms of mean square error, and the estimator is also named as minimum mean square error (MMSE) estimator [85].

A wide range of models, such as Independent Component Analysis (ICA) [5], variational models [75], dictionary learning approaches [3] and Markov Random Field (MRF) [35, 72], have been utilized to characterize priors of natural image. Early studies tend to analyze image signals with analytical mathematical tools and manually designed functional forms to describe natural image prior. Later methods tend to take advantage of training data and learn parameters to better model high quality image priors. Compared with implicit prior modeling methods, these explicit priors are limited in generating highly competitive denoising results [54, 55], but them often have a stronger generalization capacity and can be applied to different image restoration applications.

Both of the two categories of prior modeling approaches have delivered several classical denoising algorithms. In the very beginning of image denoising research, implicit approaches used to dominate the field. Different filtering and diffusion approaches have been designed for image denoising. Since two decades ago, due to the hardware development, the availability of large amount computational resources and the development of optimization algorithms, sparse and low-rank models have been suggested to provide priors for denoising in an explicit way. Most recently, state-of-the-art denoising results are achieved by deep neural networks (DNN)-based denoisers, which directly learn the mapping function between noisy and clean image. Both the degradation model as well as implicit image prior are embedded in the networks.

In this paper, we review previous algorithms in chronological order. As some classical filtering, diffusion, and wavelet based algorithms have been thoroughly reviewed in previous papers [8] and [64], we focus more on recently proposed algorithms. Concretely, we start from the sparse-based models and then introduce the low-rank methods and the DNN-based approaches. In Fig. 3, we provide the timeline of some representative denoising approaches.

Fig. 3
figure 3

Timeline with a selection of representative denoising approaches

2 Sparse Models for Image Denoising

The idea of using sparse prior for denoising has been investigate from a very early stage of image denoising research. Researches on image statistics have shown that the marginal distributions of bandpass filter responses to natural images exhibit clearly non-Gaussianity and heavy tails [34]. Based on the above observation, shrinkage or optimization approaches have been suggested to get sparse coefficients in the transformation domain. During the last several decades, many attempts have been made to find more appropriate transformation domains as well as sparsity measurements for image denoising. According to different mechanisms to achieve the representation coefficients, Elad et al. [32] divided the sparse representation models into analysis-based and synthesis-based methods. In this section, we review both the two categories of works. In addition, as there are some state-of-the-art algorithms exploiting both the sparse and the non-local self-similarity prior, we also introduce these methods and show how sparse and NSS prior can be combine to achieve good denoising performance.

2.1 Analysis Sparse Representation Models for Image Denoising

The analysis representation approaches represent a signal in terms of its product with a linear operator:

$$\displaystyle \begin{aligned} \boldsymbol{\alpha}_a = {\boldsymbol{P}}{\boldsymbol{x}}, \end{aligned} $$
(4)

where x is the signal vector and α a is its analysis representation coefficients. Linear operator P is often referred to as the analysis dictionary [74].

Some early works directly adopt orthogonal wavelet basis as dictionary, and conduct shrinkage operation to sparsify coefficients. Then, inverse transform is applied on the sparse coefficients to reconstruct denoised estimation. A wide range of wavelet basis and shrinkage operations have been investigated to get better denoising performance. A good review of wavelet-based approach can be found in [64].

Although sophisticated shrinkage operations have been designed from different points of view, such a single step shrinkage operation can not achieve very good denoising performance. Iteration algorithms have been suggested to get better results. Under the MAP framework, most of analysis sparse representation models share a similar form:

$$\displaystyle \begin{aligned} \hat{{\boldsymbol{x}}} = \arg\min_x\Upsilon({\boldsymbol{x}},{\boldsymbol{y}})+\Psi({\boldsymbol{P}}{\boldsymbol{x}}), \end{aligned} $$
(5)

where Υ(x, y) is the data fidelity term which depends on the degradation model, and Ψ(Px) is the regularization term which imposes sparsity prior on the filter responses Px.

The analysis dictionary P and the penalty function Ψ(⋅) play a very important role in the analysis sparse representation model. Early studies utilize signal processing and statistical tools to analytically design dictionaries and penalty functions. One of the most notable analysis-based methods is the Total-Variation (TV) approach [75], which uses a Laplacian distribution to model image gradients, resulting in an 1 norm penalty on the gradients of estimated image. In addition to TV and its extensions [14, 16, 17], researchers have also proposed wavelet filters [10, 28, 58] for analysis sparse representation. In these methods, the gradient operator in TV methods is replaced by wavelet filters to model image local structures. Besides dictionaries, the penalty functions have also been well investigated. Different statistical models have been introduced to model the heavy-tailed distributions of coefficients in natural, which leads to a variety of robust penalty functions, such as p norm [103], normalized sparsity measure [51], etc.

Although these analytic methods have greatly deepened our understanding on image modeling, they are considered to be over-simplistic to model the complex natural phenomena. With the advance of computing power, machine learning methods have been introduced to learn better priors. From a probabilistic image modeling point of view, Zhu et al. [101] proposed the filters, random fields and maximum entropy (FRAME) framework, which characterizes the distribution of the filter responses over latent image to model image textures. For image denoising, field-of-expert (FoE) [72] is one of the representative works which learn filters (analysis dictionary) for predefined potential (penalty) functions. Inspired by FoE [72], a lot of methods have been proposed to learn better filters for image denoising from a conditional random field (CRF) [81] point of view. All the potential functions adopted in these methods have been selected to lead to a sparse coefficients. Besides the probabilistic point of view, there are still other works proposed to learn analysis dictionary from different frameworks. Ravishankar et al. [71] proposed the transform learning framework, which aims to learn better analytical sparsifying transforms for image restoration. Rubinstein et al. [74] proposed an analysis-KSVD algorithm, which borrows idea from the K-SVD algorithm [3] and learns analysis dictionary from image patches. All the above approaches learn image priors in a generative way, only high quality images are involved in the training phase. Recently, discriminative learning methods have also been utilized to train priors for specific tasks [22, 44]. By using the image pairs as training data, these discriminative learning methods are able to deliver highly competitive restoration results. However, the learning is often achieved by solving a bi-level optimization problem, which is time-consuming.

2.2 Synthesis Sparse Representation Models for Image Denoising

Different from the analysis representation models, the synthesis sparse representation models represent a signal x as the linear combination of dictionary atoms:

$$\displaystyle \begin{aligned} {\boldsymbol{x}}= {\boldsymbol{D}}\boldsymbol{\alpha}_s, \end{aligned} $$
(6)

where α s is the synthesis coefficient for signal vector x, and D is the synthesis dictionary. Such a decomposition model may have different choices of α s, and regularization is required to provide a well-defined solution. A commonly used criterion is to find a sparse coefficient vector α s, which reconstructs the signal by only a few atoms in D. In their seminal work [59], Mallat and Zhang proposed a matching pursuit (MP) algorithm [59] to find an approximated solution of the NP-hard sparse decomposition problem. The orthogonal MP (OMP) [68] method was later proposed to improve the performance of synthesis-based modeling. Besides constraining the non-zero values ( 0 norm) in the coefficients, researchers have also proposed to utilize its convex envelope, the 1 norm, to regularize the synthesis coefficient. Compared with 0 norm, the global solution of the convex 1 problem can be achieved. One can solve 1 norm sparse coding problem with conventional linear programming solvers, or modern methods, such as least angle regression [31] and proximal algorithms [67].

The application of synthesis-based sparse representation for image restoration is quite straight forward under the MAP framework:

$$\displaystyle \begin{aligned} \hat{{\boldsymbol{x}}} = argmin_x\Upsilon({\boldsymbol{x}},{\boldsymbol{y}})+\Psi(\boldsymbol{\alpha}_s), ~s.t. {\boldsymbol{D}}\boldsymbol{\alpha}_s={\boldsymbol{x}}, \end{aligned} $$
(7)

where Υ(x, y) is the fidelity term, and the regularization Ψ(⋅) on synthesis coefficient α s provides prior information for estimating the clean image x. In early years, the adopted dictionary is often designed under the umbrella of harmonic analysis [73] such as DCT, wavelet and curvelet dictionaries. These dictionaries, however, are far from enough to model the complex structures of natural images, limiting the image restoration performance. To better model the local structures in images, dictionary learning methods have been introduced to improve image restoration performance [3]. One of the representative works is the K-SVD algorithm, Aharon et al. [3] proposed to learn a dictionary from high quality images, and utilize the learned dictionary for image denoising. Equipped with the learned dictionaries, synthesis sparse representation framework has led to state-of-the-art denoising results. Besides the studies on dictionary learning, researchers have also made many attempts on designing strong regularization functions Ψ(⋅) to provide better prior for restoration [13].

The synthesis sparse representation models can be interpreted from different view points. Zoran et al. [102] and Yu et al. [94] have proposed to utilize the mixture of Gaussian models to model the prior of natural image patches. Yu et al. [94] analyzed the relationship between mixture of Gaussian model and the group sparsity model, and shown that the selecting of Gaussian distributions can be termed as a special group sparsity constraint.

2.3 Sparse Models with the NSS Prior

Besides working independently for capturing image local prior, sparse model has also been combined with other natural image priors to pursuit better denoising performance. By collecting non-local similar patches and conducting collaborative filtering on the 3D block, the block-matching and 3D filtering (BM3D) algorithm [26] achieved state-of-the-art denoising performance. The great success achieved by BM3D inspired follow up works to combine the sparse prior and NSS prior. Mairal et al. [57] collect non-local similar patches and solve the group-sparsity problem to achieve better denoising results. Dong et al. [29] propose a non-local centralized sparse representation model in which the mean value of the representation coefficients are pre-estimated based on the patch groups.

3 Low Rank Models for Image Denoising

Apart from employing sparsity prior for signal vectors, low-rank models have also been proposed to exploit the sparsity (i.e., low-rankness) of a matrix of correlated vectors. Low rank matrix approximation (LRMA) aims to recover the underlying low rank matrix from its degraded observation. It has achieved a great success in various applications of computer vision [11, 12, 40, 76, 100].

The low-rank models have also been successfully applied to image restoration problems [40, 49, 66, 86]. Some studies [49, 96] take the low-rankness as a global prior, and treat image restoration as an LRMA problem directly. However, such an assumption is too strong and can not reflect the characteristics of natural images. As a result, those global low-rank prior based methods can only perform well on images with special contents, and they will over-smooth details in natural images. For the task of denoising, Hui et al. [49] firstly applied the low-rank model on video data. Image patches from the same spatial location among different frames are collected, after vectorizing the patches, and form the patch groups as a matrix, Ji et al. [49] seeks a low-rank approximation to generate the denoised patch groups. To deal with single image denoising problem with the low-rank prior, Wang et al. [86], Gu et al. [40], Xie et al. [92] exploit the NSS prior. These methods collect the non-local self-similarity patches and conduct low-rank approximation to estimate the corresponding clean patches. To generate a good low-rank estimation from the patch groups, Gu et al. [42] proposed a weighted nuclear norm minimization (WNNM) model to adaptively regularize the singular value of matrix. The WNNM model achieved state-of-the-art denoising performance.

Inspired by WNNM, Xie et al. [92] proposed a weighted Schatten p-Norm Minimization model for image denoising. Xie et al. [91] extend the two dimension low-rank model for matrix to higher dimension tensor models, proposed a new regularizer for tensor data and achieved state-of-the-art performance on the multispectral image denoising tasks. Xu et al. [93] proposed a multi-channel WNNM model to address the real color image denoising problem, which enable different color channel to have different noise levels.

4 Deep Neural Network for Image Denoising

In the last several years, deep neural networks (DNNs) have achieved great success on a wide range of computer vision tasks. DNN has also been suggested to solve the image denoising problem.

One of the first attempts has been made for image denoising with deep neural networks is [48]. In [48] Jain and Seung proposed a small convolutional neural network to deal with image denoising problem. The network has only four hidden layers and each layer only utilizes 24 feature maps. The network gets comparable results with the FoE [72] method. In [90], Xie et al. stacked two sparse denoising auto-encoders for image denoising. Pre-trained sparse denoising auto-encoders were combined together, and LBFGS algorithm [56] was adopted to fine-tune the network in an end-to-end manner. In [78], Schuler et al. trained a multi-layer perceptron (MLP) for image denoising. MLP is the first network to achieve comparable denoising performance with the baseline BM3D approach [26]. After MLP, Schmidt et al. [77] (CSF method) and Chen et al. [21] (TNRD method) proposed to unfold the inference process of the optimization-based denoising model, and train denoising networks in an end-to-end manner. CSF and TNRD achieved comparable denoising performance with the state-of-the-art WNNM approach.

Since 2012, the great success of convolutional neural networks achieved on the image classification task [52] created a surge of interest in deep learning research. Tremendous progress have been achieved in neural network training algorithms, deep learning toolboxes and hardware devices. This progress facilitates the research on DNN-based denoising algorithms. Zhang et al. [97] stacked convolution, batch normalization [47] and ReLU [65] layers to estimate the residual between the noisy input and the corresponding clean image. Their DnCNN network (Fig. 4) not only achieved higher PSNR index on standard benchmarks than sparse and low-rank models but also much faster than previous optimization-based approaches. Inspired by the success of DnCNN, Mao et al. proposed a very deep residual encoding-decoding (RED) framework to solve image restoration problems, in which skip connections have been introduced to train the very deep network. Recently, Tai et al. [79] proposed a very deep persistent memory network (MemNet) for image denoising. Recursive unit and gate unit have been adopted to learn multi-level representations under different receptive fields. RED approach [60] and MemNet [79] improve the denoising performance over DnCNN [97], but also have a higher demand on computational resources. In [39], Gu et al. proposed the fast denoising network (FDnet) to seek a better trade-off between denoising performance and speed. By incorporating stronger activation function, e.g. the multi-bin trainable linear unit (MTLU), in the network, FDnet is capable to generate comparable denoising result with less computational resources.

Fig. 4
figure 4

The DnCNN network structure proposed by Zhang et al. [97]

5 Image Denoising: Practical Issues

In this section, we review some practical experimental protocol of image denoising research. We first introduce commonly used experimental setting as well as measurements for evaluating denoising algorithms. Then, report the performance by some representative algorithms.

5.1 Experimental Setting for Image Denoising

Most of previous algorithms were evaluated on synthetic data. Given a clean image, AWGN with different σ were added into the image to synthesize the noisy input. Then, after conducting the denoising algorithms, different measurements can be used to evaluate the quality of denoised estimations.

Until very recently, there was not a commonly accepted benchmark dataset to evaluate denoising algorithms. Most of early studies report their algorithms on a couple of images such as Lena image shown in Fig. 1. Some commonly used images are shown in Fig. 5, one can see that most of the previous algorithms were designed for gray image denoising. To the best of our knowledge, it is the FoE [72] paper which firstly utilizes 68 testing images from the Berkeley segmentation dataset [61] for evaluating denoising algorithms. The 68 image dataset gradually became a benchmark for evaluating denoising algorithms, and most of recently proposed algorithms report their performance on the Set 68 dataset. Besides testing image, another important experimental setting for evaluating denoising algorithms is the noise. Although the algorithms are designed to handle all images corrupted by AWGN, the noise instance still affects the denoising performance. For the purpose of fair comparison, previous algorithms set the noise seed as 0 when using Matlab to generate synthetic input images. Setting the same noise seed ensures the input noisy image for different algorithms to be the same.

Fig. 5
figure 5

Some commonly used images used for evaluating denoising algorithms beside the BSD 68 dataset

Recently, some benchmarks have been prepared to evaluate the performance of denoising algorithm on real-world denoising task [1, 70]. The main challenge of building benchmark for real-world denoising problem lies in that paired noisy and clean image is hard to obtain. To address this issue, different averaging methods and elaborately designed post processing approaches have been adopted to evaluate clean reference image from multiple shoots [1, 70]. Other benchmarks consider the restoration of a clean image degraded by a combination of noise, blur and downsampling operators [83].

5.2 Measurements for Denoising Algorithms

The target of denoising algorithm is to recover the latent clean image from its noisy observation. To evaluate denoising algorithms, different measurements have been adopted to compare the denoised estimation and ground truth high quality images. The most commonly used measurement is the peak signal to noise ratio (PSNR) index. PSNR is most easily defined via the mean squared error (MSE). Given the ground truth image G and denoised estimation E, the MSE is defined as:

$$\displaystyle \begin{aligned} MSE = \frac{\sum_{M,N}[E(m,n)-G(m,n)]^2}{M\times N}, \end{aligned} $$
(8)

where E(m, n) and G(m, n) are the pixel values at position (m, n) of image E and G, respectively, and M and N are the image size. Based on MSE, the definition of PSNR is:

$$\displaystyle \begin{aligned} PSNR=10\log_{10}\left(\frac{R^2}{MSE}\right), {} \end{aligned} $$
(9)

where R is the maximum fluctuation in the image data type. Although plenty of works have pointed out that PSNR is not a good fit to measure the perceptual similarity between two images, it is still the most commonly used index to compare two images.

Besides the MSR and PSNR, perceptual quality measurements have also been proposed to evaluate denoising algorithms. One of the representative measurement is the structural similarity (SSIM) index [87]. The SSIM index is calculated on various windows of an image. The measure between two windows x and y is:

$$\displaystyle \begin{aligned} SSIM(x,y)=\frac{(2\mu_x\mu_y+c_1)(2\sigma_{xy}+c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}. \end{aligned} $$
(10)

μ x and μ y are the average value of window x and y, respectively. σ x and σ y are the variance of x and y, and σ xy is the covariance of x and y. c 1 = (k 1 R)2 and c 2 = (k 2 R)2 are two variables to stabilize the division with weak denominator, where k 1 and k 2 is set as 0.01 and 0.03 by default and R is the dynamic range as defined in Eq. (9). The SSIM and its extensions [98] have been widely applied in different tasks to compare the estimated and ground truth images. For the image denoising task, also the feature structural similarity (FSIM) [98] has been adopted in some works. Very recently, Zhang et al. [99] proposed the Learned Perceptual Image Patch Similarity (LPIPS) metric based on deep features. However, all the abovementioned perceptual quality measures are only proxies to the mean opinion score (MOS) as obtained based on the ratings from human subjects. Recently challenges on perceptual image super-resolution [6], image enhancement [46] and learned image compression (CLIC at CVPR 2018) resort to MOS for rankings.

5.3 Denoising Performance of Representative Algorithms

In this part, we present the denoising results by different methods on the Set 68 dataset for reference. Note that in the recent years other datasets gained popularity such as Set5, Set14, B100, Urban100, or DIV2K datasets [2] more commonly used for benchmarking image super-resolution algorithms. The numbers are achieved from recently published denoising paper [97]. As can be easily found in the Table 1, the discriminative learning approaches TNRD [21] and DnCNN [97], generally able to achieve better performance than the hand-crafted methods such as BM3D [26], EPLL [102] and WNNM [42]. Furthermore, the deep neural networks based approach DnCNN achieved much better performance than other models.

Table 1 The average PSNR(dB) results of different methods on the BSD68 dataset

The running time by different methods to process a 512 × 512 image are shown in Table 2. The numbers also come from the DnCNN paper, which adopted a PC with Intel(R) Core(TM) i7-5820K 3.30 GHz CPU and an Nvidia Titan X GPU to test different algorithms. Generally, as discriminative learning based approaches do not need to solve the optimization problem in the inference phase, they achieved large improvement on running speed over optimization based methods. Furthermore, some recently proposed algorithms are well-suited for parallel computation on GPU, their running time can be greatly reduced by using the GPU.

Table 2 The runtimes of different methods for processing a 512 × 512

Some visual examples of the denoising results by different methods can be found in Fig. 6.

Fig. 6
figure 6

Denoising results by different methods on a testing image from the Set68 dataset (σ = 50)

6 Emerging Topics and Open Problems

Despite the good performance DNN-based denoising approaches have achieved on standard benchmarks, there are still some open problems which may play a very important role in the application of DNN-based denoising on real systems. During the past several years, some attempts have been made to improve the performance of the denoising system such that to better fit real applications. In this section, we discuss some emerging topics and open problems in the field of image denoising.

The target of image denoising is to improve the quality of the noisy input. Pixel-wise fidelity measures such as RMSE and PSNR computed between denoised and latent ground truth are widely used to evaluate denoising results. However, these measures do not fit the human visual system well, an estimation with high PSNR index may not always mean a visually pleasant denoising result. In order to improve the perceptual quality of the denoised image, conventional approaches proposed to add extra constrains in the optimization progress such that to keep more textural details in the denoising results. For example, Cho et al. [24] utilize a hyper-Laplacian to model the image gradient, and proposed a content-aware prior which sets different shape parameters of gradient distribution in different image regions. Although the original idea of [24] was proposed for the image deblurring tasks, it inspired Zuo et al. [104] to propose a gradient histogram preservation algorithm for image denoising. Zuo et al. [104] exploit the statistical property of AWGN and pre-estimate the gradient histogram of the clean image from its noisy observation, then add a gradient histogram constraint to keep more textural details in the denoising results. For DNN-based methods, the denoised estimation is directly generated from the noisy input and adding constraint on the target is not easy. A straight forward way to improve the visual quality is to use a better loss function, better correlated to the human perception, to train the denoising networks. In the field of image super-resolution, a wide range of losses have been suggested to train CNNs for achieving better visual quality. Johnson et al. [50] proposed to use the 2 distance between the VGG features of target and estimated image to train neural networks, and claimed it helps to generate visually more pleasant results compared with the conventional 2 or 1 distance in the image domain. Inspired by the great success achieved by the generative adversarial networks (GAN) [37] in image prior modeling, Ledig et al. [53] added a GAN loss in the training of super-resolution networks and achieved better visual quality than the RMSE losses. Although the above losses were proposed for other image restoration tasks, they are also applicable for improving the visual quality of denoising networks [19]. As the loss functions greatly affects the behavior of denoising networks, finding a good loss function which fits human visual system well is a very interesting and promising research topic.

Besides seeking for better loss function, another interesting problem is training denoising networks in a weakly supervised or unsupervised manner. As the noise model in imaging system is very complex and synthesizing real noisy image from high quality image is a challenging problem. We may not able to get paired training images, which contains real noisy image and its corresponding high quality image, to train denoising networks. Most of current denoising networks follow the experimental setting of AWGN removal, the noisy inputs were synthesized by adding noise to the clean image. Although the networks achieved highly competitive performance on standard benchmarks, recent study [70] has found that DNN-based approach is not as good as conventional approaches, such as BM3D [26] and WNNM [42], for processing real noisy images. In order to solve this problem, one possible direction is to improve the generalization performance of denoising neural networks, and make neural networks trained on synthetic data capable to perform well on real data. Guo et al. [43] proposed a convolutional blind denoising network (CBDNet). CBDNet is comprised of a noise estimation subnetwork and a denoising subnetwork, and is trained using a more realistic noise model by considering both signal-dependent noise and in-camera processing pipeline. Another interesting way is to investigate weakly supervised training strategies which do not rely on paired training data to train denoising networks. Previously, Ignatov et al. [45] proposed a WESPE method, which trains an image enhancement networks with only two groups of images, i.e. the low quality images and the high quality images. However, currently there are still no works successful in training denoising networks in a weakly supervised manner. How to take benefit from unpaired noisy and clean images for denoising networks training is still an open problem.

7 Conclusion

In this brief review we focused mainly on the image denoising literature addressing the additive white Gaussian noise case, pointed out to the tremendous advances from a field whose research spans for several decades, and went more into the details of the recent literature and the advent of deep learning approaches. After reviewing the current state-of-the-art we went beyond and identified a couple of challenges and open directions to whom the image denoising community is invited to answer.