Abstract
Over the past few years various convolutional neural network (CNN) based approaches have been applied in image denoising. Acknowledging the size of the receptive field in an image denoising problem can increase the network’s performance. The present research has altered the receptive field and studied its effect on image denoising. Three networks have been designed and compared– (1) CNN with dilated kernels (2) CNN without dilation but with increased kernel size with similar receptive field and (3) CNN without any dilation and without increased kernel size. After analyzing the results of three cases an additional fourth case is added with an optimum receptive field, which improves upon the state-of-the-art results. Most methods are based on some assumptions on natural images, here a method relying on a purely learning based approach is presented. The simulation has been conducted with randomized noise levels. A single network is able to handle a wide range of noise levels (σ). For 24-bit images with 256 gray scales for each color, it can reduce noise for standard deviation (σ) in the range of [0–75]. Comparable (with contemporary research) results have been presented in terms of qualitative measurements and visual perspectives. Performance of our approach can be validated with various test sets used in our study. We have achieved comparable PSNR values on existing standard test sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Image acquired by the camera is always degraded by noise. Often the scene is not illuminated properly, forcing the camera to increase the sensitivity of the sensor degrades the image. Removal of noise is an essential step in various image restoration [1, 2] tasks. Considering noise as independent of the image gives us a simple model \(I=S+\eta ,\) where I is a noisy image, S is a noiseless ground truth image and η is noise with standard deviation σ. Noisy images with additive white Gaussian noise (AWGN) [3] can be modeled as the sum of ground truth image and noise (η) (Eq. 1)
where \(I\left(x,y\right)\) is resultant noisy image pixel when \(S\left(x,y\right)\) is corrupted image with noise \((\eta )\).
Image denoising is often ill-posed. This ill-posedness can be solved with maximum-a-posteriori (MAP) principle [4]. Solving this problem with MAP requires modeling the image with random variables that follow a prior distribution. The corrupted image is then reconstructed with the priors using maximum-a-posteriori principle. The objective is to maximize the conditional probability of the reconstructed image when a corrupted image is given.
Image denoising methods can be broadly classified into two categories, model-based methods and discriminative methods. Model based methods use a generic prior model and an optimization algorithm. Model based methods include Block Matching and 3D Filtering (BM3D) [5], Expected Patch Log Likelihood (EPLL) [6] and Weighted Nuclear Norm Minimization (WNNM) [7]. These methods are computationally expensive, time consuming and unable to resolve the problem of spatially variant noise. Discriminative methods model an appropriate image prior, this approach of learning the prior employ Multi-Layer Perceptron (MLP) [8], DnCNN [2], and Trainable nonlinear reaction diffusion (TNRD) [9]. The major difference between model based and discriminative methods is that the model-based methods have flexibility to handle several tasks. Whereas the discriminative methods are solely dependent on the type of dataset used and thus are very specific in nature and can only solve problems they are designed for. For example, a single model-based method NCSR [10] can solve for image denoising, deblurring and super-resolution, whereas three different discriminative methods MLP [8], SRCNN [11], and DCNN [12] are designed for image deblurring, super-resolution and denoising.
Despite having the flexibility of handling multiple tasks, model-based methods are time consuming and need to be optimized with appropriate priors. On the other hand, discriminative methods offer fast speed and promising performance. So far, the most promising results have been provided by discriminative methods. Discriminative methods include usage of convolutional neural network (CNN) [11] and MLP [8] based deep learning techniques [13,14,15]. The aim of this paper is to obtain an estimate of the ground truth image and a thorough evaluation of different receptive field sizes. Mapping function can be referred by \(F\left(I\right)=\widehat{S}\), for this mapping a deep learning approach is employed.
A major part of deep learning is based on CNNs. These networks perform very well as compared to traditional algorithms and produce state of the art results. CNNs were originally developed for image recognition [14] and classification [16, 17] tasks. Using a CNN for these types of tasks progressively reduces the image resolution. A straightforward approach to increase resolution would be to remove subsampling or strides from the layers. This does increase the resolution but at the same time severely affects the receptive field. So, removing subsampling improves the loss in resolution but on the other hand, it reduces the receptive field [18] in the same proportion as it had improved resolution. Reduction in the receptive field cannot be compromised in any way. A brief introduction of receptive field is given below that explains the variation in the size of the receptive field with a set of equations.
1.1 Receptive field
In a Neural Network each node of a layer is connected with node of the next layer, this way of connection to transfer information requires an extremely large number of parameters. CNN uses a slightly different approach where only a few nodes participate in the connection to the next layer. Since this transfer of the information resembles the response of neurons to stimuli only in certain regions of the visual field, the region is called the receptive field in the visual system. With this analogy it can be stated that CNN uses a receptive field like layout [19], where the subset of the nodes of the previous layer connected to the next layer is the receptive field of the next layer. Figure 1 shows the receptive field in neural network layers and CNN.
1.2 Size of receptive field
A large receptive field means the network can perceive more information to predict an accurate image. So, the receptive field has to be enlarged to broaden the view of the input to capture wider contextual information. The receptive field can be enlarged either by increasing the size of the kernel or by increasing the depth of the network. Inflating the size of the kernel increases the number of parameters thus makes this approach a computational burden. A larger number of layers makes the network architecture deep thus introduces more operations. A solution to this problem is to replace convolutional layers with dilation layers. It helps in increasing the size of the receptive field.
The dilation layer method increases the effective kernel size by inserting blank spaces between them. Dilated kernel reduces the computation as it uses a smaller number of parameters. Also, it helps in detection of minute details with improved resolution. Dilated convolution is applied in various domains like image super-resolution [11, 21] Text-to-speech [22] solution and language translation [23].
1.3 Equation of receptive field
Convolution and dilated convolution-based model [20] can be defined with Eqs. (2) and (3) respectively–
Let \({F :Z}^{2}\) → R be a discrete function. Let \({\Omega }_{r}={[-r,r]}^{2}\cap {Z}^{2}\) and let \({k :\Omega }_{r}\to R\) be a discrete filter of size\({(2r + 1)}^{2}\). The discrete convolution operator ∗ is defined as (Eqs. 2 and 3)
Let l be a dilation factor and let \({*}_{l}\) be defined as
When l = 1 refers discrete convolution and l > 1 refers dilated convolution. Same can be applied on 2-Dimensional dilated convolution and given by Eq. 4:
where \(y\left(m,n\right)\) is the output of dilated convolution for input \(x\left(m, n\right)\) and a filter \(w\left(i, j\right)\) with length and the width of m and n respectively.
The dilation layer can be naively called convolution over input with a sparsely populated filter which expands the size of the convolving filter. The expansion rate is controlled by a hyper-parameter d, where (d − 1) blank spaces are inserted in the kernel. For dilation rate equal to 1, zero space will be added. The effective kernel size under the influence of dilation d with kernel size k is given by [24] as
where \({k}^{^{\prime}}\) represents the effective kernel size. Effective receptive field from Eq. 5, for a kernel (k = 3) and dilation rate d = 1, 2 and 3 is 3 × 3, 5 × 5 and 7 × 7 (shown in Fig. 2a–c).
In addition to this, the receptive field for the depth n and kernel size 3 × 3 (throughout the network) can be given by (2d + 1)(2d + 1). The relationship between the dilation rate and output size o as in [24] is given by
For the input size i, padding p and stride s. Convolution of 3 × 3 kernel over an input of size 9 × 9, padding zero and dilation rate 2 (i.e., i = 9, k = 3, d = 2, s = 1 and p = 0) produces output of dimension 5 × 5. Figure 3 shows the output size 3 × 3 for i = 7. That concludes the introduction on receptive field.
In this paper we have studied the significance of receptive field in image denoising in 4 study cases. Our work can be summarized with Fig. 4, where images of compared cases are shown. Cases 1–4 are described in Sect. 3. Case 4 showcases our best result in terms of PSNR (Peak Signal to Noise Ratio) comparison. The rest of the paper is organized as follows. Section 2 contains a brief study on related methods. Section 3 explains our study cases. Sections 4 and 5 describe the details of dataset and network structure. In Sect. 6 experimental results and analysis based on compared PSNR values and corresponding images are shown. Based on the analysis in Sect. 6 another network mentioned as case 4 is introduced. Discussion on the results of test sets is in Sect. 7. Lastly, we wrap up our work with the conclusion mentioned in Sect. 8 and future work in 9.
2 Related methods
Filter based techniques are one of the initially proposed methods for AWGN denoising, these filters are further divided into spatial domain filter and transform domain filter. Mean filtering [25], denoising with local statistics [26], Weiner filter [27] and Bilateral Filtering [28] are some of the most prevalent techniques. These techniques were not sufficient to produce a good quality image.
The image prior is an important property in image denoising. In the past decade a lot of methods were proposed based on image priors. Some of them are Markov Random Field (MRF) [20, 29], BM3D [5], NCSR [10], nonlocal self-similarity (NSS) [31] and WNNM [7]. It is convenient to learn the prior model on small image patches. In EPLL [6] optimization is made on an entire image and image prior is given by the product of all patch priors. Non-local self-similarity [6, 7, 31] based methods exploit the property of repetitive patterns in natural images. These similar patches are grouped to collaboratively estimate the final image. Out of the prior based methods mentioned above, BM3D [5] and WNNM [7] are the popular ones. They are capable of handling various noise levels but they cannot be directly used for spatially variant noise [32].
CNNs are widely used in various image processing tasks due to their excellent performance. Though CNN based methods have also been challenged against prior based methods. Jain et al. [33] compared the performance of Markov random field (MRF) with convolutional neural networks. Another comparison with BM3D is done by Burger et al. [8].
Prior based methods perform well but with some drawbacks. They need to be optimized well, thus there is an increase in the computation cost, in addition to that they rely on manual settings and tuning. To address these problems discriminative approaches were proposed where there is a direct mapping from the noisy image to the ground truth image. In the discriminative learning approach DnCNN [2] is the most popular one. Here a single method aims to solve various image restoration (IR) tasks i.e., it can provide solutions for blind Gaussian denoising, single image super-resolution and JPEG deblocking [34]. They produced promising results for all the three problems. Another similar method [1] used HQS (Half Quadratic Splitting), a variable splitting technique to solve for image denoising, image deblurring and single image super resolution utilizing deep CNN denoiser prior. Chuah et al. [35] provided a straightforward strategy of estimating noise level before removing noise. Wang et al. [36] achieved comparable results with reduced computational cost, less complicated network structure, using a larger receptive field than DnCNN. A combination of a dilated layer with residual learning is a popular technique in resolving the AWG noise problem [37, 38 and 2]. All the methods described above employ a one-to-one mapping i.e., they require a single image as input to produce a denoised image. Zhang et al. [32] proposed a novel network design FFDNet (Fast and Flexible Denoising Convolutional Neural Network) that takes sampled input images with their noise level maps to produce a denoised image. In addition to that, recent approaches aim to deal with spatially variant and invariant noises, these methods worked on real world noisy images. Anwar et al. [38] incorporated feature attention for image denoising. A benchmark dataset of denoised images is created by Romano et al. [39], these images are captured with different cameras and under different camera settings. Guo et al. [40] uses the same strategy of estimating noise first before noise removal like [35]. Their work is denoted by its network architecture name CBDNet [40], which has two sub-networks one for noise estimation and other non-blind denoising estimation. Some of the methods used for comparison are categorized in Fig. 5.
3 Our method
The present research has studied three cases, that are as follows:
Case 1: A dilated convolutional network is utilized here. The network structure is similar to Zhang et al. [1] and Peng et al. [37], here a 7 layered dilated network with a receptive field of the network of size 33 × 33 is used. Zhang et al. [1] and [37] used a residual learning formulation i.e., \(F\left(I\right)=\eta\). i.e., noise is separated by subtracting the predicted noise from noisy input image. We have used \(F\left(I\right)=\widehat{S}\), a direct mapping from noisy image to noiseless image. No intermediate step is required.
Case 2: In this case plain CNN network with the same configuration and same receptive field of size 33 × 33 is used. Same receptive field can be ensured either by using larger sized filters or increasing the depth of the network. We have used larger sized filters.
Case 3: To verify the efficacy of dilated layers in image denoising process, receptive field size is reduced. All dilated layers are replaced with plain CNN layers that reduces the size of the receptive field to 15 × 15. This way performance can be compared with the above two cases. The figure below represents the three cases (Fig. 6).
Receptive field of each layer for all three cases are shown in Table 1, all the cases have seven layers. We have used the term \({rcp}_{i}\) which is the receptive field of the network up to \(i\) layers, where \(i=1, 2, \dots ,7\). All these cases mentioned above are essential steps in verifying the effectiveness of receptive field size in image denoising. Three cases are compared qualitatively and visually.
4 Datasets
We have used COCO dataset [41], which is available as an open-source online dataset and contains 5000 images in its val2016 set. After preprocessing this data, an augmented set of 10,000 images was developed. It is observed that increasing the size of the dataset beyond this does not lead to significant improvement. Images are then separated into training and validation sets in the ratio of 7:3. These images are then cropped to 256 × 256 pixels. To produce synthetic noisy images, an additive white Gaussian noise (AWGN) is added to the images.
For blind denoising, noise levels (σ) are randomly selected from the range [0, 75] to create the dataset. Test sets BSD68 [30], Set 12 [2], RNI15 [42], NC12 [42] and Nam [43] datasets are used. BSD68 [30] and Set 12 [2] contain classic images in the field of image processing i.e., these images have been extensively used for the evaluation of numerous methods. RNI15 [42] set is a real-world noisy image set having 15 images, these images contain spatially variant noise too. NC12 is a set of 12 noisy images, there are no ground truth images in RNI15 [42], NC12 [42] so the images will be compared visually for this set.
5 Network structure
The architecture of the network used to remove Gaussian noise is shown in Fig. 7, where it takes an image degraded with a certain level of AWGN as input. This image is convolved with dilated kernels. Each layer has these kinds of filters with dimension defined by their dilation rate, which will then be trained to appropriate values during the back-propagation algorithm. No pooling is used here due to the requirement of same dimensions of input and output images. Zero padding is used to avoid boundary artifacts. Filters of dimension 3 × 3 × 32 are used where the third channel refers to the number of filters. ReLu (Rectified Linear Unit) [44] is also placed between two consecutive convolution layers to introduce non-linearity in the network. For adaptive learning Adam [45] is used as an optimizer with learning rate 0.001. Loss function here is MSE (mean squared Error) shown in Eq. (7).
where \(S\) and \(\widehat{S}\) are ground truth and predicted denoised images. The final layer gives out the output i.e., an image with reduced noise.
Table 2 shows the parameters and the size of the effective kernel in each layer. Here the receptive field (effective kernel size) of each layer denoted by \({r}_{1},{r}_{2}, \dots ,{r}_{7}\) is 3, 5, 7, 9, 7, 5, 3 in 1–7 layers respectively assuming \({r}_{0}=1\). In calculating the receptive field of a network.
6 Experimental results and analysis
Proposed method is a plain discriminative method, since it is feasible to train a deep network with minimal number of layers. Though residual learning framework is not utilized here, equivalent results are achieved. All our models are trained on Nvidia Tesla K80 GPU.
PSNR and SSIM measurements are utilized to compare the performance of the cases mentioned in Sect. 3. Including SSIM for comparison ensures the quality of image for human perception. It estimates the correlation between two normalized images.
where \({\mu }_{S}\), \({{\sigma }_{S}}^{2}\) and \({\mu }_{\widehat{S}}\), \({{\sigma }_{\widehat{S}}}^{2}\) the local mean and variance of the ground truth image and predicted image. \({\sigma }_{S\widehat{S}}\) denotes the local covariance of ground truth and predicted image.
6.1 Comparison of case 1, case 2, case 3 and case 4
Our cases are compared with three prior based methods BM3D [5], WNNM [8], EPLL [7] and two discriminative methods TNRD [9] and DnCNN [2] as shown in Table 3. Comparison is based on the PSNR values obtained for Set 12 Images [2].
All the simulations are done for a sigma set in the range of σ ∈ [0–75], where a single model is used to denoise an image. Images have a range of [0–255]. These models are referred to as case 1, case 2 and case 3. This type of denoising may also be referred to as blind denoising. Here a single model can handle a series of σ ∈ [0–75]. Comparison of each is shown below:
-
Case 1 and case 2
It is observed from Table 3 that these two cases produce approximately the same result for both σ = 15 and 25. Since these cases have the same receptive field, they produce similar results. Also, these results verify the significance of the receptive field. Case 1 and 2 differ in terms of convergence rate, since case 2 has CNNs with large sized filters; its convergence rate is slower than case 1.
-
Case 1 and case 3
Case 3 shows slightly less PSNR values than case 1. The difference increases when noise level (σ) increases from 15 to 25. This difference proves the need for a larger receptive field in image denoising. Case 1 has fastest convergence compared to the rest of the cases.
-
Case 2 and case 3
Case 2 outperforms case 3 significantly when noise is high (σ = 25), although it maintains comparable PSNR values when noise levels are less (σ < 20). It is visible in Table 3 that as the noise (σ) goes above 30 all these cases fail to beat the state-of-the-art methods.
These comparisons lead this work in the direction of case 1. A slight modification is done in case 1 i.e., use of a dilation layer with a larger receptive field. Further, previous methods validate the need of batch normalization so batch normalization layers are also added in this case 4.
-
Case 4
This case is the same as case 1 with some variations. This network is designed with Batch Normalization (BN) layer [15], where Conv + BN + ReLu series is utilized. Utilization of Batch normalization slows down the process of convergence due to increased number of parameters but it is essential to reduce the problem of covariance shift in a network.
Recalling the inabilities of previous network of case 1 with 7 layers, there is a need of increasing the number of layers. This network with BN has 9 layers. Receptive field of this network comes out to be 51 × 51(Table 4). Network structure is shown in Fig. 8. Mean squared logarithmic error (MSLE) is used in this case unlike MSE in previous cases. As name suggests it is the (Mean Squared Error) MSE calculated over logarithmic error values \(S\) and \(\widehat{S}\) (shown in Eq. (10)).
where \(S\) and \(\widehat{S}\) are ground truth and predicted denoised images respectively and \(N\) denotes the number of samples. 1 is added to both \(S\) and \(\widehat{S}\) to validate this equation mathematically when \(S\), \(\widehat{S}\to 0\).
Comparison of case 4 with the state of the art is shown in Table 3, where case 4 surpasses BM3D [5], WNNM [8], MLP [8] and DnCNN [2] by a margin of at-least 0.4 dB for a range of noise level beyond (σ > 25). For noise levels less than (σ < 25) case 4 produces comparable results, this is because of the larger receptive field. Lower noise levels are removed better by methods having larger modeling capacity e.g. DnCNN [2]. Case 4 worked well for σ = 25, 35 and 50, even for σ = 75, case 4 outperformed FFDNet [33] in 4 out of 12 images.
7 Discussion
Case 4 proved to be our best model so far according to Table 3, thus this will be used as a predicting model for test sets. Figure 9 shows predicted images by case 4 for a series of noise levels (σ = 15, 25, 35, 45, 55, 75). Where an image from CBSD68 [30] is corrupted with a series of noise levels to create a set of synthetic images.
Figure 10 shows predicted PSNR results on popular test images from BSD68 [30] and CBSD68 [30], used by previous methods. Ours (Case4) performs best on BSD68 [30] than CBSD68 [30]. For color images FFDNet works best among others. Average of the predicted PSNR values computed on 68 images of BSD68 [30] and CBSD68 [30] dataset is shown in Table 5, DnCNN [2] and FFDNet [33] are used for comparison. where bolded values specify the maximum values among the comparing methods, ours work best on BSD68 [30] and RIDNet [39] performs best on CBSD68 [30]. Results on Real world dataset (RNI15) [42] were compared (Fig. 11) with DnCNN [2], FFDNet [33] and RIDNet [39], these images contain spatially variant noise. Given the fact that our method is not specifically designed for this type of noise, it performs surprisingly well. Here in these images, it can be clearly seen that for the Flower image it outperforms [39]. Result on Pattern 3 image is also shown, no doubt that overall noise reduction is better done by RIDNet but by analyzing these images carefully one can conclude that an important piece of data is missing in RIDNet’s [39] output. The inset image shown in the bottom illustrates the loss in image quality with RIDNet and FFDNet. Quality of the images are better compared with zoomed out versions of resultant images. Overall performance of our model on these datasets validates the broad spectrum of our method.
8 Conclusion
The presented paper incorporates a detailed study on receptive fields. It includes significance of receptive field in image denoising [46, 47] problem and calculation of receptive field of a network as well as its layers. To achieve this goal several comparison techniques were utilized in network design and training. Performances on different sizes of receptive fields were compared. Our work not only provides a comparative study but also solves the problem of image denoising. With these variations we were able to produce results that are competitive to the state of the art. Previous methods articulated the crucial need of residual learning whereas present work is able to perform without residual learning. Proposed work is an end-to-end approach that only requires a noisy image. A single network can work well for a wide range of sigma (σ) values of grayscale and colored images. The results on real noisy images further demonstrated that our work can deliver perceptually appealing denoised results when compared with BM3D [5], WNNM [8], EPLL [7], TNRD [9] and DnCNN [2]. Though this work did not concern spatially variant noise in its methodology, it can compete with the method RIDNet [39] designed specifically for spatially variant noise.
9 Future scope
There is a continuous effort on image enhancement [48, 49] and restoration [50] in various fields, despite that performance on real images is still lacking. This is due the fact that any simulated noise is much simpler than the real noise. In real life components as illumination, camera shaking and sensors are accountable for degrading the image. Thus, a further powerful noise modeling is required that can handle a variety of noise.
Data availability
All the data used in this work is available online.
References
Zhang K, Zuo W, Gu S, Zhang L (2017) Learning deep CNN denoiser prior for image restoration. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp 2808–2817. https://doi.org/10.1109/CVPR.2017.300
Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising. IEEE Trans Image Process 26(7):3142–3155. https://doi.org/10.1109/TIP.2017.2662206
Liu W, Lin W (2013) Additive white Gaussian noise level estimation in svd domain for images. IEEE Trans Image Process 22(3):872–883. https://doi.org/10.1109/TIP.2012.2219544
Mihcak KM, Kozintsev I, Ramchandran K, Moulin P (1999) Low-complexity image denoising based on statistical modeling of wavelet coefficients. IEEE Signal Process Lett 6(12):300–303. https://doi.org/10.1109/97.803428
Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Trans Image Process 16(8):2080–2095. https://doi.org/10.1109/TIP.2007.901238
Zoran D, Weiss Y (2011) Learning models of natural image patches to whole image restoration. In: 2011 International Conference on Computer Vision, Barcelona, pp 479–486. https://doi.org/10.1109/ICCV.2011.6126278
Gu S, Zhang L, Zuo W, Feng X (2014) Weighted nuclear norm minimization with application to image denoising. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp 2862–2869. https://doi.org/10.1109/CVPR.2014.366
Burger CH, Schuler JC, Harmeling S (2012) Image denoising: can plain neural networks compete with BM3D? In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, pp 2392–2399. https://doi.org/10.1109/CVPR.2012.6247952
Chen Y, Pock T (2017) Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration. IEEE Trans Pattern Anal Mach Intell 39(6):1256–1272. https://doi.org/10.1109/TPAMI.2016.2596743
Dong W, Zhang L, Shi G, Li X (2013) Nonlocally centralized sparse representation for image restoration. IEEE Trans Image Process 22:1620–1630. https://doi.org/10.1109/TIP.2012.2235847
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Xu L, Ren JS, Liu C, Jia J (2016) Deep convolutional neural network for image deconvolution. Adv Neural Info Process Syst 1:1790–1798
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Kaiming H, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proc. of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 37:448–456. arXiv:1502.03167v3
Krizhevsky A, Sutskever I, Hinton EG (2015) ImageNet classification with deep convolutional neural networks. Assoc Comput Mach 60:89–90. https://doi.org/10.1145/3065386
Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field, in Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems, Barcelona, pp 4898–4906. arXiv:1701.04128
Yu F, Koltun V (2016), Multi-scale context aggregation by dilated convolutions.In: International Conference on Learning Representations (ICLR), San Juan, Puerto Rico. arXiv:1511.07122v3
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference for Learning Representations (ICLR), San Diego, pp 1404–1556. arXiv:1409.1556v6
Yu F, Koltun V, Funkhouser T (2017) Dilated residual networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CPVR), Honolulu, HI, pp 636–644. arXiv:1705.09914v1
Oord et al (2016) WaveNet: a generative model for raw audio, pp 1–15. arxiv.org/abs/1609.03499
Kalchbrenner N, Espeholt L, Simonyan K, Oord DVA, Graves A, Kavukcuoglu K (2016) Neural machine translation in linear time. arXiv:1610.10099v2
Dumoulin V, Visin F (2016) A guide to convolution arithmetic for deep learning, arXiv:1603.07285v2
Chandel R, Gupta G (2013) Image filtering algorithms and techniques: a review. Int J Adv Res Comput Sci Softw Eng 3:198–202
Sen-Jong L (1960) Digital Image enhancement and noise filtering by use of local statistics. IEEE Trans Pattern Analysis Machine Intell 2:165–168. https://doi.org/10.1109/TPAMI.1980.4766994
Chen J, Benesty J, Huang Y, Doclo S (2006) New insights into the noise reduction Wiener filter. IEEE Trans Audio Speech Lang Process 14(4):1218–1234. https://doi.org/10.1109/TSA.2005.860851
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images, In: 6th International Conference on Computer Vision, Bombay, India, pp 839–846. https://doi.org/10.1109/ICCV.1998.710815
Lan X, Roth S, Huttenlocher D, Black JM (2006) Efficient belief propagation with learned higher-order Markov random fields. In: Proc. of the European Conference on Computer Vision (ECCV), Springer, LNCS; 3952:269–282. https://doi.org/10.1007/11744047_21
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, Proceedings Eighth IEEE International Conference on Computer Vision, vol 2, pp 416–423. https://doi.org/10.1109/ICCV.2001.937655
Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2:60–65. https://doi.org/10.1109/CVPR.2005.38
Zhang K, Zuo W, Zhang L (2018) FFDNet: toward a fast and flexible solution for CNN-based image denoising. IEEE Trans Image Process 27:4608–4622. https://doi.org/10.1109/TIP.2018.2839891
Jain V, Seung S (2009) Natural image denoising with convolutional networks. Adv Neural Info Process Syst. https://doi.org/10.5555/2981780.2981876
Xiong Z, Orchard TM, Zhang Y (1997) A deblocking algorithm for JPEG compressed images using over complete wavelet representations. IEEE Trans Circuits Syst Video Technol 7:433–437. https://doi.org/10.1109/76.564123
Chuah HJ, Khaw YH, Soon CF, Chow C (2017) Detection of Gaussian noise and its level using deep convolutional neural network. In: TENCON IEEE Region 10 Conference, Penang, pp 2447–2450. https://doi.org/10.1109/TENCON.2017.8228272
Wang T, Sun M, Hu K (2017) Dilated deep residual network for image denoising. In: IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), Boston, MA, pp 1272–1279. arXiv:1708.05473v3
Peng Y et al (2018) Dilated Residual networks with symmetric skip connection for image denoising. Neurocomputing 345:67–76. https://doi.org/10.1016/j.neucom.2018.12.075
Anwar S, Barnes N (2019) Real image denoising with feature attention. In: The IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, pp 3155–3164. arXiv:1904.07396v1
Romano Y, Elad M, Milanfar P (2017) The little engine that could: Regularization by denoising (red). SIAM J Imaging Sci. arXiv:1611.02862v3
Guo S, Yan Z, Zhang K, Zuo W, Zhang L (2018) Toward convolutional blind denoising of real photographs. arXiv:1807.04686
Fleet D et al (2014) Microsoft COCO: common objects in context, In: Computer Vision – European Conference on Computer Vision (ECCV), LNCS, Springer, Cham, vol 8693
Lebrun M, Colom M, Morel J-M (2015) The noise clinic: a blind image denoising algorithm. Image Processing On Line 5:1–54. https://doi.org/10.5201/ipol.2015.125
Nam S, Hwang Y, Matsushita Y, Kim SJ (2016) A holistic approach to cross-channel image noise modeling and its application to image denoising. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1683–1691. https://doi.org/10.1109/CVPR.2016.186
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models, In: Proc. International Conference on Machine Learning (ICML) 30(1): 3. arXiv:1804.02763v1
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization, International Conference. Learning. Representation, pp 1–41. arXiv:1412.6980v9
Hussain J (2022) Vanlalruata Image denoising to enhance character recognition using deep learning. Int J Inf Tecnol. https://doi.org/10.1007/s41870-022-00931-y
Dhanushree M, Priyadharsini R, Sree Sharmila T (2019) Acoustic image denoising using various spatial filtering techniques. Int J Inf Tecnol 11:659–665. https://doi.org/10.1007/s41870-018-0272-3
Kumar M (2019) Priyanka Various image enhancement and matching techniques used for fingerprint recognition system. Int J Inf Tecnol 11:767–772. https://doi.org/10.1007/s41870-017-0061-4
Gupta S, Gupta R, Singla C (2017) Analysis of image enhancement techniques for astrocytoma MRI images. Int J Inf Tecnol 9:311–319. https://doi.org/10.1007/s41870-017-0033-8
Nair RS, Domnic S (2022) Deep-learning with context sensitive quantization and interpolation for underwater image compression and quality image restoration. Int J Inf Tecnol. https://doi.org/10.1007/s41870-022-01020-w
Funding
The corresponding author has not received any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chaurasiya, R., Ganotra, D. Deep dilated CNN based image denoising. Int. j. inf. tecnol. 15, 137–148 (2023). https://doi.org/10.1007/s41870-022-01125-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-022-01125-2