1 Introduction

Image denoise on digital images is a crucial process in the stage of image preprocessing on applications that envolve the areas of image processing, computational vision and pattern recognition. This stage is characterized by a simple math operation of softening or even the localization and recognition of objects. Thereby, the process of restoration of noisy images is a topic of scientific interest for great part of the research community in the areas of signal processing and computational vision. In summary, the main goal of denoising is to minimize and/or to reduce random disruptions of a signal in an image, in order to preserve the maximum amount of relevant information subsequent stages.

In the image processing area, the concept of quadratic noise is flagged as a random variation of intensity information, which occurs during the process of acquisition, recording, processing and transmition of images [1]. In this context, certain types of noise present some characteristics about the intensity of the degraded pixels, a fact that randomly leads to black and white spots, and image blurring, or they assume random variations in their parameters both in the spatial domain and in the frequency domain, which causes and compromises the ability of visual interpretation of an image.

As a result, the methods of image filtering exist in order to reduce the effects caused by the noises and they are usually classified into two categories: (1) in the spatial domain; (2) in the frequency domain [8]. The filtering methods that work in the spatial domain operate directly over the matrix of the image intensity, through operations of convolution with a mask [27]. n the other hand, the filtering methods that operate in the frequency domain are based on the modification of the Fourier transform of the image [31].

In this research, only filtering methods that are on the spatial domain are used: Wiener [13], Total Variation [25], Anisotropic Diffusion [21], Wavelets [18], Bilateral [29], Non-Local Means (NLM) [3] and BM3D [4]. The main similarity between these image filtering methods in the spatial domain is that they normally operate through the convolution process with a mask over the matrix of pixel/image intensity or through calculations of the distance between patches and pixels.

Among the many filtering methods that were mentioned above, the Wiener filter is considered one of the classical methods in the literature of image filtering, which is applied in the process of reduction of different types of signal dependent noises [13]. This filter maps the image and its noise on random variables, in order to find an estimate between the reference image and the filtered image, in such a way that the mean square error (MSE) between them is minimized. Likewise, this filter is excellent in terms of estimating the minimum linear mean square error in the process of filtering and in the softening of image noises.

Meanwhile, the Total Variation filter is an optimized algorithm that is applied for the noise reduction in images [25]. To sum up, the approach adopted by this method solves the problem of denoising in images through a mathematical modeling in an optimized way, while aiming to reduce the signal-noise ration (SNR) of a reference image with a term defined by the magnitude of the absolute gradient of the image.

Regarding the Anisotropic Difussion, it is an adaptative and non-linear method based on partial differential equations (PDE) for an equation of heat diffusion, in which the diffusion coefficient is a function of the image gradient [21]. In short, the conception of this method is to apply the convolution of an original image with a Gaussian core in different scales (variance). In this way, the result of this convolution allows to obtain blurred images in multiple resolutions, an intra-region softening and the preservation of the picture edge.

The Bilateral filter is also an adaptative, non-line filtering method that uses a mathematical modeling to substitute the intensity of each pixel by a weighted average of its neighboring pixels [29].The results of the wrights obtained by the weighted average are defined in terms of two local differences: spatial differences calculated by the Euclidian distance, and radiometric differences between the central pixels and their neighboring ones. Consequently, this method allows to preserve the edges and to reduce the noise in uniform regions of an image.

In addition, the Wavelet filter is a filtering method that is applied to decompose and represent an orthogonal multi-resolution sign [18].The Wavelet representation is between the spatial and the frequency domains. In the spatial domain, a transformed Wavelet is applied in order to obtain the coefficients in the Wavelets sparse domain, using a Wavelet base. In this domain, the noise slightly degrades all the coefficients, in a way that it changes zero coefficients into nonzero coefficients. Like that, according to a threshold T, lower coefficients must be defined for zero, while higher coefficients are attenuated or they remain unchanged. To sum up, the key-point is to determine the value of T.

Furthermore, the BM3D filter presents the prefiltration and post-filtration stages [4]. In the prefiltration stage, non-local techniques and the transformed Wavelet are applied to decompose the image into fragments, then they are grouped according to their similiarity. The post-filtration stage is charachterized by the usage of the Wiener filter in order to minimize the edge effects and the denoising on the image.

Great part of the theoretical background of this article is related to the Non-Local Means (NLM) filtering method. The NLM filter is a technique applied to deal with the reduction of additive noise of Gaussian type [3]. This method is based on the fact that digital images have characteristics that are repeated in the image not only in regions, but also in a global way. In this way, using the Euclidian distance as a measure of similarity, this method aims to find the estimated value of the intensity of each pixel in a certain region in the image.

In the last decades, different methodologies and mathematical models have been reported in the history of the filtering methods art to deal with issues regarding Gaussian denoising in digital images [6, 19, 20, 24]. In this context, the conception of the classic filtering method called NLM is based on the calculation of the Euclidian distance between patches, which is appropriated for images that have additive white Gaussian noise (AWGN). Therefore, an adaptative method is necessary to eliminate the different types of noise in digital images, considering the various types of intensity distribution.

Hence, the method that is proposed in this article suggests the development of a new adaptative filtering method of images that aims to reduce the Gaussian noise in a way that it individually combines the NLM filter with four different divergences of the information theory, in their variations. In order to measure the similarity between patches in the same browser window and based on Levada’s work to spread the NLM filter to the Gaussian noise [16], the following divergences are used: (1) the Kullback-Leibler divergence; (2) the Bhatthacharyya distance; (3) the Hellinger distance; and (4) the Cauchy- Schwarz divergence. Like that, the capacity of the proposed method can be relevant in the processing of different image types, such as the tomographic and hyperspectral images. Besides that, this method can be extended to other types of degradation, like the speckle noise and non-Gaussian noises.

The main contributions of this research are: 1) it was proposed a new filtering method, called Dual Non-Local Means, which acts as an extension to the traditional Non-Local Means filter, and it is applied in images degraded by Gaussian noises; 2) the L1 norm of the Dual Non-Local Means filter was used and the stochastic distances of the Kullback-Leibler divergence, Bhatthacharyya distance, Hellinger distance and Cauchy-Schwarz divergence were used as a measurement of similarity; and 3) the results obtained through the 7 different images degraded by a Gaussian noise indicate that the proposed method can produce superior outcomes comparing to the following filters: Wiener, Total Variation, Anisotropic Diffusion, Wavelets, Bilateral, Non-Local Means (NLM) and BM3D. That comparison was done in a quantitative way, by applying the Peak Signal-to-Noise Ratio (PSNR) metrics for analysis of mean, minimum and maximum regarding the results that were obtained through the other filters that were compared.

This article is organized in the following way: Section 2 is meant to describe works related to this research, by presenting the relation between this study and the various important and relevant methods for the image filtering area, such as the Wiener, Bilateral, Wavelet, Total Variation, Aniostropic filters and, specially, the Non-Local Means and the BM3D filters. Section 3 describes the mathematical formulation of the stochastic distances of the Kullback-Leibler divergence, Bhatthacharyya distance, Hellinger distance and Cauchy-Schwarz divergence, when applied to the Non- Local Means filtering model. Section 4 detailes the proposed method: Non-Local Means using the KL divergence, Non-Local Means using the Bhatthacharyya coefficient, Non-Local Means using the Hellinger distance and Non- Local Means using the Cauchy-Schwarz divergence. Section 5 shows the experiments done and the results obtained in terms of os the PSNR quantitative metrics and the qualitative metrics. Finally, Section 6 shows the conclusions, final considerations and some directions for future works.

2 Related work

The literature on the image filtering methods is really long, therefore an extensive literary review assisted in the scope of this article. In this respect, this section determines the relation between this work and other important and recent methods in the area of image filtering, such as the Wiener, Bilateral, Wavelet, Anisotropic Diffusion, Total Variation, BM3D filters and, in special, the Non- Local Means filter.

In 2020, Petkova and Draganov presented, in their article, a proposal of application of the Wiener filtering method on digital images that denote an unknown level of Gaussian noise [22]. In this context, the variance of Gaussian noise found on images was caused by the distribution of intensities in homogenous areas contained in the image. Thus, the simplest method of the Wiener filter for denoising was applied when obtaining the results of the noisy images. In this way, the authors conducted an extensive analysis on the influence of the size of the filter Wiener mask regarding the variance of the noise. Consequently, the results of that influence led to the conclusion that the usage of the Wiener adaptative filter is efficient, in terms of general (PSNR) and structural (SSIM) preservation.

In 2020, Jin and Luan revealed, in their study, a new approach for the denoising in digital images, based on the Total Variation filtering method and on the weighting function [11]. The approach proposed by the authors initially analyzes the ladder effect caused by the traditional Total Variation filter. Besides that, a second analysis was done, based on the effects of the weighting function in edge regions, in flat regions and in gradient regions. Through these analyses and the information provided by the traditional method, the authors used the Total Variation filter to modify the process of image denoising in a problem of minimization of the energy function. Thereby, after the filter application, they used the weighting function to calculate the gradient magnitude and the value of the local variance of each pixel. By the end of the filter application and after the mathematical calculus, it was possible to analyze in detail the characteristics of different parts of an image. From this new approach, the authors concluded that the Total Variation filter can effectively extinguish the ladder effect of the traditional Total Variation filtering method.

In 2020, Zhang and Sun showed, in their research, a new adaptation of the BM3D algorithm for the denoising in digital images, without affecting the the intra-region and edge details [32]. In the first filtering stage, the algorithm proposed by the authors used the Anisotropic Diffusion (DA) method, along the BM3D method, to search for similar blocks based on the vertical directions of the edge. Therefore, more concrete information could be obtained about the edges and the details of the processing effects. During the second stage, in order to calculate the function of the diffusion coefficient, a mathematic model based on the hyperbolic tangent function was introduced. In that way, the values obtained by the gradient of the neighboring of the eight directions in the image are used for the AD filter application. Through this approach, the researchers concluded that the adaptation of the AD filter, along the BM3D algorithm, could promote better results than the traditional BM3D method, in terms of PSNR and SSIM evaluation. In addition, it provided superior results regarding denoising, edge preservation and image detailing.

Similar to the methodology proposed by Zhang and Sun (2020), which seeks to adapt the BM3D method of filtering noises through another type of filter, Yahya and collaboratores presented, in their work, a model of adaptation of the BM3D method, done through an adaptative filtering technique [30]. In this context, the authors proposed to divide that method into two stages, aiming at reducing the Gaussian noise and at preserving the edge. The first part of this adaptation seeks to replace the traditional hard-thresholding technique that is on BM3D by the Total Variation method of adaptative filtering. Like that, the Total Variation filter is applied on image areas that contain slight noises, in contrast to the traditional hard-thresholding technique used in areas of high noises. This adaptation that was proposed by the authors allows a high performance regarding denoising and edge preservation. Thus, the second part of this stage uses the calculus of the adaptive weight function and the k-means clustering technique to calculate the spatial distance between a reference patch and its candidates. Consequently, using the Total Variation adaptative filter, the adaptive weight function and the k-means clustering technique, as well as through the PSNR and SSIM metrics, the authors noticed the superiority of this method in comparison to the traditional BM3D method.

In 2021, Salehi and Vahidi demonstrated, in their study, a new method of hybrid filtering, which is composed by three stages and three filters for image [26]. In this scenario, the combination of the three filtering methods used by the authors was based on the Wiener, Bilateral and Wavelet filters. In this way, the first stage of the process to denoise concerns to obtain the coefficient of variation and to apply the fuzzy c-means technique to classify the image regions. Then, the second stage consists of the combination and application of denoising filters, being them the Bilateral filter for homgenous regions, and the Wiener and Wavelet filters in regions that contain details and edges. In the third and last stage, the resulting image is evaluated through the logic fuzzy approach. Through the approach of the three named stages, the authors concluded that the combination of the three filtering methods was able to overcome other methods that exist in literature. Besides that, this method could preserve important details and edges on the image.

In 2021, Gupta and Lamba made evident, in their research, two new guidelines to be applied in the traditional Anisotropic Diffusion filtering model [23]. The proposals made by the authors include the traditional Anisotropic Diffusion filter, which is based on a new coefficient of diffusion and on a new threshold, depending on the image. The new model of diffusion coefficiente relied on the function of the tangent sigmoid, so that there was a greater speed in the rate of the function convergence. Regarding the new threshold, a mathematical calculation of weighted absolute mean deviation of the gradient of each processed image was done. So, the researchers concluded that the proposed method demonstrated a higher performance in denoising and edge preservation, besides effectively supplying the ladder effects and blurred edges in relation to the traditional anisotropic diffusion filtering method.

Recently, Kundu and collaborators presented, in their article, a new way of evaluating the value of intensity and retention of edges of NLM filter, using a genetic algorithm [15]. In this scenario, instead of applying the calculation of the weighted average to find the neighboring pixels, the authors used some techniques of genetic algorithms to choose pixels that were more relevant in the local neighborhood, through the introduction of new intensity values. By doing that, the selection of these significant pixels aided in denoising and also improved the process of filtering in terms of edge preservation and evaluation of an intensity value that is considered deprived of noise. Through this new adaptation of the NLM filter and doing an empirical analysis, the authors demonstrated that the proposed filter exceeds the traditional NLM filter.

Therefore, the methodology that is proposed in this article aims to improve the quality of the method of Gaussian noises reduction at a considerable level in comparison to the other attempts done on previous works. Besides that, the previous efforts provided several results for different variances of Gaussian, Salt and Pepper, and Poisson noises, while the method proposed in this article was applied and analyzed for additive Gaussian noise. The exhaustive experimental approach was used in order to show the efficiency of the metrics of information theory when applied on the Dual NLM method that was proposed.

3 Information theoretic distances

In the literary context, the metrics of information theory have been applied on several works, and it has obtained success in the areas of Mathematics and Statistics to quantify the similarity level between random variables. Among the various metrics that exist in the information theory, in the context of this article the Kullback-Leibler divergence, the Bhattacharyya distance, the Hellinger distance and the Cauchy-Schwarz divergence are used and analyzed.

3.1 The Kullback-Leibler divergence

The first metrics used in this work is the Kullback- Leibler (KL) divergence. The KL divergence, also known as Relative Entropy, was initially proposed on the On Information and Sufficiency article, in 1951, by Kullback and Leibler [14]. In this context, the KL method aims to calculate the divergence between two probability distributions (or relative frequencies). In this way, it is possible to represent the math equation of the KL divergence by the following expression:

$$ D_{KL}\left (p, q \right )= \int d_{\mu 1}\left (x \right ) log\frac{p\left (x \right )}{q\left (x \right )} = \int p\left (x \right ) log\frac{p\left (x \right )}{q\left (x \right )}d\lambda \left (x \right ) $$
(1)

In which the parameters p and q denote the discrete distribution of probabilities of a random variance X with parameter λ that is determined from the sample x.

Given the univariate Gaussian context, it is possible to compute the KL divergence as:

$$ D_{KL}^{sym}\left (p,q \right ) = \frac{1}{2}\left [ D_{KL}\left (p,q \right )+D_{KL}\left (q,p \right ) \right ] \\ =\frac{1}{4}\left [\frac{{\sigma_{1}^{2}} \left (\mu_{1} -\mu_{2} \right )^{2} }{{\sigma_{2}^{2}}}+\frac{{\sigma_{2}^{2}} \left (\mu_{1} -\mu_{2} \right )^{2} }{{\sigma_{1}^{2}}} -2 \right ] $$
(2)

Likewise, (2) is summarized as:

$$ D_{KL}^{sym}\left (p,q \right ) = \frac{1}{4{\sigma_{1}^{2}}{\sigma_{2}^{2}}} \left [ \left ({\sigma_{1}^{2}}-{\sigma_{2}^{2}} \right )^{2} + \left ({\mu_{1}^{2}}-{\mu_{2}^{2}} \right )^{2}\left ({\sigma_{1}^{2}}-{\sigma_{2}^{2}} \right ) \right ] $$
(3)

In which the parameters σ and μ show the variances and non-local means of the distributions.

3.2 The Cauchy-Schwarz divergence

The second metrics used in this work is the Cauchy- Schwar divergence. Based on the probability theories and on the math theory, the metrics present on the information theory that are applied more frequently on the literature are the Kullback-Leibler divergence and the Rényi divergence [7, 9, 10, 17]. However, the Kullback-Leibler divergence and the Rényi divergence make fast and efficient calculus impossible on applications involving the classification of objects in the static recognition. In this scenario, the Cauchy-Schwarz divergence arises, which is an analytical expression, closed for a Gaussian mixture (MoG), that enables fast and efficient calculations in applications of the computational vision and classifying objects areas.

The Cauchy-Schwarz divergence for two densities of random vectors \(p\left (x \right )\) and \(q\left (x \right )\) is defined as:

$$ \begin{array}{cccc} D_{CS}\left (p, q \right ) =-log \left (\frac{ \int p\left (x \right ) q\left (x \right ) dx}{ \sqrt{\int p\left (x \right )^{2}dx \int q\left (x \right )^{2}dx}}\right ) \\ = \frac{1}{2} log\left (\int p\left (x \right )^{2}dx \right )+ \left (\int q\left (x \right )^{2}dx \right ) - log \left (\int p\left (x \right )q\left (x \right )dx \right ) \end{array} $$
(4)

in which the parameters p and q represent two symmetric measures regarding the probabilities distributions, such as \(0\leq D_{CS} < \infty \),in which the result of the minimum value is obtained if and only if \(p\left (x \right ) = q\left (x \right )\).

It can been seen that, in the univariate Gaussian case, the Cauchy-Schwarz divergence can be computed by [28]:

$$ D_{CS}\left (p,q \right )= \frac{1}{2} log \left (\frac{\left ({\sigma^{2}_{1}}+{\sigma^{2}_{2}} \right )^{2}}{4{\sigma^{2}_{1}}{\sigma^{2}_{2}}} \right ) + \frac{\left (\mu_{1}-\mu_{2} \right )^{2}}{{\sigma^{2}_{1}}+{\sigma^{2}_{1}}} $$
(5)

where the parameters σ and μ represent the variances and non-local means of distributions, respectively.

3.3 The Bhatthacharyya distance

The third metrics used in this work is the Bhat- tacharyya distance. Based on the process of stochastic distances, the Bhattacharyya distance was originally proposed on The Divergence and Bhattacharry Distace Measures in Signal Selection article, in 1967, by Thomas Kailath [12]. In this context, this method defines a normalized distance between two coefficients:

$$ D_{BC}\left (p, q \right ) = -ln \left (\sum\limits_{x = 1}^{n}\sqrt{p\left (x \right ),q\left (x \right )}\right ) $$
(6)

in which the parameters p and q represent the distributions of normalized probabilities and N the number of distribution compartments. In addition, the Bhattacharyya distance must be limited between \(0\leq D_{BC}\left (p, q \right ) \leq \infty \).

Furhtermore, in the univariate Gaussian case, the Bhattacharyya distance can be computed by:

$$ BC_{Bhat}\left (p,q \right ) = -ln BC\left (p,q \right ) $$
(7)

in wich BC(p,q) is the Bhattacharyya coefficient, in the Gaussian case given by:

$$ BC\left (p,q \right ) = \sqrt{\frac{2\sigma_{1}\sigma_{2}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}}}exp\left \{ -\frac{1}{4} \frac{\left (\mu_{1} -\mu_{2} \right )^{2}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}} \right \} $$
(8)

in wich parameters σ and μ denote the variances and non-local means of distributions, respectively.

3.4 The Hellinger distance

The fourth and last metrics to be applied in this work concerns the Hellinger distance. Its origin dates back to 1907, by the Germanpelo mathematician Ernst David Hellinger, and it presentes a mathematical modeling different from the Riemann integral to measure the distance between distributions of discrete probabilities. Besides its application for the distance calculation, the Hellinger method is classified for calculus that envolve metrics and divergence [5]. Like that, the Hellinger distance is defined as:

$$ D_{H}\left (p, q \right ) = \frac{1}{\sqrt{2}}\left [ {\sum}_{x \in {\Omega}} \left (\sqrt{p\left (x \right )} - \sqrt{q\left (x \right )}\right )^{2} \right ]^{\frac{1}{2}} $$
(9)

in which the parameters p and q q represent the probabilities distributions of a countable Ω space. In this case, the Hellinger distance is restricted by \(0 \leq D_{H} \left (p, q \right ) \leq 1\). Consequently, when the result of the Hellinger distance corresponds to 0, it means that there was no divergence; on the other hand, if it corresponds to 1, the distributions of probability do not share a common support..

In the univariate Gaussian case, the Hellinger distance can be computed by:

$$ D_{Hel}\left (p,q \right ) = 1 - BC\left (p,q \right ) $$
(10)

in which the function BC(p,q) denotes a Bhattacharyya distance coefficient, given by (8).

4 The proposed method

In this section, the standard filtering method is presented and compared to the method proposed in this study, in order to describe in detail the functioning of each mathematical variance. To sum up, the idea of the Non-Local Means method is to deal with Gaussian noises by the calculation of the weighted median instead of the definition of a weighted average.

Considering an additive Gaussian noise, uncorrelated and independent of the signal, the mathematical model used to describe the process of filtering is given by the following equation:

$$ y_{i} = x_{i} + n_{i} $$
(11)

in which yi denotes the result of the noisy pixel, xi,is associated with the noise-free pixel and ni is an operator of the additive Gaussian noise. In the traditional approach ni N(0,σ2) and \(x_{i} ~ N(\mu _{0},{\sigma _{0}^{2}})\) are defined. It should be emphasized that the noise is not correlated, that is, \(E[n_{i}n_{j}] = {\sigma _{0}^{2}}\delta _{i,j}\), whereby, δi,j = 1 if i = j and δi,j = 0 if ij. In this way, the goal is to recuperate xi since xi from yi, given pi the ith patch.

4.1 Non-Local Means

The Non-Local Means traditional filtering method (NLM) was proposed in 2005, by Buades, Coll and Morel, as a method applied to deal with the denoising of additive Gaussian noises in images [3]]. In this scenario, this filter aims to scan the whole image searching for similar pixels by using the concept of similiarity measurement between patches, as shown in Fig. 1 [16].

Fig. 1
figure 1

Process performed by the Non-Local Means filter employing the concept of measuring similarity between image patches using weighted averaging

Thereby, the Non-Local Means filtering method when applied to the noise is contextualized in such a way that given the noisy y = yi𝜖I, the great value for the noise-free pixel xi, denoted by NL[xj] is computed as a weighted average of all the pixels in image expressed by:

$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) x_{j} $$
(12)

in which w(i,j) represents the weights assigned to the similarity between pixels i and j, meeting the condition 0 ≤ w(i,j) ≤ 1 and \({\sum }_{j} w(i,j)=1\). In this way, the similarity between the i and j pixels has as analogy the similarity of vectors of intensities in the levels of gray x(Ni) and x(Nj). Therefore, the Nk parameter represents a patch with a central k pixel. Then (12) of the traditional NLM filter, that is expresses by the \(w\left (i,j \right )\) weights, is defined as:

$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{E}\left (i,j \right )}{h^{2}}} \right \} =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- \left \| x(N_{i}) - x(N_{j})\right \|^{2}_{2,\tau}}{h^{2}}} \right \} $$
(13)

in which the value τ > 1, h2 is a parameter that controls the level of softening, and the constant \(Z\left (i \right )\) refers to a function of normalization that is given by:

$$ Z(i) = \sum\limits_{j} e^{- \frac{ \left \| x(N_{i}) - x(N_{j})\right \|^{2}_{2,\tau}}{h^{2}}} $$
(14)

Hence, the calculation of the sum expressed by (14) do not envolve all the image pixels, but only those that belong to a browser window of t × t size, as a way to reduce the computational cost and to make the method viable.

4.2 Dual Non-Local Means

In na attempt to overcome the main limitations of the Non-Local Means traditional filtering method, the Dual Non-Local Means method is proposed in this article. The Dual Non-Local Means filter consists of a method for the reduction of Gaussian noise in two stages, which incorporates measures of non-Euclidian similarity, based on the information theory, to measure the distance between the patches. Works from the literature that consider other types of noise, such as the Poisson one, were successfully developed [2]. However, improvements for the original NLM filter are not much explored in the literature, in the case of the Gaussian noise.

In this context, the conception of the Dual Non-Local Means method considers for each i = 1,2,...,n the e mean and the variance in each pixel for the definition of the parametric vector \(\vec {theta} = (\mu _{i},{\sigma _{i}^{2}})\). While the variance on i is locally estimated by using all the pixels insiude the ith patch, the means are estimated in a non-local way, directly applied to the standard NLM filter. So, by using the Euclidian NLM exit, that is represented by the non-local estimates of parameters μi to compute the parametric version of great weights, is defined w(i,j) as:

$$ w\left (i,j \right ) = exp\left \{{~}^{\frac{- d_{E}\left (N_{i},N_{j} \right )}{h^{2}}} \right \} $$
(15)

in which dp(Ni,Nj) representes the parametric measure based on the information theory.

Therefore, the idea of the Dual Non-Local Means method is to substitute the L1 norm by the function of Kullback-Leibler divergence, the Bhattacharyya distance, the Hellinger distance and the Cauchy-Schwarz divergence. In this way, the Dual Non-Local Means actually defines a double filtering process, in which the first stage, based on Euclidian NLM, is responsible for the estimation of the model parameters, while the second stage is responsible for the computation of measures of parametric similarity. Because of that, this techinique is called Dual Non-Local Means.

4.2.1 Dual Non-Local Means KL

In the context of applied mathematics for the control of similarity between random variances and for the improvement of multivariate data analysis processes, the usage of measures of information theory is proposed, which is called KL divergence or relative entropy, as alternatives to benefit data grouping and filtering. To sum up, the Dual Non-Local Means KL filter works in the following way:

  • From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μi in a non-local way, for i = 1,2,...,n;

  • For each pixel xi, two local variances \({\sigma _{i}^{2}}\) are estimated inside patch Nk of f × f size, defined as:

    $$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
    (16)
  • KL divergence is calculated between the central \(\hat {N_{i}}\),by using (3);

  • Calculate the weight \(w\left (i,j \right )\) as:

    $$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{KL}\left (p,q \right )}{h^{2}}} \right \} $$
    (17)
  • Noise-free pixel xi is evaluated as:

    $$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
    (18)

4.2.2 Dual Non-Local Means Cauchy-Schwarz

Analogous to the Dual Non-Local Means KL method, the Dual Non-Local Means CS filter is used in the following way:

  • From a noisy image, the standard (Euclidian) NLM filter is applied to estimate theμi means in a non-local way, for i = 1,2,...,n

  • For each xi pixel, we estimate two local variance \({\sigma _{i}^{2}}\) within the patch Nk of size f × f, defined by:

    $$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
    (19)
  • CS divergence is calculated between the central \(\hat {N_{i}}\) by using (5);

  • Calculate the weight \(w\left (i,j \right )\) as:

    $$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{CS}\left (p,q \right )}{h^{2}}} \right \} $$
    (20)
  • Noise-free pixel xi is evaluated as:

    $$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
    (21)

4.2.3 Dual Non-Local Means Bhattacharyya

Similar to the filtering process of the Dual Non-Local Means KL method, the Dual Non-Local Means Bhattacharyya filter is applied in the following way:

  • From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μi means in a non-local way, for i = 1,2,...,n

  • For each xi pixel, two local variance \({\sigma _{i}^{2}}\) are estimated inside patch Nk of f × f size, defined as:

    $$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
    (22)
  • Bhattacharyya distance is calculated between the central patch \(\hat {N_{i}}\) and \(\hat {N_{j}}\), using the (8);

  • Calculate the weight \(w\left (i,j \right )\) as:

    $$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{BC}\left (p,q \right )}{h^{2}}} \right \} $$
    (23)
  • Noise-free pixel xi is evaluated as:

    $$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
    (24)

4.2.4 Dual Non-Local Means Hellinger

When compared to the Dual Non-Local Means KL method, the Dual Non-Local Means Hellinger filter is used in the following way:

  • From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μi means in a non-local way, for i = 1,2,...,n;

  • For each xi pixel, two local variance \({\sigma _{i}^{2}}\) are estimate inside patch Nk of f × f size, defined as:

    $$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
    (25)
  • Hellinger distance is calculated between the central patch \(\hat {N_{i}}\) by using the (10);

  • Calculate the weight \(w\left (i,j \right )\) as:

    $$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{H}\left (p,q \right )}{h^{2}}} \right \} $$
    (26)
  • Noise-free pixel xi is evaluated as:

    $$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
    (27)

5 Experiments and results

In order to test and evaluate the performance of the Dual NLM filtering method when applied to the process of denoising in digital images, a group of 7 types of images, with sizes of 512 x 512 pixels, 8 bits and in shades of grey were used. The images that were used refer to: Airplane, Barbara, Camera, Car, House, Lena and Peppers. The set of images used were taken from the dataset: https://sipi.usc.edu/database/. All the images taken from USC Image Database are intended for research purposes.

In this scenario, the performance of the Dual NLM filter was compared to other filters that exist in the literature, such as: the usual Wiener filter, NLM, the Bilateral one, Total Variation, Wavelet, Anisotropic Diffusion and BM3D.

In order to compare the different methods, quantitative metrics (PSNR) were selected to evaluate the maximum peak in the signal-to-noise ratio between a reference image and its filtered image. The higher the resultant value of this index, the better is the result of the applied filter. All the images used in this article were degraded by additive Gaussian noise, with a variance of \({\sigma ^{2}_{n}} = 10\) of image pixels, which were randomly selected.

Table 1 presents the results of the evalutation done with the PSNR metrics for 7 types of different images, considering a type of mathematical model of the information theory for each column.

Table 1 PSNR’s obtained after filtering with the parametric NLM filter (Kullback-Leibler, Bhatthacharyya, Hellinger and Cauchy-Schwarz) for images corrupted by Gaussian noise with σ = 10

When analyzing the results presented on Table 1, the Dual-NLM Cauchy-Schwarz filtering method showed the greatest final average when compared to the other filtering methods. However, there were also situation in which the Dual-NLM Kullback-Leibler filtering method allowed the obtention of satisfactory results.

Table 2 presents the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different values of entrance as parameter, applied to the calculation of KL divergence.

Table 2 PSNR obtained after filtering with the standard NLM filter and parametric NLM Kullback-Leibler filter for images corrupted by Gaussian noise with σ = 10

Through the analysis of the results show on Table 2,the Dual-NLM KL filtering method presented, in great part of the images, the best result to reduce Graussian noises. However, the standard NLM filtering method also made it possible to obtain satisfactory results.

Furthermore, Table 3 show the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of the Bhattacharrya distance.

Table 3 PSNR obtained after filtering with the standard NLM filter and parametric NLM Bhattacharrya filter for images corrupted by Gaussian noise with σ = 10

By analyzing the results presented on Table 3,the Dual-NLM Bhattacharrya filtering method showed the best result for the reduction of Gaussian noise.

Table 4 provides the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of Cauchy- Schwarz divergence.

Table 4 PSNR obtained after filtering with the standard NLM filter and parametric Cauchy-Schwarz NLM for images corrupted by Gaussian noise with σ = 10

Analyzing the results presented on Table 4, the Dual-NLM de Cauchy-Schwarz filtering method had the greatest result to reduce Gaussian noises.

Table 5 shows the results of evaluation done with PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of the Hellinger distance.

Table 5 PSNR obtained after filtering with the standard NLM filter and parametric Hellinger NLM filter for images corrupted by Gaussian noise with σ = 10

Analyzing the results presented on Table 5, the Dual-NLM Hellinger filtering method showed the best result for the reduction of Gaussian noises when compared to the traditional NLM method.

Table 6 shows the results of evaluation done with the PSNR metrics for the 7 images. Besides that, the results presented were considered for each filter: BM3D, Bilateral, Wiener, Wavelet, Anisotropic Diffusion and Total Variation.

Table 6 BM3D, Bilateral, Wiener, Wavelet, Anisotropic Diffusion and Total Variation filters, for images corrupted by Gaussian noise with σ = 10

Analyzing the results presented on Table 6, the BM3D and Wiener filtering methods shows the greatest results for reduction of Gaussian noises in comparison to the other methods.

To illustrate the difference between those methods, Figs. 234567 and 8, show the visual results for the Airplane, Barbara, Camera, Car, House, Lena and Peppers images. By doing that, it is possible to observe that there is a significant different regarding the level of residual noise in the images that were filtered by the Dual NLM Cauchy-Schwar method. Furthermore, the variances of the as Cauchy- Schwar and Kullback-Leibler non-local divergences offer a greater relation between denoising and edge preservation.

Fig. 2
figure 2

Example image result (a), noisy image (b), Dual NLM Cauchy-Schwarz filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

Fig. 3
figure 3

Example image result (a), noisy image (b), Dual NLM Cauchy-Schwarz filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic diffusion filter (h) and Total Variation filter (i)

Fig. 4
figure 4

Example image result (a), noisy image (b), Dual NLM Cauchy-Schwarz filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

Fig. 5
figure 5

Example image result (a), noisy image (b), Dual NLM Kullback-Leibler filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

Fig. 6
figure 6

Example image result (a), noisy image (b), Dual NLM Kullback-Leibler filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

Fig. 7
figure 7

Example image result (a), noisy image (b), Dual NLM Kullback-Leibler filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

Fig. 8
figure 8

Example image result (a), noisy image (b), Dual NLM Kullback-Leibler filter (c), BM3D filter (d), Bilateral filter (e), Wiener filter (f), Wavelet filter (g), Anisotropic Diffusion filter (h) and Total Variation filter (i)

From the results obtained on Tables 1 and 6,with the evaluation of the filtering methods, it is possible to observe that both filters, Cauchy-Schwarz and KL, respectively, presented a high value of percentage regarding the evaluation of PSNR. In this context, in great part of the results, the usage of the Dual NLM of Cauchy-Schwarz filter proved to be satisfactory for the application on images that were degraded by Gaussian noise. However, the variances found regarding the PSNR evaluation of the Dual NLM KL filter were significant, what allowed the achieving results that also showed to be satisfactory regarding the filtering methods of traditional NLM and the other filtering methods that were applied in this work.

6 Conclusions and final remarks

The process of denoising in images degraded by Graussian noises is a challenge task in the computational vision area, since the recent methods of image filtering are based on functions of spatial domain and frequency domain are not efficient. The filtering methods that are based on spatial domain are usually the best option to solve issues that deal with the impulsive Gaussian noise. Given this, in this article, a Dual Non-Local Means filter was presented, which combines the characteristics of classification order, non-local strategies and mathematical models that are seen on information theory.

In this scenario, the Dual Non-Local Means filtering method can be considered a philosophy of the NLM filter to solve problems of images that are degraded by impulsive noises. The variances of the mathematical metrics based on the concept of the information theory unify two types of distinct behaviors but essential to deal with the Gaussian noise. In this case, the behaviors of the proposed method have the Non-Local Means and the Dual Non-Local Means filters as approaches. In this way, several computational experiments were done during the course of this work, with multiple digital images degraded by Gaussian noise, which showed that the proposed method can generate, on average, significantly better outcomes in terms of PSNR when compared to the continuous application of the standard NLM, Total Variation, BM3D, Anisotropic Diffusion, Wiener, Wavelet and Bilateral filters.

Finally, future works can include the usage of different families of entropy, such as Renyi’s and Sharma-Mittal’s entropies. Methods that are applied to solve problems of dimensionality reduction, for example, PCA, can be used to better understand a more compact and significant representation for patches inside the browser window. Besides that, methods like the Parametric PCA, the ISOMAP and the Laplacian Eigemaps can be applied before the calculation of the Euclidian distances as a way of asymptotically guarantee the greatest similarity measures.