Dual Non-Local Means: a two-stage information-theoretic filter for image denoising

de Brito, André R.; Levada, Alexandre L. M.

doi:10.1007/s11042-023-15339-4

Dual Non-Local Means: a two-stage information-theoretic filter for image denoising

Published: 20 May 2023

Volume 83, pages 4065–4092, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Dual Non-Local Means: a two-stage information-theoretic filter for image denoising

Download PDF

252 Accesses
Explore all metrics

Abstract

Image denoise has been explored with the development of various filters used to remove or reduce random disruptions on observed data, but at the same time it preserves most of the edges and the fine details of the scene. The issue caused by the combined deterioration of the Gaussian noise succeeds the scattering through all the signal frequencies. Thus, the most effective filters for this type of noise are implemented in spatial domain. In this article, we proposed a Non-Local Means filter that combines the average of each fragment of a browser window, by using four measures – of distinct similarities – among the Gaussian densities that are estimated from the following fragments: the Kullback-Leibler divergence, the Bhattacharyya distance, the Hellinger distance and the Cauchy-Schwarz divergence. Computational experiments were done in a set of 7 images that were deteriorated by a noise of Gaussian type, considering that the data obtained show that the proposed methods are capable of producing, on average, a Peak Signal-to-Noise Ratio significantly greater than the one the combination of Total Variation, Non-Local Means, BM3D, Anisotropic Diffusion, Wiener, Wavelet e Bilateral filters does when they are applied independently.

A Nonlocal Image Denoising Algorithm Using the Structural Similarity Metric

Article 01 July 2019

Bounded Non-Local Means for Fast and Effective Image Denoising

Filter Parameter Estimation in Non-Local Means Algorithm

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image denoise on digital images is a crucial process in the stage of image preprocessing on applications that envolve the areas of image processing, computational vision and pattern recognition. This stage is characterized by a simple math operation of softening or even the localization and recognition of objects. Thereby, the process of restoration of noisy images is a topic of scientific interest for great part of the research community in the areas of signal processing and computational vision. In summary, the main goal of denoising is to minimize and/or to reduce random disruptions of a signal in an image, in order to preserve the maximum amount of relevant information subsequent stages.

In the image processing area, the concept of quadratic noise is flagged as a random variation of intensity information, which occurs during the process of acquisition, recording, processing and transmition of images [1]. In this context, certain types of noise present some characteristics about the intensity of the degraded pixels, a fact that randomly leads to black and white spots, and image blurring, or they assume random variations in their parameters both in the spatial domain and in the frequency domain, which causes and compromises the ability of visual interpretation of an image.

As a result, the methods of image filtering exist in order to reduce the effects caused by the noises and they are usually classified into two categories: (1) in the spatial domain; (2) in the frequency domain [8]. The filtering methods that work in the spatial domain operate directly over the matrix of the image intensity, through operations of convolution with a mask [27]. n the other hand, the filtering methods that operate in the frequency domain are based on the modification of the Fourier transform of the image [31].

In this research, only filtering methods that are on the spatial domain are used: Wiener [13], Total Variation [25], Anisotropic Diffusion [21], Wavelets [18], Bilateral [29], Non-Local Means (NLM) [3] and BM3D [4]. The main similarity between these image filtering methods in the spatial domain is that they normally operate through the convolution process with a mask over the matrix of pixel/image intensity or through calculations of the distance between patches and pixels.

Among the many filtering methods that were mentioned above, the Wiener filter is considered one of the classical methods in the literature of image filtering, which is applied in the process of reduction of different types of signal dependent noises [13]. This filter maps the image and its noise on random variables, in order to find an estimate between the reference image and the filtered image, in such a way that the mean square error (MSE) between them is minimized. Likewise, this filter is excellent in terms of estimating the minimum linear mean square error in the process of filtering and in the softening of image noises.

Meanwhile, the Total Variation filter is an optimized algorithm that is applied for the noise reduction in images [25]. To sum up, the approach adopted by this method solves the problem of denoising in images through a mathematical modeling in an optimized way, while aiming to reduce the signal-noise ration (SNR) of a reference image with a term defined by the magnitude of the absolute gradient of the image.

Regarding the Anisotropic Difussion, it is an adaptative and non-linear method based on partial differential equations (PDE) for an equation of heat diffusion, in which the diffusion coefficient is a function of the image gradient [21]. In short, the conception of this method is to apply the convolution of an original image with a Gaussian core in different scales (variance). In this way, the result of this convolution allows to obtain blurred images in multiple resolutions, an intra-region softening and the preservation of the picture edge.

The Bilateral filter is also an adaptative, non-line filtering method that uses a mathematical modeling to substitute the intensity of each pixel by a weighted average of its neighboring pixels [29].The results of the wrights obtained by the weighted average are defined in terms of two local differences: spatial differences calculated by the Euclidian distance, and radiometric differences between the central pixels and their neighboring ones. Consequently, this method allows to preserve the edges and to reduce the noise in uniform regions of an image.

In addition, the Wavelet filter is a filtering method that is applied to decompose and represent an orthogonal multi-resolution sign [18].The Wavelet representation is between the spatial and the frequency domains. In the spatial domain, a transformed Wavelet is applied in order to obtain the coefficients in the Wavelets sparse domain, using a Wavelet base. In this domain, the noise slightly degrades all the coefficients, in a way that it changes zero coefficients into nonzero coefficients. Like that, according to a threshold T, lower coefficients must be defined for zero, while higher coefficients are attenuated or they remain unchanged. To sum up, the key-point is to determine the value of T.

Furthermore, the BM3D filter presents the prefiltration and post-filtration stages [4]. In the prefiltration stage, non-local techniques and the transformed Wavelet are applied to decompose the image into fragments, then they are grouped according to their similiarity. The post-filtration stage is charachterized by the usage of the Wiener filter in order to minimize the edge effects and the denoising on the image.

Great part of the theoretical background of this article is related to the Non-Local Means (NLM) filtering method. The NLM filter is a technique applied to deal with the reduction of additive noise of Gaussian type [3]. This method is based on the fact that digital images have characteristics that are repeated in the image not only in regions, but also in a global way. In this way, using the Euclidian distance as a measure of similarity, this method aims to find the estimated value of the intensity of each pixel in a certain region in the image.

In the last decades, different methodologies and mathematical models have been reported in the history of the filtering methods art to deal with issues regarding Gaussian denoising in digital images [6, 19, 20, 24]. In this context, the conception of the classic filtering method called NLM is based on the calculation of the Euclidian distance between patches, which is appropriated for images that have additive white Gaussian noise (AWGN). Therefore, an adaptative method is necessary to eliminate the different types of noise in digital images, considering the various types of intensity distribution.

Hence, the method that is proposed in this article suggests the development of a new adaptative filtering method of images that aims to reduce the Gaussian noise in a way that it individually combines the NLM filter with four different divergences of the information theory, in their variations. In order to measure the similarity between patches in the same browser window and based on Levada’s work to spread the NLM filter to the Gaussian noise [16], the following divergences are used: (1) the Kullback-Leibler divergence; (2) the Bhatthacharyya distance; (3) the Hellinger distance; and (4) the Cauchy- Schwarz divergence. Like that, the capacity of the proposed method can be relevant in the processing of different image types, such as the tomographic and hyperspectral images. Besides that, this method can be extended to other types of degradation, like the speckle noise and non-Gaussian noises.

The main contributions of this research are: 1) it was proposed a new filtering method, called Dual Non-Local Means, which acts as an extension to the traditional Non-Local Means filter, and it is applied in images degraded by Gaussian noises; 2) the L1 norm of the Dual Non-Local Means filter was used and the stochastic distances of the Kullback-Leibler divergence, Bhatthacharyya distance, Hellinger distance and Cauchy-Schwarz divergence were used as a measurement of similarity; and 3) the results obtained through the 7 different images degraded by a Gaussian noise indicate that the proposed method can produce superior outcomes comparing to the following filters: Wiener, Total Variation, Anisotropic Diffusion, Wavelets, Bilateral, Non-Local Means (NLM) and BM3D. That comparison was done in a quantitative way, by applying the Peak Signal-to-Noise Ratio (PSNR) metrics for analysis of mean, minimum and maximum regarding the results that were obtained through the other filters that were compared.

This article is organized in the following way: Section 2 is meant to describe works related to this research, by presenting the relation between this study and the various important and relevant methods for the image filtering area, such as the Wiener, Bilateral, Wavelet, Total Variation, Aniostropic filters and, specially, the Non-Local Means and the BM3D filters. Section 3 describes the mathematical formulation of the stochastic distances of the Kullback-Leibler divergence, Bhatthacharyya distance, Hellinger distance and Cauchy-Schwarz divergence, when applied to the Non- Local Means filtering model. Section 4 detailes the proposed method: Non-Local Means using the KL divergence, Non-Local Means using the Bhatthacharyya coefficient, Non-Local Means using the Hellinger distance and Non- Local Means using the Cauchy-Schwarz divergence. Section 5 shows the experiments done and the results obtained in terms of os the PSNR quantitative metrics and the qualitative metrics. Finally, Section 6 shows the conclusions, final considerations and some directions for future works.

2 Related work

The literature on the image filtering methods is really long, therefore an extensive literary review assisted in the scope of this article. In this respect, this section determines the relation between this work and other important and recent methods in the area of image filtering, such as the Wiener, Bilateral, Wavelet, Anisotropic Diffusion, Total Variation, BM3D filters and, in special, the Non- Local Means filter.

In 2020, Petkova and Draganov presented, in their article, a proposal of application of the Wiener filtering method on digital images that denote an unknown level of Gaussian noise [22]. In this context, the variance of Gaussian noise found on images was caused by the distribution of intensities in homogenous areas contained in the image. Thus, the simplest method of the Wiener filter for denoising was applied when obtaining the results of the noisy images. In this way, the authors conducted an extensive analysis on the influence of the size of the filter Wiener mask regarding the variance of the noise. Consequently, the results of that influence led to the conclusion that the usage of the Wiener adaptative filter is efficient, in terms of general (PSNR) and structural (SSIM) preservation.

In 2020, Jin and Luan revealed, in their study, a new approach for the denoising in digital images, based on the Total Variation filtering method and on the weighting function [11]. The approach proposed by the authors initially analyzes the ladder effect caused by the traditional Total Variation filter. Besides that, a second analysis was done, based on the effects of the weighting function in edge regions, in flat regions and in gradient regions. Through these analyses and the information provided by the traditional method, the authors used the Total Variation filter to modify the process of image denoising in a problem of minimization of the energy function. Thereby, after the filter application, they used the weighting function to calculate the gradient magnitude and the value of the local variance of each pixel. By the end of the filter application and after the mathematical calculus, it was possible to analyze in detail the characteristics of different parts of an image. From this new approach, the authors concluded that the Total Variation filter can effectively extinguish the ladder effect of the traditional Total Variation filtering method.

In 2020, Zhang and Sun showed, in their research, a new adaptation of the BM3D algorithm for the denoising in digital images, without affecting the the intra-region and edge details [32]. In the first filtering stage, the algorithm proposed by the authors used the Anisotropic Diffusion (DA) method, along the BM3D method, to search for similar blocks based on the vertical directions of the edge. Therefore, more concrete information could be obtained about the edges and the details of the processing effects. During the second stage, in order to calculate the function of the diffusion coefficient, a mathematic model based on the hyperbolic tangent function was introduced. In that way, the values obtained by the gradient of the neighboring of the eight directions in the image are used for the AD filter application. Through this approach, the researchers concluded that the adaptation of the AD filter, along the BM3D algorithm, could promote better results than the traditional BM3D method, in terms of PSNR and SSIM evaluation. In addition, it provided superior results regarding denoising, edge preservation and image detailing.

Similar to the methodology proposed by Zhang and Sun (2020), which seeks to adapt the BM3D method of filtering noises through another type of filter, Yahya and collaboratores presented, in their work, a model of adaptation of the BM3D method, done through an adaptative filtering technique [30]. In this context, the authors proposed to divide that method into two stages, aiming at reducing the Gaussian noise and at preserving the edge. The first part of this adaptation seeks to replace the traditional hard-thresholding technique that is on BM3D by the Total Variation method of adaptative filtering. Like that, the Total Variation filter is applied on image areas that contain slight noises, in contrast to the traditional hard-thresholding technique used in areas of high noises. This adaptation that was proposed by the authors allows a high performance regarding denoising and edge preservation. Thus, the second part of this stage uses the calculus of the adaptive weight function and the k-means clustering technique to calculate the spatial distance between a reference patch and its candidates. Consequently, using the Total Variation adaptative filter, the adaptive weight function and the k-means clustering technique, as well as through the PSNR and SSIM metrics, the authors noticed the superiority of this method in comparison to the traditional BM3D method.

In 2021, Salehi and Vahidi demonstrated, in their study, a new method of hybrid filtering, which is composed by three stages and three filters for image [26]. In this scenario, the combination of the three filtering methods used by the authors was based on the Wiener, Bilateral and Wavelet filters. In this way, the first stage of the process to denoise concerns to obtain the coefficient of variation and to apply the fuzzy c-means technique to classify the image regions. Then, the second stage consists of the combination and application of denoising filters, being them the Bilateral filter for homgenous regions, and the Wiener and Wavelet filters in regions that contain details and edges. In the third and last stage, the resulting image is evaluated through the logic fuzzy approach. Through the approach of the three named stages, the authors concluded that the combination of the three filtering methods was able to overcome other methods that exist in literature. Besides that, this method could preserve important details and edges on the image.

In 2021, Gupta and Lamba made evident, in their research, two new guidelines to be applied in the traditional Anisotropic Diffusion filtering model [23]. The proposals made by the authors include the traditional Anisotropic Diffusion filter, which is based on a new coefficient of diffusion and on a new threshold, depending on the image. The new model of diffusion coefficiente relied on the function of the tangent sigmoid, so that there was a greater speed in the rate of the function convergence. Regarding the new threshold, a mathematical calculation of weighted absolute mean deviation of the gradient of each processed image was done. So, the researchers concluded that the proposed method demonstrated a higher performance in denoising and edge preservation, besides effectively supplying the ladder effects and blurred edges in relation to the traditional anisotropic diffusion filtering method.

Recently, Kundu and collaborators presented, in their article, a new way of evaluating the value of intensity and retention of edges of NLM filter, using a genetic algorithm [15]. In this scenario, instead of applying the calculation of the weighted average to find the neighboring pixels, the authors used some techniques of genetic algorithms to choose pixels that were more relevant in the local neighborhood, through the introduction of new intensity values. By doing that, the selection of these significant pixels aided in denoising and also improved the process of filtering in terms of edge preservation and evaluation of an intensity value that is considered deprived of noise. Through this new adaptation of the NLM filter and doing an empirical analysis, the authors demonstrated that the proposed filter exceeds the traditional NLM filter.

Therefore, the methodology that is proposed in this article aims to improve the quality of the method of Gaussian noises reduction at a considerable level in comparison to the other attempts done on previous works. Besides that, the previous efforts provided several results for different variances of Gaussian, Salt and Pepper, and Poisson noises, while the method proposed in this article was applied and analyzed for additive Gaussian noise. The exhaustive experimental approach was used in order to show the efficiency of the metrics of information theory when applied on the Dual NLM method that was proposed.

3 Information theoretic distances

In the literary context, the metrics of information theory have been applied on several works, and it has obtained success in the areas of Mathematics and Statistics to quantify the similarity level between random variables. Among the various metrics that exist in the information theory, in the context of this article the Kullback-Leibler divergence, the Bhattacharyya distance, the Hellinger distance and the Cauchy-Schwarz divergence are used and analyzed.

3.1 The Kullback-Leibler divergence

The first metrics used in this work is the Kullback- Leibler (KL) divergence. The KL divergence, also known as Relative Entropy, was initially proposed on the On Information and Sufficiency article, in 1951, by Kullback and Leibler [14]. In this context, the KL method aims to calculate the divergence between two probability distributions (or relative frequencies). In this way, it is possible to represent the math equation of the KL divergence by the following expression:

$$ D_{KL}\left (p, q \right )= \int d_{\mu 1}\left (x \right ) log\frac{p\left (x \right )}{q\left (x \right )} = \int p\left (x \right ) log\frac{p\left (x \right )}{q\left (x \right )}d\lambda \left (x \right ) $$

(1)

In which the parameters p and q denote the discrete distribution of probabilities of a random variance X with parameter λ that is determined from the sample x.

Given the univariate Gaussian context, it is possible to compute the KL divergence as:

$$ D_{KL}^{sym}\left (p,q \right ) = \frac{1}{2}\left [ D_{KL}\left (p,q \right )+D_{KL}\left (q,p \right ) \right ] \\ =\frac{1}{4}\left [\frac{{\sigma_{1}^{2}} \left (\mu_{1} -\mu_{2} \right )^{2} }{{\sigma_{2}^{2}}}+\frac{{\sigma_{2}^{2}} \left (\mu_{1} -\mu_{2} \right )^{2} }{{\sigma_{1}^{2}}} -2 \right ] $$

(2)

Likewise, (2) is summarized as:

$$ D_{KL}^{sym}\left (p,q \right ) = \frac{1}{4{\sigma_{1}^{2}}{\sigma_{2}^{2}}} \left [ \left ({\sigma_{1}^{2}}-{\sigma_{2}^{2}} \right )^{2} + \left ({\mu_{1}^{2}}-{\mu_{2}^{2}} \right )^{2}\left ({\sigma_{1}^{2}}-{\sigma_{2}^{2}} \right ) \right ] $$

(3)

In which the parameters σ and μ show the variances and non-local means of the distributions.

3.2 The Cauchy-Schwarz divergence

The second metrics used in this work is the Cauchy- Schwar divergence. Based on the probability theories and on the math theory, the metrics present on the information theory that are applied more frequently on the literature are the Kullback-Leibler divergence and the Rényi divergence [7, 9, 10, 17]. However, the Kullback-Leibler divergence and the Rényi divergence make fast and efficient calculus impossible on applications involving the classification of objects in the static recognition. In this scenario, the Cauchy-Schwarz divergence arises, which is an analytical expression, closed for a Gaussian mixture (MoG), that enables fast and efficient calculations in applications of the computational vision and classifying objects areas.

The Cauchy-Schwarz divergence for two densities of random vectors $p\left (x \right )$ and $q\left (x \right )$ is defined as:

$$ \begin{array}{cccc} D_{CS}\left (p, q \right ) =-log \left (\frac{ \int p\left (x \right ) q\left (x \right ) dx}{ \sqrt{\int p\left (x \right )^{2}dx \int q\left (x \right )^{2}dx}}\right ) \\ = \frac{1}{2} log\left (\int p\left (x \right )^{2}dx \right )+ \left (\int q\left (x \right )^{2}dx \right ) - log \left (\int p\left (x \right )q\left (x \right )dx \right ) \end{array} $$

(4)

in which the parameters p and q represent two symmetric measures regarding the probabilities distributions, such as $0\leq D_{CS} < \infty $,in which the result of the minimum value is obtained if and only if $p\left (x \right ) = q\left (x \right )$.

It can been seen that, in the univariate Gaussian case, the Cauchy-Schwarz divergence can be computed by [28]:

$$ D_{CS}\left (p,q \right )= \frac{1}{2} log \left (\frac{\left ({\sigma^{2}_{1}}+{\sigma^{2}_{2}} \right )^{2}}{4{\sigma^{2}_{1}}{\sigma^{2}_{2}}} \right ) + \frac{\left (\mu_{1}-\mu_{2} \right )^{2}}{{\sigma^{2}_{1}}+{\sigma^{2}_{1}}} $$

(5)

where the parameters σ and μ represent the variances and non-local means of distributions, respectively.

3.3 The Bhatthacharyya distance

The third metrics used in this work is the Bhat- tacharyya distance. Based on the process of stochastic distances, the Bhattacharyya distance was originally proposed on The Divergence and Bhattacharry Distace Measures in Signal Selection article, in 1967, by Thomas Kailath [12]. In this context, this method defines a normalized distance between two coefficients:

$$ D_{BC}\left (p, q \right ) = -ln \left (\sum\limits_{x = 1}^{n}\sqrt{p\left (x \right ),q\left (x \right )}\right ) $$

(6)

in which the parameters p and q represent the distributions of normalized probabilities and N the number of distribution compartments. In addition, the Bhattacharyya distance must be limited between $0\leq D_{BC}\left (p, q \right ) \leq \infty $.

Furhtermore, in the univariate Gaussian case, the Bhattacharyya distance can be computed by:

$$ BC_{Bhat}\left (p,q \right ) = -ln BC\left (p,q \right ) $$

(7)

in wich BC(p,q) is the Bhattacharyya coefficient, in the Gaussian case given by:

$$ BC\left (p,q \right ) = \sqrt{\frac{2\sigma_{1}\sigma_{2}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}}}exp\left \{ -\frac{1}{4} \frac{\left (\mu_{1} -\mu_{2} \right )^{2}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}} \right \} $$

(8)

in wich parameters σ and μ denote the variances and non-local means of distributions, respectively.

3.4 The Hellinger distance

The fourth and last metrics to be applied in this work concerns the Hellinger distance. Its origin dates back to 1907, by the Germanpelo mathematician Ernst David Hellinger, and it presentes a mathematical modeling different from the Riemann integral to measure the distance between distributions of discrete probabilities. Besides its application for the distance calculation, the Hellinger method is classified for calculus that envolve metrics and divergence [5]. Like that, the Hellinger distance is defined as:

$$ D_{H}\left (p, q \right ) = \frac{1}{\sqrt{2}}\left [ {\sum}_{x \in {\Omega}} \left (\sqrt{p\left (x \right )} - \sqrt{q\left (x \right )}\right )^{2} \right ]^{\frac{1}{2}} $$

(9)

in which the parameters p and q q represent the probabilities distributions of a countable Ω space. In this case, the Hellinger distance is restricted by $0 \leq D_{H} \left (p, q \right ) \leq 1$. Consequently, when the result of the Hellinger distance corresponds to 0, it means that there was no divergence; on the other hand, if it corresponds to 1, the distributions of probability do not share a common support..

In the univariate Gaussian case, the Hellinger distance can be computed by:

$$ D_{Hel}\left (p,q \right ) = 1 - BC\left (p,q \right ) $$

(10)

in which the function BC(p,q) denotes a Bhattacharyya distance coefficient, given by (8).

4 The proposed method

In this section, the standard filtering method is presented and compared to the method proposed in this study, in order to describe in detail the functioning of each mathematical variance. To sum up, the idea of the Non-Local Means method is to deal with Gaussian noises by the calculation of the weighted median instead of the definition of a weighted average.

Considering an additive Gaussian noise, uncorrelated and independent of the signal, the mathematical model used to describe the process of filtering is given by the following equation:

$$ y_{i} = x_{i} + n_{i} $$

(11)

in which y_i denotes the result of the noisy pixel, x_i,is associated with the noise-free pixel and n_i is an operator of the additive Gaussian noise. In the traditional approach n_i N(0,σ²) and $x_{i} ~ N(\mu _{0},{\sigma _{0}^{2}})$ are defined. It should be emphasized that the noise is not correlated, that is, $E[n_{i}n_{j}] = {\sigma _{0}^{2}}\delta _{i,j}$, whereby, δ_i,j = 1 if i = j and δ_i,j = 0 if i≠j. In this way, the goal is to recuperate xi since x_i from y_i, given p_i the ith patch.

4.1 Non-Local Means

The Non-Local Means traditional filtering method (NLM) was proposed in 2005, by Buades, Coll and Morel, as a method applied to deal with the denoising of additive Gaussian noises in images [3]]. In this scenario, this filter aims to scan the whole image searching for similar pixels by using the concept of similiarity measurement between patches, as shown in Fig. 1 [16].

Thereby, the Non-Local Means filtering method when applied to the noise is contextualized in such a way that given the noisy y = y_i∣𝜖I, the great value for the noise-free pixel x_i, denoted by NL[x_j] is computed as a weighted average of all the pixels in image expressed by:

$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) x_{j} $$

(12)

in which w(i,j) represents the weights assigned to the similarity between pixels i and j, meeting the condition 0 ≤ w(i,j) ≤ 1 and ${\sum }_{j} w(i,j)=1$. In this way, the similarity between the i and j pixels has as analogy the similarity of vectors of intensities in the levels of gray x(N_i) and x(N_j). Therefore, the N_k parameter represents a patch with a central k pixel. Then (12) of the traditional NLM filter, that is expresses by the $w\left (i,j \right )$ weights, is defined as:

$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{E}\left (i,j \right )}{h^{2}}} \right \} =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- \left \| x(N_{i}) - x(N_{j})\right \|^{2}_{2,\tau}}{h^{2}}} \right \} $$

(13)

in which the value τ > 1, h² is a parameter that controls the level of softening, and the constant $Z\left (i \right )$ refers to a function of normalization that is given by:

$$ Z(i) = \sum\limits_{j} e^{- \frac{ \left \| x(N_{i}) - x(N_{j})\right \|^{2}_{2,\tau}}{h^{2}}} $$

(14)

Hence, the calculation of the sum expressed by (14) do not envolve all the image pixels, but only those that belong to a browser window of t × t size, as a way to reduce the computational cost and to make the method viable.

4.2 Dual Non-Local Means

In na attempt to overcome the main limitations of the Non-Local Means traditional filtering method, the Dual Non-Local Means method is proposed in this article. The Dual Non-Local Means filter consists of a method for the reduction of Gaussian noise in two stages, which incorporates measures of non-Euclidian similarity, based on the information theory, to measure the distance between the patches. Works from the literature that consider other types of noise, such as the Poisson one, were successfully developed [2]. However, improvements for the original NLM filter are not much explored in the literature, in the case of the Gaussian noise.

In this context, the conception of the Dual Non-Local Means method considers for each i = 1,2,...,n the e mean and the variance in each pixel for the definition of the parametric vector $\vec {theta} = (\mu _{i},{\sigma _{i}^{2}})$. While the variance on i is locally estimated by using all the pixels insiude the ith patch, the means are estimated in a non-local way, directly applied to the standard NLM filter. So, by using the Euclidian NLM exit, that is represented by the non-local estimates of parameters μ_i to compute the parametric version of great weights, is defined w(i,j) as:

$$ w\left (i,j \right ) = exp\left \{{~}^{\frac{- d_{E}\left (N_{i},N_{j} \right )}{h^{2}}} \right \} $$

(15)

in which d_p(N_i,N_j) representes the parametric measure based on the information theory.

Therefore, the idea of the Dual Non-Local Means method is to substitute the L1 norm by the function of Kullback-Leibler divergence, the Bhattacharyya distance, the Hellinger distance and the Cauchy-Schwarz divergence. In this way, the Dual Non-Local Means actually defines a double filtering process, in which the first stage, based on Euclidian NLM, is responsible for the estimation of the model parameters, while the second stage is responsible for the computation of measures of parametric similarity. Because of that, this techinique is called Dual Non-Local Means.

4.2.1 Dual Non-Local Means KL

In the context of applied mathematics for the control of similarity between random variances and for the improvement of multivariate data analysis processes, the usage of measures of information theory is proposed, which is called KL divergence or relative entropy, as alternatives to benefit data grouping and filtering. To sum up, the Dual Non-Local Means KL filter works in the following way:

From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μ_i in a non-local way, for i = 1,2,...,n;
For each pixel x_i, two local variances ${\sigma _{i}^{2}}$ are estimated inside patch N_k of f × f size, defined as:
$$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
(16)
KL divergence is calculated between the central $\hat {N_{i}}$,by using (3);
Calculate the weight $w\left (i,j \right )$ as:
$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{KL}\left (p,q \right )}{h^{2}}} \right \} $$
(17)
Noise-free pixel x_i is evaluated as:
$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
(18)

4.2.2 Dual Non-Local Means Cauchy-Schwarz

Analogous to the Dual Non-Local Means KL method, the Dual Non-Local Means CS filter is used in the following way:

From a noisy image, the standard (Euclidian) NLM filter is applied to estimate theμ_i means in a non-local way, for i = 1,2,...,n
For each x_i pixel, we estimate two local variance ${\sigma _{i}^{2}}$ within the patch N_k of size f × f, defined by:
$$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
(19)
CS divergence is calculated between the central $\hat {N_{i}}$ by using (5);
Calculate the weight $w\left (i,j \right )$ as:
$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{CS}\left (p,q \right )}{h^{2}}} \right \} $$
(20)
Noise-free pixel x_i is evaluated as:
$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
(21)

4.2.3 Dual Non-Local Means Bhattacharyya

Similar to the filtering process of the Dual Non-Local Means KL method, the Dual Non-Local Means Bhattacharyya filter is applied in the following way:

From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μ_i means in a non-local way, for i = 1,2,...,n
For each x_i pixel, two local variance ${\sigma _{i}^{2}}$ are estimated inside patch N_k of f × f size, defined as:
$$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
(22)
Bhattacharyya distance is calculated between the central patch $\hat {N_{i}}$ and $\hat {N_{j}}$, using the (8);
Calculate the weight $w\left (i,j \right )$ as:
$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{BC}\left (p,q \right )}{h^{2}}} \right \} $$
(23)
Noise-free pixel x_i is evaluated as:
$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
(24)

4.2.4 Dual Non-Local Means Hellinger

When compared to the Dual Non-Local Means KL method, the Dual Non-Local Means Hellinger filter is used in the following way:

From a noisy image, the standard (Euclidian) NLM filter is applied to estimate the μ_i means in a non-local way, for i = 1,2,...,n;
For each x_i pixel, two local variance ${\sigma _{i}^{2}}$ are estimate inside patch N_k of f × f size, defined as:
$$ {\sigma_{i}^{2}} = \frac{1}{f^{2}}\sum\limits_{j\epsilon p_{i}}(x_{i} - \mu_{i})^{2} $$
(25)
Hellinger distance is calculated between the central patch $\hat {N_{i}}$ by using the (10);
Calculate the weight $w\left (i,j \right )$ as:
$$ w\left (i,j \right ) =\frac{1}{Z\left (i \right )}exp\left \{{~}^{\frac{- d_{H}\left (p,q \right )}{h^{2}}} \right \} $$
(26)
Noise-free pixel x_i is evaluated as:
$$ NLM\left [ x \right ]\left (i \right ) = \sum\limits_{j \in f} w\left (i,j \right ) m_{j} $$
(27)

5 Experiments and results

In order to test and evaluate the performance of the Dual NLM filtering method when applied to the process of denoising in digital images, a group of 7 types of images, with sizes of 512 x 512 pixels, 8 bits and in shades of grey were used. The images that were used refer to: Airplane, Barbara, Camera, Car, House, Lena and Peppers. The set of images used were taken from the dataset: https://sipi.usc.edu/database/. All the images taken from USC Image Database are intended for research purposes.

In this scenario, the performance of the Dual NLM filter was compared to other filters that exist in the literature, such as: the usual Wiener filter, NLM, the Bilateral one, Total Variation, Wavelet, Anisotropic Diffusion and BM3D.

In order to compare the different methods, quantitative metrics (PSNR) were selected to evaluate the maximum peak in the signal-to-noise ratio between a reference image and its filtered image. The higher the resultant value of this index, the better is the result of the applied filter. All the images used in this article were degraded by additive Gaussian noise, with a variance of ${\sigma ^{2}_{n}} = 10$ of image pixels, which were randomly selected.

Table 1 presents the results of the evalutation done with the PSNR metrics for 7 types of different images, considering a type of mathematical model of the information theory for each column.

Table 1 PSNR’s obtained after filtering with the parametric NLM filter (Kullback-Leibler, Bhatthacharyya, Hellinger and Cauchy-Schwarz) for images corrupted by Gaussian noise with σ = 10

Full size table

When analyzing the results presented on Table 1, the Dual-NLM Cauchy-Schwarz filtering method showed the greatest final average when compared to the other filtering methods. However, there were also situation in which the Dual-NLM Kullback-Leibler filtering method allowed the obtention of satisfactory results.

Table 2 presents the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different values of entrance as parameter, applied to the calculation of KL divergence.

Table 2 PSNR obtained after filtering with the standard NLM filter and parametric NLM Kullback-Leibler filter for images corrupted by Gaussian noise with σ = 10

Full size table

Through the analysis of the results show on Table 2,the Dual-NLM KL filtering method presented, in great part of the images, the best result to reduce Graussian noises. However, the standard NLM filtering method also made it possible to obtain satisfactory results.

Furthermore, Table 3 show the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of the Bhattacharrya distance.

Table 3 PSNR obtained after filtering with the standard NLM filter and parametric NLM Bhattacharrya filter for images corrupted by Gaussian noise with σ = 10

Full size table

By analyzing the results presented on Table 3,the Dual-NLM Bhattacharrya filtering method showed the best result for the reduction of Gaussian noise.

Table 4 provides the results of evaluation done with the PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of Cauchy- Schwarz divergence.

Table 4 PSNR obtained after filtering with the standard NLM filter and parametric Cauchy-Schwarz NLM for images corrupted by Gaussian noise with σ = 10

Full size table

Analyzing the results presented on Table 4, the Dual-NLM de Cauchy-Schwarz filtering method had the greatest result to reduce Gaussian noises.

Table 5 shows the results of evaluation done with PSNR metrics for 7 types of different images and 3 different entrance values as parameter, applied to the calculation of the Hellinger distance.

Table 5 PSNR obtained after filtering with the standard NLM filter and parametric Hellinger NLM filter for images corrupted by Gaussian noise with σ = 10

Full size table

Analyzing the results presented on Table 5, the Dual-NLM Hellinger filtering method showed the best result for the reduction of Gaussian noises when compared to the traditional NLM method.

Table 6 shows the results of evaluation done with the PSNR metrics for the 7 images. Besides that, the results presented were considered for each filter: BM3D, Bilateral, Wiener, Wavelet, Anisotropic Diffusion and Total Variation.

Table 6 BM3D, Bilateral, Wiener, Wavelet, Anisotropic Diffusion and Total Variation filters, for images corrupted by Gaussian noise with σ = 10

Full size table

Analyzing the results presented on Table 6, the BM3D and Wiener filtering methods shows the greatest results for reduction of Gaussian noises in comparison to the other methods.

To illustrate the difference between those methods, Figs. 2, 3, 4, 5, 6, 7 and 8, show the visual results for the Airplane, Barbara, Camera, Car, House, Lena and Peppers images. By doing that, it is possible to observe that there is a significant different regarding the level of residual noise in the images that were filtered by the Dual NLM Cauchy-Schwar method. Furthermore, the variances of the as Cauchy- Schwar and Kullback-Leibler non-local divergences offer a greater relation between denoising and edge preservation.

From the results obtained on Tables 1 and 6,with the evaluation of the filtering methods, it is possible to observe that both filters, Cauchy-Schwarz and KL, respectively, presented a high value of percentage regarding the evaluation of PSNR. In this context, in great part of the results, the usage of the Dual NLM of Cauchy-Schwarz filter proved to be satisfactory for the application on images that were degraded by Gaussian noise. However, the variances found regarding the PSNR evaluation of the Dual NLM KL filter were significant, what allowed the achieving results that also showed to be satisfactory regarding the filtering methods of traditional NLM and the other filtering methods that were applied in this work.

6 Conclusions and final remarks

The process of denoising in images degraded by Graussian noises is a challenge task in the computational vision area, since the recent methods of image filtering are based on functions of spatial domain and frequency domain are not efficient. The filtering methods that are based on spatial domain are usually the best option to solve issues that deal with the impulsive Gaussian noise. Given this, in this article, a Dual Non-Local Means filter was presented, which combines the characteristics of classification order, non-local strategies and mathematical models that are seen on information theory.

In this scenario, the Dual Non-Local Means filtering method can be considered a philosophy of the NLM filter to solve problems of images that are degraded by impulsive noises. The variances of the mathematical metrics based on the concept of the information theory unify two types of distinct behaviors but essential to deal with the Gaussian noise. In this case, the behaviors of the proposed method have the Non-Local Means and the Dual Non-Local Means filters as approaches. In this way, several computational experiments were done during the course of this work, with multiple digital images degraded by Gaussian noise, which showed that the proposed method can generate, on average, significantly better outcomes in terms of PSNR when compared to the continuous application of the standard NLM, Total Variation, BM3D, Anisotropic Diffusion, Wiener, Wavelet and Bilateral filters.

Finally, future works can include the usage of different families of entropy, such as Renyi’s and Sharma-Mittal’s entropies. Methods that are applied to solve problems of dimensionality reduction, for example, PCA, can be used to better understand a more compact and significant representation for patches inside the browser window. Besides that, methods like the Parametric PCA, the ISOMAP and the Laplacian Eigemaps can be applied before the calculation of the Euclidian distances as a way of asymptotically guarantee the greatest similarity measures.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Amer A, Dubois E (2005) Fast and reliable structure-oriented video noise estimation. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2004.837017
Bindilatti A A, Mascarenhas N DA (2013) A nonlocal poisson denoising algorithm based on stochastic distances. IEEE Signal Process Lett. https://doi.org/10.1109/LSP.2013.2277111
Buades A, Coll B, Morel J M (2005) A non-local algorithm for image denoising. In: Proceedings - 2005 IEEE computer society conference on computer vision and pattern recognition, CVPR 2005. https://doi.org/10.1109/CVPR.2005.38
Dabov K, Foi A, Katkovnik V, Egiazarian K (2007) Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2007.901238
Diaconis P, Zabell S L (1982) Updating subjective probability. J Am Stat Assoc. https://doi.org/10.1080/01621459.1982.10477893
Hasan M, El-Sakka M R (2018) Improved BM3D image denoising using SSIM-optimized Wiener filter. Eurasip Journal on Image and Video Processing 2018(1). https://doi.org/10.1186/s13640-018-0264-z
Hoang H G, Vo B N, Vo B T, Mahler R (2014) The Cauchy-Schwarz divergence for poisson point processes. In: IEEE Workshop on statistical signal processing proceedings. https://doi.org/10.1109/SSP.2014.6884620
Jain P, Tyagi V (2016) A survey of edge-preserving image denoising methods. Inf Syst Front. https://doi.org/10.1007/s10796-014-9527-0
Jenssen R, Erdogmus D, Hild K E, Principe J C, Eltoft T (2005) Optimizing the Cauchy-Schwarz PDF distance for information theoretic, non-parametric clustering. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/11585978_3
Jenssen R, Principe J C, Erdogmus D, Eltoft T (2006) The Cauchy-Schwarz divergence and Parzen windowing: connections to graph theory and Mercer kernels. J Franklin Inst. https://doi.org/10.1016/j.jfranklin.2006.03.018
Jin C, Luan N (2020) An image denoising iterative approach based on total variation and weighting function. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-08871-0
Kailath T (1967) The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol. https://doi.org/10.1109/TCOM.1967.1089532
Kuan D T, Sawchuk A A, Strand T C, Chavel P (1985) Adaptive noise smoothing filter for images with signal-dependent noise. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.1985.4767641
Kullback S, Leibler R A (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MathSciNet Google Scholar
Kundu R, Chakrabarti A, Lenka P (2022) A novel technique for image denoising using non-local means and genetic algorithm. National Academy Science Letters. https://doi.org/10.1007/s40009-021-01052-z
Levada A LM (2021) Non-local medians filter for joint Gaussian and impulsive image denoising. In: Proceedings - 2021 34th SIBGRAPI conference on graphics, patterns and images, SIBGRAPI 2021, pp 152–159. https://doi.org/10.1109/SIBGRAPI54419.2021.00029
Mahler R P S (1998) Global posterior densities for sensor management. In: Masten M K, Stockum L A (eds) Acquisition, tracking, and pointing XII. International Society for Optics and Photonics, vol 3365. SPIE, pp 252–263, DOI https://doi.org/10.1117/12.317518
Mallat S G (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/34.192463
Milanfar P (2013) A tour of modern image filtering. IEEE Signal Process Mag 30(1):106–123. https://doi.org/10.1109/MSP.2011.2179329
Article Google Scholar
Patidar P, Gupta M, Nagawat A K (2010) Image denoising by various filters for different noise image de-noising by various filters for different noise Sumit Srivastava. Int J Comput Applic 9(4):975–8887
Google Scholar
Perona P, Malik J (1990) Scale-space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/34.56205
Petkova L, Draganov I (2020) Noise adaptive Wiener filtering of images. In: 2020 55th International scientific conference on information, communication and energy systems and technologies, ICEST 2020 - proceedings. https://doi.org/10.1109/ICEST49890.2020.9232887
Riya, Gupta B, Lamba S S (2021) An efficient anisotropic diffusion model for image denoising with edge preservation. Comput Math Appl. https://doi.org/10.1016/j.camwa.2021.03.029
Rohit V, Ali J (2013) A comparative study of various types of image noise and efficient noise removal techniques. Int J Adv Res Comput Sci Softw Eng 3(10):2277–128
Google Scholar
Rudin L I, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithms. Physica D. https://doi.org/10.1016/0167-2789(92)90242-F
Salehi H, Vahidi J (2021) A novel hybrid filter for image despeckling based on improved adaptive wiener filter, bilateral filter and wavelet filter. International Journal of Image and Graphics. https://doi.org/10.1142/S0219467821500364
Shao L, Yan R, Li X, Liu Y (2014) From heuristic optimization to dictionary learning: a review and comprehensive comparison of image denoising algorithms. IEEE Transactions on Cybernetics. https://doi.org/10.1109/TCYB.2013.2278548
Spurek P, Pałka W (2016) Clustering of Gaussian distributions. In: Proceedings of the international joint conference on neural networks. https://doi.org/10.1109/IJCNN.2016.7727627
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/iccv.1998.710815
Yahya A A, Tan J, Su B, Hu M, Wang Y, Liu K, Hadi A N (2020) BM3D image denoising algorithm based on an adaptive filtering. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-08815-8
Yue H, Sun X, Yang J, Wu F (2014) CID: combined image denoising in spatial and frequency domains using web images. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.375
Zhang Y, Sun J (2020) An improved BM3D algorithm based on anisotropic diffusion equation. Math Biosci Eng. https://doi.org/10.3934/mbe.2020269

Download references

Acknowledgments

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Computing Department, Federal University of São Carlos, São Carlos, SP, Brazil
André R. de Brito & Alexandre L. M. Levada

Authors

André R. de Brito
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre L. M. Levada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André R. de Brito.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

de Brito, A.R., Levada, A.L.M. Dual Non-Local Means: a two-stage information-theoretic filter for image denoising. Multimed Tools Appl 83, 4065–4092 (2024). https://doi.org/10.1007/s11042-023-15339-4

Download citation

Received: 04 August 2022
Revised: 05 January 2023
Accepted: 06 April 2023
Published: 20 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15339-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Dual Non-Local Means: a two-stage information-theoretic filter for image denoising

Abstract

Similar content being viewed by others

A Nonlocal Image Denoising Algorithm Using the Structural Similarity Metric

Bounded Non-Local Means for Fast and Effective Image Denoising

Filter Parameter Estimation in Non-Local Means Algorithm

1 Introduction

2 Related work