Keywords

1 Introduction

Hyperspectral Imaging (HSI) is the application of remote sensing for capturing image data beyond the visible parts of electromagnetic spectrum in order to observe reflectance from real scenes, spanning across a wide wavelength range from 400 nm to 2500 nm [17]. It facilitates visualization of areas which are not visible through conventional cameras. Its application areas range from agriculture [1], object detection, mineral exploration to military surveillance [28], pharmaceuticals [9], medicine [15], etc.; to name a few.

Noise, however, introduced during the image acquisition process deteriorates the visual quality of the acquired images and is characterized by grainy texture, horizontal and vertical stripes in the images. Noise affects the subsequent applications of object tracking, spectral unmixing and classification tasks from the HSI data. In HSI, noise is characterised by Gaussian noise as well as sparse noise [5]. Image denoising is a class of algorithms that are used to mitigate the effect of noise from acquired images.

One of the earliest references of HSI denoising can be seen in the work of [16] where signal dependent nature of noise in spectral domain is handled by exploiting the dissimilarity of signal along spatial and spectral dimension and working in the derivative domain using wavelet shrinkage operator. A filtering-based approach called Color Spectrum filtering is utilized based on the assumption that under a normal scenario, the spectrum is smooth [19]. Here, noisy channel is detected based on de-correlation with neighbouring channels. Also, assumption of Gaussian noise as the only source of noise [14] can have limiting performance on such methods. A Bayesian framework is adopted in [8] by modelling noise as non-identical and independently distributed Mixture of Gaussian (MoG) using Low Rank Matrix Factorization (LRMF) strategy to imitate the complex nature of HSI noise.

Total Variation (TV) based methods can be seen in many works. An optimization framework is designed using Split-Bregman as the optimization technique using 2D Total Variation (TV) along spatial dimension and 1D TV in the spectral dimension [3]. Similar work can be found in [2]. Using lexicographical ordering of data to exploit low rank behavior of clean data, Augmented Lagrange Multiplier (ALM) is used in [29] to recover clean HSI data from its noisy observation. A combination of nuclear norm, \(\ell _1\)-norm regularization and TV regularization is adopted in a unified framework in [11]. Nuclear norm is used to exploit spectral low rank property while TV regularization is used as prior to preserve piece-wise smooth regions of the image. Similar approach is applied using spatio-spectral TV augmented with group low rank property in [13]. Using ALM as the optimization strategy in a variational framework is exploited by combinations of TV regularization, \(\ell _1\)-norm regularization and frebonius norm.

Sparse dictionary of spectral signature in HSI data is used as prior for restoration of coloured (RGB) hyperspectral data [4]. Sparse dictionary learning is explored by establishing redundancy and correlation (RAC) along spatial dimension by global RAC and along spectral dimension by local RAC to remove noise from spatio-spectral dimensions [30]. An iterative non-local strategy is delivered in [22] using decomposition of 3rd order tensor to 4th order tensors to obtain non-local similarity along spatial direction and global similarity along spectral direction. Similarly, non-local self similarity along with low-rank approximation is focused in the works of Chang et al. [7]. In a method proposed in [23], utilizing the low-rank property of clean HSI data, noisy image is reconstructed using robust principal component analysis (RPCA). Authors in [25] effectively denoised HSI data using non-local spatial similarity and low rank constraint along spectral dimension. Similar techniques using non-local similarity and low-rank behaviour along spatial-spectral dimension is explored in works of [24]. A novel approach is devised in [10] using hypothesis testing based on Kullback- Leibler Divergence (KLD) for approximating Poisson distributed HSI data by Gaussian distribution and vice-versa. The proposed method is tested with applications to Compact Reconnaissance Imaging Spectrometer for Mars (CRISM).

In this paper, we intend to design a MAP based variational framework for the removal of mixed Gaussian and random-valued impulse noise from HSI data. As we will discuss in the proposed section, we split the image degradation model for HSI into two parts and fit a Maximum a posteriori (MAP) estimator to the resultant model. The ensuing variational model helps in better recovery of noisy data and artefacts, as has been shown in the experimental section.

Rest of the paper is organised as follows. We discuss image degradation model faced in HSI data in Sect. 2 and set a background for our proposed technique to be discussed in Sect. 3. We have conducted extensive experiments on synthetically corrupted (Subsect. 4.1) and real HSI data (Subsect. 4.2) in Sect. 4. Finally, paper is concluded in Sect. 5.

2 HSI Degradation Model and Objective

Image formation of HSI data is generally modelled as [5]:

$$\begin{aligned} f=u+g+s \end{aligned}$$
(1)

where \(u\in R^{wh\times c}\) is the clean data corrupted by additive Gaussian noise g with mean 0 and variance \(\sigma _n^2\) (approximated by normal distribution \(g\sim \mathcal {N}(0,\sigma _n^2)\)) and additive impulse/sparse noise approximated by Laplacian distribution with given location (0) and scale parameter \((\sigma _s)\); denoted by \(s\sim \mathcal {L} (0,\sigma _n)\). w, h and c are the width, height and number of spectral bands in the image respectively. To exploit spatio-spectral correlation among different bands of HSI data, Casorati matrix representation is employed [29] (by vectorisation of all HSI bands to obtain a 2D matrix). As a result, each band is reshaped into a vector of size \(wh\times 1\) to produce a resultant 2D matrix of size \(wh\,\times \,c\) obtained by concatenation of all the bands together. This helps in proper utilisation of similarity among neighbouring pixels in surrounding layers. A combination of Gaussian-impulse corrupted data is represented by observation f. Impulse noise affects limited number of pixels but affects them heavily and there is no easy way to recover impulse corrupted noisy pixels. Impulse noise can be fixed valued impulse noise (FVIN) or random-valued impulse noise (RVIN) [12]. Salt-and-pepper noise is a FVIN where pixels are randomly replaced by two extreme values \([u_\text {min},~u_\text {max}]\). RVIN, on the other hand, replaces pixels with any random value in the range \([u_\text {min},~u_\text {max}]\) and hence, is a more practical assumption [5, 11, 12, 29] in HSI data. We have therefore made the same noise assumption for modelling our degradation scenario.

Our objective is to recover an image \(\hat{u}\), from the observed noisy data f, which will be visually as close as possible to u. Following Bayesian formulation we have:

$$\begin{aligned} \hat{u}=\underset{u}{\text {arg max }} p(u|f)=\underset{u}{\text {arg max}} p(f|u)\cdot p(u) \end{aligned}$$
(2)

Maximizing Eq. (2) is same as minimizing the negative log-likelihood of the function (due to monotonically increasing property of \(\log \) function):

$$\begin{aligned} \hat{u}=\underset{u}{\text {arg min}}-\log p(u|f)=\underset{u}{\text {arg min}} -\log (p(f|u)+ p(u)) \end{aligned}$$
(3)

We derive our variational formulation by modelling the likelihood term p(f|u) in accordance with the appropriate noise model and the prior term p(u) with respect to the property we intend to achieve in our denoised image.

3 Proposed Denoising Framework

We propose to re-write Eq. (1) i.e., the image formation model by splitting it into two parts with the help of a new variable v such that:

$$\begin{aligned} \bigg \{ \begin{array}{lr} v=u+g \\ f=v+s \end{array} \end{aligned}$$
(4)

The variable v now accounts only for the Gaussian noise degradation of u and then by adding impulse noise s to Gaussian corrupted v leads us to the final composite noisy image f.

Since v is a Gaussian corrupted observation, its MAP estimator can be written as:

$$\begin{aligned} \hat{u}=\underset{u}{\text {arg min}}-\log \Bigg (\dfrac{1}{\sqrt{2\pi \sigma _g^2}}\exp \Bigg (-\dfrac{{(v - u)}^2}{2\sigma _g^2}\Bigg )\Bigg )-\log p(u) \end{aligned}$$
(5)

The prior term can be modelled as Gibbs prior:

$$\begin{aligned} p(u)=e^{-\alpha R(u)},~\alpha >0 \end{aligned}$$
(6)

We choose the Total Variation (TV) prior because of its high quality denoising ability while preserving the high frequency details of the image [18]. We therefore choose \(R(u)=|\nabla u|\), where \(\nabla \) is the gradient operator. Substituting in Eq. (5), we obtain:

$$\begin{aligned} \hat{u}=\underset{u}{\text {arg min}}\Bigg (\dfrac{{(v-u)}^2}{2\sigma _g^2}+\alpha |\nabla u|\Bigg ) \end{aligned}$$
(7)

which can be equivalently written as:

$$\begin{aligned} \hat{u}=\underset{u}{\text {arg min}}~\dfrac{1}{2}\Vert v-u\Vert _2^2+\Vert \nabla u\Vert _1 \end{aligned}$$
(8)

Similarly f is corrupted by additive impulse noise s on v, which is already a Gaussian corrupted image. Its MAP estimator is therefore given by fitting the likelihood term with laplacian distribution [12] and is given by (using the same expression for \(p(v)= |\nabla v|\)):

$$\begin{aligned} \hat{v}=\underset{v}{\text {arg min}}-\log \Bigg (\dfrac{1}{2\sigma _s}\exp \Bigg (-\dfrac{|f-v|}{\sigma _s}\Bigg )\Bigg )-\log p(v) \end{aligned}$$
(9a)
$$\begin{aligned} \hat{v}=\underset{v}{\text {arg min}}\Bigg (\dfrac{|f-v|}{\sigma _s}+ |\nabla v|\Bigg ) \end{aligned}$$
(9b)

This is equivalent to minimizing the following energy functional:

$$\begin{aligned} \hat{v}=\underset{v}{\text {arg min}}\Vert f-v\Vert _1+\Vert \nabla v\Vert _1 \end{aligned}$$
(10)

We propose to club Eq. (8) and Eq. (10) in a successive manner, such that:

$$\begin{aligned} \bigg \{ \begin{array}{lr} \hat{v} = \underset{v}{\text{ arg } \text{ min }}~\Vert f - v\Vert _1 + \lambda _1 \Vert \nabla v\Vert _1 \\ \hat{u} = \underset{u}{\text{ arg } \text{ min }}~\dfrac{1}{2}\Vert \hat{v} - u\Vert _2^2 + \lambda _2 \Vert \nabla u\Vert _1 \end{array} \end{aligned}$$
(11)

It is important to mention here that \(\ell _2\) data fidelity term \(\Vert \cdot \Vert _2^2\) penalises loss considering Gaussian distribution of noise with TV as the regularization term \((\Vert \nabla u\Vert _1)\) utilizing prior information from clean data. Similarly, \(\ell _1\) data fidelity term \(\Vert \cdot \Vert _1\) penalises loss considering laplacian distribution (impulse) of noise with TV regularization on \((\Vert \nabla v\Vert _1)\). Also, from a different point of view, \(\ell _1-\)norm fidelity term is characterized by the contrast invariant property and lack of continuous dependence on data [6]. As a result of separate TV regularization terms, we are able to remove residual Gaussian and impulse noise from our restored data while successively handling the effects of both noise sources. Minimizing Eq. (11) with respect to v and u gives us the following:

$$\begin{aligned} v_{k+1}=v_k-\alpha \Bigg [\Bigg ( \dfrac{(f-v_k)}{\sqrt{{(f-v_k)}^2+\delta }}\Bigg ) +\lambda _1~ \text {div}\Bigg (\dfrac{\nabla v_k}{|\nabla v_k|_\gamma } \Bigg )\Bigg ] \end{aligned}$$
(12a)
$$\begin{aligned} u_{k+1}= u_k-\alpha \Bigg [(\hat{v}-u_k)+\lambda _2~\text {div} \Bigg (\dfrac{\nabla u_k}{|\nabla u_k|_\beta }\Bigg )\Bigg ] \end{aligned}$$
(12b)

Eq. (12a) and Eq. (12b) are the solutions obtained using first order optimization technique like gradient descent to obtain the solutions for true image \(\hat{u}\sim u\). \(u_k\) and \(v_k\) are the corresponding solutions obtained at iteration k. In Eq. (12a), initial data \(v_0\) is set with noisy observation f. Optimal value of v obtained at \(\hat{v}\) is free from impulse noise but it still contains residual Gaussian noise. To remove this residual noise, \(\hat{v}\) is used as initial value of u to obtain optimized value of u at \(\hat{u}\). \(\delta \), \(\beta \) and \(\gamma \) are very small positive constant terms introduced in order to avoid division by zero and \(\alpha \) is the step size. \(\lambda _1\) and \(\lambda _2\) are the regularization hyperparameters. As discussed in the experimental section, values of hyperparameters are obtained after optimising them for the best metric result.

Fig. 1.
figure 1

Quantitative results for Beers dataset synthetically corrupted by Gaussian-impulse noise at level \((G,I)=(10\text { dB, 15}\%)\).

Table 1. (Mean) Peak Signal to Noise Ratio ((M)PSNR)

4 Experiments and Discussion

In this section, we conduct experiments on synthetic and real HSI datasets to test the potential applicability of our technique over state-of-the-art methods available in the literature for denoising HSI data. For quantitative evaluation, we have used the Peak Signal to Noise ratio (PSNR) and Structural Similarity (SSIM) [20] metrics. SSIM is a reliable metric used to compare the restoration results perceived by Human Visual System. For comparison, we use three techniques; namely: Hyperspectral Image Restoration Using Low-Rank Matrix Recovery (LRMR) [29], Reducing Mixed Noise from Hyperspectral Images- Spatio-Spectral Total Variation (SSTV) [3] and Total-Variation Regularized Low-Rank Matrix Factorization for Hyperspectral Image Restoration (LRTV) [11].

4.1 Results on Synthetic Data

In this section, we replicate the real degradation scenario in HSI data. As discussed in the introduction and proposed method section, images are synthetically corrupted with Gaussian noise of specific signal to noise ratio (SNR) in dB followed by random-valued impulse noise specified in terms of percentage. These synthetic noise cases are input to our benchmark techniques as well as our proposed method.

We have obtained our images from two different sources: University of South Carolina-Signal and Image Processing Institute (USC-SIPI)Footnote 1 [21] consisting of volumes segregated according to the nature of images: textures, aerials, miscellaneous and sequences. All images are available in coloured and grayscale formats with 8 bits/pixel in three different sizes: 256 \(\times \) 256, 512 \(\times \) 512 and 1024 \(\times \) 1024. To avoid the computational burden encountered especially in comparing techniques in terms of their execution time, we have considered images of size \(256\times 256\) in grayscale format. We have normalized all images in the range [0...1] to prevent bias caused when obtaining metric results.

Table 2. (Mean) Structural Similarity ((M)SSIM)

To further study the effect of denoising along spectral dimension by different methods including ours, we have used clean multi-spectral datasets obtained over wavelength range 400 nm to 700 nm with 31 spectral bands from CAVE databaseFootnote 2 [27]. Images are obtained using multi-spectral CCD camera with spatial resolution of 512 \(\times \) 512 pixels along spatial directions separated by wavelength of 10 nm along spectral domain in 16 bit PNG format. The entire database contains 32 images organised into 5 scene types: stuffs, skin and hair, paints, food and drinks and real and fake.

Fig. 2.
figure 2

Visual results of synthetically corrupted bridge image at varying levels of Gaussian-impulse noise. Row 1: \((G,I)=(12\text { dB, }10\%);~ (\lambda _1,\lambda _2)~ =~ (1e-3,1e-4)\), Row 2: \((G,I)=(10\text { dB, }15\%);~ (\lambda _1,\lambda _2)~ =~ (0.5e-3,0.5e-3)\) and Row 3: \((G,I)=(5\text { dB, }20\%);~ (\lambda _1,\lambda _2)~ =~ (1e-2,1e-3)\)

All the images used as synthetic data are corrupted by a combination of Gaussian-Impulse noise of varying levels. SNR is used to specify Gaussian noise levels of particular variance. Higher the SNR, lesser is the noise (lesser is the variance) and vice-versa. SNR is specified in decibel (dB). Impulse noise is specified in terms of percentage (%). More is the percentage of impulse noise, more is the intensity of noise. We have conducted our experiments by corrupting all data with six different levels of Gaussian-impulse noise: \((G~,I)~=~(20\text { dB}, 0.5\%)\), \((18\text { dB}, 1\%)\), \((15\text { dB}, 5\%)\), \((12\text { dB}, 10\%)\), \((10\text { dB}, 15\%)\) and \((5\text { dB}, 20\%)\). Figure 1 shows the layer-wise PSNR and SSIM comparison for one noise level: \((10\text { dB}, 15\%)\) for beers dataset (from CAVE database) for layers 1 to 31. We can clearly see that the proposed technique outperforms the competing algorithms. In Table 1 and Table 2 respectively, we provide PSNR and SSIM for SIPI datasets and mean PSNR and mean SSIM for CAVE dataset multichannel beers image. In terms of quantitative evaluation, the proposed method gives the best metric values signifying better denoising over the others.

Fig. 3.
figure 3

Visual results of synthetically corrupted beers [26] image at varying levels of Gaussian-impulse noise. Row 1: \((G,I)=(12\text { dB, }10\%);~ (\lambda _1,\lambda _2)~ =~ (0.8e-4,1e-5)\), Row 2: \((G,I)=(10\text { dB, }15\%);~ (\lambda _1,\lambda _2)~ =~ (1.5e-3,1e-3)\) and Row 3: \((G,I)=(5\text { dB, }20\%);~ (\lambda _1,\lambda _2)~ =~ (1e-2,1e-1)\)

We present visual results for three different noise cases ((12 dB, 10%), (10 dB, 15%) and (5 dB, 20%)) in Fig. 2 and Fig. 3 for bridge and beers datasets respectively. For beers dataset, we have shown results of layer 24. We can clearly observe that our proposed technique is able to remove noise of all levels from both datasets without introducing unnecessary artefacts and without loss of detailed structures in images. Although LRTV shows closer PSNR and SSIM values (over the other methods) against our proposed technique, details in bridge images are almost lost in LRTV. In addition to that, we can observe ringing artefacts for noise level \((10\text { dB}, 15\%)\) for beers dataset in LRTV.

Fig. 4.
figure 4

Visual results of Salinas \( (\lambda _1,\lambda _2)~ =~ (2e-3,1e-2)\) and Indian Pines dataset \( (\lambda _1,\lambda _2)~ =~ (1e-4,1e-3)\).

4.2 Results on Real Data

To check the performance of the proposed technique on a practical HSI scenario, we have conducted experiments on two real HSI datasets. Both the images are obtained from Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. Test site for Indian PinesFootnote 3 is present in north-west Indiana obtained over spatial resolution of \(145\times 145\) pixels with reflectance bands over range 0.4 to 2.5 \(\upmu \)m with 200 bands. The area covered by the image is two-third agricultural and one-third forest. Our second dataset is a 224 channels image from Salinas ValleyFootnote 4, California covering an area with \(512\times 217\) pixels in spatial dimension.

In Fig. 4, we have shown results for Salinas and Indian pines dataset for layer 3 and 111 respectively. We have chosen these layers in real data with a view of high grainy texture rendered by high levels of noise. With fairly smooth regions in Salinas dataset, we can see that our technique performs best among all competing methods. This becomes more evident in magnified sections of images where LRMR performs worst and SSTV develops a white cover upon restoration. On the other hand, large sections of residual artefacts can be seen in left side region of LRTV results. As far as Indian pines dataset is concerned, the proposed technique preserves the details while significantly removing the granular effects of noise. Details are particularly smoothed out in LRMR while noise is not properly removed in SSTV. LRTV still provides a better compromise between these two extremes.

5 Conclusion

In this paper, we have proposed a novel image denoising technique for HSI data corrupted by a mixture of Gaussian-impulse noise. We have designed our variational framework by modelling image degradation using a combination of Gaussian-Laplacian distribution. Splitting the degradation model into two parts and developing step-wise denoising framework provides necessary denoising gains with special consideration for the removal of residual artefacts. Experimental results on synthetically corrupted data and real HSI data suggest utility of our technique in real scenario. As a future work, we intend to work on learning based techniques to utilize the result of learning-based denoiser as prior into the MAP based iterative optimization techniques. Further, exploring the low-rank behaviour of HSI data can help mitigate computational complexity incurred and lessen burden of large size in HSI data.