1 Introduction

Images taken in foggy weather lose their color, fidelity and contrast from the law of light being absorbed and dispersed during propagation by the medium such as water droplets present in the atmosphere. Many automatic systems firmly depend upon the clarity of input images. They may fail in the case of degraded images. Therefore, improving the image de-fogging techniques may welfare many computer vision and image understanding applications including image or video improvement, aerial imagery, image classification and remote sensing applications etc. Since fog concentration may vary from place to place, which is difficult to detect from a foggy image. Therefore, image de-fogging is a confronting task.

For de-fogging an image, initial researchers used traditional image processing techniques (e.g., histogram based image enhancement [17, 33] etc.). Since a single foggy image can rarely produce more information about fog conditions, therefore, the de-fogging effect of such techniques was constrained. After that, number of de-fogging techniques based upon multiple images were proposed. In [29, 32], polarization based techniques were used for de-fogging. Images captured with multiple orientations of polarization were used in de-fogging process. Narasimhan et al. [22, 24] proposed de-fogging techniques based upon same scene’s multiple images in changing climate conditions. In [18, 23], de-fogging was carried out interactively based upon given information about the scene depth. A significant development has been made in the field of physical model based single image de-fogging. Tan [36] proposed a de-fogging technique based upon maximizing the image local contrast using Markov Random Field. This technique could attain impressive performance, but the de-fogging results suffer from over saturation. Fattal [7] proposed a de-fogging technique based upon Independent Component Analysis, but this is computationally intensive. Also, this is not effective in dealing with dense foggy images. Later, He et al. [10] discovered the dark channel prior (DCP) concept according to which, for non-sky patches, there exists at least one-color channel whose intensity is close to zero. Using DCP, fog thickness was estimated to restore the image by using atmospheric scattering model. Although, in most cases DCP is effective, but this is not able to handle sky regions. Moreover, the time complexity of DCP technique is very high. Later on, to overcome the shortcomings of DCP technique, some enhancements were proposed [3,4,5, 8, 9, 11, 13,14,15,16, 19,20,21, 26, 27, 30, 31, 34, 35, 37,38,39,40,41,42,43,44].

In the field of image de-fogging, wavelets have also been used. By taking advantage of the fact that fog typically distributes in the low frequency spectrum, such techniques generate a de-fogging framework to exploit the relationships of wavelet coefficients for fog removal and texture enhancement simultaneously. Wang et al. [39] proposed a fast wavelet transform technique which removes fog from the image and improves its sharpness. In the fog removing stage, two coarse transmission maps using dark channel prior are fused. For performing simultaneous de-fogging and enhancing sharpness, a modified fast wavelet transform based unsharp masking framework is applied. Liu et al. [20] proposed a single image de-fogging technique using multi-scale correlated wavelet framework. This technique handles the de-fogging problem in frequency domain. It aims not only to significantly increase the perceptual visibility of a scene, but also to reduce the noise effect as well. Kansal et al. [14] used the concept of dual tree complex wavelet transform to simultaneously de-fog and sharpness enhancement of the image.

Inspired by haze-relevant features, Tang et al. [37] estimated transmission map by incorporating four haze relevant characteristics using Random Forest model. Zhu et al. [44] proposed a prior called as CAP for image de-fogging. This prior is very simple and effective, which uses a linear model to recover the scene depth of an image. Parameters of the model are learned using supervised learning. Cai et al. [3] propose a trainable end-to-end system called DehazeNet for transmission estimation, with specially designed feature extraction layers. Ren et al. [27] propose a multi-scale convolutional neural network (MSCNN) for learning the transmission map of the foggy image. It consists of a coarse scale network predicting a holistic transmission map and a fine-scale network for refining the map. However, these methods usually take CNNs to learn a mapping from input foggy images to the transmissions or for-free images, without considering fog-related priors to constrain the mapping space compared with the traditional methods. Yang and Sun [43] proposed a novel deep learning-based method that integrates fog imaging model constraints and image prior learning into a single network architecture. Artificial neural networks (ANN) have also begun to attract more attention in computer vision systems. ANN are computing systems inspired by biological neural networks trying to emulate the microstructure of the brain. One of the most popular ANN classes is the multilayer perceptron (MLP) because of its robustness and ease of implementation. It is an ANN with at least three layers of neurons and it commonly uses a supervised learning strategy called back propagation. Colores et al. [28] proposed a technique for single image de-fogging using an MLP to estimate the transmission map rather than RGB images as input data.

Since machine learning applications have been gaining lot of attention in the recent years with their increasing demand, growing usage and a scope of improvement. Such techniques can lead to produce better de-fogging results in the future. This work studies the limitations of Zhu et al’s technique described above. It uses guided filter for transmission refinement which may suffer from some halo artifacts, as the local linear model used in this filter is not able to finely characterize the image near some edges. However, the scattering coefficient in the atmospheric scattering model cannot actually be regarded as a constant, Zhu’s technique proved to be unstable in its de-fogging performance. Due to this reason, the recovered images suffer from dullness and higher illumination variations. Both issues have been resolved in the proposed work in such a way that the time complexity is not compromised. Time complexity of the proposed technique has also been greatly reduced by using minimum/maximum preserving down sampling approach without compromising the visual quality of the de-fogged image. In brief, the proposed technique produces higher contrast [6], low saturated results efficiently in very less amount of time.

The remainder of this paper is organized as follows: In Section 2, the overview of atmospheric scattering model and CAP technique is given. Section 2.1 describes the CAP linear model whereas Section 2.2 contains the process of CAP parameter estimation, Section 2.3 describes depth estimation and scene recovery using atmospheric scattering model. Section 3 explains the overview of the proposed work, in Section 3.1, proposed depth estimation has been discussed, Section 3.2, contains of details of gradient domain guided image filter for depth refinement. Section 3.3 contains the image restoration process and non uniform illumination compensation. In Section 4, subjective and objective evaluation of the proposed technique with existing techniques have been discussed. Finally, Section 5 contains the conclusion of the proposed work.

2 Background

As the input foggy image reveals a very small amount of scene detail information, it is challenging to detect and extract fog from a given image. In this section, the fog image de-gradation model is discussed, which is used to detect and extract the fog from a given foggy image in current research. Zhu et al. [44] proposed a technique of image de-fogging, which finds the parameters of this model and is discussed in this section.

2.1 Fog image de-gradation model

In a clear day scenario, every scene point emits the energy from different lighting sources including light reflected by the ground, sunlight or skylight. When this energy reaches to the imaging system, a fraction of it is lost. Without fog effect, an outdoor image reflects multiple colors, but during fog, the situation becomes different. Two mechanisms including: direct attenuation and airlight take place in the imaging system during foggy weather, which are described by

$$ I(x)=L_{0}(x)e^{-\beta d(x)}+L_{A}(1-e^{-\beta d(x)}) $$
(1)

where

$$ t(x)=e^{-\beta d(x)} $$
(2)

Here x represents the image coordinate, I and L0 represent the foggy and de-fogged images respectively. β is the medium scattering coefficient and represents the ability of a unit volume of atmosphere to scatter light in all directions. d is the depth of an image scene point at x. LA is the global atmospheric light, t is the transmission map which is inversely related to depth. Direct attenuation takes place by decrease in reflected energy by the scene point and weakens the image brightness. The term L0(x)eβd(x) in a fog degradation model, is the direct attenuation component. The atmospheric light is produced by the effect of environmental illumination scattering. Another factor LA(1 − eβd(x)) in (1) is the airlight which is an additive term. The effect of airlight boosts the brightness but reduces the saturation of the foggy image. It puts a significant contribution in the image degradation, therefore; foggy regions in an image are characterized by lower saturation and higher brightness. In the next section, Zhu’s technique is discussed, which uses the brightness and saturation channel of a given image for the image de-fogging process.

2.2 Overview of CAP technique

Zhu et al. [44] observed that saturation and brightness of a foggy image change strongly with changing fog concentration. In general, the concentration of fog increases with increasing scene depth. By assuming that the scene depth is positively correlated with fog concentration, Zhu et al. proposed a following linear model to estimate the parameters of a fog degradation model to recover a foggy image:

$$ d(x)=\theta_{0}+\theta_{1}\times v(x)+\theta_{2}\times s(x)+\epsilon(x) $$
(3)

Here x represents the image coordinates, d denotes the image scene depth at x, v represents the scene brightness and s denotes the saturation, 𝜃0, 𝜃1 and 𝜃2 are the linear coefficients, 𝜖(x) denotes the random error of the model (3). This model given by Zhu et al. is called as CAP. One of the main importance of this model is its edge preservation nature as if gradient is estimated from (3) as:

$$ \delta d=\theta_{1}\times \delta v+\theta_{2}\times \delta s+\delta \epsilon $$
(4)

Here δ denotes the gradient of a given channel. According to Zhu et al., 𝜖 tends to be very low and hence the CAP model preserves the gradient according to (4). To accurately learn the 𝜃0, 𝜃1and𝜃2 coefficients, training data is required. Data in this case consists of foggy images and their associated ground truth depths. But it is hard to obtain the real depth maps. Therefore, inspired from Tang et al.’s technique [37], synthetic depth maps and foggy images are generated from clear images taken from Google and Flicker. For training the linear model, 500 training samples having 120 million scene points were collected. Having 517 epochs, the best learning results are obtained for, 𝜃0 = 0.121779 , 𝜃1 = 0.959710 , 𝜃2 = − 0.780245 and σ2 = 0.041337. These parameters are then used for estimating depth map by using (3). Depth estimated by using (3) may fail for some cases like white objects. For such objects, the brightness values are usually high and the saturation values are low. Therefore, it will consider white objects as the distant objects. So, the raw depth is modified by applying local minimum operation as:

$$ d_{r}(x)=min_{y\in {\Omega}_{r}(x)}d(y), $$
(5)

Here Ωr(x) denotes the window of radius r centered at x. y is the minimum value pixel in the local window Ωr(x). dr is the final depth map. However, due to patch wise local minimum operations, this depth map generates halo artifacts. Therefore, to remove the halo artifacts, Zhu et al. applied guided filter (GF).

After estimating the scene depth map d, the global atmospheric light A, the transmission map t is estimated according to (2). Finally, the scene radiance J is recovered by using (1) as

$$ J(x)=\frac{I(x)-A}{min\{max\{e^{-\beta d(x)},0.1\},0.9\}}+A $$
(6)

The values for the transmission map are restricted between 0.1 and 0.9. A small value of β generates smaller transmission values and hence final de-fogging results look still foggy in the distant regions. Whereas, a large β causes overestimation of the transmission. Therefore, Zhu et al. took a moderate value of β = 1.0.

This technique is based on the difference between brightness and saturation of the pixels within the foggy image. It creates a linear model for the scene depth of the foggy image and uses it to learn the parameters using a supervised learning approach. Using this, the depth information can be well recovered. By using this information, the scene radiance of the foggy image can be recovered easily. This technique produce good quality de-fogged images but still has some limitations. As it estimates initial depth map by using brightness and saturation maps of the input image which tends to consider the white scene objects being distant and therefore applies 15 × 15 window based minimum operation to consider each pixel in the neighborhood. This requires total 225 × M × N (M × N image) operations which takes a long processing time. Also, to refine the initial depth map, GF is used which is a well known smoothening filter and preserve edges in the image. Although, this filter is efficient in recovering image edges, but in case of fine edge details, it does not work properly [19]. In reality, the foggy image may be influenced from different lightening sources present in the atmosphere. But, CAP considers fog image degradation model in homogeneous environment and a constant value of atmospheric light is used to recover entire image pixels according to (6). Due to this, the recovered image contains dullness or illumination variations which is another shortcoming of this technique. In the proposed work, these limitations have been dealt in such a way that the time complexity of the whole image de-fogging process is not compromised which has been explained in the next section.

3 Proposed technique

In this section, the overall proposed technique is discussed which addresses above described limitations of CAP technique. The window based operation applied in (5) is a computational expensive task because it finds the local minimum values independently for each pixel of an image. For depth map refinement, Zhu et al. [44] used GF, which is a well known smoothing filter with edge-preserving property and low time complexity. However, this filter may not represent the image near fine edges. Also, the de-fogging results of CAP suffer from illumination variations and hence make the final images look diminished or dull. These limitations have been addressed in the proposed work are as following:

  1. i

    In the proposed work, to refine the depth map, GDGF is used instead of GF, which incorporates explicit first-order edge attentive constraints to better recover the edges in the de-fogged images.

  2. ii

    To compensate non uniform illumination variations in the de-fogged image, the initial de-fogging results are modified with the help of bright channel prior and illumination reflection model.

  3. iii

    To reduce the execution time, image sub-sampling mechanism is used in different ways at different steps without compromising the quality and constraints of image de-fogging.

    1. (a)

      Minimum preserving sub-sampling is applied to estimate the initial depth map using CAP technique.

    2. (b)

      GDGF is modified by using bilinear sub-sampling inspired from fast GF.

    3. (c)

      Maximum preserving sub-sampling is applied to estimate the bright channel prior for non-uniform illumination compensation.

Figure 1 shows the block diagram of the proposed technique which has been discussed in further sub sections. Since the proposed technique is the improvement of CAP which is a model (1) based de-fogging technique. It involves estimation of two parameters i.e.LA and d(x) ort(x). Using the foggy input image, initial depth map d is obtained by using CAP technique. Patch based depth map is then obtained by applying minimum preserving sub-sampling mechanism described in Section 3.1. The edge preserving smoothening of depth map is performed by using GDGF described in Section 3.2. Inspired from [9], GDGF is implemented using down sampling the input and guidance images to reduce the execution time. Global atmospheric light is found from the top 25% image rows as explained in Section 3.3. The de-fogged image is then obtained by using (6). The de-fogged image may suffer from dullness and higher illumination variations. Such variations are removed in the proposed work by using Lambert’s law of illumination reaction. This helps to compensate non uniform illumination and causes simultaneous dynamic range modification, color consistency, and lightness rendition without producing the artifacts in a de-fogged image as explained in Section 3.4.

Fig. 1
figure 1

Block diagram of the proposed work

3.1 Depth map estimation

In the proposed work, initial depth map is estimated by using (3) of CAP technique. This depth estimation may fail in some cases like for the case of white objects [44] etc. For such objects; the brightness values are usually high and the saturation values are low. Therefore, direct depth d estimated using (3) may consider the white objects as the distant ones. So, the raw depth d is modified by local neighborhood in Zhu’s technique [44] as

$$ d_{r}(x)=min_{y\in {\Omega}_{r}(x)}d(y) $$
(7)

Where dr(x) is a local neighborhood depth at location x with window Ωr(x) having a radius r centered at x. According to (7), to estimate the whole depth map for M × N pixels in an image, r2 × M × N operations are required. Therefore, in the proposed work, inspired from Kansal et al. [15], we estimate the above window based depth map using the image down sampling mechanism so that the total operations can be minimized using a down sampled image. Down sampling is applied in such a way that most of the local minimums of the image are preserved in the resultant depth map.

Since, patch based minimum operation obtains locally constant or redundant values, therefore in this work, for a given initial depth map, its window based depth map is estimated in such a way that it minimizes the redundant calculations along with preserving the local minimum value in a window. Following are the steps to obtain window based depth dr using the initial depth map d:

  • An initial depth map d having size M × N, is divided into non overlapping s blocks of fixed size (B1, B2.., Bi.....Bs) as shown in Fig. 2. Here size of blocks is taken as 5 × 5.

  • Then, a down sampled depth image dds is created in such a way that each pixel in dds is obtained from each block Bi by taking the minimum in the respective block as

    $$ d^{ds}(x)=min(B_{i}), ~~~~ i=1,2,3,.....s $$
    (8)

    The number of pixels in dds is equal to s. Here min specifies the mathematical operation to find the minimum depth value in a block Bi.

  • After obtaining dds, its window based dark image \(d^{ds}_{r}\) is estimated as

    $$ d^{ds}_{r}(x)=min_{y\in\omega(x)}(d^{ds}(y)) $$
    (9)

    Here ω is the window of size 3 × 3.

  • After this, nearest neighbor up sampling of \(d^{ds}_{r}\) is performed to obtain the equivalent window based depth map dr. Now, this depth map will be further refined as described in the next section for depth map estimation.

Fig. 2
figure 2

Division of depth map d into fixed size blocks.

3.2 Depth map refinement with fast gradient domain guided filter

Guided filter is an image smoothing filter having the special property of preserving the edges. It is also well known for its low computational cost. However, a GF is miserable to halo artifacts as it uses a local linear model for an image representation which may not preserve edges cleanly. Kou et al. [19] proposed a modification in GF called as GDGF. In this, first order edge awareness constraints are incorporated in GF which helps to better preserve the edges in smoothed image. GF obtains a smoothed image q by applying a local linear model on the filtering image and a reference image as shown in (10). The filtering image is denoted as p, and the reference image G. Ωr(k) denotes a rectangular window with center k having radius r.

$$ q_{i}=a_{k}\times G_{i}+b_{k}, \forall i \in {\Omega}_{r}(k) $$
(10)

Here ak and bk represent two constants in a window Ωr(k). The values of these constants are obtained by minimizing the following cost function:

$$ E(a_{k},b_{k})=\sum\limits_{i \in {\Omega}_{r}(k) }[(a_{k}\times G_{i}+b_{k}-p_{i})^{2}+\lambda \times {a_{k}^{2}}] $$
(11)

Here k denotes a regularization parameter penalizing a large ak. Optimum values of ak and bk are obtained by using a linear regression technique. In GF, no specific constraints were applied to handle the gradient. Therefore, in some cases, it becomes difficult to preserve the edges. Kou et al. added first order edge attentive constraints to form GDGF from a GF. The filtered results obtained by GDGF are closer to an input image around the edges. Therefore, in this, the edges are better preserved in comparison to GF. The cost function of GDGF is represented as

$$ E(a_{k},b_{k})=\sum\limits_{i \in {\Omega}_{r}(k) }\left[(a_{k}\times G_{i}+b_{k}-p_{i})^{2}+\frac{\lambda}{{\Gamma}_{G,k}} (a_{k}-\gamma_{k})^{2}\right] $$
(12)
$$ \gamma_{k}=1-\frac{1}{1+e^{\psi_{k}}}, ~~~\psi_{k}=\frac{4 \times (S_{k}- \theta_{S} )}{\theta_{S}-min(S)} $$
(13)
$$ {\Gamma}_{G,k}=\frac{1}{N}\sum\limits_{i=1}^{N}\frac{S_{k}+\epsilon}{S_{i}+\epsilon} $$
(14)
$$ S_{k}=\sigma_{G,1}(k) \times \sigma_{G,r}(k) $$
(15)

Where γk is an edge attentive filtering parameter. ΓG, k is an edge attentive weight filtering parameter which measures the importance of pixel k with respect to the whole guidance image. 𝜖 is a small constant and its value is generally taken as (001 × L)2, where L represents the dynamic range of the input image, N is the total number of pixels in a guidance image, Sk denotes the variance of the area centered at k, S represents all Si(i = 1...N), 𝜃S denotes the mean value of all Si. σG,1 and σG, r denotes the standard deviations for the windows Ω1(k) and Ωr(k) of image G. This indicates that Sk estimates the variance of the regions having radius 1 and r at the same time. The optimum values for GDGF are then estimated as:

$$ a_{k}=\frac{\theta_{G. p,r}(k)-\theta_{G,r}(k) \times \theta_{p,r}(k)+\frac{\lambda}{{\Gamma}_{G,k}}\times \gamma_{k}}{\sigma_{G,r}^{2}(k)+\frac{\lambda}{{\Gamma}_{G,k}}}, $$
(16)
$$ b_{k}=\theta_{p,r}(k)-a_{k}\times \theta_{G,r}(k), $$
(17)

In (16), (.) denotes the element wise multiplication of two different matrices. Here 𝜃G.p, r(k), 𝜃G, r(k) and 𝜃p, r denote the average values of G.p, G and p in the window Ωr(k). Finally the value of qi is estimated as

$$ q_{i}=\theta_{a(i)}\times G_{i}+\theta_{b(i)}, ~~~~~ \theta_{a(i)}=\frac{1}{|{\Omega}_{r}(i)|}\sum\limits_{k\in {\Omega}_{r}(i)}a_{k},~~~~~ \theta_{b(i)}=\frac{1}{|{\Omega}_{r}(i)|}\sum\limits_{k\in {\Omega}_{r}(i)}b_{k} $$
(18)

Here |Ωr(i)| denotes the cardinality of Ωr(i). If the pixel k lies on an edge, the value of γk approaches to 1 and 0 if the pixel k belongs to a smooth region. It can be said that, ak behaves similar to γk. Finally, it can be concluded that, GDGF, provides better performance near edges by incorporating two edge attentive parameters including γk and ΓG, k. The another positive factor about GDGF is that it still performs filtering in O(N) time. In (10) and (11), p is the depth map dr obtained in above section, G is the guidance image which is the input image I and q is the refined depth map dref obtained after applying GDGF. As discussed above, GF does not cleanly preserve all edges. GDGF incorporates first order edge awareness constraints in GF and helps it to better preserve the edges in a smoothed image. This has also been shown in Fig. 3. The areas inside red rectangles are zoomed and shown in other rectangles as indicated by arrows. In Fig. 3b, the de-fogged image is produced by GF where as in Fig. 3c, GDGF is used. The zoomed regions show that the former filter does not work properly for some fine details which can be dealt with GDGF.

Fig. 3
figure 3

a Input Foggy Image. De-fogged images obtained using b Guided Filter [11] c Gradient Domain guided image Filter [19]

Inspired from [9], the performance of GDGF is improved in the proposed work by performing the above described operations on down sampled input images with scaling factor s. He et al. observed that, if GF is applied on down sampled images (Input image and the guided image) with s = 4, similar results can be achieved as that of full resolution images. This leads to a speedup of more than 10 times with almost no visible degradation. Similarly, in the proposed work, with GDGF, the depth map dref and the guidance input image I are down sampled (nearest-neighbor or bilinear). All the window based filters are performed on the down sampled images. Finally, the coefficient maps \(\bar {a}\) and \(\bar {b}\) are up sampled to the original size by using bilinear interpolation and the refined depth map dref is computed by using (10) with the up sampled coefficients and the guidance image.

3.3 Atmospheric light estimation

Global atmospheric light (LA) plays an important role in the model based image de-fogging process. High value of atmospheric light produces darker de-fogged images and lower value obtains brighter de-fogged images [31]. According to [44], top 0.1% brightest pixels in the depth map obtained in (9) represent the most fog opaque region of a foggy image which can be considered as the best region for atmospheric light estimation. According to Koschmieder model (1), the scene point value of a foggy image approaches to global atmospheric light in the regions of infinite depth. Consider (1):

$$ I(x)=L_{0}\times (x)e^{-\beta d(x)}+L_{A}\times (1-e^{-\beta d(x)}) $$

For the pixel x of infinite depth (\(d(x) \rightarrow \infty \)), \(e^{-\beta d(x)} \rightarrow 0\), therefore,

$$ I(x)\approxeq L_{A}~~~~~~~~ or ~~~~~~~~L_{A}\approxeq I $$

The regions of infinite depth generally lie at the top portion of an image. Therefore, in the proposed work, to save the computational time, LA is found from the depth map of top 25% image rows. The rows are extracted and their corresponding depth map is obtained according to (9). Finally, the values in R, G, B color channels corresponding to top 0.1% brightest pixels in the depth image are selected as the value of global atmospheric light LA. This value of atmospheric light obtained in this section is further used in image de-fogging as described in the next section.

3.4 Transmission estimation, image restoration and non uniform illumination compensation

The objective of an image de-fogging algorithm is to obtain a fog free image (L0) from a foggy image (I). For this, refined depth map dref is obtained as described in Section 3.2. LA is obtained as described in Section 3.3. Transmission Map is estimated using (2). Rearranging (1), \({L_{0}^{c}}(x)\) is found in [44] as,

$$ {L_{0}^{c}}(x)={L_{A}^{c}}+\frac{I^{c}(x)-L_{A}}{min\{max\{e^{-\beta d_{ref}(x)},0.1\},0.9\}} $$
(19)

\({L_{0}^{c}}(x)\) is then recovered by putting the values of I(x), dref(x) and LA in (19). For avoiding noise in the de-fogging results, the values of transmission t(x) are restricted between 0.1 and 0.9 [44]. β = 1.0 is also taken from [44].

It is generally an under-constrained problem to solve the scene radiance from the physical model because the single value of global atmospheric light and the medium extinction coefficient are taken for the whole image. For avoiding the problem caused due to error in estimation of global atmospheric light and a constant value of β, we propose a method to compensate non uniform illumination and avoid dullness in the recovered image L0 inspired from [34]. This method causes simultaneous dynamic range modification, color consistency, and lightness rendition in a time efficient manner. According to Lambert’s law of illumination reflectance, the amount of light reflected by a point on the object or intensity of the given pixel, is the product of scene illumination and the object’s reflectance, where light is completely disseminated in all directions. It is represented mathematically as:

$$ {L_{0}^{c}}(x)=L_{L}(x)\times {L_{R}^{c}}(x) $$
(20)

Here LL is the scene illumination and \({L_{R}^{c}}\) is the reflectance for color channel c. To avoid the problem of non-uniform illumination in the de-fogged image L0, the fundamental problem is that, how to abolish the illumination veil LL, at the same time keeping the reflectance LR. The concept of bright channel prior has been widely used in the area of image processing [34]. According to bright channel prior, most local patches in an outdoor color image contain pixels whose intensity is very high in at least one color channel in RGB channel of each image point. Based on the model described in (20), the initial illumination veil \(L_{L}^{bright}\) can be constructed from the bright channel prior as

$$ L_{L}^{bright}(x)=max_{c\in {RGB}}(max_{y\in {\Omega}(x)}({L_{0}^{c}}(y))) $$
(21)

The scene illumination should be smooth and also proficient of maintaining the image details [34]. The smoothness is achieved by finding local maximum on a square patch in (21). To preserve image details, the edge preserving smoothening operation is applied on \(L_{L}^{bright}\) by using GDGF as described in Section 3.2 to obtain the final scene illumination LL (20). Now as discussed above, the reflection coefficient is found by rearranging (20) as:

$$ {L_{R}^{c}}(x)=\frac{{L_{0}^{c}}(x)}{L_{L}(x)} $$
(22)

The procedure described above not produces any color distortion but there may be certain scenes points which may violate some “gray world” presumptions and generate some color biasness in LR. To correct this, the gain/offset correction is performed simply by biasing the image average color towards pure white. A normalization operation is performed for a scene with dynamic range between \(L_{R_{min}}\) and \(L_{R_{max}}\). This transform can be done using

$$ L_{R}^{c^{\prime}}(x,y)= \left\{\begin{array}{lll} 0 & \text{if } {L_{R}^{c}}(x,y)\leq L_{R_{min}} \\ \frac{{L_{R}^{c}}(x,y)-L_{R_{min}}}{L_{R_{max}}-L_{R_{min}}} & \text{if } L_{R_{min}}\leq {L_{R}^{c}}(x,y)\leq L_{R_{max}} \\ 1 & {L_{R}^{c}}(x,y)\geq L_{R_{max}} \end{array}\right. $$

Here \(L_{R}^{c^{\prime }}\) and \({L_{R}^{c}}\) are the cth color channel’s input and output bands respectively. Rmin = μ − 2σ and Rmax = μ + 2σ are found by calculating mean μ and variance σ corresponding to the input image LR. The procedure described above yields a good visible representation of a de-fogged image as shown in Fig. 4c. To find the bright channel (21), the strategy used in Section 3.1 is used. By doing this, the computational cost can be reduced. In this case, instead of applying minimum operation, local maximum is used.

Fig. 4
figure 4

a Input Foggy Image b De-fogged image without illumination compensation c De-fogged image with illumination compensation

4 Experimental results and discussion

Proposed technique is implemented in MATLAB 2016a, with intel core i5, 1.60GHz processor and 8GB RAM. The results of proposed technique are compared with existing techniques based on subjective and objective parameters. For these comparisons, seven different images, “Image 1-7” taken from the datasets [1] and [2] are considered for subjective and objective evaluations.

4.1 Subjective evaluation

In this subsection, results of the proposed technique have been visually compared with existing techniques including Tan [36], Tarel [38], He [10], Xiao [40], Meng [21], Tang [37], Choi [5], Zhu [44], Ren [27], Liu [20] and Colores [28] etc. Figure 5 contains the de-fogging results on “Image 1” image for He, Tarel, Meng, Zhu, Choi, Cai and proposed technique. Meng and Tarel’s results are saturated in sky area whereas the non sky region of Choi’s result is over saturated. Results of Zhu’s technique are better but the contrast of a de-fogged image is poor. In comparison, results of the proposed technique have higher contrast and are non saturated. In Fig. 6, the results of proposed technique are compared for an evening time image, “Image 2”. Clearly, the de-fogging results of all the techniques are over dark due to which the scene information is even lost. Since the proposed technique applies non uniform illumination compensation which improves the illumination of the final image. Similar effect can also be observed in Fig. 7 for “Image 3”.

Fig. 5
figure 5

a Input Foggy Image “Image 1”. De-fogged images obtained using techniques of b He [11] c Tarel [38] d Meng [21] e Zhu [44] f Choi [5] g Cai [3] h Proposed

Fig. 6
figure 6

a Input Foggy Image “Image 2”. De-fogged images obtained using techniques of b He [11] c Tarel [38] d Meng [21] e Xiao [40] f Tang [37] g Zhu [44] h Proposed

Fig. 7
figure 7

a Input Foggy Image “Image 3”. De-fogged images obtained using techniques of b He [11] c Tarel [38] d Meng [21] e Xiao [40] f Tang [37] g Zhu [44] h Proposed

In Fig. 8, comparison is made between the de-fogging results of Tan, He, Tarel, Meng, Choi, Cai and proposed techniques for “Image 4”. Tan’s result is over saturated completely, whereas Cai’s technique produces over saturation in the extreme left area. Results of other techniques are better but they look dull and diminished. In comparison, the result of proposed technique is better which can be visually observed. In Fig. 9 for “Image 5”, it can be seen that the proposed technique can uncover the details and recover realistic color information also in dense foggy regions. Opposite to this, results of Meng, Cai and Choi are over saturated. Cai proposed a learning-based framework that trains a regressor to predict the transmission value t(x) at each pixel from its surrounding patch. It obtains good results but leaves a noticeable amount of fog in the recovered images. Results of Zhu and Ren are better than these but dense foggy regions are not properly enhanced and the overall results are poor in contrast. Similar observation can be made in Fig. 10 for “Image 6”. The results have also been compared with Liu et al’s [20] wavelet based technique. This can simultaneously remove fog, improves sharpness of the image and remove noise. In Fig. 10j, enhanced edges can be observed but at the same time image information in many areas is lost due to over darkness. Results of Colores et al’s technique [28] based upon MLP is shown in Fig. 10k. This technique is based upon multilayer perceptron to compute the transmission map directly from the minimum channel and a contrast stretching technique to improve the dynamic range of the restored image which shows no significant visibility improvement. In contrast to this, the road area, trees and cars etc. in the proposed results can be clearly visualized.

Fig. 8
figure 8

a Input Foggy Image “Image 4”. De-fogged images obtained using techniques of b Tan [36] c He [11] d Tarel [38] e Meng [21] f Choi [5] g Cai [3] h Proposed

Fig. 9
figure 9

a Input Foggy Image “Image 5”. De-fogged images obtained using techniques of b He [11] c Meng [21] d Zhu [44] e Choi [5] f Ren [27] g Cai [3] h Proposed

Fig. 10
figure 10

a Input Foggy Image “Image 6”. De-fogged images obtained using techniques of b He [11] c Tarel [38] d Meng [21] e Shiau [31] f Choi [5] g Zhu [44] h Cai [3] i Ren [27] j Liu [20] k Colores [28] l Proposed

Next the results of de-fogging for proposed technique are shown in Fig. 11 for “Image 7”. For Cai’s technique, the road portion becomes clearly over saturated. Meng’s technique generates false colors in sky portion of the image. Tarel’s image is good, but due to the use of double median filter, halo artifacts can be observed around the depth edges. Zhu’s result is better but the information of the image is not well recovered. Liu’s results are over dark whereas Colores image is not clearly enhanced. In contrast to this, the results of the proposed technique efficiently removes fog and improves the image details. Finally, it can be concluded that the proposed technique is capable of removing fog from digital images in different kinds of environments with varying amount of fog as shown in different type of images.

Fig. 11
figure 11

a Input Foggy Image “Image 7”. De-fogged images obtained using techniques of b He [11] c Tarel [38] d Meng [21] e Shiau [31] f Choi [5] g Zhu [44] h Cai [3] i Ren [27] j Liu [20] k Colores [28] l Proposed

4.2 Objective evaluation

The final restoration results of the proposed de-fogging technique have been compared with previous state-of-art techniques including Tarel [38], He [11], Meng [21], Choi [5], Zhu [44], Cai [3], and Ren [27]. For comparison, seven foggy images have been considered taken from databases [1] and [2]. The comparison is made in two different categories. First, the quality parameters of the image are considered and then the efficiency of the de-fogging technique is evaluated on the basis of execution time as described below.

  1. i

    Image Quality Parameters: To ensure comparability of various de-fogging techniques, descriptive variables or metrics for judging the output image quality are necessary. In this work, three different quality parameters including Visual Contrast Measure (VCM), Color Naturalness Index (CNI) and Fog Reduction Factor (FRF) have been considered as described below:

    • VCM: It quantifies the visibility strength of a given image [42] and is calculated by

      $$ VCM=100\times \frac{K_{v}}{K_{t}} $$
      (23)

      Here Kt is the total number of local areas and Kv represents the number of local areas having standard deviation larger than that of a given threshold. OTSU threshold image segmentation algorithm [25] has been used to calculate the threshold. VCM uses the local standard deviations and denotes an image contrast to measure the visibility. In general, higher value of VCM represents a good quality image.

    • FRF: It represents the difference between fog density value of the de-fogged image, L0, and the foggy image, I. Fog density is measured using Fog Aware Density Evaluator (FADE) [5]. This fog density is estimated on the basis of natural scene statistics like local mean, contrast, sharpness, image entropy, dark channel, color saturation and colorfulness value. Larger value of FRF indicates better performance of a de-fogging technique. FRF is calculated as

      $$ FRF=FADE(I)-FADE(L_{0}) $$
      (24)
    • CNI: It is the degree of correspondence between human perception and reality world and its value lies between [0, 1] [12]. Higher value of CNI indicates that a de-fogged image is more natural. To find CNI for a given RGB image, first, its luminance (L), hue (H), and saturation (S) components are computed. Then, L and S are limited using a threshold. Next, a pixel will be classified into the three classes, namely, skin, sky and grass based on the hue values. In particular, the hue values of 25-70, 95-135 and 185-260 are for skin, grass and sky pixels, respectively. Let nskin, ngrass and nsky be numbers of pixels, for the above three classes. For each class, the averaged values and standard deviation of the saturation are estimated. The Saverage skin, Saverage grass, and Saverage sky are the averaged saturation values for skin, grass and sky pixels, respectively. By assuming the normal distribution, the local naturalness index for skin, grass and sky are computed from:

      $$ N_{skin}=exp\left[-\frac{1}{2}\left( \frac{S_{average_{skin}}-0.76}{0.52}\right)^{2}\right] $$
      (25)
      $$ N_{grass}=exp\left[-\frac{1}{2}\left( \frac{S_{average_{grass}}-0.81}{0.53}\right)^{2}\right] $$
      (26)
      $$ N_{sky}=exp\left[-\frac{1}{2}\left( \frac{S_{average_{sky}}-0.30}{0.13}\right)^{2}\right] $$
      (27)

      Finally, the global CNI value is obtained as

      $$ CNI=\frac{n_{skin} \times N_{skin}+n_{grass}\times N_{grass}+n_{sky}\times N_{sky}}{n_{skin}+n_{grass}+n_{sky}} $$
      (28)

    A better de-fogging technique achieves higher VCM, FRF and CNI as discussed above. For the proposed technique, comparisons have been made with existing techniques in Tables 12 and 3 for VCM, CNI and FRF respectively. In all the tables, last column denotes the average values of VCM, FRF and CNI which have been made bold to show the overall performance achieved by a given technique for different type of images. It can be seen that the average VCM achieved by the proposed technique is best among all the considered techniques. Average CNI of the proposed technique is slightly less than that of Tarel’s technique but at the same time, Tarel’s visual contrast and fog reduction factor is significantly less than that of the proposed technique. Also, average FRF achieved by the proposed technique is better than that of He, Tarel, Meng, Zhu, Ren and Colorer’s techniques. FRF values of Cai, Choi and Liu’s techniques are better, but other two parameters of these techniques are not good. In the proposed work, we have improved the technique of Zhu et al. [44]. In comparison to Zhu’s technique, the percentage improvement achieved by the proposed technique is 16.9% in VCM, 23.3% in CNI and 63.28% in FRF, which is considerably high. Furthermore, the average execution time taken by the proposed technique is significantly less than that of Zhu’s technique. The overall results achieved by the proposed technique are satisfactory as described above. The computational efficiency of the proposed technique is discussed below.

  2. ii

    Execution Time: It is the total time required to execute the technique to obtain a final de-fogged image (L0) from a foggy image (I). An image de-fogging technique needs to be used in real time applications such as video surveillance systems, intelligent vehicles, etc. Therefore, the execution time taken by a de-fogging technique is a considerable issue. A technique is considered better if it provides good visual results in lesser amount of time. The execution time of the proposed technique is measured in seconds and is given in Table 4.

    It can be clearly seen from Table 4 that the processing speed of proposed technique is better than all other techniques. It shows that the proposed technique is more computational effective than existing techniques and at the same time, other parameters of quality have not been compromised, as shown in Tables 13. Therefore, it can be concluded that the proposed technique is better in achieving high speed along with maintaining the overall image quality.

Table 1 VCM comparison
Table 2 CNI comparison
Table 3 FRF comparison
Table 4 Comparison of execution time of the proposed technique with existing techniques

4.3 Effect of GDGF and non-uniform illumination compensation on de-fogging results

In Sections 4.1 and 4.2, the overall de-fogging results of the proposed technique have been compared with existing techniques. These comparisons show that the proposed technique outperforms the existing techniques in terms of VCM, CNI, FRF and execution time. In this section, the effect of GDGF and Non-Uniform Illumination Compensation of the proposed technique have been discussed. As stated in Section 3, the main contributions of the proposed work include replacement of GF with GDGF in CAP technique, applying non-uniform illumination compensation on CAP technique and applying minimum/maximum sub-sampling to reduce execution time. In Fig. 12, the effect of first two contributions has been shown. In Fig. 12b, the de-fogging results of proposed approach are shown without the post processing step. It can be clearly observed that the image clarity and colorfulness is increased by applying the proposed post processing step. Apart from this the results of proposed approach are also taken by using GF in place of GDGF as shown in Fig. 12c. As described in Section 3.2, GDGF is better in recovering edges as compared to GF. The regions improved by applying GDGF are shown in Red color rectangles and are enlarged at the top which clearly indicates that the saturated portions generated by GF are enhanced using GDGF.

Fig. 12
figure 12

a Input Foggy Image “Image 5”, b De-fogging results of proposed approach without non unifrom illumination, c De-fogging results of proposed approach with GF (in place of GDGF), d De-fogging results of proposed approach (with GDGF)

In Tables 5 and 6, the effect of applying GDGF and non-uniform illumination compensation has been shown. The average values of VCM, CNIandFRF have been shown for Images 1-7. From Table 5, it can be observed that replacing GF with GDGF causes a 11.49% increase in VCM, 2.9% rise in CNI whereas 0.5% decrease in FRF which is very less in comparison to other two parameters. GDGF can better recover edges which leads to increase in visual contrast and naturalness. On the other hand, in Table 6, it can be seen that there is 26.16%, 23.73% and 72.73% rise in VCM, CNI and FRF respectively which is quite high. Non uniform-illumination causes increase in brightness and colorfulness. Thus causes simultaneous increase in all the three parameters. Therefore, it can be concluded that applying GDGF makes the de-fogging results better. Also the non-uniform illumination compensation performed in the proposed technique significantly improves the de-fogging performance.

Table 5 Average VCM, CNIandFRF comparison of the proposed approach on the basis of GF, GDGF performing Non-Uniform Illumination Compensation on both
Table 6 Average VCM, CNIandFRF comparison of the proposed approach on the basis of with and without Non-Uniform Illumination Compensation applying GDGF on both

5 Conclusion

In this work, a de-fogging technique based upon CAP is proposed. Using CAP, depth from a foggy image at each pixel is estimated based on difference between saturation and brightness for the respective pixel. Local window based minimum operation is applied to optimize the estimation time of the depth map. Depth map is further refined using gradient domain guided image filter which recovers fine edge details. For avoiding the problem caused due to error in estimation of global atmospheric light and a constant value of β, we present a novel strategy to post process the de-fogged image. It causes simultaneous dynamic range modification, color consistency, and lightness rendition without having the artifacts in a time efficient manner. In comparison to Zhu’s technique, the percentage improvement achieved by the proposed technique is 16.9% in VCM, 23.3% in CNI and 63.28% in FRF, which is considerably high. The average execution time taken by the proposed technique is also significantly less than that of Zhu’s technique. Experimental results show that the proposed technique achieves high efficiency and better de-fogging effect. It can be also be applied to real time applications due to its low computational cost.