1 Introduction

Haze represents one of the atmospheric phenomena in which various particles (e.g., dust, smoke, and water droplets) severely degrade the quality of an image. There are many sources by which haze particles are generated. Some of them are traffic, industry, farming, and wildfires [1]. These particles scatter and absorb the light. Therefore, captured images suffer from several problems such as low visibility and poor contrast. Moreover, nonlinear and additional noises are introduced in the captured image due to scattering of atmospheric particles [2]. Many of the computer vision applications used in remote sensing, driver assistance system, and surveillance systems demand high-quality input images to work flawlessly [3]. Therefore, to ensure working of these dehazing algorithms perfectly, removal of the haze effect is essential from captured images.

Several research works focus on removing the haze from an image. Most of the dehazing methods [4,5,6,7,8,9] depend on the physical model of haze imaging. In order to find haze-free image, the physical model requires two parameters to compute: atmospheric light and transmission map.

In these dehazing methods, the transmission is estimated upon certain assumptions or priors. The state-of-the-art dehazing methods usually enhance the visibility of the image quite satisfactory. However, if these assumptions/priors are violated, a dehazing method may distort the quality of the recovered image and may lead to many issues such as color distortions or halo artifacts. Moreover, the captured images have additional noises in the sky regions which are unnoticeable in the input image due to heavy haze; a dehazing method also increases these noises which results in visual artifacts in the sky regions.

Remarkably, very few methods have paid attention to remove the visual artifacts while restoring an image. These methods are managed to remove the visual artifacts in the sky regions in the dehazed image. However, some of them introduce blurring effects and loss of details in the dehazed image [10, 11]. The method in [12] handles only blocking artifacts, and other artifacts remain unresolved. The other methods [13, 14] are not able to remove haze completely. Furthermore, the methods in [15, 16] lead to oversaturation. Moreover, existing machine learning-based methods require a huge number of examples to train the model [17, 18]. Therefore, the estimation of accurate transmission is still an open area of research.

None of the method in the state-of-the-art image dehazing is able to handle three types of distortions (saturation, artifacts, and blur) simultaneously in the dehazed image. Therefore, we introduced a machine learning-based dehazing technique based on the concept of superpixels that overcomes the limitations of previous work as follows:

  1. (1)

    The proposed method estimates the transmission of a superpixel rather than a pixel or patch that makes the proposed method fast.

  2. (2)

    Instead of directly supplying local patches of the hazy and haze-free images for the training, we extract the haze relevant features from the hazy images for superpixels that reduce the size of a feature vector and as well as of training examples.

  3. (3)

    A nonlinear regression using ensemble neural networks is used to estimate the robust transmission of a superpixel which avoids the problem of overfitting and improve generalization of a network in the presence of noisy data.

  4. (4)

    The proposed method handles all distortions (artifacts, blur, and saturation) of an image simultaneously.

This paper is structured as follows: Sect. 2 describes the process of image formation in hazy weather conditions and the most popular methods in the field of single image dehazing. The steps of the proposed method are discussed in Sect. 3. Section 4 presents an extensive evaluation of the proposed method on various real-world and synthetic hazy images. Finally, conclusions and future work are presented in Sect. 5.

2 Related work

Currently, state-of-the-art dehazing methods can be divided into three groups: (1) restoration-based methods; (2) image enhancement-based method; (3) machine learning-based methods.

Image restoration-based methods consider the degradation mechanism to restore the hazy image. These methods use the physical model of haze imaging [19, 20] to obtain the haze-free image, given as follows:

$$ I_{h}^{c} (x) = J_{nh}^{c} (x)\;t_{r} (x) + A_{t}^{c} \left( {1 - t_{r} (x)} \right) $$
(1)

where \( c \in [r,g,b] \) represents the color components of an RGB image, \( I_{h}^{c} \) is the captured hazy image in color component c, and \( J_{nh}^{c} {\kern 1pt} \) is the haze-free image in color component c. x is the pixel coordinate, \( {\kern 1pt} {\kern 1pt} t_{r} \) is the transmission map, representing the portion of light directly reaching to the camera without scattering. \( t_{r} \) is an exponential function of distance, ranging in [0, 1], where \( {\kern 1pt} t_{r} (x) = e^{ - \beta d(x)} \), \( \beta \) is the atmospheric scattering coefficient, and d is the distance between the camera and the object. A large value of \( \beta \) and d will attenuate the transmission and result in a high impact of haze.

Due to the ill-posed nature of single image dehazing, restoration-based methods estimate the transmission based on some assumptions or priors. The success of a method depends on how strong the assumption is.

Tan et al. [4] proposed a method based on two assumptions: (1) Haze-free image must have better contrast compared to hazy image. (2) Airlight is a continuous function of distance. Based on the first prior, the contrast maximization concept is used. For the second prior, airlight is obtained through Markov random fields (MRFs). However, dehazed image suffers from the “halo” effects due to sudden change of depth and color oversaturation. Fattal [5] proposed a method with the assumption that transmission function and object surface shading are uncorrelated. This approach models the haze-free image as a product of object surface reflectance coefficient (R) and a shadowing factor (l). However, it cannot restore the image with heavy fog or insufficient signal-to-noise ratio and may lead to oversaturation. Tarel et al. [6] proposed a fast and filtering-based approach for visibility restoration. Before visibility restoration, this method performs white balancing. Instead of estimating depth, atmospheric veil function is estimated using a median filter. However, depth edge information is lost due to the median filter. Adjusting of many parameters makes it inappropriate.

Later, He et al. [7] proposed a most popular dark channel prior. It is based on the assumption that every haze-free image, except for sky regions, has some pixels of low intensity or even close to 0 in at least one color channel. Based on these dark patches, the transmission is estimated. This method produces an excellent result and improves the visibility of a hazy image. However, this method is unable to handle halo artifact at depth change due to local patch and not valid for images having large sky regions. Furthermore, when the image contains white objects, over-dehazing problem occurs due to the underestimation of the transmission. Meng et al. [8] improved the DCP method and proposed a boundary constraint for the DCP. The transmission is refined using contextual regularization. This method greatly enhances the visibility of the image. However, the dehazed image contains a color aliasing effect in the sky regions.

Till now, all dehazing methods estimate the transmission based on local patch. Berman et al. [15] proposed a method using non-local prior. A haze-free image is represented using a few distinct hundreds of color clusters. In the presence of haze, these clusters are changed and form a haze-line. This method significantly improves the visibility of the image. However, this method fails when the intensity of atmospheric light is more than the intensity of the image. In this situation, the direction of most of the pixels will be the same and it is very difficult to find haze-lines. Therefore, estimated incorrect transmission causes saturation or color distortions in the dehazed image. Bui and Kim [9] proposed a color ellipsoid prior in which hazy pixels clusters are statistically fitted in RGB space. This method is fast and enhances the details of the hazy image; however, few color artifacts can be seen in sky regions. Recently, Raikwar et al. [21] claimed to propose an accurate transmission that preserves the structure of a hazy image in a dehazed image. To compute the transmission map, this method considers the difference of minimum color channel of hazy and haze-free image. The transmission is refined by contextual regularization. This work is able to maintain the structure of a hazy image; however, recovered images have less contrast. Later, Raikwar et al. [22] proposed a tight lower bound on the transmission, computed from the minimum color channel of a hazy image.

Few methods to handle artifacts exist in the literature. Chen et al. [11] presented a dehazing method that works in two steps. In the first step, DCP method is used to estimate the transmission. Artifacts-free image is obtained in the second step by minimizing the gradient residual between the input and output image. This method is managed to remove the artifacts in the restored image. However, it introduces a blurring effect and sometimes heavy oversaturation in the dehazed image. Moreover, due to ambiguity between artifacts and details of the object, it is unable to enhance the visibility in long-range regions. Li et al. [12] proposed an image enhancement-based method that decomposes a hazy image into low-frequency and high-frequency components. Dehazing is performed in low frequencies, and finally, a dehazed image is obtained by combining these two components. However, this method removes only blocking artifacts and unable to handle other artifacts such as color artifacts in sky regions.

To address the problem of artifacts, several machine learning-based methods also came into existence. Zhu et al. [10] proposed a linear model based on color attenuation prior (CAP) to compute depth map. This method assumes that the haze effect is correlated with a difference of saturation and brightness. It does not consider any additional step to suppress visual artifacts like method [11]. Visual artifacts are handled by the estimated depth itself. However, it requires a number of parameters to learn the linear model that relies on training data. Additionally, this method also blurs the detail of a dehazed image.

Recently, a super pixel-based dehazing technique is proposed in [16]. It is based on the assumption that the transmission of an image is locally constant; therefore, the groups of pixels that are similar in colors, textures, or brightness can be clustered into the superpixels. The method proposed a two-layer Gaussian process. It used three methods [5, 7, 23] to compute the transmission of a superpixel for training purpose. In the first layer, the rough transmission of a superpixel is computed and refined using neighboring pixels in the second layer. Haze filtering, introduced to reduce the artifacts in the sky regions, leads to loss of important details in sky regions. Additionally, it also has a problem of saturation in dehazed image.

Salazar-Colores et al. [13] proposed a multilayer perceptron to compute the transmission. It estimates the transmission by the minimum channel of a hazy image. This method prevents artifacts and saturation in the dehazed image. However, it requires post-processing operation (contrast stretching) to obtain a final haze-free image, and also, it is unable to remove the haze effect completely.

Deep learning-based method is also recently introduced in the dehazing and successfully implemented. In paper [24], Engin et al. proposed a convolution neural network (CNN)-based method called cycle-dehaze. This method directly enhances the quality of a hazy image by considering cycle consistency and perceptual loss in training the model. This method improves the visibility of the image. However, a lot of distortions appear in the recovered image due to downsampling and upsampling of the image. Santra et al. [25] proposed patch quality comparator-based method to dehaze an image. This method estimates the transmission by comparing the various output patches with the input hazy image using binary search and selects the best one. Ren et al. [26] utilized CNN for the removal of haze from videos. This method assumes that transmission is highly correlated in adjacent frames of the video. It also uses semantic information to restore a haze-free image. Later, Ren et al. [27] proposed a multi-scale CNN and holistic edge guided network for refinement of edges. This method achieves good dehazing capability in terms of speed and quality. Zhang et al. [28] proposed a method that estimates the transmission by jointly learning of clear image details and transmission map.

Machine learning-based methods directly supply a vast amount of hazy and their corresponding haze-free images of different haze concentrations to train the model. The high number of pixels in an image prevents an algorithm from being computationally feasible.

3 The proposed methodology

The proposed method consists of six steps: superpixel segmentation, feature extraction, train the model, non-linear regression, atmospheric light estimation, and recovery of a haze-free image, as shown in Fig. 1. In the first step, the hazy image is segmented using superpixels by SLIC algorithm. The second step extracts multi-scale haze-related features from the hazy images, and then, these features are arranged superpixelwise. The training step prepares the data to train the model. These data include a feature vector of all superpixels extracted from hazy images along with their target transmission. Nonlinear regression step predicts the transmission by ensemble neural network of segmented superpixels for a test hazy image. After refinement of the transmission, the subsequent tasks in haze removal such as atmospheric light estimation and scene recovery are performed.

Fig. 1
figure 1

The framework of the proposed method

3.1 The superpixel segmentation

The superpixel groups a set of pixels that are homogeneous in color, texture, and brightness. Due to the high number of pixels in an image, to speed up the task, many vision algorithms such as object classification [29], depth recovery [30], and semantic segmentation [31] take advantage of superpixels. A superpixel can be replaced by a fixed size patch to reduce the halo artifacts. According to the state-of-the-art of superpixels algorithm, such as graph-based, density-based, and clustering-based, it is found that simple linear iterative clustering (SLIC) algorithm [32] is fast and can be implemented in real time.

The SLIC offers on controlling of the number of superpixels extraction from an image, and generated superpixels are compact in shape. Therefore, we select the SLIC algorithm to extract haze relevant features of a superpixel. The center of a superpixel is represented in the 5D space as {l, a, b, x, y}, where {l, a, b} represents the color of a pixel in CIELAB color space and (x, y) is the coordinates of a pixel. It uses the following distance measure:

$$ {\text{dist}}_{\text{lab}} = \sqrt {(l(u) - l(v))^{2} + (a(u) - a(v))^{2} + (b(u) - b(v))^{2} } $$
(2)
$$ {\text{dist}}_{xy} = \sqrt {(x(u) - x(v))^{2} + (y(u) - y(v))^{2} } $$
(3)
$$ {\text{dist}}_{S} = {\text{dist}}_{\text{lab}} + \frac{m}{S}{\text{dist}}_{xy} $$
(4)

where \( {\text{dist}}_{\text{lab}} \) is the color distance, \( {\text{dist}}_{xy} \) represents the spatial distance, and u and v are the two pixels. \( {\text{dist}}_{S} \) is the sum of color distance and spatial distance, normalized by a grid interval S, where \( S = \sqrt {P /Q} \). P is the number of pixels in an image, and Q is the number of superpixels. m is a controlling parameter used to control the compactness of a superpixel. The SLIC algorithm shows good adherence to image boundaries; therefore, it can be utilized in image dehazing to reduce the halo artifacts at depth discontinuities. The comparison of SLIC with other two superpixel approaches: graph based [33] and path based [34] is shown in Fig. 2 along with zoomed part for the object inside the black rectangle. This figure shows that SLIC has better adherence at image boundaries as compared to these methods.

Fig. 2
figure 2

Comparison of superpixel approaches for boundary adherence on a hazy image. a Hazy image, b Ren et al. [33], c Tang et al. [34], d SLIC [32]

3.2 Features extraction from superpixel

In the second step, we extract haze relevant multi-scale features from a hazy image. These features are hue disparity (hd) [35], dark channel (\( {\text{Dark}}^{s} \)) [7], local max contrast (\( {\text{Con}}^{s} \)) [4] and local max saturation (\( {\text{Sat}}^{s} \)) [23], as shown in Fig. 3.

Fig. 3
figure 3

Feature extraction from a hazy image. a Hazy image, b hue disparity (hd), c dark channel at scale 1(Dark1), d dark channel at scale 10 (Dark10), e local max contrast at scale 1 (Con1), f local max contrast at scale 10 (Con10), g local max saturation at scale 1(Sat1), h local max saturation at scale 10 (Sat10)

3.2.1 Hue disparity

Hue disparity is used to find the presence of haze in hazy image. It is defined as the difference of hue between a hazy image and its semi-inverse image, whereas the semi-inverse image is the maximum of hazy image and inverse of the hazy image, given as follows:

$$ {\text{hd}}\;(x) = \left| {I^{h} (x) - I_{\text{semi}}^{h} (x)} \right| $$
(5)
$$ I_{\text{semi}}^{c} (x) = \mathop {\hbox{max} }\limits_{{c \in \{ r,g,b\} }} \left( {I^{c} (x),\left( {1 - I^{c} (x)} \right)} \right) $$
(6)

where \( I^{c} \) is the hazy image in color channel c, \( I_{\text{semi}}^{c} \) is the semi-inverse image in color channel c, h represents the hue of the image, and x is the position of a pixel. Figure 3b shows the hue disparity hd, which clearly shows the strong correlation with the haze in the image.

3.2.2 Dark channel

The DCP is highly correlated with the haze and used to estimate the transmission. It is defined as a minimum of color channel in a local patch. It is given by:

$$ {\text{Dark}}^{s} (x) = \mathop {\hbox{min} }\limits_{{y \in\Omega _{s} (x)}} \left( {\mathop {\hbox{min} }\limits_{{c \in \{ r,g,b\} }} \left( {I^{c} (y)} \right)} \right) $$
(7)

where \( \Omega _{s} \) is the local patch, centered at x with size s*s. s affects the performance of the dark channel. According to this feature, a small value of s will result in over-dehazing, while a large s will result in halo artifacts. Thus, we apply a multi-scale dark channel feature in the proposed work. We used four scales in the proposed work: \( {\text{Dark}}^{s} = \left[ {{\text{Dark}}^{1} ,{\text{Dark}}^{4} ,{\text{Dark}}^{7} ,{\text{Dark}}^{10} } \right] \). Figure 3c and d shows the visual analysis of dark channel feature for scale 1 and 10. As we can see, the dark channel is highly correlated with the amount of haze in the image. The dark channel becomes darker for the higher value of s.

3.2.3 Local max contrast

Due to haze, the contrast of the hazy image is reduced. Therefore, Tan et al. [4] proposed a method to improve the contrast of the image by maximizing the local contrast. In the proposed method, it is defined by taking the difference of maximum and minimum intensity in a local patch \( \Omega _{s} \), centered at x.

$$ {\text{Con}}^{s} (x) = \mathop {\hbox{max} }\limits_{{y \in\Omega _{s} (x)}} \left( {\mathop {\hbox{max} }\limits_{{c \in \{ r,g,b\} }} (I^{c} (y))} \right) - \mathop {\hbox{min} }\limits_{{y \in\Omega _{s} (x)}} \left( {\mathop {\hbox{min} }\limits_{{c \in \{ r,g,b\} }} (I^{c} (y))} \right) $$
(8)

Again, we take four scales \( {\text{Con}}^{s} = \left[ {{\text{Con}}^{1} ,{\text{Con}}^{4} ,{\text{Con}}^{7} ,{\text{Con}}^{10} } \right] \) in the proposed work. Figure 3e and f represents the local max contrast of a hazy image for the scale of 1 and 10. It can be observed in the figures that the haze-free pixels have higher contrast as compared to hazy pixels, although this feature is not as powerful as a dark channel.

3.2.4 Local max saturation

Similar to image contrast, image saturation is also reduced by haze effect. Local max saturation is defined on a local patch \( \Omega _{s} \) by considering the maximum of pixel wise saturation. It is given as follows:

$$ {\text{Sat}}^{s} (x) = \mathop {\hbox{max} }\limits_{{y \in\Omega _{s} (x)}} \left( {1 - \frac{{\mathop {\hbox{min} }\limits_{{c \in \{ r,g,b\} }} (I^{c} (y))}}{{\mathop {\hbox{max} }\limits_{{c \in \{ r,g,b\} }} (I^{c} (y))}}} \right) $$
(9)

Similar to the image contrast feature, we take four scales for local max saturation as \( {\text{Sat}}^{s} = \left[ {{\text{Sat}}^{1} ,{\text{Sat}}^{4} ,{\text{Sat}}^{7} ,{\text{Sat}}^{10} } \right] \). Visual analysis of this feature is shown in Fig. 3g and 3h. As we can notice in the figure, saturation is reduced in the presence of haze.

For each image, we extract thirteen haze relevant features. The feature vector includes hue disparity, dark channel with four scales, local max contrast with four scales, and local max saturation with four scales. The work [22] constructs a 325D feature vector at every pixel in a 5 × ← 5 patch. The size of the feature vector is reduced to 37D in the recent work [16] using superpixel. This work includes a Gabor filter with 3 scales and 8 orientations. We do not consider it because it does not characterize the hazy image. Therefore, the size of the feature vector is reduced from 37 to 13D. We use the average feature vector within the superpixel to train the model because all the pixels within a superpixel represent the same characteristics. Hence, the feature vector is represented as follows:

$$ f_{s} = \frac{1}{n}\sum\limits_{{x \in s_{i} }} {\left( {f(x)} \right)} $$
(10)

where \( f(x) = \left[ {{\text{hd}},{\tilde{\text{D}}\text{ark}}^{s} ,{\tilde{\text{C}}\text{on}}^{s} ,{\tilde{\text{S}}\text{at}}^{s} } \right] \)

$$ \begin{aligned} {\tilde{\text{D}}\text{ark}}^{s} & = \left[ {{\text{Dark}}^{1} ,{\text{Dark}}^{4} ,{\text{Dark}}^{7} ,{\text{Dark}}^{10} } \right] \\ {\tilde{\text{C}}\text{on}}^{s} & = \left[ {{\text{Con}}^{1} ,{\text{Con}}^{4} ,{\text{Con}}^{7} ,{\text{Con}}^{10} } \right] \\ {\tilde{\text{S}}\text{at}}^{s} & = \left[ {{\text{Sat}}^{1} ,{\text{Sat}}^{4} ,{\text{Sat}}^{7} ,{\text{Sat}}^{10} } \right] \\ \end{aligned} $$

where fs is the average feature vector consisting of average of four features: hue disparity, dark channel, local max contrast, and local max saturation. n is the number of the pixels belonging to a superpixel Si. \( {\tilde{\text{D}}\text{ark}}^{s} , \) \( {\tilde{\text{C}}\text{on}}^{s} , \) and \( {\tilde{\text{S}}\text{at}}^{s} \) represent superpixelwise average feature of the dark channel, max contrast, and max saturation, respectively.

Hue disparity is calculated for the entire image; the other three features are calculated in four scales. The averaging process transforms the feature vector from imagewise to superpixelwise.

3.3 The training phase

The training phase requires the transmission of a hazy image. Fan et al. [16] rely on existing methods to generate target transmission. They used three different methods to generate transmission. Since this method is dependent on other methods for the transmission, failure of assumption/priors may lead to failure of this method. Therefore, we obtain the actual transmission from the synthetic hazy images, whose depth images are available. The NYU depth dataset [36] contains ground truth images and their corresponding depth images. To generate hazy images, the atmospheric scattering model is utilized. First, for each image, the transmission is determined using the depth d and scattering coefficient \( \beta \). \( \beta \) is set to 1, 2, 3, and 4 to generate hazy images of different haze concentrations. Furthermore, the atmospheric light A is assumed to be pure white. Figure 4 shows the hazy images along with their transmissions for different values of \( \beta \). We generate the hazy image according to the following equation:

$$ I_{{_{h} }}^{c} (x) = J_{{_{nh} }}^{c} (x) \cdot e^{ - \beta d(x)} + A_{{_{t} }}^{c} \left( {1 - \left( {e^{ - \beta d(x)} } \right)} \right) $$
(11)
Fig. 4
figure 4

Generation of hazy images and their corresponding transmissions on two hazy images. a The first row in this figure shows the ground truth image along with depth image, b the second row shows the different hazy images using \( \beta \) = 1, \( \beta \) = 2, \( \beta \) = 3, and \( \beta \) = 4, c the third row shows the transmission map of the corresponding hazy images

3.4 Nonlinear regression by ensemble neural network

Nonlinear regression is a good choice for nonlinear problems such as estimating the transmission of a hazy image, where haze effect depends on the distance and scattering of particles also adds nonlinear noises to the image. The 13 haze relevant features extracted from the hazy image act as inputs to a neural network, and the transmission is the target. The neural network is trained on these 13 features of each superpixel whose transmission is already known, as discussed in the training section, and it will produce the transmission as an output for each superpixel of unknown hazy image. We used two-layer (hidden layer and output layer) feed-forward network for learning the transmission on superpixels. We have taken 10 neurons in the hidden layer and used nonlinear activation function (hyperbolic tangent sigmoid) and linear activation function in the hidden layer and an output layer, respectively. The Levenberg–Marquardt (LM) [37] is used as a training algorithm.

However, this algorithm takes more memory, but it is computationally efficient; therefore, training time is reduced.

A neuron takes M inputs with approximate weight w. The output of a neuron can be expressed as a sum of weighted inputs and bias to the transfer function. The net input to neuron k in layer i + 1 is expressed as follows:

$$ n^{i + 1} (k) = \sum\limits_{j = 1}^{M} {w^{i + 1} } (k,j)a^{i} (j) + b^{i + 1} (k) $$
(12)

The output of the neuron k is expressed as follows:

$$ o^{i + 1} (k) = f^{i + 1} \left( {n^{i + 1} (k)} \right) $$
(13)

or output of a neuron in matrix form can be expressed as follows:

$$ o^{i + 1} = f^{i + 1} \left( {w^{i + 1} a^{i} + b^{i + 1} } \right) $$
(14)

where ai represents a feature vector x of size m*s, m is the number of features, and s is the number of superpixels generated by all hazy images. i = 0,1,2,…L − 1 represents layers, k = 1, 2, 3…N represents the number of neurons in each layer, and f is the activation function. The most popular choice of the activation function for nonlinear regression is logistic sigmoid (LS) and hyperbolic tangent (HT). We have used the HT activation function as it is a rescaled version of LS and more powerful than LS, given as follows:

$$ f^{i + 1} (x) = \frac{{e^{x} - e^{ - x} }}{{e^{x} + e^{ - x} }} = \frac{{e^{2x} - 1}}{{e^{2x} + 1}} $$
(15)

and its derivative is given by

$$ f'^{i + 1} (x) = 1 - \left( {f^{i + 1} (x)} \right)^{2} $$
(16)

To adjust network weight and bias values, a cost function is required. The objective of the cost function is to minimize the difference between the actual and predicated transmission of a superpixel. This is achieved through mean square error (MSE) as follows:

$$ C(x) = \frac{1}{M}\sum\limits_{i = 1}^{M} {(e_{i} } )^{2} = \frac{1}{M}\sum\limits_{i = 1}^{M} {(t_{i} } - p_{i} )^{2} $$
(17)

where M is the number of observations or training examples, i.e., number of superpixels of all hazy images used for training, ti is the actual transmission, and pi is the predicted transmission of a superpixel i.

The LM algorithm is an approximation to Newton’s method and optimizes the performance. It computes gradients and Jacobians through backpropagation algorithm. It achieves second-order training speed without calculating the Hessian matrix (H). It can be approximated as follows:

$$ H = J^{\text{T}} J $$
(18)

and the gradient is computed as follows:

$$ g = J^{\text{T}} e $$
(19)

where J is the Jacobians matrix and e is a vector of errors. Jacobians matrix contains first-order partial derivatives of the network error. These derivatives are with respect to weights and biases. It is given as follows:

$$ J = \left[ {\begin{array}{*{20}c} {\frac{{\partial e_{1} (x)}}{{\partial x_{1} }}} & {\frac{{\partial e_{1} (x)}}{{\partial x_{2} }}} & \cdots & {\frac{{\partial e_{1} (x)}}{{\partial x_{m} }}} \\ {\frac{{\partial e_{2} (x)}}{{\partial x_{1} }}} & {\frac{{\partial e_{2} (x)}}{{\partial x_{2} }}} & \cdots & {\frac{{\partial e_{2} (x)}}{{\partial x_{m} }}} \\ \cdot & \cdot & {} & \cdot \\ \cdot & \cdot & {} & \cdot \\ {\frac{{\partial e_{M} (x)}}{{\partial x_{1} }}} & {\frac{{\partial e_{M} (x)}}{{\partial x_{2} }}} & \cdots & {\frac{{\partial e_{M} (x)}}{{\partial x_{m} }}} \\ \end{array} } \right] $$
(20)

The LM algorithm uses the approximations to the Hessian matrix in the following Newton-like update:

$$ Y_{k + 1} = Y_{k} - \left[ {J^{\text{T}} J + \mu I} \right]^{ - 1} J^{\text{T}} e $$
(21)

\( \mu \) is decreased after each successful step. It stops automatically when generalization stops improving, reflected in terms of increasing MSE.

The performance of NN depends on training data, and it may have varying set of weights each time when they are trained. Therefore, single NN is having high variance in predictions. To avoid the problem of overfitting and to improve generalization of a network, we used multiple neural networks [38, 39] and average their outputs, as shown in Fig. 1. This network is generally used when there is a less volume of data and it is noisy too. We trained using 10 NNs and their MSEs are compared to MSE of their average. Average transmission is calculated as follows:

$$ t_{r} = \sum\limits_{i = 1}^{Q} {t_{i} } $$
(22)

where ti is the output transmission estimated by an individual neural network.

After this step, transmission of all superpixels is estimated by nonlinear regression. The obtained transmission is superpixelwise; therefore, it requires refinement. For the refinement of transmission, the proposed method uses Guided filtering [40]. It produced good results in less time as compared to soft matting in [7].

3.5 The estimation of atmospheric light

Many methods have been proposed in the past to estimate the atmospheric light. Tan et al. [4] select the brightest pixel in the image. However, this method is not suitable, if there are white objects in the image. Kim et al. [41] and Wang et al. [42] use the concept of quad-tree subdivision based on a threshold. He et al. [7] select the top 0.1% brightest pixels in the dark channel and their corresponding pixels from the hazy image. We use the same method [7] to obtain atmospheric light. However, transmission generated by nonlinear regression for sky regions is more accurate than DCP [7].

3.6 The recovery of scene radiance

The transmission and atmospheric light are estimated; they are plugged into the following equation to achieve haze-free image as follows:

$$ J_{{_{nh} }}^{c} (x) = \frac{{I_{{_{h} }}^{c} (x) - A_{{_{t} }}^{c} }}{{\hbox{max} (t_{r} (x),.01)}}{\kern 1pt} {\kern 1pt} + {\kern 1pt} A_{{_{t} }}^{c} $$
(23)

where \( c \in \{ r,g,b\} \) represents color channels, \( J_{{_{nh} }}^{c} \) is the haze-free image in channel c, \( I_{{_{h} }}^{c} \) is the hazy image in channel c, and \( A_{{_{t} }}^{c} \) is the atmospheric light. max () is applied to avoid divide by zero exception.

4 Experimental results and analysis

The effectiveness of the proposed method is evaluated by comparing it with the latest and popular state-of-the-art dehazing methods. These methods are: robust artifacts suppression (RASD) [11], color attenuation prior (CAP) [10], non-local dehazing (NLD) [15], dark channel prior (DCP) [7], multilayer perceptron (MLP) [13], and two-layer Gaussian (TLG) [16]. These prevailing dehazing methods are compared and evaluated on challenging real-world hazy images. Besides, performance on synthetic hazy images is compared with CNN-based methods including patch quality comparator (PQC) [25] and cycle-dehaze (CD) [24] and latest method nonlinear bounding function (LBF) [21]. We have conducted the experiments on two datasets: Waterloo IVC dataset [43] and RESIDE dataset [44]. To prove the capability of the proposed method, qualitative and quantitative comparisons are performed.

4.1 Dehazing on real-world hazy images

We have selected the hazy images for comparison from Waterloo IVC dataset [43]. The hazy images are partitioned into hazy images with large sky regions and without sky regions. Moreover, a qualitative and quantitative comparison is conducted on these images.

4.1.1 Qualitative analysis

Figure 5 represents the hazy images with large sky regions. The challenge of these images is to remove the effect of the haze without color distortions and artifacts in the sky region. Without sky region images are shown in Fig. 6. These hazy images are containing white objects, nighttime haze (headlight similar to atmospheric light), mild haze, long scenery, and abrupt depth regions. The dehazed image must not suffer from color distortions or halo artifacts.

Fig. 5
figure 5

Visual comparison with existing methods on hazy images with sky regions. a Hazy image, b RASD [11], c CAP [10], d NLD [15], e DCP [7], f TLG [16], g MLP [13], h the proposed method

Fig. 6
figure 6

Visual comparison with existing methods on hazy images without sky regions. a Hazy image, b RASD [11], c CAP [10], d NLD [15], e DCP [7], f TLG [16], g MLP [13], h the proposed method

To better understand the dehazing capability of the proposed method and the other methods, some important regions of the image are marked with rectangles of different colors. We are considering four types of distortion in the restored image. These problems are artifacts, oversaturation/color distortions, blurring effect, and visibility/loss of details. The red rectangle represents the area with the problem of artifacts, blue rectangle indicates the region with the problem of blurring effect, the area with the problem of oversaturation or color distortions is marked with the green rectangle, and yellow rectangle denotes the region with the problem of either incomplete haze removal or loss of details in the highlighted portion.

Figures 5 and 6 show that RASD method produces dehazed result without artifacts. However, it is unable to prevent oversaturation and blurring effects. The blurring effect is introduced at the cost of artifacts suppression by gradient residual minimization, and inaccurate estimation of transmission leads to oversaturation. The problems encountered in the restored images are marked with rectangles of blue and green colors only, as shown in Figs. 5b and 6b.

The CAP method restores image without artifacts. However, it also introduces blurring effect and color distortions in the dehazed image as indicated by the blue and green rectangles. Sometimes, the dehazed image lost the details, as marked with yellow rectangles (see the fifth image of Fig. 5c and the second image of Fig. 6c, unable to preserve the structure of headlight).

The restored images using NLD are shown in Figs. 5d and 6d. The details are visible in the dehazed images. However, dehazed images are too brightened and suffer from color distortions or oversaturation. Most of the time, results are marked with green rectangles, except one image where it produces the color artifacts in the sky region, marked with a red rectangle (see the fourth image of Fig. 5d).

The DCP method works very well for non-sky images. However, for sky regions image, its performance is unsatisfactory. We can observe the dehazed images in Figs. 5e and 6e. In all images of Fig. 5e, it produces the color artifacts in the sky regions, marked with red rectangles. It produces halo artifacts at the depth discontinuities due to patch-based estimation of transmission (see the fifth image of Fig. 6e) and color distortion for white objects (see the first image of Fig. 6e).

The results of two machine learning methods TLG and MLP are shown in Figs. 5f, g and 6f, g, respectively. We can observe in these figures that the performance of these two methods is not consistent. These two methods are unable to serve the main purpose of dehazing, i.e., they are unable to increase the contrast or visibility of the hazy image, as indicated by the yellow rectangles. However, the performance of the TLG is satisfactory as compared to MLP in terms of haze removal. Otherwise, the performance of MLP in distortion parameters is better as compared to TLG. Sometimes, the dehazed images by these two methods suffer from the problem of saturation, as indicated by green rectangles.

In comparison with all dehazing methods, the proposed method better recovers the hazy image with increased visibility without distortions in the image. The dehazed images are free from artifacts, saturation, and blur. The results are shown in Figs. 5h and 6h.

Furthermore, we have tested the robustness of the proposed method on a noisy image, as shown in Fig. 7. We have added Gaussian noise with mean 0 and standard deviation 0.01.

Fig. 7
figure 7

Visual comparison with existing methods on a noisy hazy image. a Hazy image, b DCP [7], c NLD [15], d the proposed method

Figure 7 shows that the dehazed result of existing methods DCP [7] and NLD [15] suffers from the significant amplification of noise. The proposed method restores the visibility of a noisy hazy image with reduced noise amplification as compared to existing methods.

Moreover, the performance of the proposed method is evaluated with two latest methods [45, 46]. These two methods also provide hardware implementation using very large-scale integration architecture (VLSI). Kumar [45] utilized infrared images to refine the transmission map and to enhance the visibility of the hazy images. It can be used with existing dehazing approaches, such as [7, 10, 14]. The qualitative analysis with [45] on two RGB-NIR hazy images is shown in Fig. 8.

Fig. 8
figure 8

Comparison with method [45] on two hazy RGB-NIR images. a Hazy image, b the method [45] using [7], c the method [45] using [10], d the method [45] using [14], e the proposed method

The method [46] presents a VLSI architecture-based dehazing method which can be utilized in a resource constrained environment. Figure 9 shows a visual comparison with this method on two hazy images.

Fig. 9
figure 9

Comparison with method [46] on two hazy images from D-Hazy dataset. a Hazy image, b the ground truth image, c the method [46], e the proposed method

The dehazed images in Fig. 8b and c have color distortions problem in the sky region as the color of the sky is much darker. This happens due to the limitation of the base methods. However, dehazed result using [14] in Fig. 8d produces a satisfactory result in the sky region. The proposed method in Fig. 8e has better dehazed image as compared to result obtained in Fig. 8b–d.

The comparison of the dehazed result with method [46] is shown in Fig. 9 on two hazy images from the D-Hazy dataset. The method [46] in Fig. 9c produces over enhanced result in some regions, marked by red rectangles. The recovered images by the proposed method in Fig. 9d are comparable to ground truth images in Fig. 9b.

4.1.2 Quantitative analysis

Besides the subjective qualitative assessment, the quantitative analysis also plays an important role to test the capability of dehazing methods. We have considered the three types of distortions in the dehazed image: halo artifacts, saturation of pixels, and blurring effect.

Therefore, we have used three referenceless metrics, i.e., the blur metric [47], blocking artifacts and luminance change (BALC) [48], and the saturated pixel ratio [49] as illustrated in Table 1.

Table 1 Comparison of values of blur, BALC, and \( \sigma \) metric on hazy images shown in Figs. 5 and 6

Blur metric is a no-reference metric ranging from 0 to 1. A value of 0 represents the best quality and 1 represents the worst quality of the image in terms of blur perceptions.

Zhan et al. [48] proposed a non-reference metric that provides the quality score of an image based on distortions such as blocking artifacts and blurring effect. It divides the image into 8*8 block non-overlapping blocks. Afterward, for each block, it predicts the quality by considering blockiness and luminance change. These two obtained scores are combined into a single quality score as follows:

$$ {\text{BALC}} = B_{\text{img}} *L_{\text{img}}^{ - \alpha } $$
(24)

where Bimg represents the blockiness of the image and Limg indicates the luminance change or blurring effect. \( \alpha \ge 0 \) is used to adjust the relative importance of two parameters. It is usually taken as 0.215.

A higher value of BALC is an indication of a lower quality image, having more blocking artifacts and blurring effect.

Saturated pixel ratio (\( \sigma \)) is an indication of saturation. Saturation turns the pixels into black or white in the restored image. The value of \( \sigma \) must be small for good dehazing performance.

Table 1 illustrates the values of three metrics: blur metric, BALC, and \( \sigma \) for hazy images shown in Figs. 5 and 6 using different methods.

Compared with other methods, the RASD and CAP methods obtain a large mean value of blur, indicating that dehazed images suffer from the blurring effects. The remaining methods are performing average.

If we observe the values of the BALC metric in perception of artifacts in dehazed image, the DCP method achieved the highest mean BALC value. The high value of BALC is indicating that the DCP method is not suitable for hazy images with large sky regions and produces the artifacts. NLD method is in the second place in the poor performance, and it also produces color artifacts in the sky regions. The performance of other methods is average.

Moreover, \( \sigma \) value is shown in Table 1. Most of the methods have achieved high \( \sigma \) values except for the DCP and the proposed method. The high value of \( \sigma \) indicates that existing methods suffer from the problem of oversaturation and color distortions.

By analysis of Table 1, the proposed method achieves the smallest value of blur, BALC, and \( \sigma \) metric among all the methods in the comparison. The analysis from Table 1 indicates that the proposed method produces a distortion-free dehazed image. Moreover, the quantitative analysis proves the validity of the qualitative analysis.

Table 2 illustrates the mean value of these metrics on Waterloo IVC dataset [43]. The RASD method achieves second highest value of BALC, indicating that dehazed images are artifacts-free. However, this method suffers from the problem of saturation (high value of \( \sigma \)) and blurring effect (high value of blur). The machine learning methods [13, 16] produce satisfactory result. The proposed method achieves lowest mean values of blur, BALC, and \( \sigma \). These values in this table prove that the proposed method restores the visibility of a hazy image without introducing any distortion.

Table 2 Comparison of mean values of blur, BALC, and \( \sigma \) metrics on Waterloo IVC dataset [43]

4.2 Comparison on hazy images when ground truth image is available

In addition to real-world hazy image, we have also evaluated the proposed method on the synthetic hazy images from the RESIDE dataset. Figure 10 shows some example images from this dataset. In this section, we have compared the proposed methods with recent state-of-the-art technologies, including PQC [25], CD [24], and LBF [21].

Fig. 10
figure 10

Synthetic hazy images from RESIDE dataset. a Hazy image, b GT, c PQC [25], d CD [24], e LBF [21], f the proposed method

Again, the assessment is done qualitatively and quantitatively.

4.2.1 Qualitative analysis

The visual comparison with three methods PQC [25], CD [24], and LBF [21] is shown in Fig. 10. These images are taken from the RESIDE dataset. Figure 10 shows the hazy images along with their ground truth images. It can be observed in the figure that the PCQ methods suffer from the problem of over enhancement (dark color) (see the sky regions in the third, fourth, and fifth image of Fig. 10c). The CD method generates distorted images (see Fig. 10d) because it enhances the image quality without considering the degradation mechanism, i.e., the physical model of haze imaging. The LBF method maintains the structure of the recovered image. However, it is unable to increase the contrast of the hazy image (see the third, fourth, and fifth image of Fig. 10e).

The dehazed images obtained by the proposed method are presented in Fig. 10f. In comparison with other methods, the proposed method restores the image similar to ground truth images.

4.2.2 Quantitative analysis

Furthermore, we perform the quantitative analysis of the hazy images presented in Fig. 10. We used two metrics: peak signal-to-noise ratio (PSNR) and structure similarity index measure (SSIM) [3]. The PSNR measures the distortion between haze-free image and ground truth image, while SSIM checks the similarity of two images in terms of structure. For good dehazing performance, both values must be high.

Table 3 shows the PSNR and SSIM values for images presented in Fig. 10. The CD method has the lowest mean value of PSNR and SSIM value, indicating poor quality of the dehazed image. The proposed method has achieved the highest mean PSNR value than PC, LBF, and CD. It also obtained the SSIM values higher than PC and CD. However, it has less SSIM value as compared to LBF.

Table 3 Comparison of PSNR and SSIM values on synthetic hazy images shown in Fig. 10

Table 4 illustrates the mean values of PSNR and SSIM on D-Hazy dataset [36]. The high values of these metrics indicate that the proposed method outperforms existing methods in terms of visibility, contrast, structure and is comparable to ground truth images.

Table 4 Comparison of mean values of PSNR and SSIM on D-Hazy dataset [36]

5 Conclusions and future work

In this paper, we have proposed an image dehazing method that improves the visibility of the hazy image without any distortion in the recovered image. The proposed method used superpixels and ensemble neural network. The superpixel groups a set of pixels that are homogeneous in color, texture, and brightness and offer fewer training examples. It also helps in reducing the halo artifacts which is a common problem in patch-based methods. The transmission is estimated by the ensemble network that works perfectly on a less volume of data.

This ensemble network avoids the problem of overfitting and improves the generalization of a network. Finally, the transmission from an individual network is averaged. The performance of the proposed method is tested on different challenging hazy images qualitatively and quantitatively. The experimental results demonstrate that the dehazed image by the proposed is free from various distortions such as artifacts, saturation, and blur. The proposed method works effectively in most of the cases, but sometimes the dehazed image has a over enhancement problem as shown in the experimental section (first image of Fig. 10f). The other limitation is that it is unable to increase the contrast in faraway regions for dense hazy images. The future work will focus on resolving these issues. We will also consider how the hardware implementation using very large-scale integration (VLSI) architecture of the proposed method can be taken place.