1 Introduction

In recent years, the field of computer vision has been heavily focused on the restoration of image quality due to haze [7]. The quality of natural images captured using devices such as smartphones and cameras is degraded due to weather conditions such as haze, fog, and rain [1]. These weather conditions act by blocking the visibility of the sky and reducing the overall image quality [2, 3]. Although the factors that limit visibility can cause a certain amount of damage to the image captured by a camera, low visibility due to haze or fog can also affect the quality of images. In transport, this can lead to serious accidents [4]. Due to the importance of image processing in various applications such as remote sensing and driver assistance systems, proper visibility is often required for these systems [6]. The images degraded by fog or haze cannot be used for automation without pre-processing [5].

Image restoration is the process of removing haze from a subfield of an image [8]. It mainly focuses on the modeling of image degradation caused by hazy weather [9]. Aside from physical modeling, image processing also involves the use of various image enhancement techniques such as contrast enhancement, brightness enhancement, and image recognition [10]. In earlier approaches, the challenge of haze in images was addressed by enhancing contrast as a means of reducing its adverse effects [11]. However, due to the presence of haze, it is now considered a highly challenging process to perform. This process involves dealing with various factors such as noise, data dependence, and complexity [12].

In recent years, various works have been performed to improve the process of dehazing by developing a deep-learning framework that can be used to analyze and predict the effects of haze on an image. The framework was able to obtain the correlation between the haze and the clear image by performing the haze reduction method in the grayscale or the color image [13, 14].

Various image dehazing techniques use visual cues to capture the statistical and deterministic properties of hazy images [15]. There are currently two types of image dehazing techniques: single-spectral (visible band) and multispectral. Significant advancements have been made in the field of single-spectral-based image dehazing methods. Several prior- and constraint-based techniques have been introduced, including unrelated surface shading and transmission, dark-channel prior (DCP), boundary constraint, low contrast property, and color attenuation prior. When dealing with thick or dense haze, the effectiveness of a single spectral (visible band) image dehazing method significantly diminishes, as the transmission becomes extremely low. In such scenarios, the only result is the amplification of noise. The multispectral technique deals with the near-infrared spectrum of light, it refines the transmission map and produces a clear dehazed image. These methods have contributed to notable progress in addressing the challenges of image dehazing [16]. Ordinary outdoor foggy images can be effectively cleared of fog using single-image defogging-based methods [17]. Several image dehazing techniques are also available in different types. These include patch-wise [32,33,34], pixel-wise [35], scene-wise [36, 37], and non-local-wise [38, 39]. The execution efficiency of haze removal is delayed by redundant computation and increased processing times in existing dehazing techniques. These techniques, including pixel-wise, patch-wise, scene-wise, and nonlocal-wise approaches, suffer from this issue due to the similarity of spatial structure. Furthermore, although each of these prior methods has its own advantages, they are incapable of effectively handling all practical situations. This limitation can result in vulnerable images and visual inconsistencies [18].

In recent times, deep learning theory has gained considerable attention and has proven to be successful in image-dehazing applications. By training on a large number of samples, powerful image dehazing systems can be developed. By leveraging a large number of samples for training, one can achieve powerful image dehazing systems. For instance, a trainable end-to-end system has been developed, which utilizes convolutional neural networks (CNN) to estimate scene transmission by incorporating various existing image priors [29]. In subsequent work, a method was proposed in [40] that utilizes a multi-scale CNN (MSCNN) to effectively learn features for determining the transmission. The authors of [41] introduced a dehazing model named All-in-One Dehazing Network (AOD-Net), which is constructed using CNN. AOD-Net offers the key advantage of directly generating haze-free results from a single hazy image using a lightweight CNN. Notably, this approach relies on processing local patches within individual images. Despite the ability of deep dehazing methods based on learning strategies to integrate or learn haze-related features and overcome the limitations of the patch-wise strategy, they do come with certain drawbacks. For instance, these methods may struggle to handle dense haze and require a large number of training samples [18]. In this paper, a novel cross-entropy-based deep learning neural network for the restoration of dehaze/defog images is proposed to address these issues. In this proposed CE-DLNN the remainder layer is considered to improve the classification accuracy.

The main contributions of the proposed work are,

  • Utilizing the IDFHE method the quality of the image has been improved without degrading the edges and blur of the images.

  • Using CE-DLNN the foggy and clear images are identified. In DLNN (Deep Learning Neural Network), the input layer receives extracted features as input data, which are then processed and analyzed by a number of hidden layers (HLs). The output layer (OL) produces the predicted result. However, the classifier often faces accuracy issues due to the selection of weight values. To address this problem, the proposed methodology introduces the remainder layer, which incorporates the behavior of storing previous training images. This allows for improved accuracy and performance in the classification process.

  • For restoring the Fog-free images MST technique is used. Generally, the existing variants of the Spatial Transfer Function (STF) used for extracting "detail halos" from an image struggle to effectively handle the escalating filtering strengths of guide and input images. To address these limitations, the Mean and Variance in MSTF (Multi-Scale Transfer Function) are calculated using the Bayesian model averaging function. This approach helps overcome the drawbacks associated with the previous STF variants.

The rest of the paper is structured as follows: In Section 2, approaches related to the restoration of defog images are explained. Section 3 explains the proposed restoration scheme. The performance of the proposed methodology is analysed in Section 4 and Section 5 concludes the paper with future enhancement.

2 Literature survey

Over the past few decades, numerous techniques have been suggested to effectively eliminate fog and haze. Some of the single image dehazing recent techniques are listed here.

Aneela Sabir et al. [19] developed a modified DCP using fog density and guided image-processing techniques to improve the transmission map (TM). The guided image filter was able to speed up the map's refinement. It also introduced segmentation techniques to improve the image's overall quality. Initially, the atmospheric light was calculated for each segment. For each segment, the TM was calculated using the average value of atmospheric light. The results of their evaluation revealed that the modified DCP was more accurate than the existing techniques. However, this approach was only suitable for images having large sky regions. Haseeb Hassan et al. [20] proposed an effective and automatic dehazing method that can be used to estimate the intensity of atmospheric light in an image. Superpixel segmentation was then used to segment the input image. The intensity of each super pixel was then compared with the intensity of each individual superpixel to extract the maximum possible super pixel. After that, an initial TM was estimated based on measured atmospheric light. Then the rolling guidance filter was used to refine the TM. The resulting haze-free image was then produced by combining the refined transmission and atmospheric light into a haze-imaging model. Yin Gao et al. [21] presented a dual-fusion method for single-image dehazing. The sky and non-sky regions of the haze image were divided by segmentation. A multi-region fusion (MRF) method was introduced for single-image smoothness and to optimize the transmission properly. The fusion method was created using the brightness transform function to remove the haze from a single image effectively. The results of the experiments revealed that the multi-region fusion method performed better than the state-of-the-art dehazing techniques. However, it was not able to achieve a precise segmentation between the sky and non-sky regions. Sabiha Anan et al. [22] introduced a framework that combines the segmentation and restoration techniques for the sky and non-sky regions. The framework isolated the regions and restored their respective parts using a binary mask formulated based flood-fill algorithm. The foggy sky part was restored by using Contrast Limited Adaptive Histogram Equalization (CLAHE) and the non-sky part was by modified DCP. The resulting image was blended using the restored parts. The results of the tests revealed that the MRF method performed better than the state-of-the-art dehazing techniques. However, it was not able to achieve a precise segmentation between the sky and non-sky regions. Dilbag Singh et al. [23] explored a dehazing model utilizing three novel concepts. Initially, an input image was decomposed using a mask into low and high-frequency regions based on image gradient magnitude. The depth information from an input foggy image is then extracted using a Gradient sensitive loss (GSL). Thereafter an efficient filter named a Window-based integrated means filter (WIMF) was designed to refine the TM. The results of the experiments revealed that the proposed technique was able to achieve significant results. The hyperparameter tuning issue of the method was not solved, which degraded the performance of the system. Apurva Kumari et al. [24] introduced a method for achieving visibility restoration based on gamma transformation and thresholding. The first step was to determine the atmospheric light value. This value was used to validate the image's color and contrast. The gamma transformation method was then used to determine the depth of the haze. The scene radiance was then restored using the same method. The results of the study revealed that the algorithm performed remarkably well compared to other methods. The limitation of the method was not suitable for dense fog images.

3 Proposed dehaze/defog image restoration system

Images obtained under adverse weather conditions such as fog or haze can severely limit the visibility of the scene. These conditions can also lead to low contrast and faded colors in the images. They also decrease the visibility significantly depending on the intensity of the fog and haze. Such images cannot be used for other applications without pre-processing. In order to restore visibility, a process known as dehazing is performed. The process of dehazing an image involves removing the haze layer and bringing back the vibrant color and structure of the image. This paper introduces a novel visibility restoration approach based on a Cross Entropy-based Deep Learning Neural Network (CE-DLNN) classifier. The proposed method classifies the clear and haze/fog images and reconstructs the dehaze/defog images using the MST method. The block diagram of the proposed methodology is shown in below Fig. 1,

Fig. 1
figure 1

The proposed methodology

3.1 Pre-processing

The input images \(B_{n}\) are applied for several pre-processing steps to prepare the data for further processing. In the proposed methodology, the input images are pre-processed by using the Intensity-based Dynamic Fuzzy Histogram Equalization (IDFHE) technique to obtain the resultant high-quality image for which the intensity is almost equivalent to the average intensity of the input image. IDFHE is a histogram equalization technique, which uses the fuzzy statistics of digital images. The existing contrast-limited adaptive histogram equalization [42] and Gaussian linear contrast stretching model [43] need more iterations to enhance the images. In IDFHE, first, to handle the inaccuracy of Gray-level values, the histogram is created and it's divided into multiple sub-histograms. Then the sub-histograms are assigned to a replacement dynamic range and equalized separately by the HE approaches. Existing Histogram Equalization techniques tend to introduce some visual noise and undesired enhancement, including over-enhanced, under-enhanced, and intensity saturation effects. The proposed method takes into account the varying intensity levels in the image to avoid the issue of intensity saturation. It also computes the histogram using the fuzzy membership function. The subsequent stages of the IDFHE are detailed as follows,

Histogram computation

Computation of histogram decreases the imperfectness of grey value in a better way and offers a smooth histogram. Thus, the histogram, which is the series of real numbers is computed and divided into the number of sub-histograms using the intensity preservation measure. It can be expressed as,

$${B}_{Int(n)}=\frac{1}{{H}_{F(k)}}+\sum {H}_{F(k)}{\chi }_{{B}_{n}\mathrm{^{\prime}}k}k\in u,v$$
(1)
$${\chi }_{{B}_{n}\mathrm{^{\prime}}k}=\mathit{max}\left(\mathrm{0,1}-\frac{|{B}_{n}-k|}{4}\right)$$
(2)

where, \({B}_{Int(n)}\) gives the measure of intensity preservation, the histogram \({H}_{F(k)}\) represents the number of occurrences of grey levels around \(k\), \({\chi }_{{B}_{n}{\prime}k}\) is the fuzzy membership function used to handle the inaccuracy of grey value properly, \(u,v\) are the sustenance of the membership function. Then, the histogram is divided into several sub-histograms based on the intensity preservation measure.

Dynamic range allocation and equalization

The histogram partitions obtained from the above step are mapped to a dynamic range using the spanning function. The dynamic range of the output sub-histogram is denoted as,

$$D{R}_{k}=\frac{fa{c}_{k}}{{\sum }_{l=1}^{n+1}fa{c}_{l}*(N-1)} where,fa{c}_{k}=sp{n}_{k}\times {\mathit{log}}_{10}{L}_{k}$$
(3)

where, the spanning function \((spn={k}_{high}-{k}_{low})\), factor \((fac)\), output dynamic range \((D{R}_{k})\) are the parameters useful for the dynamic equalization process, \({L}_{k}\) is the total number of pixels in the sub-histogram, \(N\) is the sequence of real numbers from the histogram. The output dynamic range of the \(k^{th}\) sub histogram \((D{R}_{k})\) is obtained from the range of \([{k}_{start},{k}_{stop}]\) as,

$${k}_{start}={\sum }_{k=1}^{l-1}D{R}_{k}+1$$
(4)
$${k}_{stop}={\sum }_{k=1}^{l}D{R}_{k}$$
(5)

Then, the equalization of each sub histogram is done by remapping the histograms. After the equalization, the new image with new intensity levels can be estimated as,

$${B}_{enh(n)}={k}_{start}+({k}_{stop}-{k}_{start}){\sum }_{k={k}_{start}}^{l}\frac{{H}_{F(k)}}{{L}_{k}}$$
(6)

where, \({B}_{enh(n)}\) is the enhanced image with a new intensity level corresponding to the \({k}^{th}\) intensity level on the original image.

3.2 Dark channel prior

After pre-processing, the enhanced image \({B}_{enh(n)}\) is then used to estimate the DCP to find the haze thickness in the image. The DCP is a type of statistic of haze-free outdoor images. It can be calculated by estimating the low intensity of the haze in the image. For instance, in most haze-free images, the low intensity (close to zero in at least one-colour channel) of the haze is observed in the local patches that have multiple colour channels. The dark channel for an image can be defined as,

$$D{C}_{{B}_{enh(n)}}=\underset{x,y\in \Psi }{min}\left(\underset{cc\in (R,G,B)}{min}{B}_{enh(cc(n))}(x,y)\right)$$
(7)

where, \({B}_{enh(cc(n))}(x,y)\) denotes the Colour Channel (CC) of the input image \({B}_{enh(n)}\), the colour channel includes Red, Green, and Blue (RGB) colour space, \(\Psi\) is the local patch based on \(x,y\), \(D{C}_{{B}_{enh}(n)}\) is the output image after estimating the dark channel in which the intensity is very low except the sky region. Therefore, the dark channel \(D{C}_{{B}_{enh}(n)}\) is selected as the lowest value among three colour channels and all pixels in \(\Psi\).

3.3 Transmission map estimation

The dark channel (DC) computed for the image \(D{C}_{{B}_{enh(n)}}\) is then used to estimate the transmission map (TM). To find the TM, atmospheric light needs to be computed. Let, \(N\) be the brightest pixel value in the DC and \(m\) is a temporary variable. As a first step, \(D{C}_{{B}_{enh(n)}}\) is compared with the max value of \(N\). If both values are equal, the atmospheric light \({L}_{Atmp}\) is regarded as the input image pixels corresponding to pixel \(N\) pixel locations. Then, the input image is divided with its respective colour channel atmospheric light \({L}_{Atmp}\) to generate the TM. It can be expressed as,

$$T{M}_{D{C}_{{B}_{enh(n)}}}=1-\alpha \left(\frac{D{C}_{{B}_{enh(n)}}}{{L}_{Atmp}}\right)$$
(8)

where, \(T{M}_{D{C}_{{B}_{enh(n)}}}\) denotes the estimated TM, \(\alpha\) is the parameter whose value depends on the fog density of the input image.

3.4 Feature extraction

After the estimation of the TM, the important features such as Standard deviation, Mean gradient, Information entropy, colour, Correlation, Kurtosis, Skewness, SURF, HOG features, edge, GLCM Features, and wavelet features are extracted from the image \(T{M}_{D{C}_{{B}_{enh(n)}}}\).

Colour features

Colour features are the most significant characteristic of an image, which helps to understand photometrical information of different colour channels. Color is defined based on the presence of color information like distribution of color, dominant color, or colour moments. The proposed work utilizes the color moments like mean, variance, and standard deviation as the colour features. The colour features \({\Phi }_{clrfea}\) are expressed as,

$${\Phi }_{clrfea}=\{{\phi }_{mean},{\phi sd}_{var}, {\phi }_{sd}\}$$
(9)

where,

$${\phi }_{mean}=\frac{1}{i*j}{\sum }_{x=1}^{i}{\sum }_{y=1}^{j}B(x,y)$$
(10)
$${\phi \frac{1}{i*j}{\sum }_{x=1}^{i}{\sum }_{y=1}^{j}(B(x,y)-{\phi }_{mean}{)}^{2}}_{var}$$
(11)
$${\phi }_{sd}=\sqrt{{\phi }_{var}}$$
(12)

where,\({\phi }_{mean}\), \({\phi }_{var}\), \({\phi }_{sd}\) are the mean, variance, and standard deviation of an image of size \(i*j\), \(B(x,y)\) denotes the pixel intensity values.

Kurtosis and skewness

The skewness gives the degree of symmetry or asymmetry in the distribution where the kurtosis measures how much the tails of the distribution deviate from the normal distribution. The features are computed as,

$${\Phi }_{skn}=\left\{\frac{1}{i*j}{\sum }_{x=1}^{i}{\sum }_{y=1}^{j}{\left(\frac{B(x,y)-{\phi }_{mean}}{{\phi }_{sd}}\right)}^{4}\right\}$$
(13)
$${\Phi }_{kur}=\frac{1}{i*j}{\sum }_{x=1}^{i}{\sum }_{y=1}^{j}{\left(\frac{B(x,y)-{\phi }_{mean}}{{\phi }_{sd}}\right)}^{3}$$
(14)

where, \({\Phi }_{skn}\), \({\Phi }_{kur}\) are the extracted skewness and kurtosis features.

Information entropy

It describes the amount of information provided by the image. Entropy describes how much uncertainty or randomness there is in an image. It can be expressed as,

$${\Phi }_{E}(B)=-{\sum }_{n=1}^{N}p({B}_{n}){\mathit{log}}_{b}p({B}_{n})$$
(15)

where, \({\Phi }_{E}(B)\) is the entropy corresponds to the probability distribution \(p({B}_{n})\).

SURF

Speed-Up Robust Features (SURF) is based on adding of 2D Haar wavelet responses and makes efficient use of integral images. SURF features are extracted by using the following steps: (1) scale-space extrema detection (2) key-point localization (3) orientation assignment and (5) feature point descriptors. The image is filtered by the Gaussian kernel and the Hessian matrix used to detect the interest points is constructed as,

$${M}_{H}(x,t)=\left[\begin{array}{c}{d}_{xx}(x,t){d}_{xy}(x,t)\\ {d}_{yx}(x,t){d}_{yy}(x,t)\end{array}\right]$$
(16)

where, \({d}_{xy}(x,t)\) denotes the first-order derivative of the filter's result in \(x\) and \(y\) direction, \({M}_{H}(x,t)\) denotes the Hessian matrix in point \(x\) of the image at scale \(t\).

After the key points are detected, Haar wavelet responses are calculated to assign an orientation to each point. Then, the square region along the obtained orientation is constructed and split into \(4\times 4\) sub-regions. For each sub-region, the features are then weighted in horizontal and vertical directions to extract the feature's descriptor.

HOG

The histogram of gradient (HOG) operator is applied to focus on the structure or shape feature vector. In the HOG vectors, the image is divided into smaller regions, and the gradient and orientation are calculated for each region. The magnitude \({G}_{\vartheta }\) of the gradients \({\vartheta }_{p}\) and \({\vartheta }_{q}\) the \(p,q\) direction and the orientation is calculated as,

$${G}_{\vartheta }=\sqrt{{{\vartheta }_{p}}^{2}+{{\vartheta }_{q}}^{2}}$$
(17)
$$\partial =\mathit{argtan}\left(\frac{{\vartheta }_{p}}{{\vartheta }_{q}}\right)$$
(18)

where, \({\vartheta }_{p},{\vartheta }_{q}\) are the horizontal and vertical gradients, \(\partial\) is the gradient orientation.

Then the histogram (or features) for each region is generated by using the gradients and orientation values. Once the histogram is formed, the histogram vector of all regions is normalized and combined to get the features for the entire image. The final representation of the HOG feature is denoted as \({\Phi }_{HOG}\).

Edge

Edge feature from the image is obtained by using the steps called noise removal, differentiation, non-maximum suppression, double thresholding, and edge tracking. In the canny edge detection, first, the image has smoothed and the gradients, which ensure the presence of edges are taken. Then, the non-maxima suppression in which the points are not at the maximum is suppressed by computing the magnitude and direction of the gradients is performed to detect the edges. The strong and weak edges are identified by taking two thresholds namely, upper threshold \(({\tau }_{up})\) and lower threshold \(({\tau }_{lw})\) in double thresholding. Final edges are categorized with the help of edge tracking as,

$$\Phi_{edge}=\left\{\begin{array}{lll}Strg_{E\;}&if&\vartheta(x,y)>\tau_{lw}\\Wk_{E\;}&if&\tau_{lw}<\vartheta(x,y)<\tau_{up}\\Non_E&if&\tau_{lw}>\vartheta(x,y)\end{array}\right.$$
(19)

where, \(Str{g}_{E},W{k}_{E},No{n}_{E}\) are the categorized as strong, weak, and non-edges, \(\vartheta (x,y)\) is the gradient to determine the type of the edges.

GLCM

The Gray-level co-occurrence matrix (GLCM) is a way of extracting image properties related to second-order statistical texture features. The GLCM matrix is a representation of the number of rows and columns that equal the grey levels of an input image. It is constructed by taking into account the spatial relationships between the values in the image and the rows and columns.

The GLCM features are expressed as,

$${\Phi }_{GLCM}=\{{\phi }_{1},{\phi }_{2},{\phi }_{3},...........,{\phi }_{n}\}$$
(20)

where, \({\Phi }_{GLCM}\) denotes the extracted GLCM features, \({\phi }_{n}\) denotes the \({n}^{th}\) GLCM feature.

Wavelet features

The wavelets are used in extracting features from the input images. The Discrete Wavelet Transform (DWT) is used for extracting the wavelet coefficients from the images, which are the function of scale and position and localizes the frequency information. The different frequency components studied are expressed as,

$$g(B)=\left\{\begin{array}{c}{\lambda }_{x,y}=\sum g(B){\varpi }_{hpf}*x(B-2xy)\\ {\lambda }_{x,y}=\sum g(B){\varpi }_{lpf}*x(B-2xy)\end{array}\right.$$
(21)

where, \({\lambda }_{x,y}\) refers to the coefficients represent the frequency component in the signal \(g(B)\), the functions \({\varpi }_{hpf}\), \({\varpi }_{lpf}\) represents the high frequency (HF) and low-frequency (LF) filter coefficients.

The DWT decomposes the image into two components: the LF and the HF components. The LF components provide global description and detail information, while the HF components provide information about finer details. The LF content represents that the approximation of the original image is further decomposed for 2nd-level approximation and image details. The process is continued until the required level of resolution is obtained.

Thus, the final set of features extracted are expressed as,

$${\Phi }_{fea(n)}=\{{\Phi }_{1},{\Phi }_{2},{\Phi }_{3},.....,{\Phi }_{N}\}$$
(22)

where, \({\Phi }_{fea(n)}\) denotes the number of features extracted, \({\Phi }_{N}\) represents the \({N}^{th}\) number of features.

3.5 Classification

The extracted features \({\Phi }_{fea(n)}\) are rendered as input to the Cross Entropy-based Deep Learning Neural Network (CE-DLNN). DLNN is a neural network with an input layer (IL), output layer (OL), and multiple hidden layers (HL). In DLNN, the input layer of neurons receives the input data that are the extracted features, the input data is processed and analysed by the number of HLs, and then the OL produces the predicted result. Normally, the classifier suffers from the accuracy problem for the reason of weight value selection. To solve this problem, the remainder layer is considered in the proposed methodology, which consists of the behaviour of storing the previous training images. The remainder layer also stores the weight values of the previous training. Thus, the remainder layer is highly helpful for weight selection. It increases the accuracy. Then, the normal loss calculation is not well suitable for all types of input. So, in this research, the loss is calculated based on the cross-entropy. The architecture of the proposed CE-DLNN is shown in Fig. 2,

Fig. 2
figure 2

The architecture of CE-DLNN

The procedure of each neuron processing the data is as follows,

  • Input data is fed into the neuron.

  • Multiplying weights to each input.

  • Summing the product of weights and input data and adding bias.

  • Applying activation function.

  • Producing output and labelling.

The mathematical calculation to attain the output of the neurons is expressed as,

$${D}_{l}={\Omega }_{af}\left(\sum {\Theta }_{n}{\Phi }_{fea(n)}+\wp \right)$$
(23)
$$\Omega_{af}=\left\{\begin{array}{ccc}1&if&(\sum\Theta_n\Phi_{fea(n)}+\wp\geq0)\\0&if&(\sum\Theta_n\Phi_{fea(n)}+\wp<0)\end{array}\right.$$
(24)

where, \({D}_{l}\) is the output layer to produce the output of the network, \({\Omega }_{af}\) denotes the activation function, \({\Theta }_{n}\) is the weights multiplied with the input, and \(\wp\) is the bias value-added to the summation. During the classification process, once the output is computed, the current weight values between the input and HL, and the hidden and output layer get copied into the corresponding reminder units and are used in the next time step. The reminder layer stores the weight values as,

$${D}_{rl}={D}_{il,hl,ol}({\Theta }_{n}-1)$$
(25)

where, \({D}_{rl}\) denotes the output of the reminder layer, \({D}_{il,hl,ol}\) denotes the output of the IL, HL, and OL. The OL contains the output of two classes called, clear image \({B}_{clr}(x,y)\) and haze/fog image \({B}_{hz/fg}(x,y)\).

Then, the loss is estimated by calculating the difference between the actual and predicted outputs for all classes. The proposed work utilized the cross-entropy function for prediction, and it can be expressed as,

$$C{E}_{LF}=-\frac{1}{L}{\sum }_{l=1}^{L}{D}_{l}\mathit{log}\stackrel{\iff }{{D}_{l}}+(1-{D}_{l})\mathit{log}(1-\stackrel{\iff }{{D}_{l}})$$
(26)

where, \(C{E}_{LF}\) represents the loss of the model, \(L\) represents the number of output data,\({D}_{l}\),\(\stackrel{\iff }{{D}_{l}}\) are the actual and predicted class outputs. Algorithm 1 shows the procedure followed in the proposed CE-DLNN method.

Algorithm 1
figure a

Classification using CE-DLNN

3.6 Restoring haze/fog-free image

After classification, the Haze/Fog image \({B}_{hz/fg(n)}\) is restored as a Haze/Fog-free image by applying the restoration technique called modified Structure Transference Filtering (MSTF) technique. Structure transference filtering (STF) is known as the structure transferring property of the guidance filter where the filter can transfer the structure from the guidance to the filter output. In STF, a guidance image that could be identical to the analysed input image is used to guide the process and the dehaze image is reconstructed by refining the transmission map. In general, the STF variants that are used for extracting “detail halos” from an image are not able to handle the increasing filtering strengths of guide and input images. To overcome the drawbacks, the Bayesian model averaging function is used to calculate the Mean and Variance in MSTF.

The Guidance filter, which consists of the STF, takes the inputs of the analysed image \({B}_{hz/fg(n)}\) to be filtered and the guidance image \({B}_{Guide(n)}\) to transfer the predefined structure to the image to be filtered. The structure to be transferred is defined by the guidance vector field. The output image is obtained as,

$${B}_{refined(n)}={c}_{j}({B}_{hz/fg(n)}){B}_{Guide(n)}+{d}_{j}({B}_{hz/fg(n)})$$
(27)

where, \({c}_{j}\),\({d}_{j}\) are the linear coefficients, \({B}_{refined\left(n\right)}\) denotes the output after refining the transmission map (TM). The linear coefficients are computed by minimizing the objective function as,

$$\partial =({B}_{refined(n)}-{B}_{hz/fg(n)}{)}^{2}+\beta {{c}_{j}}^{2}$$
(28)
(29)
(30)

Where, \(\partial\) denotes the objective function, \(\beta\) is the regularization parameter, , \({\mathfrak{I}}_{\mathit{var}(j)}\) are the mean and variance of \({B}_{Guide(n)}\) in window cantered at \(j\), \({B}_{hz/fg(n)}{\prime}\) is the mean of \({B}_{hz/fg(n)}\) in \({A}_{j}\). The , \({\mathfrak{I}}_{\mathit{var}(j)}\) are computed using the Bayesian model averaging function as,

(31)

Where, \(\xi\) denotes the weights that indicate the plausibility of each scenario, \(\Delta\) represents the delay of assessments. After refining the transmission map the dehaze/defog image is reconstructed as,

$${B}_{deh/def(n)}=\frac{{B}_{hz/fg(n)}-{L}_{Atmp}}{{B}_{refined(n)}+\beta }+{L}_{Atmp}$$
(32)

Then, the dehaze/defog image \(B_{deh/def(n)}\) obtained is used for restoration.

4 Result & discussion

To assess the effectiveness of the proposed restoration method, various experiments are conducted in this section. The proposed defog/dehaze image restoration system is implemented in the working platform of MATLAB.

4.1 Database description

The proposed method utilized the real-world images (Tiananmen, y01, y16, Pumpkin, Stadium, Swan, Cone, House, and Mountain) and synthetic images taken from the Realistic Single Image Dehazing (RESIDE) (https://sites.google.com/view/reside-dehaze-datasets/reside-standard) dataset for the experimental evaluation. The RESIDE dataset is a large-scale benchmark that contains five different sub-groups.

The above Figs. 3, 4, and 5 show the input images and the resultant dehazed images with their estimated transmission map of synthetic and real-world images of RESIDE datasets. Figures 3(a), 4(a), and 5(a) show the input images, Figs. 3(b), 4(b), and 5(b) shows the estimated transmission map. Finally, the obtained dehaze/defog images are shown in Figs. 3(c), 4(c), and 5(c). The Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15 show the results comparisons of the proposed with original and existing methods results. The proposed technique outperformed the existing methods described in [24] in terms of enhancing dehazed/defogged images.

Fig. 3
figure 3

The result analysis of synthetic Cone, Forest, and Pumpkin images: (a) input images, b transmission map and (c) dehazed/defog images

Fig. 4
figure 4

The result analysis of real-world Tiananmen, Y01, and swan images: (a) input images, b transmission map and (c) dehazed/defog images

Fig. 5
figure 5

The result analysis of Y16, Stadium, House, and Mountain: (a) input images (b) transmission map and (c) dehazed/defog images

Fig. 6
figure 6

The result analysis of synthetic Cone (a) original image (b-h) Dehaze/defog He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 7
figure 7

The result analysis of synthetic Forest (a) original image (b-f) Dehaze/defog images Cai et al. [29], Ren et al. [40], Berman et al. [39], Apurva Kumari et al. [24], and Proposed method, respectively

Fig. 8
figure 8

The result analysis of synthetic Pumpkin (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 9
figure 9

The result analysis of real-world Tiananmen (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 10
figure 10

The result analysis of real-world Y01 (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 11
figure 11

The real-world swan (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 12
figure 12

The y16 (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 13
figure 13

The Stadium (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 14
figure 14

The House (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

Fig. 15
figure 15

The House (a) original image (b-h) Dehaze/defog images He et al. [25], Bilateral filter, Guided filter, Huang et al. [28], Ju et al. [18], Apurva Kumari et al. [24], and Proposed method respectively

4.2 Quantitative analysis of real-world images

In this section, the quantitative analysis for the proposed and existing methods is conducted. The analysis is done based on the real-world images using the performance metrics such as contrast gain (CG), sigma, E-value, R-value, and visual contrast measure (VCM). Then, the structural similarity index measure (SSIM), image visibility measurement (IVM), and Peak-to-signal Noise Ratio (PSNR).

The performance metrics used for the analysis are defined as follows,

$$VCM=\frac{NL{A}_{sd}}{NL{A}_{t}}$$
(33)
$$CG=M{C}_{{B}_{dhz/dfg(n)}}-M{C}_{{B}_{hz/fg(n)}}$$
(34)
$${E}_{val}=\frac{NV{E}_{{B}_{n}}-NV{E}_{{B}_{dhz/dfg(n)}}}{NV{E}_{{B}_{dhz/dfg(n)}}}$$
(35)
$${R}_{val}=\frac{A{G}_{{B}_{n}}}{A{G}_{{B}_{dhz/dfg(n)}}}$$
(36)

where, \(VCM\) is the Visual contrast measure, \(NL{A}_{sd}\) denotes the number of local areas in with standard deviation, \(NL{A}_{t}\) is the total number of local areas, Contrast gain \((CG)\) is the difference between the mean contrast of the defoggy and foggy images, \({E}_{val}\) is the E-value, \(NV{E}_{{B}_{n}}\) and \(NV{E}_{{B}_{dhz/dfg(n)}}\) are the number of visible edges in the original and restored image, R-value \(({R}_{val})\) is the ratio of average gradients in the original and gradient images.

Table 1 shows the performance measurement of 1 (a) contrast gain and 1 (b) sigma values for the proposed and existing haze/fog removal methods. Contrast gain gives the mean contrast difference amid dehaze/defog and haze/fog images whereas sigma gives the percentage of saturated pixels. To obtain improved performance, the contrast gain should be higher, and the sigma value should be lower. On analysing the above Table 1(a), the contrast gain attained by the proposed method for all images is higher than the existing methods and in terms of sigma value also the proposed method obtained better performance. More saturated pixels indicate poor quality of restored images. Therefore, the higher sigma value attained by Huang et al. [27] is not acceptable which highlighted in Table 1(b). A visual inspection of the proposed technique revealed that it is less susceptible to the artifacts produced by the structure transference filtering variants. This suggests that it is more efficient to restore the image's quality.

Table 1 Quantitative result analysis of (a) contrast gain (b) sigma

Table 2 demonstrates the performance of the proposed and existing restoration methods based on the performance metrics, 2(a) VCM 2(b) E-value, and 2(c) R-value. VCM parameter is used to measure the visual contrast of the images and the higher values of VCM produce the clearer images. The E and R values are the measures of visibility of the images. In Table 2(a), the proposed method achieved a VCM value higher than the existing methods. Then, the analysis of the results of Table 2(b) and (c) deliver that the E and R-values of the existing methods are smaller and higher for several images respectively. But the proposed method achieves the same level of results for all images, which suggests a better performance of the proposed method. Hence, the analysis justifies the advantage of the proposed approach over the existing [18, 25,26,27], and [24] methods.

Table 2 Quantitative result analysis of (a) VCM (b) E-value and (c) R value

Figure 16 shows the SSIM, PSNR, IVM, and VCM comparisons of the proposed and existing methods for the three real-world images such as forest, cone, and pumpkin. A large value of PSNR and SSIM gives the smaller image distortion and better structural similarity between the observed output and the actual image. From the above charts, it can be said that the proposed technique has the highest values of SSIM and PSNR. It is evident that the recovered images that are clearer compared to the existing methods. Concerning IVM and VCM, the existing [29] and [30] obtained much lower performance than the other methods and the IVM and VCM of the proposed method are very high than the prevailing methods. Thus, the analysis states that the proposed method is superior to existing methods.

Fig. 16
figure 16

Quantitative result analysis of (a) Forest, (b) cone, and (c) pumpkin images

4.3 Result analysis of synthetic images

In this section the results of city and Gugong images of synthetic images are shown in Fig. 17. The foggy input images and the resultant defog images with their estimated transmission maps are shown.

Fig. 17
figure 17

The result analysis of City, and Gugong images: (a) input images (b) transmission map and (c) dehazed/defog images

The quantitative result analysis of the synthetic images for the proposed and existing [29,30,31], and [24] method’s comparison is shown in Table 3. The parameters used for the analysis are the same as those used in the analysis of real-time images except that mean square error (MSE) is used instead of PSNR.

Table 3 Quantitative analysis of (a) Gugong Image (b) City Image

In analysing the above Table 3, the results obtained by the existing methods are varied as higher and lower for contrast gain, E-value, and R-value. The values of the proposed method are better than the existing methods, which shows that the results are improved. In the above Table 3 SSIM, MSE, and VCM of the proposed and existing methods are shown. The synthetic images such as Gugong and city are used for the analysis. The SSIM achieved by the proposed method is slightly close to the existing [29] and [24] for the Gugong image and very high than the existing methods for the city image. However, the lower MSE of the proposed method indicates more acceptance of the observed images. In terms of VCM, for the image Gugong, the performance of existing [29] and [30] are very lower, and the existing [31] and [24] achieved better results than the existing [29] and [30] methods and lower performance than the proposed method. For the city image also, the proposed method has better performance. Thus, the analysis delivers that the recovered results of the proposed method are similar to the actual ground truth among all the methods.

The above Fig. 18 evaluates the efficacy of proposed and existing methods in terms of computation time. The implementation of the existing [25] is computationally high. Compared to the [25] method the computation time of the [26] and [27] is crucially low for all images. For [28], the estimated time is lower than the [25] and higher than other methods. But improving the quality of images and classifying the haze-free images progressively makes the proposed method take lower processing time for all images, which indicates the efficiency of the proposed method.

Fig. 18
figure 18

Computation time analysis

4.4 Performance analysis of classification

In this section, the performance of the proposed CE-DLNN is analysed with existing methods such as Deep Neural Network (DNN), Neural Network (NN), Adaptive Neuro-Fuzzy Interference System (ANFIS), and Optimized ANFIS (OANFIS). The analysis is done concerning the quality metrics such as sensitivity, specification, accuracy, precision, recall, F-measure, Negative Predictive Value (NPV), False Positive Rate (FPR), False Negative Rate (FNR), Mathews Correlation Coefficient (MCC), False Detection rate (FDR), False Rejection Rate (FRR).

Figure 19 (a-j) analyses the performance of the proposed CE-DLNN and existing DNN, NN, ANFIS, and OANFIS classifiers concerning the performance metrics namely Sensitivity, specificity, and accuracy are widely used statistics to quantify the ability of the model the prediction of clear and haze/fog images. F-Measure produces a single score that takes into account both precision and recall concerns. NPV is the probability that subjects with several true negatives, which means the greater the better. MCC is used to measure the quality of classification, which implies greater the value better the performance in terms of prediction as clear and haze/fog images. Some other statistics such as FPR, FRR, FDR and FNR should be lower for the improved performance. Figure 19 (a-j) visually demonstrates the superior performance of the proposed classifier across all quality metrics. The predictive capabilities of the proposed classifier significantly surpass those of existing methods, indicating its advanced performance level.

Fig. 19
figure 19figure 19figure 19figure 19figure 19

The performance of proposed and existing classifiers (a) Sensitivity (b) Specificity (c) Accuracy (d) F-measure (e) NPV (f) FPR (g) FNR (h) MCC (i) FRR (j) FDR

5 Conclusion

An efficient restoration methodology is presented in this paper. The proposed technique is utilized a novel CE-DLNN method for classifying the clear and haze/fog images and MST for refining the transmission map to produce the defog/dehaze images. The proposed method consists of six phases. The proposed CE-DLNN and MST are compared to existing methods to verify the system's effectiveness. Experimental results are proved that the proposed methods outperformed existing methods. The accuracy of the proposed method is improved by 43.68% over the existing methods. The proposed system is able to reduce the computation time of the image processing process by filtering out the haze-free images. An analysis of the proposed method revealed that it produces better quality images with less saturated pixels and enhanced contrast. The results of the study revealed that the proposed method is more effective than the existing methods when it comes to reconstructing images. In the future, the work can be extended to design the defogging system for live videos.