1 Introduction

Bad weather image restoration is important for a number of real-world applications including video-based car driver assistance systems, autonomous drone navigation, and video surveillance. Haze, a prevalent atmospheric phenomenon, can substantially impair the effectiveness of high-level vision tasks, thereby underscoring the practical significance of a robust, well-generalized dehazing algorithm. A popular simplified model of images degraded by haze [1,2,3] is described by the equation:

$$\begin{aligned} {\textbf {I}}(x)=t(x){\textbf {J}}(x)+{\textbf {A}}\left( 1-t(x) \right) \vspace{-0.2cm} \end{aligned}$$
(1)

Here \({\textbf {I}}(x)\) and \({\textbf {J}}(x)\) are the hazy and clear images, respectively. The term \({\textbf {A}}\) is the global atmospheric light. The transmission map \(t(x) = e^{-\beta d(x)}\) quantifies the portion of the light that reaches the camera, where \(d(x)\) is the scene depth and \(\beta \) is the atmospheric scattering coefficient.

Image dehazing is an ill-posed problem. Traditionally, most prior-based algorithms and early learning-based methods have focused on estimating the transmission map \(t(x)\) and global atmospheric light \({\textbf {A}}\) to reconstruct a clear image, as described by Eq. 1. More recently, advanced learning-based approaches have emerged that either directly predict the latent haze-free image or the residuals between haze-free and hazy images, thereby enhancing performance.

This paper introduces a novel deep learning method to attack the image dehazing problem. The approach follows the learning strategy of the Zero-DCE low-light image enhancement [4]: it is based on designing a multi-term no-reference loss function that distinguishes haze-free images from hazy ones. It doesn’t require paired hazy and hazy-free images of the same scenes and employs a lightweight neural network. As a result, in addition to haze suppression and image clarification, achieves a remarkably high image processing speed. The source code is available at https://github.com/Hongyi311/Fast-No-Reference-Deep-Dehaze.

1.1 Related work

As mentioned earlier, most dehazing methods can be divided into two categories. One is based on light scattered physical modeling [5,6,7] and hand-crafted image priors. Most of these methods leverage priors to estimate the transmission map \(t(x)\) and atmospheric light \({\textbf {A}}\), subsequently obtaining the restored dehazed image \({\textbf {J}}(x)\). The dark channel prior (DCP) [1] is pioneering work in this field, from which some improved methods based on DCP also emerged [8,9,10]. Further advancements have been achieved through the introduction of the Color Attenuation Prior [11], Gradient Channel Prior [12], Region Line Prior [13], and more recent innovations such as the Saturation Line Prior [14], Rank-One Prior [3], and Region Gradient Constraint Prior [15], etc., have further enhanced the performance of prior-based methods. These priors are often estimated based on statistical characteristics of haze-free and hazy images, boasting commendable interpretability and generalizability. However, their effectiveness may be diminished across images in varied scenarios.

The other family of methods exploits deep learning, aiming at restoring hazy images through well-trained neural network models. Cai et al. [2] first proposed a CNN-based model named DehazeNet to estimate the transmission map. Then Li et al. [16] proposed AODNet, which modified (1) to obtain recovered images in an end-to-end manner. Afterward, more sophisticated models emerged, such as incorporating attention mechanisms [17,18,19,20] and transformer-based architecture [21, 22], etc. They provided significant improvements in dehazing performance. However, such methods cannot overcome the reliance on paired clear images and synthesized hazy images as the training data. Real-world hazy scenes can exhibit a wide range of variability in terms of atmospheric conditions, lighting, scene content, etc. This poses an ongoing challenge for the model’s generalization on real-world images. Although generative adversarial network (GAN) based methods [23, 24] have been proposed to eliminate the need for paired images, they still require careful selection of training data and usually incur significant training costs. UCL-dehaze [25] and C2PNet [26] introduced the novel unsupervised contrast learning approaches to dehazing, the models are extremely large in size and computationally intensive.

Recent progress in using deep learning models for image dehazing includes combining together extended haze models and sophisticated neural network architectures [27] and employing novel learning strategies [28,29,30]. Finally, it is worth mentioning recent works [31, 32] where multi-scale edge-aware image filters are proposed for image dehazing purposes.Footnote 1

Since dehazing algorithms often serve as data pre-processing for high-level tasks, real-time performance is one of the key objectives. In recent prior-based algorithms, Rank-One Prior [3] introduced GPU acceleration, achieving a processing time of 0.04 s per image for small-sized images (less than 720p). Region Gradient Constraint Prior [15] employs a parallel sliding window strategy, achieving impressive processing times of 0.004 s for a 512 \(\times \) 512 image. Deep learning-based algorithms primarily enhance speed by reducing model size and FLOPs. With GPU acceleration, TOENet [19] can process a 2K hazy image in only 0.006 s. TSDNet [20] separates the dehazing task into three simple stages, achieving a processing speed of over 30 FPS with better quality.Footnote 2 It is difficult to decide which of these methods is the fastest one, as different hardware components and programming languages are used in the above-mentioned methods. However, it is safe to assert that the proposed method, achieving a processing speed of 1K FPS for 2K resolution images, is at least four times faster than the previously fastest method.

1.2 Contribution

In this paper, a zero-reference deep learning dehazing network is proposed. Inspired from Zero-DCE [4], the network does not rely on paired training data. The model’s training is driven by a set of specially designed loss functions to evaluate the quality of the dehazed images. Additionally, rather than recovering clear images from hazy ones using the physical model described in (1), the approach from Zero-DCE is continued, applying high-order curves for pixel-wise adjustments on the dynamic range of the hazy image. Such a design enhances the model’s generalization on real-world hazy images while simultaneously improving the recovery of details and brightness across various scenarios. Overall, the contributions of this study are as follows:

  • The first image dehazing network that does not require paired or unpaired training data was developed, thereby improving the model’s generalization on real-world hazy images.

  • High-order curves were employed for pixel-wise adjustments to dehaze, rather than relying on physical models to recover images. This demonstrates the feasibility of using high-order curves for dynamic range adjustments in image dehazing.

  • A comparison was made between the proposed method and several recently proposed image dehazing and clarification techniques. While it may not currently be classified as state-of-the-art (SOTA) in terms of natural image restoration, it performs exceptionally well in terms of fine image detail restoration and brightness enhancement. More importantly, the proposed method significantly outperforms others in terms of processing speed, making it highly efficient for real-time processing.

2 Proposed no-ref deep image dehazing ANN

In this section, the details of the proposed Fast No-reference Deep Image Dehazing network (FaNDID) are introduced, including the curve adjustments strategy, structure of the network, and zero-reference loss functions.

2.1 Network structure

Following the definition of the initial quadratic curve for image enhancement in Zero-DCE [4], our initial curve takes the following form:

$$\begin{aligned} D(x) =I(x)+\alpha (x)I(x)\left( 1-I(x) \right) \end{aligned}$$
(2)

where \(D(x)\) is the dehazed result of the input hazy image \(I(x)\), \(\alpha (x)\in [0,1]\) is the curve parameters map with the same size as the input image, which means each pixel has its own curve to adjust the intensity. Then, iteratively applying this formula gives the high-order curve:

$$\begin{aligned} D_n(x)=D_{n-1}(x)+\alpha _n(x)\left( 1-D_{n-1}(x) \right) \end{aligned}$$
(3)

where \(n\) is the iteration times. In our work, \(n\) is set to 8. This high-order curve enhances the adjustability of the dynamic range with its simple yet differentiable form.

The zero-reference dehazing network takes hazy images as input and outputs parameter maps for curve iteration. The network architecture follows the design used in Zero-DCE [4], as depicted in Fig. 1, consists of seven plain convolutional layers. In the first six layers, each employs 32 kernels of size 3 \(\times \) 3 with a stride 1 and uses the ReLU activation function. The intermediate feature maps are symmetrically connected with skip concatenation. For a regular RGB image, eight iterations are performed when applying the high-order curve in Eq. 3 for each pixel in all three channels. Therefore, the final layer comprises 24 kernels, with the tanh activation function applied to limit the output range. This layer generates eight curve parameter maps for each color channel for iterative processing. The network is lightweight, with only 79K trainable parameters.

Fig. 1
figure 1

Illustration of the structure representation and workflow of the network

2.2 No-reference loss function

The loss function plays a pivotal role in zero-reference learning. Since our network is inspired by Zero-DCE, which specializes in enhancing low-light images, a naive approach was to invert the hazy image \({\textbf {I}}(x,y)\) to treat \({\textbf {1}}-{\textbf {I}}(x,y)\) as a low-light image, subsequently enhancing its brightness before inverting the result once more. As demonstrated in Fig. 2, it’s clear that this method yields unsatisfactory outcomes. However, the loss function in Zero-DCE has been proven to play a crucial role in balancing image brightness and color. Therefore, in addition to the four loss functions originally proposed in Zero-DCE, four additional functions are incorporated into the network to assess the quality of enhanced images, thereby driving improvements in dehazing performance. To ensure efficient training, these loss functions are all differentiable and possess relatively low computational complexity.

Fig. 2
figure 2

A naive approach to implement Zero-DCE on inverted hazy images. The right image in each pair is the final inverted low-light enhanced result

Dark Channel Loss The dark channel prior is defined as the local minimum of all pixels in a patch [1]:

$$\begin{aligned} D(x) = \underset{y\in \Omega _r(x)}{\min }\ \left( \underset{c\in \{r,g,b\}}{\min } I^{c}(y) \right) \end{aligned}$$
(4)

where \(\Omega _r(x)\) denotes a local patch centered at \(x\), with size \(r\times r\). Although it fails in sky and regions with bright colors, its simplicity and effectiveness make it a viable choice as one of the loss functions for measuring the degree of dehazing in enhanced images. Calculating the dark channel involves a process similar to a sliding window, which can significantly reduce computational speed. An effective approach is to directly divide the image into blocks with the same size as patches, then calculate the minimum pixel value in each block and assume that all the pixels in each block have the same dark channel value. \(r=8\) is used in this paper. Meanwhile, we dropped out those blocks with value larger than 0.7, which is usually the dark channel of the sky region. The loss function can be denoted as:

$$\begin{aligned} L_{dc}=\frac{W_1}{M} \sum _{i=1}^{M}\underset{{\textbf {x}}\in \Omega _i}{\min x } \end{aligned}$$
(5)

where \(M\) is the number of blocks after dropping out and \(W_1\) is the weight. \(L_{dc}\) provides an efficient approximation of the image’s dark channel.

Contrast Loss Image contrast is related to total variation of the image gradient

$$\begin{aligned} C(I)=\sum _{x} \left| \nabla I(x) \right| \end{aligned}$$
(6)

where \(\nabla \) denotes the gradient operator. Through multiple experiments, it has been found that using both the first and second derivatives of an image can enhance its edges. However, the former may lead to local overexposure and darkness, while the latter can refine texture and make the image brighter. Therefore, both of them are incorporated into a contrast-enhancing loss function \(L_{ce}\) and assign weights to balance their effects:

$$\begin{aligned} L_{ce} = -W_2\ln \left[ \frac{1}{N}\! \sum _x\left| \nabla I(x)\right| \right] -W_3\ln \left[ \frac{1}{N}\!\sum _x\left| \Delta I(x)\right| \right] \nonumber \\ \end{aligned}$$
(7)

where \(N\) is the number of pixels of image \({\textbf {I}}\), \(\Delta \) stands for a \(3\times 3\) discrete Laplacian, and \(W_2\) and \(W_3\) are two weights to balance the influence between the \(\nabla \) and \(\Delta \) components in the contrast loss function. Taking the negative natural logarithm allows it to enhance the contrast and texture details by reducing the loss.

Hue loss Hue disparity has been employed in [33] for detecting hazy areas, and it is defined as the difference in the hue channel between the original image \(I\) and the semi-inverse image \(I_{si}\):

$$\begin{aligned} H(x)=\left| I^{h}(x)- I_{si}^{h}(x)\right| \end{aligned}$$
(8)

where \(h\) means the hue channel, \(I_{si}\) is the maximum pixel-wise value between the original image and its inverse:

$$\begin{aligned} I_{si}({\textbf {x}})=\underset{c\in (r,g,b)}{\max }\left[ I^{c}({\textbf {x}}),1-I^{c}({\textbf {x}}) \right] \end{aligned}$$
(9)

Due to the high pixel intensities in regions heavily affected by haze, the semi-inverse image remains identical to the original one in these regions, resulting in low hue disparity. This concept can be used as a loss function. Theoretically, this function can reduce the pixel values of hazy regions, but since it involves a comparison of hue channels, it also has an impact on the hue value of an image.

However, based on extensive validation with synthesized hazy and haze-free images, as well as the results obtained using state-of-the-art dehazing methods on real-world hazy images, it is observed that while the brightness and saturation of hazy images change significantly compared to the clear or dehazed images, the variation of hue is usually smaller, especially in areas with milder haze or in the presence of vividly colored objects within dense haze. Therefore, the research aims for an \(L_{hue}\) loss function component to predominantly impact the image brightness and saturation, while constraining its effect on hue: \(L_{hue}=\)

$$\begin{aligned} -W_4\ln \!\left[ \frac{1}{N}\!\sum _{x}\left| I^{h}(x)- I_{si}^{h}(x)\right| \right] +\frac{W_5}{N}\!\sum _{x}\left| I^{h}(x)-Y^{h}(x)\right| \nonumber \\ \end{aligned}$$
(10)

where \(Y\) is the dehazed image, \(W_4\) and \(W_5\) are two positive weights.

Brightness loss To ensure that the bright regions do not become excessively dark, an additional function is introduced. This function calculates, separately for the original image and the enhanced image, the ratio of pixels with intensity greater than 0.7 to the total number of pixels and compares these two ratios:

$$\begin{aligned} L_{bright}=W_6\frac{B_{org}-B_{en}}{N} \end{aligned}$$
(11)

Finally, the total loss is simply adding them together:

$$\begin{aligned} L_{total}=L_{dce}+L_{dc}+L_{ce}+L_{hue}+L_{bright} \end{aligned}$$
(12)

where \(L_{dce}\) is the original loss function from Zero-DCE [4].

3 Experiments

3.1 Implementation details

Dataset. RESIDE-\(\beta \) [34] provides a large number of synthetic images with varying degrees of haze. From Part 1, 2000 images with different levels of haze were selected for training. Then, 500 outdoor images from Part 4 and 500 indoor images from SOTS (another subset) were used for testing. Additionally, to evaluate the dehazing performance on real-world hazy images, the testing set also included 1000 images from RTTS and 10 images from HSTS, both subsets of RESIDE-\(\beta \) [34], as well as 100 images from RUSH [3].

After numerous experiments, the following parameter settings were implemented: \(W_1=0.2, W_2=0.3, W_3=0.5, W_4=0.1, W_5= 5, W_6=5\), yielding the best visual results. The network optimization utilized the ADAM optimizer with default parameters and a fixed learning rate of \(1e^{-4}\). The model was trained and tested using a laptop equipped with a 13th Gen Intel(R) Core(TM) i9-13900HX CPU \(@\) 2.20 GHz with 32GB RAM and an Nvidia GeForce RTX 4060 GPU with 8GB VRAM.

Fig. 3
figure 3

Ablation study of the contribution of each component in the loss function. Top row: hazy images. Second row: dehazed results when full weights applied. Bottom row: from left to right are results when a \(W_1=0\), b \(W_2=0\), c \(W_3=0\), d \(W_4=0\), e \(W_5=0\), f \(W_6=0\), respectively

3.2 Ablation study

3.2.1 Loss functions

Figure 3 illustrates the results of removing each loss function component. When \(L_{dc}\) is removed, some details within bright white backgrounds are lost, as \(L_{dc}\) originally serves to reduce the brightness in these areas, revealing details. Removing the first-order derivative component of \(L_{ce}\) leads to an overall brighter image but reduced contrast. On the other hand, \(W_3=0\) makes the image much darker and loses numerous details. When the first part of \(L_{hue}\) is omitted, the colors in regions close to the camera become lighter, leading to an unclear appearance. When the second part restricting the hue is removed, the image undergoes severe color distortion. Finally, if \(L_{bright}\) is removed, the sky region becomes too dark. Therefore, each component of the loss function plays a crucial role in the visual quality of the image.

3.2.2 Number of layers in the network

Currently, the network utilizes 7 convolutional layers, resulting in rather high-order curves for adjusting the dynamic range of the image. Further testing was conducted with fewer layers, specifically 5 and 3 layers (since the network uses symmetric skip concatenation, it is preferable to reduce two layers at a time), and the dehazed results are shown in Fig. 4. Reducing the number of layers can further improve training speeds and alleviate the oversharpening issues. However, due to the reduction in the order of the adjustment curves, the dehazing capability of the network is weakened, performing poorly in those regions with heavy haze. In contrast, when the number of layers is 7, the details of distant buildings are relatively clear.

Fig. 4
figure 4

Ablation study on the number of network layers. a Hazy image. b 3 layers. c 5 layers. d 7 layers

3.2.3 Dark channel patch size and sky threshold

When calculating the dark channel loss function, images are segmented using 8\(\times \)8 patches. The loss function uses non-overlapping patches and averages the dark channel values of the patches after threshold filtering. Therefore, the patch size not only affects the dehazing details but also influences the overall color adjustment due to the varying number of patches. As shown in Fig. 5, if a 16 \(\times \) 16 patch size is used, it means that more details might be obscured by the dark channel values. Additionally, fewer patches lead to a higher average loss value, making the dehazed image appear overly dim or color distorted, with white objects adjusted to the ambient color tone. On the other hand, reducing the patch size to 4x4 increases the number of patches, resulting in a lower initial loss value but poorer dehazing performance, with slightly sharpened images and deeper colors. The dark channels of images obtained with different patch sizes do not vary significantly in corresponding regions. Therefore, the number of patches significantly affects the initial loss value, impacting the dehazing results.

Fig. 5
figure 5

Ablation study on the patch size in dark channel loss. a is the hazy image. The patch size in b is \(4\times 4\), in c is \(8\times 8\), and in d is \(16\times 16\)

Additionally, the sky threshold is used in both the dark channel loss and brightness loss, serving to identify pixels or patches attributed to the sky region and thereby preserving its color integrity. Extensive testing established 0.7 as the optimal threshold. A lower threshold, such as 0.6, tends to obscure details in heavily hazed areas, while a higher threshold of 0.8 often results in images with darkened colors and reduced contrast, see Fig. 6. This effect is particularly pronounced in images where the sky appears grayish-white. In such cases, the pixel values in the sky region fall below the threshold, causing them to be treated as dense haze areas, which leads to excessive reduction in pixel values. Generally, a threshold of 0.7 is effective in preserving the natural color of the sky across various image characteristics.”

Fig. 6
figure 6

Ablation study on the value of sky threshold. The value in a, d is 0.6, in b, e is 0.7, and in c, f is 0.8. The corresponding hazy images can be found in Fig. 3. Zoom in for image details

Fig. 7
figure 7

Visual comparisons on outdoor synthetic hazy images from RESIDE-\(\beta \). Images in a are hazy images, dehazed by b SLP, c ROP, d ROP+, e RGCP, f TOENet, g TSDNet, h DehazeFormer i C2PNet, and j the proposed FaNDID, respectively. k is GT. Zoom in for image details

Fig. 8
figure 8

Visual comparisons on indoor synthetic hazy images from RESIDE-SOTS. Images in a are hazy images, dehazed by b SLP, c ROP, d ROP+, e RGCP, f TOENet, g TSDNet, h DehazeFormer i C2PNet, and j the proposed FaNDID, respectively. k is GT. Zoom in for image details

Fig. 9
figure 9

Visual comparisons on real-world hazy images from RESIDE-RTTS. Images in a are hazy images, dehazed by b SLP, c ROP, d ROP+, e RGCP, f TOENet, g TSDNet, h DehazeFormer, i C2PNet and j the proposed FaNDID, respectively. Zoom in for image details

Fig. 10
figure 10

Visual comparisons on real-world hazy images from RESIDE-HSTS. Images in a are hazy images, dehazed by b SLP, c ROP, d ROP+, e RGCP, f TOENet, g TSDNet, h DehazeFormer, i C2PNet and j the proposed FaNDID, respectively. Zoom in for image details

3.3 Results and discussion

3.3.1 Qualitative comparison

The dehazing performance of the proposed FaNDID on both synthetic and real-world hazy images was compared with those of state-of-the-art dehazing algorithms and models introduced in recent years. The prior-based methods include: Saturation Line Prior (SLP) [14], Rank One Prior (ROP and ROP+) [3], and Region Gradient Constrained Prior (RGCP) [15]. In terms of learning-based methods, comparisons were made with TOENet [19], TSDNet [20], DehazeFormer [22] and C2PNet [26].

Figures 7 and 8 illustrate the dehazing effects on outdoor synthetic hazy images from RESIDE-\(\beta \) [34] and indoor images from RESIDE-SOTS [34], respectively. It is evident that most learning-based methods, trained with paired data to restore images to their original haze-free state, tend to achieve results that are closer to the ground truth compared to prior-based methods. Conversely, FaNDID employs an unpaired data training strategy, focusing on enhancing visual aspects such as brightness, contrast, and detail recovery. Despite some issues with oversaturation and sharpening, FaNDID’s dehazing performance generally surpasses that of many prior-based methods.

Fig. 11
figure 11

Visual comparisons on real-world hazy images from RUSH. Images in a are hazy images, dehazed by b SLP, c ROP, d ROP+, e RGCP, f TOENet, g TSDNet, h DehazeFormer, i C2PNet and j the proposed FaNDID, respectively. Zoom in for image details

However, synthetic hazy images usually differ significantly from real-world conditions due to the complex and varying nature of real-world haze, including factors like light scattering, particle size, and distribution. Therefore, the dehazing performance and generalization on real-world hazy images should receive more attention. FaNDID was tested on three real hazy image datasets: RESIDE-RTTS and RESIDE-HSTS (the realistic images subset), both from the work of Li et al.  [34], and RUSH [3]. The results are shown in Figs. 910 and 11, respectively.

Observations indicate that among the evaluated dehazing methods, SLP [14] and all the other learning-based methods [19, 20, 22, 26] generally yield more natural-looking images. However, SLP and TOENet reduce image brightness significantly and introduce black shadows. Although TSDNet, DehazeFormer, and C2PNet perform better in dehazing without introducing black shadows, these methods do not always achieve good dehazing effects in all images, especially in low-light or heavy fog scenarios. Figure 10 is an example of this. On the other hand, ROP+ [35] and RGCP [15] excel in both dehazing efficiency and in enhancing image brightness. However, they also occasionally struggle with achieving complete haze removal and may destroy some details in the image. ROP [35], being the precursor to ROP+, demonstrates this limitation more distinctly. Notably, in the images of the second row of Fig. 9, RGCP compromises the vibrancy of the colors in pedestrians’ attire on the bridge. Meanwhile, in the third row, ROP+ tends to overemphasize the light sources, thereby masking the details of the surrounding environment. Similarly in Fig. 11, these three methods compromise the facial details of the motorcyclist in the first image, and due to excessive brightness enhancement, the background details of the forest in the third image are obscured.

Therefore, despite the proposed FaNDID introducing a mild degree of oversaturation and halo artifacts, it effectively maintains a harmonious equilibrium between image luminosity and detail preservation, offering a commendable solution in the context of dehazing challenges.

3.3.2 Quantitative comparison

Table 1 presents the referenced metrics, i.e. PSNR, SSIM, and multiscale SSIM, for dehazed results across 500 outdoor synthetic hazy images from the RESIDE-\(\beta \) dataset, As the qualitative analysis indicates, learning-based algorithms, which are trained with paired data, generally excel in restoring images closer to their original state, thereby achieving higher metrics scores. Conversely, prior-based methods, which rely on analyzing the statistical characteristics of images for dehazing, often do not perform as well on synthetic datasets due to the discrepancy between synthetic and real hazy images. FaNDID manages to bridge the gap between learning-based and prior-based methods. It effectively reduces haze and achieves PSNR and SSIM scores that are higher than those of most prior-based methods, and the Multi-scale SSIM score is also competitive. This indicates the good performance of our method in preserving details.

Tables 2 and 3 apply no-reference image quality metrics NIQE [36], PIQE [37], and BRISQUE [38] to evaluate the quality of dehazed real-world images. Here, FaNDID outperforms most algorithms in terms of PIQE, while its performance in NIQE and BRISQUE is comparatively average. This is mainly due to the current results sometimes exhibiting issues of over-sharpening and over-saturation. However, these metrics should be considered as indicative rather than definitive assessments of dehazing quality and generalizability.

Table 1 The dehazed image quality assessment of 500 outdoor synthetic hazy images in RESIDE-\(\beta \)
Table 2 The dehazed image quality assessment of 1000 hazy images in RESIDE-RTTS
Table 3 The dehazed image quality assessment of 100 hazy images in RUSH
Table 4 Run-time (s) performance
Fig. 12
figure 12

Sometimes our method may require a color amendment. a Hazy images. b Our method is applied. c ROP [35] nicely restores image colors but preserves a significant amount of haze. d Inter-image color transfer [39] is applied with the ROP images used as color donors

On the other hand, it is worth noting that the proposed method is exceptionally fast when utilizing GPU for acceleration, and the image size has almost no impact on processing speed. As listed in Table 4, it can achieve processing speeds of around 1000 FPS, making it highly suitable for real-time video dehazing. Compared to other standard deep learning methods that are trained on paired data, the model in this work is particularly lightweight, with only 79K trainable parameters, and its FLOPs are also highly competitive.

4 Limitations and future work

The proposed method is not free of drawbacks, as it may develop oversharpening effects including halo artifacts. This is also why it has not achieved the state-of-the-art (SOTA) status across all quality measures. The model aims to both dehaze and enhance the image, revealing many details in low-light conditions. However, this also results in higher overall image brightness, making the image appear less natural. This issue necessitates further optimization of the loss function components in future work. In addition, as illustrated in Fig. 12, these images have backgrounds with noticeable colors causing the network to accentuate these colors further. This can be slightly alleviated by incorporating the ROP method [35] through an inter-image color transfer scheme [39]. Figure 12 shows the results using color transformation with the results of ROP as the source.

Currently, the loss function used in this model is too complicated but the authors failed to find a simpler loss function that delivers similar or better restoration quality. This will be a primary focus for future efforts to enhance this work. Another possible direction for future work consists of learning the loss function from examples.

The lightweight network with relatively shallow depth and simple architecture is used in this study, as seen in Fig. 1. Possibly employing a lightweight network with a more sophisticated architecture would lead to a better image restoration performance without sacrificing computational speed.

5 Conclusion

This paper introduces a zero-reference deep dehazing network that doesn’t rely on paired images as a training dataset. It leverages several designed loss functions to evaluate the quality of dehazed images, driving the training process. Although it still exhibits some limitations, it outperforms other state-of-the-art methods in brightness enhancement and detail preservation. Furthermore, the network is lightweight and exceptionally fast, making it highly suitable for video dehazing applications. Promising future work includes video dehazing which requires adding temporal coherence terms to the loss function.