Keywords

1 Introduction

The existing recommendation methods based on deep learning mainly address single image dehazing using human vision as an image evaluation criterion. However, computer vision has become an active research topic in the multi-field and has been widely used in many industries, including intelligent driving environment perception [14], medical image segmentation [25], human-computer interaction [31], and robot vision [28]. Additionally, it is emphasized that application scenarios of computer vision need to achieve all-weather working capabilities to ensure its safety and reliability. Since adverse conditions are unavoidable in real outdoor scenarios, stable and reliable methods are urgently required to identify and corresponding research programs are quickly provided. This research focuses on hazy scenes of computer vision.

Note that deep learning-based methods have achieved considerable success on single image dehazing in recent years. It is worth considering applying dehazing methods to the field of computer vision. The successful application of the scheme will also lead to research in other scenarios, such as rain, snow, and night. Thus, the IDDL portability issue should be analyzed because it can determine how dehazing research direction on computer vision will evolve in the future, whether the IDDL based on human vision or the new scheme of computer vision will be used. However, there has been a lack of reasonable evaluation and suggestions for dehazing methods development under the computer vision scenes.

Considering the variability and complexity of natural hazy environmental conditions, different hazy images can affect the dehazing effect of the IDDL. It will directly affect the detection effect of computer vision. To overcome these problems, we design a computer vision detection accuracy comparison including object detection and semantic segmentation, model scale, and real-time analysis.

Considering the measure the degree of the improvement of dehazing methods in object detection and semantic segmentation, we train the network with the synthetic hazy dataset for comparison. Real-world hazy datasets are greatly limited due to the difficulty of obtaining hazy images and their annotations, while the rational and scientific solution of using the synthetic fog dataset to replace the real hazy dataset [4]. The generalization and robustness of detection networks can be improved by using image dataset-based methods [24]. Additionally, synthesis approaches based on the standard optical model have been developed [19, 20], and various public datasets have been selected to synthesize hazy datasets to avoid the contingency error of a single dataset.

To provide a reasonable and reliable conclusion, this study performs a series of contrast experiments for portability evidence. The main contributions of the paper can be summarized as follows:

  • Compared the performance of the original detection network, detection network based on synthetic hazy dataset and detection network based on the different IDDL in real hazy environment.

  • Comprehensive evaluation of image dehazing methods from three aspects of detection network model improvement performance, model scale and running time.

  • Propose appropriate proportions of synthetic blurred images in the training dataset according to the object detection and semantic segmentation results of different synthetic image ratios in the dataset.

2 Related Work

Image Dehazing Methods. Since the hazy environment can cause blurring and sharpening of the visual image information, a large number of studies on dehazing algorithms have been conducted. These algorithms can be roughly categorized into three types: prior-based methods [18, 34], fusion-based methods [6, 29], and deep learning-based methods. The studies on the dehazing algorithms have been considering different dehazing scenes [23, 26]. Currently, the IDDL-based methods can achieve a better dehazing effect than the other two methods, so they have become a research hotspot in the domain of image dehazing. This paper selects six representative advanced hazing networks as validation criteria networks. For instance, Qin et al. proposed an end-to-end feature fusion attention network (FFA-Net) to restore the image information of a single image effectively [21]. Hang Dong et al. proposed a multiscale boosted dehazing network as a part of boosting and error feedback modules, which were used to boost the overall structure of the encoded decoder to restore the haze-free image progressively [7]. Zhang et al. designed an end-to-end gated context aggregation network to help to remove the Gridding artifacts and to fuse the features from different levels, thus directly restoring the final haze-free image [2]. Yu et al. developed fully end-to-end generative adversarial networks with fusion-discriminator for image dehazing, which uses frequency information as additional priors [8]. Xiao et al. introduced a trainable convolutional neural network (Grid-Net), which effectively solve the bottleneck problems of the traditional multiscale estimation methods and can reduce dehazing work [17]. Chen et al. proposed the patch-map-based hybrid learning dehazed net (PMHLD), which integrates the strategies of using a hybrid learning technique involving the patch map and a bi-attentive generative adversarial network to achieve better reconstruction [3].

Detection Networks. The object detection and semantic segmentation algorithms of CV scenes have been extensively studied. Numerous mature network models can achieve real-time detection and high accuracy in many application scenarios [10, 16]. The object detection algorithms can be divided into two categories [30]: the object detection algorithms based on the target candidate region and the object detection algorithms based on regression. Semantic segmentation algorithms can be divided into two categories [32]: the semantic segmentation algorithms that are based on region classification segmentation and the semantic segmentation algorithms that are based on pixel classification. In order to ensure the reliability and rationality of the comparison methods, we select each object detection and semantic segmentation methods. Faster RCNN [22] and YOLOv5 [13] networks are selected as representatives of the two-stage and one-stage algorithms and SegNet [1] and Mask R-CNN [11] networks are selected in this study as representatives of the segmentation algorithm based on the pixel classification and region classification to evaluate the effect of dehazing methods.

Evaluation Indexes. The image quality evaluation methods for dehazing methods can be categorized into two groups, including the subjective image quality evaluation methods and the objective image quality evaluation methods. The subjective image quality evaluation methods is use human visual perception as an evaluation standard for subjective evaluation [33]. The objective image quality evaluation methods is to obtain the evaluation results by utilizing different evaluation indexes, such as PSNR [12], SSIM [27]. However, using human vision as the criterion for judging image quality algorithms is different from computer vision. Thus, the evaluation indicators of object detection and semantic segmentation algorithms should be used to measure the image quality evaluation methods [30]. In this study, mAP@.5 and mIoU are used as quantitative indicators of object detection and semantic segmentation, respectively.

3 Methods

3.1 Dataset

To ensure the rationality of the study, we select some public datasets, including PASCAL VOC 2007 and 2012, Microsoft COCO 2017 and RTTS. This study used PASCAL VOC [9] and Microsoft COCO [5] for training and validation, and RTTS [15] are used as testing datasets to ensure the diversity and scale difference of the images and the scientific and objective results of experiments.

To ensure the evaluation authenticity of the semantic segmentation scenario, the semantic segmentation annotation is provided on the RTTS dataset using the image application LabelMe, including person and car. In addition, the PASCAL VOC and Microsoft COCO datasets are randomly divided into the training dataset (80%) and the validation dataset (20%).

3.2 Data Pre-processing

The synthetic hazy datasets can improve the generalization and detection precision of network models. Due to the difficulty of collection and annotating hazy images, the synthetic hazy on real images that depict clear-weather public image sets and leverage the synthetic data for computer vision by employing a standard optical model. The standard optical model has been widely used for dehazed and synthetic hazy images, and it can be expressed as [19, 20]:

$$\begin{aligned} I(x)=J(x)t(x)+L(1-t(x)) \end{aligned}$$
(1)

where I(x) denotes the hazy image, and J(x) denotes the clear image; x denotes a pixel position in the image, L denotes the global atmospheric light, and t(x) denotes the transmission map. In a homogeneous scenario, the transmission map can be represented as \(t(x)=e^{-\beta d(x)}\), where \(\beta \) and d(x) denote the atmosphere scattering parameter and the scene depth, respectively.

3.3 The Proposed Architecture

We have been comprehensively considered about IDDL from human vision to computer vision. Our studies focuses on three aspects, including the detection accuracy of different detection networks, the running times and model scales based on the different IDDL. The contents of the structure, which present the validation process of IDDL portability, are demonstrated in Fig. 1.

Fig. 1.
figure 1

The validation process of IDDL portability. The dehazed images and hazy images are respectively denoted as ① and ②.

4 Results

4.1 Experimental Configuration

All the experiments were performed on an Intel Xeon E5-2620 v4 @ 2.10 GHz graphics TITAN Xp/PCle/SSE2 server. The initial parameters of the network model remain in the initial state.

As a comparison of the effect of the dehazing algorithms, we select the synthetic dataset for reference and verify the reliability of using the data-driven method. The data enhancement-based method is proposed to improve the detection effect of the dehazing network by applying this method to the synthetic datasets. The conclusion is crucial for other researchers in the study of hazy scenes. For the specific network training set and testing set refer to Table 1, including the original dataset and the synthetic dataset.

Table 1. Network and dataset information.

4.2 Experimental Results and Analysis

Object Detection. To ensure the rationality and reliability of the conclusions, we selected six advanced dehazing models and two efficient object detection networks to verify the accuracy of the detection effect. The object detection results are shown in Table 2 and Table 3.

Table 2. The evaluation result of the Faster RCNN for the five-class. The three best results are marked in bold

The mAP@.5 is chosen as the main criterion for model performance comparison. Our experiments compare the effect of dehazing image detection using the different dehazing algorithms and demonstrate the improved effect of image dehazing in actual hazy scenes. Three points can be seen in Table 2 and Table 3.

  • The preprocessing method using the dehazing networks can improve the detection effect of Faster RCNN and YOLOv5.

  • Different dehazing networks have a different effect on the detection effect of the detection network, which the difference of about 3 times in maximum and minimum of detection index improved.

  • The synthetic dataset can effectively improve the detection accuracy and has the best index. Compared with the detection result of the best dehazing algorithms, this method improves the mAP@.5 by 2.81% and 1.9% on Faster RCNN and YOLOv5.

Table 3. The evaluation result of the YOLOv5 for the five-class.

Semantic Segmentation. To ensure the rationality and reliability of the conclusions, we selected six advanced dehazing models and two efficient semantic segmentation networks to verify the accuracy of the segmentation effect. The semantic segmentation results are shown in Table 4 and Table 5.

For this experiment, the mIoU is chosen as the main criterion for model performance comparison. It is of great help to the judge the effect of the segmentation effect and a commonly used evaluation index. Three points can be seen in Table 4 and Table 5.

  • Semantic segmentation is more affected by hazy images than object detection.

  • Most of the dehazing networks can improve the accuracy of semantic segmentation, and others may reduce the accuracy, such as using GCA-Net is more accurate than not using GCA-Net in SegNet and Mask RCNN.

  • The synthetic dataset can Significantly improve the semantic segmentation accuracy and has the best index. Compared with the segmentation result of the best dehazing algorithms, this method improves the mIoU by 1.87% and 1.33% on SegNet and Mask RCNN.

Table 4. The evaluation results of the SegNet on the two-class
Table 5. The evaluation results of the Mask RCNN on the two-class

5 Synthetic Data Studies

5.1 Synthetic Data Contrast and Analysis

According to the analysis of dehazing methods, we choose to compare three aspects:

  • In Sect. 4.2, the testing accuracy of the synthetic hazy method is significantly better than that of the dehazing methods.

  • The real-time requirement of a detection network plays a vital role in the actual application in the computer vision field. The running times of the six methods are analyzed, and the average single-image running times of different dehazed networks on the RRTS dataset are given in Table 6. The IDDL cannot meet the real-time detection requirement of the computer vision scenario due to the high running time cost. However, the synthetic hazy method does not increase the model running time significantly.

  • The scales of different dehazing network models are listed in Table 7. The network model with a better dehazing effect in computer vision is large in scale. A large models scale occupies storage memory and puts higher requirements on the chip process during the application process, which will increase the difficulty of model deployment. The main reason is that the dehazing requirement depends only on the image effect and ignores the scale of the dehazing model. However, the synthetic dataset method does not increase the model scale of the dehazed network.

Table 6. Comparison of the average runtimes of the six dehazing networks.
Table 7. Model scale results of the image dehazing methods.

5.2 Improved Data Comparison Results and Analysis

Fig. 2.
figure 2

The average class detection precision of differing numbers of hazy images under the Faster RCNN. The baseline represents the best detection effect of the dehazed network.

The above-presented results demonstrate that the synthetic hazy method can significantly improve the effect of object detection and semantic segmentation, generalization performance, and robustness of a detection network. Due to the synthetic dataset being composed of the synthetic hazy images instead of the original images, the percentage of hazy images in the synthetic dataset is an important influencing factor. The performances for 10 proportions of synthetic datasets based on Faster RCNN and SegNet are presented in Fig. 2 and Fig. 3. We can find that low proportions of synthetic hazy images decline the detection accuracy, and high proportions of synthetic hazy images result in a limited detection effect of the clear weather. Therefore, the proportion of synthetic images should be adjusted to about half synthetic images to achieve the best effect of object detection and semantic segmentation.

Fig. 3.
figure 3

The average class detection precision of differing numbers of hazy images under the SegNet. The baseline represents the best detection effect of the dehazed network.

6 Conclusion

This article analyzes the practical utility and portability of the IDDL from human vision to computer vision. The poor adaptability and portability of the IDDL are demonstrated according to the subjective evaluation indexes of mAP and mIoU in object detection and semantic segmentation. Compared with the synthetic hazy scheme, the dehazing network scheme has a poor detection effect. Furthermore, the IDDL is cost-consuming in terms of the model running time, occupies a certain space in the processor, and increases memory consumption. In contrast, the synthetic dataset method can achieve an excellent detection effect based on the time and model of original detection networks. Overall, the dehazing work still has a great development prospect in the computer vision scene understanding.

In future work, more methods will be studied to achieve hazy scenes in computer vision. In the view of datasets, the researcher could realize the collection of real haze images. By training the network with a large number of real haze images, the detection of haze images can be improved, and the robustness, stability, and effectiveness could also be further improved. In the view of sensors, utilizing multi-device sensors could perform efficiently in adverse weather environments, For example, the use of infrared sensors can effectively reduce the impact of hazy scenes. In the view of hardware, using high-performance hardware equipment could mitigate the concerns about the running time and scale.