Keywords

1 Introduction

The existence of haze degrades the quality of the image captured by the surveillance system. Therefore, haze removal has become one of the hotspots in image processing; it can provide promising solutions for various tasks such as remote sensing, surveillance systems, and aerospace.

Originally, image dehazing was performed via enhancing image contrast to reduce the effects of haze on images. [1, 2] introduced a method to resolve the haze problem by maximizing local contrast of input images. A fast image defogging method was refined to estimate the amount of fog in an image using a locally adaptive wiener filter [3]. Subsequently, more algorithms were proposed based on the atmospheric scattering model theory. In [4], the albedo of a scene is estimated to aid in haze removal. [5,6,7,8] apply the dark channel prior to the atmospheric scattering model to calculate the transmission map. Further, algorithms have been proposed to improve the defogging by: enforcing the boundary constraint and contextual regularization [9], minimizing the nonconvex potential in the random field [10], enriching contextual information [11], or automating the defogging algorithm [12]. Recently, the prior information has been applied in dehazing to recover haze-free images. [13] proposed a method to dehaze in the same, or different, scale images using internal patches. [14] built a haze removal model of scene depth, then trained this model with a color attenuation prior. The above algorithms reduce the image blur to some extent, but the restored image still has color distortion.

Recently, a deep convolutional neural network (CNN) method has become a hotspot for image dehazing [15, 16]. [17] introduced a strategy which generated first a coarse transmission matrix and then tried to obtain an accurate transmission matrix through refinement. The Dehazenet uses hazy images to train the network for the medium transmission matrix and then uses the atmospheric scattering model to recover haze-free images [18]. Most methods are based on the transmission matrix estimation which, unfortunately, cannot generate the defogging image directly. Furthermore, existing CNN-based networks usually treat each channel function equally, and hence varing information features and important details are missing [19,20,21,22].

To resolve these problems, we consider using channel-wise information as a weight in the normal convolution layer for excellent feature extraction. In this paper, we propose an image recovery network with channel attention (RCA-Net) to extract, adaptively, channel-wise features. Firstly, the transmission model M(x) is drawn from an end-to-end network by minimizing the reconstruction errors. Secondly, the M(x) is gradually optimized through an image recovery network with channel attention. Finally, more realistic color and structural details can be drawn from the recovery network.

The remainder of this paper is organized as follows: the proposed algorithm is described in Sect. 2. The experimental results and analysis are reported in Sect. 3. Finally, the conclusion is given in Sect. 4.

2 Proposed Algorithm

In this paper, an end-to-end network with channel attention is proposed to achieve a high-level restoration of foggy images. Specifically, the atmospheric scattering model in RCA-Net has been simplified during the image restoration process. Furthermore, the channel attention model focuses on more significant information to improve haze removal.

2.1 Physical Model

An atmospheric scattering model is a foundation of defogging algorithms. The atmospheric scattering model can be expressed as

$$ I_{haze} \left( x \right) = J_{clear} \left( x \right)t\left( x \right) + A\left( {1 - t\left( x \right)} \right) $$
(1)
$$ J_{clear} \left( x \right) = \frac{1}{t\left( x \right)}I_{haze} \left( x \right) - \frac{A}{t\left( x \right)} + A $$
(2)
$$ J_{clear} \left( x \right) = M\left( x \right)I_{haze} \left( x \right) - M\left( x \right) + b $$
(3)

where, Ihaze (x) denotes observed foggy image, Jclear (x) is the fog-free image needed to be recovered.

According to the atmospheric scattering model, the final expression of M (x) is

$$ M\left( x \right) = \frac{{\frac{1}{t\left( x \right)}\left( {I_{haze} \left( x \right) - A} \right) + B}}{{I_{haze} \left( x \right) - 1}} $$
(4)

wherein the M (x) parameter is a parameter that combines A and t(x), B is the value of A-b, and the final value depends only on the model input \( I_{haze} \left( x \right) \). A and t(x) represent the global atmospheric light and the transmission matrix, respectively.

2.2 Dehaze Network Architecture

The previous models are obtained from a hazy image to the transmission matrix, but RCA-Net is the ultimate model acquired from the hazy image to the hazy-free image, which trains the end-to-end network to achieve defogging. The previous model of the defogging method from a hazy image to a transmission map is different from an end-to-end network of RCA-Net from hazy images to no hazy images. The proposed RCA-Net contains two parts as shown in Fig. 1: one part is an M(x) estimation module that estimates M(x) using five convolutional layers with channel attention model integrated. Another part is a recovery net that has 21 layers including multiplication layers and addition layers; A clean image can be restored through Eq. (3). The channel attention function is obtained through “Average pool → Conv → ReLU → Conv → Sigmoid” to get a channel-wise feature.

Fig. 1.
figure 1

Structure of recovery network with channel attention model.

The matrix M(x) is trained on NYU2 Depth Database [14]. The test image is then used as input to obtain a clear image. The architecture includes five steps in model parameters estimation, that is, the output of RCA-Net can be reconstructed in sequence through convolution feature extraction, a residual group, a channel attention model, and a residual group to the recovery image. The foggy image Ihaze(x) and the clear image Jclear(x) is the input and output of RCA-Net model respectively and the network parameters are optimized by minimizing the loss. The image processing goes through three operations, which is shown in the following equations:

$$ F_{con} \left( x \right)\, \to \,H_{\_conv} \left( {I_{haze} \left( x \right)} \right) $$
(5)
$$ F_{rd} \left( x \right)\, \to \,H_{{\_Residual}} \left( {F_{con} \left( x \right)} \right) $$
(6)
$$ F_{ca} \left( x \right)\, \to \,H_{\_channel} \left( {F_{rd} \left( x \right)} \right) $$
(7)

where \( H_{\_Conv} \) represents the convolution operation, and \( F_{con} \left( x \right) \) is then used for deep feature extraction; \( H_{\_Residual} \) denotes a deep convolution group with a long skip connection, \( F_{rd} \left( x \right) \) is the furthest feature from the residual group; \( H_{\_channel} \), the operation of channel attention function, which includes three steps: global average pooling, the sigmoid function, and ReLU function. \( F_{ca} \left( x \right) \) is the feature map obtained from the channel attention model, which is multiplied by the channel attention feature, and then added to \( F_{rd} \left( x \right) \).

The next part is the estimation of M(x), which can be given by

$$ M\left( x \right)\, \to \,H_{\_Residual} \left( {F_{ca} \left( x \right)} \right) $$
(8)
$$ J_{clear} \left( x \right) = M\left( x \right)\, * \,I_{haze} \left( x \right) - M\left( x \right) + b = H_{\_Recovery} \left( {I_{haze} \left( x \right)} \right) $$
(9)

The loss function minimizes the error between the target image and the input image used to achieve network optimization. The common loss function in deep networks are the L1 loss and the L2 loss, or a combination of L1 loss and L2 loss. This is closely related to Mean Squared Error (MSE), but the visual effect is not good enough. To resolve this problem, a perceptual loss is introduced in RCA-Net for the purpose of making the feature extraction level of SR consistent with that of the original image. In this way, we can achieve perfect visual effects and more details close to the target image.

3 Preliminary Results

3.1 Experimental Setup

The Training Process:

The weights are initialized using random numbers; the channel attention convolution group is more effective for a deep network; the decay parameter in model training is set to 0.0001;

The Database:

We use the NYU2 Depth Database [14] which includes ground-truth images and depth meta-data. We set the atmospheric light A ∈ [0.6, 1.0], and β ∈ {0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6}. For the NYU2 database, 27, 256 images are the training set and 3,170 images are the non-overlapping, called TestSet A.

3.2 Experiments and Results

To test the performance of the proposed algorithm, our method is compared with the traditional algorithms Boundary Constrained Context Regularization (BCCR) [9], Color Attenuation Prior (CAP) [14], Fast Visibility Restoration (FVR) [23], Gradient Residual Minimization (GRM) [24], Dark-Channel Prior (DCP) [4] and MSCNN [8].

Figure 2 shows the defogging results of eight sets of images. The CNN-based algorithm is better than BCCR (0.9569), CAP (0.9757), FVR (0.9622), GRM (0.9249), and DCP (0.9449). The MSCNN and the proposed method, however, have higher SSIM, which are 0.9689 and 0.9792, respectively. Figure 3 shows the PSNR of the above defogging methods. The PSNR values of ATM, BCCR, FVR, NLD are between 15 dB and 20 dB. The PSNR of MSCNN, Dehazenet and the proposed method are higher than 20 dB. Compared with other methods, RCA-Net has perfect visual performance and a little higher PSNR.

Fig. 2.
figure 2

Comparison of BCCR, CAP, FVR, GRM, MSCNN, DCP, and proposed RCA-Net

Fig. 3.
figure 3

Comparison of BCCR, ATM, FVR, NLD, MSCNN, DehazeNet and proposed RCA-Net on dehazing 800 synthetic images from the Middlebury stereo database.

Among to previous experiments, few quantitative results about the restoration quality were reported. Table 1 displays the average PSNR and SSIM results on TestSet A.

Table 1. Average PSNR and SSIM results on TestSet A.

To further compare the results, the PSNR of each method is shown in Fig. 3 on the same image. It can be seen that RCA-Net greatly improves the PSNR, which is about 1 dB higher than BCCR, ATM, FVR, and NLD. Therefore, the dehazing strategy in an appropriate model is confirmed to be more effective than other networks.

3.3 Time Consumption

The light-weight structure of RCA-Net leads to faster dehazing. We select 100 images from TestSet A to test all algorithms. All experiments are implemented in Matlab 2016a on a PC with GPU Titan V, 12 GB RAM in an Ubuntu 16.04 system. We crop each input image into patches with a size of 256 * 256 for running time comparison. The per-image average running times of all models are shown in Table 2.

Table 2. Comparison of average running time (seconds).

4 Conclusion

In this paper, we propose an end-to-end dehazing network from hazy images to clean images using the channel attention model. Through giving a weight to normal convolution, the channel attention model contributes to further feature extraction. This approach simplifies the dehazing network and can effectively keep more realistic colors and structural details. Ultimately, this network can fully retain real details from the test image, which enables a state-of-art comparison of dehazing, both quantitatively and visually.