Keywords

1 Introduction

Images taken from the rainy conditions significantly suffer from degradation, which surely subject to blurring, color distortion and content obstruction. The visibility poor quality severely effects the performance of subsequent multimedia applications. Image deraining thus has become a vital component in the vision tasks and attracts increasing attention in the multimedia area.

In general, the deraining purpose is to recover the clear background B from the obtained rainy image O = B + R with the rain layer R. Since the background and rain layer are usually unknown, the deraining can be considered as a highly ill-posed problem theoretically. To make the problem be well solved, various algorithms have been designed for single image deraining, and previous researches can be mainly classified into two categories, including model-based and data-driven methods. Along the model-based line, prior-based model approaches treat rain removal as an optimization problem and typical methods contain Gaussian Mixture Model (GMM) [1], Discriminative Sparse Coding (DSC) [2, 3], low-rank representation [4], image decomposition [5] and filter-based deraining approaches [6,7,8,9]. The model-based ways try to make prior and subjective hypotheses on rain streaks, while those methods perform well only in some specific conditions.

Another appealing solution is a data related driven method that considers the deraining question as a non-linear function and searches the suitable parameters to apart the rainy part from the background image [10]. Inspired by the deep learning, numerous data-driven learning methods have emerged for deraining and verified remarkable restoration performance. Fu et al. [11] first employ the related network with multi-layer convolutional neural network to get and remove the rain layer, and then introduce Deep Detail Network [12] that straightly wipe off the rain streaks by decreasing the mapping range. The RESCAN [13] presents a recurrent neural network and convolutional way to apply the contextual information for single image deraining. In [14], Progressive Resnet Network (PReNet) carries out the recursive computer to effectively produce the derained images progressively. Based on the recurrent network, the work of Spatial attentive network (SPANet) [15] is able to get the spatial contextual details and obtain the spatial related information in a local-to-global manner. Jiang et al. [16] explore the multi-scale collaborative to represent the rain streaks and hierarchical deep features. In [17], the convolution dictionary is employed to represent the rain streaks and a proximal gradient descent technique is utilized to simply the deraining model. The above deep learning-based strategies, however, have evident deficiencies in utilizing the comprehensive rain information and representations. Few efforts have been used to preserve the desired fine spatial details and strong contextual information.

In this paper, we proposed a novel parallel architecture namely Multi-resolution Parallel Aggregation Network (MPA-Net), that maintains the multi-resolution feature representations and minimizes the detailed loss for single image deraining. Our main contributions are summarized as follows.

We conduct an end-to-end MPA-Net to handle the single image deraining problem, which can generate a spatially-precise and detailed output by using a novel multi-resolution parallel feature extraction structure, while receiving and consolidating rich contextual information from different scales.

To better illustrate the rain features from different scales, Cross-Scale Feature Fusion (CSFF) is first constructed to effectively exchange and combine the cross scales information, so that the rain streaks distribution can be integrated to characterize in a collaborative manner.

Comprehensive experiments are performed on six challenging datasets (4 synthetic and 2 real-world datasets) and the deraining results demonstrate that our designed method outperforms existing state-of-the-art approaches.

2 Proposed Method

We briefly introduce the proposed MPA-Net, which can properly remove the rain streaks and maximally restore the details in the rainy images. The details of MPA-Net can be described include the overall architecture, the feature extraction and fusion modules, as well as the loss function.

2.1 Overall Architecture of MPA-Net

We design an end-to-end deraining network to restore rainy image using the multi-resolution parallel framework, which is composed of Densely Connected Residual (DCR) blocks and CSFF with Squeeze-and-Excitation (SE) blocks, as illustrated in Fig. 1. In detail, MPA-Net has a parallel multi-resolution structure that the first stage deals with the original scale, and the other two use strided convolutions to down-sample the original input image into the changed scales as 1/2 and 1/4. Next the DCR blocks are involved in each parallel stage to extract and transport the image features. Using the features from different scales, the CSFF then performs the deep feature extraction and fusion after concatenating the multi-resolution feature information. Following the CSFF, SE block is added to adaptively rescale channel-wise features and strengthen the feature hierarchy encoding quality.

Fig. 1.
figure 1

The overview structure of the proposed MPA-Net framework.

Parallel Multi-resolution Pipeline is the main architecture of MPA-Net, that starts the original resolution as the first stage, progressively involves high-to-low resolution stages to produce more streams and connects all stages in parallel. Hence the later stage contains part of features from the previous stage and an extra low resolution one. As a novel feature extraction model, this structure is effectively used for extracting the fine-to-coarse features with semantically-richer as well as the coarse-to-fine one with spatially-precise feature representations. Let Ssr is the stage in the s th stage and r is the resolution index and the latter resolution is 1/2r−1 of the resolution of the first stage, and the 3 parallel example stages are given as follows:

(1)

2.2 Feature Extraction and Aggregation Module

Our proposed MPA-Net contains two basic backbone blocks: (a) Densely Connected Residual (DCR) block is employed to lead the rain feature extraction and representation, (b) Squeeze-and-Excitation (SE) block is used to aggregate different scale characteristics after CSFF.

Densely Connected Residual Block [18, 19] applies the DenseNet [20] to direct receive and transport features through all the preceding layers and utilities the Residual Net [21] to ensure the features can transport to the deeper layers in a lower computer cost. Based on the advantages of DenseNet and Residual Net, DCR block can obtain a precise negative rain feature to map the corresponding rainy image. Specifically, each DCR block consists of three convolution layers followed by leaky-ReLU with α = 0.2 as the activation function, shown in Fig. 1(b).

The Squeeze-and-Excitation Block [22] explores the spatial and channel components, seeking to improve the feature representation capability. As depicted in Fig. 1(c), the SE block is involved to effectively aggregate different scale characteristics. In squeeze step, a global embedding process carries out to exploit feature contextual information. Making full use of aggregated information, the excitation operation is applied to capture feature dependencies efficiently.

2.3 Cross-Scale Feature Fusion

As shown in deraining process, it is an efficient way to get various rain streak components by combining features from different scales. Existing deraining methods usually process each scale separately or just exchange information only in an adjacent manner. Apart from the mentioned methods, the proposed method explores a novel CSFF mechanism to fuse the comprehensive feature information. Here is a scheme showing the example of CSFF unit, which sets the third stage into 3 exchange blocks. Each CSFF unit has 3 parallel convolution layers with an exchange unit across the parallel units, which can be expressed as:

(2)

where \(S_{sr}^{b}\) denotes the convolution layer in the b th block and the sth stage for the rth resolution, and the corresponding exchange unit is \(C_{s}^{b}\).

The operation generated reliable feature representations by fusing the multi-scale features produced in the parallel stages. Mathematically, each output is a feature aggregation arriving multiple parallel streams, can be defined as:

$$ \begin{array}{*{20}l} {s_{1}^{3} = f_{1 \times 1}^{1} \left( {C\left( {s_{1,2,0} ,s_{2,2,2} ,s_{3,2,2} } \right);\eta } \right)} \hfill \\ {s_{2}^{3} = f_{1 \times 1}^{2} \left( {C\left( {s_{2,2,0} ,s_{1,2,1} ,s_{3,2,1} } \right);\mu } \right)} \hfill \\ {s_{3}^{3} = f_{1 \times 1}^{3} \left( {C\left( {s_{1,2,2} ,s_{2,2,1} ,s_{3,2,0} } \right);\tau } \right)} \hfill \\ \end{array} $$
(3)

where η, μ, τ are the hyperparameters of the network respectively, and C is the feature map.

According to the final fused features, we can formulate the derained result as:

$$ F_{out} = f_{3 \times 3} \left( {f_{1 \times 1} \left( {C\left( {s_{1}^{3} ,f\left( {s_{2}^{3} } \right)_{ \uparrow 2} ,f\left( {s_{3}^{3} } \right)_{ \uparrow 4} ,se} \right);\theta_{1} } \right);\theta_{2} } \right) + F_{0} $$
(4)

where se represents the SE block, ↑ means the up sampling. \(f_{n \times n} \left( \cdot \right)\) denotes a convolution of size \(n \times n\) and \(\left\{ {\theta_{1} ,\theta_{2} } \right\}\) indicates the hyperparameters of the network.

2.4 Loss Function

Generally, the derained output of proposed MPA-Net should be equal to the clean image in certain level. Thus, we adopt two classical loss function provides an efficient way to refine the difference between the derained image and the corresponding clean ground-truth image in the per-pixel level. Besides using per-pixel loss, LSSIM is used to value the structural similarity for the derained process. Finally, the total loss function can be formulated as:

$$ L = L_{1} + \lambda L_{SSIM} $$
(5)

where λ is the weight parameter.

3 Experimental Results

In this section, we describe the experiment datasets and implementation message in details. Then comprehensive deraining researches are employed to demonstrate the effectiveness of the designed MPA-Net against the current deraining approaches. In addition, ablation studies are conducted to validate the efficiency of our designed model.

3.1 Experiment Settings

Datasets.

We carry out rain removal experiments on four updated synthetic datasets: Rain200L/H [23], Rain800 [24], and Rain1400 [12], with numerous rain streaks of diverse sizes, shapes and directions. Besides, some real-world data are collected to assess the presentation of deraining and two related datasets are involved: the first one (called SPA-Data) that the rainy image is real and its ground truth is obtained by human labeling and multi-frame fusion [15], and the other with 167 rainy images collected by Internet. The detailed descriptions are tabulated in Table 1, together with the synthetic and real-world datasets.

Table 1. Datasets description. Values indicate the number of clean/rainy image pairs.

Setting.

The detailed architecture and parameter settings of the proposed MPA-Net are depicted in Fig. 1 using Pytorch framework. In MPA-Net, the parallel stage number is 3 with the channel dimensions of 32, 64 and 128 at the corresponding resolutions as 1, 1/2 and 1/4, respectively. In the training process, we randomly select 64 × 64 patch pairs from the training datasets as input, and the loss function weight λ is 0.2. To accelerate the training process, Adam optimization is applied with a batch size of 16, as well as the initial learning rate is 1 × 10–3, and then multiplied by 0.1 after every 25 epochs. Our model is trained with 200 epochs for the Rain200H, Rain200L, and Rain800 datasets, 100 epochs for Rain1400 datasets and 25 epochs for SPA-Data datasets. All the comparing testing experiments perform with the same datasets and hardware environment on the NVIDIA Tesla V100 GPU (16G).

3.2 Results on Synthetic Datasets

We compare our method with other five state-of-the-art image deraining methods, including RESCAN [13], PReNet [14], SPANet [15], MSPFN [16], and RCD-Net [17]. According to the ground truth in synthetic datasets, we perform the quantitative comparisons using Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity index (SSIM). As shown in the Table 2, our proposed method gets the highest values both in PSNR and SSIM, which reflect the excellent performance and robustness of MPA-Net. The notable increasing scores in Rain200H and Rain800 reveal that our model can properly restore the rainy images especially in the heavy rain with various rainy conditions.

Besides the quantitative results, we further present several challenging examples for visual observation comparisons in Fig. 2. As displayed, the RESCAN leaves too many rain streaks in the derained images, particularly in the heavy rain condition. Clearly, PReNet, SPANet and MSPFN can remove the rain streaks in most of rain cases, while there are still some rain left in the distant or complex sceneries. By observing zoomed parts of image, the main drawbacks of RCDNet are that it tends to blur the contents and fails to reconstruct the scene detail information, and these defects can also be found in the above deraining methods. In contrast, our proposed MPA-Net can deal with majority of rain streaks in diverse rain distribution with complex background. In addition, another benefit can be found is being good at restoring the detailed structure information.

Table 2. Quantitative results evaluate of average PSNR and SSIM metrics on five benchmark datasets.

3.3 Results on Real-World Datasets

For practical use, we conduct additional comparisons against other deraining related algorithms on the mentioned two real-world rainy datasets. Table 2 in the last column and Fig. 3(a) compare the results on SPA-Data of all competing methods visually and quantitatively. As the natural image are more complex, all the competing methods leave some rain streaks even in the less rain streaks condition. As expected in the SPA-Data datasets, our method still exhibits remarkable performance with the better quantitative values and less rain streaks left.

Furthermore, we choose other two challenging samples from Internet-Data. For fair comparison, all the methods employ the pre-trained model trained on the Rain200H dataset to evaluate. As shown in Fig. 3(b), the rainy picture has complicated spatial space and content with heavy rainy condition. t, all the competing methods fail to remove the rain streaks far from the camera in the complex real rainy scenarios. Zooming the color boxes, the other methods loss the details and blur the scene to certain extent. As the various light environment in Fig. 3(c), the above deraining algorithms fail to figure out the rain streaks from the complex surrounding background. It can be observed that the proposed method significantly competes others in removing the majority of rain streaks while preserving image details even in the dark surrounding and light effecting.

Fig. 2.
figure 2

Visual comparison of four synthetic examples, including (a) Rain200L, (b) Rain200H, (c) Rain800, and (d) Rain1400.

Fig. 3.
figure 3

Visual comparison of all the competing methods on two real-world datasets, including SPA-Data (a) and Internet-Data (b, c).

3.4 Ablation Studies

We study the main component impacts and parameter choices on the final performance. All the following ablation studies are completed in the same situation using the Rain200H dataset.

Parallel Multi-resolution Stages Number.

To investigate the different number influences, we implement experiments on different numbers of parallel multi-resolution stages. From the Table 3, the increased parallel stages can lead to higher SSIM and PSNR, which bring a total gain of 2.15 dB and 0.0569 over the one stage that means better deraining performances. To balance the model performances and memory, we choose stages = 3 for our MPA-Net.

Table 3. Ablation study on different number of parallel multi-resolution stages.

Feature Extraction and Aggregation Module.

We analysis the effect of feature extraction and aggregation module that consist of DCR and SE blocks. The baseline module is constructed by using convolution layers in series. As displayed in Table 4, the greatest performance can be realized by employing DCR and SE blocks both, which can verify its effectiveness in the rain removal tasks.

Table 4. Ablation study on feature extraction and aggregation module.

Feature Fusion Strategies.

We further perform an ablation analysis on feature fusion mechanism and three feature fusion methods as follows: (a) W/O scale exchange is no exchange between multi-resolution stages. (b) adjacent-scale exchange is only exchange between two adjacent stages. (c) cross-scale exchange is our designed method that fuses the features from all the stages. In general, Fig. 4 provides the visual and quantitative deraining results of three stated feature fusion strategies. As shown in Fig. 4(c), the zoomed color boxes perform better in rain removal, texture restoration and less artifact. The evaluation criteria can also reflect the CSFF get the better results under the following pictures.

Fig. 4.
figure 4

Quantitative and quantitative comparison of different feature fusion strategies, with the explanation of PSNR/SSIM in the bottom of derained images.

4 Conclusion

In this paper, we propose a multi-resolution parallel aggregation network (MPA-Net) to handle the single image deraining. An original multi-resolution parallel architecture is first utilized to extract and aggregate the multi-scale features, so that the complementary parallel streams are dedicated to spatially-precise generating and provide better contextualized features. In MPA-Net, DCR block is involved to explore the feature extraction and fully propagation. In addition, an innovative CSFF mechanism is introduced to realize comprehensive information exchange, so that the features across multi-resolution stages are progressively fused together for improved representation learning. Experimental results on synthetic and real-world rainy images both demonstrate that our method outperforms other state-of-the-art approaches considerably.