1 Introduction

Lots of vision and multimedia systems usually rely on high-definition images or videos, e.g., object detection [30], object tracking [40], autonomous driving [1] and so on. However, the images and videos captured in a rainy environment usually contain significant rainy streaks, which fails most vision and multimedia tasks. Thus, it is necessary to develop algorithms that can automatically restore clear images from rainy images.

Image deraining has attracted much attention in the past years. A lot of methods have been proposed to solve this problem. The main success of these algorithms is due to the use of kinds of image priors [2, 21, 25] or deep neural networks [9, 19, 20, 32, 35]. Mathematically, a rainy image can be modeled as a linear combination of a rain streak component with a clean background image:

$$\begin{aligned} O = B + R \end{aligned}$$
(1)

where OB and R denote rainy image, clear image and rainy streaks, respectively. As only the rainy image is available, this problem is highly ill-posed.

To make this problem well proposed, numerous algorithms use prior knowledge about rainy streaks and clear images, e.g., low-rank prior [2], sparse representation [25], Gaussian mixture model [21], to constrain the solution space. Although these algorithms achieve promising performance, the prior knowledge used in these algorithms does not hold for some cases. Hence, more adaptive and efficient methods, which can deal with the problem of different rainy streaks in any case, are needed.

Motivated by the success of convolutional neural networks (CNNs) in many computer vision tasks, e.g., object detection [30], object tracking [40], semantic segmentation [24], super-resolution [5, 6], style transfer [10, 14], deblurring [33], dehazing [7, 18, 27, 34, 37,38,39], the CNNs have been developed to solve image deraining [8, 9, 19, 20, 32, 35]. These deraining methods generally model the problem as a pixel-wise image regression process which directly learns to map an input rainy image to its clean one or a negative residual map in an end-to-end trainable CNNs. Among them,  [8, 9, 19, 20] proposed different network structures by considering property of rain or rain streak feature. Different from designing network structure,  [32, 35] considered rain streak detection or estimating rain streak density into rain streak removal procedure. Although considerable progress has been made in comparison with traditional methods, existing algorithms [19, 20, 32, 35] usually use larger deep models that are less effective and efficient. For example, Fig. 1 shows the results of ours and other state-of-the-art deraining methods. We can see that other state-of-the-art methods are inefficient and our algorithm generates better deraining performance utilizing small size network depending on the shared features that can be obtained through the different levels of the pyramid network.

Moreover, spatial pyramid features have been applied to many vision tasks and achieve excellent results. The pyramid manner usually utilizes max pooling or mean pooling operation to obtain. In  [3, 22, 26, 29], these methods take advantage of the pyramid to improve the performance of corresponding visual problems. However, these algorithms have a common disadvantage that the parameters in different levels of the pyramid are independent, which enlarges the model sizes. Hence, in this paper, by the proposed pyramid-based network, we have explored the dependency between the network levels which helps to shrinkage the proposed model size.

Fig. 1
figure 1

Image deraining examples. The proposed algorithm is able to remove rain from the rainy images and generates better images with finer details. The DDN [9] does not have a large receptive field that results in residual rainy streaks. The RESCAN [20] neglects to fuse the features at different levels among layers so as to lose the image details

To overcome these problems, we develop an effective deep neural network based on the feature pyramid for image deraining. Our algorithm is motivated that the features at different image pyramid levels share similar structures. By assuming that the deep models at different pyramid levels share the same weight parameters, the proposed deep model is able to remove rain streaks and has a smaller model size. To deal with complex rainy streaks and preserve the image details, we develop a multi-stream dilation convolution (MSDC) and dense connections that can get a larger receptive field to obtain more rain streak information and maintain the important features from different levels, respectively, as shown in Fig. 1. By training in an end-to-end manner, the proposed algorithm performs favorably against state-of-the-art methods in terms of accuracy and model sizes.

The contributions of this paper are summarized as follows:

  • We propose an effective deep neural network based on the feature pyramid for image deraining, where the deep models at different pyramid levels share the same weight parameters.

  • We develop a multi-stream dilation convolution and dense connections that can maintain the important features from different levels to deal with complex rainy streaks and preserve the image details.

  • We create a synthetic rainy Pascal VOC 2012 dataset to evaluate the improvement of performance by incorporating with deraining methods for the high-level vision tasks. As we know, the rainy evaluative dataset for high-level vision is firstly synthesized by us.

  • We show that the proposed algorithm is able to remove rain streaks and preserve image details. Quantitative and qualitative experimental evaluations on both synthetic datasets and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art methods.

2 Related work

In this section, we present a brief review of the recent related works.

2.1 Single image deraining

As aforementioned, the single image deraining methods can be grouped into two categories: prior-based methods and deep learning-based methods.

Prior-Based Methods: Prior-based methods are the previous pioneers of the deraining problem. Kang et al. [15] assumed that rain streaks were a high-frequency structure and separated the rain streaks by utilizing sparse coding from HOG features in high-frequency layer. Kim et al. [16] directly regarded it as an image filtering problem and solved it by resorting to nonlocal mean smoothing. Luo et al. [25] proposed a discriminative sparse coding framework based on image patches and separated rain streaks from rain-free background images. Chen et al. [2] believed that the rain streak layer was of low rank and utilized generalized a low-rank model to separate rain streaks. Li et al. [21] developed a Gaussian mixture model to derain using layer prior.

Deep Learning-Based Methods: Recently, several deep learning-based deraining methods have achieved great success. Fu et al. [8, 9] are the pioneers of applying the deep learning techniques to single image deraining. They decomposed rainy images into low- and high-frequency parts and mapped high-frequency parts to rain streaks by a deep residual network, and lastly, they utilize Eq. 1 to obtain a clean image. Yang et al. [32] proposed a joint rain streak detection and removal method. Considering the hazy condition into the rainy model, they applied a dehazing–deraining–dehazing algorithm to solve the complex situation. Li et al. [19] came up with a multi-scale nonlocal enhanced encoder–decoder network that mapped rainy images to clean images via learning the residual by considering the pixel-wise attention mechanism. Li et al. [20] recurrently utilized convolutional neural networks with dilation factors and squeeze-and-excitation [11] blocks to remove heavy rain streaks. Zhang et al. [35] proposed a multi-stream densely connected convolutional neural network to guide rain streak removal by estimating rain density.

2.2 Pyramid network

Recently, the conventional spatial pyramid approaches have been combined successfully with neural network architectures to deal with various vision tasks. There are several networks based on spatial pyramid. Ranjan et al. [26] proposed a spatial pyramid network to estimate optical flow, where they utilized a coarse-to-fine manner to estimate large motions by warping one image. Different from standard minimization of an objective function at each pyramid level, they computed the flow update by training one deep neural convolutional network at per level of pyramid. Lin et al. [22] proposed a feature pyramid network for object detection that they constructed feature pyramids with marginal extra cost by using pyramidal hierarchy of deep networks. Chen et al. [3] came up with a cascaded pyramid network for multi-person pose estimation by designing the global-net and refine-net. In particular, in the deraining task, Wang et al. [4] proposed a deep pyramid model to solve the image deraining problem, where they did not consider the shared strategy, leading to bigger model size.

To sum up, although they have achieved well-pleasing performance, all of them have a common characteristic that parameters at different levels of the pyramid are independent. The feature similarity of the pyramid is not fully utilized so as to the heavy weight networks.

3 Proposed method

We develop an end-to-end convolutional neural network for single image deraining, which is a fully convolutional network that has been proved to be able to learn complex pixel-wise mappings from a large amount of input–output image pairs. The overall architecture of the proposed network is illustrated in Fig. 2. As features at different levels of the pyramid have similar structure (we discuss this in Sect. 5), we introduce shared parameter strategy to different levels of the pyramid. Each level of pyramid can learn different rain streak information so that the overall network are boosted to learn the most useful rain streak information.

Fig. 2
figure 2

Proposed network framework. The parameters are shared in the same color blocks at the same positional layer of different levels of pyramid. Inlet layer is to convert image space to feature space, i.e., \(F_{0}\). MSDC denotes multi-stream dilation convolution shown in Fig. 3. Rain layer is estimated rainy streaks, i.e., \(F_{i, rain}\)

3.1 Overall network framework

The overall network framework is shown in Fig. 2. As rainy streaks have simpler structure than clear images, the network learns the map from rainy images to rainy streaks and obtains final clean images by utilizing Eq. 1. To obtain more spatial contextual information, we develop a multi-stream dilation convolution (MSDC) as our basic component which will be introduced in detail in Sect. 3.2. Moreover, in order to boost the information flow along with features from different levels, we use dense connections to connect these layers at the same level of the pyramid. Several MSDCs and dense connections make up our network.

Mathematically, we describe this overall network as follows:

$$\begin{aligned} F_{0} = \mathrm{Conv}_{3\times 3}(O), \end{aligned}$$
(2)

where O and \(F_{0}\) denote the input of rainy image and the shallow features, respectively. \(\mathrm{Conv}_{3\times 3}\) denotes the convolution operation with the kernel size of \(3 \times 3\) pixels. This operation is to convert image space into feature space.

Then the original features at different levels of the pyramid can be acquired by using max-pooling operation on shallow features:

$$\begin{aligned} F_{i, 0} = \mathcal {P}_{i}(F_{0}), i = 1, 2, \ldots , 2^{K-1}. \end{aligned}$$
(3)

Here, \(\mathcal {P}_i\) denotes max-pooling operation, where its stride and the kernel size are i. K denotes the number of pyramid level, and \(F_{i, 0}\) denotes the input of ith level of the pyramid.

Dense connections are used to connect MSDCs densely to boost the information flow and preserve the image details:

$$\begin{aligned} F_{i, l} = \mathcal {M}_{l}(\mathrm{Conv}_{1\times 1}(\mathcal {C}[F_{i, l-1}, \ldots , F_{i, 0}])), \end{aligned}$$
(4)

where \(\mathcal {C}\) denotes the concatenation operation and \(\mathrm{Conv}_{1\times 1}\) denotes the convolution operation with kernel size of \(1 \times 1\) pixels. \(\mathcal {M}\) denotes MSDC operation in Fig. 3 and it will be described in Sect. 3.2 in details. \(F_{i, l}\) denotes the output of lth MSDC operation at ith level of pyramid and \(l = 1, 2, \ldots , L\).

The rainy streak layer \(F_{i, \mathrm{rain}}\) is obtained by cascading all MSDCs in order to obtain features at different levels:

$$\begin{aligned} F_{i, \mathrm{rain}} = \mathrm{Conv}_{1\times 1}(\mathcal {C}[F_{i, L}, \ldots , F_{i, 0}]), \end{aligned}$$
(5)

The final estimated rain streak layer \({\tilde{R}}\) is that cascades all rain streaks at different levels by upsampling to the original size of rainy image:

$$\begin{aligned} {\tilde{R}} = \mathrm{Conv}_{1\times 1}(S_{1}(F_{1, \mathrm{rain}}), \ldots , S_{i}(F_{i, \mathrm{rain}})), \end{aligned}$$
(6)

where \(S_{i}\) denotes the upsampling operation with the scale factor i.

Finally, we obtain the estimated rain-free image \({\tilde{B}}\) via Eq. 1:

$$\begin{aligned} {\tilde{B}} = O - {\tilde{R}}, \end{aligned}$$
(7)
Fig. 3
figure 3

Multi-stream dilation convolution (MSDC)

Table 1 Quantitative experiments evaluated on three synthetic datasets
Fig. 4
figure 4

Several examples in our created synthetic rainy dataset on Pascal VOC2012. They have dozens of different sizes, shapes and directions

3.2 Multi-stream dilation convolution

As the spatial contextual information is important for single image deraining [12], we use the multi-stream dilation convolution to capture the important features at different image pyramid levels. For large rainy streaks, the large receptive field is needed to capture the information, while small rainy streaks can be estimated well by a smaller receptive field. Based on this fact, we develop a multi-stream dilation convolution (MSDC) to achieve this goal. The detailed architecture is shown in Fig. 3.

Fig. 5
figure 5

An example in synthetic datasets compared with prior-based deraining methods. The proposed method generates a much clear image

Fig. 6
figure 6

Several examples in synthetic datasets compared with deep learning-based deraining methods

Fig. 7
figure 7

A real-world example compared with prior-based deraining methods. The proposed method is able to remove rain and generates a much clear image

Fig. 8
figure 8

Deraining results on real-world images. The proposed method is able to remove rain and generate much better results. Note that the DID [35] has a refinement processing after deraining. However, there still exist significant rain streaks in the restored images

Fig. 9
figure 9

More our deraining examples

The MSDC operation can be represented as:

$$\begin{aligned} D_{r} = \mathrm{Conv}_{r} (I), r = 1, 3, 5, \end{aligned}$$
(8)

where I and \(D_{r}\) denote the input feature and corresponding output, respectively. r denotes the dilation factor based on \(3 \times 3\) convolution.

To effectively learn the rainy streak information, we fuse different layers by

$$\begin{aligned} D_{k,j} = \sigma (\mathrm{Conv}_{1 \times 1}(\mathcal {C}[D_{k}, D_{j}])), \end{aligned}$$
(9)

where \(D_{k,j}\) denotes the fusion output and \(\sigma \) denotes the activation function. Here, we select LeakyReLU with \(\alpha = 0.2\) as \(\sigma \).

Finally, the output of MSDC is:

$$\begin{aligned} \mathcal {MSDC} = \mathrm{Conv}_{1 \times 1}(\mathcal {C}[D_{1, 3}, D_{3, 5}]), \end{aligned}$$
(10)

3.3 Loss function

To train the proposed network, we use the mean square errors as the loss function which is defined as:

$$\begin{aligned} \mathcal {L} = \frac{1}{HWC}\sum _{h = 1}^{H}\sum _{w = 1}^W\sum _{c = 1}^C\Vert {\tilde{B}}_{h,w,c}-{B}_{h,w,c}\Vert _{2}^{2}, \end{aligned}$$
(11)

where HW and C denote the height, width and channel number of a rain-free image, respectively; \({\tilde{B}}\) and B denote the estimated clean image and ground truth image, respectively.

Table 2 Quantitative experiments evaluated on three synthetic datasets compared with NLEDN [19]
Fig. 10
figure 10

Real-world examples compared with NLEDN [19]. We note that our method has better deraining performance than NLEDN [19], while decreasing 97% parameters

4 Experimental results

In this section, we demonstrate the effectiveness of the proposed method by conducting various experiments on three synthetic datasets and a real-world dataset. All the results are compared with six state-of-the-art methods: DSC [25] (ICCV15), LP [21] (CVPR16), DDN [9] (CVPR17), JORDER [32] (CVPR17), RESCAN [20] (ECCV18) and DID [35] (CVPR18).

4.1 Datasets and evaluation criteria

4.1.1 Synthetic datasets

We conduct deraining experiments on three widely used synthetic datasets: Rain100L [32], Rain100H [32] and Rain1200 [35]. These three datasets include various rain streaks that have different sizes, shapes and directions. Rain100H and Rain100L have 1800 images for training and 200 images for testing, respectively. Rain1200 has 12000 images for training and 1200 image for testing. It is ensured that all the testing datasets have different background images with training datasets. We select Rain100H as our ablation study analysis dataset.

4.1.2 Real-world datasets

Zhang et al. [36] and Yang et al. [32] also provide some real-world images, we use these images to evaluate the robustness on real-world images.

4.1.3 Our created rainy pascal VOC2012 dataset

We first create the rainy Pascal VOC2012 dataset to evaluate the improvement of performance by incorporating with deraining methods for the high-level vision tasks. The synthetic rainy images have dozens of different rainy streaks, including sizes, shapes and directions. Several samples are shown in Fig. 4.

4.1.4 Evaluation criteria

We use the peak signal-to-noise ratio (PSNR) [13] and structure similarity index (SSIM) [31] to evaluate the quality of the restored images on synthetic datasets. As there are no ground truth images for real-world images, we only show visual comparisons on the real-world datasets.

4.2 Experimental settings

Table 3 Ablation study on basic component
Fig. 11
figure 11

Comparisons of the results by different baseline models. The proposed method generates a much better image as shown in (e)

Table 4 Results on different number of levels and dilation convolutions

We empirically set \(L = 12\), \(K = 3\) and the number of channels be to 8. We use the LeakyReLU with \(\alpha = 0.2\) as the nonlinear activation function. We randomly crop image patches with size of \(128 \times 128\) pixels from the training image datasets as the inputs and set the mini-batch size to be 10 to train the network. The ADAM [17] optimizer is used. The learning rate is initialized to be 0.001, and it will be divided by 10 at 240K and 320K iterations. We train the network using 400K iterations on a PC with an NVIDIA GTX 1080Ti GPU. As our entire model is fully convolutional, the testing process only takes 0.024 seconds when handling a test image with \(512 \times 512\) pixels on a PC with a GTX 1080Ti GPU.

4.3 Results on synthetic datasets

We compare our proposed network with six state-of-the-art methods, including two prior-based methods: DSC (ICCV15) [25] and LP (CVPR16) [21], and four deep learning-based methods: DDN (CVPR17) [9], JORDER (CVPR17) [32], RESCAN (ECCV18) [20] and DID (CVPR18)  [35]. The results are shown in Table 1 and corresponding parameters are also specified. We can observe that the results of our method with the least parameters achieve better performances comparable to that of state-of-the-art methods. Note that the parameters are reduced by 91 percent compared with the newest state-of-the-art method: DID [35]. Our network is trained without any label; meanwhile, the JORDER [32] and DID [35] use the rainy streak mask and rainy streak density as label to guide the training, respectively. We also show the results on the version of our lightweight model, i.e., \(L_{10}C_{6}\). This model has the advantage of the least parameters, while the decrease in performance is very small comparable with other state-of-the-art methods and even surpass them. Further, we also provide the version of our heavyweight model, i.e., \(L_{14}C_{10}\), it is obvious that the model outperforms all state-of-the-art methods, while the parameters are almost the same with that of DDN [9] and RESCAN [20] and much less than JORDER [32] and DID [35].

Table 5 Results on different number of channels and MSDCs

We also provide several examples as visual comparison. Figure 5 shows the results compared with prior-based methods [21, 25]. It is obvious that our result is the best and the other results are unacceptable.

We further compare with deep learning-based methods, shown in Fig. 6. It can be observed that our results, shown in Fig. 6f, always obtain clearer texture information and have less artifacts. The other results either maintain residual rain streaks, e.g., Fig. 6b and c, or leave over more artifacts, e.g., Fig. 6d and e.

4.4 Results on real-world datasets

To verify the robustness, we compare our algorithm with state-of-the-art methods on real-world datasets. Firstly, we present one example compared with prior-based methods [21, 25], illustrated in Fig. 7.

The other results hand down a mass of rainy streaks, while ours is the clearest and cleanest. Secondly, we display three examples compared with deep learning-based methods [9, 20, 32, 35], shown in Fig. 8. Our results shown in Fig. 8f have the least artifacts. For the first example, the results of JORDER [32], shown in Fig. 8b, DDN [9], shown in Fig. 8c, and DID [35], shown in Fig. 8e, have lots of residual rainy streaks, while the result of RESCAN [20], shown in Fig. 8d, leave over many artifacts. For the second example, our result almost removes all the rain streaks, while the results of DDN [9], RESCAN [20] and DID [35], shown in Fig. 8c, d and e, respectively, hand down some rain streaks. Please note that the DID [35] has a refinement processing after deraining that it can be seen as dehazing procedure, so their results look like fog-free, while there exists fog in our result, because we do not have any post-processing. For the third example, our method gains better texture information in masked boxes, while JORDER [32] and RESCAN [20] lose the detail information. Moreover, DDN [9] and DID [35] maintain some rain streaks and our method is able to remove all rain streaks to obtain the cleanest image.

We provide more our deraining examples in Fig. 9. It can be seen that our method is able to process various rain streaks to generate better rain-free images.

Table 6 Result on object detection and semantic segmentation are in the tables above and below, respectively
Fig. 12
figure 12

Visual examples on object detection and semantic segmentation in synthetic dataset

Fig. 13
figure 13

Visual examples on object detection and semantic segmentation in real-world dataset

4.5 Comparison with NLEDN

In particular, we compare our method with NLEDN [19] on synthetic datasets in Table 2. We note that our results are comparable with NLEDN, while the parameters are drastically reduced. Our model only has 32,075 parameters, while NLEDN has more than 1,000,000 parameters.

Moreover, we also provide several examples on real-world dataset compared with NLEDN [19] shown in Fig. 10. We can observe that our method is able to generate better and clearer deraining performance using only about 30,000 parameters, while NLEDN maintains a number of rain streaks and it has more than 1,000,000 parameters.

Fig. 14
figure 14

Several preprocessing examples of semantic segmentation

Fig. 15
figure 15

Several preprocessing examples of object detection

Fig. 16
figure 16

In the shared parameter condition, the visualizations of feature maps on the \(F_{1,0}, F_{2,0}\) and \(F_{4,0}\) are from top to bottom. Our method is motivated that the features at different pyramid levels share similar structures. It can be observed that they indeed have similar structures at different pyramid levels

4.6 Ablation study

As our network consists of multi-stream dilation convolution, dense connections and multi-level shared pyramid, it is meaningful to discuss their effectiveness on image deraining. For simplicity, we use the following abbreviations for the baseline methods.

  • Single: one-level pyramid network.

  • No dense: our proposed network without dense connections.

  • No dilation: our proposed network without dilation convolution.

  • Ours: our proposed shared pyramid network that is of 3 levels.

The results are shown in Table 3; compared with the one-level network, our sharing strategy improves the PSNR and SSIM by 0.31db and 1%, respectively, while the parameters have barely changed. This also demonstrates that our shared parameter strategy boosts the learning processing between different levels of the pyramid. So we believe that the shared parameter strategy is more worthy of promotion. In addition, it is observed the dense connections and dilation convolution also can be used to improve the performance. Specially, the dense connections greatly improve the expressive ability of the model. We also provide one example as visual comparison, shown in Fig. 11. Our proposed method obtains the highest PSNR and SSIM and has fewer artifacts compared with other methods.

4.7 Effect of the pyramid levels and the numbers of dilation convolution

It is worth exploring the effect of different levels of the pyramid (K) and the numbers of dilation convolution of MSDC (D). Table 4 shows the results of different selection sets of the levels of the pyramid and the numbers of dilation convolution. It can be seen that the results are almost barely changed for the same number of dilation convolution. Moreover, there are obvious changes for the same levels of the pyramid with different numbers of dilation convolution. Although the parameters are less when \(D = 2\), the results are unsatisfactory. While they have better performance when \(D = 4\), the parameters are too many. We select \(K = 3\) and \(D = 3\) as our network set, because the parameters are less and the results are well-pleasing.

4.8 Analysis on the model size

In this section, we evaluate the effect of the model size. We show the results in Table 5. It can be observed that the performance of our method is improved with the channels and the length of the network increasing, while the parameters are increased substantially. For lightweight network, i.e., \(L = 10\) and \(C = 6\), the result also is comparable with that of other state-of-the-art methods by combining with Table 1. Further, the results are far better than all state-of-the-art methods when the network has heavyweight parameters, i.e., \(L = 14\) and \(C = 10\). We select \(L = 12\) and \(C = 8\) as our network set; in this case, the network has fewer parameters and the results are satisfactory.

4.9 Applications on high-level vision tasks

Most CNN-based models for the high-level computer vision tasks, such as object detection and semantic segmentation, are trained in a good scenario. Therefore, rainy streaks will decrease the performance of these tasks. Figure 12a shows that under a rainy condition, SSD [23] fails to detect one of the planes, and FCN [28] cannot segment the planes. We created a Rainy Pascal VOC 2012 dataset, in which the images are synthesized to various rainy streaks with different sizes, directions and shapes. Incorporating our deraining method as a preprocess model for SSD and FCN, we conduct the experiments on this synthetic dataset, and the results are shown in Table 6. The mAP of SSD on the rainy validation set is 0.66, and after deraining, the mAP is improved to 0.74. And for FCN, the mIU is improved from 0.311 to 0.466 on the validation set. We also illustrate the results of detection and segmentation on synthetic and real-world images, respectively, in Figs. 12 and 13.

We also provide more examples in the applications of object detection and semantic segmentation in Figs. 14 and 15, respectively. It is obvious that these high-level tasks almost fail to work, while our method removes all of rain streaks and improves their accuracy with a large margin. So it is meaningful to develop a better deraining algorithm.

5 Visualization of feature maps

Our method is motivated by the fact that the features at different pyramid levels share similar structures. As shown in Fig. 16, it can be observed that the features at different pyramid levels indeed have the similar structures in the condition of shared parameters. This also supports our algorithm to work well and can boost the deraining by learning the features at different pyramid levels in the shared parameter condition.

6 Conclusions

In this paper, we have proposed a deep neural network based on a feature pyramid to solve image deraining. The proposed deep models at different feature pyramid levels share the same weight parameters. We further develop a multi-stream dilation convolution to deal with complex rainy streaks and propose dense connections to maintain the important features from different levels. By training in an end-to-end manner, the proposed method performs favorably against state-of-the-art deraining methods in terms of accuracy as well as model sizes.

Different loss functions may influence the deraining results and we will explore them in the future. Moreover, we only utilize simple rain model to solve the deraining tasks, while more complex rain models have been proposed and we will dig into more effective rain model and its principle to improve the deraining performance.