1 Introduction

The advent of low cost technology in the field of video capture systems has made it easier for various organizations to adopt surveillance technology. However, the major challenge when it comes to videos from surveillance videos is the detrimental effect of bad weather conditions. Bad weather manifests itself in videos in the form of low lighting, blurred scene content, highly saturated regions of illumination, etc. Different kinds of weather conditions affect the captured video in different ways. Research by Narasimhan and Nayar (2002) makes a broad classification of weather conditions into two, static and dynamic. The classification is based on the size of particles that cause a particular type of weather. The larger sized particles are visibly affected by gravity and hence cause a large change in the video over a sequence of frames. Static weather conditions are caused by particles that are very small (less than \(10 \upmu m\)) and manifests itself as haze, fog or cloud. In the case of dynamic weather conditions, particles are larger than \(100 \upmu m\). Rain and snow are examples of dynamic weather conditions and the changes in video over subsequent frames is very much pronounced.

In this research, the focus is on removing rain streaks from videos. The objective is to study the characteristics of rain streaks in video and thus devise a framework for accurate reconstruction of a scene by removing all the rain streaks. The process to remove rain involves two steps: (1) detect the presence of rain or locate where rain streaks appear in a frame of video, (2) estimate the actual intensity of a pixel that is affected by rain. In order to detect a rain streak, the primary challenge is to characterize rain streaks that appear in video in terms of edge information, chromatic properties and their spatio-temporal behavior. The challenges associated with scene reconstruction are edge preservation for the actual scene, maintaining temporal motion smoothness and prevent loss of useful information.

In the first part of this research, a feature based framework for detecting rain streaks is developed. In the second part, an effective method for compensation of rain affected pixels is developed. Figure 1 gives a brief overview of the complete framework for rain streak detection and removal from video.

Fig. 1
figure 1

Architecture of the proposed framework to remove rain from video

In the method for rain streak detection, frames in the video are aligned using global phase correlation to eliminate the effect of global motion in the scene caused by movement of the camera. Variations in the scene from one frame to the next are captured using phase congruency features providing a set of candidate rain pixels. The number of false detections are reduced by the application of a chromatic constraint. The final output of the rain streak detection algorithm is the set of rain streaks present in the scene along with certain false detections caused by the local motion components in the scene. The second part of the framework reconstructs the actual scene in a robust manner by eliminating the effect of false detections due to local motion and have the best possible estimate of the actual scene intensity. The algorithm utilizes information embedded in the rain affected pixel, information from its spatial neighbors and the information from temporal neighbors for the estimation process.

The proposed framework addresses the problem of removing rain by giving equal emphasis for the rain streak detection and scene reconstruction. The aim is for both parts of the framework to complement each other to have a better solution. Therefore, the constraints involved in the part for detection of rain streaks may be relaxed to make sure all the rain streaks are detected while increasing the number of false detections. The second part of the framework would make sure that the effect of false detections have no effect on the quality of the resultant video.

1.1 Previous Work

Most of the methods to remove rain that exist in the field of computer vision were reviewed by Tripathi and Mukhopadhyay (2012a). The procedure for rain streak detection has been the focus of attention for most methods in current literature.

Since rain streaks do not occlude a scene at all times, the logical step is to filter each pixel in a temporal direction to have an estimate of the intensity of actual scene. The initial approach towards removing rain was the use of temporal median filter as done by Starik and Werman (2003) and Hase et al. (1999). The method was successful when there was no motion component associated with the video, in the global or local sense. Zhang et al. (2006) exploited the spatio-temporal and chromatic properties of rain streaks to detect rain streaks in video. A temporal histogram was constructed for every pixel from which a decision was made on whether a pixel is part of rain streak or background. The main disadvantage is that the technique requires a histogram to be constructed and therefore requires at least fifty frames. Shen and Xue (2011) proposed a fast method based on optical flow that is used to detect rain. A three-dimensional anisotropic diffusion method is used to estimate the background scene intensity from the spatio-temporal neighbors. The method for detection is incomplete because it is highly probable that the optical flow components could be incorrectly calculated for small regions. Another method utilizing spatio-temporal properties of rain was proposed by Xue et al. (2012). Spatial features, wavelet features and motion constraints were combined to increase the accuracy of rain streak detection. The major disadvantage of this technique is the use of bilateral filter in order to eliminate false detections. It is highly probable that large streaks are not eliminated by the filter. The image inpainting technique used for reconstructing the scene to remove rain would be incapable of compensating for rain affected pixels in complex scenes and when heavy rain is to be removed. Park and Lee (2008) modeled the variation of intensity at a pixel in a scene using a Kalman filter. However, this method fails when motion component is present in the scene. A method based predominantly on the shape properties of rain streaks was presented by Brewer and Liu (2008). The latest research with regard to removing rain streaks was presented by Kim et al. (2013). The paper presents a method to detect rain streaks on a single image by characterizing the shape of a streak. The shape of a single streak is assumed to be an elongated ellipse. The authors used kernel regression to detect rain streaks in an image. The detected rain affected pixels are reconstructed using a non-local means filter. The method works well for well defined rain streaks with definite shape. However, streaks could be really blurred affecting the performance of the algorithm. The reconstruction technique utilizes information from the same image. In the case of heavy rainfall, this technique would not find enough information for reconstruction.

Garg and Nayar (2007) developed models for rain streaks based on the physical and photometric properties of rain drops. They used these models to detect rain streaks and to remove them from videos. The main assumption in that case was the uniform size of rain drops and the equal velocity of rain drops. The variation in depth was not taken into consideration. This became a problem while trying to remove rain from videos that contained heavy rain. The process of estimating the background intensity was not sufficient in regions with rapid movement. Barnum et al. (2010) did a frequency space analysis of rain and snow affected videos. They modeled rain and snow in the frequency space based on the statistical properties of rain and snow streaks. Each rain streak was assumed to be a blurred Gaussian. The model was unsuccessful in eliminating blurred streaks from video.

Bossu et al. (2011) segmented out candidate rain streaks from the foreground based on Gaussian Mixture Models and by applying constraints based on shape and size characteristics of rain streaks. The method utilized the property of uniform direction of rain streaks to create histogram of orientated streaks (HOS) for reducing the false detections. Tripathi and Mukhopadhyay (2011) developed a Bayesian framework to detect rain. The method is extended to remove rain from video by Tripathi and Mukhopadhyay (2012b).

A learning based method to remove rain from a single image was proposed by Kang et al. (2012a). Rain streaks were detected using a method based on Morphological Component Analysis (MCA) and dictionary learning. The method uses a bilateral filter to get the initial set of candidate rain pixels. The bilateral filter could miss large and blurred streaks. The method was extended to have a self-learning mechanism by Kang et al. (2012b). Context information in a scene was used for assisting rain streak detection as an improvement of this method by Huang et al. (2012).

In almost all the aforementioned methods, the main concentration has been on improving the process of detecting rain streaks. This would affect the ability of algorithms to remove all rain streaks present in the video, and blurred rain streaks are mostly missed by detection techniques. This motivated the development of a robust reconstruction technique proposed in this paper. By making use of a detection process that does not miss any rain streaks, more emphasis is laid on the design of the reconstruction process to compensate for rain affected pixels and preserve the details present in false detections. The following contributions are made in this paper—(1) development of a novel technique for rain streak detection based on phase congruency features, and (2) development of a robust reconstruction technique to remove rain from video by compensating for rain affected pixels.

The paper is organized as follows. Section 2 provides an overview of the characteristics of rain streaks in video. The method for detecting rain streaks is presented in Sect.  3. The proposed technique for reconstruction to remove rain streaks is given in Sect. 4. In Sect. 5, the experimental procedures are explained. The paper concludes in Sect. 6.

2 Characteristics of Rain Streaks in Video

A set of generalized characteristics can be inferred from the various models of raindrop explained in the previous section. These characteristics form the foundation for the framework developed for detecting rain streaks. In addition to the elaborate research presented by Garg and Nayar (2003) regarding properties of rain, some significant characteristics of rain are presented by Zhang et al. (2006). Some of the essential properties of rain streaks in video are explained in this section.

2.1 Temporal Property

The human eye is able to see through rain due in large part to the fact that all parts of the scene are not occluded by rain at all instances. The removal of rain streaks, that can be considered as dynamic components that vary from frame to frame, is the focus of this research. As the depth of view increases, the rain streaks are not separately visible and the image enhancement problem becomes equivalent to haze removal. Previous research by Garg and Nayar (2007) has shown that the pixel intensity increases sharply when rain occludes a scene. This is due to the fact that the resultant intensity of any raindrop on an image is the result of the radiances due to refraction, specular reflection and internal reflection. In the case of heavy rain, the intensity tends to remain high in comparison to the actual scene intensity. Therefore, more neighboring frames may be required to compensate for the rain affected pixels. This is the case where considering one frame before and after the current frame becomes insufficient to estimate the intensity of the background.

2.2 Chromatic Property

Garg and Nayar (2007) showed that a rain drop refracts a wide range of light causing an increase in intensity at a particular pixel. Zhang et al. (2006) showed that the change in levels for the individual color components of the pixel due to rain is the same. Assume that the change in color components red (R), green (G) and blue (B) is \(\Delta R\), \(\Delta G\), and \(\Delta B\) respectively. It is observed that the mean of \(\Delta R\), \(\Delta G\), and \(\Delta B\) for any spatial neighborhood is the same. For the same neighborhood, the standard deviation of \(\Delta R\), \(\Delta G\), and \(\Delta B\) is also the same.

2.3 Directional Property

Another observation that has been utilized by Garg and Nayar (2004) is the directional property of rain in videos. If rain is present in a frame, all the rain streaks will be oriented in similar direction. They computed the correlation between neighboring pixels to detect rain affected pixels. This property is used in the method for detecting rain streaks. It is embedded in the calculation of phase congruency.

3 Detecting Rain Using Phase Congruency Features

From a human visual perspective, rain streaks are sensed because of the rapid changes from frame to frame. Therefore, by finding the difference between two successive frames, it is possible to identify possible rain-affected pixels. The number of non-rain pixels detected as part of rain increases with the increase in motion component associated with the frame, either local or global. It is necessary to detect those features that are significantly altered from frame to frame in terms of visual perception. By incorporating phase based edge feature detection into the algorithm, it is observed that the detection of rain streaks could be done effectively.

3.1 Significance of Phase Information

The importance of phase information of an image was illustrated by Oppenheim and Lim (1981). In the case of a scene affected by rain, the dominant structural information between frames remains mostly the same. This helps in having phase based correlation techniques used to register two frames that are affected by rain. Previous research by Mechler et al. (2002) indicates that the human feature detection mechanism tends to be more aligned towards regions of phase congruency. Phase based information is robust towards changes in illumination as well. The local intensities act as a confidence measure for the reliability of sensing. Phase based reconstruction has been found to be much better in terms of perception. The latest research by Wadhwa et al. (2013) for magnification of videos using phase processing has achieved really good results.

In the context of rain streak detection, the spatial variations are much more localized. These localized bright streaks need to be extracted from a frame irrespective of the illumination in the local neighborhood. Therefore, phase congruency based features are used to detect rain streaks. Any kind of noise that caused intensity variations in a scene could be completely captured using phase congruency features irrespective of the local illumination component. With the help of the oriented filter in the feature computation technique, edges in a particular orientation can be isolated to utilize the directional property of rain streaks.

3.2 Phase Congruency

The principal reason that humans are able to visually recognize individual rain streaks in a particular frame is because there is a step change in intensity along the edge of the rain streak. Phase congruency (PC) is a feature detection mechanism that recognizes those edges and is invariant to illumination and contrast. The key observation that led to the development of phase congruency algorithm is that the Fourier components of an image are maximal in phase where there are edges or lines. Features are identified according to the extent to which the Fourier components are in phase.

3.2.1 Choice of Band-Pass Filter

In order to extract phase information from images, the first step is to convolve the two dimensional signal with a pair of quadrature filters. Gabor filters have been very popular as band pass filters, especially in the case of phase-based information extraction. They are efficient in providing high localization in terms of space and frequency. However, the Gabor filter is not very efficient when the information that needs to be extracted is spread over a broad spectrum and spatial localization is required. In the case of rain streaks, such is the case as proven by Barnum et al. (2010).

The log-Gabor filter by Field (1987) overcomes the aforementioned drawbacks of the Gabor filter and is adopted as the band-pass filter in this research. In frequency space, the filter can be represented as in (1).

$$\begin{aligned} G(\omega )=\exp \left( \frac{-(\log (\omega /\omega _0))^2}{2(\log (k/\omega _0))^2}\right) \end{aligned}$$
(1)

where \(\omega _0\) is the filter’s center frequency and \(k/\omega _0\) is kept constant for various \(\omega _0\). The cross-section of the transfer function of the filter can be represented as in  (2).

$$\begin{aligned} G(\theta )=\exp \left( \frac{-(\theta -\theta _0)^2}{2\sigma _\theta ^2}\right) \end{aligned}$$
(2)

where \(\theta _0\) represents the orientation of the filter and \(\sigma _\theta \) is the standard deviation of the Gaussian spreading function in the angular direction.

The log Gabor filter does not have any DC component associated with it as evident from the equation. The filter response also has an extended tail that covers a wide range of frequencies providing higher localization in space when compared to the Gabor filter.

3.2.2 Method to Compute Phase Congruency Features

The PC computation method adopted in this research was proposed by Kovesi (1999). His method was based on the local energy model developed by Morrone and Owens (1987). They observed that the point of strong phase congruency corresponds to a point of maximum energy. Let \(I(x)\) be an input periodic signal defined in \([-\pi ,\pi ]\). \(f(x)\) is the signal \(I(x)\) with no DC component and \(f_H(x)\) is the Hilbert Transform of \(f(x)\) which is a \(90^\circ \) phase shifted version of \(f(x)\). The local energy, \(E(x)\) can then be computed from \(f(x)\) and its Hilbert Transform as in (3).

$$\begin{aligned} E(x)=\sqrt{f^2(x)+f_H^2(x)} \end{aligned}$$
(3)

It has been shown in earlier research by Venkatesh and Owens (1989) that the energy is equal to the product of phase congruency \(PC\) and the sum of Fourier amplitudes \(A_n\) as in (4).

$$\begin{aligned} E(x)=PC(x)\sum _n A_n \end{aligned}$$
(4)

Therefore the peaks in phase congruency correspond to the peaks in the energy function. Equation (4) also shows that the phase congruency measure is independent of the overall magnitude of the signal, thus making the feature invariant to changes in illumination and contrast. The components, \(f(x)\) and \(f_H(x)\) are computed by the convolution of the signal with a quadrature pair of filters. Logarithmic Gabor filters are used in this case. Consider \(I(x)\) as an input signal and \(M_n^e\) and \(M_n^o\) are the even symmetric and odd symmetric components of the log Gabor function at a particular scale \(n\). \(M_n^e\) and \(M_n^o\) can be represented in frequency domain as \(\mathcal {M}_n^e\) and \(\mathcal {M}_n^o\) and is expressed as in (5).

$$\begin{aligned} \mathcal {M}_n^e&=G(\omega ) \\ \mathcal {M}_n^o&=i\text {sign}(\omega )G(\omega ) \nonumber \end{aligned}$$
(5)

where \(i=\sqrt{-1}\). Then the amplitude and phase for the input signal in the transformed domain is obtained as in (6) and  (7) where \(o_n (x)\) and \(e_n (x)\) are the responses for each quadrature pair of filters as given in (8).

$$\begin{aligned}&A_n = \sqrt{e_n^2(x)+o_n^2(x)} \end{aligned}$$
(6)
$$\begin{aligned}&\phi _n(x) = tan^{-1}\left( \frac{o_n(x)}{e_n(x)}\right) \end{aligned}$$
(7)
$$\begin{aligned}&[e_{n} (x),o_{n} (x)] = [I(x)*M_{n}^e,I(x)*M_{n}^o] \end{aligned}$$
(8)

where \('*'\) represents the convolution operation.

The values for \(f(x)\) and \(f_H(x)\) can be computed using \(e_{n} (x)\) and \(o_{n} (x)\) as shown in (9) and (10).

$$\begin{aligned}&f(x) = \sum _n e_n(x) \end{aligned}$$
(9)
$$\begin{aligned}&f_H(x) = \sum _n o_n(x) \end{aligned}$$
(10)

When the Fourier components are very small, the problem of computing phase congruency becomes ill-conditioned. This problem is solved by adding a small constant \(\varepsilon \) to the sum of Fourier components as shown in (11).

$$\begin{aligned} PC(x)=\frac{E(x)}{\varepsilon +\sum _n A_n} \end{aligned}$$
(11)

Equation (11) is the final expression to calculate phase congruency for a one dimensional signal. The computation of phase congruency features for a two-dimensional signal is as follows.

As in (8), the even symmetric and odd symmetric components at a particular scale \(n\) and orientation \(o\) can be computed as shown in (12).

$$\begin{aligned}{}[e_{no} (x,y),o_{no} (x,y)]=[I(x,y)*M_{no}^e,I(x,y)*M_{no}^o]\qquad \end{aligned}$$
(12)

The amplitude \(A_{no}\) of the response at a particular scale and orientation can be computed as in (13). For an image, the calculation of phase congruency \(PC\) is as shown in (14).

$$\begin{aligned}&A_{no} = \sqrt{e_{no}^2 (x,y)+o_{no}^2 (x,y)} \end{aligned}$$
(13)
$$\begin{aligned}&PC(x,y) = \frac{\sum _o \sqrt{(\sum _n e_{no} (x,y))^2+(\sum _n o_{no} (x,y))^2}}{\varepsilon +\sum _o \sum _n A_{no}(x,y)}\nonumber \\ \end{aligned}$$
(14)

3.3 Framework for Detection of Rain Streaks in Video

The initial framework developed by Santhaseelan and Asari (2011) for rain streak detection and removal is shown in Fig. 2.

Fig. 2
figure 2

Initial framework for rain streak detection and compensation

The algorithm starts with the difference image computation for the three color components. Phase congruency features are calculated for the difference image, after which chromatic constraints are applied to get the candidate rain pixels. The detected streaks are compensated for using temporal neighbors that are not rain streaks.

The temporal property of rain described in the previous section indicates that there will be a positive change in intensity of a rain affected pixel. In the first step (as shown in (15)), the difference image of the current frame with respect to its neighbors is computed. The difference image is computed for all the three color components separately. The neighboring frame is subtracted from the current frame. If the resultant value at a pixel is negative, it is clamped to zero. The presence of rain causes an increase in intensity. Positive differences are preserved.

$$\begin{aligned} \varDelta I(x,y,t) \!=\! \left\{ \begin{array}{ll} I(x,y,t) \!-\! I(x,y,t\!-\!1) &{} \quad \text {if} \,I(x,y,t) \!>\! I(x,y,t\!-\!1)\\ 0 &{} \quad \text {if}\, I(x,y,t) \!\le \! I(x,y,t\!-\!1) \end{array} \right. \end{aligned}$$
(15)

where \(I(x,y,t)\) represents all the color components.

The next step is to find the phase congruency features of the difference image, \(\varDelta I\) for all the color components as in (16).

$$\begin{aligned} PC(x,y,t) = \mathcal {P}(\varDelta I) \end{aligned}$$
(16)

where the function \(\mathcal {P}()\) represents the calculation of phase congruency features. Again, the features are calculated for each color component separately.

An illustration of the aforementioned steps is given in Fig. 3.

Fig. 3
figure 3

Illustration of rain streak detection. Top row original frames with rain, middle row difference image for different color components, bottom row phase congruency features for the difference images

The final result has a large amount of directional structure due to the orientation selection in the log-Gabor filter. In this case, orientation of \(90^\circ \) was selected. For most cases of rain in videos, this orientation is sufficient to capture inter-frame variations. The variation in direction of rainfall is assumed to be minimal. However, there are scenarios where wind would cause the streaks to be oriented in directions other than downward. A variation of \(-\)45 to +45 degrees would be a good range. In order to compensate for such variations, the combination of phase congruency features from two more orientations can be considered during the detection of rain streaks.

After applying phase congruency, only the candidate pixels (rain affected pixels) with intensity variations in neighboring frames remain in the processed image. Chromatic property suggests that the change in all the three color components (\(Red (R), Green (G), Blue (B)\)) will be the same in terms of the strength of the phase congruency features when rain occludes a pixel. Since the change in all the three components will be the same when there is rain occluding a pixel according to the chromatic property, the following expression (17) provides another constraint to reduce false detections.

$$\begin{aligned} PC_{red}(x,y,t) \approx PC_{green}(x,y,t) \approx PC_{blue}(x,y,t) \end{aligned}$$
(17)

where \(PC_{red}, PC_{green}\) and \(PC_{blue}\) represent the phase congruency features of the \(R, G\), and \(B\) components respectively. The difference between the three \(PC\) components are assumed to be lesser than an empirical constant (0.02 for the experiments in this research). Pixels that do not satisfy the constraint in (17) are eliminated from the set of candidate rain pixels. The result of eliminating such false detections is illustrated in Fig. 4.

Fig. 4
figure 4

Effect of applying constraint based on chromatic property. a original frame, b frame with candidate rain pixels, c frame with candidate rain pixels after removing false detections based on chromatic property

In the context of a more permissive rain streak detection model being used in conjunction with the reconstruction technique, this constraint is not very critical. However, when used with a nave reconstruction technique, the constraint seemed to provide better visual quality than without it as illustrated in Santhaseelan and Asari (2011).

In order to remove rain, the next challenge is to estimate the background intensity levels of the rain affected pixels. In order to verify the performance of the detection framework, a naive approach to scene reconstruction is adopted. A search is performed on the neighboring frames for a corresponding rain affected pixel. The value of the background intensity \(I_{bg}\) is estimated as the median of the temporal neighbors of the rain affected pixel.

Alpha-blending was used to calculate the intensity value for the rain affected pixel as shown in (18).

$$\begin{aligned} I_{no-rain}=\alpha I_{bg}+(1-\alpha )I_{rain} \end{aligned}$$
(18)

The new intensity is denoted as \(I_{no-rain}\), the background intensity is denoted as \(I_{bg}\) and the intensity of the rain-affected pixel is denoted as \(I_{rain}\). The global blending parameter is \(\alpha \), which is an empirical value that gave the best possible output in terms of visual quality. A sample frame where rain is removed is shown in Fig. 5.

Fig. 5
figure 5

Result of removing rain from a static video

It is observed in the resultant video that the proposed method is able to preserve dynamic components of the scene like variation in the pool of water. This can be attributed to the better selection of candidate rain pixels obtained by using phase congruency features.

3.4 Frame Alignment using Phase Correlation

One of the major challenges in order to detect the presence of rain is the movement of camera. During the computation of candidate rain pixels, global motion creates false detections. Therefore, it is essential that successive frames are aligned before the computation of difference image. Phase correlation (Reddy and Chatterji (1996)) is the method used to align frames as it is resilient to noise to a very large extent. Phase correlation can be used to compute the translational shift in images from the phase information. Since most of the processing is done on videos with a frame rate of 30 fps, the global motion is approximated to be translational. Other kinds of movement is canceled out during the process of reconstruction explained in the next section.

When feature based techniques like Scale Invariant Feature Transform (SIFT) (Lowe (2004)) are used for stabilization, the feature points could be on the rain streaks. Such feature points cannot be matched reliably from frame to frame. This could cause stabilization procedures to fail, especially for videos containing heavy rain. Another option for stabilization could be using region based matching techniques for stabilization. However, it is likely that rain streaks closer to the camera could cause errors in matching between frames. This problem warrants the use of a stabilization technique that has high resilience to the presence of noise. In the case of phase correlation, the similarity is accounted for in terms of the global structure.

3.5 Modified Framework for Detection of Rain Streaks in Video

The initial algorithm (shown in Fig. 2) is modified to account for movement of camera. While the basic structure of the detection framework remains the same, a pre-processing step to stabilize video frames is added. Neighboring frames are always aligned with respect to the current frame being processed to detect rain streaks. The modified framework of the algorithm is shown in Fig. 6 (Santhaseelan and Asari (2012)).

Fig. 6
figure 6

Modified algorithm for detection of rain streaks with compensation for global motion

The first step in the modified algorithm is to align neighboring frames with respect to the current frame in which rain streaks are to be detected. Then the difference image between various components is calculated. Phase congruency features are computed on the difference images. Chromatic constraints are applied to segment out the candidate rain pixels. These pixels are then compensated using information from the temporal neighbors that are aligned with the current frame as well.

In the modified algorithm, the previous frame \(I(x,y,t-1)\) is aligned with respect to the current frame \(I(x,y,t)\) and is modified into \(I_{a}(x,y,t-1)\) as in (19).

$$\begin{aligned} I_{a}(x,y,t-1) = I(x+d_1,y+d_2,t-1) \end{aligned}$$
(19)

Thus the equation for finding difference between frames changes as shown in (20).

$$\begin{aligned} \varDelta I (x,y,t)\!=\! \left\{ \begin{array}{l l} I(x,y,t) \!-\! I_{a}(x,y,t\!-\!1) &{} \quad \text {if}\,I(x,y,t) \!>\! I_{a}(x,y,t\!-\!1)\\ 0 &{} \quad \text {if}\, I(x,y,t) \le I_{a}(x,y,t\!-\!1) \end{array} \right. \end{aligned}$$
(20)

The rest of the algorithm for detection remains the same as mentioned in the previous section. In order to compensate for the rain affected pixels, the neighboring frames are aligned with respect to the current frame before estimation of the intensity of replacement pixel. The computational expense for phase correlation is very less. The reason is that the technique consists of processing in the frequency domain and has two operations to compute Fourier Transform.

An example is shown in Fig. 7 to illustrate the effectiveness of the framework.

Fig. 7
figure 7

Example frame from a video with camera movement and where rain streaks are detected. a original frame with rain, b frame with candidate rain streaks and c the final frame containing just the rain streaks

It can be observed that rain streaks are the only differences that exist after reconstruction. It can also be observed that there are no differences along vertical edges or along the outline of the person.

The algorithm for rain removal based on phase congruency has been found to be effective in situations where there are no moving objects in the scene. The presence of moving objects causes an increase in the number of false detections. Even though attempts have been made at eliminating the false detections based on local phase correlations, the quality of output video is poor as the result contains block effects. Even though the noise in individual frames appear to be diminished, temporal smoothness of the video is lost. This requires the design of a robust reconstruction algorithm that takes into account the effect of smaller streaks as well as maintaining the quality of video in terms of temporal smoothness.

4 Scene Reconstruction Based on Optical Flow of Local Phase

The observation that any part of the scene is not occluded by rain at all instances forms the basis of all reconstruction algorithms. In the case of spatial reconstruction, the assumption is that rain streaks are high frequency components in the image or frame. In some other cases, it is observed that by blending the intensity of the rain affected pixel along with the estimated actual scene intensity provides a good reconstruction of the original scene in terms of visual quality. In this research it has been observed that all the aforementioned statements hold true subject to certain constraints. This calls for the development of a technique for scene reconstruction that removes rain with minimal loss of information and maximum increase in video quality. The advantage of having a strong algorithm for reconstruction can allow for relaxation of constraints during rain streak detection.

There are three main sources of information to accurately estimate the intensity of the background (a) intensity of the rain affected pixel, (b) information from spatial neighbors, and (c) information from temporal neighbors. Within the constraints of the given sources of information, an optimized solution needs to be designed whereby the salient edges in the scene can be preserved while the temporal smoothness of video is not compromised.

4.1 Utilizing Pixel Information

As mentioned in the properties of rain, the presence of rain causes an increase in the intensity at a pixel location. Therefore, it is imperative that the estimated actual intensity of the scene does not exceed the intensity of pixel in the presence of rain which leads to the following guiding principle for the reconstruction of a scene to remove rain.

Optimization criteria 1: The intensity of the actual background cannot be greater than the intensity of the pixel with rain occluding it.

In mathematical terms the following expression (21) should hold good.

$$\begin{aligned} I_{rr}(x,y)\le I(x,y) \end{aligned}$$
(21)

where \(I_{rr}(x,y)\) is the intensity in the reconstructed scene (scene with rain removed) and \(I(x,y)\) is the original intensity of the pixel which is affected by rain.

This criterion was introduced to reduce chances of a false estimation of background intensity. Since the background intensity cannot be greater than the intensity of the pixel with rain, this would be a constraint that is easy to apply without too much computational effort. There could be instances where rain has to be removed in scenes containing a lot of textural background. The presence of textures can cause the number of false detections to increase. The aforementioned condition becomes crucial in such circumstances.

4.2 Utilizing Spatial and Temporal Information

While pixel information can be readily transformed into a constraint, the process is not trivial while using spatial and temporal neighbors. The effectiveness of both spatial and temporal information to remove rain is illustrated in Fig. 8.

Fig. 8
figure 8

Illustration of the effect of temporal and spatial compensation methods to remove rain

From a variety of experiments, it is inferred that spatial and temporal information cannot be used independently to arrive at a feasible solution. The main challenge during the course of reconstruction is to avoid deterioration in video quality due to presence of local motion in the video. The loss of quality is particularly evident along the edges. However, it is observed that the regions that have a motion component associated with it are present in subsequent frames at a different location with the same intensity pattern. Therefore, those regions can be completely registered from frame to frame causing minimal registration error. However, the same is not the case for rain streaks. The pixels containing rain cannot be registered accurately because of its rapid change in spatio-temporal domain. In light of the observations made, the following statement was set as the guiding principle in the design of the optimal solution.

Optimization criteria 2: The intensity of replacement pixel should minimize the registration error with respect to the preceding frame containing no rain.

In mathematical terms, the criteria can be used to estimate the intensity of the reconstructed scene \(I_{rr}(x,y,t)\) as in  (22).

$$\begin{aligned} I_{rr}(x,y,t) = \mathop {\hbox {argmin}}\limits _{I' \in Q} \{I'-I_{rr}(x-u,y-v,t-1)\} \end{aligned}$$
(22)

where \(Q=\{I(x,y,t),I_{k}(x,y,t)\}\), \((x,y)\) is the co-ordinates of a pixel in image space, \(t\) represents the time instant, \(I(x,y,t)\) is the intensity of the pixel which is affected by rain, \(I_{k}(x,y,t)\) is the estimate from the temporal neighbors. In this research, \(I_{k}(x,y,t)\) is computed as the median of \(k\) temporal neighbors of the pixel at \((x,y,t)\). The optical flow velocity of the pixel is denoted by \((u,v)\). By incorporating registration between frames to compensate for the rain affected pixels, the temporal smoothness can be increased.

4.3 Key Observation

The main difference between rain streaks and the movement of objects is the continuity in movement. Rain streaks that appear in one frame are not present in the next frame. The edges or other features of the object remain in the frame, albeit at a different location. During reconstruction, a registration method is required that is resilient to the presence of rain streaks in order to apply the criterion mentioned in (22). In this scenario, consider a neighborhood around a rain affected pixel. It can be observed that the neighborhood is not completely occluded by the streak. If the streak is very near to the camera and covers a large area, it would appear blurred. In the case of sharp high intensity streaks, the breadth is comparatively lesser. Based on this idea, experiments were performed in trying to track local phase components from one frame to the next which led to the following key observation.

Observation: The computation of optical flow velocity using local phase information is more resilient to the presence of rain streaks in comparison with velocities computed using intensity information.

In intensity based optical flow, the robustness of the computation of velocities depend on the size of the window being considered. If the rain streaks are sufficiently large (closer to the camera), the computation of flow velocities will be error-prone since the intensity patch could match to an entirely different region containing rain. However, the same does not hold true for local phase based flow computation. During the computation of local phase, the dominant structure in a local neighborhood is considered for the process. This neighborhood is defined indirectly by the parameters of the band pass filter. Therefore, when optical flow components are computed for a window of phase information, the net effect is that the local structure is being matched in its entirety with that of the corresponding frame. This would cause the optical flow components to be unaffected by the presence of rain. The difference in optical flow velocities using intensity and phase is illustrated as shown in Fig. 9.

Fig. 9
figure 9

Siginificance of phase based optical flow. a Original image with region of interest marked in yellow. b Optical flow components using phase based optical flow. c Optical flow vectors based on intensity

It can be observed that the velocity patterns for phase based optical flow remain the same in the region of rain streak, while that of the intensity based optical flow is completely changed.

Local phase information can be computed in multiple ways, the most prominent method of which is the use of steerable wavelet filters. However, most of the methods are only capable of generating isotropic representation of phase. In order to extract phase information from images using an anisotropic model, the monogenic signal representation by Felsberg and Sommer (2001) is used.

4.4 Monogenic Signal Representation

The analytic signal model in signal processing had enabled the extraction of phase information from signals for the one dimensional case. The isotropic extension of the analytic signal model to multiple dimensions is called the monogenic signal (Felsberg and Sommer (2001)). The implicit assumption is that the 2D signal is composed of i1D signals. An intrinsic one dimensional (i1D) signal is a signal that requires only one independent variable for its representation. Therefore, if a 2D signal were to be represented using a single i1D signal, the amplitude, phase and orientation of that i1D signal is assumed to be the local amplitude, local phase and local orientation of the 2D signal. Thus an image, \(I(\mathbf {x})\) can be represented as shown in (23).

$$\begin{aligned} I(\mathbf {x}) = A(\mathbf {x})\cos \varphi (\mathbf {x}) \end{aligned}$$
(23)

where \(\mathbf {x}=(x,y)\) is the spatial co-ordinates of the signal \(I\), \(A(\mathbf {x})\) is the local amplitude and \(\varphi (\mathbf {x})\) is the local phase of the i1D signal. The i1D signal is oriented along the local orientation, \(\theta (\mathbf {x})\).

In order to extract the i1D signal from the 2D signal, a band pass filter along with the Riesz Transform needs to be used. The combination of the band pass filter with the Riesz Transform is called the spherical quadrature filter SQF). Local amplitude, local phase and local orientation are estimated from the response of even and a pair of odd spherical quadrature filters (SQF). The even SQF is the band pass filter, which in this research is a log-Gabor filter. In the frequency domain, the transfer function of the odd pair of SQF is computed as the product of a band-pass filter and a pair of Riesz kernels. The spatial and frequency domain representation of the pair of Riesz kernels is given in (24) and (25) respectively.

$$\begin{aligned}&h_1(x,y) = \frac{x}{2\pi |\mathbf {x}|^3},\text { } h_2(x,y)=\frac{y}{2\pi |\mathbf {x}|^3}, \;\mathbf {x}=(x,y) \in \mathbb {R}^2\nonumber \\ \end{aligned}$$
(24)
$$\begin{aligned}&H_1(\omega _1,\omega _2) = -\frac{i\omega _1}{|\mathbf {\omega }|},\text { } H_2(\omega _1,\omega _2)=-\frac{i\omega _2}{|\mathbf {\omega }|}, \;\mathbf {\omega }=(\omega _1,\omega _2)\nonumber \\ \end{aligned}$$
(25)

where \((h_1,h_2)\) represents the pair of Riesz kernels in spatial domain with the corresponding frequency domain representations as \((H_1,H_2)\) and \(\mathbf {\omega }=(\omega _1,\omega _2)\) are the frequency components.

The odd set of SQFs can then be represented as in (26).

$$\begin{aligned} G_{o1}(\mathbf {\omega })&= -\frac{i\omega _1}{|\mathbf {\omega }|}G_e (\mathbf {\omega }) \nonumber \\ G_{o2}(\mathbf {\omega })&= -\frac{i\omega _2}{|\mathbf {\omega }|}G_e (\mathbf {\omega }) \end{aligned}$$
(26)

where \(G_e (\mathbf {\omega })\) is the transfer function of the log Gabor filter given in (1).

In the spatial domain, the original signal \(I(\mathbf {x})\) is convolved with the transfer function of even and odd pair of SQFs as shown in (27) to obtain the components of the monogenic signal representation \((f(\mathbf {x}),f_1(\mathbf {x}),f_2(\mathbf {x}))\).

$$\begin{aligned} f(\mathbf {x})&= I(\mathbf {x})*g_e(\mathbf {x}) \nonumber \\ f_1(\mathbf {x})&= I(\mathbf {x})*g_{o1}(\mathbf {x}) \nonumber \\ f_2(\mathbf {x})&= I(\mathbf {x})*g_{o2}(\mathbf {x}) \end{aligned}$$
(27)

where ’\(*\)’ represents the 2D convolution, \(g_e(\mathbf {x})\), \(g_{o1}(\mathbf {x})\) and \(g_{o2}(\mathbf {x})\) are the spatial domain representations of \(G_e (\mathbf {\omega })\), \(G_{o1}(\mathbf {\omega })\) and \(G_{o1}(\mathbf {\omega })\) respectively.

The local amplitude \(A(\mathbf {x})\), local phase \(\varphi (\mathbf {x})\) and local orientation \(\theta (\mathbf {x})\) can then be computed as shown in (28),  (29) and  (30) respectively.

$$\begin{aligned} A(\mathbf {x})&= \sqrt{f^2 (\mathbf {x})+ f_1^2(\mathbf {x})+ f_2^2(\mathbf {x}) } \end{aligned}$$
(28)
$$\begin{aligned} \varphi (\mathbf {x})&= \arctan \left( \frac{\sqrt{f_1^2(\mathbf {x})+f_2^2(\mathbf {x})}}{f(\mathbf {x})} \right) , \varphi \in [0,\pi ) \end{aligned}$$
(29)
$$\begin{aligned} \theta (\mathbf {x})&= \arctan \left( \frac{f_2(\mathbf {x})}{f_1(\mathbf {x})}\right) , \theta \in [0,\pi ) \end{aligned}$$
(30)

The local phase along with the local orientation can be considered to be a vector with magnitude given by the local phase \(\varphi \) and an angle given by the local orientation \(\theta \). This vector is called the phase vector \(\varPhi (\mathbf {x})\) and can be represented as in (31).

$$\begin{aligned} \varPhi (\mathbf {x})=\varphi (\mathbf {x})\eta (\mathbf {x}) \end{aligned}$$
(31)

where \(\eta (\mathbf {x})=[\cos (\theta (\mathbf {x})), \sin (\theta (\mathbf {x}))]^T\) is the unit vector along the angle given by local orientation, \(\theta (\mathbf {x})\). The phase vector \(\varPhi (\mathbf {x})\) can be represented in terms of its components as \(\varPhi (\mathbf {x})=[\varPhi _1(\mathbf {x}), \varPhi _2(\mathbf {x})]^T\).

In this research, the phase vector \(\varPhi \) is used to represent regions of the image. In the monogenic signal model, assumption is that a local image region is comprised of intrinsic 1D signals. Phase vector represents the phase and orientation of the intrinsic 1D signal in a local neighborhood. In terms of physical interpretation, the local phase captures the structural information of the object and the local orientation sheds light on the geometric information of the object. The contrast information in the image is given by the local amplitude.

Robust estimation of orientation: The estimation for local orientation could be affected by noise to a very large degree. Unser et al. (2009) proposed to have a least square estimate of the orientation based on the local neighborhood. The robust estimate is obtained by maximizing the directional Hilbert transform of the function over a neighborhood as represented by the optimization function in (32).

$$\begin{aligned} \bar{\theta }(\mathbf {x})=\mathop {\hbox {arg}\,\hbox {max}}\limits _{\theta \in [-\pi ,\pi ]}\int _{\mathbb {R}^2}v_\sigma (\mathbf {x'-x})|\mathcal {H}_\theta \{f(\mathbf {x}')\}|d\mathbf {x}' \end{aligned}$$
(32)

where \(\mathbf {x}'\) is any pixel in the local neighborhood, \(v_\sigma \) is a Gaussian kernel and \(\sigma ^2\) is its variance, \(\mathcal {H}_\theta (\cdot )\) is the directional Hilbert transform represented in the frequency domain as in (33).

$$\begin{aligned} \mathcal {H}_\theta (\omega )=\frac{\omega _1 \cos (\theta )+\omega _2 \sin (\theta )}{|\omega |} \end{aligned}$$
(33)

where \(\omega _1\) and \(\omega _2\) are the components of angular frequency, \(\omega \).

4.5 Optical Flow using Phase Information

The optical flow technique has been adopted from the method by Alessandrini et al. (2013). The authors present a multi-scale computation technique based on the flow of phase vectors from one frame to the next. In classical optical flow computation, brightness of an image region is assumed to be constant. In this case, the phase vector in a particular region is assumed to be constant. One of the main advantages of using phase is to reduce the dependence on variation in intensity due to changes in illumination.

In mathematical terms, the phase constancy assumption can be expressed as in (34).

$$\begin{aligned} \varPhi (\mathbf {x},t+1) = \varPhi (\mathbf {x}-\mathbf {d}(\mathbf {x}),t) \end{aligned}$$
(34)

where \(\mathbf {d}(\mathbf {x})=[u(\mathbf {x}),v(\mathbf {x})]\) represents the displacement made by the pixel at \(\mathbf {x}\), and \((u,v)\) is the optical flow velocity in \(x\) and \(y\) directions. If the displacement is assumed to be small, then (34) can be approximated using the Taylor series expansion as in (35).

$$\begin{aligned} \varPhi (\mathbf {x}-\mathbf {d}(\mathbf {x}),t) \approx \varPhi (\mathbf {x},t)-\mathbf {J}(\mathbf {x},t)\mathbf {d}(\mathbf {x}) \end{aligned}$$
(35)

where \(\mathbf {J}\) is the Jacobian matrix of \(\varPhi \). If a local neighborhood around a point is defined, then the local displacement \(\mathbf {d}\) can be assumed to be similar for all the pixels in the neighborhood window, \(w\). Using this assumption, a group of linear equations can be defined leading to the following expression (36).

$$\begin{aligned}&\langle \mathbf {J}\rangle _w \mathbf {d} = -\langle \varPhi _t\rangle _w, \\&\text {where }\mathbf {J}(\mathbf {x},t) = \begin{bmatrix} \varPhi _{1x}(\mathbf {x},t)&\quad \varPhi _{1y}(\mathbf {x},t) \\ \varPhi _{2x}(\mathbf {x},t)&\quad \varPhi _{2y}(\mathbf {x},t) \end{bmatrix} \nonumber \\&[\varPhi _{1}, \varPhi _{2}] = \varphi (\mathbf {x})[\cos (\theta ), \sin (\theta )] \nonumber \end{aligned}$$
(36)

In (36), \(\varPhi _{1x}=\partial \varPhi _1/\partial x\), \(\varPhi _{1y}=\partial \varPhi _1/\partial y\), \(\varPhi _{2x}=\partial \varPhi _2/\partial x\), and \(\varPhi _{2y}=\partial \varPhi _2/\partial y\). \(\varPhi _t\) is the temporal derivative of \(\varPhi \) and can be expressed as in (37).

$$\begin{aligned} \varPhi _t(\mathbf {x},t)=\varPhi (\mathbf {x},t+1) - \varPhi (\mathbf {x},t) \end{aligned}$$
(37)

Then the temporal derivative of the phase vector can be expressed in terms of SQFs as in (38), which was derived in Felsberg (2007).

$$\begin{aligned} \varPhi _t = \frac{f_t f_{R_{t+1}} - f_{R_{t}}f_{t+1} }{|f_t f_{R_{t+1}} - f_{R_{t}}f_{t+1}|} \arctan \left( \frac{|f_t f_{R_{t+1}} - f_{R_{t}}f_{t+1}| }{f_t f_{t+1} - f_{R_{t}}^T f_{R_{t+1}}}\right) \end{aligned}$$
(38)

where \(f_t\) is \(f(\mathbf {x})\) at frame \(t\), \(f_{R_t}\) is \(f_R(\mathbf {x})\) at frame \(t\), and \(f_R(\mathbf {x})=[f_1(\mathbf {x}),f_2(\mathbf {x})]^T\).

The aforementioned model for optical flow by Felsberg (2007) considered only translation of pixels. In Alessandrini et al. (2013), the case of affine flow was considered instead of a constant motion constraint that was dependent on the window size \(w\). The affine model for a window \(w\) centered at \((x_0,y_0)=(0,0)\) can be expressed as in (39).

$$\begin{aligned}&\mathbf {d}(\mathbf {x})=\mathbf {A}(\mathbf {x})\mathbf {u} \\&\hbox {where }\,\mathbf {A}= \begin{bmatrix} 1&\quad 0&\quad x&\quad y&\quad 0&\quad 0 \\ 0&\quad 1&\quad 0&\quad 0&\quad x&\quad y \end{bmatrix} \nonumber \\&\mathbf {u} =[d_{10},d_{20},d_{1x},d_{1y},d_{2x},d_{2y}]^T \nonumber \end{aligned}$$
(39)

\(d_{10}\) and \(d_{20}\) is the displacement of the center of the window. The other components are partial derivatives as given by \(d_{1x}=\partial d_1/\partial x\), \(d_{1y}=\partial d_1/\partial y\), \(d_{2x}=\partial d_2/\partial x\) and \(d_{2y}=\partial d_2/\partial y\).

By combining (39) with (36) and multiplying with \(mathbf{A}^T\), we get the equation as in (40). Equation (40) is the equivalent of the Lucas Kanade algorithm using monogenic phase vectors.

$$\begin{aligned}&\langle \mathbf {M}\rangle _w\mathbf {u}=\langle \mathbf {b}\rangle _w \\&\mathrm {where} \mathbf {M}=\mathbf {A}^T\mathbf {JA} \nonumber \\&\mathbf {b}=-\mathbf {A}^T \varPhi _t \nonumber \end{aligned}$$
(40)

From further simplification of (40), the following expressions for \(\mathbf {M}\) and \(\mathbf {b}\) can be derived.

$$\begin{aligned} \mathbf {M} \!&=\! \begin{bmatrix} \varPhi _{1x}&\quad \varPhi _{1y}&\quad x\varPhi _{1x}&\quad y\varPhi _{1x}&\quad x\varPhi _{1y}&\quad y\varPhi _{1y} \\ \varPhi _{2x}&\quad \varPhi _{2y}&\quad x\varPhi _{2x}&\quad y\varPhi _{2x}&\quad x\varPhi _{2y}&\quad y\varPhi _{2y} \\ x\varPhi _{1x}&\quad x\varPhi _{1y}&\quad x^2\varPhi _{1x}&\quad xy\varPhi _{1x}&\quad x^2\varPhi _{1y}&\quad xy\varPhi _{1y} \\ y\varPhi _{1x}&\quad y\varPhi _{1y}&\quad xy\varPhi _{1x}&\quad y^2\varPhi _{1x}&\quad xy\varPhi _{1y}&\quad y^2\varPhi _{1y} \\ x\varPhi _{2x}&\quad x\varPhi _{2y}&\quad x^2\varPhi _{2x}&\quad xy\varPhi _{2x}&\quad x^2\varPhi _{2y}&\quad xy\varPhi _{2y} \\ y\varPhi _{2x}&\quad y\varPhi _{2y}&\quad xy\varPhi _{2x}&\quad y^2\varPhi _{2x}&\quad xy\varPhi _{2y}&\quad y^2\varPhi _{2y} \end{bmatrix} \\ \mathbf {b}&=- \begin{bmatrix} \varPhi _{1t}&\varPhi _{2t}&x\varPhi _{1t}&x\varPhi _{2t}&y\varPhi _{1t}&y\varPhi _{2t} \end{bmatrix} \nonumber \end{aligned}$$
(41)

where \(\varPhi _{1x}=\partial \varPhi _1/\partial x\), \(\varPhi _{1y}=\partial \varPhi _1/\partial y\), \(\varPhi _{2x}=\partial \varPhi _2/\partial x\), \(\varPhi _{2y}=\partial \varPhi _2/\partial y\), \(\varPhi _{1t}=\partial \varPhi _1/\partial t\) and \(\varPhi _{2t}=\partial \varPhi _2/\partial t\).

The next step is to decide on what would be an ideal size for the window \(w\). In order to overcome the difficulties of too low or too high a window size, a multiscale approach for calculation of flow is followed. The basic idea is that the solution for \(\mathbf {u}\) is computed at multiple scales. The value \(\mathbf {u}\) for which a measure of residual error is minimum is considered to best estimate of optical flow. In order to compute a dense flow field, bicubic interpolation is used. The expression for residual error is given as in (42).

$$\begin{aligned} \hbox {Residual} \,\hbox {error }=\Vert \mathbf {Mu}^n - \mathbf {b}\Vert / |w| \end{aligned}$$
(42)

where \(n\) represents the scale.

4.6 Algorithm for Scene Reconstruction

This section provides a detailed explanation of the algorithm for reconstruction. As mentioned earlier, the background intensity of any pixel in a frame can be estimated using the information from the spatial and temporal neighbors.

The aim of the reconstruction algorithm is to replace the rain affected pixel with the actual intensity of the background pixel. As mentioned earlier, the temporal neighbors provide very good estimate provided no motion is associated with the pixel. Therefore, the default replacement pixel is the temporal neighbor that does not contain rain. In this research, the median of \(k\) temporal neighbors is considered to be an optimal estimate.

The crucial part of the algorithm is to decide when a candidate pixel has to be replaced by the intensity from temporal neighbors. Here, the second optimization criteria based on registering the current frame to the previous rain removed frame is utilized. The intensity of the reconstructed pixel is selected such that it reduces the difference with the registered pixel in the previous frame containing no rain. However, the pixel intensity remains unaltered if the intensity of the replacement pixel is greater than the intensity of the candidate rain pixel. That step is in accordance with the first optimization criteria. These steps are presented in Algorithm 1.

figure a

The algorithm starts with an initialization process. During the initialization, it is assumed that there is no local motion component associated with the video apart from rain streaks. Rain streaks are removed from the first \(k\) frames using a temporal median filter on the pixels that are detected to be part of rain streaks. Further processing is done on the remaining frames of video. The total number of frames is denoted as \(N\). The next step is to compute the optical flow vector from the current frame \(I(x,y,t)\) to the previous frame that does not contain rain. A pixel-wise registration error is computed. If the registration error is greater than zero, the pixel is assumed to be part of rain. The rain affected pixel is then replaced by the estimate of the pixel intensity from its temporal neighbors, \(I_k(x,y,t)\). \(I_k(x,y,t)\) is computed as the median of \(k\) temporal neighbors of the rain affected pixel. The size of the neighborhood forms a parameter for the algorithm. In the case of rapid changes in the scene, only one temporal neighbor can be considered for compensation. The rain removed intensity is denoted as \(I_{rr}(x,y,t)\).

5 Experimental Results

The previous sections described the framework for detection of rain streaks and a method to reconstruct video to remove rain streaks. The end goal of this process has two perspectives: (1) increase the quality of video in terms of visual perception of the actual scene, and (2) increase the effectiveness of other higher intelligence operations like object/face/pedestrian detection and recognition.

This section presents details of various experiments performed on different kinds of videos that contain rain. Videos from previously published research have been used to properly evaluate the performance of the proposed technique. The selected videos contain rain in varying complexities along with a variety of scene content.

5.1 Evaluation Strategies

The following techniques are adopted to verify the performance of the algorithm to remove rain.

  1. 1.

    Qualitative analysis - One of the main objectives of the algorithm to remove rain is to increase the quality of the video in terms of visual perception. This would demand some constraints on the output like no edge artifacts and better temporal smoothness. This is based on the visual assessment of the reconstructed video with respect to the original video with rain.

  2. 2.

    Quantitative analysis - The main technique for quantitative evaluation is the computation of no-reference image quality measure on the videos where rain is removed. A quality measure based on natural scene statistics called Blind Image Quality Index (BIQI) by Moorthy and Bovik (2010) is used in this research. The method uses wavelet analysis to extract features from the image, try to detect the kind of distortion present in the image using a trained classifier and then predict the distortion score using support vector regression. The prediction is based on the images that have been used to train the system. In the case of BIQI, a higher score indicates more distortion and therefore, a lower quality. The final score for image quality is on a scale of 0-100 with 100 indicating an image with the most distortion. In the context of removing rain from video, the reconstruction process should be capable of reducing spatial distortions. In the work by Barnum et al. (2010), rain streaks are considered to be a two dimensional Gaussians. It was illustrated in the work about how the assumption would be a good basis to detect and remove rain streaks in video. Following the same assumption, it can be assumed that the presence of rain streaks cause Gaussian blurring on images. The amount of Gaussian blur can be estimated using BIQI. However, the technique would be a good measure of the quality of reconstructed video only in the presence of a large number of rain streaks. In some cases, advanced image analysis algorithms can also be used for evaluation. In one of the videos, a person is walking towards the camera in rain. In this case, a face detection algorithm by Viola and Jones (2004) is employed to evaluate the performance of the algorithm.

5.2 Results and Discussion

This section provides experimental results along with some discussion. The performance of the proposed framework to remove rain is compared with the results of other state of the art research present in literature.

5.2.1 Removing Rain from Static Video

Static videos are videos where the only dynamic component in a scene is rain. Since the intensity variations are completely due to rain, the difference image computation and phase congruency calculations result in all the rain streaks in the frame. In this section, video provided by Zhang et al. (2006) is used for experimental evaluation. A qualitative comparison of the result using the proposed algorithm with the results of Zhang et al. (2006) is shown in Fig. 10.

Fig. 10
figure 10

Qualitative comparison for static video containing rain: a original frame with rain, b frame with rain removed using the method by Zhang et al. (2006), c frame with rain removed using proposed technique

It can be observed from Fig. 10 that the result using the proposed technique is similar to the result obtained by Zhang et al. (2006).

In terms of visual quality assessment by manual inspection of the reconstructed video, it was observed that the result by Zhang et al. (2006) tends to have lesser dynamic content in comparison with the output of the proposed technique. Thus the visual quality is higher for the output of Zhang et al. (2006). This difference can be attributed to the number of frames that are used to reconstruct a scene. In the method by Zhang et al. (2006), presence of rain streaks is detected from the construction of a temporal histogram for every pixel in the scene. The histogram is constructed for all the frames in the video. The intensity for scene reconstruction is also estimated from the histogram. The proposed technique uses only five neighboring frames for reconstruction. It is also observed that the proposed algorithm removes all the significant streaks with lesser number of frames for reconstruction.

The results of the proposed technique is compared with that of Zhang et al. (2006) quantitatively in terms of the distortion score for individual frames. The comparison is shown in Fig. 11.

Fig. 11
figure 11

Quantitative comparison for static videos containing rain. Comparison of quality measures of reconstructed video from proposed method with the output of method by Zhang et al. (2006)

The distortion score of the output of Zhang et al. (2006) is more than that of the proposed technique. This is due to the fact that the reconstructed scene from Zhang et al. (2006) suffers from blur induced by the rain streak removal process. The variation in distortion score of the output of Zhang et al. (2006) is very less. This can be attributed to the large number of frames used for reconstruction. The proposed technique used five temporal neighbors to remove rain from the video.

5.2.2 Removing Rain from Video with Dynamic Texture

The effectiveness of rain streak detection can be evaluated effectively when the algorithm is tested on video with dynamic textures. In this test, video provided by Garg and Nayar (2004) is used, in which the scene consists of rain falling on a pool. Rainfall causes variations in the pool that are to be segmented out from the rain streaks detected, thereby preserving the characteristics of the dynamic texture in video, which is the appearance of the pool. A qualitative comparison of the output of the proposed technique with the output of Garg and Nayar (2004) is shown in Fig. 12.

Fig. 12
figure 12

Qualitative comparison for videos with dynamic texture: a original frame with rain, b frame with rain removed using method by Garg and Nayar (2004), c frame with rain removed using proposed technique

On visual examination of the resultant videos, it was observed that the output of the proposed technique is as good as the output from Garg and Nayar (2004) in preserving the dynamic content in the scene while removing rain streaks from the video.

A quantitative comparison of algorithms in terms of the distortion score to measure image quality is shown in Fig. 13. In this experiment, quality assessment is performed on the region containing rain streaks and not the pool region. It can be observed that the performance of the proposed technique is better for most frames in comparison with the technique by Garg and Nayar (2004). The method by Garg and Nayar (2004) reduces the rain content in the scene. However, it is not successful in removing the dynamic content due to rain entirely. This could be due to the strict photometric constraints that are applied in the procedure for detection. The significant dips in distortion score for the result of proposed technique is due to the absence of any significant rain component in that particular frame or in its neighborhood.

Fig. 13
figure 13

Quantitative comparison for videos with dynamic texture. Comparison of quality measure of reconstructed video from proposed method with the output of method by Garg and Nayar (2004)

5.2.3 Removing Rain from a Video Containing Global Motion

The presence of global motion component increases the complexity of the processing to remove rain. The proposed technique consists of a step to determine the difference between consecutive frames. In order to prevent an increase in incorrect rain affect pixels, phase correlation is used to align the frames. The technique for phase correlation mentioned earlier assumes that the movement of camera from one frame to the next is purely translational. This assumption might not be true for all videos. However, the reconstruction technique is robust enough to handle such variations. The experiment presented in this sub-section verifies the robustness of the algorithm to remove rain when camera is moving. The key challenge of the reconstruction algorithm would be to preserve edge information.

In this set of experiments, a video provided by Barnum et al. (2010) is used. The video is of a man sitting in rain. Every frame contains vertical edges that need to be preserved along with the outline of the person. A sample frame where rain is removed is shown in Fig. 14.

Fig. 14
figure 14

Sample frame where rain is removed from video when camera is moving. a The original frame with rain, b the frame with rain removed

In the magnified section of the reconstructed frame, it can be observed that the rain streaks are absent. It can also be observed in the complete image that the vertical edges on the building and the outline of the person are completely preserved in the reconstructed frame when compared with the frame from original video.

The output from the proposed technique is compared with the output from Barnum et al. (2010) in Fig. 15.

Fig. 15
figure 15

Qualitative comparison for video with global motion. a Original frame with rain, b frame with rain removed by Barnum et al. (2010), c frame with rain removed using proposed technique

It can be observed that the results of the method by Barnum et al. (2010) and that of the proposed technique are similar in terms of visual quality. However, it was observed that the amount of rain that was removed by the method of Barnum et al. (2010) was lesser in comparison with the result of the proposed technique. This is due to the limitation of the frequency based model by Barnum et al. (2010) to detect streaks that cause lesser intensity variations.

Quantitative analysis is performed on the reconstructed video based on the distortion score of individual frames. The results are shown in Fig. 16.

Fig. 16
figure 16

Quantitative comparison on video with global motion. Comparison of quality measure of reconstructed video from proposed method with the quality measure of the original video with rain and the result by Barnum et al. (2010)

The complete reconstructed video by Barnum et al. (2010) was not available. Therefore, the distortion score is calculated for the available frame and that value is assumed to be a mean representation of all the frames in the reconstructed video. Since there are no significant changes in the scene content and the duration of video considered for evaluation is very less, this assumption is considered to be feasible. It is observed that the output of the proposed technique causes a decrease in distortion. The distortion score of the result by Barnum et al. (2010) tends to remain the same as that of the original video. One of the main reasons is that the amount of rain in the frame is very less. Therefore, the changes in the scene are not significant enough to cause a large variation in the distortion score.

5.2.4 Removing Rain from Video Containing Moving Objects

The most challenging part of removing rain is in the case of videos that contain objects which move between frames. The presence of local motion component is not canceled during alignment of frame using phase correlation. Such local motion causes an increase in false detections along the edges of the objects that move. Since the framework for detection of rain streaks did not address the problem of local motion, this experiment is a verification of the effectiveness of the algorithm for reconstruction. The video provided by Garg and Nayar (2004) is used for testing and analysis in this experiment. The video consists of a person, holding an umbrella, moving towards the camera in rain. There are multiple frames where the face of the person is occluded by rain streaks. There are two objects with motion components in this scene - the person and the umbrella. Therefore, edges with varying velocities are present in the same scene and need to be preserved during reconstruction.

A qualitative comparison of the results using the proposed technique with the original video and the reconstructed video using the method from Garg and Nayar (2004) is shown in Fig. 17.

Fig. 17
figure 17

Qualitative comparison for video containing moving objects. a original frame with rain, b frame with rain removed in Garg and Nayar (2004), c frame with rain removed using proposed technique

It can be observed that the method in Garg and Nayar (2004) removes rain without causing artifacts along edges. It can also be observed that the proposed technique preserves the edges as well while removing all the rain streaks.

In terms of quantitative analysis, the measures of image quality are compared for the reconstructed video using proposed technique, the original video and the result by Garg and Nayar (2004). The comparison is shown in Fig. 18.

Fig. 18
figure 18

Quantitative comparison on video with moving objects in the scene. Comparison of image quality measure of the reconstructed video using proposed technique with that of the original video and result of Garg and Nayar (2004)

It can be observed that the proposed technique reconstructed a video of much better quality than the original video. The quality in terms of the distortion score of the reconstructed video is not as good as the result by Garg and Nayar (2004). The reason for the better performance of the method by Garg and Nayar (2004) is that there is not a lot of depth in the scene. The technique by Garg and Nayar (2004) is robust when the streaks are completely in focus and narrow and there is not a lot of depth in the scene, which is the case in the video used for testing.

An automatic algorithm to detect faces by Viola and Jones (2004) is applied on the reconstructed video to further analyze the performance of the proposed algorithm. It needed to be verified whether the lesser quality in comparison with Garg and Nayar (2004) was affecting the performance of feature extraction for advanced image analysis. A sample frame showing the result of algorithm for face detection is shown in Fig. 19.

Fig. 19
figure 19

Improvement in face detection when rain is removed: a result of detection on original frame with rain, b result of detection when rain is removed using proposed algorithm

Figure 19a shows the result of the algorithm for face detection on a frame from original video where a rain streak obstructs the view of the face. It can be observed that the algorithm fails to detect the face. In Fig. 19b, the result of application of the algorithm for face detection on the reconstructed frame using the proposed technique is shown. It is observed that the face is detected, mainly because the rain streak has been completely removed.

The algorithm for face detection was applied on the resultant video from Garg and Nayar (2004) as well. The combined results are tabulated in Table 1. The number of frames where face is detected is more for the video with rain removed than the original video with rain. The number of detections of face is the same for the reconstructed video using proposed algorithm and the rain removal algorithm from Garg and Nayar (2004). This proves that even though the proposed algorithm has lesser image quality than the method by Garg and Nayar (2004) in terms of the distortion score, the proposed technique is efficient in retaining all the necessary features for advanced image analysis.

Table 1 Quantitative evaluation of rain removal based on face detection

5.2.5 More Qualitative Results

Apart from the scenarios mentioned before, a qualitative analysis was done on some other random videos as well. One of the examples is shown in Fig. 20. The video is of a street with a pedestrian moving and rain is present. The challenge is to preserve the shape of the pedestrian while removing rain from the video. From the removed rain component, it is observed that no part of the pedestrian is altered.

Fig. 20
figure 20

Removing rain on a scene with moving pedestrian: a original frame with rain, b frame with rain removed using proposed algorithm, and c rain component in the scene that was removed

Another example is of a video containing rain and the camera capturing the video is moving considerably. Vertical edges of high intensity are present in the frame that makes the rain removal process particularly challenging. The result of removing rain on the video is shown in Fig. 21.

Fig. 21
figure 21

Removing rain on a scene with vertical edges: a original frame with rain, b frame with rain removed using proposed algorithm, and c rain component in the scene that was removed

In the scene with rain removed, there is no distortion to the vertical edges in the scene. It is also observed that rain streaks have been completely removed from the frame as well.

6 Conclusion

In this paper, a novel framework to remove rain from videos was designed. The framework consisted of two parts—(1) framework for detection based on phase congruency features and (2) reconstruction of scene using optical flow estimation from local phase information. The effectiveness of the entire system was tested by applying the same on videos containing rain with varying complexities. The underlying ideas, for the algorithm to detect rain streaks, are derived from the spatio-temporal and chromatic properties of rain.

The framework to detect the location of rain streaks in frame starts with the computation of difference between two consecutive frames. This results in three difference images corresponding to the red, green and blue components. Phase congruency features are extracted from the difference images to get a set of candidate rain pixel. The directional property of rain is incorporated in the design of the feature extraction process. Chromatic constraint based on the strength of phase congruency features is applied to eliminate false detections. In order to check the effectiveness of the proposed framework for rain streak detection, temporal neighbors were used to compensate for the rain affected pixels and the scene was reconstructed. It was observed that the framework was successful in removing rain streaks where there is no motion component associated with the scene, apart from the rain streaks. The framework was modified to eliminate the presence of false detections arising due to global motion like movement of the camera. Phase correlation was used as the technique to align frames prior to the process of detection of streaks. The second part of this research focused on reconstructing the scene after detecting rain streaks. The main challenge was to make the technique robust enough to account for movement of objects present in the scene. A key observation was made that optical flow from phase information is mostly resilient to the presence of rain streaks. Based on this idea, optimization criteria were designed for robust reconstruction of the scene. The minimization function was the registration error between neighboring frames. Detailed comparisons were made with state of the art techniques qualitatively and quantitatively. The proposed technique was found to be as good as the current state of the art techniques and in some cases, even better. Quantitative evaluation was done using a no-reference image quality index called Blind Image Quality Index (BIQI).

The algorithm is being improved to include large displacement optical flow for better reconstruction of videos containing large and rapid movements. Techniques for improving the temporal consistency are also being designed. Going into the future, it would be of great value if the algorithm could be made to perform in real-time. In the age of autonomous navigation, it would be really beneficial for post-processing algorithms if the video input is devoid of rain. The current algorithm would have to be ported to a suitable implementation platform for enhanced performance.