Keywords

1 Introduction

Different weather conditions, such as haze, rain, or snow would cause unpleasing visual effects in visual data (e.g., images/videos) [1]. Such effects may significantly degrade the performances of several outdoor vision systems, such as outdoor surveillance-based object detection, tracking, and recognition, scene analysis and classification, as well as vision-assisted transportation systems and advanced driver assistance systems (ADAS) applications [2]. To cope with the problems, removal of weather effects (or the so-called deweathering) from images/videos has been recently important and received much attention [3,4,5] (e.g., dehazing, i.e., removal of haze [6,7,8,9,10], deraining, i.e., removal of rain [11,12,13,14,15,16,17,18,19,20,21,22,23,24], and desnowing, i.e., removal of snow [11, 24, 25]). To promptly apply the proper deweathering operation for an input image captured by outdoor visual devices, it is important to first correctly decide the weather condition in the image. Hence, weather image classification is essential for vision-based outdoor applications [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. Based on our explorations of the state-of-the-art approaches, a key technique in the literature consists of the three main steps. The first step is to extract the regions of interests (ROIs) from a weather image (e.g., extraction of the sky region). Then, the second stage is usually to extract some features or descriptors to represent each ROI, followed by the third step applying some classifier to achieve classification of the weather condition for this image. Such approaches may work well for images with clear and easily extracted ROIs. However, for an image without specific or easily-extracted ROI region(s) for describing the weather condition of the image, such approaches may not work well.

To achieve better weather classification performance, deep learning techniques [42,43,44] have been successfully applied to the applications of weather image classification recently [33, 36, 37, 39,40,41]. For example, a deep learning-based weather image classification framework based on AlexNet [42] was presented in [33] to classify an input weather image into one of the two classes including sunny and cloudy. Furthermore, a two-class (sunny or cloudy) weather image classification framework based on collaborative learning was presented in [36], where the data-driven convolutional neural network (CNN) feature and well-selected weather-specific features are combined. In addition, a CNN-based multi-task framework was developed in [39] which aims to concurrently tackle weather category classification task and weather-cues segmentation task. In this paper, by considering currently most popular deweathering operations, including deraining [11,12,13,14,15,16,17,18,19,20,21,22,23,24] and desnowing [11, 24, 25], we present a preprocessing framework for weather image classification by considering the three classes of rainy, snowy, and the one (e.g., sunny) for none of the two aforementioned classes. That is, our goal is to automatically online decide the weather condition for an input image captured by any outdoor sensors equipped with deweathering functionalities, and properly trigger the corresponding deweathering operation.

Inspired by the great success achieved by deep learning in numerous perceptual tasks [42,43,44], we propose to apply the inception network-based deep learning to perform weather image classification relying on GoogLeNet [43]. The main idea of the inception network [43] is based on finding out how an optimal local sparse structure in a CNN can be approximated and covered by readily available dense components. The key is to simultaneously deploy multiple convolution operations with multiple filters and pooling layers in parallel within the same layer. As a result, both of the depth and width of the network are increased while keeping the computational budget constant. On the other hand, before feeding an image into the deep network, we also study the possible impact on classification performance by applying pre-filtering operation [45] to the image for possibly facilitate to extract weather cues.

2 Proposed Inception Network-Based Weather Image Classification Framework

2.1 Problem Formulation and Preprocessing

The main goal of this paper is to learn a classifier to classify each input image to one of the three classes, including rainy, snowy, and other (or none of above). Inspired by the preprocessing of image filtering applied in several image denoising applications, such as image deraining [12,13,14,15,16,17,18,19,20,21,22,23,24] and image deblocking [46,47,48,49], we propose to first apply the low-pass filtering to an input image I to obtain the low-frequency part of I, denoted by ILF. Then, we calculate the high-frequency part of I as IHF = IILF. That is, it is expected that some weather cues, such as rain streaks or snow streaks, would be included in the high-frequency part of the image, while the other image basic components are included in the low-frequency part. Based on the suggestion of [46,47,48,49], the BM3D (block-matching and 3D filtering) algorithm [45] is selected as the low-pass filter in our method is, which is based on an enhanced sparse representation in transform domain, achieved by grouping similar 2D image fragments (blocks) into 3D data arrays. In our framework, all of the training and testing images are preprocessed via the above-mentioned filtering process to obtain their corresponding high-frequency images while the low-frequency parts are ignored. By collecting a set of N preprocessed training images \( \left\{ {x^{\left( i \right)} } \right\} \) with corresponding labels \( \left\{ {y^{\left( i \right)} } \right\} \), i = 1, 2, …, N, our goal is to learn a classifier by optimizing the cross-entropy loss function defined as:

$$ {\mathcal{L}}\left( \omega \right) = \mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{c = 1}^{C} - \varvec{I}\left\{ {y^{\left( i \right)} = c} \right\}\log \varvec{P}\left( {y^{\left( i \right)} = c|x^{\left( i \right)} ;\omega } \right) + \lambda \left\| \omega \right\|_{2} , $$
(1)

where C denotes the number of classes considered (C = 3 is used in this paper), I is an indicator function, \( y^{\left( i \right)} = c \) denotes that the i-th training image belongs to the c-th class, \( \varvec{P}\left( {y^{\left( i \right)} = c|x^{\left( i \right)} ;\omega } \right) \) is the predicted probability of the class c given the image \( x^{\left( i \right)} \), and \( \omega \) is the weighting parameter set to be learned, and \( \lambda \) is a regularization parameter.

2.2 Network Learning

To realize our inception network-based weather image classification framework, we apply GoogLeNet [43] to be the core of our method. The concept of the inception network mainly comes from the “network in network” presented in [50], which increases the representational power of neural networks with deeper nets needed for image classification purpose. In our method, we directly apply GoogLeNet with modification of output size set to be 3 (the original size of 1,000 was set for the ILSVRC2014 classification contest of 1,000 image classes). Different from the weather classification task presented in 33 with fine-tuning AlexNet 42 to achieve two-class weather classification, this paper proposes to fine-tune GoogLeNet to achieve three-class weather classification.

To train our inception network, we selected the images for training from the Rainy Image Dataset provided by [22, 23], and the Snow100K dataset provided by [25]. In addition, for the other class (not in both of rain and snow classes), we used the related images from the Multi-class Weather Image (MWI) Dataset provided by [34, 35]. Examples of training images are shown in Fig. 1. To optimize the cross-entropy loss function defined in Eq. (1). The proposed model was trained by the back-propagation algorithm with batch SGD (stochastic gradient descend) [51], such that the softmax loss is minimized.

Fig. 1.
figure 1

Examples of training images (used in the proposed deep model) of (a) rain [22, 23]; (b) snow [25]; and (c) sunny [34, 35].

3 Experimental Results

To evaluate the performance of the proposed weather image classification framework, we used the built-in pre-trained GoogLeNet deep architecture within the Caffe software of version 0.15.13 [52] on a PC equipped with Intel® Core™ Core i5-4590 processor, 12 GB memory, and NVIDIA GeForce GTX 1060 GPU. In addition, to establish our training and testing datasets, we randomly extracted 75% of the images from our collected images (from Rainy Image Dataset, [22, 23], Snow100 K dataset [25], and MWI Dataset [34, 35]) for training our deep model, and the rest 25% images were used for testing. The process was performed several times to obtain the final classification accuracy. During the training process, the learning rate is set to 0.01, and our network is trained with a batch-size of 128 in 100 epochs. The weather image classification accuracies of different epochs with and without applying the pre-filtering process obtained by the proposed method were shown in Fig. 2. It can be observed from Fig. 2 that the accuracies with pre-filtering process used are better than those without applying the pre-filtering process before the 30th epoch. That is, the pre-filtering process based on BM3D [45] might be useful for extracting some weather cues for the types of weather conditions revealing high-frequency property, such as rain streaks and snow streaks. Therefore, better classification accuracies would be achieved in earlier epochs with the assistant of the preprocessing. However, with the number of epochs increased, the deep network would learn better features for classification with higher accuracies, and the advantage of the pre-filtering operation would be non-obvious.

Fig. 2.
figure 2

The weather image classification accuracies of different epochs with and without applying the pre-filtering process obtained by the proposed method.

4 Conclusions

In this paper, we have proposed an inception network-based weather image classification framework with pre-filtering process for classifying each input image into one of the three classes, including rainy, snowy, and other. By applying the GoogLeNet deep CNN model to achieve efficient weather image classification with pre-filtering operation, we found that the pre-filtering process would be useful for extracting some weather cues for the types of weather conditions revealing high-frequency property (e.g., rain streaks and snow streaks), resulting in better accuracies in earlier training epochs. Such preprocessing technique is compatible to several recent state-of-the-art methods for removing weather effects (e.g., [12,13,14,15,16,17,18,19,20,21,22,23,24]). This property might be useful for designing a complete system for weather effect detection and removal, which is worthy to be investigated further.