Abstract
A model when stated in simple terms is a mathematical equation, which is true when it implies to any model in machine learning including deep neural network. Every model will generate output for a given input, but important is to get output of desired accuracy. Machine learning models are trained on training data, and their best fit is judged on testing data. Before fitting the training data to the model to predict on unknown (test) data, pre-processing of data is essential to ensure model accuracy to acceptable level. This paper presents steps involved in pre-processing raw labelled dataset (Seattle weather) with 25,551 records (from year 1948 to 2017) to make it suitable for input to a deep neural network model. The data is split into 80% of training data and 20% of testing data. Scaling is performed on the data before it is passed to the deep neural network model. Deep neural network model that is multilayer perceptron model using sequential model API with dense layer is built and compiled using Adam optimizer resulting accuracy of 97.33% in predicting rainfall on a particular day.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Artificial neural network (ANN) models have been used for rainfall prediction [1, 2] and found suitable for handling complex large dataset, particularly of nonlinear nature. There are several methods [3] apart from artificial neural networks which have been used for forecasting rainfall; however, ANN has proved to be useful in identifying complex nonlinear relationship between input–output variables. As the number of hidden layers are increased for better performance [4], the concept of deep learning comes in, which are useful for rainfall prediction. The deep neural network models have mathematics behind them, the understanding of which enables one for the selection of architecture for fine tuning of deep learning models, setting of values for hyperparameters and applying appropriate optimization. However, the success of a model for prediction or classification is directly impacted by the data used for training the model. The real-world data in its raw form may not be suitable to train the model, which signifies the importance of data pre-processing.
Pre-processing of data refers to improving the quality of data, which involves data cleaning, data reduction, data transformation and data encoding. In data cleaning, the missing values, duplicate values and outliers are dealt with. Data reduction refers to number of features being reduced, particularly adopted to reduce the effect of curse of dimensionality. Data transformation is applied to scale the data either using normalization or standardization. Data encoding ensures the categorical features in text format are encoded to numbers.
This paper presents steps involved in pre-processing raw labelled dataset (Seattle weather) with 25,551 records, to make it suitable for input to a deep neural network model. Insight into the data is further gained to identify the architecture suitable for chosen problem. Deep learning model, that is multilayer perceptron model using sequential model API with dense layer, is built and compiled using Adam optimizer for desired accuracy.
2 Methodology
Data obtained in raw form is suitably pre-processed to ensure all feature variables and target variable fit the DNN model. Using quantitative approach, the relationship between variables is identified. On pre-processing, the number of input neurons is identified. After pre-processing, data is divided into train–test data and scaling is applied. The DNN-sequential approach is adopted, and dense layers are added. At hidden layers, the activation function is chosen. As the problem belongs to the class of binary classification, sigmoid function is applied at the output layer. The epoch size is identified by observing the model loss and model accuracy curve. The test accuracy is further computed using the prediction capability of the model trained.
3 Data and Data Pre-processing
3.1 Data Set
The present study is made on Seattle, US weather dataset [5]. The dataset contains records of daily rainfall patterns from Jan 1st, 1948 to Dec 12th, 2017. The dataset consists of five columns, and the description of these columns is as follows as shown in Table 1.
There are total number of records which are 25,551. The memory usage is approximately 998 KB. Here DATE, PRCP, TMAX and TMIN are features (X), and RAIN is our target variable Y. RAIN is categorical in nature which has two possible output—True (RAIN) or False (Not RAIN). Thus, the problem belongs to the class—binary classification. Features—PRCP, TMAX and TMIN—are numerical continuous values, and DATE is in format YYYY-MM-DD. Sample five records are displayed below from the dataset as shown in Tables 2 and 3 which shows the statistical description of the dataset:
3.2 Data Pre-processing
Deep neural network (DNN) models like any other machine learning model requires pre-processing of data, before the data is passed to input neuron. One of the important step is to identify the missing values and if found treat them appropriately. Missing values can be dropped and can be substituted by either mean, median or any other relevant value like 0 or 1. It is also necessary to check for duplicate data to make our model more impactful. Often, the raw dataset may contain columns which are less important and can be ignored or avoided as input to the model which are identified during pre-processing. Also, certain new columns may be required to be generated from the existing one for extracting more feature value for the model. Deep learning model takes input to the input layer neurons in the form of real values, essentially it is to be ensured that text to number encoding is done prior to that.
It is observed that the dataset consists of three null values in PRCP and RAIN columns as shown in Table 4. Before we send our data to the model, it is required that the null values to be treated with appropriate action. In this case compared to the huge number of records dropping, the three records with null values in PRCP and RAIN are recommended, and accordingly, they are removed from the dataset leaving 25,548 records for further processing. Next, the dataset is checked for duplicate values and it is observed that there are no duplicate records in the dataset.
The DATE field is broken into ‘YEAR’, ‘MON’, ‘DAY’. The value true is replaced by 1 and False by 0 in the field ‘RAIN’, that is the text (Boolean) data is converted to numeric. Thus, feature X will consist of ‘PRCP’, ‘TMAX’, ‘TMIN’, ‘YEAR’, ‘MON’, ‘DAY’ and ‘RAIN’ as target column Y. So now the data appears as shown in Table 5.
3.3 Data Insight
The dataset contains data from the year 1948 to 2017, and the month-wise distribution of data from the 1948 to 2017 is shown in Table 6.
The rainfall experience in a particular month for the period 1948 to 2017 is shown in Table 7. It is observed that lowest rainfall is observed in the month of July every year, and highest rainfall is observed in the month of December every year.
The histogram for the column rainfall is shown in Fig. 1. The number of records with no rainfall is 14,648, and number of records with rainfall is 10,900.
To get the scatter plot of precipitation with the temperature, additional column AVGTEMP is created from TMAX and TMIN. The obtained scatter plot is shown in Fig. 2. On referring to Table 3, it is observed that the maximum value of PRCP is 5.02, which as shown in Fig. 2 is an outlier. Except for few cases where the average temperature is in between 48 and 60°F, we observe PRCP value above 2.0.
The precipitation is the water released from clouds in the form of rainfall [6]. Refer to Table 8 [7] below and observe Fig. 3, it is seen that October to April every year, moderate rainfall is experienced and from May to September light rainfall is experienced in Seattle.
3.4 Train–Test Split
To estimate the performance of machine learning algorithms, the data is split into training and testing data. Usually, the data is split into train–validate–test data. The train data is used to train the model; using validation data, the model is validated and tested using test data. In the present experiment, the data is split into only train and test data. The ratio chosen here is 80:20, that is 80% of training data and 20% of testing data.
3.5 Feature Scaling
Machine learning algorithms like linear regression, logistic regression and neural network that use gradient descent as an optimization technique require data to be scaled [8]. On observing the values across different columns in the Seattle weather dataset, we find there is varying range of values. By applying scaling, all values are brought on same scale before giving it to our deep neural network model. Scaling can be done using either normalization or standardization. In normalization, values are scaled in the range from 0 to 1, while in standardization values are centred around the mean. As we see outliers with respect to PRCP and AVGTEMP in the data, standardization on train and test data is applied.
4 Deep Neural Network Model
Neural network models in simple terms described as mathematical function which maps the input to generate the desired output. It comprises of input layer, output layer, arbitrary number of hidden layers, a set of weights, and bias between each layer and a choice of activation function and loss function.
On completion of transformation of the data, DNN-sequential model is applied here on the pre-processed data as it allows layer by layer model building, which forms network of dense layer. As there are six input features (except ‘RAIN’ which is target variable) as shown in Table 5, the first dense input layer is set with six features. Activation function rectified linear activation function ‘ReLU’ is used at hidden layers. Another dense layer with four neurons is added with ‘ReLU’ activation. As the problem belongs to the class of binary classification, ‘sigmoid’ function is used at the output layer, which will generate output either 1 or 0, for rainfall or no rainfall, respectively. Thus, the model has 6 input, two hidden layers with 6 and 4 neurons and output layer with one output. The model is implemented using keras.io [9]. The model summary is shown in Table 9.
To train deep neural network models, adaptive optimization algorithms are used. The examples include Adam [10], Adagrad [11], RMSprop [12]. Adaptive here refers that it computes individual learning rate for different parameters. The model is compiled using Adam optimizer which is seen as a combination of RMSprop and stochastic gradient descent [13] with momentum with few distinctions. The nature of problem being binary classification and as the target variables are {0,1}, the loss function is computed using cross-entropy [14]. The model is fitted to training data using 10 epochs and batch size of 64. One epoch is the complete pass through the training data. Epoch is a hyperparameter, and there is no thumb rule for that. Batch size is the number of sample processed in the single mini-batch.
5 Result and Conclusion
In this section, the results generated at various stages are discussed.
5.1 Weights and Bias
The model weights for first layer dense_1, with the output shape (None, 6) generates 42 parameters, that is 36 weights and 6 bias, similarly 24 weights and 4 bias for dense_2 and 4 weights and 1 bias for dense_3, total 75 parameters. The values generated across each weight and bias is as shown in Table 10.
5.2 Training and Validation: Loss and Accuracy
After the model is complied, the training data is fit using 10 epoch and 64 batch size. As the number of epochs increases, more information is learned. Both training and validation accuracy increase as the number of epochs increases. The resultant training and validation loss and accuracy in each epoch are shown in Table 11. The model took 2.48 s to train.
Figure 4 shows at each increasing epoch the model loss and decreases and the model accuracy increases. From 8th epoch onwards, the curve starts flattening, and by 10th epoch, it becomes stagnant. Here, the batch size is 64. Thus, further training is stopped and model used for testing data.
5.3 Test Loss and Test Accuracy
The model is applied on testing data, and the resultant test loss and test accuracy is shown in Table 12.
5.4 Comparative Analysis
The target variable in the present problem statement is binary, that is 1 for it will rain and 0 for will not rain. Thus, it is a classification problem and several machine learning classification algorithms can be fitted to this dataset on appropriate pre-processing. Logistic regression model [15] is one such model which best fits for binary classification. The training data is fitted to logistic regression model, and it is observed that test accuracy obtained is 0.9330724070450098.
6 Conclusion
The paper presents the steps involved in pre-processing the raw data before passing it to a deep neural network model. The architecture of the deep neural network model is influenced by the feature vectors that go as input and the target variable which is the expected output. The activation function and optimizer used impacts the loss function. Here, the Seattle weather data is used for rainfall prediction which is available for a period from 1948 to 2017. The prediction of rainfall on a particular day which belongs to the class of binary classification is trained using deep neural network model. The sequential model with dense layer, ReLU activation function at hidden layer, sigmoid function at output layer, Adam optimizer, 10 epochs with batch size 64 is implemented on the dataset to achieve the test accuracy of 97.33%. The training data when was fitted to logistic regression model also, and it gave accuracy of 93.30%. Thus, it is recommended to use DNN model, as in logistic regression the classification is linear, whereas the DNN model will be useful for more complex and nonlinear data.
References
Nayak DR, Mahapatra A, Mishra P (2013) A survey on rainfall prediction using artificial neural network. Int J Comput Appl
Lee S, Cho S, Wong PM (1998) Rainfall prediction using artificial neural networks. J Geogr Inf Decis Anal 2(2):233–242
Lee J, Kim C-G, Lee JE, Kim NW, Kim H (2018) Application of artificial neural networks to rainfall forecasting in the Geum River Basin, Korea. Water 10:1448. https://doi.org/10.3390/w10101448
Aswin S, Srikanth G, Vinayakumar R (2018) Deep learning models for the prediction of rainfall. In: Conference: 2018 international conference on communication and signal processing (ICCSP). https://doi.org/10.1109/ICCSP.2018.8523829
https://www.kaggle.com/rtatman/did-it-rain-in-seattle-19482017
Rain: A Water Resource. USGS General Interest Publication. https://www.usgs.gov/special-topic/water-science-school/science/precipitation-and-water-cycle?qt-science_center_objects=0#qt-science_center_objects. Last accessed 12/101/2020
Lull HW (1959) Soil compaction on forest and range lands. U.S. Dept. of Agriculture, Forestry Service, Misc. Publication No. 768
Bhandari A (2020) Feature scaling for machine learning: understanding the difference between normalization versus standardization, https://www.analyicsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
Keras Homepage. https://keras.io/. Last accessed 14/10/2020
Kingma DP, Ba JL (2014) Adam: a method for stochastic optimization. arXiv:1412.6980v9
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: neural networks for machine learning 4(2):26–31
Ashia C, Wilson RR, Stern M, Srebro N, Recht B (2017) The marginal value of adaptive gradient methods in machine learning. arXiv:1705.08292v2
Mannor S, Peleg D, Reuven R (2005) The cross entropy method for classification, ICML. In: ‘05: proceedings of the 22nd international conference on machine learning, pp 561–568. https://doi.org/10.1145/1102351.1102422
Jakaitiene A (2019) Nonlinear regression models. In: Encyclopedia of bioinformatics and computational biology
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Desai, C. (2021). Rainfall Prediction Using Deep Neural Network. In: Sharma, H., Gupta, M.K., Tomar, G.S., Lipo, W. (eds) Communication and Intelligent Systems. Lecture Notes in Networks and Systems, vol 204. Springer, Singapore. https://doi.org/10.1007/978-981-16-1089-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-1089-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1088-2
Online ISBN: 978-981-16-1089-9
eBook Packages: EngineeringEngineering (R0)