Keywords

1 Introduction

The process of analyzing and predicting the probability of precipitation and forecasting of rainfall in future along with estimating the amount of rainfall in specific regions is called Rainfall Prediction. Alongside the likelihood of precipitation in that particular district, it likewise considers the precipitation volume assessment, exactness of forecast, mistake in expectation and. It is ready by the forecasters through social occasion, researching, affirming, showing, reproducing and investigating the different meteorological data and limits available. Precipitation is one of the six intrinsic bits of environment assumption and is moreover the most basic parts in the hydrologic. Storm in India is critical as it gives shape to its farming and economy. Because of the intrinsic intricacy of the actual cycles related with the forecast of Indian Summer Storm Precipitation (ISMR), it is perhaps the most confounding logical errand. However, research related to monsoon has improved and advanced significantly due to the ever increasing amount of data made available from the satellites, improved understanding of the processes, and enhanced computing resources. As an impact of changing climate, the spatiotemporal distribution of precipitation is getting modified since the last few decades. This has resulted into frequent droughts and floods within a spatial distance of a few hundred kilometers. Quantitative forecast of every day precipitation is a difficult assignment and is huge for a considerable length of time and functional applications. Over the most recent thirty years, the precision of rainstorm gauges has improved and the methodical mistake with conjecture length in the medium reach has decreased. In any case, the worry for precipitation conjecture for the Indian rainstorm locale stays as the capacity for jungles is still lower when contrasted with mid-scopes [1]. The expectation of the precipitation can be partition into two sections one being the transient forecast which considers month to month expectation and also the drawn out forecast which considers yearly expectation which ordinarily is troublesome as a result of high unconventionality and reliance of precipitation on different boundaries of precipitation. AI models are typically known to yield better proficiency in expectation in situations where the informational collection has assorted boundaries and has critical degree of consistency, though the precipitation information being a period variation and conflicting which is the reason the AI models normally don’t perform well. The coming of profound learning and neural organization has filled in as a well-suited answer for such issues. The different degree of learning and neighborhood conditions in the neural organization model assist them with performing great on such fluctuating dataset with less boundaries. Please note that the first paragraph of a section or subsection is not indented. The first paragraphs that follows a table, figure, equation etc. does not have an indent, either.

2 Literature Review

(Table 1).

Table 1 Review of relevant literature

3 Regression

Regression is module of data science which takes a statistical approach to solve the problems. Regression is categorized under the supervised learning approaches where strategies are made to predict the future variables. While calculating the future (unknown) variables, the known quantities and their relationships are taken into consideration and connections with the unknown variable are build. Simple regression is given by A = X + X1(B), where An is the reliant variable, whose worth is to be anticipated and B is the free factor whose esteem consistent worth and X1 is the improved coefficient.

3.1 Support Vector Machines (SVM) for Regression

The utilization of SVM for relapse is known as Support Vector Regression (SVR) [5]. This calculation consequently changes ostensible qualities over to numeric qualities. Input informational collection must be standardized prior to preparing start, either consequently (by apparatus arrangement or prearranging) or physically by the client (informational collection standardization). SVR observes a best fit line which decreases the mistake of the expense work. Just those occasions (Support Vectors) in the preparation informational collection are picked which are closest to the line with least expense. An edge must be made around the line for better change of forecast and afterward the information might be projected into higher layered space for better expectation and adaptability. The expense work limits the boundaries over the dataset.

Bit Functions are utilized to deal with the high dimensionality of the component space. Appropriate choice of Kernel capacity can create more powerful outcome or exactness in least time in this way expanding productivity of the model. Weka instrument utilizes different parts to accomplish this errand [2].

3.2 Artificial Neural Network

Artificial Neural Organization involves various processors working in equal and organized in levels. The principal level acknowledges the crude information. Rather than crude info, each successive level gets the result from the level going before it. The last level creates the result of the framework. The two significant conversations against ANN is that its asset concentrated and its results are difficult to decipher.

ANN is considered whenever the computational assets are not a constrictive and cost-restrictive. The ANN functions admirably when the informational collection utilized for preparing is colossal which the situation in precipitation dataset is normally. Besides the ANN model has stowed away layers and critical measure of neighborhood conditions which assists it with learning the conflicting examples and perform better on the time series information.

4 Study Area

The area of Maharashtra lies on the western side of India. Maharashtra lies between 19° 36′ 4.2984″ N scope and 75° 33′ 10.7244″ E longitude. The state is bifurcated into 35 area and four meteorological locales. These sub divisional bifurcations are (1) Vidarbha (2) Madhya Maharashtra (3) Marathwada and (4) Konkan. The Konkan sub-division lies on the windward side of the Ghats and the other sub-divisions lie on the leeward side. Within a piece of the state is semi-very dry. Colossal assortments in precipitation in different areas of the state achieve a wide extent of climatic conditions. In light of geological Maharashtra gets most outrageous precipitation in July (33% of SW storm precipitation) followed by August (28% of SW rainstorm precipitation). 89% of yearly precipitation gets during southwest tempest precipitation (June–September). Most outrageous precipitation gets during the SW storm season over the districts in Konkan region (2361–3322 mm) while bits of Madhya Maharashtra and Marathwada get least precipitation (454–600 mm). During the entire year there is an immense development in Blustery days in Nandurbar, Jalgaon, Raigarh, Kolhapur and Bhandara regions. However there is a basic reducing in Turbulent days in Pune, Solapur, Kolhapur, Ahmednagar, Aurangabad, Jalna, Beed, Hingoli, Nanded, Yavatmal, Wardha districts. During the period June to September there is an enormous development in Profound precipitation days in Nandurbar, Jalgaon, Raigarh, Kolhapur and Bhandara area. While it is also observed that there is an enormous decrease in significant rainy days in Pune, Solapur, Kolhapur, Ahmednagar, regions. According to the meteorological and climatic variations in.

  1. (1)

    Pre-monsoon seasonal rainfall (March–May).

  2. (2)

    South-west monsoon season or monsoon season rainfall (June–September).

  3. (3)

    Post monsoon season rainfall (October–November).

  4. (4)

    Rainfall in winter season (December-February).

In our study we have considered the annual rainfall for the state of Maharashtra from 1950 to 2020, with the annual rainfall we have also taken into consideration the normal monsoon season period which consists of June–September (4 months) since this is the period when the state of Maharashtra receives the majority of rainfall. Along with the above-mentioned period, a time period of further 8 months comprising of 5 months of January–May and 3 months of October–December is considered as the non-monsoon rainfall period.

5 Data Collection

The primary data collected for this study consists of the annual rainfall data for India for a time period between 1950–2020 from IMD. Data-set in use has 36 sub divisions and 19 attributes (individual months, annual, combinations of 3 consecutive months according to the monsoon and non-monsoon period). All the attributes have the amount of rainfall in mm.

6 Analysis of Rainfall Pattern

Many key parts of earth and human existence are subject to precipitation straightforwardly or in a roundabout way. The progressions in worldwide environment have a solid relationship with the yearly precipitation. It has been seen that the precipitation design off of late has been exceptionally unpredictable which has upset the agrarian and water the board universally. It is vital to investigate the precipitation design to comprehend and moderate the conditions of the boundaries which are straightforwardly or by implication impacted by the yearly precipitation. In the introduced study a dataset of Yearly precipitation of India from 1950–2020 has been dissected and the investigation has been addressed in the structure Reference diagrams showing appropriation of measure of precipitation, Conveyance of measure of precipitation yearly, month to month, gatherings of months, Circulation of precipitation in developments, areas structure every month, gatherings of months.The visualization of rainfall helps to make observations which can be used for creating or choosing the right predictive model for further experimentation.

Observations made from the visualization:

  1. 1.

    Histograms in Fig. 1 show the circulation of precipitation over months. It is Seen that there is expansion in measure of precipitation over months July, August, September all through

    Fig. 1
    Eighteen histograms of rainfall distribution from 1950 to 2020 by years, month wise from January to December, and from January to February, March to May, June to September, and October to December. The highest is in November and December, and the months have the tallest bars in the initial period.

    Total rainfall distribution month wise from 1950–202

  2. 2.

    The two charts in Fig. 2 shows that how much precipitation is sensibly great in the long stretches of Spring, April, may in eastern India.

    Fig. 2
    Two spike graphs. 12 curves at different levels for January to December. 4 curves for January to February, March to May, June to September, and October to December at different levels.

    Monthly rainfall in various sub-divisions of India

  3. 3.

    Scatter graph in the Fig. 3 shows the conveyance of precipitation on yearly premise, high measure of precipitation is Seen in 1950s.

    Fig. 3
    Two horizontal stacked bar charts with rainfall over the months. a. Month wise plots from January to December with July having the highest contribution in most of the states. b. Plots of January to February, March to May, June to September, and October to December, with June to September having the highest share.

    Scatter plot of rainfall over the months

  4. 4.

    Heat-Map in Fig. 4 shows the co-relation (dependency) between the measures of precipitation over months.

    Fig. 4
    A matrix represents a heat map of January to December and annually with a value of 1 at the decreasing diagonal from top left, mid-range values on either side, which decrease towards the vertices, on a scale of 0 to 1.

    Heat map for monthly and annual rainfall

  5. 5.

    From above representations obviously on the off chance that measure of precipitation is high in the long periods of July, august, September then how much precipitation will be high yearly.

  6. 6.

    It is additionally seen that assuming measure of precipitation in great in the long stretches of October, November, December then the precipitation will be great in the general year.

7 Methodology

In the presented study firstly the analysis of rainfall was carried out followed by the prediction of annual rainfall. Three models for prediction were used and tested over the annual rainfall dataset.

  1. 1.

    The input data set originally had monthly and annualized rainfall for each year from 1950–2020. Data integration was performed by adding the Monsoon and Non Monsoon rainfall parameters in form of the Monthly divisions namely Jan–Feb, Mar–May, Jun–Sep, Oct–Dec.

  2. 2.

    All the parameters were verified and the the missing data was augmented wherever necessary. The available data was then analyzed to create a visualization of the rainfall trends over the years. The dataset was then split into 80% train and 20% test data.

  3. 3.

    The Linear regression model, SVR and ANN are used for making annual predictions on the dataset Two types of trainings were done once training on complete dataset and other with training on only Maharashtra and Konkan.

  4. 4.

    The ANN model used has been run with 10 epoch so as to yield better efficiency as it is known to have better accuracy for inconsistent data if executed with more number of epoch than usual.

  5. 5.

    All three models’ parameters, MAE, MSE, RMSE and R2_Score are compared to find the best rainfall prediction model for annual rainfall in Maharashtra and Konkan. The models were evaluated on the basis of above-mentioned metrics.

8 Results Analysis and Discussion

The study aims at long term prediction of rainfall in Maharashtra state, which is done with the help of regression modeling. It was found that due to the long-term prediction modelling the accuracies of the models leveraged showed distinct results. It has been observed that the long-term prediction usually is uncertain because of the data size and time accumulation and lack of information. In the long term time series prediction there are many limitations ranging from the dearth of data to less accurate results, but the deep learning techniques can be leveraged to introduce localization and improve the impact of the long term prediction. The deep learning models leveraged were able to learn directly the complex and arbitrary mapping in the input side and support and supplement the hypothesis on the output end (Table 2).

Table 2 Prediction performance

The tabular comparative result analysis mentioned in the following table shows the parametric performance of the three algorithms. The predictions made were on the basis of the annual rainfall, monsoon and non-monsoon sub-divisions The MAE values for the three models show the deviation of the predicted values from the actual value, which is higher for linear regression and SVR whereas ANN has showed comparatively less deviation. Similarly, the RMSE value tells us that the predictions made by LR and SVR are discrete and away from the best fit line. ANN has a better efficiency of prediction as the even the R2_Score for this model is nearer to 1 which is considered ideal for any predictive model. The rainfall data is discrete and nonlinear which is why the two machine learning models have showed less efficiency in prediction, the efficiency for ANN is better but can’t be said satisfactory (Figs. 5, 6 and 7).

Fig. 5
Three grouped bar graphs of the amount of rainfall from April to December for Maharashtra and Konkan using Linear Regression. The highest values for ground truth in 2005, 2010, and 2015 are July 1340, July 1405, and June 266, and for prediction are August 1017, August 1069, and July 259.

Prediction of Year 2005, 2010, 2015 for Maharashtra and Konkan using Linear Regression

Fig. 6
Three grouped bar graphs of the amount of rainfall from April to December for Maharashtra and Konkan using S V R. The highest values for ground truth in 2005, 2010, and 2015 are July 1340, July 1405, and June 266, and for prediction are July to December 82, July to December 82, and April to December 82.

Prediction of Year 2005, 2010, 2015 for Maharashtra and Konkan using SVR

Fig. 7
Three grouped bar graphs of the amount of rainfall from April to December for Maharashtra and Konkan using A N N. The highest values for ground truth in 2005, 2010, and 2015 are July 1340, July 1405, and June 266, and for prediction are August 1017, August 1069, and July 259.

Prediction of Year 2005, 2010, 2015 for Maharashtra and Konkan using ANN

9 Conclusion

Forecasting rainfall often can be helpful to make decisions regarding agro-based fields, having prior knowledge about the rainfall can help mitigate and foresee problems. In this paper a long term forecasting of rainfall in Maharashtra is done with the help of predictive modelling and deep neural networks, this study also provides a practical proof which supplements the fact that for annual rainfall in Maharashtra and Konkan is variable and unevenly scattered over the monsoon and non-monsoon period. The visualization of Indian rainfall made it clear that there is a lot of variance and discreteness in the rainfall pattern. To produce desired load forecasts, three forecasting techniques i.e. LR, SVR and ANNs were considered for evaluation by using multiple performance metrics. Significant weather profiles from eight different cities were selected to develop a synthetic weather station.. The results yielded by the two machine learning models (linear regression and SVR) depict the incompetency of the machine learning models for prediction of rainfall due to fluctuations in rainfall. It is also observed that the neural network model used for experimentation has performed better on the nonlinear time series data even though the number parameters in the dataset were limited on the basis of the evaluation metrics such as MAE, MSE, R2_score. In future, multiple prospects of this research can be explored for further development. In future experimentation more powerful neural network model such as LSTM (Long Short-Term memory) and RNN can be implemented on the non- linear data in order to get more accurate results for prediction and analysis. Even the dataset can be made more modular by adding some more parameters to obtain better results.