Keywords

1 Introduction

Throughout human history, meteorological processes prediction has always been a complex task mainly because the Earth’s atmosphere system is very complex and dynamic.

Weather forecasts are calculated based on meteorological data collected by a network of weather stations, radiosondes, radars and satellites around the world. Data is sent to meteorological centres, where it is entered into forecast models for atmospheric conditions calculation. Such models are based on physical laws and work according to extremely complex algorithms.

Precipitation is determined by the physical processes occurring in the cloud, namely the physics of the interaction of water droplets, ice particles and water vapor. Convective clouds are very variable due to the large vertical speeds within the cloud and its environs. It is also difficult to conduct control experiments involving them. All this leads to the fact that the development of the cloud is usually analyzed using computer simulation, which allows us to do this without resorting to expensive field experiments.

As a result of computer simulation of the cloud we get a data set that can be further used for forecasting various dangerous convective phenomena such as thunderstorm, hail and heavy rain.

In recent decades mathematicians and programmers are working hard to improve existing numerical weather forecasting models. Nowadays machine learning methods, especially neural networks are considered to be one of the most promising tool of such improvement [2, 3]. Authors in [2] state that advantages of neural networks are the intrinsic absence of model bias and possible savings of computational resources due to ability of neural network very efficiently calculate forecasts with new data after corresponding training.

The use of machine learning methods allows us to automate the forecasting process, which greatly facilitates data analysis. These methods conduct a series of computational experiments with the aim of analyzing, interpreting and comparing the simulation results with the given behavior of the object under study and, if necessary, subsequently refining their input parameters.

The idea to use neural networks to process output from numerical weather prediction models had been explored in far 1998 year in [4] in order to give more accurate and localized rainfall predictions already.

Prediction of rainfall amounts is very popular application for neural networks usage [5,6,7]. Thus in [5] researchers in Thailand tried to predict possible flooding dangers by estimating rainfall amounts using feed-forward neural networks. Authors in [6] tried to accurately predict heavy precipitation events (>25 mm h−1) over Germany using also neural networks.

Tao et al. [8, 9] use deep neural networks for forecasting precipitation amount among meteorological factors and obtained promising results.

Authors in [10] used neural networks for predicting probabilities and quantitative values of precipitation with the help of the Eta atmospheric model and upper air soundings.

Researcher in [11] has investigated how effectively neural networks can perform classification prediction of freezing and gusty events as well as minimum temperature and maximum gust values. Paper [11] contains also the detailed review of neural networks application for solution of meteorological problems.

Neural networks have also been used to predict various weather phenomena (wind speed, barometric pressure, fog [12]) including extreme events, such as tornadoes [13] and typhoons [14, 15].

In [14] a multilayer perceptron is used to predict changes in tropical cyclone intensity in the northwestern Pacific Ocean. The paper [15] uses a generative adversarial network to predict typhoon trajectories. The neural network generates an image showing the future location of the typhoon center and cloud structure using satellite images as an input.

In this paper we continue the studies described in [1, 16,17,18] and analyze the possibility of the use of neural networks for dangerous convective phenomena forecasting by processing the output data of numerical model of convective cloud [19,20,21,22]. The idea is to retrieve the possibility of thunderstorm forecasting from the data of the model, able to simulate only dynamical and microphysical characteristics of convective clouds, but not electrical characteristics. The ideas and the result delivered in [1] are developed and supplemented by the classification error calculations and by consideration of radial basic and probabilistic neural networks.

2 Initial Data

Research using machine learning methods is based on data, therefore, in order to obtain the best results, it is necessary to use reliable sources of information to obtain data and form their correct structure.

In this work, the data was obtained using the following algorithm:

  1. 1.

    We receive data on the date and place of meteorological phenomena occurrence;

  2. 2.

    We select the data corresponding to the presence of a thunderstorm or the absence of any meteorological phenomena;

  3. 3.

    We obtain data from atmosphere radio sounding for the certain date and place;

  4. 4.

    We convert the radio sounding data to the model input data format;

  5. 5.

    Using the hydrodynamic model, we obtain the integral and spectral characteristics of the cloud;

  6. 6.

    We determine the height and time corresponding to the maximum development and maximum water content of the cloud. The cloud parameters corresponding to these height and time will be used for the thunderstorm forecasting.

Formed data set contains 416 records, where 220 samples correspond to the presence of a thunderstorm and 196 samples to its absence. This data was divided into training and test data sets. The training one contains 333 samples and the test one contains 83. Due to the small amount of data we decided to use test data set for validation.

We also created labels for each sample in the data set. Since there are only two cases, the presence and absence of phenomenon, we could have created one label per sample. But we decided to use two labels per sample, one for each case, mainly because we will need to divide the output variables of the neural network at some point.

3 Data Preprocessing

Solution of machine learning problems require to find an unknown relationship between a known set of objects and a set of answers. In our case the fact of dangerous phenomenon occurrence can be considered as an answer, and the results of numerical modeling, can be considered as an object. Radiosonde sounding data are used as the model input.

Neural networks, like all machine learning algorithms, depend significantly on the quality of the source data. Therefore, before proceeding to the construction of a neural network, we will need prepare the data.

First, we normalize the data using the Standard Scaler method from the Python scikit-learn library, which converts the data to the standard normal distribution.

Then we select the most significant features. To do this, we use the Recursive Feature Elimination method from the scikit-learn library with Random Forest algorithm as an estimator. The method is as follows. The estimator is firstly trained on the initial set of features, then the least important feature is pruned and the procedure is recursively repeated with smaller and smaller set. Figure 1 shows the resulting graph of the prediction accuracy versus the number of features used. As can be seen from the figure, maximum accuracy is achieved when using 8 features. Their names and their importance are shown in the Fig. 2. Thus, we will use the following features: vapor, aerosol, relative humidity, density, temperature excess (inside cloud), pressure, velocity, temperature (in the environment).

Fig. 1.
figure 1

Graph of prediction accuracy versus the number of features involved

Fig. 2.
figure 2

Selected features and their importance.

4 Classical Multi-layer Perceptron and Perceptron Complexes

The main ideas and results achieved while using classical multilayer perceptron structure (Fig. 3) and perceptron complexes were described in [1]. Some additional explanations for using perceptron complexes and the values of the classification errors can be added to what is said there.

Fig. 3.
figure 3

Classical multi-layer perceptron

The article [23] mentions that the ratio of the volume of the training data set and the number of trainable network parameters is one of the factors that affect the modeling ability of the perceptron. If this ratio is close to 1, the perceptron will simply remember the training set, and if it is too large, the network will average the data without taking the details into account. In this regard, in most cases, it is recommended that this ratio falls in the range from 2 to 5. In our case, this ratio is.

$$ \frac{333}{{123}} = 2.7 $$

which falls into this range. However, our training data set is small and the use of algorithms based on neural networks may be ineffective with small amounts of experimental data [24]. So we decided to use one of the methods that can help to increase the efficiency of our neural network.

The method is described in [23]. It consists in dividing the set of input and output variables into several perceptrons with a simpler structure and then combining them into a single perceptron complex. Figure 4 shows the general structure of such a complex.

The perceptron complex training algorithm is as follows [24]:

  1. 1.

    For each first level perceptron:

    1. a.

      Given the input and output variables of the current perceptron, we construct the training and test data sets for it based on the initial data;

    2. b.

      Perceptron training is executed;

    3. c.

      For all samples of training and test data sets, the values of the perceptron outputs are calculated and stored.

  2. 2.

    For the resulting perceptron:

    1. a.

      Given the input and output variables of the perceptron, we construct the training and test data sets for it based on the initial data and the calculated output values of the first level perceptrons;

    2. b.

      Perceptron training is executed.

Two variants of the perceptron complexes were described in our previous work [1]. Here we can only add that classification errors are equal to 0.081 and 0.078 for the first and the second perceptron complexes consequently.

Fig. 4.
figure 4

General structure of the perceptron complex.

5 Radial Basis Function Network

Two types of networks that belong to radial basic networks are considered as they show good results in problems of binary classification. Also, their advantage is a simple structure where there is only one hidden layer.

In the process of training this network, three sets of parameters are determined. We considered several ways to set their initial values and established how many neurons there should be in the hidden layer to get the highest prediction accuracy.

The resulting neural network is shown on Fig. 5. Its accuracy is 91.6%, classification error is 0.069.

Fig. 5.
figure 5

Radial basis function network

6 Probabilistic Neural Network

A feature of the probabilistic neural network is that the number of neurons in the hidden layer is equal to the number of examples in the training set, that is, the network simply stores the entire training set.

The structure of the network is shown on the Fig. 6. The accuracy is 90.4%, classification error is 0.096.

Fig. 6.
figure 6

Probabilistic neural network

7 Classification Accuracies and Classification Errors of the Neural Networks of Different Types

Table 1 presents the values of classification accuracy and classification error of the five types of neural networks considered by the authors. As can be seen from the table the best accuracy is achieved using the second perceptron complex and the radial basis function network.

Table 1. Maximum forecast accuracy and classification error of the five types of neural networks

8 Conclusions

The work analyzed the possibility of using neural networks to build forecasts of dangerous convective phenomena by the example of a thunderstorm.

The initial data set was obtained using numerical modeling of a convective cloud.

Using machine learning methods at the stage of data analysis and processing of features, the most significant features were identified.

Five networks were considered. The best accuracy was achieved using the second perceptron complex and the radial basis function network. However, the radial basis function network gave the smallest classification error. Also, its advantages over the perceptron complex are simple structure and short training time.

In future we will further explore the possibility of using neural networks for forecasting thunderstorm and other dangerous convective phenomena and specifically our research should be focused on obtaining sufficient number of radiosonde soundings with the corresponding model simulations for training data sets formation.