Keywords

1 Introduction

In recent years, the world has been experiencing increasing scenarios of droughts and floods that threaten many countries economically and in many other different ways. Such extreme events might be a consequence of climate change that tends to worsen over time. An early-warning information system is a key part of counterbalancing this endangerment. For this purpose, rainfall monitoring plays a crucial role. The precipitation rate must be precisely measured with high spatial and temporal resolution in order to monitor an eminent extreme event.

For developed countries, weather radar networks are a well-established solution for rainfall monitoring. Another usual approach is to use weather satellites to estimate the precipitation rate. Both solutions are capable of covering a broad area, which is particularly beneficial for large countries.

Unfortunately, having a dense deployment of weather radars is costly, and developing countries cannot afford it. Regarding weather satellite monitoring, some efforts have been made with the Tropical Rainfall Measuring Mission (TRMM), which consists of a constellation of satellites to monitor and study tropical and subtropical precipitation. This mission was a result of collaborative work between the United States and Japan. The TRMM has been an important data source for meteorological and hydrological activities worldwide [6]. Naturally, some emerging countries in such regions took advantage of the rainfall measurement. Notwithstanding, it is well-known that satellite rainfall monitoring still lacks accuracy, especially for high-resolution and real-time applications [5], and ground measurements to tweak the rainfall estimation are still required to adjust or to downscale satellite estimates.

Some papers have proposed alternatives to rainfall estimation, such as using satellite radio links already in operation and broadly spread worldwide. These satellite communication links usually operate on Ka or Ku bands, which are mainly corrupted by rainfall. Even though these base stations primarily focus on satellite services, it is possible to estimate the precipitation rate from rain-induced attenuation of the received signal [1, 4, 9].

In-situ rain gauge measurement is also a very cost-attractive and high-accuracy solution to monitor the rainfall [4]. However, this technique only provides a point-scale measurement, and a high density of gauges would be necessary to cover an urban area [8]. Another common usage is to compare the rain gauge measurement with the estimated values of a given method. In this case, the gauge is not a monitoring system but a reference set to assess the method’s accuracy.

Similar techniques can also be applied to ground-to-ground microwave links [2, 7, 10, 15]. These terrestrial radio networks have the advantage of providing measurements of close-to-ground links, which is beneficial for near-surface rainfall estimation. In scenarios where weather radars are not available operationally, commercial microwave links (CML) might be an alternative for measuring rainfall [7, 10, 15]. Since the International Telecommunication Union (ITU) provides a straightforward relationship between the attenuation due to rain and precipitation rate, it is possible to remove the path attenuation along the link (baseline level) and calculate the precipitation rate from the remaining attenuation, which is assumed to be caused by the rainfall. This physics-based approach utilizes scattering calculation to derive the power-law coefficients, which depend only on the frequency [11]. However, real applications also have other parameters that might influence the quality of the estimation, such as radio link length, distance from the precipitation area taken into consideration, etc.

This paper proposes an alternative data-driven estimation of the power-law coefficients, where the Levenberg-Marquardt algorithm is used to recursively adjust them in order to minimize the sum of the square of the error. The predicted rainfall time series of the proposed model is compared with a reference rain gauge (closest one), and the Pearson correlation between both curves is computed and used as figure of merit for precision. Moreover, the results are also compared with the time series predicted by using the standard ITU coefficients.

The main contributions of this article are:

  1. 1.

    Data treatment of the bucket gauge measurements and its mathematical analysis to estimate the rainfall via Levenberg-Marquardt algorithm.

  2. 2.

    Comparison of the estimated time series with the prediction from the ITU model.

  3. 3.

    Numerical analysis of a real dataset collected in Niamey, the capital of Niger.

The present article is organized as follows: In Sect. 2, the geographical context is detailed. Section 3 introduces the Levenberg-Marquardt algorithm and the data treatment for the present application. In Sect. 4 and 5, the results and conclusions are exposed, respectively.

2 Geographical Context

The present article makes use of a dataset collected in Niamey, the capital of Niger. The environment is characterized as semi-arid, with a rainfall rate between \(500~\hbox {mm/yr}\) and \(750~\hbox {mm/yr}\), where most of this precipitation occurs between June and September. There is practically no rain in the remaining months (from October to April). Convective rains created by Mesoscale Convective Systems (MCS) comprise for \(75\%-80\%\) of the total rainfall [14].

The commercial microwave link data was originally obtained through a partnership established with the mobile telecommunication operator Orange (the network has now been bought by Zamani Com). The project entitled “Rain Cell Africa - Niger” is financed by the World Bank’s Global Facility for Disaster Reduction and Recovery (WB/GFDRR) and aims to test the potential of CML-based rainfall estimation for urban hydrology in Africa. Indeed, previous results for a single radio link in Ouagadougou have indicated the feasibility of such an approach [2].

In order to cover this area, an instrumental setup that records the received power level and rainfall gauge measurement was built in Niamey. In 2017, this system continuously recorded the testbed for 6 months, approximately, and yielded a dataset from 135 microwave links and three bucket gauges. The microwave frequencies varies from \(18~\hbox {GHz}\) up to \(23~\hbox {GHz}\), with a link length between \(0.5~\hbox {km}\) and \(5.5~\hbox {km}\). Their measurements were recorded at a period of \(15~\hbox {min}\) between the samples and with a resolution of \(1~\hbox {dB}\). The bucket gauges, on the other hand, recorded the precipitation in \(\hbox {mm h}^{-1}\) with a resolution of \(0.5~\hbox {mm}\) but at the same rate. It is assumed no timing synchronism impairment between the samples from the CML and the bucket gauge.

The criterion used to associate the kth radio link to the ith tipping bucket gauge is the selection of the gauge whose distance from the center point of the communication link to it is minimum. This distance varies from \(1~\hbox {km}\) to \(6~\hbox {km}\), approximately. Table 1 shows the number of radio links associated with each tipping gauge. The tipping gauge of number 1 has much more radio links associated with it, which naturally leads to more data availability.

Table 1. Distribution of radio links by frequency for each gauge: each row has the number of links that are associated with a given gauge.

3 Mathematical Analisys and Data Treatment

3.1 The Levenberg-Marquardt Algorithm

Let \(y_{i}(n) \in \mathbb {R}\) be the nth measurement sample collected by the ith rain gauge (in mm/h) during one day. The signal \(y_{i}(n)\) has no missing or zero-valued data, i.e., only the time series intervals with rainfall measurements are considered.

Let us further define \(x_{k}(n) \in \mathbb {R}\) as the specific attenuation (in dB/km) attributable to rain along the kth radio link for the same day. Each signal \(x_{k}(n) \in \mathbb {R}\) is associated to one and only one gauge \(y_{i}(n)\). Our goal is to predict \(y_{i}(n)\) by using the set \(\left\{ x_{k}(n) \mid k \in \mathcal {S}_{i} \right\} \), where \(\mathcal {S}_{i}\) is the set of radio links associated to the gauge i.

A power-law relationship converts the specific attenuation into rain rate by using the following formula [15]:

$$\begin{aligned} \hat{y}_{i}(n) = \root w_{k,1} \of {\frac{x_{k}(n)}{w_{k,0}}}, \end{aligned}$$
(1)

where \(\textbf{w}_k = \begin{bmatrix} w_{k,0} & w_{k,1} \end{bmatrix}^\top \) and \((\cdot )^\top \) is the transpose operator.

The usual method is to apply Mie’s solution to Maxwell’s equations in order to evaluate the attenuation-rainfall relationship and define \(\textbf{w}_k\). This approach requires defining the environment temperature and the radio link operating frequency.

Another alternative is to treat (1) as an optimization problem, where each link-gauge pair has its own set of coefficients that minimizes a given objective function. For linear problems, the ordinary least-squares algorithm provides an analytical solution that minimizes the squared Euclidean distance of the error vector between the data and the curve-fit function. However, the power-law relationship is clearly nonlinear in its parameters, \(\textbf{w}_k\), and thus an iterative process shall be used in order to find the optimum solution. By treating the present problem as a nonlinear least-squares regression, the objective function can be defined as

$$\begin{aligned} \mathcal {E}(\textbf{w}_k(n)) & = \textbf{e}_{i}^\top (n)\textbf{e}_{i}(n) \nonumber \\ & = \textbf{y}_{i}^\top (n) \textbf{y}_{i}(n) -2\textbf{y}_{i}^\top (n)\hat{\textbf{y}}_{i}(n) + \hat{\textbf{y}}_{i}^\top (n)\hat{\textbf{y}}_{i}(n), \end{aligned}$$
(2)

where \(\textbf{e}_{i}(n) = \textbf{y}_{i}(n) - \hat{\textbf{y}}_{i}(n)\) is the error vector, with \(\textbf{y}_{i}(n) = [y_{i}(1) y_{i}(2) \cdots y_{i}(n) ]^\top \) and \(\hat{\textbf{y}}_{i}(n) = \begin{bmatrix}\hat{y}_{i}(1) & \hat{y}_{i}(2) & \cdots & \hat{y}_{i}(n) \end{bmatrix}^\top \). For each sample n, the optimization algorithm acts recursively on the parameter vector \(\textbf{w}_k(n)\) in order to minimize the cost function, \(\mathcal {E}(\cdot )\).

The first-order Taylor series approximation of the function \(\mathcal {E}(\cdot )\) at the instant \(n+1\) is given by

$$\begin{aligned} \Delta \mathcal {E}(\textbf{w}_{k}(n)) \simeq \boldsymbol{\Delta }\textbf{w}_k^\top (n)\textbf{g}(n), \end{aligned}$$
(3)

where \(\Delta \mathcal {E}(\textbf{w}_{k}(n)) = \mathcal {E}(\textbf{w}_{k}(n+1)) - \mathcal {E}(\textbf{w}_{k}(n))\) is the difference of the cost function between the instants \(n+1\) and n, \(\boldsymbol{\Delta }\textbf{w}_k(n) = \textbf{w}_{k}(n+1) - \textbf{w}_{k}(n)\) is the update vector to be calculated by the optimization method, and

$$\begin{aligned} \textbf{g}(n) = \frac{\partial \mathcal {E}(\textbf{w}_k(n))}{\partial \textbf{w}_k(n)} \end{aligned}$$
(4)

is the gradient vector.

By replacing (2) into (4), we have that [3]

$$\begin{aligned} \textbf{g}(n) & = -2 \frac{\partial \hat{\textbf{y}}_{i}(n)}{\partial \textbf{w}_k(n)} \textbf{y}(n) + 2 \frac{\partial \hat{\textbf{y}}_{i}(n)}{\partial \textbf{w}_k(n)} \hat{\textbf{y}}_{i}(n) \nonumber \\ & = -2 \textbf{J}^\top (n) \textbf{e}_{i}(n), \end{aligned}$$
(5)

where

$$\begin{aligned} \textbf{J}(n) = \frac{\partial \hat{\textbf{y}}^\top _{i}(n)}{\partial \textbf{w}_k(n)} \end{aligned}$$
(6)

is the Jacobian matrix in denominator layout. The steepest descent algorithm updates the parameters vector in the opposite direction of the gradient vector. In other words, the vector

$$\begin{aligned} \boldsymbol{\Delta }\textbf{w}_k(n) = \gamma \textbf{J}^\top (n) \textbf{e}_{i}(n) \end{aligned}$$
(7)

points at the tangent line in which the downhill direction of \(\mathcal {E}(\cdot )\) is maximum on the operating point \(\textbf{w}_k(n)\). In this equation, the value \(\gamma \in \mathbb {R}\) is a step-learning hyperparameter that regulates the convergence speed [13]. Although the gradient descent method has the advantage of simplicity, such an estimator has only first-order local information about the error surface in its neighborhood. In order to increase the performance of the estimator, one can employ another algorithm, called Newton’s method, which considers the quadratic approximation of the Taylor series, i.e.,

$$\begin{aligned} \Delta \mathcal {E}(\textbf{w}_k(n)) \simeq \boldsymbol{\Delta }\textbf{w}_k^\top (n)\textbf{g}(n) + \frac{1}{2}\boldsymbol{\Delta }\textbf{w}_k^\top (n)\textbf{H}(n)\boldsymbol{\Delta }\textbf{w}_k(n), \end{aligned}$$
(8)

where

$$\begin{aligned} \textbf{H}(n) = \frac{\partial ^2 \mathcal {E}(\textbf{w}_k(n))}{\partial \textbf{w}_k(n) \partial \textbf{w}_k^\top (n)} \end{aligned}$$
(9)

is the Hessian matrix at the instant n.

By differentiating (8) with respect to \(\textbf{w}_k(n)\) and setting its value to zero, we get the update vector that minimizes \(\mathcal {E}(\cdot )\) quadratically, which is given by

$$\begin{aligned} \textbf{g}(n) + \textbf{H}(n)\boldsymbol{\Delta }\textbf{w}_k(n) & = \textbf{0} \nonumber \\ \boldsymbol{\Delta }\textbf{w}_k(n) & = - \textbf{H}^{-1}(n) \textbf{g}(n), \end{aligned}$$
(10)

where \(\textbf{0}\) is the zero vector. Replacing (5) into (10), ignoring the constant factor, and inserting the step-learning, yields

$$\begin{aligned} \boldsymbol{\Delta }\textbf{w}_k(n) = \gamma \textbf{H}^{-1}(n) \textbf{J}^\top (n) \textbf{e}_{i}(n). \end{aligned}$$
(11)

whereas the steepest descent seeks the tangent line in the most downhill direction, Newton’s algorithm finds the tangent parabola that minimizes the cost function, which prompts to a faster convergence to the optimum value when compared to gradient-based methods. However, the biggest drawback is the computation of \(\textbf{H}^{-1}(n)\), which is usually costly.

One way out is to resort to quasi-Newton methods, where the inverse of the Hessian matrix is updated recursively and updated by low-rank matrices, without requiring inversion. Another solution is to obtain a nonrecursive approximation of \(\textbf{H}(n)\). For the optimization problems where the objective function is the sum of the squares of the error, the Gauss-Newton method can be used to accomplish such task. The biggest advantage is that the Hessian is approximated by a Gramian matrix that takes only first-order derivatives, in addition to being symmetric and positive definite, which consequently makes it invertible.

The base idea of the Gauss-Newton method is to linearize the dependence of \(\hat{\textbf{y}}_{i}(n)\) on a local operating point \(\textbf{w}\), i.e.,

$$\begin{aligned} \left. \hat{\textbf{y}}_{i}(n)\right| {_{\textbf{w}_k(n)+\textbf{w}}} & \simeq \hat{\textbf{y}}_{i}(n) + \frac{\partial \hat{\textbf{y}}^\top _{i}(n)}{\partial \textbf{w}_k(n)} \textbf{w} \nonumber \\ & \simeq \hat{\textbf{y}}_{i}(n) + \textbf{J}(n) \textbf{w}, \end{aligned}$$
(12)

where \(\left. \hat{\textbf{y}}_{i}(n)\right| {_{\textbf{w}_k(n)+\textbf{w}}}\) is the value of \(\hat{\textbf{y}}_{i}(n)\) when the coefficient vector is \(\textbf{w}_k(n)+\textbf{w}\). By replacing (12) into (2), it follows that

$$\begin{aligned} \mathcal {E}(\textbf{w}_k(n) + \textbf{w}) = & \textbf{y}_i^\top (n)\textbf{y}_{i}(n) -2\textbf{y}_i^\top (n)(\hat{\textbf{y}}_{i}(n) + \textbf{J}(n)\textbf{w}) \nonumber \\ & + (\hat{\textbf{y}}_{i}(n) + \textbf{J}(n)\textbf{w})^\top (\hat{\textbf{y}}_{i}(n) + \textbf{J}(n)\textbf{w}) \nonumber \\ = & \textbf{y}_i^\top (n)\textbf{y}_{i}(n) + \hat{\textbf{y}}_i^\top (n)\hat{\textbf{y}}_{i}(n) \nonumber \\ & - 2(\textbf{y}_{i}(n) - \hat{\textbf{y}}_{i}(n))^\top \textbf{J}(n)\textbf{w} \nonumber \\ & - 2 \textbf{y}_i^\top (n)\hat{\textbf{y}}_{i}(n) + \textbf{w}^\top \textbf{J}^\top (n) \textbf{J}(n)\textbf{w} \end{aligned}$$
(13)

Thus, differentiating (13) with respect to \(\textbf{w}\), and setting the result to zero, we obtain \(\textbf{w}=\boldsymbol{\Delta } \textbf{w}_k(n)\), i.e.,

$$\begin{aligned} & - \textbf{J}^\top (n) \textbf{e}_{i}(n) + \textbf{J}^\top (n) \textbf{J}(n) \boldsymbol{\Delta }\textbf{w}_k(n) = \textbf{0}. \end{aligned}$$
(14)

Reorganizing the previous equation and inserting the step-learning, we have that

$$\begin{aligned} \boldsymbol{\Delta }\textbf{w}_k(n) = \gamma (\textbf{J}^\top (n) \textbf{J}(n))^{-1} \textbf{J}^\top (n)\textbf{e}_{i}(n). \end{aligned}$$
(15)

By comparing (15) with (11), we notice that the Gauss-Newton method approximates the Hessian matrix, \(\textbf{H}(n)\), to \(2\textbf{J}^\top (n) \textbf{J}(n)\) (the constant factor was dropped out), thus avoiding second-order derivatives.

The Levenberg-Marquardt (LM) method combines the two algorithms presented in this article: the steepest descent and a Newton-like algorithm. It tries to take advantage of the convergence guaranteedFootnote 1 by the gradient method, and the fast convergence of Newton’s method. The update vector of the LM algorithm is given by

$$\begin{aligned} \boldsymbol{\Delta }\textbf{w}_k(n) = \gamma \left( \textbf{J}^\top (n) \textbf{J}(n) + \lambda (n) \textbf{I} \right) ^{-1} \textbf{J}^\top (n) \textbf{e}_{i}(n), \end{aligned}$$
(16)

where \(\textbf{I}\) is the identity matrix. The goal of the hyperparameter \(\lambda (n)\) is twofold: it performs the Tikhonov regularization, preventing \(\textbf{J}^\top (n) \textbf{J}(n)\) from being ill-conditioned, and also controls how the algorithm behaves. The LM method leans toward Gauss-Newton for small values of \(\lambda (n)\), while large values make it behave as the gradient descent. The initial coefficient vector, \(\textbf{w}_k(1)\), is likely far from the optimal point since it is randomly initialized. Hence, it is sensible that the LM algorithm initially behaves like the gradient descent once \(\textbf{J}^\top (1) \textbf{J}(1)\) is probably a bad estimate (the Hessian matrix depends on the operating point of the coefficient vector when the cost function is nonquadratic). Insofar as the estimate of \(\textbf{H}(n)\) becomes trustworthy, the LM algorithm shall decrease \(\lambda (n)\) toward zero, causing it to behave like the Gauss-Newton.

Finally, the coefficient vector adopted to estimate the rainfall is defined as

$$\begin{aligned} \textbf{w}_k \triangleq \textbf{w}_k(N+1) = \textbf{w}_k(N) + \boldsymbol{\Delta }\textbf{w}_k(N), \end{aligned}$$
(17)

where N is the number of samples in the training set.

3.2 Data Treatment and Analysis

The first step in the data treatment is to select days with rainfall events from the collected time series. In other words, the received power level, \(\tilde{x}_k(m)\), and rainfall gauge measurement, \(\tilde{y}_i(m)\), are decimated, producing \(\tilde{x}_k(n)\) and \(y_i(n)\), respectively. From the 6-month dataset, only 11 days with rainfall events are considered. A rainfall event is defined as a period in which the bucket gauge continuously measures nonzero values for, at minimum, 3 h and 45 min. Since the recording period is only 15 min, it leads to a dataset containing at minimum 15 samples for each rainfall event.

Afterwards, the sampled received power level is converted to the specific attenuation, \(x_k(n)\), by subtracting the baseline level from \(\tilde{x}_k(n)\) and dividing it by the distance of the kth radio link. The baseline is determined by using the method recommended by Schleiss and Berne [12], which uses the moving window method to determine the variance.

In order to obtain a reliable analysis with the available dataset, it is performed a cross-validation where each fold comprises the samples obtained from a given rainfall event. The parameters were estimated using the training dataset, whereas the test dataset is used to assess the model performance. For this paper, it is adopted a ratio of \(63\%-37\%\) for the training and test datasets, respectively. It leads to 7 and 4 days for the training and test datasets, respectively. Algorithm 1 summarizes the procedure used in this work to process the data, estimate the parameters, and analyze the results. In this algorithm, the symbol \(\rho \) indicates the Pearson correlation coefficient between \(y_i(n)\) and \(\hat{y}_i(n)\), which is used as figure of merit.

Algorithm 1
figure a

Data processing for the kth radio link

4 Results and Discussions

Considering that the system provides the minimum, mean, and maximum attenuation reached by each radio link, Fig. 1 shows the box plot obtained by the model for each situation. A kernel smoothing technique is applied to the set of Pearson correlations in order to estimate its distribution, which is also shown in this figure. For the sake of performance comparison, the results obtained in this work are contrasted to the performance obtained using the original ITU coefficients, under the same methodology of test datasets and cross-validation.

Fig. 1.
figure 1

Box plot of the rainfall estimation.

It is possible to notice that the proposed model presents a considerable variance with outliers when it is used the maximum or minimum attenuation. However, for the mean attenuation, the mean correlation is \(82.45\%\), without considerable loss of performance for all folds. This result is slightly lower than the mean correlation obtained by the physics-based method. However, its performance in terms of variance, \(1.10\times 10^{-2}\), surpasses the results obtained when using the ITU model, which obtained \(1.16\times 10^{-2}\).

Figure 2 shows the time series estimation for the best and worst folding. Both estimations came from the gauge number 1, which has more radio links associated with it. Nevertheless, as shown in Fig. 2a, both methods failed in estimating the measured rainfall that occurred on June 13, 2017 for a given training-test dataset split.

Fig. 2.
figure 2

Time series estimation of the best and worst folding case.

5 Conclusions and Future Work

In this article, we presented a new methodology to estimate the coefficient parameters of the rainfall via Levenberg-Marquardt algorithm. The available data was properly preprocessed before estimating the coefficient parameters. The cross-validation technique was applied in order to obtain a reliable estimation of performance, assessed in terms of the Pearson correlation coefficient.

Additionally, the estimated performance was compared with results when the original ITU coefficients are used, under the same conditions of the test datasets. The results show that both methodologies achieved similar results, where the present estimation technique presented a lower mean and variance.

In this work, the power-law relationship provided an estimation mapping that does not take into account the time correlation between the samples. Moreover, the raw data was decimated, and only intervals with reasonable rainfall events are considered. Future efforts might consider the correlation time of the radio link attenuation, and how it can be exploited to estimate precipitation.