Quantifying the effect of climate variability on seasonal precipitation using Bayesian clustering approach in Kebir Rhumel Basin, Algeria

Belkhiri, Lazhar; Krakauer, Nir

doi:10.1007/s00477-023-02488-z

Quantifying the effect of climate variability on seasonal precipitation using Bayesian clustering approach in Kebir Rhumel Basin, Algeria

ORIGINAL PAPER
Published: 14 June 2023

Volume 37, pages 3929–3943, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Quantifying the effect of climate variability on seasonal precipitation using Bayesian clustering approach in Kebir Rhumel Basin, Algeria

Download PDF

Lazhar Belkhiri¹ &
Nir Krakauer²

115 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a Bayesian clustering approach that allows quantification of the effect of climate variability on seasonal precipitation data in Kebir Rhumel Basin (KRB). We applied this approach to simultaneously identify clusters of stations with similar characteristics and the climate variability associated with each cluster and for the individual stations within each cluster. Both full pooling Bayesian clustering (FPBC) and partial pooling Bayesian clustering (PPBC) models with nonstationary generalized extreme value (GEV) distribution are applied to each season. In these models, a climate index variable, namely the El Niño Southern Oscillation (ENSO), is included as a time-varying covariate with an appropriate basis function to potentially explain the temporal variation of one or more of the parameters of the distribution. Results reveal that the partial pooling Bayesian clustering model provided the best fit for the seasonal precipitation data. The significant effect of ENSO differs from one season to another. During spring and autumn, ENSO significantly affects precipitation across large parts of KRB. Furthermore, the southern part and northern part of KRB are positively and negatively influenced by ENSO during winter and summer, respectively. Moreover, almost all stations during spring and autumn are negatively and positively influenced by ENSO, respectively. Finally, we demonstrated that the proposed model helps to reduce the uncertainty in the parameter estimation and provides more robust results.

Spatio-temporal variability of extreme precipitation characteristics under different climatic conditions in Fars province, Iran

Article 19 November 2021

Mapping the spatiotemporal diversity of precipitation in Iran using multiple statistical methods

Article 13 September 2022

Updating regionalization of precipitation in Ecuador

Article 07 January 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Climate change and variability is one of the most serious challenges facing the global ecosystem in the 21st Century. Global warming and climate fluctuation have comprehensively affected the water cycle components, especially precipitation (Berghuijs et al. 2017; Konapala et al. 2020; Belkhiri and Kim 2021; IPCC, 2021). The El Niño Southern Oscillation (ENSO) is the single most influential climate phenomenon affecting the variability of precipitation (Ropelewski and Halpert 1987; Hoerling et al. 1997; Dai et al. 1997; Ward et al. 2014). Although ENSO is defined based on Pacific surface pressure or sea surface temperature, it affects climate in various ways globally, particularly across the tropics and subtropics. Many studies have shown the complex effects of ENSO on the precipitation in many countries throughout Africa (Ropelewski and Halpert 1987; Kiladis and Diaz 1989; Nicholson and Kim 1997; Mason and Goddard 2001; Lüdecke et al 2021). Fewer studies investigated the effect of ENSO on precipitation in Algeria. For example, Meddi et al. (2010) analyzed the temporal variability of annual precipitation in the Macta and Tafna catchments of northwestern Algeria and showed a negative ENSO correlation with precipitation. Turki et al. (2016) studied the long-term variability of rainfall in the Soummam watershed and noted significant effect of ENSO on rainfall variability in northeastern Algeria. Zeroual et al. (2016) analyzed the climate indices' influence on temperature and annual as well as seasonal rainfall in the coastal region of northern Algeria from 1972 to 2013, the researchers found a positive correlation between rainfall and the ENSO (El Niño-Southern Oscillation) index. However, the effect of climate indices such as ENSO on the full probability distribution of precipitation in Algeria has not yet been comprehensively assessed. Therefore, there is room for new approaches to evaluate the response of precipitation to ENSO.

The comprehensive effects of climate variability/change on precipitation have been studied by using several nonstationary models. For instance, Micevski et al. (2006) examined the effect of Inter-decadal Pacific Oscillation (IPO) for mainly coastal, eastern Australian flood data and found that the IPO modulates the flood risk within parts of eastern Australia. Ouarda and El-Adlouni (2011) discussed nonstationary frequency analysis models in hydrology with a focus on the Bayesian approach and demonstrated that the Bayesian approach can be applied to more general and more complex models where parameters are expressed as nonlinear functions of covariates. Madsen et al. (2014) presented a review of trend analysis of extreme precipitation and floods and described that non-stationarity in extreme precipitation and flood characteristics due to climatic changes is high on the research agenda in Europe. Chen et al. (2014) developed a Hierarchical Bayesian approach for regional rainfall and streamflow forecast using appropriate climate indicators and described that this approach allows appropriate grouping of information in the region and explicit modeling of the covariance of the model errors and the regression coefficients to better represent the uncertainty in the model parameters and the final streamflow and rainfall forecasts. Sun et al. (2014) demonstrated that a Bayesian regional framework provides the opportunity to assess the value of regional information in better identifying the effect of climate variability on hydrometeorological extremes. Despite the results of these valuable studies, the reduction of the uncertainties and the identification of the effect of climate variability in these models across a relatively large area by regional pooling of information on model parameters raises questions. Some studies addressed this question but considered a relatively small homogeneous area over which information can be pooled (Aryal et al. 2009; Renard et al. 2008; Renard et al. 2013; Chen et al. 2014; Sun et al. 2014; Sun et al. 2015a). Moreover, some authors have applied Bayesian clustering models based on Dirichlet process or Expectation–Maximization (EM) algorithm (Xiong and Yeung 2004; Johnson et al. 2013; Nieto-Barajas and Contreras-Cristán, 2014) but these mixture models have been limited to a Gaussian mixture model. Sun and Lall (2015) and Sun et al. (2015b) developed a new Bayesian clustering approach for exploring homogeneity of response in large area datasets, through a multicomponent mixture model (or clusters). This approach allows the reduction of uncertainties through full pooling or partial pooling across automatically chosen subsets of the data. In this study, we applied a similar Bayesian clustering approach with non-stationary GEV distribution for seasonal precipitation using ENSO index as covariate.

The current paper is structured as follows. Section 2 introduces the study area and data. Section 3 presents the Bayesian clustering approach. Section 4 describes the results and discussions, followed by Sect. 5 in which we concluded the main findings.

2 Study area and data

The Kebir Rhumel Basin (KRB) is located in northern Algeria and includes seven sub-basins, which covers an area of approximately 8815 km² (Table 1 and Fig. 1). The KRB is drained by two major rivers, Oued Rhumel in the southern part and Oued Endja in the western part. Beni-Haroun and Boussiaba are considered as the major dams in the basin, with a capacity of 960 hm³ and 120 hm³, respectively (Marouf et al. 2019). According to Mebarki (2005), of KRB climate is humid in northern part and semi-arid in the southern part.

Table 1 The surface area and number of stations for each Sub-Basins in KRB

Full size table

The precipitation datasets for the period 1970–2013 used in this study were obtained from the National Agency for Hydraulic Resources and Office National Office Meteorology. In the current study, the mean value of monthly precipitation is derived for the four seasons winter (DJF), spring (MAM), summer (JJA) and autumn (SON). For each season, a station was selected only if it had at least 90% of the total precipitation data available during the period 1970–2013. Overall, 24 stations are utilized for the analysis. Figure 1 displays the spatial distribution of the selected stations in KRB.

In order to analyze the effect of climate variability on the seasonal precipitation, the El Niño Southern Oscillation (ENSO) was considered. Many indices have been developed to characterize aspects of ENSO evolution (Trenberth 1997; Trenberth and Stepaniak 2001). The seasonal mean of Southern Oscillation Index (SOI) is used as the measure of ENSO, and was obtained from the National Oceanic and Atmospheric Administration Climate Prediction Center (https://www.cpc.ncep.noaa.gov/data/indices/soi/).

3 Methodology

3.1 Nonstationary model structure

For the nonstationary modeling of the seasonal precipitation, it is very important to choose an appropriate distribution. In recent decades, many studies have successfully used the generalized extreme value (GEV) distribution to model the nonstationarity in precipitation events by including time-varying parameters (Du et al. 2014; Cheng and Aghakouchak 2014; Gao et al. 2016; Agilan and Umamahesh, 2017a, b; Su and Chen 2019). Steirou et al. (2019) identified links between seasonal flood probabilities and large-scale atmospheric indices for entire Europe by adopting a Bayesian framework with climate-informed (non-stationary) GEV distribution and compare it with the classical (stationary) GEV distribution with parameters invariant in time. They demonstrated that the climate-informed models were preferred over the classical GEV distribution for a high percentage of stations for most seasons and the seasonally averaged indices provided in most cases better fits compared with monthly values. Ossandón et al. (2022) developed a Bayesian Hierarchical Model (BHM) to project seasonal streamflow extremes for several lead times based on a Gaussian elliptical copula and Generalized Extreme Value (GEV) margins with nonstationary parameters and demonstrated that the framework proposed could be useful for the early implementation of flood risk adaptation and preparedness strategies. Thus, a nonstationary generalized extreme value (GEV) distribution is considered to model the seasonal precipitation over the KRB at each station. Here, we assumed that the GEV location parameter was linked to the temporal climate covariate (ENSO index) using a linear regression model. In the preliminary analysis, we considered the effect of the climate index on both the location and scale parameter, but this did not provide very different results than those for a covariate on the location parameter only (not shown). The shape parameter is kept constant as its estimation includes large uncertainties, even under the assumption of stationarity (Coles et al. 2001; Papalexiou and Koutsoyiannis 2013; Silva et al. 2017; Steirou et al. 2017).

In this application, the three proposed model structures of no pooling, full pooling and partial pooling can be written as follows:

(a)
No pooling model

$$Y\left(s,t\right) \sim GEV\left({\mu }_{0}\left(s\right) + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi (s)\right)$$
(1)
(b)
Full pooling model

$$Y\left(s,t\right) \sim GEV\left({\mu }_{0} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi \right)$$
(2)
(c)
Partial pooling model

$$Level 1: Y\left(s,t\right) \sim GEV\left({\mu }_{0} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi \right) Level 2: {\mu }_{1}\left(s\right) \sim N\left({\mu }_{\mu },{\sigma }_{\mu }\right)$$
(3)

Y(s,t) is the observation of the variable at station s and time t. ENSO(t) is climate covariate at the time t. µ₀(s)/µ₀, µ₁(s), σ(s) and ξ(s)/ξ are model parameters, where the µ₀(s)/µ₀ is the intercept of the location parameter, and the µ₁(s) is the slope of the location parameter at station s. σ(s) and ξ(s)/ξ are the scale and shape parameters, respectively. µ₀(s), µ₁(s), σ(s) and ξ(s) are considered as a site-specific (local) parameter while µ₀ and ξ are a regional parameters. N(.,.) denotes a normal distribution. μ_μ and σ_μ are the hyper-parameters in the second level model (Eq. 3).

In the current research, we use the same linear regression function on the location parameter in the proposed models to describe the ENSO temporal climate covariate, while these models differ in the settings of µ₀, µ₁ and ξ, which are either site-specific (no pooling), regional (full pooling), or have a second level (partial pooling). In the no pooling model (Eq. 1), the three GEV parameters are estimated locally (i.e., were site-specific). This model was used as a baseline. In the full pooling model (Eq. 2), all model parameters are allowed to be estimated independently for each station except that the intercept of the location parameter (µ₀) and the shape parameter (ξ) are regional and estimated by using all data. We considered these two parameters for clustering in order to standardize the seasonal precipitation data, and considering that the shape parameter requires more data to obtain a precise estimation due to large uncertainty (Coles, 2001, p.106). In the partial pooling model (Eq. 3), we allowed for pooling of information across stations for estimating the regression slope of the location parameter to reduce the associated uncertainty, but also for variability in this parameter between stations. In this model, the slope parameter is presumed to drawn from common hyper-distribution, therefore the µ₁(s) is in turn described by a set of hyper-parameters μ_μ and σ_μ (i.e., second level model (Eq. 3)). Here, the hyper-distribution describes the second level of the hierarchical Bayesian model. The intercept µ₀ and the shape parameter ξ are still regional in this model.

3.2 Hierarchical Bayesian clustering model

The hierarchical Bayesian clustering model with a nonstationary GEV distribution is applied to model the seasonal precipitation. Sun and Lall (2015) and Sun et al. (2015b) demonstrated that this model can be applied directly to a heterogeneous area. In this mixture model, we assume that the stations can be classified into K clusters, and a hierarchical Bayesian model (H_k) for each cluster (k) is developed. Here, each station has a probability π_k to belong to a cluster k, which needs to be estimated.

The mixture distribution across all clusters can be given as follows:

$$Y\left(s\right) \sim \sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)$$

(4)

where ${f}_{{H}_{k}}(s)$ is the likelihood function of the hierarchical Bayesian model (H_k) at station s. π = {π₁, …, π_k} denotes the mixing probabilities (i.e., mixing coefficients or weights). In order to be valid probabilities, the probability π_k must satisfy:

$$0\le {\pi }_{k}\le 1 \left(k=1,\dots ,K\right), and \sum_{k=1}^{K}{\pi }_{k}=1$$

(5)

Therefore, the likelihood of the hierarchical Bayesian clustering model can be computed as follows:

$$L= \prod_{s=1}^{S}\sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)$$

(6)

where L is the likelihood function and S in the total number of stations. A schematic of hierarchical Bayesian clustering model is presented in Fig. 2.

In this study, full pooling Bayesian clustering (FPBC) and partial pooling Bayesian clustering (PPBC) models are considered for each cluster k. A summary of the two proposed models used in this research is shown in Table 2.

Table 2 Nonstationary GEV distribution models to fit the seasonal precipitation

Full size table

Based on the partial pooling model described in Eq. (3), we constructed this model for each cluster by setting the intercept of the location parameter (${\mu }_{0}$) and the shape parameter ($\xi$) to be fully pooled (regional) and the slope of the location parameter to be partially pooled. The hierarchical Bayesian model (H_k) for each cluster k can be written as follows:

$$Level 1: Y\left(s,t\right) \sim GEV\left({\mu }_{0,k} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),{\xi }_{k}\right) Level 2: {\mu }_{1}\left(s\right) \sim N\left({\mu }_{{\mu }_{k}},{{\sigma }_{\mu }}_{k}\right)$$

(7)

where ${\mu }_{0,k}$, ${\xi }_{k}$, ${\mu }_{{\mu }_{k}}$ and ${\sigma }_{{\mu }_{k}}$ are parameters that are associated with cluster k.

The likelihood function of H_k at station s can be calculated as follows:

$${f}_{{H}_{k}}\left(s\right)= \prod_{t=1}^{T}{f}_{GEV}\left(Y(s,t)|{\mu }_{0,k},{\mu }_{1}\left(s\right),\sigma \left(s\right),{\xi }_{k},ENSO(t)\right) X {f}_{N}\left({\mu }_{1}\left(s\right)|{\mu }_{{\mu }_{k}},{\sigma }_{{\mu }_{k}}\right)$$

(8)

The full likelihood function of the hierarchical Bayesian clustering model is obtained when integrating Eq. (8) into Eq. (4).

In this application, we used a Dirichlet distribution with identical parameters (a vector of 2 with length k) as a prior for π_k. For the other parameters, we used flat priors (normal or uniform distribution with large variance). For the initial values, we used the values that were obtained by fitting a nonstationary GEV distribution using maximum likelihood as the starting point for the intercept of the location parameter, scale and shape of each station. For the hyper-parameters, the mean of slope parameter µ₁(s) is set to zero initially.

3.3 Implementation and model fitting

For each Bayesian clustering model, the posterior probability distribution of the model parameters is estimated using a No-U-Turn Hamiltonian Monte Carlo method (Hoffman and Gelman 2014). One chain of length 30,000 was run, with the first 15,000 iterations discarded as warmup. The convergence is evaluated by the potential scale reduction factor (Gelman and Rubin 1992), which should be smaller than 1.2 for each parameter. All the calculations are conducted using R and RStan (Stan Development Team 2022).

3.4 Selecting the optimal number of clusters

The selection of the optimal number of clusters is an important issue in mixture modeling. In a common trade-off in model selection problems, the mixture model with too many clusters may overfit the data, while a mixture with too few clusters may not be flexible enough to approximate the underlying model. Thus, it is important to adopt some statistical criteria to infer an optimal number of clusters (Deng and Han 2018). There are some statistical criteria, such as Akaike Information Criterion (AIC) (Akaike 1974), Bayesian Information Criterion (BIC) (Schwarz 1978), and Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002), can be used to select the models among different clusters. In the Bayesian framework, AIC and BIC can be applied to the integrated likelihood over the model parameters. However, AIC and BIC are not theoretically justified for mixture models and may not be the best way to determining the optimal number of clusters (Biernacki and Govaert 1997; McLachlan and Peel 2004). Alternately, the DIC is a model selection criterion that automatically considers parameter uncertainty by utilizing the posterior distribution. However, the definition and application of DIC to mixture models are not straightforward, and different definitions and adaptations have been proposed (Delorio and Roberst, 2002). To overcome the limitation of BIC, Biernacki et al. (2000) proposed an Integrated Completed Likelihood (ICL) criterion, which showed that it performs well both for selecting a mixture model and an optimal number of clusters. Also, ICL differs from other criteria in that the integrated likelihood of the complete data (observed data) are used to evaluate mixture models. The ICL criterion is defined by:

$$ICL\left(K,\widehat{\theta }\right)= -2\mathrm{log}f\left(Y,\widehat{Z}|K,\widehat{\theta }\right)+ {\upsilon }_{K}log\left(S\right)$$

(9)

where Y is the observed data. $\mathrm{log}f\left(Y,\widehat{Z}|K,\widehat{\theta }\right)$ is the log likelihood. $\widehat{Z}$ is a m*K binary matrix, which refers to the estimated membership of each station, and ${\widehat{Z}}_{s,k}=1$ if and only if station s belongs to the cluster k. $\widehat{\theta }$ is the collection of estimated parameters, ${\upsilon }_{K}$ is the number of parameters, and S is the number of stations. Among the different mixture models, the one with the lowest ICL value is preferred. In this study, we computed the ICL to select the optimal number of clusters for each Bayesian clustering model.

4 Results

4.1 Preliminary investigations

For the preliminary analysis, the no pooling model with a nonstationary GEV distribution was constructed using linear function of the ENSO-index time-covariate to fit the seasonal precipitation across the Kebir Rhumel Basin. The slope parameter (µ₁) characterizes the effect of the climate index (ENSO) on the location parameter. If the effect of the climate index is significant, the posterior distribution of µ₁ not have zero near its center.

The posterior probability distribution of the slope of location parameter (µ₁) estimated from the no-pooling model for all stations and four seasons as boxplots is illustrated in Fig. 3. The red and blue colors of the box represent the significant and non-significant effect of the climate index, respectively. As illustrated in Fig. 3, we can see that the most stations have a positive value of the median posterior distribution of the slope parameter during the winter and autumn seasons, However, most stations of them have a negative value of the median posterior distribution of µ₁ during spring and summer seasons, suggesting that in this basin, ENSO may have contrasting effects on precipitation in different seasons. In addition, the results show that 58% (14 out of 24) and 33% (8 out of 24) of the total stations have a significant effect of the climate index during the spring and autumn, respectively. By contrast, the significant effect of climate index during the rest of the seasons is only detected in two stations. Next, we applied the full pooling Bayesian clustering and partial pooling Bayesian clustering models in order to understand more about the influence of the climate index on the seasonal precipitation and its spatial variability within the basin.

4.2 Bayesian clustering models

In the current study, we applied the full pooling Bayesian clustering (FPBC) and partial pooling Bayesian clustering (PPBC) models using a nonstationary GEV distribution to fit the seasonal precipitation data by varying the number of clusters from 2 to 4.

4.2.1 Model selection

Model selection for the full pooling Bayesian clustering and partial pooling Bayesian clustering models includes the selection of the optimal number of clusters. Choosing the number of clusters (K) is a very important issue when using mixture models. In this study, the ICL is considered, and the best model is selected based on the lowest ICL value. Figure 4 shows the boxplots of the ICL values of FPBC and PPBC models. In all four seasons, the ICL values decrease from K = 2 to K = 4 for both mixture models, indicating that adding the number of clusters improves the fit of the mixture models. Thus, the optimal number of clusters is equal to 4. In other words, the mixture models with K = 4 provide better fit than a mixture models with K = 3 and 2. As shown in Fig. 4, the partial pooling Bayesian clustering model with K equal to 4 has the lowest values of ICL among all K values and thus provide the best fit for the seasonal precipitation data in the Kebir Rhumel Basin.

4.2.2 Assignment of stations to clusters

In order to determine the membership of each station in an appropriate cluster, we calculate the posterior probability (P_post) of each station (s) belonging to cluster k as follows:

$${P}_{post}\left(s \in cluster k\right)= \frac{{\pi }_{k}{f}_{{H}_{k}}\left(s\right)}{\sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)}$$

(10)

A station is a member of the cluster k where its posterior probability is highest.

Figure 5 displays the membership of station s belonging to cluster k for the partial pooling Bayesian clustering model with the number of clusters equal to 4. The summary statistics of the posterior distribution of the model parameters for each cluster in PPBC model is presented in Table 3. From the results, we can see clearly that the PPBC model identify four clusters during the four seasons. It can be seen in Table 3 that, about of 42%, 33% and 58% of the total stations belong to the first cluster in the season of winter, summer and autumn, respectively. Whereas, in the season of spring about of 50% of the total stations belong to the second cluster.

Table 3 Summary of the clustering results for PPBC model with K = 4

Full size table

The estimated seasonal precipitation by the PPBC model with non-stationary GEV distribution for each cluster during the four seasons is displayed in Fig. 6. For the purpose of comparison, the empirical cumulative function of observed and fitted PPBC model with non-stationary GEV distribution to the seasonal precipitation is illustrated in Fig. 7 for each season. The figures show how the estimated theoretical distributions are adapted to the empirical distributions in each season.

As see in Figs. 6 and 7, we can observe that the values of the seasonal precipitation are changed from one season to another and from the first cluster to last one. Also, the cumulative distribution function (CDF) plot showed that PPBC with non-stationary GEV distribution is practically following the observed seasonal precipitation in all stations. Therefore, the results confirm that the PPBC model with k = 4 is the best model for seasonal precipitation which clearly explained the significant variation of the precipitation between the seasons and clusters. The winter season consisted of the stations with the highest values of the seasonal precipitation while summer season had the smallest values. Moreover, the seasonal precipitation values are increased from the first cluster to the last cluster during all seasons.

In all four seasons, the values of the mean posterior distribution of the intercept µ₀ and scale σ parameters are increased from the first to the last cluster, indicating that the last cluster has the larger mean compared with the other clusters. In addition, we can clearly observe that the mean values of the posterior distribution of µ₀ and σ are increased form the south to the north of the Kebir Rhumel Basin in the seasons of winter, spring and autumn. Moreover, the vertical sorting of the clusters is related to the elevation and distance from the Mediterranean Sea. During the winter and spring, the stations belong to the last cluster are covered the sub-basin of Oued Kebir Maritime and situated near to the Mediterranean Sea. Thus, this indicated that the high values of the precipitation are observed in the sub-basin of Oued Kebir Maritime (1007). The mean posterior distribution of the shape parameter (ξ) in the most stations is negative during the seasons of winter, spring and autumn and positive during the summer season.

4.2.3 Identify the effect of the climate variability

In the current study, the effect of the El Niño Southern Oscillation (ENSO) index on the seasonal precipitation is characterized by slope of the location parameter (µ₁). For each season, the posterior distribution of the slope parameter µ₁ estimated by the partial pooling Bayesian clustering model for each station is displayed in Fig. 8. The colors of the boxes represent the identified clusters by clustering model with K equals to 4. The median posterior distribution of µ₁ for each cluster is presented in Table 3. From the results, we can see that most stations during the winter and autumn seasons have a positive value of the median posterior of the slope µ₁, while most stations in spring and summer seasons have a negative value of µ₁. This again indicates that ENSO (SOI) has a positive influence on the seasonal precipitation during winter and autumn seasons whereas a negative influence during spring and summer seasons.

In addition, to better understand the significant effect of ENSO on the seasonal precipitation, we examine the posterior distribution of the slope µ₁ of the location parameter for each season. For a significant effect of ENSO, the zero value is not included in the 90% posterior interval of the slope parameter µ₁. A significant positive (negative) effect is considered if the probability of the posterior distribution pdf of µ is larger (smaller) than zero at the 10% (90%) significance level. Table 4 presents the percentage of stations with a significant positive and negative effect of ENSO by PPBC model. Figure 5 shows the spatial distribution of the significant effect of ENSO for each season. From the results, we can observe that a significant effect of ENSO is found in precipitation at 17% (4 stations), 75% (18 stations), 12% (3 stations) and 75% (18 stations) of the total stations during winter, spring, summer and autumn seasons, respectively, indicating that the highest numbers of significant effect of ENSO are found in spring and autumn. Also, the significant positive and negative effects of ENSO on the seasonal precipitation differ from one season to another. In winter and summer, a significant effect of ENSO is only detected in the stations of the first cluster. Thus, a significant positive influence of ENSO during winter is observed in the southern part of the Kebir Rhumel Basin, but a significant negative influence during summer is detected in the northern part of the Kebir Rhumel Basin (sub-basin 1007). In spring and autumn, the stations of the first three and two clusters are negatively and positively influenced by ENSO, respectively. In addition, all stations of the sub-basins except those of Oued Kebir Maritime sub-basin are negatively and positively influenced by ENSO during spring and autumn, respectively. These results were roughly consistent with the findings in previous studies in Africa (Ropelewski and Halpert 1987; Kiladis and Diaz 1989; Nicholson and Kim 1997; Mason and Goddard 2001; Lüdecke et al 2021) and Algeria (Meddi et al. 2010; Turki et al. 2016; Zeroual et al. 2016) in terms of the seasonal distribution of ENSO impacts on precipitation amount.

Table 4 Percentage of stations with a significant effect of ENSO on seasonal precipitation in PPBC model

Full size table

5 Summary and conclusions

This study aimed to analyze the effect of ENSO on the seasonal precipitation across the Kebir Rhumel Basin using Bayesian clustering approach. For each season, full pooling Bayesian clustering and partial pooling Bayesian clustering models with a nonstationary GEV distribution are applied. In these models, we assumed that the location parameter was linked to the temporal climate covariate using linear regression function. The intercept and the slope of the location parameter and the shape parameter were used for clustering. An advantage of the approach is that it allows the clustering and the model parameter estimation to proceed at the same time and reduce the uncertainty in the parameter estimation by transferring the information across stations with similar characteristics.

The main findings of this study are summarized as follows: (i) It was found that adding the number of clusters improves the fit of both Bayesian clustering models. (ii) For all four seasons, the partial pooling Bayesian clustering model with K = 4 provided the best fit for the seasonal precipitation data. (iii) ENSO significantly effects precipitation across large parts of the Kebir Rhumel Basin during spring and autumn seasons. (iv) In winter and autumn, 17% and 75% of the stations were found to be positively influenced by ENSO, respectively. On the contrary, 75% and 12% of the stations examined were negatively affected by ENSO during spring and summer, respectively, indicating that the ENSO effect changed from one season to another. (v) The significant positive and negative influences of ENSO are observed in southern part and northern part of the Kebir Rhumel Basin during winter and summer, respectively. All stations except those found in Oued Kebir Maritime sub-basin are negatively and positively influenced by ENSO during spring and autumn, respectively.

An extension of the Bayesian clustering approach to consider several appropriate covariates at the same time is possible. In this study, we assumed a symmetric effect of the positive and negative phases of climate index, leading to a linear relationship between the ENSO measure SOI and the distribution location parameter. However, an asymmetric relation may better identify the influence of the climate variability on seasonal precipitation. As well, clustering models could also explicitly include spatial dependence across the stations. Sun et al. (2015b) demonstrated that considering spatial dependence in Hierarchical Bayesian clustering model can avoid under-estimating uncertainties. In future work, we expect to develop a strategy that can effectively model stations precipitation trends with adding various climate indices as a covariates using both symmetric and asymmetric analysis and considering the spatial dependence in a cluster.

Data availability

The ENSO datasets are available at https://www.cpc.ncep.noaa.gov/data/indices/soi/. The data used in this paper is available upon request of the contact author.

References

Agilan V, Umamahesh NV (2017a) What are the best covariates for developing non-stationary rainfall intensity-duration-frequency relationship? Adv Water Resour 101:11–22
Article Google Scholar
Agilan V, Umamahesh NV (2017b) Covariate and parameter uncertainty in non-stationary rainfall IDF curve. Int J Climatol 38:365–383
Article Google Scholar
Akaike H (1974) New look at statistical-model identification. IEEE Trans Autom Control 19(6):716–723
Article Google Scholar
Aryal SK, Bates BC, Campbell EP, Li Y, Palmer MJ, Viney NR (2009) Characterizing and modeling temporal and spatial trends in rainfall extremes. J Hydrometeorol 10(1):241–253
Article Google Scholar
Belkhiri L, Kim TJ (2021) Individual influence of climate variability indices on annual maximum precipitation across the global scale. Water Resour Manage 35(9):2987–3003
Article Google Scholar
Berghuijs WR, Larsen JR, van Emmerik THM, Woods RA (2017) A global assessment of runoff sensitivity to changes in precipitation, potential evaporation, and other factors. Water Resour Res 53(10):8475–8486
Article Google Scholar
Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29:451–457
Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Article Google Scholar
Chen X, Hao Z, Devineni N, Lall U (2014) Climate information based streamflow and rainfall forecasts for Huai River basin using hierarchical Bayesian modeling. Hydrol Earth Syst Sci 18(4):1539–1548
Article Google Scholar
Cheng L, Aghakouchak A (2014) Nonstationary precipitation intensity-duration-fre- quency curves for infrastructure design in a changing climate. Sci Rep-UK 4:7093
Article Google Scholar
Coles S, Bawa J, Trenner L, Dorazio P (2001) An introduction to statistical modeling of extreme values. Springer, London
Book Google Scholar
Dai A, Fung IY, Del Genio AD (1997) Surface observed global land precipitation variations during 1900–88. J Clim 10(11):2943–2962
Article Google Scholar
Delorio M, Robert CP (2002) Discussion of spiegelhalter. J Royal Statist Soc Ser B 64:629–630
Google Scholar
Deng Hongbo, Han Jiawei (2018) Probabilistic models for clustering. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, pp 61–86. https://doi.org/10.1201/9781315373515-3
Chapter Google Scholar
Du H, Xia J, Zeng S, She D, Liu J (2014) Variations and statistical probability characteristic analysis of extreme precipitation events under climate change in Haihe River Basin. China Hydrol Process 28:913–925
Article Google Scholar
Gao M, Mo D, Wu X (2016) Nonstationary modeling of extreme precipitation in China. Atmos Res 182:1–9
Article Google Scholar
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472
Article Google Scholar
Hoerling MP, Kumar A, Zhong M (1997) El Nino, La Nina, and the nonlinearity of their teleconnections. J Clim 10(8):1769–1786
Article Google Scholar
Hoffman MD, Gelman A (2014) The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593–1623
Google Scholar
IPCC CC (2021) The physical science basis. Contribution of working group I to the sixth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge
Johnson DS, Ream RR, Towell RG, Williams MT, Leon Guerrero JD (2013) Bayesian clustering of animal abundance trends for inference and dimension reduction. J Agric Biol Environ Stat 18(3):299–313
Article Google Scholar
Kiladis GN, Diaz HF (1989) Global climatic anomalies associated with extremes in the Southern oscillation. J Clim 2(9):1069–1090
Article Google Scholar
Konapala G, Mishra AK, Wada Y, Mann ME (2020) Climate change will affect global water availability through compounding changes in seasonal precipitation and evaporation. Nat Commun 11(1):3044
Article CAS Google Scholar
Lüdecke HJ, Müller-Plath G, Wallace MG, Lüning S (2021) Decadal and multidecadal natural variability of African rainfall. J Hydrol Regional Stud 34:100795
Article Google Scholar
Madsen H, Lawrence D, Lang M, Martinkova M, Kjeldsen TR (2014) Review of trend analysis and climate change projections of extreme precipitation and floods in Europe. J Hydrol 519:3634–3650
Article Google Scholar
Marouf N, Remini BJJOW, Development L (2019) Impact study of Beni-Haroun dam on the environmental and socio-economic elements in Kébir-Rhumel basin. Algeria 43:120–132
CAS Google Scholar
Mason SJ, Goddard L (2001) Probabilistic precipitation anomalies associated with ENSO. Bull Am Meteorol Soc 82(4):619–638
Article Google Scholar
McLachlan GJ, Peel D (2004) Finite mixture models. Wiley, New York
Google Scholar
Mebarki A (2005) Thèse d’État: Hydrologie des bassins de l’Est algérien: ressources en eau, aménagement et environnement.
Meddi MM, Assani AA, Meddi H (2010) Temporal variability of annual rainfall in the Macta and Tafna catchments, northwestern Algeria. Water Resour Manage 24(14):3817–3833
Article Google Scholar
Micevski T, Franks SW, Kuczera G (2006) Muttidecadal variability in coastal eastern Australian flood data. J Hydrol 327(1–2):219–225
Article Google Scholar
Nicholson SE, Kim J (1997) The relationship of the El Nino-Southern oscillation to African rainfall. Int J Climatol 17(2):117–135
Article Google Scholar
Nieto-Barajas LE, Contreras-Cristán A (2014) A Bayesian nonparametric approach for time series clustering. Bayesian Anal 9(1):147–170
Article Google Scholar
Ossandón Á, Brunner MI, Rajagopalan B, Kleiber W (2022) A space–time Bayesian hierarchical modeling framework for projection of seasonal maximum streamflow. Hydrol Earth Syst Sci 26(1):149–166
Article Google Scholar
Ouarda TBMJ, El-Adlouni S (2011) Bayesian nonstationary frequency analysis of hydrological variables. J Am Water Resour Assoc 47(3):496–505
Article Google Scholar
Papalexiou SM, Koutsoyiannis D (2013) Battle of extreme value distributions: a global survey on extreme daily rainfall: survey on extreme daily rainfall. Water Resour Res 49(1):187–201. https://doi.org/10.1029/2012WR012557
Article Google Scholar
Renard B, Sun X, Lang M (2013) Bayesian methods for non-stationary extreme value analysis. In: AghaKouchak A, Easterling D, Hsu K, Schubert S, Sorooshian S (eds) Extremes in a changing climate: detection, analysis and uncertainty. Springer Netherlands, Dordrecht, pp 39–95. https://doi.org/10.1007/978-94-007-4479-0_3
Chapter Google Scholar
Renard B, Lang M, Bois P, Dupeyrat A, Mestre O, Niel H, Sauquet E, Prudhomme C, Parey S, Paquet E, Neppel L (2008) Regional methods for trend detection: assessing field significance and regional consistency. Water Resourc Res. 44(8)
Ropelewski CF, Halpert MS (1987) Global and regional scale precipitation patterns associated with the El Niño/Southern Oscillation. Mon Weather Rev 115(8):1606–1626
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article Google Scholar
Silva AT, Portela MM, Naghettini M, Fernandes W (2017) A Bayesian peaks-over-threshold analysis of floods in the Itajaí-açu River under stationarity and nonstationarity. Stoch Env Res Risk Assess 31(1):185–204
Article Google Scholar
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B 64(4):583–639
Article Google Scholar
Stan Development Team (2022) Rstan: the R interface to Stan. R package version 2.26.4. https://CRAN.R-project.org/package=rstan
Steirou E, Gerlitz L, Apel H, Merz B (2017) Links between large-scale circulation patterns and streamflow in Central Europe: a review. J Hydrol 549:484–500
Article Google Scholar
Steirou E, Gerlitz L, Apel H, Sun X, Merz B (2019) Climate influences on flood probabilities across Europe. Hydrol Earth Syst Sci 23(3):1305–1322
Article Google Scholar
Su C, Chen X (2019) Covariates for nonstationary modeling of extreme precipitation in the Pearl River Basin. China Atmospheric Research 229:224–239
Article Google Scholar
Sun X, Lall U (2015) Spatially coherent trends of annual maximum daily precipitation in the United States. Geophys Res Lett 42(22):9781–9789
Article Google Scholar
Sun X, Thyer M, Renard B, Lang M (2014) A general regional frequency analysis framework for quantifying local-scale climate effects: a case study of ENSO effects on Southeast Queensland rainfall. J Hydrol 512:53–68
Article Google Scholar
Sun X, Renard B, Thyer M, Westra S, Lang M (2015a) A global analysis of the asymmetric effect of ENSO on extreme precipitation. J Hydrol 530:51–65
Article Google Scholar
Sun X, Lall U, Merz B, Dung NV (2015b) Hierarchical Bayesian clustering for nonstationary flood frequency analysis: application to trends of annual maximum flow in Germany. Water Resour Res 51(8):6586–6601
Article Google Scholar
Trenberth KE (1997) The definition of El Niño. Bull Amer Meteor Soc 78:2771–2777
Article Google Scholar
Trenberth KE, Stepaniak DP (2001) Indices of El Niño evolution. J Clim 14(8):1697–1701
Article Google Scholar
Turki I, Laignel B, Massei N, Nouaceur Z, Benhamiche N, Madani K (2016) Hydrological variability of the Soummam watershed (Northeastern Algeria) and the possible links to climate fluctuations. Arab J Geosci. https://doi.org/10.1007/s12517-016-2448-0
Article Google Scholar
Ward PJ, Eisner S, Flörke M, Dettinger MD, Kummu M (2014) Annual flood sensitivities to El Niño-Southern Oscillation at the global scale. Hydrol Earth Syst Sci 18(1):47–66
Article Google Scholar
Xiong Y, Yeung DY (2004) Time series clustering with ARMA mixtures. Pattern Recogn 37(8):1675–1689
Article Google Scholar
Zeroual A, Assani AA, Meddi M (2016) Combined analysis of temperature and rainfall variability as they relate to climate indices in Northern Algeria over the 1972–2013 period. Hydrol Res 48(2):584–595
Article Google Scholar

Download references

Acknowledgements

We grateful to Prof. Xun Sun and Prof. Naresh Devineni, for their helpful comments and suggestions that substantially improved this work. We would like to thank the editor and the two referees for their helpful suggestions and comments.

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Laboratory of Applied Research in Hydraulics, University of Mustapha Ben Boulaid, Batna 2, Algeria
Lazhar Belkhiri
Department of Civil Engineering, City College of New York, New York, NY, 10031, USA
Nir Krakauer

Authors

Lazhar Belkhiri
View author publications
You can also search for this author in PubMed Google Scholar
Nir Krakauer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors reviewed the manuscript

Corresponding author

Correspondence to Lazhar Belkhiri.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Ethical approval

All the work is compliance with Ethical Standards.

Consent to participate

The authors gave the consent to participate in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Belkhiri, L., Krakauer, N. Quantifying the effect of climate variability on seasonal precipitation using Bayesian clustering approach in Kebir Rhumel Basin, Algeria. Stoch Environ Res Risk Assess 37, 3929–3943 (2023). https://doi.org/10.1007/s00477-023-02488-z

Download citation

Accepted: 03 June 2023
Published: 14 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00477-023-02488-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Quantifying the effect of climate variability on seasonal precipitation using Bayesian clustering approach in Kebir Rhumel Basin, Algeria

Abstract

Similar content being viewed by others

Spatio-temporal variability of extreme precipitation characteristics under different climatic conditions in Fars province, Iran

Mapping the spatiotemporal diversity of precipitation in Iran using multiple statistical methods

Updating regionalization of precipitation in Ecuador

1 Introduction

2 Study area and data