1 Introduction

Climate change and variability is one of the most serious challenges facing the global ecosystem in the 21st Century. Global warming and climate fluctuation have comprehensively affected the water cycle components, especially precipitation (Berghuijs et al. 2017; Konapala et al. 2020; Belkhiri and Kim 2021; IPCC, 2021). The El Niño Southern Oscillation (ENSO) is the single most influential climate phenomenon affecting the variability of precipitation (Ropelewski and Halpert 1987; Hoerling et al. 1997; Dai et al. 1997; Ward et al. 2014). Although ENSO is defined based on Pacific surface pressure or sea surface temperature, it affects climate in various ways globally, particularly across the tropics and subtropics. Many studies have shown the complex effects of ENSO on the precipitation in many countries throughout Africa (Ropelewski and Halpert 1987; Kiladis and Diaz 1989; Nicholson and Kim 1997; Mason and Goddard 2001; Lüdecke et al 2021). Fewer studies investigated the effect of ENSO on precipitation in Algeria. For example, Meddi et al. (2010) analyzed the temporal variability of annual precipitation in the Macta and Tafna catchments of northwestern Algeria and showed a negative ENSO correlation with precipitation. Turki et al. (2016) studied the long-term variability of rainfall in the Soummam watershed and noted significant effect of ENSO on rainfall variability in northeastern Algeria. Zeroual et al. (2016) analyzed the climate indices' influence on temperature and annual as well as seasonal rainfall in the coastal region of northern Algeria from 1972 to 2013, the researchers found a positive correlation between rainfall and the ENSO (El Niño-Southern Oscillation) index. However, the effect of climate indices such as ENSO on the full probability distribution of precipitation in Algeria has not yet been comprehensively assessed. Therefore, there is room for new approaches to evaluate the response of precipitation to ENSO.

The comprehensive effects of climate variability/change on precipitation have been studied by using several nonstationary models. For instance, Micevski et al. (2006) examined the effect of Inter-decadal Pacific Oscillation (IPO) for mainly coastal, eastern Australian flood data and found that the IPO modulates the flood risk within parts of eastern Australia. Ouarda and El-Adlouni (2011) discussed nonstationary frequency analysis models in hydrology with a focus on the Bayesian approach and demonstrated that the Bayesian approach can be applied to more general and more complex models where parameters are expressed as nonlinear functions of covariates. Madsen et al. (2014) presented a review of trend analysis of extreme precipitation and floods and described that non-stationarity in extreme precipitation and flood characteristics due to climatic changes is high on the research agenda in Europe. Chen et al. (2014) developed a Hierarchical Bayesian approach for regional rainfall and streamflow forecast using appropriate climate indicators and described that this approach allows appropriate grouping of information in the region and explicit modeling of the covariance of the model errors and the regression coefficients to better represent the uncertainty in the model parameters and the final streamflow and rainfall forecasts. Sun et al. (2014) demonstrated that a Bayesian regional framework provides the opportunity to assess the value of regional information in better identifying the effect of climate variability on hydrometeorological extremes. Despite the results of these valuable studies, the reduction of the uncertainties and the identification of the effect of climate variability in these models across a relatively large area by regional pooling of information on model parameters raises questions. Some studies addressed this question but considered a relatively small homogeneous area over which information can be pooled (Aryal et al. 2009; Renard et al. 2008; Renard et al. 2013; Chen et al. 2014; Sun et al. 2014; Sun et al. 2015a). Moreover, some authors have applied Bayesian clustering models based on Dirichlet process or Expectation–Maximization (EM) algorithm (Xiong and Yeung 2004; Johnson et al. 2013; Nieto-Barajas and Contreras-Cristán, 2014) but these mixture models have been limited to a Gaussian mixture model. Sun and Lall (2015) and Sun et al. (2015b) developed a new Bayesian clustering approach for exploring homogeneity of response in large area datasets, through a multicomponent mixture model (or clusters). This approach allows the reduction of uncertainties through full pooling or partial pooling across automatically chosen subsets of the data. In this study, we applied a similar Bayesian clustering approach with non-stationary GEV distribution for seasonal precipitation using ENSO index as covariate.

The current paper is structured as follows. Section 2 introduces the study area and data. Section 3 presents the Bayesian clustering approach. Section 4 describes the results and discussions, followed by Sect. 5 in which we concluded the main findings.

2 Study area and data

The Kebir Rhumel Basin (KRB) is located in northern Algeria and includes seven sub-basins, which covers an area of approximately 8815 km2 (Table 1 and Fig. 1). The KRB is drained by two major rivers, Oued Rhumel in the southern part and Oued Endja in the western part. Beni-Haroun and Boussiaba are considered as the major dams in the basin, with a capacity of 960 hm3 and 120 hm3, respectively (Marouf et al. 2019). According to Mebarki (2005), of KRB climate is humid in northern part and semi-arid in the southern part.

Table 1 The surface area and number of stations for each Sub-Basins in KRB
Fig. 1
figure 1

Locations of the study area and the selected stations

The precipitation datasets for the period 1970–2013 used in this study were obtained from the National Agency for Hydraulic Resources and Office National Office Meteorology. In the current study, the mean value of monthly precipitation is derived for the four seasons winter (DJF), spring (MAM), summer (JJA) and autumn (SON). For each season, a station was selected only if it had at least 90% of the total precipitation data available during the period 1970–2013. Overall, 24 stations are utilized for the analysis. Figure 1 displays the spatial distribution of the selected stations in KRB.

In order to analyze the effect of climate variability on the seasonal precipitation, the El Niño Southern Oscillation (ENSO) was considered. Many indices have been developed to characterize aspects of ENSO evolution (Trenberth 1997; Trenberth and Stepaniak 2001). The seasonal mean of Southern Oscillation Index (SOI) is used as the measure of ENSO, and was obtained from the National Oceanic and Atmospheric Administration Climate Prediction Center (https://www.cpc.ncep.noaa.gov/data/indices/soi/).

3 Methodology

3.1 Nonstationary model structure

For the nonstationary modeling of the seasonal precipitation, it is very important to choose an appropriate distribution. In recent decades, many studies have successfully used the generalized extreme value (GEV) distribution to model the nonstationarity in precipitation events by including time-varying parameters (Du et al. 2014; Cheng and Aghakouchak 2014; Gao et al. 2016; Agilan and Umamahesh, 2017a, b; Su and Chen 2019). Steirou et al. (2019) identified links between seasonal flood probabilities and large-scale atmospheric indices for entire Europe by adopting a Bayesian framework with climate-informed (non-stationary) GEV distribution and compare it with the classical (stationary) GEV distribution with parameters invariant in time. They demonstrated that the climate-informed models were preferred over the classical GEV distribution for a high percentage of stations for most seasons and the seasonally averaged indices provided in most cases better fits compared with monthly values. Ossandón et al. (2022) developed a Bayesian Hierarchical Model (BHM) to project seasonal streamflow extremes for several lead times based on a Gaussian elliptical copula and Generalized Extreme Value (GEV) margins with nonstationary parameters and demonstrated that the framework proposed could be useful for the early implementation of flood risk adaptation and preparedness strategies. Thus, a nonstationary generalized extreme value (GEV) distribution is considered to model the seasonal precipitation over the KRB at each station. Here, we assumed that the GEV location parameter was linked to the temporal climate covariate (ENSO index) using a linear regression model. In the preliminary analysis, we considered the effect of the climate index on both the location and scale parameter, but this did not provide very different results than those for a covariate on the location parameter only (not shown). The shape parameter is kept constant as its estimation includes large uncertainties, even under the assumption of stationarity (Coles et al. 2001; Papalexiou and Koutsoyiannis 2013; Silva et al. 2017; Steirou et al. 2017).

In this application, the three proposed model structures of no pooling, full pooling and partial pooling can be written as follows:

  1. (a)

    No pooling model

    $$Y\left(s,t\right) \sim GEV\left({\mu }_{0}\left(s\right) + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi (s)\right)$$
    (1)
  2. (b)

    Full pooling model

    $$Y\left(s,t\right) \sim GEV\left({\mu }_{0} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi \right)$$
    (2)
  3. (c)

    Partial pooling model

    $$Level 1: Y\left(s,t\right) \sim GEV\left({\mu }_{0} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),\xi \right) Level 2: {\mu }_{1}\left(s\right) \sim N\left({\mu }_{\mu },{\sigma }_{\mu }\right)$$
    (3)

Y(s,t) is the observation of the variable at station s and time t. ENSO(t) is climate covariate at the time t. µ0(s)/µ0, µ1(s), σ(s) and ξ(s)/ξ are model parameters, where the µ0(s)/µ0 is the intercept of the location parameter, and the µ1(s) is the slope of the location parameter at station s. σ(s) and ξ(s)/ξ are the scale and shape parameters, respectively. µ0(s), µ1(s), σ(s) and ξ(s) are considered as a site-specific (local) parameter while µ0 and ξ are a regional parameters. N(.,.) denotes a normal distribution. μμ and σμ are the hyper-parameters in the second level model (Eq. 3).

In the current research, we use the same linear regression function on the location parameter in the proposed models to describe the ENSO temporal climate covariate, while these models differ in the settings of µ0, µ1 and ξ, which are either site-specific (no pooling), regional (full pooling), or have a second level (partial pooling). In the no pooling model (Eq. 1), the three GEV parameters are estimated locally (i.e., were site-specific). This model was used as a baseline. In the full pooling model (Eq. 2), all model parameters are allowed to be estimated independently for each station except that the intercept of the location parameter (µ0) and the shape parameter (ξ) are regional and estimated by using all data. We considered these two parameters for clustering in order to standardize the seasonal precipitation data, and considering that the shape parameter requires more data to obtain a precise estimation due to large uncertainty (Coles, 2001, p.106). In the partial pooling model (Eq. 3), we allowed for pooling of information across stations for estimating the regression slope of the location parameter to reduce the associated uncertainty, but also for variability in this parameter between stations. In this model, the slope parameter is presumed to drawn from common hyper-distribution, therefore the µ1(s) is in turn described by a set of hyper-parameters μμ and σμ (i.e., second level model (Eq. 3)). Here, the hyper-distribution describes the second level of the hierarchical Bayesian model. The intercept µ0 and the shape parameter ξ are still regional in this model.

3.2 Hierarchical Bayesian clustering model

The hierarchical Bayesian clustering model with a nonstationary GEV distribution is applied to model the seasonal precipitation. Sun and Lall (2015) and Sun et al. (2015b) demonstrated that this model can be applied directly to a heterogeneous area. In this mixture model, we assume that the stations can be classified into K clusters, and a hierarchical Bayesian model (Hk) for each cluster (k) is developed. Here, each station has a probability πk to belong to a cluster k, which needs to be estimated.

The mixture distribution across all clusters can be given as follows:

$$Y\left(s\right) \sim \sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)$$
(4)

where \({f}_{{H}_{k}}(s)\) is the likelihood function of the hierarchical Bayesian model (Hk) at station s. π = 1, …, πk} denotes the mixing probabilities (i.e., mixing coefficients or weights). In order to be valid probabilities, the probability πk must satisfy:

$$0\le {\pi }_{k}\le 1 \left(k=1,\dots ,K\right), and \sum_{k=1}^{K}{\pi }_{k}=1$$
(5)

Therefore, the likelihood of the hierarchical Bayesian clustering model can be computed as follows:

$$L= \prod_{s=1}^{S}\sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)$$
(6)

where L is the likelihood function and S in the total number of stations. A schematic of hierarchical Bayesian clustering model is presented in Fig. 2.

Fig. 2
figure 2

Schematic diagram of hierarchical Bayesian clustering model

In this study, full pooling Bayesian clustering (FPBC) and partial pooling Bayesian clustering (PPBC) models are considered for each cluster k. A summary of the two proposed models used in this research is shown in Table 2.

Table 2 Nonstationary GEV distribution models to fit the seasonal precipitation

Based on the partial pooling model described in Eq. (3), we constructed this model for each cluster by setting the intercept of the location parameter (\({\mu }_{0}\)) and the shape parameter (\(\xi\)) to be fully pooled (regional) and the slope of the location parameter to be partially pooled. The hierarchical Bayesian model (Hk) for each cluster k can be written as follows:

$$Level 1: Y\left(s,t\right) \sim GEV\left({\mu }_{0,k} + {\mu }_{1}\left(s\right)* ENSO\left(t\right), \sigma \left(s\right),{\xi }_{k}\right) Level 2: {\mu }_{1}\left(s\right) \sim N\left({\mu }_{{\mu }_{k}},{{\sigma }_{\mu }}_{k}\right)$$
(7)

where \({\mu }_{0,k}\), \({\xi }_{k}\), \({\mu }_{{\mu }_{k}}\) and \({\sigma }_{{\mu }_{k}}\) are parameters that are associated with cluster k.

The likelihood function of Hk at station s can be calculated as follows:

$${f}_{{H}_{k}}\left(s\right)= \prod_{t=1}^{T}{f}_{GEV}\left(Y(s,t)|{\mu }_{0,k},{\mu }_{1}\left(s\right),\sigma \left(s\right),{\xi }_{k},ENSO(t)\right) X {f}_{N}\left({\mu }_{1}\left(s\right)|{\mu }_{{\mu }_{k}},{\sigma }_{{\mu }_{k}}\right)$$
(8)

The full likelihood function of the hierarchical Bayesian clustering model is obtained when integrating Eq. (8) into Eq. (4).

In this application, we used a Dirichlet distribution with identical parameters (a vector of 2 with length k) as a prior for πk. For the other parameters, we used flat priors (normal or uniform distribution with large variance). For the initial values, we used the values that were obtained by fitting a nonstationary GEV distribution using maximum likelihood as the starting point for the intercept of the location parameter, scale and shape of each station. For the hyper-parameters, the mean of slope parameter µ1(s) is set to zero initially.

3.3 Implementation and model fitting

For each Bayesian clustering model, the posterior probability distribution of the model parameters is estimated using a No-U-Turn Hamiltonian Monte Carlo method (Hoffman and Gelman 2014). One chain of length 30,000 was run, with the first 15,000 iterations discarded as warmup. The convergence is evaluated by the potential scale reduction factor (Gelman and Rubin 1992), which should be smaller than 1.2 for each parameter. All the calculations are conducted using R and RStan (Stan Development Team 2022).

3.4 Selecting the optimal number of clusters

The selection of the optimal number of clusters is an important issue in mixture modeling. In a common trade-off in model selection problems, the mixture model with too many clusters may overfit the data, while a mixture with too few clusters may not be flexible enough to approximate the underlying model. Thus, it is important to adopt some statistical criteria to infer an optimal number of clusters (Deng and Han 2018). There are some statistical criteria, such as Akaike Information Criterion (AIC) (Akaike 1974), Bayesian Information Criterion (BIC) (Schwarz 1978), and Deviance Information Criterion (DIC) (Spiegelhalter et al. 2002), can be used to select the models among different clusters. In the Bayesian framework, AIC and BIC can be applied to the integrated likelihood over the model parameters. However, AIC and BIC are not theoretically justified for mixture models and may not be the best way to determining the optimal number of clusters (Biernacki and Govaert 1997; McLachlan and Peel 2004). Alternately, the DIC is a model selection criterion that automatically considers parameter uncertainty by utilizing the posterior distribution. However, the definition and application of DIC to mixture models are not straightforward, and different definitions and adaptations have been proposed (Delorio and Roberst, 2002). To overcome the limitation of BIC, Biernacki et al. (2000) proposed an Integrated Completed Likelihood (ICL) criterion, which showed that it performs well both for selecting a mixture model and an optimal number of clusters. Also, ICL differs from other criteria in that the integrated likelihood of the complete data (observed data) are used to evaluate mixture models. The ICL criterion is defined by:

$$ICL\left(K,\widehat{\theta }\right)= -2\mathrm{log}f\left(Y,\widehat{Z}|K,\widehat{\theta }\right)+ {\upsilon }_{K}log\left(S\right)$$
(9)

where Y is the observed data. \(\mathrm{log}f\left(Y,\widehat{Z}|K,\widehat{\theta }\right)\) is the log likelihood. \(\widehat{Z}\) is a m*K binary matrix, which refers to the estimated membership of each station, and \({\widehat{Z}}_{s,k}=1\) if and only if station s belongs to the cluster k. \(\widehat{\theta }\) is the collection of estimated parameters, \({\upsilon }_{K}\) is the number of parameters, and S is the number of stations. Among the different mixture models, the one with the lowest ICL value is preferred. In this study, we computed the ICL to select the optimal number of clusters for each Bayesian clustering model.

4 Results

4.1 Preliminary investigations

For the preliminary analysis, the no pooling model with a nonstationary GEV distribution was constructed using linear function of the ENSO-index time-covariate to fit the seasonal precipitation across the Kebir Rhumel Basin. The slope parameter (µ1) characterizes the effect of the climate index (ENSO) on the location parameter. If the effect of the climate index is significant, the posterior distribution of µ1 not have zero near its center.

The posterior probability distribution of the slope of location parameter (µ1) estimated from the no-pooling model for all stations and four seasons as boxplots is illustrated in Fig. 3. The red and blue colors of the box represent the significant and non-significant effect of the climate index, respectively. As illustrated in Fig. 3, we can see that the most stations have a positive value of the median posterior distribution of the slope parameter during the winter and autumn seasons, However, most stations of them have a negative value of the median posterior distribution of µ1 during spring and summer seasons, suggesting that in this basin, ENSO may have contrasting effects on precipitation in different seasons. In addition, the results show that 58% (14 out of 24) and 33% (8 out of 24) of the total stations have a significant effect of the climate index during the spring and autumn, respectively. By contrast, the significant effect of climate index during the rest of the seasons is only detected in two stations. Next, we applied the full pooling Bayesian clustering and partial pooling Bayesian clustering models in order to understand more about the influence of the climate index on the seasonal precipitation and its spatial variability within the basin.

Fig. 3
figure 3

Boxplot of the posterior distribution of the slope parameter µ1(s) from the no pooling model for all station. Red boxes represent the stations with significant effect, while the blue boxes are for the stations with non-significant effect of the climate index

4.2 Bayesian clustering models

In the current study, we applied the full pooling Bayesian clustering (FPBC) and partial pooling Bayesian clustering (PPBC) models using a nonstationary GEV distribution to fit the seasonal precipitation data by varying the number of clusters from 2 to 4.

4.2.1 Model selection

Model selection for the full pooling Bayesian clustering and partial pooling Bayesian clustering models includes the selection of the optimal number of clusters. Choosing the number of clusters (K) is a very important issue when using mixture models. In this study, the ICL is considered, and the best model is selected based on the lowest ICL value. Figure 4 shows the boxplots of the ICL values of FPBC and PPBC models. In all four seasons, the ICL values decrease from K = 2 to K = 4 for both mixture models, indicating that adding the number of clusters improves the fit of the mixture models. Thus, the optimal number of clusters is equal to 4. In other words, the mixture models with K = 4 provide better fit than a mixture models with K = 3 and 2. As shown in Fig. 4, the partial pooling Bayesian clustering model with K equal to 4 has the lowest values of ICL among all K values and thus provide the best fit for the seasonal precipitation data in the Kebir Rhumel Basin.

Fig. 4
figure 4

Boxplots of the Integrated Completed Likelihood (ICL) of FPBC and PPBC models for each season

4.2.2 Assignment of stations to clusters

In order to determine the membership of each station in an appropriate cluster, we calculate the posterior probability (Ppost) of each station (s) belonging to cluster k as follows:

$${P}_{post}\left(s \in cluster k\right)= \frac{{\pi }_{k}{f}_{{H}_{k}}\left(s\right)}{\sum_{k=1}^{K}{\pi }_{k}{f}_{{H}_{k}}\left(s\right)}$$
(10)

A station is a member of the cluster k where its posterior probability is highest.

Figure 5 displays the membership of station s belonging to cluster k for the partial pooling Bayesian clustering model with the number of clusters equal to 4. The summary statistics of the posterior distribution of the model parameters for each cluster in PPBC model is presented in Table 3. From the results, we can see clearly that the PPBC model identify four clusters during the four seasons. It can be seen in Table 3 that, about of 42%, 33% and 58% of the total stations belong to the first cluster in the season of winter, summer and autumn, respectively. Whereas, in the season of spring about of 50% of the total stations belong to the second cluster.

Fig. 5
figure 5

Spatial distribution of the identified clusters by the PPBC model with K = 4 for each season. The up-pointing and down-pointing triangles denote the significant positive and negative effect of ENSO, respectively. Colored dots and triangles denote the identified clusters by the clustering model

Table 3 Summary of the clustering results for PPBC model with K = 4

The estimated seasonal precipitation by the PPBC model with non-stationary GEV distribution for each cluster during the four seasons is displayed in Fig. 6. For the purpose of comparison, the empirical cumulative function of observed and fitted PPBC model with non-stationary GEV distribution to the seasonal precipitation is illustrated in Fig. 7 for each season. The figures show how the estimated theoretical distributions are adapted to the empirical distributions in each season.

Fig. 6
figure 6

Boxplots of the clusters obtained by PPBC model with k = 4 for each season

Fig. 7
figure 7

Empirical distribution function for the seasonal precipitation of observed data (dots) and PPBC models for each cluster (dashed lines). The stations for each cluster are randomly selected

As see in Figs. 6 and 7, we can observe that the values of the seasonal precipitation are changed from one season to another and from the first cluster to last one. Also, the cumulative distribution function (CDF) plot showed that PPBC with non-stationary GEV distribution is practically following the observed seasonal precipitation in all stations. Therefore, the results confirm that the PPBC model with k = 4 is the best model for seasonal precipitation which clearly explained the significant variation of the precipitation between the seasons and clusters. The winter season consisted of the stations with the highest values of the seasonal precipitation while summer season had the smallest values. Moreover, the seasonal precipitation values are increased from the first cluster to the last cluster during all seasons.

In all four seasons, the values of the mean posterior distribution of the intercept µ0 and scale σ parameters are increased from the first to the last cluster, indicating that the last cluster has the larger mean compared with the other clusters. In addition, we can clearly observe that the mean values of the posterior distribution of µ0 and σ are increased form the south to the north of the Kebir Rhumel Basin in the seasons of winter, spring and autumn. Moreover, the vertical sorting of the clusters is related to the elevation and distance from the Mediterranean Sea. During the winter and spring, the stations belong to the last cluster are covered the sub-basin of Oued Kebir Maritime and situated near to the Mediterranean Sea. Thus, this indicated that the high values of the precipitation are observed in the sub-basin of Oued Kebir Maritime (1007). The mean posterior distribution of the shape parameter (ξ) in the most stations is negative during the seasons of winter, spring and autumn and positive during the summer season.

4.2.3 Identify the effect of the climate variability

In the current study, the effect of the El Niño Southern Oscillation (ENSO) index on the seasonal precipitation is characterized by slope of the location parameter (µ1). For each season, the posterior distribution of the slope parameter µ1 estimated by the partial pooling Bayesian clustering model for each station is displayed in Fig. 8. The colors of the boxes represent the identified clusters by clustering model with K equals to 4. The median posterior distribution of µ1 for each cluster is presented in Table 3. From the results, we can see that most stations during the winter and autumn seasons have a positive value of the median posterior of the slope µ1, while most stations in spring and summer seasons have a negative value of µ1. This again indicates that ENSO (SOI) has a positive influence on the seasonal precipitation during winter and autumn seasons whereas a negative influence during spring and summer seasons.

Fig. 8
figure 8

Boxplot of the posterior distribution of the slope parameter µ1(s) for each station by PPBC model with K = 4. Different colors present the identified clusters form the clustering models

In addition, to better understand the significant effect of ENSO on the seasonal precipitation, we examine the posterior distribution of the slope µ1 of the location parameter for each season. For a significant effect of ENSO, the zero value is not included in the 90% posterior interval of the slope parameter µ1. A significant positive (negative) effect is considered if the probability of the posterior distribution pdf of µ is larger (smaller) than zero at the 10% (90%) significance level. Table 4 presents the percentage of stations with a significant positive and negative effect of ENSO by PPBC model. Figure 5 shows the spatial distribution of the significant effect of ENSO for each season. From the results, we can observe that a significant effect of ENSO is found in precipitation at 17% (4 stations), 75% (18 stations), 12% (3 stations) and 75% (18 stations) of the total stations during winter, spring, summer and autumn seasons, respectively, indicating that the highest numbers of significant effect of ENSO are found in spring and autumn. Also, the significant positive and negative effects of ENSO on the seasonal precipitation differ from one season to another. In winter and summer, a significant effect of ENSO is only detected in the stations of the first cluster. Thus, a significant positive influence of ENSO during winter is observed in the southern part of the Kebir Rhumel Basin, but a significant negative influence during summer is detected in the northern part of the Kebir Rhumel Basin (sub-basin 1007). In spring and autumn, the stations of the first three and two clusters are negatively and positively influenced by ENSO, respectively. In addition, all stations of the sub-basins except those of Oued Kebir Maritime sub-basin are negatively and positively influenced by ENSO during spring and autumn, respectively. These results were roughly consistent with the findings in previous studies in Africa (Ropelewski and Halpert 1987; Kiladis and Diaz 1989; Nicholson and Kim 1997; Mason and Goddard 2001; Lüdecke et al 2021) and Algeria (Meddi et al. 2010; Turki et al. 2016; Zeroual et al. 2016) in terms of the seasonal distribution of ENSO impacts on precipitation amount.

Table 4 Percentage of stations with a significant effect of ENSO on seasonal precipitation in PPBC model

5 Summary and conclusions

This study aimed to analyze the effect of ENSO on the seasonal precipitation across the Kebir Rhumel Basin using Bayesian clustering approach. For each season, full pooling Bayesian clustering and partial pooling Bayesian clustering models with a nonstationary GEV distribution are applied. In these models, we assumed that the location parameter was linked to the temporal climate covariate using linear regression function. The intercept and the slope of the location parameter and the shape parameter were used for clustering. An advantage of the approach is that it allows the clustering and the model parameter estimation to proceed at the same time and reduce the uncertainty in the parameter estimation by transferring the information across stations with similar characteristics.

The main findings of this study are summarized as follows: (i) It was found that adding the number of clusters improves the fit of both Bayesian clustering models. (ii) For all four seasons, the partial pooling Bayesian clustering model with K = 4 provided the best fit for the seasonal precipitation data. (iii) ENSO significantly effects precipitation across large parts of the Kebir Rhumel Basin during spring and autumn seasons. (iv) In winter and autumn, 17% and 75% of the stations were found to be positively influenced by ENSO, respectively. On the contrary, 75% and 12% of the stations examined were negatively affected by ENSO during spring and summer, respectively, indicating that the ENSO effect changed from one season to another. (v) The significant positive and negative influences of ENSO are observed in southern part and northern part of the Kebir Rhumel Basin during winter and summer, respectively. All stations except those found in Oued Kebir Maritime sub-basin are negatively and positively influenced by ENSO during spring and autumn, respectively.

An extension of the Bayesian clustering approach to consider several appropriate covariates at the same time is possible. In this study, we assumed a symmetric effect of the positive and negative phases of climate index, leading to a linear relationship between the ENSO measure SOI and the distribution location parameter. However, an asymmetric relation may better identify the influence of the climate variability on seasonal precipitation. As well, clustering models could also explicitly include spatial dependence across the stations. Sun et al. (2015b) demonstrated that considering spatial dependence in Hierarchical Bayesian clustering model can avoid under-estimating uncertainties. In future work, we expect to develop a strategy that can effectively model stations precipitation trends with adding various climate indices as a covariates using both symmetric and asymmetric analysis and considering the spatial dependence in a cluster.