Introduction

The Earth’s climate is naturally subject to variations. The land has regularly experienced periods of warming and cooling periods, which are still parts of natural climate cycles. However, human activity has exacerbated the problems associated with climate change. Anthropogenic climate change is predicted to cause spatial and temporal shifts in precipitation patterns (Marvel et al. 2017). The time series study, to highlight the climate change effect on the meteorological data variability, has been the subject of several researches (Kundzewicz et al. 2005; Donat et al. 2014; Raziei et al. 2014, Elmeddahi et al. 2016). Ogburn (2013) reported that global precipitation shifts across land and ocean. The various parts of the world or even parts of some countries react differently to the global warming effects. Thus, the types of impacts of climate change are different between the semi-arid, arid and wet areas in the world (Şen et al. 2012). According to the report and the projections of the Intergovernmental Panel on Climate Change (IPCC 2007), the annual average rainfall is expected to decline in the Mediterranean and the Arabian Peninsula between 10–20% and 30–40% in Morocco and northern Mauritania, respectively. While 10 to 30% increase in precipitation is expected in the Kingdom of Saudi Arabia, Yemen, the United Arab Emirates and Oman (Jnad and Siba 2006).

The climate change effects in the Maghreb countries might be more pronounced than elsewhere because of the Sahara vicinity having hyper-arid climate. Now, the climate change ahead is a major concern for Maghreb states (Taabni and El Jihad 2012). Turki et al. (2016) have noted significant changes of rainfall variability in Northeastern Algeria.

For these reasons, it is a prerequisite for water resources planning to know the direction of change whether in an increasing or decreasing manner so as to support regional water and food exchange possibilities for the sustainability of a society.

The main impact on Algeria is desertification, which involves usually a decrease in the amount of some meteorological and agricultural items such as rainfall, vegetation cover, the extension of water surfaces, groundwater levels and crop yields (Şen 2008). In this regard, it is important to emphasize that a healthy water resource management in Algeria is unthinkable without considering the climate and its variations.

This study focuses on the watershed Macta and aims to know the changing rainfall patterns. The first objective is to collect, verify and validate the monthly rainfall data forming the longest possible series. The second stage is to study the trends of annual and monthly rainfall series and finally the formation of stations groups with the same rainfall pattern for different periods and for possible climate change determinations. This work is based on the data analysis methodology, which aims to reason about any number of variables not only in pairs (De Lagarde 1983).

Methodology

Study area

The Macta watershed is located in the northwest of Algeria with an area of 14,389 km2. It is bounded to the northwest by the mountain ranges of Tessala, in the south by the highlands of Maalif, in the west by the Telagh plateau and in the east by the Saidas’ mountains (Meddi et al. 2010).

The Macta watershed is the most important western Algerian Basin that drains the main water resources to the west (Fig. 1).

Fig. 1
figure 1

Macta watershed location

Due to its elongated shape, the basin is characterized by a strong surface flow in the upstream part of the river, and it is drained by two major rivers, Mékérra and Wadi El-Hammam, which join not far from the Mediterranean coast to form Macta. Annual rainfall decreases from north to south and varies on the average from 500 down to 300 mm. On the mountains of Saida, in Aouf, it can reach to 600 or 700 mm per annum. Regarding the monthly average, usually January is the wettest, and July is the driest month.

Data collection and processing

Stations and the study period selection

Data are obtained mainly from the National Office of Meteorology (ONM) and the National Agency for Water Resources (ANRH).

For an effective data processing, it is necessary to have sufficiently long periods of records. This is essential not only to know the traits of a climate but also to monitor its evolution. However, the reality is different, because often there are gaps in the observation series.

The present study is based on monthly rainfall records without significant gaps unlike temperature, wind, humidity, etc., where important discontinuities are observed within the record series.

To ensure a good representation of the region, data are collected from 123 rainfall stations with periods as long as possible. Years and stations with more than 10% gaps were removed (Meddi et al. 2014). Finally, 42 stations are selected from 1970 to 2011, inclusive. Figure 2 clearly illustrates years with percentage gaps.

Fig. 2
figure 2

Year percentage gaps

The spatial distribution of the station locations is presented in Fig. 3 with distinctions among the coastal, mountains, hills and plains stations (Fig. 3).

Fig. 3
figure 3

Selected weather stations

Statistical analysis

Outliers

An excellent analysis on false data obviously has no value. Any analysis involves two steps, namely, data collection and processing. It is clear that the first operation is very important for obtaining reliable results from the second step. Whatever the initial data, one can often find anomalies in rainfall series. It is almost always necessary to make a critical study prior to direct use of available data (Laborde 2013). Data verification can highlight certain values that can appear as singulars compared to the rest of the series. The singularity can express a transcription error (outlier) during the various manipulations of the data or simply rare local weather phenomena (exceptional data). An outlier can be considered an observation, which abnormally or haphazardly deviates from normal status of data and analysis (Noughabi 2011).

Groups of neighboring stations are formed for each month by considering matrices, which are classified into monthly value columns. Each pair of these matrices is subjected to the outlier detection procedure with Hydrolab Excel Macro software (Hydrolab 2010), which uses the accumulated residual method.

The basic idea is to study the cumulative residue εi between the values and the estimated values by the regression line (Yue Rong 2011). These totals are calculated by the following formula (Yue Rong 2011):

$$ {Z}_i=\sum \limits_{j=1}^i{\varepsilon}_i $$
(1)

where Z is are random variable with mean zero and standard deviation σzi. One can build for each Z i value, a confidence interval (for a given significance level) according to the following expression:

$$ \left[-{u}_{1-\frac{\alpha }{2}}{\sigma}_{Zi},{u}_{1-\frac{\alpha }{2}}{\sigma}_{Zi}\right] $$
(2)

where \( {u}_{1-\frac{\alpha }{2}} \) is a reduced Gauss variable for a two-tailed test and σ Zi is calculated as,

$$ {\sigma}_{Zi}={\sigma}_{\varepsilon}\sqrt{\frac{i\left(n-i\right)}{n-1}} $$
(3)

where n is the number of observations and σ ε is expressed according to the following expression:

$$ {\sigma}_{\varepsilon }={\sigma}_{\mathrm{y}}\sqrt{1-{r}^2} $$
(4)

where σ y is the tested variable standard deviation and r is the correlation coefficient.

On a graph (Fig. 4), one can indicate on y-axis residuals (Z i) with the upper and lower bounds of the confidence intervals, which form an ellipse. The cumulative gap out of the ellipse is considered an outlier. Subsequently, one can have the chance to compare it with other matrix columns either to dismiss or to validate (exceptional value).

Fig. 4
figure 4

Outlier detection example

Filling gaps

Environmental time series are often affected by the “presence” of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered (Lo Presti et al. 2010). After inserting these estimates, data analyses can continue almost as if no value is missed.

On an Excel spreadsheet, the missing data are replaced by “lac”. They are then estimated by using the Hydrolab software (Hydrolab 2010). This software uses the principal component analysis (PCA) to estimate missing data. According to Laborde (2013), seven to eight iterations are sufficient to stabilize the process. At this stage, the data are complete and ready for the analysis.

Data analysis (or exploratory multivariate techniques)

Multivariate exploratory data analysis methods also known as principal component methods are dimensionality reduction techniques often used to sum-up data, where individuals are described by continuous and or categorical variables (Josse and Husson 2012).

Data analysis mainly covers two sets of techniques: the first is Factor analysis, where variables that are correlated with each other are combined (Gordon et al. 2004); the second technique is the “automatic classification”, which is for partition of the data into groups, where objects that are similar tend to fall into the same groups and objects that are relatively distinct tend to separate into different groups (Yang 2012). These two techniques are used either alone or jointly.

The principal components factor analysis was developed by H. Hotelling (1933), but it can be traced back also to Pearson (1901) (Ambapour 2003). It is a multi-dimensional statistical approach that allows for a comprehensive study of involved data. It also helps to summarize, represent, organize, visualize and define the relationships that may exist among the variables. The classification methods such as hierarchical clustering (hierarchical cluster analysis or HCA) is the best known and most experienced technique. It is most useful to discover groups and interesting pattern identifications (Halkidi et al. 2001). It is also known as an effective statistical tool to group stations in homogeneous climatic regions (De Gaetano and Shulman 1990; Ahmed 1997; De Gaetano 2001). After calculating distances between individuals in pairs, one must choose a criterion to discriminate among the trees with different topologies (Pardi and Gascuel 2012).

Results and discussion

Graphic study

The graph of annual rainfall in Fig. 5 imposes the following comments:

  • Macta Watershed annual rainfall varies between 71.5 and 1033.1 mm and indicates a high spatial and temporal variability.

  • Years 1996 and 1997 represent a phenomenon that has caught the attention. Most stations had an exceptional year (1996) and a drought year (1997).

  • This last remark helped to note that from these two years, the majority of lines converge. This gave the idea to test the rainfall patterns unification hypothesis on the monthly scale.

Fig. 5
figure 5

Annual rainfall of the Macta watershed 42 stations (1970–2011 period)

Figure 6 represents the annual rainfall sum of the 42 stations. It shows a clear downward trend with − 31.228 slope. The year 2009 represents the maximum records with a total rainfall of 21,165.7 mm, while 1982 has the smallest total with 7751.6 mm. The recent years indicate droughts that are longer in time durations but closer to each other in space (Stockton 1988; Mutin 2011).

Fig. 6
figure 6

The 42 stations’ rainfall amount

The seasonal rainfall is represented in Fig. 7. The winter, the wet season, is most affected by the rainfall decline. It has the biggest trend slope (− 38.417). On the other hand, summer is characterized by an increasing trend, influenced by exceptional rains that occur in a random manner. The latter affects the stations variability.

Fig. 7
figure 7

The 42 stations’ seasonal rainfall amount

To quantify the rainfall reduction in time, the study period is divided into four periods. For each year, the rainfall totals are calculated with each sub-period average (Table 1) in addition to the percentage reduction relative to the first sub-period (1970–1979). The averages range between 15,384 and 11,466.5 mm. The two sub-periods, 1980–1989 and 1990–1999, have the minimum average within the study period with a percentage reduction of about 25%. The period of 2000–2011 has an increase compared to the previous periods, but it has a decrease of 13.2% compared with the first period. Meddi et al. (2010) observed also a decrease of at least 20% of total annual rainfall at five stations in Macta and Tafna catchments. This can directly affect the agriculture. Also, several studies indicated that many cultures will increase, in future, their demand on water due to the global warming (IPCC 2008; Brauch 2007; Maracchi et al. 2005; Agoumi 2003; Fischer et al. 2005).

Table 1 Total annual rainfall average

The coefficients of variation fluctuate between 17 and 28%, indicating increased temporal variability of annual totals.

Regarding the spatial variability of monthly rainfall patterns, one can calculate the standard deviation of each station for each sub-period. The graphical representation and interpretations are as follows (Fig. 8):

  • The standard deviation values have a greater variability for the first and second sub-periods, while they decrease significantly for the last two sub-periods.

  • The stations’ standard deviations are close while in decrease. Thus, seven (6.66%) stations only have one standard deviation below 30 for the first time period against 28 (66.66%) for the last period.

Fig. 8
figure 8

Monthly rainfall standard deviation for different stations (for each sub-period)

Regarding the seasonal variability, the majority of stations indicate a decrease. This decrease is accompanied by a reconciliation of the standard deviation values. With the exception of the summer season, where there is an increase in variability, others are influenced by extreme rains (Fig. 9).

Fig. 9
figure 9

Seasonal rainfall standard deviation for different stations (for each sub-period)

Formation of homogenous groups

The data processing affecting the automatic classification is done by using the STATISTICA software (StatSoft France 2003).

The purpose is the formation of groups that have the same rainfall pattern (monthly rainfall) and not the same annual rainfall. Two stations may have the same annual rainfall, but they may not be part of the same group, if the distribution of monthly precipitation is not similar.

To confirm the above interpretations by graphical study and to show clearly the influence of climate change on rainfall patterns, one can form a matrix for each sub-period with 42 columns. The processing of monthly rainfall by hierarchical clustering of the four sub-periods led to four graphics called dendrograms (Figs. 10, 11, 12, 13). Their use facilitates group formations and subsequent interpretations.

Fig. 10
figure 10

Dendrogram of period 1970–1979

Fig. 11
figure 11

Dendrogram of period 1980–1989

Fig. 12
figure 12

Dendrogram of period 1990–1999

Fig. 13
figure 13

Dendrogram of period 2000–2011

Figures 10, 11, 12 and 13 show on abscissa the codes of the stations and on the ordinate the aggregation distances between the different stations.

Among the methods used by the hierarchical clustering, one can find the notion based on the minimum jump criterion. This involves choosing the smallest of distances, which allow moving from one class to another (Ambapour 2003). This distance is considered critical.

Dendrograms in Figs. 10, 11, 12 and 13 are used for the groups’ formation. The Euclidean distance is used as a criterion of similarity (critical distance). One can draw a horizontal line (critical aggregation distance) through the diagram at any level on the y-axis (the distance measure); the vertical cluster lines that intersect the horizontal line indicate clusters, whose members are at least that close to each other. A critical aggregation distance is set equal to 150 for the four sub-periods. If the distance between stations is greater than 150, they cannot belong to the same group.

The results are shown in the maps (Figs. 14, 15, 16, 17), where the stations within the same group are indicated by one color.

Fig. 14
figure 14

Stations’ class with an aggregate distance ≤ 150 (period 1970–1979)

Fig. 15
figure 15

Stations’ class with an aggregate distance ≤ 150 (period 1980–1989)

Fig. 16
figure 16

Stations’ class with an aggregate distance ≤ 150 (period 1990–1999)

Fig. 17
figure 17

Stations’ class with an aggregate distance ≤ 150 (period 2000–2011)

The analysis of Figs. 14, 15, 16 and 17 shows that station number per group tends to expand over time. Thus, the maximum number of stations forming the same group (aggregation distances ≤ 150) for sub-periods 1970–1979, 1980–1989, 1990–1999 and 2000–2011 is 4, 4, 30 and 15 stations, respectively. These results lead to the conclusion that the monthly rainfall pattern of stations tends to unify. This unifying trend is accompanied by runoff declines reaching 20% (IPCC 2008; Mostefa-Kara 2008).

It is also possible to note that during the first two periods, the similarity of monthly rainfall patterns is related to the geographical position (two neighboring stations may belong to the same group). On the other side, the similarity in the last two periods is no longer linked to neighborhood phenomenon but to rather a southern trend northward.

The same observation can be made using the stations’ standard deviation interpolation by the kriging method. Figure 18 shows the iso-standard deviation lines. The latter have a west-east direction except the last period, where the direction is south-north.

Fig. 18
figure 18

Standard deviation interpolation

The same results are highlighted by Şen (2013) by showing that climate tends to shift from south to north. These changes have implications on hydrological extremes like droughts and floods. They can also affect growing seasons and crop yields (Sultan et al. 2014).

Conclusion

This study highlighted the climate change impact on the rainfall spatio-temporal variability. It could contribute to the scientific knowledge by providing factual information about the hydrology of the study region. The main results are the decrease of the rainfall and variability. Data verification allowed to detect outliers and then to dismiss them. The gap-filling operation helped to build a database of 42 years’ monthly rainfall records from 1970 to 2011 within the Macta watershed in the northwest Algeria.

The total average annual rainfall tends to decline and pass from 15,384 to 13,352.5 mm, with a loss of about 13.2%. The dry periods are longer with few exceptional years.

The formation of four sub-period is used to determine possible rainfall evolution. Comparison of the coefficients of variations showed an increasing variability. The coefficient of variation is 17% during the period 1970–1979 but reached to 27.69% in 2000–2011, hence, indicating the decrease in the monthly rainfall variability. Only 17% of stations had a standard deviation less than 30, while more than 66% passes below this value during the period 2000–2011.

The hierarchical clustering methodology facilitates the stations’ group formations based on similarity of monthly rainfall pattern. The maximum station numbers that can be in the same group for the periods 1970–1979, 1980–1989, 1990–1999 and 2000 are 4, 4, 30 and 15, respectively. This may imply a reduction in the spatial variability, and therefore, a unification of monthly rainfall patterns. Group representation on maps shows that the unification phenomenon has a southern to northward trend.

This evolution of the climate (temperature, precipitation, etc.) affects the global hydrological cycle, especially as it will conjugate the watershed changes, especially due to activities (Jacques et al. 2006).