1 Introduction

Nowadays, local water management depends strongly on the availability of precipitation databases. This importance increases mainly where water resources are facing unprecedented changes related to the precipitation distribution and triggered extremes (Abahous et al. 2021; Satgé et al. 2020; Toté et al. 2015). Generally, accurate temporal and spatial variability of precipitation information depends on the availability of high coverage of survey stations (WMO 2018). However, in underdeveloped and developing countries the number of meteorological stations is limited and sparsely distributed (Ayoub et al. 2020). In Morocco, the number of weather stations able to contribute to the World Meteorological Organization (WMO) network is considered better compared to other parts of Northern Africa (Born et al. 2008; Filahi et al. 2015). Unfortunately, most of the available stations are mainly located in the northern part of the country, where agriculture activity is expected to be the major sector contributing to the gross domestic product (Zereini and Hötzl 2009). As a result, a large part of the southern Atlas Mountains and sub-Saharan basins lack sufficient weather stations to capture precipitation variability adequately.

The Drâa River basin (DRB) located in the south of Morocco is a typical unequipped Moroccan basin regarding weather stations. The only available station included in SYNOP weather network (Surface Synoptic Observations) is the Ouarzazate station (30° 56′ N, 6° 53′ W with an Altitude of 1131 m). Conventionally, the in-situ hydrologic stations are used to measure total seasonal precipitation, but not reported to the global network system such as WMO (Zereini and Hötzl 2008). Indeed, these manually operated stations well known as rain gauge stations combine monitoring discharges of rivers and climatic variable measurements including liquid precipitation (rainfall) in many sparsely inhabited areas across DRB (ABHDON 2018). Unfortunately, according to the World Meteorological Organization (WMO) standards, the Upper and the Middle Drâa basins need at least 44 and 52 additional rain gauge stations, respectively to answer the minimum required station for mountainous regions in temperate climate (Liang and Wang 2020). Furthermore, the precipitation amount measured by this in-situ station is only representative of several square kilometers surrounding the station which makes a large part of the DRB unobserved.

Over the past few decades, P-datasets with full spatial coverage have emerged as a crucial resource for various applications (An et al. 2020). These datasets, incorporating data from different secondary sources, have shown significant advancements in earth system modeling and have enabled the provision of high-resolution data at both temporal and spatial scales (Yeditha et al. 2020). With their easy accessibility, these products have the potential to support a range of hydrological, hydrogeological, and weather forecasting applications, ultimately facilitating effective water sustainability management (Aghakouchak et al. 2011). In DRB, the number of research studies integrating different satellites products into various scientific applications (e.g., water resources, hydrologic modeling, climate change) has far outweighed those studying the accuracy, the uncertainty and hence the validation of these P-datasets. However, incorporating these P-datasets can be done only if the estimated rainfall matches closely the spatial and temporal variability observed through rain gauges (Bouizrou et al. 2023; Rivera et al. 2018). Moreover, the reliability of the indirect measurement given by the P-dataset cannot be concluded from its performance in another region (Satgé et al. 2020).

Assessing P-datasets over DRB has become more mandatory than in the past. This large area is considered as one of the ten driest basins in the world with quasi-absent of surface water (Hssaisoune et al. 2020; Ouysse et al. 2010; Revenga et al. 1998). The majority of the population are rural and highly dependent on dry land farming, dominating the whole Drâa basin territory (Abbaci and Bouchaou 2021; Messouli 2014). However, the lack of historical measurements of precipitation and the impracticality of making extensive rainfall measurements from available stations hampers long-term planning for minor and major dam needs considering that floods provide most of the water volume for many reservoirs across DRB (Schulz et al. 2008). Moreover, this typical arid and semi-arid region is ecologically fragile and highly sensitive to climate change and several studies expect decreasing annual precipitation and more frequent extreme weather events (Diekkrüger et al. 2012; Johannsen et al. 2016) making many economic activities vulnerable to precipitation variability (Johannsen et al. 2016; Messouli 2014). Under the given conditions, the quantitative estimation of precipitation is an important component to conduct water management.

This paper addresses one of the significant obstacles facing surface and groundwater management with lack of regional precipitation information in the Upper and the Middle Drâa basins. The assessment considers the performance of five P-dataset that incorporate different inputs, taking into account temporal, intensity, and elevation variations. The outcomes of this study will also provide an overview of the usability of similar P-datasets used in the surrounding basins (Bouizrou et al. 2023b; Milewski et al. 2015; Ouatiki et al. 2017), with the first attempt to include a reanalysis product.

2 Materials and methods

2.1 Study area

The investigated research area covers the Upper and Middle Drâa basins, which are bonded between 29° N to 32°N and 5°W to 8° W. These units represent about a quarter of the whole Drâa basin (DRB), with surface area of 15,200 km2 and 14,000 km2, respectively. The landforms vary from the southern slope of the High Atlas Mountains where the summit of Jebel M’Goun reaches 4071 m to the Saharan lowlands in the south where the average elevation is around 500 m at Iriki Lake (Fig. 1). This elevation gradient over the basin governs the distribution of vegetation, climate, the intermittent/permanent regime of the tributaries and their flow rate (Busche 2008). The climate in the DRB shows a high variability depending on the elevation gradient and aridity which have a high impact on precipitation quantity, duration and intensity (Cappy 2007). The High-Atlas Mountains acts as a NE–SW (uplifting) moisture barrier and experiences a precipitation regime influenced primarily by the Atlantic Oscillation and the interactions between the tropical and extra-tropical systems (Knippertz et al. 2003). The humid season starts from the late summer to spring with maximum precipitation event in spring and early fall, but the transition in climate from a humid to arid and hyper-arid conditions is evident (Born et al. 2008). Consequently, the seasonal cycle’s amplitude is short in such a way that 40% of the total annual precipitation was recorded from September to November at Ouarzazate station in a single rainy period during late summer/early autumn (Knippertz et al. 2003). The precipitation at the high Atlas peaks can reach 900 mm/year, giving rise to an exceptional permanent flow regime of the M'Goun and Dades Rivers contributing to the Mansour Eddahbi reservoir (MED) considered as the decisive hydrographic feature of the Upper Drâa basin. Moreover, the precipitation decreases rapidly at the foothills of the High Atlas Mountains and becoming arid to hyper-arid downstream. The precipitation reaches only 107 mm/year in Ouarzazate and decreases to 61 mm/year in Zagora and 30 mm/year in Saharan forelands.

Fig. 1
figure 1

Geographic presentation of DRB with the location of 20 rain gauge stations along the altitude gradient

2.2 Used data

2.2.1 Ground-based precipitation observation sites

The monthly rainfall data were delivered from the Drâa-Oued Noun Hydraulic Basin Agency (ABHDON), IMPETUS project and the National Moroccan Weather Service (Direction de la Météorologie Nationale). Starting in 1962, the ABHDON agency maintains more than 37 manually operated stations situated in open areas and close to the discharge gauges. Consequently, a large number of rain gauge stations are regularly installed close to the rivers and tributaries spread out from the highest altitude of the High Atlas Mountains in the north along a gradient of altitude and aridity (Fig. 1). Due to technical problems, several individual rainfall records are marked by gaps or even have stopped recording. Consequently, only 19 rain gauges, mostly located in the Upper Drâa basin, were selected based on their more complete data record, where the temporal coverage varied from a short period (6 years) to a long period (53 years). To achieve the most representative data for the Middle Drâa basin considered as poorer regarding gauge station density, rainfall data from Tazarine station belonging to the Maider basin in the eastern part of the Anti-Atlas Mountains sharing the same climatic context was added to enhance rainfall data density (Fig. 1). Nonetheless, the density of rain gauges in the area is still restricted, and only the Upstream Drâa basin meets the minimum density standard recommended by the World Meteorological Organization for mountainous regions and challenging circumstances, which requires at least one gauge per 250–1000 km2 (Liang and Wang 2020). Moreover, only one available meteorological station belonging to the Upper Drâa basin and accredited by the World Meteorological Organization (WMO) was included.

2.2.2 Satellite-based precipitation product (P-datasets) and selection criteria

Choosing the appropriate precipitation datasets may have a large impact on model output accuracy (Gebregiorgis and Hossain 2013). For instance, confident drought monitoring depends on the availability of historical data with at least 30 years, whereas high-resolution satellite precipitation products enhance hydrological simulations and flood predictions (Peng et al. 2021; Yong et al. 2014). Due to the irregular distribution of rain gauge stations in DRB, using low spatial resolution satellite-based product increase the likelihood of capturing multiple stations within the same pixel. Indeed, different precipitation values recorded by nearby stations will be compared with a single pixel value of the P-datasets. In such a context where the elevation influences the precipitation distribution, the variability at the smaller scale makes it difficult to define one station as representative for a given region. Additionally, it is necessary to ensure timely performance in order to address the current mismatch between the times of gauge and satellite records, and thus to incorporate the maximum number of rain gauge stations. In this regard, choosing satellite-based products with high spatio-temporal resolution is the best approach to establish the direct usability of the P-dataset.

Five gridded products with different data sources, spatiotemporal resolution, latency, and estimation methods were evaluated (Table 1). Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) was established by the United States Geological Survey with collaboration with the University of California to support the International Development Famine Early Warning Systems Network (FEWS NET). CHIRPS encompasses three diverse types of algorithms: global climatology, satellite information and in situ observations to represent sparsely gauged locations. CHIRPS dataset involves the Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis version 7 (TMPA 3B42 v7)7 to calibrate global Cold Cloud Duration (CCD) and Global Precipitation Climatology Centre (GPCC) to perform the validation (Funk et al. 2015). Detailed information regarding the CHIRPS and the dataset of version 2.0 used in this study are available from the Portal platform (https://www.Chc.ucsb.edu/data/chirps).

Table 1 Summary of the five P-datasets used with different input data. In input source column, S, R, G refer to satellite, reanalysis, and gauge information

Multi-Source Weighted-Ensemble Precipitation (MSWEP) merges gauge, satellite, and estimates from atmospheric models to provide precipitation estimates with high spatio-temporal resolution worldwide. The MSWEP incorporates relative merits of satellite infrared and microwave precipitation from five satellites product estimates with correction of systematic terrestrial biases using a strong network of rain gauges, rivers discharge observations and hydrological modeling (Beck et al. 2019a). The present study used the latest version of the dataset MSWEP V2.8; this product is available online at (https://www.gloh2o.org).

Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System-Climate Data Record (PERSIANN-CCS-CDR) was generated by the Center for Hydrometeorology and Remote Sensing at the University of California for capturing extreme events at high temporal resolution. Compared to other PERSIANN version (CCS and CDR), PERSIANN-CCS-CDR designed to address the need for having a long-term dataset with fine spatio-temporal resolution (Sadeghi et al. 2021). To yield a primary dataset, PERSIANN model incorporates Gridded satellite infrared data (GridSat-B1) brightness–temperature data from Geostationary Earth orbiting (GEO) satellites as an input data and the monthly Global Precipitation Climatology Project (GPCP) version 2.2 to correct the bias.

The fifth generation European Centre for Medium-Range Weather Forecasts Reanalysis on global land surface (ERA5-Land) is the latest reanalysis offering from the European Center for Medium-Range Weather Forecasts (ECMWF). ERA5-Land utilizes the Scheme for Surface Exchanges over Land incorporating land surface hydrology (H-TESSEL) to generate detailed information on surface and sub-subsurface processes at a highly localized level. (Muñoz-Sabater et al. 2021). ERA5-Land contains a series of improvements compared to ERA5 and the older ERA-Interim making it more accurate for all types of land applications and precipitation assimilation. The monthly averaged data are converted to total precipitation (mm/month) intervening the processing instructions given by the Copernicus hydrological table for accumulated variables made available online at (https://cds.climate.copernicus.eu/cdsapp#!/home).

Tropical Applications of Meteorology using SATellite data and ground-based observations (TAMSAT) was produced by the Meteorology Department of the University of Reading (United Kingdom). TAMSAT V3.1 officially replaces v3.0 since the 1st of July 2020 with additional changes enabling users to handle the data with greater ease. TAMSAT rainfall data have been widely used for seasonal early warning of drought and assessments of long-term agricultural risks over the African continent (Le Coz and Van De Giesen 2020). The rainfall estimates are based on Meteosat thermal infrared imagery, calibrated against ground-based rain gauge measurements (Maidment et al. 2017).

2.3 Used methods

2.3.1 Observations and P-datasets comparison

Point to pixel comparison was adopted to compare the obtained value from station and different P-datasets. Indeed, the insufficient number of rain gauge stations, landform complexity and high spatial variability of precipitation limits the accurate interpolation of rainfall into gridded data (Liu et al. 2020). Due to the non-availability of daily record for most rain gauge data and the limitation of monthly data in the past decades, a period of 81 months (≈ 7 years) comprising ideally the most recent period of records is the only way to include the maximum number of gauge stations. The selected period varies based on the availability of a recent continuous 7-year period without gaps. According to WMO (2018), a period of five years is considered adequate for conducting this kind of study. The small rain amounts not measured by the standard instrument are interpreted as dry months (< 1 mm) in the observation data (WMO 1983).

2.3.2 Accuracy of the P-datasets

The accuracy of the satellite product was assessed quantitatively in terms of monthly and seasonal temporal scales and at different elevations and grades using continuous statistical measurements, categorical metrics and volumetric indices (Liu et al. 2020). Finally, an extreme precipitation event was analyzed considering the spatial distribution of precipitation in the nearest rain gauges with the availability of daily records. The continuous statistics included the Pearson Correlation Coefficient as a measure of the degree of agreement/disagreement between the observation and the gridded datasets. The tendency of the gridded product to over/underestimate the rain gauge observation was calculated using the bias and the root mean square error (RMSE) was used to quantify the average magnitude of error between both values. The categorical indices were used to express the capacity of the satellite product to detect the rain events above a given threshold (Wilks 2011). These indices are based on the contingency table and include the Probability of Detection (POD) referring to the fraction of precipitation events that were correctly detected (i.e., 1 implies that all rainfall events are detected by the gridded data, 0 means none of the event was correctly detected). False Alarm Ratio (FAR) represents events identified by the gridded data but not confirmed by the rain gauge observation (i.e., 0 perfect agreement of rainfall occurrences in both observation and gridded data, 1 means all the detected event are not confirmed by the gauges). The Critical Success Index (CSI) is the fraction of the rain event that was correctly detected by the gridded data including missed occurring rain events (i.e., 1 implies zero occurrences in both the false alarm and missed categories).

The volumetric difference that may exist between two variables detected correctly or incorrectly cannot be identified with categorical measurements (Aghakouchak and Mehran 2013). Consequently, the precipitation volume fraction detected correctly/incorrectly is still unknown. This margin of biases (overestimate or underestimate) can be important in the context, where the precipitation is highly variable in terms of frequency, duration and intensity and DRB is a clear example, where a heavy storm can represent more than half of the average annual sum (Cappy 2007). Therefore, the categorical metrics have been extended to volumetric measures to quantify the error magnitude between the observed and estimated variable. The volumetric indices include the volumetric hit index (VHI) which defines the correct rainfall volume detected by the satellite product relative to the rainfall volume detected by the satellite and missed by observation. The volumetric false alarm ratio (VFAR) expresses the volume of false volume rainfall detected by the satellite product relative to the sum of rainfall detected by the same satellite product. The volumetric critical success index (VCSI) reflects the overall measure of volumetric performance. All these statistics and indices with their perfect score are summarized in Table 2.

Table 2 Continuous, categorical and volumetric indices used to quantify the performance of the five P-datasets

3 Results

3.1 Monthly assessment

Water balance analysis accuracy depends more on the availability of monthly precipitation datasets (Liu et al. 2020). The monthly precipitation during the 81 months was calculated for each gauge station and the corresponding pixel for the five products. The basic statistics in Table 3, indicate that TAMSAT V3.1 yields the highest average monthly precipitation over DRB (with a mean of 27.42 mm), followed by CHIRPS V2.0 (with a mean of 25.21 mm), PERSIANN-CCS-CDR (with a mean of 22. 57 mm), MSWEP V2.8 (with a mean of 16.86), and ERA5-Land (with a mean of 15.75 mm). Concerning the range and deviation of the monthly precipitation, MSWEP V2.8 and ERA5-Land exhibit narrowest range (about one-third over the maximum of gauge stations). ERA5-Land shows close values to the standard deviation of the observation (22.69 mm), and TAMSAT V3.1 attains the largest standard deviation (36.64 mm). A large disagreement between the observed and estimated precipitation values is evident for most P-datasets (low and moderate r), and thus all points do not fall on the 1:1 line (Fig. 2). The results of monthly assessment presented in Table 4, indicates that ERA5-Land followed by MSWEP V2.8 achieved considerably good performance in most continuous statistical measures (r, bias and RMSE) with an overestimation rate of 46% and 63%, respectively. On the other hand, TAMSAT V3.1, PERSIANN-CCS-CDR and CHIRPS V2.0 show low performance overestimating significantly rain gauge observations (82%, 83% and 85% of total months were overestimated, respectively). This discrepancy with observation data is more observed with TAMSAT V3.1 yielding high RMSE (38. 81 mm/ month). CHIRPS V2.0, PERSIANN-CCS-CDR and TAMSAT V3.1 show high FAR and VFAR score (Table 4). Indeed, the volume of precipitation missed doesn’t exceed 3% of the percentage of the volume of precipitation detected above the given threshold for all P-datasets yielding high VHI and VFRA score.

Table 3 Statistics of monthly precipitation of the gauge stations and five P-datasets across the DRB for 81 months (unit: mm)
Fig. 2
figure 2

Scatter plot of rain gauge observations and versus the five P-datasets over the study area at different temporal scales (monthly, wet and dry season)

Table 4 Performance assessment of the five P-datasets at various time scale

3.2 Seasonal assessment

The seasonal assessments of the five P-datasets and rain gauge observations are shown in Fig. 2 and Table 4. All P-datasets overestimate precipitation during wet and dry seasons, but the intensity allows distinction between CHIRPS V2.0 and TAMSAT V3.1 yielding high bias and RMSE scores at all seasons and other products. ERA5-Land and MSWEP V2.8 exhibit strong correlation during wet and dry seasons, respectively, while PERSIANN-CSS-CDR reveals a noticeable transition from wet to dry season compared to other P-datasets. The capacity to detect precipitation accurately in all seasons was shown by all P-datasets yielding high POD scores. Only one wet season was missed by PERSIANN-CCS-CDR during the wet season at 1 mm threshold. Consistent with categorical indices, a comparison of the volumetric indices of seasonally with those of monthly assessment in Table 4, describe certain independence regarding seasonal variation recording the same optimal scores.

The seasonal precipitation data were stratified into winter (December to February), spring (March to May), summer (June to August) and autumn (September to November). All the five P-datasets exhibit a seasonal pattern with low RMSE scores during the summer (Fig. 3c). High correlation, low bias and RMSE characterize ERA5-Land and MSWEP V2.8 at all seasons. CHIRPS V2.0 showed a good ability to detect occurred precipitation all the year, but a large precipitation event detected (44% of rainfall event) were not confirmed by gauge observations yielding high FAR rate (Fig. 3d, e). ERA5-Land, PERSIANN-CSS-CDR and TAMSAT V3.1 show almost the same seasonal cycle in POD and FAR with relatively optimal scores during winter. Indeed, FAR ratio increases from winter to autumn for most P-datasets except MSWEP V2.8. Otherwise, the maximum number of missed events was noticed for TAMSAT V3.1 with more than 88 missing events during spring (21% of occurred events). Achieving high POD, CSI, VHI, and VCSI values and low FAR and VFAR values was shown by ERA5-Land followed by PERSIANN-CCS-CDR during winter. However, MSWEP V2.8 performs well during other seasons compared to PERSIANN-CSS-CDR.

Fig. 3
figure 3

Seasonal assessment indices of the five P-datasets with reference to: winter (December–February), spring (March–May), summer (June–August) and autumn (September–November)

3.3 Assessment according to different intensities

Based on the previous results, it can be inferred that using lower thresholds yields better categorical and volumetric outcomes. This is because the number of correctly identified values surpasses the number of both false and missed events, even when using the given threshold of 1 mm. In this sense, higher thresholds can be introduced to evaluate solely the higher quantiles of simulations relative to observations (Aghakouchak and Mehran 2013). Two precipitation grades including Q2 = 2.5 mm/month and Q3 = 12.5/month (i.e., 50% and 75% quantiles of rain observations, respectively), were used as the corresponding thresholds. In this sense, precipitation range < 1 mm, 1 ≤ precipitation range < 2.5, 2.5 ≤ precipitation range < 12.5 and ≥ 12.5 mm are interpreted as dry months, months with moderate precipitations, months with noticeable precipitations and months with important precipitations, respectively. Assigning such interpretation to different precipitation levels depends on the climatic features of the study area and hence, the precipitation amount recorded by rain gauge stations.

The occurrence percentage (frequency) according to precipitation intensity is shown in Fig. 4e. Dry months accounted for 42.96% of the total months recorded at 20 stations across the Upper and the Middle Drâa river basins. All five P-datasets show a clear difference throughout all rain intensity categories with a tendency to underestimate the frequency of dry months (< 1 mm). CHIRPS V2.0 appears to have reasonably estimated correctly the frequency for months with moderate precipitation while it was underestimated slightly by TAMSAT V3.1. Meanwhile, MSWEP V2.8 followed by ERA5-Land and PERSIANN-CSS-CDR overestimate the same precipitation range. TAMSAT V3.1 underestimates the frequency of months with noticeable precipitation, whereas all other P-datasets overestimate this frequency by more than 5%. Noticeably, all products tend to overestimate the frequency of months with important precipitation amount, where CHIRPS V2.0 followed TAMSAT V3.1 show the highest rate (more than 25%). In terms of continuous measurements, the magnitude of r and RMSE increases with precipitation greater than 1 mm for all P-datasets (Fig. 4a–c). Bias optimal values were recorded during months with moderate and noticeable rainfall months (Fig. 4b). PERSIANN-CSS-CDR overestimates precipitations above 12.5 mm whereas CHIRPS V2.0 overestimates significantly at low precipitation threshold (< 1 mm). Low capability to correctly detect months with moderate precipitation was noticed by all P-datasets (Fig. 4d). CHIRPS V2.0, PERSIANN-CCS-CDR and TAMSAT V3.1 are able to detect precipitation above the Q3 threshold (months with important precipitation) compared to ERA5-Land and MSWEP V2.8.

Fig. 4
figure 4

Evaluation indices and precipitation distribution of the five P-datasets at the different monthly precipitation grades

3.4 Evaluation at the different elevation levels

Based on the topographical setting of the DRB, the rain gauge stations are distributed across three principal domains, namely: (i) the mountain ranges of the High and Anti Atlas in the northern and central part following the WSW-ENE axis, (ii) the southern flank of Central High Atlas and Ouarzazate basin identifiable as large depression between the High Atlas and the Anti-Atlas ranges, and (iii) the low-lying terrains adjacent to the Saharan desert. The selected rain gauge stations span an elevation range of 656 to 2100 m above sea level (Fig. 5). To maintain the disparity of the above-mentioned domains, the classification was calculated below Oussikis station (2100 m) due to a limited number of stations in high-elevation areas. Table 5 depicts the assessment statistics for the five P-datasets on monthly precipitation scale at different elevations.

Fig. 5
figure 5

Rain gauge stations repartition across DRB along a gradient of altitude from the highest stations to low altitude

Table 5 Performance assessment for the five P-datasets at monthly precipitation scale with respect to different elevation ranges

Good statistical scores were achieved by ERA5-Land and MSWEP V2.8 at all elevation ranges and share almost the same categorical and volumetric indices above 1100 m (Table 5). These products show the lowest average bias, RMSE, FAR and VFAR values at low-lying terrains adjacent to the Saharan desert. ERA5-Land agrees with most gauge stations and the highest correlation (r = 0.87) was obtained for Tahrbilt gauge station at an elevation of 1184 m (Table 6). Indeed, this station shows high resemblance with estimated precipitation from all P-datasets. On the other hand, CHIRPS V2.0, PERSIANN-CSS-CDR, and TAMSAT V3.1 perform poorly in low-lying terrains adjacent to the Saharan desert, the Ouarzazate basin, and the High and Anti Atlas range, respectively. These products detect more dry months as wet yielding high FAR and VFAR with less missing precipitation events (i.e., high POD and VHI). The magnitude of RMSE increases with elevation for all P-datasets except CHIRPS V2.0 recording the highest average RMSE (48.16 mm/Month) at low elevation (Fig. 6c). Otherwise, a negative association was shown in terms of bias for CHIRPS V2.0, PERSIANN-CSS-CDR, and TAMSAT V3.1 (Fig. 6b). The slope of trend lines of POD and VHI for ERA5-Land and MSWEP V2.8 are higher compared to other products whereas, CHIRPS V2.0, PERSIANN-CSS-CDR and TAMSAT V3.1 show similar trends (Fig. 6d, e). FAR and VFAR had a negative association with elevation for all P-datasets except ERA5-Land (Fig. 6e, h). The performances of the five P-dataset on a monthly scale at each gauge station are shown in Table S1–S7 of the Supplementary materials. ERA5-Land and MSWEP V2.8 are the only products underestimating precipitation mainly in lower elevations.

Table 6 Correlation coefficient for the five P-dataset at each rain gauge station over the DRB
Fig. 6
figure 6

Assessment of continuous, categorical and volumetric indices at the different elevation range across DRB

3.5 P-dataset ability to detect extreme rain event

During some very exceptional and important flood events, Drâa River crosses the lac Iriki and low Drâa basin in a continuing flow regime to reach the Atlantic Ocean in the southwestern part of the country (Karmaoui et al. 2014). Progressively, this situation has changed since several reservoirs have been put into service such as Oussikis (31°69'N and 5°90'W), Sultan Moulay Ali Cherif (30° 93' N and 7° 24′ W), Taghdout 30° 62′ N, 7° 29′ W, Mansour Eddahbi (30° 90′ N and 6° 75′ W) and recently Agdez (30° 82′ N, 6° 56′ W) replacing the natural flooding system with a controlled flow regime to different oases, palm groves and valleys downstream (ABHDON 2018). The 31 March and 1 April 2002 rain event was suggested to be one of the most important extreme rain events for more than 25 years (Cappy 2007). The precipitation rate reaches 77 mm in 24 h, causing an important flash flood and strong anomaly in several gauge stations monitoring discharges of rivers and tributaries over the basin. This event was recorded by six stations distributed across the upstream part of DRB with available daily records.

From daily precipitation series, the ability of the P-dataset to detect extreme event varies according to stations. Both CHIRPS V2.0 and PERSIANN-CCS-CDR display a good agreement with all rain gauge stations (Table 7). The correlation values of these P-datasets are high but noticeable distinction in term of other continuous statistics is evident. PERSIANN-CCS-CDR overestimates the extreme event episode at all rain gauge station with high RMSE score at Aman N’Tini station (Fig. 7d). The latest presents an exception in such a way that all the Five P-datasets overestimate the intensity of precipitation episode. Otherwise, the high underestimation rate was recorded at Tahrbilt station for ERA5-Land, MSWEP V2.8 and TAMSAT V3.1 (Fig. 7c). Indeed, ERA5-Land and MSWEP V2.8 indicate weakness in detecting the first and late extreme precipitation episode, respectively.

Table 7 Performance measures of the five P-datasets in affected rain-gauge stations during the extreme precipitation event of 31 March and 1 April 2002
Fig. 7
figure 7

Daily rainfall for six rain gauge stations during March/April 2002 flood event

4 Discussion

The conducted study explores the reliability and the quality of five tested P-datasets with reference to 20 rain gauge stations distributed across the Upper and the Middle Drâa basins. Our findings indicate clearly that achieving favourable outcomes in continuous statistics lead generally to high performance in both categorical and volumetric statistics, while the reverse is not necessarily true. Furthermore, the fine spatial resolution of CHIRPS V2.0, PERSIANN-CCS-CDR and TAMSAT V3.1 does not guarantee high performance in capturing the correct precipitation amount. This vulnerability to errors could be increased under the complexity of DRB, namely topography and precipitation variability. However, the observed weakness of these products does not characterize a unique elevation range or season, which requires more confidence before attributing such limitation to input data, onboard sensors, algorithm, and interpolation techniques. For instance, the integration of PMW and IR sensors for the majority of P-datasets are expected to present certain limitations in more complex terrains than open areas (Funk et al. 2015). Moreover, Milewski et al. (2015) found that overestimation of precipitation from the TRMM product, which merges data from PMW and IR satellites, is significant in many arid environments over Morocco. In this sense, the high bias found in the hyper-arid low-lying terrains may be related to the limitation of PMW during short term and slight precipitation events (Gebregiorgis and Hossain 2013). Nevertheless, the infrequent sampling challenge presented by the PMW sensor occurs mainly during short storms and hence not sufficient to explain the uncertainty for TAMSAT V3.1 based only on IR data. Indeed, the revealed error for these product does not depend on miss-precipitation bias since most P-datasets are able to detect approximately all occurred events. Therefore, the indirect nature of the IR to sense the precipitation are expected to explain their limitation (Kidd and Huffman 2011). The large uncertainties found with these products were reported in previous studies conducted over arid and complex basins. Consistent with the result of the low accuracy of CHIRPS V2.0 product under the complexity of DRB, Rivera et al. (2018) also found that CHIRPS product perform better outside the semi-arid regions and below 1000 m of elevation. In the same way, Rachdane et al. (2022) have attributed the high bias of PERSIANN-CSS-CDR found over Sub-Saharan Moroccan basins to topography effects. Indeed, the integration of cold-cloud duration (CCD) values derived from IR for CHIRPS V2.0 and TAMSAT V3.1 compared to artificial neural networks method employed by PERSIANN-CSS-CDR are expected to classify more clouds as cold clouds with precipitation activity, which may explain the high FAR ratio (Domenikiotis and Dalezios 2003). Besides, low-intensity precipitation during winter may lead to the same uncertainty considering the inability of PMW techniques during the cooler months. On the other hand, the optimist RMSE values recorded mainly by these three products during summer may be explained by the ability of IR sensor to detect thunderstorm which dominate this period of year (Knippertz et al. 2002).

The impressive performance achieved by ERA5-Land underscores its ability to accurately model large-scale precipitation systems. These results are consistent with earlier studies by Beck et al. (2019b) and Kidd et al. (2012), which highlight the superiority of the ECMWF model in forecasting large-scale precipitation compared to regions dominated by convective precipitation. Thus, the limitation of short-lived storms affecting small geographic regions in late summer/early autumn explain the overall accuracy of ERA5-Land throughout the year. On the other hand, the integration of additional patterns for MSWEP V2.8 such as daily gauge observations and accounting for reporting times performance compared to CHIRPS V2.0 which combine also gauge, satellite, and (re)analysis data. However, the uncertainty of reanalysis model estimates are expected to grow as we go back in time compared to other products, which may explain their low ability to detect extreme events that occurred in 2001.

5 Rain gauge incertitude and approach limitations

Based on the prior findings, most P-datasets have shown positive bias and high RMSE scores, with a clear deficiency in underestimating the frequency of dry months and overestimating high-intensity precipitation (Table 4 and Fig. 4). However, the magnitude of errors could also be a result of rain gauge uncertainty. Employing various sizes and shapes of orifice and gauge heights not strictly comparable with WMO standards are expected to underestimate actual precipitation by 5% to 40%, with a bias of 9% on average (Groisman and Legates 1994; Kidd and Huffman 2011). Furthermore, the majority of rain gauges in DRB are not equipped with windshields and evaporation losses are expected to influence instrument accuracy. In this sense, estimating the systematic gauge-measuring incertitude on the basis of weather information from synoptic stations may confirm the error proneness of a given P-dataset. National Direction of Meteorology maintains one station accredited by the World Meteorological Organization (WMO) delivering data with relatively high-quality standards and temporal completeness (Filahi et al. 2016; Schulz et al. 2008). The Monthly precipitation data from Ouarzazate meteorological and gauge station for a period of 38 years from January 1983 to December 2020 was extracted. Measures of similarity were obtained using both Taylor Diagram and frequency histogram (Fig. 8a, b). Based on the results, it can be inferred that the ground-observations share similar patterns with correlation values greater than 0.90. A few outlier values showing substantial disagreement with weather measurements may be attributed to recording errors, but gauge station underestimate the precipitation recorded by the meteorological station of Ouarzazate by 10% (Fig. S1 and Table S8 in supplementary materials).

Fig. 8
figure 8

Taylor diagram and precipitation distribution for different intensity of the five P-datasets using the meteorological station of Ouarzazate as reference

As shown in Fig. 8b, ERA5-Land and MSWEP V2.8 have the highest correlation values (0.77 and 0.71), followed by CHIRPS V2.0 (0.65), while PERSIANN-CSS-CDR and TAMSAT V3.1 share the same correlation values (0.52). The high RMSE shown by PERSIANN-CSS-CDR confirms its low performance found in Ouarzazate basin. In terms of occurrence percentage, the performance of most P-datasets was considered better compared to the previous assessment based on the corresponding gauge data. However, underestimating no rain range (< 1 mm) still characterizes all five datasets (Fig. 8a). CHIRPS V2.0 has low performance with non-dry month during all the 38 years, but detect months with moderate precipitation intensity satisfactorily. On the other, hand MSWEP V2.8 is more adequate for precipitation below 12.5 mm and ERA5-Land during important precipitation (> 12.5 mm).

Uneven distribution of rain gauge stations across DRB, particularly in the Middle Drâa and lower terrains, limits our understanding of precipitation patterns. Indeed, a large part of DRB with complex topography and precipitation variability are still unobserved which make it difficult to accurately assess the limitations of a given P-dataset. On the other hand, this study evaluated P-datasets at monthly, seasonal, and sub-seasonal scales. However, the lack of daily records, especially in the Middle Drâa basin where there are no daily precipitation records, hampers our ability to assess the effectiveness of P-datasets in hydrological studies, such as rainfall–runoff modeling. Besides, further studies at the hourly scale are still necessary considering that rain events in DRB as many arid regions occur within a very short time, such as a few hours.

6 Conclusion

This study evaluates the performance of five P-datasets with high spatial and temporal resolution over DRB, namely CHIRPS V2.0, MSWEP V2.8, PERSIANN-CSS-CDR, TAMSAT V3.1 and ERA5-Land. Incorporating the maximum number of available precipitation data from 20 gauge station and on one meteorological rainfall data accredited by the WMO to evaluate the P-datasets accuracy, is unprecedented in the study area. The valuable information provided is considered helpful before using such P-datasets as alternatives to rain gauges. The study's main findings are as follows:

  1. (i)

    Incorporating atmospheric reanalysis models as input are expected to reduce the sensitivity of the P-dataset to the complexity of topography and precipitation variability over DRB compared to those that combine information primarily from IR and PMW.

  2. (ii)

    ERA5-Land followed by MSWEP V2.8 perform better at various times, intensities and elevation scale and show good agreement with most rain gauge stations, but have a limited ability to detect completely extreme precipitation episodes.

  3. (iii)

    CHIRPS V2.0, PERSIANN-CSS-CDR and TAMSAT V3.1 detect dry months as wet and overestimate significantly the precipitation amounts. Better performance of these products during months with moderate and noticeable precipitation intensity is evident. However, still not satisfactory to be recommended for drought monitoring investigation which require more ability during low-intensity precipitation.

  4. (iv)

    In situ rainfall measurements using unique and adopted designs by the WMO are expected to reduce the magnitude of bias and RMSE found. However, achieving the minimum density standard for constructing precipitation gauge networks in DRB is still mandatory to provide more information about the accuracy of input data, onboard sensors, algorithm, and interpolation techniques.