1 Introduction

Precipitation, a key environmental variable, finds itself in a multitude of applications making it central to scientific research such as hydroclimatic modelling, climate change, agriculture, and water resource modelling and management (Sun et al. 2018; Try et al. 2020). It is also noteworthy to mention that obtaining a high-quality and fine spatiotemporal resolution precipitation data is still a challenge. Direct gauge precipitation is considered the most reliable and accurate data source that reflects the precipitation at the Earth’s surface. However, in many developing nations, availability of such gauge-based data is dire given their political-economic instability. Even though gauge data are available for some, they may not be complete and are often costlier. Also, they house inhomogeneities attributed to instrumentation relocation, observation gaps and errors, data transmission, inability to represent single large event, etc. Weather radar could be an alternative to the gauge-precipitation, but it too demands wider coverage and calibration according to precipitation type. Further, it also could be the costlier and have limitations in capturing the precipitation in mountainous areas (Arshad et al. 2021; Lee 2006). To overcome these gaps, gridded precipitation products (GPPs) developed at semi-global to global scale with varying spatio-temporal resolution using satellite estimates, gauge-based interpolated products, and/or a blend of both have garnered wider attention from the scientific communities.

GPPs are developed drawing on a vast wealth of information from several sources and agencies (both government and non-government) further intensified by leveraging the use of satellite and the evolving state-of-the-art computing techniques. This has opened up a new window to reliably estimate and develop the hydroclimatic series for a longer period especially in the areas with scarce to no data. The GPPs find their way into hydroclimatic applications owing to their easy access and as cost-free resources. However, there is a plethora of GPPs (Darand & Khandu 2020) to choose from and no one-size-fits-all option available for selection. In addition, performance of GPPs is also not uniform across the spatial scale (Camici et al. 2018). They further possess a high level of uncertainty owing to their gridding algorithms, reliability and number of data sources used, kind of models used for estimating precipitation, gauge adjustment techniques, etc. (Aliyar et al. 2021; Ebert et al. 2007) that propagates through the application (hydrological modelling or extremes analysis). Thus, credibility of GPPs is subjective to factors like time, region, climatic zone, and the technology and methods involved in their development. Hence, evaluation of GPPs is crucial prior to their use. Evaluation and ranking techniques span across a multifaceted spectrum including the use of single criterion like (correlation coefficient (r) or root mean squared error (RMSE)) to multi-criteria decision making (MCDM) like compromise programming (Komaragiri & Kumar 2014; Zeleny 1973), Cooperative Game Theory (Gershon & Duckstein 1983), Preference Ranking Organization Method of Enrichment Evaluation (PROMETHEE-2) (Brans et al. 1986), and Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) (Opricovic & Tzeng 2004). These methods are gradually evolving and regularly improved/updated. Zeleny (1973) introduced compromise programming (CP) to automatically devise the solution closest to the reference dataset. This approach tends to obtain the best solution by finding the optimum values of evaluation indices under consideration. The search for the optimum value is guided by the entropy method to weigh each indicator (Komaragiri & Kumar 2014). The main advantage of the entropy method is that it calculates the weight for each indicator without intervention from the evaluator avoiding unintentional bias. Further, the method weighs each indicator based on the information available which is given by its entropy value. Higher the entropy value, higher is the uncertainty.

Saying that, the primary purpose of the evaluation should be to select “fit-to-purpose” dataset rather than finding the “most correct” dataset. Abundance of studies on the evaluation of GPPs against the gauge-based station data is available that spans across a wide assortment of spatial coverage from global scale (Hobeichi et al. 2020; Shen et al. 2020; Yong et al. 2015), continental scale (Awange et al. 2016; Tarek et al. 2021), national scale (Aliyar et al. 2021; Darand & Khandu 2020; Prakash et al. 2016), regional scale (Ahmed et al. 2019; Camici et al. 2018; Satgé et al. 2020), and to basin scale (Duan et al. 2016; Fallah et al. 2020; Upadhyay et al. 2022; Yan Yang et al. 2014). Depending on region, scale, and evaluation method, the performance of GPPs is different. Lately, studies on the evaluation of GPPs across the Mekong River Basin (MRB), a transboundary river basin in Southeast Asia that transcends six countries China, Laos, Myanmar, Thailand, Cambodia, and Vietnam inhabited by more than 60 million people who rely on agriculture and fisheries (Piesse 2016), is growing. These studies can broadly be categorized into (a) ability to characterize the spatio-temporal distribution of precipitation (A. Chen et al. 2018; C.-J. Chen et al. 2017; Dandridge et al. 2019; Irannezhad & Liu 2022) and (b) blend of spatio-temporal characterization and ability to simulate hydrology (Dang Dinh et al. 2020; Tang et al. 2019, 2021; Tian et al. 2021; Try et al. 2020). Studies in former category mostly used relative bias (Bias), r, RMSE, and different variations in bias and errors as continuous statistical indices while probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI) under categorical indices to evaluate the performance of GPPs. Similarly, the studies in the latter category used Nash–Sutcliffe efficiency (NSE), coefficient of determination (R2), and percentage bias to evaluate hydrological simulation skill in addition to indices enlisted for category A.

All evaluation technique used in former studies focused on evaluating the GPPs based on their ability to reproduce the mean precipitation and skill to simulate the discharge in the basin. None of these studies considered the ability of GPPs to reproduce the extreme indices. The studies in category A were subjective to the evaluation skill of the researcher as the conclusion was drawn based on institution of the researcher without considering the tradeoffs among the evaluation indices used. As the precipitation is inherent with the intrinsic errors of missing data, instrumentation errors, instrument relocation, observation errors, etc. It is often difficult to rank the dataset only based on the statistical indicators as different statistical indices may suggest different GPPs. Thus, this indicates the need for an approach that seeks the tradeoffs among the indicators by automatically weighing them and ranking accordingly. For the studies in category B, the uncertainty in the GPPs to simulate hydrology is also compromised by altering the other water balance components of the hydrological models like actual evapotranspiration, infiltration, etc.(Tian et al., 2021). Further, they were mostly carried out considering the entire MRB and recommended Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) for the entire basin. Also, gauged-based reference data used were mostly concentrated in the upper reaches of the basin (China) and Thailand and relatively less number of observed stations were considered for Laos, Cambodia and Vietnam (Dang Dinh et al. 2020; Tian et al. 2021). For the basin covering a wide assortment of terrain from rugged mountainous to flat terrain, the performance of GPPs is not uniform (Alexander et al. 2020).

Hence, overall objective of this study is to evaluate eight GPPs, namely, APHRODITE, Climate Hazard Group InfraRed Precipitation with Station (CHIRPS), Climate Prediction Center (CPC), CPC MORPHing (CMORPH), Climate Research Unit (CRU), Southeast Asian OBServed (SAOBS), Tropical Rainfall Measuring Mission (TRMM), and Global Precipitation Climatology Center (GPCC) for Nam Ngum River Basin (NRB) and Vietnam Mekong Delta (VMD), two sub-basins of LMRB featuring different characteristics in terms of climate and terrain. The specific objectives are to (i) evaluate GPPs using automated entropy-compromise programming (ECP) that automatically assigns weight to evaluation indicators and rank them, (ii) consider six extreme climate indices apart from mean precipitation, and (iii) check where a single GPP can be generalized for two sub-basins as indicated by former studies.

2 Study area

The study area comprises of two sub-basins of the LMRB: Nam Ngum River Basin in Lao PDR and Mekong delta in Vietnam as presented in Fig. 1

Fig. 1
figure 1

a Location map of Nam Ngum River Basin and Vietnam Mekong Delta. b Elevation profile along with the spatial distribution of reference precipitation gauges across the Vietnam Mekong Delta. c Elevation profile of Nam Ngum River Basin. d Spatial distribution of the reference precipitation gauges in the Nam Ngum River Basin

2.1 Nam Ngum River Basin

The NRB is a one of the important Mekong tributary sub-basins in central Lao PDR housing an area of 16,800 km2 accounting for 7% of the country’s area. The basin lies approximately between 17.9° N to 19.8° N and 101.85° E to 103.5° E. The main tributary of the basin, Nam Ngum River, traverses 420 km southwards before draining into Mekong River. It further discharges 40% of the country’s flow to Mekong River which accounts for 14% of its flow (Meema et al. 2021). The basin is characterized by relatively flat in the southern part and steep topography with undulating terrain in the northeast part. The elevation for the Nam Ngum ranges between 151 and 2698 m.

The basin is marked by the tropical climate with distinct wet (June to September) and dry (October to April) seasons. March and April months usually witness the highest temperature with the average temperature ranging between 30 and 38 °C depending on the altitude and location. The average annual precipitation across basin ranges between 1450 and 3500 mm with basin average of 2000 mm (Meema et al. 2021).

The basin is significant as it has tremendous potential for hydropower and already houses four dams with Nam Ngum 1 being the largest one and many other are under construction or planned. The mean annual river flow of the NRB is approximately 21,000 Mm3 which attains its lowest level during March–April and reaches the peak during August–September.

2.2 Vietnam Mekong Delta

The VMD lies approximately between 8.55°N to 11.05°N and 104.45°E to 106.85°E in the southern Vietnam bounded by the East Sea in the south and southeast, Cambodia in the north, and Gulf of Thailand in the southwest. The basin is spread across an area of 40,000 km2 and is drained by a complex network of the canals, rivers, and dikes.

The VMD is characterized by the relatively flat terrain < 5 m. It observes tropical climate with distinct wet (May to November) and dry season (December to April). About 30–40% of the delta is inundated during rainy (wet) season influenced by the Indian Summer Monsoon (Duy et al. 2021). The average annual precipitation across the basin ranges between 1300 and 2500 mm of which 80–90% is attributed by the precipitation during the wet season accounting for a discharge load of 75–85% (Le et al. 2021).

The VMD is well known for the aquaculture and rice production occupying about 50% (~ 1.9 million ha) of the agricultural land (Sebesvari et al. 2012). However, the annual floods from the Mekong River inundates the large parts of the VMD.

3 Datasets

3.1 Observation datasets

The observed gauge-based precipitation dataset in the NRB were very limited. Only six stations data were made available by the Department of Meteorology and Hydrology, Laos that were mostly concentrated around the central region of the Basin along the Nam Ngum River. The spatial distribution of the observed dataset is presented in Fig. 1. The daily observed gauge-based precipitation dataset was available for the reference period of 1998 to 2014. Only five stations having the complete observed data for the overlapping timeframe were considered for NRB. In case of the VMD, eleven rain gauge stations were made available by the Hydro-Meteorology Data Center, Hanoi, Vietnam. Since, the availability of the gauge density is low in both basins and is unlikely to represent the spatial variability in the rainfall, the GPPs are considered for evaluation. The sparse gauge-based dataset is not well distributed to be considered for the basin scale applications as the uncertainty associated with the sparsity is likely to propagate in the application like hydrological modelling or trends or extremes, etc.

3.2 Gridded precipitation products

In this study, we evaluated eight GPPs that comprise five gauge-based, one satellite-based, and two reanalyzed precipitation products with different spatial and temporal resolutions. Different GPPs have different temporal resolution, and an overlapping reference period of 1998–2014 was considered for the study. Details of products, their description, and source from where they can be retrieved are presented in Table 1.

Table 1 Different GPPs categorized according to their type they fit in, their spatial and temporal resolutions, period of their availability, and sources from where they can be retrieved

3.2.1 Gauge-based GPPs

APHRODITE

Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) is a observed gauge-based interpolated precipitation product using improved angular-distance-weighting interpolation technique that covers three spatial domains (Monsoon Asia, Middle Ease, and Russia) at the spatial resolution of 0.25° and of daily temporal resolution (Yatagai et al. 2012). The Meteorological Research Institute of Japan Meteorological Agency and the Research Institute for Humanity and Nature are the developers of the dataset. Daily based precipitation records form a wide assortment of sources including the global telemetric stations, individual researcher’s collections, meteorological stations records of the National Climatic Data Center, etc. were obtained to develop the APHRODITE. Two versions of products are available: V1101 is available from 1951–2007 and V1101EX_R1 from 2007–2015).

CPC

The Climate Prediction Center (CPC) precipitation product is a quality-controlled observed gauge-based interpolated daily scale dataset produced by the National Oceanic and Atmospheric Administration (NOAA) at the spatial resolution of 0.5° covering a period of 1979 until present (Xie et al. 2007). The data sources used in developing these products house more than 30,000 gauge station data provided by Cooperative Observer Network (COOP), GTS, and numerous national and international organizations.

CRU

The Climate Research Unit (CRU) is a meteorological stations based interpolated terrestrial only (except Antarctica) monthly dataset at a spatial resolution of 0.5° covering a period of 1901 to 2020 (I Harris et al. 2014; Ian Harris et al. 2020). The anomalies for the gauge-based precipitation data were gridded using the angular distance weighting (ADW) interpolation algorithm and were converted to the actual values of precipitation (not anomalies). The dataset is developed by the University of East Anglia with a support from Natural Environment Research Council (NERC), United Kingdom (UK), the United States (US) Department of Energy, and the UK National Center for Atmospheric Sciences (NCAS). In this study, the CRU gridded Time Series version 4.05 (CRU TS v4.05) was used.

GPCC

The Global Precipitation Climatology Center (GPCC) is a gauge-based interpolated long-term gridded monthly dataset with a spatial resolution of 0.25° covering a period of 1891–2016. The dataset houses about 50,000 stations per month throughout the globe (Zandler et al. 2019). The raw gauge-based dataset to produce GPCC was supported by several institutions including national meteorological institutions (NMAs), the World Meteorological Organization (WMO), the Food and Agricultural Organization (FAO), the Climate Research Unit (CRU), etc.

SAOBS

The Southeast Asian OBServed (SAOBS), a product of Southeast Asia Climate Assessment and Dataset (SACAD), is an observed gauge-based interpolated daily dataset for Southeast Asia at two spatial resolution of 0.25° and 0.5° from 1981 to 2014 (van den Besselaar et al. 2017). About 4000 gauge-based precipitation datasets from Southeast Asian countries were used to develop the SAOBS. This study used the dataset at 0.25° spatial resolution.

3.2.2 Satellite-based GPP

TRMM

The Tropical Rainfall Measuring Mission (TRMM) is a satellite-based precipitation estimates developed by a joint mission between the National Space Development Agency (NASDA), Japan and the US National Aeronautics and Space Administration (NASA) with an aim to estimate rainfall in the tropical and subtropical regions. This daily dataset is available at the spatial resolution of 0.25° between 1998 until 2020 (Huffman & Bolvin 2018).

3.2.3 Reanalyzed GPPs

CHIRPS

The CHIRPS is a reanalyzed precipitation product developed from the blend of multiple sources of dataset including observed rain gauge data, precipitation based on cold cloud duration (CCD) infrared data of National Oceanic and Atmospheric Administration (NOAA)—National Climate Data Center (NCDC), TRMM 3B42 and Version 2 atmospheric model rainfall of NOAA-Climate Forecast System (CFS). The dataset is available at two spatial resolutions (0.05° and 0.25°). The CHIRPS25 is the daily precipitation product at the spatial resolution of 0.25° spanning across a period of 1981 until present (Funk et al. 2015) which was used in this study.

CMORPH

The NOAA CPC MORPHing (CMORPH) dataset is a bias-corrected precipitation data product developed by blending gauge-based observations with passive microwave-based infrared satellite precipitation estimates using CPC Morphing Technique. CMORPH data is available at three spatial resolutions of 8 km (only December 2002 onwards), 0.25° and 0.5° and three temporal resolution of 30 min, 3-hourly and daily from 1998 until present (Xie et al. 2017). This study used the CMORPH with 0.25° spatial resolution and daily temporal resolution.

4 Methodology

The series of steps adopted for evaluation and ranking of the gridded dataset is illustrated in Fig. 2. It starts with the collection of observed and gridded datasets. The methodology adopts the pixel-to-point comparison (considering the limited number of gauge stations within the basin) for which the gridded datasets were extracted to station’s coordinates. Following the data extraction, extreme indices were computed and homogeneity tests were carried out. Six extreme indices proposed by Experts Team on Climate Change Detection Indices (ETCCDI) of World Meteorological Organization (WMO) were computed for each dataset. List of extreme indices used in this study are presented in Table 2. A payoff matrix was then formulated by normalizing the evaluation metrics obtained for each EI in addition to mean precipitation for each station. Metrics normalization was carried out to avoid the dominance of the larger value indicators. Entropy method (Komaragiri & Kumar 2014) was employed to weigh each evaluation criteria. The payoff matrix along with the weights obtained from entropy method were then supplied to the compromise programming to obtain station wise ranked dataset. To obtain one final ranked dataset, sum of ranks for individual indices were computed and the one with the least value was ranked 1st, second least value was ranked 2nd, and so on.

Fig. 2
figure 2

Flowchart displaying the GPP evaluation and ranking techniques adopted in this study

Table 2 ETCCDI extreme precipitation indices

4.1 Extreme indices

The Expert Team on Climate Change Detection and Indices (ETCCDI) of the WMO has developed 27 core extreme climate indices to analyze and standardize the climate extremes (Zhang et al. 2011). Of proposed 27 indices, six indices corresponding to precipitation extreme analysis used in this study are presented in the Table 2.

4.2 Continuous statistical indices

To evaluate the GPPs against the available observed dataset, the following continuous statistical indices were used:

Pearson correlation coefficient (r)

$$r=\frac{\sum_{i=1}^{n} \left({o}_{i}-\overline{o }\right)\left({p}_{i}-\overline{p }\right)}{\sqrt{\sum_{i=1}^{n} {\left({o}_{i}-\overline{o }\right)}^{2}\sqrt{\sum_{i=1}^{n} {\left({p}_{i}-\overline{p }\right)}^{2}}}}$$
(1)

where \({o}_{i}\), \(\overline{o }\), \({p}_{i}\), \(\overline{p }\), and n are observed precipitation for ith position, observed mean for the studied period, gridded precipitation for ith position, average gridded precipitation data for the studied period, and n is the total number of data in the timeseries. Pearson correlation coefficient measures the degree of similarity between the observed and gridded dataset. Its value ranges between − 1 and 1. Value of 1 (− 1) represent a perfect positive (negative) correlation.

Bias

$$Bias={\sum }_{i=1}^{n}{p}_{i}-{o}_{i}$$
(2)

where \({p}_{i}\) and \({o}_{i}\) are gridded and observation precipitation data at ith position and n is the total number of datapoints in time series. The positive (negative) value of Bias indicates overestimation (underestimation) by the gridded dataset while the value of zero shows perfect estimation. The closer the value of Bias to zero better is the estimation.

Root mean squared error

$$RMSE=\sqrt{\frac{\sum_{i=1}^{n} {\left({x}_{i}-{y}_{i}\right)}^{2}}{n}}$$
(3)

where \({x}_{i}, {y}_{i}\) are the observed and gridded data points at ith position and n is the total number of datapoints in the timeseries. RMSE measures the differences between the observed and gridded dataset and estimates the averaged error magnitude. Value of RMSE close to zero indicate the better agreement between gridded and observed dataset.

5 Skill score

The skill score compares the probability density function (pdf) of the GPPs with the pdf of the observed/reference dataset and measure the common area between the pdfs.

$${S}_{\text{score }}=\sum_{i}^{N} \left[min\left({P}_{i}^{\text{GPP }},{P}_{i}^{\text{reference}}\right)\right]$$
(4)

where \({S}_{\text{score}}\) is the skill score, \(N\) is the number of bins to calculate pdf, \({P}_{i}^{\text{GPP}}\) and \({P}_{i}^{\text{reference}}\) are the frequencies of values in a bin of the GPP and observed dataset, respectively. A skill score of 1 indicate a complete overlap between the pdfs of GPP and observed and a value of 0 indicate no overlap at all.

5.1 Performance evaluation based on categorical indices

Apart from the continuous statistical indices, three categorical indices—probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI)—were used to evaluate the skills of GPPs in accurately predicting the rain/no rain events. POD measures the ability of the GPPs to correctly estimate the precipitation of a given threshold. It is also commonly known as the hit ratio. Similarly, FAR measures the number of false precipitation event GPPs detected that were not present in the observed data. Likewise, the CSI measures the ability of GPPs to estimate the precipitation skill. The POD, FAR, and CSI are calculated using Eqs. 5, 6, and 7.

$$POD=\frac{R}{R+M}$$
(5)
$$FAR=\frac{F}{R+F}$$
(6)
$$CSI=\frac{R}{R+F+M}$$
(7)

where R is the number of hits, F is the number of false alarm, and M is the number of misses. The POD, FAR, and CSI ranges between 0 and 1, where 1 indicates perfect skill in detecting the precipitation event for POD and CSI. In contrast, 0 represents the perfect score for FAR, indicating that GPPs did not give any false alarm.

5.2 Entropy method

Entropy method assigns weights to each evaluation criteria under study. Equations 811 are used to compute weights using entropy method.

$${\mathrm{En}}_{j}=-\frac{1}{\mathrm{ln}(T)}\sum_{a=1}^{T} {k}_{aj}\mathrm{ln}\left({k}_{aj}\right)\text{ for }j=\mathrm{1,2},\dots \dots J$$
(8)
$${k}_{aj}=\frac{{k}_{j}(a)}{\sum_{a=1}^{T} {k}_{j}(a)}$$
(9)

where \({\mathrm{En}}_{j}\) is the entropy for each evalutaion criteria j, T is the number of gridded precipitation datasets, and \({k}_{aj}\) is the value of the jth indicator for ath gridded dataset.

The entropy (\({\mathrm{En}}_{j}\)) is then used to compute the degree of diversification and weights for each index as given by following equations.

$${Dd}_{j}=1-{En}_{j}$$
(10)
$${r}_{j}=\frac{{Dd}_{j}}{\sum_{j=1}^{J} {Dd}_{j}}$$
(11)

where \({\mathrm{Dd}}_{j}\) and \({r}_{j}\) represent the degree of diversification and normalized weight of indicator j, respectively.

The higher the value of entropy, the higher is the uncertainty associated with the indicator and less is the weight.

5.3 Compromise programming

Compromise programming is the ranking technique that utilizes minimum distance of the indicator associated with GPP and the observed dataset. The distance formula for compromise programming is given by Eq. 12.

$$L_p(a)=\left[{\textstyle\sum_{j=1}^J}w_j\left|f_j^\ast-f_j(a)\right|^p\right]^\frac1p$$
(12)

where \({L}_{p}(a)\) is the minumum distance metric for the \(a\) gridded dataset, j is the indicator, \({f}_{j}^{*}\) and \({f}_{j}\left(a\right)\) are normalized value of jth indicator for observed and \(a\) th gridded dataset, respectively, \({w}_{j}\) is the weight of jth indicator obtained from entropy method, and \(p\) represents distance parameter (\(p\) = 1 for linear and \(p\) = 2 squared Euclidean distance). The lower the value \({L}_{p}\) places the gridded dataset on the top and so on.

5.4 Homogeneity test

To test the inhomogeneities possibly present in dataset that are likely to influence the evaluation process, three of the four tests proposed by Wijngaard et al. (2003), which are standard homogeneity test, Pettitt test, and Buishand test, were considered for this study. As few datasets (GPCC and CRU) lack daily precipitation, annual total precipitation was used over the annual wet days (precipitation > 1 mm) for the homogeneity test. Similar approach adopted by Zandler et al. (2019) was used. The pyHomogeneity package in python 3.7 environment was used to test homogeneity of datasets at the significance level of 1% for the entire study period of 1998 to 2014. The obtained test results from these tests were then condensed into three categories: (a) “useful” if at most one test indicated inhomogeneous dataset, (b) “doubtful” if two tests indicated inhomogeneity, and (c) “suspect” otherwise.

6 Results

6.1 Homogeneity tests’ result

The homogeneity test indicated that majority of GPPs (> 90%) were homogeneous for both NRB and VMD. In case of NRB, only CPC was doubtful for three stations, namely, Ban Hinheup, Naphok, and Pakkanhoung. The condensed categories of the homogeneity tests for NRB is presented in Table 3.

Table 3 Categories of the homogeneity tests for different GPPs against each station for NRB

U useful, D doubtful.

On the other hand, three GPPs indicated signs of inhomogeneities for some stations in VMD. For instance, CRU was doubtful for BaTri station and CPC was doubtful for two stations and indicated suspect for BacLieu station while SAOBS was doubtful for 10 stations except ChauDoc. Results of homogeneity test for NRB and VMD are presented Table 4.

Table 4 Categories of condensed homogeneity tests for GPPs against each station for VMD

The p-stat obtained for individual test for all stations to generate the condensed categories of homogeneity analysis is presented in Appendix Tables (8) and (9), respectively.

6.2 Performance of GPPs across NRB based on continuous statistical indices and CP

The comparison of the GPPs with the station data were made on a grid-to-point basis whereby nearest grid data were extracted to the station as enough stations were unavailable to cover the entire basin. Station-wise evaluation indicators (r, Bias, RMSE, and SS) are presented in Table 5 and Fig. 3 for NRB. The APHRODITE in daily scale and CRU and GPCC at monthly time resolution showed a good fit against the observed dataset with correlation coefficient ranging between 0.85 and 0.97. The GPCC, TRMM, and APHRODITE have least bias among all the GPPs. The former two GPP represent positive bias ranging between 1.21 and 38 mm/month, indicating overestimation of precipitation while APHRODITE underestimated the precipitation for 80% of stations (− 4.97 to − 25.34 mm/month). Rest all GPPs (except CHIRPS) highly underestimated the precipitation in the NRB. CPC accounts for the highest negative bias in monthly precipitation for all stations followed by the CMORPH. The APHRODITE performed best in terms of RMSE with its values ranging between 45.32 and 111.55 mm/month followed by TRMM (48.83 to 109.87 mm/month) and GPCC (34.73 to 126.37 mm/month). Rest all GPPs were relatively erroneous.

Table 5 Statistical measures of GPPs for different statistical indices across stations of NRB
Fig. 3
figure 3

Taylor diagram displaying performance of different datasets against observed dataset for different evaluation indicators

The skill score (SS) represents the coincidence of two PDFs. The SS of APHRODITE and TRMM ranged between 0.69 to 0.90 outperformed the other datasets. The CPC, CMOPRH, and SAOBS exhibited larger differences between the PDFs against observed dataset. The worst was exhibited by the CPC with SS value ranging between − 0.12 and 0.4 (except for Vientiane). Rest GPPs exhibited moderate SS. The discrepancies in the evaluation indices between GPPs can also be attributed to the different interpolation techniques and the station density available for interpolation (Hu et al. 2018).

The Taylor diagram (Taylor 2001) in Fig. 3 visually illustrates the station-wise performance of different gridded dataset against the observed dataset with respect to three different evaluation indicators (r, RMSE, and SD), and a table in the middle shows the final rank of the dataset. The black secants represent the SD while the green secants are for RMSE and the dotted blue straight lines represent r. The red secant signifies the SD of the reference dataset. In terms of SD, GPCC and TRMM were close to observed SD than other GPPs. Before obtaining the final rank, datasets with daily temporal resolution were further evaluated for the extreme climate indices. The performance of individual dataset against the observed dataset in capturing the variability of EIs is presented in Appendix Fig. 7. As monthly datasets were evaluated for the mean precipitation only, their score summed up to minimum and placed GPCC and CRU in the top order. It can be seen from the Fig. 3 and Table 5 that APHRODITE secured 3rd rank (overall) and 1st (at daily scale) for about 60% stations of NRB.

Figure 4 houses the ranking of each GPP against the observed dataset in the form of heatmaps obtained from entropy method and compromise programming while Appendix Fig. 8 presents the station-wise box and whisker plot of EIs for the different GPPs with detailed description. Varying degree of performances were observed for GPPs for different stations. For instance, SAOBS was able to capture the CDD for 60% of stations followed by CPC. On the other hand, CPC and CHIRPS did well in capturing the CWD for 80% of the stations combined. APHRODITE and CMORPH were able to capture the range of CDD and CWD, but they underestimated (overestimated) for CDD (CWD) for almost all the stations. For about 60% of the stations, SAOBS was able to represent the R10mm precipitation followed by APHRODITE. Similarly, in case of R20mm precipitation, APHRODITE ranked itself in top order for 60% of the stations. APHRODITE and TRMM were able to represent the Rx1day precipitation for 80% of the stations combined. In contrast, SAOBS dominated the order for 60% of the stations for Rx5mm indices.

Fig. 4
figure 4

Heatmaps capturing rank of GPPs evaluated against the observed dataset for different EIS for all the stations in NRB

6.3 Evaluation GPPs based on categorical indices for NRB

The skill of individual GPPs to correctly discriminate the rain/no-rain events was further tested using three categorical indices (POD, FAR, and CSI) using 1 mm as the precipitation threshold. The values of POD, FAR, and CSI for each GPP is presented in Table 6. APHRODITE dataset had the highest POD ranging between 0.96 and 0.98 while SAOBS had the least FAR value ranging between 0.39 and 0.46. The TRMM had the highest CSI value ranging between 0.44 and 0.64.

Table 6 Evaluation of GPPs based on the categorical indices (bold values represent better performance for each index)

6.4 Performance of GPPs across VMD based on continuous statistical indices and CP

The results for evaluation criteria of the different GPPs against the observed stations data for VMD is presented in Appendix Table (10). APHRODITE, CHIRPS, GPCC, and TRMM demonstrate least bias for almost all the stations ranging between ± 23 mm/month. The former two GPPs mostly have the negative bias and later two GPPs with the positive bias. Similarly, CMORPH, CPC, and SAOBS mostly underpredicted the precipitation as high as − 184 mm/month for RachGia station (SAOBS). Correspondingly, in terms of RMSE, APHRODITE, GPCC, and TRMM has the least RMSE with values less than 86 mm/month followed by CHIRPS, CMORPH, CPC, and CRU. The SAOBS had poor RMSE ranging between 150 and 241 mm/month. Likewise, except for the SAOBS, all GPPs demonstrated a good correlation (above 0.7 for 80% of stations) with the observed dataset at the monthly scale with higher correlation corresponding to TRMM and GPCC. The r values for SAOBS ranged between 0.16 and 0.42 reveal very poor correlation at the monthly timescale. Figure 5 graphically illustrates the performance of each GPPs in terms of r, RMSE, and SD for VMD.

Fig. 5
figure 5

Taylor diagram displaying performance evaluation of GPPs against observed dataset at different stations for Mekong Delta

The GPCC had the better coincidence of PDF with the observed indicated by the SS values ranging between 0.52 and 0.98 for different stations. Rest all GPPs has the similar performances in terms of SS except (CRU and SAOBS). The SAOBS has the least overlapping PDF with observed as indicated by the negative values of SS (− 1.28 to − 0.82).

Similarly, Appendix Fig. 8 presents a station-wise box and whisker plot housing the performance of daily-scale GPPs against observed dataset for several climate EIs under study for VMD. During the study period from 1998 to 2014, TRMM outperformed all other GPPs in terms of representing the climate EIs against EIs of observed dataset for all the stations. Strong differences can be observed for other GPPs in representing the EIs. For instance, TRMM and CMORPH closely represent CDD (CWD) while other GPPs underestimate (overestimate) CDD(CWD) for almost all the stations. Similarly, TRMM and APHRODITE adequately represent the observed R10mm EI for almost all the stations while other GPPs underestimated R10mm EI except for CHIRPS which overestimated the R10mm EI for all stations. However, only TRMM seems to represent R20mm while other GPPs underestimate it for all the stations. Likewise, only TRMM and CMORPH sufficiently captured the Rx1day and Rx5day indices while other GPPs underestimated them. Detail description of performance of GPPs in capturing the variability of EIs in observed dataset along with statistics is presented in Appendix Fig. 8.

The rank of each GPP based on its performance against the observed dataset for different climate indices is presented by the heatmap in the Fig. 6.

Fig. 6
figure 6

Heatmaps capturing rank of GPPs evaluated against the observed dataset for different EIS for all the stations in Mekong Delta

6.5 Evaluation of GPPs based on categorical indices

The threshold to discriminate rain/no-rain was considered 1 mm as recommended by WMO. Based on the different categorical indices used to evaluate the GPPs, APHRODITE seems to perform well in detecting the rainfall as indicated by the POD value greater than 0.85 for all stations as presented in Table 7. The APHRODITE also has the higher FAR values ranging between 0.34 and 0.52. This can be attributed to the higher number of CWD and fewer CDD for APHRODITE. Satellite-based estimate TRMM and reanalyzed dataset CMORPH have the least value for POD not above 0.71 for all stations. They also have relatively lower FAR compared to station-based gridded dataset. However, the overall critical index is similar for all the datasets.

Table 7 Evaluation of GPPs based on categorical indices for Mekong Delta

7 Discussion

This study evaluated the eight GPPs: five gauge-based products (APHRODITE, CPC, CRU, GPCC, and SAOBS), one satellite-based product (TRMM), and two reanalyzed products (CMORPH and CHIRPS) for the two sub-basins (NRB and VMD) of LMRB. Contrasting performance of GPPs were observed for these basins. The performance differences between the GPPs across SEA is also reported by Alexander et al. (2020). APHRODITE performed well for the NRB while TRMM outweighed others in VMD. The performance of APHRODITE outperforming other dataset for the NRB is consistent with the findings of Tian et al. (2021) and Tang et al. (2019). The better performance of APHRODITE in the NRB could be the outcome of the better interpolation technique addressing the topographic variation (Yi Yang et al. 2017) and was developed specifically for Monsoon Asia by taking observed gauge-based precipitation products from within the region (Yatagai et al. 2012). Saying that, the APHRODITE failed to capture the EIs in many instances for both the sub-basins. For example, the APHRODITE overestimated (underestimated) CWD (CDD). The higher CWD is the reflection of spatial average of gauge data over the grid resolution (Hussain et al. 2018; Satgé et al. 2016). In contrast, the performance of GPPs is quite similar (Sun et al. 2018) and discrepancies are not large for VMD as indicated by the statistical indices of GPPs except for SAOBS. Though the SAOBS was developed particularly for SEA with data obtained from the SEA countries, it observed the poorest skill in estimating the precipitation across VMD. This possibly could be that the gauge records might not have been made available from VMD during the production of SAOBS product.

In terms of representing the mean precipitation, gauge-based products (except GPCC for NRB and VMD and CRU only for VMD) and CHIRPS (reanalyzed product) mostly underpredicted for 75% of stations. The underestimation by these products could be attributed to the limited number of in-situ stations used for developing gridded dataset and inadequate quality control of the real-time GTS datasets whereby sometimes missing values are replaced with 0 especially for APHRODITE (Yatagai et al. 2012). The underprediction of precipitation by CPC over Myanmar and neighboring regions is also reported by Kim et al. (2019). In contrast, the GPCC overestimated (though close to observed) the mean monthly precipitation. The GPCC at the monthly scale out performs in terms of capturing the mean monthly precipitation attributed to its increased number of stations and improved quality controlled data and the change in interpolation technique from kriging to spheremap (Alexander et al. 2020; Schneider et al. 2014). Similarly, the overestimation of precipitation of TRMM can be linked to ignorance of altitudinal variations in the algorithm, IR sensor estimating precipitation from non-raining cirrus cloud (Scheel et al. 2011) and its bias correction technique that employs GPCC as reference dataset at monthly scale (Trinh-Tuan et al. 2019). Likewise, the higher negative bias of the CMOPRH can be linked to its inherent characteristics of underestimating the precipitation at higher altitude (Hobouchian et al. 2017) and the use of CPC (which is already underestimating precipitation) as the reference dataset for bias correction (Xie et al. 2017).

The CMORPH usually simulates light-rain events at daily scale and fewer heavy rain-events. This can be attributed to its gridding algorithm which is bilinear interpolation (Yu et al. 2009). It ranked last in terms of representing CDD and CWD for almost all the stations. It is also noteworthy to contemplate that the precipitation by the GPPs is the mean areal value of the grid boxes and the spatial resolution of the different GPPs are different which compromises the spatial representativeness of the station to the grid. This may differ significantly in terms of elevation as well. Hence, it is difficult for a station to represent the precipitation of different spatial resolution of different GPPs.

The gauge-based GPPs exhibited higher POD compared to satellite-based and reanalyzed GPPs. In contrast satellite-based and reanalyzed products were far skillful in minimizing FAR compared to the gauge-based GPPs (except for SAOBS). Similar findings were reported by Yi Yang et al. (2017). The lower POD and higher FAR for satellite and reanalyzed product can be attributed to the low ability to discriminate the drizzle and frozen precipitation during dry season (Wu et al. 2019). The higher FAR for the gauge-based product could be caused by the differences in grid location and gauge location and areal averaged value.

8 Conclusions

An evaluation and ranking of eight state-of-art GPPs were carried out for the two contrasting sub-basins of the LMRB: (i) NRB of Lao PDR with undulating terrain and (ii) humid VMD characterized by flat terrain and tropical climate. The noticeable differences in the performance of GPPs were revealed for two sub-basins. Most of former studies attempted to evaluate and recommend a best GPP for the MRB or LMRB as a whole. However, this may not be always applicable as the study domain might only be the sub-basins of smaller areal coverage. Depending upon the area considered for the study and the gauge-station availability, a diverse performance can be expected for different GPP with in sub-basins of the same basin as well. This study also indicates that there is no-size-fits-all GPP for different sub-basins for application within the LMRB as APHRODITE outperformed other GPPs at daily scale for NRB while TRMM ranked first for VMD. Further, Alexander et al. (2020) also pointed out that the differences between the performances of GPPs across the SEA is prominent. The differences in the performances can be attributed to the sparse station density considered for developing the GPPs, station’s locations, gridding algorithms, location of stations and grid point, etc.

This study employed the ECP, an automated multi-criteria decision-making approach for evaluation and ranking of the GPPs. This is also a first attempt to rank the dataset using compromise programming in these two sub-basins. This method calculates the weight based on the uncertainty with in the timeseries and rank them with reference to minimum distance (minimum error/uncertainty) from the observed dataset by limiting the interference of the researcher. Most of the studies (A. Chen et al. 2018; C.-J. Chen et al. 2017; Dandridge et al. 2019; Dang Dinh et al. 2020) that has been carried out to evaluate GPPs in the LMRB rely only on the statistical and categorical indices and the ranking is solely based on the skill of the researcher to weigh these indices. As different statistical indices may favor different GPP resulting in a conflicting choice and it is entirely dependent on skill of researcher to choose the best one. This limitation can be overcome by employing the ECP which weighs the uncertainties in statistical indices and rank GPP with no influence from the researcher. However, the performance might be affected by the number of criteria being considered and the weight obtained from entropy gets redistributed on addition or removal of evaluation indices influencing the GPP selection. Hence, care should be taken in limiting the number of statistical indices to be used. Saying that, the ECP can be a suitable choice in ranking and evaluating a large suite of GPPs and when a researcher has hard time in selecting a suitable choice based on statistical indices only.

Saying that, the ranked GPP from this study can be used for different hydro-meteorological application such as water resource planning and estimating potential for hydropower potential. The GPPs are the gateways to fill in the data gaps and spatial coverage for several hydrometeorological studies (Morales-Moraga et al. 2019; Prajapati et al. 2021). However, it should also be noted that the choice of the GPP should be considered based upon the application or purpose of the study as the error in the daily precipitation frequencies and intensities of GPPs significantly impact the output of hydrologic simulations (Luo et al. 2019). For instance, though APHRODITE was ranked best for NRB, its skill to reflect the EI for some instances (CWD, CDD) was poor. Hence, in case of extreme analysis, GPP that captures the EI well or an ensemble of the GPPs addressing the large uncertainty among the dataset shall be considered.