1 Introduction

Precipitation is considered a fundamental constituent of the global hydrological cycle that affects the socio-economic development of any country (Behrangi et al. 2011). The choice of an accurate and reliable gridded precipitation dataset (GPD) is of great importance not only for studying trends of climate variability but also for efficient water resource management and hydrological forecasting. For researchers, it is difficult to simulate and study climate variability and hydrological cycle without accurate measurement of the precipitation (Tapiador et al. 2012; Turner and Annamalai 2012). The reliability of precipitation datasets is limited mainly by the number and spatial coverage of stations in observed data collection networks over remote and inaccessible regions (Wu et al. 2019; Yong et al. 2015). Thus, GPDs are a valuable source of data for hydro-meteorological and climatological studies. Owing to their intrinsic structure and design, GPDs are known to perform differently in regions with diversified topography and climatic conditions. There is a lack of consensus on the reliability of these datasets for regions like Pakistan with complex topography. It is therefore important to evaluate the comparative performance of the GPDs on multiple spatiotemporal scales.

In climate science, GPDs are generally categorized into observed and reanalysis datasets; each has its strengths and weaknesses (Herold et al. 2016; Tapiador et al. 2017; Sun et al. 2018). Observed GPDs are surface precipitation gauge (SPG) observations interpolated on a uniform grid and they are widely used for model calibration and evaluation (Zumwald et al. 2020). In addition to the issues arising from measuring instruments, quality control, and calibration methods, the number and spatial homogeneity of functional SPG observations also impact the quality and continuous availability of data (Hofstra et al. 2010; Dunn et al. 2014; Kidd et al. 2017). Reanalysis data, on the other hand, are produced via data assimilation techniques based on observations and model-generated forecasts to produce an optimal estimate of the atmospheric state. These products have been extensively used by the climate science community to understand climate variability, atmospheric dynamics, and hydro-meteorological characteristics (Qaiser et al. 2021; Ain et al. 2020; Latif et al. 2017; Kravtsov et al. 2014). The climate science community had long recognized the significance of observed and reanalysis precipitation estimates. In recent decades, there have been rapid advancements in measurement instruments, numerical weather prediction models, advanced data assimilation techniques, and satellite technologies, leading to the evolution of highly accurate and reliable GPDs.

Despite becoming technically more complex and analytically challenging, the GPDs are aggravated by uncertainties, shortcomings, and gaps, which can be traced by investigating their intrinsic structure and/or multiple statistical and assimilation techniques used in their generation (Stampoulis and Anagnostou 2012; Tapiador et al. 2012; Beck et al. 2019). The datasets have long been questioned for their limitations over complex terrain and lack of ability to capture precipitation at multiple spatiotemporal scales (Palazzi et al. 2013). Numerous studies have been carried out on the performance evaluation of GPDs across various parts of the globe. For instance, Wang et al. (2019a) have concluded that European reanalysis Interim (Era-Interim, hereafter called ERAINT) and Japanese 55-year reanalysis (JRA55) datasets agree well with SPG observations over Qinling-Daba Mountains (China) in comparison to other datasets used. Kishore et al. (2016) have shown that GPCC correlates well with Indian Meteorological Department (IMD) station data compared to other datasets. Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis data shows a large deviation, particularly over peninsular India. Similar studies related to performance evaluation of observed and reanalysis GPDs have been conducted over different parts of Asia (e.g., Kim et al. 2019; Nashwan et al. 2019; Rana et al. 2015; You et al. 2015; Vu et al. 2018).

In the past decades, several studies have been conducted on the performance assessment of GPDs in various zones of Pakistan. For instance, Ahmed et al. (2017) evaluated the performance of four gauge-based GPDs over Balochistan province during 1961–2007. They have concluded that GPCC shows better performance in comparison to other datasets. Ali et al. (2012) have found that APHRODITE underestimates gauge precipitation over humid and sub-humid regions of the country. Cheema and Hanif (2013) have identified a better linear relationship of GPCC with Pakistan Meteorological Department (PMD) station data (1961–2005) over the Punjab province of the country. Latif et al. (2017) have pointed out inconsistencies in various monthly GPDs in estimating the summer precipitation over the core monsoon region of Pakistan. Ain et al. (2020) have recently undertaken a comprehensive study on droughts over the Potwar region of Pakistan and reported the same inconsistencies in the performance of several GPDs. Ullah et al. (2019) have deduced a clear disagreement among a range of gridded satellite-based precipitation products against station observations of Pakistan. In addition to that, even bias-corrected APHRODITE data overestimates (underestimates) the precipitation in northern (southern) parts of the country (Nabeel and Athar 2018). Krakauer et al. (2019) have investigated several GPDs against SPG observations over the Indus basin and found that GPCC and TMPA better approximate SPG observations. Moreover, Muhammad et al. (2020) have assessed the accuracy of satellite-based rainfall estimates over the diverse areas of Pakistan and found SM2RAIN as the best-performing satellite-based data product.

In spite of the existence of previous literature on the performance assessment of various types of GPDs in different regions of Pakistan, there is no such study that encompasses the entire variety of topography of the country by employing numerous datasets and statistical metrics over a reasonably long period of time. The literature shows that the choice of GPD has often been arbitrary and random for investigating the impact of precipitation variability/change on hydrology, agriculture, and other socio-economic sectors, which may lead to a decrease in the robustness of results. Therefore, there is a need to develop unanimity on the choice of GPD which might be used as a reference for the agro-hydro-meteorological studies over the country. The main objective of this work is to evaluate the performance of several widely used observed and reanalysis GPDs comprehensively in space and time using six statistical tests over the entire country during 1981–2015. The structure of this paper is organized as follows: Section 2 describes the study area, datasets, and methodology; results are presented in Section 3; a discussion on the results is provided in Section 4; Section 5 summarises the paper and concludes the main findings of this study.

2 Data and methods

2.1 Study area

Pakistan lies in south-west Asia, extends between the latitudes 23.5°N to 37°N and longitudes 60.5°E to 78°E, with diversified topography and climatic conditions (Fig. 1). It covers a total area of 796,100 km2, with altitudes ranging from sea level to 8611 m in the north with the world’s second-highest mountain peak Mount Godwin Austen (K2). It comprises six administrative regions: namely, Punjab, Sindh, Balochistan, Khyber Pakhtunkhwa (KP), Gilgit Baltistan (GB), and Azad Jammu and Kashmir (AJK) (Nawaz et al. 2021). The climate of Pakistan is mainly arid and semi-arid, with great diversity in temperature and precipitation (Adnan et al. 2017). The country receives much of its annual precipitation during summer [June–September (JJAS)] and winter [December–March (DJFM)] seasons through south Asian monsoon and western disturbances, respectively. The observational uncertainty of precipitation is higher in summer months (1.5 to 3.0 mm) as compared to winter months (1.0 to 1.7 mm) (Fig. 2b). The summer monsoon climate is found in Punjab, Sindh, isolated places of KP (e.g., Malakand and Hazara divisions), and AJK; whereas Balochistan, northern parts of KP, GB, northern Punjab, and AJK get precipitation mainly through western disturbances (Ahmed et al. 2019; Asmat and Athar 2018). The summer (winter) precipitation contributes approximately 45% (31%) of the annual precipitation of the country (Adnan et al. 2018).

Fig. 1
figure 1

available at http://srtm.csi.cgiar.org/. The elevation is represented in meters. SPG station network distribution of the study area is represented by corresponding serial numbers. (see Table 1 for details of SPG stations)

Elevation map of study area derived from 90-m Shuttle Radar Topography Mission (SRTM) data

Fig. 2
figure 2

Spatial distribution of annual total precipitation (solid filled circles) using SPG data (mm/year) for the period 1981–2015. SPG stations with inhomogeneity detected in the data series are marked with a white dot within the point location. b Annual cycle and variability of precipitation (mm/month) and near-surface air temperature (°C) over Pakistan

The spatial distribution of mean annual precipitation during 1981–2015 over Pakistan shows that the maximum precipitation (888 to 1770 mm) is mainly observed in the core precipitation region of Pakistan (CPRP), which includes northern Punjab, Potwar plateau, and isolated places of KP (Fig. 2a). According to Latif and Syed (2016), the core monsoon region of Pakistan not only shares a major part of the country’s total annual precipitation but also signals the arrival of the monsoon rainy season in the country. Out of 56 SPG stations of PMD, inhomogeneity has been detected over 10 stations (Fig. 2) situated in the high elevated north-western fringe of the country. This is the region of complex terrain which mainly receives precipitation through the western disturbances in the winter season. The highest average annual temperature over Pakistan ranges between 30 and 35 °C during the months of June and July, while the winter months (i.e., December and January) show the lowest temperatures range from 8 to 20 °C (Fig. 2b).

2.2 Observed and reanalysis GPDs

In order to evaluate the performance of different observed and reanalysis GPDs for the period of 35 years (1981–2015) on monthly, seasonal, and annual timescales, the following products have been used: (1) monthly point-based SPG data from 56 meteorological stations (Table 1), provided by Climate Data Processing Centre (CDPC), PMD; (2) Global Precipitation Climatology Centre (GPCC) ver. 6 of National Oceanic and Atmospheric Administration (NOAA) with a horizontal resolution of 0.5° × 0.5° (Schneider et al. 2016); (3) Climatic Research Unit (CRU) ver. TS 4.03 at 0.5° grid resolution, provided by the University of East Anglia, UK (Harris et al. 2014); (4) Climate Prediction Centre (CPC) unified daily gauge-based data of NOAA at 0.5° × 0.5° resolution (Chen et al. 2008); (5) Cressman Interpolated High-resolution Gauge-based Gridded Observations (CIHGGO) data with a horizontal resolution of 0.5° (Ahmad et al. 2019); and (6) EartH2Observe, WFDEI and ERA-Interim data Merged and Bias-corrected for ISIMIP (EWEMBI) of Potsdam Institute for Climate Impact Research (Frieler et al. 2017).

Table 1 Details of SPG locations across different administrative regions of Pakistan. Index shows station numbers that are mentioned in Fig. 1 at their respective locations

In addition to the above, we have used the following reanalysis GPDs: (7) Era-Interim (ERAINT) by the European Centre for Medium-Range Weather Forecasts (ECMWF) (Dee et al. 2011); (8) the fifth generation ECMWF reanalysis ERA5, produced by the Copernicus Climate Change Service (C3S) (Hersbach et al. 2020); (9) the Twentieth Century Reanalysis (20CR) ver. 3, supported by the NOAA-Cooperative Institute for Research in Environmental Sciences (CIRES), Department of Energy (DOE) (Slivinski et al. 2019); (10) Japanese 55-year Reanalysis (JRA55) of Japan Meteorological Agency (JMA) (Kobayashi et al. 2015); and (11) Modern-Era Retrospective analysis for Research, and Applications ver. 2 (MERRA2), produced by the NASA’s Global Modelling and Assimilation Office (GMAO) (Gelaro et al. 2017). The horizontal resolution of the forecast model is ~ 31 km for ERAINT and ERA5, ~ 75 km for 20CR, and ~ 55 km for JRA55 and MERRA2. The number of the model level ranges between 60 and 137. To produce data, an advanced data assimilation technique (4D-VAR) has been used by the reanalysis products. Table 2 summarizes the datasets used in this study. To cater for uncertainty in observed and reanalysis products, both multi-observed and multi-reanalysis approaches have been employed (Syed et al. 2019).

Table 2 List of observed and reanalysis GPDs used in this study

2.3 Homogeneity tests

The quality and reliability of monthly SPG data have been assessed using the Pettitt test (Pettitt 1979), Standard Normal Homogeneity Test (SNHT) test (Alexandersson 1986), and Buishand test (Buishand 1982). The results are evaluated to determine inhomogeneity in the SPG data. The level of significance is set to α = 0.05. Table 3 shows the change point probability in the annual precipitation data series over Pakistan. Out of 56 SPG stations, inhomogeneity is detected at 10 stations which are mainly located in northern parts of Pakistan (Fig. 2). Since the majority of these SPG stations lie in the high elevation region, the geographic location might be one of the major reasons for inhomogeneity. Other probable causes of inhomogeneities may include a shift in a climate zone or monsoon circulation, change in measurement location and observation times, human activities, and change in land use and land cover over time (Akinsanola and Ogunjobi 2017). The inhomogeneous stations show the change point in the years 1991, 1996, 1997, 1998, 1999, 2002, 2005, and 2010 (Table 3). SPG stations with inhomogeneities in annual precipitation are excluded from the analyses. Therefore, the precipitation data of the remaining 46 stations have been used as a reference for the analyses. Moreover, manual quality control tests (e.g., outliers, missing data, and temporal consistency) have also been applied to SPG data prior to performing analyses.

Table 3 Most probable change year by Pettitt’s test, SNHT test, and Buishand’s test

2.4 Methods

The monthly total precipitation of the two observed (i.e., CPC and EWEMBI1) and four reanalysis (i.e., MERRA2, ERAINT, ERA5, and JRA55) data is calculated by accumulating daily and hourly data values. Similarly, seasonal and total annual precipitation is calculated by summing the monthly values. Prior to extracting precipitation values over 46 selected SPG stations, all the observed and reanalysis GPDs are resampled to a common 0.5° × 0.5° (~ 56 km) horizontal grid. In evaluation studies, it is a common practice to compare the SPG data to the GPD (see, for instance, Nawaz et al. 2021; Wang et al. 2019a; Ahmed et al. 2019; Hu et al. 2018). Following the data quality–control tests and resampling, monthly precipitation values are extracted at the same grid locations where SPG stations are located.

To quantitatively assess the performance of observed and reanalysis GPDs, commonly used statistical analysis techniques such as correlation coefficient (CC), relative bias (RB), root mean square error (RMSE), and mean absolute error (MAE) are employed on monthly, seasonal, and annual timescales. The CC is a unitless quantity used to measure the strength and direction of the linear association between two variables. The RB is used to assess the degree of under- and overestimation with respect to the reference field. A positive (negative) value of RB indicates the degree of overestimation (underestimation) of the true value. RMSE and MAE are used for measuring the average magnitude of error between observed and simulated data. MAE gives the average magnitude of error without considering its direction, whereas the RMSE gives more weight to the largest errors. The values of CC, RB, RMSE, and MAE are calculated by using the following Eqs. 1, 2, 3, and 4, respectively.

$$CC=\frac{\sum_{i=1}^n({stn}_i-{}_{stn}{}^=)\;({grd}_i-{}_{grd}{}^=)}{\sqrt{\sum_{i=1}^n({stn}_i-{}_{stn}{}^=)^2\;({grd}_i-{}_{grd}{}^=)^2}}$$
(1)
$$RB=\frac{\sum_{i=1}^{n}({stn}_{i}-{grd}_{i})}{\sum_{i=1}^{n}{stn}_{i}} \times 100\%$$
(2)
$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}({stn}_{i}-{grd}_{i}{)}^{2}}$$
(3)
$$MAE=\frac{1}{n}\sum_{i=1}^{n}\mid {stn}_{i}-{grd}_{i}\mid$$
(4)

where \(n\) is the total number of counts, \(i\) is the \(ith\) value of the SPG station and gridded data, stn and grd represent, respectively, the SPG station and gridded (observed and reanalysis) precipitation data at ith month. \({}_{ stn}{}^{=}\) and \({}_{ grd}{}^{=}\) are the mean values of SPG station and gridded data, respectively. The statistical significance of the results is checked using Student’s t-test. To identify how well the observed and reanalysis GPDs identify wet and dry years, the percentage precipitation difference (PPD) is calculated:

$$PPD=\frac{grd-stn}{\overline{stn}}\times100\%$$
(5)

Precipitation centroid method is used to analyze the spatial heterogeneity of the observed and reanalysis GPDs with SPG data. The point where spatial variations in precipitation attain balance is regarded as a centroid (Liu et al. 2013; Li et al. 2015). Migration distance is calculated between 2 adjacent years within a dataset and the sum of migration distances is compared to quantitatively assess the performance of a GPD against SPG data. The following formula is used to find the coordinates (\(X\;and\;Y\)) of the precipitation centroid:

$$X=\frac{\sum_{i=1\;{grd}_ix_i}^n}{\sum_{i=1\;{grd}_i}^n}$$
(6)
$$Y=\frac{\sum_{i=1\;{grd}_iy_i}^n}{\sum_{i=1\;{grd}_i}^n}$$
(7)

where \(n\) is the number,\((x_i\;and\;y_i)\) is the location of SPG station, and grd represents the precipitation from gridded data. To further evaluate the performance of GPDs on seasonal timescale, the Taylor diagram (Taylor 2001) is used.

3 Results

3.1 Climatology

The spatial distribution of monthly mean precipitation of five observed GPDs and interpolated SPG data is shown in Fig. 3 (left panels). Spatial pattern of precipitation from SPG stations clearly shows that the largest amount of precipitation (> 100 mm/month) is observed in the CPRP, extending from the north-east to the north-west of the country (Fig. 3 inset). The CPRP receives the largest amount of precipitation in both summer and winter seasons through south Asian monsoon and western disturbances, respectively (Latif and Syed 2016). In contrast, relatively dry conditions with precipitation less than 10 mm/month are observed over the southwestern parts of the study area. Southern Punjab, Sindh, and adjoining areas of Balochistan receive relatively low precipitation, ranging from 10 to 20 mm/month. Monthly mean precipitation patterns of five observed GPDs and multi observed mean (MOM) are shown in Fig. 3a–e and f, respectively. It can be seen that the spatial distribution of mean precipitation is well captured by all the observed GPDs; however, CPC slightly underestimates in the north-eastern parts of the country. The reanalysis GPDs, on the other hand, do not well capture the spatial patterns of monthly precipitation over the study area (Fig. 3g–k). The ERAINT, ERA5, and JRA55 (20CR and MERRA2) overestimate (underestimate) the mean precipitation over the northern half of the country.

Fig. 3
figure 3

Mean annual precipitation of observed, reanalysis, and SPG data for the period 1981–2015 over Pakistan. Left panel (observed GPDs): a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, and f multi-observed mean (MOM). Right panel (reanalysis GPDs): g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and l Multi Reanalysis Mean (MRM). Surface gauge-based precipitation (in offset)

3.2 Monthly precipitation evaluation

Figure 4a–f shows the correlation of monthly SPG data with observed GPDs. It is observed that GPCC, EWEMBI1, and CIHGGO show a strong positive correlation (> 0.95) over the majority of the stations. The stations with relatively low CC value (< 0.80) are located in Gilgit Baltistan (GB), which can be attributed to the high mountain ranges, harsh, desolate terrain, and scarce network of SPG stations. The reanalysis GPDs, on the other hand, display comparatively low CC values (Fig. 4g–l). MERRA2 shows a significant positive correlation (> 0.81) over 12 stations all over the country. Both ERA5 and ERAINT show a positive correlation of (> 0.80) over a few stations located in the CPRP. Whereas JRA55 and 20CR do not perform well at most of the stations. Overall, GPCC, CIHGGO, and EWEMBI1 perform well with a high CC value over the majority of the stations as compared to reanalysis products. The high correlations of observed GPDs can be attributed to the fact that the uncertainties and errors to estimate atmospheric conditions are often well understood than those linked to reanalysis data (Parker 2016).

Fig. 4
figure 4

Correlation of monthly SPG data with a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, f MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and (l) MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

The spatial distribution of RB (%) suggests that GPCC and EWEMBI1 have the least bias (± 40%) at most SPG stations (Fig. 5a–l). The higher performance of GPCC may be linked to the large number of SPG stations used for data generation. Whereas a good performance of EWEMBI1 may be attributed to the bias correction methods applied in the creation of this product. CPC underestimates the monthly precipitation by − 60% over all of the SPG stations, except for Gilgit, Bunji, and Chilas in GB, and Nokkundi and Ormara in Balochistan province (Fig. 5b). Overall, all the observed GPDs (except CIHGGO) overestimate precipitation (> 80%) at stations located in the orographically complex region of GB. Among the reanalysis products, ERAINT and ERA5 overestimate precipitation (> 60%) over most of the northern parts of the country. Whereas JRA55 shows overestimation (80%) over the eastern half of the country. 20CR underestimates (− 41 to − 60%) over most of the stations. All reanalysis GPDs show above 100% bias in the high mountainous GB region.

Fig. 5
figure 5

Relative bias (%) of monthly SPG data with a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, f MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, (k) MERRA2, and l MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

In terms of RMSE and MAE, all the observed and reanalysis datasets show lower error in the southern half compared to the northern mountainous/sub-mountainous region of the country (Supplementary Figs. S1 and S2). Among the observed GPDs, GPCC and EWEMBI1 have the lowest error values (< 60 mm). Of reanalysis products, MERRA2 shows the lowest RMSE and MAE (< 10 to 40 mm) in the GB region. JRA55 shows high RMSE (80 to > 100 mm) in the eastern flank (Punjab and Sindh provinces) of the study area whereas MAE in the same region ranges between 41 and 60 mm.

To assess the overall performance of GPDs on a monthly timescale, average values of four statistical tests are analyzed (Table 4). GPCC performs well with a high CC value of 0.96, low bias error (8.20%), RMSE (5.46 mm), and MAE (3.42 mm). EWEMBI1 displays comparable results to GPCC and can be ranked as the second-best performing observed GPD. CIHGGO reveals a similar ability to produce a spatial distribution of correlations (0.95); however, this data overestimates the monthly precipitation by 15.24% which is the highest among all the observed GPDs. CRU shows a better performance in terms of relative bias (~ 5%). CPC does not perform well in terms of four statistical assessments. Among the reanalysis datasets, ERA5 and ERAINT perform better with an average CC value of 0.92 and 0.80, respectively; however, ERA5 overestimates the precipitation by 64% with average RMSE and MAE of 18 mm and 16 mm, respectively. MERRA2 and 20CR underestimate by − 18% and − 29.31%, respectively. MERRA2 has an average RMSE (MAE) value of 13.52 mm (9.65 mm). JRA55 significantly overestimates, although it ranks third in terms of correlation with a CC value of 0.83. Multi-observed-mean (MOM) and multi-reanalysis-mean (MRM) results show a close approximation to the SPG data, hence, proving the effectiveness of the multi-mean approach to cater for the observational uncertainty.

Table 4 The average of CC, RB (%), RMSE (mm), and MAE (mm) between observed and reanalysis GPDs and SPG data on a monthly timescale during 1981–2015 over Pakistan

3.2.1 Statistical assessments during an annual cycle

Four statistical tests are conducted for each month and plotted on a line chart for comparative analyses (Fig. 6). Among the observed GPDs, GPCC and EWEMBI1 perform well in all months of the annual cycle (Fig. 6a–d). CIHGGO has similar results, except in October when its CC value drops to 0.84 with a high bias error of (~ 30%). It is also noted that all the observed GPDs overestimate in dry months (see, October and November). CRU shows better performance in the winter months than in summer. CPC performs the worst in all months (avg. RB − 19%) with no clear agreement with station data. During peak monsoon months (July–August), the highest error values (between 23 and 35 mm) are observed by CPC compared to other datasets. During November (the driest month), all datasets perform well with low error values. MOM has relatively a better average performance compared to individual data products throughout the annual cycle.

Fig. 6
figure 6

The line chart of the correlation coefficient, RB (%), RMSE (mm), and MAE (mm) between monthly SPG and observed (ad) and reanalysis (eh) GPDs for the period 1981–2015 over Pakistan

Among the reanalysis products (Fig. 6e–h), ERA5 and ERAINT simulate the annual cycle reasonably with average CC values of 0.92 and 0.87, respectively. These two datasets generally overestimate the precipitation in all months with a larger bias error (85% to 150%) in October and November. The RMSE and MAE are observed higher during the summer and winter rainy months. These results are consistent with previous studies (e.g., Wang et at., 2019b) that conforms to the association of larger errors with the higher rainfall concentrated months. MERRA2 does not coincide well with SPG data during March–April and July–August but performs better during post-monsoon and winter months. It generally overestimates precipitation, which is comparable to ERA5 and ERAINT but with slight underestimation (overestimation) in November (May–August). 20CR does not show a good correlation in most of the months and underestimates with lower error values throughout the annual cycle. JRA55 has the worst performance in the summer rainy season and overestimates the precipitation mostly in the annual cycle. The highest error values are observed by JRA55 in the summer monsoon months. MRM demonstrates the highest correlation and lowest RMSE and MAE. Overall, the highest average correlation and lowest errors are shown by ERA5 and MERRA2, respectively.

3.3 Seasonal precipitation evaluation

3.3.1 Summer

Figure 7 shows the spatial distribution of CC for summer precipitation at each SPG station. The results show that EWEMBI1 ranks the first with CC values greater than 0.95 at 21 SPG stations. GPCC and CIHGGO are ranked second and third well-performed observed datasets with 15 and 13 SPG stations having CC > 0.95, respectively. Most of these stations are located in the southern half of the country including central and southern Punjab, and the southernmost Sindh and Balochistan. CPC has correlation values of < 0.70 at most stations in the KP province and northern Punjab, whereas it ranges from 0.81 to 0.90 in the southern provinces of Sindh and Balochistan. Similarly, CRU results are comparable to CPC with CC values < 0.70 at majority of stations. The high correlation in summer for the southern half of the country may be attributed to less complex terrain and lower rainfall amounts.

Fig. 7
figure 7

Correlation of summer (JJAS) SPG data with a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, f MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and l MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

In terms of RB (Fig. 8), GPCC and EWEMBI perform the best with ± 20 to 40% of bias at the majority of stations. CIHGGO shows a higher RB (80%) at some stations randomly distributed across the country. It is noted that all the observed GPDs show a positive RB in the high elevated northern areas of the country. For the observed GPDs (GPCC, EWEMBI1, and CIHGGO), SPG stations located in south-western and south-eastern parts of the country show comparatively low error values (100 mm) compared to those in the CPRP (Figs. S3 and S4). The higher error values may be due to the high variability of precipitation associated with the simultaneous occurrence of convective phenomena and interaction of easterly/westerly weather systems which cause heavy rainfall in this region.

Fig. 8
figure 8

Relative bias (%) of summer (JJAS) SPG data with a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, f MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and l MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

The reanalysis products show a weak correlation with SPG data (Fig. 7g–l). ERAINT, ERA5, and 20CR do not well-correlate the summer precipitation (CC < 0.5) in the northern parts of the study area; however, correlation values of ERA5, ERAINT, and MERRA2 are slightly higher in the southern parts. JRA55 shows the weakest correlation to SPG data compared to other datasets. In terms of RB, all reanalysis products show larger errors at most of the stations (Fig. 8g–l). ERAINT and ERA5 display RB of > 100% at stations in the north and north-western parts of the country. JRA55 significantly overestimates (> 100%), whereas 20CR and MERRA2 underestimate the summer precipitation at the majority of stations except in GB and southern Balochistan. The magnitude of error values for ERAINT and ERA5 is greater in the northern half of the country including CPRP and the northern mountainous region of GB (Figs. S3 and S4). Similarly, 20CR and MERRA2 show error values ranging between 151 and > 800 mm in GB, whereas JRA55 shows higher RMSE in Southern Punjab and Sindh province.

Based on the above results, it may be concluded that GPCC and EWEMBI1 are identified as the best-performing datasets for the summer season among all the observational GPDs. CIHGGO and CRU can be ranked as the second and third well-performing observed GPDs. Among the reanalysis products, ERA5 and ERAINT show relatively better performance for the summer season. JRA55 significantly overestimates, whereas 20CR and MERRA2 underestimate the summer precipitation.

3.3.2 Winter

The results show that GPCC and EWEMBI1 have good performance for winter precipitation with CC > 0.95 at 21 SPG stations (Fig. 9). CIHGGO is the only observed GPD that performs relatively better in the extreme north high-mountainous areas of the country where CC value reaches > 0.95 over Gilgit and Skardu stations. CPC shows a weak correlation with a CC value of < 0.90 at all stations, whereas CRU performs relatively better in the northern part of the country. MOM shows the highest correlation (CC > 0.95) at 13 SPG stations scattered across the country.

Fig. 9
figure 9

Correlation of winter (DJFM) SPG data with a GPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, f MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and l MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

In terms of RB (Fig. 10), all the observed GPDs except CIHGGO, show a smaller bias error (± 40%) at most SPG stations. These datasets generally overestimate (> 100%) winter precipitation in high-mountainous areas, particularly Gilgit and Bunji stations. Similarly, CIHGGO overestimates at scattered stations of the country except for GB. The RMSE value of > 151 mm is observed by GPCC at a single SPG station, whereas CHIGGO and EWEMBI1 have shown it at two stations. The MAE for GPCC, CIHGGO, EWEMBI1, and MOM is below 150 mm at all stations (Figs. S5 and S6).

Fig. 10
figure 10

Relative bias (%) of winter (DJFM) SPG data with aGPCC, b CPC, c CRU, d CIHGGO, e EWEMBI1, (f) MOM (left panel; observed GPDs), g ERAINT, h ERA5, i 20CR, j JRA55, k MERRA2, and l MRM (right panel; reanalysis GPDs) at each SPG station over Pakistan

The reanalysis GPDs shows a weak correlation with SPG data for the winter season (Fig. 9g–l). ERA5 performs better in southern parts of the country with CC value ranges between 0.81 and 0.90 at 20 stations located in southern parts of the country. The correlation values range between 0.81 and 0.90 and are observed at 10, 9, 5, and 4 stations for ERAINT, JRA55, MERRA2, and 20CR, respectively. EARINT and ERA5 overestimate (RB > 60%) winter precipitation at the majority of stations located in central and northern parts of the country, whereas the bias error is observed the highest (> 100%) in the high-mountainous GB region. 20CR underestimates (− 60 to − 80%) the winter precipitation in most areas of the southern half of the country. JRA55 and MERRA2 show a high value of RB (± 60%) at most of the stations. It is also noted that all the reanalysis GPDs show the highest positive RB (> 100%) in the GB region. For the observed and reanalysis GPDs, the RMSE ranges 100 to 250 mm at stations lying in the CPRP (Fig. S5). The highest RMSE values (250 to 500 mm) in reanalysis datasets are observed in the northern high-mountainous GB region (Fig. S5b). Weak correlation and high values of errors in the GB region may be attributed to its complex terrain, variable climate, and precipitation in the form of snow.

The overall performance of every single GPD on a seasonal timescale is presented in Table 5. A large amount of rainfall over Pakistan occurs during two rainy periods, i.e., summer monsoon (July–September) and western disturbances (December–March) which contribute 45% and 31% to annual rainfall, respectively (Adnan et al. 2018). Table 5 clearly shows that GPCC, EWEMBI1, and CIHGGO show a very strong correlation with CC values of 0.98, 0.98, and 0.97, respectively, during the summer season, whereas CRU and CPC have correlation values of 0.90 and 0.76, respectively. Among all the observed GPDs, the performances of GPCC and EWEMBI1 may be ranked at the top in terms of their high correlation (0.98), low relative bias, and error values. However, these two datasets show relatively weak performance during the winter season.

Table 5 The average of CC, RB (%), RMSE (mm), and MAE (mm) between observed and reanalysis GPDs and SPG data at seasonal (summer, JJAS and winter, DJFM) timescale during 1981–015 over Pakistan (all products passed the significance test at 5% significance level)

The reanalysis products show relatively lower values of correlation < 0.90 with station data during the summer season. ERA5 performs the best among the reanalysis datasets with a correlation of 0.89 and RB at 39.73%. ERAINT produces a lower value of CC compared to ERA5, and error values have almost the same magnitude. JRA55 does not perform well during the summer season with CC 0.67, RB 56%, RMSE 153 mm, and MAE 129 mm.

Taylor diagram (Fig. 11) provides a summary of statistical analysis of how precise the spatial patterns of observed and reanalysis GPDs match with SPG data during the summer and winter seasons. Taylor diagram summarizes three statistical quantities, i.e., spatial correlation, root-mean-square (RMS) difference, and standard deviation in one diagram. Our results clearly show that the GPCC and EWEMBI1 observed GPDs perform very well during both summer and winter seasons with a high CC value, low RMS error, and standard deviation close to SPG data. These datasets show relatively better performance during summer compared to the winter season. The reanalysis products, on the other hand, do not perform well compared to observed GPDs. ERA5, ERAINT, and MERRA2 performance indicate a relatively better relationship with the SPG data during both seasons.

Fig. 11
figure 11

Taylor diagram representing a statistical comparison of gridded observed and reanalysis precipitation with SPG data for summer (JJAS; left) and winter (DJFM; right) seasons for the period 1981–2015

3.4 Annual precipitation evaluation

The results presented in this section are obtained on an annual timescale (figure not shown). GPCC, EWEMBI1, and CIHGGO have CC > 0.95 at 21, 17, and 10 stations, respectively. CPC and CRU show CC < 0.80 at most of the stations, especially in northern parts of the country. GPCC and EWEMBI have low error values in the south and southwestern parts characterized by scarce precipitation; however, these datasets show higher values (250 to 500 mm) in the CPRP. CRU and CIHGGO have RMSE ranging between 250 and 500 mm, whereas CPC has errors in the range of 500 to 800 mm at stations located in the core monsoon belt, indicating the worst-performing data among all the observed GPDs.

Among the reanalysis products, 20CR and JRA55 show a weak correlation at the majority of stations (figure not shown). Although ERA5, ERAINT, and MERRA2 show a high correlation (0.81 to 0.90) in Sindh province, however, in the northern parts, all five individuals and their multi-mean (MRM) show a weak correlation (< 0.5). ERA5 and ERAINT show relatively higher values of RMSE (~ 800 mm) at GB and (250 to 800 mm) at stations in the core monsoon belt. The RMSE values fall below 250 mm in the southern Punjab, Sindh, and Balochistan provinces. JRA55 has the highest RMSE (> 800 mm) at most of the stations of KP, Punjab, Sindh, and GB, making it the worst-performing reanalysis GPD for the region. 20CR generally underestimates the annual precipitation at most of the stations. Contrarily, the JRA55 overestimates the precipitation, especially in the southern and eastern parts of Punjab, Sindh, Balochistan, and GB.

Table 6 presents the overall performance of GPDs on an annual timescale. The table shows that GPCC performs well with a high CC value (0.94), low bias error (5.83%), RMSE (35.36 mm), and MAE (25.06 mm). EWEMBI1 results are comparable to GPCC, and it may be ranked as the second-best performing observed GPD. CIHGGO reveals a similar ability to produce a spatial distribution of high correlations (0.94); however, this data overestimates the annual precipitation by 13.09% which is the highest among all the observed GPDs. CRU has a higher CC (0.91), lower RMSE (32.70 mm), and MAE (25.55 mm). CPC underestimates the precipitation with an average RB of − 23.15% and correlates the least (CC 0.61) with SPG data.

Table 6 The average of CC, RB (%), RMSE (mm), and MAE (mm) between observed and reanalysis GPDs and SPG data at annual timescale during 1981–2015 over Pakistan (all products passed the significance test at 5% significance level)

Among the reanalysis GPDs, ERA5 performs better with an average CC value of 0.87; however, this data overestimates the precipitation by 51.2% with average RMSE and MAE of 201.5 mm and 197.6 mm, respectively. ERAINT shows a similar ability to ERA5 but with a slightly lower correlation (0.81). MERRA2 and 20CR underestimate by − 22.46% and − 31.67%, respectively. JRA55 overestimates the highest with RB of 58.7%. MOM and MRM results show a close approximation to SPG data, hence, proving the effectiveness of the multi-mean approach to cater for the observational uncertainty.

3.5 Assessment in wet and dry years

We have further evaluated the performance of GPDs based on how well these datasets identify wet and dry years of SPG data (Table 7). Annual precipitation data of 35-years have been divided into wet and dry years based on the precipitation difference between the mean annual and the mean of the whole study period. The percentage of precipitation difference (PPD) is calculated between SPG stations and GPDs (see Eq. 5). According to the SPG data, 19 years belong to the set of wet years and the rest 16 years to the group of dry years. The PPDs during wet and dry years are shown in Table 7.

Table 7 The Percentage of Precipitation Differences (PPDs) between observed and reanalysis GPDs and SPG data for wet and dry years during 1981–2015 over Pakistan

Among the observed GPDs, GPCC does not identify dry years except for 1998, 2000, 2001, and 2004; however, it identifies all wet years except 1997. CPC classifies all years as dry except for 1981 and 1982. CRU confirms 1991, 1993, 1999, 2000, and 2012 as dry years and categorizes the rest as wet years. Apart from 1997, CIHGGO ascertains all the years as wet years. Apart from 1998, 2000, and 2001, EWEMBI1 data identifies all dry years as wet years, and wet years as wet except 1997. MOM confirms half of the dry years and, thus, performs the best in identifying the dry years.

Among the reanalysis GPDs, ERAINT, ERA5, and JRA55 identify all 35 years as wet years thus unable to differentiate between wet and dry years. 20CR identifies all years as dry years. MERRA2 confirms 1981 and 1982 as wet years while identifying the rest of the years as dry years. MRM tends to reduce the percentage of precipitation difference but does not assist in the identification of wet and dry years because of larger errors. The majority of the reanalysis GPDs, thus, show poor performance in differentiating dry years from wet years.

3.6 Precipitation centroid

Precipitation centroids of SPG data and observed and reanalysis GPDs are calculated for each year and the results are presented in Fig. 12. The precipitation centroids of the SPG station and observed GPDs are situated close to the CPRP, thus showing northeast-southwest spatial distribution patterns of precipitation centroids. Among the reanalysis GPDs, most of the precipitation centroids of ERAINT and ERA5 are generally located in the north-eastern parts compared to those of SPG data. As per definition, the precipitation centroid would be closer to the regions that receive the highest amount of precipitation, hence, indicating that ERAINT and ERA5 datasets overestimate precipitation in the north and north-eastern parts of the country. MERRA2 has the best-fit spatial distribution of precipitation centroid with reference to SPG data. 20CR has almost a similar pattern to the station with a slight shift towards the north, due to the fact that abundant precipitation is found in the north. JRA55 center of precipitation falls in the same latitude–longitude region but with a significant spread from south-west to north-east.

Fig. 12
figure 12

Precipitation centroids of SPG data (m), observed GPDs; GPCC (a), CPC (b), CRU (c), CIHGGO (d), EWEMBI1 (e), and MOM (f), and reanalysis GPDs; ERAINT (g), ERA5 (h), 20CR (i), JRA55 (j), MERRA2 (k), and MRM (l) over Pakistan during 1981 to 2015. The black box in figure (n) represents the data range in figures al

The centroid migration distance is summed to estimate the degree of shift in the center of gravity of GPDs relative to SPG data (Table 8). The total migration distance of SPG data is observed at 2286.3 km. There is no significant migration of precipitation centroid for GPCC, CIHGGO, and EWEMBI1 compared to SPG data. The precipitation centroid of CIHGGO has migrated only 2283.1 km and is thus found to be the closest to the SPG data, followed by GPCC (2334.6 km) and EWEMBI1 (2348.8 km). CPC and CRU have the largest deviations, i.e., 2026.5 km and 1813.3 km respectively, in the total migration distance from the SPG data.

Table 8 The migration distance (Km) of precipitation centroid of observed and reanalysis GPDs and SPG data during 1981–2015 over Pakistan

Regarding the reanalysis GPDs, the ERAINT precipitation centroid has the farthest migration of 1540.8 km towards the north-east compared to ERA5 (2004.4 km) and other datasets. The migration distance of JRA55 is farthest towards the south 2941 km relative to the SPG data. 20CR and MERRA2 have a shift in centroid towards the north-east. The centroid of JRA55 is spread from south-west to north-east throughout the extent of the country. In 2006 and 2007, the precipitation centroid of JRA55 has a significant shift towards the south, which indicates that JRA55 overestimates precipitation in the southern parts of the country during these two years.

4 Discussion

4.1 Reasons for the difference in performance

The GPCC correlates the best with SPG data compared to other observed and reanalysis GPDs; however, CRU and CPC generally underestimate the precipitation over the majority of the stations. Previous studies found a significant correlation between GPCC and SPG observations over different arid and semi-arid regions of Pakistan (e.g., Ahmed et al. 2019; Krakauer et al. 2019; Cheema and Hanif 2013). Furthermore, EWEMBI1 and CIHGGO are ranked as the second-best observed GPDs in terms of statistical assessments conducted.

One of the possible reasons behind the better performance of GPCC could be the highest number of SPG stations utilized for constructing the data. According to Schneider et al. (2014), GPCC constitutes monthly precipitation data acquired from approximately 85,000 observing stations all over the world, and the number has increased since 2016 to more than 100,000 stations (Schneider et al. 2016). GPCC dataset is spatially interpolated based on an inverse distance weighting (IDW) scheme, using a modified spherical adaptation of SPHEREMAP (Willmott et al. 1985) which is highly robust and considers orography as a factor affecting the distribution of precipitation estimate over the complex terrain (Salman et al. 2019). Weedon et al. (2014) and Frieler et al. (2017) credited the enhanced performance of EWEMBI to the GPCC v5/v6 monthly precipitation totals used for its bias correction. Thus, EWEMBI1 inherits the advantages of the robustness of the interpolation technique and the number of SPG observations included in GPCC. Although CIHGGO performs well in replicating the reference climatic features, yet owing to being a dedicated product for Pakistan, its efficiency remains restricted to the influence of 31 stations only (Ahmad et al. 2019). The modest performance of CPC may be attributed to the smaller number of SPG observations from the region incorporated into the datasets.

Unlike observed GPDs, reanalysis products show overall insufficient performance; however, ERA5 among all reanalysis datasets performs comparatively better in terms of statistical tests used. Previous studies show the superiority of ERA5 over MERRA2 and JRA55 reanalysis data (e.g., Krakauer et al. 2019) which is in line with the results from the present study. The assimilation techniques used in the formation of reanalysis datasets play a crucial role in their ability to simulate actual precipitation received on the ground. As compared to the 80 km horizontal resolution of ERAINT, ERA5 resolves the atmosphere with a finer spatial grid of 31 km (Hersbach et al. 2019; Wang et al. 2019b). According to Palazzi et al. (2013) and Ghodichore et al. (2018, 2019), ERAINT strongly overestimates precipitation in the high mountainous region (e.g., GB) of Pakistan. In previous studies, the overestimation of precipitation in JRA55 is attributed to the spin-down problem that adds an artificial source of excessive precipitation after the initiation of the forecasts (Kobayashi et al. 2015; Ghodichore et al. 2018, 2019). Products like MERRA2 with assimilation of satellite-based observations struggle to replicate precipitation over rugged terrains like those of Pakistan (Liu and Margulis 2019). Attributed to its coarse resolution, Slivinski et al. (2019) have reported imperfection via systematic biases in the representation of tropical precipitation in the 20CR dataset.

4.2 Difference in spatiotemporal scale performance

The results show that most of the observed GPDs well capture the spatiotemporal patterns of precipitation over an entire country. GPCC is ranked at the top followed by EWEMBI1 and CIHGGO. The majority of datasets show low correlation values over the high mountainous region of GB. Among the reanalysis products, ERA5 and ERAINT generally overestimate precipitation over the CPRP during the summer season. Monthly precipitation analysis shows that MERRA2 and 20CR have underestimated precipitation at the majority of SPG stations. In comparison to summer, reanalysis products perform relatively better in winter over the northern parts. The reanalysis GPDs are good at approximating precipitation from large-scale weather systems resulting from western disturbances over Pakistan. The GPDs lack the skill to capture abrupt convective spells of precipitation over a fine spatial scale, which is largely due to their modest grid resolutions.

The JRA55 overestimates precipitation in the south-eastern parts of the country and exaggerates precipitation during the summer monsoon season. The reanalysis GPDs have displayed large errors and greater deviations from the best-fit value at six SPG stations in GB, for which complex orography and high altitude could be held responsible. The six SPG stations are located at an average altitude of 1786 m above sea level (a.s.l) with Skardu at 2317 m a.s.l (see, Fig. 1). Reanalysis datasets have failed to correctly reproduce actual precipitation at such higher elevations. A sparse SPG stations’ density may further sink confidence in the estimated results of precipitation since the existing SPG station network struggles to represent actual hydrological balances over the high-altitude regions of the upper Indus basin (Lutz et al. 2016). Most of GPDs indicate very low CC values and high RB in their statistical metrics in the high-altitude regions. Still, another form of uncertainty in the precipitation estimates derives from the ability of the reanalysis datasets to approximate a total of all the liquid, solid, convective, and advective forms of precipitation, and a simultaneous incapacity of SPG observations in detecting a solid form of precipitation that may produce a deficiency in total recorded precipitation per unit of time (Kochendorfer et al. 2021).

5 Summary and conclusions

Pakistan, a country with diversified topography and variable climatic conditions, is significantly affected by the changes in precipitation patterns during the summer monsoon, and winter seasons. Its economic activities are largely based on agriculture, and it can easily be impacted by variations in precipitation. In this study, the performance of various observed (GPCC, CPC, CRU, CIHGGO, and EWEMBI1) and reanalysis (ERAINT, ERA5, 20CR, JRA55, and MERRA2) GPDs is evaluated against SPG data on monthly, seasonal, and annual timescales during 1981–2015. Several statistical assessments such as CC, RB, RMSE, MAE wet-dry years, and precipitation centroid are conducted. The major findings of this study are summarized below:

Among the observed GPDs, GPCC is ranked at the top with a high CC value of 0.96, low RB (8.20%), RMSE (5.46 mm), and MAE (3.42 mm) on a monthly timescale. In terms of identifying the wet-dry years with respect to SPG data, GPCC detects wet (dry) years with an accuracy of 95% (25%). A bias-adjusted EWEMBI1 data performs in uniformity with GPCC and, hence, has proved to be a good alternative to GPCC that can be used for hydro-meteorological studies over Pakistan. The performance of CIHGGO indicates a better relationship with SPG data (CC 0.95, RB 15.24%, RMSE 7.5 mm/mon, and MAE 5.12 mm/mon). It shows good (poor) performance in detecting wet (dry) years. CRU can be ranked third as well-performed data with values of CC (0.90), RB (4.98%), RMSE (8.11 mm), and MAE (5.74 mm) on monthly basis. However, it can be considered the second-best data in identifying both wet (68%) and dry (31%) years. CPC may be considered a poor-performing dataset compared to other observed products with CC values of (0.80), RB (− 19.85%), RMSE (13.66 mm/mon), and MAE (9.77 mm/mon). In terms of detecting wet and dry years, its accuracy is high (low) for dry (wet) years. All the observed GPDs show low correlation values over the high mountainous region of GB, which could be linked to the sparse SPG observations and the inability of SPG stations to account for total precipitation.

Reanalysis GPDs, on the other hand, show relatively weak performance in reproducing the observed spatial patterns of precipitation. ERA5 performs the best in terms of correlation (0.92), which may be linked to the advanced model dynamics and data assimilation technique utilized in the production of data. Seasonal analysis shows that both ERA5 and ERAINT perform relatively better during the winter season. However, both datasets show high RB in the extreme north and adjoining mountainous regions of the country in both seasons. JRA55 overestimates the summer precipitation, whereas 20CR and MERRA2 underestimate the summer and winter precipitation at most stations. All the reanalysis datasets show high values of error in the CPRP; thus, these products should be used with caution in this region. In terms of detecting wet and dry years, ERA5, ERAINT, and JRA55 (20CR and MERRA2) show relatively good performance for wet (dry) years. Overall, precipitation centroids of GPCC, EWEMBI1, and CIHGGO show a close resemblance to SPG data compared to other gridded products. Among the reanalysis GPDs, only MERRA2 has the best-fit spatial distribution of precipitation centroid with reference to SPG data. Our results might be helpful for researchers who aim to use observed and reanalysis GPDs for hydro-meteorological studies over the region.

Supplementary information.