1 Introduction

Precipitation is one of the most crucial parameters of the hydrological cycle. Climate change is intensifying the hydrological cycle, which is anticipated to have a major impact on regional water resources (Arnell 1999; River et al. 2018). Analyzing precipitation is important for several applications, such as hydrological and climate change impact studies (Azmat et al. 2018a). The management of water resources depends on an accurate understanding of precipitation patterns, which is also required for any impact study involving climate change concerns (Gofa et al. 2019). Accurate modeling of hydrological, ecological, and climatic processes requires reliable spatial and temporal meteorological data (Di Luzio et al. 2008; Yatagai et al. 2012). The best inputs for the aforesaid modeling applications are meteorological datasets based on near-surface observations (Pechlivanidis et al. 2011; Livneh et al. 2015) Although models normally require meteorological data on quasi-continuous, regular grids, meteorological observations are often heterogeneously distributed and clustered around population centers, which presents a practical challenge (Azmat et al. 2020).

During the last few decades, numerous gridded precipitation products have been developed. Gridded precipitation datasets have become increasingly common with the development of satellite precipitation measurements (Li et al. 2021). The four major categories of available data sets are gauge-based, reanalysis, satellite-derived, and merged products. The gauge-based data sets are developed from on-site direct observations and contain reasonably accurate information on the frequencies, amounts, and rainfall types at the measuring stations. These station measurements are frequently employed for calibrating, validating, and bias-correcting reanalysis and satellite products (Dahri et al. 2021). However, the gauge-based products are subject to measurement errors, observational uncertainties, limited spatial and temporal coverage, the uneven distribution of the gauges, and interpolation methods (Nevada et al. 2010; Boers et al. 2016; Prein and Gobiet 2017; Dahri et al. 2018). Several precipitation estimates modeled using Retrospective Weather Forecast Model Analysis (Reanalysis) or obtained from satellite data offer gauge-independent estimates and these products offer practical alternatives to globally consistent, reliable, near-real-time estimates of many meteorological variables (Ghodichore et al. 2018). Reanalysis products can produce high-spatiotemporal-resolution historical datasets with continuous long-term series by combining ground and high-altitude observation data with historical atmospheric results (Zhang and Wang 2022). Merging precipitation datasets from many sources has become a common approach to increasing the precision of precipitation estimates and acquiring realistic spatial distribution patterns (Beck et al. 2017; Xu et al. 2020; Shao et al. 2021). The merged products are more similar to gauge-based products since they incorporate inputs from ground observations. Several attempts have been made to fully utilize the complimentary nature and comparative advantages of gauge-based observations, satellite data, and reanalysis products. There have been many combined precipitation products developed in recent years (Xie and Arkin 1997; Janowiak et al. 1999; Huffman et al. 2007; Weedon et al. 2014; Ashouri et al. 2015; Beck et al. 2019). These data sets primarily use merging methods to reduce the limitations of the source data sets and get a higher-quality final product. Three satellite products based on a high-density rainfall gauge network for the Tibetan Plateau were merged to establish a new merged daily rainfall dataset (Li et al. 2021). A gridded dataset for Java Island was developed by assessing different meteorological datasets (Yanto and Rajagopalan 2017). A high-resolution gridded temperature dataset for the central-north region of Egypt (CNE) was developed by assessing different gridded temperature datasets (Nashwan et al. 2019). Twenty-seven gridded precipitation products, comprising gauge-based, reanalysis, and merged datasets, were evaluated for high altitude Indus Basin (Dahri et al. 2021). Four gridded precipitation datasets were assessed for arid regions of Baluchistan, Pakistan (Ahmed et al. 2019).

Even though gridded products offer better information in terms of spatiotemporal consistency, their inability to accurately forecast the frequency, amount, and precipitation type remains a major problem. On the other hand, several different techniques have been developed to correct data from global climate models (GCMs), ranging from simple linear scaling to more complex nonlinear methods (Teutschbein and Seibert 2012). The gridded datasets could also be corrected by applying bias correction techniques. Several studies have been carried out to assess different bias correction techniques. In a study of an arid area of China, different bias correction techniques for downscaling meteorological variables were compared (Fang et al. 2014). Six bias correction techniques were evaluated for downscaling precipitation over North America (Chen et al. 2013).

High-altitude areas of the Indus basin serve as an important source of freshwater and other important ecosystem services to the entire Indus basin, yet these areas are very scarce in observational data of important hydro meteorological parameters (Azmat et al. 2020). Therefore, climate change and water balance studies in this area generally lack the desirable quality. The climate in the Potohar Plateau ranges from semi-arid to humid, and agriculture is the dominant sector in the Plateau. The Potohar Plateau receives most of its rainfall from the monsoon season (Rashid and Rasul 2011). Wide topography variances, technological limitations, and economic factors are some of the causes of countries like Pakistan having a relatively lower density of precipitation gauging stations. Additionally, the stations are sparsely scattered around the Plateau and are usually found along the Grand Trunk Road (G.T. Road). In this study high resolution (0.08° × 0.08°), long-term (35 years) station-based gridded precipitation, and maximum and minimum temperature datasets were developed for the Potohar Plateau. Before that, gridded datasets were evaluated and bias corrected for the area under consideration. Potohar Plateau is a rainfed region that accounts for 18.6% of the total cultivated land in the Punjab province (Asian Development Bank 2007). Water is the only challenge limiting sustainable agricultural growth in rainfed regions, where cultivation mostly relies on rainfall (Adnan et al. 2009). Furthermore, the Government of Punjab has constructed 55 small dams on this Plateau to facilitate the farmers in terms of irrigation services. Potohar Plateau is experiencing a rapid transformation from merely a rainfed agricultural region to a high-value horticultural region with considerable expansion in built-up areas in the high-altitude semi-mountain areas. The region is tipped as the fruit valley/basket for local use as well as export purposes. The region also contributes significantly towards river inflows of the Indus and Jhelum river basins (Azmat et al. 2018a). Moreover, the gauge stations are sparsely located around the Plateau and the data available from the existing stations has significant gaps. A Long-term gridded dataset in a data-scarce area with limited meteorological stations will offer better estimates of precipitation and temperature and would be helpful in water balance studies for the Plateau in addition to managing and operating the small dams being constructed in the Plateau. Additionally, this study can serve as a guideline for similar studies in the country, especially for the rainfed region of Baluchistan, Pakistan. Also, this study will provide improved and higher-quality database for future planning and development. Adopting appropriate water resource development, harvesting, and management strategies might significantly increase crop yields (Ashraf et al. 2007) which would boost the country's gross domestic product.

2 Materials and methods

2.1 Study area

Potohar Plateau is located in the northern Punjab province, the northern-eastern part of Pakistan. The Potohar is located between 32.5° and 34.0° North latitude and 72° to 74° East longitude and has an area of 22,254 km2. Geographically, the Potohar Plateau is bordered by the Rivers Jhelum and Indus on its eastern and western sides, as well as by salt ranges in the south, the Soan and Haro Rivers in the Kala Chitta Range, and Margalla Hills in the north (Ur Rahman et al. 2020). The major portion of Attock, Chakwal, Jhelum, Rawalpindi districts, and the Islamabad Capital Territory are included in the Potohar Plateau. The topography of the region is very undulating and is formed up of high mountains in the west, undulating plains in the east, and dissected ravine belts. The Plateau’s climate ranges from semi-arid to sub-humid, having hot summers and reasonably chilly winters. Typically, dry and semi-arid conditions dominate in the central and southern parts of the region, respectively, while moist and sub-humid climates are more prevalent in the northern parts of the region (Idrees et al. 2022). The Potohar plateau has a cultivated area of about 55% (Ur Rahman et al. 2020) and about 96% of the cultivated land is dependent on rain, and only 4% of it is irrigated (Amir et al. 2019). The Plateau has a semi-arid to sub-humid climate. Since there is no irrigation network in this area, except for a few tube wells in places like Pind Daddan Khan in Jhelum, agriculture in this area is primarily rain-fed. The annual precipitation in the Plateau ranges between 450 and 1750 mm, and almost three-fourths of it falls during the monsoon (Cheema and Bastiaanssen 2012; Ullah et al. 2018). The summer and winter temperature ranges are 15 to 40 °C, and 4 to 25 °C respectively. The mean annual maximum and minimum precipitation of the Plateau for the study duration (1991–2010) were 1264 mm in 1992 and 661 mm in 2009 respectively, whereas the 20 years mean annual precipitation was 973.5 mm. The mean annual maximum temperature for the study duration (1991–2005) was 30.39 °C in 2002, whereas the mean annual minimum temperature was 13.34 °C in 1993. The 15-year mean annual maximum and minimum temperatures were 29.14 °C and 14.62 °C respectively (Fig. 1). The study area's northwest has the most precipitation, which decreases to an arid state in the southwest (Amir et al. 2019). The soil's texture varies from sandy to silty clay (Afzal 2021). The study area map is presented in Fig. 2.

Fig. 1
figure 1

The precipitation and maximum and minimum temperature of the study area for the study duration

Fig. 2
figure 2

Study area map with meteorological stations and stream network

3 Observed data

The observed climatic data was collected from Water and Power Development Authority (WAPDA), Pakistan Meteorological Department (PMD), Soil and Water Conservation Research Institute (SAWCRI), Chakwal, and Surface Water Hydrology Project- WAPDA (SWHP-WAPDA). The details of the stations and data range are described in Table S1 provided in supplementary materials.

3.1 Gridded datasets

Many studies have evaluated different gridded datasets for different regions. Global Precipitation Climatology Centre (GPCC) performed significantly better than other datasets studied for the arid regions of Baluchistan, Pakistan (Azmat et al. 2018b, Ahmed et al. 2019). GPCC was found as the better-performing dataset for the sub-basins of the Lower Mekong River Basin (Dhungana 2022), as well as for the Upper Benue River Basin, Nigeria (Salaudeen et al. 2021). Asian Precipitation—Highly-Resolved Observational Data Integration towards Evaluation (APHRODITE) has a high correlation with the observed dataset on a mean monthly basis for Pakistan (Ali et al. 2012). APHRODITE had a high correlation with the observed in the study of (Nusrat et al. 2020) and the results of (Nusrat et al. 2022) revealed APHRODITE as the more reliable dataset than the other datasets compared in the study. Reanalysis products are biased in precipitation estimation, yet they are still commonly utilized in climate research (Ma et al. 2009; Lorenz and Kunstmann 2012). ERA5 was found to be the superior dataset for Iran (Taghizadeh et al. 2021). European Centre for Medium-Range Weather Forecasts Reanalysis-Land (ERA5-Land) was cross-validated to check its performance and the dataset provided reasonably good estimates (Syed et al. 2022). GPCC had the best performance followed by the study of (Xiang et al. 2021). Multi-Source Weighted-Ensemble Precipitation (MSWEP) was found as more suitable for hydrological applications in the Yellow River Basin, China (Yang et al. 2020). GPCC, ERA5, and MSWEP provided better estimates than their counter-groups of gauge-based, reanalysis, and merged datasets respectively (Dahri et al. 2021).

In this study, five precipitation products and three temperature products were selected for evaluation. GPCC does not have temperature data, while APHRODITE and ERA5-Land do not have maximum and minimum temperatures. Therefore, Climate Prediction Centre (CPC), Climate Research Unit (CRU), and ERA5 were selected for temperature. The details of all the datasets are given in Table 1 where G, S, and R represent gauge-based, satellite, and reanalysis respectively.

Table 1 Summary of basic attributes of selected gridded datasets

3.2 Methodology

3.2.1 Adjustment of precipitation data for measurement errors

The amount of precipitation that reaches the ground is typically higher than that measured in precipitation gauges because of measurement errors that typically depend on the precipitation type, topography, gauge type, exposure of the gauges to prevalent winds and temperatures, and vegetation near the gauge proximity (Dahri et al. 2018). The majority of the commonly used global precipitation products do not account for wind-induced under-catch (Adam and Lettenmaier 2003), even though it is the most significant source of biases in precipitation measured by gauges (Goodison et al. 1998; Adam and Lettenmaier 2003; Michelson 2004; Wolff et al. 2015). According to the WMO's recommendations and developed methodologies, (Dahri et al. 2018) derived the adjustment factors for precipitation for the Upper Indus basin. The factors derived from their study were used to adjust the precipitation for this study's stations. All the stations in this study had the factors except a few stations for which linear regression analysis was performed against the nearby stations. The adjustment factors for each station are presented in Table S2, provided in supplementary materials.

3.2.2 Gap filling of observed data

A consistent period of 20 years (1991–2010) for precipitation and 15 years (1991–2005) for temperature was selected as the base period. Yet, there were a few data gaps where data could not be observed/measured due to some unavoidable reasons. These gaps were filled using the regression models developed from the concerned station and the nearby station or the gridded datasets. If the fit was less than 60% then the average data of the concerned station was used. The nearby stations were selected by keeping the elevation and distance in view with the station under consideration.

3.2.3 Observed gridded dataset

The observed point data were spatially interpolated through the ordinary kriging method on a grid size of 0.08°. Ordinary kriging (OK), a geostatistical method, is among the most widely employed techniques for spatial interpolation. Geostatistical interpolation techniques, in contrast to deterministic methods, also utilize the statistical characteristics of the measured points. Geostatistical techniques take into account the spatial configuration of the sampling points surrounding the estimation point and determine the autocorrelation among the measured points (Ozturk and Kilic 2016). In terms of precession, the Kriging method of interpolation performs better than the Inverse Distance Weighting (IDW) method (Ahmed and Abdelkarim 2015). The spatial estimate of the unmeasured location yo is obtained by projecting a value equal to the line sum of the known observed values. The formula given by (Hohn 1991; Cressie 2015; Pham et al. 2019) provides a simple representation of OK.

$${\mathrm{X }}^{*} ({\mathrm{y}}_{\mathrm{o}}) =\sum_{\mathrm{i }= 1}^{\mathrm{n}} {\mathrm{ \alpha }}_{\mathrm{i}}\mathrm{X}({\mathrm{y}}_{\mathrm{i}})$$

where X * (yo), X (yi), αi, and n represent the estimated value at the unmeasured point yo, the known value at point yi, the weighting coefficient from the measured position to yo, the number of points in the neighborhood respectively (Hengl 2007). The OK was performed through ArcGIS with an exponential semi-variogram model and an output cell size of 0.08°. The raster files were first converted to common data format (NetCDF) in ArcGIS and were then merged and time-stamped through the climate data operator (CDO). An overview of the adopted research methodology in this study is presented in Fig. 3.

Fig. 3
figure 3

Overview of the adopted research methodology in this study

3.2.4 Assessment of gridded datasets

The performance of the selected gridded products was assessed through commonly used statistical parameters. The period selected for assessment was 20 years (1991–2010) for precipitation and 15 years for temperature (1991–2005). The assessments were performed on a monthly scale. The statistical parameters selected in this study were Kling Gupta Efficiency (KGE), Mean Absolute Error (MAE), Coefficient of Determination (R2), and Root Mean Square Error (RMSE). Before performing the statistical evaluation, all the gridded datasets were re-gridded to the same grid size as the observed dataset by first-order conservative mapping through CDO. From a physical perspective, the conservative method is typically preferred due to the mass conservation restriction, which is preferred for flux variables (such as precipitation) but extended nonetheless to the non-flux variables (Irene Cionni (ENEA) Jaume Ramon (BSC) Llorenç Lledó (BSC) Harilaos Loukos (TCDF) TN (TCDF) (2020)). The Conservative regridding method is more desirable for discontinuous variables such as precipitation (Jones 1999; Saeed et al. 2017).

The KGE (Gupta et al. 2009; Kling et al. 2012) has mostly been used to assess the accuracy of outputs from hydrological or climate models compared to the observed data. However, it can also assess how well the gridded precipitation products perform compared to the relevant reference data (Beck et al. 2019). KGE has three components, the Pearson correlation coefficient (r) measures the degree of the linear relation between two data sets, bias (β) is the ratio of gridded and reference means (μ), and the variability ratio (γ) is the ratio of the gridded and reference data set's coefficients of variation (σ/μ) (σ is the standard deviation). The ideal values of KGE, Pearson correlation coefficient, bias, and variability ratio are 1. MAE calculates the magnitude of the mean differences between two datasets without considering the error’s direction. R2 represents the proportion of the dependent variable's variation that can be estimated from the independent variables. RMSE has been commonly used (Nevitt and Hancock 2000; Kelley and Lai 2011; Ravikumar et al. 2012; Hancock and Freeman 2016) to standardize the units of measures of MSE. The optimum values for KGE, MAE, R2, and RMSE are 1, 0, 1, and 0 respectively. The formulae for the performance evaluation parameters employed are given in Table 2.

Table 2 Summary of the statistical parameters applied

3.2.5 Bias Correction

Compared to the observed climatic data, the gridded dataset’s accuracy is reduced by the inherited model biases. As a result, they frequently need bias correction before analyzing the variables. Numerous studies (Lenderink et al. 2007; Teutschbein and Seibert 2012; Shrestha et al. 2017) have assessed the bias correction methods’ performance, and the majority of them suggest the dependence on location. Linear Scaling (LS) and Quantile Mapping (QM) were used in this study.

The LS technique aims to achieve a perfect match between the monthly mean of corrected and observed values (Lenderink et al. 2007). It performs correction using monthly correction values based on differences between the observed and raw data (gridded datasets in this study). Precipitation is usually corrected with a multiplier and temperature is corrected with an additive term (Fang et al. 2014).

A non-parametric technique QM can be applied to any possible precipitation distribution without making assumptions about it. It can efficiently correct bias in the mean, standard deviation, quantiles, and wet day frequency. The adjustment through QM can be represented as the empirical CDF (ecdf) and its inverse. The formulae of the bias correction methods are presented in Table 3.

Table 3 Bias correction methods formulae

4 Results

4.1 Gap filling of observed data

The gaps (missing data) in the observed station data were filled by performing linear regression against the gridded datasets, nearby stations, and nearby stations' averages. R2 for each month was computed and the gaps were filled by selecting the better fit (better R2 value) among the gridded dataset, nearby stations, and nearby stations' average. The R2 values varied from 0.602 to 0.994. Overall GPCC was a better-performing dataset for precipitation with better results in most cases. For temperature, CRU results were better in most cases. The gap-filling results are presented in Table S3 (precipitation) and Table S4 (temperature), provided in supplementary materials.

4.2 Observed gridded dataset

The observed data were spatially interpolated using Ordinary Kriging on a grid size of 0.08°. The mean annual precipitation was 864.15 mm. The western areas are the dry areas where the mean annual precipitation is less than 650 mm. The southwestern areas of Attock, eastern areas of Chakwal, and southwestern areas of Jhelum have precipitation ranging from 650 to 850 mm. Northeastern areas of Attock, Rawalpindi, and Islamabad are wet areas with mean annual precipitation greater than 1000 mm. The mean annual maximum temperature for the study area is from 29 °C to 31 °C except in the northeastern areas of Attock and Rawalpindi, and Islamabad where the mean annual maximum temperature range is 25 °C – 29 °C. The minimum temperature throughout the study area ranges from 14 °C to 18 °C except for some northeastern areas of Attock and Rawalpindi, and Islamabad where it ranges from 10 °C to 14 °C. The spatial distribution maps of observed mean annual precipitation and temperature are presented in Fig. 4.

Fig. 4
figure 4

Observed mean annual datasets. a precipitation, b maximum temperature, c minimum temperature

4.3 Assessment of gridded datasets

4.3.1 Spatial analysis of gridded datasets

In comparison with the mean annual precipitation of 864.15 mm of the observed dataset, the estimates of APHRODITE, ERA5-Land, MSWEP, and PERSIANN-CDR are 730.10 mm (-15.51%), 1006.70 mm (16.49%), 687.43 mm (-20.45%) and 802.73 mm (-7.10%) respectively whereas GPCC’s estimate of 826.31 mm (-4.37%) is the closest to observed data. The mean annual absolute error for GPCC, APHRODITE, ERA5-Land, MSWEP, and PERSIANN-CDR are -37.84 mm, -134.05 mm, 142.55 mm, -176.62 mm, and -61.42 mm respectively. APHRODITE underestimates the whole area except for the eastern areas of district Rawalpindi and Jhelum where it overestimates. The underestimation is more pronounced in high-altitude areas. ERA5 Land significantly overestimates for the whole area except for the high-altitude areas and regions of Rawalpindi and Islamabad, where it underestimates. Reanalysis products typically exhibit greater variability and wider spread of residuals, which is reasonable and associated to their independence from direct precipitation measurements, use of a variety of assimilation models and assimilation schemes, as well as their use of different types and numbers of assimilation observations (Dahri et al. 2021). GPCC overestimates the southeastern areas of Attock, northeastern areas of Islamabad, northeastern and northwestern areas of Chakwal, and northeastern and southwestern areas of Rawalpindi while underestimating the rest of the study area. MSWEP underestimates the whole area except the eastern areas and the underestimation is more pronounced in northwestern areas and high-altitude areas. PERSIANN-CDR overestimates the eastern and western areas while underestimating the northern, central, and eastern southern areas. The overestimation is higher for high-altitude areas compared to the other areas. The accuracy of gauge-based precipitation products is significantly affected by the absence of observations at higher altitudes, the uneven distribution of the current stations, and measurement errors. The merged products are affected by the errors in their source data and MSWEP and PERSIAN-CDR datasets utilize ground observations but their poor performance can be attributed to the use of different and/or fewer observations.

As for KGE, APHRODITE performs better for most parts except high altitude areas and the southern areas of the study area. ERA5-Land performs better throughout the study area except for southwestern areas and northern parts of Rawalpindi. GPCC performs best among all the products for the study area except for some areas like the northern parts of Attock and some parts of Chakwal, where it performs moderately. MSWEP has poor KGE values for the northwestern areas and high-altitude areas, whereas, for the rest of the area, the KGE values are in a moderate range. PERSIANN-CDR has better KGE values for the area except in northern and high-altitude areas where the KGE values are in a moderate range. The average KGE values for GPCC, APHRODITE, ERA5-Land, MSWEP, and PERSIANN-CDR are 0.75, 0.58, 0.66, 0.52, and 0.71 respectively. In terms of R2 for the whole study area, APHRODITE performs best among all the datasets, ERA5-Land performs moderately, and GPCC performs moderately to better. R2 values for MSWEP are in the moderate range for the study area except for high-altitude areas where the values are relatively better. The performance of PERSIANN-CDR is better in terms of R2 for the whole area except southwestern areas where it performs moderately. The average R2 values are 0.79, 0.81, 0.68, 0.68, and 0.74 for GPCC, APHRODITE, ERA5-Land, MSWEP, and PERSIANN-CDR respectively.

APHRODITE has high RMSE values for high altitude areas, Islamabad, and northern areas of Rawalpindi. The RMSE values for MSWEP are higher in northern and high-altitude areas, whereas for the rest of the area, the RMSE values are in a moderate range. The RMSE values for PERSIANN-CDR are in the moderate to good range except for northern and high-altitude areas where the values are relatively poor. GPCC and APHRODITE have almost the same average RMSE values of 35.11 and 35.13 mm, MSWEP and PERSIANN-CDR have RMSE values of 45.11 mm and 40.20 mm respectively, while ERA5-Land has the highest RMSE value of 52.46 mm. GPCC has the lowest average MAE value of 21.22 mm, closely followed by APHRODITE with 22.02 mm, MSWEP, and PERSIANN-CDR have MAE values of 28.71 mm and 25.01 mm, while ERA5-Land has the highest value of MAE (32.09 mm). The spatial maps of gridded precipitation datasets are presented in Fig. 5 and the statistical evaluation results are presented in Table 4.

Fig. 5
figure 5

Spatial distribution maps for precipitation. a mean annual precipitation, b mean annual absolute error, c KGE, d R2, e RMSE

Table 4 Statistical evaluation of gridded precipitation products

Compared to the observed mean annual maximum temperature of 29.21 °C, the estimates of CPC, CRU, and ERA5 are 28.11 °C (-3.77%), 29.03 °C (-0.62%), and 26.58 °C (-9%) respectively. The mean annual absolute error for CPC, CRU, and ERA5 is -1.1 °C, -0.18 °C, and -2.63 °C, respectively. CPC underestimates the study area except for the northern areas of Attock. ERA5 underestimates the whole study area. CPC has high values for the whole area except for southwest areas. CRU performs best for northern and eastern areas, while the low values are for southern and southwestern areas. ERA5 performs better for the whole area except for high altitude areas and some western and southwestern areas where it has lower values. The KGE varies from 0.65 to 0.98 whereas the average KGE values for CPC, CRU, and ERA5 are 0.86, 0.85, and 0.87, respectively. All the datasets perform best for the whole area and have comparatively low values for high-altitude and southwestern areas. R2 values for the area range from 0.92 to 0.98, whereas the average values are 0.96, 0.95, and 0.97 for CPC, CRU, and ERA5, respectively. Regarding RMSE, CRU has the best values throughout the whole area except for some high-altitude areas where the values are slightly higher. CPC performs better except Chakwal, which has relatively higher values, whereas ERA5 has the highest values among all. CPC has both the highest MAE and RMSE values of 1.6 and 2.01 °C respectively for maximum temperature. CRU has the lowest values of MAE and RMSE of 1.47 °C and 1.81 °C for maximum temperature, followed by ERA5 with values of 1.5 °C and 2.07 °C, respectively.

In comparison with the observed minimum temperature of 15.13 °C, CPC, CRU, and ERA5 estimate 16.60 °C (9.72%), 15.37 °C (1.59%), and 15.04 °C (-0.59%), respectively. ERA5 has a bias of -0.09 °C, while CPC and CRU have a bias of 1.47 °C and 0.24 °C, respectively. CPC overestimates the whole area except some high-altitude areas and some areas of Jhelum. CRU overestimates the northern and western areas while underestimating the southeastern and northeastern areas. ERA5 overestimates southwestern and northern areas while underestimating the rest of the study area. All the datasets have high KGE values throughout the study area except ERA5, which has relatively low values in the high-altitude areas. KGE values range from 0.23 to 0.98 whereas the average KGE values are 0.89, 0.93, and 0.92 for CPC, CRU, and ERA5 respectively. CPC has higher R2 values except for Islamabad, northern areas of Rawalpindi, Attock, and high-altitude areas with relatively lower values. CRU has higher values throughout the study area, while ERA5 has higher values except for some high-altitude areas. The average R2 values for CPC, CRU, and ERA5 are 0.97, 0.98, and 0.98, respectively. For minimum temperature, the highest values of MAE and RMSE, 1.78 °C and 2.12 °C, respectively, are of CPC. The lowest values are ERA5, 1.05 °C, and 1.25 °C, respectively, while the values of CRU are 1.15 °C and 1.27 °C, respectively. The maximum and minimum temperature spatial maps are presented in Fig. 6 and Fig. 7 respectively. The statistical evaluation results for temperature are presented in Table 5.

Fig. 6
figure 6

Spatial distribution maps for maximum temperature, a mean annual maximum temperature, b mean annual absolute error, c KGE, d R2, e RMSE

Fig. 7
figure 7

Spatial distribution maps for minimum temperature, a mean annual minimum temperature, b mean annual absolute error, c KGE, d R2, e RMSE

Table 5 Statistical evaluation of gridded temperature products

5 Seasonal analysis of gridded datasets

The mean monthly precipitation of the gridded datasets against the observed dataset is presented in Fig. 8. Although all the datasets follow the same trend as the observed data, there is significant variation in the estimation of all the datasets. APHRODITE tends to be significantly underestimated, especially during the monsoon season (June–September). ERA5-Land overestimates for all the months except January, which underestimates marginally. During the monsoon season, GPCC underestimates, except for July, in which it overestimates. MSWEP underestimates for all months except April, May, and November. PERSIANN-CDR underestimates for all months except April-July for which it overestimates. Overall, GPCC is better among all datasets in replicating the observed precipitation. GPCC and APHRODITE both are gauge-based products, however, the better performance of GPCC can be attributed to the utilization of a high number of observation stations as well as better interpolation techniques. Further, missing values in APHRODITE are represented as zero (Yatagai et al. 2009) which can contribute to the underestimation.

Fig. 8
figure 8

Mean monthly precipitation of gridded datasets against observed data

CPC and CRU closely follow the observed maximum temperature in the pre-monsoon months (Jan-May) and slightly underestimate the monsoon months (Jun-Sep). CPC slightly underestimates, whereas CRU slightly overestimates the months of Oct-Dec. ERA5 underestimates all the months’ maximum temperatures. CPC overestimates the minimum temperature for all the months. CRU slightly overestimates the pre- and post-monsoon months while underestimating the monsoon months. ERA5 underestimates all the months except the post-monsoon months. GPCC and ERA5 were selected as the precipitation and temperature datasets based on the results of the analysis performed. The mean monthly maximum and minimum temperature of the gridded datasets against observed data are presented in Fig. 9.

Fig. 9
figure 9

Mean monthly maximum temperature of gridded datasets against observed maximum temperature. a Maximum temperature, b Minimum temperature. The MX and MN in the names represent maximum and minimum temperature respectively

6 Performance of bias correction methods

The KGE has been improved for the whole area by LS except for some western and southwestern areas. QM has also improved over most areas except some southwestern and eastern areas. LS and QM have average KGE values of 0.87 and 0.86 respectively. LS has significantly improved the R2 values throughout the area, except for some western areas. QM has not resulted in significant improvement. The average values are 0.82 and 0.79 for LS and QM, respectively. QM performed better in improving the MAE values in the Chakwal, while LS showed improvement in the high-altitude areas and some areas of Attock and Chakwal. The average MAE values are 19.12 and 19.77, respectively, for LS and QM. The RMSE average values are 31.06 and 30.7 for LS and QM, respectively. Table 6 represents the statistical evaluation results of LS and QM bias correction for precipitation. Figure 10 represents the spatial maps of LS and QM bias-corrected GPCC against biased GPCC.

Table 6 Statistical evaluation of bias correction methods for precipitation
Fig. 10
figure 10

Spatial distribution maps of LS and QM bias-corrected precipitation. a KGE, b R2, c MAE, d RMSE

LS and QM for maximum temperature significantly improved the KGE values throughout the area. While LS improved the R2 values for the whole area except for some high-altitude areas, QM did not significantly improve R2 values. LS and QM have an average value of 0.98 for KGE, while R2 average values are 0.98 and 0.97 respectively. In terms of MAE and RMSE, LS outperformed QM significantly. The average MAE values are 0.69 and 0.78, whereas average RMSE values are 1.02 and 1.17, respectively, for LS and QM. The spatial maps of LS and QM against biased ERA5 maximum temperature are presented in Fig. 11.

Fig. 11
figure 11

Spatial distribution maps of LS and QM bias-corrected maximum temperature. a KGE, b R2, c MAE, d RMSE

For minimum temperature, LS performed better in improving both KGE and R2 except for some high-altitude areas, while QM performed relatively poorly. LS and QM both have an average KGE value of 0.98, while the average R2 values are 0.99 and 0.98, respectively. LS outperformed QM for both MAE and RMSE. The average MAE values are 0.52 and 0.67, while RMSE values are 0.68 and 0.86, respectively, for LS and QM. Based on the statistical evaluation results LS was selected for the bias correction of selected gridded datasets. The spatial distribution maps of LS and QM against biased ERA5 minimum temperature are presented in Fig. 12. The average values of statistical parameters for LS and QM bias correction for maximum and minimum temperature are presented in Table 7.

Fig. 12
figure 12

Spatial distribution maps of LS and QM bias corrected minimum temperature. a KGE, b R2, c MAE, d RMSE

Table 7 Statistical evaluation bias correction methods for temperature

6.1 Discussion

In this study, high-resolution gridded precipitation and temperature datasets were developed by evaluating gridded datasets for the Potohar Plateau of Pakistan. Accuracy assessment of the gridded datasets against the observed dataset is carried out by applying several performance indicators. Significant uncertainties and errors were revealed by the results in the estimates of these gridded datasets, especially for precipitation. On an annual scale, the uncertainty ranges from -20.45% to 16.49% for precipitation. The maximum temperature uncertainty range is from -0.62% to -9%, while the minimum temperature uncertainty ranges from -0.59% to 9.72%.

The comparison of observed precipitation with APHRODITE revealed a significant difference. APHRODITE underestimated precipitation both spatially and temporally. Compared to the mean annual observed precipitation of 864.15 mm, APHRODITE produced 730.10 mm. Monthly analysis revealed that APHRODITE underestimates for all months. The APHRODITE monthly precipitation for July and August was 143.71 mm and 147.28 mm compared to the observed precipitation of 184.92 mm and 189.10 mm. Such underestimation of APHRODITE agrees with the findings of (Nair et al. 2009; Ali et al. 2012; Duethmann et al. 2013; Ahmed et al. 2019). APHRODITE is expected to estimate the monsoon months better, but the poor performance may be due to the raw data collected from Pakistan (Ahmed et al. 2019). According to (Yatagai et al. 2009) the missing values in APHRODITE are represented as zero, which can result in underestimation and biases.

ERA5-Land overestimated the precipitation for most of the areas except some high-altitude areas and the overestimation was observed on a temporal scale also. The mean annual precipitation of ERA5-Land was 1006.70 mm compared to the observed precipitation of 864.15 mm. The monthly precipitation for the ERA5-Land was 215.41 mm and 195.81 mm against the observed precipitation of 184.92 mm and 189.10 mm. The reanalysis product shows a tendency for overestimation for most of the study areas, as indicated by (Dahri et al. 2021). ERA5-Land estimates precipitation better in the high altitude areas, which is in line with the findings of (Dahri et al. 2021). Inconsistencies in the assimilated observations, the reanalysis system's physical characteristics, and the model parameterizations utilized for weather forecasting are the main causes of the reanalysis products' shortcomings (Bosilovich et al. 2008; Dahri et al. 2021). As a result, compared to gauge-based and merged products, reanalysis outputs show greater variability and a wider variety of residual errors.

GPCC showed a tendency of underestimation in most parts while overestimation in some central areas was observed, but the errors were lower than APHRODITE and ERA5-Land. GPCC’s mean annual precipitation was 826.31 mm compared to the observed 864.15 mm, the closest to observed among all datasets evaluated in this study. GPCC overestimated mean monthly precipitation for July, while underestimated for August. GPCC July and August values were 190.41 mm and 180.69 mm compared to observed precipitation of 184.92 mm and 189.10 mm. The better accuracy of GPCC in this study agrees with the findings of (Ahmed et al. 2019; Salaudeen et al. 2021). GPCC captured the mean monthly precipitation better than the other datasets which were also reported by (Dhungana 2022). The higher efficiency of GPCC may be attributable to the dataset's development utilizing a high number of observed stations. More than 85,000 stations around the world are used in the development of GPCC as reported by (Schneider et al. 2014). Furthermore, the data undergo a series of automated and visual checks before being used by GPCC. Additional tests are conducted to verify the anomalous data since anomalies and extreme values are typical and cannot be overlooked in the analysis (Schneider et al. 2014).

MSWEP underestimated precipitation spatially except few eastern areas and the underestimation was higher in the northern areas and high-altitude areas. MSWEP underestimated the precipitation for all months except April and May. Compared to the mean annual observed precipitation of 864.15 mm MSWEP estimate was 687.43 mm, depicting significant underestimation. PERSIANN-CDR overestimated precipitation for the western areas and some eastern areas while underestimating for northern and high-altitude areas. The mean annual precipitation estimated by PERSIANN-CDR was 802.73 mm in comparison with the 864.15 mm observed mean annual precipitation. PERSIANN-CDR underestimated the pre-monsoon (Jan-Mar), post-monsoon months (Oct-Dec), and monsoon months of July-Sep while overestimating the months of Apr-July. The merged products are closer to gauge-based products because they use inputs from ground observations. Nevertheless, because of variations in other data sources, they show greater variability and error spreads among themselves. The uncertainties in the merging algorithms as well as the limitations of their source data affect the merged products and the poor performance of merged datasets may be partially explained by the use of different and/or fewer observations. However, a key contributing factor may be the inability of merging methods to retain the comparative advantages of gauge, reanalysis, and satellite data. Any merged products should have higher quality than their parent input data sets, which may be the case in data-rich regions but is not the case in data-scarce regions (Dahri et al. 2021).

For maximum temperature, CPC underestimated for almost the whole area, CRU underestimated in some parts while overestimated in other parts, and ERA5 underestimated throughout the study area. CPC and ERA5 underestimated for all months while CRU underestimated for monsoon season. Compared to the observed mean annual maximum temperature of 29.21 °C, the maximum temperature estimates of CPC, CRU, and ERA5 are 28.11 °C, 29.03 °C, and 26.58 °C respectively. For minimum temperature, CPC overestimated throughout the study area, and CRU and ERA5 underestimated in some parts while overestimated in other parts of the study area. CPC overestimates for all months, and CRU overestimates except for the months of Jun-Aug, which it underestimates. In contrast, ERA5 underestimates except for the months of Sep-Jan, for which it overestimates. Compared to the observed mean annual minimum temperature of 15.13 °C, CPC, CRU, and ERA5 estimated 16.60 °C, 15.37 °C, and 15.04 °C respectively. For temperature, the performance of CRU and ERA5 is better than CPC, and the better accuracy of CRU than CPC aligns with the findings of (Nawaz et al. 2020). ERA5 performing better than CPC is in line with the results of the study (Tarek et al. 2020). GPCC has higher KGE (0.75) and lower MAE (21.22 mm) and RMSE (35.11 mm) values among all datasets. ERA5 has higher KGE (0.87) for maximum temperature, higher R2 values for both maximum and minimum temperature (0.98 and 0.97), and lower MAE and RMSE values (1.05 and 1.25 mm) for minimum temperature. Based on the statistical evaluation results GPCC and ERA5 were selected as the precipitation and temperature datasets.

The selected dataset was bias-corrected using linear scaling and quantile mapping and the performance of these methods was evaluated. Linear scaling performed marginally better than quantile mapping for both precipitation and temperature. Linear scaling had higher KGE, R2, and lower MAE values for precipitation (0.87, 0.82, and 19.12 mm) while for maximum temperature the KGE, R2, and MAE values were (0.98. 0.98, and 0.69 mm) and for minimum temperature the values were (0.98, 0.99 and 0.52 °C). Linear scaling being as effective as quantile mapping when elevation is not considered is also reported by (Shrestha et al. 2017).

Overall, the gridded datasets replicated the seasonal distribution patterns, but large differences were found at the monthly and annual timescale. All the datasets show considerable uncertainties in the precipitation distribution. The results align with the findings of (Dahri et al. 2021) and (Sun et al. 2018), who evaluated and compared 27 and 30 global precipitation datasets. The different structural characteristics, diverse observational densities, input data, different quality control measures, spatiotemporal resolution, and employment of different interpolation techniques and gauges under-catch correction are the major and important attributions for such large uncertainties and differences in the global precipitation datasets (Dahri et al. 2021). According to Pour et al. (Pour et al. 2014), extreme events often occur at the micro-scale, hence gridded data may not be able to accurately capture those events at the point level. According to Schneider et al. (Schneider et al. 2017), a dense network of stations covering a large area is required to accurately capture extreme occurrences in the gridded data at the small or micro scale. One of the main reasons for data scarcity in most parts of the world is the absence of a dense network of stations. Therefore, gridded datasets are the sole source to perform hydro-climatic studies in the regions where observational data are not available, despite several shortcomings such as a reduction in peak precipitation and an increase in wet days (Ahmed et al. 2019).

The understanding of spatial and temporal variations of precipitation is crucial to reduce the uncertainties in hydrological modeling and accurate decision-making in managing water resources. High-quality reliable precipitation data plays an important role in numerous sectors like flood risk management, water allocation management, and agricultural management. The dataset developed in this study can be utilized by water managers in performing accurate hydrological modeling and as a result, adopt more efficient water management strategies. This dataset has been utilized in estimating the water balance for anticipated land use changes in the Potohar Plateau (Idrees et al. 2022). Furthermore, the Government of Punjab has constructed 55 small dams on this Plateau to facilitate the farmers in terms of irrigation services. This dataset can be used in the development of a hydrological model for the management of these small dams in addition to water management for agricultural purposes. Moreover, this dataset can also be utilized in the Global Circulation Models (GCMs) evaluation for the study area for forecasting future precipitation and temperature changes. This study can help in the evaluation of gridded datasets for other regions having similar climatic conditions.

In this study, only a few gridded datasets were evaluated and the evaluation of more gridded datasets may derive different results than this study. Hydrological modeling can be performed to validate the performance of this dataset.

7 Conclusion

The precipitation data for the Potohar Plateau is derived mostly from the in-situ observational stations which are sparsely distributed across the Plateau and often have an incomplete record. This study evaluated gridded datasets for the Potohar Plateau. The results showed that the gridded datasets contain significant biases and require careful bias correction. In this study, GPCC performed better than other precipitation datasets. GPCC had the lower mean annual absolute error (-37.84 mm) and closely followed the observed dataset with minimal deviations from mean monthly precipitation. Furthermore, GPCC had higher KGE (0.75) and lower MAE (21.22 mm) and RMSE (35.11 mm) values. Results for temperature were mixed, but overall, ERA5 performed better than other datasets. CRU had the lower mean annual absolute error (0.18 °C) for maximum temperature, while for minimum temperature ERA5 had the lower mean annual absolute error (-0.09 °C). All the datasets underestimated maximum temperature, while ERA5 was a better dataset for minimum temperature. ERA5 had a better KGE value (0.87) for maximum temperature while CRU had a better KGE value (0.93) for minimum temperature. The RMSE and MAE values for the maximum temperature of CRU were slightly better than ERA5, while for minimum temperature, ERA5 values were better. For bias correction, the LS method performed better than QM for both precipitation and temperature. LS had high KGE, R2, and MAE values (0.87, 0.82, and 19.12) for precipitation. For maximum temperature, LS and QM had the same KGE value (0.98) while LS had better R2, MAE, and RMSE values (0.98, 0.69, and 1.02). Similarly, LS performed better than QM for minimum temperature, and LS had better R2, MAE, and RMSE values (0.99, 0.52, and 0.68). The GPCC and ERA5 were selected as the precipitation and temperature datasets and were bias corrected to get the final precipitation and temperature datasets for the Potohar Plateau.

This data can be utilized in numerous applications such as hydrological modeling, climate change impact studies, and future climate projections. This dataset was utilized to estimate the water balance under anticipated land use for Potohar Plateau and the findings would aid the decision makers in understanding the plausible effects of land use changes on the water balance of the plateau as well as planning and executing adoption strategies (Idrees et al. 2022). This study can be utilized in the evaluation and bias correction of Global Circulation Models (GCMs) for studying the future changes in precipitation and temperature which would help the water managers and policymakers in making policies according to the anticipated climate changes. Furthermore, future water balance estimation studies can also be carried out which would inform the water management about the changes in the water balance and ultimately would help in decision-making for sustainable water management, especially for agricultural purposes.