Introduction 

The climate of a region consists of rainfall, daily maximum and minimum temperatures, relative humidity, and wind speed. Such variables are of great concern in different areas of earth sciences, e.g., agro-meteorology, agro-hydrology, and agro-ecology (Tapiador et al. 2012; Beck et al. 2019). The particular elements of a climate, temperature, and precipitation control the phenology, productivity, abundance, interaction, and geographical distribution of biodiversity and the biotic ecosystem of a region (Early and Keith 2019). However, the unavailability of long-term quality-controlled and spatially continuous climatic data is a significant obstacle to observing a region’s climate (Prein & Gobiet 2017; Nashwan et al. 2019a, b). In precipitation, if rain gauge records are accurate and available for long time series, then less dense spatial distribution hinders their use (Bell et al. 2015; Kidd et al. 2017; Nashwan et al. 2018). To solve this problem, extensive range of global and regional gridded datasets have been developed, which are of high resolution and provide spatiotemporal coverage globally and regionally in ungauged or sparse gauged regions (Gyalistras 2003; Haylock et al. 2008; Yatagai et al. 2009; Schiemann et al. 2010; Belo‐Pereira et al. 2011; Herrera et al. 2012). The applications of high-resolution gridded datasets have been overgrown recently, mainly due to spatial–temporal resolutions. Despite that, there is a significant concern about accuracy in gridded data, which led to a large number of studies related to performance evaluation of gridded data at the regional level (Dinku et al. 2008; Ma et al. 2008; Kyselý and Plavcová 2010; Sylla et al. 2013; Eum et al. 2014; Prakash et al. 2015a, b; Prein and Gobiet 2017; Nashwan et al. 2019a). The performance evaluation of gridded data is performed mainly by comparing it with reliable ground data. Several climatic datasets have developed based on different data sources, e.g., ground observations, satellite, radar, re-analysis, and emerging techniques. These datasets are very useful for finding spatial–temporal trends (Trenberth et al. 2003; Ghosh et al. 2012; Fischer and Knutti 2014) and the reliability of climatic extremes (Trenberth et al. 2003; Ghosh et al. 2012; Fischer and Knutti 2014). In particular to precipitation, literature shows a substantial difference between different global datasets at daily (Wong et al. 2017) to yearly (Sun et al. 2018) time scales, which limits the understanding of regional to global precipitation. There is a need to develop standard accuracy thresholds for gridded climatic datasets, particularly rainfall, on a local to regional scale in Punjab, a province of Pakistan.

The accuracy and performance of the gridded climatic dataset over Pakistan were conducted by Karori and Zhang (2008); Khan et al. (2014, 2018); Bui et al. (2019); Ullah et al. (2019); Adnan et al. 2020; Baudouin et al. (2020); Ougahi and Mahmood (2022); Usman et al. (2022). However, these studies are on particular climatic regions, specific river catchments, and broken past time series datasets. This research evaluates the performance of gridded climatic variables over Punjab over the regional (district) scale. Such performance evaluation and accuracy threshold criteria were never tested before on local to regional scales along such a continuous time series dataset.

Study area

The Punjab (meaning land of five rivers) is the largest province of Pakistan, and this province comprises 36 districts of different geographical extent. It has the largest population and shares in the economy of the country. Agriculture is the major contributor to the province’s economy, which is sensitive to climatic extremes and abnormalities. This province has faced a series of natural disasters, e.g., floods, climatic extremes, and earthquakes, in the past decade and still facing such disasters regularly. The historical earthquake of 2005 and flood of 2010 nearly shuffled its economic conditions thoroughly. The policymakers require accurate and timely information on climatic variables to mitigate regional disasters. Most districts of Punjab face extreme weather with foggy winters, sometimes accompanied by precipitation. The temperature begins to rise from mid-February till April in the springtime. By May, monsoon onsets and brings intensive rainfall, which causes flash floods in the province. June and July are the hottest months of the year. Climatic extremes are notable from hot, barren, and flat terrain in the south to the cool and mountainous Pothohar plateau in the north (Table 1; Fig. 1).

Table 1 Overall climatological view of the study area 
Fig. 1
figure 1

Districts in the Punjab, PMD stations, Training and testing precipitation grids in spatial accuracy ranking matrix 

Dataset and methodology

Datasets

Different gridded datasets of the National Oceanic and Atmospheric Administration, Physical Science Laboratory (NOAA PSL), mainly climate prediction center (CPC) Global Unified Temperature (0.50 — degree latitude × 0.50 — degree longitude grid) and CPC Global Unified Gauge-Based Analysis of daily precipitations (0.50 — degree latitude × 0.50 — degree longitude grid), climate diagnostics center (CDC) NCEP-NCAR Reanalysis-1 surface level products (2.5-degree × 2.5-degree global grids), e.g., wind speed and relative humidity, were tested over nine different districts of Punjab. The following daily variables: maximum temperature “Tmax (CPC),” minimum temperature “Tmin (CPC),” wind speed “WS (CDC),” relative humidity “RH (CDC),” and rainfall “RF (CPC)” and monthly long-term average rainfall “RFM (CPC)” were selected to check their accuracy during the period 1991–2020. NOAA PSL provides such historic datasets globally, four times a day on monthly bases till the present time.

Pakistan Meteorological Department (PMD) is an official department in Pakistan that observes climatic variables on the ground based on meteorological observatories (Fig. 1). PMD (established in 1947) monitors and records maximum and minimum temperature “Tmax (PMD)” and “Tmin (PMD)” in a day, while it records relative humidity “RH (PMD 8 AM) and RH (PMD 5 PM)” and wind speed “WS (PMD 8 AM) and WS (PMD 5 PM)” two times in a day, e.g., 8 A.M. and 5 P.M. (Pakistan Standard Time). PMD’s rain gauges collect total rainfall at a specific location (Fig. 1) and announce it as daily district total rainfall “RF (PMD DT)” and monthly long-termed average rainfall as “RFM (PMD).” PMD observatories are spatially sparse in Punjab, and a gridded dataset may be required to mitigate climatic disasters in the province.

Methodology

Daily datasets from NOAA PSL and PMD were obtained from 1991 to 2020 to check the accuracy of gridded datasets over nine districts of Punjab. Intensive care was taken to select the pixels of NOAA PSL datasets falling within the districts to calculate the districts’ mean of Tmax (CPC), Tmin (CPC), WS (CDC), and RH (CDC). However, different geospatial approaches were used in precipitation, i.e., district average, point-sum, point-average, and district sum, which were compared with “RF (PMD DT).” In the first case of precipitation, the average of all gridded rainfall pixels falling within the district domain was taken, hence written as RF (CPC DM). In the second and third cases, the sum and average of all gridded rainfall pixels were taken over the PMD observatory, which is written as RF (CPC PS) and RF (CPC PM), respectively. In the last case of rainfall, a sum of all gridded rainfall pixels was taken over district “RF (CPC DS).” The climatic datasets were taken in standard units, i.e., precipitation in millimeters (mm), the temperature in degrees centigrade (°C), relative humidity in percentage (%), and wind speed in meters per second (m/s) throughout this research. Climatic data were preprocessed to remove null values and gaps in the time series (Fig. 2).

Fig. 2
figure 2

Spatial representation of Tmax (CPC), Tmin (CPC), WS (CDC), and an idea of averaging is depicted for RH (CDC) in A. Different spatial approaches for RF (CPC DM), RF (CPC PS), RF (CPC PM), and RF (CPC DS) are shown in B, C, D, and E. An approach has been shown over the district’s Faisalabad and corresponding PMD Station for 7 selected pixels

A new Spatial Accuracy Ranking Matrix using a Pixel-based Approach was introduced in this research to check the accuracy of the gridded precipitation in real-time applications and to produce accuracy maps over all of Punjab (A total of 36 districts). A pixel-wise spatial accuracy approach was adopted, i.e., we selected the pixels of gridded precipitation within each district boundary (Fig. 3A). RMSE was calculated for each pixel with the PMD rainfall of the selected district (Fig. 3B); hence, a spatial accuracy grid was formed, which contains pixel-by-pixel RMSE of gridded precipitation. Similarly, a spatial accuracy grid was developed for each district, representing the spatial accuracy of each gridded precipitation pixel (based on RMSE) to PMD rainfall (Fig. 3B). As this spatial accuracy grid contains the RMSE values of each pixel falling inside the district, a modeling approach of pixel-wise accuracy was utilized to predict the gridded rainfall accuracy over the whole province of Punjab (Fig. 3).

Fig. 3
figure 3

An example of the selection of precipitation pixels within the domain of Faisalabad district and pixel-wise RMSE calculation from PMD Station (A) and formation of pixel-wise accuracy grid consisting of RMSE for each selected pixel (B)

This spatial accuracy grid is divided into training and testing grids of 80% by 20% to formulate a model. Due to the different geospatial extent of districts (Fig. 1), the testing pixels were selected with spatial randomness, consisting of the best visual and scatter geographic locations, so that interpolation and extrapolation in the rest of the districts have sufficient training pixels (Fig. 1). Nine different interpolation techniques broadly consisting of purely statistical and geostatistical approaches were utilized to develop pixel-based interpolated spatial accuracy grid. The interpolated spatial accuracy grid was validated against the testing grid, and the model was optimized to get minimum RMSE (Fig. 4). The ranking of the optimized interpolated spatial accuracy grid was declared based on ascending order of minimum RMSE with a testing grid; hence, the best interpolation technique was ranked for each month of the year, and it was named spatial accuracy ranking matrix. This approach formulated a matrix of order 12 by 9, representing month-wise spatial accuracy of different interpolation techniques in a month by accuracy rank all over Punjab (Fig. 4). Below are the details of statistical and geostatistical interpolation techniques used in the spatial accuracy ranking matrix using a pixel-based approach.

Fig. 4
figure 4

Overview of spatial accuracy ranking matrix using pixel based approached used in this research

Inverse distance weighting (IDW) with variable search radius is based on the Tobler’s Law of geography (Saleem et al. 2018). Kumar et al. (2022) and Philip and Watson (1982) used the following equation for IDW:

$${RF}_{idw}=\frac{{\sum }_{i=1}^{n}{w}_{i}\times {RF (CPC)}_{i}}{{\sum }_{i=1}^{n}{w}_{i}} , {w}_{i}=\frac{1}{{dis}_{ji}^{p}}$$
(1)

The geostatistical technique of Kriging was used in research with two modes, e.g., ordinary and universal. McBRATNEY and WEBSTER (1986), Oliver and Webster (2007), and Kumar et al. (2022) mentioned the following formula for ordinary Kriging:

$${RF\left(\mu \right)}_{o}={\sum }_{i=1}^{n(u)}{\gamma }_{i}\times RF({\mu }_{i})$$
(2)

The mathematical formula for Universal Kriging (Gundogdu and Guney 2007) is given below:

$${RF\left(u\right)}_{u}=d\left(u\right)+ \delta (u)$$
(3)

where in (1), \({RF}_{idw}\) is interpolated spatial accuracy grid of rainfall pixels through the IDW, \({RF (CPC)}_{i}\) is the known value of rainfall pixel at an ith location, Wi is the weight,\({dis}_{ji}^{p}\) is the displacement between pixels at ith, jth, and \(\rho\) is the degree of distance. In (2), \({RF\left(\mu \right)}_{o}\) is the interpolated spatial accuracy grid through ordinary Kriging at a given location of \(\mu\), \(RF({\mu }_{i})\) are given a spatial accuracy grid, \({\gamma }_{i}\) is kriging weight for minimizing the variance, and \(n(u)\) is the total surrounding samples in predicting the value of \({RF\left(\mu \right)}_{o}\). In (3), \({RF\left(u\right)}_{u}\) is the predicted rainfall through universal Kriging, \(d\left(u\right)\) is the deterministic function, and \(\delta (u)\) is macroscale or random variations. The following semi-variance models of Kriging were incorporated in this research: ordinary Kriging with a circular semi-variance model (KOC), ordinary Kriging with an exponential semi-variance model (KOE), ordinary Kriging with Gaussian semi-variance model (KOG), ordinary Kriging with a linear semi-variance model (KOL), and ordinary Kriging with a spherical semi-variance model (KOS). The other modes of Kriging were universal Kriging with linear drift written as “KU (LL)” and universal Kriging with quadratic drift written as “KU (LQ).” Spline interpolation predicts values using a mathematical function that minimizes the overall surface curvature, resulting in a smooth surface that passes perfectly through training pixels (Shekhar and Xiong 2008).

The following performance and quality evaluation parameters were adopted to check NOAA PSL quality and accuracy with the real-time PMD dataset:

$$RMSE=\frac{\sqrt{({x}_{PMD}-{x}_{Gridded})}}{n}$$
(4)
$$SKWN=\frac{\sum_{i}^{n}(x-{x}_{mean})}{\left(n-1\right)\times {\sigma }^{3}}$$
(5)

where in (4), RMSE is the root mean square error, and its minimum value indicates more accuracy in dataset. SKWN is the skewness of the dataset, and \({x}_{PMD}\) is the PMD dataset; \({x}_{Gridded}\) is the NOAA PSL datasets, n is the total number of samples, and \(\sigma\) is the standard deviation of datasets. The second standard error of SKWN was suggested by Brown (1997) and Doane and Seward (2011) and given as follows:

$$2\mathrm{nd}\;\mathrm{standard}\;\mathrm{error}\;\left(SKWN\right)=2\times\sqrt{\frac6n}$$
(6)
$$KRTS=\frac{\sum_{i}^{n}{\left(x-{x}_{mean}\right)}^{4}}{n\times {\sigma }^{4}}$$
(7)

where in (7) KRTS is the kurtosis of datasets and its 2nd standard error suggested by Brown (1997) and Doane and Seward (2011) as follows:

$$2\mathrm{nd}\;\mathrm{standard}\;\mathrm{error}\;\left(KRTS\right)=2\times\sqrt{\frac{24}n}$$
(8)

The correlation coefficient (CORREL) was calculated by using the following formula:

$$CORREL=\frac{\sum ({x}_{PMD}-{x}_{PMDmean})({x}_{Gridded}-{x}_{Griddedmean})}{\sqrt{\sum {\left({x}_{PMD}-{x}_{PMDmean}\right)}^{2}{({x}_{Gridded}-{x}_{Griddedmean})}^{2}}}$$
(9)

where in (9), \({x}_{PMDmean}\) and \({x}_{Griddedmean}\) are the means of PMD and NOAA PSL datasets respectively. The following classifications of CORREL were adopted in research, e.g., 0.2 to 0.39 a weak, 0.4 to 0.59 a moderate, 0.6 to 0.79 a strong, and 0.8 to 1.0 a very strong correlation between variables.

In order to check relative deviations in gridded and ground observed datasets, the standard deviation ratio (RSTD) was calculated using (10):

$$RSTD=\frac{{x}_{Gridded}}{{x}_{PMD}}$$
(10)

The following possibilities of RSTD were used, e.g., RSTD > 1.0, RSTD = 1.0, and RSTD < 1.0

Results and discussions

Brown (1997) and Doane and Seward (2011) made an interpretation of skewness and kurtosis, which was used in this research. The value of 2nd standard error (SE) of skewness for the dataset used was \(\pm 0.258\). Tmax (CPC) and Tmax (PMD) remain negatively skewed in all districts of Punjab, and Tmax (CPC) follows the same pattern of skewness as Tmax (PMD) in 9 districts of Punjab (Table 2). However, such negative values were beyond the standard error (SE) of skewness, so a non-symmetric curve was declared. Tmin (CPC) was symmetric and negatively skewed and followed the same skewness trend as that of Tmin (PMD). The results of WS (CPC) reveal a non-symmetric nature of data in all districts, and a similar direction of positive skewness was observed for WS (PMD 8 AM) in 9 districts of Punjab (Table 2). WS (PMD 5 PM) followed symmetric nature of data in Lahore, Multan, and Sargodha districts (Table 2). In contrast, the rest of the districts follow the pattern of non-symmetric distributions like WS (CPC). WS (PMD Mean) was found normal in Lahore and Sialkot districts, while in the rest, it remains non-symmetric with positive skewness like WS (CPC). RH (CDC) follows a symmetric nature in Jhelum and Sargodha districts. In the rest of the districts, the non-symmetric nature of the dataset was found, i.e., negatively skewed in Sialkot and positively skewed in the rest of the districts (Table 2). RH (PMD 8 AM) was symmetric in the Jhang, and the rest of the districts were revealed non-symmetric with negative skewness (Table 2). RH (PMD 5 PM) was also symmetric with slightly negative skewness in Bahawalnagar, Faisalabad, Lahore, and Multan, while non-symmetric in the rest of the districts (Table 2). RH (PMD Mean) was symmetric in the Jhelum district; in the rest, it was non-symmetric with negative and positive skewness. There was no symmetry in RF (CPC DM), RF (CPC PM), RF (CPC PS), RF (CPC DS), and RF (PMD DT) because skewness was positive and much beyond the SE of skewness (Table 2).

Table 2 Results of skewness of climatic datasets used in research

The value of the 2nd standard error (SE) of kurtosis for the dataset used was \(\pm\) 0.516. Tmax (CPC), Tmax (PMD), Tmin (CPC), and Tmin (PMD) were fairly platykurtic (that is, ordinarily high) with slightly negative kurtosis values in 9 districts of Punjab. WS (CPC) was mesokurtic in Bahawalnagar, Bahawalpur, Multan, and Sialkot, while slightly negative mesokurtic in Jhelum and the rest of the districts; WS (CPC) was leptokurtic.WS (PMD 8 AM) was approximately mesokurtic in Jhang and Multan, and it remained slightly negative mesokurtic in Lahore, Sargodha, and in rest, had leptokurtic distributions. WS (PMD 5 PM) was slightly negatively mesokurtic in Bahawalnagar, Bahawalpur, Jhang, Jhelum, and had a flat platykurtic distribution in Lahore, Multan, and Sargodha, and in Sialkot, it had a flatten leptokurtic kurtosis (Table 3). WS (PMD Mean) had mesokurtic distributions in Jhang, Jhelum, Multan, Sialkot, Bhawalnagar; platykurtic in Lahore, Sargodha; and leptokurtic in Bhawalpur district. RH (CDC) had normal distributions in Jhelum, Sialkot; platykurtic in Sargodha; and in the rest of the districts, it was leptokurtic. RH (PMD 8 AM) was mesokurtic in Jhelum, Bhawalpur, and slightly negative mesokurtic in Bhawalnagar, Lahore, Jhang, and Multan. It was platykurtic in the rest of the districts. RH (PMD 5 PM) was slightly negative mesokurtic in Bahawalnagar, Bahawalpur, Jhang, Sialkot, Sargodha, and fairly mesokurtic in Faisalabad, and at rest, it had platykurtic distributions. RH (PMD Mean) followed negative mesokurtic distributions in all districts except Sialkot, which had much leptokurtic, and Jhelum, which had platykurtic distributions. RF (CPC DM), RF (CPC PM), RF (CPC PS), RF (PMD DT), and RF (CPC DS) had very much tall leptokurtic distributions in all selected districts of Punjab (Table 3).

Table 3 Results of kurtosis for the climatic datasets used in this research

The correlation coefficient was calculated with the corresponding climatic variable of the PMD observatory in the district to check the linear relation between the variables. Tmax (CPC) showed a very strong correlation with Tmax (PMD) in each district except Sargodha, which had a moderate correlation. Similarly, a strong correlation was found between Tmin (CPC) and Tmin (PMD), except in Sargodha, where a weak correlation was found. WS (CPC) showed a strong to moderately strong correlation with WS (PMD 8 AM), WS (PMD 5 PM), and WS (PMD mean) in Bahawalnagar, Bahawalpur, and Multan, respectively. It had a weak correlation in Jhang and Jhelum and no correlation in the rest of the districts (Table 4). RH (CDC) showed a moderate to fair strong correlation with RH (PMD 8 AM) in Faisalabad, Sargodha, Lahore, Sialkot, Jhelum, and Jhang, respectively, and a weak correlation in Multan. RH (CDC) showed no correlation with RH (PMD 8 AM) in the rest of the districts. RH (CDC) revealed a moderate to strong correlation with RH (PMD 5 PM) in Bahawalpur, Bahawalnagar, Sargodha, Multan, Faisalabad, Jhelum, Sialkot, and Lahore, respectively (Table 4). A very strong to solid uphill was found between RH (CDC) and RH (PMD Mean) in Jhang, Sialkot, Lahore, Sargodha, and Faisalabad districts, respectively. A moderately strong to weak correlation comes from these climatic variables in Bahawalnagar, Bahawalpur, and Jhang, respectively. RF (CPC DM) revealed a relatively fair strong correlation with RF (PMD DT) in Bahawalnagar, Sialkot, Jhang, Lahore, Faisalabad, Bahawalpur, and Multan, respectively. RF (CPC PM) revealed a moderately strong correlation with RF (PMD DT) in Sialkot, Jhelum, Jhang, and Faisalabad districts, respectively, and a robust correlation was observed in Bahawalnagar and Sargodha, and a weak correlation in Multan (Table 4). A solid correlation of RF (CPC PS) observed in the Bahawalnagar, Lahore, and Sargodha districts showed a moderate to strong correlation in Sialkot, Bahawalpur, and Jhang, respectively. This variable showed a weak correlation in the rest of the districts (Table 4).

Table 4 Results of the correlation coefficient for the climatic dataset used in this research

The standard deviations ratio (RSTD) was calculated by dividing the standard deviations of gridded data by the ground observed variable (Table 5).

Table 5 Results of RSTD of the climatic dataset used in this research

There are three possibilities, e.g., RSTD > 1, RSTD = 1, RSTD < 1.0. The first case means the standard deviation (STD) of the gridded dataset is more than the standard deviation of the corresponding variable observed on the ground. The second case (RSTD = 1.0) means the STD of both gridded and ground observed datasets are of the same spread, and the last case (RSTD < 1.0) means the STD of the gridded climatic variable is less than the STD of the ground observed one.

With this interpretation, Tmax and Tmin have RSTD near 1.0, indicating the same spread of both datasets in districts of Punjab (Table 5). WS (8 AM), WS (5 PM), and WS (Mean) have RSTD nearly equal to 1.0 in Bahawalnagar and Lahore districts, while it has RSTD < 1.0 in Multan and the rest of the districts have RSTD > 1.0. RH (8 AM), RH (5 PM), and RH (mean) have RSTD nearly equal to 1.0 in all districts except in Jhang, where RSTD < 1.0, revealing the spread in WS (CPC) to WS (PMD 8 AM, 5 PM and mean). RF (CPC DM) has RSTD, nearly 1.0 in Bahawalpur and Faisalabad; less than 1.0 in Bahawalnagar, Sargodha, and Sialkot; and greater than 1.0 in the rest of the districts (Table 4). RSTD of RF (CPC PM) was near 1.0 in Bahawalnagar, Bahawalpur, Faisalabad, and Lahore, and in the rest of the districts, RSTD < 1. RSTD of RF (CPC PS) has < 1.0 in Jhelum, Sargodha, Lahore, Sialkot, and it has RSTD > 1.0 in the rest of the districts (Table 5).

RMSE was the last and most crucial parameter deployed to find the accuracy of NOAA PSL datasets. Tmax (CPC) and Tmin (CPC) found an accurate representation of Tmax (PMD) and Tmin (PMD) as their RMSE remains very low (Table 6).

Table 6 Root mean square error of gridded climatic dataset with ground observed data

Similarly, the RMSE of WS (CPC) with WS (PMD 8 AM), WS (PMD 5 PM), and WS (PMD mean) showed excellent accuracy (RMSE \(\cong 0)\) in all of the districts except Sargodha, where accuracy slightly varied (RMSE \(\cong 3-5)\). RMSE of RH (CPC) with RH (PMD 8 AM) was marginally more than RH (PMD 5 PM) and RH (PMD Mean) in most of the districts (Table 6). Maximum RMSE of 3.88 in RH (CDC) with RH (PMD 8 AM) describing 3.88% relative humidity error in RH (CDC) from PMD observatory. RF (CPC DT), RF (CPC DM), and RF (CPC PM) were very accurate (RMSE \(\cong 0-2)\) in Punjab, except in Sargodha, where RF (CPC DT) was poor (RMSE > 6) with the ground observatory. RF (CPC PS) was also very accurate (RMSE \(\cong 0-2).\)

Spatial accuracy ranking matrix of gridded rainfall by using a pixel-based approach

This approach was developed on monthly long-term mean gridded precipitation RFM (CPC) and long-term mean ground rainfall collected by PMD RFM(PMD). According to the methodology described above, the spatial accuracy grid was divided into training and testing grids of 80% by 20% (Fig. 1). The training grids were given as input to the model, and the spatial accuracy grid was interpolated up to Punjab province using statistical and geostatistical interpolation techniques, and an interpolated spatial accuracy grid was formed. RMSE was calculated of interpolated spatial accuracy grid with the testing grid, and results are formulated in Table 7.

Table 7 Optimized RMSE for interpolated spatial accuracy grid with the RFM (CPC) testing grid by nine different interpolation techniques

A spatial accuracy ranking matrix (Table 8) is developed from Table 7 by rearranging it in the ascending order of minimum RMSE to months. This spatial accuracy ranking matrix concluded the month-by-best ranking matrix for gridded rainfall with the sparse rain gauges in the Punjab province of Pakistan. The nine ranks (columns) in this matrix represent the order of the best interpolation technique in space by time; hence, the best interpolation is placed at rank first and the least at the end. The 1st rank in the matrix was declared the optimum, in which the best interpolation techniques among the nine were placed (Table 8).

Table 8 Spatial accuracy ranking matrix using the pixel-based approach for RFM (CPC)

In the first rank of the spatial accuracy ranking matrix (Table 8), IDW produced RMSE of 2.55, 2.86, 6.71, 0.86, and 0.36 in January, May, September, October, and November, and KOE produced RMSE of 3.18, 3.56, 3.15, 10.72, 19.92, and 1.32 in February, March, April, June, August, and December (Table 9). In the 2nd rank, KOE, KOL, KOG, KOS, and KOC produced the least RMSE with testing pixels. KOE produces RMSE of 2.76, 22.35, 9.83, 1.13, and 0.9 in January, July, September, October, and November, and KOL produces 3.21, 3.58, and 3.16 in February, March, and April. In the 3rd rank, KOC, KOS, and KOL remain the least RMSE-producing interpolation techniques. RMSE from KOC was 2.78, 3.21, 3.58, 3.16, 10.73, 20.03, 9.85, and 1.01 in January, February, March, April, June, August, September, and November. The RMSE of KOC, KOL, and KOS was the same in February, March, April, June, and September. In rank 4 in the spatial accuracy ranking matrix (Table 8), RMSE produced through KOC was 3.21, 3.58, 3.16,10.73, 22.44, 9.85, 1.35, and 1.01 in February, March, April, June, July, September, October, and November. RMSE of KOC and KOS was the same in February, March, and April, and it was also the same for KOL in September. KOL, KOC, KOS, KU (LQ), and spline interpolations remain optimum in rank 5 in the spatial accuracy ranking matrix. In this accuracy rank, RMSE from KOL was 2.90, 4.42, 22.45, 9.85, 1.45, and 1.13 in January, May, September, October, and November, besides KOL, KOC and KOS produced the same RMSE in May and September. In this rank, KU (LQ) and spline interpolations performed well in February, March, April, and December. KU (LQ), spline dominated most of the months in rank 6 in the spatial accuracy ranking matrix. RMSE from KU (LQ) was 4.08, 5.67, 12.01, and 2.09 in January, March, June, and November. It was 5.66, 25.58, 27.58, 9.92, 2.09, and 1.93 from spline interpolation in February, July, August, September, October, and December (Table 9). KOC, KOE, KOL, and KOS produced the same RMSE in May, ranking 6th in the spatial accuracy ranking matrix. Similarly, in the 7th rank, KU (LL), KU (LQ), KOG, and spline, in 8th rank KU (LL), and in 9th rank KOG, IDW, and KU (LL) remain the maximum-producing RMSE with testing pixels (Table 9).

Table 9 RMSE values of spatial accuracy ranking matrix using the pixel-based approach

These nine ranked spatial accuracies of gridded rainfall can be classified into three categories: good, moderate, and poor. This classification was based on two parameters, i.e., raining and less rainy months. Raining months are monsoon months, e.g., June to September, and the rest are less rainy in Punjab. The spatial accuracy ranking matrix is classified as good if RMSE values from different interpolation techniques remain less than 5 in less rainy months, and for rainy months, it remains less than 20. Similarly, the spatial accuracy ranking matrix is classified as moderate if RMSE remains less than 6.5 in less rainy months, and rainy months remain less than 30. Class poor is defined as if RMSE remains less than 9 in less rainy months and greater than 30 in rainy months of the year. According to this classification, ranks 1 to 4 in the spatial accuracy ranking matrix fall under class good, 5 to 6 fall under class moderate, and the rest fall under poor. Statistical and geostatistical interpolations performed very well in each rank, e.g., produced RMSE < 10. However, in the rainy months of the year, the accuracy of interpolations (RMSE values) was found directly varying with ranks in the spatial accuracy ranking matrix (Fig. 5).

Fig. 5
figure 5

Monthly performance of different interpolation techniques in RFM (CPC)

This accuracy ranking matrix helps the user of the RFM (CPC) interpolation technique to be feasible in space and time when less dense rainfall gauges are available.

Accuracy mapping of gridded rainfall

Using the first accurate rank in the spatial ranking matrix of RFM (CPC) from Table 8, accuracy mapping (in mm) was processed over Punjab. The accuracy mapping was downscaled to 0.25 by 0.25 grid so that good accuracy may be achieved in visualization. The accuracy mapping of the Rank-1 in the spatial accuracy ranking matrix of RFM (CPC) was mapped for each month of the year (Fig. 6). This mapping revealed good RFM (CPC) accuracy in the South Punjab region (Bahawalpur, Bahawalnagar, Rahim Yar Khan, Rajanpur, Muzaffargarh, DG Khan, Layyah, Multan, Lodhran, Khanewal, Vehari districts) throughout the year. The central Punjab region (Lahore, Sheikhupura, Kasur, Nankana, Sahiwal, Pakpattan, Okara, Faisalabad, Chiniot, T.T. Singh, Narowal, Gujrat, Gujranwala, and Sialkot) reveals well to moderate accuracy throughout the year. In the central Punjab region, the results were good in winter, spring, and autumn seasons; however, its accuracy went from average to poor in the monsoon season (May to September). In the Northern Punjab (Rawalpindi, Jhelum, Chakwal, Attock, Mianwali, Khushab, Bhakkar, Jhang, Sargodha, and Hafizabad districts), mapping of rank-1 in spatial accuracy ranking matrix remains the poorest throughout the year. The Northern Punjab region consists of a high terrain area in the province (Fig. 1); the Central Punjab region has moderate to flat terrain, and the South Punjab region mainly consists of flat terrain. Hence, Rank-1 in the spatial accuracy ranking matrix enormously varied with the districts’ elevations and the monsoon period (Table 10).

Fig. 6
figure 6figure 6figure 6

Monthly accuracy maps of RFM (CPC) over all districts of Punjab province

Table 10 District-wise RMSE (in millimeters) for each month by using the Rank-1 in the spatial accuracy ranking matrix

Conclusions

The following are the outcomes of the research work:

•Tmax (CPC) and Tmin (CPC) followed the same data quality as Tmax (PMD) and Tmin (PMD), and the RMSE error remains < 0.5 °C in nine districts of Punjab. WS (CPC) also followed the same pattern of accuracy with ground observed wind speed; the results reveal its accuracy is very high (RMSE < 1 m/s), and RH (CDC) was also very accurate and had RMSE < 3%.

•The performance of daily gridded precipitation evaluated in different spatial domains, i.e., point-to-point and polygon-to-point, reveals that its district average precipitation produced high accuracy (RMSE < 3 mm) over sparse rainfall gauges.

•The spatial accuracy of RFM (CPC) using statistical and geostatistical interpolations techniques was predicted for the remaining 27 districts of Punjab. The model was optimized based on the least RMSE, and a spatial accuracy ranking matrix (of order 12 by 9) was produced. This matrix provides 9 different accuracy ranks of statistical and geostatistical interpolations in space and time. IDW and KOE remain the best in most elements of Rank-1 in the spatial accuracy ranking matrix throughout the year.

•The nine accuracy ranks in the matrix were classified as good, moderate, and poor. This classification of nine accuracy ranks in the spatial accuracy ranking matrix was based on two parameters, i.e., rainy and less rainy months, with certain thresholds of RMSE. Ranks 1 to 4 in the spatial accuracy ranking matrix were declared good, ranks 5 to 6 fall under moderate, and the rest fall under poor classification.

•The accuracy mapping of RFM (CPC) revealed that this dataset is perfect and accurate in South Punjab (flat terrain region). It is also good in central Punjab (flat-to-terrain region) during winter, spring, and autumn, while it has moderate accuracy in the monsoon season. Cross-validation concludes the abysmal performance of RFM (CPC) in the northern region of Punjab (Highly terrain region) in each month of the year. RFM (CPC) accuracy is directly related to terrain and the monsoon period.

The spatial accuracy ranking matrix was developed on monthly long-term average gridded precipitation using ground-based long-term average rainfall data; however, investigations are suggested on a daily or hourly dataset to understand more precision and accuracy of precipitation.