1 Introduction

Geo-statistical models, as the powerful tool for spatial heterogeneity exploration, can help us to discover the inherent structural characteristics and spatial variation regularity, which cannot be found in classical statistical method [1]. The key concept of these models is that the nearer observations have more influence in estimating the local set of parameters than do observations farther away [2]. This is described by a kernel weighting function based on distances between model calibration points and observation points. Euclidean distance is traditionally used as default in calibrating a geo-statistical model. However, an empirical work has shown that the use of non-Euclidean distance metrics like network distance in geo-statistical can improve model fit [3, 4]. Furthermore, the relationship between the dependent and each independent variable may have its own distinctive response to the weighting computation [5].

Statistical regression and correlation techniques were the common empirical approaches used to quantify the relationships in most of studies [6]. Two regression models of ordinary least square (OLS) and geographic weighted regression (GWR) were developed to examine the relationship between the gridded rainfall data and some influencing variables such as topography. In recent papers, new local hybrid regression models were developed based on aforementioned OLS and GWR models [7].

Due to the complex characteristics of precipitation/rainfall spatial variability and its numerous influential factors, it is difficult to estimate the spatial analysis of precipitation/rainfall [8]. Conventional methods for estimating rainfall distributions are based on gauges and employ spatial interpolations. Accordingly, there are at least 10 types of rainfall spatial interpolation methods [9, 10], and these can be divided into two categories.

The first type includes the ground–based rainfall measurements. The rainfall at a specific location is estimated from gauges at many adjacent sites without considering the impacts of geographical position, topography, and other factors. Geographic spatial interpolation algorithms such as the spline, ordinary Kriging, and inverse distance weighting techniques are typical representative methods [11, 12]. The second type of spatial estimation method combines ground-based rainfall measurements with major influential factors such as geographical position and topography [13, 14].

Climatic variables are usually measured at meteorological stations, and the data are only valid for the point of measurement. To overcome this problem, the gridded precipitation/rainfall dataset can be implemented via from several grid-based sources [15, 16]. Gridded data sets are useful for regional studies on climate variability and changes. This opportunity gives many possibilities for further applications especially for distributed modeling of environmental processes [17]. Since, several studies on spatial representation of Iranian precipitation were published according to station-based data such as [1822]. In recent papers, researchers interested to apply gridded data, especially APHRODITE project gridded data, for spatial analysis of precipitation/rainfall. For example, a research revealed the optimum R 2 statistics equal 72% to measure the agreement between station-based observation of daily precipitation and APHRODITE estimations [23]. Furthermore, another research revealed that APHRODITE dataset potentially could be used for regionalization of precipitation regimes in Iran [15]. In regard of ECMWF application for modeling of precipitation/rainfall data, there are only a few studies carried out in Iran. By verifying ECMWF data set for precipitation forecast over Iran, the ECMWF data set correctly estimated the position of the precipitation band in Iran [24]. According to literature review, application of ECMWF products has been encouraged by [25, 26].

In the present study, the main aim is spatial evaluation of Iranian annual rainfall based on the European Centre for Medium Range Weather Forecast (ECMWF) database. Thereafter, aforementioned rainfall data is analyzed using ordinary least squares (OLS) and geographically weighted regression (GWR) models. For this purpose, rainfall data is defined as dependent variable and topographical altitude, slopeand aspect and other spatial coordination of longitude and latitude, derived from global digital elevation model (GDEM), are defined as independent variables.

2 Methodology

2.1 Data preparation

In this study, daily rainfall gridded data as dependent variable was extracted from ERA-Interim reanalysis version of European Centre for Medium-Range Weather Forecasts (ECMWF) via web site (http://www.ecmwf.int/products). To spatio-temporal analysis of rainfall data, several databases such as NASA/TRMM, NOAA/NCEP, APHRODITE and ECMWF are available for research. Main reason of their application is to estimate rainfall data over the high elevated areas or difficult to access regions without any rain gauge stations. Among aforementioned databases, ERA-Interim reanalysis version of ECMWF database is an open source web-based database with easily to access and friendly to procedure in GIS software. According to [26], ERA-Interim is the latest global atmospheric reanalysis produced by the ECMWF and shows improvements on ERA-40 [27] due to the use of four-dimensional data assimilation (4D-Var), higher horizontal resolution, and bias correction of satellite radiance data [28, 29].

Extracted data with spatial resolution of 12.5° × 12.5° and temporal intervals of 1979–2015 was selected for Iran country with an area of 1,648,000 km2, approximately limited between latitude 25°N to 40°N and longitude 44°E to 64°E. All daily data based on time series with long-term monthly intervals were extracted initially into MATLAB software with NC sheet format. Thereafter, monthly time series were recorded into GIS attribute tables, which corresponds to about 9954 spatial cell pixels. Therefore, the topographical characteristics of altitude, slope and aspect and other spatial coordination of longitude and latitude are considered as independent variables. Attribute table of aforementioned six variables were linked by all 9954 cell pixels of ECMWF gridded data in GIS to establish desired geo-statistical analysis such as OLS and GWR. In this study, aforementioned topographical data was reproduced based on a Global Digital Elevation Model (GDEM) from the Advanced Space-borne Thermal Emission and Reflection Radiometer (ASTER) on the National Aeronautic and Space Administration spacecraft Terra (Fig. 1a). The ASTER GDEM is distributed as Geo-referenced Tagged Image File Format (GeoTIFF) files with overall accuracy of around 17 m with 95% confidence level and a horizontal resolution approximately 75 m, which were extracted via web site (http://www.gdem.aster.ersdac.or.jp/search.jsp).

Fig. 1
figure 1

Geo-statistical variables of the study area. a Altitude mask of global digital elevation model (GDEM) for Iran. b Mean annual rainfall over Iran within 1979–2015 based on ECMWF database

2.2 Geo-statistical models

The regression model parameters derived, such as from the ordinary least squares (OLS) method, are assumed to apply globally over the entire region from which measured data have been taken, based on the assumption of spatial stationary in the relationship between the variables under study. However, in most cases, this assumption is invalid due to account for spatial autocorrelation prevents [30]. In vice versa, geographically weighted regression (GWR), a recent refinement to normal regression methods, explicitly deals with the spatial non–stationary of empirical relationships [31]. The technique provides a weighting of information that is locally associated, and allows regression model parameters to vary in space. This can help reveal spatial variations in the empirical relationships between variables that would otherwise be ignored in the overall analysis [32]. The simple linear model, usually fitted by ordinary least squares methods (OLS) is estimated as follow [33]:

$$\gamma = \beta_{ 0} + \sum\limits_{i = 1}^{p} {\beta_{i} x_{i} } + \varepsilon$$
(1)

where γ is the dependent variable, x i is independent variable such as altitude, slope, aspect, etc., β 0 is the intercept, β i is estimated coefficient, ε represents the error term and p is the number of independent variables. The conventional statistical regression method of OLS is stationary in a spatial sense. It means that a single model is fitted to all data and is applied equally over the entire geographic space of interest. This regression model and its coefficients are constant across space, assuming the relationship is also spatially constant. That is usually not adequate for spatially differentiated data such as rainfall data [6].

GWR is an extension of the common linear regression model. GWR directly builds a relationship between location and parameters using spatial x, y coordinates, as well as the local fitting relationship between the dependent variable and independent variables. GWR also can integrate multiple factors to fit the dependent variable. As a robust tool to describe spatial heterogeneity, the regression coefficients in GWR are not based on global information; rather, they vary with location, which is generated by a local regression estimation using sub-sampled data from the nearest neighboring observations [14]. The principle of GWR allows local rather than global parameters to be estimated as follow [33]:

$$y = \beta_{ 0} \left(u_{j} ,v{}_{j}\right) + \sum\limits_{i = 1}^{p} {\beta_{i} \left(u_{j} ,v{}_{j}\right)x_{ij} } + \varepsilon_{j}$$
(2)

where j is the location, (u j ,v j ) denotes the longitude and latitude coordination of each location in space and x ij is the local independent variable. In geographically weighted regression, the parameter estimates are made using an approach in which the contribution of a sample to the analysis is weighted based on its spatial proximity to the specific location under consideration. Thus, the weighting of an observation is no longer constant in the calibration but varies with different locations. Data from observations close to the location under consideration are weighted more than data from observations far away [32]. Both OLS and GWR models were implemented by spatial-statistical extensions of Arc–Toolbox in ArcGIS version 10.2.

2.3 Validations

The outputs of each regression model include Root Mean Squared Error (RMSE), coefficient of determination (R 2) and local residuals, which are generated by ArcGIS geo-statistical extensions. RMSE and R 2 principally are calculated as follows [34]:

$$RMSE = \sqrt {\sum\limits_{j = 1}^{N} {\left(Y_{j} - O_{j} \right)^{ 2} /n} }$$
(3)
$$R^{ 2} = \frac{{\sqrt {\sum\limits_{j = 1}^{N} {\left[ {\left(Y_{j} - \bar{Y}_{j}\right)\left(O_{j} - \bar{O}_{j}\right)} \right]} } }}{{\sqrt {\sum\limits_{j = 1}^{N} {\left[ {\left(Y_{j} - \bar{Y}_{j}\right)^{ 2} } \right]} } \times \sqrt {\sum\limits_{j = 1}^{N} {\left[ {\left(O_{j} - \bar{O}_{j}\right)^{2} } \right]} } }}$$
(4)

where Y j is the observation measured at j location, O j is the precipitation estimated by a model at location of j, \(\bar{Y}\) is the mean value of all observations, and \(\bar{O}\) is the mean value of all estimated rainfall. Furthermore, the residuals may be used to test the model’s accuracy at predicting local validations by running a test for spatial autocorrelation [33].

2.4 Spatial autocorrelations

Based on the observed and estimated data, some spatial autocorrelation analysis should be utilized to assess the characteristics of the geo-statistical models. In this regard, some procedures have been proposed for autocorrelation analysis in the literature. In this paper, both the Moran’s I statistics is used to measure the spatial autocorrelation of the rainfall gridded cells. Global spatial autocorrelation analysis can be used to describe the spatial characteristic of a given property in the entire study area and reflect the mean of spatial difference between all the spatial cells and their adjacent cells [1]. The normalized Z-Score value of Moran statistics ranges from −1 to 1. Given a certain significance level, a Moran value significantly beyond zero implies spatial positive correlation and obvious spatial clusters of cells with small global spatial difference. On the other hand, a Moran value significantly below zero implies spatial negative correlation and an obvious spatial difference in the attribute values between the cells and their adjacent cells. The global Moran’s I statistics can be calculated as follow [35]:

$$I = \frac{N}{{\sum\nolimits_{i = 1}^{N} {} \sum\nolimits_{j = 1}^{N} {} w_{ij} }} \times \frac{{\sum\nolimits_{i = 1}^{N} {} \sum\nolimits_{j = 1}^{N} {} w_{ij} (x_{i} - \bar{x})(x_{j} - \bar{x})}}{{\sum\nolimits_{j = 1}^{N} {} (x_{i} - \bar{x})_{i}^{2} }}$$
(5)

where N is the number of spatial observation cells, x i is the observed value of cell i, \(\bar{x}\) is the mean of x i , and w ij is the spatial weighting value between the cell i and j.

The global Moran’s I statistics only indicate overall clustering extent but cannot be used to detect spatial association pattern in different locations. To further reveal the spatial autocorrelation of precipitation grids in neighborhood cells and visualize the spatial pattern of local difference, the local spatial autocorrelation statistics, such as local Moran’s I is used to evaluate the local spatial association and difference between each cell and its surrounding cells [1]. For a given spatial cell i, the value of.

Local Moran’s I is computed as follow [36]:

$$I_{i} = x_{i} \sum\limits_{i = 1,j \ne i}^{N} {w_{ij} x_{j} }$$
(6)

where N is the number of spatial observation cells, x i and x j are the standardized observed value of cell i and j, and w ij is the standardized spatial weighting value and \(\sum\nolimits_{j}^{{}} {w_{ij} = 1}\). Similar to the global Moran’s I, the result of local Moran’s I may be estimated by means of Z-Score. In the present study, aforementioned statistics were calculated based on ArcGIS adjusted extensions of spatial statistics tools.

3 Results and discussion

According to ECMWF data, the distribution of mean annual long-term rainfall data in Iran was produced in Fig. 1b. The maximum annual rainfall observed in northern Iran on the southwestern coast of the Caspian Sea, where the mean annual rainfall is over 700 mm. In addition, on the Zagros Mountains, the mean annual rainfall is about 500 mm. The lowest mean annual rainfall, less than 150 mm, occurs in the central and southeastern parts of Iran. On this basis, over than 60% of the surface area of country is categorized as the regions with annual rainfall lower than average class of 250–350 mm.

In next step, both OLS and GWR models were implemented in ArcGIS version 10.2. The produced maps were shown in Fig. 2a, b. This figure revealed that the OLS model has estimated the mean annual rainfall with linear descending trend from northwestern to southeastern Iran, while GWR model has predicted rainfall data affecting by spatial characteristics. To validate the models, rainfall residual values for both geo-statistical models was produced in GIS (Fig. 3a, b). The main advantage of running of regression models in ArcGIS is the output from the process includes the residual for each cell, allowing the researcher to test the residuals for spatial autocorrelation analysis [33]. Figure 3 reveals that high residuals (over than +100 mm) is related to lower estimation of rainfall data against observed data and low residuals (below than −100 mm) is related to higher estimation of rainfall data against observed data. Hence, OLS model has a high gradient between estimated and observed rainfall data compared to GWR model. From the other side, the residuals of OLS model have low spatial adaptions between estimated and observed rainfall data compared to GWR model.

Fig. 2
figure 2

Estimated rainfall according to geo-statistical models. a Ordinary least square (OLS). b Geographic weighted regression (GWR)

Fig. 3
figure 3

Rainfall residual values of the geo-statistical model estimation. a Ordinary least square (OLS). b Geographic weighted regression (GWR)

Rainfall scatter plots for ECMWF observation and geo-statistical model estimation were presented in Fig. 4. On this basis, coefficient of determination (S 2) in OLS and GWR models were calculated as 0.64 and 0.90, respectively. This validation statistics reveal that the GWR model has a significant ability to explanation of more than 90% of rainfall spatial distribution in Iran. Furthermore, the mean errors validation statistics of RMSE for GWR model were calculated as 55, which was twofold smaller than OLS RMSE (103). The values of local Moran’s I statistics for both OLS and GWR models were calculated in GIS as about 0.89 and 0.13, respectively. Hence, the spatial autocorrelation of OLS model that is close to +1 is a sign of spatial positive correlation and obvious spatial clustering effect. However, in GWR model with local Moran’s I statistics as 0.13 there is no other spatial clustering effect.

Fig. 4
figure 4

Rainfall scatter plots for ECMWF observation and geo-statistical model estimation a Ordinary least square (OLS). b Geographic weighted regression (GWR)

The two models, RMSE, S 2 and Moran’s I statistics, were presented in Table 1. We conclude that the GWR model with higher S 2 and lower RMSE is an optimized geo-statistical model for rainfall modeling of Iran based on ECMWF gridded database. This model can explain spatio-temporal rainfall distribution in Iran in attribution to complex terrains and geographic coordination. This model reveals that two high mountain ranges of Zagros and Alborz in west and north, respectively, strikingly affect the temporal and spatial patterns of rainfall. Considering spatial verification results, the ECMWF-based GWR model describes the rainfall structure on the windward side of both mountain ranges. As mentioned by [24], it accurately estimates the amount of precipitation over the Zagros Mountains, but it is not able to estimate the precipitation amounts on the Caspian coast, which can be explained by the complexity of the rainfall process and high contribution of convective cases in this region.

Table 1 Validation statistics of RMSE, S 2 and Moran’s I statistics for OLS and GWR models

In the final step, based on the geo-statistical model of GWR, the correlation matrix of dependent and independent variables is described. The statistical correlation matrix in Table 2 reveals that rainfall data is dominantly depended on geographical latitude and topographical altitude/slope with 0.56 and 0.32 correlation coefficients, respectively. These two independent variables have direct effect on rainfall data variations in each pixel of grids. In other words, we anticipate increased rainfall between 32 and 56% for elevated and high latitude pixels in Iran. Contrarily, the rainfall data decreases as about 74% from west to east longitudes. The variable of aspect has negligible effect on rainfall variations and can be removed it from review.

Table 2 Correlation matrix of rainfall and spatial characteristics in GWR model

In Figs. 5 and 6, the spatial correlation maps in GWR model between aforementioned effective variables and rainfall data were presented. Based on these figures, the most positive effect of altitude variable is observed over the northern (Alborz) and western (Zagros) elevation ranges in Iran. Accordingly, mutual positive effects of altitude and slope variables are concentrated over the elevation ranges in Iran. The negative effects of altitude and slope variables on rainfall variation spatially are distributed on northwestern and eastern parts, respectively. A research revealed the same problem in ECMWF data set for southern coastal areas of Caspian Sea in northward of Iran [24]. However, the most positive effect of latitude and longitude variables is spatially observed over the northern parts in Iran. In vice versa, the negative effects of these variables are also synchronously registered over the Zagros elevation range. It may be related to the alteration of precipitated rainfall to snow falls over this elevated area. Hence, the GWR model estimates an equable geo-statistical correlation between rainfall data and spatial characteristics over southern coastal areas of Iran along Persian Gulf and Oman Sea.

Fig. 5
figure 5

Spatial correlation maps in GWR model between rainfall data and effective variables a Altitude. b Slope

Fig. 6
figure 6

Spatial correlation maps in GWR model between rainfall data and effective variables a Latitude. b Longitude

4 Conclusion

In this study, to analyze geo-statistical modeling of rainfall in Iran, daily rainfall gridded data, used as dependent variable and extracted from ERA-Interim reanalysis version of European Centre for Medium-Range Weather Forecasts (ECMWF), was used. In this regard, we showed that the GWR model with higher S 2 and lower RMSE is an optimized geo-statistical model for rainfall modeling of Iran based on ECMWF gridded database. GWR has the potential to reveal local patterns in the spatial distribution of a parameter, which would be ignored by the OLS approach. Furthermore, OLS may provide a false general relationship between spatially non-stationary variables such as rainfall data [32]. As mentioned by [37], OLS model in this study had residuals that were spatially auto–correlated, while the GWR model was implemented correctly accounting for local spatial variables. This model can explain spatio-temporal rainfall distribution in Iran in attribution to complex terrains and geographic coordination. This model revealed that two high mountain ranges of Zagros and Alborz in west and north, respectively, strikingly affect the temporal and spatial patterns of rainfall.

The statistical correlation results revealed that rainfall data is dominantly depended on geographical latitude and topographical altitude/slope with 0.56 and 0.32 correlation coefficients, respectively. In other words, we anticipate increased rainfall between 32 and 56% for elevated and high latitude pixels in Iran. Also, the spatial correlation in GWR model between aforementioned effective variables and rainfall data exposed the most positive effect of altitude variable spatially observed over the northern (Alborz) and western (Zagros) elevation ranges in Iran. Accordingly, mutual positive effects of altitude and slope variables are concentrated over the elevation ranges in Iran.

As a brief conclusion of this study, geo-statistical analysis of ECMWF data set revealed a better performance in elevated high lands and mountainous regions. Hence, it can be observed that the Zagros and Alborz Mountain ranges have an important role in distribution of rainfall in the country. Nevertheless, the spatial results revealed to a weakness of ECMWF-based geo-statistical models to estimate orographic processes in southern coastal areas of Caspian Sea in northward of Iran. We propose further studies to investigate new geo-statistical models on grid-based precipitation/rainfall data sets e.g., ECMWF, APHRODITE, etc., to develop the accurately data mining, applicable forecasting and so to overcome on such that localized problems.