Introduction

Urbanization and human activities affect the climate of cities and surface temperatures. Ground-based weather stations cannot provide sufficient land surface temperature (LST) data as they are not well distributed within the area (Hereher 2017). One of the most effective methods for measuring surface temperatures worldwide with high temporal and spatial resolution is remote sensing (Li et al. 2013). The spatial distribution of LST varies according to the type of land cover (Voogt and Oke 2003; Ali and Shalaby 2012). Geographically weighted regression (GWR) is a spatial statistical method for spatial modeling of heterogeneous processes, which allows the relationship between response variables and a set of auxiliary variables to be different across geographic locations (Brunsdon et al. 1996, 1998; Fotheringham et al. 1996, 1997, 2003). A major component of GWR is the space weight by which the spatial relationships are created. Usually, space weights are defined by spatial nuclear functions such as Gaussian or bisquare functions (Fotheringham et al. 2003), in which weights are related to closer observations. The GWR model provides a more precise prediction for the response variable (Hession and Moore 2011; Chu 2012). This model can estimate regression coefficients in each situation (Ahmadi et al. 2018b).

GWR is a new approach to modeling heterogeneous spatial processes and, due to its greater analytical capability and further details, leads to increased accuracy and efficiency (Ahamdi et al. 2018b). GWR approaches are methods of exploring spatial variations (Işik and Pinarcioğlu 2006; Mennis 2006; Wen et al. 2010). Application of the GWR model is limited for certain reasons: First, the results of the model are very sensitive to the kernel type and bandwidth methods (Wheeler and Tiefelsdorf 2005; Wu and Qiu, 2011); second, nonlinear relations cannot be added to the model, and its inference does not occur in the model (Fotheringham et al. 2003).

The LSTs of Tehran (as the capital of Iran) and its neighboring cities have undergone changes in recent years due to the population growth, built-up areas, and changes in the vegetation. As a result of geographic, economic, and other characteristics, these changes are not similar in all regions, and the relationship between biophysical variables and surface temperature requires careful examination. Many investigations, including Bakar et al. (2016), Di Leo et al. (2016), Kikon et al. (2016), Liu and Zhang (2011), Yuan and Bauer (2007), Chen et al. (2006), Ogashawara and Bastos (2012), and Luo and Peng (2016), have discussed the relationship between surface temperature and biophysical variables. Some of these studies have built on the GWR model to study this relationship, including Ivajnšič et al. (2014), Zhao et al. (2018), and Zhou and Wang (2011), each of which has used one or more biophysical variables as the independent variable to examine the relationship with LST. Considering the fact that so far no comprehensive and comparative study has been carried out on Tehran and its satellite cities as far as the literature indicates, the present study intends to investigate the relation between biophysical variables and surface temperature in this region using the GWR model.

Materials and Methods

Studied Area

The scope of this research includes Tehran metropolis and its surrounding cities with geographic coordinates of 50° 57′ to 51° 36′ E and 35° 23′ to 35° 49′ N. The total area of the study location is 160,789.7 ha. The cities of this area include Tehran and parts of Ray, Shemiranat, Shahriar, Islamshahr, Robat Karim, and Karaj. Different climates have been formed in different areas of Tehran metropolis due to the special geographic location. Tehran metropolis has a moderate climate in mountainous regions and is semi-arid in the plains. Based on the Köppen-Geiger climate classification, the study area is classified as Bsk and Bwk. Figure 1 depicts the position and scope of the surveyed area.

Fig. 1
figure 1

a The map of Iran and location of the study area on the map b the location of Tehran metropolis and its satellite cities on study area

Data

Extraction and preparation of image data were performed via ETM + sensor of Landsat 7 during the years 2001–2015 and in June as the hottest month of the study area. These images were extracted from Route 164 and Row 35 of the USGS (https://www.usgs.gov). The spatial resolution of images was 30 m. Cloud cover was surveyed on the selected days at stations located in the study area. When the cloud cover was 1 percent in the satellite image and 1.8 in the ground stations, that day was considered as a clear sky. In this research, all the days of the study had clear sky conditions.

Methods

(A) Selection of the Days to Examine and Perform Corrections: In order to investigate the changes in vegetation in Tehran and its satellite cities, June was selected as the hottest month of the region, and satellite images were extracted on this basis. Subsequently, radiometric and geometric corrections were performed on them. Since the region was photographed every 16 days and some of the images taken in June were removed due to cloudiness, July and (if unavailable) August were inevitably taken as the second- and third-grade hot months. Initially, the atmospheric correction was carried out for the image pertaining to each day. The list of the studied days, together with the atmospheric correction parameters, is presented in Table 1. Given the difference between the downwelling and the upwelling values, i.e., the amount of absorbed radiation by the land surface, the greatest amount of radiation occurred in 2007, while the lowest was absorbed in 2001 and 2002.

Table 1 List of days reviewed by ETM + sensor of Landsat 7 and atmospheric correction parameters

(B) LST Computation: First, the brightness temperature was calculated by the following formula (data users’ handbook 2018):

$$L_{\lambda } = (L\hbox{max} - L\hbox{min} /Qcal\hbox{max} - Qcal\hbox{min} ) \times \left( {Qcal - Qcal\hbox{min} } \right) + L\hbox{min}$$
(1)

\(L_{\lambda }\) is spectral radiance at the sensor’s aperture in watts/(meter squared * ster * μm)

$$T_{{{\text{B}} = \frac{{K_{2} }}{{Ln\left( {\frac{{K_{1} }}{{L_{\lambda } }} + 1} \right)}}}}$$
(2)

The calculated brightness temperature is for the black body. Therefore, the amount of emissivity needs to be computed in order to convert the lighting temperature into the surface kinetic temperature, and the accuracy of the extracted LST depends on this computation. Accordingly, LST emissivity should be corrected (Farina 2012), and the outcome should be deducted from 273.15 to obtain LST in Celsius.

(C) Verification of LST Data: Landsat satellite images obtained from the temperature of the meteorological stations in the area were based on Taylor’s diagram with a mean of 0.8 for the ETM + sensor in June, indicating that the data were highly accurate.

(D) Evaluation of Surface Temperature Spatial Autocorrelation in the Studied Area: Moran’s I method is used to describe the dependence of spatial variables or spatial autocorrelations (Moran 1950). Moran spatial autocorrelation studies spatial autocorrelation based on the distribution of two variables and analyzes the geographic occurrence in the place (Griffith 1987). The spatial independence of the residuals is evaluated by Moran’s I correlation coefficient (Lin and Wen 2011) and is expressed by the relation (3):

$$I = \frac{N}{{S_{0} }}\frac{{\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{n} {w_{ij} (x_{i} - \bar{x})(x_{j} - \bar{x})} } }}{{\sum\nolimits_{i = 1}^{n} {(x_{i} - \bar{x})^{2} } }}$$
(3)

If the Moran index is close to + 1, the data have spatial autocorrelation and a cluster pattern; if it is close to − 1, the data are discrete and dispersed. In global Moran I, the zero hypothesis is that there is no spatial clustering between values with geographic features; the hypothesis can be ruled out when the p value is very small and the calculated Z-score is very large. If the Moran index is larger than zero, the data represent a spatial clustering. If Moran statistic is zero, it indicates that the data are random (Chu 2012).

Table 2 presents the output of the spatial autocorrelation analysis of global Moran. The Moran index for the period is greater than 0.69. Given the large Z-score (between 704 and 730) and because p value is small, the zero hypothesis holding the absence of correlation with LST is rejected. Based on the global Moran measure, it can be said that the changes in the LST of the hottest months of the year in the studied area follow the cluster pattern and hold a spatial pattern. If the surface temperature pattern was a random one, the value of the variance should be − 0.000009.

Table 2 Moran’s I spatial autocorrelation analysis output

(E) Calculation of Vegetation Indices and Built-up Areas. To analyze vegetation in the region, four indices were selected including NDVI, SAVI, NDBI, and NDBaI, and the following calculations were made (data users’ handbook 2018):

$${\text{Ref}} = \pi \times L\lambda \times d^{2} /{\text{ESUN}}_{\lambda } \times \cos \theta_{s}$$
(4)

where Ref denotes the reflection of the atmosphere.

Four indices of NDVI (Rouse et al. 1973), SAVI (Huete 1988), NDBI (Zha and Gao 2003), and NDBaI (Zhao and Chen 2005) were computed by the following formulae:

$${\text{NDVI}} = {\text{NIR}} - {\text{RED}}/{\text{NIR}} + {\text{RED}}$$
(5)
$${\text{SAVI}} = \frac{{{\text{NIR}} - {\text{RED}}}}{{{\text{NIR}} - {\text{RED}} + L}} \times \left( {L + 1} \right)$$
(6)
$${\text{NDBI}} = {\text{SWIR}}1 - {\text{NIR}}/{\text{SWIR}}1 + {\text{NIR}}$$
(7)
$${\text{NDBaI}} = {\text{SWIR}}1 - {\text{TIRS}}1/{\text{SWIR}}1 + {\text{TIRS}}1$$
(8)

(F) Mapping the Spatial Distribution Map for LST and the Four Biophysical Variables of NDVI, SAVI, NDBI, and NDBaI

(G) Modeling the Spatial Relationships of NDVI, SAVI, NDBI, and NDBaI Indices with LST Using the GWR Method: The GWR model extends the conventional global regression with one or more geographic parameters (Ahmadi et al. 2018b). The GWR model is written as follows:

$$y_{i} = \beta_{0} \left( {u_{i} ,v_{i} } \right) + \sum_{k} \beta_{k} \left( {u_{i} ,v_{i} } \right)x_{ik} + \varepsilon_{i}$$
(9)

In this equation, y is the dependent variable, xi is the independent variable, β0 and β1 are the estimated coefficients, ε is the error component, ui and vi are the latitude and longitude of the point i, and βk (ui, vi) is the implementation of the factor examined on a continuous level (Chu 2012; Fotheringham et al. 2015; Mondal et al. 2015; Ahmadi et al. 2018a, 2018b). In this research, the AICC criterion was used to select the appropriate bandwidth. The low AICC values indicate that the model is somewhat closer to the actual situation (Ahmadi et al. 2018b). Also, because the cells were distributed regularly and consistently in the region, the fixed kernel method was used instead of the adaptive kernel.

(H) Calculating the Weight Matrix of LST and Biophysical Variables: This is performed to select the maximum correlation coefficient between LST and a biophysical variable in order to remodel and for further analysis, the output of which is a map, diagram, and/or table.

(I) Modeling the Spatial Relationships of the NDBI Index and LST Using GWR model.

Results and Discussion

After the LST was extracted, its accuracy was verified. Subsequently, the presence of spatial relationships between LST cells in the studied area and the calculation of 4 indices including NDVI, SAVI, NDBI, and NDBaI, spatial distribution of surface temperature, and biophysical variables were investigated.

Spatial Distribution of LST and Biophysical Variables in the Region

The average LST in the statistical period of 2001–2015 is presented in Fig. 2. The minimum temperature was 27.44, and the maximum was 49.34 °C. The lowest temperatures during the warmest month of the year were in the northern part of Tehran and the counties of Shemiranat, east of Shahr-e Rey and west of Shahriar, and northern parts of Karaj. The maximum temperature was registered in the west of Tehran, the center and south of Shahr-e Rey, west of Islamshahr, and the center and west of Robat Karim. Figure 3 depicts the 15-year average map of four biophysical parameters of NDVI,Footnote 1 SAVI,Footnote 2 NDBI,Footnote 3 and NDBaI,Footnote 4 the reason for the decrease or increase in temperature in the region is well understood. In the maps of NDVI and SAVI, vegetation distributions corresponded to each other. On the other hand, upon juxtaposing the surface temperature, NDVI and SAVI maps, maximum vegetation density can be found in northern Tehran, eastern Shahr-eRey, west of Shahriar, and northern Karaj, where the minimum temperature is recorded. The highest temperatures are recorded in the west of Tehran, due to built-up areas and industries (NDBI map), in the south of Shahr-e Rey and west of Rabat Karim, due to the size of the bare areas (NDBaI map), and in the west of Islamshahr, where a combination of both prevail.

Fig. 2
figure 2

Distribution of ground temperature (Celsius) during the statistical period 2001–2015

Fig. 3
figure 3

Spatial distribution of four biophysical parameters including a NDVI, b SAVI, c NDBI, and d NDBaI during the statistical period (2001–2015)

Modeling Biophysical Variables with LST by GWR Model

In this study, the spatial variation of surface temperature is studied in relation to the four indices of NDVI, SAVI, NDBI, and NDBaI. Table 3 reports a summary of the results. The AICc values, which are used for bandwidth estimation and model accuracy estimates, were lowest in 2005, indicating that the model is closer to the real situation in this year. Because the cells were distributed regularly and consistently throughout the region, the fixed kernel method was employed. The maximum \(R^{2}\) and \(R^{2} \;{\text{adjusted}}\) values between the dependent variable (LST) and the independent variables (biophysical parameters) were, respectively, 0.73 and 0.71 in 2010, and the minimum values were 0.45 and 0.41, respectively, in 2013. The phenomenon code of the same day in 2013 indicates the presence of dust in the air that came from a location outside the station. This explains the turbulence of the air and the presence of pressure systems in the area on this day, reducing the effect of surface cover in the temperature. The \(R^{2} \;{\text{adjusted}}\) value is always a little lower than the multiple R-squared values. The \(R^{2} \;{\text{adjusted}}\) value is a more accurate measure of model performance. Accordingly, in 2010, the model could explain 71% of the change in the dependent variable. To calculate sigma, it was once considered as biased and once as unbiased. Biased values, which standardize the unit of the independent variables, are slightly smaller than the standardized state. In the biased state, the problem with the different units of the variables is resolved. The smallest value of the sigma was in 2005, and the highest value was in 2013.

Table 3 Results of modeling biophysical variables and LST by GWR model

Weight Matrix of LST with Biophysical Variables

The correlation coefficient between the surface temperature and biophysical variables is shown as a weight matrix in Table 4. The highest correlation coefficient exists between LST and NDBI (r = 0.67). This is due to the increased absorption of sunlight by built-up features. According to Liu and Zhang (2011), vegetation has a great influence on the reduction of surface temperature and heat island, although the positive relationship between the built-up areas and the heat island is stronger. On the other hand, thermal inertia is higher at impervious levels and increases the LST (Zhang et al. 2017). After NDBI, the highest correlation coefficient was between LST and NDVI, SAVI and NDBaI, respectively. Among biophysical variables, the highest correlation coefficient was between NDVI and SAVI. All the numbers at 0.01 level are statistically significant (Sig = 0.00).

Table 4 Weight matrix of LST and biophysical variables

Modeling LST with NDBI Index by GWR Model

Given the highest correlation coefficient between surface temperature and NDBI index in the region in the 15-year statistical period (2001–2015), this relation was modeled by GWR model (Table 5). Residual squares represent the sum of the squares of the remainders. The residuals denote the difference between the observed y and the predicted y. Figure 4 depicts a map of the predicted values of the temperature based on the NDBI (2001–2015). According to the map, the maximum predicted temperature for the region in the 15-year statistical period belongs to the west and northwest of Tehran, south of Shahr-e Rey, west of Robat Karim, and west of Islamshahr, which corresponds with what was observed in the LST maps. The minimum predicted temperature is 27.92, and the maximum temperature is 47.4 °C, where the maximum temperature is 2 degrees smaller than the observed temperature.

Table 5 Results of modeling LST with NDBI index using GWR model
Fig. 4
figure 4

The map of predicted temperature values according to NDBI (2001–2015)

Figure 5 displays the residual values that are calculated for each year in the 15-year statistical period. Minimum and maximum residual values, marked in blue and red, are intermingled because of the high fluctuations in the region and the various characteristics such as vegetation, built-up regions, and the bare areas located beside each other. In fact, the areas where the predicted values correspond and those that do not correspond to real values are beside each other because of the large regional variations. Of course, in southwestern and western regions, average to minimum residuals will be more significantly distinguished as a result of more homogeneous biophysical variables. In Fig. 6, the observed and predicted data are presented as a graph. The value of \(R^{2}\) = 0.86 indicates that the data are highly accurate and that the LST values are estimated vis-à-vis the NDBI index with great care. At a temperature of 40 degrees, the data are most aggregated, and there is more adjustment between observations and estimates. The effective number values, which are presented in the table, indicate the efficiency between the variance of the adjusted values and the bias in the estimation of the coefficients. The values of sigma and AICc were 1.251934 and 21,961.11, respectively. According to \(R^{2} \;{\text{adjusted}}\), this model can explain 82% of the variation of the dependent variable. In other words, 82% of the surface temperature change is expressed by the GWR model.

Fig. 5
figure 5

The map of residuals values

Fig. 6
figure 6

The relationship between observed and predicted data

Figure 7 displays the local \(R^{2}\) map of the area. In this figure, the highest coefficient of determination is observed in Shahriar and Islamshahr. In other words, in these two regions, the highest accuracy of the LST data is considered as a dependent variable with respect to the independent variable, i.e., the NDBI index. Tehran, especially in the southern half, has the lowest coefficient of determination. This can be due to the complexity of the features and surface biophysical variables that reduce the precision of estimation.

Fig. 7
figure 7

The map of local \(R^{2}\)

Concluding Remarks

The literature indicates that most researchers have investigated the relationship between biophysical variables and LST via the regression analysis method, while few research studies have used the GWR model in a comprehensive manner. As an example, the study conducted by Karimi et al. (2017) focused only on Tehran, a single variable (NDVI), and 1 day was used as the statistical sample and 1 day as the land use sample. They concluded that the built-up areas including industrial, military, and transport and road areas have the highest surface temperatures, which is in fact representative of the NDBI index and consistent with findings of the current research. In studies such as Chen et al. (2006), Liu and Zhang (2011), Ranagalage et al. (2017), Li et al. (2009), and Zeng et al. (2010), the relationship between LST and biophysical variables shows that the surface temperature is very strongly correlated with NDBI and strongly with NDVI indices.

Most of the research conducted in this regard encompasses few years and an urban area. In this research, Tehran and its satellite cities were selected for study in a time series (15 days in the years 2001–2015) with focus on four biophysical variables. The weak relationship of LST and biophysical variables indicate the impact of weather variables in a region such that the LST is less affected by the land cover. In cities with more homogeneous features on the earth, there will be strong associations between the LST and the biophysical variables. However, the different uses and features in one area undermine this relationship. For a more thorough investigation and more accurate identification of the ways LST and biophysical variables are connected, these relationships can be analyzed in different seasons.