Introduction

These days, air pollution is known as one of the main problems in urban areas. Outdoor air pollution is mainly derived from urban growth, expansion of industrial activities, and uncontrolled consumption of fossil fuels (Johansson et al. 2008; Shi et al. 2014; WHO 2008). At the first stage, this leads to impact on citizens’ respiratory illness and to increase intensity of heart and lung diseases (Fan et al. 2016; Hwang et al. 2017; Yang et al. 2017) and, at the second stage, this plays an important role as a parameter in aggravation of climate changes, climatic fluctuations, and environmental impact (Ren et al. 2007). Hence, air pollution always has been as a major environmental issue by/for the experts and urban planners (White and Engelen 2000). Tehran is among the cities which are subject to severe air pollution and facing unhealthy days more than a third of the year (Rowshan et al. 2009).

In recent years, one of the greatest threats for Tehran has been particulate matters less than 2.5 microns, which causes most unhealthy days (Brajer et al. 2012). High concentration of PM2.5 is a serious threat to the environment and human health (Luo et al. 2015; Miri et al. 2017; WHO 2016) Therefore, the ability of understanding the relevant parameters on PM2.5 concentration becomes an increasing key prerequisite to take effective steps to reduction and prevention of aerosol pollution. Parameters like traffic, industries, and land use have a relationship with air pollution in a constant form during the year. Climate and vegetation cover have changing relationship with air pollution (Fisher 2002; Makra et al. 2011; Pcu and Bosiacka 2011). The climate system is a complex, dynamic system, and the elements of this system are mutually influenced by PM2.5 concentration (Chen et al. 2015; Kaufman et al. 2002). Therefore, exact analysis of various air pollutants and evaluation of environmental and meteorological parameters such as wind speed, wind direction, relative humidity, and air temperature are essential (Khokhar et al. 2017). As shown by previous studies, meteorological conditions can largely diffuse, dilute, and accumulate pollutants (Pohjola et al. 2002); thus, PM2.5 mass concentration is mainly due to meteorological condition (Tai et al. 2010). Yang et al. (2011) concluded that meteorological conditions can, at least, make a contribution of 16% to the reduction of PM2.5 mass concentration.

Since the access to ground measurements of related parameters to air pollution faced many limitations, satellite data can be used in areas where ground measurements are not available (Gupta et al. 2006). Potentials definitely exist in using remote sensing information for the validation of emission inventories and for a better understanding of the atmospheric processes controlling air pollution episodes. In addition, remote sensing can complement ground monitoring data when performing assessments of air pollution levels (Veefkind et al. 2007). The idea of using satellite imagery and use of analyst space of different sciences such as geographic information system (GIS) on various topics related to air pollution has spread in recent years. Impacts of different parameters differs basis on the spatial changes (Wu et al. 2017). In recent years, a relatively simple, but effective, new technique for exploring spatially varying relationships, called Geographically Weighted Regression (GWR), has been developed (Brunsdon et al. 2001; Fotheringham et al. 2001). Definitely, for analysis of spatial relationships of meteorological and environmental parameters with air pollution, GWR is more efficient than previous methods such as OLS (one of global spatial models).

In the field of particulate matters, especially PM2.5, many studies predicted and estimated the PM2.5 concentration using different models and methods. Up to date, some previous studies predicted surface PM2.5 concentration by establishing the direct relationship between PM2.5 and aerosol optical depth (Hu 2009; Schaap et al. 2009), while others estimated ground-level PM2.5 concentrations using satellite aerosol optical depth in conjunction with diverse variables and fields (Liu et al. 2009; Parkinson 2003). In general, the aims of these studies were to recognize the meteorological and land use variables as effective predators of PM2.5 and improve the model predictability using those variables (Lin et al. 2015; Liu et al. 2007; Liu et al. 2005; Tian and Chen 2010). The result of Chen et al. (2017a) proved that the higher PM2.5 concentration, the stronger influences meteorological factors exert on PM2.5 concentration. In another study of Chen et al. (2017b), at the national scale, temperature, humidity, wind, and air pressure exert stronger influences on PM2.5 concentrations than other meteorological factors.

In recent years, the application of satellite remote sensing to air quality research, especially the application of aerosol optical depth (AOD) has been greatly promoted (Hoff and Christopher 2009; Martin 2008). But, little researches have been done on the use of another products of remote sensing like NDVI and LST and evaluated them as effective parameters on PM2.5 concentration. So far, many models have developed in researches related to PM2.5 concentration, but a few studies have used geographically weighted regression to estimate or investigate the effective variables on it (Hu et al. 2013; Lin et al. 2013; Lin et al. 2015; Luo et al. 2017). This study is aimed to investigate the impact of meteorological and environmental parameters on PM2.5 concentration in winter and summer seasons using GWR method. For this purpose, for the first time, satellite-derived products (NDVI and LST products of MODIS sensor) as related parameters with air pollution and meteorological parameters were used. To the best of our knowledge, so far, limited attention was paid to explore the seasonal impacts of the parameters on PM2.5 concentration, and in this research, we have used the GWR model to compare seasonal impacts of meteorological–environmental parameters on PM2.5 concentration using satellite-derived products (LST, NDVI) for the first time.

Material and methods

Study area

The present study was conducted in Tehran, the capital city of Iran, which is located in the north of the central plateau of Iran within the longitudes of 51° to 51° 40′ E and the latitudes of 35° 30′ to 35° 51′ N. Tehran is a mountainside city with an altitude of 900 to 1700 m above sea level. Its urban area spreads entirely over the Iranian plateau, on the southern slopes of a very high and dense mountain barrier, with a peak of 3933 m, which is 2200 m higher than the city’s residential areas. According to the latest Iranian Population and Housing Census in 2011 by the Statistical Center of Iran, Tehran, with a population of 8.1 Million people, is still ranked as most populous city in Iran with a very distinct demographic difference than other cities (Statistical Center of Iran, 2011, https://www.amar.org.ir/Portals/1/Iran/Atlas_Census_2011.pdf). Tehran is rated as one of the world’s most polluted cities and suffers from severe air pollution. Tehran is divided into 22 urban districts, and based on the availability of meteorological and PM2.5 concentration stations, this study was conducted in district nos. 2, 3, 5, 6, 7, 9, 10, 11, and 12. The location of study area is shown in Fig. 1.

Fig. 1
figure 1

The location of study area and air quality stations

Data and methodology

In the present study, different environmental data from meteorological and air quality stations and satellite imagery were used. The summary of these data is given as follows:

  • The seasonal average of meteorological parameters (humidity, pressure, precipitation, wind speed, and air temperature calculated from daily data of Meteorological Organization for both winter (2014–2015) and summer (2015) separately.

  • Ground-based PM2.5 concentration data from seasonal average of daily PM2.5 concentration data derived from the air quality station network of environmental protection organization and air quality control center of Tehran.

  • Satellite images from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor on board the Terra and Aqua satellites (Parkinson 2003): (a) in 36 spectral bands ranging in wavelength from 0.4 μm to 14.4 as 16-day normalized difference vegetation index (NDVI, MOD13A1) to represent the vegetation coverage and (b) land surface temperature (LST, MOD11A1) both achieved at NASA (available at https://ladsweb.modaps.eosdis.nasa.gov), with both being level-1A MODIS/Terra(EOS AM-1) products with spatial resolution of 1 km. the times coinciding the station datasets.

As mentioned above, only the ground measurements which had closed times to the over pass of the MODIS/Terra satellite were used in this study. A summary of variable used in modeling is given in Table 1.

Table 1 Summary of variables used in modeling analysis

Due to the low number of stations with enough information and lack of adequate coverage by stations in the whole city, and on the other hand because its data collection system is data point, it is necessary to construct new data points within the range of a discrete set of known data points. Generally, interpolation can be used to predict unknown values for any geographic point data, and it predicts values for cells in a raster from a limited number of sample data points. So, the seasonal average of ground measurements was extended to the whole city using IDW interpolation, one of the weighted average methods which estimates cell values by averaging the values of sample data points in the neighborhood of each processing cell and, unlike, e.g., the kriging method, it does not follow the assumptions about the relationship between the spatial data. It only relies on the assumption that the closer a point is to the center of the cell being estimated, the more influence or weight it has in the averaging process.

Satellite images were georeferenced and averaged for both winter (2014–2015) and summer (2015) seasons. Spatial distributions of PM2.5 and the chosen independent variables are shown in Figs. 2 and 3.

Fig. 2
figure 2

Spatial distributions of variables in winter

Fig. 3
figure 3

Spatial distributions of variables in summer

To assess the impact of meteorological and environmental parameters on PM2.5 concentration, firstly with the help of OLS, multivariate correlation analysis was carried out between latent meteorological and environmental parameters and PM2.5 concentrations to identify the decisive parameters of PM2.5 concentrations before GWR. Actually, variables that had a significant relationship with the PM2.5 were defined. For this purpose, all variables were entered into the model by using ArcGIS 10.4.1. Non-significant variables were removed step by step. For example, in the first performance of the model for winter season, pressure and then moisture variables were removed from the model. So, this procedure was continued until all remaining variables were statistically significant. Then, using geographically weighting regression model, the relationship between meteorological and environmental variables with particulate matter less than 2.5 μm were evaluated. Eventually, comparison was done between seasons. The methodology is summarized in the following flowchart (Fig. 4).

Fig. 4
figure 4

The flowchart of methodology

OLS model

OLS is a global regression method that it can be implemented in ARC GIS and is described using Eq. (1):

$$ Y={\beta}_0+{\beta}_1{X}_1+{\beta}_2{X}_2+\dots +{\beta}_n{X}_n+\upvarepsilon $$
(1)

where Y is the dependent variable, X explanatory variables, β the coefficients of independent variables in describing the dependent variable and the random error with expectation 0 and variance σ2 (Stone and Brooks 1990). One of the most important parameters in this model, the variance inflation factor value of regression variables (VIF), should be no more than 7.5, which ensures there is no multicollinearity and redundant independent variables in the regression model. If the VIF value is greater than 7.5, it means that two or more variables are similar to each other. VIF can measure how much the estimated variance of a coefficient is increased by local collinearity (Wheeler and Páez 2010; Wheeler and Tiefelsdorf 2005). In addition to VIF, there are some statistical definitions in OLS model such as multiple R-squared and adjusted R-squared values that both are measures of model performance. Possible values range from 0.0 to 1.0. Both the joint F statistic and joint Wald statistic are measures of overall model statistical significance. The Koenker (BP) statistic (Koenker’s studentized Breusch-Pagan statistic) is a test to determine if the explanatory variables in the model have a consistent relationship to the dependent variable (what you are trying to predict/understand) both in geographic space and in data space. The Jarque-Bera statistic indicates whether or not the residuals (the observed/known dependent variable values minus the predicted/estimated values) are normally distributed.

GWR model

To constrain the spatial variability, the GWR model has been adopted to examine the relationship between meteorological and environmental parameters and PM2.5 concentration on the different parts of the study area. The GWR model proposed by Brunsdon et al. (1996) can be used to estimate the parameters. The GWR is an extension of traditional standard regression techniques such as OLS because it allows local rather than global parameter estimates (Fotheringham et al. 2001). Different from global models like OLS, the GWR model stands on local statistic, and it considers the effects from spatial variations of meteorological and geographical variables on the estimation of PM2.5 concentration. This method estimates model parameters at each geographical location by using the weighting function of exponential distance decay. The weighting function, called the kernel function, can be stated using the exponential distance decay form:

$$ {w}_{ij}=\exp \left(\frac{d_{ij}^2}{b^2}\right) $$
(2)

where Wij represents the weight of observation j for location i, dij expresses the Euclidean distance between points i and j, and b is the kernel bandwidth. In addition, the observations are weighted by distance, so those closer to the studied location have more influence on the parameter estimates. These estimates are showing how a relationship varies over space. This procedure can help to examine the spatial pattern of the local estimates and to get some understanding of hidden possible causes of the respective patterns (Fotheringham et al. 2003). The basic GWR equation is:

$$ y\left(u,v\right)={b}_0\left(u,v\right)+{b}_1\left(u,v\right){x}_1+e\left(u,v\right) $$
(3)

where y is the dependent variable with a Gaussian distribution; x is the independent variable; u and v are the coordinates of the data; b0 is the intercept term; b1 is the coefficient being estimated; and e is the random error term (See et al. 2015). The point is that GWR can not only test the spatial non-stationarity of the input variable, but it can also be used to estimate the outcome variable (Wheeler 2014).

In GWR model bandwidth is the number of neighbors used for each local estimation. Residual squares is the some of the squared residuals in the model. Residual squares is the sum of the squared residuals in the model (the residual being the difference between an observed y value and its estimated value returned by the GWR model). AICc is a measure of model performance and is helpful for comparing different regression models. Taking into account model complexity, the model with the lower AICc value provides a better fit to the observed data.

Results and discussion

Results of OLS model

The first performance of the OLS model showed that some variables do not have a significant relationship with the dependent variable in less than 0.01 significance level. Non-significant variables were removed step by step. Results of the OLS method after removing meaningless variables are shown in Table 2.

Table 2 Summary of global OLS linear regression model results

VIF was used to detect whether collinearity problems existed among the variables. By checking the amount of VIF, it was found that some variables have redundancy. Data redundancy was deleted by removing humidity variable. This point indicates that humidity and precipitation are very similar to each other. Also, in summer, first pressure and then LST were dropped from the model. Output results of VIF showed that data redundancy is due to precipitation and humidity variables. Therefore, in order to resolve data redundancy, one of the variables was eliminated. The negative relationship between NDVI index and wind speed with the dependent variable (PM2.5 concentration) indicates that these two variables play an important role in reducing air pollution in Tehran. The results of OLS model also confirm that NDVI and LST have an inverse correlation with each other. This issue is more pronounced in summer due to the NDVI and LST levels. Amount of adjusted R-squared indicated that 0.41 (winter) and 0.39 (summer) of the variation in PM2.5 can be explained by meteorological and environmental parameters. The significant joint F-statistic and joint Wald statistic both indicated that there is a significant linear relationship between the dependent variable and the independent variables. Because the Jarque-Bera statistic was not significant, the model is considered unbiased, and all the key variables were included in the model.

Both the multiple R-squared and adjusted R-squared values are measures of model performance. Possible values range from 0.0 to 1.0. Both the joint F-statistic and joint Wald statistic are measures of overall model statistical significance. The Koenker (BP) statistic (Koenker’s studentized Breusch-Pagan statistic) is a test to determine if the explanatory variables in the model have a consistent relationship to the dependent variable (what you are trying to predict/understand) both in geographic space and in data space. The Jarque-Bera statistic indicates whether or not the residuals (the observed/known dependent variable values minus the predicted/estimated values) are normally distributed. The variance inflation factor (VIF) measures redundancy among explanatory variables. As a rule of thumb, explanatory variables associated with VIF values larger than about 7.5 should be removed (one by one) from the regression model.

Results of GWR model

According to the OLS model, meaningless variables were removed and data redundancy was eliminated. Finally, LST, NDVI, precipitation, air temperature, and wind speed variables for winter and NDVI, temperature, precipitation, and wind speed variables for summer entered into GWR model. A list of explanatory variables considered for inclusion in GWR with their descriptive statistics is given in Table 3.

Table 3 Results of GWR model output

The GWR model is a significant improvement of the OLS model which is more conspicuous in winter (adjusted R2 = 0.71) compared to summer (adjusted R2 = 0.41). Also, AICc value suggests that the local model was a significant improvement. In general, both the OLS and GWR methods show that meteorological and environmental variables in winter have a much stronger relationship with PM2.5 concentration than in summer. The summary of GWR model output is shown in Table 4. Also, Figs. 3 and 4 indicate the local coefficients for each of the meteorological and environmental variables for two hot and cold seasons in GWR model.

Table 4 Summary of GWR model output

According to the result of various measurement campaigns (Chen et al. 2017a; Hu et al. 2013; Jiang and Weiwei 2016), local methods like GWR estimates PM2.5 concentration much better and more accurate than the global methods, which is confirmed by our results. In agreement with the result of an investigation done in China (Chen et al. 2017b), we found that the influence of meteorological–environmental parameters in the cold season is much stronger than the warm season. The results of the GWR method also show that local variations of coefficients for all parameters and R2 are very low in summer, which probably is due to larger spatial variations in the cold season caused by more stable atmospheric stratifications, lower mixing layer heights, and lower wind speeds prevailing in winter.

Actually, the local coefficient indicates the degree of correlation between the dependent and independent variable, which can be positive or negative. Figures 5a and 6b show local coefficients for air temperature in winter and summer, respectively. By surveying this parameter, it can be concluded that air temperature has a positive correlation with PM2.5 concentration in both seasons. But, the pattern of local coefficient is different in two seasons. Also, in comparison with winter, spatial variations are low and the area is almost uniform. As it can be seen, the correlation between PM2.5 and air temperature is much stronger in summer. This is because air temperature can affect the formation of particles; thus, the high air temperature can promote the photochemical reaction between precursors (Wang and Ogawa 2015). In big and crowded cities like Tehran, solar radiation affect the spatial variations in air temperature less than intra-urban human activities like industrial activities, traffic, and population density, as a result we can say although the increase in air temperature leads to an increase in albedo, it cannot have a significant impact on spatial air temperature variations. Actually, the spatial distribution of air temperatures in urban areas is quite complex, and most of the factors that modify energy balance and air temperature conditions in cities arise out of the highly different thermal properties of the urban environment relative to its natural surroundings (Dobrovolný and Krahula 2015; Haddad and Aouachria 2015).

Fig. 5
figure 5

Spatial distribution of local coefficients in GWR model (winter)

Fig. 6
figure 6

Spatial distribution of local coefficients in GWR model (summer)

According to Fig. 5b, LST has a positive correlation in the western part of the region, and there is a negative correlation with PM2.5 concentration in the east of the region in winter. In fact, by moving from west to east, the positive correlation becomes non-correlation and then negative correlation. Since the air temperature has a positive effect on PM2.5 concentration and LST is influenced by many parameters such as air temperature (Khandelwal et al. 2017), the necessary condition for a positive relationship between LST and PM2.5 is a positive correlation between LST and air temperature. But, unlike the western part, the relation between LST and air temperature is not positive; the reason could be that the vegetation cover is weak, so generally, for slight vegetated areas, the day LST is higher than air temperature (Chan et al. 2017).

The vegetation coverage described as normalized difference vegetation index (NDVI) is known to have some impact on the spatial patterns of PM2.5 (Jiang and Weiwei 2016). Although the effect of vegetation on urban air quality is not yet fully understood (Escobedo et al. 2011; Tiwary et al. 2009), Fig. 5c shows that in most of the study area, there is a negative relationship between NDVI parameter and PM2.5 concentration in winter. Only a very small part of the southeastern of the study area has a positive correlation. In general, by moving away from the city center, as for the geography of the urban heat island that generally increases from the urban outskirts towards the city center, the relation between NDVI and PM2.5 becomes negative. This correlation indicates that this parameter has a controlling effect on Tehran air pollution. Considering that the highest air pollution in Tehran is observed in winter, improvement of urban vegetation can be a viable strategy to help reduce urban air pollution. Previous studies have shown that vegetation can mitigate particulate air pollution through a number of mechanisms, such as intercepting and accumulating atmospheric particles through leaf pubescence and stomata (Chen et al. 2016; Irga et al. 2015). But, as it can be seen in Fig. 6a, NDVI and PM2.5 behave differently in summer, and they have a weak positive correlation with low spatial variations in the whole Tehran city. The weak positive relationship between NDVI and PM2.5 in summer can be attributed to the inverse relationship between NDVI and LST. Hence, LST is influenced by the air temperature and both have higher values in summer than in winter, so NDVI has less important role in describing PM2.5 in summer. Actually, the impact of NDVI is overcome by the influence of land surface temperature and air temperature. It has shown that air temperature, precipitation, and other meteorological parameters can preferably explain the relationship between concentrations of particulate matter and meteorological parameters (Tai et al. 2010).

As seen in Fig. 5e, there is a negative correlation between the wind speed and PM2.5 concentration in the north-eastern part of the case study, which increases up positive correlations in the south-western part. According to Fig. 6d, unlike winter, variations in local coefficients are really low for wind speed in summer with slightly decreasing but always positive local coefficients from the west to the east. Also, the correlation is low and positive. According to the result of wind speed, various impact of wind speed on PM2.5 concentration was observed in both seasons as well. It is noticeable that wind speed has a negative effect on PM2.5 concentration in winter; it means that higher wind speed is conducive to the diffusion of PM2.5, which results in lower concentrations of PM2.5 (Luo et al. 2017). The point is that the influence of wind speed on PM2.5 concentrations in winter is more obvious than in summer, because the correlation is positive with low spatial variation in summer. These results may be attributed to the local pollution sources in Tehran urban area as dominant source for PM2.5 pollution and less external sources, as the actual pollutant concentration is the integrated effect of external and local sources (Pu et al. 2011). Results of GWR model output about precipitation and PM2.5 concentration in winter and summer are shown in Figs. 5(c) and 6(d). Precipitation has a positive correlation with PM2.5 concentration in most parts of the study area. This result contradicts previous researches. Previous studies show that precipitation can effectively remove atmospheric particulate matter (Li et al. 2015; Lin et al. 2015; Luo et al. 2017; Tai et al. 2010; Wang and Ogawa 2015). Hence, further, precipitation data was investigated to find out complementary explanation. The relationship between PM2.5 concentration and precipitation more than 2 mm is shown in Fig. 7. For further analysis, in addition to days with precipitation more than 2 mm, 1 day before each precipitation is also considered.

Fig. 7
figure 7

The relationship between PM2.5 concentration and precipitation more than 2 mm (considering the data of 1 day before precipitation)

It was found that the precipitation parameter has special condition in Tehran. The number of rainy days, especially in summer, is low and most precipitations are moderate. When classifying the precipitation data into several classes with regard to the amount of the precipitation and analyzing the impact of each class on the PM2.5 concentration, it was found that the days with low precipitation (less than 2 mm) not only affect the decrease of PM2.5 concentration, but also increase air pollution due to increasing traffic, while in case of higher precipitation (more than 2 mm), PM2.5 concentration decreases.

The local R2 from the GWR ranges between 0.05 and 0.73 for winter and between 0.37 and 0.43 for summer (Fig. 8a, b). The highest R2 in winter can be seen in the northeast and the lowest in the central part. In comparison with winter, we see fewer changes in the local distribution of local R2. The standard residual values show quite the same pattern in both seasons (Fig. 9). The lowest and the highest of standard residual values are seen in the northwest, some parts of the southwest, and northeast, respectively.

Fig. 8
figure 8

Spatial distribution of local R2

Fig. 9
figure 9

Spatial distribution of standard residuals of the GWR model

However, there exist data gaps in both PM2.5 measurements and meteorological parameters that certainly make some limitations for researches. More accurate parameters and variables should be identified to improve new models and methods to gain more accurate examinations and results. Also, the important point in the study of urban problems such as air pollution, use of various data over the years to examine changes over time, and identification of effective and influential parameters also plays an important role.

Conclusion

In the presented study, the impact assessment of meteorological and environmental parameters on PM2.5 concentrations in winter and summer was studied. For this purpose, the geographically weighted regression (GWR) model was adopted to explore the impact of environmental parameters on PM2.5 concentrations in Tehran city. In comparison with the OLS method, the GWR model showed a higher ability to analyze the relationships between independent and dependent variables. The study indicated that meteorological and environmental parameters in winter have a much higher ability than in summer to describe PM2.5 concentration. Compared to winter season, spatial variation of the local coefficients was lower. Considering the negative correlation between NDVI and PM2.5 concentration in winter, it is recommended to increase urban vegetation in order to reduce Tehran’s air pollution and especially PM2.5 concentration. Air temperature showed the highest correlation with PM2.5 concentration in summer. Positive correlation of air temperature in both seasons as well as LST parameter in some areas shows the need for more attention to source-related air pollution control such as reduction of emissions from traffic or air conditioning in summer. However, there is no other way to reduce air pollution without severe control especially at hot season since air temperature cannot be modified. Also, there is an urgent need of an extensive investigation on the emission sources and that very certainly most effective will be the replacement of fossil energy sources by electric power from sustainable sources especially in transportation. We would suggest some directions for future research: (1) more air pollutants such as PM10 and ozone should be evaluated and be applied to the potential of remote sensing technology to get a more extensive picture of source-receptor relationships in highly polluted areas like in and around the City of Tehran. (2) Evaluating the capability of different satellite imagery in association with the topics related to air pollution and air quality. (3) Impact assessment of various meteorological and environmental parameters on primary and secondary pollutants. It also needs a larger time span and higher data density in the further research of the next step.