Keywords

1 Introduction

Over the last two decades, air pollution has gotten a lot of attention because of its negative impacts on the global climate, declining air quality, and dramatically affecting human health [1,2,3,4]. Aerosols (also known as particulate matter) have significant direct and indirect radiative forcing effects due to their scattering and absorption properties, which are affected by changing atmospheric stability [5, 6]. Besides, several studies have found that PM2.5 (particulate matter with a diameter of less than 2.5 μm) exposure might result in serious health consequences such as cardiovascular disease, respiratory infections, morbidity, and mortality [7,8,9,10]. Air quality monitoring and regulation in industrial and fast-growing cities become increasingly important [11].

Ground-based PM2.5 is monitored by air quality monitoring stations, which give accurate and reliable PM2.5 data. On the other hand, ground-based stations are appropriate for assessing air pollution in the near vicinity, but they are often spatially constrained due to the high cost of the equipment and their operation [12, 13]. Furthermore, significant spatial and temporal variation of PM2.5 could not be adequately represented by just using point-level data to address the regional air pollution problem. Advanced satellite remote sensing technology may provide a viable answer for ground-level observations. Satellite sensors are more beneficial than ground stations when extending point-level observations across a larger scale, allowing researchers to more efficiently estimate and monitor ground-level PM2.5 [14,15,16]. For example, the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard the National Aeronautics and Space Administration’s (NASA) Earth Observing System (EOS), a polar-orbiting satellite, can measure daily aerosol optical depth (AOD) with extraordinary precision for aerosol investigations [17, 18]. Previous research utilized MODIS AOD to link ground-level PM concentrations for air quality [19,20,21,22].

In recent years, a lot of research has applied regression analysis to estimate ground-level PM2.5 concentrations from satellite-derived AOD [20, 22,23,24,25,26]. In general, these research show a high correlation between AOD and PM2.5 [24,25,26]. However, the correlation between them is primarily influenced by other factors such as meteorological and aerosol properties [21]. These studies are based on observations of PM2.5 and related parameters that can establish estimated functions [14, 21, 27]. Several methods have been used, such as: simple linear regression between AOD and PM2.5, multiple linear regression with more parameters (e.g., AOD, relative humidity, temperature…), artificial neural network (ANN) algorithms [19, 21, 25, 26, 28].

Vietnam is a developing country with high population growth especially in urban areas [29,30,31]. The development of industrial zones increases the emission of dust and pollutants into the surrounding environment. The development of urban increases the emission of dust and pollutants into the surrounding environment through activities such as cooking, biomass burning, and traffic activities. The process of building infrastructure is the driving force behind the exploitation of construction materials and is the cause of the increase in the amount of fine particulate matter released into the environment through mining and transportation [32, 33]. Currently, many provinces of Vietnam have open-pit mines, such as Quang Ninh, Binh Duong, Ha Nam. In particular, some mines are located near residential areas because of the convenience of the transportation system. The growth of open-pit mining and urbanization increases human exposure to fine particulate matter. The study effects of urbanization and open-pit mining on PM2.5 concentration will be a reliable document to develop policies in controlling emission activities and reducing air pollution in urban areas [34].

This study aims to evaluate the impact of urbanization and open-pit mining on PM2.5 concentrations using earth observation and ground-level data. Fine particulate concentrations are assessed on a large scale through spatial distribution maps. Monthly and yearly average maps of PM2.5 are created using regression models of PM2.5, AOD values, and meteorological factors. PM2.5 concentrations over different representing regions (i.e., urban, rural, open-pit mining) are extracted to evaluate the PM2.5 variation of each area over time. In addition, the population density and building density are also connected to PM2.5 concentration data for influence analysis.

2 Materials and Methods

2.1 Study Areas

Binh Duong is a province in the Southeast region of Vietnam with geographical coordinates 10°51′46″–11°30′ North latitude and 106°20′–106°58′ East longitude (Fig. 1). Binh Duong is located in the tropical monsoon climate [35]. The rainy season runs from May to November, while the dry season runs from December to April the next year. This is a developing province and tends to promote the development of industrial parks and attracted a large number of migrant workers. By 2019, the total population of Binh Duong province is nearly 2.5 million people [36]. The population is concentrated in the southern districts and tends to decrease in the North. Similar to other provinces of Vietnam with mining activity, Binh Duong has several large stone open-pit mining sites such as Thuong Tan, Tan My, Tan Dong Hiep. These mines are located near residential areas leading to air pollution from mining and transportation activities (Fig. 1).

Fig. 1
A map of Binh Duong province plots the ground-based stations and population density. The plots are located in central, eastern, and southeastern Binh Duong.

Location map of study region in Binh Duong province

2.2 Data Description

This study uses Corrected Optical Depth Land products from Terra (MOD04) and Aqua (MYD04) of MODIS Level-2 AOD products at a spatial resolution of 3 km. MODIS instruments onboard EOS Terra and Aqua are in a sun-synchronous polar orbit at an altitude of about 705 km, passing over the equator around 10:30 a.m. and 1:30 p.m. local time, respectively. Standard data products were downloaded in hierarchal data format (HDF) from the NASA website (https://ladsweb.nascom.nasa.gov/). There is a good agreement between Terra and Aqua MODIS AOD and AERONET observations in global validation studies, with a correlation coefficient of about 0.9 and little bias.

The study uses hourly PM2.5, Relative Humidity, and Temperature data at ground-based stations to establish regression models and validate PM2.5 estimation results from satellite images (Table 1). The selected stations are evenly distributed inside and surrounding the study area (Fig. 1). The data from May to December 2020 of stations Tan Phong B, Thanh Thai, Nguyen Van Nghi, An Binh, Le Hong Phong, and Tan Lien are measured by optical sensor PAS-OA318 developed by PAMair (the air quality network operating over 400 monitoring points across 63 cities and provinces of Vietnam) (Fig. 2a) [37]. Besides, the PMS5003 optical sensor mounted on the ground-based instrument was used to measure data at Tan My and Long Son stations (Fig. 2b). Both of these instruments use the light-scattering method to measure the PM2.5 concentrations.

Table 1 Description of data in the study
Fig. 2
Two illustrations, a and b, depict the sensors. a. A sensor on the roof. b. A sensor is fixed to the ground.

Ground-based station used in the study: a sensor PAS-OA318 developed by PAMair; b Ground-based station use PMS5003 sensor

Total suspended dust (TSP) refers to the sum of particles suspended in the air with an aerodynamic diameter of less than or equal to 100 microns [38]. The Binh Duong Center of Natural Resources and Environmental Technical–Monitoring under the Department of Natural Resources and Environment of Binh Duong province provides TSP ground-based station data including monthly average data of three stations representing three areas of different emission sources. The Phu Giao station is located in a sparsely populated area, far from industrial zones representing rural areas. Second, The Di An Center station represents an urban area located in the urban center area with a high population density, many vehicles of transport, and is influenced by several surrounding industrial activities. Finally, Tan Uyen Stone Mine station is located in the open-pit mining area, influenced by the process of mining and transporting stone representing the open-pit mining area.

The population density data provided by the General Statistics Office of Vietnam is the population density data of the districts of Binh Duong province in 2019 [39]. The Normalized Difference Built-Up Index (NDBI) indicates the built-up area density of the study area [40]. This index is calculated through satellite images Landsat 8-OLI, operated by the US Geological Survey with a spatial resolution of the electromagnetic spectrum channel (0.43–2.3 μm) is 30 m, the spatial resolution with the thermal infrared channel (10.6–12.5 μm) is 100 m. Satellite images of Binh Duong province are downloaded from the website https://earthexplorer.usgs.gov/.

2.3 Regression Model

Simple Linear Regression

In this study, a simple linear regression model is established between the independent variable MODIS AOD550 nm of the ground station and the dependent variable PM2.5 to understand that PM2.5 and AOD are stable in the range of certain space [17, 41]. This simple regression model is used for the months when there is no data to measure meteorological parameters in the study, namely from January 2020 to April 2020. The time the satellite cross over the position of the ground station PM2.5 is determined to coincide with a delay of ±30 min and the used AOD value is the average of 3 km around the PM2.5 station location [41, 42]. Therefore, the regression function can be given as follows [17, 40, 43, 44]:

$${\text{PM}}_{2.5} = \alpha + \beta \times {\text{AOD}}$$
(1)

where PM2.5 refers to fine particulate concentration (μg/m3); AOD is the aerosol optical depth; β is the slope and α is the intercept.

Multiple Linear Regression

Some studies have shown that meteorological variables influence the relationship between PM2.5 concentration and AOD [14, 16, 21]. Multiple Linear Regression explains a constant relationship between dependent and two or more independent variables [14]. The observed meteorological data are the important factors affecting fine particulate concentration. The hourly data of PM2.5, Relative Humidity, and Temperature parameters are matching with the AOD data with the same method as in the simple linear regression model. The study used PM2.5 data at six ground-based stations to match with the AOD satellite for regression model building. Then use two stations such as Le Hong Phong and Tan Lien to validate (Table 1). The used parameters in the multiple regression function are shown in the regression formula below [14, 39]:

$${\text{PM}}_{2.5} = \left( {\alpha + \varepsilon_{1} } \right) + \left( {\beta_{1} + \varepsilon_{2} } \right) \times {\text{AOD}} + \left( {\beta_{2} + \varepsilon_{3} } \right) \times {\text{Temp}} + \left( {\beta_{3} + \varepsilon_{4} } \right) \times {\text{RH}}$$
(2)

where Temp is the temperature (℃); RH is the relative humidity (%); α and β are the fixed coefficients; and ε is the random error.

2.4 PM2.5 Concentration Mapping

The study uses AOD, Relative Humidity, and Temperature maps as input values of the regression model to build a daily average PM2.5 map. First, the daily average AOD map is created after removing pixels affected by clouds. Next, average hourly relative humidity and temperature data at monitoring stations are converted to daily average data. This study uses the Kriging interpolation method to fill up empty pixels and interpolate mean daily temperatures and relative humidity factors. Finally, the monthly average PM2.5 and annual average PM2.5 maps are created by averaging the daily and monthly average PM2.5 maps, respectively.

2.5 Evaluating Indicator

This paper estimates the value of PM2.5 concentration using a regression algorithm. It concerns whether the predicted PM2.5 is the same as the observed PM2.5. Correlation coefficient (R) and root mean square error (RMSE) are chosen to evaluate the estimate accuracy. The values of R and RMSE are calculated based on the following formula:

$$RMSE = \sqrt{\frac{1}{n}} \sum\limits_{i = 1}^{n} {\left( {y_{i} - y^{\prime}_{i} } \right)^{2} }$$
(3)

where yi is the ith estimated PM2.5 value; yi is the ith observed PM2.5 value; n is the sample size.

$$R = \frac{{\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)} }}{{\sqrt {\sum\limits_{i = 1}^{n} {\left( {x_{i} - \overline{x}} \right)^{2} } \sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y}} \right)^{2} } } }}$$
(4)

where xi and yi are the ith sample points; and ȳ are the sample means; n is the sample size.

3 Results and Discussion

3.1 Regression Model of PM2.5 Estimation

Figure 3 illustrates the scatter plot of the simple linear relationship between MODIS AOD and PM2.5 over Binh Duong province in 2020. The values of PM2.5 used in the model range from 1.67 to 46.45 (µg/m3), while the AOD range from 0.18 to 1.62. There is a high correlation between MODIS AOD and fine particulate matter concentration with a correlation coefficient (R) equal to 0.806 when applying a simple linear regression model. Table 2 shows the regression functions and performance evaluation of both models. The multiple linear regression model that considers the influence of meteorological factors including relative humidity and temperature with 33 points model showed slightly better performance (R = 0.817) when compared to the simple regression model. The linear regression and multiple linear regression functions were PM2.5 = 22.306AOD550 + 8.390 and PM2.5 = 97.484 + 21.439AOD550 − 0.564RH-1.646 Temp, respectively.

Fig. 3
A graph plots the relationship between P M subscript 2.5 concentration and A O D. The trend of the line slopes upward, and the plots are clustered within (0.2, 25) and (1.2, 35).

Linear regression of the MODIS AOD and PM2.5

Table 2 Description of simple linear regression and multiple linear regression

The correlation coefficient between estimated and measured PM2.5 is 0.878, and the root means squared error (RMSE) is just 6.62 g/m3 (Fig. 4). The least-squares method is used to establish the linear trend with the highest fitness value among the distributed points in this study. This validation chart demonstrates that there is a good performance of estimating PM2.5 concentrations model. In the scatter plot of observed and predicted values, the points were dispersed around a 1:1 line. The fitting line of the measured–estimated PM2.5 scatter plot has a slope and intercept of 1.05 and 0.499, respectively. The results can be improved when overcoming the lack of data on meteorological variables such as wind speed, planetary boundary layer height, etc., or satellite AOD calibration with ground-based AOD before putting it into the model.

Fig. 4
A graph plots P M subscript 2.5 estimated versus P M subscript 2.5 observed. The lines slope upward, and the plots are scattered between (25, 30) and (45, 40).

Scatter plot for models fitting result

3.2 AOD and PM2.5 Concentration Variation

The average monthly AOD values over the Binh Duong province region in 2020 are demonstrated in Fig. 5. Average daily AOD values are calculated as the average of all pixels on the area, and the monthly average is then calculated by using the daily average values. In 2020, the average AOD value ranged from 0.105 (July) to 0.860 (March). The average AOD value in January is 0.118. From February to April, the AOD value increased sharply and tended to be higher than the rest of the months, with an average value greater than 0.5. From May to December, the mean values are all less than 0.5. In general, the AOD value in Binh Duong is high in the dry season months and lower in the rainy season months.

Fig. 5
A graph plots MODIS A O D subscript 550 nanometers versus month. The lines of minimum, maximum, and mean fluctuate approximately.

Average monthly MODIS AOD of Binh Duong province in 2020

The spatial distribution of monthly PM2.5 was generated from average daily PM2.5 maps of Binh Duong province in 2020 (Fig. 6). There is a high trend in values of PM2.5 in southern districts such as Di An, Thuan An, Tan Uyen, Thu Dau Mot. In addition, there is a slight increase in the Northern and Northwest counties in May. The average PM2.5 concentration of Binh Duong province shows a markedly high increase from February to April, with the highest value in March (23.655 µg/m3). In contrast, the months with low mean concentrations belong to November (11.850 µg/m3) and December (12.027 µg/m3). The results demonstrate a clear difference in the value of PM2.5 concentrations when comparing the rainy season and the dry season.

Fig. 6
Twelve maps plot the minimum, maximum, mean, and standard deviation of the P M 2.5 value statistics from January 2020 to December 2020. Each map plots the high and low values.figure 6

Spatial distribution of monthly average PM2.5 concentrations of Binh Duong province in 2020

The average annual value of the districts in Binh Duong province in 2020 is shown in Fig. 7. These values were compared with the average PM2.5 value with the 2005 standards of the World Health Organization (WHO) and National technical regulation in ambient air quality standard (QCVN) [45]. In general, all districts of Binh Duong province are below the threshold of PM2.5 value in the national standard. However, there is an excess of comparison with the standards set by the World Health Organization. There is a marked increase when comparing the average values of the southern districts such as Di An, Thuan An, Thu Dau Mot, Tan Uyen with the rest. These areas are mainly urban areas with a high population, industrial zones, and open-pit mines. Figure 7 shows that the highest mean annual PM2.5 value belongs to Di An and Thuan An districts (18.76 µg/m3), while the lowest is Phu Giao district (14.20 µg/m3).

Fig. 7
A bar graph with error lines plots P M subscript 2.5 concentration versus district. 2005 W H O standar remains constant at 10, and Q C V N 05 (2013) remains constant at 25. Di An and Thuan An have the maximum value 18.76.

Average annual PM2.5 concentration of districts of Binh Duong province in 2020

3.3 Evaluation of the Variation of TSP, PM2.5 Concentrations in Representative Areas

The monthly average PM2.5 values at the locations of three TSP ground-based stations are extracted to compare with the monthly mean of TSP observed. The selected TSP stations are located in three areas representing different sources of dust emissions. Phu Giao station (106.79° N; 11.30°E) is a station located far away from urban centers, traffic intersections, industrial production zones with good environmental quality. Di An station (106.76° N; 10.88°E) is an urban center area with a high population density and is influenced by neighboring industrial activities. Tan Uyen Station (106.89° N; 11.03°E) is located in the stone open-pit mining area, affected by quarrying and stone transportation. Figure 8 compares the monthly average value of PM2.5 concentrations at three locations of the TSP ground station. PM2.5 concentrations tend to be high in the dry season and lower in the rainy season. The urban areas affected by anthropogenic and industrial zones show the highest PM2.5 values in the three representative areas, while the open-pit mining areas show the lower values. The values of PM2.5 concentrations are roughly similar between urban areas and open-pit mining areas from March to May. The lowest PM2.5 concentrations in rural areas show that there are effects of open-pit mining and urban activities on the increase of PM2.5 concentrations.

Fig. 8
A graph plots P M concentration versus month. The lines between rural area P M 2.5, urban area P M 2.5, and open-pit mine area P M 2.5 fluctuate approximately.

Average monthly PM2.5 concentrations of representative areas

Figure 9 illustrates the monthly mean values of TSP, PM2.5, and PM2.5/TSP ratios in three representative regions. The results show that the TSP value has the same trend with high PM2.5 concentrations in the dry season and lower in the rainy season. TSP values of rural areas tend to be high from January to April, with the highest in April with TSP equal to 112.8 µg/m3 (Fig. 9a). In comparison, TSP concentrations tend to be lower from May to December. As a result, this region has the lowest concentration of TSP compared with the other two representative areas. This reason leads to some months of the year with the PM2.5/TSP ratio being as high as May (56.00%), July (74.23%), October (73.05%). The urban area reported high TSP concentrations from January to May, with the highest concentration in March (161.3 g/m3). Then the concentration decreased between June and September with a threshold of less than 50.0 µg/m3. Finally, there was a sudden increase in October (84.8 µg/m3) and a gradual decrease in the last two months of the year (Fig. 9b). PM2.5/TSP ratio in this area tends to be lower than in rural areas. The concentration of TSP in the open-pit mining area is superior to high in urban and rural areas. In the first five months of the year, the TSP level of this region increased sharply, with all values exceeding 200 µg/m3, of which the highest was in March (614 µg/m3). In this period, the PM2.5/TSP ratio has a low value in this region, with all months being less than 10% (Fig. 9c). Overall, the PM2.5/TSP ratio in the open-pit mining areas has the lowest value among the three regions. This percentage increases gradually in the last months of the year and reaches the highest level in December (40.59%). The results demonstrated that the primary emission of open-pit mining is coarse particles.

Fig. 9
Three graphs, a to c, plot P M concentration versus month. The graph plots the value of T S P, P M 2.5, and ratio for rural, urban, and open-pit mine areas.

Monthly mean values of TSP, PM2.5, and PM2.5/TSP ratios of representative a rural areas; b urban areas; c open-pit mining areas

3.4 Effect of Built-Up Density and Population Density on PM2.5 Concentration

In Vietnam, urbanization tends to cause an increase in built-up density and population density. Figure 10 illustrates the scatter plot of the relationship between PM2.5 concentrations with NDBI index and population density in the districts of Binh Duong city in 2020. There is a good correlation between PM2.5 and NDBI index and population density in Binh Duong province, with R = 0.875 and R = 0.950, respectively. The results demonstrate that the increase in built-up density and population density contributes to the increase in PM2.5 concentrations.

Fig. 10
Two graphs, a and b, plot N D B I versus P M subscript 2.5 concentration and population density versus P M subscript 2.5 concentration. The lines slope upward.

Correlation between PM2.5 concentrations and a NDBI index; b population density in the districts of Binh Duong city

4 Conclusions

The study established a model to estimate the concentration of PM2.5 in a case study in Binh Duong province by using satellite image data and ground-based data. The results show that the simple linear regression model and the multiple linear regression model show good evaluation results with R = 0.806 and R = 0.817, respectively. The daily average AOD, Relative Humidity, and Temperature maps were used to calculate the daily average PM2.5 spatial distribution using both regression models. The monthly average PM2.5 concentration results showed a high increase in PM2.5 concentrations in the dry season and low in the rainy season months in Binh Duong province. The spatial distribution of PM2.5 concentration values in the southern districts tends to be higher than in the rest of the regions caused by urban development, industrial development, and mining activities. The study also demonstrated that open-pit mining areas provide a large proportion of coarse-particle composition while urban areas have a high proportion of fine particles. The study also demonstrated that increasing building density and increasing population density cause an increase in PM2.5 concentrations. The study has the potential to be applied to areas with open-pit mining operations and high population density. However, there will be limits to areas with few ground PM2.5 stations.