Abstract
The levels of air pollution in Macao often exceeded the levels recommended by WHO. In order for the population to take precautionary measures and avoid further health risks under high pollutant exposure, it is important to develop a reliable air quality forecast. Statistical models based on linear multiple regression (MR) and classification and regression trees (CART) analysis were developed successfully, for Macao, to predict the next day concentrations of NO2, PM10, PM2.5, and O3. All the developed models were statistically significantly valid with a 95% confidence level with high coefficients of determination (from 0.78 to 0.93) for all pollutants. The models utilized meteorological and air quality variables based on 5 years of historical data, from 2013 to 2017. Data from 2013 to 2016 were used to develop the statistical models and data from 2017 was used for validation purposes. A wide range of meteorological and air quality variables was identified, and only some were selected as significant independent variables. Meteorological variables were selected from an extensive list of variables, including geopotential height, relative humidity, atmospheric stability, and air temperature at different vertical levels. Air quality variables translate the resilience of the recent past concentrations of each pollutant and usually are maximum and/or the average of latest 24-h levels. The models were applied in forecasting the next day average daily concentrations for NO2 and PM and maximum hourly O3 levels for five air quality monitoring stations. The results are expected to be an operational air quality forecast for Macao.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Seven million people die every year from the effects of air pollution. More than 90% of such deaths are in developing countries (WHO 2019). Across southern Asia, levels of fine particulate matter (PM2.5) and surface ozone (O3) exceed the World Health Organization (WHO) limits for much of the year (Kumar et al. 2018). Macao is located in Southern China, in the Pearl River Delta (PRD) region. The levels of nitrogen dioxide (NO2), particulate matter (PM), particulate matter with an average aerodynamic diameter below 10 μm and 2.5 μm (PM10 and PM2.5, respectively), and ozone (O3) in Macao are high and often exceed the established limit values recommended by WHO’s air quality guidelines (AQG). Since 2010, the worst air quality index classes in Macao have been due to PM10 and PM2.5 (SMG 2019). Macao was listed as the number one most densely populated region in the world (Sheng and Tang 2013), with a population density of about 20,000 inhabitants/km2. A significant proportion of Macao urban population is being exposed to air pollutant concentrations above the limit or target values.
The exposure to air pollutants such as NO2, PM, and O3 increase the chance of hospital admissions for cardiovascular and respiratory disease and mortality in the world (Liu and Peng 2018; WHO 2018). O3 at the ground level is associated with numerous harmful effects on respiratory health, at levels commonly found in urban areas throughout the world, contributing to morbidity and hospital admissions related to respiratory disease, even at low ambient levels (Entwistle et al. 2019). Regarding particulate matter, for human health, small particles (PM2.5) are particularly dangerous as they can penetrate deeply into the lungs and be transported directly into the bloodstream (Wiśniewska et al. 2019). Furthermore, mixtures of NO2-PM2.5-O3 exist in ambient environments, being the combinations of these pollutants more harmful to human health (a mixture with relatively low levels of some pollutants combined with relatively high levels of other pollutants was found to be equally or more harmful than a mixture with high levels of all pollutants) (Liu and Peng 2018). In Macao, traffic-related pollution is high, primarily due to high vehicle emissions and urban canyon topology (He et al. 2000).
In this context, it is relevant to develop a reliable methodology to forecast the concentration of air pollutants, which can provide an alert for health hazards in advance, in a way that the population can take precautionary actions to avoid exposure.
Recent studies have been conducted to access meteorological influence on air quality (Tong et al. 2018a, b; Xie et al. 2019), and related to air quality forecast (Lee et al. 2017; Deng et al. 2018), both in PRD region. The current paper focuses the development of air quality forecast models by statistical methods for the most critical air pollutants in Macao.
The methods for the prediction of the air pollutant concentration can be roughly divided into two types: deterministic and stochastic. Statistical approach learns from historical data and predicts the future behavior of the air pollutants. Meteorological conditions significantly affect the levels of air pollution in the urban atmosphere, due to their important role in the transport and dilution of pollutants. It has also been concluded that there is a close relationship between the concentration of air pollutants and meteorological variables (Zhang and Ding 2017). Thus, multiple linear regression models (MR) are trained based on existing measurements and are used to predict concentrations of air pollutants in the future, according to the corresponding meteorological variables.
The Greater Bay Area (GBA) of China consists of nine cities of Guangdong province, and the Special Administrative Region of Hong Kong and Macao. The synoptic situation of Macao and other cities of the GBA is closely related due to its geographic proximity. The GBA experiences a complex temporal and spatial climatic condition due to topographic variations, urban morphology, and land-water contrasts. Located along the southeast coast of Mainland China, Macao is surrounded by the sea on three sides, with a subtropical oceanic monsoon climate that is characterized by high temperatures, high rates of evaporation, high levels of atmospheric moisture, and abundant rainfall (SMG 2014). In winter, Macao is influenced by the north monsoon, the climate is cold and dry with the predominant wind from the north quadrant. In summer, the northeast monsoon is replaced by the strong southwest monsoon with heavy rains. Spring and autumn are transition periods.
Recent studies (Tong et al. 2018a, b) showed a rise of surface temperature and a drop of surface absolute humidity and wind speed at GBA due to the decline of vegetation and irrigated cropland. The landscape of GBA is characterized by a large flatland surrounded by the Nanling Mountains which can prevent air pollution from the central part of China reaching the GBA. Nevertheless, the northeast monsoon present during the winter may transport pollutants from northern and eastern China, along the coastline to the region of GBA (Tong et al. 2018a, b). PM levels are usually measured higher during the winter season, from December to February, due to the northern wind, bringing the air pollutants to the region, lowering mixing height, and fewer amount and lower frequency of rainfall. During summer season, from June to August, PM levels are usually measured lower due to the southern winds from the China sea, higher mixing height, higher frequency, and amount of rainfall, which allow for a better air pollution dispersion and deposition conditions (Lopes et al. 2016).
The air pollution of the GBA is normally associated with emission sources at alternating spatial scales from local to regional and transboundary (Tong et al. 2018a), under certain synoptic conditions. Estimates show that, in this region, for nitrogen oxides (NOx), mobile sources account for the majority of emissions (50%). For PM, the industrial sector is the main emitter, followed by mobile sources (Zheng et al. 2009). O3 is not emitted directly to the atmosphere, but is formed in reactions between NOx and volatile organic compounds (VOC), being these reactions driven by absorbed solar radiation (Reid et al. 2008).
Materials and methods
The statistical methods selected for this paper were both multiple linear regression analysis (MR) and classification and regression tree (CART). Those can be a useful and straightforward tool in air quality studies (Choi et al. 2013; Martinez et al. 2018; Cassmassi 1987; Clapp and Jenkin 2001). As one of the advantages of the CART analysis is its effectiveness in explaining the variations in pollutant levels solely by a combination of meteorological conditions, regression trees can identify specific meteorological conditions that lead to low or elevated pollutant concentrations (Choi et al. 2013). The basic concept of the CART approach is to make a hierarchy of binary decisions, each of which splits distribution/variation of a target variables into two mutually exclusive branches (groups) based on the explanatory variable/value showing the largest reduction in variations in target variable after the split (Choi et al. 2013).
Following precedent experiences (Cassmassi 1987; US EPA 2003; Durão et al. 2016; Oduro et al. 2016), the statistical models were initially created using MR analysis. As an approach to obtain improved results, mainly regarding a better prediction of high pollutant levels, the CART analysis was chosen to better predict the maximum concentrations.
Statistical models, based on MR and CART, were applied to forecast the daily average concentration of NO2, PM10, PM2.5, and the maximum average hourly concentration of O3 levels for the next day, for each station of the air quality monitoring network in Macao. This comprehends six air quality monitoring stations, operated by Macao Meteorological and Geophysical Bureau (SMG), being two of them classified as roadside (Macao Roadside, Ká-Hó Roadside), two as high density residential (Macao Residential, Taipa Residential), and two as ambient background types (Taipa Ambient, Coloane Ambient). Figure 1 represents the air quality monitoring stations spatial location, within the 30 km2 of Macao region.
Data from 4-year daily series observations, from 2013 to 2016, were used to develop the forecast models, and each of the models was evaluated using 2017 data.
The first step of the study was to gather a set of meteorological and air quality data, namely (i) meteorological surface observations: hourly observations from automatic weather stations, such as temperature, relative humidity, and dew point temperature collected from the Taipa Grande Meteorological Station; (ii) upper-air observations, such as, geopotential heights, temperature, relative humidity, and dew point temperature at various altitudes, collected from Hong Kong King’s Park location; (iii) surface air quality measurements, from SMG’s network, of NO2, PM10, PM2.5, and O3. Other variables were added to the analysis, as the flag for week/weekend day and the daily sunlight period duration. These variables are presented in Table 1.
The next step was to assess data efficiency levels, for each parameter, through the years, in order to reject lower annual efficiencies. The statistical models for Ká-Hó Roadside station were not feasible, due to the lack of sufficient air quality data. Outliers were identified and excluded from the data series. A complimentary analysis was conducted to observe air pollution trends, monthly, weekly, and hourly patterns, and pollution roses.
A preliminary exploratory data analysis, looking at basic statistics, like average, mode, histogram, distribution type, correlation between different variables, and principal component analysis, was performed to identify variables with similar behaviors. This strategy enabled to decide the proper steps to get the best model outcome.
The significance level of 0.05 was used in the linear MR analysis. Some variables initially selected were rejected from the forecast models due to collinearity. The final objective was to obtain prediction models with the lowest possible number of variables but with the maximum explained variance as translated by the R2. The higher the number of variables used by the model, the higher the risk of compromising the operational forecast, due to lack of information/missing data in case one or more variables are not accessible. SPSS version 25 was used to perform linear MR (stepwise method) and CART analysis.
Model performance was determined recurring to the following parameters: coefficient of determination (R2) (1), root mean square error (RMSE) (2), mean absolute error (MAE) (3), and Bias (4).
where f is forecast, \( \overline{f} \) is forecast average, o is observation, and \( \overline{o} \)is observation average, for each i case to the n number of cases.
Results and discussion
The statistical models based on MR and CART analysis were developed to forecast NO2, PM10, PM2.5, and O3 concentrations. The final objective is to be able to perform a daily forecast, for the next day, in an operational mode, by running the prediction models after 16H (due to the daily schedules of which the air quality data is made available).
CART analysis was tested mainly in order to better predict the high concentration levels. For NO2 and PM, CART analysis did not improve the quality of the overall predictions. Therefore, prediction models were based only on one MR model. In the case of O3 forecast, for three stations (Taipa Ambient, Taipa Residential, and Coloane Ambient), CART analysis allowed to identify split nodes, for which O3 prediction equations were determined afterwards by using MR for each node. Figure 2 represents an example of the CART trees obtained, in this case for O3 MAX prediction at Taipa Ambient station.
The output meteorological and air quality variables and equations obtained with MR (or CART and MR, in the O3 MAX case) are listed in Table 2.
The models were validated with collected data from 2017. The results show a good agreement between modelled and observed concentrations, being statistically significant at the 95% confidence level. The selected models provide a good relationship between meteorological and air quality variables, when performing an air quality forecast under different situations. Table 3 contains the obtained model performance indicators, such as, R2, RMSE, MAE, and Bias.
The obtained results performed a better R2 for PM (between 0.86 and 0.93 and, in all cases, greater for PM10 than for PM2.5), followed by NO2 (between 0.84 and 0.90), being the lowest explained variance achieved for O3 (between 0.78 and 0.87). Models did not show a defined trend on the forecasts by type of station, presenting undistinctive R2 for roadside, residential, and ambient stations. The monitored and forecasted concentrations, in 2017, for the models with the highest and lowest R2 are depicted in Figs. 3 and 4, being respectively, the one for PM10 Coloane Ambient and O3 MAX Coloane Ambient, in 2017. The poorest results obtained in Coloane Ambient is related with the fewest cases available to build the model (N = 546).
Regarding the RMSE, all models presented the same trend observed for R2, being the RMSE lower for PM (between 4.9 and 9.2 μg/m3), followed by NO2 (between 6.1 and 7.9 μg/m3), and the highest for O3 (between 21.1 and 27.4 μg/m3). In the case of O3, the high RMSE obtained values were due to abrupt variations, on consecutive days, influencing the predicted values, since statistical models are sensitive to this kind of fluctuations.
Regarding CART analysis for O3 prediction, three equation nodes were used. The number of cases considered in each node (N), the coefficient of determination (R2), the correlation coefficient (r), and the standard error of the estimate are presented in Table 4. The obtained standard error of the estimate, which is a measure of the prediction’s accuracy, was higher for higher concentrations prediction categories. The highest obtained standard error of the estimate for node 1 was of 17.2 μg/m3 in Coloane Ambient station, for node 2 was of 28.8 μg/m3, and for node 3 was of 43.6 μg/m3, both in Taipa Residential station. This reflects the difficulty of the model on predicting the highest O3 concentration ranges. Traffic-related pollutants, such as PM and NO2, are dependent on meteorological conditions as well as emission rates. Because O3 is produced in the atmosphere through photochemical processes, the major meteorological factors affecting ozone concentrations are different from those for traffic-related primary pollutants (Choi et al. 2013).
In all the cases, the variable that represents the last 24-h pollutant concentrations (16D1) is the most prevalent, being selected at all the forecast equations (Table 3). The geopotential height at 850 hPa (H_850), indicator of synoptic-scale weather pattern, is also frequently present in the forecast of NO2 and PM. Specifically, in the case of PM10, relevant variables are H_850 and the medium relative humidity (HRMD), while for PM2.5, for both residential stations, average dew point temperature (TD_MD) and air temperature at 925 hPa (TAR_925, a measure of the strength and height of the subsidence inversion) figure in the final equations. Atmospheric stability at 925 hPa and at 850 hPa (STB_925 and STB_850, respectively) figure in final equations in the case of NO2 and O3 MAX at Taipa Ambient. This temperature differences between layers provide information about atmospheric stability.
The used statistical methods depend on the past series of data. If the historical data is insufficient, forecasted data will be less reliable. In particular, if emission sources change considerably or if meteorological variables also change due to factors related to new weather patterns eventually motivated by climate change, the data series of the past will not represent the updated situation, and models need to be recalculated with more recent data.
Conclusion
The development of statistical models to forecast the daily average concentration of NO2, PM10, PM2.5, and the maximum hourly average concentration of O3 for the next day, in Macao region, was successfully accomplished for five locations, recurring to MR analysis. In the case of O3 predictions, CART analysis showed better results, specially improving high concentration levels predictions, assuring a more accurate prediction of critical pollution episodes.
The pollutants for which best results were obtained were PM10, followed by PM2.5 and NO2. The most challenging pollutant forecast was the maximum hourly concentration of O3, scoring the lowest R2 (0.78), due to its secondary nature as a pollutant, involved in several atmospheric reactions that depend on the concentrations of other compounds, and also key meteorological conditions, such as sunlight and temperature.
The variables that explained most of the variability, for all pollutants, were the concentration levels measured in the previous 24-h to the operational forecast. For PM and NO2, the indicator of synoptic-scale weather pattern (geopotential height at 850 hPa parameter), was also a relevant variable.
This work shows that in areas such as Macao, where data may not be easily obtained with a high level of confidence (such as spatially resolved emissions and traffic-related data), this kind of statistical approach becomes an opportunity to obtain a reliable forecast with a clearer understanding of the main factors that affect air quality.
References
Cassmassi JC (1987) Development of an objective ozone forecast model for the South Coast Air Basin. Annual meeting of the Air Pollution Control Association, Conference: 80, Journal Volume: 4, New York, NY (USA), 21-26 Jun Technical Paper 87-71.3; Journal ID: ISSN 0193-9688
Choi W, Paulson SE, Casmassi J, Winer AM (2013) Evaluating meteorological comparability in air quality studies: classification and regression trees for primary pollutants in California’s South Coast Air Basin. Atmos Environ 64:150–159. https://doi.org/10.1016/j.atmosenv.2012.09.049
Clapp LJ, Jenkin ME (2001) Analysis of the relationship between ambient levels of O3, NO2 and NO as a function of NOx in the UK. Atmos Environ 35:6391–6405. https://doi.org/10.1016/S1352-2310(01)00378-8
Deng T, Chen Y, Wan Q et al (2018) Comparative evaluation of the impact of GRAPES and MM5 meteorology on CMAQ prediction over Pearl River Delta, China. Particuology 40:88–97. https://doi.org/10.1016/j.partic.2017.10.005
Durão RM, Mendes MT, Pereira MJ (2016) Forecasting O3 levels in industrial area surroundings up to 24 h in advance, combining classification trees and MLP models. Atmos Pollut Res 7:961–970
Entwistle MR, Gharibi H, Tavallali P et al (2019) Ozone pollution and asthma emergency department visits in Fresno, CA, USA, during the warm season (June–September) of the years 2005 to 2015: a time-stratified case-crossover analysis. Air Qual Atmos Heal 12:661–672. https://doi.org/10.1007/s11869-019-00685-w
He D, Zhou Z, He K et al (2000) Assessment of traffic related air pollution in urban areas of Macao. J Environ Sci 12:39–46
Kumar R, Barth MC, Pfister GG et al (2018) How will air quality change in South Asia by 2050? J Geophys Res Atmos 123:1840–1864. https://doi.org/10.1002/2017JD027357
Lee M, Brauer M, Wong P et al (2017) Land use regression modelling of air pollution in high density high rise cities: a case study in Hong Kong. Sci Total Environ 592:306–315. https://doi.org/10.1016/j.scitotenv.2017.03.094
Liu JC, Peng RD (2018) Health effect of mixtures of ozone, nitrogen dioxide, and fine particulates in 85 US counties. Air Qual Atmos Heal 11:311–324. https://doi.org/10.1007/s11869-017-0544-2
Lopes D, Hoi KI, Mok KM et al (2016) Air quality in the main cities of the pearl river delta region. Glob Nest J 18:794–802
Martinez NM, Montes LM, Mura I, Franco JF (2018) Machine Learning Techniques for PM 10 Levels Forecast in Bogotá. In: 2018 ICAI Workshops (ICAIW). IEEE, pp 1–7. doi: https://doi.org/10.1109/ICAIW.2018.8554995
Oduro SD, Ha QP, Duc H (2016) Vehicular emissions prediction with CART-BMARS hybrid models. Transp Res Part D Transp Environ 49:188–202. https://doi.org/10.1016/j.trd.2016.09.012
Reid N, Yap D, Bloxam R (2008) The potential role of background ozone on current and emerging air issues: an overview. Air Qual Atmos Heal 1:19–29. https://doi.org/10.1007/s11869-008-0005-z
Sheng N, Tang UW (2013) Risk assessment of traffic-related air pollution in a world heritage city. Int J Environ Sci Technol 10:11–18. https://doi.org/10.1007/s13762-012-0030-1
SMG (2014) Climate in Macao. SMG/ Macao Meteorological and Geophysical Bureau. Available at: http://www.smg.gov.mo/smg/climate/e_climaintro.htm. Accessed 1 June 2019
SMG (2019) Annual summary of air quality in Macao – 2018. SMG/ Macao Meteorological and Geophysical Bureau. Available at: http://www.smg.gov.mo/smg/airQuality/pdf/IQA_2018_PT.pdf. Accessed 1 June 2019
Tong CHM, Yim SHL, Rothenberg D et al (2018a) Assessing the impacts of seasonal and vertical atmospheric conditions on air quality over the Pearl River Delta region. Atmos Environ 180:69–78. https://doi.org/10.1016/j.atmosenv.2018.02.039
Tong CHM, Yim SHL, Rothenberg D et al (2018b) Projecting the impacts of atmospheric conditions under climate change on air quality over the Pearl River Delta region. Atmos Environ 193:79–87. https://doi.org/10.1016/j.atmosenv.2018.08.053
US EPA (2003) Guidelines for Developing an Air Quality (Ozone and PM2.5) Forecasting Program. doi: EPA-456/R-03-002. Available at: https://nepis.epa.gov/Exe/ZyPURL.cgi?Dockey=2000F0ZT.TXT. Accessed 1 June 2019
WHO (2018) Ambient ( outdoor ) air quality and health. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health. Accessed 2 Jul 2019
WHO (2019) Air pollution and health: summary. https://www.who.int/airpollution/ambient/about/en/. Accessed 2 Jul 2019
Wiśniewska K, Lewandowska AU, Staniszewska M (2019) Air quality at two stations (Gdynia and Rumia) located in the region of Gulf of Gdansk during periods of intensive smog in Poland. Air Qual Atmos Heal 12:879–890. https://doi.org/10.1007/s11869-019-00708-6
Xie J, Liao Z, Fang X et al (2019) The characteristics of hourly wind field and its impacts on air quality in the Pearl River Delta region during 2013–2017. Atmos Res 227:112–124. https://doi.org/10.1016/j.atmosres.2019.04.023
Zhang J, Ding W (2017) Prediction of air pollutants concentration based on an extreme learning machine: the case of Hong Kong. Int J Environ Res Public Health 14:1–19. https://doi.org/10.3390/ijerph14020114
Zheng J, Zhang L, Che W et al (2009) A highly resolved temporal and spatial air pollutant emission inventory for the Pearl River Delta region , China and its uncertainty assessment. Atmos Environ 43:5112–5122. https://doi.org/10.1016/j.atmosenv.2009.04.060
Funding
The work developed was supported by The Macao Meteorological and Geophysical Bureau (SMG). The research work of CENSE is financed by the Fundação para a Ciência e Tecnologia, I.P., Portugal (UID/AMB/04085/2019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lei, M.T., Monjardino, J., Mendes, L. et al. Macao air quality forecast using statistical methods. Air Qual Atmos Health 12, 1049–1057 (2019). https://doi.org/10.1007/s11869-019-00721-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11869-019-00721-9