1 Introduction

The World Health Organization has last update Air Quality Guidelines (AQG) in 2005, which sets the recommended threshold levels for each pollutant. Also, Macao has an air quality problem, in particular of high levels of nitrogen dioxide (NO2), particulate matter (PM2.5), and ozone (O3), which often overstep the guidance values of AQG. In addition, there are a lot of studies that show that exposure to NO2, PM2.5, and O3 have increased hospital admissions and emergency room visits and even led to death from heart or lung diseases in extreme cases (WHO, 2003). Thus, it is extremely important to develop a reliable air quality forecast for the concentration of NO2, PM2.5, and O3 in Macao, which can alert the local population to take precautionary measures in case of a pollution episode (Neto et al., 2009). Figures 1, 2 and 3 showed the comparison of different air quality standards amongst WHO, EU, US, China, Macao, and Hong Kong for NO2, PM2.5, and O3 (MEE, 2012; SMG, 2019; WHO Europe, 2006).

Fig. 1
figure 1

Comparison of air quality standard for NO2

Fig. 2
figure 2

Comparison of air quality standard for PM2.5

Fig. 3
figure 3

Comparison of air quality standard for O3

2 Methodology

To now the air quality for the next day was used statistical methods that were based on past data series analysis. For this paper was utilized multiple linear regression (MLR) and classification and regression tree (CART) analysis. As showed in previous work (Cassmassi, 1997), statistical models based on MLR and CART analysis were developed to forecast the average daily concentration for NO2 and PM2.5, and the maximum hourly O3 levels for the next day, for the air quality monitoring stations of Taipa Ambient and Taipa Residential. Taipa Ambient is an ambient station, also a background representative station, which set the baseline for the levels of pollutant concentration. This station is located at Taipa Grande, the headquarter of Macao Meteorological and Geophysical Bureau (SMG). Taipa Residential is a high-density residential area station located in Taipa. This station is located at the Taipa Central Park, a leisure area for the local residents. A six-year period from 2013 to 2018 was selected as the period to develop the models, while the year of 2019 was selected for validation of the model.

Figure 4 shows the flowchart of the model development for the air quality forecast using statistical methods. The development of statistical model consists of collecting the air quality data and meteorological data, followed by computing the hourly air quality data into the daily concentrations of NO2, PM2.5, and O3. In addition, the meteorological data required to develop the statistical model would be extracted from different meteorological observations. These processed data would be analyzed by statistical methods such as multiple linear regression (MLR) and classification and regression trees (CART) analysis. The final procedure is to perform a model validation to ensure the accuracy of the next-day air quality forecast.

Fig. 4
figure 4

Flowchart for the development of statistical air quality forecast models

3 Results and Discussion

Table 1 provides the final list of meteorological and air quality parameters used as predictors, for each pollutant, in the obtained multiple regression models for the air quality air quality monitoring stations of Taipa Ambient and Taipa Residential. Table 1 shows PM2.5 has a highly correlated relationship with past concentration levels (PM25_16D1 as the average of the hourly values(µg/m3) between 16:00 of yesterday and 15:00 of today) for both Taipa Ambient and Taipa Residential air quality monitoring stations, geopotential height (m) at 850 hPa (H_850), and average relative air humidity (%) (HRMD) for Taipa Ambient, average dew point temperature (°C) (TD_MD), and air temperature (°C) at 925 hPa (TAR_925) for Taipa Residential. In addition, NOhas a highly correlated relationship with past concentration levels NO2_16D1and geopotential height (m) at 850 hPa for both Taipa Ambient and Taipa Residential air quality monitoring stations, atmospheric stability (°C) at 925 hPa (STB_925) for Taipa Ambient, and average dew point temperature (°C) (TD_MD) for Taipa Residential. Furthermore, O3 has a highly correlated relationship with past concentration levels O3 MAX_16D1, O3 MAX_23D1 (as the maximum hourly values (µg/m3) between 00:00 and 23:00 of yesterday), geopotential height (m) at 850 hPa and minimum relative air humidity (%) (HRMN) for both Taipa Ambient and Taipa Residential air quality monitoring stations.

Table 1 Variables used in statistical model

An example of one of the regression equations delivered is the following for next-day 24 h-average NO2 at Taipa Ambient:

$${\text{NO}}_{{\text{2}}} = \left( {0.{\text{914}} \times {\text{NO}}_{{\text{2}}} \_{\text{16D1}}} \right) + \left( {0.00{\text{4}} \times {\text{H}}\_{\text{85}}0} \right){-}\left( {0.{\text{734}} \times {\text{STB}}\_{\text{925}}} \right)$$
(1)

Figure 5 shows the graph of the model validation for the MLR models of NO2 concentrations in Taipa Ambient air quality monitoring station in 2019.

Fig. 5
figure 5

Observed and predicted NO2 concentrations value using MLR models in Taipa Ambient (2019)

Table 2 shows the model performance indicators of PM2.5, NO2, and O3 MAX for Taipa Ambient and Taipa Residential air quality monitoring stations. The model performance indicators include coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and BIAS. The results obtained from MLR and CART models perform a coefficient of determination between 0.86 and 0.87 for Taipa Ambient and between 0.78 and 0.88 for Taipa Residential. The air quality forecast is at best forecasting the levels of NO2 concentration in Taipa Ambient and is at best forecasting the levels of PM2.5 concentration in Taipa Residential. All of the statistical models were built using MLR, while the models for the maximum hourly ozone were built using both MLR and CART.

Table 2 Model performance indicators

The ambient station was better at predicting NO2 while the residential station was better at predicting the maximum hourly concentration of O3. Also, the developed models provide a better understanding of different air quality and meteorological variables and also the relationship between these variables. Furthermore, the variable that explained most of the variability is the 16D1 concentration for NO2, PM2.5, and O3.

4 Conclusion

The work presented is an air quality forecast using statistical methods, based on a detailed analysis of both air quality and meteorological variables for NO2, PM2.5, and O3. The final objective of this study is to develop a daily air quality forecast using statistical methods to predict the daily average of NO2, PM2.5, and maximum hourly O3 levels for the next day, in the Taipa Ambient air quality monitoring station (background location) and the Taipa Residential air quality monitoring station (the high-density residential location). The models for NO2, PM2.5 and O3_MAX used independent variables including the average of the hourly values between 16:00 of yesterday and 15:00 of today for NO2, PM2.5 and O3 MAX respectively, the average of the hourly values between 00:00 and 23:00 of yesterday for O3 MAX, geopotential height at 850 hPa, atmospheric stability at 850 hPa and 925 hPa, air temperature at 925 hPa, average dew point temperature, minimum relative air humidity and average relative air humidity. The use of statistical models was successful in forecasting the average daily concentrations with MLR for NO2 and PM2.5 and MLR and CART analysis for the peak levels for maximum hourly O3 for next day and be able to forecast the high concentration of pollution episodes, for both Taipa Ambient and Taipa Residential.