Using Ensemble Machine Learning Methods to Forecast Particulate Matter (PM2.5) in Bangkok, Thailand

Srisuradetchai, Patchanok; Panichkitkosolkul, Wararit

doi:10.1007/978-3-031-20992-5_18

Patchanok Srisuradetchai⁹ &
Wararit Panichkitkosolkul⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13651))

Included in the following conference series:

International Conference on Multi-disciplinary Trends in Artificial Intelligence

436 Accesses
9 Citations

Abstract

Many areas of Bangkok and its environs are currently blanketed with fine dust with dangerous levels of PM_2.5. High levels of PM_2.5 have a negative impact on human health. In this study, support vector regression, begged regression tree, random forest, gradient boosted models, neural networks, neural networks autoregressive, seasonal autoregressive moving average with exogenous covariates, k-nearest neighbor, Bayesian additive model, Prophet, and general additive models are used to anticipate PM_2.5. The usefulness of adopting an ensemble model for forecasting is investigated. A thorough evaluation of standalone algorithms and ensemble techniques was performed using the root-mean-square error, mean absolute error, and Pearson correlation coefficient. According to the results, hybrid models are effective in the forecasting of PM_2.5 concentrations.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Meteorological variability and predictive forecasting of atmospheric particulate pollution

Article Open access 02 January 2024

Modeling of atmospheric particulate matters via artificial intelligence methods

Article 21 April 2021

Applicability of machine learning in modeling of atmospheric particle pollution in Bangladesh

Article 20 July 2020

Keywords

1 Introduction

All life on Earth depends on air, making it one of the most essential components. Over the last half-century, human activity, industry, automobiles, power plants, chemical plants, and other natural phenomena like agricultural burning, earthquakes, and fires have all contributed to an increase in pollution [1]. It is the fourth most common cause of death in the world. About 6.67 million people died in 2019 as a result of pollution, the most lethal of which scientists call PM_2.5 (particles measuring less than 2.5 µm in aerodynamic diameter) [2]. Breathing in PM_2.5 is harmful because it can reach the bloodstream and the lungs. For 24-h exposure to PM_2.5 both outdoors and indoors, most studies indicate that PM_2.5 of 12 µg/m³ (micrograms per cubic meter) or less is considered healthy. Asthmatics and people who already have respiratory problems, such as those who live in areas with high levels of ozone in the air, should seek medical assistance if their symptoms worsen [3]. PM_2.5 has been related to an increase in respiratory and cardiovascular hospitalizations, emergency room visits, and mortality. Exposure to fine particles for a long time has been linked to chronic bronchitis, poor lung function, and death from lung cancer and heart disease [4].

Many areas of Bangkok and its environs are currently shrouded in fine dust with lethal PM_2.5 levels. According to the Pollution Control Department (PCD), the maximum allowable level of air dust in the environment is 50 µg/m³, and this limit was exceeded numerous times in February 2022, with PM_2.5 levels ranging from 40 to 146 µg/m³ [5]. Due to poor air quality, the PCD has expanded its list of Bangkok province areas and districts where people should wear protective gear and stay indoors.

Emissions from a variety of sources constantly change the quality of the air we breathe. As a result, things can appear normal one day and then take an unexpected turn the next. This emphasizes the importance of developing a tool for forecasting future air quality accurately. Machine learning techniques like artificial neural networks and regression trees can make predictions faster, more accurate, and easier to do with more than one type of data.

To forecast PM_2.5 in this paper, we utilize techniques of support vector regression (SVR) with linear and polynomial (degrees of 2 and 3) kernel functions, bagged regression trees (BRT), random forest (RF), gradient boosted models (GBM) with different loss functions, extreme gradient boosted (XGBoost) trees using L2 regularization, artificial neural networks (ANN) having 1 and 2 hidden layers and different activation functions, neural network autoregressive (NNAR), ANN using model averaging, seasonal ARIMAX (SARIMAX), k-nearest neighbor (KNN) regression with Epanechnikov and rectangular kernels, Prophet model, boosted generalized additive model (Boosted GAM), and Bayesian additive model (BAM). Finally, the “great” models will be included in the ensemble models to better forecast PM_2.5 concentration, and they will be compared to standalone algorithms. Following this introduction, the structure of the study is as follows: literature review; dataset overview and preparation; research methods; findings and conclusions, accordingly.

2 Literature Review

Because of the volatile nature of PM_2.5, accurate prediction has become challenging. Several models for predicting particulate matter emissions have been developed in the last several years in an effort to monitor air quality around the world.

Catalano [6] looked at the relationship between the hourly mean NO₂ concentration and factors that explain the NO₂ mean level one hour before, as well as traffic and weather conditions like the number of cars on the road, the speed of the wind, the direction of the wind, and the temperature. To model pollution peaks, the ANN, ARIMAX, and SARIMAX models were used.

Masood and Ahmad [7] looked at the possibilities of ANN and SVM in creating reliable and accurate PM_2.5 predictions for New Delhi. Carbon monoxide (CO), sulfur dioxide (SO₂), nitrogen oxide (NO), toluene (C₇H₈), nitrogen dioxide (NO₂), wind speed, relative humidity, and temperature are studied.

Suleiman et al. [8] evaluated and compared three air quality management techniques for predicting and managing roadside PM₁₀ and PM_2.5, including SVM, ANN, and BRT. It has been found that the ANN and regression tree-based models perform marginally better than the SVM model for PM₁₀ forecasting but significantly worse for PM_2.5 forecasting.

Doreswamy et al. [9] employed RF, XG Boost, BRT, and MLP Regression to forecast Taiwan PM_2.5. In both training and testing datasets, XG Boost obtained the best R² and lowest MSE. Sharma et al. [10] evaluated many air contaminants and utilized a time series regression model with extraneous factors to predict SO₂, NO₂, O₃, CO, and PM_2.5.

Qiao et al. [11] proposed a new model based on WT (wavelet transform)-SAE (stacked autoencoder)-LSTM (wavelet transform-stacked autoencoder-LSTM). To begin, WT is used to break down the PM_2.5 time series into numerous low- and high-frequency components based on different data from six Chinese research sites. SAE-LSTM is then used to forecast the deconstructed components.

Biancofiore et al. [12] analyzed three years of continuous measurements of PM and CO concentration in central Italy using a multiple linear regression model and ANN models with and without recursive architecture. One to three days in the future, an ANN was used to predict the concentrations of PM₁₀ and PM_2.5 in the air. The ANN used meteorological and chemical factors as input.

Mahajan et al. [13] utilized the NNAR, an additive version of the Holt-Winters method, and the ARIMA model to forecast hourly PM_2.5 in Taiwan. For comparison, root-mean-square error (RMSE) and mean absolute error (MAE) were the criteria. The results show that the NNAR model has the lowest values for both RMSE and MAE.

Ejohwomu et al. [14] used ARIMA, exponential smoothing, prophet, NNAR, ANN based on multiple variables, SVM, XG Boost, and RF. PM_2.5, relative humidity, and temperature were measured every 15 min and converted to hourly time-series data. Accuracy of forecast model predictions was evaluated using metrics such as RMSE and MAE.

Gupta et al. [15] used NASA’s Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA2) aerosols and meteorology reanalysis data to estimate the surface PM_2.5 concentration in Thailand. The RF was used to validate and train the data. Furthermore, the RF can estimate hourly and daily mean PM2.5 with a high degree of precision. The mean bias is near to zero, with correlation coefficients above 0.90 in the majority of cases.

3 Dataset Overview and Preparation

The PM_2.5 air pollution index (API) data utilized in this study are secondary data obtained from the website of the World Air Quality Index (WAQI) project, https://aqicn.org/city/Bangkok/. Bangkok’s API database in WAQI is retrieved from the Division of Air Quality Data, Bureau of Air Quality and Noise Management, Pollution Control Department. The daily AQI is based on the 24-h average of hourly readings from all stations from January 1, 2019 to December 31, 2021. Also, some variables are obtained from the World Meteorological Organization (WMO) via the webpage https://meteostat.net/en/place/th/bangkok. The database contains pollutants and meteorological variables such as ozone (O₃), nitrogen dioxide (NO₂), average temperature (Temp.avg), precipitation (PRCP), wind speed (Wspd), and pressure.

In data cleaning and preparation processes, the missing data can be accessed in aggregation plots as shown in Fig. 1. The missing proportion of PRCP is 1.09%, and PM_2.5, NO₂, and O₃ are all the same, at 0.36%, while the other covariates have no missing values. Two different types of blocks are shown on the right of Fig. 1: one for observed (blue) values and one for missing (red) data. There are 1,078 days with complete covariates, 12 days with only missing PRCP, 2 days with only missing PM_2.5, 2 days with missing NO₂ and O₃, and 2 days with missing PM_2.5, NO₂, and O₃. Multiple imputation by chained equations (MICE) was utilized to replace missing values in the gathered data. The MICE approach is based on the premise that multiple imputation is best accomplished in distinct steps, each of which may require diagnostic examination. Multiple imputation, analysis of imputed data, and pooling of analysis outcomes are MICE’s main steps. This task can be achieved by using the R package “mice” [16].

Table 1. Descriptive statistics of cleaned data from 1 January 2019 to 31 December 2021

Full size table

After preprocessing data, the time-series data on PM_2.5 concentration is illustrated in Fig. 2, and descriptive statistics of PM_2.5 concentration and metrological data are shown in Table 1. To quantify relationship, Pearson’s values are calculated and presented in Fig. 3 along with scatter plots. It is observed that PM_2.5 concentration is correlated with NO₂ the most, followed by pressure, O₃, and humidity. These factors correlate moderately to strongly with PM_2.5. Precipitation and average temperature have low correlations with PM_2.5, and wind speed has the lowest correlation.

4 Research Methods

4.1 Seasonal ARIMA with Exogenous Covariates

The autoregressive moving average (ARMA) model is a combination of the AR and MA models. The AR model of order p can be written as,

$$\left( {1 - \phi_1 L - \cdots \phi_p L^p } \right)y_t = c + \varepsilon_t ,$$

(1)

where $L^i$ is a lag operator that converts a variable at time t into its i^th-order lagged form, and the MA model of order q is defined as

$$y_t = c + \left( {1 + \theta_1 L + \cdots + \theta_q L^q } \right)\varepsilon_t .$$

(2)

The AR component represents the connection between the dependent variable and its previous expression, while the MA term combines the effect of a limited series of random disturbances on the dependent variable. Incorporating differencing and exogenous variables to ARMA model, we obtain a non-seasonal ARIMAX model:

$$\phi (L)\nabla^d y_t = {\text{x}}_t^T \rm{\beta } + \theta (L)\varepsilon_t ,$$

(3)

where $\phi (L)$ is called the autoregressive operator, $\theta (L)$ is called the moving average operator, β is a vector of coefficients, ${\text{x}}_t^T$ is a vector of covariate at time t, and $\varepsilon_t$ is a disturbance characterized by a normal distribution with a mean of zero and a constant variance. To describe Eq. (3), notation ARIMAX(p, d, q) is usually used.

The ARIMAX models can also be used to model a variety of seasonal data. Additional seasonal terms are added to the ARIMAX model to create a seasonal ARIMAX model, Eq. (3). It is written as follows:

$$\phi_p (L)\Phi_P (L^S )\nabla^d \nabla^D y_t = {\text{x}}_t^T \rm{\beta } + \theta_q (L)\Theta_Q (L^S )\varepsilon_t ,$$

(4)

where $\Phi_P (L^S )$ corresponds to a seasonal AR component, $\Theta_Q (L^S )$ corresponds to a seasonal MA component, and S is the duration of the recurring seasonal pattern. The corresponding notation for Eq. (4) is SARIMAX(p, d, q)(P, D, Q)_S [17].

In the R package “forecast”, there is a function called auto.arima() that can fit a regression model with ARIMA errors. It employs a variant of the Hyndman-Khandakar method [18] that combines unit root testing, Akaike information criterion (AICc) reduction, and MLE to get the ARIMA model.

4.2 Prophet Model

Facebook [19] created the Prophet model to forecast daily data with weekly and yearly seasonality, as well as holiday influences. It was later extended to incorporate other seasonal data sources. It is effective with time series with strong seasonality and data from many seasons. Prophet is a nonlinear regression model of the following form:

$$y_t = g(t) + s(t) + h(t) + \varepsilon_t ,$$

(5)

where $g(t)$ represents a piecewise-linear trend, $s(t)$ denotes the various seasonal patterns, $h(t)$ determines the holiday effects, and $\varepsilon_t$ is a random error term.

4.3 Regression Tree

Since Breiman [20] proposed decision trees in 1984, statistical learning approaches based on them have grown in popularity. A binary regression tree ${\rm T}$ divides the space ${\rm X}$ into many regions as there are leaf nodes, as stated by $W.$ The total prediction function $g$ associated with the tree may be represented as

$$g({\text{x}}) = \sum_{w \in W} {g^w ({\text{x}}){\rm I}({\text{x}} \in {\mathbb{R}}_W )} ,$$

(6)

where ${\rm I}$ represents the indicator function and ${\mathbb{R}}_W$ is the region built in the regression tree using logical criteria. The goal of building a tree using a training set $\tau = \left\{ {({\text{x}}_i ,y_i )} \right\}_{i = 1}^n$ is to minimize the training squared-error loss,

$$l_\tau (g) = \frac{1}{n}\sum_{w \in W} {\sum_{i = 1}^n {{\rm I}({\text{x}}_i \in {\mathbb{R}}_W )\left[ {y_i - g({\text{x}}_i )} \right]^2 } } .$$

(7)

Cost-Complexity Pruning.

Let $\tau = \left\{ {({\text{x}}_i ,y_i )} \right\}_{i = 1}^n$ be a data set and $\gamma \ge 0$ be a real number. For a given tree ${\rm T}$, the cost-complexity measure $C_\tau (\gamma ,{\rm T})$ is defined as:

$$C_\tau (\gamma ,{\rm T}) = \frac{1}{n}\sum_{w \in W} {\sum_{i = 1}^n {{\rm I}({\text{x}}_i \in {\mathbb{R}}_W )\left[ {y_i - g({\text{x}}_i )} \right]^2 } } + \gamma |W|,$$

(8)

where $W$ denotes the set of terminal nodes of ${\rm T}$ and $|W|$ denotes the total number of leaves on the tree, which provides insight into its intricacy.

Bootstrap Aggregation.

One of the ensemble methods is bootstrap aggregation, commonly known as bagging. There are bootstrap samples $\mathfrak{I}_1^* ,\;\mathfrak{I}_2^* ,\;...,\;\mathfrak{I}_B^*$ and the matching $B$ independent models giving learner $g_{\mathfrak{I}_1^* } ,\;g_{\mathfrak{I}_2^* } ,\;...,\;g_{\mathfrak{I}_B^* }$ from the training set $\mathfrak{I}$ with $n$ observations. The bootstrapped aggregated estimator or bagged estimator is obtained by model averaging as follows:

$$g_{bag} ({\text{x}}) = \frac{1}{B}\sum_{b = 1}^B {g_{\mathfrak{I}_b^* } ({\text{x}})} .$$

(9)

In an idealized situation, the average prediction function converges to the expectation prediction function if $B \to \infty$ and $\mathfrak{I}_1 ,\mathfrak{I}_2 \;,\;...,\;\mathfrak{I}_B$ are identically and distributed. However, $\mathfrak{I}_1 ,\;\mathfrak{I}_2 ,\;...,\;\mathfrak{I}_B$ are not independent, and for large $n$, the bootstrap sample $\mathfrak{I}^*$ only contains roughly 0.37 of the points from $\mathfrak{I}$[21].

Random Forest.

Suppose there is a feature that gives a very excellent split of the data, it will be chosen and divided for every $\{ g_{\mathfrak{I}_b^* } \}_{b = 1}^B$ at the root level, and predictions will be highly correlated. Prediction averaging is unlikely to improve in such a case. this problem is addressed by selecting $m \le p$ features at random and then calculating splitting criteria. Strong predictors have a lower chance of being retained at the root levels [21].

Conditional Inference Forest.

Torsten Hothorn et al. [22] created conditional inference forests (Cforest) to identify the conditional distribution of statistics that quantify the relationships between the response variable and the predictor factors. The Chi-square test statistics are used to examine if any predictors have statistically significant correlations with the response. A global null hypothesis is defined as $H_0 :\bigcap\nolimits_{j = 1}^m {H_0^j }$, where $H_0^j$ indicates that $Y$ is independent of $X_j ,j \in \{ 1,2,\;...,p\}$.

Gradient Boosted Regression Tree.

Any learning algorithm may benefit from boosting, especially if the learner is a poor one. Boosting and bagging both use prediction functions, however the two techniques are fundamentally distinct from each other. Bootstrapped data are used in bagging, while in boosting, the prediction functions are learned in sequentially. At each stage of the boosting round $b,\;b = 1,2,\,...,B$, a negative gradient on $n$ training points ${\text{x}}_1 ,...,\;{\text{x}}_n$ will be calculated. Next, the negative gradient is estimated using a simple tree by solving

$$h_b = \mathop {\arg \min }\limits_{h \in H} \frac{1}{n}\sum_{i = 0}^n {\left( {r_i^{(b)} - \left[ {g_{b - 1} ({\text{x}}_i ) + h({\text{x}}_i )} \right]} \right)^2 } .$$

(10)

The algorithm makes a γ-sized step in the direction of the negative gradient:

$$g_b ({\text{x}})\quad \leftarrow \quad g_{b - 1} ({\text{x}}) + \gamma h_b ({\text{x}}).$$

(11)

Approximation tree learning with sparse data was proposed by Chen and Guestrin [23]. They explain how to build a scalable tree boosting system using caching, compression, and sharing techniques. The combination of these findings allows XGBoost to handle billions of instances while using a fraction of the resources.

Bayesian Additive Regression Tree (BART).

The BART model is comprised of a sum-of-trees model plus a regularization prior on model parameters. Let $M = \{ \mu_1 ,\mu_2 ,\;...,\;\mu_b \}$ denote a set of parameter values for each terminal node $b$ in ${\rm T}$ and Function $f(x;{\rm T},M)$ that assigns a $\mu_i \in M$ to a single component in $x = (x_1 ,\;x_2 ,\;...,\,\;x_p )$ as follows:

$$Y = \sum_{j = 1}^m {g(x;{\rm T}_j ,M_j )} + \varepsilon ,$$

(12)

where $\varepsilon \ N(0,\sigma^2 )$ and $g(x;{\rm T}_j ,M_j )$ is the function which assigns $\mu_{ij} \in M_j$ to $x$. Also, a prior $g({\rm T}_1 ,M_1 ),\;...,\;g({\rm T}_m ,M_m )$ and $\sigma$ must be imposed over all sum-of-trees parameters. BART draws posterior samples using MCMC. Chipman et al. [24] describe in detail an iterative Bayesian backfitting MCMC algorithm.

4.4 Support Vector Regression

Vapnik et al. [25] proposed an SVM for regression. Here, $F({\text{x}},{\text{w}})$ denotes a family of functions parameterized by w, $G({\text{x}})$ is an unknown function, and ${\hat{\text{w}}}$ is the value of w that minimizes an error between $G({\text{x}})$ and $F({\text{x}},{\hat{\text{w}}})$. The representation of $F({{\varvec{x}}},{\hat{\user2{w}}})$ can be defined as

$$F({\text{x}},{\hat{\text{w}}}) = \sum_{i = 1}^n {\left( {\alpha_i^* - \alpha_i } \right)\left( {{\upnu }_i^T {\text{x}} + 1} \right)^p } + b,$$

(13)

where there are $2n + 1$ values of $\alpha_i^* ,\alpha_i ,$ and $b$. The optimum values for the components of ${\hat{\text{w}}}$ or $\alpha$ depend on a definition of a loss function and the objective function.

4.5 Artificial Neural Network

The artificial neural network (ANN) approach resembles the functioning of human bran, and the algorithm has been based on function:

$$g^* ({{\varvec{x}}}) = \sum_{j = 1}^{2p + 1} {h_j \left( {\sum_{i = 1}^p {h_{ij} (x_i )} } \right)} ,$$

(14)

where each of the $p$ parts of the input x is expressed as a node in an input layer; there are $2p + 1$ nodes in the hidden layer. The output of a feed-forward neural network with $L + 1$ layers may be expressed as the function composition:

$$g({\text{x}}) = S_L \circ M_L \circ \cdots \circ S_2 \circ M_2 \circ S_1 \circ M_1 ,$$

(15)

where $M_l = {\text{W}}_l z + {\text{b}}_l ,\;l = 1,2,\;...,L - 1$, $S_l$ is an activation function,${\text{W}}_l$ is a weight matrix, and ${\text{b}}_l$ is a bias vector.

4.6 K-Nearest Neighbors (KNN) Regression

Let $\tau = \{ ({\text{x}}_i ,y_i )\}_{i = 1}^n$ be a training set and $\{ ({\text{x}}_{(i)} ,y_{(i)} )\}_{i = 1}^n$ be a reordering of the data according to increasing distances $\left\| {{\text{x}}_i - {\text{x}}} \right\|$ of the ${\text{x}}^{\prime}_i s$ to ${{\varvec{x}}}$. The usual $k$-NN regression estimate takes the form $g_n ({\text{x}}) = {{\sum_{i = 1}^n {y_{(i)} ({\text{x}})} } / {k_n }}$.

5 Results and Conclusions

A total of 1004 data points were used for training and a further 92 for testing. The RMSE, MAE, and Pearson correlation coefficient (PCC) were used to evaluate the forecasts provided by the machine learning models.

Table 2 summarizes and presents the predictive performance indicators for all the 24 models. Based on the training data, the RF has the lowest RMSE and MAE, followed by SARIMAX and trees without pruning; the RF is clearly superior, as its RMSE is only 9.68 and its PCC is close to one. For test data, GBMs with Gaussian and Laplace have the lowest RMSE and MAE, respectively. Based on the PCC, the best three approaches are, respectively, Prophet, NNAR, and GBM with Gaussian distribution.

Considering all the criteria in both the training and test datasets, SARIMAX, trees without pruning, and typical neural networks (except NNAR and ANN using model averaging) tend to be overfitted models, so they are not suitable for PM_2.5 prediction. The other models are considered “good” and some of them are evaluated for a particular period, as shown in Fig. 4.

To provide superior forecasts, ensemble techniques employ a collection of machine learning methods. There are numerous “great” models here based on a certain criterion for both training and test datasets. For example, GBM with Gaussian, NNAR, SVR (Poly deg. of 2), BRT, and RF all give RMSE values of less than 20, MAE values of less than 16, and PCC values of greater than 0.7 for the test data. These models shown in Fig. 4 are among the top ten and were included in the ensemble models. There are three kinds of ensemble models: (1) average, (2) median, and (3) weighted. The weights (W) are allocated based on predictive performance: W_GBM = 5, W_NNAR = 4, W_SVR = 3, W_BRT = 2, and W_RF = 1. The predictive performance is shown in Table 3. When compared to all standalone algorithms, the ensemble (weighted) model gives the lowest RMSE, the lowest MAE, and the highest PCC. The ensemble model produced in this work is a mix of the “great” models, which might explain why it performs better.

Table 2. Predictive performance of the univariate and multivariate models

Full size table

Table 3. Predictive performance on test data for the ensemble models

Full size table

References

Jung, R., Hwang, F., Chen, T.: Incorporating long-term satellite-based aerosol optical depth, localized land use data, and meteorological variables to estimate ground-level PM 2.5 concentrations in Taiwan from 2005 to 2015. Environ. Pollut. 237(1), 1000–1010 (2018)
Google Scholar
Health Effects Institute: State of Global Air 2020. Special Report. Boston, MA (2020)
Google Scholar
Yiyi, W., et al.: Associations of daily mortality with short-term exposure to PM2.5 and its constituents in Shanghai, China. Chemosphere 233, 879–887 (2019)
Google Scholar
Xing, Y.F., et al.: The impact of PM2.5 on the human respiratory system. J. Thorac. Dis. 8(1), E69–E74 (2016)
Google Scholar
World Air Quality Index project. https://aqicn.org/city/bangkok/. Last accessed 25 Mar 2022
Catalano, M., et al.: Improving the prediction of air pollution peak episodes generated by urban transport networks. Environ. Sci. Policy. 60, 69–83 (2016)
Article Google Scholar
Masood, A., Ahmad, K.: A model for particulate for Delhi based on machine learning approaches. Procedia. Comput. Sci. 167, 2101–2110 (2020)
Article Google Scholar
Suleiman, A., et al.: Applying machine learning methods in managing urban concentrations of traffic-related particulate matter (PM₁₀ and PM_2.5). Atmos. Pollut. Res. 10(1), 134–144 (2019)
Google Scholar
Doreswamy, et al.: Forecasting air pollution particulate matter (PM_2.5) using machine learning regression models. Procedia. Comput. Sci. 171, 2057–2066 (2020)
Google Scholar
Sharma, N., et al.: Forecasting air pollution load in Delhi using data analysis tools. Procedia. Comput. Sci. 132, 1077–1085 (2018)
Article Google Scholar
Qiao, W., et al.: The forecasting of PM_2.5 using a hybrid model based on wavelet transform and an improved deep learning algorithm. IEEE Access 7, 142814–142825 (2019)
Google Scholar
Biancofiore, F.: Recursive neural network model for analysis and forecast of PM₁₀ and PM_2.5. Atmos. Pollut. Res. 8, 652–659 (2017)
Google Scholar
Mahajan, S.: An empirical study of PM_2.5 forecasting using neural network. In: 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, pp. 1–7. IEEE, San Francisco, USA (2017)
Google Scholar
Ejohwomu, O.A., et al.: Modelling and forecasting temporal PM_2.5 concentration using ensemble machine learning methods. Buildings 12(1), 46 (2022)
Google Scholar
Gupta, P., et al.: Machine learning algorithm for estimating surface PM_2.5 in Thailand, Aerosol Air Qual. Res. 21(11), 210105 (2021)
Google Scholar
Buuren, S.: Karin Groothuis-Oudshoorn: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)
Article Google Scholar
Box, G.E.P., et al.: Time series analysis: forecasting and control, 4th edn. John Wiley & Sons Inc., Hoboken, New Jersey (2008)
Book MATH Google Scholar
Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27(3), 1–22 (2008)
Article Google Scholar
Taylor, S.J., Letham, B.: Forecasting at scale. Am. Stat. 72(1), 37–45 (2018)
Article MathSciNet Google Scholar
Breiman, L., et al.: Classification and Regression Trees, 1st edn. Chapman and Hall/CRC, Boca Raton (1984)
MATH Google Scholar
Kroese, D.P., et al.: Data Science and Machine Learning: Mathematical and Statistical Methods, 1st edn. Chapman and Hall/CRC, Boca Raton (2020)
MATH Google Scholar
Hothorn, T., et al.: Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graph. Stat. 15(3), 651–674 (2006)
Article MathSciNet Google Scholar
Chen, T.Q., Guestrin, C.: XGBoost: a scalable tree boosting system. https://arxiv.org/abs/1603.02754. Last accessed 17 Apr 2022
Chipman, H.A., et al.: BART: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)
Article MathSciNet MATH Google Scholar
Vapnik, V., et al.: Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inf. Process. Syst. 9, 281–287 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Pathum Thani, 12120, Thailand
Patchanok Srisuradetchai & Wararit Panichkitkosolkul

Authors

Patchanok Srisuradetchai
View author publications
You can also search for this author in PubMed Google Scholar
Wararit Panichkitkosolkul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patchanok Srisuradetchai .

Editor information

Editors and Affiliations

Mahasarakham University, Maha Sarakham, Thailand
Olarik Surinta
Hong Kong Polytechnic University, Hong Kong, China
Kevin Kam Fung Yuen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srisuradetchai, P., Panichkitkosolkul, W. (2022). Using Ensemble Machine Learning Methods to Forecast Particulate Matter (PM_2.5) in Bangkok, Thailand. In: Surinta, O., Kam Fung Yuen, K. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2022. Lecture Notes in Computer Science(), vol 13651. Springer, Cham. https://doi.org/10.1007/978-3-031-20992-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-20992-5_18
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20991-8
Online ISBN: 978-3-031-20992-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Ensemble Machine Learning Methods to Forecast Particulate Matter (PM_2.5) in Bangkok, Thailand

Abstract

Similar content being viewed by others

Meteorological variability and predictive forecasting of atmospheric particulate pollution

Modeling of atmospheric particulate matters via artificial intelligence methods

Applicability of machine learning in modeling of atmospheric particle pollution in Bangladesh

Keywords

1 Introduction

2 Literature Review

3 Dataset Overview and Preparation

4 Research Methods

4.1 Seasonal ARIMA with Exogenous Covariates

4.2 Prophet Model

4.3 Regression Tree

Cost-Complexity Pruning.

Bootstrap Aggregation.

Random Forest.

Conditional Inference Forest.

Gradient Boosted Regression Tree.

Bayesian Additive Regression Tree (BART).

4.4 Support Vector Regression

4.5 Artificial Neural Network

4.6 K-Nearest Neighbors (KNN) Regression

5 Results and Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using Ensemble Machine Learning Methods to Forecast Particulate Matter (PM2.5) in Bangkok, Thailand

Abstract

Similar content being viewed by others

Meteorological variability and predictive forecasting of atmospheric particulate pollution

Modeling of atmospheric particulate matters via artificial intelligence methods

Applicability of machine learning in modeling of atmospheric particle pollution in Bangladesh

Keywords

1 Introduction

2 Literature Review

3 Dataset Overview and Preparation

4 Research Methods

4.1 Seasonal ARIMA with Exogenous Covariates

4.2 Prophet Model

4.3 Regression Tree

Cost-Complexity Pruning.

Bootstrap Aggregation.

Random Forest.

Conditional Inference Forest.

Gradient Boosted Regression Tree.

Bayesian Additive Regression Tree (BART).

4.4 Support Vector Regression

4.5 Artificial Neural Network

4.6 K-Nearest Neighbors (KNN) Regression

5 Results and Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Using Ensemble Machine Learning Methods to Forecast Particulate Matter (PM_2.5) in Bangkok, Thailand