The Air Quality Prediction Based on a Convolutional LSTM Network

Guo, Canyang; Guo, Wenzhong; Chen, Chi-Hua; Wang, Xin; Liu, Genggeng

doi:10.1007/978-3-030-30952-7_12

Canyang Guo¹²,
Wenzhong Guo¹²,
Chi-Hua Chen¹²,
Xin Wang¹³ &
…
Genggeng Liu¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11817))

Included in the following conference series:

International Conference on Web Information Systems and Applications

2310 Accesses
9 Citations

Abstract

In recent years, the rapid development of industrial technology has been accompanied by serious environmental pollution. In the face of numerous environmental pollution problems, particulate matter (PM2.5) which has received special attention is rich in a large amount of toxic and harmful substances. Furthermore, PM2.5 has a long residence time in the atmosphere and a long transport distance, so analyzing PM2.5 distributions is an important issue for air quality prediction. Therefore, this paper proposes a method based on convolutional neural networks (CNN) and long short-term memory (LSTM) networks to analyze the spatial-temporal characteristics of PM2.5 distributions for predicting air quality in multiple cities. In experiments, the records of environmental factors in China were collected and analyzed, and three accuracy metrics (i.e., mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE)) were used to evaluate the performance of the proposed method in this paper. For the evaluation of the proposed method, the performance of the proposed method was compared with other machine learning methods. The practical experimental results show that the MAE, RMSE, and MAPE of the proposed method are lower than other machine learning methods. The main contribution of this paper is to propose a deep multilayer neural network that combines the advantages of CNN and LSTM for accurately predicting air quality in multiple cities.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Air quality prediction using CNN+LSTM-based hybrid deep learning architecture

Article 23 September 2021

A hybrid CNN-LSTM model for predicting PM_2.5 in Beijing based on spatiotemporal correlation

Article 27 April 2021

Prediction of hourly PM₁₀ concentration through a hybrid deep learning-based method

Article 22 November 2023

Keywords

1 Introduction

In recent years, with the development of industrial development, the problem of environmental pollution has become more and more serious, which has attracted significant attention. There are many factors that cause environmental pollution, such as sulfur dioxide, nitrogen oxides, fine particles (PM2.5) and so on. PM2.5 is the main cause of smog among them [1]. PM2.5 can stay in the atmosphere for a long time, and it also can enter the body by breathing, accumulating in the trachea or lungs and affecting the health of the body [2]. PM2.5 is larger than viruses and smaller than bacteria. It is easy to carry toxic substances into the human body [3]. For the environment and human health, the threat from PM2.5 is enormous. Therefore, the prediction and control of PM2.5 concentration are quite important issues.

This paper proposes a deep multilayer neural network model combining convolutional neural network (CNN) and long short-term memory (LSTM) to predict PM2.5 concentration. The model is able to predict the future PM2.5 concentration data based on the past PM2.5 concentration data. This study collects the environmental data from January 2015 to December 2017 in nine cities (i.e., Ningde City, Nanping City, Fuzhou City, Sanming City, Putian City, Quanzhou City, Longyan City, Zhangzhou City, and Xiamen City) in the Fujian Province of the People’s Republic of China as a training set, and the environmental data from January to October 2018 as a testing set. For the evaluation of the proposed method, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are used as accuracy metrics. Experimental results show that the proposed method is superior to other machine learning methods.

This paper is organized as follows. Section 2 presents a literature review on air quality prediction. Section 3 presents the prediction methods based on deep learning techniques. Section 4 describes the data processing and gives the practical experimental results and discussions. Section 5 summarizes the contributions of this study and discusses the future work.

2 Literature Reviews

In 2013, the World Health Organization’s International Agency for Research on Cancer (IARC) published a report stating that PM2.5 is carcinogenic to humans and is considered a universal and a major environmental carcinogen [4]. Therefore, the prediction and control of PM2.5 are particularly important issues for air quality maintenance and urban development. At present, the air quality prediction methods are mainly classified into two categories: (1) the mechanism models based on the atmospheric chemical modes are called deterministic models [5, 6]; (2) the statistical models based on machine learning algorithms are called machine learning models [6, 7].

Gu et al. [8] designed a new picture-based predictor of PM2.5 concentration (PPPC) which employs the pictures acquired using mobile phones or cameras to make a real-time estimation of PM2.5 concentration. Although this method can estimate the PM2.5 concentration more accurately, it can just evaluate the current PM2.5 concentration and cannot predict the future PM2.5 concentration. Mahajan et al. designed a PM2.5 concentration prediction model that combined a neural network based hybrid model and clustering techniques like grid-based clustering and wavelet-based clustering [9]. The main focus is to achieve high accuracy of prediction with reduced computation time. A hybrid model was applied to do a grid-based prediction system for clustering the monitoring stations based on the geographical distance [9]. In 2014, Elangasinghedeng et al. [10] proposed the complex event-sequence analyses of PM10 and PM2.5 in coastal areas by using artificial neural network models and k-means clustering method. The study presented a new approach based on artificial neural network models, and the k-means clustering method was used to analyze the relationships between the bivariates of concentration–wind and speed–wind direction for extracting source performance signals from the time series of ambient PM2.5 and PM10 concentrations [10]. Li et al. [11] proposed a LSTM network for air pollutant concentration prediction, and Tsai et al. [12] showed the way of air pollution prediction based on RNN and LSTM. The method collected PM2.5 data from 66 stations in Taiwan from 2012 to 2016 and establishes LSTM and RNN models for precise prediction of air quality. Yu et al. [13] predicted the concentration of PM2.5 through the Eta-Community Multiscale Air Quality (Eta-CMAQ) model. The method is based on the chemical composition of PM2.5 and has certain references, but the cost of chemical analysis is too expensive. Verma et al. [14] proposed the use of a bi-directional LSTM model to predict air pollutant severity levels ahead of time. The models are robust and have shown superiority over an artificial neural network model in predicting PM2.5 severity levels for multiple stations in New Delhi City [14].

Because of the small particle size of PM2.5, it stays in the atmosphere for a long time and the transport distance is long. Therefore, PM2.5 concentrations have very close relationships with time and space. The proposed method based on a CNN and a LSTM network to perfectly extract the spatial-temporal characteristics of PM2.5 distributions for air quality prediction.

3 Prediction Methods Based on Deep Learning Techniques

The concepts and processes of CNNs, LSTM networks, and convolutional long short-term memory (ConvLSTM) networks are presented in the following subsections.

3.1 Convolutional Neural Networks

Deep neural networks have achieved remarkable performance at the cost of a large number of parameters and high computational complexity [15]. A convolutional neural network is a feedforward neural network that contains convolutional computation and has a deep structure. The difference between CNN and the fully connected neural network is the weight sharing. CNN [16, 17] has two advantages: (1) the number of weights is reduced, and the amount of training is greatly reduced; (2) spatio features can be effectively extracted. The network model can process multi-dimension data. In this paper, the input convolutional layer data is a 5 × 5 two-dimensional matrix. As shown in Fig. 1, the variables $ x_{1} $ to $ x_{25} $ are inputs, and the variables $ w_{1} $ to $ w_{4} $ are convolution kernels which function to filter data and extract features. The variables $ h_{1} $ to $ h_{16} $ are feature maps obtained after convolution.

3.2 Long Short-Term Memory Networks

A recurrent neural network (RNN) [18] is an artificial neural network that has a tree-like hierarchical structure, and the nodes of RNN recursively input information in the order in which they are connected. A LSTM [19, 20] network, a special RNN, differs from RNN in learning long-term dependencies. The repeating module in a conventional RNN contains only a single layer (shown in Fig. 2(a)), and the repeating module in a LSTM network contains four interacting neural network layers.

The LSTM network can remove or add information to the cell state and manage it by the gate structure. The LSTM network includes forgetting gates, input gates and output gates. The function σ in the module represents a sigmoid function, and the formula is as shown in Eq. (1). The sigmoid layer outputs a number between 0 and 1, which represents how much each component should pass the threshold. The value of “1” means that all ingredients pass, and the value of “0” means that no ingredients are allowed to pass.

$$ S({\text{t}}) = \frac{1}{{1 + {\text{e}}^{ - t} }} $$

(1)

3.2.1 The Forgetting Gate

In Fig. 3, the forgetting gate decides which information to discard. The formula is shown in Eq. (2).

$$ f_{t} = \sigma (W_{xf} x_{t} + W_{hf} h_{t - 1} + W_{cf} \circ c_{t - 1} + b_{f} ) $$

(2)

Where $ W $ is the weight matrix and $ b $ is the deviation vector matrix. Both $ W $ and $ b $ need to learn during the training process. Where $ \circ $ is the Hadamard product.

3.2.2 The Output Gate

In Fig. 4, the input gate determines which information to remember. The formulas are shown in Eqs. (3), (4) and (5).

$$ i_{t} = \sigma (W_{xi} x_{t} + W_{hi} h_{t - 1} + W_{ci} \circ c_{t - 1} + b_{i} ) $$

(3)

$$ \tilde{C}_{\text{t}} = \tanh (W_{xc} x_{t} + W_{hc} h_{t - 1} + b_{c} ) $$

(4)

$$ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tilde{C}_{\text{t}} $$

(5)

3.2.3 The Input Gate

In Fig. 5, the input gate decides which information to update. The formulas are shown in Eqs. (6) and (7).

$$ o_{t} = \sigma (W_{xo} x_{t} + W_{ho} h_{t - 1} + W_{co} \circ c_{t} + b_{o} ) $$

(6)

$$ h_{t} = o_{t} \circ \,\tanh (c_{t} ) $$

(7)

3.3 Convolutional Long Short-Term Memory Networks

The ConvLSTM [21] network not only has the timing modeling capabilities of a LSTM network, but also extracts spatio features like a CNN. As shown in Fig. 6, the ConvLSTM network differs from the normal LSTM network in that the internal LSTM is internally calculated by a similar feedforward neural network and can be called FC-LSTM [21]. A ConvLSTM network uses convolutional calculations instead of fully connected calculations.

The derivation formulas have also changed, and the new derivations are shown in Eqs. (8), (9), (10), (11), (12) and (13).

$$ f_{t} = \sigma (W_{xf} *X_{t} + W_{hf} *H_{t - 1} + W_{cf} \circ C_{t - 1} + b_{f} ) $$

(8)

$$ i_{t} = \sigma (W_{xi} *X_{t} + W_{hi} *H_{t - 1} + W_{ci} \circ C_{t - 1} + b_{i} ) $$

(9)

$$ \tilde{C}_{\text{t}} = \tanh (W_{xc} *X_{t} + W_{hc} *H_{t - 1} + b_{c} ) $$

(10)

$$ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tilde{C}_{\text{t}} $$

(11)

$$ o_{t} = \sigma (W_{xo} *X_{t} + W_{ho} H_{t - 1} + W_{co} \circ C_{t} + b_{o} ) $$

(12)

$$ H_{t} = o_{t} \circ \,\tanh (c_{t} ) $$

(13)

4 Practical Experimental Environments and Results

This section illustrates the selected features for air quality prediction in Subsect. 4.1 and discusses the practical experimental results in Subsect. 4.2.

4.1 Practical Experimental Environments

For training and testing the air quality prediction methods, this study collected the environmental data in nine cities in the Fujian Province of the People’s Republic of China from January 2015 to October 2018. The environmental factors include 7 dimensions which are air quality index (AQI), PM2.5, PM10, SO2, NO2, CO and O3; the AQI reflects the degree of air pollution. Seven environmental factors (i.e., AQI, PM2.5, PM10, SO2, NO2, CO and O3) at the t-th timestamp are elected as the inputs of neural networks, and the parameter of PM2.5 at the (t + 1)-th timestamp is elected as the output of neural networks. The mean squared error loss function is adopted for optimizing neural networks. In experiments, the environmental data from January 2015 to December 2017 is used as a training dataset, and the environmental data from January 2018 to October 2018 is used as a testing dataset.

For data pre-processing, if there are abnormal values or missing values in a record, the record will be deleted [22]. For data normalization, the data is processed by min-max normalization method [23] and represented by a number between 0 and 1. The number of records is N, and the value of the i-th record (x) can be normalized by Eq. (14).

$$ z_{i} = \frac{{x_{i} - \mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} )}}{{\mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} ) - \mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} )}} $$

(14)

4.2 Practical Experimental Results and Discussions

For the evaluation of the proposed ConvLSTM method, multi-layer perception (MLP) neural networks [24,25,26], CNNs, LSTM networks are implemented and used to predict the air quality in the selected cities in Fujian Province. In order to compare the performance of each prediction method comprehensively and objectively, MAE, RMSE and MAPE were used as accuracy metrics. The value of the n-th actual data is defined as o_n, and the value of the n-th predicted data is defined as p_n. These three accuracy metrics can be estimated by Eqs. (15), (16) and (17), respectively. The practical experimental results based on these three accuracy metrics are shown in Tables 1, 2 and 3.

Table 1. The MAEs of each prediction method for each city

Full size table

Table 2. The RMSEs of each prediction method for each city

Full size table

Table 3. The MAPEs of each prediction method for each city

Full size table

$$ MAE = \frac{1}{N}\sum\limits_{n = 1}^{N} {\left| {o_{n} - p_{n} } \right|} $$

(15)

$$ RMSE = \sqrt {\frac{{\sum\limits_{n = 1}^{N} {(o_{n} - p_{n} )^{2} } }}{N}} $$

(16)

$$ MAPE = \frac{1}{N}\sum\limits_{n = 1}^{N} {\frac{{\left| {o_{n} - p_{n} } \right|}}{{o_{n} }}} $$

(17)

The MAEs from low to high are generated by ConvLSTM (6.4579), MLP (7.0221), CNN (7.0906) and LSTM (7.1125). Furthermore, the MAPEs from low to high are generated by ConvLSTM (0.3152), MLP (0.3577), LSTM (0.3595) and CNN (0.3681). Finally, the RMSEs from low to high are generated by ConvLSTM (10.1450), CNN (10.7404), LSTM (10.8044) and MLP (10.8077). From the comparison results, the performance of ConvLSTM is significantly better than the other methods, which proves that the superiority of the ConvLSTM network in predicting PM2.5 concentration.

A case study of air quality prediction by the proposed ConvLSTM network for each city is shown in Fig. 7. The actual records are illustrated as blue polylines, and the predicted records are expressed as orange polylines. In experiments, the predicted values of PM2.5 concentration by the proposed ConvLSTM network are roughly consistent with the actual values. Some large errors may be generally caused by human behaviors. For instance, a large number of fireworks and firecrackers are released during the Spring Festival and New Year’s Eve, which causes the rising of PM2.5 concentration.

5 Conclusions and Future Work

A deep multi-layer neural network model based on CNN and LSTM (i.e., the ConvLSTM method) is proposed to analyze the spatio-temporal features for predicting air quality in multiple cities. A case study of the prediction of PM2.5 concentration in the Fujian Province of the People’s Republic of China is given in this study, the proposed model estimate and predict the future concentration of PM2.5 in accordance with the past concentration of PM2.5. In experiments, the performances of each prediction method (e.g., MLP, CNN, LSTM, and ConvLSTM) were evaluated by MAE, MAPE and RMSE. The practical experimental results show that the proposed model combines the advantages of CNN and LSTM for analyzing the spatio-temporal features and improving the accuracy of PM2.5 concentration prediction.

In the future, this study can be applied to the prediction and control of air quality for other cities. Furthermore, the human behaviors can be detected and considered for the improvement of air quality prediction.

References

Querol, X., et al.: PM10 and PM2.5 source apportionment in the Barcelona metropolitan area, Catalonia, Spain. Atmos. Environ. 35(36), 6407–6419 (2001)
Article Google Scholar
Schwartz, J., Laden, F., Zanobetti, A.: The concentration-response relation between PM2.5 and daily deaths. Environ. Health Perspect. 110(10), 1025–1029 (2002)
Article Google Scholar
Bell, M.L., Francesca, D., Keita, E.: Spatial and temporal variation in PM2.5 chemical composition in the United States for health effects studies. Environ. Health Perspect. 115(7), 989–995 (2007)
Article Google Scholar
Badyda, A.J., Grellier, J., Dąbrowiecki, P.: Ambient PM2.5 exposure and mortality due to lung cancer and cardiopulmonary diseases in Polish cities. Adv. Exp. Med. Biol. 944, 9–17 (2017)
Article Google Scholar
Chan, C.K., Yao, X.: Air pollution in mega cities in China. Atmos. Environ. 42(1), 1–42 (2008)
Article Google Scholar
Kermanshahi, B.S., et al.: Artificial neural network for forecasting daily loads of a Canadian electric utility. In: Proceedings of the Second International Forum on Applications of Neural Networks to Power Systems, Yokohama, Japan (1993)
Google Scholar
Fleming, S.W.: Artificial neural network forecasting of nonlinear Markov processes. Can. J. Phys. 85(3), 279–294 (2007)
Article Google Scholar
Gu, K., Qiao, J., Li, X.: Highly efficient picture-based prediction of PM2.5 concentration. IEEE Trans. Industr. Electron. 66(4), 3176–3184 (2019)
Article Google Scholar
Mahajan, S., Liu, H.M., Tsai, T.C., Chen, L.J.: Improving the accuracy and efficiency of PM2.5 forecast service using cluster-based hybrid neural network model. IEEE Access 6, 19193–19204 (2018)
Article Google Scholar
Elangasinghe, M.A., Singhal, N., Dirks, K.N., Salmond, J.A., Samarasinghe, S.: Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering. Atmos. Environ. 94, 106–116 (2014)
Article Google Scholar
Li, X., et al.: Long short-term memory neural network for air pollutant concentration predictions: method development and evaluation. Environ. Pollut. 231, 997–1004 (2017)
Article Google Scholar
Tsai, Y.T., Zeng, Y.R., Chang, Y.S.: Air pollution forecasting using RNN with LSTM. In: Proceedings of 2018 IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress, Athens, Greece (2018)
Google Scholar
Yu, S., et al.: Evaluation of real-time PM2.5 forecasts and process analysis for PM2.5 formation over the eastern United States using the Eta-CMAQ forecast model during the 2004 ICARTT study. J. Geophy. Res. Atmos. 113, D06204 (2008)
Google Scholar
Verma, I., Ahuja, R., Meisheri, H., Dey, L.: Air pollutant severity prediction using bi-directional LSTM network. In: Proceedings of 2018 IEEE/WIC/ACM International Conference on Web Intelligence, Santiago, Chile (2018)
Google Scholar
Zhao, H., Xia, S., Zhao, J., Zhu, D., Yao, R., Niu, Q.: Pareto-based many-objective convolutional neural networks. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_1
Chapter Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Nevada, USA (2012)
Google Scholar
Lawrence, S., Giles, C.L., Tsoi, A.C., Back, A.D.: Face recognition: a convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997)
Article Google Scholar
Mikolov, T., Karafiát, M., Khudanpur, S.: Recurrent neural network based language model. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, Chiba, Japan (2010)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Baddeley, A.D., Warrington, E.K.: Amnesia and the distinction between long-and short-term memory. J. Verbal Learn. Verbal Behav. 9(2), 176–189 (1970)
Article Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, Canada (2015)
Google Scholar
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised leaning. Int. J. Comput. Electr. Autom. Control Inf. Eng. 1(12), 4091–4096 (2007)
Google Scholar
Jain, Y.K., Bhandare, S.K.: Min max normalization based data perturbation method for privacy protection. Int. J. Comput. Commun. Technol. 3(4), 45–50 (2014)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Upper Saddle River (1998)
MATH Google Scholar
Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 32(14–15), 2627–2636 (1998)
Article Google Scholar
Haykin, S.O.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, Upper Saddle River (2008)
Google Scholar

Download references

Acknowledgment

This work was supported in part the National Natural Science Foundation of China under Grants No. 61877010 and No. 11501114, and the Fujian Natural Science Funds under Grant No. 2019J01243. This research was partially supported by Fuzhou University, grant numbers 510730/XRC-18075 and 510809/GXRC-19037.

Author information

Authors and Affiliations

College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
Canyang Guo, Wenzhong Guo, Chi-Hua Chen & Genggeng Liu
College of Intelligence and Computing, Tianjin University, Tianjin, China
Xin Wang

Authors

Canyang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Hua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Genggeng Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Genggeng Liu .

Editor information

Editors and Affiliations

Southeast University, Nanjing, China
Weiwei Ni
Tianjin University, Tianjin, China
Xin Wang
Wuhan University, Wuhan, China
Wei Song
Tianjin University of Technology, Tianjin, China
Yukun Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, C., Guo, W., Chen, CH., Wang, X., Liu, G. (2019). The Air Quality Prediction Based on a Convolutional LSTM Network. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-30952-7_12
Published: 16 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30951-0
Online ISBN: 978-3-030-30952-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

The Air Quality Prediction Based on a Convolutional LSTM Network

Abstract

Similar content being viewed by others

Air quality prediction using CNN+LSTM-based hybrid deep learning architecture

A hybrid CNN-LSTM model for predicting PM_2.5 in Beijing based on spatiotemporal correlation

Prediction of hourly PM₁₀ concentration through a hybrid deep learning-based method

Keywords

1 Introduction

2 Literature Reviews