Keywords

1 Introduction

In recent years, with the development of industrial development, the problem of environmental pollution has become more and more serious, which has attracted significant attention. There are many factors that cause environmental pollution, such as sulfur dioxide, nitrogen oxides, fine particles (PM2.5) and so on. PM2.5 is the main cause of smog among them [1]. PM2.5 can stay in the atmosphere for a long time, and it also can enter the body by breathing, accumulating in the trachea or lungs and affecting the health of the body [2]. PM2.5 is larger than viruses and smaller than bacteria. It is easy to carry toxic substances into the human body [3]. For the environment and human health, the threat from PM2.5 is enormous. Therefore, the prediction and control of PM2.5 concentration are quite important issues.

This paper proposes a deep multilayer neural network model combining convolutional neural network (CNN) and long short-term memory (LSTM) to predict PM2.5 concentration. The model is able to predict the future PM2.5 concentration data based on the past PM2.5 concentration data. This study collects the environmental data from January 2015 to December 2017 in nine cities (i.e., Ningde City, Nanping City, Fuzhou City, Sanming City, Putian City, Quanzhou City, Longyan City, Zhangzhou City, and Xiamen City) in the Fujian Province of the People’s Republic of China as a training set, and the environmental data from January to October 2018 as a testing set. For the evaluation of the proposed method, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) are used as accuracy metrics. Experimental results show that the proposed method is superior to other machine learning methods.

This paper is organized as follows. Section 2 presents a literature review on air quality prediction. Section 3 presents the prediction methods based on deep learning techniques. Section 4 describes the data processing and gives the practical experimental results and discussions. Section 5 summarizes the contributions of this study and discusses the future work.

2 Literature Reviews

In 2013, the World Health Organization’s International Agency for Research on Cancer (IARC) published a report stating that PM2.5 is carcinogenic to humans and is considered a universal and a major environmental carcinogen [4]. Therefore, the prediction and control of PM2.5 are particularly important issues for air quality maintenance and urban development. At present, the air quality prediction methods are mainly classified into two categories: (1) the mechanism models based on the atmospheric chemical modes are called deterministic models [5, 6]; (2) the statistical models based on machine learning algorithms are called machine learning models [6, 7].

Gu et al. [8] designed a new picture-based predictor of PM2.5 concentration (PPPC) which employs the pictures acquired using mobile phones or cameras to make a real-time estimation of PM2.5 concentration. Although this method can estimate the PM2.5 concentration more accurately, it can just evaluate the current PM2.5 concentration and cannot predict the future PM2.5 concentration. Mahajan et al. designed a PM2.5 concentration prediction model that combined a neural network based hybrid model and clustering techniques like grid-based clustering and wavelet-based clustering [9]. The main focus is to achieve high accuracy of prediction with reduced computation time. A hybrid model was applied to do a grid-based prediction system for clustering the monitoring stations based on the geographical distance [9]. In 2014, Elangasinghedeng et al. [10] proposed the complex event-sequence analyses of PM10 and PM2.5 in coastal areas by using artificial neural network models and k-means clustering method. The study presented a new approach based on artificial neural network models, and the k-means clustering method was used to analyze the relationships between the bivariates of concentration–wind and speed–wind direction for extracting source performance signals from the time series of ambient PM2.5 and PM10 concentrations [10]. Li et al. [11] proposed a LSTM network for air pollutant concentration prediction, and Tsai et al. [12] showed the way of air pollution prediction based on RNN and LSTM. The method collected PM2.5 data from 66 stations in Taiwan from 2012 to 2016 and establishes LSTM and RNN models for precise prediction of air quality. Yu et al. [13] predicted the concentration of PM2.5 through the Eta-Community Multiscale Air Quality (Eta-CMAQ) model. The method is based on the chemical composition of PM2.5 and has certain references, but the cost of chemical analysis is too expensive. Verma et al. [14] proposed the use of a bi-directional LSTM model to predict air pollutant severity levels ahead of time. The models are robust and have shown superiority over an artificial neural network model in predicting PM2.5 severity levels for multiple stations in New Delhi City [14].

Because of the small particle size of PM2.5, it stays in the atmosphere for a long time and the transport distance is long. Therefore, PM2.5 concentrations have very close relationships with time and space. The proposed method based on a CNN and a LSTM network to perfectly extract the spatial-temporal characteristics of PM2.5 distributions for air quality prediction.

3 Prediction Methods Based on Deep Learning Techniques

The concepts and processes of CNNs, LSTM networks, and convolutional long short-term memory (ConvLSTM) networks are presented in the following subsections.

3.1 Convolutional Neural Networks

Deep neural networks have achieved remarkable performance at the cost of a large number of parameters and high computational complexity [15]. A convolutional neural network is a feedforward neural network that contains convolutional computation and has a deep structure. The difference between CNN and the fully connected neural network is the weight sharing. CNN [16, 17] has two advantages: (1) the number of weights is reduced, and the amount of training is greatly reduced; (2) spatio features can be effectively extracted. The network model can process multi-dimension data. In this paper, the input convolutional layer data is a 5 × 5 two-dimensional matrix. As shown in Fig. 1, the variables \( x_{1} \) to \( x_{25} \) are inputs, and the variables \( w_{1} \) to \( w_{4} \) are convolution kernels which function to filter data and extract features. The variables \( h_{1} \) to \( h_{16} \) are feature maps obtained after convolution.

Fig. 1.
figure 1

A two-dimensional convolutional operation

3.2 Long Short-Term Memory Networks

A recurrent neural network (RNN) [18] is an artificial neural network that has a tree-like hierarchical structure, and the nodes of RNN recursively input information in the order in which they are connected. A LSTM [19, 20] network, a special RNN, differs from RNN in learning long-term dependencies. The repeating module in a conventional RNN contains only a single layer (shown in Fig. 2(a)), and the repeating module in a LSTM network contains four interacting neural network layers.

Fig. 2.
figure 2

RNN and LSTM networks

The LSTM network can remove or add information to the cell state and manage it by the gate structure. The LSTM network includes forgetting gates, input gates and output gates. The function σ in the module represents a sigmoid function, and the formula is as shown in Eq. (1). The sigmoid layer outputs a number between 0 and 1, which represents how much each component should pass the threshold. The value of “1” means that all ingredients pass, and the value of “0” means that no ingredients are allowed to pass.

$$ S({\text{t}}) = \frac{1}{{1 + {\text{e}}^{ - t} }} $$
(1)

3.2.1 The Forgetting Gate

In Fig. 3, the forgetting gate decides which information to discard. The formula is shown in Eq. (2).

Fig. 3.
figure 3

The forgetting gate in a LSTM network

$$ f_{t} = \sigma (W_{xf} x_{t} + W_{hf} h_{t - 1} + W_{cf} \circ c_{t - 1} + b_{f} ) $$
(2)

Where \( W \) is the weight matrix and \( b \) is the deviation vector matrix. Both \( W \) and \( b \) need to learn during the training process. Where \( \circ \) is the Hadamard product.

3.2.2 The Output Gate

In Fig. 4, the input gate determines which information to remember. The formulas are shown in Eqs. (3), (4) and (5).

Fig. 4.
figure 4

The output gate in a LSTM network

$$ i_{t} = \sigma (W_{xi} x_{t} + W_{hi} h_{t - 1} + W_{ci} \circ c_{t - 1} + b_{i} ) $$
(3)
$$ \tilde{C}_{\text{t}} = \tanh (W_{xc} x_{t} + W_{hc} h_{t - 1} + b_{c} ) $$
(4)
$$ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tilde{C}_{\text{t}} $$
(5)

3.2.3 The Input Gate

In Fig. 5, the input gate decides which information to update. The formulas are shown in Eqs. (6) and (7).

Fig. 5.
figure 5

The input gate in a LSTM network

$$ o_{t} = \sigma (W_{xo} x_{t} + W_{ho} h_{t - 1} + W_{co} \circ c_{t} + b_{o} ) $$
(6)
$$ h_{t} = o_{t} \circ \,\tanh (c_{t} ) $$
(7)

3.3 Convolutional Long Short-Term Memory Networks

The ConvLSTM [21] network not only has the timing modeling capabilities of a LSTM network, but also extracts spatio features like a CNN. As shown in Fig. 6, the ConvLSTM network differs from the normal LSTM network in that the internal LSTM is internally calculated by a similar feedforward neural network and can be called FC-LSTM [21]. A ConvLSTM network uses convolutional calculations instead of fully connected calculations.

Fig. 6.
figure 6

The differences between a FC-LSTM network and a ConvLSTM network

The derivation formulas have also changed, and the new derivations are shown in Eqs. (8), (9), (10), (11), (12) and (13).

$$ f_{t} = \sigma (W_{xf} *X_{t} + W_{hf} *H_{t - 1} + W_{cf} \circ C_{t - 1} + b_{f} ) $$
(8)
$$ i_{t} = \sigma (W_{xi} *X_{t} + W_{hi} *H_{t - 1} + W_{ci} \circ C_{t - 1} + b_{i} ) $$
(9)
$$ \tilde{C}_{\text{t}} = \tanh (W_{xc} *X_{t} + W_{hc} *H_{t - 1} + b_{c} ) $$
(10)
$$ C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tilde{C}_{\text{t}} $$
(11)
$$ o_{t} = \sigma (W_{xo} *X_{t} + W_{ho} H_{t - 1} + W_{co} \circ C_{t} + b_{o} ) $$
(12)
$$ H_{t} = o_{t} \circ \,\tanh (c_{t} ) $$
(13)

4 Practical Experimental Environments and Results

This section illustrates the selected features for air quality prediction in Subsect. 4.1 and discusses the practical experimental results in Subsect. 4.2.

4.1 Practical Experimental Environments

For training and testing the air quality prediction methods, this study collected the environmental data in nine cities in the Fujian Province of the People’s Republic of China from January 2015 to October 2018. The environmental factors include 7 dimensions which are air quality index (AQI), PM2.5, PM10, SO2, NO2, CO and O3; the AQI reflects the degree of air pollution. Seven environmental factors (i.e., AQI, PM2.5, PM10, SO2, NO2, CO and O3) at the t-th timestamp are elected as the inputs of neural networks, and the parameter of PM2.5 at the (t + 1)-th timestamp is elected as the output of neural networks. The mean squared error loss function is adopted for optimizing neural networks. In experiments, the environmental data from January 2015 to December 2017 is used as a training dataset, and the environmental data from January 2018 to October 2018 is used as a testing dataset.

For data pre-processing, if there are abnormal values or missing values in a record, the record will be deleted [22]. For data normalization, the data is processed by min-max normalization method [23] and represented by a number between 0 and 1. The number of records is N, and the value of the i-th record (x) can be normalized by Eq. (14).

$$ z_{i} = \frac{{x_{i} - \mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} )}}{{\mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} ) - \mathop {\hbox{min} }\limits_{1 \le n \le N} (x_{n} )}} $$
(14)

4.2 Practical Experimental Results and Discussions

For the evaluation of the proposed ConvLSTM method, multi-layer perception (MLP) neural networks [24,25,26], CNNs, LSTM networks are implemented and used to predict the air quality in the selected cities in Fujian Province. In order to compare the performance of each prediction method comprehensively and objectively, MAE, RMSE and MAPE were used as accuracy metrics. The value of the n-th actual data is defined as on, and the value of the n-th predicted data is defined as pn. These three accuracy metrics can be estimated by Eqs. (15), (16) and (17), respectively. The practical experimental results based on these three accuracy metrics are shown in Tables 1, 2 and 3.

Table 1. The MAEs of each prediction method for each city
Table 2. The RMSEs of each prediction method for each city
Table 3. The MAPEs of each prediction method for each city
$$ MAE = \frac{1}{N}\sum\limits_{n = 1}^{N} {\left| {o_{n} - p_{n} } \right|} $$
(15)
$$ RMSE = \sqrt {\frac{{\sum\limits_{n = 1}^{N} {(o_{n} - p_{n} )^{2} } }}{N}} $$
(16)
$$ MAPE = \frac{1}{N}\sum\limits_{n = 1}^{N} {\frac{{\left| {o_{n} - p_{n} } \right|}}{{o_{n} }}} $$
(17)

The MAEs from low to high are generated by ConvLSTM (6.4579), MLP (7.0221), CNN (7.0906) and LSTM (7.1125). Furthermore, the MAPEs from low to high are generated by ConvLSTM (0.3152), MLP (0.3577), LSTM (0.3595) and CNN (0.3681). Finally, the RMSEs from low to high are generated by ConvLSTM (10.1450), CNN (10.7404), LSTM (10.8044) and MLP (10.8077). From the comparison results, the performance of ConvLSTM is significantly better than the other methods, which proves that the superiority of the ConvLSTM network in predicting PM2.5 concentration.

A case study of air quality prediction by the proposed ConvLSTM network for each city is shown in Fig. 7. The actual records are illustrated as blue polylines, and the predicted records are expressed as orange polylines. In experiments, the predicted values of PM2.5 concentration by the proposed ConvLSTM network are roughly consistent with the actual values. Some large errors may be generally caused by human behaviors. For instance, a large number of fireworks and firecrackers are released during the Spring Festival and New Year’s Eve, which causes the rising of PM2.5 concentration.

Fig. 7.
figure 7

The prediction results by the proposed ConvLSTM method

5 Conclusions and Future Work

A deep multi-layer neural network model based on CNN and LSTM (i.e., the ConvLSTM method) is proposed to analyze the spatio-temporal features for predicting air quality in multiple cities. A case study of the prediction of PM2.5 concentration in the Fujian Province of the People’s Republic of China is given in this study, the proposed model estimate and predict the future concentration of PM2.5 in accordance with the past concentration of PM2.5. In experiments, the performances of each prediction method (e.g., MLP, CNN, LSTM, and ConvLSTM) were evaluated by MAE, MAPE and RMSE. The practical experimental results show that the proposed model combines the advantages of CNN and LSTM for analyzing the spatio-temporal features and improving the accuracy of PM2.5 concentration prediction.

In the future, this study can be applied to the prediction and control of air quality for other cities. Furthermore, the human behaviors can be detected and considered for the improvement of air quality prediction.