A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM2.5 Concentration Across India

Goswami, Pranjol; Prakash, Manoj; Ranjan, Rakesh Kumar; Prakash, Amit

doi:10.1007/s10666-023-09902-4

A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM_2.5 Concentration Across India

Research
Published: 03 June 2023

Volume 28, pages 803–816, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Environmental Modeling & Assessment Aims and scope Submit manuscript

A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM_2.5 Concentration Across India

Download PDF

Pranjol Goswami¹,
Manoj Prakash²,
Rakesh Kumar Ranjan³ &
…
Amit Prakash¹

276 Accesses
3 Citations
Explore all metrics

Abstract

Fine particulate matter (PM_2.5) concentration in ambient air has become a major concern across the globe. All major cities of India have reported an elevated concentration of PM_2.5 that has severe consequences to the health, economy, and ecosystem of the region. As a result, it becomes imperative to develop adequate tools for forecasting particulate matter concentration. Most of the research works mostly focused on single-step prediction horizon, thereby limiting their use. In the present work, a hybrid model has been proposed to forecast multi-step ahead concentrations of PM_2.5 in ambient air across India covering different agroclimatic zones. The hybrid model architecture was an encoder-decoder-based sequence to sequence model framework that was built with convolutional long short-term memory (LSTM), bidirectional LSTM and 3D convolution neural network. The model was tested across 26 Indian cities covering 13 major agroclimatic zones of India. The performance of the model was also analysed for consecutive hour sequential prediction taking last 24-h data as input to the model. The model output was also compared with signal to noise ratio to explore the reason for variations in model performance. A distinct trend was found between signal to noise ratio and model output. As noise increases, the model performances suffer. Overall, the model was found to be stable as its performance errors across different time horizon has little variations. The proposed model has the potential to be used for long-term forecasting by incorporating other predictor variables series.

Prediction of hourly PM₁₀ concentration through a hybrid deep learning-based method

Article 22 November 2023

Deep learning algorithms for prediction of PM₁₀ dynamics in urban and rural areas of Korea

Article 25 January 2022

Long-term time-series pollution forecast using statistical and deep learning methods

Article 03 April 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human-induced activities have increased the level of air pollution, and its adverse consequences were suffered by all. Researchers across the world have been concerned with the challenging problem of high ambient aerosol concentrations that directly affect our health, economy and the climates [1, 2]. A World Health Organization (WHO) report [3] suggests that 90% of the population of the planet breathes air that crosses the WHO air quality recommendations, and every year, around 7 million people have lost their lives from exposure to ambient air pollution. A survey conducted by WHO in the year 2016 reported that out of the world’s top twenty polluted cities, fourteen cities belong to India. Therefore, reliable forecasting of PM_2.5 concentration is imperative to forewarn the public as well as policy planners to take corrective measures. In the last couple of decades, different forecasting methods such as the deterministic model [4,5,6,7,8], statistical model [9] and artificial neural networks (ANN)-based models were explored and attempted by various researchers across the world. In recent years, ANN-based models gained prominence owing to its ability to handle linear and nonlinear variability present in the environmental data with limited set of variables. The general regression neural network [10] and neuro-fuzzy models [11], feed forward network, radial basis function network, multilayer perceptron (MLP) model [12], back-propagation neural network [13, 14], and recurrent neural network (RNN) [15, 16] with error back-propagation learning techniques are some of the widely used ANN architecture for air pollution forecasting [17,18,19,20]. With the advancement in computing technology, an emerging field of machine learning and artificial intelligence has attracted a lot of researchers to apply deep learning techniques (a subset of machine learning) for diverse problems of societal relevance.

Among many deep learning architectures, long short-term memory (LSTM) network was mostly used for air quality time series forecasting in the recent past because of its ability to capture long- and short-term dependencies [21,22,23,24,25,26]. However, different hybrid network architectures were also employed for different aspects of air quality studies [27,28,29]. The LSTM network was further used to ascertain the long-term and short-term dependencies [30] and effectiveness of encoder-decoder networks for building prediction machines with time-series data [31, 32]. A hybrid model consisting of convolutional LSTM and CNN was also attempted to predict the concentration of particulate matter [33, 34] where convolutional LSTM was used for sequential spatiotemporal information and CNN for extracting temporal features in parallel. Similarly, transfer learning BiLSTM model was also examined for hourly, daily and weekly prediction of air quality [35].

Most of these studies were focused on predicting air quality at single (or few) monitoring stations making use of such modelling techniques rather limited. India being a very diverse (7th largest country by area with 2nd most populated in the world) country spreading from 8°4′ to 37°6′ latitude and 68°7′ to 97°25′ longitude, models trained on selected cities may not be efficient for use in other cities. The model architecture needs constant alteration for use in other cities. Therefore, a uniform simpler model is essentially required which is less data intensive and can be applicable across India without the need for structural changes. The present study aims to develop a hybrid deep learning network with uniform model architecture that can be applicable for all monitoring stations across India. Besides developing and testing the model for multi-step ahead forecasting, it is also ascertained the dependence of model performance on the relative variance measure in terms of signal-to-noise ratio (SNR) of the input data. Section 2 details site description of data pre-processing; model development and architecture are presented in Sect. 3. Result and discussions are given in Sect. 4 and conclusion in Sect. 5.

2 Site and Data Description

For the present study, the air pollution data were acquired for 26 different cities across the country from the Central Pollution Control Board (CPCB), Government of India (http://www.cpcb.nic.in/) (Fig. 1). For the ease of analysis, India was further subdivided into 15 Agroclimatic zones (Table 1) as per Indian Meteorological Department (IMD), Government of India (GOI) classification [36]. No data was available for two regions, namely, WH (U.T. of Jammu and Kashmir and Union Territory of Ladakh) and the IR of India (Andaman and Nicobar Island, Lakshadweep Island), hence were not included in the present study. The data was collected for the duration from 1 January 2015, to 31 May 2020 depending upon the data availability. Details of the data used in this work have been presented in Table 1.

Table 1 Agroclimatic zones and data description

Full size table

2.1 Data Pre-Processing

The data acquired from the secondary sources were often infected with outliers and missing values. Therefore, pre-processing of the data to eliminate and minimise such errors is highly imperative. In the present study, the unreasonably high values were considered as outliers and were replaced with the help of linear interpolation method [37]. Similar techniques were followed for filling of missing values present in the data.

3 Model Development

3.1 Network Architecture

In the present study, an ANN architecture with deep learning framework was adopted for multi (Eight) step ahead forecasting. The model architecture is an encoder-decoder–based (Fig. 2) [38] sequence-to-sequence hybrid model, which has three main components, namely,

1.
3D-CNN: 3-dimensional convolutional neural network model
2.
ConvLSTM: convolutional long short-term memory
3.
BiLSTM: bidirectional long short-term memory

Essentially, 3D-CNN and LSTM networks are the backbone of this architecture. The LSTM model was widely applied for time series prediction, because of its ability to store the information in self-recurrent cells that can be retrieved at different time steps. The LSTM network performs exceedingly well in reducing the Gaussian noise present in the data [39], but unable to filter out non-Gaussian noise, which was inherently present in the data set. To address these shortcomings, BiLSTM network was applied to reduce the overfitting of noisy data. Besides, the ConvLSTM model performed better in datasets having long-duration sequential features with multiple temporal information [40]. Furthermore, the 3D-CNN model was advancement over 2D-CNN model that has better processing ability for large contextual data helpful in extracting the spatiotemporal features. The ability of the 3D-CNN to extract features from large sequential data into different time–frequency domains was exploited to reduce noise present in the data as well as to abstract features that can be stored and further fed into the next fully connected layer. The schematic diagram of hybrid model architecture is presented in Fig. 3.

In the model architecture, ConvLSTM encoder layer generates a feature map that was further refined and filtered by the second ConvLSTM network with Batch normalisation layer. The output is fed into 3D-CNN to extract spatiotemporal patterns from the state matrix. The output of the 3D-CNN layer then feeds into the decoder layer having four BiLSTM networks. BiLSTM will generate a string of the entire sequence containing values for 8 h. The first, second and third fully connected layers act as an interpretation layer for each time step of the output sequence, and the last fully connected layer is the final output layer of the model that generates the final predicted value of 8 steps ahead prediction. Concurrently, a dropout layer was used after the first BiLSTM to minimize the overfitting. Each layer of filter is a CNN model abstract feature. Since initial network layers receive the noisy raw data, fewer filters were used to capture the basic features only. In the subsequent layers, the number of filters was increased to capture deeper abstraction of features. A smaller filter size or kernel size can capture more features than a larger kernel size. We applied 64 numbers of filters of size (1,7) in the first ConvLSTM layer. In the second ConvLSTM layer, the number of filters was increased to 128, and kernel size was decreased to (1,3). Odd numbers of kernel size were used to maintain symmetry around the centre or origin of the abstraction layer.

The BiLSTM layer, used in the model, acts as a decoder and generates output of multiple values in a sequence. Cross validation and out of sample testing techniques were employed to evaluate the model performance. A similar model framework was earlier applied by [41] for learning smart manufacturing problems using time series data. However, they used stacked ConvLSTM as an encoder and stacked BiLSTM as a decoder layer for an auto encoder model framework. In the present study, Stacked ConvLSTM layer outputs were fed into the 3D-CNN layer. The air pollution time series data are the net outcome of the complex interplay between different stochastic and dynamic processes having different characteristic frequencies [16]. Therefore, 3D-CNN was used to take into account the characteristic features, enhancing the ability of the network for better prediction.

To forecast PM_2.5 value for the next 8 h, we used 3 sequences of 8-h durations, i.e., 24 h of data as input sequence with the next 8 h of data as target. But this number of instances would be rather limited for training a deep learning model. Therefore, an overlapping moving window method was used during training of the time series data for generating more training instances. This method is a modified rolling window method as proposed in [42] and later adopted for air pollution studies by [35]. Here, a large training dataset was generated by shifting the entire sequence by one step (Fig. 4) as discussed as follows.

Let us consider a time series u(t) = {u₁, u₂, u₃,……..,u_t}. In order to forecast the next k values of the sequence ŝ = (ŝ₁, ŝ₂,…,ŝ_k) equivalent to (u_t+1, u_t+2,………,u_t+k) with the help of last observation and a moving window of fixed size w, it would be

$$\widehat{s}= (\widehat{s}_{1}, \widehat{s}_{2},..., \widehat{s}_{k}) = f(u_{t\,-\,w},u_{t\,-\,w\,+\,1}, u_{t\,-\,w\,+\,2},..., u_{t})$$

When the above operations are applied to a univariate time series of length N, it generates a sequence to sequence prediction with an input set U ∈ R^n×w and output set S ∈ R^n×k. Here, n is the size of training data given by

$$n = (N - w - k + 1).$$

As evident from the above description, the entire sequence of 24-h time series data (h1 to h24) was converted into 3 × 8 sequence internally and mapped to the next 8-h values (h25 to h32). In the next step, the data were shifted by one value, and now, the data (h2 to h25) were mapped to the next 8-h sequence (h26 to h33) and so on.

Since the model was used in many stations situated at different geographical locations of India, the model parameters had been generalized in such a way that it could result in optimum value for most of the stations.

3.2 Hyperparameters

Hyperparameters in machine learning are special kinds of parameters that play a significant role in determining the performance of a deep learning model. The hyperparameters used in this paper are listed in Table 2.

Table 2 Optimal architecture of the parameters used in the study

Full size table

The most widely used activation function for deep learning was the rectified linear unit (ReLU), which is f(x) = max (0, x). A new activation function was proposed by the Google Brain team [43], named ‘swish’ which is f(x) = x · sigmoid(x), which performs better in a deeper network. Hence, in the present study, the swish activation function was used, and for BiLSTM part, ‘tanh’ activation function was applied.

There exist different types of optimizing algorithms such as Gradient Descent, Stochastic Gradient Descent, Momentum Based Gradient Descent, Adaptive Moment Estimation (Adam), Nesterov Accelerated Gradient (NAG) and Root Mean Square Propagation (RMSProp) to minimize the loss function during the training of a machine learning model. In the present work, Adaptive Moment Estimation (Adam) optimizer was used due to its adaptive nature and combined momentum component [44].

3.3 Model Evaluation

Furthermore, the effectiveness of the model was tested following a walk forward validation method. Statistical error metrics like root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentile error (MAPE) were used for performance evaluation. The equations involved in the error metrics are as follows:

$$RMSE=\sqrt{\frac{1}{n}{\textstyle\sum }_{i\,=\,1}^{n}{\left({A}_{i}-{P}_{i}\right)}^{2}}$$

$$MAE= \frac{1}{n}\textstyle\sum_{i\,=\,1}^{n}|{A}_{i}-{P}_{i}|$$

$$MAPE=\left(\frac{1}{n}\textstyle\sum_{i\,=\,1}^{n}\frac{\left|{A}_{i}\,-\,{P}_{i}\right|}{\left|{A}_{i}\right|}\right)\times 10$$

where

A_i :: observed value
P _i :: model predicted value
n :: total number of samples

All the model was developed in a single HP-Z6-G4 Workstation under Linux environment. NVIDIA Quadro P2200 GPU was used with Python3.8, TensorFlow and Keras library to run the model.

4 Result and Discussions

4.1 Statistical Distribution Analysis

The time series data were subjected to normality test using KS statistics, Shapiro-Wilk test and Jarque-Bera test to understand the nature of time series. The results of the normality test (Supplementary Table T1) indicate the rather non-normal nature of the PM_2.5 dataset across different agroclimatic zones of India. The seasonal analysis of the data revealed the maximum average concentration of 195.5 μg/m³ of PM_2.5 during winter season and minimum of 113.8 μg/m³ during monsoon season. Similar observations were also reported by many authors in the past [11]. During monsoon season, WCPG area witnessed minimum average concentration and WD observed maximum average concentration. During the winter season, EH region has minimum average concentration, and the MGP has the maximum average concentration. It is worth mentioning that the TGP region has maximum average concentration during the post-monsoon season whereas the MGP region has maximum average concentration in pre-monsoon season. The seasonal contour plots (Fig. 5) revealed that north-western India has higher concentration of PM_2.5 especially during monsoon and post monsoon season. In winter and pre-monsoon seasons, the highest concentration was confined to Indo Gangetic plain region.

A detailed statistical distribution analysis was carried out to ascertain the nature of distribution prevalent in PM_2.5 data in different agroclimatic zones. It is imperative to understand the nature of the statistical distribution in any data as it determines the effectiveness of the model performance measures used in forecasting problems. To ascertain the best fit model, each dataset was tested against 7 common distributions applied in case of air pollution studies, namely, Normal, Log Normal, Logistic, Laplace, Weibull, Gamma and Beta. The best fitted distribution (Supplementary Fig. F1(A-Z)) was selected based on the minimum sum of square error criteria. The results (Table 3) reveal the predominance of Gamma and Beta distribution at 24 out of 26 sites. Overall, at 14 sites, PM_2.5 concentration follows Gamma distribution, Beta distribution at 10 sites and lognormal at the remaining two sites. It is pertinent to mention that all the sites at UGP, TGP and EPH have Gamma distribution as best fit distribution. Similarly, LGP, ECPH, GPH follow Beta distribution only. Log normal distribution was observed only at Shillong in EH region and at Patna in MGP. At the rest of the regions, mixed distribution fitting results were obtained.

Table 3 Best fit Statistical distribution parameters values

Full size table

4.2 Model Performance Evaluation Results

The model performance evaluations were carried out through commonly used error functions such as RMSE, MAE and MAPE. The error functions RMSE and MAE are scale dependent and not an optimum measure to compare different data sets where mean differences are larger. However, MAPE is a unitless function that is scale independent and more suitable for model comparisons even with data that have large variance and are infected with extreme values. The model’s performances (Fig. 6) in terms of RMSE values for 8-consecutive-hour advance predictions ranged from the minimum of 7.09 in Shillong at EH region to the maximum of 53.81 in Patna at MGP. Similarly, in terms of MAE, minimum of 5.41 and maximum of 34.09 were obtained at Shillong and Patna respectively. In terms of MAPE, the minimum value of 18.6% and a maximum of 52.7% were observed in Hyderabad and Chennai, respectively. The first step prediction errors in terms of RMSE were found to be less than 10 µg/m³ at nine cities whereas Shillong in EH, Howrah in LGP and Mandideep and Nagpur in CPH were found to be in between 10 and 15 µg/m³. In terms of MAE, 15 out of 26 cities have less than 10 µg/m³ MAE value, and overall, 24 out of 26 cities have ≤ 15 µg/m³ MAE values. Only Patna (MAE = 20.52) in the MGP and Talcher (MAE = 20.29) in EPH have MAE values more than 15 µg/m³. Similarly, 11 out of 26 sites have MAPE values less than 30%, and 20 sites show MAPE values < 35%. Apart from Jodhpur in WD and Chennai in ECPH zone, the rest of the sites have MAPE values less than 40%. (The detailed results are presented in Supplementary Table T2). Overall, for predicting up to 8-h ahead concentrations, the model performance was found to be relatively better in the central, southern and western regions of India in comparison to northern and eastern regions. The robustness of the prediction ability of the proposed model framework is also evident from the 1-h ahead and 8-h average PM_2.5 concentration estimates. The performance evaluation for 1-h ahead and 8-h average prediction horizon is essential as the regulatory air quality was reported mostly in this temporal range. The minimum values of RMSE, MAE and MAPE for 1-h ahead concentration was found to be 5.81 (at Shillong), 3.92 (at Aurangabad) and 10.8 (Howrah), respectively. The maximum RMSE of 41.384 and 29.04 was obtained respectively for 1-h and 8-h average periods at Patna and Talcher, respectively. Overall, less than 20 RMSE values were observed at 18 sites and 13 sites for 1-h ahead and 8-h average forecasting horizon, respectively. In case of MAE, again, minimum and maximum values for 1-h and 8-h average were found at Shillong and Patna respectively. The results were found to be more uniform across India in terms of MAE as only 3 sites namely Delhi, Talcher and Patna, have MAE values more than 20 µg/m³.

Across agroclimatic zones (Table 4), the best model performance for 1-h ahead prediction was achieved in terms of RMSE for SPH (7.6) followed by CPH (8.4) and WPH (9.4). Overall, 7 zones have RMSE values less than 20 and 3 zones each have RMSE values in the range 20 to 30 and 30 to 40 µg/m³, respectively. In terms of MAE, similar trend in the error was observed with the minimum value (5.4) obtained at SPH and maximum (16.3) at MGP. In case of 8-consecutive-hour advance predictions, the same pattern was obtained with minimum RMSE and MAE values of 11.1 and 8.0, respectively, observed at SPH, and maximum RMSE (40.4) and MAE (25.8) were obtained for MGP. However, the results for MAPE values were slightly different with minimum error value observed at LGP (10.8%) and EH (21.3%) for 1-h ahead and cumulative 8-consecutive-hour ahead prediction, respectively.

Table 4 Model performance agroclimatic zone wise

Full size table

The observed forecasting results exhibit spatial variability in model performance. As evident from the heatmap (Fig. 7) for multi-step hourly forecast, the regions mostly along the southeast, south, central and southwestern of India have better model performance in terms of MAE values. In most parts of North India, relatively poor model performance was observed except for Jamshedpur, Agra, Amritsar and Shillong. It is worth mentioning that, for the first step, model performance is best across India except at Patna and Talcher. Till date, cross-country analysis of model performance was not attempted in India, although ANN with deep learning architecture was attempted for selected pollution hotspots in India by many researchers [45, 46]. The results obtained at different locations and the corresponding observed values were further subjected to statistical distribution analysis. The best fit distribution was found to be the same as the original training data at each location in India. (The statistical distribution analysis plots for test results are not displayed here as it is same as that of the observed training dataset).

4.3 Effects of Data Length and SNR on Model Performance

The variability in the forecasting results across the different agroecosystems of India prompts us to examine the effect of data length and the nature of deterministic signal and random components present in the PM_2.5 time series using correlation analysis and signal-to-noise ratio (SNR) measurements. SNR quantifies the fraction of desired or good information with respect to unwanted or false information in each data series. In the present study, SNR was calculated [47] (Table 5) for each pre-processed dataset through the following equation:

$$SNR=\frac{\mu }{\sigma }$$

where µ is the mean and σ is the standard deviation of the time series data. Such equations are used in situations where all values are non-negative. The scatter plots and trend line between the data length and model performance error (Supplementary Fig. F2) does not show any relationship between them indicating minimal or no effect of data length on model performance. A scatter plot (Fig. 8) of MAE vs. SNR for 1-h ahead and 8-h cumulative forecasting reveals a sharply decreasing trend. It is evident that as the noise component reduces, the model error also declines significantly for both the forecasting horizons. It is pertinent to mention that when SNR is greater than ~ 1.5, error variance reduces significantly, i.e. model performance improves significantly. The variability in the results across the sites in India may be attributed to the level of noise present in the data series. In northern India, the pollution sources vary significantly because of large population density and traffic loads, thereby increasing the relative variance in the data set. The Indo Gangetic plain (IGP) region is known for large-scale farming and agricultural waste burning. It is to be noted that westerly winds are dominant in this region throughout the year except for monsoon season when easterlies bring monsoon rains. Westerlies wind-driven dust storms and agricultural burning bring large uncertainty in the dust load over the area. The poor performances for multistep ahead forecast of model in this region may be attributed to the weather-induced uncertainties.

Table 5 Signal-to-noise ratio

Full size table

4.4 Comparison with Other Studies

The comparative analysis of the results reveals a significant improvement in the model errors in terms of RMSE when compared with the multiple output in a sequence. [48] applied multi-output auto encoder model for forecasting PM_2.5 and PM₁₀ concentrations at Beijing city and obtained the best RMSE value of 39, although the applied model has used multiple inputs such as meteorological variables in addition to the time series data of the pollutant concentrations. [49] have applied an ANN model to achieve an error of 0.0191 in terms of MSE with a correlation coefficient of 0.7301. However, the model prediction horizon is of single-step only, and the model viability for multistep ahead prediction horizon was not examined. Similarly, [46] evaluated a simple feed-forward artificial neural network model for Kolkata region in eastern India using multivariate input parameters to predict single-step PM_2.5 concentrations during the COVID-induced lockdown period and reported the RMSE value of 3.74 and MAE value of 1.14. Similarly, [50] has tested 8 different models including Stacked LSTM, LSTM-autoencoder, BiLSTM and Conv2DLSTM models on different air pollutants in Kolkata and observed RMSE and MAE values more than 10 µg/m³. Similarly, [51] has achieved an MAE value of ~ 15 in case of PM_2.5 forecasting in Delhi. Furthermore, [48] reported RMSE values of 31, 56 and 68 for 3-h, 5-h and 9-h ahead prediction using ANN model for Talcher station in India. Using LSTM and BiLSTM, they have reported RMSE values of 26, 41, 80 and 42 and 155 and 168, respectively in comparison to the RMSE values of 29.04 and 40.41 for 1-h and 8-h ahead prediction horizon. In the present study, the proposed model has achieved RMSE values ranging from 7.09 to 53.81 across different data centres spread over 13 different agroclimatic zones in India. Out of the total 26 locations, 18 locations have RMSE values less than 30 in India. The results indicate the robustness of the model to be applicable to different locations in India without alterations.

5 Conclusion

Air pollution data mostly contained seasonal trends, multiple periodicities and stochastic components. To address multiple complexities, present in the air pollution data, a hybrid deep learning model was formulated by integrating Convolutional LSTM, 3D Convolutional Neural Network and Bidirectional LSTM network and examined its forecasting efficiency across India on a univariate PM_2.5 time series data. There is universality in the PM_2.5 data series across India as all of these data rejected the null hypothesis of normal distribution. They largely follow either Gamma or Beta distribution apart from Patna in MGP and Shillong in EH region that follow log normal distribution. The results obtained for 8-h ahead sequential prediction reveal significant variations across the region with minimum (7.09) and maximum (53.81) RMSE values obtained at Shillong in EH and Patna in MGP, respectively. Similar results (minimum: − 5.41 and maximum: − 34.09) were also found in terms of MAE values at Shillong and Patna respectively. In terms of MAPE, minimum and maximum values were observed to be 18.6 and 52.7% at Hyderabad and Chennai, respectively. The robustness of model performance was evident from the little variations observed in the model error estimation for 1-h ahead and 8-h sequential forecasts. The results (MAE) were further analysed against SNR and found strong association between level of error and SNR values. As SNR decreases, model performance decreases (MAE values increases). The variations in the SNR may be attributed to anthropogenic activities in the region. The results reveal weak performance in and around IGP in comparison to the rest of India. The model has the potential to be utilized for policy and planning for pollution control. It could be a useful tool for forewarning about lurking air pollution events.

Availability of Data and Materials

The data used in this study were procured from Central Pollution Control Board (CPCB), Government of India (http://www.cpcb.nic.in/). These are the public repositories maintained by the government of India.

References

Li, X., Zhang, X., Zhang, Z., Han, L., Gong, D., Li, J., Wang, T., Wang, Y., Gao, S., Duan, H., & Kong, F. (2019). (D. J. Schroeder (1999). Astronomical optics (2nd ed.). Academic Press. p. 278. ISBN 978-0-12-629810-9., p.278). Air pollution exposure and immunological and systemic inflammatory alterations among schoolchildren in China. Science of The Total Environment, 657, 1304–1310. https://doi.org/10.1016/j.scitotenv.2018.12.153
Chen, Z., Cui, L., Cui, X., Li, X., Yu, K., Yue, K., Dai, Z., Zhou, J., Jia, G., & Zhang, J. (2019). The association between high ambient air pollution exposure and respiratory health of young children: a cross sectional study in Jinan, China. Science of the Total Environment, 656, 740–749. https://doi.org/10.1016/j.scitotenv.2018.11.368
Organization, W. H. (n.d.). Ambient air pollution: a global assessment of exposure and burden of disease. World Health Organization. https://apps.who.int/iris/handle/10665/250141
Coats C. J., Jr. (1996). High-performance algorithms in the Sparse Matrix Operator Kernel Emissions (SMOKE) modeling system. In Proceedings of Ninth AMS Joint Conference on Applications of Air Pollution Meteorology with A&WMA. American Meteor Society, GA (pp. 584-588). https://www.osti.gov/biblio/422986
Olatinwo, R. O., Prabha, T., Paz, J. O., Riley, D. G., & Hoogenboom, G. (2010). The weather research and forecasting (WRF) model: Application in prediction of TSWV-vectors populations. Journal of Applied Entomology, 135(1–2), 81–90. https://doi.org/10.1111/j.1439-0418.2010.01539.x
Article Google Scholar
Vautard, R., Builtjes, P. J. H., Thunis, P., Cuvelier, C., Bedogni, M., Bessagnet, B., Honore, C., Moussiopoulos, N., Pirovano, G., Schaap, M., Stern, R., Tarrason, L., & Wind, P. (2007). Evaluation and intercomparison of Ozone and PM10 simulations by several chemistry transport models over four European cities within the CityDelta project. Atmospheric Environment, 41, 173–188. https://doi.org/10.1016/j.atmosenv.2006.07.039
Article CAS Google Scholar
Stern, R., Builtjes, P. J. H., Schaap, M., Timmermans, R., Vautard, R., Hodzic, A., Memmesheimer, M., Feldmann, H., Renner, E., Wolke, R., Kerschbaumer, A., Liu, B. C., Binaykia, A., Chang, P. C., Tiwari, M. K., Tsao, C. C., Srivastava, N., Mansimov, E., Salakhutdinov, R., … Bui, T. (2017). A model inter-comparison study focussing on episodes with elevated PM10 concentrations. Atmospheric Environment, 42(19), 4567–4588. https://doi.org/10.1016/j.neucom.2018.06.049
Saide, P. E., Carmichael, G. R., Spak, S. N., Gallardo, L., Osses, A. E., Mena-Carrasco, M. A., & Pagowski, M. (2011). Forecasting urban PM10 and PM2.5 pollution episodes in very stable nocturnal conditions and complex terrain using WRF–Chem CO tracer model. Atmospheric Environment, 45(16), 2769–2780. https://doi.org/10.1016/j.atmosenv.2011.02.001
Goyal, P., Chan, A. T., & Jaiswal, N. (2006). Statistical models for the prediction of respirable suspended particulate matter in urban cities. Atmospheric Environment, 40(11), 2068–2077. https://doi.org/10.1016/j.atmosenv.2005.11.041
Antanasijević, D. Z., Pocajt, V. V, Povrenović, D. S., Ristić, M. Đ., & Perić-Grujić, A. A. (2013). PM10 emission forecasting using artificial neural networks and genetic algorithm input variable optimization. Science of the Total Environment, 443, 511–519. https://doi.org/10.1016/j.scitotenv.2012.10.110
Mishra, D., & Goyal, P. (2016). Neuro-fuzzy approach to forecast NO2 pollutants addressed to air quality dispersion model over Delhi. India. Aerosol and Air Quality Research, 16(1), 166–174. https://doi.org/10.4209/aaqr.2015.04.0249
Article CAS Google Scholar
Paschalidou, A., Karakitsios, S., Kleanthous, S., & Kassomenos, P. (2011). Forecasting hourly PM10 concentration in Cyprus through artificial neural networks and multiple regression models: Implications to local environmental management. Environmental Science and Pollution Research International, 18, 316–327. https://doi.org/10.1007/s11356-010-0375-2
Article CAS Google Scholar
Kolehmainen, M., Martikainen, H., & Ruuskanen, J. (2001). Neural networks and periodic components used in air quality forecasting. Atmospheric Environment, 35, 815–825. https://doi.org/10.1016/S1352-2310(00)00385-X
Article CAS Google Scholar
Kang, Z., Qu, Z., Kim, M. H., Kim, Y. S., Lim, J., Kim, J. T., Sung, S. W., & Yoo, C. (2017). Data-driven prediction model of indoor air quality in an underground space. 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), 27(6), 1675–1680. https://doi.org/10.1007/s11814-010-0313-5
Feng, Y., Zhang, W., Sun, D., & Zhang, L. (2011). Ozone concentration forecast method based on genetic algorithm optimized back propagation neural networks and support vector machine data classification. Atmospheric Environment, 45(11), 1979–1985. https://doi.org/10.1016/j.atmosenv.2011.01.022
Prakash, A., Kumar, U., Kumar, K., & Jain, V. (2011). A wavelet-based neural network model to predict ambient air pollutants’ concentration. Environmental Modeling & Assessment, 16, 503–517. https://doi.org/10.1007/s10666-011-9270-6
Article Google Scholar
Díaz-Robles, L. A., Ortega, J. C., Fu, J. S., Reed, G. D., Chow, J. C., Watson, J. G., & Moncada-Herrera, J. A. (2008). A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco. Chile. Atmospheric Environment, 42(35), 8331–8340. https://doi.org/10.1016/j.atmosenv.2008.07.020
Article CAS Google Scholar
Chen, Y., Shi, R., Shu, S., & Gao, W. (2013). Ensemble and enhanced PM10 concentration forecast model based on stepwise regression and wavelet analysis. Atmospheric Environment, 74, 346–359. https://doi.org/10.1016/j.atmosenv.2013.04.002
Alimissis, A., Philippopoulos, K., Tzanis, C. G., & Deligiorgi, D. (2018). Spatial estimation of urban air pollution with the use of artificial neural network models. Atmospheric Environment, 191, 205–213. https://doi.org/10.1016/j.atmosenv.2018.07.058
Yang, Z., & Wang, J. (2017). A new air quality monitoring and early warning system: air quality assessment and air pollutant concentration prediction. Environmental Research, 158, 105–117. https://doi.org/10.1016/j.envres.2017.06.002
Li, X., Peng, L., Yao, X., Cui, S., Hu, Y., You, C., & Chi, T. (2017). Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environmental Pollution, 231(December), 997–1004. https://doi.org/10.1016/j.envpol.2017.08.114
Article CAS Google Scholar
Reddy, V., Yedavalli, P., Mohanty, S., & Nakhat, U. (2017). Deep air: forecasting air pollution in Beijing, China. https://www.ischool.berkeley.edu/sites/default/files/sproject_attachments/deep-airforecasting_final.pdf
Kök, İ, Şimşek, M. U., & Özdemir, S. (2017). A deep learning model for air quality prediction in smart cities. IEEE International Conference on Big Data (Big Data), 2017, 1983–1990. https://doi.org/10.1109/BigData.2017.8258144
Article Google Scholar
Liu, B., Yan, S., Li, J., Qu, G., Li, Y., Lang, J., & Gu, R. (2019). A sequence-to-sequence air quality predictor based on the n-step recurrent prediction. IEEE Access, 7, 43331–43345. https://doi.org/10.1109/ACCESS.2019.2908081
Article Google Scholar
Soh, P. W., Chang, J. W., & Huang, J. W. (2018). Adaptive deep learning-based air quality prediction model using the most relevant spatial-temporal relations. IEEE Access, 6, 38186–38199. https://doi.org/10.1109/ACCESS.2018.2849820
Article Google Scholar
Qi, Y., Li, Q., Karimian, H., Liu, D., Gong, Y., Liu, L., Yang, M., Bourdev, L., Soh, P., Chang, J., Huang, J., Stojov, V., Koteli, N., & Lameski, P. (2019). A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Science of The Total Environment, 664(2014), 1–10. https://doi.org/10.1016/j.scitotenv.2019.01.333
Fan, J., Li, Q., Hou, J., Feng, X., Karimian, H., & Lin, S. (2013). A spatiotemporal prediction framework for air pollution based on deep RNN. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4(4W2), 15–22. https://doi.org/10.5194/isprs-annals-IV-4-W2-15-2017
Zhang, C., Yan, J., Li, C., Rui, X., Liu, L., & Bie, R. (2016). On estimating air pollution from photos using convolutional neural network. MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, 297–301. https://doi.org/10.1145/2964284.2967230
Li, X., Peng, L., Hu, Y., Shao, J., & Chi, T. (2016). Deep learning architecture for air quality predictions. Environmental Science and Pollution Research, 23(22), 22408–22417. https://doi.org/10.1007/s11356-016-7812-9
Article Google Scholar
Wang, J., & Song, G. (2018). A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing, 314, 198–206. https://doi.org/10.1016/j.neucom.2018.06.049
Article Google Scholar
Bui, T., Le, V.-D., & Cha, S.-K. (2018). A deep learning approach for forecasting air pollution in South Korea using LSTM. http://arxiv.org/abs/1804.07891
Zhao, X., Zhang, R., Wu, J. L., & Chang, P. C. (2018). A deep recurrent neural network for air quality classification. Journal of Information Hiding and Multimedia Signal Processing, 9(2), 346–354.
Google Scholar
Lee, S., & Shin, J. (2019). Hybrid model of convolutional LSTM and CNN to predict particulate matter. International Journal of Information and Electronics Engineering, 9(1), 34–38. https://doi.org/10.18178/ijiee.2019.9.1.701
Pak, U., Ma, J., Ryu, U., Ryom, K., Juhyok, U., Pak, K., & Pak, C. (2020). Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: a case study of Beijing, China. Science of The Total Environment, 699, 133561. https://doi.org/10.1016/j.scitotenv.2019.07.367.
Ma, J., Cheng, J. C. P. P., Lin, C., Tan, Y., & Zhang, J. (2019). Improving air quality prediction accuracy at larger temporal resolutions using deep learning and transfer learning techniques. Atmospheric Environment, 214(July), 116885.
Article CAS Google Scholar
Bhatla, R., Sarkar, D., Verma, S., Sinha, P., Ghosh, S., & Mall, R. K. (2020). Regional climate model performance and application of bias corrections in simulating summer monsoon maximum temperature for agro-climatic zones in India. Theoretical and Applied Climatology, 142(3), 1595–1612. https://doi.org/10.1007/s00704-020-03393-z
Article Google Scholar
Gnauck, A. (2004). Interpolation and approximation of water quality time series and process identification. Analytical and Bioanalytical Chemistry, 380(3), 484–492. https://doi.org/10.1007/s00216-004-2799-3
Article CAS Google Scholar
Cho, K., Merrienboer, van, Gülçehre, Ç. B., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. BT - Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, (pp. 1724–1734). https://doi.org/10.3115/v1/d14-1179
Wu, Z., Rincon, D., Luo, J., & Christofides, P. D. (2021). Machine learning modeling and predictive control of nonlinear processes using noisy data. AIChE Journal, 67(4), e17164. https://doi.org/10.1002/aic.17164
Zhang, B., Zhang, H., Zhao, G., & Lian, J. (2020). Constructing a PM2.5 concentration prediction model by combining auto-encoder with Bi-LSTM neural networks. Environmental Modelling & Software, 124, 104600. https://doi.org/10.1016/j.envsoft.2019.104600
Essien, A., & Giannetti, C. (2020). A deep learning model for smart manufacturing using convolutional LSTM neural network autoencoders. IEEE Transactions on Industrial Informatics, 16(9), 6069–6078. https://doi.org/10.1109/TII.2020.2967556
Article Google Scholar
Rolling analysis of time series. In: Zivot, E., Wang, J. (Eds.), Modeling Financial Time Series with S-PLUS^®. Springer New York, New York, NY, pp. 313–360. https://doi.org/10.1007/978-0-387-32348-0_9
Ramachandran, P., Zoph, B., & Le, Q. V. (2018). Searching for activation functions. 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings. https://arxiv.org/pdf/1710.05941.pdf
Kingma, D. P., & Ba, J. L. (2015). Adam: a method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.
Chelani, A., & Gautam, S. (2021). Lockdown during COVID-19 pandemic: a case study from Indian cities shows insignificant effects on persistent property of urban air quality. Geoscience Frontiers, 101284. https://doi.org/10.1016/j.gsf.2021.101284
Bera, B., Bhattacharjee, S., Shit, P. K., Sengupta, N., & Saha, S. (2021). Significant impacts of COVID-19 lockdown on urban air pollution in Kolkata (India) and amelioration of environmental health. Environment, Development and Sustainability, 23(5), 6913–6940. https://doi.org/10.1007/s10668-020-00898-5
Article Google Scholar
Schroeder, D. J. (1999). Astronomical optics (2nd ed.). Academic Press. p. 278. ISBN 978-0-12-629810-9, p.278.
Samal, K. K. R., Panda, A. K., Babu, K. S., & Das, S. K. (2021). Multi-output TCN autoencoder for long-term pollution forecasting for multiple sites. Urban Climate, 39, 100943. https://doi.org/10.1016/j.uclim.2021.100943
Masood, A., & Ahmad, K. (2020). A model for particulate matter (PM2.5) prediction for Delhi based on machine learning approaches. Procedia Computer Science, 167, 2101–2110. https://doi.org/10.1016/j.procs.2020.03.258
Article Google Scholar
Middya, A. I., & Roy, S. (2022). Pollutant specific optimal deep learning and statistical model building for air quality forecasting. Environmental Pollution, 301, 118972. https://doi.org/10.1016/j.envpol.2022.118972
Kumar, S., Mishra, S., & Singh, S. K. (2020). A machine learning-based model to estimate PM2.5 concentration levels in Delhi’s atmosphere. Heliyon, 6(11), e05618. https://doi.org/10.1016/j.heliyon.2020.e05618

Download references

Acknowledgements

The authors would like to acknowledge the Tezpur University for providing the necessary infrastructure for carrying out this study. The database maintained and provided by Central Pollution Control Board (CBCB) is also duly acknowledged. We also acknowledge the following state level pollution control boards (PCB) of India—Uttar Pradesh PCB, Gujrat PCB, Punjub PCB, Maharashtra PCB, Bihar State PCB, Assam PCB, West Bengal PCB, Telengana State PCB, Jharkhand State PCB, Rajasthan State PCB, Madhya Pradesh PCB, Meghalaya PCB, Odisha State PCB, Kerala PCB and Andhra Pradesh PCB. We are pleased to acknowledge the Open-Source technologies such as Linux, Python, TensorFlow, Keras, etc., which are used for postprocessing model outputs.

Author information

Authors and Affiliations

Department of Environmental Science, Tezpur University, Napaam, 784028, Assam, India
Pranjol Goswami & Amit Prakash
Leadics, Bengaluru, 560038, Karnataka, India
Manoj Prakash
Department of Geology, Sikkim University, Gangtok, 737102, Sikkim, India
Rakesh Kumar Ranjan

Authors

Pranjol Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Kumar Ranjan
View author publications
You can also search for this author in PubMed Google Scholar
Amit Prakash
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mr. Pranjol Goswami. The present work is part of the Ph.D. research work for his thesis. He has developed the Deep Learning Network and carried out major part of data analysis and preparation of first draft of the paper. Mr. Manoj Prakash: He has contributed to model testing and refinement and further analysed the network design and efficiency. Dr. Rakesh Kumar Ranjan: He has contributed in GIS related work in data presentation and statistical analysis. Dr. Amit Prakash: the lead investigator of the present work; conceptualized the research work and provided the overall supervision. All the authors have contributed to the data analysis and manuscript preparation.

Corresponding author

Correspondence to Amit Prakash.

Ethics declarations

Ethical Approval

Not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1113 KB)

Supplementary file2 (DOCX 26 KB)

Supplementary file3 (DOCX 21 KB)

Supplementary file4 (DOCX 32 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Goswami, P., Prakash, M., Ranjan, R.K. et al. A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM_2.5 Concentration Across India. Environ Model Assess 28, 803–816 (2023). https://doi.org/10.1007/s10666-023-09902-4

Download citation

Received: 25 September 2022
Accepted: 10 May 2023
Published: 03 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10666-023-09902-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM2.5 Concentration Across India

Abstract

Similar content being viewed by others

Prediction of hourly PM10 concentration through a hybrid deep learning-based method

Deep learning algorithms for prediction of PM10 dynamics in urban and rural areas of Korea

Long-term time-series pollution forecast using statistical and deep learning methods

Explore related subjects

1 Introduction

2 Site and Data Description

2.1 Data Pre-Processing

3 Model Development

3.1 Network Architecture

3.2 Hyperparameters

3.3 Model Evaluation

4 Result and Discussions

4.1 Statistical Distribution Analysis

4.2 Model Performance Evaluation Results

4.3 Effects of Data Length and SNR on Model Performance

4.4 Comparison with Other Studies

5 Conclusion

Availability of Data and Materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Competing Interests

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 1113 KB)

Supplementary file2 (DOCX 26 KB)

Supplementary file3 (DOCX 21 KB)

Supplementary file4 (DOCX 32 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

A Hybrid Deep Learning Model for Multi-step Ahead Prediction of PM_2.5 Concentration Across India

Prediction of hourly PM₁₀ concentration through a hybrid deep learning-based method

Deep learning algorithms for prediction of PM₁₀ dynamics in urban and rural areas of Korea