Integrated Dual LSTM Model-Based Air Quality Prediction

Muley, Rajesh Reddy; Sri, Vadlamudi Teja Sai; Kumar, Kuntamukkala Kiran; Kumar, Kakumanu Manoj

doi:10.1007/978-981-99-4071-4_55

Rajesh Reddy Muley¹³,
Vadlamudi Teja Sai Sri¹⁴,
Kuntamukkala Kiran Kumar¹⁴ &
…
Kakumanu Manoj Kumar¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 731))

Included in the following conference series:

International Conference On Innovative Computing And Communication

261 Accesses

Abstract

Although air quality prediction is a crucial tool for weather forecasting and air quality management, algorithms for making predictions that are based on a single model are prone to overfitting. In order to address the complexity of air quality prediction, a prediction approach based on integrated dual long short-term memory (LSTM) models was developed in this study. The model takes into account the variables that affect air quality such as nearby station data and weather information. Finally, two models are integrated using the eXtreme Gradient Boosting (XGBoosting) tree. The ultimate results of the prediction may be obtained by summing the predicted values of the ideal subtree nodes. The proposed method was tested and examined using five evaluation techniques. The accuracy of the prediction data in our model has significantly increased when compared with other models.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Air quality prediction using CT-LSTM

Article 20 November 2020

Comparative analysis of Air Quality Index prediction using deep learning algorithms

Article 21 July 2023

Air pollution prediction system using XRSTH-LSTM algorithm

Article 23 July 2023

Keywords

1 Introduction

The amount of exhaust gas produced by several factories and automobiles continues to climb as industrialisation levels rise, substantially increasing air pollution. People's daily lives are significantly impacted by air quality. Accurate air quality forecasting has emerged as a key strategy for reducing pollution and raising air quality. Data on air quality has caused great worry throughout the world. For predicting air quality, time series data prediction techniques are frequently employed, along with time series prediction models and conventional machine learning techniques. Some methods imitate the temporal and geographical dependency of air quality data concurrently. However, commonly used machine learning techniques frequently exhibit considerable performance variability under various conditions. Numerous variables, including temperature, wind speed, and geographical arrangement have an impact on air quality. As a result, it is challenging to produce certain and precise prediction results using the popular single model prediction method. Our thoughts in this paper are based on a strategy that has recently been discussed in the literature: integrating various models to predict air quality. When compared with current models, the integrated model can greatly increase the ability to forecast. But there is still much to learn about how to combine the benefits of several models depending on the features of the data collection.

2 Literature Survey

Petr Hájek et al. in [1] genetic algorithms optimise the input variable sets for each forecast of an air pollutant. Based on information gathered by the Pardubice city monitoring station in the Czech Republic, models are developed to predict the specific air quality indices for each air pollutant. The results show that when the root mean squared error is taken into consideration, individual prediction model compositions outperform single forecasts of the common air quality index. As a result, these models can be used to produce air quality index predictions that are more accurate one day in advance.

In order to avoid air pollution in urban areas and improve the quality of life for city dwellers, Kang et al. [2] highlighted the importance of conducting work on city air quality forecasting. Following that, AQI prediction models based on back propagation (BP) neural networks, genetic algorithm optimisation, and genetic simulated annealing algorithm optimisation are established. Comparing and evaluating the prediction outcomes reveal that the BP neural network based on genetic simulated annealing method has a higher accuracy rate, excellent generalisation capacity, and global search ability.

According to Wang et al. [3], who found that air pollution was becoming more severe, the most significant air pollutant, PM2.5 in aerosols, had a negative impact on people's regular output, way of life, and employment, as well as their health. As a result, the forecasting of PM2.5 concentration has taken on significant practical importance. The study selects real-time air quality data that is released, collects historical monitoring data of air environmental contaminants, normalises the data, and then splits the sample data into the two sets in a suitable ratio to form the training dataset and test dataset.

A key component of a smart city is a system for measuring and forecasting air quality, Mahajan et al. [5]. Making a forecast system with great accuracy and a reasonable calculation time is one of the biggest challenges. In this study, we demonstrate that a variety of clustering algorithms may be used to forecast fine particulate matter (PM2.5) concentrations reliably and quickly. We cluster the monitoring stations depending on their geographic proximity using a grid-based methodology. Data from 557 stations that have been distributed throughout Taiwan's Airbox device network is used in the tests and evaluation. The accuracy and processing time of the various clustering algorithms are compared in a final study.

3 Existing System

Commonly used machine learning techniques frequently exhibit considerable performance variability under various conditions. Numerous variables, including temperature, wind speed, and geographical arrangement have an impact on air quality. As a result, it is challenging to produce certain and precise prediction results using the popular single model prediction method.

4 Proposed System

In this work, a prediction approach based on integrated dual long short-term memory (LSTM) models was created to handle the complexity of air quality prediction. First, a single-factor prediction model that can independently forecast the value of each component in air quality data is created using sequence to sequence (Seq2Seq) technology. The multi-factor prediction model is then the LSTM model plus the attention mechanism. The model takes into account the air quality parameters such as the data from nearby stations and the weather. The two models are then combined using the eXtreme Gradient Boosting (XGBoosting) tree.

5 System Architecture

See Fig. 1.

A flowchart of a system architecture. It starts with raw input data, followed by normalizing the dataset based on air quality, single and multi-factor prediction models and results. This collective result and metrological data lead to an integrated dual-mode to provide final prediction results. — **Fig. 1**

6 Flow Chart

See Fig. 2.

A flowchart of the processes on a dataset. It starts with loading and verification. If verification is successful, it processes and splits the dataset, calculates values of R M S E, M A E, and M A P E based on different algorithms and finally gives the comparison graph. If not, it ends directly. — **Fig. 2**

7 Results

Single-factor model is subsequently improved using the ATTENTION layer to create a multi-factor (combination of LSTM, sequence 2 sequence, and attention). In order to combine both models and improve prediction accuracy, features from the multi-model are extracted and retrained using XGBOOST.

The screen below displays information from the air quality dataset, which was used to construct this project.

The first row of the dataset's screen in Fig. 3 shows its column names, while the following rows show its values. As the training features, we used PM values, the target variable, and others.

A screenshot of an Edit plus window with air quality dataset of Ahmedabad in a c s v file. — **Fig. 3**

All three models—single-factor LSTM, multi-factor LSTM with attention, and multi-factor integrated with XGBOOST—have been coded by our team. Below are the code and output screens for all the models we coded in the Jupyter notebook. You can see BLUE colour comments in each screen to learn about code (Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19).

A screenshot of a window has an untitled tab with Python code. It has a command that reads loading python require packages. — **Fig. 4**

A screenshot of a window has an untitled tab with Python code. It has functions to calculate MAPE and to read and display dataset values. A table at the bottom has the dataset values of Ahmedabad. — **Fig. 5**

A screenshot of a window has an untitled tab with Python code. It has a function to extract x features and y labels from the dataset, to normalize the features, and to split the dataset into train and test. — **Fig. 6**

A screenshot of a window has an untitled tab with Python code. It has a function to calculate R M S E, M A E, and MAPE for each algorithm. — **Fig. 7**

A screenshot of a window has an untitled tab with Python code. The algorithm has a function to train single-factor L S T M model without attention. — **Fig. 8**

A screenshot of a window has an untitled tab. A table for a sequential model has layer type, output shape, and param number. The total number of params is 71051, which is the same as the number of trainable params. — **Fig. 9**

A screenshot of a window has an untitled tab. A table lists 17 rows of values of actual test data and predicted air quality. — **Fig. 10**

A screenshot of a window has an untitled tab. A single factor L S T M air quality prediction graph plots predicted air quality versus test data quality. Both trends overlap and fluctuate. — **Fig. 11**

A screenshot of a window has an untitled tab. A Python code for training with L S T M algorithm with attention as multifactor is given. — **Fig. 12**

A screenshot of a window has an untitled tab. A table lists 16 rows of values of actual test data and predicted air quality. — **Fig. 14**

A screenshot of a window has an untitled tab. A multifactor L S T M air quality prediction graph plots predicted air quality versus test data quality. Both trends overlap and fluctuate. — **Fig. 15**

A screenshot of a window has an untitled tab with Python code for an integrated model using Xgboost and multimodel attention L S T M. A table below lists 10 rows of values of actual test data and predicted air quality. — **Fig. 16**

A screenshot of a window has an untitled tab. An integrated model air quality prediction graph plots predicted air quality versus test data quality. Both trends overlap and fluctuate. — **Fig. 17**

A screenshot of a window has an untitled tab. It has a grouped bar graph of performance in percentage versus algorithms such as multi-factor, single-factor, and X G boost. 3 output bars of M A E, MAPE, and R M S E are plotted for each algorithm. Out of which MAPE has higher values in all. — **Fig. 18**

A screenshot of a window has an untitled tab. It has a Python code and a table with columns for algorithm name, R M S E, M A E, and MAPE. — **Fig. 19**

8 Conclusion

We suggested a prediction model based on integrated dual LSTM model method to increase the precision of air quality data prediction. The integrated model's realisation procedure and impact can be summed up as follows. The air quality characteristics in the model are taken into consideration, together with meteorological information and data from surrounding stations. The method tree is then used to integrate the two models. First, single-factor models for each factor in the temporal dimension were made. To obtain the forecasted outcomes, the temporal dimension's attributes are employed. The projected value and weight of each leaf node are put together to provide the ideal expected value. Since the technique outlined in this study is based on analysing the experimental data using five evaluation indicators, it can result in predictions that are more accurate.

In order to improve the accuracy of by integrating the advantages of various models, the integrated dual LSTM model technique will be expanded in the next phase of the study. Although our model's outputs have very low probability, we have also found certain prediction results with outlier values. The examination of this sort of outlier value is one of the concerns that has to be addressed in the feature scope.

References

Petr H, Vladimir O (2013) Prediction of air quality indices by neural networks and fuzzy inference systems. Commun Comput Inf Sci 383:302–312. https://doi.org/10.1007/978-3-642-41013-0_31
Article Google Scholar
Kang Z, Qu Z (2017) Application of BP neural network optimized by genetic simulated annealing algorithm to prediction of air quality index in Lanzhou. In: Proc. IEEE Comput. Intell. Appl. (ICCIA), Sep. 2017, pp. 155–160. https://doi.org/10.1109/CIAPP.2017.8167199
Wang X, Wang B (2019) ‘Research on prediction of environmental aerosol and PM2.5 based on artificial neural network.’ Neural Comput Appl 31(12):8217–8227. https://doi.org/10.1007/s00521-018-3861-y
Article Google Scholar
T. S. Rajput and N. Sharma, ‘‘Multivariate regression analysis of air quality index for Hyderabad city: Forecasting model with hourly frequency,’’ Int. J. Appl. Res., vol. 3, no. 8, pp. 443–447, 2017. Accessed: Mar. 20, 2021. [Online]. Available: https://www.allresearchjournal.com/archives/2017/vol3iss ue8/PartG/3–8- 78–443.pdf
Mahajan S, Liu H-M, Tsai T-C, Chen L-J (2018) ‘Improving the accuracy and efficiency of PM2.5 forecast service using cluster-based hybrid neural network model.’ IEEE Access 6:19193–19204. https://doi.org/10.1109/ACCESS.2018.2820164
Article Google Scholar
Li R, Dong Y, Zhu Z, Li C, Yang H (2019) ‘A dynamic evaluation framework for ambient air pollution monitoring.’ Appl Math Model 65:52–71. https://doi.org/10.1016/j.apm.2018.07.052
Article Google Scholar
Liu B, Yan S, Li J, Qu G, Li Y, Lang J, Gu R (2019) ‘A sequenceto-sequence air quality predictor based on the n-step recurrent prediction.’ IEEE Access 7:43331–43345. https://doi.org/10.1109/ACCESS.2019.2908081
Article Google Scholar
Gu K, Qiao J, Lin W (2018) ‘Recurrent air quality predictor based on meteorology- and pollution-related factors.’ IEEE Trans. Ind. Informat. 14(9):3946–3955. https://doi.org/10.1109/TII.2018.2793950
Article Google Scholar
Benhaddi M, Ouarzazi J (2021) ‘Multivariate time series forecasting with dilated residual convolutional neural networks for urban air quality prediction.’ Arabian J. Sci. Eng. 46(4):3423–3442. https://doi.org/10.1007/s13369-020-05109-x
Article Google Scholar
Song X, Huang J, Song D (2019) Air quality prediction based on LSTM-Kalman model. In: Proceedings IEEE 8th Joint Int. Inf. Technol. Artif. Intell. Conf. (ITAIC), Chongqing, China, May 2019, 695–699. https://doi.org/10.1109/ITAIC.2019.8785751

Download references

Author information

Authors and Affiliations

Department of Information Technology, Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India
Rajesh Reddy Muley
Lakireddy Bali Reddy College of Engineering, Mylavaram, Andhra Pradesh, India
Vadlamudi Teja Sai Sri, Kuntamukkala Kiran Kumar & Kakumanu Manoj Kumar

Authors

Rajesh Reddy Muley
View author publications
You can also search for this author in PubMed Google Scholar
Vadlamudi Teja Sai Sri
View author publications
You can also search for this author in PubMed Google Scholar
Kuntamukkala Kiran Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Kakumanu Manoj Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajesh Reddy Muley .

Editor information

Editors and Affiliations

IT Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, Delhi, India
Sameer Anand
Department of Computer Science, Shaheed Sukhdev College of Business Studies, University of Delhi, New Delhi, India
Ajay Jaiswal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muley, R.R., Sri, V.T.S., Kumar, K.K., Kumar, K.M. (2024). Integrated Dual LSTM Model-Based Air Quality Prediction. In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A. (eds) International Conference on Innovative Computing and Communications. ICICC 2023. Lecture Notes in Networks and Systems, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-99-4071-4_55

Download citation

DOI: https://doi.org/10.1007/978-981-99-4071-4_55
Published: 26 October 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4070-7
Online ISBN: 978-981-99-4071-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Integrated Dual LSTM Model-Based Air Quality Prediction

Abstract