Data Fusion for the Improvement of Low-Cost Air Quality Sensors

Kassandros, Theodosios; Bagkis, Evangelos; Karatzas, Kostas

doi:10.1007/978-3-031-12786-1_24

Theodosios Kassandros³,
Evangelos Bagkis³ &
Kostas Karatzas³

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

International Technical Meeting on Air Pollution Modelling and its Application

361 Accesses
2 Citations

Abstract

Aim of this study is to develop a calibration procedure through Machine Learning to upgrade the low-cost air quality sensor performance and investigate the generalization of this function over a specific area towards air quality data fusion.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Improving Performance of Low-Cost Sensors Using Machine Learning Calibration with a 2-Step Model

Stochastic Comparison of Machine Learning Approaches to Calibration of Mobile Air Quality Monitors

Analysis and Improvement of Two Low-Cost Air Quality Sensor Measurements’ Uncertainty

Keywords

1 Introduction

Bad air quality (AQ) has a negative impact on peoples’ quality of life. The small number of monitoring stations used for the official AQ monitoring and the operationally available air pollution modelling tools still leave open space for improving local AQ knowledge. The KASTOM project (www.air4me.eu) is developing a versatile and flexible air quality monitoring and forecasting system by deploying an IoT-oriented network of low-cost AQ sensor nodes (LCAQSN), while in parallel developing a state-of-the-art emission modeling module combined with state-of-the-art three-dimensional AQ models. LCAQSN can cover larger areas due to their low cost but are lacking the necessary accuracy.

2 Materials

The Greater Thessaloniki Area (GTA) is the second largest urban agglomeration in Greece hosting more than 1 million inhabitants. The KASTOM project has installed 33 LCAQSN in the GTA including: (a) Particle sensors (PM10–PM2.5: PMS5003, Beijing Plantower Co., Ltd.), (b) sensors for gaseous pollutants (NO₂, O₃ and CO: Alphasense Ltd., U.K.) and meteorological sensors (Air Temperature, Relative Humidity and Air Pressure, BME280 Bosch Sensortec, Germany).

In this study, we have collocated six nodes with two reference stations (Fig. 1) in Agias Sofias (AGSOF) and Kordelio (KORD) areas, classified by the European Environment Agency as an urban traffic and urban industrial station respectively.

A map of the Greater Thessaloniki area marks regions installed with L C A Q S N. — **Fig. 1**

The initial dataset (NodeSet) consists of six nodes measurements (Node1–3 located in AGSOF and Node4–6 located in KORD) for the period of 21/12/2019–10/03/2020 and the reference stations measurements for PM10, O₃ and NO₂ (NO₂ measurements in KORD omitted due to missing value problems). The additional dataset (FSet) included meteorological modeling (WRF) and free traffic flow data (Salanova et al., 2018). All variables are presented in Table 1.

Table 1 Dataset description

Full size table

3 Methods

The first step of the computational procedure aimed at generating a set of features, capturing the maximum amount of information. We therefore applied time lags (from 1 to 12 h) and rolling—aggregation statistics (6 and 12 h) to all the variables, leading to 161 features for the Nset and 401 features for the Fset. To reduce noise introduced by features, a feature reduction procedure was followed employing the Random Forest Feature Importance (RFFI) method. We then employed a Machine Learning (ML)-oriented modeling approach, making use of the reference station measurements as target parameters (PM10, O₃ and NO₂) to calibrate and upgrade the KASTOM nodes performance. Models were trained in the two subsets, for each sensor and location. A Gradient Boosting algorithm was used (Friedman, 2001), combining the outputs sequentially from individual regression trees, where each new tree helps to correct errors made by a previously trained tree.

To evaluate the initial performance of the LCAQSN, the Pearson Correlation Coefficient (r) and Coefficient of Divergence (CoD) were calculated. The ML models were evaluated using a fivefold time forward cross validation on a rolling basis, using the Coefficient of Determination (R²) and the Relative Expanded Uncertainty (REU), following the methodology described in the Guide to the Demonstration of Equivalence of Ambient Air Monitoring Methods (EUD, 2008). According to the European Air Quality Directive, uncertainties for “class 1 sensor” or indicative measurements are 50, 25, 30% and for “class 2 sensor” or objective measurements are 100, 75, 75% for PM10, NO₂ and O₃ respectively.

4 Results

Field calibration of an LCAQSN network requires the individual nodes to perform identical to each other, this being the first condition to apply the same calibration function. This was checked with the aid of the CoD versus Pearson (Fig. 2). All PM10 sensors scored very high Pearson and very low CoD thus behaving identical, but the gas sensors, and especially O₃ sensors in the AGSOF, displaying a more diverse behavior therefore suggesting that in this case, the generalization of the calibration functions could be more challenging.

Two scatterplots of C o D versus Pearson. Graph 1 represents data for ozone, nitrogen oxide, and particulate material 10. Graph 2 represents data for ozone and particulate material 10. — **Fig. 2**

The RFFI selected the most relevant features, mostly the ones deriving from the KASTOM nodes’ measurements, but also meteorological factors deriving from modeling (Fig. 3). On the other hand, traffic related features are only chosen in the AGSOF location (an urban traffic station). Also, traffic features seem to influence more NO₂ and PM10 than O₃.

A stacked vertical bar graph plots the percent of features used versus node and pollutant. 3 stacks in the graph represent node, W R F, and traffic. — **Fig. 3**

.

While raw measurements display extremely poor scores against reference measurements, the computational procedure and the XGBoost shows promising results (Table 2). In most of the cases the use of the Fset leads to better output than the use of the Nset, though by a small margin.

Table 2 R² score for XGBoost and raw measurements. Bold: best performance per sensor

Full size table

In terms of REU, the calibrated PM10 can be considered as “class 1 sensor” in both locations, while the calibrated O3 are above the desired threshold but have still improved their performance and be considered as “class 2 sensor” (Fig. 4).

4 line graphs plot R E U versus P M 10 for raw and calibrated measurements. Graphs 1 and 2 have 3 increasing red curves and 3 decreasing blue curves. Graphs 3 and 4 have 6 decreasing curves of both — **Fig. 4**

5 Conclusions

The intercomparison of LCAQSN for a small time period, proves that PM10 sensors are behaving similar in the same locations and the proposed computational calibration procedure can upgrade their performance as indicative measurements for regulatory purposes, while it may be possible to apply the same approach to the rest of the network. For NO₂ and O₃, while the calibration functions can improve the sensors’ response, the desired REU levels couldn’t be reached. In every case data fusion is improving results and therefore more data sources and additional effort towards better fusion should be considered.

References

EUD. (2008). Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe. Official Journal of the European Union L152
Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189–1232.https://doi.org/10.1214/aos/1013203451
Salanova et al., 2018Salanova Grau J. M., Mitsakis E., Tzenos P., Stamos I., & Aifadopoulou, G. (2018). Multisource data framework for road traffic state estimation. Journal of Advanced Transportation, 1–9. https://doi.org/10.1155/2018/9078547

Download references

Acknowledgements

This research has been co‐financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH—CREATE—INNOVATE. Project code Τ1ΕDΚ-01697; project name Innovative system for air quality monitoring and forecasting (KASTOM, www.air4me.eu).

Author information

Authors and Affiliations

Environmental Informatics Research Group, School of Mechanical Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Theodosios Kassandros, Evangelos Bagkis & Kostas Karatzas

Authors

Theodosios Kassandros
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Bagkis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Karatzas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Theodosios Kassandros .

Editor information

Editors and Affiliations

VITO NV, Mol, Belgium
Clemens Mensink
Barcelona Supercomputing Center, Barcelona, Spain
Oriol Jorba

Questions and Answers

QUESTIONER:: Zhaoyue Chen
QUESTION:: Thanks, how could you determine lagged hour length when enriching feature space?
ANSWER:: The lagged length was determined after trial-and-error experiments, while it has been observed from previous computational exercises by our group that no more than 24 hours lagged is important for low-cost sensor nodes calibration.
QUESTIONER:: Bas Mijling
QUESTION:: Low-cost sensors are calibrated at two different sites. What would happen if the sensor location snapped? Does the calibration obtained at site 1 is applicable at site 2?
ANSWER:: This is a very interesting question and can be answered thoroughly only if further research is applied. From our knowledge of the field and ongoing experiments, applying a calibrated function from Agias Sofias to Kordelio and vice versa is yielding good results in terms of uncertainty and R² for PM2.5 and PM10, and acceptable metrics for O₃. Although the question about the spatial generalizability of the calibration function cannot be answered with only two reference stations collocated with the low-cost sensors. Data from a third collocated reference station, not included in this study, show more ambiguous behavior and thus applying functions by proximity or by type of station (urban, suburban, traffic, background, etc.) or applying one generalized calibration function trained in all available locations, would be considered for calibrating the whole network of 33 devices.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kassandros, T., Bagkis, E., Karatzas, K. (2022). Data Fusion for the Improvement of Low-Cost Air Quality Sensors. In: Mensink, C., Jorba, O. (eds) Air Pollution Modeling and its Application XXVIII. ITM 2021. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-031-12786-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-12786-1_24
Published: 02 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12785-4
Online ISBN: 978-3-031-12786-1
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Data Fusion for the Improvement of Low-Cost Air Quality Sensors

Abstract

Similar content being viewed by others

Improving Performance of Low-Cost Sensors Using Machine Learning Calibration with a 2-Step Model

Stochastic Comparison of Machine Learning Approaches to Calibration of Mobile Air Quality Monitors

Analysis and Improvement of Two Low-Cost Air Quality Sensor Measurements’ Uncertainty

Keywords

1 Introduction

2 Materials

3 Methods

4 Results

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Questions and Answers

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Data Fusion for the Improvement of Low-Cost Air Quality Sensors

Abstract

Similar content being viewed by others

Improving Performance of Low-Cost Sensors Using Machine Learning Calibration with a 2-Step Model

Stochastic Comparison of Machine Learning Approaches to Calibration of Mobile Air Quality Monitors

Analysis and Improvement of Two Low-Cost Air Quality Sensor Measurements’ Uncertainty

Keywords

1 Introduction

2 Materials

3 Methods

4 Results

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Questions and Answers

Questions and Answers

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation