Abstract
Low-cost air quality sensors are widely used to improve temporal and spatial resolution of air quality data. In Lima, Peru, only a limited number of reference air quality monitors have been installed, which has led to a lack of data for establishing environmental and health policies. Low-cost technology is promising for developing countries because it is small and inexpensive to operate and maintain. However, considerable work remains to be done to improve data quality. In this study, a low-cost sensor was installed with a reference monitor station as the first stage for the calibration process, and a multiple regression model was developed based on reference measurements as an outcome variable using sensor data, temperature, and relative humidity as the predictive parameters. The results show that this particular technology exhibits a promising performance in measuring PM2.5 and PM10 (particulate matter with diameter aerodynamic less than 2.5 μm and 10 μm, respectively); however, the correlation for PM2.5 appears to be better. Temperature and relative humidity data from the sensor were only partially analyzed due to the evident low correlation with the reference meteorological data. The objective of this study is to begin analyzing the performance of low-cost sensors that have already been introduced to the Peruvian market and selecting those that perform better to provide for informed decision-making.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Air pollution causes 7 million deaths worldwide every year (WHO (World Health Organization) n.d.). The 2018 Lancet Commission on pollution pointed out that air pollution accounts for 16% of all deaths worldwide, with 4.2 million deaths due to ambient air PM2.5 (Landrigan et al. 2018). Clear evidence of the potential negative health effects of air pollution, such as chronic obstructive pulmonary disease (COPD), lung cancer, and cardiovascular disease, has been widely reported in the literature (Costa et al. 2017; Ierodiakonou et al. 2016; Kim et al. 2017; Nyhan et al. 2013). According to Romero et al. 2019, the transportation sector corresponds to one of the main pollutant sources in the Lima Metropolitan Area (LMA); however, spatial and temporal emission concentration disaggregation must be performed to update transport and environmental regulations. Air quality station data are needed to analyze the regulations implemented on air quality and health. In Peru, only a few air quality stations are operated by the National Meteorological and Hydrological Service (SENAMHI). Therefore, low-cost air quality sensors could represent an important resource to assess air quality at high spatial and temporal resolutions, particularly for the evaluation of vehicle emissions (Velásquez et al. 2020).
Low-cost sensors are characterized by their small size and weight, relatively low power requirements, short response time, and real (or close to real)-time network adaptability (Snyder et al. 2013; Sheats et al. 2010). These sensors have the following benefits: It delivers highly precise real-time data, and it is easy to use, with fast sampling (Idrees and Zheng 2020); besides the advantage of low-cost sensors is the significant amount of sensors that can be deployed for pollution mapping (Li et al. 2020), this information with a correct correlation methodology can be calibrated with a reference meteorological station simultaneously, especially for particle matter concentration (Romero et al. 2020). Although they have several benefits, challenges still remain regarding their usage, which is due to the sensor data quality (Snyder et al. 2013). Low-cost air quality sensors need calibration to validate the reliability of their data, and such validations are performed through simple linear regressions and multiple linear regressions. Simple linear regression models can be used to correct slopes and offsets, whereas multiple linear regression models, which can include other pollutants measurements and meteorological parameters, show a better performance (Munir et al. 2019). In the multi linear regression model (MLRM), the meteorological parameters usually include temperature and relative humidity (Badura et al. 2018; Munir et al. 2019; Snyder et al. 2013) because they affect the functioning of the electronics (Badura et al. 2018). Therefore, by including these parameters in the MLRM, a higher correlation of determination and a lower Root mean square error (RMSE) can be obtained (Panem et al. 2020). For each sensor, a calibration function is specified assuming that the sensor responses are linear with the reference measurement for each contaminant. Ordinary linear regression is performed by minimizing the square residuals of the sensor responses versus the reference measurements. The calibration functions are of the type Rs = a · X + b, where Rs represents the sensor responses and X is the corresponding reference measurement of the air pollutant.
Multiple regression is applied in several calibration process with success, for example, in reduced-spaced Gaussian (Arias Vel et al. 2020), with interpolating nitrogen dioxide concentration for temporal-spatial analysis (Weissert et al. 2020), also compared with multilinear analysis or artificial neural networks approaches for low-cost calibration. Therefore, the regression in calibration models is capable of modeling extremely complex functions that are very suitable for the calibration of a group of sensors. In the study conducted by Spinelle et al. (2015), two types of Artificial Neural Network (ANN) architecture (recommended in diagnosis analysis with ANN for linear regression analysis, Arias Velásquez and Mejia Lara 2018) were considered: radial functions and multilayer perceptron (MLP). MLP shows a better performance and represents the network architecture used today, and it originates from the work of Rumelhart and McClelland (Rumelhart 1986). This architecture consists of artificial units that receive a number of inputs and typically a hidden layer with hidden units. To send an activation signal, the MLP performs an activation function to produce the output of the unit and obtain training algorithms that use interactive techniques called “back propagation.”
In this research, we developed a multiple linear regression to analyze the performance of a particular technology called PurpleAir sensors (PurpleAir 2019). We have considered low-cost sensor and its calibration, due to government requirements to several universities; researches are focusing on developing procedures that allow remote sensor calibrations (Weissert et al. 2020). Therefore, we have progressively introduced it to the case scenario of Lima, Peru, associated to the main avenues and meteorological stations, for traffic assessment (Arias Vel et al. 2020). In the following research, we proposed the development of more advance techniques by taking into account the existing reference monitors, meteorological parameters, and other pollutants to ensure the accuracy of the sensor data relative to reference measurements with PurpleAir sensors and the performance for measurement PM2.5 and PM10 (Table 1).
Material and methods
Characterization of the study site
To assess the performance of low-cost sensor technology via comparisons with a reference monitor, a PurpleAir sensor was installed next to the Campo de Marte Air Quality Station (AQSMarte), which is currently under the supervision of SENAMHI. About the study site, the AQSMarte is located in the middle of Lima city, in “Jesus María” district, so, it is one of the 43 districts, with a population of 75,359 (INEI 2018), with industrial facilities, mainly plastic, electric furnaces, and 3 several avenues across Lima city. In Fig. 1, this sensor allows to evaluate the main effect on traffic associated to “Paseo de la República” avenue, “Salaverry” avenue, and “Brasil” avenue, the roads with the greatest traffic problem in the city of Lima with a calibrated meteorological station online (Arias Vel et al. 2020).
The evaluation was carried out over a period of 3 weeks from October 3rd to October 24th in 2019. This study was developed by only one institution; however, a following step in collaboration with the Ministry of Environment of Peru will include various private companies and academic institutions to assess different low-cost air quality sensor technologies currently in the Peruvian market.
Technical specifications of the reference monitors
Ambient PM2.5 and PM10 mass concentrations for AQSMarte were measured with the leading Automated Measurement System (AMS), GRIMM EDM 180 technology. This system offers outstanding features, such as simultaneous PM measurements in 31 particle size channels, 0.1 μg/m3 resolution, dust mass detection of 0–100 μg/m3, and an isothermal inlet with an integrated Nafion dryer. This sensor can measure values in a range of 0.25 to 32 μm with regard to particle size and has selectable storage intervals between 6 s and 60 min. This sensor works in a temperature range of − 20 °C to 50 °C and consumes approximately 160 W of power with a maximum current of 1.4 A. The measurement principle of this technology is based on light scattering of single particles, detection volume with an aerodynamic focus, and no border zone error.
In addition, a meteorological station is in operation next to the reference monitor station, and it provided data for the measurement of temperature, relative humidity, wind speed, and wind direction.
Technical specifications of the low-cost sensor—PurpleAir Technology
In this study, a PurpleAir sensor (Fig. 2) was chosen. These are low-cost, small, and portable sensors. The portability allows for the development of a dense air quality sensor network, and the sensor provides an internal data storage to record data with or without a WiFi connection and a free platform to visualize real-time data for all parameters (particulate matter, temperature, and humidity), with high consistency as a function of temperature. PurpleAir is a local community organization that operates an air quality monitoring network based on PMS sensors from PlanTower. Currently, they provide one new sensor, i.e., PurpleAir I-PA-Indoor (PMS 1003), and two new sensors, i.e., PurpleAir II-PA and II-PA/SD (PMS 5003), for outdoor and/or indoor evaluations. The last two sensor models are composed of the sensors PMS 5003 (particle counter) and BME 280 (for temperature and humidity) for measuring PM1, PM2.5, and PM10, temperature and humidity in near real time. However, the PurpleAir II-PA-SD sensor model is capable of storing data on an SD card with or without a WiFi connection (PurpleAir PA-II-SD n.d.). PMS 5003 is a digital particle concentration sensor based on a laser scattering physical principle, and it measures 90° light scattering with a photodiode detector that converts the scattered light into a voltage pulse. This sensor counts the number of particles of a certain diameter (0.3 to 10 μm) in the air (by counting the pulses from the scattering signal) and outputs them into a digital interface (PurpleAir PA-II-SD n.d.).
Dynamic light scattering is needed for the laser light source, and the PMS 5003 sensor use laser radiation as the light source at a wavelength of 680 to 30 nm for scattering intensity measurements, and radiation interacts with the different particles suspended in the air. The scattered light is collected at a certain angle by a light scattering measuring cavity to finally obtain the curve of the scattering light change as a function of time. The equivalent particle diameter and the number of particles with different diameters per unit volume are calculated by a microprocessor based on MIE Theory, and the results have a maximum error per module of 10% at a temperature range of 10 to 40 °C, as shown in Fig. 8, according the Manual of the PMS50003, a laser counter manufacturer, and calibrated by PlanTower (2016). According to the manufacturer, the PMS50003 response time is less than 10 s, from which we can infer that the sensor has some limitations in rapidly changing environments. The concentrations uncertainties for PM2.5 are in the range 100 to 500 μg/m3, and PM10 are in the range from 0 to 100 μg/m3 (PlanTower 2016).
BME 28 is a digital pressure, temperature, and humidity sensor with high performance, and the size and low power consumption are key design advantages for mobile applications. These sensors excel in their fast response time for temperature and humidity, which is advantageous for fast context awareness applications, and high overall accuracy over a wide temperature and humidity range.
The PurpleAir sensor has two different “channels” or identical laser counters that measure data: channel A and channel B. As shown on the PurpleAir web page (PurpleAir PA-II-SD n.d.), both channels contain primary and secondary type data. Certain differences between the two channels may indicate that an error is occurring in one of the channels. For the different “fields” of data, “ATM” or “Atmospheric” type data are used for outdoor applications and CF = 1 is meant to be used for indoor or controlled environment applications.
In Peru, the government entrusts the monitoring of air quality to SENAMHI, which continuously monitors air quality and was the main reference source for testing the accuracy of the mobile sensor units deployed for this study.
Model establishment and validation
A multiple regression model was developed to analyze the relationship between the sensor data and the reference measurements (PM2.5 and PM10) and meteorological parameters, such as temperature and relative humidity. Two regression models, model 1 and model 2, were developed (see Eq. 1), and they are summarized in Table 2.
Yi corresponds to the reference PM measurement for a diameter of i μm, μg/m3; Xi refers to the sensor measurement by the PurpleAir sensors for Channel “A,” μg/m3; tem represents temperature; and hum represents the relative humidity recorded at AQSMarte.
According to the results shown in Table 2, Model 1 and Model 2 are equal to < 2.2e−16, which is highly significant because it means that at least one of the predictor variables (Xi, tem, and hum) is significantly related to the outcome variable (Yi). For both models, the Xi and tem predictor variables are more significant for the Yi variable.
Results and discussion
Results obtained by reference monitors
Figure 3 shows the hourly and daily average PM2.5 and PM10 concentrations during the study period (October 3rd to October 24th). The daily average concentration of PM2.5 during this period was 24.5 μg/m3 and ranged from 14.4 to 36.4 μg/m3. According to the environmental legislation, during this period, the air quality regulations for PM2.5 were not violated because, on any day, the daily average exceeded 50 μg/m3 (MINAM 2017); it has been verified in the last research in Peru with meteorological stations (Arias Vel et al. 2020) and temporal and spatial analysis of traffic in the last 5 years since 2020 (Romero et al. 2020). Moreover, the maximum PM10 daily concentration did not exceed the regulation of no more than 100 μg/m3 during our 3-week study period, which showed a daily average concentration of 34.6 μg/m3 and a range of 19.7–48.7 μg/m3.
During our study period, we measured a relatively low temperature between 13.7 and 21.1 °C (average 16 °C), high relative humidities between 64 and 97% (average 86%), wind direction between 3 and 357.0° (average 216.7°), and wind speed ranging from 0 to 5.5 m/s, with an average of 2.4 m/s. The meteorological parameter variations (temperature, relative humidity, wind speed, and wind direction) from the reference monitors are shown in Fig. 4.
Results obtained by PurpleAir sensors
The complete data provided by the PurpleAir sensor must be considered. As briefly mentioned in the “Technical specifications of the low-cost sensor—PurpleAir Technology” section, the two different channels “A” and “B” correspond to equal laser counter sensors, which must be comparable. In this particular study, the ATM data from Primary Channels “A” and “B” were chosen for analysis due to the outdoor condition (ATM). PM2.5a and PM10a correspond to Channel “A”, and PM2.5b PM10b correspond to Channel “B.”
This performance evaluation was carried out using data observed by the sensor platforms and data obtained from the reference monitor, which were statistically analyzed and compared.
PMs—particle matter with diameters less than 2.5 μm and 10 μm
Figure 5 shows the trends of air quality data from the PurpleAir sensor versus the reference measurements. The results show that positive correlations occur between PM10 and PM2.5 (for both channels of PurpleAir sensor) and the reference measurement; however, the “A” channel presents the highest correlations for PM2.5a and PM10a of 0.8 and 0.4, respectively; therefore, we recommend to use the A channel for outdoor PM2.5 and PM10 studies.
Based on the multiple regression model developed in the “Model establishment and validation” section and the summary of the results in Table 2, the following figures can be analyzed. Figures 6 and 7 show the multiple regression model diagnostics. For both cases, the developed models present a very good performance, according to the Normal quantile–quantile (Q-Q) analysis; it demonstrates the distribution of the data against the expected normal distribution. A linear regression approach is appropriated for the standardized residuals analysis on PM2.5 and PM10 data.
The residuals versus fitted graphs demonstrate that the difference between the dependent (Yi) variables and predictor variables (Xi, tem, and hum), which can be interpreted for Model 1 and Model 2, for both predictions was too high (residuals near 0), and it suggests that there are no outliers for linear predictor and residual analysis. Normal Q-Q graphs show that the residuals are close to 0 because they follow a straight line for Model 1 and Model 2. In both cases, the result is that all the predictor variables have a considerable influence on the developed regression models (Figure 8).
Conclusions
Developing low-cost sensors represents a potential alternative that can complement reference air quality monitor stations worldwide because of the low cost and minimal maintenance requirements during operation. This study shows that different low-cost technologies have already been introduced for informative and research purposes in Lima, Peru; however, previous studies have performed data comparisons of their performance versus the performance of reference monitors, such as those from SENAMHI, which manages the reference air quality monitoring stations. It is important to point out that air quality data are required to be collected at high spatial and temporal resolution to develop better assessments and provide information to private and public sectors because of the effects of air quality on health.
The results show that a high positive correlation occurs between sensor data and reference measurements for PM2.5 and PM10, especially for PM2.5 (r2 = 0.8) for the “A” and “B” channels. Two multiple regression models (Model 1 and Model 2) were developed by taking into account the sensor data (PM2.5 and PM10 concentration) and meteorological parameters, such as temperature and relative humidity. In both cases, a good performance was identified for the models, and temperature was shown to be a more influential meteorological parameter compared with relative humidity. The PurpleAir sensor shows a good performance for the measurement of PM2.5 and PM10 but not for temperature and relative humidity.
Additional studies must be performed to ensure the accuracy of sensor data relative to reference measurements. A following study that uses machine learning techniques will be performed to compare the performance of the multiple regression model with machine learning techniques.
References
Arias Velásquez, R. M., & Mejia Lara, J. V. (2018). Life estimation of shunt power reactors considering a failure core heating by floating potentials. Engineering Failure Analysis, 86, 142–157. https://doi.org/10.1016/j.engfailanal.2018.01.004.
Arias Velásquez, R. M., & Mejía Lara, J. V. (2020). Forecast and evaluation of COVID-19 spreading in USA with reduced-space Gaussian process regression. Chaos, Solitons & Fractals, 136, 109924. https://doi.org/10.1016/j.chaos.2020.109924.
Badura, M., Batog, P., Drzeniecka-Osiadacz, A., & Modzel, P. (2018). Evaluation of low-cost sensors for Ambient PM2. 5 Monitoring. Journal of Sensors, 2018.
Costa, L. G., Cole, T. B., Coburn, J., Chang, Y.-C., Dao, K., & Roqué, P. J. (2017). Neurotoxicity of traffic-related air pollution. Neurotoxicology, 59, 133–139. https://doi.org/10.1016/j.neuro.2015.11.008.
Idrees, Z., & Zheng, L. (2020). Low cost air pollution monitoring systems: A review of protocols and enabling technologies. Journal of Industrial Information Integration, 17, 100123. https://doi.org/10.1016/j.atmosenv.2020.117293.
Ierodiakonou, D., Zanobetti, A., Coull, B. A., Melly, S., Postma, D. S., Boezen, H. M., Vonk, J. M., Williams, P. V., Shapiro, G. G., McKone, E. F., Hallstrand, T. S., Koenig, J. Q., Schildcrout, J. S., Lumley, T., Fuhlbrigge, A. N., Koutrakis, P., Schwartz, J., Weiss, S. T., & Gold, D. R. (2016). Ambient air pollution, lung function, and airway responsiveness in asthmatic children. The Journal of Allergy and Clinical Immunology, 137, 390–399. https://doi.org/10.1016/j.jaci.2015.05.028.
INEI. inei.gob.pe (2018). Perú: crecimiento y distribución de la población, 2017. 1–91. https://www.inei.gob.pe/media/inei_en_los_medios/Conferencia_Prensa_CPV2017.pdf (Accessed Aug 28th, 2018).
Kim, K.-H., Kumar, P., Szulejko, J. E., Adelodun, A. A., Junaid, M. F., Uchimiya, M., & Chambers, S. (2017). Toward a better understanding of the impact of mass transit air pollutants on human health. Chemosphere, 174, 268–279. https://doi.org/10.1016/j.chemosphere.2017.01.113.
Landrigan, P. J., Fuller, R., Acosta, N. J. R., Adeyi, O., Arnold, R., Baldé, A. B., Bertollini, R., Bose-O’Reilly, S., Boufford, J. I., Breysse, P. N., et al. (2018). The Lancet Commission on pollution and health. Lancet, 391, 462–512.
Li, J., Zhang, H., Chao, C., Chien, C., Wu, C., Luo, C. H., Chen, L.-J., & Biswas, P. (2020). Integrating low-cost air quality sensor networks with fixed and satellite monitoring systems to study ground-level PM2.5. Atmospheric Environment, 223, 117293. https://doi.org/10.1016/j.jii.2019.100123.
MINAM, & del Carmen, A. M. (2017). Decreto Supremo No 003-2017-MINAM - Estándares de Calidad Ambiental (ECA) para Aire. Normas Legales El Peruano., 397, 187–196. https://doi.org/10.3354/meps08245.
Munir, S., Mayfield, M., Coca, D., Jubb, S. A., & Osammor, O. (2019). Analysing the performance of low-cost air quality sensors, their drivers, relative benefits and calibration in cities—A case study in Sheffield. Environmental Monitoring and Assessment, 191, 94.
Nyhan, M., McNabola, A., & Misstear, B. (2013). Comparison of particulate matter dose and acute heart rate variability response in cyclists, pedestrians, bus and train passengers. Sci. Total Environ., 468–469, 821–831. https://doi.org/10.1016/j.scitotenv.2013.08.096.
Panem, C., Gad, V., Gad, R. 2020. Sensor’s data transmission with BPSK using LDPC (Min-Sum) error corrections over MIMO channel: Analysis over RMSE and BER, Materials Today: Proceedings, 27, 1, 2020, Pages 571–575. https://doi.org/10.1016/j.matpr.2019.12.039.
PlanTower. (2016). Product data manual of PlantTower - Digital universal particle concentration sensor PMS50003 series data manual. 1–15. (http://www.aqmd.gov/docs/default-source/aq-spec/resources-page/plantower-pms5003-manual_v2-3.pdf). Accessed 21 Nov 2019.
PurpleAir PA-II-SD [WWW Document], (n.d.) URL https://www2.purpleair.com/products/purpleair-pa-ii-sd (accessed 11.21.19).
Romero, Y., Chicchon, N., Duarte, F., Noel, J., Ratti, C., & Nyhan, M. (2019). Quantifying and spatial disaggregation of air pollution emissions from ground transportation in a developing country context: Case study for the Lima metropolitan area in Peru. Sci. Total Environ., 698, 134313. https://doi.org/10.1016/j.scitotenv.2019.134313.
Romero, Y., Diaz, C., Meldrum, I., Arias, V. R., & Noel, J. (2020). Temporal and spatial analysis of traffic – Related pollutant under the influence of the seasonality and meteorological variables over an urban city in Peru. Heliyon, 6(6), e04029. https://doi.org/10.1016/j.heliyon.2020.e04029.
Rumelhart, D.E.. 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Learn. Intern. Represent. by error Propag. 1, 318–362.
Sheats, J. R., Biesty D., Noel J. and Taylor, G. N.. 2010. “Printing technology for ubiquitous electronics.” Circuit World, V.36, #2, 40–47, https://doi.org/10.1108/03056121011041690.
Snyder, E. G., Watkins, T. H., Solomon, P. A., Thoma, E. D., Williams, R. W., Hagler, G. S. W., Shelow, D., Hindin, D. A., Kilaru, V. J., Preuss, P. W.. 2013. The changing paradigm of air pollution monitoring, The Changing Paradigm of Air Pollution Monitoring.
Spinelle, L., Gerboles, M., Villani, M. G., Aleixandre, M., & Bonavitacola, F. (2015). Field calibration of a cluster of low-cost available sensors for air quality monitoring. Part A: Ozone and nitrogen dioxide. Sensors and Actuators B: Chemical, 215, 249–257.
Velásquez, R. A., Ramos, Y. R., & Noel, J. (2019). Citizen science approach for spatiotemporal modelling of air pollution quality and traffic in Lima, Peru. In 2019 IEEE Sciences and Humanities International Research Conference (SHIRCON) (pp. 1-4). IEEE.
Weissert, L., Alberti, K., Miles, E., Miskell, G., Feenstra, B., Henshaw, G., Papapostolou, V., Patel, H., Polidori, A., Salmond, J., & Williams, D. (2020). Low-cost sensor networks and land-use regression: Interpolating nitrogen dioxide concentration at high temporal and spatial resolution in Southern California. Atmospheric Environment, 223, 117287. https://doi.org/10.1016/j.atmosenv.2020.117287.
WHO (World Health Organization). (n.d.) How air pollution is destroying our health [WWW Document]. URL https://www.who.int/air-pollution/news-and-events/how-air-pollution-is-destroying-our-health (accessed 10.17.19).
Acknowledgments
This work was supported by the UTEC MIT investigacion seedfund #033-OI-2019.
Author information
Authors and Affiliations
Contributions
Yovitza Romero: conceptualization, methodology, validation, formal analysis, investigation, data curation, writing—original draft, writing—review editing, visualization, and supervision
Ricardo Manuel Arias Velásquez: conceptualization, methodology, validation, formal analysis, investigation, data curation, writing—original draft, writing—review editing, visualization, supervision
Julien Noel: conceptualization, methodology, validation, formal analysis, investigation, data curation, writing—original draft, writing—review editing, visualization, supervision, project administration
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Capsule
We provide a framework for performance evaluations of low-cost sensors using reference monitoring stations.
Rights and permissions
About this article
Cite this article
Romero, Y., Velásquez, R.M.A. & Noel, J. Development of a multiple regression model to calibrate a low-cost sensor considering reference measurements and meteorological parameters. Environ Monit Assess 192, 498 (2020). https://doi.org/10.1007/s10661-020-08440-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-020-08440-w