Abstract
Air pollution is a critical environmental problem with detrimental effects on human health that is affecting all regions in the world, especially to low-income cities, where critical levels have been reached. Air pollution has a direct role in public health, climate change, and worldwide economy. Effective actions to mitigate air pollution, e.g. research and decision making, require of the availability of high resolution observations. This has motivated the emergence of new low-cost sensor technologies, which have the potential to provide high resolution data thanks to their accessible prices. However, since low-cost sensors are built with relatively low-cost materials, they tend to be unreliable. That is, measurements from low-cost sensors are prone to errors, gaps, bias and noise. All these problems need to be solved before the data can be used to support research or decision making. In this paper, we address the problem of data imputation on a daily air pollution data set with relatively small gaps. Our main contributions are: (1) an air pollution data set composed by several air pollution concentrations including criteria gases and thirteen meteorological covariates; and (2) a custom algorithm for data imputation of daily ozone concentrations based on a trend surface and a Gaussian Process. Data Visualization techniques were extensively used along this work, as they are useful tools for understanding the multi-dimensionality of point-referenced sensor data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allenby GM, Rossi PE, McCulloch RE (2005) Hierarchical Bayes models: a practitioners guide
Bakar KS, Sahu SK et al (2015) spTimer: spatio-temporal Bayesian modelling using R. J Stat Softw 63(15):1–32
Burke JA, Estrin D, Hansen M, Parker A, Ramanathan N, Reddy S, Srivastava MB (2006) Participatory sensing. Center for Embedded Network Sensing
Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Stat Anal 97(2):109–131
Campozano L, Sánchez E, Avilés A, Samaniego E (2014) Evaluation of infilling methods for time series of daily precipitation and temperature: the case of the ecuadorian andes. Maskana 5(1):99–115
Cressie N, Wikle CK (2015) Statistics for spatio-temporal data. Wiley, New York
Finley AO, Banerjee S, Gelfand AE (2013) spBayes for large univariate and multivariate point-referenced spatio-temporal data models. arXiv preprint arXiv:1310.8192
Gelfand AE (2012) Hierarchical modeling for spatial data problems. Spat Stat 1:30–39
Gräler B, Pebesma E, Heuvelink G (2016) Spatio-temporal interpolation using gstat. R J 8(1):204–218
Hasenfratz D, Saukh O, Sturzenegger S, Thiele L (2012) Participatory air pollution monitoring using smartphones. Mob Sens 1:1–5
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J et al (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77(3):437–471
Mukhopadhyay S, Sahu SK (2017) A Bayesian spatiotemporal model to estimate long-term exposure to outdoor air pollution at coarser administrative geographies in England and Wales. J R Stat Soc Ser (Stat Soc) 181(2):465–486
Pirani M, Gulliver J, Fuller GW, Blangiardo M (2014) Bayesian spatiotemporal modelling for the assessment of short-term exposure to particle pollution in urban areas. J Expo Sci Environ Epidemiol 24(3):319
R Core Team (2013) R: a language and environment for statistical computing. https://www.r-project.org/
S3L (2012) Matrix factorization as data imputation \(|\) S3l. http://s3l.stanford.edu/blog/?p=66
Sahu SK, Bakar KS (2012) Hierarchical Bayesian autoregressive models for large space-time data with applications to ozone concentration modelling. Appl Stoch Model Bus Ind 28(5):395–415
Sahu SK, Gelfand AE, Holland DM (2007) High-resolution space-time ozone modeling for assessing trends. J Am Stat Assoc 102(480):1221–1234
Samworth RJ et al (2012) Optimal weighted nearest neighbour classifiers. Ann Stat 40(5):2733–2763
Seo J, Youn D, Kim J, Lee H (2014) Extensive spatiotemporal analyses of surface ozone and related meteorological variables in south korea for the period 1999–2010. Atmos Chem Phys 14(12):6395–6415
Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GSW, Shelow D, Hindin DA, Kilaru VJ, Preuss PW (2013) The changing paradigm of air pollution monitoring. Environ Sci Technol 47(20):11,369–11,377. https://doi.org/10.1021/es4022602
Stocker M, Baranizadeh E, Portin H, Komppula M, Rönkkö M, Hamed A, Virtanen A, Lehtinen K, Laaksonen A, Kolehmainen M (2014) Representing situational knowledge acquired from sensor data for atmospheric phenomena. Environ Model Softw 58:27–47
US EPA (2016) Air data basic information \(|\) air data: air quality data collected at outdoor monitors across the US \(|\) US EPA. https://www.epa.gov/outdoor-air-quality-data/air-data-basic-information
Wen H, Xiao Z, Markham A, Trigoni N (2015) Accuracy estimation for sensor systems. IEEE Trans Mob Comput 14(7):1330–1343
WHO (2016) WHO global urban ambient air pollution database (update 2016). http://www.who.int/phe/health_topics/outdoorair/databases/cities/en/
Yanosky JD, Paciorek CJ, Laden F, Hart JE, Puett RC, Liao D, Suh HH (2014) Spatio-temporal modeling of particulate air pollution in the conterminous united states using geographic and meteorological predictors. Environ Health 13(1):63
Zakaria NA, Noor NM (2018) Imputation methods for filling missing data in urban air pollution data formalaysia. Urbanism. Arhitectura. Constructii 9(2):159
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gualán, R., Saquicela, V., Tran-Thanh, L. (2019). EDA and a Tailored Data Imputation Algorithm for Daily Ozone Concentrations. In: Botto-Tobar, M., Barba-Maggi, L., González-Huerta, J., Villacrés-Cevallos, P., S. Gómez, O., Uvidia-Fassler, M. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2018. Advances in Intelligent Systems and Computing, vol 884. Springer, Cham. https://doi.org/10.1007/978-3-030-02828-2_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-02828-2_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02827-5
Online ISBN: 978-3-030-02828-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)