Skip to main content

EDA and a Tailored Data Imputation Algorithm for Daily Ozone Concentrations

  • Conference paper
  • First Online:
Information and Communication Technologies of Ecuador (TIC.EC) (TICEC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 884))

Included in the following conference series:

Abstract

Air pollution is a critical environmental problem with detrimental effects on human health that is affecting all regions in the world, especially to low-income cities, where critical levels have been reached. Air pollution has a direct role in public health, climate change, and worldwide economy. Effective actions to mitigate air pollution, e.g. research and decision making, require of the availability of high resolution observations. This has motivated the emergence of new low-cost sensor technologies, which have the potential to provide high resolution data thanks to their accessible prices. However, since low-cost sensors are built with relatively low-cost materials, they tend to be unreliable. That is, measurements from low-cost sensors are prone to errors, gaps, bias and noise. All these problems need to be solved before the data can be used to support research or decision making. In this paper, we address the problem of data imputation on a daily air pollution data set with relatively small gaps. Our main contributions are: (1) an air pollution data set composed by several air pollution concentrations including criteria gases and thirteen meteorological covariates; and (2) a custom algorithm for data imputation of daily ozone concentrations based on a trend surface and a Gaussian Process. Data Visualization techniques were extensively used along this work, as they are useful tools for understanding the multi-dimensionality of point-referenced sensor data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/rgualan/soton-data-science-thesis.

  2. 2.

    https://aqs.epa.gov/aqsweb/airdata/download_files.html.

  3. 3.

    https://www.esrl.noaa.gov/psd/data/gridded/data.narr.monolevel.html.

References

  1. Allenby GM, Rossi PE, McCulloch RE (2005) Hierarchical Bayes models: a practitioners guide

    Google Scholar 

  2. Bakar KS, Sahu SK et al (2015) spTimer: spatio-temporal Bayesian modelling using R. J Stat Softw 63(15):1–32

    Article  Google Scholar 

  3. Burke JA, Estrin D, Hansen M, Parker A, Ramanathan N, Reddy S, Srivastava MB (2006) Participatory sensing. Center for Embedded Network Sensing

    Google Scholar 

  4. Cameletti M, Lindgren F, Simpson D, Rue H (2013) Spatio-temporal modeling of particulate matter concentration through the SPDE approach. AStA Adv Stat Anal 97(2):109–131

    Article  MathSciNet  Google Scholar 

  5. Campozano L, Sánchez E, Avilés A, Samaniego E (2014) Evaluation of infilling methods for time series of daily precipitation and temperature: the case of the ecuadorian andes. Maskana 5(1):99–115

    Google Scholar 

  6. Cressie N, Wikle CK (2015) Statistics for spatio-temporal data. Wiley, New York

    MATH  Google Scholar 

  7. Finley AO, Banerjee S, Gelfand AE (2013) spBayes for large univariate and multivariate point-referenced spatio-temporal data models. arXiv preprint arXiv:1310.8192

  8. Gelfand AE (2012) Hierarchical modeling for spatial data problems. Spat Stat 1:30–39

    Article  Google Scholar 

  9. Gräler B, Pebesma E, Heuvelink G (2016) Spatio-temporal interpolation using gstat. R J 8(1):204–218

    Google Scholar 

  10. Hasenfratz D, Saukh O, Sturzenegger S, Thiele L (2012) Participatory air pollution monitoring using smartphones. Mob Sens 1:1–5

    Google Scholar 

  11. Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J et al (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77(3):437–471

    Article  Google Scholar 

  12. Mukhopadhyay S, Sahu SK (2017) A Bayesian spatiotemporal model to estimate long-term exposure to outdoor air pollution at coarser administrative geographies in England and Wales. J R Stat Soc Ser (Stat Soc) 181(2):465–486

    Article  MathSciNet  Google Scholar 

  13. Pirani M, Gulliver J, Fuller GW, Blangiardo M (2014) Bayesian spatiotemporal modelling for the assessment of short-term exposure to particle pollution in urban areas. J Expo Sci Environ Epidemiol 24(3):319

    Article  Google Scholar 

  14. R Core Team (2013) R: a language and environment for statistical computing. https://www.r-project.org/

  15. S3L (2012) Matrix factorization as data imputation \(|\) S3l. http://s3l.stanford.edu/blog/?p=66

  16. Sahu SK, Bakar KS (2012) Hierarchical Bayesian autoregressive models for large space-time data with applications to ozone concentration modelling. Appl Stoch Model Bus Ind 28(5):395–415

    Article  MathSciNet  Google Scholar 

  17. Sahu SK, Gelfand AE, Holland DM (2007) High-resolution space-time ozone modeling for assessing trends. J Am Stat Assoc 102(480):1221–1234

    Article  MathSciNet  Google Scholar 

  18. Samworth RJ et al (2012) Optimal weighted nearest neighbour classifiers. Ann Stat 40(5):2733–2763

    Article  MathSciNet  Google Scholar 

  19. Seo J, Youn D, Kim J, Lee H (2014) Extensive spatiotemporal analyses of surface ozone and related meteorological variables in south korea for the period 1999–2010. Atmos Chem Phys 14(12):6395–6415

    Article  Google Scholar 

  20. Snyder EG, Watkins TH, Solomon PA, Thoma ED, Williams RW, Hagler GSW, Shelow D, Hindin DA, Kilaru VJ, Preuss PW (2013) The changing paradigm of air pollution monitoring. Environ Sci Technol 47(20):11,369–11,377. https://doi.org/10.1021/es4022602

    Article  Google Scholar 

  21. Stocker M, Baranizadeh E, Portin H, Komppula M, Rönkkö M, Hamed A, Virtanen A, Lehtinen K, Laaksonen A, Kolehmainen M (2014) Representing situational knowledge acquired from sensor data for atmospheric phenomena. Environ Model Softw 58:27–47

    Article  Google Scholar 

  22. US EPA (2016) Air data basic information \(|\) air data: air quality data collected at outdoor monitors across the US \(|\) US EPA. https://www.epa.gov/outdoor-air-quality-data/air-data-basic-information

  23. Wen H, Xiao Z, Markham A, Trigoni N (2015) Accuracy estimation for sensor systems. IEEE Trans Mob Comput 14(7):1330–1343

    Article  Google Scholar 

  24. WHO (2016) WHO global urban ambient air pollution database (update 2016). http://www.who.int/phe/health_topics/outdoorair/databases/cities/en/

  25. Yanosky JD, Paciorek CJ, Laden F, Hart JE, Puett RC, Liao D, Suh HH (2014) Spatio-temporal modeling of particulate air pollution in the conterminous united states using geographic and meteorological predictors. Environ Health 13(1):63

    Article  Google Scholar 

  26. Zakaria NA, Noor NM (2018) Imputation methods for filling missing data in urban air pollution data formalaysia. Urbanism. Arhitectura. Constructii 9(2):159

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ronald Gualán .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gualán, R., Saquicela, V., Tran-Thanh, L. (2019). EDA and a Tailored Data Imputation Algorithm for Daily Ozone Concentrations. In: Botto-Tobar, M., Barba-Maggi, L., González-Huerta, J., Villacrés-Cevallos, P., S. Gómez, O., Uvidia-Fassler, M. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2018. Advances in Intelligent Systems and Computing, vol 884. Springer, Cham. https://doi.org/10.1007/978-3-030-02828-2_27

Download citation

Publish with us

Policies and ethics