Skip to main content

A Novel Approach to Detect Missing Values Patterns in Time Series Data

  • Conference paper
  • First Online:
Information and Communication Technologies of Ecuador (TIC.EC) (TICEC 2019)

Abstract

The increase of environmental sensors to capture the behavior of cities implies large amounts of shared data. However, missing values issues are unavoidable, becoming it a critical problem for studies which require data analysis over extensive periods. The main problem is evident in longitudinal studies since they require data over long periods. Hence, a convenient process is to support the data collection rules by determining the behavior of common missing data slots. This process is possible by discovering missing data patterns over time series based on: (1) Data matrices definition, (2) Compute and categorize the missed periods using the proposed algorithm, (3) Identify the time analysis scenarios, and (4) Applying the Kernel Density Estimation algorithm. This paper describes the experimentation of this method using a real air quality dataset from Cuenca, Ecuador, collected over one-year. The results show that the proposed approach is useful to evidence the missing data patterns. Also, this approach provides a good starting point for companies and laboratories interested in improving their data collection rules.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Albayrak, M., Turhan, K., Kurt, B.: A missing data imputation approach using clustering and maximum likelihood estimation. In: Medical Technologies National Congress (TIPTEKNO), pp. 1–4, October 2017. https://doi.org/10.1109/TIPTEKNO.2017.8238064

  2. Aljuaid, T., Sasi, S.: Proper imputation techniques for missing values in data sets. In: International Conference on Data Science and Engineering (ICDSE), pp. 1–5, August 2016. https://doi.org/10.1109/ICDSE.2016.7823957

  3. Barnett, A.G., McElwee, P., Nathan, A., Burton, N.W., Turrell, G.: Identifying patterns of item missing survey data using latent groups: an observational study. BMJ Open 7(10), e017284 (2017). https://doi.org/10.1136/bmjopen-2017-017284. https://bmjopen.bmj.com/content/7/10/e017284

    Article  Google Scholar 

  4. Bennett, D.A.: How can i deal with missing data in my study? Aust. N. Z. J. Public Health 25(5), 464–469 (2001)

    Article  Google Scholar 

  5. Boudries, A., Aliouat, M., Siarry, P.: Detection and replacement of a failing node in the wireless sensors networks. Comput. Electr. Eng. 40(2), 421–432 (2014)

    Article  Google Scholar 

  6. Caruana, E.J., Roman, M., Hernndez-Snchez, J., Solli, P.: Longitudinal studies. J. Thorac. Dis. 7(11) (2015). http://jtd.amegroups.com/article/view/5822

  7. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 step-by-step data mining guide. Technical report, The CRISP-DM consortium, August 2000. http://www.crisp-dm.org/CRISPWP-0800.pdf

  8. Chen, W., Guo, F., Wang, F.: A survey of traffic data visualization. IEEE Trans. Intell. Transp. Syst. 16(6), 2970–2984 (2015). https://doi.org/10.1109/TITS.2015.2436897

    Article  Google Scholar 

  9. Dockery, D.W., Brunekreef, B.A.: Longitudinal studies of air pollution effects on lung function. Am. J. Respir. Crit. Care Med. 154(6 Pt 2), S250–6 (1996)

    Article  Google Scholar 

  10. Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. SpringerPlus 2(1), 222 (2013). https://doi.org/10.1186/2193-1801-2-222

    Article  Google Scholar 

  11. Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)

    Google Scholar 

  12. Fischer, P.H., Marra, M., Ameling, C.B., Hoek, G., Beelen, R., de Hoogh, K., Breugelmans, O., Kruize, H., Janssen, N.A., Houthuijs, D.: Air pollution and mortality in seven million adults: the Dutch environmental longitudinal study (DUELS). Environ. Health Perspect. 123(7), 697–704 (2015)

    Article  Google Scholar 

  13. Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32–40 (1975). https://doi.org/10.1109/TIT.1975.1055330

    Article  MathSciNet  MATH  Google Scholar 

  14. Galimard, J.E., Chevret, S., Curis, E., Resche-Rigon, M.: Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors. BMC Med. Res. Method. 18(1), 90 (2018). https://doi.org/10.1186/s12874-018-0547-1

    Article  Google Scholar 

  15. Kong, L., Xia, M., Liu, X., Chen, G., Gu, Y., Wu, M., Liu, X.: Data loss and reconstruction in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(11), 2818–2828 (2014). https://doi.org/10.1109/TPDS.2013.269

    Article  Google Scholar 

  16. Schafer, J.L., Graham, J.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002). https://doi.org/10.1037/1082-989X.7.2.147

    Article  Google Scholar 

  17. Laird, N.M.: Missing data in longitudinal studies. Stat. Med. 7(1–2), 305–315 (1988)

    Article  Google Scholar 

  18. Lee, M., An, J., Lee, Y.: Missing-value imputation of continuous missing based on deep imputation network using correlations among multiple iot data streams in a smart space. IEICE Trans. Inf. Syst. 102(2), 289–298 (2019)

    Article  Google Scholar 

  19. Myers, T.A.: Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun. Methods Measures 5(4), 297–310 (2011)

    Article  Google Scholar 

  20. Nakagawa, S.: Missing data: mechanisms, methods and messages. In: Ecological Statistics: Contemporary Theory and Application, pp. 81–105 (2015)

    Chapter  Google Scholar 

  21. Oudin, A., Forsberg, B., Adolfsson, A.N., Lind, N., Modig, L., Nordin, M., Nordin, S., Adolfsson, R., Nilsson, L.G.: Traffic-related air pollution and dementia incidence in Northern Sweden: a longitudinal study. Environ. Health Perspect. 124(3), 306–312 (2015)

    Article  Google Scholar 

  22. Pedersen, A.B., Mikkelsen, E.M., Cronin-Fenton, D., Kristensen, N.R., Pham, T.M., Pedersen, L., Petersen, I.: Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 9, 157 (2017)

    Article  Google Scholar 

  23. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  24. Peixoto, M.L.M., Souza, I., Barbosa, M., Lecomte, G., Batista, B.G., Kuehne, B.T., Filho, D.M.L.: Data missing problem in smart surveillance environment. In: International Conference on High Performance Computing Simulation (HPCS), pp. 962–969, July 2018. https://doi.org/10.1109/HPCS.2018.00152

  25. Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecol. Modell. 190(3), 231–259 (2006). https://doi.org/10.1016/j.ecolmodel.2005.03.026. http://www.sciencedirect.com/science/article/pii/S030438000500267X

    Article  Google Scholar 

  26. Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Urban planning and building smart cities based on the internet of things using big data analytics. Comput. Netw. 101, 63–80 (2016). https://doi.org/10.1016/j.comnet.2015.12.023. http://www.sciencedirect.com/science/article/pii/S1389128616000086. Industrial Technologies and Applications for the Internet of Things

    Article  Google Scholar 

  27. Santhi, K., Reddy, R.M.: Critical analysis of big visual analytics: a survey. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3200438

  28. Scott, D.W.: Multivariate density estimation and visualization. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 549–569. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work is part of the “Aplicación de minería de datos en el análisis de asociaciones entre contaminantes atmosféricos y variables meteorológicas” project, supported by the University of Azuay, also thanks to Chester Sellers and EMOV-EP to provide access to data of air quality variables of Cuenca, Ecuador.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan-Fernando Lima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lima, JF., Ortega-Chasi, P., Orellana Cordero, M. (2020). A Novel Approach to Detect Missing Values Patterns in Time Series Data. In: Fosenca C, E., Rodríguez Morales, G., Orellana Cordero, M., Botto-Tobar, M., Crespo Martínez, E., Patiño León, A. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2019. Advances in Intelligent Systems and Computing, vol 1099. Springer, Cham. https://doi.org/10.1007/978-3-030-35740-5_11

Download citation

Publish with us

Policies and ethics