Abstract
The increase of environmental sensors to capture the behavior of cities implies large amounts of shared data. However, missing values issues are unavoidable, becoming it a critical problem for studies which require data analysis over extensive periods. The main problem is evident in longitudinal studies since they require data over long periods. Hence, a convenient process is to support the data collection rules by determining the behavior of common missing data slots. This process is possible by discovering missing data patterns over time series based on: (1) Data matrices definition, (2) Compute and categorize the missed periods using the proposed algorithm, (3) Identify the time analysis scenarios, and (4) Applying the Kernel Density Estimation algorithm. This paper describes the experimentation of this method using a real air quality dataset from Cuenca, Ecuador, collected over one-year. The results show that the proposed approach is useful to evidence the missing data patterns. Also, this approach provides a good starting point for companies and laboratories interested in improving their data collection rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albayrak, M., Turhan, K., Kurt, B.: A missing data imputation approach using clustering and maximum likelihood estimation. In: Medical Technologies National Congress (TIPTEKNO), pp. 1–4, October 2017. https://doi.org/10.1109/TIPTEKNO.2017.8238064
Aljuaid, T., Sasi, S.: Proper imputation techniques for missing values in data sets. In: International Conference on Data Science and Engineering (ICDSE), pp. 1–5, August 2016. https://doi.org/10.1109/ICDSE.2016.7823957
Barnett, A.G., McElwee, P., Nathan, A., Burton, N.W., Turrell, G.: Identifying patterns of item missing survey data using latent groups: an observational study. BMJ Open 7(10), e017284 (2017). https://doi.org/10.1136/bmjopen-2017-017284. https://bmjopen.bmj.com/content/7/10/e017284
Bennett, D.A.: How can i deal with missing data in my study? Aust. N. Z. J. Public Health 25(5), 464–469 (2001)
Boudries, A., Aliouat, M., Siarry, P.: Detection and replacement of a failing node in the wireless sensors networks. Comput. Electr. Eng. 40(2), 421–432 (2014)
Caruana, E.J., Roman, M., Hernndez-Snchez, J., Solli, P.: Longitudinal studies. J. Thorac. Dis. 7(11) (2015). http://jtd.amegroups.com/article/view/5822
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 step-by-step data mining guide. Technical report, The CRISP-DM consortium, August 2000. http://www.crisp-dm.org/CRISPWP-0800.pdf
Chen, W., Guo, F., Wang, F.: A survey of traffic data visualization. IEEE Trans. Intell. Transp. Syst. 16(6), 2970–2984 (2015). https://doi.org/10.1109/TITS.2015.2436897
Dockery, D.W., Brunekreef, B.A.: Longitudinal studies of air pollution effects on lung function. Am. J. Respir. Crit. Care Med. 154(6 Pt 2), S250–6 (1996)
Dong, Y., Peng, C.Y.J.: Principled missing data methods for researchers. SpringerPlus 2(1), 222 (2013). https://doi.org/10.1186/2193-1801-2-222
Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)
Fischer, P.H., Marra, M., Ameling, C.B., Hoek, G., Beelen, R., de Hoogh, K., Breugelmans, O., Kruize, H., Janssen, N.A., Houthuijs, D.: Air pollution and mortality in seven million adults: the Dutch environmental longitudinal study (DUELS). Environ. Health Perspect. 123(7), 697–704 (2015)
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32–40 (1975). https://doi.org/10.1109/TIT.1975.1055330
Galimard, J.E., Chevret, S., Curis, E., Resche-Rigon, M.: Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors. BMC Med. Res. Method. 18(1), 90 (2018). https://doi.org/10.1186/s12874-018-0547-1
Kong, L., Xia, M., Liu, X., Chen, G., Gu, Y., Wu, M., Liu, X.: Data loss and reconstruction in wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 25(11), 2818–2828 (2014). https://doi.org/10.1109/TPDS.2013.269
Schafer, J.L., Graham, J.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002). https://doi.org/10.1037/1082-989X.7.2.147
Laird, N.M.: Missing data in longitudinal studies. Stat. Med. 7(1–2), 305–315 (1988)
Lee, M., An, J., Lee, Y.: Missing-value imputation of continuous missing based on deep imputation network using correlations among multiple iot data streams in a smart space. IEICE Trans. Inf. Syst. 102(2), 289–298 (2019)
Myers, T.A.: Goodbye, listwise deletion: presenting hot deck imputation as an easy and effective tool for handling missing data. Commun. Methods Measures 5(4), 297–310 (2011)
Nakagawa, S.: Missing data: mechanisms, methods and messages. In: Ecological Statistics: Contemporary Theory and Application, pp. 81–105 (2015)
Oudin, A., Forsberg, B., Adolfsson, A.N., Lind, N., Modig, L., Nordin, M., Nordin, S., Adolfsson, R., Nilsson, L.G.: Traffic-related air pollution and dementia incidence in Northern Sweden: a longitudinal study. Environ. Health Perspect. 124(3), 306–312 (2015)
Pedersen, A.B., Mikkelsen, E.M., Cronin-Fenton, D., Kristensen, N.R., Pham, T.M., Pedersen, L., Petersen, I.: Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 9, 157 (2017)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peixoto, M.L.M., Souza, I., Barbosa, M., Lecomte, G., Batista, B.G., Kuehne, B.T., Filho, D.M.L.: Data missing problem in smart surveillance environment. In: International Conference on High Performance Computing Simulation (HPCS), pp. 962–969, July 2018. https://doi.org/10.1109/HPCS.2018.00152
Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecol. Modell. 190(3), 231–259 (2006). https://doi.org/10.1016/j.ecolmodel.2005.03.026. http://www.sciencedirect.com/science/article/pii/S030438000500267X
Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Urban planning and building smart cities based on the internet of things using big data analytics. Comput. Netw. 101, 63–80 (2016). https://doi.org/10.1016/j.comnet.2015.12.023. http://www.sciencedirect.com/science/article/pii/S1389128616000086. Industrial Technologies and Applications for the Internet of Things
Santhi, K., Reddy, R.M.: Critical analysis of big visual analytics: a survey. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3200438
Scott, D.W.: Multivariate density estimation and visualization. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics, pp. 549–569. Springer, Heidelberg (2012)
Acknowledgments
This work is part of the “Aplicación de minería de datos en el análisis de asociaciones entre contaminantes atmosféricos y variables meteorológicas” project, supported by the University of Azuay, also thanks to Chester Sellers and EMOV-EP to provide access to data of air quality variables of Cuenca, Ecuador.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lima, JF., Ortega-Chasi, P., Orellana Cordero, M. (2020). A Novel Approach to Detect Missing Values Patterns in Time Series Data. In: Fosenca C, E., Rodríguez Morales, G., Orellana Cordero, M., Botto-Tobar, M., Crespo Martínez, E., Patiño León, A. (eds) Information and Communication Technologies of Ecuador (TIC.EC). TICEC 2019. Advances in Intelligent Systems and Computing, vol 1099. Springer, Cham. https://doi.org/10.1007/978-3-030-35740-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-35740-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35739-9
Online ISBN: 978-3-030-35740-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)