Abstract
The presence of missing values in time series datasets poses significant challenges for accurate data analysis and modeling. In this paper, we present a comparative study of missing value imputation algorithms applied to time series datasets collected from various sensors over a period of six months. The goal of this study is to bridge the data gap by effectively replacing missing values and assessing the performance of three common imputation algorithms for time series: K-Nearest Neighbors (KNN) imputer, Expectation-Maximization (EM), and Multiple Imputation by Chained Equations (MICE). To evaluate the performance of the imputation techniques, we employed Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) as metrics. Through rigorous experimentation and analysis, we found that each algorithm exhibited varying degrees of effectiveness in handling missing values within the time series datasets. Our findings highlight the importance of choosing an appropriate imputation algorithm based on the characteristics of the dataset and the specific requirements of the analysis. The results also demonstrate the potential of the MICE imputer in closing the data gap and improving the accuracy of subsequent analyses on time series sensor data. Overall, this study provides valuable insights into the performance and suitability of different missing value imputation algorithms for time series datasets, facilitating better decision-making and enhancing the reliability of data-driven applications in various domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- KNN:
-
K Neareast-Neighbour
- EM:
-
Expectation Maximization
- MICE:
-
Multiple Imputation by Chained Equations
- RMSE:
-
Root Mean Squared Error
- MAE:
-
Mean Absolute Error
- MVI:
-
Missing Value Imputation
- EMMVI:
-
Expectation-Maximization Missing Value Imputation
- MMVI:
-
Multiple Imputation by Chained Equations (MICE)
- LLSMVI:
-
Locally Linear Stochastic Missing Value Imputation
- BPCAMVI:
-
Bayesian Principal Component Analysis Missing Value Imputation
- WSNs:
-
Wireless sensor networks
- LRMVI:
-
Latent Regression Missing Value Imputation
- NRMSE:
-
Normalized Root Mean Squared Error
- MSE:
-
Mean Squared Error
- RF:
-
Random Forest
- SVM:
-
Support Vector Machines
- BPCA:
-
Bayesian Principal Component Analysis
- DT:
-
Decision Tree
- ML:
-
Machine Learning
- CVBKNNI:
-
Cross-Validation Based k-Nearest Neighbor Imputation
- RNNs:
-
Recurrent Neural Networks
- MuSDRI:
-
Multi-Seasonal Decomposition based Recurrent Imputation
- HPGR:
-
High-Pressure Grinding Rolls
- IQR:
-
Interquartile Range
References
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2019)
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., Tabona, O.: A survey on missing data in machine learning. J. Big Data. 8 (2021). https://doi.org/10.1186/s40537-021-00516-9
Ahn, H., Sun, K., Kim, K.P.: Comparison of missing data imputation methods in time series forecasting. Comput. Mater. Continua. 70, 767–779 (2021). https://doi.org/10.32604/cmc.2022.019369
Jamaludin, K.R., Muhamad, W.Z.A.W., Miskon, S.: A review of current publications trend on missing data imputation over three decades: direction and future research (2021). https://doi.org/10.21203/rs.3.rs-996596/v1
Hasan, M.K., Alam, M.A., Roy, S., Dutta, A., Jawad, M.T., Das, S.: Missing value imputation affects the performance of machine learning: a review and analysis of the literature (2010–2021) (2021). https://doi.org/10.1016/j.imu.2021.100799
Armina, R., Mohd Zain, A., Ali, N.A., Sallehuddin, R.: A review on missing value estimation using imputation algorithm. J. Phys. Conf. Ser. (2017). https://doi.org/10.1088/1742-6596/892/1/012004
Read, S., Wild, S., Lewis, S.: Applying missing data methods to routine data: a prospective, population-based register of people with diabetes. Trials. 14 (2013). https://doi.org/10.1186/1745-6215-14-s1-p113
Kenyeres, M., Kenyeres, J.: Multi-sensor data fusion by average consensus algorithm with fully-distributed stopping criterion: comparative study of weight designs. U.P.B. Sci. Bull., Series C. 81 (2019)
Dubey, A., Rasool, A.: Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour. Sci. Rep. 11, (2021). https://doi.org/10.1038/s41598-021-03438-x
Huang, J., et al.: Cross-validation based K nearest neighbor imputation for software quality datasets: an empirical study. J. Syst. Softw. 132, 226–252 (2017). https://doi.org/10.1016/j.jss.2017.07.012
Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning k for kNN Classification. ACM Trans. Intell. Syst. Technol. 8 (2017). https://doi.org/10.1145/2990508
Zhang, S.: Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell. Inf. Bull. 9, 32–38 (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. (1977)
Nakai, M., Ke, W.: Review of the methods for handling missing data in longitudinal data analysis (2011)
Latiffah Abd Rani, N., et al.: Prediction model of missing data: a case study of PM10 across Malaysia Region. Article J. Appl. Fundam. Sci. 2018, 182–203 (2019). https://doi.org/10.4314/jfas.v10i1s.1
van Buuren, S.: Flexible Imputation of Missing Data. Chapman and Hall/CRC (2012). https://doi.org/10.1201/b11826
Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatr. Res. 20, 40–49 (2011). https://doi.org/10.1002/mpr.329
Gatial, E., Balogh, Z., Hluchy, L.: Concept of energy efficient ESP32 chip for industrial wireless sensor network. In: 2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES), pp. 179–184. IEEE (2020). https://doi.org/10.1109/INES49302.2020.9147189
Acknowledgments
This work was supported by the Slovak Scientific Grand Agency VEGA under the contract 2/0135/23 “Intelligent sensor systems and data processing” and “Research on the application of artificial intelligence tools in the analysis and classification of hyperspectral sensing data” (ITMS: NFP313011BWC9) supported by the Operational Programme Integrated Infrastructure (OPII) funded by the ERDF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hassankhani Dolatabadi, S., Budinská, I., Behmaneshpour, R., Gatial, E. (2024). Closing the Data Gap: A Comparative Study of Missing Value Imputation Algorithms in Time Series Datasets. In: Silhavy, R., Silhavy, P. (eds) Data Analytics in System Engineering. CoMeSySo 2023. Lecture Notes in Networks and Systems, vol 910. Springer, Cham. https://doi.org/10.1007/978-3-031-53552-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-53552-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53551-2
Online ISBN: 978-3-031-53552-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)