Comparison of Missing Values Handling Techniques Using MICE Package Tools of R Software and Logistic Regression Model

Nadraga, Vasiliy; Smirnov, Volodymyr; Boiko, Oleksandra; Dereko, Vladyslav

doi:10.1007/978-3-030-54215-3_3

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1246))

Included in the following conference series:

International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”

687 Accesses
2 Citations

Abstract

The paper presents the result of the research concerning comparisson of various techniques of missing values multiple imputation by chained equations (MICE) with the use of logistic regression at the stage of the model verification. The presence of missing values in the data complicates the data processing and increases the risk factor in the process of solving various problems in various areas of data science techniques use. The simulation process was performed on the basis of the apply of both R and KNIME software tools. The Mammographic Mass dataset from Machine Learning Repository was used as the experimental data during the simulation process. Implementation of the step-by-step process of missing values handling involved the data analysis and missing values visualization at the first step. Then, we have performed the missing values handling with the use of various techniques which are available in MICE package of R software. The quality of the data processing at each step of this procedure implementation was estimated with the use of logistic regression model based on ROC analysis with calculation of the quantitative criteria: AUC (area under roc curve), Akaike and Bayesian ones. At the final step, we have compared various techniques of missing values handling for purpose of selection from them the best variants taking into account the used criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multistage Model for Accurate Prediction of Missing Values Using Imputation Methods in Heart Disease Dataset

Performance Analysis of Missing Values Imputation Methods Using Machine Learning Techniques

SICE: an improved missing data imputation technique

Article Open access 12 June 2020

References

Center for machine learning and intelligent systems. Machine learning repository. http://archive.ics.uci.edu/ml/datasets/mammographic+mass
Knime. https://www.knime.com/
Allotey, P., Harel, O.: Multiple imputation for incomplete data in environmental epidemiology research. Curr. Environ. Health Rep. 6(2), 62–71 (2020). https://doi.org/10.1007/s40572-019-00230-y
Article Google Scholar
Babichev, S., Kornelyuk, A., Lytvynenko, V., Osypenko, V.: Computational analysis of microarray gene expression profiles of lung cancer. Biopolymers Cell 32(1), 70–79 (2016). https://doi.org/10.7124/bc.00090F
Article Google Scholar
Babichev, S., Škvor, J., Fišer, J., Lytvynenko, V.: Technology of gene expression profiles filtering based on wavelet analysis. Int. J. Intell. Sys. Appl. 10(4), 1–7 (2018). https://doi.org/10.5815/ijisa.2018.04.01
Article Google Scholar
Babichev, S., Lytvynenko, V., Škvor, J., Fišer, J.: Model of the objective clustering inductive technology of gene expression profiles based on SOTA and DBSCAN clustering algorithms. Adv. Intell. Sys. Comput. 689, 21–39 (2018). https://doi.org/10.1007/978-3-319-70581-1_2
Article Google Scholar
Babichev, S., Lytvynenko, V., Osypenko, V.: Implementation of the objective clustering inductive technology based on DBSCAN clustering algorithm. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 479–484 (2017). https://doi.org/10.1109/STC-CSIT.2017.8098832
Barth, A., Wallerman, J., Stahl, G.: Spatially consistent nearest neighbor imputation of forest stand data. Remote Sens. Environ. 113(3), 546–553 (2009). https://doi.org/10.1016/j.rse.2008.09.011
Article Google Scholar
Chhabra, G., Vashisht, V., Ranjan, J.: A review on missing data value estimation using imputation algorithm. J. Adv. Res. Dyn. Control Sys. 11(7), 312–318 (2019)
Google Scholar
Choi, J., Dekkers, O., Cessie, S.: A comparison of different methods to handle missing data in the context of propensity score analysis. Eur. J. Epidemiol. 34(1), 23–36 (2019). https://doi.org/10.1007/s10654-018-0447-z
Article Google Scholar
Choudhury, S., Pal, N.: Imputation of missing data with neural networks for classification. Knowl. Based Syst. 182 (2019). Article no. 104838. https://doi.org/10.1016/j.knosys.2019.07.009
Cihan, P., Ozger, Z.: A new heuristic approach for treating missing value: ABCimp. Elektron. Elektrotech. 25(6), 48–54 (2019). https://doi.org/10.5755/j01.eie.25.6.24826
Article Google Scholar
Elter, M., Schulz-Wendtland, R., Wittenberg, T.: The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34(11), 4164–4172 (2007). https://doi.org/10.1118/1.2786864
Article Google Scholar
Ferguson, K., Yu, Y., Cantonwine, D., McElrath, T., Meeker, J., Mukherjee, B.: Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE). Paediatr. Perinat. Epidemiol. 32(5), 469–473 (2018). https://doi.org/10.1111/ppe.12486
Article Google Scholar
Fitzmaurice, G., Lipsitz, S., Weiss, R.: Sensitivity analysis for non-monotone missing binary data in longitudinal studies: application to the NIDA collaborative cocaine treatment study. Stat. Methods Med. Res. 28(10–11), 3057–3073 (2019). https://doi.org/10.1177/0962280218794725
Article MathSciNet Google Scholar
Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996). https://doi.org/10.1080/10618600.1996.10474713
Article Google Scholar
Izonin, I., Kryvinska, N., Vitynskyi, P., Tkachenko, R., Zub, K.: GRNN approach towards missing data recovery between IoT systems. Adv. Intell. Sys. Comput. 1035, 445–453 (2020). https://doi.org/10.1007/978-3-030-29035-1_43
Article Google Scholar
Kanishcheva, O., Vysotska, V., Chyrun, L., Gozhyj, A.: Method of integration and content management of the information resources network. Adv. Intell. Sys. Comput. 689, 204–216 (2019). https://doi.org/10.1007/978-3-319-70581-1_14
Article Google Scholar
Landerman, L., Land, K., Pieper, C.: An empirical evaluation of the predictive mean matching method for imputing missing values. Sociol. Methods Res. 26(1), 3–33 (1997). https://doi.org/10.1177/0049124197026001001
Article Google Scholar
Ma, S., Schreiner, P., at. al.: Multiple predictively equivalent risk models for handling missing data at time of prediction: with an application in severe hypoglycemia risk prediction for type 2 diabetes. J. Biomed. Inform. 103, 103379 (2020). https://doi.org/10.1016/j.jbi.2020.103379
Meera, S., Rosiline Jeetha, B.: Missing value aware optimal feature selection method for efficient big data mining process. Int. J. Recent Technol. Eng. 8(2), 354–360 (2019). https://doi.org/10.35940/ijrte.B1055.0982S1119
Article Google Scholar
Meyer, P., Olteanu, A.L.: Handling imprecise and missing evaluations in multi-criteria majority-rule sorting. Comput. Oper. Res. 110, 135–147 (2019). https://doi.org/10.1016/j.cor.2019.05.027
Article MathSciNet MATH Google Scholar
Mishchuk, O., Tkachenko, R., Izonin, I.: Missing data imputation through STGM neural-like structure for environmental monitoring tasks. Adv. Intell. Sys. Comput. 938, 142–151 (2020). https://doi.org/10.1007/978-3-030-16621-2_13
Article Google Scholar
Naum, O., Chyrun, L., Vysotska, V., Kanishcheva, O.: Intellectual system design for content formation. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, vol. 1, pp. 131–138. Institute of Electrical and Electronics Engineers Inc. (2017). https://doi.org/10.1109/STC-CSIT.2017.8098753
Sabri, A., Alfred, R.: Effects of handling missing values of VOCS gases emitted from human for human detection. Int. J. Recent Technol. Eng. 8(2), 1405–1412 (2019). https://doi.org/10.35940/ijrte.B1075.0882S819
Article Google Scholar
Sarkar, S., Pramanik, A., Khatedi, N., Maiti, J.: An investigation of the effects of missing data handling using ‘R’-packages. Adv. Intell. Sys. Comput. 1079, 275–284 (2020). https://doi.org/10.1007/978-981-15-1097-7_24
Article Google Scholar
Shah, A., Bartlett, J., Carpenter, J., Nicholas, O., Hemingway, H.: Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. Am. J. Epidemiol. 179(6), 764–774 (2014). https://doi.org/10.1093/aje/kwt312
Article Google Scholar
Soe, T., Min, M.: Analysis of missing data using matrix-characterized approximations. Stud. Comput. Intell. 845, 117–129 (2020). https://doi.org/10.1007/978-3-030-24344-9_7
Article Google Scholar
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). https://www.jstatsoft.org/v45/i03/
Xiao, Y., Tian, Z., Guo, W.: Empirical likelihood for partially non linear models with missing response variables at random. Commun. Stat. Theor. Methods 44(16), 3523–3540 (2015). https://doi.org/10.1080/03610926.2013.815211
Article MathSciNet MATH Google Scholar
Zhu, L.: Empirical likelihood for multidimensional linear model with missing responses. J. Probab. Stat. 473932 (2012). https://doi.org/10.1155/2012/473932

Download references

Author information

Authors and Affiliations

Departments Personnel Management and Labor Economics, Ukrainian State Employment Service Training Institute, Kyiv, Ukraine
Vasiliy Nadraga
Military-Diplomatic Academy named after Eugene Bereznyak, Kyiv, Ukraine
Volodymyr Smirnov, Oleksandra Boiko & Vladyslav Dereko

Authors

Vasiliy Nadraga
View author publications
You can also search for this author in PubMed Google Scholar
Volodymyr Smirnov
View author publications
You can also search for this author in PubMed Google Scholar
Oleksandra Boiko
View author publications
You can also search for this author in PubMed Google Scholar
Vladyslav Dereko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasiliy Nadraga .

Editor information

Editors and Affiliations

Department of Informatics, Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic
Sergii Babichev
Department of Informatics and Computer Science, Kherson National Technical University, Kherson, Ukraine
Volodymyr Lytvynenko
Institute of Electronics and Information, Lublin University of Technology, Lublin, Poland
Waldemar Wójcik
Department of Informatics and Computer Science, Kherson National Technical University, Kherson, Ukraine
Svetlana Vyshemyrskaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nadraga, V., Smirnov, V., Boiko, O., Dereko, V. (2021). Comparison of Missing Values Handling Techniques Using MICE Package Tools of R Software and Logistic Regression Model. In: Babichev, S., Lytvynenko, V., Wójcik, W., Vyshemyrskaya, S. (eds) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol 1246. Springer, Cham. https://doi.org/10.1007/978-3-030-54215-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-54215-3_3
Published: 26 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54214-6
Online ISBN: 978-3-030-54215-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Comparison of Missing Values Handling Techniques Using MICE Package Tools of R Software and Logistic Regression Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multistage Model for Accurate Prediction of Missing Values Using Imputation Methods in Heart Disease Dataset

Performance Analysis of Missing Values Imputation Methods Using Machine Learning Techniques

SICE: an improved missing data imputation technique

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Comparison of Missing Values Handling Techniques Using MICE Package Tools of R Software and Logistic Regression Model

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multistage Model for Accurate Prediction of Missing Values Using Imputation Methods in Heart Disease Dataset

Performance Analysis of Missing Values Imputation Methods Using Machine Learning Techniques

SICE: an improved missing data imputation technique

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation