Introduction to Missing Data Estimation

Leke, Collins Achepsah; Marwala, Tshilidzi

doi:10.1007/978-3-030-01180-2_1

Collins Achepsah Leke⁴ &
Tshilidzi Marwala⁴

Part of the book series: Studies in Big Data ((SBD,volume 48))

1489 Accesses
2 Citations

Abstract

This chapter describes in detail the problem of missing data. It also describes the different missing data patterns and mechanisms. This is followed by a discussion of the classical missing data techniques ensued by a presentation of machine learning approaches to address the missing data problem. Subsequently, machine learning optimization techniques are presented for missing data estimation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdella, M., & Marwala, T. (2005a). The use of genetic algorithms and neural networks to approximate missing data in database. 24, 577–589.
Google Scholar
Abdella, M. (2005). The use of genetic algorithms and neural networks to approximate missing data in database. Unpublished master’s thesis, University of the Witwatersrand, Johannesburg.
Google Scholar
Abdella, M., & Marwala, T. (2005b). Treatment of missing data using neural networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 1, 598–603
Google Scholar
Allison, P. D. (2000). Multiple imputation for missing data. Sociological Methods & Research, 28(3), 301–309.
Article Google Scholar
Allison, P. D. (2002). Missing data. Thousand Oaks: Sage Publications.
Book MATH Google Scholar
Atalla, M. J., & Inman, D. J. (1998). On model updating using neural networks. Mechanical Systems and Signal Processing, 12, 135–161.
Article Google Scholar
Baek, K., & Cho, S. (2003). Bankruptcy prediction for credit risk using an auto-associative neural network in Korean firms. In: IEEE Conference on Computational Intelligence for Financial Engineering, pp. 25–29, Hong Kong, China.
Google Scholar
Brain, L. B., Marwala, T., & Tettey, T. (2006). Autoencoder networks for HIV classification. Current Science, 91(11), 1467–1473.
Google Scholar
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistics Society, 39(1), 1–38.
MathSciNet MATH Google Scholar
Dhlamini, S. M., Nelwamondo, F. V., & Marwala, T. (2006). Condition monitoring of HV bushings in the presence of missing data using evolutionary computing. Transactions on Power Systems, 1(2), 280–287.
Google Scholar
Engelbrecht, A. P. (2006). Particle swarm optimization: Where does it belong? In: Proceedings of IEEE Swarm Intelligence Symposium, pp. 48–54.
Google Scholar
Faris, P. D., Ghali, W. A., Brant, R., Norris, C. M., Galbraith, P. D., & Knudtson, M. L. (2002). Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology, 55(2), 184–191.
Article Google Scholar
Gabrys, B. (2002). Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30, 149–179.
Article MathSciNet MATH Google Scholar
Garca-Laencina, P., Sancho-Gmez, J., Figueiras-Vidal, A., & Verleysen, M. (2009). K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7–9), 1483–1493.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
MATH Google Scholar
Haykin, S. (1999). Neural networks (2nd ed.). New Jersey: Prentice-Hall.
Google Scholar
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28.
Article Google Scholar
Hines, J. W., Robert, E. U., & Wrest, D. J. (1998). Use of autoassociative neural networks for signal validation. Journal of Intelligent and Robotic Systems, 21(2), 143–154.
Article Google Scholar
Ho, P., Silva, M. C. M., & Hogg, T. A. (2001). Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemometrics and Intelligent Laboratory Systems, 55(1–2), 1–11.
Article Google Scholar
Hui, D., Wan, S., Su, B., Katul, G., Monson, R., & Luo, Y. (2004). Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations. Agricultural and Forest Meteorology, 121(1–2), 93–111.
Article Google Scholar
Isaacs, J. C. (2014). Representational learning for sonar ATR. In SPIE Defense + Security. In: Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XIX. International Society for Optics and Photonics, vol. 9072, p. 907203. https://doi.org/10.1117/12.2053057.
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895–2907.
Article Google Scholar
Kalousis, A., & Hilario, M. (2000). Supervised knowledge discovery from incomplete data. In: Proceedings of the 2nd International Conference on Data Mining. WIT Press. http://cui.unige.ch/AI-group/research/metal/Papers/missingvalues.ps. Accessed Oct 2016.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (PSO). In: Proceedings of IEEE International Conference on Neural Networks (ICNN), Perth, Australia, vol. 4, pp. 1942–1948.
Google Scholar
Leke, C., & Marwala, T. (2016). Missing data estimation in high-dimensional datasets: A swarm intelligence-deep neural network approach. In: International Conference in Swarm Intelligence. Springer International Publishing, pp. 259–270.
Google Scholar
Leke, C., Twala, B., & Marwala, T. (2014). Modeling of missing data prediction: Computational intelligence and optimization algorithms. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 1400–1404.
Google Scholar
Little, R., & Rubin, D. (2014). Statistical analysis with missing data (Vol. 333). New York: Wiley.
MATH Google Scholar
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
MATH Google Scholar
Liu, Y., & Brown, S. D. (2013). Comparison of five iterative imputation methods for multivariate classification. Chemometrics and Intelligent Laboratory Systems, 120, 106–115.
Article Google Scholar
Lu, P. J., & Hsu, T. C. (2002). Application of autoassociative neural network on gas-path sensor data validation. Journal of Propulsion and Power, 18(4), 879–888.
Article Google Scholar
Marwala, T. (2010). Finite element model updating using computational intelligence techniques: Applications to structural dynamics. Heidelberg: Springer.
Book MATH Google Scholar
Marwala, T., & Lagazio, M. (2011). Militarized conflict modeling using computational intelligence techniques. London: Springer.
Book Google Scholar
Marwala, T. (2009). Computational intelligence for missing data imputation: Estimation and management knowledge optimization techniques. Hershey, New York: Information Science Reference.
Book Google Scholar
Marwala, T. (2001). Probabilistic fault identification using a committee of neural networks and vibration data. Journal of Aircraft, 38(1), 138–146.
Article Google Scholar
Marwala, T., & Chakraverty, S. (2006). Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm. Current Science, 90(4), 542–549.
Google Scholar
Marwala, T. (2013). Economic modelling using artificial intelligence methods. London: Springer.
Book MATH Google Scholar
Ming-Hau, C. (2010). Pattern recognition of business failure by autoassociative neural networks in considering the missing values. International Computer Symposium (ICS) (pp. 711–715). Taiwan: Taipei.
Google Scholar
Mistry, J., Nelwamondo, F., & Marwala, T. (2008). Estimating missing data and determining the confidence of the estimate data. In: Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, pp. 752–755.
Google Scholar
Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007a). Missing data: A comparison of neural network and expectation maximization techniques. Current Science, 93(11), 1514–1521.
Google Scholar
Nelwamondo, F. V., & Marwala, T. (2007a). Handling missing data from heteroskedastic and non-stationary data. Lecture Notes in Computer Science, 4491(1), 1297–1306
Google Scholar
Nelwamondo, F. V., & Marwala, T. (2007b). Rough set theory for the treatment of incomplete data. In: Proceedings of the IEEE Conference on Fuzzy Systems, London, UK, pp. 338–343.
Google Scholar
Nelwamondo, F. V., & Marwala, T. (2007c). Fuzzy ARTMAP and neural network approach to online processing of inputs with missing values. SAIEE Africa Research Journal, 98(2), 45–51.
Google Scholar
Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007b). Missing data: A comparison of neural network and expectation maximisation techniques. Current Science, 93(12), 1514–1521.
Google Scholar
Nelwamondo, F. V., & Marwala, T. (2008). Techniques for handling missing data: applications to online condition monitoring. International Journal of Innovative Computing, Information and Control, 4(6), 1507–1526.
Google Scholar
Nishanth, K. J., & Ravi, V. (2013). A computational intelligence based online data imputation method: An application for banking. Journal of Information Processing Systems, 9(4), 633–650.
Article Google Scholar
Pérez, A., Dennis, R. J., Gil, J. F. A., Róndon, M. A., & López, A. (2002). Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Journal of Statistics in Medicine, 21(24), 3885–3896.
Article Google Scholar
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.
Article Google Scholar
Poleto, F. Z., Singer, J. M., & Paulino, C. D. (2011). Missing data mechanisms and their implications on the analysis of categorical data. Statistics and Computing, 21(1), 31–43.
Article MathSciNet MATH Google Scholar
Polikar, R., De Pasquale, J., Mohammed, H. S., Brown, G., & Kuncheva, L. I. (2010). Learn ++mf: A random subspace approach for the missing feature problem. Pattern Recognition, 43(11), 3817–3832.
Article MATH Google Scholar
Ramoni, M., & Sebastiani, P. (2001). Robust learning with missing data. Journal of Machine Learning, 45(2), 147–170.
Article MATH Google Scholar
Rubin, D. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. Proceedings of the survey research methods section of the American Statistical Association, 1, 20–34.
Google Scholar
Sartori, N., Salvan, A., & Thomaseth, K. (2005). Multiple imputation of missing values in a cancer mortality analysis with estimated exposure dose. Computational Statistics & Data Analysis, 49(3), 937–953.
Article MathSciNet MATH Google Scholar
Scheffer, J. (2000). Dealing with missing data. Research Letters in the Information and Mathematical Sciences. 3:153–160. (last accessed: 18-March-2016). [Online]. Available: http://www.massey.ac.nz/wwiims/research/letters.
Shinozaki, T., & Ostendorf, M. (2008). Cross-validation and aggregated EM training for robust parameter estimation. Computer Speech & Language, 22(2), 185–195.
Article Google Scholar
Silva-Ramirez, E.-L., Pino-Mejias, R., Lopez-Coello, M., & Cubiles-de-la Vega, M.-D. (2011). Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks, 24(1), 121–129.
Article Google Scholar
Smauoi, N., & Al-Yakoob, S. (2003). Analyzing the dynamics of cellular flames using karhunenloeve decomposition and autoassociative neural networks. Society for Industrial and Applied Mathematics, 24, 1790–1808.
MATH Google Scholar
Steeb, W.-H. (2008). The Nonlinear Workbook. Singapore: World Scientific.
Book MATH Google Scholar
Stolkin, R., Greig, A., Hodgetts, M., & Gilby, J. (2008). An EM/E-MRF algorithm for adaptive model-based tracking in extremely poor visibility. Image and Vision Computing, 26(4), 480–495.
Article Google Scholar
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Article Google Scholar
Tim, T., Mutajogire, M., & Marwala, T. (2004). Stock market prediction using evolutionary neural networks (pp. 123–133). PRASA: Fifteenth Annual Symposium of the Pattern Recognition.
Google Scholar
Tremblay, M. C., Dutta, K., & Vandermeer, D. (2010). Using data mining techniques to discover bias patterns in missing data. Journal of Data and Information Quality, 2(1), 1–19.
Article Google Scholar
Twala, B. (2009). An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, 23(5), 373–405.
Article Google Scholar
Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis., 14(3), 299–331.
Article Google Scholar
Twala, B. E. T. H., Jones, M. C., & Hand, D. J. (2008). Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7), 950–956.
Article Google Scholar
Twala, B., & Phorah, M. (2010). Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognition Letters, 31, 2061–2069.
Article Google Scholar
Twala, B. E. T. H. (2005). Effective techniques for handling incomplete data using decision trees. Unpublished doctoral dissertation, The Open University, UK.
Google Scholar
Wang, S. (2005). Classification with incomplete survey data: A Hopfield neural network approach. Computers & Operations Research, 24, 53–62.
Google Scholar
Yansaneh, I. S., Wallace, L. S., & Marker, D. A. (1998). Imputation methods for large complex datasets: An application to the Nehis. In: Proceedings of the Survey Research Methods Section, pp. 314–319.
Google Scholar
Yu, S., & Kobayashi, H. (2003). A hidden semi-Markov model with missing data and multiple observation sequences for mobility tracking. Signal Processing, 83(2), 235–250.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Built Environment, University of Johannesburg, Auckland Park, South Africa
Collins Achepsah Leke & Tshilidzi Marwala

Authors

Collins Achepsah Leke
View author publications
You can also search for this author in PubMed Google Scholar
Tshilidzi Marwala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Collins Achepsah Leke .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leke, C.A., Marwala, T. (2019). Introduction to Missing Data Estimation. In: Deep Learning and Missing Data in Engineering Systems. Studies in Big Data, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-01180-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-01180-2_1
Published: 14 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01179-6
Online ISBN: 978-3-030-01180-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics