Abstract
This chapter describes in detail the problem of missing data. It also describes the different missing data patterns and mechanisms. This is followed by a discussion of the classical missing data techniques ensued by a presentation of machine learning approaches to address the missing data problem. Subsequently, machine learning optimization techniques are presented for missing data estimation tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdella, M., & Marwala, T. (2005a). The use of genetic algorithms and neural networks to approximate missing data in database. 24, 577–589.
Abdella, M. (2005). The use of genetic algorithms and neural networks to approximate missing data in database. Unpublished master’s thesis, University of the Witwatersrand, Johannesburg.
Abdella, M., & Marwala, T. (2005b). Treatment of missing data using neural networks. In: Proceedings of the IEEE International Joint Conference on Neural Networks, vol. 1, 598–603
Allison, P. D. (2000). Multiple imputation for missing data. Sociological Methods & Research, 28(3), 301–309.
Allison, P. D. (2002). Missing data. Thousand Oaks: Sage Publications.
Atalla, M. J., & Inman, D. J. (1998). On model updating using neural networks. Mechanical Systems and Signal Processing, 12, 135–161.
Baek, K., & Cho, S. (2003). Bankruptcy prediction for credit risk using an auto-associative neural network in Korean firms. In: IEEE Conference on Computational Intelligence for Financial Engineering, pp. 25–29, Hong Kong, China.
Brain, L. B., Marwala, T., & Tettey, T. (2006). Autoencoder networks for HIV classification. Current Science, 91(11), 1467–1473.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistics Society, 39(1), 1–38.
Dhlamini, S. M., Nelwamondo, F. V., & Marwala, T. (2006). Condition monitoring of HV bushings in the presence of missing data using evolutionary computing. Transactions on Power Systems, 1(2), 280–287.
Engelbrecht, A. P. (2006). Particle swarm optimization: Where does it belong? In: Proceedings of IEEE Swarm Intelligence Symposium, pp. 48–54.
Faris, P. D., Ghali, W. A., Brant, R., Norris, C. M., Galbraith, P. D., & Knudtson, M. L. (2002). Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. Journal of Clinical Epidemiology, 55(2), 184–191.
Gabrys, B. (2002). Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning, 30, 149–179.
Garca-Laencina, P., Sancho-Gmez, J., Figueiras-Vidal, A., & Verleysen, M. (2009). K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing, 72(7–9), 1483–1493.
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Haykin, S. (1999). Neural networks (2nd ed.). New Jersey: Prentice-Hall.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28.
Hines, J. W., Robert, E. U., & Wrest, D. J. (1998). Use of autoassociative neural networks for signal validation. Journal of Intelligent and Robotic Systems, 21(2), 143–154.
Ho, P., Silva, M. C. M., & Hogg, T. A. (2001). Multiple imputation and maximum likelihood principal component analysis of incomplete multivariate data from a study of the ageing of port. Chemometrics and Intelligent Laboratory Systems, 55(1–2), 1–11.
Hui, D., Wan, S., Su, B., Katul, G., Monson, R., & Luo, Y. (2004). Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations. Agricultural and Forest Meteorology, 121(1–2), 93–111.
Isaacs, J. C. (2014). Representational learning for sonar ATR. In SPIE Defense + Security. In: Detection and Sensing of Mines, Explosive Objects, and Obscured Targets XIX. International Society for Optics and Photonics, vol. 9072, p. 907203. https://doi.org/10.1117/12.2053057.
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., & Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets. Atmospheric Environment, 38(18), 2895–2907.
Kalousis, A., & Hilario, M. (2000). Supervised knowledge discovery from incomplete data. In: Proceedings of the 2nd International Conference on Data Mining. WIT Press. http://cui.unige.ch/AI-group/research/metal/Papers/missingvalues.ps. Accessed Oct 2016.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (PSO). In: Proceedings of IEEE International Conference on Neural Networks (ICNN), Perth, Australia, vol. 4, pp. 1942–1948.
Leke, C., & Marwala, T. (2016). Missing data estimation in high-dimensional datasets: A swarm intelligence-deep neural network approach. In: International Conference in Swarm Intelligence. Springer International Publishing, pp. 259–270.
Leke, C., Twala, B., & Marwala, T. (2014). Modeling of missing data prediction: Computational intelligence and optimization algorithms. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 1400–1404.
Little, R., & Rubin, D. (2014). Statistical analysis with missing data (Vol. 333). New York: Wiley.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Liu, Y., & Brown, S. D. (2013). Comparison of five iterative imputation methods for multivariate classification. Chemometrics and Intelligent Laboratory Systems, 120, 106–115.
Lu, P. J., & Hsu, T. C. (2002). Application of autoassociative neural network on gas-path sensor data validation. Journal of Propulsion and Power, 18(4), 879–888.
Marwala, T. (2010). Finite element model updating using computational intelligence techniques: Applications to structural dynamics. Heidelberg: Springer.
Marwala, T., & Lagazio, M. (2011). Militarized conflict modeling using computational intelligence techniques. London: Springer.
Marwala, T. (2009). Computational intelligence for missing data imputation: Estimation and management knowledge optimization techniques. Hershey, New York: Information Science Reference.
Marwala, T. (2001). Probabilistic fault identification using a committee of neural networks and vibration data. Journal of Aircraft, 38(1), 138–146.
Marwala, T., & Chakraverty, S. (2006). Fault classification in structures with incomplete measured data using autoassociative neural networks and genetic algorithm. Current Science, 90(4), 542–549.
Marwala, T. (2013). Economic modelling using artificial intelligence methods. London: Springer.
Ming-Hau, C. (2010). Pattern recognition of business failure by autoassociative neural networks in considering the missing values. International Computer Symposium (ICS) (pp. 711–715). Taiwan: Taipei.
Mistry, J., Nelwamondo, F., & Marwala, T. (2008). Estimating missing data and determining the confidence of the estimate data. In: Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, pp. 752–755.
Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007a). Missing data: A comparison of neural network and expectation maximization techniques. Current Science, 93(11), 1514–1521.
Nelwamondo, F. V., & Marwala, T. (2007a). Handling missing data from heteroskedastic and non-stationary data. Lecture Notes in Computer Science, 4491(1), 1297–1306
Nelwamondo, F. V., & Marwala, T. (2007b). Rough set theory for the treatment of incomplete data. In: Proceedings of the IEEE Conference on Fuzzy Systems, London, UK, pp. 338–343.
Nelwamondo, F. V., & Marwala, T. (2007c). Fuzzy ARTMAP and neural network approach to online processing of inputs with missing values. SAIEE Africa Research Journal, 98(2), 45–51.
Nelwamondo, F. V., Mohamed, S., & Marwala, T. (2007b). Missing data: A comparison of neural network and expectation maximisation techniques. Current Science, 93(12), 1514–1521.
Nelwamondo, F. V., & Marwala, T. (2008). Techniques for handling missing data: applications to online condition monitoring. International Journal of Innovative Computing, Information and Control, 4(6), 1507–1526.
Nishanth, K. J., & Ravi, V. (2013). A computational intelligence based online data imputation method: An application for banking. Journal of Information Processing Systems, 9(4), 633–650.
Pérez, A., Dennis, R. J., Gil, J. F. A., Róndon, M. A., & López, A. (2002). Use of the mean, hot deck and multiple imputation techniques to predict outcome in intensive care unit patients in Colombia. Journal of Statistics in Medicine, 21(24), 3885–3896.
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.
Poleto, F. Z., Singer, J. M., & Paulino, C. D. (2011). Missing data mechanisms and their implications on the analysis of categorical data. Statistics and Computing, 21(1), 31–43.
Polikar, R., De Pasquale, J., Mohammed, H. S., Brown, G., & Kuncheva, L. I. (2010). Learn ++mf: A random subspace approach for the missing feature problem. Pattern Recognition, 43(11), 3817–3832.
Ramoni, M., & Sebastiani, P. (2001). Robust learning with missing data. Journal of Machine Learning, 45(2), 147–170.
Rubin, D. (1978). Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. Proceedings of the survey research methods section of the American Statistical Association, 1, 20–34.
Sartori, N., Salvan, A., & Thomaseth, K. (2005). Multiple imputation of missing values in a cancer mortality analysis with estimated exposure dose. Computational Statistics & Data Analysis, 49(3), 937–953.
Scheffer, J. (2000). Dealing with missing data. Research Letters in the Information and Mathematical Sciences. 3:153–160. (last accessed: 18-March-2016). [Online]. Available: http://www.massey.ac.nz/wwiims/research/letters.
Shinozaki, T., & Ostendorf, M. (2008). Cross-validation and aggregated EM training for robust parameter estimation. Computer Speech & Language, 22(2), 185–195.
Silva-Ramirez, E.-L., Pino-Mejias, R., Lopez-Coello, M., & Cubiles-de-la Vega, M.-D. (2011). Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks, 24(1), 121–129.
Smauoi, N., & Al-Yakoob, S. (2003). Analyzing the dynamics of cellular flames using karhunenloeve decomposition and autoassociative neural networks. Society for Industrial and Applied Mathematics, 24, 1790–1808.
Steeb, W.-H. (2008). The Nonlinear Workbook. Singapore: World Scientific.
Stolkin, R., Greig, A., Hodgetts, M., & Gilby, J. (2008). An EM/E-MRF algorithm for adaptive model-based tracking in extremely poor visibility. Image and Vision Computing, 26(4), 480–495.
Suykens, J. A. K., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Tim, T., Mutajogire, M., & Marwala, T. (2004). Stock market prediction using evolutionary neural networks (pp. 123–133). PRASA: Fifteenth Annual Symposium of the Pattern Recognition.
Tremblay, M. C., Dutta, K., & Vandermeer, D. (2010). Using data mining techniques to discover bias patterns in missing data. Journal of Data and Information Quality, 2(1), 1–19.
Twala, B. (2009). An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence, 23(5), 373–405.
Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis., 14(3), 299–331.
Twala, B. E. T. H., Jones, M. C., & Hand, D. J. (2008). Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7), 950–956.
Twala, B., & Phorah, M. (2010). Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognition Letters, 31, 2061–2069.
Twala, B. E. T. H. (2005). Effective techniques for handling incomplete data using decision trees. Unpublished doctoral dissertation, The Open University, UK.
Wang, S. (2005). Classification with incomplete survey data: A Hopfield neural network approach. Computers & Operations Research, 24, 53–62.
Yansaneh, I. S., Wallace, L. S., & Marker, D. A. (1998). Imputation methods for large complex datasets: An application to the Nehis. In: Proceedings of the Survey Research Methods Section, pp. 314–319.
Yu, S., & Kobayashi, H. (2003). A hidden semi-Markov model with missing data and multiple observation sequences for mobility tracking. Signal Processing, 83(2), 235–250.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Leke, C.A., Marwala, T. (2019). Introduction to Missing Data Estimation. In: Deep Learning and Missing Data in Engineering Systems. Studies in Big Data, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-01180-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-01180-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01179-6
Online ISBN: 978-3-030-01180-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)