Abstract
Partially missing datasets are a prevailing problem in data analysis. Since several reasons for missing attribute values can be distinguished, we suggest a differentiated treatment of this common problem. For datasets, in which feature values are missing completely at random, a variety of approaches has been proposed. In other situations, however, the fact that values are missing provides additional information for the classification of the dataset. Since the known approaches cannot exploit this information, we developed an extension of the Gath and Geva algorithm that can utilize it. We introduce a class specific probability for missing values in order to appropriately assign incomplete data points to clusters. Benchmark datasets are used to demonstrate the capability of the presented approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aeberhard, S., Coomans, D., and de Vel, O.: Comparison of Classifiers in High Dimensional Settings. Tech Rep. 92-02, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland, 1992.
Bezdek, J.C. and Pal, S.K. (eds.): Fuzzy Models for Pattern Recognition: methods that search for structures in data. IEEE Press, Piscataway, 1992.
Bezdek, J.C., Keller, J., Krishnapuram, R., and Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer, Boston, London, 1999.
Dixon, J.K.: Pattern Recognition with partly missing data. IEEE Transactions on Systems, Man, and Cybernetics, 9(6), 617–621, 1979.
Gath, I. and Geva, A.B.: Unsupervised Optimal Fuzzy Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 773–781, 1989.
Hathaway, R.J. and Bezdek J.C.: Fuzzy c-Means Clustering of Incomplete Data. IEEE Trans. on Systems, Man, and Cybernetics — Part B, 31(5), 735–744, 2001.
Höppner, F., Klawonn, F., Kruse, R. and Runkler, T.: Fuzzy Cluster Analysis, Wiley, Chichester, New York, 1999.
Little, R.J.A. und Rubin, D.A.: Statistical analysis with missing data. John Wiley and Sons, New York, 1987.
Schafer, J.L.: Analysis of Incomplete Multivariate Data, Chapman & Hall, London, 1997.
Timm, H. and Klawonn, F.: Classification of Data with Missing Values. Proc. 6th European Congress on Intelligent Techniques and Soft Computing (EUFIT’ 98), 1304–1308, Aachen, Deutschland, 1998.
Timm, H. and Kruse, R.: Fuzzy Cluster Analysis with Missing Values. Proc. 17th International Conf. of the North American Fuzzy Information Processing Society (NAFIPS98), 242–246, Pensacola, FL, USA, 1998.
Timm, H. and Klawonn, F.: Different Approaches for Fuzzy Cluster Analysis with Missing Values, Proceedings of 7th European Congress on Intelligent Techniques & Soft Computing, Aachen, Germany, 1999.
Timm, H. Döring, C. and Kruse, R.: Fuzzy Cluster Analysis of Partially Missing Datasets. Proc. of the European Symposium on Intelligent Technologies, Hybid Systems and Their Implementation on Smart Adaptive Systems (EUNITE 2002), Albufeira, Portugal, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Timm, H., Döring, C., Kruse, R. (2003). Differentiated Treatment of Missing Values in Fuzzy Clustering. In: Bilgiç, T., De Baets, B., Kaynak, O. (eds) Fuzzy Sets and Systems — IFSA 2003. IFSA 2003. Lecture Notes in Computer Science, vol 2715. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44967-1_42
Download citation
DOI: https://doi.org/10.1007/3-540-44967-1_42
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40383-8
Online ISBN: 978-3-540-44967-6
eBook Packages: Springer Book Archive