Abstract
One of the basic endeavors in Pattern Recognition and particularly in Data Mining is the process of determining which unlabeled objects in a set do share interesting properties. This implies a singular process of classification usually denoted as "clustering", where the objects are grouped into k subsets (clusters) in accordance with an appropriate measure of likelihood. Clustering can be considered the most important unsupervised learning problem. The more traditional clustering methods are based on the minimization of a similarity criteria based on a metric or distance. This fact imposes important constraints on the geometry of the clusters found. Since each element in a cluster lies within a radial distance relative to a given center, the shape of the covering or hull of a cluster is hyper-spherical (convex) which sometimes does not encompass adequately the elements that belong to it. For this reason we propose to solve the clustering problem through the optimization of Shannon’s Entropy. The optimization of this criterion represents a hard combinatorial problem which disallows the use of traditional optimization techniques, and thus, the use of a very efficient optimization technique is necessary. We consider that Genetic Algorithms are a good alternative. We show that our method allows to obtain successfull results for problems where the clusters have complex spatial arrangements. Such method obtains clusters with non-convex hulls that adequately encompass its elements. We statistically show that our method displays the best performance that can be achieved under the assumption of normal distribution of the elements of the clusters. We also show that this is a good alternative when this assumption is not met.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Casella, G., Robert, C.P.: Monte carlo statistical methods (1999)
De Sa, J.M.: Pattern recognition: concepts, methods, and applications. Springer (2001)
Digalakis, J., Margaritis, K.: An experimental study of benchmarking functions for genetic algorithms (2002)
Duda, R., Hart, P., Stork, D.: Pattern classification, Section 10, P. 6. John Wiley, New York (2001)
Eshelman, L.: The chc adaptive search algorithm. how to have safe search when engaging in nontraditional genetic recombination (1991)
Gallager, R.G.: Information theory and reliable communication (1968)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17(2), 107–145 (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. (1998)
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Johnson, J.L.: Probability and statistics for computer science. Wiley Online Library (2003)
Kim, J.H., Myung, H.: Evolutionary programming techniques for constrained optimization problems (1997)
Kuri-Morales, A.: A statistical genetic algorithm (1999)
Kuri-Morales, A., Aldana-Bobadilla, E.: A comprehensive comparative study of structurally different genetic algorithms (sent for publication, 2013)
Kuri-Morales, A., Aldana-Bobadilla, E.: The Search for Irregularly Shaped Clusters in Data Mining. Kimito, Funatsu and Kiyoshi, Hasegawa (2011)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, USA, vol. 1, p. 14 (1967)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1996)
Mitchell, M., Holland, J., Forrest, S.: When Will a Genetic Algorithm Outperform Hill Climbing? In: Advances of Neural Information Processing Systems 6, pp. 51–58. Morgan Kaufmann (1994)
Molga, M., Smutnicki, C.: Test functions for optimization needs, pp. 41–42 (2005), http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf (retrieved March 11, 2012)
Pohlheim, H.: Geatbx: Genetic and evolutionary algorithm toolbox for use with matlab documentation (2012)
Rezaee, J., Hashemi, A., Nilsaz, N., Dezfouli, H.: Analysis of the strategies in heuristic techniques for solving constrained optimisation problems (2012)
Rudolph, G.: Convergence Analysis of Canonical Genetic Algorithms. IEEE Transactions on Neural Networks 5(1), 96–101 (1994)
Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Sánchez-Ferrero, G., Arribas, J.: A statistical-genetic algorithm to select the most significant features in mammograms (2007)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review 5(1), 3–55 (2001)
Steliga, K., Szynal, D.: On markov-type inequalities (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Aldana-Bobadilla, E., Kuri-Morales, A. (2013). Unsupervised Classifier Based on Heuristic Optimization and Maximum Entropy Principle. In: Emmerich, M., et al. EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV. Advances in Intelligent Systems and Computing, vol 227. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-01128-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-01128-8_2
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-01127-1
Online ISBN: 978-3-319-01128-8
eBook Packages: EngineeringEngineering (R0)